SQL Server – Achieving minimally logged inserts into partitioned table with a computed column as the partitioning key

A while ago, I started a thread on SQLServerCentral discussing tactics to go about partitioning a huge table in respect to index alignment.  During the discussion, Steve Hall pointed out a great way of working with cycling partitions using a computed column that meant some of the maintenance aspects of partition archiving were a nice doddle.  I’ve since incorporated this tactic as a standard method to partition tables that have a rolling window of archiving, and for the most part it works very nicely.

Recently, we’ve had a requirement to move another large heap table into a partitioning structure to ease maintenance on the table, and ensure we aren’t just storing unneeded data forever.  Afterall, querying a 98GB heap with around half a billion rows just isn’t a good idea anyway!  In this instance, however, the old data needed migrating into the new table, which seemed to create a bit of a problem in terms of the logging level.

As per the BOL guidelines on operations that can be minimally logged, we ensured that all the following were met:-

  • The database is in Simple recovery model.
  • The target table is empty, albeit with a partitioned, clustered index.
  • The insert statement specifies the WITH (TABLOCK) hint.

Despite all these settings, however, when performing the insert, the log started to grow rapidly.  Querying sys.fn_dblog returned a swathe of LOP_INSERT_ROWS for the context LCX_CLUSTERED.  I had an inkling that this might be due to the computed column, so I went to twitter to see if anyone had any thoughts into the matter.  Luckily for me, Adam Machanic was kindly able to have a look at the problem and share his thoughts.

The first suggestion was to try TraceFlag 610 in the session; but this did not make any difference to the statement.  However, it did prompt me in to a test to see if minimal logging was achieved without the computed column, and so I knocked up a copy of the table with the computed column replaced by a straight tinyint; I populated it by moving the computed code into the insert statement.  From checking sys.dn_dblog, it was obvious straight away that minimal logging was being applied, and all the LOP_INSERT_ROWS were gone.

Adam was then able to reproduce the issue, and below are his tweets with the issue and solution:-

So, putting this together we need to create an additional staging table, with the same clustered index structure as the final target partitioned table and with the computed column, however, we need to ensure that this table is not partitioned.  We also need to create a check constraint on the staging table, to limit the data within that table to only meet the criteria of the partition we’re switching into; if we don’t do this, we will get the following error:-

Msg 4972, Level 16, State 1, Line 1

ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table allows values that are not allowed by check constraints or partition function on target table.

Bear in mind, that NULL equates to “unknown” for the purpose of check constraints and partition functions, as per the answer from Paul White to this stack exchange question so you will need to ensure that the check constraint handles NULLs accordingly.  You’re also going to either need a separate staging table for each partition, with it’s own check constraint, or drop and create the check constraint on the staging table each time.

So, stepping through the code to set an example for this up, we’ll start off by creating a heap and adding some data, then creating a partition function and scheme for our target table:-

/* create the Heap table */
IF OBJECT_ID('dbo.Heap') IS NOT NULL
DROP TABLE dbo.Heap</code>

IF OBJECT_ID('dbo.PartitionedTable') IS NOT NULL
DROP TABLE dbo.PartitionedTable

IF OBJECT_ID('dbo.PartitionedTable_Staging_Partition1') IS NOT NULL
DROP TABLE dbo.PartitionedTable_Staging_Partition1

IF OBJECT_ID('dbo.PartitionedTable_Staging_Partition2') IS NOT NULL
DROP TABLE dbo.PartitionedTable_Staging_Partition2

IF EXISTS (SELECT * FROM sys.partition_schemes AS PS WHERE name = 'pscTenYears')
DROP PARTITION SCHEME pscTenYears

IF EXISTS (SELECT * FROM sys.partition_functions AS PF WHERE name = 'pfnTenYears')
DROP PARTITION FUNCTION pfnTenYears
GO

CREATE TABLE dbo.Heap
(DateInserted DATETIME2(3) NOT NULL
, SomeData VARCHAR(255))
GO

/* create ten years of dates with 1000 row per day*/
DECLARE @StartDate DATETIME2(3) = '2006-12-31'

INSERT INTO dbo.Heap WITH (TABLOCK)
(DateInserted
, SomeData)
SELECT DateInserted
, name
FROM (
SELECT TOP 3652 DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT 1)), @StartDate) AS DateInserted
, c1.name
FROM sys.columns C1
CROSS JOIN sys.columns C2
) A
CROSS JOIN (SELECT TOP 1000 1 AS multiplier
FROM sys.columns c3
CROSS JOIN sys.columns c4) b
GO

/* create the partition function we need to cycle 10 years (remember we need spare partitions because one needs to be empty at all times */
CREATE PARTITION FUNCTION pfnTenYears(TINYINT) AS RANGE LEFT FOR VALUES (1,2,3,4,5,6,7,8,9,10,11)
GO

/* create the partitin scheme*/
CREATE PARTITION SCHEME pscTenYears AS PARTITION pfnTenYears ALL TO ([DEFAULT])
GO

/* create the target, partitioned table */
CREATE TABLE dbo.PartitionedTable
(DateInserted DATETIME2(3) NOT NULL
, SomeData Varchar(255) NULL
, PartitionKey AS CONVERT(TINYINT, DATEDIFF(YEAR,(0),DateInserted)%10+1) PERSISTED NOT NULL
)

/* add the partitioned clustered index */
CREATE CLUSTERED INDEX [CIX_dbo:PartitionedTable-PartitionKey:DateInserted]
ON dbo.PartitionedTable (PartitionKey, DateInserted)
WITH (DATA_COMPRESSION = PAGE) ON pscTenYears(PartitionKey)
GO

Next, we’re going to create the interim staging table, this, importantly, will have a constraint to restrict the value of the PartitionKey column, in order to ensure that it is aligned with the target partition we will be switching into. We’re then going to insert the first partitions worth of data into the table:-

/* create an intermediate staging table*/
CREATE TABLE dbo.PartitionedTable_Staging_Partition1
(DateInserted DATETIME2(3) NOT NULL
, SomeData VARCHAR(255)
, PartitionKey AS CONVERT(TINYINT, DATEDIFF(YEAR,(0),DateInserted)%10+1) PERSISTED NOT NULL
)

CREATE CLUSTERED INDEX [CIX_dbo:PartitionedTable-PartitionKey:DateInserted]
ON dbo.PartitionedTable_Staging_Partition1 (PartitionKey, DateInserted)
WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]

/* we need this constraint otherwise the partition switch won't work*/
ALTER TABLE dbo.PartitionedTable_Staging_Partition1
ADD CONSTRAINT [CHK_PartitionKey=1] CHECK (PartitionKey = 1 AND PartitionKey IS NOT NULL)
GO

/* do a checkpoint to clear the log */
CHECKPOINT
GO

/* insert only the first partition*/
INSERT INTO dbo.PartitionedTable_Staging_Partition1 WITH (TABLOCK)
(DateInserted
, SomeData)
SELECT DateInserted
, SomeData
FROM dbo.Heap
WHERE $PARTITION.pfnTenYears(CONVERT(TINYINT, DATEDIFF(YEAR,(0),DateInserted)%10+1)) = 1

I’m now going to query the transaction log using fn_dblog, to have a look at whether the row inserts were fully logged:-

/* look at the log for the new insert; we can't see any  */
SELECT D.AllocUnitName
, context
, COUNT(*) NumberLogRecords
FROM sys.fn_dblog(NULL, NULL) AS D
GROUP BY D.AllocUnitName
, context
ORDER BY NumberLogRecords DESC

Here’s my result set; as we can see, only the PFS, IAM and GAM page changes for the table have been logged:-

MinimalLoggedInsert

Lastly, to move this partition into our target table we can run the following statement:-

/* switch this into our original table */
ALTER TABLE dbo.PartitionedTable_Staging_Partition1
SWITCH TO dbo.PartitionedTable PARTITION 1
GO

To do the rest of the partitions it’s a pretty similar operation, but simply changing everything to the results for the second partition:-

/* to do the second partition we can run the following; the code is pretty much the same as the first insert; just where the partition key = 2*/

CREATE TABLE dbo.PartitionedTable_Staging_Partition2
(DateInserted DATETIME2(3) NOT NULL
, SomeData VARCHAR(255)
, PartitionKey AS CONVERT(TINYINT, DATEDIFF(YEAR,(0),DateInserted)%10+1) PERSISTED NOT NULL
)

CREATE CLUSTERED INDEX [CIX_dbo:PartitionedTable-PartitionKey:DateInserted]
ON dbo.PartitionedTable_Staging_Partition2 (PartitionKey, DateInserted)
WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]

/* we need this constraint otherwise the partition switch won't work*/
ALTER TABLE dbo.PartitionedTable_Staging_Partition2
ADD CONSTRAINT [CHK_PartitionKey=2] CHECK (PartitionKey = 2 AND PartitionKey IS NOT NULL)
GO

/* do a checkpoint to clear the log */
CHECKPOINT
GO

/* insert only the first partition*/
INSERT INTO dbo.PartitionedTable_Staging_Partition2 WITH (TABLOCK)
(DateInserted
, SomeData)
SELECT DateInserted
, SomeData
FROM dbo.Heap
WHERE $PARTITION.pfnTenYears(CONVERT(TINYINT, DATEDIFF(YEAR,(0),DateInserted)%10+1)) = 2

/* look at the log for the new insert */
SELECT D.AllocUnitName
, context
, COUNT(*) NumberLogRecords
FROM sys.fn_dblog(NULL, NULL) AS D
GROUP BY D.AllocUnitName
, context
ORDER BY NumberLogRecords DESC

/* switch this into our original table */
ALTER TABLE dbo.PartitionedTable_Staging_Partition2
SWITCH TO dbo.PartitionedTable PARTITION 2
GO

Let’s have a look at the table PartitionKey counts, and the number of rows in each partition:-

/* now lets check our partition table */

SELECT PT.PartitionKey
, COUNT(*) NumberRowsPerPartition
FROM dbo.PartitionedTable AS PT
GROUP BY PT.PartitionKey

/* and if we look at the meta data for the partitions, we'll see that's also correct*/

SELECT sc.name AS SchemaName
, ob.name AS TableName
, i.name AS IndexName
, p.partition_number
, p.rows
FROM sys.objects AS ob
INNER JOIN sys.schemas AS sc
	ON ob.schema_id = sc.schema_id
INNER JOIN sys.indexes AS i
	ON ob.object_id = i.object_id
INNER JOIN sys.partitions AS p
	ON i.index_id = p.index_id
	AND ob.object_id = p.object_id
WHERE ob.name = 'PartitionedTable'
ORDER BY SchemaName, TableName, IndexName, partition_number

PartitionData.png

So, we’ve been able to perform two seperate inserts into the same table, whilst taking advantage of minimal logging.  Neat, huh?  The next step, which I’ll look at covering in a future post, will be to utilize this in SSIS to perform a single scan of the table and a conditional split, which will then insert each chunk of data into it’s staging table before using the switch to pull it all together.

Very big thanks to Adam Machanic for his assistance in resolving this problem!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s