Using the MERGE statement is pretty well documented however what about using it to update a dimension that includes Type 1 and Type 2 changes? There are examples on the web but they're either not SQL specific or seem flawed in one way or another. To follow this example you'll need the AdventureWorksDW2008 database available here. The R2 version of AdventureWorksDW might work but as yet I haven't tested it. Firstly lets set up a staging table and a dimension table:
SELECT ProductKey,ProductAlternateKey AS ProductId,EnglishProductName,
ISNULL(ListPrice, 0) AS ListPrice,HASHBYTES('SHA1', EnglishProductName)
AS Type1CheckSum,
HASHBYTES('SHA1', CAST(ISNULL(ListPrice, 0) AS VARCHAR(10))) AS
Type2CheckSum
INTO ProductStaging
FROM AdventureWorksDW2008..DimProduct
WHERE EndDate IS NULL
GO
CREATE TABLE [dbo].[DimProduct]
(
[ProductKey] [INT] IDENTITY(1, 1) NOT NULL,
ProductId NVARCHAR(25) NOT NULL,
[EnglishProductName] [NVARCHAR](50) NOT NULL,
[ListPrice] [MONEY] NOT NULL,
DateUpdated DATETIME NOT NULL DEFAULT GETDATE(),
Type1Checksum VARBINARY(8000) NULL DEFAULT 0,
Type2Checksum VARBINARY(8000) NULL DEFAULT 0,
StartDate DATETIME NOT NULL DEFAULT GETDATE(),
EndDate DATETIME NULL,
CurrentFlag BIT NOT NULL DEFAULT 1
CONSTRAINT [PK_DimProduct_ProductKey] PRIMARY KEY CLUSTERED ( [ProductKey]
ASC )
)
Note that we're using HASHBYTES on the staging table - this will be used to compare columns. The SHA1 algorithm ensures sensitivity to changes across those columns. If you're using non-character fields you'll need to cast them before hashing them.
Firstly lets add new records (at this point all of them):
MERGE dbo.DimProduct AS dim
USING ProductStaging AS stg
ON dim.ProductId=stg.ProductId
WHEN MATCHED AND dim.Type1Checksum<>stg.Type1Checksum THEN --type1 changes
UPDATE SET dim.Type1Checksum=stg.Type1Checksum,
dim.EnglishProductName=stg.EnglishProductName,
dim.DateUpdated=GETDATE()
WHEN NOT MATCHED THEN --new records
INSERT (ProductId, EnglishProductName, ListPrice, Type1Checksum, Type2Checksum)
VALUES (stg.ProductId, stg.EnglishProductName, stg.ListPrice, stg.Type1Checksum, stg.Type2Checksum);
You'll note that this statement actually does the Type 1 updates as well - ie where the checksum\hash columns are different.
OK, lets add some changes to the ProductName - which will be handled as Type 1 overwrites, and some changes to ListPrice - which we'll handle as Type 2:
UPDATE ProductStaging
SET EnglishProductName = 'Fred',
Type1CheckSum = HASHBYTES('SHA1', 'Fred')
WHERE ProductId = 'AR-5381';
UPDATE ProductStaging
SET ListPrice = 9.99,
Type2CheckSum = HASHBYTES('SHA1', CAST(ISNULL(9.99, 0) AS VARCHAR(10)))
WHERE ProductId = 'BE-2349';
GO
So now we'll do the Type 1 update using the same MERGE statement as above. We also set the DateUpdated column - for auditing purposes:
MERGE dbo.DimProduct AS dim
USING ProductStaging AS stg
ON dim.ProductId = stg.ProductId
WHEN MATCHED AND dim.Type1Checksum<>stg.Type1Checksum THEN --type1 changes
UPDATE SET dim.Type1Checksum = stg.Type1Checksum,
dim.EnglishProductName = stg.EnglishProductName,
dim.DateUpdated = GETDATE()
WHEN NOT MATCHED THEN --new records
INSERT (ProductId,
EnglishProductName,
ListPrice,
Type1Checksum,
Type2Checksum)
VALUES (stg.ProductId,
stg.EnglishProductName,
stg.ListPrice,
stg.Type1Checksum,
stg.Type2Checksum);
Here we're careful to ensure that Type 1 changes apply to all records including the historic ones. Some examples filter for current records only...
Finally the Type 2 INSERTS. The trick here is to use the OUTPUT statement on the UPDATE to create a pseudo-table and then SELECT from that into the DimProduct dimension table to create the new 'current' record. Note that I'm also setting the CurrentFlag and EndDate on the old dimension record - so that its clear this is a historic record. Again all in one statement:
INSERT dbo.DimProduct
(ProductId,EnglishProductName,ListPrice,DateUpdated,Type1Checksum,
Type2Checksum,
CurrentFlag)
SELECT ProductId,EnglishProductName,ListPrice,GETDATE(),Type1Checksum,
Type2Checksum,1
FROM (
MERGE DimProduct AS dim
USING ProductStaging AS stg
ON dim.ProductId = stg.ProductId
WHEN MATCHED AND dim.Type2Checksum<>stg.Type2Checksum AND dim.CurrentFlag=1 THEN
UPDATE SET CurrentFlag = 0,
EndDate = GETDATE()
OUTPUT $Action AS Operation,stg.ProductId,stg.EnglishProductName,stg.ListPrice,
stg.Type1Checksum,stg.Type2Checksum
) AS MergeOut
WHERE Operation = 'UPDATE';
So that's it! Hope its um...useful.
1 comment:
Very useful..niche!
Post a Comment