Monday 8 August 2011

Splitting dates over monthly periods using a 'numbers table'

One of the things you may find yourself having to do is to take a single record with a start and end date and subdivide it into monthly or weekly periods. Typically this is so that you can create a record for each month or week in a data warehouse, and have a data structure thats easy to report from. There are many ways of doing this of course, but an elegant set-based solution involves using a 'numbers table'.

Numbers tables are well documented elsewhere, however they can now be created in a trivial fashion in Denali using a Sequence, with the following script.


USE adventureworks2008r2
SET nocount ON

IF EXISTS (SELECT 1
           FROM   information_schema.tables
           WHERE  table_name = 'Numbers')
  DROP TABLE numbers

GO

CREATE TABLE numbers
  (
     NUMBER INT NOT NULL,
     CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (NUMBER) WITH FILLFACTOR = 100
  )

IF EXISTS (SELECT 1
           FROM   sys.objects
           WHERE  name = 'SeqNumbers'
AND type_desc = 'SEQUENCE_OBJECT')
  DROP sequence dbo.seqnumbers

GO

CREATE SEQUENCE SeqNumbers
AS INT
MINVALUE 0
GO

INSERT Numbers(Number) VALUES(NEXT VALUE FOR SeqNumbers)
GO 10000

This will give you a table of numbers from 0-10000.

Then a simple query will split a table into monthly records - or any other time period - by joining to the numbers table using the difference between the start and end dates of the rows you want to split. An example will probably make it clearer - we'll use the Production.BillOfMaterials in the AdventureWorks2008R2 database.

Firstly lets make the dates a little less uniform:

UPDATE  [Production].[BillOfMaterials]
SET    EndDate = '20040405'
WHERE  BillOfMaterialsID=271

UPDATE  [Production].[BillOfMaterials]
SET    StartDate='20040731', EndDate = '20040801'
WHERE  BillOfMaterialsID=1950


UPDATE  [Production].[BillOfMaterials]
SET    StartDate='20041031', EndDate = '20051130'
WHERE  BillOfMaterialsID=2899

Querying the Production.BillOfMaterials for an Id and start and end dates and quantity looks something like this:


Taking the difference between the start and end dates and joining to the numbers table gives one row per month that exists between the start and end dates. A little jiggery-pokery with the date functions gives you the start and end of each month over which the bill of materials spanned, like so;

SELECT [BillOfMaterialsID],
       [StartDate],
       [EndDate],
     DATEDIFF(m, StartDate, EndDate) NumberOfMonths,
     CAST(CONVERT(CHAR(6), DATEADD(m, Number, StartDate), 112)+'01' AS DATETIME) MonthStartDate,
     DATEADD(dd, -1, (CAST(CONVERT(CHAR(6), DATEADD(m, Number+1, StartDate), 112)+'01' AS DATETIME))) MonthEndDate,
     CAST(PerAssemblyQty/(DATEDIFF(m, StartDate,  EndDate)+1) AS DECIMAL(5,2))PerAssemblyQtyMonthly,
     PerAssemblyQty
  FROM [Production].[BillOfMaterials] b
  INNER JOIN dbo.Numbers n ON DATEDIFF(m, StartDate, EndDate)>=n.Number
  WHERE [EndDate] IS NOT NULL
You'll note that I also divided the PerAssemblyQty by the number of months (adding one because the DATEDIFF on two dates in the same month returns 0). Finally lets see the result set

No comments: