Sunday, 14 November 2010

SQL Server Denali: SSIS Configuration and Package Execution

One of the first things you note from the release notes are the changes to SSIS configurations in Denali. This is excellent news because the use of configurations in previous versions of SSIS weren't without their problems. The inimitable Jamie Thompson has a series of excellent posts about it which I won't repeat here, plus there is additional information on the MSDN Wiki.

Briefly however, the new deployment model involves deployment to a specially constructed SQL DB otherwise known as 'SSISDB' (at least in this CTP) which appears under its own node in SSMS - with its own GUI options. The SSISDB database is the SSIS 'Catalog' and includes the concepts of environments and parameters to allow run-time configuration of packages. Therefore SSIS is now 'environment aware' and there is no need to have to store configuration locations outside of SSIS in Windows environment variables and so on - which was always a bit awkward. Parameters can now be created on a project level removing the need to create Parent package configurations. All of this can be administered by using views and stored procedures, the API or the aforementioned SSMS.

However the one thing I was left wondering was: 'how do I execute a package under the new configuration model without using the SSMS GUI?'. DTExec still exists but does not appear to have options to use the new environment variables (at least in the current CTP). Happily perusal of the new MSDN documentation reveals the answer.

Firstly create and environment called 'env1', deploy an SSIS project called 'Project' with a package called 'Package1' to a folder called 'Test'. See Jamie's blog to set these up. Then do the following. First of all you need to retrieve the appropriate reference id for the project and environment:

DECLARE @execution_id BIGINT

DECLARE @reference_id BIGINT



SELECT @reference_id = e.reference_id

FROM   catalog.project_environments e

       INNER JOIN catalog.projects p

         ON e.project_id = p.project_id

WHERE  e.environment_name = 'env1'

       AND p.name = 'Project'
Then we need to tell the SSIS Catalog that we want to execute the package with the particular reference:

EXECUTE catalog.create_execution

  @folder_name = 'Test',

  @project_name = 'Project',

  @package_name = 'Package1.dtsx',

  @reference_id = @reference_id,

  @execution_id = @execution_id OUTPUT

Finally tell SSIS to execute using the execution id created above:

EXECUTE catalog.start_execution @execution_id = @execution_id

A couple of notes:

1. The package referenced by 'create_execution' has to have the 'Entry-point package' property set in Visual Studio:


2. There is an unused parameter in the above call to 'create_execution' - '@use32bitruntime' - use this on 64-bit environments if you want to force SSIS to execute in 32-bit mode.

This leaves two overall configuration options as I see it. You could either:

1. Have one SSIS 'environment' for every application you have. This means that the SSIS environment referenced at runtime would be the same across Dev, Test, Live and so on. Obviously the parameters themselves would be different!

2. Have all of your application/network environments (ie Dev, Test, Live) created in every instance of the SSIS Catalog and reference the appropriate one at runtime. This is obviously more flexible but probably carries more risk of running the wrong configuration on the wrong environment - with potentially serious consequences!

SQL Server Denali: First Impressions

So I, along with a great many others, have been playing around with Denali CTP1 over the last few days and reading about the many announcements made at SQL PASS. Its become clear that its probably the biggest release since SQL Server 2005 in terms of new features, with possibly the biggest splash being made by 'Project Crescent' (the reporting tool I mentioned in my previous Denali post) and the Business Intelligence Semantic Model (BISM).

They're both described in quite a bit of detail here, and the introduction of BISM has provoked some misgivings in the developer community. However the introduction of the Powerpivot engine (otherwise known as Vertipaq) to developer tools, has to be a good thing even if they haven't managed to integrate it with Analysis Services. Potentially this could even mean the end of data warehousing and ETL, although this has been tried before...

Project Crescent is Silverlight-based and looks like the sort of end-user BI tool those of us working with MS BI software have been waiting for since Reporting Services came out. There's a very brief demo of it here.

On a more detailed note as I said in my previous post you won't be able to use the new Denali CTP1 projects if you have 2008 or 2008 R2 installed and I found that I had to uninstall the 2008 Team Foundation Client as well. However if you're using Windows 7 PowerShell 2.0 is already installed.

Additionally on starting SQL Server Management Studio (SSMS) I found the Create Catalog on the Integration Services folder (of which more in a later post) greyed out as follows:



When I tried to connect to the (now superceded) Integration Services Server I got the following 'Access denied' error:



On Windows 7 both problems were solved by simply running SSMS in Administrator mode by right-clicking on the SSMS icon and selecting 'Run as Administrator':


More soon...

Thursday, 11 November 2010

Denali available for download

As promised by MS the first CTP of the next version of SQL Server is available for download now. What's new? Check here and here.

One thing that immediately jumps out at me is the following;

Business users will gain greater agility and empowerment with a new highly interactive, web-based data exploration and visualization solution. Introducing new and engaging ways to discover insights through a drag-and-drop interface, smart and powerful data querying and interactive storyboarding to allow users to quickly create and share visual presentations of large datasets.
Sounds intriguing - but I'm not sure its in the CTP. Needless to say I'm downloading now...

Update:

Its worth checking the release notes here. There is a dependency on Powershell 2.0 - and more importantly no support for side-by-side installations with 2008/R2. There is also no support for XP regardless of SP version.

Thursday, 28 October 2010

Integration Services Lookup caching - Part one

Lookup caching in Integration Services is a new feature in 2008 which didn't get a lot of attention at the time of release and even now a search reveals some posts that discuss one aspect or other, but maybe not the whole story. Hopefully I will redress the balance here a little.

First of all then - what is Lookup caching?

Back in Integration Services 2005 there was a transformation task which allowed you to match a record in the dataflow with record or field in a database via an OLE DB connection. In a data warehouse context this was often used to look up surrogate keys when loading fact tables. This was OK but every time a lookup was needed it required a round-trip to the database. If you're loading a data warehouse with role-playing dimensions you might be looking up the same values for the same fact table, in the same dataflow. There were some options for preformance-tuining but essentially you always had to connect to the database in some way every time. Inefficient and in stark contrast to products such as Informatica which allowed you to reuse lookup data cached in memory.

So what changed in 2008?

For 2008 Microsoft substantially improved lookup functionality. You now have two basic caching options for lookups. These boil down to in-memory caching and file caching. Both are potentially faster than lookups using a database.

In this post we'll deal with memory-caching.

As the name suggests this stores the lookup data in-memory (ie RAM). This is pretty well covered by Jamie here . The key point is that the in-memory cache will not extend beyond the scope of the package ie you can't re-use an in-memory cache in one package having created in another. At this point its worth clarifying how this is used in a package.

Step one:

After you've created a connection manager, the first step is to create a data source in a Data Flow Task - in this example it is an OLEDB Source Transformation. Note you can use any source for this - providing another advantage over using the 2005 look-up task on its own. I'm calling this OLEDB_Dim_Time for obvious reasons:

  
As you can see we're going to look up a time surrogate key - and by time I mean time of day not a calendar or date dimension. I'll cover the usefulness of a 'real' time dimension in another post. As is best practice we are selecting only the fields needed for the look-up and the value to be added to the data flow (ie the TimeKey).

Step two:

The next step is to terminate the data flow in the Cache Transform - which caches in memory.

The Cache Transform Editor looks as follows:



Here you need to create a cache connection manager by clicking the New... button. This then displays the Cache Connection Manager Editor, give the Cache Connection Manager a sensible name - in this case 'TimeLKP', and click on the 'Columns' tab: 



Here you need to specify which columns are 'indexed columns'. This isn't 'indexed' in the sense that there's a database index on the columns specified, instead this is the index that the lookup will use (ie the business key(s)). In other words, specify the order in which a lookup transform should use the lookup columns - lowest level (ie most granular) first. Here we have one column to lookup on so we make this position 1. Then if you want, you can alter the column names in the cache as follows:





At the end of this you should have a Data Flow that looks something like this:


If you were creating multiple caches you could put them all within the same Data Flow - or split them up, its up to you. However, you have to ensure that the cache is created before the lookup is used. To this end I put the lookup in a seperate Data Flow.

Step three:

All thats left is to use the Cache Connection Manager in a Lookup Transform.
This is very straightforward. Create another Data Flow and within that, a data source connection in the normal way. Then add a Lookup Transform to the Data Flow and open its Editor. Select the 'Cache connection manager' radio button.



Then choose your Cache connection: 


The rest of the process is exactly the same as a normal lookup:


Hopefully thats clarified how to go about creating an in-memory lookup cache. In my next post I'll demonstrate creating a file-cache and share the results of my entirely unscientific performance comparison  of memory-caching vs file-caching vs no caching.

Thursday, 30 September 2010

Next version SSIS

Details of the next version of SSIS have been leaking onto the internets. Some highlights; undo in the designer - which will be a big time-saver, data-flow sequence containers - allowing developers to group and hide transformations, and last but not least - rounded corners!

This is great, and no doubt other improvements will be announced over the next few weeks (presumably Visual Studio 2010 integration is a cast-iron certainty). However other gripes on connect remain open, or have already been closed as 'won't fix', such as case sensitivity on the Lookup transform (forcing additional 'uppercase' columns, or passing the whole lookup to the SQL database), and some of Jamie Thomsons debugging suggestions.

Nit-picking aside its interesting to note that less than 6 months after the release of R2 MS is starting to talk about CTP's of the next version.

Wednesday, 22 September 2010

Data dude build without a Visual Studio installation

The simple answer is that you can't.

I've just run into this issue myself. It strikes me that requiring an additional client license, and a full VS install simply to_build_a project is way over-the-top. The TFS alternative is obviously better (it being a server product) - but potentially still requires additional licensing. Anyway with increasing numbers of shops running continuous integration I think it removes a significant amount of desirable functionality from the Visual Studion database project story.

If you want to try and get database projects built only using MSbuild vote here.

Sunday, 19 September 2010

Data dude erroring on deployment - Connect request

I've created a connect request for the problem described below. Fingers crossed it gets accepted - if you want it added go and vote now!

Saturday, 21 August 2010

Data dude erroring on deployment over-writing database file settings

At MITIE we've started using Visual Studio (VS) 2010 for managing database development. The nice thing about 2010 is that the functionality from Data Dude that used to be in a seperate sku and license is now included in the vanilla editions of VS Pro and Premium (Kimberly Tripp has a good post about it here). Premium has more functionality than Pro, obviously.

I won't cover the advantages of using VS 2010 to develop databases as that is covered in detail elsewhere. However there are some wrinkles that aren't quite so well documented.

I started to get this error whilst trying to deploy a database project from VS 2010, that I'd previously imported from SQL Server:

Error SQL01268: .Net SqlClient Data Provider: Msg 1834, Level 16, State 1, Line 1 The file [file name] cannot be overwritten. It is being used by database [database name].

There is some information on the problem here however the thread is a bit long-winded and the final fix leaves something to be desired...

Essentially my problem boiled down to the fact that the database that I imported had different file settings to the database I was deploying to. In this case the filegrowth settings. Therefore the database project also had different filegrowth settings than the deployment database. In this situation VS/data dude will detect a difference and attempt to alter the database file. Obviously if there are open connections to the database then this won't work. Therefore there are two solutions:

1. Close all connections to the database and then deploy (possibly even putting the database in single user mode). This is a bit inconvenient and assumes you're happy with the settings in the database project. In this case I wasn't.

2. Happily there is an alternative. You can change filegrowth (and other) settings of the database project, however these aren't available through the VS 2010 IDE. Instead you have to edit .sql files found here:


[Project location]\[Database project name]\Schema Objects\Storage\Files


In my case I changed the filegrowth settings from:


FILEGROWTH=1MB




To:


FILEGROWTH=10%




After doing this my deployment problems were solved.

Clearly appropriate filegrowth settings is a whole seperate topic - which could be the subject of at a whole post in itself. On this occasion 10% was better than 1MB.

Unfortunately there is a limited set of database settings available to change through VS 2010 using the database properties > schema settings dialogue - which doesn't include autogrowth let alone sizing. Ideally it'd be nice to be able to amend those settings through VS itself. I think there's probably another Connect request in the works...

Wednesday, 7 July 2010

MERGE and Slowly Changing Dimensions

Blimey - has it really been over 2 months since my last post? A few things have happened since then - not least moving job from Clarity to MITIE. Anyway on with the subject of the post.

Using the MERGE statement is pretty well documented however what about using it to update a dimension that includes Type 1 and Type 2 changes? There are examples on the web but they're either not SQL specific or seem flawed in one way or another. To follow this example you'll need the AdventureWorksDW2008 database available here. The R2 version of AdventureWorksDW might work but as yet I haven't tested it. Firstly lets set up a staging table and a dimension table:



SELECT ProductKey,ProductAlternateKey AS ProductId,EnglishProductName,

       ISNULL(ListPrice, 0) AS ListPrice,HASHBYTES('SHA1', EnglishProductName)

                                         AS Type1CheckSum,

       HASHBYTES('SHA1', CAST(ISNULL(ListPrice, 0) AS VARCHAR(10))) AS

       Type2CheckSum

INTO   ProductStaging

FROM   AdventureWorksDW2008..DimProduct

WHERE  EndDate IS NULL



GO



CREATE TABLE [dbo].[DimProduct]

  (

     [ProductKey]         [INT] IDENTITY(1, 1) NOT NULL,

     ProductId            NVARCHAR(25) NOT NULL,

     [EnglishProductName] [NVARCHAR](50) NOT NULL,

     [ListPrice]          [MONEY] NOT NULL,

     DateUpdated          DATETIME NOT NULL DEFAULT GETDATE(),

     Type1Checksum        VARBINARY(8000) NULL DEFAULT 0,

     Type2Checksum        VARBINARY(8000) NULL DEFAULT 0,

     StartDate            DATETIME NOT NULL DEFAULT GETDATE(),

     EndDate              DATETIME NULL,

     CurrentFlag          BIT NOT NULL DEFAULT 1

     CONSTRAINT [PK_DimProduct_ProductKey] PRIMARY KEY CLUSTERED ( [ProductKey]

     ASC )

  ) 



Note that we're using HASHBYTES on the staging table - this will be used to compare columns. The SHA1 algorithm ensures sensitivity to changes across those columns. If you're using non-character fields you'll need to cast them before hashing them.

Firstly lets add new records (at this point all of them):



MERGE dbo.DimProduct AS dim

USING ProductStaging AS stg

ON dim.ProductId=stg.ProductId



WHEN MATCHED AND dim.Type1Checksum<>stg.Type1Checksum THEN --type1 changes

  UPDATE SET  dim.Type1Checksum=stg.Type1Checksum,

        dim.EnglishProductName=stg.EnglishProductName,

        dim.DateUpdated=GETDATE()



WHEN NOT MATCHED THEN --new records

  INSERT (ProductId, EnglishProductName, ListPrice, Type1Checksum, Type2Checksum)

  VALUES  (stg.ProductId, stg.EnglishProductName, stg.ListPrice, stg.Type1Checksum, stg.Type2Checksum);





You'll note that this statement actually does the Type 1 updates as well - ie where the checksum\hash columns are different.

OK, lets add some changes to the ProductName - which will be handled as Type 1 overwrites, and some changes to ListPrice - which we'll handle as Type 2:



UPDATE ProductStaging

SET    EnglishProductName = 'Fred',

       Type1CheckSum = HASHBYTES('SHA1', 'Fred')

WHERE  ProductId = 'AR-5381';



UPDATE ProductStaging

SET    ListPrice = 9.99,

       Type2CheckSum = HASHBYTES('SHA1', CAST(ISNULL(9.99, 0) AS VARCHAR(10)))

WHERE  ProductId = 'BE-2349';



GO



So now we'll do the Type 1 update using the same MERGE statement as above. We also set the DateUpdated column - for auditing purposes:




MERGE dbo.DimProduct AS dim

USING ProductStaging AS stg

ON dim.ProductId = stg.ProductId

WHEN MATCHED AND dim.Type1Checksum<>stg.Type1Checksum THEN --type1 changes

  UPDATE SET dim.Type1Checksum = stg.Type1Checksum,

             dim.EnglishProductName = stg.EnglishProductName,

             dim.DateUpdated = GETDATE()

WHEN NOT MATCHED THEN --new records

  INSERT (ProductId,

          EnglishProductName,

          ListPrice,

          Type1Checksum,

          Type2Checksum)

  VALUES (stg.ProductId,

          stg.EnglishProductName,

          stg.ListPrice,

          stg.Type1Checksum,

          stg.Type2Checksum); 



Here we're careful to ensure that Type 1 changes apply to all records including the historic ones. Some examples filter for current records only...

Finally the Type 2 INSERTS. The trick here is to use the OUTPUT statement on the UPDATE to create a pseudo-table and then SELECT from that into the DimProduct dimension table to create the new 'current' record. Note that I'm also setting the CurrentFlag and EndDate on the old dimension record - so that its clear this is a historic record. Again all in one statement:



INSERT dbo.DimProduct

       (ProductId,EnglishProductName,ListPrice,DateUpdated,Type1Checksum,

        Type2Checksum,

        CurrentFlag)

SELECT ProductId,EnglishProductName,ListPrice,GETDATE(),Type1Checksum,

       Type2Checksum,1

FROM   ( 

MERGE DimProduct AS dim

USING ProductStaging AS stg

ON dim.ProductId = stg.ProductId

WHEN MATCHED AND dim.Type2Checksum<>stg.Type2Checksum AND dim.CurrentFlag=1 THEN

  UPDATE SET CurrentFlag = 0,

             EndDate = GETDATE()

OUTPUT $Action AS Operation,stg.ProductId,stg.EnglishProductName,stg.ListPrice,

stg.Type1Checksum,stg.Type2Checksum 
) AS MergeOut

WHERE  Operation = 'UPDATE';





So that's it! Hope its um...useful.

Wednesday, 21 April 2010

SQL Server 2008 R2 RTM

MS have issued a press release stating that SQL Server 2008 R2 has been Released To Manufacturing (are they still pressing discs? I guess so...). It also states that it will be available in early May from the likes of us and MS's own channels. Self-service BI gets top billing. I've certainly been impressed with the reporting elements of R2 - not only PowerPivot but also Reporting Services, so I'm looking forward to using it in anger with clients...

Thursday, 18 March 2010

SQL Bits VI Registration Open

Title says it all really. Go here to register and here for the agenda - lots of BI stuff needless to say...

Thursday, 18 February 2010

SQL Bits 6 and MS Tech Days announced

SQL Bits VI, The 6th Sets (see what they did there?) was announced a few days ago (although I only got the mail yesterday). Its on the 16th April in London. Handily, the day before - also in London - MS has announced a Tech-day devoted to SQL Server. Good news is that they are both free so get in fast as these are likely to fill up quick...

Update: registration is yet to open for SQLBits, but you can join their mailing list to be told when it does.

Wednesday, 10 February 2010

Repeating table headers in Reporting Services 2008

You may find it a bit of a challenge to get Reporting Services 2008 to repeat table headers across pages. Unfortunately the tablix properties 'Repeat Row Headers' and 'Repeat Column Headers' appear to do absolutely nothing despite what MSDN says. In fact there is a connect request related to it here. There is a bit of information on the web about this - but not definitive guide and I still found myself scratching my head a bit, so here's my attempt at explaining it:

Step 1:

Select the tablix in question and click the arrow on the left of Column Groups and the bottom of the Design tab, and select 'Advanced Mode':



Step 2:

Under 'Row Groups' you should now see grey bars before and after each tablix grouping marked '(Static)'. These represent the column headers for each group. Click on one of these:



Step 3:

Click the Properties Window (there's no right-click menu on the static groups). On the RepeatOnNewPage property click 'True':




Step 4:


Repeat step 3 for every static group that a repeating header is required. Bear in mind there are the static groups after the tablix - don't set these to repeat.

Then you should be done! Shame this can't simply be done on the tablix property - lets hope MS fix this in a Service Pack at some point.

Wednesday, 20 January 2010

SQL Server 2008 R2 release dated

So the speculation was right. SQL Server 2008 R2 will be out in May this year. Check out the post from MS here. No surprises that PowerPivot and self-service BI is getting a lot of airtime.

Monday, 11 January 2010

Analysis Services processing failure - 'Class not registered"

Beware of migrating SQL Server 2008 Analysis Services databases from a server that had an upgrade from SQL 2005 to one that has a clean install of 2008 - with no previous 2005 install. What you'll find is that the connection string information in the upgraded database will still be using the 2005 SQL Native Client provider; SQLNCLI.1. This is fine on a box that has this provider of course - ie one with a previous 2005 install. However a clean 2008 install won't have this version and you'll get a not terribly informative 'Class not registered message' when you try and process it - as I found out.

The solution is very straightforward - change the connection string in the database's data source to use SQLNCLI10.1 instead of SQLNCLI.1.

More information can be found here

Visual Studio 2008 crash using Reporting Services 2008 charts

Frustrating day today - with a few techology niggles (see above). Anyway this means I can post about it in the hope that it saves someone else a lot of trouble. The major problem I had was with Visual Studio 2008 crashing without warning or error. The first occasion led to me losing work. On both occasions it happened after adding a standard line chart to a Reporting Services report. After checking the Event Log I found the following entry:

.NET Runtime version 2.0.50727.3603 - Fatal Execution Engine Error (7225795E) (80131506)


Did some Googling around and discovered many people getting a similar (though not identical) error message. The answer is to download a hotfix here. So far this seems to have done the trick with no crashes. Note that this isn't included in Visual Studio SP1 or SQL Server 2008 SP1- so service packing won't help (it is in fact a .NET hotfix, from what I can tell) . I would strongly suggest applying the hotfix for anyone planning to do graph/chart work in Reporting Services 2008...