Sunday, 2 September 2012

Using OR for multiple constraints into an SSIS task

After helping out a colleague a while ago with this issue, I thought I'd  post a short explanation of the gotcha that occurs when you route two or more precedence constraints (ie the 'arrows' that join tasks)  into a downstream task. The behaviour of the flow may not be what you expect. Hopefully a quick example will make this clear.

Step 1:

Create a package in your favourite version of SSIS (I'm using 2012 - but any will do). Create a Connection Manager to SQL Server (doesn't matter about the database), and add two SQL tasks to the Control Flow and connect them to a Script Task as follows:
 

Step 2:

On the first Execute SQL Task connect create a SQL statement that you know will succeed, such as SELECT 1.

Step 3:

On the second Execute SQL Task create a SQL statement that you know will fail, such as 'SELECT * FROM NoTable'

The Script Task can be left as is - we're just using to test the control flow logic.

Before you run the package - what do you think will happen? Will the script task execute?

Step 4:

Run the package. You should see the following:


If you thought that the Script Task wouldn't run - award yourself a pat on the back. If you thought it would - read on!

Step 5:

The default behaviour for precedence constraints is that all constraints must evaluate to True before allowing 'downstream' tasks to execute.

Therefore in order to ensure that the downstream script task executes, double-click on one of the constraints (it doesn't matter which one) and in the editor click the radio button that says 'Logical OR. One constraint must evaluate to True'.



















Click OK and you'll see the precedence constraints change to being dashed;
















Step 6:

Now only one of the tasks will need to run successfully for the script task to run. Run the package and you'll note that the script task has indeed run;
 




Tuesday, 5 June 2012

FireInformation to debug SSIS script

This is a quick blog that I've been meaning to post for a while. When writing script in an SSIS script task or transform, debugging can be a bit tricky.
Strategies for debugging scripts are briefly outlined here. I know a lot of beginners (and those few hold-outs still migrating over from DTS) use message boxes to debug scripts. There are a number of disadvantages to message boxes; you can't copy the contents of the message (to validate a file-path for example), the message isn't recorded anywhere to allow you follow the script logic, and you have to remember to remove the message box statement before deploying to a server (otherwise the package will hang waiting for a response that will never come!). Therefore I prefer using the FireInformation method of the IDTSComponentEvents interface. At its simplest it looks like this;
public void Main()
        {

            string s1 = "Hello world";
            
            bool f = false;

            Dts.Events.FireInformation(0, "", s1, "", 0, ref f);

            Dts.TaskResult = (int)ScriptResults.Success;
        }
This results in the following in the 'Output' window;

Note that the 'fireAgain' parameter ('f' in the script) is passed by reference and needs the 'ref' keyword. All this means is that the variable can be altered by the FireInformation method itself - I always set it to false. Obviously more useful things can be logged - but the principle is the same. This is particulary useful in a Data Flow (ie as a Script Transformation) where breakpoints and so on rapidly become impractical. One word of warning though ensure that 'OnInformation' events aren't logged as part of the SSIS logging, otherwise all of these debugging messages will be written out to your logging provider, slowing things down. Alternatively simply comment the calls out before deploying the package.

Saturday, 5 May 2012

Why you need UseSandboxSettings

At MITIE we use Visual Studio Database projects to automate our database builds. This usually works perfectly, however recently we discovered that the build was deleting all of the data from one of our development databases. We had backups so we didn't lose any data, but it was inconvenient to say the least!

Suspicion first fell on the 'Always recreate database' option in the Database.sqldeployment file, however this didn't appear to be set - which created something of a puzzle. After a bit of poking around we realised that the problem stemmed from the use of 'My isolated development environment' in the project properties;

This setting is intended to enable developers to build and deploy the database project to their own machines using their own settings as it explains here. However, the .dbproj.user file that stores individual developers' settings was being stored in source control, hence the automated build was using these settings for the shared development environment. The solution was two-fold. Firstly we removed the .dbproj.user file from source control, and secondly we added the "UseSandboxSettings=False" option to our msbuild script, examples of which can be found here. This msbuild option ensures that user settings are ignored by the build process, whether they are included or not. I'd recommend doing this for all builds on shared development environments.

Tuesday, 3 April 2012

SQL Bits X - the 'take-aways'

I've just about recovered from attending SQL Bits X in London and I thought I'd share the points that I took away from Thursday and Friday.

Overall the conference was well-organised and probably twice the size of the last one I attended in Brighton. On Thursday alone they had about 600, and to quote Chris Webb there were 'loads more' on Friday. Its still organised by volunteers and does not make a profit, and is comfortably a match for the other more established conferences (such as Connections, TechEd or MS's own BI Conferences). So many thanks to all those who made it happen.

One other note - what follows is my own personal take on each session, as much for my own benefit as anyone else's and any mistakes or omissions in relaying them here are entirely my own.

SSAS Deep-dive
Akshai Mirchandani
On Thursday I went to Akshai Mirchandani's SSAS deep-dive session, which slightly confusingly, was merged with Cathy Dumas' 'Putting tabular models into production' for the first hour. This actually worked quite well as they went through the pro's and cons of SQL 2012's Analysis Services Tabular vs 'old' Analysis Services cubes (now re-branded as 'Multi-dimensional'). The major take-away for me here, was the fact that MS has committed to add Multi-dimensional (ie cube) support to Power View (aka 'Project Crescent') 'before the next major release'. Not sure exactly what that means - whether we have to wait for a service pack, an 'R2' release or even a Reporting Services-type release. Which if you recall, originally came out in 2003 as a seperate install (before SQL 2005). In any case this could still be a couple of years away.

Additionally MS is committed to supporting both Tabular and Multi-dimensional into the foreseeable future. However, whilst Multi-dimensional will get incremental development, Tabular will get the most investment. As an aside my colleague spent the rest of his day in the Tabular session and identified some serious limitations in the current version around KPI's, which will limit the use to which we can put it. Anyway the controversy over the BI Semantic Model and Multi-dimensional versus Tabular has been covered at some length elsewhere and so I'll leave that particular hot potato.

The rest of Akshai's session was 'deep' as advertised (it was based on the three-day meastro course) and way out of the scope of this blog post, however the following points stood out for me;

  • Full processing of cubes blocks queries - no surprise - but it is also blocked by queries. This could lead to a situation where processing can't complete because of long-running queries. In turn other queries that started after processing began would be waiting for the cube to complete processing (ie release its 'pending' lock). In order to mitigate this change the 'ForceCommitTimeout' property. This sets the amount of time the processing will wait for a query to finish before cancelling it.
  • Always consider keying string attributes by using a integer, since there is a 12-byte overhead for even the smallest string. A smaller key means better querying and processing performance.
  • Similarly use an integer value to key degenerate dimensions. It may even be worth experimenting with a separate database table for degenerate dimensions. Also try indexing the degenerate column in the database.
  • Its usually worth explicitly setting the partition-slice (ie the 'Slice' property in a partition) explicitly. This is for various reasons, described here and here. Further, some customers use regions or time-zone slicers in addition to date in order to support rolling partition-processing by time-zone.
  • Finally its worth remembering that when constructing a calculated measure its worth evaluating to NULL rather then 0. This improves performance by taking advantage of block computation algorithms

Introduction to Microsoft Codename "Data Explorer"
Miguel Llopis
The first session I attended on Friday was about 'Data Explorer'. This is a beta product from Azure Labs that is essentially a self-service ETL tool. Despite being a beta product it is fully-featured with a long list of potential data sources such as SQL databases, text files, Excel, OData and even tables in web pages - the example they used was a Wikipedia page. There is also a wide variety of transformations that can be applied as well as look-ups and merges of seperate data sets. The immediate use-case I thought of was as a way of profiling data sources prior to bringing into a data warehouse via more traditional ETL tools. I will definitely be checking it out when I get the chance.

Building a BI Portal for Tabular Models
Cathy Dumas
Despite the title this was more of a high-level look at the variety of SQL Server tools now available for reporting, rather than a practical lesson in creating portals for Tabular. This included the usual suspects such as SSRS, Excel Services, PerformancePoint and Power View. For me, the key points here were;
  • Currently Reporting Services, of all things, has the best support for mobile - as long as you're using iOS (ie iPhone, iPad). The PASS presentation featuring Power View on mobile was in Cathy's words a 'fake'. Apparently it was merely indicative of functionality that will be made available in the future.
  • Reporting Services can also be used to report on Tabular models by using DAX by pretending its a DMX query. Its unsupported, but works, and reminds me of similar hacks needed to report on cubes in the first release of SSRS.
  • SharePoint is the future for BI reporting, for example the next version of PerformancePoint is likely to be much more integrated into SharePoint. If you consider yourself a BI professional, and you want to exclusively use MS products, its no longer enough just to learn SQL, you need to understand SharePoint too. This also means paying attention to SharePoint Cumulative Updates and the next version of SharePoint coming with the monster Office 15 release.
  • Cathy concluded by suggesting that the best solution for dashboards at the moment was a mix of Excel and Reporting Services. She called out the latter's strength in geospatial for which Excel has no support. Conversly Excel has the advantage of allowing end-users to alter reports rather than costantly needing BI pro's help.

Load Testing Analysis Services
Bob Duffy
The penultimate session I attended was devoted to performance testing Analysis Services. The source code can be found here.
  • The session principally used Visual Studio Test Edition, which was slightly disappointing for me as I don't think MITIE's MSDN subscription actually covers it!
  • Despite the reliance on VS Test Edition, there were some good tips. One if which was in the analysis of trace files. Bob suggested using checksum to identify similar queries and take the average response of these rather than focusing on the slowest queries that might represent 'freak' results.
  • What came through quite strongly in the case study Bob quoted was that caution needs to be exercised when using more CPU cores to process and query Analysis Services cubes. This is because before SQL Server 2012 SSAS didn't work at all well with NUMA. This can be so bad that performance can actually worsen. In the case study the ultimate solution for this was to run 4 virtual machines on each NUMA node, each hosting an SSAS instance.

Unit Testing SSIS Packages
John Welch
John presented his unit-testing framework/software for Integration Services. I'd actually already downloaded this from CodePlex a while ago.
  • Its still the only real option for unit-testing SSIS, and John promised that the next release would incude the ability to test individual data-flow transformations which is sorely needed.
  • One good tip was the use of database snapshots to roll-back changes made by unit tests.

..and finally SQLRelay!

At the end of the Friday we were all reminded that all-day SQL-related community events continue through April and May; details here. I'd thought I'd give a special mention to the Bristol event since its in my back yard. Hope to see you there!

Wednesday, 7 March 2012

SQL Server 2012 RTM!

SQL Server 2012 is released to everyone today. Get the evaluation here (saves a couple of clicks).

There's also a virtual launch event later on at about 4:00pm GMT. Register for that here.

Finally the official, European, real-world 'technical' launch event is SQLBits X on 29th-31st March. Register for that here.

Monday, 28 November 2011

Reporting Services Win/Lose Sparkline - Setting the Axis Line

Win/lose sparklines are a useful way of displaying performance over time, providing you can boil-down performance to a binary 'win or lose' result.


Sean Boon has a good example of doing this here but I struggled to get the Axis line to display in the right place. I thought I'd share what worked for me.

What I found was after setting the axis to visible, as per Seans blog post, that initially the Axis is drawn at the bottom like so:




This just looks daft and gives the incorrect or at least confusing impression of where '0' is.

The answer to this is buried well within the SSRS interface.

1. First select the sparkline chart. Then press 'F4' to see the Properties grid.

3. Find the ChartAreas property and select the tilde to the right:


4. Within the 'ChartArea Collection Editor' select the 'CategoryAxes' property and click the tilde to the right of that:



5. Within the 'ChartAxis Collection Editor' make sure that the 'CrossAt' property is set to '0':


6. Click OK to get back to the 'ChartArea Collectino Editor' and then click the tilde on the right of 'ValueAxes' and repeat step 5.

7. You should now find that the Axis for your win/lose sparkline goes through the middle of your chart where you'd expect. You'll probably also want to remove the tick marks from the Axis by ticking the 'Hide major tick marks' box in the horizontal axis properties.

Monday, 8 August 2011

Splitting dates over monthly periods using a 'numbers table'

One of the things you may find yourself having to do is to take a single record with a start and end date and subdivide it into monthly or weekly periods. Typically this is so that you can create a record for each month or week in a data warehouse, and have a data structure thats easy to report from. There are many ways of doing this of course, but an elegant set-based solution involves using a 'numbers table'.

Numbers tables are well documented elsewhere, however they can now be created in a trivial fashion in Denali using a Sequence, with the following script.


USE adventureworks2008r2
SET nocount ON

IF EXISTS (SELECT 1
           FROM   information_schema.tables
           WHERE  table_name = 'Numbers')
  DROP TABLE numbers

GO

CREATE TABLE numbers
  (
     NUMBER INT NOT NULL,
     CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (NUMBER) WITH FILLFACTOR = 100
  )

IF EXISTS (SELECT 1
           FROM   sys.objects
           WHERE  name = 'SeqNumbers'
AND type_desc = 'SEQUENCE_OBJECT')
  DROP sequence dbo.seqnumbers

GO

CREATE SEQUENCE SeqNumbers
AS INT
MINVALUE 0
GO

INSERT Numbers(Number) VALUES(NEXT VALUE FOR SeqNumbers)
GO 10000

This will give you a table of numbers from 0-10000.

Then a simple query will split a table into monthly records - or any other time period - by joining to the numbers table using the difference between the start and end dates of the rows you want to split. An example will probably make it clearer - we'll use the Production.BillOfMaterials in the AdventureWorks2008R2 database.

Firstly lets make the dates a little less uniform:

UPDATE  [Production].[BillOfMaterials]
SET    EndDate = '20040405'
WHERE  BillOfMaterialsID=271

UPDATE  [Production].[BillOfMaterials]
SET    StartDate='20040731', EndDate = '20040801'
WHERE  BillOfMaterialsID=1950


UPDATE  [Production].[BillOfMaterials]
SET    StartDate='20041031', EndDate = '20051130'
WHERE  BillOfMaterialsID=2899

Querying the Production.BillOfMaterials for an Id and start and end dates and quantity looks something like this:


Taking the difference between the start and end dates and joining to the numbers table gives one row per month that exists between the start and end dates. A little jiggery-pokery with the date functions gives you the start and end of each month over which the bill of materials spanned, like so;

SELECT [BillOfMaterialsID],
       [StartDate],
       [EndDate],
     DATEDIFF(m, StartDate, EndDate) NumberOfMonths,
     CAST(CONVERT(CHAR(6), DATEADD(m, Number, StartDate), 112)+'01' AS DATETIME) MonthStartDate,
     DATEADD(dd, -1, (CAST(CONVERT(CHAR(6), DATEADD(m, Number+1, StartDate), 112)+'01' AS DATETIME))) MonthEndDate,
     CAST(PerAssemblyQty/(DATEDIFF(m, StartDate,  EndDate)+1) AS DECIMAL(5,2))PerAssemblyQtyMonthly,
     PerAssemblyQty
  FROM [Production].[BillOfMaterials] b
  INNER JOIN dbo.Numbers n ON DATEDIFF(m, StartDate, EndDate)>=n.Number
  WHERE [EndDate] IS NOT NULL
You'll note that I also divided the PerAssemblyQty by the number of months (adding one because the DATEDIFF on two dates in the same month returns 0). Finally lets see the result set