Tuesday 3 April 2012

SQL Bits X - the 'take-aways'

I've just about recovered from attending SQL Bits X in London and I thought I'd share the points that I took away from Thursday and Friday.

Overall the conference was well-organised and probably twice the size of the last one I attended in Brighton. On Thursday alone they had about 600, and to quote Chris Webb there were 'loads more' on Friday. Its still organised by volunteers and does not make a profit, and is comfortably a match for the other more established conferences (such as Connections, TechEd or MS's own BI Conferences). So many thanks to all those who made it happen.

One other note - what follows is my own personal take on each session, as much for my own benefit as anyone else's and any mistakes or omissions in relaying them here are entirely my own.

SSAS Deep-dive
Akshai Mirchandani
On Thursday I went to Akshai Mirchandani's SSAS deep-dive session, which slightly confusingly, was merged with Cathy Dumas' 'Putting tabular models into production' for the first hour. This actually worked quite well as they went through the pro's and cons of SQL 2012's Analysis Services Tabular vs 'old' Analysis Services cubes (now re-branded as 'Multi-dimensional'). The major take-away for me here, was the fact that MS has committed to add Multi-dimensional (ie cube) support to Power View (aka 'Project Crescent') 'before the next major release'. Not sure exactly what that means - whether we have to wait for a service pack, an 'R2' release or even a Reporting Services-type release. Which if you recall, originally came out in 2003 as a seperate install (before SQL 2005). In any case this could still be a couple of years away.

Additionally MS is committed to supporting both Tabular and Multi-dimensional into the foreseeable future. However, whilst Multi-dimensional will get incremental development, Tabular will get the most investment. As an aside my colleague spent the rest of his day in the Tabular session and identified some serious limitations in the current version around KPI's, which will limit the use to which we can put it. Anyway the controversy over the BI Semantic Model and Multi-dimensional versus Tabular has been covered at some length elsewhere and so I'll leave that particular hot potato.

The rest of Akshai's session was 'deep' as advertised (it was based on the three-day meastro course) and way out of the scope of this blog post, however the following points stood out for me;

  • Full processing of cubes blocks queries - no surprise - but it is also blocked by queries. This could lead to a situation where processing can't complete because of long-running queries. In turn other queries that started after processing began would be waiting for the cube to complete processing (ie release its 'pending' lock). In order to mitigate this change the 'ForceCommitTimeout' property. This sets the amount of time the processing will wait for a query to finish before cancelling it.
  • Always consider keying string attributes by using a integer, since there is a 12-byte overhead for even the smallest string. A smaller key means better querying and processing performance.
  • Similarly use an integer value to key degenerate dimensions. It may even be worth experimenting with a separate database table for degenerate dimensions. Also try indexing the degenerate column in the database.
  • Its usually worth explicitly setting the partition-slice (ie the 'Slice' property in a partition) explicitly. This is for various reasons, described here and here. Further, some customers use regions or time-zone slicers in addition to date in order to support rolling partition-processing by time-zone.
  • Finally its worth remembering that when constructing a calculated measure its worth evaluating to NULL rather then 0. This improves performance by taking advantage of block computation algorithms

Introduction to Microsoft Codename "Data Explorer"
Miguel Llopis
The first session I attended on Friday was about 'Data Explorer'. This is a beta product from Azure Labs that is essentially a self-service ETL tool. Despite being a beta product it is fully-featured with a long list of potential data sources such as SQL databases, text files, Excel, OData and even tables in web pages - the example they used was a Wikipedia page. There is also a wide variety of transformations that can be applied as well as look-ups and merges of seperate data sets. The immediate use-case I thought of was as a way of profiling data sources prior to bringing into a data warehouse via more traditional ETL tools. I will definitely be checking it out when I get the chance.

Building a BI Portal for Tabular Models
Cathy Dumas
Despite the title this was more of a high-level look at the variety of SQL Server tools now available for reporting, rather than a practical lesson in creating portals for Tabular. This included the usual suspects such as SSRS, Excel Services, PerformancePoint and Power View. For me, the key points here were;
  • Currently Reporting Services, of all things, has the best support for mobile - as long as you're using iOS (ie iPhone, iPad). The PASS presentation featuring Power View on mobile was in Cathy's words a 'fake'. Apparently it was merely indicative of functionality that will be made available in the future.
  • Reporting Services can also be used to report on Tabular models by using DAX by pretending its a DMX query. Its unsupported, but works, and reminds me of similar hacks needed to report on cubes in the first release of SSRS.
  • SharePoint is the future for BI reporting, for example the next version of PerformancePoint is likely to be much more integrated into SharePoint. If you consider yourself a BI professional, and you want to exclusively use MS products, its no longer enough just to learn SQL, you need to understand SharePoint too. This also means paying attention to SharePoint Cumulative Updates and the next version of SharePoint coming with the monster Office 15 release.
  • Cathy concluded by suggesting that the best solution for dashboards at the moment was a mix of Excel and Reporting Services. She called out the latter's strength in geospatial for which Excel has no support. Conversly Excel has the advantage of allowing end-users to alter reports rather than costantly needing BI pro's help.

Load Testing Analysis Services
Bob Duffy
The penultimate session I attended was devoted to performance testing Analysis Services. The source code can be found here.
  • The session principally used Visual Studio Test Edition, which was slightly disappointing for me as I don't think MITIE's MSDN subscription actually covers it!
  • Despite the reliance on VS Test Edition, there were some good tips. One if which was in the analysis of trace files. Bob suggested using checksum to identify similar queries and take the average response of these rather than focusing on the slowest queries that might represent 'freak' results.
  • What came through quite strongly in the case study Bob quoted was that caution needs to be exercised when using more CPU cores to process and query Analysis Services cubes. This is because before SQL Server 2012 SSAS didn't work at all well with NUMA. This can be so bad that performance can actually worsen. In the case study the ultimate solution for this was to run 4 virtual machines on each NUMA node, each hosting an SSAS instance.

Unit Testing SSIS Packages
John Welch
John presented his unit-testing framework/software for Integration Services. I'd actually already downloaded this from CodePlex a while ago.
  • Its still the only real option for unit-testing SSIS, and John promised that the next release would incude the ability to test individual data-flow transformations which is sorely needed.
  • One good tip was the use of database snapshots to roll-back changes made by unit tests.

..and finally SQLRelay!

At the end of the Friday we were all reminded that all-day SQL-related community events continue through April and May; details here. I'd thought I'd give a special mention to the Bristol event since its in my back yard. Hope to see you there!