Today I want to report on the Parallel Data Warehouse (PDW) session given by Thomas Kejser, a good, informative session - that only lacked a demo for obvious reasons (although I was told they are working on that). Any inaccuracies are my own however.
Firstly Thomas went through the limitations of the usual SMP (think traditional big-iron) architecture. At about 10TB designing the data warehouse database requires significant additional customisation incurring additional administration and development cost. On top of that the cost of moving above a 64-core SMP architecture starts becoming prohibitively expensive in terms of the increase in the number of additional cores for each pound spent (ie doubling the core-count costs more than twice as much). None of this is of course news for those who operate at these sort of scales.
So PDW is a true MPP solution operating across multiple cores in multiple nodes. PDW comes as a pre-built appliance - but using standard commodity hardware from Dell/EMC, IBM, HP or Bull. Nodes are nothing more than standard servers or disk arrays. The appliance comes pre-built, with all of the cabling, power, disk configuration and software installation already done. In theory it would be possible to mix racks from different hardware vendors. An appliance consists of one or more racks each with 64 cores - Thomas saying the largest being 5 racks at the moment.
An appliance has two control nodes, a management node, a backup node, and a series of compute and storage nodes. It also had one 'spare' node per rack - in case of a whole node failure. There is also a landing zone for ETL tools to deposit text files and a specialised data movement service to move data between nodes where necessary.
As in other MPP architectures designers have to ensure that data is evenly spread across nodes by choosing atomic, evenly distributed columns. Datetime columns are often good candidates. Tables are created with the following syntax:
CREATE TABLE AS SELECT [select list]
WITH
(
CLUSTER_ON [column name]
DISTRIBUTE_ON [column name] REPLICATE
PARTITION_ON [column name]
)
As the key word DISTRIBUTE ON suggests, it identifies the column(s) that distributes the data across the nodes. Its worth pointing out at this point that, like most other MPP products, PDW will only recognise a subset of T-SQL.
A control node receives queries from clients, breaks the query down and issues it to the compute nodes, based on the distribution of data. Only one control node can be active at one time. Compute nodes work on their slice of the query, aggregating it up and sending it back to the control node. Thus each compute node only has to work on a small part of the total query. As the amount of data grows, additional racks can be added to increase the number of nodes and maintain or increase the performance of the appliance.
The key to the appliance are the compute nodes roughly analogous to a Netezza SPU, although unlike those a compute node is simply a standard commodity server running SQL 2008. The version of 2008 on the compute node is claimed to be entirely standard - although the hardware is optimised it is only the sort of thing you can do yourself - each core having its own dedicated chunk of disk and tempdb separated out.
Thomas was asked about the fact that all queries have to pass through one control node, the suspicion being that it could form a bottleneck. Thomas countered this by saying that most queries would never touch the disk of the control node and would be streamed through in-memory. Aggregations carried out on the control node would be relatively trivial as the compute nodes would pre-aggregate the data first.
Thomas also suggested some ETL tips (that work on 'normal' SQL) which I'll post later-on. There were also intriguing possibilities raised around Analysis Services using PDW in ROLAP mode - leveraging the more powerful MDX query language whilst utilising the raw performance of PDW. MS have already posited PDW as the centre of a hub-and-spoke model:
So thats an overview of Madison/PDW - hopefully its of interest. We were told 8-9 companies have it in production and its scheduled for a first half 2010 release. After the summer we should start hearing more about customer experiences and the future roadmap. Pricing is anticipated to be cheaper than Netezza on a per-TB basis, and its likely that MS will discount the list price.