Phil's fairly useful blog: 2009

Thursday 3 December 2009

Server properties using err.. SERVERPROPERTY

Useful bit of SQL if you're connecting to a SQL Server for the first time:

SELECT SERVERPROPERTY('Edition') AS SQLVersion
,CASE WHEN SERVERPROPERTY('IsIntegratedSecurityOnly')=0
THEN 'Mixed Security' ELSE 'Integrated Security Only' END AS IntegratedFlag
,SERVERPROPERTY('LicenseType') AS LicenseType
,SERVERPROPERTY('ProductLevel') AS SPLevel
,SERVERPROPERTY('ServerName') AS ServerName
,SERVERPROPERTY('Collation') AS Collation

Wednesday 2 December 2009

SQL Server R2 release dated - or is it?

The Reg is quoting the SQL Server Performance blog, stating that R2 will be released on May 6, 2010. However MS is quoted as saying that this will be delayed if 'more work is needed'. Given that this only emerged as a result of posting some TPC benchmarks, and the resulting backtracking one wonders how well planned this was...

Sunday 22 November 2009

SQL Bits - R2 day - Parallel Data Warehouse

Went to my first SQL bits last Friday (given that it was practically in my backyard how could I not?) and I have to say that for a not-for-profit community-organised event, it was easily as good as those organised by DevExpress, Microsoft et al. Leaving aside the opportunity to catch-up with ex-colleagues and do a bit of networking there were some excellent sessions. Given that this was 'R2 day' these were almost entirely devoted to the next release of SQL Server. Hence I attended sessions on Gemini (now PowerPivot) Madison (now Parallel Data Warehouse) and R2 Reporting Services.

Today I want to report on the Parallel Data Warehouse (PDW) session given by Thomas Kejser, a good, informative session - that only lacked a demo for obvious reasons (although I was told they are working on that). Any inaccuracies are my own however.

Firstly Thomas went through the limitations of the usual SMP (think traditional big-iron) architecture. At about 10TB designing the data warehouse database requires significant additional customisation incurring additional administration and development cost. On top of that the cost of moving above a 64-core SMP architecture starts becoming prohibitively expensive in terms of the increase in the number of additional cores for each pound spent (ie doubling the core-count costs more than twice as much). None of this is of course news for those who operate at these sort of scales.

So PDW is a true MPP solution operating across multiple cores in multiple nodes. PDW comes as a pre-built appliance - but using standard commodity hardware from Dell/EMC, IBM, HP or Bull. Nodes are nothing more than standard servers or disk arrays. The appliance comes pre-built, with all of the cabling, power, disk configuration and software installation already done. In theory it would be possible to mix racks from different hardware vendors. An appliance consists of one or more racks each with 64 cores - Thomas saying the largest being 5 racks at the moment.

An appliance has two control nodes, a management node, a backup node, and a series of compute and storage nodes. It also had one 'spare' node per rack - in case of a whole node failure. There is also a landing zone for ETL tools to deposit text files and a specialised data movement service to move data between nodes where necessary.

As in other MPP architectures designers have to ensure that data is evenly spread across nodes by choosing atomic, evenly distributed columns. Datetime columns are often good candidates. Tables are created with the following syntax:

CREATE TABLE AS SELECT [select list]
WITH
(
CLUSTER_ON [column name]
DISTRIBUTE_ON [column name] REPLICATE
PARTITION_ON [column name]
)

As the key word DISTRIBUTE ON suggests, it identifies the column(s) that distributes the data across the nodes. Its worth pointing out at this point that, like most other MPP products, PDW will only recognise a subset of T-SQL.

A control node receives queries from clients, breaks the query down and issues it to the compute nodes, based on the distribution of data. Only one control node can be active at one time. Compute nodes work on their slice of the query, aggregating it up and sending it back to the control node. Thus each compute node only has to work on a small part of the total query. As the amount of data grows, additional racks can be added to increase the number of nodes and maintain or increase the performance of the appliance.

The key to the appliance are the compute nodes roughly analogous to a Netezza SPU, although unlike those a compute node is simply a standard commodity server running SQL 2008. The version of 2008 on the compute node is claimed to be entirely standard - although the hardware is optimised it is only the sort of thing you can do yourself - each core having its own dedicated chunk of disk and tempdb separated out.

Thomas was asked about the fact that all queries have to pass through one control node, the suspicion being that it could form a bottleneck. Thomas countered this by saying that most queries would never touch the disk of the control node and would be streamed through in-memory. Aggregations carried out on the control node would be relatively trivial as the compute nodes would pre-aggregate the data first.

Thomas also suggested some ETL tips (that work on 'normal' SQL) which I'll post later-on. There were also intriguing possibilities raised around Analysis Services using PDW in ROLAP mode - leveraging the more powerful MDX query language whilst utilising the raw performance of PDW. MS have already posited PDW as the centre of a hub-and-spoke model:

So thats an overview of Madison/PDW - hopefully its of interest. We were told 8-9 companies have it in production and its scheduled for a first half 2010 release. After the summer we should start hearing more about customer experiences and the future roadmap. Pricing is anticipated to be cheaper than Netezza on a per-TB basis, and its likely that MS will discount the list price.

Tuesday 10 November 2009

November SQL Server 2008 R2 CTP out

Thanks to Jamie for posting about this yesterday. The November SQL Server 2008 R2 CTP is out for TechNet and MSDN subscribers now and everbody else tomorrow. Get the bits here.

Wednesday 4 November 2009

More SQL 2008 R2 news

Gemini (posted about previously) has been renamed 'PowerPivot' or more precisely 'Power Pivot for Excel 2010', continuing MS's penchant for long-winded product names. I've had a play with the CPT2 and it pretty much works as advertised - giving the ability to slice-and-dice large, disconnected, datasets with ease on modest hardware, with massive levels of data compression. There's also options to combine it with non-datawarehouse type data such as industry trends analysis or high-level reports retrieved from outside organisations in excel-ready formats. It also includes some useful GUI tricks pinched from QlickView such as greying out slicers which have no data. Unfortunately CTP2 didn't include a connector for Analysis Services so I await that with baited breath in the November CTP.

Also in the news was the renaming of Madison (also mentioned previously) to - wait for it - 'SQL Server 2008 R2 Parallel Data Warehouse' - confirmed as coming out with R2 in the first half of next year. This is great news as it answers the frequent criticism that 'SQL doesn't scale'. Not only that, but unlike many other warehouse appliance providers (such as Netezza and Teradata) you have the choice of commodity hardware from 5 different vendors, and unlike Greenplum (an open-source hardware-independent appliance vendor) MS is hardly a tiny company that few have heard of. More information on the sort of architecture MS propose for their Parallel Data Warehouse can be found here.

Thursday 3 September 2009

Strange Reporting Services 2008 error

Fixed a rather odd SSRS error today. A report with multiple tablix controls wouldn't render as a PDF. Looking in the ReportServer error log I saw the following:

System.FormatException: Too many bytes in what should have been a 7 bit encoded Int32.

The issue was related to a divide-by-zero 'error' in one row in the result set from an Analysis Services cube. Analysis Services doesn't actually error in this situation but returns '1.#INF'. The HTML version of the report happily rendered 'Infinity' however the PDF renderer would appear to be using the BinaryReader.Read7BitEncodedInt Method of System.IO and this obviously chokes on the string since it expects a 7-bit integer. Hence the slightly obtuse error message in the Reporting Services log above. Removing '1.#INF' is as easy as doing the following:

WITH MEASURE c AS (IIF(Measures.a=0, NULL, Measures.b/Measures.a))

Mosha commented here that he didn't see the point in removing these values. I guess he hadn't had to render PDF's using Reporting Services at the time!

Update:

After attempting to reproduce the error it appears that Reporting Services normally, is quite happy to render PDF's correctly, even when there are '1.#INF'-type characters in the results. This is the case even when switching from an integer-only field. So the bug is obviously much subtler than it first appeared. I'll post about this again if I figure it out...

Tuesday 25 August 2009

Madison news

More news on the next developments in SQL Server BI. Mentioned previously on this blog, Madison's progress is reported in the Data Platform Insider blog. One customer is quoted as having 20TB of data, and installations range from 8-20 nodes.

Exciting times next year in the world of MS BI with the releases of Gemini, Madison and Office 2010...

Gemini CTP

Not exactly news I know but the Gemini CTP (technically CTP2) is out. You have to sign up to SQL Server Connect to get it (Windows Live ID required) and join the Gemini program by clicking the link at the top of the page and filling in a short questionnaire. There's also a Gemini blog here.

Why should you be interested? Well there is a good demo - in two parts - presented by Donald Farmer on here giving an indication:

20 000 000 rows in a spreadsheet being manipulated as if it was 20 000. The hardware running it? You'll have to watch to the end of the second part...

Anyway as I say I've joined the CTP program so when I get the chance I'll be downloading it and giving it a whirl.

Wednesday 19 August 2009

VMWare networking and Windows firewall

Couple of days ago a colleague and I were trying to set up 3 virtual windows 2008 servers in a domain (on VMWare's ESX as it happens). One domain controller (DC), one web and a database. Simple enough you'd think.

So we had a virtual switch set up routing to the DC with DHCP running and the other two servers set to dynamically pick-up IP addresses. However try as we might we couldn't get them to 'see' the DC using ping etc, yet everything looked OK on the switch and the TCP/IP settings on the servers. So we scratched our heads for a while - and I left for a meeting saying 'its as though we have a loose bit of virtual CAT5...'.

When I returned my colleague had cracked it. As you may have guessed from the title of this post it was Windows firewall all along. Turning it off on all 3 servers brought the network into life. Bit of a forehead-slapping 'doh' moment...

Saturday 30 May 2009

SQL Agent job schedules

Last week I found myself reviewing a SQL Server (2000 vintage no less!). As part of the review I was asked to document the Agent jobs - when they ran, how long for etc, etc. Unfortunately there were well over 100 of them. Clearly a query was called for - and to save me from having to repeat the exercise I'll reproduce it here:

This is the 2000 version:

select j.name,
case freq_type when 1 then 'Once'
when 4 then 'Daily'
when 8 then 'Weekly'
when 16 then 'Monthly'
when 32 then 'Monthly Relative'
else 'On start' end as TypeOfFrequency,
case freq_type when 1 then 'Once'
when 4 then 'Daily'
when 8 then 'Weekly'
when 16 then 'Monthly'
when 32 then 'Monthly Relative'
else 'On start' end,
case when freq_type=4 then
case when freq_interval=1 then 'Every day'
else 'Every '+cast(freq_interval as varchar(2))+' days' end
when freq_type=8 then 'Every '+
left(
case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end,
len(case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end)-1)
when freq_type=16 then 'On day '+cast(freq_interval as varchar(2))+' of the month'
else 'The '+
case freq_relative_interval when 1 then 'first '
when 2 then 'second '
when 4 then 'third '
when 8 then 'forth '
else 'last ' end
+'day of the month'
end,
case freq_subday_type when 1
then 'At ' + reverse(left(reverse(next_run_time), 2)+':'
+ substring(reverse(next_run_time), 3, 2)+':'
+ substring(reverse(next_run_time), 5, 2))

when 2 then 'Every '+cast(freq_subday_interval as varchar(1000))+' seconds'
when 4 then 'Every '+cast(freq_subday_interval as varchar(1000))+' mins'
else 'Every ' +cast(freq_subday_interval as varchar(1000))+' hours'
end as 'Daily execution (hhmmss)',
avg(run_duration) AvgDuration,
min(run_duration) MinDuration,
max(run_duration) MaxDuration,
count(*)
from sysjobs j
inner join sysjobhistory h on j.job_id=h.job_id
inner join sysjobschedules s on j.job_id=s.job_id
group by
j.name,
case freq_type when 1 then 'Once'
when 4 then 'Daily'
when 8 then 'Weekly'
when 16 then 'Monthly'
when 32 then 'Monthly Relative'
else 'On start' end,
case when freq_type=4 then
case when freq_interval=1 then 'Every day' else 'Every '+cast(freq_interval as varchar(2))+' days' end
when freq_type=8 then 'Every '+
left(case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end,
len(case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end)-1)
when freq_type=16 then 'On day '+cast(freq_interval as varchar(2))+' of the month'
else 'The '+
case freq_relative_interval when 1 then 'first '
when 2 then 'second '
when 4 then 'third '
when 8 then 'forth '
else 'last ' end
+'day of the month'
end,
case freq_subday_type when 1 then 'At ' + reverse(left(reverse(next_run_time), 2)+':'
+ substring(reverse(next_run_time), 3, 2)+':'
+ substring(reverse(next_run_time), 5, 2))
when 2 then 'Every '+cast(freq_subday_interval as varchar(1000))+' seconds'
when 4 then 'Every '+cast(freq_subday_interval as varchar(1000))+' mins'
else 'Every ' +cast(freq_subday_interval as varchar(1000))+' hours'
end

order by j.name

Nasty huh! Not sure why bitmask values are being used where they aren't combined (ie freq_relative_interval) but anyway... Note also the smart-alec use of the bitwise OR operator to get the days of the week.

This is the 2005/8 version. Note that since you can now have multiple schedules per job, you might get multiple rows per job - hence the use of 'ScheduleName':

select j.name,
s.name as ScheduleName,
case freq_type when 1 then 'Once'
when 4 then 'Daily'
when 8 then 'Weekly'
when 16 then 'Monthly'
when 32 then 'Monthly Relative'
else 'On start' end as TypeOfFrequency,
case when freq_type=4 then
case when freq_interval=1 then 'Every day' else 'Every '+cast(freq_interval as varchar(2))+' days' end
when freq_type=8 then 'Every '+
left(
case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end,
len(case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end)-1)
when freq_type=16 then 'On day '+cast(freq_interval as varchar(2))+' of the month'
else 'The '+
case freq_relative_interval when 1 then 'first '
when 2 then 'second '
when 4 then 'third '
when 8 then 'forth '
else 'last ' end
+'day of the month'
end as Interval,
case freq_subday_type when 1 then 'At ' + reverse(left(stuff('000000', 6-len(cast(active_start_time as varchar(6))), len(cast(active_start_time as varchar(6))), cast(active_start_time as varchar(6))), 2))+':'+
substring(reverse(stuff('000000', 6-len(cast(active_start_time as varchar(6))), len(cast(active_start_time as varchar(6))), cast(active_start_time as varchar(6)))), 3, 2)+':'+
substring(reverse(stuff('000000', 6-len(cast(active_start_time as varchar(6))), len(cast(active_start_time as varchar(6))), cast(active_start_time as varchar(6)))), 1, 2)
when 2 then 'Every '+cast(freq_subday_interval as varchar(1000))+' seconds'
when 4 then 'Every '+cast(freq_subday_interval as varchar(1000))+' mins'
else 'Every ' +cast(freq_subday_interval as varchar(1000))+' hours'
end as 'Daily execution (hhmmss)',
avg(run_duration) AvgDuration,
min(run_duration) MinDuration,
max(run_duration) MaxDuration

from sysjobs j
left outer join sysjobhistory h on j.job_id=h.job_id
left outer join sysjobschedules js on j.job_id=js.job_id
left outer join dbo.sysschedules s on js.schedule_id=s.schedule_id
group by
j.name,
s.name,
case freq_type when 1 then 'Once'
when 4 then 'Daily'
when 8 then 'Weekly'
when 16 then 'Monthly'
when 32 then 'Monthly Relative'
else 'On start' end,
case when freq_type=4 then
case when freq_interval=1 then 'Every day' else 'Every '+cast(freq_interval as varchar(2))+' days' end
when freq_type=8 then 'Every '+
left(case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end,
len(case when 1|freq_interval=freq_interval then 'Sunday, ' else '' end
+case when 2|freq_interval=freq_interval then 'Monday, ' else '' end
+case when 4|freq_interval=freq_interval then 'Tuesday, ' else '' end
+case when 8|freq_interval=freq_interval then 'Wednesday, ' else '' end
+case when 16|freq_interval=freq_interval then 'Thursday, ' else '' end
+case when 32|freq_interval=freq_interval then 'Friday, ' else '' end
+case when 64|freq_interval=freq_interval then 'Saturday,' else '' end)-1)
when freq_type=16 then 'On day '+cast(freq_interval as varchar(2))+' of the month'
else 'The '+
case freq_relative_interval when 1 then 'first '
when 2 then 'second '
when 4 then 'third '
when 8 then 'forth '
else 'last ' end
+'day of the month'
end,

case freq_subday_type when 1 then 'At ' + reverse(left(stuff('000000', 6-len(cast(active_start_time as varchar(6))), len(cast(active_start_time as varchar(6))), cast(active_start_time as varchar(6))), 2))+':'+
substring(reverse(stuff('000000', 6-len(cast(active_start_time as varchar(6))), len(cast(active_start_time as varchar(6))), cast(active_start_time as varchar(6)))), 3, 2)+':'+
substring(reverse(stuff('000000', 6-len(cast(active_start_time as varchar(6))), len(cast(active_start_time as varchar(6))), cast(active_start_time as varchar(6)))), 1, 2)
when 2 then 'Every '+cast(freq_subday_interval as varchar(1000))+' seconds'
when 4 then 'Every '+cast(freq_subday_interval as varchar(1000))+' mins'
else 'Every ' +cast(freq_subday_interval as varchar(1000))+' hours'
end

order by j.name

Sunday 17 May 2009

Kilimanjaro now 2008 R2 and Madison CTP out in July

Kilimanjaro, previously the code name of the next release of SQL Server is now SQL Server 2008 R2.

The list of features is pretty long for an 'R2' - the link above provides the full details, but as a hopelessly biased BI chap, the most interesting one for me is the technology previously known as Gemini. This is MS's answer to the column-orientated 'in-memory' analysis engines of vendors such as QlikTech. The ability to be able to provide users with an ability to analyse millions of rows of data through Excel at 'lightening fast speeds' without needing to create cubes, write MDX etc is an interesting one, to say the least. No confirmed date for this but Mary-Jo Foley reckons it will be July.

For me though the most exciting announcement is that a Madison CTP is also planned for July. The lack of a compelling scalability story has always been a bit of a chink in SQL Server's armour, particularly when compared to vendors such as Teradata and Netezza, and I will be all over this when it comes out.

What both of these announcements show is that SQL Server is forging ahead, addressing weaker product areas and - providing the CTP's ship on time - on schedule.

Sunday 29 March 2009

Microsoft Connectors for Oracle and Teradata by Attunity

Thanks to Euan Garden for mentioning that MS have licensed Attunity's Oracle and Teradata 'Connect adapters' (ie database providers) for use with SSIS 2008 (Attunity press release here).

I've known about Attunity's providers for a while, but the licensing deal is great news, because paying for database providers is a difficult case to make. Anyone using Oracle for a source or destination in a SSIS package should investigate using these since, as I have posted before, there are a number of bugs with Oracle providers. I've yet to try the Attunity providers myself so I'll post again when I've tried them out. However, its hard to see they could be any worse than Oracle's.

Anyway you can download them here.

Thursday 26 February 2009

Deploying Reporting Services programmatically - shared data sources

Hello again! After a longish hiatus I'm back posting fairly useful stuff - so I don't lose/forget/have to do it again in another job.

The first useful snippet covers deploying Reporting Services reports using the rs utility on the command line. This will be necessary in situations where you can't deploy directly from BIDS - which to my mind isn't good practice when you are outside of the development environment. Although there are various examples of doing this there aren't many using VB.NET (which you have to use with rs) and none covering my requirements.

As you may know when using the rs utility to deploy (technically create) reports on a Report Server it does not maintain references to shared data sources even when they exist in the same relative position given in the Reporting Services project. This means you have the tricky job of adding those references programatically after using CreateReport to create the reports.

Tricky? Well it is if your reports have multiple different data sources. If you try adding a data source to a report that doesn't know about that data source (ie hasn't got a reference to it in the .rdl) you will get an error. Therefore you have to work out what data sources that report has a reference to and relate those references to the shared data sources with the same name elsewhere in Reporting Services. A complicating factor is that you can't (or at least I couldn't) simply map the data sources obtained from the report to the shared data source location. Which is way I ended up looping around the datasources retrieved from the report definition in the code below. Anyway it works and that's good enough for me...

In order to use the example create an empty .rss file in Notepad and copy and paste the following code and save it as UpdateDataSources.rss.

Public Sub Main()

rs.Credentials = System.Net.CredentialCache.DefaultCredentials

Dim item as CatalogItem
Dim items as CatalogItem()

Try
items=rs.ListChildren(ReportFolder, False)

For Each item in items

If item.Type = 2 'ie Report

Dim dataSources() as DataSource = rs.GetItemDataSources(item.Path)

For Each ds as DataSource in dataSources

Dim sharedDs(0) as DataSource
sharedDs(0)=GetDataSource(SharedDataSourceFolder, ds.Name)

rs.SetItemDataSources(item.Path, sharedDs)
Console.WriteLine("Set " & ds.Name & " datasource for " & item.Path & " report")

Next

end if

Next

Console.WriteLine("Shared data source reference set for {0} reports.", ReportFolder)

Catch e As SoapException
Console.WriteLine(e.Detail.InnerXml.ToString())
End Try

End Sub

Private Function GetDataSource(sharedDataSourcePath as string, dataSourceName as String) as DataSource

Dim reference As New DataSourceReference()
Dim ds As New DataSource

reference.Reference = sharedDataSourcePath & "/" & dataSourceName
ds.Item = CType(reference, DataSourceDefinitionOrReference)

ds.Name = dataSourceName

GetDataSource=ds

End Function

Use it with rs on the command-line as follows:

rs -i UpdateDataSources.rss -v ReportFolder=[Folder Location containing the reports to be updated] -v SharedDataSourceFolder=[Folder location containing the shared data sources].