Sunday, 4 November 2012

SSIS release and Source Control

This post was inspired by a post by the inestimable Jamie Thomson here and the associated comments, in which  I rashly said I'd try and explain the source control model in use where I currently work. Source control is one of the first things I try and organise when starting a project, and given there are perfectly good free source control systems, with Visual Studio integration  there's no real excuse not to use one. So assuming you're using one how do you use it?

The simplest way is to simply start randomly adding projects to source control, however this creates the problem of creating mulitple working folders and after a while it becomes really difficult to find things. So the first tip is fairly obvious;

Put all SSIS packages in one folder and have one working folder

Put all of the SSIS packages in one folder. Call the folder 'SSIS', 'ETL' or 'Stuff that moves data around', it doesn't matter. Also split your packages into multiple projects which go seperate sub-folders. Finally set the root of your source control to a working folder so that all of the sub-folders on your file-system match with source control.

Branch before deploying your packages

Assuming you're not developing on your live environment (if you are, I strongly recommend you stop - even if it means using you're own PC or laptop as the development environment), at some point you're going to need to deploy your packages to another server (hopefully a test server).

This creates a potential problem. You have two copies of the package - one on the development machine and one on the test (or live) machine. However, there is only one in source control. This is obviously fine until you need to change the development package. As soon as you do you've 'lost' the test/live version from source control. Of course this isn't quite true as one of the advantages of source control is the ability to keep the history of your packages. However, if you encounter problems with your live version you have to roll-back to the live version in the development environment (assuming you can easily tell which one this is).

If you are in a team, even a small one, this could then hold up development, if others need to change the same package. If you imagine this problem replicated across dozens of packages you can easily end up in a big muddle with package versions being rolled back and forward with no-one knowing which version is which.

The solution is to branch from development folder before deploying to another machine. This is very easy in TFS and Subversion - I'll demonstrate in TFS;

Step one

Select the main development folder (TFS calls this 'trunk' by default). Right-click on it and select 'Branch...'


Step two

Simply name the branch. Its best to add it under a seperate 'branches' folder.




The advantage of this is that you now have two packages that reflect the machines they are on. If you need to fix the live package, you can without disturbing the development version of the package, or preventing the onward development of that package. There is no need to go back and restore previous versions. You also have a complete 'snapshot' of all packages on the live environment, which means that you can recreate it, re-release it, migrate it and so on with ease.

Only release packages that are 'ready'

The other advantage of branching in this way relates back to Jamie's original post. It gives you the ability to release packages that are ready for release, instead of all packages under development, including those that have only just been started.

We delete all packages from the target server before releasing; this has a number of advantages;
 
1. Any packages that are dropped from the source control release branch becuase they are no longer needed are also dropped from target servers. This means the target exactly mirrors the source control release branch.We do this by using the dtutil /delete command (we're still on 2008 R2) and some good old-fashioned batch script, to get us a list of packages to delete;
dtutil /SourceServer "<target server>" /FDirectory SQL;<target SSIS folder>;S > packagelist.txt

type packagelist.txt | findstr /v package(s) > packagelist2.txt

type packagelist2.txt | findstr /v successfully > packagelist3.txt

FOR /F "skip=6 tokens=4" %%G IN (packagelist3.txt) DO dtutil /SourceServer "<target server>" /SQL MGP_BI\%%G /DELETE
2. We can be sure that the release is deploying all of the packages. Any packages snuck in by any 'back-doors' will get overwritten by their source control-derived cousins. Anything developed outside of source control will equally get removed.

3. Finally, we can use the same process to build or migrate to a new SSIS server.
 
Merging packages

This simple branching technique does give rise to another problem however. At some point you have to make sure that the development and live/release branches are the same. Fixing the live package is all very well but obviously the fix needs to be applied to development as well - otherwise the fix will get overwritten when the development package is released.

This is where merging comes in - and for SSIS it can prove tricky, however if you stick to the following you won't go far wrong;

Try merging the files using your favourite source control client. If you have a package file in a release branch that has been changed and the development package has not been altered since the release version was changed the merge is simple. If you have BIDS and TFS;

1. Right-click the release package that has just been changes.

2. Click 'Merge..'

3. In the Merge dialogue you will see the Source branch (ie the release package) and in the Target branch you will see the development (or 'trunk') package.



4. Click 'Next', 'Next' again and 'Finish'.

5. You'll find that the development package in the 'trunk' folder has a merge check-out on it. Check it in and you will find the development package has been updated with the release branch changes.

You could of course manually make the same changes in both packages, but in this case TFS handles the change gracefully by simply overwriting the development package, ensuring there are no mistakes made. This of course makes the assumption that the live package contains all of the development package information plus the fix. This is reasonable as the release package is a copy of the development package in the first place.

In the next example the development package has been changed after the live package.

In this case you will get a merge conflict - TFS can't tell which bits of the development package need changing and which bits need to be updated with the live package. In this case do not attempt to merge the files using the merge conflict resolution tool in TFS. Simply cancel the merge. The changes have to be manually worked out and applied to both packages using BIDS. Generally speaking this happens when different developers have been working on the same package, therefore they need to collaborate and make sure all of the appropriate changes are included. In big teams it is often worth making one individual responsible for ensuring that changes to the release branch are included in the development branch. They can even set up an alert in TFS so that get a mail when such live changes occur.

No comments: