Thoughts on Branching Strategies for SSIS Projects

A question came in recently from a customer about branching strategies for SSIS projects. Their development team has been using SSIS for years, but have not been using any form of source control software up until now. They felt that the package format improvements in SQL Server 2012, which were focused on making it possible to diff/merge package files, were a good motivation for them to adopt this best practice.

I gave the following high level guidance:

  • Treat your SSIS packages as if they were source files from a compiled language (i.e. C#)
  • Think of your SSIS project as an “Application” (I frequently refer to SSIS projects as “Data Integration Applications” in my talks)
  • There is a lot of guidance out there on software development lifecycle best practices, most of which applies to SSIS development
    • It may take some experimenting to determine what methodology works best for your project and team

I’ve always liked Jamie Thomson’s posts about his experiences with SSIS Team Development (as well as general best practices and naming conventions that apply here as well). The changes we made in SQL 2012 alleviate the need for some of these best practices, but many of them still apply.

In regards to specific branching strategies, it’s common to have three core branches.

Development

This is the branch developers work out of. Developers are responsible for testing changes, and making sure the project is fully functional in this branch. If a developer is making significant changes that would be disruptive to everyone else, or cause the project to be broken for any amount of time, then you should create a sub branch for them to work out of. This branch should be regularly merged (weekly?) to the Integration branch so that issues not caught during developer testing can be identified early.

Integration

The builds from this branch go up to your test environment. This is where you’d do signoffs before the project is moved to production. Most changes coming into this branch should be coming from Development – the only time you’d directly check something in here is during the stabilization period (and this should be rare). Once QA have signed off on the release, you’d merge this to your Main branch.

Main

This is what gets released to production. You should use build tags to identify specific versions/releases. Nothing should be checked directly into this branch without first going through Integration testing.

Note, this branching strategy assumes a single team/project. If you have multiple teams/projects, you need to decide if you are sharing a common Main, or if everyone goes into the same Integration branch (and whether there are multiple levels of Integration). Within the SQL org, for example, we’d have a “Release” branch past Main that we actually do the external builds from (it also allows us to do servicing builds post-RTM).

(The actual SQL branching strategy is a lot more complex than that, but this is the basic idea!)

Are you currently working with SSIS in team environment? Are you using source control? I’d love to hear any recommendations you might have.

11 thoughts on “Thoughts on Branching Strategies for SSIS Projects

  1. Hi Matt

    We are using SSIS 2012 developing in a team of 4 developers. We are using the new SSIS project deployment model. It seems that many operations, including changes that should be isolated to a single package, result in a need for the dev. environment to check out the project file. Upon inspection of the project file (.dtproj), it appears that it contains a lot of package specific properties – I guess that is why this file needs to be checked out when editing individual packages. This setup does not seem to be compatible with being multiple developers working on the same project. Do you have any recommendations on how to solve this?

    Thanks.

    Michael Barrett

    1. (Sorry for the delay – not sure how I missed your original comments)

      Yes, package parameters and VS Configurations are also tracked in the .dtproj file. Ideally we'd split these into separate artifacts, but unfortunately we didn't get to that for SQL 2012 RTM. Once your parameters are defined, you shouldn't see too much traffic on the .dtproj file.

  2. I can confirm the point Michael Barrett makes: constant changes in the .dtproj file don't work well with any source control system.

    That's also a reason why Visual Studio configurations are actually unusable in SSIS projects. The configurations (Debug/Development/Release for instance) are saved in the project file (very bad idea IMO, but is generally a VS-Feature, not something the SSIS Team is responsible for) and of course differ from one development machine to another. If I add a new package to the project, I need to check-in (commit) the .dtproj file, thus also commiting my configurations. Each developer making an update after my checkin now gets a merge conflict for the .dtproj file, which needs to be solved manually: he DOES want to have the new package in the .dtproj file and he DOESN'T want to overwrite his configurations with mine.

    Layout information which was saved in the same .dtsx file as the actual logic has always been a pain in the neck (high probability of conflicts which need to be resolved manually), but at least it was placed at the end of the file, so that a human being could easily say whether one needs to study the changed logic or can just reapply the changed layout information.

    I would say that the SSIS 2012 continues the very source-control unfriendly behavior putting data which don't belong together in one location (like saving packet parameters, packet versions, packet descriptions etc. in the project file). Unfortunately 🙁

    1. Yeah – we tried to break out as much as possible into separate files, but we weren't able to get it all into SQL 2012. The metadata stored in the project file (parameters, package information, etc) becomes the manifest for the compiled .isproj file. We cache it so we don't have to open up each package at compile time to extract the information we need. I'd eventually like to see this moved into its own file which can simply be excluded from source control.

      Note that merging layout changes is painful. If you don't mind doing an auto-layout post merge, you can actually ignore or delete the layout information at the bottom of the file.

      1. Hi Matt,

        concerning layout: I can of course speak only of my experience and experience of our clients, but almost nobody merges the layout manually. When there are conflicts in the layout – one developer had another screen resolution, another one has several additional pads fixed left or right in Visual Studio (like Solution Explorer or SSIS Variables), so they adjusted the layout to be most readable on their screens – the developer which needs to resolve the conflict just takes his local version of the layout (or the version in the repository, this really doesn't matter so much). But I still need to open the file, check that these were only layout changes and copy in the diff dialog the section of my layout (or your layout, doesn't matter).
        The point I'm trying to make is that layout is visual, client-dependent information. It can be even regenerated by BIDS/SSDT automatically. It's pretty unfortunate that this information is placed in one file with the actual package logic and human intervention is needed each time the decision needs to be made whether it was a code-relevant change or not.
        Comparing to .NET/C# development, Visual Studio guys made a great usage of partial classes and moved the layout, generated by the designer into *.Designer.cs files. BIDS/SSDT would definitely benefit from a similar, modular approach.

        I’ve made a MS Connect item for this issue 🙂 https://connect.microsoft.com/SQLServer/feedback/

  3. Hi Matt,

    Are there any changes in SQL Server 2014 that would relate to your post/comments?

    I like your approach and I must admit it is the only one I feel comfortable with BI projects.

    Here are a few questions:
    1) Would you move dev to integration items individually? In here I have a scenario in mind where you have integration 5 files and dev has 3 changed files but only 2 should be part of next release. Would you move only 2 of them to integration branch?
    2) Personally I prefer to avoid merges so would it be a reasonable approach to 'overwrite' files when they are moved from dev >> Integration >> Main? Under a condition that "next" branch item has not been modified?
    3) Bug fixes.
    a)I like your approach as this means I can potentially do fix in dev and than push to integration and main under a condition that "previous" branch item has not been modified and that gives me a question. Is that a reasonable approach?
    b) If you have item in integration that requires a fix and dev item has been "improved" than would you try to merge (with more friendly tools than text merge) integration fix into dev item without interrupting new development or would you just apply a fix manually in both items without merge and on next occasion 'overwrite" integration item?
    4) I presume with your approach if you have SSIS/SSAS/SSRS multiple projects than it still doesn't matter as you just move changes to the next step?

    Are you aware of any tutorials/articles that would show steps for your approach (ideally with multiple different BI projects).

    I hope my questions make sense if not let me know I will try to re-phrase it.

    p.s. I always avoided branching for BI (apart from keeping separate files for releases) but it seems I cannot always avoid it and your approach gives me hope that I found what I was looking for.

    Take care
    Emil

  4. Hi Matt

    Has the VS handling of .dtproj files changed with SQL Server 2014, with the new SSDT tools for SQL 2014?

    Regards
    Daniel

Comments are closed.