New SSIS Whitepapers from the SQLCAT Team

Two new SSIS SQL Server Technical Articles written by the SQLCAT team have been posted to the MSDN Library. They both focus on hybrid/cloud data movement, but they also contain design patterns and guidance that applies to on-premises data flows as well. The SSIS Operational and Tuning Guide has a great section on designing a package for restartability and performance tuning that all SSIS users will find useful.

SSIS Operational and Tuning Guide

SQL Server Integration Services (SSIS) can be used effectively as a tool for moving data to and from Windows Azure (WA) SQL Database, as part of the total extract, transform, and load (ETL) solution and as part of the data movement solution. SSIS can be used effectively to move data between sources and destinations in the cloud, and in a hybrid scenario between the cloud and on-premise. This paper outlines SSIS best practices for cloud sources and destinations, discusses project planning for SSIS projects whether the project is all in the cloud or involves hybrid data moves, and walks through an example of maximizing performance on a hybrid move by scaling out the data movement.


SSIS for Azure and Hybrid Data Movement

SQL Server Integration Services (SSIS) can be used effectively as a tool for moving data to and from Windows Azure SQL Database, as part of the total extract, transform, and load (ETL) solution and as part of the data movement solution. SSIS can be used effectively to move data between sources and destinations in the cloud, and in a hybrid scenario between the cloud and on-premise. This paper outlines best practices for using SSIS for cloud sources and destinations and for project planning for SSIS projects to be used with Azure or hybrid data moves, and gives an example of maximizing performance on a hybrid move by scaling out the data movement.


Did you know that the DQS team also published a whitepaper on using DQS with AlwaysOn? It’s another interesting read.

SQL Saturday #146 | Nashua, New Hampshire


I’ve been invited to speak at SQL Saturday #146 in Nashua in October. I’ll be presenting two talks related to Data Movement:

Efficient On-Premise to Cloud Data Transfer

Thinking about moving some of your operations to Azure? Have multiple remote sites, and want to use the cloud to centralize and share data between them?? Just like hearing talks about data transfer performance?! Have we got the session for you! We’ll cover some common user scenarios, and describe when and how to use the latest Microsoft data transfer technologies, including SQL Server Integration Services (SSIS), SQL Data Sync (a capability of SQL Database), and more.

EIM – Bringing Together SSIS, DQS and MDS

Enterprise Information Management (EIM) is an industry term for managing your data for data integration, quality, and governance. This session revolves around a demo which brings together the EIM functionality in SQL Server, a key part of our Credible, Consistent Data story for the 2012 release. We will show you how SQL Server Integration Services (SSIS), Data Quality Services (DQS), Master Data Services (MDS) and other Microsoft technologies work together to provide a comprehensive EIM solution.

Both are intermediate (200-300) level talks, and will cover multiple technologies. I’ll be using this as a venue to preview of some of the new content I’ll be presenting at the PASS  Summit in November. I’ll most likely end up in SSIS Design Pattern book co-author Andy Leonard’s (Blog | Twitter) two SSIS sessions, as well – I always love to see other people present SSIS topics, and Andy is one of the best SSIS presenters around.

The full schedule is now available on the event site. If you’re at the event, please come by and say hello! I’ll be happy to sign your ebook edition of my two SQL 2012 books.

The timing is perfect, as I’ll be presenting to the New England SQL Server User Group a couple of days before… Hope to see you at both events!

Getting started with SQL Data Sync

Here’s the scenario – you have an on-prem SQL database that you’d like to expose to a number of internet based applications. This could be Windows Azure Reporting Services reports, web applications, Windows Azure based apps, or mobile clients… it’s might even be all of the above. Now, you could make SQL Azure your primary database location, however, chances are you have processes and business apps running on-prem that are accessing (and perhaps generating) the data within this database.

In comes SQL Data Sync – a web service in the Azure platform that allows one-way or bi-directional syncing of on-prem SQL databases, and SQL Azure databases. It also allows SQL Azure <-> SQL Azure syncing. This post provides a step-by-step look at how to get started with the service. For more information, you can also take a look at the Getting Started entry in Books Online.

[warning]SQL Data Sync is currently in a technology Preview state.[/warning]

[warning]This steps and user interface described in this post is current as of 2012-07-03.[/warning]

The first step to using SQL Data Sync is to provision yourself a server. Log into the Windows Azure Platform management portal, and click the Data Sync button on the left hand pane.


In the center of your window, you’ll see the Provision option.


Click on the Provision button to setup your service account. Select your subscription, and the region you’d like to host your hub.


Once you have provisioned the server, you will create your Sync Group.


Step 1 is to specify a name for the sync group. In this scenario, I’d like to sync an on-prem AdventureWorksDW database to the cloud, so I’ll name my sync group accordingly.


Once you’ve set the name, Step 2 is add a SQL Server database to the sync group.


Since this is the first Sync Group we are creating, we’ll be adding a new SQL Server database to the sync group. The sync direction specifies how the data is being moved:

  • Bi-Directional – data flows both ways between the Hub and the client
  • Sync to the Hub – data flows to the hub only. The hub acts as a mirror for changes made on the client.
  • Sync from the Hub – data flows to the client only. The client acts as a mirror for changes made to the Hub.


In this scenario, I’d like to mirror my on-prem database to make it available to Cloud applications, so I select “Sync to the Hub”.

On the next page, I need to select a Client Sync Agent. Since I don’t have one already installed, I select the “Install a new Agent” option.


I now download the agent binary to my local machine, enter a name, and generate an Agent Key.

Agent Configuration

Clicking the Download button brings you to a download page, with the option to download and install the agent msi. During the installation process, you are prompted for an account which has network access. This becomes the account which will run the agent service.


Once the agent is installed, you will see a new “Microsoft SQL Data Sync” service created on your machine. Before you can start syncing, you will need to launch the Agent Configuration Tool (“Microsoft SQL Data Sync Agent Preview”).


Launching the newly installed Agent application brings up the configuration UI.


Back to the Windows Azure Management Portal to generate a key for the agent.


Copy the key from the management portal, and go back to the Agent configuration tool. Click the Submit Agent Key button and enter the key.


Now your Agent is linked to the Sync Group you are creating. We can proceed with making databases available.


Click Register to add a new database. You’ll enter the SQL instance information, and the database name. You have the option of using SQL Authentication (i.e. a user name and password), or to use Windows Authentication and connect to the database using the SQL Data Sync Agent service account.


Once a database has been configured, we can go back to the Management Portal.

On the next step, we will add a SQL Server database to our sync group.


Click the Get Database List button to retrieve the list of databases you registered with the SQL Data Sync Agent. Select your database (in this example I am selecting AdventureWorksDW).

Step 3 will be to configure a Windows Azure SQL Database to use as the Sync Hub.


Click the database icon to enter your Windows Azure SQL database information.

Note, for the Test connection to work, I had to enter:

  • Fully qualified server name (i.e. <server>
  • Database name (i.e. AdventureWorksDW_Sync)
  • Credentials
    • Username without @<server>
    • Password

Step 4 is to enter the configuration details.


For the configuration, you can specify the sync schedule, as well as the Conflict Resolution if you are doing Bi-directional syncing. In this example, I don’t expect to have any conflicts (as it’s a one way push to SQL Azure), but I’ll select Client Wins.

Step 5 is to define the dataset you will synchronize.



When we select our source database (in this case, the on-prem AdventureWorksDW database), we see that we have some tables that can’t be properly synced to SQL Azure. Clicking on the See the Report link brings up details.


For this example, we can ignore the tables that can’t be synced. Clicking Select All will automatically skip tables with incompatible schemas.


At the bottom of the UI, we can see a Row Filter area. Filtering allows us to sync only certain records… for example, we might want to only sync a single year of data from our fact table. This is essentially adding a WHERE clause to our sync query. To do this, we would select the table we want to filter, and click the Filter checkbox next to the column we are adding the WHERE clause on.


Step 6 is to “Deploy” the changes to the server.


Once you click Deploy, SQL Data Sync completes the configuration of the sync group. The selected schema will be created in your SQL Azure database, and the initial set of data will be moved over.


Your newly created Sync Group now appears underneath your Subscription on the left hand side of the Management Portal. On the right, you can see the dataset sync definition, including any filters you’ve defined.