Archive for the ‘SSIS’ Category

h1

Task Factory Advanced Execute Package Task

April 30, 2012

If you’ve ever developed any kind of SSIS auditing framework, chances are you’ve wanted to pass variables from the child package to the parent package.  As you know it can be accomplished, but it takes some hoop jumping including writing some code in a script task.  As I’ve said before, scripting is not my strong suit, so if there is a way to work around it, I’ll give it a try.

Thankfully Pragmatic Works has made the work around unnecessary with the new component in the ever growing list of SSIS components.  The Advanced Execute Package Task component allows you to easily configure the passing of variables between packages in both directions.

image

The component is easy to configure.  Once you’ve selected the location of the package you want to execute, either SQL Server of File System, the next step is to assign the direction of the variable mapping.  There are three options, read from child package, write to child package, read and write variable from child package.

image

Once that is selected (in this case I’m going to read the variable from the child package) the next step is to select the variable from the child package you want to read.  The nice thing that the Advance Execute Package Task does, is that it only allows you to map variables of the same type. This means that if you choose a string variable in the child package you can only choose string variables in the parent package.

image

Once the variable mapping is set the way you want it, click the add Mapping button.  This adds the mapping to the package and you can now configure another variable.

image

With this component it is very easy to pass variables back and forth between packages.  Don’t take my word for it, download Task Factory for yourself and try it out.

h1

SSIS For Loop Containers Part 2

March 30, 2012

    In the continuation from my previous post on For Loop Containers I’ll give you a little more practical example of using the container.

    A “Watcher” package is used quite often for checking for the existence of a file in a folder or for a row to be inserted into a table. Lets set one up. Here’s what its going to do:

    1. Check for new records in a source table.

    2. If there are new records in the table, then rows will be inserted.

    3. If there are not any new records, the package will enter into a wait loop, and “sleep” for 1 min

    4. then iterate through the loop again.

    5. Exit the loop once the rows are inserted.

      1. First lets set up the variables we are going to need for this project.

        image_thumb[20]

        Next lets set drag a For Loop container on to the design pane and configure it. Unlike the first For Loop we set up we are going to leave the InitExpression and the AssignExpression blank, because as you remember from earlier they are optional. In the EvalExpression enter this:

        @dtMaxDestinationLoadDate >= @dtMaxSourceLoadDate

        That is all we have to do here. So far this is even easier to configure then the first one, but hold on it’s the use of the control flow that adds the complexity. We will be using a couple of execute SQL tasks to populate our two variable used in the EvalExpression.

        Here is the create Script for the Source and Destination Tables

        CREATE TABLE [dbo].[ForLoopSource](

        [ID] [int] IDENTITY(1,1) NOT NULL,

        [Name] [varchar](50) NULL,

        [LoadDate] [datetime] NULL

        ) ON [PRIMARY]

        CREATE TABLE [dbo].[ForLoopDestination](

        [ID] [int] IDENTITY(1,1) NOT NULL,

        [Name] [varchar](50) NULL,

        [LoadDate] [datetime] NULL

        ) ON [PRIMARY]

        I’ll leave it to you to populate the tables with data, but there are some name generator websites out there that you can get pretty big datasets from if you need them, or you can just pull the data from Adventure Works.

        Drag in an Execute SQL Task and configure your connection. Then enter this code:

        SELECT

        ISNULL(MAX([LoadDate]),‘1/1/1900’)MaxSourceLoadDate

        FROM

        [Demo].[dbo].[ForLoopSource]

        Once you’ve entered the query make sure to set the result set to single row, and then map the result set to the dtMaxSourceLoadDate variable. Then click ok.

        image_thumb[22]

        image_thumb[24]

        Do the same for the next Execute SQL Task but set up for the dtMaxSourceLoadDate.

        SELECT

        ISNULL(MAX([LoadDate]),‘1/1/1900’)MaxDestinationLoadDate

        FROM

        [Demo].[dbo].[ForLoopDestination]

        The next step is to set the conditions that determine which path the package should take. We do this by adding expressions on the precedence constraints.

        image

        When the expression for the constraint on the left @dtMaxSourceLoadDate>@dtMaxDestinationLoadDate evaluates to true, the package will proceed to the data flow task and load the new rows into the destination table.

        When the expression for the constraint on the right @dtMaxSourceLoadDate<=@dtMaxDestinationLoadDate evaluates as true, the package will proceed to the script task which puts the loop in a wait status before it returns to the top and iterates through again.

        Its in this script task that we use the third variable that we set up at the beginning intWaitTime.  Here is the VB.net for the script task

        Public Sub Main()
                Dim sec As Double = Convert.ToDouble(SnippetParameter(“User::intWaitTime”,””,””))
                Dim ms As Int32 = Convert.ToInt32(60 * (sec * 1000))
                System.Threading.Thread.Sleep(ms)
              Dts.TaskResult = ScriptResults.Success
              End Sub

        By making a variable to configure this instead of hard coding the value in the script task gives us the option to later on configure that variable in a table with package configurations. Once the wait is over the package loops again, until the destination table has a date that is greater then the max date in the source table.

        Inside the dataflow you source query should look like this.

        SELECT        Name, LoadDate
        FROM            dbo.ForLoop
        Where LoadDate > ?

        with the parameter mapped to @dtMaxDestinationLoadDate.

        I hope this helps you all get a better grasp on how For Loops can be used to enhance the functionality of you SSIS packages.

        Until Next Time…

      h1

      SSIS For Loop Containers Part 1

      March 29, 2012

      In a previous blog I talked about the Sequence Container, and some of its uses.  To continue with that theme I now bring you the For Loop Container. 

      The For Loop is one of two Loop containers available in SSIS.  In my opinion it is easier to set up and use than the For Each Loop, but it is just as useful.  The basic Function of the for loop is to loop over whatever tasks you put inside the container a predetermined number of times, or until a condition is met. The For Loop Container, as is true of all the containers in SSIS, supports transactions by setting the Transaction Option in the properties pane of the container to “Required”, or “Supported” if a parent container, or the package itself is set to “Required”

      There are three expressions that control the number of times the
      loop executes in the For Loop container.

      1. The InitExpression is the first expression to be evaluated on the For Loop and is only evaluated once at the beginning. This expression is optional in the For Loop Container.  It is evaluated before any work is done inside the loop.  Typically you use it to set the initial value for the variable that will be used in the other expressions in the For Loop Container. You can also use it to initialize a variable that might be used in the workflow of the loop.
      2. The EvalExpression is the second expression evaluated when the loop first starts. This expression is not optional. It is also evaluated before any work is performed inside the container, and then evaluated at the beginning of each loop.  This is the expression that determines if the loop continues or terminates. If the expression entered evaluates to TRUE, the loop executes again. If it evaluates to FALSE, the loop ends.  Make sure to pay particular attention to this expression.  I will admit that I have accidentally written an expression in the EvalExpression that evaluates to False right away and terminated the loop before any work was done, and it took me longer than it probably should have to figure out that the EvalExpression was the reason why it was wrong.
      3. The AssignExpression is the last expression used in the For Loop. It is used to change the value of the variable used in the EvalExpression. This expression is evaluated for each pass through the loop as well, but at the end of the workflow. This expression is optional.

      image

      Lets walk through setting up an example of the package. In this example we’ll create a loop that executes a given number of times.

      Create a new package and add two variables to it, intStartVal and intEndVal.

      image

      Next add a For Loop Container to the package and open the editor.  Assign the following values for the expressions:

      image

      That is all the configuring that is required for the For Loop Container.  Now lets add a Script Task that will display a message box with the value of the intStartVal variable as the loop updates the value of that variable. Here is the code to do that:

      Public Sub Main()
      '
      MsgBox(Dts.Variables("intStartVal").Value)

      '
      Dts.TaskResult = ScriptResults.Success
      End Sub

      Once that is done the package is ready to execute.

      image

      First Iteration

      image

      Second Iteration

      image

      Fifth Iteration

      image

      Complete

      image

      Now that the EvalExpression @intStartVal <= @intEndVal evaluated to false the package ends.  In part two of SSIS For Loop Containers I’ll go a little deeper in using a For Loop with some real world examples. Until then…

      h1

      An Alternative to the OLE DB Command to update Slowly Changing Dimensions.

      March 21, 2012

      In a previous post I detailed how to create a package that handles SCD Type 1 changes using SSIS.  The component that handles writing the updates to the destination table inside the data flow is the OLEDB Command. 

      image

      It is the only native component in the dataflow that can write update statements to the destination table. Unfortunately these update statements are done on a row by row basis.  The question is then, how do we accomplish batch updates in the SSIS package?

      The answer is pretty simple:

      • We first have to create a table stage the updates,  with the same structure as our destination table.
          CREATE TABLE [dbo].[StageDimUser](
          [UserSK] [int] NOT NULL,
          [UserAK] [varchar](50) NULL,
          [ScreenName] [varchar](50) NULL,
          [AccountCreateDate] [date] NULL,
          [TimeZone] [varchar](50) NULL,
          [Language] [varchar](50) NULL,
          [GeographySK] [int] NULL,
          [Hash_CD] [int] NULL)
      • Replace the OLE DB Command with another OLE DB Destination Component.

      image

      • Connect the new destination to our newly created stage table and insert all the updates to that table
      • Return to the control flow and add an execute SQL task that updates the destination table with the values from the stage table.  The execute SQL task will perform a Batch Update instead of updating row by row.  So even though we’ve added an extra step to the process, the resulting processing time should be much faster.
        UPDATE [Demo].[dbo].[DimUser]
        SET [UserAK] = StageDimUser.[UserAK]
        ,[ScreenName] = StageDimUser.[ScreenName]
        ,[AccountCreateDate] = StageDimUser.[AccountCreateDate]
        ,[TimeZone] = StageDimUser.[TimeZone]
        ,[Language] = StageDimUser.[Language]
        ,[GeographySK] = StageDimUser.[GeographySK]
        ,[Hash_CD] = StageDimUser.[Hash_CD]
        FROM StageDimUser
        WHERE DimUser.UserSK = StageDimUser.UserSK
      • Finally once the updates are done, truncate the stage table to prep the table for the next load.

      image

      h1

      Using Sequence Containers in SSIS

      March 21, 2012

      Not every package requires the use of a Sequence container, and not everyone likes to use them. For me personally they can be a huge help when developing and debugging SSIS packages.  Some of the most common uses for Sequence Containers are:

      • Group and control the execution order of a package.  For example loading a data warehouse, place all of the dimension load tasks inside the sequence container and allow them to run in parallel, then connect the sequence container to the fact table load to ensure that all the dimensions finish loading before the fact table loads.

      image

      • Grouping related tasks in a package to give a clearer “High Level” understanding of what a complex package is doing.

      • Easier debugging by allowing you to disable groups of tasks to focus package debugging on one subset of the package control flow.

      image

      • Grouping tasks that require similar property settings without having to set each individual task property.  Similarly it allows you to set  properties at the container level and not at the package level.

      • Providing scope for variables that a group of related tasks and containers use.

      • Create a transaction around all the components inside the container.  Meaning if one task inside the container fails all the work that was completed successfully prior to that task failing is rolled back and its like it never happened.

      Its this last point that I would like to look at in more detail.  To be quite honest I haven’t always been aware that you could create a transaction using a sequence container, and it would have made my life a bit easier. If you head over to MSDN you can find these articles covering transactions (1, 2, 3). All Microsoft Integration Services container types allow you to configure them for use in transaction with one of three settings.  MSDN describes them:

      • Required indicates that the container starts a transaction, unless one is already started by its parent container. If a transaction already exists, the container joins the transaction. For example, if a package that is not configured to support transactions includes a Sequence container that uses the Required option, the Sequence container would start its own transaction. If the package were configured to use the Required option, the Sequence container would join the package transaction.

      • Supported indicates that the container does not start a transaction, but joins any transaction started by its parent container. For example, if a package with four Execute SQL tasks starts a transaction and all four tasks use the Supported option, the database updates performed by the Execute SQL tasks are rolled back if any task fails. If the package does not start a transaction, the four Execute SQL tasks are not bound by a transaction, and no database updates except the ones performed by the failed task are rolled back.

      • NotSupported indicates that the container does not start a transaction or join an existing transaction. A transaction started by a parent container does not affect child containers that have been configured to not support transactions. For example, if a package is configured to start a transaction and a For Loop container in the package uses the NotSupported option, none of the tasks in the For Loop can roll back if they fail.

      This means that at the Package level, at the Sequence, For Loop and For Each Loop Container level as well as most if not all (sorry I haven’t checked them all, but you can if you want to) Control Flow containers you can configure them to use or ignore a transaction. Just by changing the TransactionOption property. By default the property is set to Supported, which I find rather convenient, because then I only have to set it at the Parent container level and every child will automatically pick up the transaction. I like it when I don’t have to change the property of a handful of tasks.

      So lets create a few examples.

      Lets start simple.  I’ve seen this example done by a few people, here and here, and if its good enough for them, its good enough for me. Lets create a package that creates a table and inserts a bunch of rows using multiple Execute SQL tasks.  Here are the SQL statements for the 5 tasks:

      CREATE TABLE [dbo].[TransactonTest](
      [PartNumber] [int] NULL CONSTRAINT uq_TransactionTest_PartNumber UNIQUE,
      [PartDescription] [Varchar] (200) null


      ) ON [PRIMARY]





      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (1
      ,'Spacely Sprokets')

      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (2
      ,'Cogswells Cosmic Cogs')


      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (3
      ,'Thing 1')


      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (3
      ,'Thing 2')

      image

      As you can see I’ve added a Sequence Container and put the Execute SQL tasks inside the container.  I’ve also set the TransactionOption for the Container to Required.  Now when I run my package the final insert should fail. 

      Before I could even get the package to fail for the reason I wanted to I got this error.

      Error: The SSIS Runtime has failed to start the distributed transaction due to error 0x8004D01B “The Transaction Manager is not available.”. The DTC transaction failed to start. This could occur because the MSDTC Service is not running.

      So to enable the service on my laptop I went tp ControlPanel->AdministrativeTools->Component Services->Computers->My Computer->Properties and started the Distributed Transaction Coordinator, and that solved my problem. However if you are using this in your work environment be aware that there are some hurdles to jump through regarding firewalls  

      So now back to the package.

      image

      As expected the 4th insert fails and the container fails.  If we look at my server you’ll see that the table didn’t get created.

      image

      See I told you no table named TransactionTest.  But if we go ahead and change the 4th insert to work…

      image

      image

      image

      Well there you go, it really does work.  Pretty cool little tidbit to have in your back pocket.  Now don’t go thinking that this will solve all of your problems, some people (SSIS Junkie, Matt Masson) and  have encountered issues with using transactions in some instances, so make sure to test this out thoroughly.

      This will work on package designs for loading a data warehouse star schema using Execute Package Tasks as well. 

      image

      As long as the tasks is set to supported, and the items inside the child package are set to supported as well the will inherit the transaction created by the parent container.

      I hope this opened up your eyes to the fact that you can use Sequence containers to do a number of useful things and that incorporating transactions inside your SSIS packages is possible and not to difficult to implement.

      h1

      Slowly Changing Dimension Type 1 with Upsert Destination from Pragmatic Works’ Task Factory

      March 19, 2012

      Over my last 3 blogs I’ve shown you a variety of ways to handle Type 1 changes in a Slowly Changing Dimension.  All of those design patterns can be created with components that come standard with SSIS.  This time I’m going to show you a 3rd party tool from Pragmatic Works, that replaces every component that we’ve used before except the Source Component, with just one component.

      image

      SELECT
      [UserAK]
      ,[ScreenName]
      ,[AccountCreateDate]
      ,[TimeZone]
      ,[Language]
      ,[GeographySK]
      ,CHECKSUM(
      [ScreenName]
      ,[AccountCreateDate]
      ,[TimeZone]
      ,[Language]
      ,[GeographySK]
      ) as Hash_CD
      FROM
      DimUserUpdate

      Now instead of adding the lookup and conditional split or the SCD component all we need to do is add the Upsert Destination and configure it and we are done.

      image

      The first thing we have to do is create a new connection manager.  The Upsert Destination requires an ADO.net Connection to the database instead of the standard OLEDB connection. Once you’ve set the connection and selected the Destination Table, you must identify the key column for the destination by placing a check in the box next to the correct column.  Once that is done, go to the Update Method Tab.

      image

      On this tab we can set several different ways to handle the update.  The first is the Bulk Update Method. This method is the fastest, but if you plan on doing any kind of row count logging you will not be able to track updated rows. The next option is the Column Compare Method.  This method will compare the values in the columns from the source with that of the columns in the destination and if a change exists it will update the column.  This method is slower but allows you to log row counts for both updates and inserts.  The next option is the Timestamp compare which will compare two timestamp values to see if a record has been updated.  Similarly the Last Update Compare uses a Last update column value.  For this package we are going to use the Column Compare method. Simply select the columns you want to use for the comparison using the check boxes down the left hand side.  Now you can click OK and be done, or move on to the Advanced Tab.  I’m going to set up a Row Count for updates so I’m going to the Advanced Tab next.

      image

      Simply map each row count variable to the correct drop down and then click OK.  I’m going to add a message box using a Script Task to the Control Flow just to show that the update row count works properly.

      You can add this little snippet to get the row count to show in a message box:

      Public Sub Main()

      MsgBox(Dts.Variables(“UpdateRowCount”).Value)


      Dts.TaskResult = ScriptResults.Success
      End Sub

      Now I’m just going to update the Update table so we actually get some updates going through.

      UPDATE DimUserUpdate SET GeographySK = GeographySK + 1
      
       WHERE [UserAK] in ('98939115','97454425','97202071','9567382','87627850')
      Now Execute the package
      image

      As you can see although 2,056 rows came through the source query, only 5 records were actually updated.  The Upsert Destination makes designing packages for handling SCD type 1 changes much faster and in my opinion easier.  If you get a chance I would suggest trying it out, as well as the multitude of other components that Task Factory has to offer.

      h1

      Slowly Changing Dimension Type 1 with Merge Join.

      March 19, 2012

      In my last 2 posts (1 and 2) I covered different methods for implementing SCD Type 1 changes in SSIS.  Today we will cover how to use the Merge Join design pattern to load SCD Type 1 columns.

      Right from the start the design is a bit different. Instead of adding your source query to an OLEDB Source component and then connecting it to the lookup or the SCD component, we create 2 OLEDB Source components. 

      image

      The first one is the same as in the previous two posts, it connects to our source table (UserUpdate).  However we have to modify the query a little bit to get it to work with the Merge Join component.  So lets take a look at that query.

      SELECT 
      [UserAK]
      ,[ScreenName]
      ,[AccountCreateDate]
      ,[TimeZone]
      ,[Language]
      ,[GeographySK]
      ,CHECKSUM(
      [ScreenName]
      ,[AccountCreateDate]
      ,[TimeZone]
      ,[Language]
      ,[GeographySK]
      ) as Hash_CD
      FROM
      DimUserUpdate
      Order By UserAK
      The addition of the ORDER BY clause is essential when using the Merge Join design pattern.
       
      The second OLEDB Source component will query against our destination table, essentially replacing the lookup  component from the Lookup Conditional Split design pattern.  As in the lookup query we only need to bring back the UserAK, UserSK and the Hash_CD.  But just like the source query above we need to add the ORDER BY clause because we are going to use the Merge Join.
       
      SELECT [UserSK]
      ,[UserAK]
      ,[Hash_CD]
      FROM [Demo].[dbo].[DimUser]
      Order By UserAK

      Simply using the ORDER BY clause is not enough for SSIS to know that the datasets you are intending to enter into the Merge Join is not enough. You have to tell SSIS that the data is sorted, and on what column the sort is happening.  I agree it’s a little redundant, but that’s how it is.  You must do this for each OLEDB Source.  To do this right click on the OLEDB Source and select Show Advanced Editor from the menu.
      image

      Once inside the advanced editor select the Input and Output Properties Tab along the top of the window. Then under the Common Properties for the OLEDB Source Output change the IsSorted property to True.

      image

      That takes care of telling SSIS that the data is sorted, but now we need to tell SSIS what column it is sorted on.  To do that we drill down OLEDB Source Output to the Output Columns and select the column, UserAK, (or columns if your query is ordered by more than one column) and change its SortKeyPosition to 1

      image

      Now do this for the second OLEDB source and then we’ll be ready for the Merge Join component.  Please note that if you have multiple column in your order by clause they must match in both queries, and you must set the SortKeyPosition in the same order for both components.  Once the the source components are configured drag in the Merge Join component.

      image

      Now drag the first source component output connection to the Merge Join component. Once you’ve connected this a selection box will appear.  It will ask you to tell it what side of the join this output belongs on.  This is important, because we are basically going to be creating a Left Outer Join inside the component, so knowing which side of the join each portion belongs to is obviously essential.

      image

      When you add the second output from the other OLEDB Source you won’t be prompted again, it will just make it the input for the other side of the join. Now double click on the Merge Join component and open the editor. 

      The first property to set is the Join Type.  Your options are Left Outer Join, Inner Join, or Full Outer Join.  We are going to be using the Left Outer Join option.  Below the Join Type is the columns from our two queries each on the side we specified when hooking up the component to the Merge Join.  The keys are already linked now we need to add the columns we want to return from both sides of the join.  Make sure to both the Hash_CD columns to make comparing them easier later on in the package.

      image

      The next step will be to add a conditional split to the package to determine which rows are new rows to be inserted and which rows need to be evaluated for updates.  To do this requires only one output to be created and we’ll call it NewRecords.  Here is the expression you need to add to the editor.

      image

      If this doesn’t make immediate sense let me explain.  Since we did a left outer join if there were no matches between the two queries on the UserAK it means that the records without a match are new records and as a result the UserSK would evaluate to NULL.  Now hook up the NewRecords output to your OLEDB Destination, set the connection to the destination table, confirm the mappings and the new records will get inserted correctly.

      Now on to the second Conditional Split. Here we need to evaluate if the records that matched on UserAK have had a change in any of the columns that we are tracking changes on.  To do this we will use the Checksum values that we’ve created in our source query.  We need to write 2 SSIS expressions, 1 for each output, to determine if the records have changed.  Here are the expressions:

      image

      We then connect the Update output to the OLEDB Command component to update the rows that need to be updated.

      Inside the OLEDB Command editor the first tab you are shown is the connection manager tab. Simply connect to the destination database and then select the component properties tab.
      image_thumb20

      At the bottom of this tab, under Custom Properties is the SQLCommand property. Here you will write your update statement. To do this you will have to map all of the columns to be updated in the destination to the columns coming through in the data flow. The query will look like you are mapping them to parameter values (?). Notice that even the where clause is set using the parameter indicator.

      UPDATE DimUser
      SET
      [ScreenName] = ?
      ,[AccountCreateDate] = ?
      ,[TimeZone] = ?
      ,[Language] = ?
      ,[GeographySK] = ?

      WHERE [UserSK] =?

      Once the query is complete go to the Column Mapping tab.

      image_thumb23

      No it is just a matter of correctly mapping the parameter values to the columns in your data flow (input columns). Make sure to pay attention to your column order in the update query to map the parameter values to the appropriate input column. Remember that the last parameter value is for the where clause and it is why we brought the UserSK value from the Lookup query to begin with.

      When your done the Data flow should look something like this.

      image

      I hope this helps some of you looking to try different ways to update SCD1 Dimensions.