h1

BIxPress MDX Calculation Builder

April 30, 2012

This weekend at SQL Saturday 130 I gave presented a session on Introduction to SSAS, and after my session one of the attendees came up to me and asked how to create what amounted to a rolling 30 60 and 90 days calculation inside of SSAS.  Since this was an 100 level session, the attendee hadn’t ever seen an MDX expression so walking her through the calculation would have been a bit over her head. This is a common problem for most developers just getting into cube building with SSAS.  MDX is a difficult language to pick up.  Thankfully Pragmatic Works has some software that can help with that. 

When I’ve written about BIxPress in the past it has mostly been about the SSIS features it contains, but SSIS isn’t all it does. BIxPress has an MDX calculation builder that plugs in to the Calculation Tab inside of BIDS in you SSAS project, and its not that difficult to use, meaning you don’t have to write any MDX at all. Yes I am serious and don’t call me Shirley (thank you Leslie Nielsen)

As I said before, navigate to the calculations tab in you SSAS Project inside of BIDS. Once there, and if you have BIxPress installed of course, you will see a little green calculator button has been added to the tool bar.

image

When you click on the icon, a list of available calculations and sets are presented in a new window. These include calculations like Month to Date and Percent of Parent, as well as sets returning Top 10 Count or Top 10 Percent.

image

In this example I’m going to do the Profit Margin calculation, so I select it from the list and hit next.  The first thing I’m asked is to provide the measure that represents the total for Sales.

image

Once that is selected I need to provide the Total Cost.

image

(Notice I haven’t written any MDX yet) The next screen allows me to set any conditional formatting, like text color or Red for Profit Margin of <%15 or something to that effect, but I don’t need that for this example.

On the last screen we give the Calculation an Name and associate it with the correct measure group(Internet Sales Profit Margin, and Internet Sale respectively) and then click finish.  Once that is done, build and deploy the cube, and just like that you have a new calculation in your SSAS cube, and you didn’t have to write a single MDX expression.

image

image

Take a look at this and many other feature of BIxPress by Downloading a trial version here.  Also if you have any suggestions for additional calculations, or other features you’d like to see in BIxPress or any of the Pragmatic Works software products, post them here.

h1

Task Factory Advanced Execute Package Task

April 30, 2012

If you’ve ever developed any kind of SSIS auditing framework, chances are you’ve wanted to pass variables from the child package to the parent package.  As you know it can be accomplished, but it takes some hoop jumping including writing some code in a script task.  As I’ve said before, scripting is not my strong suit, so if there is a way to work around it, I’ll give it a try.

Thankfully Pragmatic Works has made the work around unnecessary with the new component in the ever growing list of SSIS components.  The Advanced Execute Package Task component allows you to easily configure the passing of variables between packages in both directions.

image

The component is easy to configure.  Once you’ve selected the location of the package you want to execute, either SQL Server of File System, the next step is to assign the direction of the variable mapping.  There are three options, read from child package, write to child package, read and write variable from child package.

image

Once that is selected (in this case I’m going to read the variable from the child package) the next step is to select the variable from the child package you want to read.  The nice thing that the Advance Execute Package Task does, is that it only allows you to map variables of the same type. This means that if you choose a string variable in the child package you can only choose string variables in the parent package.

image

Once the variable mapping is set the way you want it, click the add Mapping button.  This adds the mapping to the package and you can now configure another variable.

image

With this component it is very easy to pass variables back and forth between packages.  Don’t take my word for it, download Task Factory for yourself and try it out.

h1

SSIS For Loop Containers Part 2

March 30, 2012

    In the continuation from my previous post on For Loop Containers I’ll give you a little more practical example of using the container.

    A “Watcher” package is used quite often for checking for the existence of a file in a folder or for a row to be inserted into a table. Lets set one up. Here’s what its going to do:

    1. Check for new records in a source table.

    2. If there are new records in the table, then rows will be inserted.

    3. If there are not any new records, the package will enter into a wait loop, and “sleep” for 1 min

    4. then iterate through the loop again.

    5. Exit the loop once the rows are inserted.

      1. First lets set up the variables we are going to need for this project.

        image_thumb[20]

        Next lets set drag a For Loop container on to the design pane and configure it. Unlike the first For Loop we set up we are going to leave the InitExpression and the AssignExpression blank, because as you remember from earlier they are optional. In the EvalExpression enter this:

        @dtMaxDestinationLoadDate >= @dtMaxSourceLoadDate

        That is all we have to do here. So far this is even easier to configure then the first one, but hold on it’s the use of the control flow that adds the complexity. We will be using a couple of execute SQL tasks to populate our two variable used in the EvalExpression.

        Here is the create Script for the Source and Destination Tables

        CREATE TABLE [dbo].[ForLoopSource](

        [ID] [int] IDENTITY(1,1) NOT NULL,

        [Name] [varchar](50) NULL,

        [LoadDate] [datetime] NULL

        ) ON [PRIMARY]

        CREATE TABLE [dbo].[ForLoopDestination](

        [ID] [int] IDENTITY(1,1) NOT NULL,

        [Name] [varchar](50) NULL,

        [LoadDate] [datetime] NULL

        ) ON [PRIMARY]

        I’ll leave it to you to populate the tables with data, but there are some name generator websites out there that you can get pretty big datasets from if you need them, or you can just pull the data from Adventure Works.

        Drag in an Execute SQL Task and configure your connection. Then enter this code:

        SELECT

        ISNULL(MAX([LoadDate]),‘1/1/1900’)MaxSourceLoadDate

        FROM

        [Demo].[dbo].[ForLoopSource]

        Once you’ve entered the query make sure to set the result set to single row, and then map the result set to the dtMaxSourceLoadDate variable. Then click ok.

        image_thumb[22]

        image_thumb[24]

        Do the same for the next Execute SQL Task but set up for the dtMaxSourceLoadDate.

        SELECT

        ISNULL(MAX([LoadDate]),‘1/1/1900’)MaxDestinationLoadDate

        FROM

        [Demo].[dbo].[ForLoopDestination]

        The next step is to set the conditions that determine which path the package should take. We do this by adding expressions on the precedence constraints.

        image

        When the expression for the constraint on the left @dtMaxSourceLoadDate>@dtMaxDestinationLoadDate evaluates to true, the package will proceed to the data flow task and load the new rows into the destination table.

        When the expression for the constraint on the right @dtMaxSourceLoadDate<=@dtMaxDestinationLoadDate evaluates as true, the package will proceed to the script task which puts the loop in a wait status before it returns to the top and iterates through again.

        Its in this script task that we use the third variable that we set up at the beginning intWaitTime.  Here is the VB.net for the script task

        Public Sub Main()
                Dim sec As Double = Convert.ToDouble(SnippetParameter(“User::intWaitTime”,””,””))
                Dim ms As Int32 = Convert.ToInt32(60 * (sec * 1000))
                System.Threading.Thread.Sleep(ms)
              Dts.TaskResult = ScriptResults.Success
              End Sub

        By making a variable to configure this instead of hard coding the value in the script task gives us the option to later on configure that variable in a table with package configurations. Once the wait is over the package loops again, until the destination table has a date that is greater then the max date in the source table.

        Inside the dataflow you source query should look like this.

        SELECT        Name, LoadDate
        FROM            dbo.ForLoop
        Where LoadDate > ?

        with the parameter mapped to @dtMaxDestinationLoadDate.

        I hope this helps you all get a better grasp on how For Loops can be used to enhance the functionality of you SSIS packages.

        Until Next Time…

      h1

      SSIS For Loop Containers Part 1

      March 29, 2012

      In a previous blog I talked about the Sequence Container, and some of its uses.  To continue with that theme I now bring you the For Loop Container. 

      The For Loop is one of two Loop containers available in SSIS.  In my opinion it is easier to set up and use than the For Each Loop, but it is just as useful.  The basic Function of the for loop is to loop over whatever tasks you put inside the container a predetermined number of times, or until a condition is met. The For Loop Container, as is true of all the containers in SSIS, supports transactions by setting the Transaction Option in the properties pane of the container to “Required”, or “Supported” if a parent container, or the package itself is set to “Required”

      There are three expressions that control the number of times the
      loop executes in the For Loop container.

      1. The InitExpression is the first expression to be evaluated on the For Loop and is only evaluated once at the beginning. This expression is optional in the For Loop Container.  It is evaluated before any work is done inside the loop.  Typically you use it to set the initial value for the variable that will be used in the other expressions in the For Loop Container. You can also use it to initialize a variable that might be used in the workflow of the loop.
      2. The EvalExpression is the second expression evaluated when the loop first starts. This expression is not optional. It is also evaluated before any work is performed inside the container, and then evaluated at the beginning of each loop.  This is the expression that determines if the loop continues or terminates. If the expression entered evaluates to TRUE, the loop executes again. If it evaluates to FALSE, the loop ends.  Make sure to pay particular attention to this expression.  I will admit that I have accidentally written an expression in the EvalExpression that evaluates to False right away and terminated the loop before any work was done, and it took me longer than it probably should have to figure out that the EvalExpression was the reason why it was wrong.
      3. The AssignExpression is the last expression used in the For Loop. It is used to change the value of the variable used in the EvalExpression. This expression is evaluated for each pass through the loop as well, but at the end of the workflow. This expression is optional.

      image

      Lets walk through setting up an example of the package. In this example we’ll create a loop that executes a given number of times.

      Create a new package and add two variables to it, intStartVal and intEndVal.

      image

      Next add a For Loop Container to the package and open the editor.  Assign the following values for the expressions:

      image

      That is all the configuring that is required for the For Loop Container.  Now lets add a Script Task that will display a message box with the value of the intStartVal variable as the loop updates the value of that variable. Here is the code to do that:

      Public Sub Main()
      '
      MsgBox(Dts.Variables("intStartVal").Value)

      '
      Dts.TaskResult = ScriptResults.Success
      End Sub

      Once that is done the package is ready to execute.

      image

      First Iteration

      image

      Second Iteration

      image

      Fifth Iteration

      image

      Complete

      image

      Now that the EvalExpression @intStartVal <= @intEndVal evaluated to false the package ends.  In part two of SSIS For Loop Containers I’ll go a little deeper in using a For Loop with some real world examples. Until then…

      h1

      Indexing Strategies for OLTP Databases.

      March 24, 2012

      Let me start by saying that I am not a DBA, I am a BI Developer.  I work with some really bright DBA’s and they can tell you much more about this topic then I can, and probably in great depth, with many terms like PERFMON, DMV and querying a bunch of those Sys tables.  To this point in my career I’ve haven’t had the exposure or need to think about many of the DBA tasks.  I mostly worry about building packages, cubes and repots.  This just isn’t going to cut it.  As I’ve progressed in my career as a consultant, I realized that clients are going to be expecting more and more from me, and rightfully so.  As a junior developer I could lean on the more senior guys to help carry me when things outside of my knowledge base came up, but now I need to become one of those senior guys, and that means <gulp> moving outside of my comfort zone and diving into topics that are in areas other than strictly Business Intelligence.  So for my first foray into the land of the DBA I decided to look into Indexes, and more specifically indexing strategies in OLTP systems, since my only real exposure to them is reading from them when loading data into a data warehouse, and not writing to them. 

      Now some of this might seem pretty basic to you but this is how I think about them.  Online Transactional Processing (OLTP) systems are primarily used for data entry. Since the primary function of them is to input data, and usually high volumes of data over an extended period of time, the primary performance concern for an OLTP system is how fast can it input the data. Now that is over simplifying it a bit, because its not like every new insert is coming from a new customer ordering a new item.  Most likely it’s a returning customer ordering an item that has been ordered by other customers before.  So obviously we aren’t going to put duplicate data inside our OLTP system (3NF and all that goodness) so there will have to be some reads performed on the database. This is where the indexes come in.  You want to create indexes that will speed up the reads you are doing on your OLTP system to enable you to do faster inserts updates and deletes to your system.  The problem is that indexes can actually slow down the inserts updates and deletes.  The trick then is to find the “sweet spot” where the indexes speed up the  overall process, not just the reads on your queries. As you can imagine there is no single way to do it, no one “right” answer.  I did however find some tips out there to help fledgling DBAs and other wannabes like me, keep from making a fool of ourselves when the cool kids (DBAs) come behind and look at the indexes we’ve implemented.  Here are some of the Highlights and I’ve included links to them.

      Well I hope this has helped someone other than me. One thing I have learned since I started blogging, is that if I can explain it in writing then I have a pretty good grasp of the concept, and I can always fall back on the blog that I wrote as a refresher (convenient!).

      h1

      An Alternative to the OLE DB Command to update Slowly Changing Dimensions.

      March 21, 2012

      In a previous post I detailed how to create a package that handles SCD Type 1 changes using SSIS.  The component that handles writing the updates to the destination table inside the data flow is the OLEDB Command. 

      image

      It is the only native component in the dataflow that can write update statements to the destination table. Unfortunately these update statements are done on a row by row basis.  The question is then, how do we accomplish batch updates in the SSIS package?

      The answer is pretty simple:

      • We first have to create a table stage the updates,  with the same structure as our destination table.
          CREATE TABLE [dbo].[StageDimUser](
          [UserSK] [int] NOT NULL,
          [UserAK] [varchar](50) NULL,
          [ScreenName] [varchar](50) NULL,
          [AccountCreateDate] [date] NULL,
          [TimeZone] [varchar](50) NULL,
          [Language] [varchar](50) NULL,
          [GeographySK] [int] NULL,
          [Hash_CD] [int] NULL)
      • Replace the OLE DB Command with another OLE DB Destination Component.

      image

      • Connect the new destination to our newly created stage table and insert all the updates to that table
      • Return to the control flow and add an execute SQL task that updates the destination table with the values from the stage table.  The execute SQL task will perform a Batch Update instead of updating row by row.  So even though we’ve added an extra step to the process, the resulting processing time should be much faster.
        UPDATE [Demo].[dbo].[DimUser]
        SET [UserAK] = StageDimUser.[UserAK]
        ,[ScreenName] = StageDimUser.[ScreenName]
        ,[AccountCreateDate] = StageDimUser.[AccountCreateDate]
        ,[TimeZone] = StageDimUser.[TimeZone]
        ,[Language] = StageDimUser.[Language]
        ,[GeographySK] = StageDimUser.[GeographySK]
        ,[Hash_CD] = StageDimUser.[Hash_CD]
        FROM StageDimUser
        WHERE DimUser.UserSK = StageDimUser.UserSK
      • Finally once the updates are done, truncate the stage table to prep the table for the next load.

      image

      h1

      Using Sequence Containers in SSIS

      March 21, 2012

      Not every package requires the use of a Sequence container, and not everyone likes to use them. For me personally they can be a huge help when developing and debugging SSIS packages.  Some of the most common uses for Sequence Containers are:

      • Group and control the execution order of a package.  For example loading a data warehouse, place all of the dimension load tasks inside the sequence container and allow them to run in parallel, then connect the sequence container to the fact table load to ensure that all the dimensions finish loading before the fact table loads.

      image

      • Grouping related tasks in a package to give a clearer “High Level” understanding of what a complex package is doing.

      • Easier debugging by allowing you to disable groups of tasks to focus package debugging on one subset of the package control flow.

      image

      • Grouping tasks that require similar property settings without having to set each individual task property.  Similarly it allows you to set  properties at the container level and not at the package level.

      • Providing scope for variables that a group of related tasks and containers use.

      • Create a transaction around all the components inside the container.  Meaning if one task inside the container fails all the work that was completed successfully prior to that task failing is rolled back and its like it never happened.

      Its this last point that I would like to look at in more detail.  To be quite honest I haven’t always been aware that you could create a transaction using a sequence container, and it would have made my life a bit easier. If you head over to MSDN you can find these articles covering transactions (1, 2, 3). All Microsoft Integration Services container types allow you to configure them for use in transaction with one of three settings.  MSDN describes them:

      • Required indicates that the container starts a transaction, unless one is already started by its parent container. If a transaction already exists, the container joins the transaction. For example, if a package that is not configured to support transactions includes a Sequence container that uses the Required option, the Sequence container would start its own transaction. If the package were configured to use the Required option, the Sequence container would join the package transaction.

      • Supported indicates that the container does not start a transaction, but joins any transaction started by its parent container. For example, if a package with four Execute SQL tasks starts a transaction and all four tasks use the Supported option, the database updates performed by the Execute SQL tasks are rolled back if any task fails. If the package does not start a transaction, the four Execute SQL tasks are not bound by a transaction, and no database updates except the ones performed by the failed task are rolled back.

      • NotSupported indicates that the container does not start a transaction or join an existing transaction. A transaction started by a parent container does not affect child containers that have been configured to not support transactions. For example, if a package is configured to start a transaction and a For Loop container in the package uses the NotSupported option, none of the tasks in the For Loop can roll back if they fail.

      This means that at the Package level, at the Sequence, For Loop and For Each Loop Container level as well as most if not all (sorry I haven’t checked them all, but you can if you want to) Control Flow containers you can configure them to use or ignore a transaction. Just by changing the TransactionOption property. By default the property is set to Supported, which I find rather convenient, because then I only have to set it at the Parent container level and every child will automatically pick up the transaction. I like it when I don’t have to change the property of a handful of tasks.

      So lets create a few examples.

      Lets start simple.  I’ve seen this example done by a few people, here and here, and if its good enough for them, its good enough for me. Lets create a package that creates a table and inserts a bunch of rows using multiple Execute SQL tasks.  Here are the SQL statements for the 5 tasks:

      CREATE TABLE [dbo].[TransactonTest](
      [PartNumber] [int] NULL CONSTRAINT uq_TransactionTest_PartNumber UNIQUE,
      [PartDescription] [Varchar] (200) null


      ) ON [PRIMARY]





      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (1
      ,'Spacely Sprokets')

      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (2
      ,'Cogswells Cosmic Cogs')


      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (3
      ,'Thing 1')


      INSERT INTO [dbo].[TransactonTest]
      ([PartNumber]
      ,[PartDescription])
      VALUES
      (3
      ,'Thing 2')

      image

      As you can see I’ve added a Sequence Container and put the Execute SQL tasks inside the container.  I’ve also set the TransactionOption for the Container to Required.  Now when I run my package the final insert should fail. 

      Before I could even get the package to fail for the reason I wanted to I got this error.

      Error: The SSIS Runtime has failed to start the distributed transaction due to error 0x8004D01B “The Transaction Manager is not available.”. The DTC transaction failed to start. This could occur because the MSDTC Service is not running.

      So to enable the service on my laptop I went tp ControlPanel->AdministrativeTools->Component Services->Computers->My Computer->Properties and started the Distributed Transaction Coordinator, and that solved my problem. However if you are using this in your work environment be aware that there are some hurdles to jump through regarding firewalls  

      So now back to the package.

      image

      As expected the 4th insert fails and the container fails.  If we look at my server you’ll see that the table didn’t get created.

      image

      See I told you no table named TransactionTest.  But if we go ahead and change the 4th insert to work…

      image

      image

      image

      Well there you go, it really does work.  Pretty cool little tidbit to have in your back pocket.  Now don’t go thinking that this will solve all of your problems, some people (SSIS Junkie, Matt Masson) and  have encountered issues with using transactions in some instances, so make sure to test this out thoroughly.

      This will work on package designs for loading a data warehouse star schema using Execute Package Tasks as well. 

      image

      As long as the tasks is set to supported, and the items inside the child package are set to supported as well the will inherit the transaction created by the parent container.

      I hope this opened up your eyes to the fact that you can use Sequence containers to do a number of useful things and that incorporating transactions inside your SSIS packages is possible and not to difficult to implement.