h1

Slowly Changing Dimension Type 1 with Lookup and Conditional Split

March 18, 2012

In my last post I talked about using the SCD component that comes with SSIS to load a Dimension.  This method is ok for loading small tables (<50,000). The Lookup and Conditional Split design pattern performs much better.  The main reason for the enhanced performance is the lookup component.  The lookup component executes its query once and stores the results in cache where as the SCD component queries the destination table for each row that comes from the source.  While there is no easy to configure wizard to setting up this design pattern, it isn’t too difficult to do manually.

With the SCD component we didn’t have to do any additional work in our source query to make comparing the columns quick and easy.  With the Lookup and Conditional Split we need to add a checksum to our query.  Here is the query that includes the checksum:

SELECT 
[UserAK]
,[ScreenName]
,[AccountCreateDate]
,[TimeZone]
,[Language]
,[GeographySK]
,CHECKSUM(
[ScreenName]
,[AccountCreateDate]
,[TimeZone]
,[Language]
,[GeographySK]
) as Hash_CD
FROM
DimUserUpdate

image

Now that we have our source query configured correctly and added the Lookup component and connected the two, lets configure the Lookup.

image

On the General Tab you can configure the Cache mode, the connection type and the way to handle no matches.  This third setting is very important when configuring the lookup for Slowly Changing Dimensions.  You want to set the option to Redirect Rows to No Match output.  This allows you to insert rows that don’t already exist in you destination table.

image

On the Connection Tab you obviously set the connection to your database and can either choose to select a table or use a SQL query.  I suggest you always use a SQL query and bring back only the rows you need for the lookup.  Doing this will save space in memory, and if you have a very wide table with millions of rows this could cause your lookup to run slowly or even fail.  In this query I am only bringing back the UserAK (business key) UserSK(surrogate key) and the Checksum value.  With regards to the checksum it is entirely up to you (or the data architect if that isn’t you as well) on whether or not you store the checksum value.  In this example I am not storing the value.  Here is the query:

 

SELECT
UserSK
,UserAK
,CHECKSUM(
[ScreenName]
,[AccountCreateDate]
,[TimeZone]
,[Language]
,[GeographySK]
) as Hash_CD
FROM dbo.DimUser

image

The Columns Tab is pretty easy to configure.  The two boxes on the upper half of the editor window represent the source query (left) and the lookup query (right).  To configure just drag the business key (UserAK) from the source query on to the busness key (UserAK) from the lookup query.  Then put a check mark next to the columns from the Lookup that you want to bring back into the dataflow (UserSK and Hash_CD).  Since the column Hash_CD exists in both the lookup and the source queries make sure to set the Output Alias (LKP_Hash_CD) so it is easy to differentiate between the two.  This is all you need to do to configure the lookup.

From the lookup we should have to outputs:

  1. Lookup Match Output
  2. Lookup No Match Output

The Lookup No Match Output will contain all of our new records.  We can map this output directly to our OLEDB Destination with no further work. 

image

The Lookup Match Output contains all the records that had matches in the destination table, so the next step will be to determine if the records coming are different from the records that already exists.  To do this we will use the Conditional Split transform.

image

To configure this transform we will use the two Hash_CD values two create two different outputs from the Conditional Split.  Fist we will configure the NoChange Outupt.  First name the output and then add the SSIS expression to compare the two values.  In this instance we want to send all the matching Hash_CD values to this output.  Here is the expression :

LKP_Hash_CD == Hash_CD

The next output will be the Change Output, and will contain all the records where the Hash_CD values didn’t match. Here is that expression:

LKP_Hash_CD != Hash_CD
That is it for the Conditional Split configuration. Now we need to set up the package to perform the updates.  There are a couple of methods to do this, but we are going to use the OLEDB Command to do the updates.
image
 
As you can see we use the Change Output to connect to our OLEDB Command component. 
image
Inside the OLEDB Command editor the first tab you are shown is the connection manager tab. Simply connect to the destination database and then select the component properties tab.
image

At the bottom of this tab, under Custom Properties is the SQLCommand property.  Here you will write your update statement.  To do this you will have to map all of the columns to be updated in the destination to the columns coming through in the data flow.  The query will look like you are mapping them to parameter values (?).  Notice that even the where clause is set using the parameter indicator.

UPDATE DimUser
SET
[ScreenName] = ?
,[AccountCreateDate] = ?
,[TimeZone] = ?
,[Language] = ?
,[GeographySK] = ?

WHERE [UserSK] =?

Once the query is complete go to the Column Mapping tab.

image

No it is just a matter of correctly mapping the parameter values to the columns in your data flow (input columns).  Make sure to pay attention to your column order in the update query to map the parameter values to the appropriate input column. Remember that the last parameter value is for the where clause and it is why we brought the UserSK value from the Lookup query to begin with. 

Once all the mapping is done click OK and you are now ready to handle Type 1 changes in your Slowly Changing Dimension.

Advertisements

2 comments

  1. […] A Blog About SQL Server and Business Intelligence « Slowly Changing Dimension Type 1 with Lookup and Conditional Split Slowly Changing Dimension Type 1 with Merge Join. March 19, […]


  2. […] a previous post I detailed how to create a package that handles SCD Type 1 changes using SSIS.  The component […]



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: