SCD via SQL Stored Procedure
We are going to revisit the issue of dealing with Slowly Changing Dimensions in a data warehouse. We have seen a demonstration of using the SCD transformation that is available in SQL Server Integration Services (SSIS); however, this is not always the best option for processing an SCD. When the volume of rows you’re dealing with is substantial, this creates a significant, and usually unacceptable, performance hit. Another valid reason why you may choose not to implement the SSIS transformation is flexibility. The SSIS transformation requires your begin and end dates to be datetime data types, which could prove to be inconvenient when your SCD needs to tie back to a date and/or time dimension.
One alternative we are going to exhibit is using a SQL Server stored procedure. This example demonstrates the implementation of a Type 2 SCD, preserving the change history in the dimension table by creating a new row when there are changes.
Let’s take a look at our Vendor dimension table:
VENDOR_KEY serves as the surrogate or warehouse key in the table. The effective date range columns retain the history of each vendor (VENDOR_IDNT) in the dimension, allowing us to see the column values at any point in time. Fact table rows can be joined to the dimension row on the VENDOR_KEY where the date of the fact is within the effective date range of the dimension. The BEGIN_EFF_DT and END_EFF_DT have an integer data type because they will correspond to key values that tie back to the date dimension, whose primary key has a yyyyddd format (ex: 2008001).
The Vendor staging table:
The staging table reflects its source file with the addition of the FILE_DT column. The FILE_DT column will be used as the effective date in the load process should the file contain changes or new vendors.
The stored procedure takes the data from the staging table and loads it into the dimension table. This process can be broken down into three steps.
1. Insert new records of vendors that do not exist in the dimension
2. Insert new records of vendors that do exist in the dimension and contain field values that are different from the previous record.
3. Update the effective dates in changed records to expire the previously active record.
The first step is to insert any completely new vendor records.
The first record for each vendor is pulled from the dimension table and the VENDOR_IDNT is compared to the VENDOR_IDNT values in the staging table. A new record is inserted only when there is no match in the dimension table. The BEGIN_EFF_DT and END_EFF_DT are set to -2147483648 (smallest integer value) and 2147483647 (largest integer value), respectively, for the new record. The idea being that if there is only one record for a vendor then it is always effective.
The second step of the procedure is to insert new rows for any items that have changed.
The latest record for each vendor is pulled from the dimension and the values are compared to those in the staging table. The columns compared by the EXCEPT clause are what tracks changes. A new row is inserted for any differences in the LONG_NAME or SHORT_NAME columns for a matching VENDOR_IDNT. The BEGIN_EFF_DT of the inserted row is set to the FILE_DT from the staging table and the END_EFF_DT is set to 2147483647 because it will be the latest record for that vendor.
The third step for the procedure is to expire all the entries that have been updated.
The END_EFF_DT is updated on the previously valid record and it is set to one day before the BEGIN_EFF_DT on the new record, so there is no overlap.
Here we have an example of a record that changed and what the table will look like when the procedure is finished: