Column Change Auditing with SSIS
I was recently asked by a client to write an SSIS package to update some data in a database. One of the requirements was to audit all changes with before and after column values, update user and update date. The client requested that CDC (Change Data Capture) NOT be implemented. And not being very proficient in .net programming languages, I chose not to attempt to code any SSIS “Scripting Component” transformations. An alternate method was chosen that used a combination of SSIS “Conditional Split”, “Derived Column”, “Multicast” and “Union All” transformations. The trick was to “create” a separate audit row for each column that was changed in a database row. The following example shows this method for the row update data flow. The row has already been determined to have column value changes before it reaches this point.Derive Static Values: Used to provide fixed values not derived from the data. In this case the audit output had an “Action” column designating an insert or update activity. This column value is set here. Also update Userid is derived from a passed variable, and update date was populated using getdate(). Hard coded table name and column name columns are also set here for each table being audited. Get Original Column Values: Does a lookup of each changing data row to get the original values to store in the audit tables OriginalValue column.
If Column n Changed: Series of Conditional Split transformations to test for value changes in each column. If a column value will be changed, the data flow is directed to a Multicast transformation. If the data value will not change, the data flow continues to check the next column for changes.
Multicast n: Flows the data for a ColumnChanged split conditional to both the data update and audit destinations. One Multicast transformation is coded for each column tested for changes.
Union Data n: Unions the changed rows and the unchanged rows together for the next Conditional Split test.
Union Audit: Unions the rows together that were created for the audit output destination. Union All Input 1 and Union All Input 3 values for OriginalValue and BusinessKey are taken from “Get Original Column Values” transformation for each column (row), ChangedValue comes from the input source, TableName, Userid, UpdateDate ColumnName come from “Derive Static Values” transformation.
Count Rows Updated: Count of updated rows for logging.
Update Data Table: This is the destination of the actual data table where the data updates are to be made.
Audit Destination: The audit table destination containing each changed column as a row.
Below is a code snippet from the insert flow. As with the Update flow, the data is Multicast and Union’d so that the data is made available for both the insert data and audit destinations.
Split for Data and Audit Inserts: Multicast transformation that splits data into separate rows for each column for the audit destinations, and the output data for the “Insert New Data Row” destination.
Insert New Data Row: Destination for actual data insert.
Derive BusinessKey Data: Saves the new BusinessKey value from the source data for the audit row’s ChangedValue column. In an insert flow, all columns are captured for the audit with blanks provided for OriginalValue.
Derive Column n Data: Saves each data column of the inserted row for the ChangedValue column of the audit. Again, blanks are used for OriginalValue.
Union Audit: All audit columns (which have now been transformed into rows) are union’d together for audit destination insert.
Audit Destination: The audit table destination containing each column as a row with original value, changed value, etc.
The sample update and insert flows shown above are actually contained in a single data flow. The insert or update decisions are determined in a series of lookup transformations (not shown). Then, the audit output from the update and insert flows are union’d together again and stored in a “Raw File” destination. The raw file is passed to another data flow that handles the audit table inserts. A lookup of each data row using the passed business key gets the integer primary key value from the inserted / updated rows. The PK value is also required for the audit table.
Although there may be other ways to achieve the same goal of deriving a data row from a column, I’ve found that this basic method does it well using only “out of the box” SSIS components.