Tallan's Blog

Tallan’s Experts Share Their Knowledge on Technology, Trends and Solutions to Business Challenges

Posts Tagged "ETL"

Improving performance of inserting multiple parent/child tables in a single SQL Procedure

Tom Babiec wrote a great blog a few months back on inserting multiple parent child tables in a single stored procedure. We use this technique a lot in our data integration work, and it’s proven to be very robust in many contexts. The SQL procedure outlined in that blog is useful not just for BizTalk, but generally speaking for ADO.NET and other applications trying to load multiple rows into SQL Server for multiple parent child tables. However, when dealing with larger datasets (and as the table grows), we’ve noticed some degradation in performance. In some cases, we were seeing variances of 30 seconds to 10+ minutes for loading the same exact data set on the same database. We tried a few different options, including forcing a recompile of the stored procedure between each load
, but this did not solve…

Column Change Auditing with SSIS

I was recently asked by a client to write an SSIS package to update some data in a database.  One of the requirements was to audit all changes with before and after column values, update user and update date. The client requested that CDC (Change Data Capture) NOT be implemented. And not being very proficient in .net programming languages, I chose not to attempt to code any SSIS “Scripting Component” transformations. An alternate method was chosen that used a combination of SSIS “Conditional Split”, “Derived Column”, “Multicast” and “Union All” transformations. The trick was to “create” a separate audit row for each column that was changed in a database row. The following example shows this method for the row update data flow. The row has already been determined to have column value changes before it reaches this point.

Derive Static Values:                       Used to…

Partitioned Fact Tables

Once tables grow into the millions of records, they become candidates for partitioning. Table partitioning offers many benefits, particularly in warehouse environments. Since data is split into smaller units of storage, backups can target filegroups with a higher rate of change. Systems with multiple CPUs see improved query performance as partitioned data leads to greater parallelism. Perhaps most significant is the ability to swap in huge amounts of data by partition switching, an operation that is practically instantaneous.
Tables can be partitioned horizontally or vertically. With vertical partitioning, columns are split out into separate physical tables. This post focuses on horizontally partitioned tables, which take advantage of new constructs Microsoft added in the 2005 release – partition functions and partition schemes. Table partitioning is an Enterprise Edition only feature in SQL Server 2005 and 2008.

Introduction to SAP Business Objects Data Integrator ETL tool

SAP Business  Objects Data Integrator XI R2/R3
The purpose of this presentation is to familiarize the team with some of the features of Business Objects Data Integrator (DI). The presentation is also aimed at providing some insight into how technology can affect the design and implementation of  the ETL processes’.
While at a high level it is best that an ETL architecture be technology agnostic, the physical implementation can stand to benefit by being designed to take advantage of the features provided by the technology.

SSIS data management fundamentals

Loading data into a warehouse for use in business intelligence often involves loading data from multiple dissimilar sources. Many times the easiest way to consistently manage the data involves doing extracts from the source systems into a simple text format such as comma delimited. Even legacy systems have some form of reporting engines that allow for text file output, and in dealing with more current database types there are mechanisms such as BCP to allow for dumping output to a text file. A few more reason for using flat files – it allows for a snapshot in time that is replicable, it allows for a single output from a source system to easily be imported into multiple different warehouses, and allows for the imported data to easily be broken into manageable chunks by managing the text files.
SSIS has many features…

ETL Process Control Architecture

Data Volumes are growing at incredible rates across many IT departments. Businesses are dealing with new problems every day, and there is much demand on IT to be flexible and to react to many issues facing the business on a daily basis. Many times, processes become obsolete or disfunct because they cannot keep up with the rate of change and increasing demands by the business. In some cases, the users lose confidence in the systems and seek alternative methods for delivering reports and data in a timely fashion.
A centralized, standard Process Control Environment can provide an inexpensive and efficient way to provide an auditing mechanism on new and existing data migration processes which may run occasionally or very frequently.  This will enable an IT department to become more pro-active and aware of issues well before the end users are aware….

SCD via SQL Stored Procedure

We are going to revisit the issue of dealing with Slowly Changing Dimensions in a data warehouse. We have seen a demonstration of using the SCD transformation that is available in SQL Server Integration Services (SSIS); however, this is not always the best option for processing an SCD. When the volume of rows you’re dealing with is substantial, this creates a significant, and usually unacceptable, performance hit. Another valid reason why you may choose not to implement the SSIS transformation is flexibility. The SSIS transformation requires your begin and end dates to be datetime data types, which could prove to be inconvenient when your SCD needs to tie back to a date and/or time dimension.
One alternative we are going to exhibit is using a SQL Server stored procedure. This example demonstrates the implementation of a Type 2 SCD, preserving the…

The BCP Option

Moving data from one repository to the next may sound like a trivial task, and in the sense of the normal mechanical approach it very well may be.  What could be simpler than INSERT INTO Here SELECT * FROM There?  Not much, it would seem, until your data source is something other than a SQL Server table or your data volume is so heavy that such a simple statement chokes, or worse yet, affects the transactional system’s performance.  What can you do when your data volumes are large enough that you need to consider faster approaches than manual SQL Statements?
Fortuitously, Microsoft has provided us with the BCP utility.  BCP is a command line executable that is optimized to move data very quickly from one place to another.  BCP is flexible enough to allow transformations of data in the operation, as…

The SCD Transformation in SSIS

In this post, we will demonstrate the Slowly Changing Dimension (SCD) transformation that is available for use in a Data Flow task within SQL Server Integration Services (SSIS).  We’ll walk through an example of how to use it, explaining the terms as we go. In addition, I’ll demonstrate some “extra effort” kind of modifications that can be made for a more thorough usage.  After that, we’ll discuss the pros and cons of using it at all.  And we’ll wrap up with some tips on how to tune the transformation for better performance.