Conformed Dimensions and Facts
Data marts are often developed to represent important systems within a company. Over time additional data marts are added, and eventually there is a desire extract data across multiple marts. Extracting data across data marts can be cumbersome and some times impossible if the data marts were not designed to share common dimensions, also known as conformed dimensions. Ideally since the dimension criteria would be the same for each data mart, selecting data across multiple marts would be as routine as selecting data from a single data mart.
Going hand and hand with conformed dimensions, conformed facts involve standardizing facts across multiple marts. Adopting conformed facts eliminates the ambiguity of having facts that possess the same name but have different underlying calculations.
In this blog we use simple data marts to demonstrate the use of both conformed dimensions and…
We are going to revisit the issue of dealing with Slowly Changing Dimensions in a data warehouse. We have seen a demonstration of using the SCD transformation that is available in SQL Server Integration Services (SSIS); however, this is not always the best option for processing an SCD. When the volume of rows you’re dealing with is substantial, this creates a significant, and usually unacceptable, performance hit. Another valid reason why you may choose not to implement the SSIS transformation is flexibility. The SSIS transformation requires your begin and end dates to be datetime data types, which could prove to be inconvenient when your SCD needs to tie back to a date and/or time dimension.
One alternative we are going to exhibit is using a SQL Server stored procedure. This example demonstrates the implementation of a Type 2 SCD, preserving the…
Moving data from one repository to the next may sound like a trivial task, and in the sense of the normal mechanical approach it very well may be. What could be simpler than INSERT INTO Here SELECT * FROM There? Not much, it would seem, until your data source is something other than a SQL Server table or your data volume is so heavy that such a simple statement chokes, or worse yet, affects the transactional system’s performance. What can you do when your data volumes are large enough that you need to consider faster approaches than manual SQL Statements?
Fortuitously, Microsoft has provided us with the BCP utility. BCP is a command line executable that is optimized to move data very quickly from one place to another. BCP is flexible enough to allow transformations of data in the operation, as…
In this post, we will demonstrate the Slowly Changing Dimension (SCD) transformation that is available for use in a Data Flow task within SQL Server Integration Services (SSIS). We’ll walk through an example of how to use it, explaining the terms as we go. In addition, I’ll demonstrate some “extra effort” kind of modifications that can be made for a more thorough usage. After that, we’ll discuss the pros and cons of using it at all. And we’ll wrap up with some tips on how to tune the transformation for better performance.
Data warehousing is a big subject. This overview is intended to cover some of the most representative issues on a high level: the nature of OLAP systems, star schemas, facts and dimensions, and differing perspectives (Inmon vs. Kimball) on warehouse design.
OLTP vs. OLAP
OLTP systems are the operational databases supporting applications. They are highly normalized, and focused on CRUD operations.
OLAP databases are usually arranged in star schemas and are built for speed in retrieving aggregated data.