Tallan Blog

Tallan’s Experts Share Their Knowledge on Technology, Trends and Solutions to Business Challenges

Category Archive for "Data & Analytics"

Introduction to SAP Business Objects Data Integrator ETL tool

SAP Business  Objects Data Integrator XI R2/R3
The purpose of this presentation is to familiarize the team with some of the features of Business Objects Data Integrator (DI). The presentation is also aimed at providing some insight into how technology can affect the design and implementation of  the ETL processes’.
While at a high level it is best that an ETL architecture be technology agnostic, the physical implementation can stand to benefit by being designed to take advantage of the features provided by the technology.

SSIS data management fundamentals

Loading data into a warehouse for use in business intelligence often involves loading data from multiple dissimilar sources. Many times the easiest way to consistently manage the data involves doing extracts from the source systems into a simple text format such as comma delimited. Even legacy systems have some form of reporting engines that allow for text file output, and in dealing with more current database types there are mechanisms such as BCP to allow for dumping output to a text file. A few more reason for using flat files – it allows for a snapshot in time that is replicable, it allows for a single output from a source system to easily be imported into multiple different warehouses, and allows for the imported data to easily be broken into manageable chunks by managing the text files.
SSIS has many features…

ETL Process Control Architecture

Data Volumes are growing at incredible rates across many IT departments. Businesses are dealing with new problems every day, and there is much demand on IT to be flexible and to react to many issues facing the business on a daily basis. Many times, processes become obsolete or disfunct because they cannot keep up with the rate of change and increasing demands by the business. In some cases, the users lose confidence in the systems and seek alternative methods for delivering reports and data in a timely fashion.
A centralized, standard Process Control Environment can provide an inexpensive and efficient way to provide an auditing mechanism on new and existing data migration processes which may run occasionally or very frequently.  This will enable an IT department to become more pro-active and aware of issues well before the end users are aware….

Developing a Dimensional Model w/ conformed dimensions and facts

Conformed Dimensions and Facts
Data marts are often developed to represent important systems within a company.  Over time additional data marts are added, and eventually there is a desire extract data across multiple marts.  Extracting data across data marts can be cumbersome and some times impossible if the data marts were not designed to share common dimensions, also known as conformed dimensions. Ideally since the dimension criteria would be the same for each data mart, selecting data across multiple marts would be as routine as selecting data from a single data mart.
Going hand and hand with conformed dimensions, conformed facts involve standardizing facts across multiple marts.  Adopting conformed facts eliminates the ambiguity of having facts that possess the same name but have different underlying calculations.
In this blog we use simple data marts to demonstrate the use of both conformed dimensions and…

SCD via SQL Stored Procedure

We are going to revisit the issue of dealing with Slowly Changing Dimensions in a data warehouse. We have seen a demonstration of using the SCD transformation that is available in SQL Server Integration Services (SSIS); however, this is not always the best option for processing an SCD. When the volume of rows you’re dealing with is substantial, this creates a significant, and usually unacceptable, performance hit. Another valid reason why you may choose not to implement the SSIS transformation is flexibility. The SSIS transformation requires your begin and end dates to be datetime data types, which could prove to be inconvenient when your SCD needs to tie back to a date and/or time dimension.
One alternative we are going to exhibit is using a SQL Server stored procedure. This example demonstrates the implementation of a Type 2 SCD, preserving the…

The BCP Option

Moving data from one repository to the next may sound like a trivial task, and in the sense of the normal mechanical approach it very well may be.  What could be simpler than INSERT INTO Here SELECT * FROM There?  Not much, it would seem, until your data source is something other than a SQL Server table or your data volume is so heavy that such a simple statement chokes, or worse yet, affects the transactional system’s performance.  What can you do when your data volumes are large enough that you need to consider faster approaches than manual SQL Statements?
Fortuitously, Microsoft has provided us with the BCP utility.  BCP is a command line executable that is optimized to move data very quickly from one place to another.  BCP is flexible enough to allow transformations of data in the operation, as…

The SCD Transformation in SSIS

In this post, we will demonstrate the Slowly Changing Dimension (SCD) transformation that is available for use in a Data Flow task within SQL Server Integration Services (SSIS).  We’ll walk through an example of how to use it, explaining the terms as we go. In addition, I’ll demonstrate some “extra effort” kind of modifications that can be made for a more thorough usage.  After that, we’ll discuss the pros and cons of using it at all.  And we’ll wrap up with some tips on how to tune the transformation for better performance.

A Warehousing Overview

Data warehousing is a big subject. This overview is intended to cover some of the most representative issues on a high level: the nature of OLAP systems, star schemas, facts and dimensions, and differing perspectives (Inmon vs. Kimball) on warehouse design.
OLTP vs. OLAP
OLTP systems are the operational databases supporting applications. They are highly normalized, and focused on CRUD operations.
OLAP databases are usually arranged in star schemas and are built for speed in retrieving aggregated data.

\\\