Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

Analysis Services Tabular: Factless Facts and Revealing Object Relationships

Matthew Gajdosik

Defining relationships is central to data analytics, but how can you use SQL Server Analysis Services (SSAS) Tabular to keep them from being overlooked? Often, you can run into the situation where you really just need to say what is missing from a data set or report. Usually, you can work around this conflict of expectations versus reality by building a factless fact. This type of fact table is really just a list of relationships, where the “fact” is really the existence of the row.

Typically, this type of table is built directly into a data warehouse, and helps to materialize calculations that would otherwise be too expensive or awkward to calculate. Think about the steps that you might need to take in order to highlight gaps in data (such as a student missing class), if you were not provided a tidy checklist of expectations. And how might you implement them in SQL Server Analysis Services?

With all of the great recent updates to SSAS Tabular, and the resulting uptick in popularity for the platform, it’s getting easier and easier to pick up Tabular for yourself or your business. In this post, I’ll share my experiences through building a sample model, and walking through some of the approaches you might want to try while figuring out the best fit for your own model.

Simple Use Case

Imagine you are the new reporting and analytics expert at Sample Merchandise, Inc., and you have been tasked with helping the sales managers. After sitting down to work out their requirements, everyone agrees that they want to improve how they review their sales and salespersons throughout the week, in order to help gauge scheduling and performance.

The managers send you a copy of the report that they have been using up until now, which shows the total sales for each salesperson, by day, for the period chosen. They are quite happy with it, but wonder why it sometimes skips dates, like missing the Wednesday in the copy they have forwarded to you.

Sales Mon, 01/02 Tues, 01/03 Thurs, 01/05 Fri, 01/06
Doe, Jane $60.00 $321.00 $189.00
Doe, John $178.00 $555.00 $220.00
Smith, John $107.00 $74.00 $75.00

Table 1. The sum of all sales, grouped by salesperson and date.

After looking at the source data, you realize that not only are days missing, but salespeople as well, and adding these to the report by hand leads to a very different set of results.

Sales Mon, 01/02 Tues, 01/03 Wed, 01/04 Thurs, 01/05 Fri, 01/06
Doe, Jane $60.00 $321.00 $0 $0 $189.00
Doe, John $178.00 $0 $0 $555.00 $220.00
Smith, John $0 $107.00 $0 $74.00 $75.00
Smith, George $0 $0 $0 $0 $0

Table 2. The sum of all sales, as calculated for each salesperson and each day in the week.

Blank Handling

An important factor in all of this is the handling of “blank” results. In SSAS Tabular, a BLANK is similar to the concept of a NULL in that it is a special type of result returned whenever there is no result, such as taking the SUM() of a series of blank cells, or a context with no rows. In most common tools for reporting from an SSAS cube, such as Microsoft Excel, a BLANK is filtered out by default, and is used to select which results are “real” for the selected measures, and which just represent the cross-join of all of the different values.

You can see this in the MDX-based queries that most such SSAS-compatible reporting platforms use (due to its cross-compatibility and the historical prevalence of SSAS Multidimensional). For instance, the following MDX is representative of the type of query that would be used to build the results in the first table (Table 1), and uses the NONEMPTY() syntax to select only non-BLANK results.

SELECT
    NON EMPTY { [Measures].[Total Sales] } ON COLUMNS,
    NON EMPTY { (
        [Salesperson].[Salesperson_Name].[Salesperson_Name].ALLMEMBERS
        * [Calendar].[Date].[Date].ALLMEMBERS
    ) } ON ROWS
FROM [Model];

However, this also means that you can control the display results by forcing the measure results in or out of a BLANK state. By forcing a BLANK, you can hide results that your measure should not be returning (typically done implicitly through a FILTER() or relationship), and likewise you can force a measure to display by guaranteeing a result, even a zero/default result. Therefore, you may want to ask the following questions:

  1. Are there any situations where the users or businesses always want to see, or assume they will see, a result, even if there is no factual data?
  2. If so, are these relationships already captured by the data model? For instance, if a division or sales group should always display all of its members. (In this case, take note of where exactly you can separate the fact table(s) and these relationships for later.)
  3. If the relationships are not already accessible, how can they be added, and what is required to define them? For instance, do they already track which salespeople are active for a particular day or time period, or is there a way to derive this from other information? If not, are there a set of assumptions you can use to generate it?

(Note: Avoid the situation where you simply avoid using NON EMPTY() or otherwise don’t filter BLANKs. This implies that all combinations are valid, and may cause a lot of unnecessary results to be displayed, which can be misleading or frustrating depending on scale.)

Building Your Model

To recreate the example above, all you need is the following:

  • A fact table organized by date and some assignment (in this case a salesperson), and with an associated amount. Importantly, this is a transactional fact and does not include any significant pre-aggregation.
  • A dimensional or equivalent table listing the set of available assignments (salespeople).
  • A dimensional/calendar or equivalent table listing the set of expected dates.

The screen capture below is the Tabular model used for this example.

Simple SSAS Tabular example cube layout Figure 1. The base Tabular model, with a central Sales fact table, and three dimensional tables.

The [Total Sales] measure was written as below, and results in the PivotTable first seen (Table 1).

Total Sales:=SUM([Sale_Amount])

Given the simple data model, there are a number of ways to build in the concept of a factless fact. When reading the following example implementations, try to consider any challenges your particular data model may face.

Context Checking

In the basic model, the results only display (are non-blank) when a Sale entry exists. Since Tabular is driven by contexts, sometimes all you really need to do is alter or search the one you’re in to work around a problem. For example, if you just wanted to make sure that you weren’t losing any rows or columns to a lack of entries, you could try the following code:

Total Sales:=SUM(Sale[Sale_Amount]) + 0

This works because the 0 value will override the BLANK resulting from the SUM() if there are no Sale rows in context, but the problem is that this applies to EVERY combination; there is no selectivity to the data. One way to make this change more selective, and to work only when the user is slicing by dates could be written as follows:

Total Sales:=SUM(Sale[Sale_Amount]) + IF(NOT(ISEMPTY('Calendar')) && ISFILTERED('Calendar'[Date]),0,BLANK())

This DAX constructs an additional value to potentially force a BLANK result from the SUM into a zero, if the Calendar table has rows in the context and was filtered. In fact, if you were asked just to make sure that every date was populated, this could be all that you need to do. The result is that every cell will display at least the default value, but only when you are viewing dates, as in the table below.

Sales Mon, 01/02 Tues, 01/03 Wed, 01/04 Thurs, 01/05 Fri, 01/06
Doe, Jane $60.00 $321.00 $0 $0 $189.00
Doe, John $178.00 $0 $0 $555.00 $220.00
Smith, John $0 $107.00 $0 $74.00 $75.00
Smith, George $0 $0 $0 $0 $0
Sales Product A Product B Product C Product D Product E
Doe, Jane $60.00 $510.00
Doe, John $460.00 $95.00 $398.00
Smith, John $74.00 $182.00

Table 3. The sum of all sales, as calculated for each salesperson and each day in the week, and the sum of all sales, as calculated for each salesperson and each product.

However, this also runs into an issue if the date is filtered in any other way. If the user were to filter the entire report down to a specific week, for instance, then every combination of salesperson and product would display with a zero or summed amount, and further corrections would require further complicated DAX code that’ll likely need to be duplicated elsewhere. A pure DAX-based approach can give you some quick configurability, but will quickly become unreliable, and you may want to look into materializing these relationships.

Factless Union

Since the problem with the original model is that no rows existed for the combinations the business may expect, you can simply add the set of combinations you need. Generating this will depend heavily on your exact data, and may be easier to control from your source database. In particular, since fact tables will usually be strong candidates for partitioning, it may be easiest to add these extra rows as another partition or set of partitions, so that you can use different sources from the “real” rows.

This approach can be very efficient if you have a large number of existing measures, since a SUM() requires no DAX code changes whatsoever, and other aggregates can be corrected using only a simple filter. Additionally, since you define the data, you can base your rows on the “true” set of relationships, if they exist, or on whatever projections and assumptions the business rules allow. However, what are the potential concerns with this approach?

First, the rows you generate must account for all relationships that could relate to your fact table(s) in order to not be filtered out when any of their attributes are used. In our example, the fact table is related to three objects: Dates, Salespersons, and Products. While we have already implied that a relationship exists between Salesperson and Date, does the same relationship exist for these items and Products? If not, should all dates and salespersons be visible if the user has filtered to a single product?

Consider the following query, which comments on some variations for this simple scenario.

SELECT DISTINCT -- Replicate the fact table schema,
 Salesperson.Salesperson_ID AS [Salesperson_ID],
 Calendar.Date AS [Sale_Date],
 Product.Product_ID AS [Product_ID],
 0 AS [Sale_Amount] -- but zero the corresponding amount.
FROM Salesperson -- The cross join is one of the simplest methods to generate all of the combinations you might find,
CROSS JOIN Calendar
--FROM Salesperson_Schedule -- but should be replaced if the relationship is already described.
LEFT OUTER JOIN Product
-- ON 1=2 -- If the results should be hidden when filtering by product, don't include any,
 ON 1=1 -- and otherwise you can default to zero for all products.
LEFT OUTER JOIN Sale -- If you need to remove any existing records, you can join to the fact
 ON Sale.Product_ID = Product.Product_ID
 AND Sale.Salesperson_ID = Salesperson.Salesperson_ID
 AND Sale.Sale_Date = Calendar.Date
WHERE Sale.Sale_Date IS NULL -- and remove any conditions that are already recorded.

Second, consider the impact on any indirect measures you may need to implement, such as counts. If you need to display counts, then you will need to make sure that the ‘factless’ entries do not have any impact, such as by adding a filterable flag, or including the amount column twice; one where the factless entries default to zero, and one where they default to BLANK (since most aggregate functions, including COUNT() will ignore BLANK values).

You could even take an aggregation approach, and precalculate a new version of the Sale table for each combination of all relationship keys and slice-able columns. Of course, you should keep in mind that using aggregations can restrict incremental loads, since the full table may need to be reprocessed every time.

Parallel Fact

Is there a way to avoid some of these issues with COUNT()s and other aggregates, and needing to account for every relationship attached to the fact? Simple: Rather than appending the factless rows to the fact table, use them to generate a new table specific to the relationship you need to represent. In this example, the underlying relationship table, Salesperson_Schedule, has been imported to the model and related to the Salesperson and Calendar tables, as in the screen capture below.

SSAS Tabular example cube with parallel factFigure 2. The base Tabular model, with a central Sales fact table, and three dimensional tables.

Moving from our base measure of Total Sales:= SUM([Sale_Amount]), we can take advantage of BLANK behaviors to enforce a 0-result (default) when an entry in Salesperson_Schedule exists.

Total Sales:=SUM([Sale_Amount])+IF(ISEMPTY(Salesperson_Schedule),BLANK(),0)

This takes advantage of BLANK handling in DAX, as the SUM() will return a BLANK when there are no results, as BLANK + BLANK = BLANK, and BLANK + n = n. Therefore, if either the SUM or parallel factless fact have results, the user will see a result. This adds a further dimension to the results; unexpected data is still visible (such as sales while off-schedule), but cells are hidden appropriately when data is both unexpected and unrecorded. This can lead to examples like the following:

Sales Mon, 01/02 Tues, 01/03 Wed, 01/04 Thurs, 01/05 Fri, 01/06
Doe, Jane $60.00 $321.00 $0 $0 $189.00
Doe, John $178.00 $0 $0 $555.00 $220.00
Smith, John $0 $107.00 $0 $74.00 $75.00
Sales Mon, 01/09 Tues, 01/10 Wed, 01/11 Thurs, 01/12 Fri, 01/13
Doe, Jane $46.00 $212.00 $0 $24.00 $108.00
Doe, John $188.00 $153.00 $26.00 $555.00 $49.00
Smith, John $55.00 $45.00
Smith, George $68.00 $89.00 $0 $241.00 $23.00

Table 4. The sum of all sales, as calculated for each salesperson and each day in the week. When a salesperson is not scheduled, they are left blank, as in George Smith in week one. Sales during an unscheduled period are still visible, however, as in John Smith in week two. Any scheduled day missing sales displays as a zero value.

Since this doesn’t need to account for the Product relationship anymore, you can end up with fewer rows, and can either accept the default behavior for Product when unrelated (any Product filters will have no impact on the factless fact), or add in code or relationships to change the behavior to suit your needs.

However, you may need to be careful when handling the additional relationships, as any bidirectional filters shared between the two tables may lead to a multiple filter paths error being generated. Try to avoid needing bidirectional filters close to these tables, but you can always activate or change the direction of relationships only when needed for your particular measures to resolve these path conflicts.

Summary

Having seen some of the examples, which seems most appropriate and achievable for your situation? Do you have your own way to handle factless display in your Tabular models, or want to compare how you would implement this in Multidimensional? Feel free to comment below.

Context Checking

Pro: Can be simple to apply, makes minimal assumptions about the data

Con: Can’t include detailed relationships, DAX code changes required

Factless Union

Pro: “Clean” implementation, no/minimal DAX code changes, high control over data

Con: May interfere with complex calculations, must account for all relationships and columns when generating data

Parallel Fact

Pro: “Clean” implementation, high control over data AND relationships

Con: DAX code changes required, must manage additional relationships to avoid multiple filter paths

_________________________________________________________________________________________

To learn more on how Tallan can help transform your data into knowledgeable insights with SQL Server 2016, CLICK HERE.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>