Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

Part 1 – Analysis Services Multidimensional: “A duplicate attribute key has been found when processing”…

Mark Frawley
Introduction – Part 1

The most common and dreaded error that may occur when processing a dimension in Analysis Services Multidimensional (MD) is undoubtedly “Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: …. “   ‘Common’ because of both the poor data integrity frequently found in cube data sources, and improper modeling of the data in MD. ‘Dreaded’ because a number of distinct situations all give rise to this error and it can be difficult to diagnose which one is the actual problem. There is enough to explore around this error that this is Part 1.  There will be a Part 2 subsequently.

The full error message looks like “Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘TABLENAME’, Column: ‘COLUMN’, Value: ‘VALUE’. The attribute is ‘ATTRIBUTE’.”  TABLENAME will be a concatenation of the Schema and Name properties of the dimension table (separated by “_”) as defined in the Data Source View (DSV) – note it will not be the Friendly Name property of the table which appears in the graphical displays of the DSV and dimension editor. This is the first point of confusion, if the Friendly Name property has been used to give better names to source tables. ATTRIBUTE will be the name of the dimension attribute raising the error, COLUMN will be the name of the relational column of the dimension table on which the attribute is based, and VALUE will be the value of the attribute that is raising the duplicate key error.

Before diving in, it’s worth mentioning that Analysis Services Tabular (Tab) will never raise this error – it will happily ingest and use the same data. Whether this leads to incorrect results or not depends on a number of factors outside the scope of this article. In general, MD is much stricter about data integrity than Tabular, which may be good or bad, depending on your goals and needs.

Case Study Prerequisites

It is assumed the reader has access to and permissions on SQL Server, Analysis Services MD and a compatible version of Visual Studio (VS), with either BIDS or SQL Server Data Tools (SSDT) as appropriate to your version of VS and MD – and sufficient familiarity with all of these to carry out the indicated demos with a minimum of guidance. The demos should work on any version of SQL Server and any version of Analysis Services MD 2005 or later.

To focus as exclusively and simply on the issues as possible, test dimension tables are created and updated in TEMPDB on SQL Server, and modeled as dimensions in a cube database consisting of only the test dimensions (no measure groups) – rather than the traditional Adventure Works database and cube.

 

Case 1 – Blanks and Nulls in Attribute Key Data

This is the simplest and most common scenario. Simple as it is, much can be learned from it. The duplicate key error will occur when all of the following are true for an attribute:

  1. its KeyColumn property is a column in the source which is nullable
  2. this KeyColumn column becomes a WChar data type in MD
  3. the NullProcessing property is set to “Automatic” (the default)
  4. the NameColumn property is either not defined (shows in BIDS/SSDT as “(none)”), or is explicitly defined as the same column as the KeyColumn property. In both cases the Name values come from the Key column.
  5. a server-generated Unknown member is not defined for the dimension
  6. the Key/Name column contains both nulls and blank/empty strings. If there are no null values, or if there are no blank/empty values, the error will not occur.

Though this may seem like enough different conditions that the issue should arise rarely, in the real world all six conditions are met quite frequently.

Let’s examine the possibilities. Start by executing the following TSQL to create and populate the test dimension table. This is about the simplest dimension possible, with a surrogate key, code field and code description field. Note that the Descr column allows nulls, and has multiple instances of empty string value but no nulls.

Tempdb is used because a very low level of SQL Server permissions is required.

create table tempdb..MD_Dupe_Case1
(
SK     int                  not null,
Code   char(1)              not null,
Descr  varchar(15)
)
go
 
insert tempdb..MD_Dupe_Case1
values
(1, ‘A’, ‘Value1′),
(2, ‘B’, ‘Value2′),
(3, ‘C’, ”),
(4, ‘D’, ”)
go

Next, create a new Multidimensional project in Visual Studio and create a dimension of three attributes representing the above dimension table, defining SK as the Key attribute of the dimension, and in each case defining only the KeyColumn property of each attribute. You will of course first need to create a Data Source object pointing to your SQL Server instance, and then import the table into the DSV.

The following is an example of what this should look like, highlighting the Descr attribute:

as-dupe-pic-1

Build and deploy the project, and do a Full process on the dimension. It should be successful. If you browse the Descr attribute you should see a single blank member.

We need to get a better handle on what is really going on – when browsing the attribute, what is “really” behind something displayed as blank ? Let’s run an MDX query in SSMS to see the values for Key and Name of the Descr attribute – along with additional derived measures that test the values, so we are not totally dependent on how our tool displays things. The following MDX query shows whether a given property is “EMPTY” (MD’s equivalent of NULL) and distinguishes empty and non-empty strings. As a side-benefit it shows how the cube sees the “All” member – we will remove this in future queries. Note: since there is no cube and the query window expects one, you must proceed in a particular way to be able to run this query: a) in SSMS, navigate to the cube database you created and highlight it; b) then and only then open a new query window. You will see that the object pane on the left says “Error loading metadata”, but you can still run the query.

WITH MEMBER [Measures].[key] AS '[MD Dupe Case1].[Descr].currentmember.properties("key0")'
MEMBER [Measures].[name] AS '[MD Dupe Case1].[Descr].currentmember.properties("name")'
MEMBER [Measures].[TestKeyForNull] AS CASE WHEN ISEMPTY([Measures].[key]) THEN "Empty" ELSE "Not Empty" END
MEMBER [Measures].[TestKeyForEmptyString] AS CASE WHEN [Measures].[key] = "" THEN "Empty String" ELSE "Not Empty String" END
MEMBER [Measures].[TestNameForNull] AS CASE WHEN ISEMPTY([Measures].[name]) THEN "Empty" ELSE "Not Empty" END
MEMBER [Measures].[TestNameForEmptyString] AS CASE WHEN [Measures].[name] = "" THEN "Empty String" ELSE "Not Empty String" END
SELECT {
[Measures].[key],
[Measures].[name],
[Measures].[TestKeyForNull],
[Measures].[TestKeyForEmptyString],
[Measures].[TestNameForNull],
[Measures].[TestNameForEmptyString]} ON 0,
[MD Dupe Case1].[Descr].members ON 1
FROM [$MD Dupe Case1]

Here is the result:

as-dupe-pic-2

It is unclear what the “(null)” values for “key” and “name”, derived from the attribute’s “key0″ and “name” properties signify, since there were no nulls in the source data, TestKeyForNull finds “key” not ISEMPTY, and TestKeyForEmptyString finds it an Empty String.  We’ll just accept this.

Next, run this SQL update:

update tempdb..MD_Dupe_Case1
set Descr = '     '
where SK = 4

Build, deploy and do a Full process on the dimension (hereafter this will be abbreviated BDF). It should again be successful, and again if you browse the Descr attribute you should see a single blank member, thus illustrating that empty strings and non-empty but blank strings are effectively trimmed to the same distinct value, and may coexist with non-empty, non-blank, non-null values. If you also run the prior MDX Query the results will be the same.

Now, run this update, followed by BDF:

update tempdb..MD_Dupe_Case1
set Descr = null
where SK in (3,4)

Once again it will be successful, and when browsing Descr will appear as the earlier cases, thereby illustrating that multiple nulls may coexist with non-empty, non-blank values. Running the MDX Query will also give the same results.

Now let’s see what happens when there exist both nulls and empty/blank strings in the Key column. Run this update, followed by BDF:

update tempdb..MD_Dupe_Case1
set Descr = ''
where SK = 4

The Full process fails with the duplicate key error on the Descr attribute:

as-dupe-pic-3

Since the dimension did not successfully process, it remains in its prior state, and we cannot explore what is going on via MDX queries. We have to deduce what we can from the SQL query issued during the processing of the Descr attribute, which can be found in the Process Progress dialogue window and is shown below (note you will need to explicitly switch to Tempdb to run this):

SELECT DISTINCT [dbo_MD_Dupe_Case1].[Descr] AS [dbo_MD_Dupe_Case1Descr0_0]
FROM [dbo].[MD_Dupe_Case1] AS [dbo_MD_Dupe_Case1]

The result:

as-dupe-pic-4

So why does the error happen ? Recall that the NullProcessing property for this attribute is set to “Automatic”. This setting (as well as ZeroOrBlank, which is equivalent) means MD will convert nulls to empty strings on the WChar data type, such as we have here. When this happens in the above case, there will be two rows equal to empty string. When the column is in the role of attribute key, there is obviously then a duplicate, which MD is not expecting since a DISTINCT was done. This can be seen in the error message in the value given for “Value:”, which is ‘’.

It could be argued that this is a design flaw in MD – it doesn’t seem quite fair to change the data after a DISTINCT and then demand that it still conform to DISTINCT semantics – but that is the behavior.

We won’t demonstrate it, but the analogue to this if the key column is numeric and NullProcessing set to either Automatic or ZeroOrBlank is the value zero. Nulls would be converted to zero and if there were also present “real” zero values, a similar duplicate key error would result.  Interestingly, if you look back at the prior MDX query results you’ll see that “key” for the “All” member is shown as having a value of zero.  That might make one expect that even one “real” zero value for the key would immediately lead to the error, since the “All” member always exists (well, almost always, but we won’t get into that).  However, this is not the case – “real” zeroes don’t conflict with whatever MD uses internally as the “All” key, but mixed nulls and zeroes will get the error.

You now have insight into the likely cause when the duplicate key error occurs and the reported Value is ‘’ or 0. If you are using views in your DSV (you are, aren’t you ?), you can fix the problem by applying a COALESCE to ‘’ or 0 on the relevant column. Or you may wish to look further back into your ETL process or even source data, to see why you are receiving inconsistent data and perhaps fix it upstream.

Unfortunately, when this error happens, MD does not tell us the value of the dimension key (SK here) on the row(s) with the problem. You need to develop your own SQL query on the DSV source to find the rows with nulls.

Case 1 – But wait, there’s more !

First, while the default error configuration for processing will cause it to fail on the first error, this can be changed so that you can get more information in one run – for instance, to identify all attributes in a dimension that have this problem, and/or how many rows for a given attribute have the problem. This is very useful during development, though you wouldn’t want to modify the default fail-on-first error configuration in production.

To change the error configuration, click “Change Settings” on the Processing dialogue, then choose the “Dimension key errors” tab on the next pane. Click “Use custom error configuration”, then “Ignore errors count”, and then change the “Duplicate key” dropdown to “Report and continue”. Now a Full process will report all such errors across all dimensions. Here is what it looks like:

as-dupe-pic-5

 

Second, what about those other NullProcessing options: Preserve, Error and UnknownMember ? These have the following effects:

  • Preserve: as this suggests, nulls will be preserved rather than raising an error. With our setup thus far and default error configuration, this option will cause processing to succeed again. When you browse the attribute you will see two indistinguishable blank members, one representing the null and one the true blank. This setting does not seem particularly useful.
  • Error: this setting with our setup and default error configuration will cause processing to fail, but with a different message: Errors in the OLAP storage engine: The record was skipped because a null attribute key was encountered. Attribute: Descr of Dimension: MD Dupe Case1 from Database: DupeKeyTests, Record: 2. At first this might appear useful because it gives you a record number, here “2”. However, it is not useful after all, as it is not the unique identifier (SK here) of the failing row, but the ordinal number of the row as presented by the SQL DISTINCT query – an order which is not defined. It does more precisely explain the nature of the error than the message you get with Automatic, which may be useful in some cases.
  • UnknownMember: this will cause nulls to be mapped into the MD-generated unknown member of the dimension, thereby allowing processing to succeed and for the null cases to be represented as a non-blank member according to whatever you configured for the dimension’s unknown member (how to configure this will not be covered as there are many articles on-line for how to do so). Using the MD-generated unknown member may be useful in prototyping but is generally not recommended – it is far preferable to create one or more “real” “unknown” members in the dimension to represent unknown case(s) and have the ETL assign them according to business rules to the fact table.

Third, again we will not demonstrate it, but the null processing options are also settable in the Dimension Usage pane of the cube editor. In the grid, if you click the “-“ on the granularity attribute at the intersection of any dimension and a measure group, then click Advanced on the resulting dialog, the resulting pane will show a Null Processing column with the same options as those defined earlier. If the setting here differs from that on the granularity attribute in the main dimension (usually the Key attribute), you will get a red squiggly on the granularity attribute in the Dimension Usage tab. What is not clear is whether this is merely a warning – i.e. you can override the main setting – or not. For a visualization of this, though not an answer to this question, see here.

 

Click Here to check out Part 2 of the series to learn more on resolving duplicate attribute key errors within Analysis Services Multidimensional!

_________________________________________________________________________________________

To learn more about SQL Server 2016 and how Tallan can help turn your data into knowledgeable insights and business action, CLICK HERE.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\