Tallan Blog

Tallan’s Experts Share Their Knowledge on Technology, Trends and Solutions to Business Challenges

Parsing Delimited Strings in a SQL Database

I often receive requests from colleagues needing to solve particularly thorny problems from within a SQL database. This article is intended for those who might find themselves needing to parse a delimited string inside a SQL database, those who find themselves in a position to provide solutions to this particular issue and those liking SQL challenges.

Recently a colleague posed a problem where the value in one column was a list of user property metadata. Each user property was a list of the information required to extract values from a second column in the same table, a list within a list with a common delimiter throughout. A key-value pairs table is a suitable design for this application; the designer chose a different path. The data and the metadata about that data were denormalized into separate columns. Our goal is to determine a solution that extracts user properties from the given data.

I always like to take up SQL puzzles as a challenge. For those of you similarly inclined, I encourage you to load the table as defined below and figure out a solution before proceeding. The goal is to take the data provided and produce the key-value pairs as a result. For those who wish to work a solution before reading on, the script to load the data follows, along with the expected result set.

Data and Expected Results

The sample data provided for the analysis:

Sample Data

The script to load the sample data:

CREATE TABLE [dbo].[Sample Data2](
     [Userid] [varchar](36) NOT NULL,
     [PropertyNames] [varchar](125) NOT NULL,
     [PropertyValuesString] [varchar](125) NOT NULL)

GO
--TRUNCATE TABLE [dbo].[Sample Data2]
INSERT INTO [dbo].[Sample Data2] (UserId,PropertyNames,PropertyValuesString) VALUES ('101','AttorneyTableID:S:0:1:FirstName:S:1:6:Location.UnitID:S:7:1:LastName:S:8:6:Unit:S:14:1:Location.UnitName:S:15:13:City:S:28:6:','0Raisin4Muffin4Boston-TrialsBoston')
INSERT INTO [dbo].[Sample Data2] (UserId,PropertyNames,PropertyValuesString) VALUES ('102','FirstName:S:0:6:City:S:6:0:Unit:S:6:2:AttorneyTableID:S:8:1:LastName:S:9:6:','Raisin210Muffin')
INSERT INTO [dbo].[Sample Data2] (UserId,PropertyNames,PropertyValuesString) VALUES ('103','FirstName:S:0:7:City:S:7:11:Unit:S:18:2:AttorneyTableID:S:20:1:LastName:S:21:7:','BrownieSpringfield450Squares')
INSERT INTO [dbo].[Sample Data2] (UserId,PropertyNames,PropertyValuesString) VALUES ('104','FirstName:S:0:10:Location.UnitID:S:10:2:LastName:S:12:4:Unit:S:16:2:Location.UnitName:S:18:15:','Pittsfield33CAFL33Pittsfield-CAFL')
INSERT INTO [dbo].[Sample Data2] (UserId,PropertyNames,PropertyValuesString) VALUES ('105','FirstName:S:0:4:Location.UnitID:S:4:2:LastName:S:6:5:Unit:S:11:2:Location.UnitName:S:13:15:City:S:28:10:','Corn32Flake32Pittsfield-DistPittsfield')

The requested result was:

Requested Result

If you are taking this on as a puzzle, the key value pairs should produce this result:

Key Value Pair Result

Background

The first impulse of a procedural programmer is to loop through a delimited list picking off tokens one at a time and adding each token to a list. This is the simplest way to approach the programming problem:

Initialize RemainingUnparsed = Input Delimited String
WHILE LENGTH(RemainingUnparsed) > 0
DELIMITER1POS = position of First Occurrence of delimiter in RemainingUnparsed
Add to list LEFT(RemainingUnparsed, DELIMITER1POS -1)
RemainingUnparsed = RIGHT(RemainingUnparsed,LENGTH(RemainingUnparsed) – DELIMITER1POS)
NEXT

A deeper insight into this formulation is that we are looking at an example of recursion. Every succeeding delimiter is a level deeper into recursing/walking the tokens in the delimited list from first to last. Each iteration of the loop is a new level that is dependent on the prior iteration. The recursion is obscured by the fact that the flow diagram ‘looks’ like there is only one level.

The most common example of recursing data is to enumerate the parent-child relationships in a hierarchy. The classic example is the manager-employee (organization chart), where the manager is a special case of an employee. Two facts are not known, a priori, about recursed hierarchies. First, the width, i.e., the number of children per parent is unknown. Secondly, the depth, i.e., how many parent/child relationships there are, is also not known. In fact, the depth may be ragged in the sense that the depth for each child may be different.

For the manager/employee example, there are usually many ‘child’ employees per manager. A delimited list follows this pattern as a special case where the entire hierarchy is available as the delimited list, and each token represents the ‘parent’ for each succeeding token. The width of the hierarchy is exactly one, but the depth remains unknown. The delimited list would have exactly one member at each level, each member having exactly one parent (except the first) and one child (except the last).

There are multiple ways to accomplish the parsing of a delimited list in most SQL Databases:

  • Write a stored procedure/table valued function that returns a table and builds the table using a WHILE LOOP;
  • Use some form of a STRING_SPLIT() function to parse the delimited list (for SQL Server this function is available only as of compatibility level 130 (SQL Server 2016));
  • If the hierarchies are deep, leading to a performance issue, implement a CLR if the dba’s approve (Again this pertains directly to MS SQL Server, but other databases may allow for a similar construct) .
  • Write a Table Valued Function that utilizes a recursive CTE to build the result table;

The SQL Server variant of the STRING_SPLIT() function has a few limitations. In some of the solutions presented here, particularly the general solution, it is necessary to know the position of the token within the source string. The built-in function only provides the tokens as rows and not the level where the token was retrieved. The table returned is not guaranteed to be in the order of the tokens in the source string as well, so artificially numbering the result set is not appropriate.

If performance is an issue, the CLR choice will be better, but this option is not always available. The dba’s may prefer not to introduce CLR to the database server and veto any use of CLR calls. Since I have posed this as a SQL puzzle for those who wish to pursue it, a CLR is not an all SQL solution, and so will not be further considered for the purposes of this discussion.

A table-valued function with a recursive CTE is generally my first choice. There are several reasons for choosing this method. A primary reason is that I can customize the result set produced. This is to the extent that the depth of each token can be returned, and delimiters can be anything of arbitrary length (the provided function returns the level and supports delimiters up to two characters in length). Additionally, a CTE is a general solution that can be applied to any database that is at least ANSI SQL 99 compliant. Delimited lists are not usually so deep as to cause a performance issue, although it is possible that the number of tokens may exceed the limits on how deep a recursion is permitted to execute.

Finally, it is not the purpose of this article to document CTE’s as there are many other sources available(See Microsoft documentation and CTE Introduction). There is no solution that is ‘perfect’. Nevertheless, we will examine this use case from a perspective where a recursive CTE may offer better options for achieving a solution that is not immediately obvious.

Analysis

Having decided that the recursive CTE in a table-valued function is the route that will be pursued, the structure of the data provided will be evaluated. Special attention is paid to column two, which contains the delimited string that must be parsed. The first column is the user, and we will presume it is the primary key to the table. Column three, the PropertyValuesString, is a list of property values strung together with no delimiters.

Column two, the PropertyNames column, contains the metadata describing the property name and the starting location of the property value (0 based) and its’ length in the PropertyValuesString column. All property metadata is delimited with a colon(:). We need to parse the PropertyNames column so that the metadata properties can be used to extract the value of a particular user property from the PropertyValuesString column.

Four solutions are provided in two groups of two solutions. The original request was to extract only the Last Name and First Name properties from the data.  The first two solutions (1a and 1b) were constructed with, perhaps, too narrow a focus on the idea that only two of the properties were required, and the overall list of properties is short and well known. Hardcoding the list of known properties is not a general solution. The second set of two solutions focuses on being agnostic to what the list of properties is and making no assumptions about the data. Solutions are developed from that viewpoint.

Our main interest is the interpretation and structure of the metadata stored in the PropertyValuesString column, as this must be established for any solution. Looking at the PropertyNames value for Userid 101 as provided in the sample data, we see that a colon serves as the delimiter between all tokens in the list:

AttorneyTableID:S:0:1:FirstName:S:1:6:Location.UnitID:S:7:1:LastName:S:8:6:Unit:S:14:1:Location.UnitName:S:15:13:City:S:28:6:

What are the issues with parsing this string directly? Primarily, parsing the string into a simple list does not respect the groupings within the overall string:

parsing string issues

There are seven repeating groups of four tokens each within the subject string. I will explore three approaches to extracting column positions into the Property, Start Position, and Length properties. The third approach will be the most general and will assume very little about the tokens. The first two approaches will assume that the Property Names are a known, fixed list and will use code based on that knowledge to split the string into seven pieces. These approaches also assume that the second token in each property ( ‘:S:’ ) is a constant in all cases and can be ignored.

How can the data be “shaped” to make it more amenable to providing the information we need to parse the PropertyValuesString column? As previously noted, the values in the PropertyNames column do not lend themselves easily to parsing in a sensible manner. All items in the column are separated by a ‘:’, so determining where a property name and its positional information is not easily extracted as separate data elements, one row per property name.

To address this, I will replace the ‘:’ in front of any property name to a delimiter of my choosing, ‘||’ in this case (to show that delimiters do not need to be 1 character in length). The ‘:S:’ in each property name could be used as the delimiter to split the name and positional information. The table-valued function that is used to support parsing does not support more than 2 characters in the delimiter(it could have been easily modified to support delimiters of arbitrary length).

A few manipulations make the process of parsing easier: Convert the ‘:S:’ to ‘~’. If there are other values in this location, treat them similarly; The only impediment to making it all work is removing the trailing ‘:’ when it exists in the parsed rows (the last property name in the list of property names for a given row)

The Solution(s)

The PropertyNames column defines properties/attributes and the location of the property value within the PropertyValuesString. The properties are delimited within the PropertyNames column. This is the column that needs to be split into a list of PropertyName, Start Location, and Length within the PropertyValuesString column. Once this list is generated, extracting the Property Values is trivial using SUBSTRING(). Only First and Last Name properties are requested. Adding additional properties to the output is trivial.

Definition of a Table-Valued Function for parsing delimited lists

Creating a table-valued function to parse the delimited lists is central to the solution. This is intended to be a general-purpose function that can be called from any code with a requirement to parse a delimited string. If a delimiter of length longer than 2 characters is needed, change the length to a suitable number for the @Delimiter parameter:

CREATE OR ALTER FUNCTION [dbo].[ParseDelimitedStringIntoTable]
(
     @DelimitedList varchar(max)
   , @Delimiter varchar(2) = ','
)
RETURNS TABLE
AS
RETURN
(
--  You can simply highlight one of the following to test the sql
--  If you need a longer delimiter than 2 characters you should be able to
--     DECLARE @Delimiter to something more than 2 and all should still work.
/*
DECLARE @DelimitedList varchar(max), @Delimiter varchar(2) = ',';
SELECT @DelimitedList='A,B,C';
SELECT * FROM [dbo].[ParseDelimitedStringIntoTable] (@DelimitedList, @Delimiter)
*//*
DECLARE @DelimitedList varchar(max), @Delimiter varchar(2) = ',';
SELECT @DelimitedList='W|X||Y||Z', @Delimiter = '||';
SELECT * FROM [dbo].[ParseDelimitedStringIntoTable] (@DelimitedList, @Delimiter)
*//*
DECLARE @DelimitedList varchar(max), @Delimiter varchar(2) = ',';
SELECT @DelimitedList='ABCDAAAXGAXGA', @Delimiter = 'A';
SELECT * FROM [dbo].[ParseDelimitedStringIntoTable] (@DelimitedList, @Delimiter)
*/
WITH TokenList
       ( FullString
       , a
       , b
       ) AS
       ( SELECT @DelimitedList
              , 1
              , cast(charindex(@Delimiter,@DelimitedList + @Delimiter) as int)
          UNION ALL
         SELECT @DelimitedList
              , b +len(@Delimiter)
              , cast(charindex(@Delimiter,@DelimitedList + @Delimiter,b+len(@Delimiter)) as int)
           FROM TokenList
          WHERE charindex(@Delimiter,@DelimitedList + @Delimiter,b+len(@Delimiter)) <> 0
       )
SELECT DISTINCT row_number() over (partition by (SELECT NULL) order by a) as ParsedOrder
     , substring(FullString,a,b-a)  as ParsedValue
  FROM TokenList

)

GO

Option #1a – A Base Solution

In this option, we manipulate the PropertyValuesString to demarcate each set of 3 tokens with a ‘||’ delimiter and replacing ‘:S:’ with ‘~’, resulting in:

AttorneyTableID~0:1||FirstName~1:6||Location.UnitID~7:1||LastName~8:6:Unit~14:1||Location.UnitName~15:13||City:S:28:6:

Now we have a delimited string that can be split into the proper seven values.  The next step splits on the ‘||’ and then uses the position of the  ‘~’ and ‘:’ and overall length of the result string to extract the Property Start Position and length using SUBSTRING() based on the locations of the two embedded strings.  These properties are then used to extract the values of the properties from the PropertyValuesString column.

; WITH ParsedPropertyNames AS
(
SELECT a.UserID
     , a.PropertyNames
     , a.PropertyValuesString
     , replace(replace(replace(replace
	      (replace(replace(replace(a.PropertyNames,':S:','~')
     ,':FirstName','||FirstName')
     ,':Location','||Location')
     ,':LastName','||LastName')
     ,':Unit','||Unit')
     ,':AttorneyTable','||AttorneyTable')
		  ,':City','||City')
         as ManipulatedPropertyNames
FROM [dbo].[Sample Data2] a
)
     , ParsedProperties AS
(
SELECT a.UserID
     , a.PropertyNames
     , a.PropertyValuesString
     , a.ManipulatedPropertyNames
     , b.ParsedValue
     , Substr1StartPostition = cast(charindex('~',b.ParsedValue ) as int) +1
     , Substr2StartPostition = cast(charindex(':',b.ParsedValue ) as int) +1
     , LengthString = LEN(b.ParsedValue) -
             case when substring(b.ParsedValue, len(b.parsedValue),1) = ':' then 1 else 0 end
     , LastNameStart = case when b.ParsedValue like 'LastName%' then
           substring(b.ParsedValue,cast(charindex('~',b.ParsedValue ) as int) +1
	           ,cast(charindex(':',b.ParsedValue ) as int)
                  -cast(charindex('~',b.ParsedValue ) as int)-1)+1
		                    else null end
     , LastNameLen = case when b.ParsedValue like 'LastName%' then
           substring(b.ParsedValue,cast(charindex(':',ParsedValue ) as int) +1,
                    LEN(b.ParsedValue) - case when substring(b.ParsedValue,len(b.parsedValue),1) = ':'
     	                           then 1 else 0 end
                -(cast(charindex(':',b.ParsedValue ) as int) ) ) else cast(null as int) end
     , FirstNameStart = case when b.ParsedValue like 'FirstName%' then
           substring(b.ParsedValue,cast(charindex('~',ParsedValue ) as int) +1
		        ,cast(charindex(':',b.ParsedValue ) as int)
                -cast(charindex('~',b.ParsedValue ) as int)-1) +1 else null end
     , FirstNameLen = case when b.ParsedValue like 'FirstName%' then
           substring(b.ParsedValue,cast(charindex(':',b.ParsedValue ) as int) +1,
                     LEN(b.ParsedValue) - case when substring(b.ParsedValue,len(b.parsedValue),1) = ':'
     	                           then 1 else 0 end
                 -(cast(charindex(':',b.ParsedValue ) as int) )) else cast(null as int) end
  FROM ParsedPropertyNames a
 OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (a.ManipulatedPropertyNames,'||') b
 WHERE b.ParsedValue LIKE 'LastName%'
    OR b.ParsedValue LIKE 'FirstName%'
)
SELECT /* *, */UserID

     , LastName = max(case when a.ParsedValue like 'LastName%'
                      then substring(a.PropertyValuesString,a.LastNameStart,a.LastNameLen)
					  else null end)
     , FirstName = max(case when a.ParsedValue like 'FirstName%'
                       then substring(a.PropertyValuesString,a.FirstNameStart,a.FirstNameLen)
					   else null end)
  FROM ParsedProperties a
 GROUP BY UserID

If we look at the ParsedValue produced in the above query, it takes the form of ‘LastName~12:4’.  That looks like it might also be parsable.  In fact, there are two levels of parsing possible, first on the ‘~’ and then on the ‘:’. (If the two delimiters used here were converted to a single delimiter, one of the levels of parsing could be removed.)

Option #1b – Adding Additional Parsing Levels

The purpose here is to remove the need for CHARINDEX(), SUBSTRING() and LEN() calls to extract the start position and length of the properties from the above query.  Additional calls to the delimited list parser function are utilized instead.

; WITH ParsedPropertyNames AS
(
SELECT a.UserID
     , a.PropertyNames
	 , a.PropertyValuesString
     , replace(replace(replace(replace
	      (replace(replace(replace(a.PropertyNames,':S:','~')
          ,':FirstName','||FirstName')
	  ,':Location','||Location')
	  ,':LastName','||LastName')
	  ,':Unit','||Unit')
          ,':AttorneyTable','||AtttorneyTable')
	  ,':City','||City')
         as ManipulatedPropertyNames
FROM [dbo].[Sample Data2] a
)
     , ParsedProperties AS
(
SELECT x.UserID
     , x.ParsedValue
     , PropertyValuesString = MAX(x.PropertyValuesString)
	 , LastNameStart        = MAX(x.LastNameStart       )
	 , LastNameLen          = MAX(x.LastNameLen         )
	 , FirstNameStart       = MAX(x.FirstNameStart      )
	 , FirstNameLen         = MAX(x.FirstNameLen        )
  FROM (
        SELECT a.UserID
             , a.PropertyNames
        	     , a.PropertyValuesString
        	     , a.ManipulatedPropertyNames
        	     , b.ParsedValue
             , LastNameStart = case when b.ParsedValue like 'LastName%'
        	                         and d.ParsedOrder = 1 then cast(d.ParsedValue as int) +1
        		                     else 0 end
             , LastNameLen = case when b.ParsedValue like 'LastName%'
        	                         and d.ParsedOrder = 2 then cast(d.ParsedValue as int)
             	                     else 0 end
             , FirstNameStart = case when b.ParsedValue like 'FirstName%'
        	                         and d.ParsedOrder = 1 then cast(d.ParsedValue as int) +1
                                     else 0 end
             , FirstNameLen = case when b.ParsedValue like 'FirstName%'
        	                         and d.ParsedOrder = 2 then cast(d.ParsedValue as int)
                                     else 0 end
          FROM ParsedPropertyNames a
         OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (a.ManipulatedPropertyNames,'||') b
         OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (b.ParsedValue,'~')  c
         OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (c.ParsedValue,':')  d
         WHERE c.ParsedOrder = 2
           AND ((b.ParsedValue LIKE 'LastName%')
            OR  (b.ParsedValue LIKE 'FirstName%'))
     ) x
 GROUP BY x.UserID
     , x.ParsedValue
)
SELECT /* *, */UserID

     , LastName = max(case when a.ParsedValue like 'LastName%'
            then substring(a.PropertyValuesString,a.LastNameStart,a.LastNameLen)
			else null end)
     , FirstName = max(case when a.ParsedValue like 'FirstName%'
            then substring(a.PropertyValuesString,a.FirstNameStart,a.FirstNameLen)
			else null end)
  FROM ParsedProperties a
 GROUP BY a.UserID

Option #2a – A General Approach

In this case, we will expand the use of the parsing function to eliminate much of the hard-coding of property names that the prior two options relied upon. The first step develops the full list of property names. These names are crossed with the original values of the PropertyNames columns and substitutes ‘||’ for ‘:’ wherever the value of concatenate(‘:’,PropertyName) is found. This results in 0 (Property name is first in the delimited list) or 1 match for every combination of the crossed tables. This is split on the ‘||’ and the results (1 or 2 rows) filtered where the result of the split starts with the Property Name. This list may contain more than 4 tokens, but we are only interested in tokens 1, 3, and 4, so the next step splits the resultant string on the ‘:’ and filters for items of interest. At this point, the property name is on row 1, the start position of the property value is on row 3, and the length of the property value is on row 4, which we collapse and then apply to the Property names to produce a key-value pair for every property. The remainder of the request simply filters to Last Name and First Name and applies a Substring() to the PropertyValueString column to fulfill the actual request.

; WITH ExtractPropertyNames(PropertyName) AS
(
--  Extract a distinct list of the property names in PropertyNames column
--    with no manipulation of the string.  Only assumption here is that
--    there is at least one character in the Property Name, and that the
--    second token regardless of value is one character long
   SELECT DISTINCT b.ParsedValue
     FROM [dbo].[Sample Data2] a
    OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (a.PropertyNames,':') b
    WHERE b.ParsedValue LIKE '%[a-z]%'
      AND Len(b.ParsedValue) > 1
)
     , ParsedPropertyNames AS
(
   SELECT a.UserID
        , a.PropertyNames
        , a.PropertyValuesString
        , replace(a.PropertyNames
        , concat(':',b.PropertyName),concat('||',b.PropertyName)) as ManipulatedPropertyNames
        , b.PropertyName
     FROM [dbo].[Sample Data2] a
     JOIN ExtractPropertyNames b
       ON a.PropertyNames LIKE concat('%',b.PropertyName,'%')
)
     , ParsedProperties AS
(
   SELECT y.UserID
   	, y.PropertyName
   	, y.PropertyMetaData
        , y.PropertyValuesString
        , c.parsedValue
        , c.ParsedOrder
     FROM (
        SELECT x.UserID
        	    , x.parsedOrder
        	    , x.propertyName
        	    , PropertyMetaData = MIN(x.ParsedValue)
            , PropertyValuesString = MAX(x.PropertyValuesString)
          FROM (
             SELECT a.UserID
                  , a.PropertyNames
     	          , b.parsedOrder
             	  , a.PropertyValuesString
             	  , a.ManipulatedPropertyNames
             	  , b.ParsedValue
     			  , a.PropertyName
               FROM ParsedPropertyNames a
              OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (a.ManipulatedPropertyNames,'||') b
               ) x
         WHERE  x.ParsedValue LIKE concat(PropertyName,'%')
         GROUP BY x.UserID
             , x.ParsedOrder
   	         , x.PropertyName
         ) y
     OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (y.PropertyMetaData,':')  c
     WHERE c.ParsedOrder IN (1,3,4)
)
     , ParseUserPropertiesIntoKeyValuePairs AS
(
   SELECT a.UserID
        , a.PropertyName
        , PropertyValue = SUBSTRING(MAX(a.PropertyValuesString),
	      cast(MAX(case when a.ParsedOrder = 3 then a.ParsedValue else null end) as int) +1,
	      cast(MAX(case when a.ParsedOrder = 4 then a.ParsedValue else null end) as int))
     FROM ParsedProperties a
    WHERE a.ParsedOrder IN (3,4)
    GROUP BY a.UserID
        , a.PropertyName
)
SELECT UserID
     , FirstName = MAX(case when z.PropertyName = 'FirstName' then z.PropertyValue else null end)
     , LastName = MAX(case when z.PropertyName = 'LastName' then z.PropertyValue else null end)
  FROM ParseUserPropertiesIntoKeyValuePairs z
 GROUP BY z.Userid
ORDER BY 1,2
--OPTION (MAXRECURSION 1000)

Option #2b – A Simplified General Approach

Option #2a can be simplified. In this case, the need to develop a list of the Property Names is avoided. The delimited string is parsed in whole for ‘:’, no manipulation required. What is critical is that the parsed order is used to find the elements needed (grouped by sets of 4, 1st, 3rd, and 4th tokens). The result:

; WITH ParsedProperties AS
(
             SELECT a.UserID
     	          , ParsedOrder = b.ParsedOrder % 4
             	  , a.PropertyValuesString
     		  , PropertyName = MAX(case when b.ParsedOrder % 4 = 1
			                    then b.ParsedValue else null end)
			      OVER (partition by a.Userid,(b.ParsedOrder - 1) / 4)
     		  , StartPosition = MAX(case when b.ParsedOrder % 4 = 3
			                     then b.ParsedValue +1 else null end)
			      OVER (partition by a.Userid,(b.ParsedOrder - 1) / 4)
     		  , PropertyLength = MAX(case when b.ParsedOrder % 4 = 0
			                      then b.ParsedValue else null end)
			      OVER (partition by a.Userid,(b.ParsedOrder - 1) / 4)
               FROM [Sample Data2] a
              OUTER APPLY [dbo].[ParseDelimitedStringIntoTable] (a.PropertyNames,':') b
         WHERE  (b.ParsedOrder % 4) IN (1,3,0)
		   AND b.ParsedValue != ''
)
     , ParseUserPropertiesIntoKeyValuePairs AS
(
   SELECT a.UserID
        , a.PropertyName
        , PropertyValue = SUBSTRING(a.PropertyValuesString,
		      cast(StartPosition as int),
			  cast(PropertyLength as int))
     FROM ParsedProperties a
    WHERE a.ParsedOrder = 1
)
SELECT UserID
     , LastName = MAX(case when z.PropertyName = 'LastName' then z.PropertyValue else null end)
     , FirstName = MAX(case when z.PropertyName = 'FirstName' then z.PropertyValue else null end)
  FROM ParseUserPropertiesIntoKeyValuePairs z
 GROUP BY z.Userid
--OPTION (MAXRECURSION 1000)

Results

All queries produce identical results:

Result 2

Relative Performance

How well do these solutions compare with each other from a cost perspective?  The solutions were run as a batch in the order specified above.

Execution Plan

They were also run individually 10 times in succession:

Performance

Solution 1a (Query 1) performs best but suffers in that it is not a general solution. Solution 1b (Query 2), also not general, shows the worst cost, although, from an execution time perspective, it is not as bad as the execution plan seems to indicate.

Focus on Solution 2a (Query 3) and Solution 2b (Query 4) as these are general solutions. From both cost and execution, time perspectives Solution 2a is about 50% worse than Solution 2b. The reason for this is that there are two passes to parse the delimited string in Solution 2a, the first to determine the list of properties and the second to break each string into one or two pieces depending on the location of the property within the overall delimited string.

Conclusion

The ‘parsing delimited lists’ use case for recursive CTEs is often overlooked by/unknown to developers. The table-valued function provided here leverages a recursive CTE to enable ‘set-based’ processing of delimited lists directly in SQL. For performance reasons, this technique may not apply to all scenarios.  However, in most cases, it provides a reliable, robust solution to a common problem for programmers.

For those who took up the challenge of a puzzle, I sincerely hope you found the exercise gratifying. I look forward to seeing alternate approaches.


Learn more about Tallan or see us in person at one of our many Events!

Share this post:

1 Comment. Leave new

Get Qoral Health
February 8, 2020 9:51 am

The PARSENAME function is logically designed to parse four-part object names. The nice thing about PARSENAME is that it’s not limited to parsing just SQL Server four-part object names – it will parse any function or string data that is delimited by dots.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\