Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

Muenchian Grouping in BizTalk while keeping Mapper functionality

Dan Field

Muenchian Grouping is a powerful technique for to allow grouping by common values among looping/repeating nodes in an XML document.  BizTalk does not have out of the box support for this, but it can be achieved by adding custom XSLT to a map.  Chris Romp wrote a post about this years ago that serves as an excellent example of the idea in BizTalk: http://blogs.msdn.com/b/chrisromp/archive/2008/07/31/muenchian-grouping-and-sorting-in-biztalk-maps.aspx.  The drawback of his method is that you lose all other Mapper functionality by using completely custom XSLT, and custom XSLT is more difficult to maintain than a BizTalk map.

Enter Sandro Periera’s phenomenal tutorial on Muenchian Grouping in BizTalk maps (https://code.msdn.microsoft.com/windowsdesktop/Muenchian-Grouping-and-790347d2).  His solution is particularly powerful because it allows you to maintain the functionality and simplicity of the BizTalk Mapping engine while extending it to allow for the Muenchian Grouping technique as well.  However, there is still a limitation to this approach; the XSLT functoids will still be responsible for any transformation of child nodes that are grouped.  That poses a problem if your grouping logic requires that a parent (or perhaps even a root) node gets grouped on the criteria and many child nodes must be appended to the proper parent.

I recently faced just this situation while working for a client.  The XML data coming in needed to be extensively transformed, and in particular, duplicate child nodes had to be converted to unique parent nodes, with the original parents being appended to the correct new unique node.  Custom XSLT is clearly required here, but a hybrid approach can be used to still allow a regular BizTalk map to transform the resultant data.

My keys look like the following: each duplicate node has a pair of elements that, when joined, make it unique (ContactID and ContactType):

 <xsl:key name="Contacts" match="Contact" use="concat(ContactID, '|', ContactType)"/>

I could then use a xsl:for-each loop to join these nodes as the new parent node; the rest of the map was pretty straightforward:

  <xsl:for-each select="//Root/Node1/Node2/Contact[generate-id(.) = generate-id(key('contacts', concat(ContactID, '|', ContactType)))]">
    ... parent nodes appended here ...

To avoid having very complicated logic and needing to replicate custom functoids in that “…” part, I started considering ways to have two maps: one using custom XSLT and the second being a regular BizTalk map.

The most basic way to achieve this strategy would be to have two maps which are called in sequence from an orchestration.  The first map has the custom XSLT to do the Muenchian grouping, the second map is a regular BizTalk map that works from the output of the first map.  This would work, but not if you want to do mapping on a receive port (which the architecture called for), where only the first map would get triggered.  It also is a bad idea of the message sizes are large (which occasionally happens with this trading partner).  Another method would be the one discussed here (again by Chris Romp): http://blogs.msdn.com/b/chrisromp/archive/2008/08/06/stacking-maps-in-biztalk-server.aspx, but this would involve creating several dummy receive locations to achieve something that can be done fairly simply in a pipeline component.

My eventual solution was pretty simple: create a pipeline component with the custom XSLT as an embedded resource.  I decided to put it in the decode stage, but it could have gone in the XML Disassembler stage, or in a C# helper library that gets called by an orchestration.  Here’s the relevant code:


A quick explanation of the code: A VirtualStream is used to manipulate the incoming message; the embedded XSLT is loaded as a resource, and XslCompiledTransform is used to run transform the schema.  The new message is added to the Context’s ResourceTracker to ensure that the garbage collector will properly dispose of streams when the pipeline is done with them, and the message is passed on to the next stage of the pipeline component.

This approach does have a drawback.  If the trading partner changes the schema, there will now be several updates to make: the pipeline component’s XSLT, the map, and the inverted schema.  However, this change is not likely to occur often, and using this method means that future developers can continue to use the BizTalk mapper functionality (including some custom functoids) when mapping the partially transformed trading partner data to the internal canonical format.

And the upside is worth it.  The complexity of the XSLT grouping is handled in one place, and it will be easier to maintain changes to that separately from changes to the rest of the mapping logic.  Plus, new developers on the solution will be able to quickly see how the data is getting sent into the canonical, a task that is more difficult to understand when looking at raw XSLT.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>