Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

BizTalk mapping patterns: Remove Empty Nodes

Dan Field

Some recipients of BizTalk messages (such as the SQL Server adapter and some SOAP web services) will run into problems if there’s an empty node in the message, such as:

<root>
  <Keeper>This has value</Keeper>
  <node />
  <node></node>
  <node>
    <child />
  </node>
</root>

In this example, we’d really just want to have Keeper in there and get rid of the rest (to avoid other services from throwing exceptions on those empty/valueless nodes).

There are mapping patterns to address this, generally using a Value Mapping functoid with a Logical functoid. For example, you might have a Logical Existence or Logical String as the first parameter (see this blog for an example) to prevent an empty value from being mapped to the destinatino node. With a Table Looping functoid, you can achieve this for an entire table by having the first column be gated (see the Remarks section of the Table Looping refernece: MSDN).  This technique works well, and isn’t too difficult to maintain in a small map where only a few nodes are affected.  However, it can become very cumbersome in a larger map where you need to suppress all empty nodes, potentially numbering in the hundreds. The Value Mapping pattern becomes cumbersome to maintain and update in such scenarios, but there are alternatives.  In this post, I’ll outline some methods for removing empty nodes from XML, and how to use these methods in BizTalk effectively.

Methodology

There are several ways to remove empty nodes from an XML document.  The most naive approach would be to use a regular expression to do so; however, this approach is bad practice for many reasons.  XML becomes challenging to parse correctly using regular expressions, what with concerns about namespaces, prefixes, CDATA nodes, comments, etc.  While it is probably technically possible, such a regular expression would become a maintenance nightmare.  Another approach is using XSLT, such as the following template (adapted from here):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes" method="xml"/>
    <xsl:template match="node()">
        <xsl:if test="count(descendant::text()[string-length(normalize-space(.))>0]|@*)">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()" />
            </xsl:copy>
        </xsl:if>
    </xsl:template>
    <xsl:template match="@*">
        <xsl:copy />
    </xsl:template>
    <xsl:template match="text()">
        <xsl:value-of select="." />
    </xsl:template>
</xsl:stylesheet>

You could also do it using an XmlReader (with some look-ahead logic) – this approach gets slightly involved, and I really doubt that any implementation would perform much better than the way XDocument would handle things anyway.

That leads to the simplest approach (while still respecting the complexity of the XML): using LINQ to XML.

XDocument xd = XDocument.Load(xmlStream);
// Remove all empty attributes
xd.Descendants().Attributes().Where(a => string.IsNullOrWhiteSpace(a.Value)).Remove();
xd.Descendants()
        .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration)) // leave elements with non-XMLNS attributes
                        && string.IsNullOrWhiteSpace(e.Value))          // look for empty elements (with no attributes on them)
        .Remove();

That XDocument can then be .Save()ed to your stream again. XDocument will handle the complexity of the XmlReader for us (it uses XmlReader behind the scenes), and will perform lightyears better than XmlDocument (which will try to create an in memory DOM to perform such operations). Because of its simplicity and performance, this is my preferred method; however, you could certainly use XslCompiledTransform (perhaps pre-compiling the XSLT template mentioned above) in the methods listed below.

Implementation

There are two places in BizTalk where these kinds of methods can be applied: a custom pipeline component or an Orchestration MessageAssignment shape. A Custom Pipeline component would look like this:

public IBaseMessage Execute(IPipelineContext pContext, IBaseMessage pInMsg)
{
    if (IsEnabled == true)
    {
        try
        {
            XDocument xd = XDocument.Load(pInMsg.BodyPart.GetOriginalDataStream());
            VirtualStream vts = new VirtualStream();

            xd.Descendants()
              .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration))
                        && string.IsNullOrWhiteSpace(e.Value))
              .Remove();     

            vts.Position = 0;
            pInMsg.BodyPart.Data = vts;

            pContext.ResourceTracker.AddResource(vts);
        }
        catch (Exception ex)
        {
            // Log your exception meaningfully - BizTalk doesn't always bubble up exceptions the way you'd expect!
            throw;
        }
    }

    return pInMsg;
}

This component could be added to a send pipeline, and is probably the most stream-lined way of handling this issue (the new message won’t have to hit the MessageBox, as it would in an Orchestration).  The map to your SQL or SOAP call could execute on the port (or in an Orchestration), and the component would trim down the map result for you.  However, sometimes it might be preferable or necessary to do so in an Orchestration, in which case the following snippet would take care of things (after calling your map):

public static void RemoveEmptyNodes(XLANGMessage msg, int part = 0)
{
    VirtualStream vts = new VirtualStream();
    XDocument xd = msg.ToXDocument(part);

    xd.Descendants()
        .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration))
                        && string.IsNullOrWhiteSpace(e.Value))
        .Remove();     

    xd.Save(vts);
    msg[part].LoadFrom(vts);
}

And then in a MessageAssignment:

Utilities.RemoveEmptyNodes(msg_SQL, 0);

Caveat Emptor!

The one caution to be aware of: these methods could potentially leave you with a completely empty message! Going back to the example at the beginning, if that Keeper node didn’t exist any of these methods would result in a completely empty document. However, there same would be true of the Value Mapping pattern if applied to the entire document, and other checks should exist to prevent this from happening in the first place.

1 Comment. Leave new

Great article. I only have one question. Where is the XLANGMessage.ToXDocument() defined? Is this an extension method?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\