BizTalk Patterns: Database Assisted Aggregation
Aggregation is a common pattern used in Enterprise Integration. System (or systems) A sends many messages that System B expects in a single message, or in several messages grouped on a particular attribute (or set of attributes).
The most common way to approach this in BizTalk is using a Sequential Convoy orchestration to aggregate the message – Microsoft provides a sample of how to do this in the SDK. This is a powerful pattern, but has a few downsides:
- Sequential Convoys can become difficult to manage if they’re expected to run for a long time
- Complex subgrouping can multiply the headaches here – for example, if you have to aggregate messages for hundreds of destinations concurrently
- The destination message may become very large, to the point where BizTalk cannot optimally process it anymore – particularly if it is a large flat file message.
- Modifying the aggregate message may be challenging in a standard BizTalk map, especially if one message might be expected to modify a previously aggregated message.
In this post, we’ll look at two approaches to overcoming these challenges through a database assisted aggregation pattern.
The advantages of this pattern are:
- Avoid complexities of sequential convoys, especially sequential convoys that may have to run for long periods of time
- Allow for complex subgrouping
- More flexible data modification options throughout aggregation
There are two versions of this pattern that I’ve been using. One involves storing XML blobs in a document-database model, and one involves writing data down to a table throughout the day. The document model works well if there’s no need to modify data, data is not going to a flat file destination, and you need to capture only limited metadata about the document. The table model works well for flat file based output, especially for larger flat files that would be prohibitive to work with in BizTalk in a single shot.
In this scenario, whenever a message comes in that needs to be aggregated, it’s mapped to a stored procedure that takes an XML parameter along with any other metadata fields about that message that will be captured on the row level. This might be as simple as a Transaction ID, OriginID, and a date that will be used for grouping/sorting. A stored procedure would look like this:
CREATE TABLE tAggregatorSample (TransactionID int, OriginId, DT datetime, doc XML); GO CREATE PROCEDURE pInsertDoc ( @TransactionID int, @OriginID int, @dt DATETIME, @doc XML ) AS BEGIN INSERT tAggregatorSample (TransactionId, OriginId, DT, doc) VALUES (@TransactionID, @originId, @dt, @doc); END
A map to this procedure would look fairly standard, but the “doc” node would take input from a scripting functoid to use an xsl:copy-of instruction to copy the desired portion of the XML document. This snippet would go in a scripting functoid using Inline XSLT, and a single output link to the doc node:
<doc xmlns='http://schemas.microsoft.com/Sql/2008/05/TypedProcedures/pInsertDoc'> <!-- put it in a CDATA node to prevent BizTalk from confusing content in the mapper --> <xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text> <xsl:copy-of select="/path/to/node/to/copy" /> <xsl:text disable-output-escaping="yes">]]</xsl:text> <xsl:text disable-output-escaping="yes">></xsl:text> </doc>
The select attribute on the copy-of could point to any node in the document; the root node works for the entire document, but since we’re aggregating we probably want a child node that will later be re-wrapped in the root node. So, using the following XML:
<ns0:Root xmlns:ns0="http://ns-for-schema"> <ns0:TransactionId>1</ns0:TransactionId> <ns0:OriginId>123</ns0:OriginId> <ns0:TransactionDate>2015-10-10</ns0:TransactionDate> <ns0:Transaction><ns0:Child /><ns0:Child />...</ns0:Transaction> <ns0:Root>
We’d probably want the path for the “Transaction” node.
On the outbound side of things, when ready (or a particular trigger message is received), a stored procedure like the following will extract the aggregated XML from the database:
CREATE PROCEDURE pExtractDoc ( @OriginId int, @dt DATETIME ) AS BEGIN WITH XMLNAMESPACES('http://ns-for-schema' as ns1) SELECT TOP 1 TransactionId as 'ns0:TransactionId' ,OriginId as 'ns0:OriginId' ,dt as 'ns0:TransactionDate') ,(SELECT doc as '*' -- just get the raw XML as is FROM pAggregatorSample t2 WHERE OriginId = @OriginId ORDER BY DT FOR XML PATH(''), TYPE) -- stuff the XML back into the parent FOR XML PATH('ns1:Root') END
Invoking this as an XmlProcedure would give you back your aggregated document ready for consumption by BizTalk (for example, a send port filtering on such messages coming from this port).
This model works well when large aggregation is done for a simple flat file. In this case, the scenario is a system that’s expecting a very large flat file, or perhaps needs previous records updated. Here, the inbound map and stored procedure is far more standard – map the relevant fields to the columns as desired, insert them, and you’re good. If the rows need to be modified, they can now be modified easily using standard SQL UPDATE statements.
On the outbound end, you have a couple options. If the file is going to a large CSV file, you can use the BCP utility to very quickly export it; I’ve tested this with up to 500MB worth of raw flat file data, and it finishes in about a minute (forget even trying to process a message like that in BizTalk – the XML version would be several gigabytes). On the other hand, if you’re dealing with smaller quantities of data, you can export the data back to BizTalk via a stored procedure (either StronglyTyped or XmlProcedure as desired), map to your destination schema, and be on your way. Even in this case, the outbound mapping becomes much simpler and avoids the need to reconstruct messages for every update (very expensive as the message grows in size).
Some data retention policy should be put into place here – issuing a TRUNCATE after the last aggregated message is sent, or deleting rows as they get polled for example. If there’s any desire to resubmit aggregated messages, the data could be stored for longer periods of time and resubmitted easily using the same process as the initial export.