Message Schema Resolution – The Simple Approach
Recently, a company in the process of implementing BT 2006 within their enterprise asked me for a recommendation regarding best practices for a pre-existing usage scenario they were seeking to accommodate using BizTalk. This company receives, on a reoccurring basis, a variety of disparate inbound flat file business documents which, at present, are processed into their suite of business applications using SQL Server DTS. While the interchange with this company in question has CERTAINLY inspired folks at BlogBizTalk and Tallan to start developing a whitepaper comparing and defining best-practice usage and design considerations for implementing SSIS (or equivalent ETL tool) and BizTalk both in scenarios of consonance or autonomy, a few much more immediate considerations arose for my address.
One such very much more low level consideration was that for the handling of the disparity of received file formats. My understanding of the situation was that for quite some time the existing integration with external partners did not require that inbound files comply to any 1 format and as such it was unlikely that a newly introduced such requirement would not be easily adopted by said partners. In general, through the course of our conversation I identified a real desire to make the BizTalk implementation essentially unapparent to any involved systems or groups of users.
Currently the “Which file format is this?” question (or Schema Resolution to introduce the BizTalk vernacular) was answered through the use of a variety of TSQL code incorporated in the DTS Packages built for the existing integration. Clearly, BizTalk can provide this company a far more extensible solution for this. As with my previous posting on the Aggregator pattern, I will refer anyone asking this question to a solution example within the BizTalk SDK samples section, in this case under Pipelines: SchemaResolverComponent.
The Solution Example Out of the Box
This solution example provides a BizTalk solution along with and under a separate solution a C# class library project. The C# solution contains classes that define a custom Flat File Disassembler pipeline component. By inheriting interfaces defined within the BizTalk class libraries for .NET, this C# project defines a new pipeline component that, at its core, “Probes” (IProbeMessage interface) the received message within a pipeline and, based on particular positioned character values within the file, is able to determine the applicable message schema. By assigning schema properties to the context of the message within the Probe method, a call the base disassembler passing that same context disassembles the message determined schema-accordingly.
Note that in this example, the indication of applicable schema is not based on an actual parse of the file but rather a key value that is extracted from the stream of the message content.
When one looks at the code for the custom pipeline component, it would be very easy to start to mentally connect how that TSQL code I mentioned might translate. This TSQL very likely does similar straight analysis of file content aspects looking for such keys to format identity.
However, I would have to say that this usage of the custom pipeline component to perform this schema resolution would not be my proscribed approach to this problem – especially for the particular company in question (as they are relatively new to the BizTalk). Custom pipeline components certainly represent a more advanced topic in BizTalk development and can in many cases, especially for those new to BizTalk, very quickly lead to exactly the type of lack of extensibility and ease of maintenance that we were using BizTalk in the first place to try and avoid.
Documentation that ships with this sample solution reads as follows:
“The flat file disassembler component normally requires you to define a parsing schema at design time. So if you expect to receive different flat file documents on the same receive location, you typically include several flat file disassemblers in the receive pipeline, one for each schema. At run time, the correct disassembler component is selected using a pipeline probing mechanism. However, this approach is expensive if you have many flat file schemas because probing for every corresponding disassembler component degrades pipeline performance.”
So unless these performance issues present themselves as an issue within a particular BizTalk message processing scenario or the number of possible defined message schemas exceeds 255 (the number of Disassemblers allowed within the Disassemble section of a Pipeline), one’s best bet is to go with the much simpler incorporation of multiple base Flat File Disassemblers – each with a reference to a specific document schema.
Extending the Solution Example into Simplicity
So as a step in the learning process, I would certainly recommend extending this sample solution to incorporate this far more simplistic approach to message resolution in disassembly. Before making any the modifications that will be described, I would build and deploy as per the samples documentation (running setup.bat within the solution directory) and go through a test run with the sample flat files provided. This will be the benchmark for what we reimplement in the extended version.
Starting with the solution example provided, the first thing that we would want to do to test out our move to this far simpler approach, is create a new Pipeline that will take 1 Flat File Disassembler per document schema – I have called this ~.AlternateReceivePipeLine.AlternateRP in keeping with the naming conventions that appear to be in place for this solution.
If we rebuild and redeploy our application, and then change our receive location’s selected pipeline to the one we have just recreated
(and perform any restarts required from the admin console) we should see exactly the same result that we saw from our multi-class custom pipeline component – Namely 4 resultant XML files that should match the 4 that we ran through in our baseline test.
In our new pipeline, a parse of the message with each of the 4 available schemas is attempted until one succeeds. Only a pass through the pipeline without ANY match to one of the designated disassembly schema will cause an exception.
At this point, I would suggest incorporation of orchestrations to illustrate how, once message resolution has taken place within the receive pipeline, BizTalk should take care of the rest of the post-message box decision making with regard to handling of message of format #1 as opposed to #2 (or in this case, say, a Sales Order vs. a Purchase Order). Note the same should take place should we be working with maps assigned to our document schemas configured for usage at either the send or receive port. To demonstrate as such, create 2 very simple orchestrations such as SalesOrderSimplePickup.odx and PurchaseOrderSimplePickup.odx that look something like the following.
If we rebuild, redeploy, and remove the filter on our send port, then re-deliver our 4 sample messages to the receive location, we should see only 2 resultant files appear in our OUT file location. Since no orchestrations were created that took our other 2 message types, they were not processed and/or sent.
Clearly, either type of backbone of a messaging solution as described (custom component or multiple base disassemblers) allow us to deal with messages of a variety of possible formats and their resolution of schema. Custom pipeline components are a powerful tool within BizTalk that can certainly be used when out-of-box functionality does not support required functionality. However, given their potential to deprecate some of the benefits of the use of BizTalk and the relative complexity of their development – especially for new-comers to BizTalk, it would be highly recommended utilize to the utmost, the more out-of-box solutions such as the one described. This is particularly the case with regard to the challenge posed to the company mentioned in this example and their integration with external partner systems.