Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

Beyond the Flat File Wizard in BTS 2006 – Part 1

I was recently helping one of our clients parse a record-based flat file where the records are mostly positional identified by a tag. The issue was that the flat file was a credit card transaction log where each record, identified by a tag, is a sibling to the other records with no specific order in which the records appear. Such a scenario cannot be handled by an “out of the box”  usage of the FF wizard. This series of postings will focus on explaining how BizTalk Flat File schemas are structured, analyze the output of the FF wizard and how to customize it.

BizTalk handles Flat Files through the use of flat file extensions defined within schema annotations (more about annotations…, more about flat file extensions…). A receive pipeline component known as a Flat File Disassembler (or assembler for the send counterpart) can act on these extensions to parse an incoming flat file into XML. This disassembler is one of the standard components packaged with BizTalk and can be found in the Microsoft.BizTalk.Pipeline.Components.dll assembly under “C:\Program Files\Microsoft BizTalk Server 2006\Pipeline Components\” (for a default installation). The FF assembler and disassembler components in this assembly are FFDasmComp and FFAsmComp.

Although the Flat File wizard is a really nice new feature of BTS 06, it does make it easier for developers to get by without understanding the different options available with FF extensions to meet their needs. This series of posts will be example driven to explain what the wizard generates and go over some common flat file scenarios where the developer will need to update the schema properties vs. using the output of the wizard.

Example 1: A Common CSV File
The first example is a simple comma separated file of contacts that has three fields: First Name, Last Name and Age. The following is what a sample Contacts.csv file looks like:

    1 Mario, Harik, 32

    2 Foo , Bar , 12

    3 John , Dough, 47

 

Creating a flat file schema from this file using the wizard is done in two steps, the first one is to specify the record delimiter of new line (“{CR}{LF}”) with repeating records for the root element and the second is to specify each of the record’s delimiter of comma (“,”). The following are the essential steps of the creation process:

  1. Add a new Flat File Wizard Schema to your project.
  2. Select the flat file you want to create a schema for (contacts.csv in this example) as your instance.
  3. Give the root node a name of Contacts and assign the proper namespace based on your naming convention.
  4. Enter your record delimiter of data under the root node which is {CR}{LF} in this case.
  5. The wizard will then allows you to specify the child elements of the selected data based on the specified delimiter. Our first example is a simple one where it contains a set of identical repeating records.
  6. The wizard will then reiterate over all the records or repeating records to define how they will be parsed.
  7. In this example, the second iteration will define the contact record on the contacts schema. This time however, select “,” (comma) as being the delimiter.
  8. Enter the child elements information as shown below.
  9. And this would create the flat file schema that will be used by the flat file disassembler or assembler to be able to produce an XML document out of the contacts CSV file or vice versa. Before we start looking into non-wizard supported FF customizations, we’ll first analyze the output of the wizard for this example.

The schema generated by the FF wizard looks as follows:

The first thing we notice in this schema are the schema-level annotations that specify that we’re using the flat file extensions:

    3 <xs:annotation>

    4     <xs:appinfo>

    5       <schemaEditorExtension:schemaInfo namespaceAlias=bextensionClass=Microsoft.BizTalk.FlatFileExtension.FlatFileExtensionstandardName=Flat Filexmlns:schemaEditorExtension=http://schemas.microsoft.com/BizTalk/   2003/SchemaEditorExtensions />

    6       <b:schemaInfo standard=Flat Filecodepage=1200default_pad_char=” ” pad_char_type=charcount_positions_by_byte=falseparser_optimization=speedlookahead_depth=3suppress_empty_nodes=falsegenerate_empty_nodes=trueallow_early_termination=falseearly_terminate_optional_fields=falseallow_message_breakup_of_infix_root=falsecompile_parse_tables=falseroot_reference=Contacts />

    7     </xs:appinfo>

    8   </xs:annotation>  

 

Underneath the root node, the flat file wizard added the following annotations:

Underneath the root node, the flat file wizard added the following annotations:

   11 <xs:appinfo>

   12         <b:recordInfo structure=delimitedchild_delimiter_type=hexchild_delimiter=0xD 0xAchild_order=postfixsequence_number=1preserve_delimiter_for_empty_data=truesuppress_trailing_delimiters=false />

   13       </xs:appinfo>

 

These specify that the structure is delimited (vs. positional) where the child_delimiter_type is of type hex (vs. char).  The carriage return and line feed are represented with their hex ascii representation of 13 (hex=D) and 10 (hex=A). When specifying hex delimiters, they have to be separated with a space and prefixed with 0x.

These annotations will instruct the FF assembler or disassembler to map one Contact element to a line in the corresponding flat file.

The instructions on how to interpret the elements within each Contact are specified on the Contact node with the following annotations:

   24 <xs:appinfo>

   25               <b:recordInfo structure=delimitedchild_delimiter_type=charchild_delimiter=,child_order=infixsequence_number=1preserve_delimiter_for_empty_data=truesuppress_trailing_delimiters=false />

   26             </xs:appinfo>

 

These elements are also delimited but using commas instead of carriage return and line feed.

Using these default settings, and parsing the incoming CSV file, the output XML file is as follows:

    1 <Contacts xmlns=http://Tallan.BizTalk.SQLAdapterUpdategrams.Contacts_FF>

    2     <Contact xmlns=“”>

    3         <FirstName>Mario</FirstName>

    4         <LastName>Harik</LastName>

    5         <Age>32</Age>

    6     </Contact>

    7     <Contact xmlns=“”>

    8         <FirstName>Foo</FirstName>

    9         <LastName>Bar</LastName>

   10         <Age>12</Age>

   11     </Contact>

   12     <Contact xmlns=“”>

   13         <FirstName>John</FirstName>

   14         <LastName>Dough</LastName>

   15         <Age>47</Age>

   16     </Contact>

   17 </Contacts>

 

Simple Customizations
Character Wrapping or Escape Delimiters

In this example, it is unlikely for delimiters to appear within the boundaries of one of the Contact elements but this is a common scenario in CSV files. These are usually dealt with using wrapping characters or escape characters; for example:
“This is my, first name”, last name, 45
or
This is my \, first name, last name, 45

With our wizard generated schema, the following flat file:

    1 “Ma,rio”, Har\,ik, 32

 

results in the following XML:

    2 <Contact xmlns=“”>

    3   <FirstName>”Ma</FirstName>

    4   <LastName>rio”</LastName>

    5   <Age>Har\,ik, 32</Age>

    6   </Contact>

    7 </Contacts>

 

Which is not the desired result. Adding a wrap character to the field’s annotation can take care of nested commas; this is done on the element that needs to support wrapping (in this case, we add it to the FirstName element’s annotation):

   35 <xs:element name=FirstNametype=xs:string>

   36                 <xs:annotation>

   37                   <xs:appinfo>

   38                     <b:fieldInfo justification=leftsequence_number=1wrap_char_type=charwrap_char= />

   39                   </xs:appinfo>

   40                 </xs:annotation>

   41               </xs:element>

 

Specifying an escape character such as a “\” can be done on the record info annotation:

   24 <xs:appinfo>

   25               <b:recordInfo structure=delimitedchild_delimiter_type=charchild_delimiter=,child_order=infixsequence_number=1preserve_delimiter_for_empty_data=truesuppress_trailing_delimiters=falseescape_char_type=charescape_char=\ />

   26             </xs:appinfo>

 

These settings will then cause the line shown above to be parsed as:

    1 <Contact xmlns=“”>

    2     <FirstName>Ma,rio</FirstName>

    3     <LastName>Har,ik</LastName>

    4     <Age>32</Age>

    5 </Contact>

 

In the next post in this series, I’ll cover some of the other properties of flat files and how to use them to add further customizations than the one provided by the wizard.

4 Comments. Leave new

HI,

I would like to appreciate the solution that you have mentioned here and that has helped me a lot but there is a small query if you can help me in this I need to use biztalk but in my case instead of flat file I will be having a mdb(Access Database) file. So please can you give me a solution of how should I convert that in xml in biztalk

Hi Manuj,
You cannot get an Access MDB file into BizTalk using the flat file diassembler. You will either have to use a custom pipeline component or an adapter. The custom pipeline component can use ADO.NET to be able to query the MDB file for the data you want to build an XML message from. You can get an ODBC adapter from the Top XML BizTalk utilities that can be found at: http://www.topxml.com/biztalkutilities/. We’ll post a few entries about this in the near future on how to implement both approaches.

Hi Mario,
very interesting post. Could you, please, cover the other properties of the FF processing? Maybe it is possible to get us the more general view of the FF processor architecture?

Thanks!

The above information(Wrap Text) in creating FF schema was really helpful in solving my requriment.Appreciate your effort and thanks a lot for this.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\