Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

Working with FILESTREAM BLOBs in BizTalk

Dan Field

MSDN provides an example of INSERTing large data into SQL Server, leveraging the WCF-SQL adapter’s built in FILESTREAM capabilities.  However, it’s also possible to leverage the transaction enlisted by the WCF adapter in a custom pipeline to pull FILESTREAM data out of SQL Server more efficiently than the more common SELECT … FOR XML query which simply grabs the FILESTREAM content and stuffs it into an XML node in the resulting document.

Imagine, for example, you had a large document of some sort (XML, Flat File, etc.) to store in SQL Server that BizTalk would need to process from a FILESTREAM table defined like so:

CREATE TABLE [tFilestreamDemo] (
    [CreateDate] DATETIME NOT NULL,
    [CreateUser] VARCHAR (50) NOT NULL,
    [ID] UNIQUEIDENTIFIER DEFAULT (NEWSEQUENTIALID()) ROWGUIDCOL NOT NULL,
    [Metadata1] INT NOT NULL,
    [MetadataDate] DATE NOT NULL,
    [Metadata3] VARCHAR (50) NULL,
    [XmlBlob] VARBINARY (MAX) FILESTREAM NOT NULL,
    CONSTRAINT [PK_tFilestreamDemo] PRIMARY KEY CLUSTERED ([ID] ASC)
) FILESTREAM;

The tradeoff here is losing the typed XML data in favor of more efficient storage and access to larger file objects (especially when the data will, on average, be large).  This can make a vast difference if you have to store a large (>100MB) XML file in the database for some reason.

It would be possible to extract data from this table writing a procedure as follows:

CREATE PROCEDURE usp_Sel_FilestreamDemo
(
    @MetadataDate DATE
)
AS
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN
    ;WITH XMLNAMESPACES('http://sql_message_target_namespace.com' as ns0)
    SELECT
        Metadata1 as 'ns0:Metadata1'
       ,MetadataDate as 'ns0:MetadataDate'
       ,Metadata3 as 'ns0:Metadata3'
       ,CAST(XmlBlob as XML) as 'ns0:XmlBlob' -- note -if we don't cast it, we'll get it as Base64
    FROM tFileStreamDemo
    FOR XML PATH('ns0:Root')
END

This will result in a very large message, like so (assuming XmlPolling is used – the same principle would apply to TypedPolling):

<SqlAdapterWrapperNode xmlns="http://namespace_adapter_is_Configured_to_use.com">
  <ns0:Root xmlns:ns0="http://sql_message_target_namespace.com">
    <ns0:Metadata1>1</ns0:Metadata1>
    <ns0:Metadata2>2016-02-29</ns0:Metadata2>
    <ns0:Metadata3>Metadata3</ns0:Metadata3>
    <ns0:XmlBlob><![CDATA[<100 MB of XML text here.../>]]></ns0:XmlBlob>
  </ns0:Root>
  <ns0:Root>
    ...
  </ns0:Root>
</SqlAdapterWrapperNode>

Which will be problematic for a few reasons:

  1. It will cause additional load on SQL Server when it tries to retrieve a lot of large data to send over the pipe
  2. It will greatly increase the likelyhood of connection timeouts during this period
  3. It will be one more large XML message BizTalk will have to track.

To avoid this, we could instead utilize a the SqlFileStream class with the following procedure

CREATE PROCEDURE usp_Sel_FilestreamDemo
(
    @MetadataDate DATE
)
AS
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN
    ;WITH XMLNAMESPACES('http://sql_message_target_namespace.com' as ns0)
    SELECT
        Metadata1 as 'ns0:Metadata1'
       ,MetadataDate as 'ns0:MetadataDate'
       ,Metadata3 as 'ns0:Metadata3'
       ,XmlBlob.PathName() as 'ns0:FilePath' -- this will give us a path to pass in the disassembler to the actual FILESTREAM file
       ,GET_FILESTREAM_TRANSACTION_CONTEXT() as 'ns0:TxContext' -- this will give us the transaction context to use in that call
    FROM tFileStreamDemo
    FOR XML PATH('ns0:Root')
END

This procedure will be much more efficient – instead of trying to make the SQL Server process retrieve the entire blob and write it to the XML file (and do some data conversion on it), we’re simply returning the handle and transaction context for that file. The resultant XML will look more like this:

<SqlAdapterWraperNode xmlns="http://ns_adapter_is_configured_to_use.com">
  <ns0:Root xmlns:ns0="http://sql_message_target_namespace.com">
    <ns0:Metadata1>1</ns0:Metadata1>
    <ns0:Metadata2>2016-02-29</ns0:Metadata2>
    <ns0:Metadata3>Metadata3</ns0:Metadata3>
    <ns0:FilePath>\\server_name\path\to\mssql\filestream\volumeinfo</ns0:FilePath>
    <ns0:TxContext>0x1C2D7FD5DC09164EA21D3AFD27611A6D</ns0:TxContext>
  </ns0:Root>
  <ns0:Root>
    ...
  </ns0:Root>
</SqlAdapterWrapperNode>

And this will be much easier for BizTalk to process, even if multiple records are returned in the resultset. Now, loading that file can be deferred. If you want to load that file in the receive location, you can use the TxContext value that has been passed – the location will be operating under the same transaction as the adapter (assuming UseAmbientTransaction is set to true).

Following that path, here’s what the code for the Disassembler methods would look like (a Disassembler is required here because we’re assuming we might want to handle multiple files per message – if not, then it’d be fine to write a somewhat simpler Decoder component instead, but the rough idea would be the same):

The Disassemble method:

private Queue<IBaseMessage> qOutputMsgs = new Queue<IBaseMessage>();
private const string _sysProps = "http://schemas.microsoft.com/BizTalk/2003/system-properties";

public void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
  Stream originalStream = pInMsg.BodyPart.GetOriginalDataStream();
  try
  {
    HashSet<string> propNamesToIgnore = new HashSet<string>()
    {
      "MessageType", "MessageID", "SchemaStrongName"
    };
    XmlReader reader = XmlReader.Create(originalStream);
    pContext.ResourceTracker.AddResource(reader);

    reader.MoveToContent();
    // we're effectively transforming the message
    // replace with your actual MessageType
    // get the IDocumentSpec so that we can properly promote SchemaStrongName for later mapping activities
    string msgType = "http://destination_message_target_namespace.com#Root";
    IDocumentSpec docSpecName = pContext.GetDocumentSpecByType(msgType);

                string connectionString = SSOHelper.GetSetting("ConnectionString"); // this could be done using BTDF or the Microsoft SSO Client sample
    IBaseMessageFactory msgFactory = pContext.GetMessageFactory();
    // open a new connection, which will implicitly use the MSDTC transaction from the adapter
    // we'll be able to use the FILESTREAM Transaction Context we recieved from the procedure since we're in the same transaction
    // only need one connection to get multiple files out
    using (SqlConnection conn = new SqlConnection(connectionString))
    {
      conn.Open();
      while (reader.ReadToFollowing("Metadata1"))
      {
        IBaseMessage outMsg = msgFactory.CreateMessage();
        outMsg.AddPart("Body", msgFactory.CreateMessagePart(), true);
        reader.Read();
        string metadata1 = reader.Value; // do whatever you want with this

        reader.ReadToFollowing("FileName");
        reader.Read();
        string fsName = reader.Value;

        reader.ReadToFollowing("TxContext");
        reader.Read();
        byte[] txContext = Convert.FromBase64String(reader.Value);
        VirtualStream vts = new VirtualStream();

        // this does the heavy lifting - load the file into a virtual stream
        // FileOptions.SequentialScan works well here because we're just sequentially copying the file to a new stream
        using (SqlFileStream sqlFileStream = new SqlFileStream(fsName, txContext, FileAccess.Read, FileOptions.SequentialScan, 0))
          sqlFileStream.CopyTo(vts); // this is fast.

        vts.Position = 0;
        pContext.ResourceTracker.AddResource(vts);

        outMsg.BodyPart.Data = vts;

        CopyProperties(pInMsg, outMsg, propNamesToIgnore);

        outMsg.Context.Promote("MessageType", _sysProps, msgType);
        outMsg.Context.Write("SchemaStrongName", _sysProps, docSpecName.DocSpecStrongName);

        qOutputMsgs.Enqueue(outMsg);
      }
    }
  }
  catch (Exception e)
  {
    // do whatever appropriate logging
    Logger.LogError("FILESTREAM Disassembler encountered exception:\r\n\r\n" + e.ToString());
    throw;
  }
}

A helper method to copy properties:

/// <summary>
/// Copies properties from one IBaseMessage to another
/// </summary>
/// <param name="pInMsg">Source message</param>
/// <param name="outMsg">Destination Message</param>
/// <param name="propsToIgnore">Optional - hashset of property names to ignore</param>
/// <returns>Number of properties copied</returns>
private static int CopyProperties(IBaseMessage pInMsg, IBaseMessage outMsg, HashSet<string> propsToIgnore = null)
{
  uint propCount = pInMsg.Context.CountProperties;

  int copiedProperties = 0;
  for (int i = 0; i < propCount; i++)
  {
    string propName;
    string propNS;
    object propVal = pInMsg.Context.ReadAt(i, out propName, out propNS);
    if (propsToIgnore == null || !propsToIgnore.Contains(propName))
    {
      if (pInMsg.Context.IsPromoted(propName, propNS))
      {
        outMsg.Context.Promote(propName, propNS, propVal);
      }
      else
      {
        outMsg.Context.Write(propName, propNS, propVal);
      }
      copiedProperties++;
    }
  }

  return copiedProperties;
}

And the GetNext method:

public IBaseMessage GetNext(IPipelineContext pContext)
{
  if (qOutputMsgs.Count > 0)
  {
    IBaseMessage outMsg = qOutputMsgs.Dequeue();
    return outMsg;
  }
  else
    return null;
}

While this will still load the large message into BizTalk, it will do so much more quickly (testing on this took processing down from ~5000ms per file per message to ~2ms per message (!) in my dev environment). It will also save significant load on the SQL process (and, some load on the BizTalk process) – freeing up valuable resources for other applications to use.

The same principles could be used in an orchestration or later component, but would require actually creating a transaction (or piggybacking off the MSDTC transaction in a send pipeline) and calling GET_FILESTREAM_TRANSACTION_CONTEXT() to get the transaction context token. Lenni Lobel has an excellent blog on how to do that here – the same .NET calls would apply in some custom orchestration or pipeline code, with the possible exception of using System.Transactions (to piggyback off MSDTC) instead of a regular SqlTransaction.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\