Integrating document BLOB storage with SQL Server

NoSQL platforms can support highly scalable databases with BLOB attachments (images, documents, and other files), but if you think you need to embrace a NoSQL solution in lieu of SQL Server just because you have a high volume of BLOBs in your database, then think again. Sure, if you have good reasons to go with NoSQL anyway – for example, if you want the flexibility of schema-free tables, and you can accept the compromises of eventual transactional consistency – then NoSQL can fit the bill nicely.

But critical line-of-business applications often can’t afford the relaxed constraints of NoSQL databases, and usually require schemas that are strongly typed, with full transactional integrity; that is, a full-fledged relational database system (RDBMS). However, relational database platforms like SQL Server were originally designed and optimized to work primarily with structured data, not BLOBs. And so historically, it’s never been feasible to store large amounts of BLOB data directly in the database. That is, until FILESTREAM.

With FILESTREAM, Microsoft addresses the dilemma of storing BLOBs within the relational database. My new Pluralsight course, SQL Server 2012-2014 Native File Streaming, explains this innovative feature in detail, and in this blog post, I’ll discuss how FILESTREAM (and its related technologies) can be used to implement a highly-scalable BLOB storage solution that’s fully integrated with a relational SQL Server database. You’ll also find live demos on everything mentioned in this post in the course.

Introducing FILESTREAM

Although SQL Server was never originally intended to handle BLOBs in large scale, this is no longer true as of FILESTREAM (introduced in SQL Server 2008). Before FILESTREAM, SQL Server was forced to shove BLOBs into the standard database filegroups, which are really optimized for storing structured row data in 8k pages. Because BLOBs don’t fit naturally within this structure, they must be pushed into off-row storage, which bloats the structured filegroups, and ultimately kills performance.

FS1b

FILESTREAM changes all that. First, to be clear, FILESTREAM is not actually a data type. Rather, it’s an attribute that you apply to the varbinary(max) data type, the same data type that you would use to store BLOBs directly inside the row. But by merely appending the FILESTREAM attribute to the varbinary(max) data type, SQL Server takes a radically different approach to physical BLOB storage. Rather than inundating the standard database filegroups with BLOBs, SQL Server stores BLOB content as files in the file system – where they belong; the file system being a native environment optimized for storing and streaming unstructured binary content. At the same time, it establishes and maintains reference pointers between the rows in the standard filegroups and the files in the file system that are tied to varbinary(max) columns in those rows. All this magic occurs behind the scenes, and is totally transparent to any existing code that works with ordinary varbinary(max) columns.

FS2

In this manner, the BLOB data is physically stored separately from structured row data, but it is logically an integral part of the database. So for example, backing up the database includes the BLOB data, with the option of performing a partial backup that excludes the FILESTREAM filegroup when you want to create smaller backups that don’t include BLOB data.

Furthermore, this solution provides full transactional consistency – because FILESTREAM integrates with the NTFS file system, and NTFS is a transactional file system. So when you start a database transaction and insert a row, and that row includes BLOB data stored in a varbinary(max) FILESTREAM column, then SQL Server automatically initiates an NTFS file system transaction over that BLOB data. Then, the fate of the file system transaction hinges on the fate of the database transaction. If and when the database transaction commits, then SQL Server will also commit the NTFS file system transaction; similarly, rolling back the database transaction automatically rolls back the NTFS transaction.

Accessing BLOBs with T-SQL

With FILESTREAM, you can treat BLOBs as ordinary varbinary(max) columns in T-SQL. For example, you can use the OPENROWSET function with the BULK provider to import an external file into a varbinary(max) column, and if that column is decorated with the FILESTREAM attribute, then SQL Server will automatically store a copy of that file as a BLOB in the NTFS file system behind the scenes, rather than force-fitting it into the standard database filegroups.

For example:

INSERT INTO PhotoAlbum(PhotoId, PhotoDescription, Photo)
  VALUES(
    3,
    'Mountains',
    (SELECT BulkColumn FROM OPENROWSET(BULK 'C:\Demo\Ascent.jpg', SINGLE_BLOB) AS x))

To retrieve BLOBs, it’s a simple SELECT:

SELECT * FROM PhotoAlbum
FS3

 

Using SqlFileStream and the Streaming API

Although FILESTREAM delivers scalable storage by leveraging the NTFS file system behind the scenes, BLOB access needs to scale as well. It’s great that you can enjoy total transparency by just using T-SQL access, but stop for a moment and think about what SQL Server needs to do when retrieving BLOBs with T-SQL. In order to serve up the Photo column in the resultset shown above for the SELECT statement, for example, SQL Server needed to read the entire contents of each BLOB from the NTFS file system that it’s managing internally, and this can easily and suddenly place a great deal of memory pressure on the server.

To address this concern, FILESTREAM exposes the streaming API. When you use this API, SQL Server still manages the file system behind the scenes, only it shifts the burden and memory requirements of actually reading and writing BLOBs in the file system off of itself and onto the client application. This keeps the memory requirements on SQL Server very lean, regardless of how large your BLOBs may be.

The SqlFileStream class is a managed code wrapper around the streaming API, which makes it extremely easy to use from .NET. In C# or VB .NET, you start a database transaction and issue an INSERT statement, but you don’t actually include the BLOB content with the INSERT statement. Instead, SQL Server passes you back the information you need to create a SqlFileStream object. This object inherits from the base System.IO.Stream class, meaning that it supports all the standard read/write methods of standard .NET stream classes, including memory streams, HTTP request/response streams, and local file streams. So it’s easy to then stream your BLOBs in and out, using buffers in memory allocated to your application – not SQL Server. Then, you just commit the database transaction, and SQL Server automatically commits the NTFS file system transaction at the same time.

In my course, I show you SqlFileStream up close, and demonstrate how to program against the streaming API from a thick client application, a client/server (Web) application, and in an n-tier (WCF) scenario as well.

Introducing FileTable

The FILESTREAM story only gets better with FileTable, added in SQL Server 2012. While FILESTREAM revolutionizes BLOB storage in SQL Server, it’s only accessible to developers and administrators. What about ordinary users? They’re certainly not going to write T-SQL or streaming API code to access BLOBs. And there’s also no way for ordinary client applications to access FILESTREAM data.

The solution is FileTable, which combines FILESTREAM with the hierarchyid data type to furnish an “emulated” file system; that is, a file system that users and applications can work with, but which is really a FileTable in a SQL Server database. A FileTable is just a regular table except that it has a fixed schema; specifically, it has these pre-determined columns for the metadata of the emulated file system:

FS4

Every row in a FileTable represents either a file or a folder (depending on the is_directory column), and the hierarchyid value in the path_locator column is what implements the folder structure of the emulated file system. The hierarchyid data type has methods that you can use to query and manipulate the structure; for example, you can programmatically move entire subtrees from one parent to another.

For rows that represent files, the file_stream column holds the actual BLOB, and this is a varbinary(max) FILESTREAM column. So behind the scenes, it is stored in the NTFS file system just like a varbinary(max) FILESTREAM column of an ordinary table (a non-FileTable) would be.

And so, in addition to being able to use T-SQL or the streaming API with a FileTable, the emulated file system that a FileTable represents also gets exposed to users and client applications via a Windows file share. As a result, changes made to the table in the database are reflected in the emulated file system, and conversely, changes made to the emulated file system by users or client applications are reflected automatically in the database, which ultimately pushes down into the physical NTFS file system being used for BLOB storage behind the scenes.

FS5

Summary

This blog post explained FILESTREAM, and its related feature, FileTable. We first saw how FILESTREAM supports scalable BLOB storage using the NTFS file system behind the scenes, and provides transparent T-SQL access using the varbinary(max) data type. We also learned about the streaming API and SqlFileStream, which shifts the burden and memory requirements for streaming BLOBs off of SQL Server and onto client applications, providing scalable BLOB access. And we finally saw how FileTable combines FILESTREAM with the hierarchyid data type to furnish an emulated file system on the front end that users and client applications can interact with, but which in actuality is just a table in the database.

And so, with FILESTREAM, line-of-business applications can embrace scalable BLOB integration without being forced to consider a NoSQL alternative to SQL Server.

Posted in SQL Server | Leave a comment

A Better ReadPropertyBag in BizTalk Pipeline components

Just a quick one today.  I’ve been working on several BizTalk pipeline components lately, and had been using some code that’s been floating around here for a while – I’ve found them posted on other blogs/message boards as well, but the earliest posting I’ve found of it is from another Tallan blog post from 2006: http://blog.tallan.com/2006/11/22/custom-pipeline-components-part-2-archive-same-continued/.

The relevant code from that post is this:

Read More »

Posted in BizTalk, Biztalk Tutorial | Tagged , , , , , | Leave a comment

Calling a custom pipeline programmatically from within a BizTalk Orchestration

Interfaces often times are required to process email attachment received thru the POP3 adapter. While recently working on a project with a use case such as this we came across a requirement where the attachment not only had to be separated/detached from the email as a body part but also then had to be reformatted/transformed from a CSV file to a standard BizTalk XML message format.

Typically the best practice and approach to parsing a flat file is with the Flat File wizard using an example of your file to generate the appropriate xml schema. You would then create a custom pipeline and place a flat file dissembler component in the disassembly section configuring the component with the FF generated schema produced by the flat file wizard:

FFWizardPicforBlog

Read More »

Posted in BizTalk, Biztalk Tutorial | Leave a comment

Embedding Power View Reports Into Your Blog/Site

For all of you out there who have had challenges with embedding your Power View dashboards into your public-facing websites, here is the fruits of my labors attempting to and finally achieving success in doing just that. There’s no new revelations here, consider this to be more of a compendium of all the important information that’s out there on this subject.

Making your Power View Report Available to the Public

This was one of the biggest challenges I encountered. While there are many examples of how to create a guest link, all the resources I found did not give the full picture of what you need to do to make this possible.

Read More »

Posted in Business Intelligence, PowerView, SharePoint | Tagged , , , , | Leave a comment

Dynamics CRM2013 Reporting: Error Pre-Filtering Report w/ Multiple DataSources

On a recent client I had the task of making one of our already existing Dynamics CRM reports use pre-filtering. This is normally a very straight forward procedure and you can find steps to do this on the following link: https://msdn.microsoft.com/en-us/library/bb955092.aspx.

Now given the complexity of the query used to populate this specific report I needed to use explicit pre-filtering. Once again, this is something that should be fairly straight forward to do. After I finished adjusting the report to use explicit pre-filtering I then deployed my report out to CRM. But when I tried running the report I got the following error: The expected parameter has not been supplied for the report

Read More »

Posted in Dynamics CRM, Reporting Services | Leave a comment

BizTalk Orchestration handling webHttp REST GET requests

Consider the following scenario: you have a tested BizTalk Orchestration that effectively takes a parameter (say, a message identifier) and returns the message from a database for consumption.  Now you want to expose a WCF Service for consumers of that orchestration to use.  While you could design a SOAP based interface, a simple RESTful interface that accepts a GET verb would do the trick more elegantly (and in a way that would be much easier for clients to work with). It also may meet a requirement when you already have clients that expect a RESTful interface for data retrieval.

There’s plenty out there on handling REST GET requests using BizTalk Orchestrations as a consumer:

Unfortunately, the same can’t quite be said about handling REST GET requests as a provider; there are resources:

…but this side of the equation tends to focus on POST and PUT verbs where there will be a message body.  The GET verb is not going to provide a message body (only parameters).

Using the above two articles should be enough to get a WCF webHttp service deployed and ready, and I’m not going to rehash every step of that process (MSDN and Richard Seroter provide excellent instructions including pictures and diagrams).

However, when you get to the configuration of the receive location, there is some additional work needed to properly handle GET requests:

Read More »

Posted in BizTalk, Biztalk Tutorial | Tagged , , , , , | Leave a comment

BizTalk WCF Timeout Issue

Background

On a recent engagement, I was developing a BizTalk 2013 solution that received XML-based data from several sources and ultimately inserted rows into a SQL Server database using a stored procedure. An additional requirement was that the database connection had to be dynamic.

As with most development, everything worked fine in the development environment and even in initial UAT testing. A timeout issue started to appear intermittently as the size of the data being processed increased and the content of the database grew. Rerunning the same test data through would often complete successfully. The details of the message logged to the Windows Logs was:

A message sent to adapter “WCF-Custom” on send port “WCF.Timeout_1.0.0.0_WCF.Timeout.ProcessDataToSQL_Dyn_WCFTest_c495b1996f35b303″ with URI “mssql://xxxx//Test?” is suspended.

Error details: System.Data.SqlClient.SqlException (0×80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0×80004005): The wait operation timed out

I captured a SQL Profile for the offending stored procedure and reran it in SSMS. For one of the test cases, I noticed the processing time took between 50 seconds and 1 minute and 10 seconds and I immediately remembered that the default timeout on WCF adapter calls is 1 minute.

Since the send port is dynamic, there are no binding configuration properties to set so I added the following line to the message assignment shape:

SQL_Request(WCF.SendTimeout)= “00:10:00″;

I then redeployed the solution, sent a test message and received the exact same timeout message! Next I figured maybe it was one of the “other” timeout settings that was getting triggered so I added the following additional lines to the message assignment shape:

SQL_Request(WCF.OpenTimeout)= “00:10:00″;

SQL_Request(WCF.CloseTimeout)= “00:10:00″;

I then redeployed the solution, sent a test message and received the same message again. That wasn’t it. What to do now?

Read More »

Posted in BizTalk, Biztalk Tutorial | Tagged , , , , , , , , | Leave a comment

Introducing the T-Connect EDI Validator!

All of us who have worked on EDI integration know the pain of maintaining the integrity of the EDI messages coming into the system, as well as the value of validating them right from the pipeline-gate. To aid in this, Microsoft BizTalk Server supports out of the box verification of structural integrity with extended schema validation settings.  The great part about this is that it allows us to reject any incoming EDI transactions right out of the gate and return those results in the generated ACK file. This way, we ensure that we don’t let invalid data into the system, while also making the sender aware of our “bad data” rejection.

However, as great as this feature is, it still leaves a major validation pain point.

Let’s consider following scenario.

  1. Let’s say in the case of EDI 837 transactions, we consider REF*D9 as the claim identifier within our system and we want to ensure that there are no duplicate claims within a given interchange.
  2. In the same transaction assume the trading partner is sending some other custom information like a Return Date with an NTE Segment, which has a specific format.
  3. Since we are talking about HIPAA Claims, our organization only processes claims that are higher than a certain threshold.

Read More »

Posted in Announcements, BizTalk | Leave a comment

Validating an untyped XML message in a BizTalk Orchestration

Receiving an untyped message (a System.Xml.XmlDocument) in a BizTalk orchestration offers a lot of flexibility in design.  The same code can be used for many different message types, and schema changes don’t require reworking your ports and receive shapes.  However, you lose some of the advantages of working with strongly typed messages – your orchestration could very well get a malformed message that is invalid by structure or content, and you won’t know until further down the line when another component that does validate the message (such as a send port) throws an exception.  Many times this isn’t a concern, but I was recently working on a solution where the orchestration received untyped messages and needed to know whether the message would validate against its schema or not.  Other components in the integration that processed the message later were not open for modification, and since the message was being constructed by the orchestration I couldn’t just add the XML Validator component to the receive pipeline for the orchestration.

There are a few different ways to handle this scenario. XmlDocument provides a Validate() method that takes a callback function it sends warnings and errors to.  This method requires that you add the correct schema for the message type.  This could be accomplished with a C# helper class:

Read More »

Posted in 0-Uncategorized, BizTalk, Biztalk Tutorial | Tagged , , , , , | 1 Comment

New Update to the T-Connect EDI Splitter Now Available!

Picture1

In September of last year, we released the T-Connect EDI Splitter For BizTalk.  We are proud to announce the release and immediate availability of the newest version of the T-Connect EDI Splitter.  For the uninitiated, the EDI File Splitter is a Microsoft BizTalk Server pipeline component that enables the rapid splitting of large EDI files within BizTalk Server.  These split files are delivered to an user configurable folder location.   Along with additional features added to the EDI Splitter for BizTalk, we are also releasing a standalone application for Microsoft Windows available today: The T-Connect EDI Splitter for Windows

Read More »

Posted in Announcements, BizTalk | Tagged , , , , , | Leave a comment