Overview of Simple Parallel Programming with .NET 4.0

Today’s modern day computers contain multi-core hardware capable of performing multiple tasks simultaneously, also known as parallel processing. Prior to .Net 4.0 the norm was to use synchronous operations to perform multi-threading. Although some level of synchronization logic is needed for data that is shared, the simple .NET 4.0 multi-threading technique presented in this article is excellent for scenarios when you have large data and do not have shared dependencies between the data.

To use the parallel feature in .NET 4.0 you need to use the namespace System.Threading.Tasks. Then you simply take the list structure that you want to traverse and iterate through it using Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body). For example,

Parallel.ForEach(someLargeList, item =>

{

doSomething();

});.

For information on synchronized parallel programming, take a look at http://reedcopsey.com/2010/01/22/parallelism-in-net-part-4-imperative-data-parallelism-aggregation/. The author goes over how you can synchronize dependent data using the .NET 4.0 multithreading feature.

Posted in .NET Framework, Enterprise .NET | Leave a comment

A Look at Optional Parameters, Named Parameters and Permanent Redirect in .NET 4.0

While getting acquainted with VS 2010 and .NET 4.0, I’ve come across some improvements that that I’ll be sharing in this blog. Three features in particular are Optional Parameters, Named Parameters and Permanent Redirect.

Before .NET 4.0 one would have to overload methods if they wanted the ability to implement optional parameters. With this latest version of .NET, developers simply have to add the optional parameter in the last position of parameters as displayed below.

Example of Optional Parameters

Example of Optional Parameters

Another new feature in .NET 4.0 is named parameters, which allow you to ignore the parameter order and mention parameters with names in a different order. For example:

Example of Named Parameters

Example of Named Parameters

Although Microsoft coins this as an improvement, I find that this gives way for confusing code that may be hard to manage. For instance, imagine reading several lines of code, written by someone else, with methods having parameters all jumbled in no particular order.

Finally, in situations where we want to redirect users to a new page permanently and update the search engine’s indexes, a new command Response.RedirectPermanent can be used in .NET 4.0. In previous versions of .NET, Response.Redirect was used to do the redirection. The caveat in the aforementioned approach was that the header information for the page would not be updated, therefore allowing search engines to keep the old information of the page in their indexes for search. With the new command Response.RedirectPermanent, header information is updated on the server, the user is directly redirected to the new page and it saves a double trip to server as well. In addition, search engines, on re-indexing of that site and page, update the header information in their index for better performance on searches.

Posted in .NET Framework, Enterprise .NET | Leave a comment

Adding CROSS APPLY to your bag of T-SQL tricks

Recently I found myself stuck on a problem while constructing a query for a stored procedure. Like any other developer when I get stuck I start looking through my “bag of tricks” to see if I can find something that will help me solve this problem, but this time I came up empty handed. That’s when I started reading and learned about the APPLY operator, mainly CROSS apply. Many of you may be well versed with this operator already but if you are not, it may come in handy sometime in the future.

According to MSDN the APPLY operator allows you to invoke a table-valued function for each row returned by an out table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input. So to reiterate what MSDN is trying to get across is that when you use the APPLY operator each row of the output table of the table valued function (a function that returns a table rather than a scalar value) or sub-query is evaluated for each row of the outer table (your main query). So for each row of your main query, some function or sub query is being applied to it using data from that row; the results of that function or sub-query are then being added to row.

The APPLY operator comes in two flavors, CROSS APPLY and OUTER APPLY. CROSS APPLY returns only the rows from the outer table (main query) which have a result set from the table valued function. This is similar to how an INNER JOIN functions. On the other hand OUTER APPLY returns both rows that produce results and those that do not. For the rows where results were not produced NULL values will be present.

Always keep performance in mind; every time a table-value function is run there will have to be at least one scan. This means that although using the APPLY operator can be great fun, it has its time and place and shouldn’t be used all the time (Although, I know it can be tempting). I hope this helps clear up the APPLY operator so that you can use it in the future when you get stuck on a problem.

Posted in SQL Server, Tips and Tricks | Leave a comment

It’s a File System… It’s a Database Table… It’s SQL Server Denali FileTable!

SQL Server 2008 introduced FILESTREAM, an innovative feature that integrates the relational database engine with the NTFS file system to provide highly efficient storage and management of BLOBs. Now FileTable, a new feature coming in the next version of SQL Server (code-named “Denali”), builds on FILESTREAM and takes database BLOB management to the next level. In this post, I’ll explain FileTable and show you how it works. I also encourage you to experiment with FileTable yourself by downloading and installing SQL Server Denali CTP3 from http://bit.ly/DenaliCTP3 (but be sure to install it on a virtual machine; beta software can be difficult or impossible to uninstall).

Note: FileTable relies on FILESTREAM, so you need to understand FILESTREAM to fully understand FileTable. If you’re new to FILESTREAM, you can get the necessary background by reading Introducing FILESTREAM, Enabling and Using FILESTREAM, and Using SqlFileStream.

FileTable combines FILESTREAM with hierarchyid (another SQL Server 2008 feature) and the Windows file system API to deliver new and exciting BLOB capabilities in SQL Server. Like the two words joined in its name, one FileTable functions as two distinct things simultaneously:

  1. A FileTable is an Ordinary Table
  2. A FileTable is an Ordinary File System

First and foremost, a FileTable is a regular SQL Server database table in every respect, with one exception: The schema of a FileTable is fixed. The columns of a FileTable and their data types are pre-determined by SQL Server. Specifically, every FileTable contains these columns:

Column Name Data Type Description
stream_id uniqueidentifier ROWGUIDCOL Unique row identifier
file_stream varbinary(max) FILESTREAM BLOB content (NULL if directory)
name nvarchar(255) Name of file or directory
path_locator hierarchyid Location of file or directory within the logical file system
creation_time datetimeoffset(7) Created
last_write_time datetimeoffset(7) Last Modified
last_access_time datetimeoffset(7) Last Accessed
is_directory bit 0=file, 1=directory
is_offline bit Storage attributes
is_hidden bit
is_readonly bit
is_archive bit
is_system bit
is_temporary bit

Every FileTable implements a logical folder structure using the path_locator column. This is a hierarchyid value that denotes the location of each file and folder (row) within the logical file system (table). The hierarchyid data type was introduced in SQL Server 2008 as a binary value that, relative to other hierarchyid values in the same tree structure, identifies a unique node position (reminder to self: blog on hierarchyid!). It is implemented as a system CLR type, which means that it’s a .NET framework class, and has a set of methods you can use (e.g., GetAncestor, GetDescendant, GetReparentedValue, IsDescendantOf, etc.) to traverse and manipulate the hierarchy. Thus, it’s perfect for casting the hierarchical structure of a file system over a relational table, as FileTable does.

The path_locator column is defined as the table’s primary key with a non-clustered index. A separate key value is also stored in the stream_id column with a non-clustered unique index. This is the uniqueidentifier ROWGUIDCOL value required by any table with varbinary(max) FILESTREAM columns, so FileTable is no exception. And unlike path_locator, this unique value will never change once it is assigned to a new FileTable row, even if the row is later “reparented” (i.e., moved to another location in the hierarchy). Thus, you should treat stream_id as each row’s “ID,” even though it isn’t the table’s primary key.

Every row in a FileTable corresponds precisely to either a single file or folder, as determined by the bit value in the is_directory column. The file or folder name is stored in the name column as an nvarchar(255) string. All of the other column names are self-describing, and are used to store typical file system information such as various timestamps and storage attributes. The actual file content (the BLOB itself) is stored in the file_stream column, which is a varbinary(max) data type decorated with the FILESTREAM attribute. This means that the binary content in the file_stream column is stored in the NTFS file system that SQL Server is managing behind the scenes, rather than the structured file groups where all the other table data is stored (standard FILESTREAM behavior in SQL Server 2008).

In addition to these fourteen columns, each FileTable includes the following three computed (read-only) columns:

Column Name Data Type Description
parent_path_locator hierarchyid Parent node derived from path_locator
file_type nvarchar(255) Extension derived from name
cached_file_size bigint BLOB byte length derived from file_stream

The parent_path_locator column returns the result of calling GetAncestor(1) on path_locator to obtain the path_locator to the parent folder of any file or folder in the table. The file_type column returns the extension of the filename parsed from the string value in the name column. And the cached_file_size column returns the number of bytes stored in the file_stream column (these are BLOBs stored in SQL Server’s internally-managed NTFS file system behind the scenes).

With this fixed schema in place, every FileTable has what it needs to represent a logical file system. Thus, SQL Server is able to fabricate a Windows file share over any FileTable. This magically exposes the FileTable to any user or application who can then view and update the table using standard file I/O semantics (e.g., drag-and-drop with Windows Explorer, or read/write with System.IO.FileStream). So:

  • Creating a file or directory in the logical file system adds a row to the table
  • Adding a row to the table creates a file or directory in the logical file system

Here’s the total FileTable picture:

Take a moment to digest what’s happening here. In addition to the programmatic FILESTREAM access using either T-SQL or SqlFileStream, Denali now offers a third interface to FILESTREAM: A logical file system. In a sense, this fills the SQL Server 2008 FILESTREAM gap in which the file system itself is completely inaccessible. This is not to say that FileTable lets you directly access SQL Server’s internally managed NTFS file system; certainly not. That remains obfuscated and private, as it continues to function in standard FILESTREAM fashion against BLOB data that just happens to be in a FileTable instead of a regular table. What we do get is an abstraction layer over the FileTable that functions as a standard file system. In this logical file system, everything about each file and folder—except the BLOB content of the files themselves—is stored in the FileTable’s structured file group, while the BLOBs themselves are physically stored in the NTFS file system.

So you can see that there’s really nothing new beneath the FileTable layer; as before, SQL Server synchronizes transactional access between the row in the FileTable and its corresponding BLOB content in the NTFS file system to ensure that integrity is properly maintained between them. As with T-SQL access, this synchronization occurs implicitly when manipulating the FileTable via the exposed Windows file share. And, being an ordinary table in virtually every respect, you also have the option to use SqlFileStream with explicit transaction synchronization for the fastest possible streaming of BLOBs into and out of a FileTable.

I don’t know about you, but I find all of this extremely appealing. We now have total flexibility for BLOB storage in the database. With FileTable, you can easily migrate existing applications that work against physical file systems without writing any custom T-SQL or fancy SqlFileStream code. Just use a FileTable, let the existing applications continue working without modification, and enjoy the benefits of your files becoming an integral part of the SQL Server database.

That’s the whole FileTable story. The rest of this post just walks you through the steps and syntax for getting a FileTable up and running. It’s quite simple and straightforward. Specifically, you’ll need to:

  • Enable FILESTREAM for file system access
  • Create a FileTable-enabled database
  • Create a FileTable

Enable FILESTREAM for File System Access

FILESTREAM must be enabled for file system access at both the service level (either during setup, or later via the SQL Server Configuration Manager) and at the instance level. Complete details can be found in my Enabling and Using FILESTREAM post. Once enabled, SQL Server exposes a file share for all the FileTable-enabled databases you create under the SQL Server instance (by default, the share name is MSSQLSERVER).

The following statement enables FILESTREAM at the instance level for file system access (level 2).

EXEC sp_configure filestream_access_level, 2
RECONFIGURE

Create a FileTable-Enabled Database

Naturally, a FileTable-enabled database must include the FILEGROUP…CONTAINS FILESTREAM clause expected of any FILESTREAM-enabled database. In addition, you must also specify two parameters in the SET FILESTREAM clause of the the CREATE DATABASE (or ALTER DATABASE) statement. The DIRECTORY_NAME specifies the name of the folder that will be created for this database in the root file share associated with the instance. And enabling full non-transacted access with NON_TRANSACTED_ACCESS=FULL exposes every FileTable within the database as a subfolder beneath the database folder of the instance’s file share.

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 SET FILESTREAM(
  (DIRECTORY_NAME='PhotoLibrary',
   NON_TRANSACTED_ACCESS=FULL)
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Create a FileTable

Not surprisingly, this is the easiest part. Since SQL Server controls the schema of every FileTable, you just use a CREATE TABLE statement with the new AS FileTable clause and don’t include any columns:

CREATE TABLE PhotoFiles AS FileTable

Your FileTable is ready to use. You will find a root PhotoFiles folder for the FileTable beneath the PhotoLibrary folder created for the database in the Windows file share for the instance. You can interact with the FileTable using T-SQL, SqlFileStream, or the logical file system exposed by the Windows file share.

Summary

FILESTREAM + hierarchyid + Windows File Share = FileTable

Genius! Enjoy…

Posted in SQL Server | Tagged , , , | Leave a comment

Changing SharePoint Default ASP.NET compiler to use ASP.NET 3.5

Overview

While SharePoint 2010 supports .NET 3.5, it uses the .NET 2.0 compiler by default for ASP.NET pages.  If you write any UserControls or consume any DLLs in your hosted ASPX pages that utilizes .NET 3.5 features such as the var keyword, extension methods, and LINQ, you’ll get compilation error messages when attempting to access those pages.

To resolve this issue, you can edit the web.config in your site collection’s virtual directory (Usually under \inetpub\wwwroot\wss\virtualdirectories\<site>).  Add the following snippet under the configuration element:

 

<system.codedom>
  <compilers>
    <compiler language="c#;cs;csharp" extension=".cs" warningLevel="4"
              type="Microsoft.CSharp.CSharpCodeProvider, System, Version=2.0.0.0,
                    Culture=neutral, PublicKeyToken=b77a5c561934e089">
      <providerOption name="CompilerVersion" value="v3.5" />
      <providerOption name="WarnAsError" value="false" />
    </compiler>
  </compilers>
</system.codedom> 

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

Also, look under the Assemblies section and make sure you have the following entries:

   1: <add assembly="System.Web.Extensions, Version=3.5.0.0,Culture=neutral, PublicKeyToken=31BF3856AD364E35" />

   2: <add assembly="System.Xml.Linq, Version=3.5.0.0,Culture=neutral, PublicKeyToken=B77A5C561934E089" />

   3: <add assembly="Microsoft.SharePoint, Version=14.0.0.0,Culture=neutral, PublicKeyToken=71e9bce111e9429c" />

   4: <add assembly="System.Web.Extensions, Version=3.5.0.0,Culture=neutral, PublicKeyToken=31bf3856ad364e35" />

   5: <add assembly="Microsoft.Web.CommandUI, Version=14.0.0.0,Culture=neutral, PublicKeyToken=71e9bce111e9429c" />

   6: <add assembly="Microsoft.SharePoint.Search, Version=14.0.0.0,Culture=neutral, PublicKeyToken=71e9bce111e9429c" />

   7: <add assembly="Microsoft.Office.Access.Server.UI, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c" />

   8: <add assembly="Microsoft.SharePoint.Publishing, Version=14.0.0.0,Culture=neutral, PublicKeyToken=71e9bce111e9429c" />

 

Note: You can also add other .NET 3.5 assemblies here as needed.

Posted in .NET Framework, SharePoint, SharePoint 2010, SharePoint Foundation, SharePoint Server | Tagged , , | 1 Comment

Text input length limit with jQuery

Requirements

  • jQuery (tested on 1.6.2)

Introduction

Limiting the character length on a text input is one of the most commonly used patterns in web applications. In this post I’ll go over an easily re-usable implementation that is compatible with Internet Explorer, Firefox and Chrome using jQuery

Considerations

Setting a character length limit on text input is fairly simple to implement. For example the following snippet of code can achieve this:

$('#tweet_input').keydown(function () {
	var tweet_length = this.val().length;
	if(tweet_length &gt; 140) {
		this.val(this.substring(0,140));
	}
});

One flaw with this simple implementation is that the text length limiting logic only fires when a key is pressed. If someone were to copy and paste text into this text input the character count limit would not be applied. Another problem to consider is that when we truncate the text the cursor position is set to the end of the text box. This becomes an issue when a user highlights a portion of text in the middle and hits  the character length limit; the cursor will be set to the end of the text box giving them a glitchy experience.

jQuery plugin

Implementing the text length limit in this manner is not very re-usable and not easily maintainable. In jQuery a popular way to write frequently used code is to write it as a plugin. To do this we start it off by writing the following:

(function( $ ){
	$.fn.charLimit = function( options ) {
		if(options.limit === undefined || typeof options.limit !== 'number') {
			$.error('Option limit must be defined and must be a number.');
		}
		return this.each(function() {
			// CharLimit logic goes here.
		});
	};
})( jQuery );

This will add a function called charLimit to jQuery that takes a required parameter ‘limit’ that must be a number. So we can call:

$('#selector').charLimit({limit: 140});

anywhere on the page and it will apply the character length limiting to the matching selector.

Character Length Limit Logic

Initialization and Helper Functions

We will initialize our function with two variables; self which is the selector for the target text input and charLimit which is the character length limit. We have two helper functions _truncate which limits text length to charLimit and _setCaretPos which sets the cursor position of the text box. Calling _truncate will call _setCaretPos therefore it will limit character count and set the caret to the proper location.

// CharLimit logic goes here.
var self = $(this);
var charLimit = options.limit;

function _truncate(ev) {
	var caretPos;
	if (ev.target.selectionStart !== undefined) {
		caretPos = ev.target.selectionEnd;
	} else if(document.selection) {
		ev.target.focus();
		var range = document.selection.createRange();
		range.moveStart('character', -ev.target.value.length);
		caretPos = range.text.length;
	}

	self.val(self.val().substring(0, charLimit));
	_setCaretPos(ev, caretPos);
}

function _setCaretPos(ev, pos) {
	if ($(ev.target).get(0).setSelectionRange) {
		$(ev.target).get(0).setSelectionRange(pos, pos);
	} else if ($(ev.target).get(0).createTextRange) {
		var range = $(ev.target).get(0).createTextRange();
		range.collapse(true);
		range.moveEnd('character', pos);
		range.moveStart('character', pos);
		range.select();
	}
}

Keyboard Event

The most of obvious of the events to handle; we will call the _truncate function anytime a key is pressed. The difference between the keypress and keydown event is that keypress will ignore special keys such as backspace, shift and ctrl. We will also block the key input event if the character limit is reached unless a portion of text is highlighted. We do this because if a portion of text is highlighted it would delete the selected portion bringing down the text length.

self.keypress(function(ev) {
	var charCount = self.val().length;
	var selected;
	if (ev.target.selectionStart !== undefined) {
		selected = !(ev.target.selectionStart==ev.target.selectionEnd);
	} else if(document.selection) {
		ev.target.focus();
		var range = document.selection.createRange();
		selected = (range.text.length &gt; 0);
	}

	if(charCount &gt; charLimit-1 &amp;&amp; !selected) {
		return false;
	}
	setTimeout(function() {
		_truncate(ev);
	}, 1);
});

Paste Event

Another event to handle is the paste event. The following will handle paste events triggered by either keyboard or mouse. It will call same truncate helper function that the keyboard event handler uses.

self.bind('paste', function(ev) {
	setTimeout(function() {
		_truncate(ev);
	}, 1);
});

Final Product

When we put everything together this is our final product.

(function($) {
	$.fn.charLimit = function(options) {
		if(options === undefined || options.limit === undefined || typeof options.limit !== 'number') {
			$.error('Option limit must be defined and must be a number.');
		}

		return this.each(function() {
			var self = $(this);
			var charLimit = options.limit;

			function _truncate(ev) {
				var caretPos;
				if (ev.target.selectionStart !== undefined) {
					caretPos = ev.target.selectionEnd;
				} else if(document.selection) {
					ev.target.focus();
					var range = document.selection.createRange();
					range.moveStart('character', -ev.target.value.length);
					caretPos = range.text.length;
				}

				self.val(self.val().substring(0, charLimit));
				_setCaretPos(ev, caretPos);
			}

			function _setCaretPos(ev, pos) {
				if ($(ev.target).get(0).setSelectionRange) {
					$(ev.target).get(0).setSelectionRange(pos, pos);
				} else if ($(ev.target).get(0).createTextRange) {
					var range = $(ev.target).get(0).createTextRange();
					range.collapse(true);
					range.moveEnd('character', pos);
					range.moveStart('character', pos);
					range.select();
				}
			}

			self.keypress(function(ev) {
				var charCount = self.val().length;
				var selected;
				if (ev.target.selectionStart !== undefined) {
					selected = !(ev.target.selectionStart==ev.target.selectionEnd);
				} else if(document.selection) {
					ev.target.focus();
					var range = document.selection.createRange();
					selected = (range.text.length &gt; 0);
				}

				if(charCount &gt; charLimit-1 &amp;&amp; !selected) {
					return false;
				}
				setTimeout(function() {
					_truncate(ev);
				}, 1);
			});

			self.bind('paste', function(ev) {
				setTimeout(function() {
					_truncate(ev);
				}, 1);
			});

		});
	};
})( jQuery );

In order to use it all you need to is type the following. Where #your_textbox is the id of your text input.

// Include this once.
<script type="text/javascript" src="/js/jquery.charlimit.js"></script>

<input type="text" id='your_textbox'></input>
<script type="text/javascript">$('#your_textbox').charLimit({limit: 60})</script>

Resources

jquery.charlimit.js

Posted in Uncategorized | Tagged , | Leave a comment

Column Change Auditing with SSIS

I was recently asked by a client to write an SSIS package to update some data in a database.  One of the requirements was to audit all changes with before and after column values, update user and update date. The client requested that CDC (Change Data Capture) NOT be implemented. And not being very proficient in .net programming languages, I chose not to attempt to code any SSIS “Scripting Component” transformations. An alternate method was chosen that used a combination of SSIS “Conditional Split”, “Derived Column”, “Multicast” and “Union All” transformations. The trick was to “create” a separate audit row for each column that was changed in a database row. The following example shows this method for the row update data flow. The row has already been determined to have column value changes before it reaches this point.

 

 
 
 
Derive Static Values:                       Used to provide fixed values not derived from the data. In this case the audit output had an “Action” column designating an insert or update activity. This column value is set here.  Also update Userid is derived from a passed variable, and update date was populated using getdate().  Hard coded table name and column name columns are also set here for each table being audited.
Get Original Column Values:         Does a lookup of each changing data row to get the original values to store in the audit tables OriginalValue column.

If Column n Changed:                     Series of Conditional Split transformations to test for value changes in each column. If a column value will be changed, the data flow is directed to a Multicast transformation. If the data value will not change, the data flow continues to check the next column for changes.

Multicast n:                                       Flows the data for a ColumnChanged split conditional to both the data update and audit destinations. One Multicast transformation is coded for each column tested for changes.

Union Data n:                                   Unions the changed rows and the unchanged rows together for the next Conditional Split test.

Union Audit:                                     Unions the rows together that were created for the audit output destination. Union All Input 1 and Union All Input 3 values for OriginalValue and BusinessKey are taken from “Get Original Column Values” transformation for each column (row), ChangedValue comes from the input source, TableName, Userid, UpdateDate ColumnName come from “Derive Static Values” transformation.

Count Rows Updated:                    Count of updated rows for logging.

Update Data Table:                         This is the destination of the actual data table where the data updates are to be made.

Audit Destination:                           The audit table destination containing each changed column as a row.

Below is a code snippet from the insert flow. As with the Update flow, the data is Multicast and Union’d so that the data is made available for both the insert data and audit destinations.

 

Split for Data and Audit Inserts:                  Multicast transformation that splits data into separate rows for each column for the audit destinations, and the output data for the “Insert New Data Row” destination.

Insert New Data Row:                                   Destination for actual data insert.

Derive BusinessKey Data:                             Saves the new BusinessKey value from the source data for the audit row’s ChangedValue column. In an insert flow, all columns are captured for the audit with blanks provided for OriginalValue.

Derive Column n Data:                                  Saves each data column of the inserted row for the ChangedValue column of the audit. Again, blanks are used for OriginalValue.

Union Audit:                                                    All audit columns (which have now been transformed into rows) are union’d together for audit destination insert.

Audit Destination:                                          The audit table destination containing each column as a row with original value, changed value, etc.

The sample update and insert flows shown above are actually contained in a single data flow. The insert or update decisions are determined in a series of lookup transformations (not shown). Then, the audit output from the update and insert flows are union’d together again and stored in a “Raw File” destination. The raw file is passed to another data flow that handles the audit table inserts. A lookup of each data row using the passed business key gets the integer primary key value from the inserted / updated rows. The PK value is also required for the audit table.

Although there may be other ways to achieve the same goal of deriving a data row from a column, I’ve found that this basic method does it well using only “out of the box” SSIS components.

Posted in ETL | 1 Comment

Using Synonyms for Flexibility in a SQL Server Environment

Synonyms are a useful tool to make database aplications more flexible, portable and easier to code. A synonym can be viewed as a pointer to a fully qualified SQL Server object that is used in place of its fully qualified name. This allows application code to be more concise and easier to maintain.

Synonyms are a layer of abstraction that allow access to the underlying object without the need to know the name or location of the object. This shields the application from base object movement and name changes. They also provide a way to give objects shorter, more friendly names as opposed to SQL Server’s fully qualified object naming convention. This is specially appealing if you have end-users reporting against your database tables.

 A synonym can reference objects across schemas, databases and servers. For example, a synonyn named “syn_Employee” can be coded to reference a local table named “dbo.Employee”. If, for example, the synonym was changed to point to a different table named “dbo.Emp_2” on a another server or changed to reference a view “vEmployee”, calling applications would not require any code changes. As long as the column names remain the same and the required permissions are in place, this synonym change would be transparent.

Synonyms can reference tables, views, stored procedures and some user defined functions. It is interesting to note that similar to a view calling another view, a synonym can reference a view and a view can reference a synonym. Care should be taken in the number of possible levels this can spawn.

Synonyms cannot be called across a linked server. Instead, a synonym should be created on the local server referencing the linked server’s database/table and the app code whould then reference the local synonym.

Synonyms can be created with the “New Synonym” dialog in SQL Server 2008 SSMS. Expand Databases, select and expand desired database, right click Synonyms and select New Synonym.

 

Or, they can be compiled using a Create Synonym statement:

CREATE SYNONYM MySchema.MySynonym FOR MyDatabase.MySchema.MyTable

Consider this sample use for synonyms.

You have database with a table (dbo.Sales) containing sales transactions. In order to improve performance, older transactions are periodically archived to another table (dbo.Sales_Archive). Creating a synonym  referencing the table desired would allow applications to access either table without changing the called object’s name.

CREATE SYNONYM syn_Sales FOR MyDatabase.dbo.Sales

With the synonym coded as such, the sql code would access the dbo.Sales table using the name “syn_Sales”. (Select Col1, Col2, Col2 from syn_Sales where …).

Point the synonym to the table containing the archive data by executing the following:

DROP SYNONYM dbo.syn_Sales
CREATE SYNONYM dbo.syn_Sales FOR MyDatabase.dbo.Sales_Archive

The same sample code (Select Col1, Col2, Col2 from syn_Sales where …) would work without change.

Note that there is no “Alter” for a synonym. The synonym must be dropped and recreated for any change.

If the archive tables were in a different database, the synonym syntax would look like this:

CREATE SYNONYM dbo.syn_Sales FOR MyArchiveDatabase.dbo.Sales_Archive

If the archive tables were in a database on another server, all four qualifying levels of the name must be specified. Note that the linked server for the RemoteServer must be defined.

CREATE SYNONYM dbo.syn_Sales FOR RemoteServer.MyDatabase.dbo.Sales_Archive

Synonyms are loosley bound to their base object. This means that validation and permission checks are not done until run time. This also means that a synonym can be compiled with a reference to a non-existent object. It will not fail until the synonym is called. With this comes the advantage that a base object can be altered or dropped without affecting the synonym thus allowing DBAs the freedom to make changes to the underlying object without worrying about application code.

CREATE SYNONYM and ALTER SCHEMA permission are to required to create synonyms in a database / schema. Besides access permissions for a base object, select, update, etc. permissions are also needed for the synonyms themselves.

Generally, synonyms will be used in multiples. All the objects in a database or schema used in an application could be referenced by synonyms. So, to set an environment, multiple DROP / CREATE synonym statements would be issued:

DROP SYNONYM dbo.syn_Sales
CREATE SYNONYM dbo.syn_Sales FOR MyDatabase.dbo.Sales
DROP SYNONYM dbo.syn_Customer
CREATE SYNONYM dbo.syn_Customer FOR MyDatabase.dbo.Customer
DROP SYNONYM dbo.syn_Product
CREATE SYNONYM dbo.syn_Product FOR MyDatabase.dbo. Product

These statements can be issued either externally to the requesting application, or possibly internally by a switch in the application itself (note the permissions needed to drop and create a synonym). If one wanted to be really ambitious, a separate stored procedure or app could be developed to set all the synonyms needed for an application based on the environment desired, with parameters such as source and target databases and schemas and a list of objects, etc.

Consider using a naming convention for your synonyms. A prefix or suffix attached to your synonym name will make it easier to recognize an object as a synonym.

It is a good idea to always code all levels of the synonym name (Server.Database.Schema.Table). This standardizes your synonyms and makes it easier to classify and manage.

Although synonyms will add some additional complexity to database maintenance, a site with multiple environments can realize benefits when implementing a well thought out synonym strategy.

Posted in SQL Server | Leave a comment

Quartz.NET Job Scheduler Error: Failure occurred during job recovery

So, I was trying to hook up Quartz.NET with a Windows Service to provide a scheduled job that would execute on a daily trigger. I am using SQL Server as the persistence store for the jobs and triggers created by quartz.

I deployed and tested it on my local box and everything was working great until I moved it over to the QA box. I installed the service, no errors or warnings. But when I went to start the service, I got an error "The service did not respond in a timely fashion" and the service would not start. Fortuantely for me I had verbose logging turned on and noticed the following error in the log:

Failure occured during job recovery
Stack Trace:   at Quartz.Impl.AdoJobStore.JobStoreSupport.SchedulerStarted()
   at Quartz.Core.QuartzScheduler.Start()
   at Quartz.Impl.StdScheduler.Start()

I looked at the quartz source to see what was going on. The code that throws the error is below:


public virtual void SchedulerStarted()
{
if (Clustered)
{
  clusterManagementThread = new ClusterManager(this);
  clusterManagementThread.Initialize();
}
else
{
  try
  {
   RecoverJobs();
  }
  catch (SchedulerException se)
  {
   throw new SchedulerConfigException("Failure occured during job recovery.", se);
  }
}
misfireHandler = new MisfireHandler(this);
misfireHandler.Initialize();
schedulerRunning = true;
}


That sent me down the path to see what was the difference between my local database and the one in the QA environment. Long story short, it was basically that the "QRTZ_LOCKS" table was missing data . Corrected that and everything works fine now.


INSERT INTO [dbo].[QRTZ_LOCKS] VALUES(‘TRIGGER_ACCESS’);
INSERT INTO [dbo].[QRTZ_LOCKS] VALUES(‘JOB_ACCESS’);
INSERT INTO [dbo].[QRTZ_LOCKS] VALUES(‘CALENDAR_ACCESS’);
INSERT INTO [dbo].[QRTZ_LOCKS] VALUES(‘STATE_ACCESS’);
INSERT INTO [dbo].[QRTZ_LOCKS] VALUES(‘MISFIRE_ACCESS’);


So, in case you come across this issue:

1. Look at the connection string and make sure that whether using SQL Server or Windows Authentication, the user has read/write access to the quartz tables

2. Second, make sure you have populated all the necessary data in the Quartz tables.

Posted in Uncategorized | Leave a comment

SQL Server 2012 Windowing Functions Part 2 of 2: New Analytic Functions

This is the second part of my two-part article on windowing functions in SQL Server 2012. In Part 1, I explained the new running and sliding aggregation capabilities added to the OVER clause in SQL Server 2012. In this post, I’ll explain the new T-SQL analytic windowing functions. All of these functions operate using the windowing principles I explained in Part 1.

Eight New Analytic Functions

There are eight new analytic functions that have been added to T-SQL. All of them work in conjunction with an ordered window defined with OVER and an associated ORDER BY clause that can be optionally partitioned with a PARTITION BY clause and framed with a BETWEEN clause. The new functions are:

  • FIRST_VALUE
  • LAST_VALUE
  • LAG
  • LEAD
  • PERCENT_RANK
  • CUME_DIST
  • PERCENTILE_CONT
  • PERCENTILE_DISC

In the following code listing, the FIRST_VALUE, LAST_VALUE, LAG, and LEAD functions are used to analyze a set of orders at the product level.

DECLARE @Orders AS table(OrderDate date, ProductID int, Quantity int)
INSERT INTO @Orders VALUES
 ('2011-03-18', 142, 74),
 ('2011-04-11', 123, 95),
 ('2011-04-12', 101, 38),
 ('2011-05-21', 130, 12),
 ('2011-05-30', 101, 28),
 ('2011-07-25', 123, 57),
 ('2011-07-28', 101, 12)

SELECT
  OrderDate,
  ProductID,
  Quantity,
  WorstOn = FIRST_VALUE(OrderDate)
             OVER(PARTITION BY ProductID ORDER BY Quantity),
  BestOn = LAST_VALUE(OrderDate)
            OVER(PARTITION BY ProductID ORDER BY Quantity
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  PrevOn = LAG(OrderDate, 1)
            OVER(PARTITION BY ProductID ORDER BY OrderDate),
  NextOn = LEAD(OrderDate, 1)
            OVER(PARTITION BY ProductID ORDER BY OrderDate)
 FROM @Orders
 ORDER BY OrderDate

OrderDate  ProductID Quantity WorstOn    BestOn     PrevOn     NextOn
---------- --------- -------- ---------- ---------- ---------- ----------
2011-03-18 142       74       2011-03-18 2011-03-18 NULL       NULL
2011-04-11 123       95       2011-07-25 2011-04-11 NULL       2011-07-25
2011-04-12 101       38       2011-07-28 2011-04-12 NULL       2011-05-30
2011-05-21 130       12       2011-05-21 2011-05-21 NULL       NULL
2011-05-30 101       28       2011-07-28 2011-04-12 2011-04-12 2011-07-28
2011-07-25 123       57       2011-07-25 2011-04-11 2011-04-11 NULL
2011-07-28 101       12       2011-07-28 2011-04-12 2011-05-30 NULL

In this query, four analytic functions specify an OVER clause that partitions the result set by ProductID. The product partitions defined for FIRST_VALUE and LAST_VALUE are sorted by Quantity, while the product partitions for LAG and LEAD are sorted by OrderDate. The full result set is sorted by OrderDate, so you need to visualize the sorted partition for each of the four functions to understand the output—the result set sequence is not the same as the row sequence used in the windowing functions.

FIRST_VALUE and LAST_VALUE

The WorstOn and BestOn columns use FIRST_VALUE and LAST_VALUE respectively to return the “worst” and “best” dates for the product in each partition. Performance is measured by quantity, so sorting each product’s partition by quantity will position the worst order at the first row in the partition and the best order at the last row in the partition. FIRST_VALUE and LAST_VALUE can return the value of any column (OrderDate, in this case), not just the aggregate column itself. For LAST_VALUE, it is also necessary to explicitly define a window over the entire partition with ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Otherwise, as explained in my coverage of OVER clause enhancements in Part 1, the default window is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which frames (constrains) the window, and does not consider the remaining rows in the partition that are needed to obtain the highest quantity for LAST_VALUE.

In the output, notice that OrderDate, LowestOn, and HighestOn for the first order (product 142) are all the same value (3/18). This is because product 142 was only ordered once, so FIRST_VALUE and LAST_VALUE operate over a partition that has only this one row in it, with an OrderDate value of 3/18. The second row is for product 123, quantity 95, ordered on 4/11. Four rows ahead in the result set (not the partition) there is another order for product 123, quantity 57, placed on 7/25. This means that, for this product, FIRST_VALUE and LAST_VALUE operate over a partition that has these two rows in it, sorted by quantity. This positions the 7/25 order (quantity 57) first and the 4/11 (quantity 95) last within the partition. As a result, rows for product 123 report 7/25 for WorstDate and 4/11 for BestDate. The next order (product 101) appears two more times in the result set, creating a partition of three rows. Again, based on the Quantity sort of the partition, each row in the partition reports the product’s worst and best dates (which are 7/28 and 4/12, respectively).

LAG and LEAD

The PrevOn and NextOn columns use LAG and LEAD to return the previous and next date that each product was ordered. They specify an OVER clause that partitions by ProductId as before, but the rows in these partitions are sorted by OrderDate. Thus, the LAG and LEAD functions examine each product’s orders in chronological sequence, regardless of quantity. For each row in each partition, LAG is able to access previous (lagging) rows within the same partition. Similarly, LEAD can access subsequent (leading) rows within the same partition. The first parameter to LAG and LEAD specifies the column value to be returned from a lagging or leading row, respectively. The second parameter specifies the number of rows back or forward LAG and LEAD should seek within each partition, relative to the current row. The query passes OrderDate and 1 as parameters to LAG and LEAD, using product partitions that are ordered by date. Thus, the query returns the most recent past date, and nearest future date, that each product was ordered.

Because the first order’s product (142) was only ordered once, its single-row partition has no lagging or leading rows, and so LAG and LEAD both return NULL for PrevOn and NextOn. The second order (on 4/11) is for product 123, which was ordered again on 7/25, creating a partition with two rows sorted by OrderDate, with the 4/11 order positioned first and the 7/25 order positioned last within the partition. The first row in a multi-row window has no lagging rows, but at least one leading row. Similarly, the last order in a multi-row window has at least one lagging row, but no leading rows. As a result, the first order (4/11) reports NULL and 7/25 for PrevOn and NextOn (respectively), and the second order (7/25) reports 4/11 and NULL for PrevOn and NextOn (respectively). Product 101 was ordered three times, which creates a partition of three rows. In this partition, the second row has both a lagging row and a leading row. Thus, the three orders report PrevOn and NextOn values for product 101, respectively indicating NULL-5/30 for the first (4/12) order, 4/12-7/28 for the second (5/30) order, and 5/30-NULL for the third and last order.

The last functions to examine are PERCENT_RANK (rank distribution), CUME_DIST (cumulative distribution, or percentile), PERCENTILE_CONT (continuous percentile lookup), and PERCENTILE_DISC (discreet percentile lookup). The following queries demonstrate these functions, which are all closely related, by querying sales figures across each quarter of two years.

DECLARE @Sales table(Yr int, Qtr int, Amount money)
INSERT INTO @Sales VALUES
  (2010, 1, 5000), (2010, 2, 6000), (2010, 3, 7000), (2010, 4, 2000),
  (2011, 1, 1000), (2011, 2, 2000), (2011, 3, 3000), (2011, 4, 4000)

-- Distributed across all 8 quarters
SELECT
  Yr, Qtr, Amount,
  R = RANK() OVER(ORDER BY Amount),
  PR = PERCENT_RANK() OVER(ORDER BY Amount),
  CD = CUME_DIST() OVER(ORDER BY Amount)
 FROM @Sales
 ORDER BY Amount

-- Distributed (partitioned) by year with percentile lookups
SELECT
  Yr, Qtr, Amount,
  R = RANK() OVER(PARTITION BY Yr ORDER BY Amount),
  PR = PERCENT_RANK() OVER(PARTITION BY Yr ORDER BY Amount),
  CD = CUME_DIST() OVER(PARTITION BY Yr ORDER BY Amount),
  PD5 = PERCENTILE_DISC(.5)
         WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr),
  PD6 = PERCENTILE_DISC(.6)
         WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr),
  PC5 = PERCENTILE_CONT(.5)
         WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr),
  PC6 = PERCENTILE_CONT(.6)
         WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr)
 FROM @Sales
 ORDER BY Yr, Amount

Yr    Qtr  Amount   R  PR                 CD
----  ---  -------  -  -----------------  -----
2011  1    1000.00  1  0                  0.125
2011  2    2000.00  2  0.142857142857143  0.375
2010  4    2000.00  2  0.142857142857143  0.375
2011  3    3000.00  4  0.428571428571429  0.5
2011  4    4000.00  5  0.571428571428571  0.625
2010  1    5000.00  6  0.714285714285714  0.75
2010  2    6000.00  7  0.857142857142857  0.875
2010  3    7000.00  8  1                  1

Yr    Qtr  Amount   R  PR             CD    PD5      PD6      PC5   PC6
----  ---  -------  -  -------------  ----  -------  -------  ----  ----
2010  4    2000.00  1  0              0.25  5000.00  6000.00  5500  5800
2010  1    5000.00  2  0.33333333333  0.5   5000.00  6000.00  5500  5800
2010  2    6000.00  3  0.66666666667  0.75  5000.00  6000.00  5500  5800
2010  3    7000.00  4  1              1     5000.00  6000.00  5500  5800
2011  1    1000.00  1  0              0.25  2000.00  3000.00  2500  2800
2011  2    2000.00  2  0.33333333333  0.5   2000.00  3000.00  2500  2800
2011  3    3000.00  3  0.66666666667  0.75  2000.00  3000.00  2500  2800
2011  4    4000.00  4  1              1     2000.00  3000.00  2500  2800

The new functions are all based on the RANK function introduced in SQL Server 2005. So both these queries also report on RANK, which will aid both in my explanation and your understanding of each of the new functions.

PERCENT_RANK and CUME_DIST

In the first query, PERCENT_RANK and CUME_DIST (aliased as PR and CD respectively) rank quarterly sales across the entire two year period. Look at the value returned by RANK (aliased as R). It ranks each row in the unpartitioned window (all eight quarters) by dollar amount. Both 2011Q2 and 2010Q4 are tied for $2,000 in sales, so RANK assigns them the same value (2). The next row break the tie, so RANK continues with 4, which accounts for the “empty slot” created by the two previous rows that were tied.

Now examine the values returned by PERCENT_RANK and CUME_DIST. Notice how they reflect the same information as RANK with decimal values ranging from 0 and 1. The only difference between the two is a slight variation in their formulaic implementation, such that PERCENT_RANK always starts with 0 while CUME_DIST always starts with a value greater than 0. Specifically, PERCENT_RANK returns (RANK – 1) / (N – 1) for each row, where N is the total number of rows in the window. This always returns 0 for the first (or only) row in the window. CUME_DIST returns RANK / N, which always returns a value greater than 0 for the first row in the window (which would be 1, if there’s only row). For windows with two or more rows, both functions return 1 for the last row in the window with decimal values distributed among all the other rows.

The second query examines the same sales figures, only this time the result set is partitioned by year. There are no ties within each year, so RANK assigns the sequential numbers 1 through 4 to each of the quarters, for 2010 and 2011, by dollar amount. You can see that PERCENT_RANK and CUME_DIST perform the same RANK calculations as explained for the first query (only, again, partitioned by year this time).

PERCENTILE_DISC and PERCENTILE_CONT

This query also demonstrates PERCENTILE_DISC and PERCENTILE_CONT. These very similar functions each accept a percentile parameter (the desired CUME_DIST value) and “reach in” to the window for the row at or near that percentile. The code demonstrates by calling both functions twice, once with a percentile parameter value of .5 and once with .6, returning columns aliased as PD5, PD6, PC5, and PC6. Both functions examine the CUME_DIST value for each row in the window to find the one nearest to .5 and .6. The subtle difference between them is that PERCENTILE_DISC will return a precise (discreet) value from the row with the matching percentile (or greater), while PERCENTILE_CONT interpolates a value based on a continuous range. Specifically, PERCENTILE_CONT returns a value ranging from the row matching the specified percentile—or a calculated value higher than that (based on the specified percentile) if there is no exact match—and the row with the next higher percentile in the window. This explains the values they return in this query.

Notice that these functions define their window ordering using ORDER BY in a WITHIN GROUP clause rather than in the OVER clause. Thus, you do not (and cannot) specify ORDER BY in the OVER clause. The OVER clause is still required, however, so OVER (with empty parentheses) must be specified even if you don’t want to partition using PARTITION BY.

For the year 2010, the .5 percentile (CUME_DIST value) is located exactly on quarter 1, which had $5,000 in sales. Thus PERCENTILE_DISC(.5) returns 5000. There is no row in the window with a percentile of .6, so PERCENTILE_DISC(.6) matches up against the first row with a percentile greater than or equal to .6, which is the row for quarter 2 with $6,000 in sales, and thus returns 6000. In both cases, PERCENTILE_DISC returns a discreet value from a row in the window at or greater than the specified percentile. The same calculations are performed for 2011, returning 2000 for PERCENTILE_DISC(.5) and 3000 for PERCENTILE_DISC(.6), corresponding to the $2,000 in sales for quarter 2 (percentile .5) and the $3,000 in sales for quarter 3 (percentile .75)

As I stated, PERCENTILE_CONT is very similar. It takes the same percentile parameter to find the row in the window matching that percentile. If there is no exact match, the function calculates a value based on the scale of percentiles distributed across the entire window, rather than looking ahead to the row having the next greater percentile value, as PERCENTILE_DISC does. Then it returns the median between that value and the value found in the row with the next greater percentile. For 2010, the .5 percentile matches up with 5000 (as before). The next percentile in the window is for .75 for 6000. The median between 5000 and 6000 is 5500 and thus, PERCENTILE_CONT(.5) returns 5500. There is no row in the window with a percentile of .6, so PERCENTILE_CONT(.6) calculates what the value for .6 would be (somewhere between 5000 and 6000, a bit closer to 5000) and then calculates the median between that value and the next percentile in the window (again, .75 for 6000). Thus, PERCENTILE_CONT(.6) returns 5800; slightly higher than the 5500 returned for PERCENTILE_CONT(.5).

Conclusion

This post explained the eight new analytic functions added to T-SQL in SQL Server 2012. These new functions, plus the running and sliding aggregation capabilities covered in Part 1, greatly expand the windowing capabilities of the OVER clause available since SQL Server 2005.

Posted in SQL Server | Tagged | 1 Comment