Partitioned Fact Tables
Once tables grow into the millions of records, they become candidates for partitioning. Table partitioning offers many benefits, particularly in warehouse environments. Since data is split into smaller units of storage, backups can target filegroups with a higher rate of change. Systems with multiple CPUs see improved query performance as partitioned data leads to greater parallelism. Perhaps most significant is the ability to swap in huge amounts of data by partition switching, an operation that is practically instantaneous.
Tables can be partitioned horizontally or vertically. With vertical partitioning, columns are split out into separate physical tables. This post focuses on horizontally partitioned tables, which take advantage of new constructs Microsoft added in the 2005 release – partition functions and partition schemes. Table partitioning is an Enterprise Edition only feature in SQL Server 2005 and 2008.
Partitioned tables store rows in separate filegroups. Typically, these filegroups will reside on different disks. There are three steps required to partition a table or index:
1) Create a Partition Function
Data will be partitioned by a particular column. The partition function defines the data ranges which define the boundaries of your partitions. Here’s an example:
CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000);
This command creates a partition function named myRangePF1. A partition function must provide a data-typed input parameter. This data type must ultimately map onto the column in your table used to define partition ranges. That said, it’s important to note that this partition function is a logical range – it’s not tied to any particular table, column or filegroups.
2) Create a Partition Scheme
Partition schemes provide the mapping between a particular partition function and filegroups. Here’s a sample statement to create a partition scheme:
CREATE PARTITION SCHEME myRangePS1 AS PARTITION myRangePF1 TO (test1fg, test2fg, test3fg, test4fg);
The partition scheme references an existing partition function (myRangePF1) as well as a set of existing filegroups. While partition schemes are tied to a partition function, they are not specific to any particular table.
3) Place a table on the Partition Scheme
A partitioned table looks identical to any other table with the exception that it is created on a partition scheme instead of a filegroup. Here’s an example:
CREATE TABLE PartitionTable (col1 int, col2 char(10)) ON myRangePS1 (col1) ;
The table is placed on the partition scheme myRangePS1, and the column col1 is specified as the argument which will define which filegroup data is stored in. Indexes may also be place on partition schemes. When the index and table are placed on the same partition scheme, they are said to be aligned – this is one of the requirements to enable partition switching.
Subsequent INSERT statements against PartitionTable will be automatically partitioned based on the value in col1. Typically, the column which defines partition boundaries is the foreign key to the date dimension – this tends to be the easiest way to evenly distribute incoming records.
The SWITCH Statement
INSERT statements may be the bread and butter of adding data to a table, but it’s not optimized for bulk loads. Partitioning would be worthwhile from a management perspective if all it provided was easier maintenance of large data sets. The SWITCH statement, however, provides a much faster solution to loading huge amounts of data and does not run into blocking or contention issues as INSERT often can.
The SWITCH statement accomplishes all this by changing only metadata in the partitioned fact table. All table and index data in SQL Server is ultimately stored in 8KB pages, implemented as doubly linked lists. This means each page contains a pointer to the preceding and trailing page. In the case of partitioned tables, the last page of a filegroup points to the start of the next partition. SWITCH manipulates these pointers, so that an entirely new partition may be swapped in, or removed, simply by updating the trailing pointer.
The syntax to do this looks like this:
ALTER TABLE MyPartitionedTable SWITCH PARTITION @partition_number TO dbo.MyNewPartition
There are plenty of conditions which must be met for this switch to work. Many of them are common sense – the incoming partition must have the same structure as the partitioned table, and partitions must exist on the same column.