Improving performance of inserting multiple parent/child tables in a single SQL Procedure
Tom Babiec wrote a great blog a few months back on inserting multiple parent child tables in a single stored procedure. We use this technique a lot in our data integration work, and it’s proven to be very robust in many contexts. The SQL procedure outlined in that blog is useful not just for BizTalk, but generally speaking for ADO.NET and other applications trying to load multiple rows into SQL Server for multiple parent child tables. However, when dealing with larger datasets (and as the table grows), we’ve noticed some degradation in performance. In some cases, we were seeing variances of 30 seconds to 10+ minutes for loading the same exact data set on the same database. We tried a few different options, including forcing a recompile of the stored procedure between each load
, but this did not solve the problem. Finally, taking a look at the query plan gave some additional insight for a fairly simple fix.
By default, the query optimizer is using the Nested Loops operator to do the final MERGE between the input table and the tracker table:
Note that the table scan can be avoided by ensuring there’s a proper primary key on the source table – but this table scan was of little consequence in our testing, as it’s scanning an input table variable either way.
This works well when at least one of the datasets is small (e.g. 10 rows or less), and is the most performant join operator in that case. However, when either the source tables get larger or the destination tables get larger, we’ve noticed the query optimizer fails to change to using Merge Join for that final merge operation, which is the preferable operator when dealing with two larger datasets. We were occasionally able to get this to happen by changing around indexes on the table variables, but we’d consistently find the query plan going back to Nested Loops. There are a few possible reasons for this:
- The query optimizer assumes that the table variables coming in to the procedure will normally be small. This is probably usually a fair assumption.
- Merge Join is more expensive when one of the result sets is in fact small, so Nested Loops will be preferable for most cases.
- It may become particularly unpredictable when joining the result set within the USING portion of the merge statement – the optimizer may incorrectly assume this is going to result in fewer rows.
- Merge Join will require two additional sort operations – however, this sort will be cheap since the sort is actually happening in relation to existing clustered indexes (so the data is already sorted)
Whatever the final reasons, testing showed that Merge Joins were vastly better here. There were two attempts we had; first, we tried just using the MERGE hint for the source table, but this would result in Nested Loops on the destination table; finally, we ended up an OPTION (MERGE JOIN); for the whole merge statement, forcing the engine to use Merge Join over Nested Loops to the final table and the intermediate source table:
MERGE TargetTableName target USING (SELECT * FROM @sourceTable src INNER JOIN @trackerTable tracker ON src.Pk = tracker.Pk) as remapTable ON 1=0 -- force insert WHEN NOT MATCHED THEN INSERT (col1, col2, col3, etc...) VALUES (col1, col2, col3, etc...) OUTPUT INSERTED.pk_id, remapTable.old_id into @nextTrackerTable OPTION (MERGE JOIN);
Which results in a query plan like this:
Using the OPTION (MERGE JOIN); does the trick here, and renders the more favorable query plan for these scenarios. Using this technique showed no noticeable change in smaller loads (where most input tables had fewer than 10 rows); but for larger loads (even up to hundreds of thousands of rows in children), we saw data loads go from 10+ minutes down to >1 minute.