Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

The BCP Option

dstrickland

Moving data from one repository to the next may sound like a trivial task, and in the sense of the normal mechanical approach it very well may be.  What could be simpler than INSERT INTO Here SELECT * FROM There?  Not much, it would seem, until your data source is something other than a SQL Server table or your data volume is so heavy that such a simple statement chokes, or worse yet, affects the transactional system’s performance.  What can you do when your data volumes are large enough that you need to consider faster approaches than manual SQL Statements?

Fortuitously, Microsoft has provided us with the BCP utility.  BCP is a command line executable that is optimized to move data very quickly from one place to another.  BCP is flexible enough to allow transformations of data in the operation, as well as logging, authentication, and batch processing.  Microsoft has also provided a  T-SQL based convention for moving large chunks of data efficiently known as the BULK INSERT command.  Like the BCP utility, BULK INSERT has been optimized for large data loads.  In case you were wondering what the differences were and how they might affect your situation I have listed a couple of highlights

  • BULK INSERT executes in process, bcp executes out of process
  • Bcp copies data to files on the file system and then operates on those files, BULK INSERT executes within the Transact-SQL Engine
  • BCP makes use of a legacy protocol called TDS (tabular data streaming) and cannot stream rows larger than 8k, BULK INSERT will work on rows larger than 8k
  • BULK INSERT may be executed from within Query Analyzer or an ADO.NET connection, but still requires format files and data files as part of the source definition.

In the meantime here are some caveats and some resources to help you have a frame of reference when starting a BCP task.

Considerations and Caveats

  • When Importing Turn Off Auto Stats
    • Turn off Auto-create stats and Auto-update stats after creating the database
    • Run the BCP process
    • Use sp_createstats to create statistics afterwards
    • Why? – this configuration change will disable automatic statistics management in the db. SQL Server uses these statistics when manufacturing query plans. If you were to do a large insert operation with statistics enabled SQL Server will try to manage those statistical measures while the Bulk Copy operation is happening which will degrade the performance of the insert as well as the database in general. It is better to let SQL Server compute the statistics after the entire Bulk Copy is done rather than while in process.
  • Before importing check data row size of the source
    • You can’t use bcp when table or view data rows are more than 8k. In those situations you will need to use the T-SQL BULK INSERT statement instead.
    • As a side note, this exception often masks another. If you fail to specify the correct row terminator value, the bcp utility will continue to read past the end of the first row until it hits its 8 max. This is the exception that will be generated, but it simply masks the real issue.
  • Don’t run as a generic account
    • While you can specify the administrator as the user when performing the operation it is best from and audit standpoint to use an account that is assigned to the operator and only the operator.
  • Double check disk space
    • On the target db
    • Any file server used
  • Change target db Log settings to bulk-logged recovery model
    • Why? – this configuration change limits the amount of transaction log space used to record the transactions. Without making this change, you may fill the transaction log, run out of disk space, and bring the server to a grinding halt.
  • Set target db into single user mode
  • If you have multiple processes that need to bulk copy from separate sources to the same destination table consider running the processes in parallel as outlined here: Importing Data in Parallel with Table Level Locking.
  • Consider configuring batches in your bulk copy processes as documented here Managing Batches for Bulk Import to avoid having to reprocess successful rows if a large load fails and the transaction rolls back. This concept is analogous to a transaction checkpoint.
  • By default Insert and Update Triggers WILL NOT fire using either bulk copy method. This can be overridden by providing a hit/argument to BULK INSERT or bcp respectively. From a performance perspective this is typically a good thing, but from a database consistency perspective you will need to analyze whether the triggers need to fire or not.
  • Consider disabling table constraints during the bulk copy operation. Regardless of the technique you use, you will have better performance if you have the ability to disable constraints on the target table. Using bcp or BULK INSERT you can override default values by specifying that the bulk copy operation should keep nulls. Be careful when doing this since it may mean that re-enabling constraints has to be done with NOCHECK and in doing so you may expose null check weaknesses in any applications that consumes the data
  • In terms of ordering data, pay it forward. What I mean by that is, you should analyze how data is ordered in the target table and then specify that order when pulling the data out of the source table. This will mean the database has less work to do once you re-enable statistics after the import is over.
  • Consider locking the table during the bulk import to improve the bulk import performance by reducing the lock contention on the table. When considering this option you must remain cognizant of the table usage and the timing of the bulk import. If the table is hot, or is written to frequently by a synchronous application this may not be viable. For more information see the article here Controlling the Locking Behavior for Bulk Import.
  • If copying data from one SQL Server to another you should probably specify that the bulk copy operation use the native format for data. There isn’t much of a reason to translate data to a different format when both ends of the pipe speak the same language.

Samples

BCP

The following command will create a format file that contains the table metadata for the Department table in the AdventureWorks db.

bcp AdventureWorks.HumanResources.Department format nul -T -n -f Department-n.fmt

The following line when executed from a command prompt on a machine that has the AdventureWorks DB loaded will select the contents of the Department table and create a data file that can be used for importing.  Here is the command.  Feel free to give it a shot.

bcp “SELECT * FROM AdventureWorks. HumanResources.Department ”  queryout “Department.dat” -T -c

This final command will now import the data into the specified target table using the previously created format file to validate the data contained in the data file.  The example assumes you created a new copy of the Department table called Department2 with no rows so that you can try out the test script.

bcp AdventureWorks.HumanResources.Department2 in Department.dat -T -f Department-n.fmt

BULK INSERT

To accomplish the same task in a bulk insert command the following would be the approach
BULK INSERT AdventureWorks.HumanResources.Department

    FROM 'Department.dat'

    WITH

        (

            FIELDTERMINATOR = '|',

            ROWTERMINATOR = '|\n',

             FORMATFILE = ‘Department-n.fmt'

        )

  
Notice that the BULK INSERT command still makes use of the format file and data file we created in the bcp example.  Keep in mind that BULK INSERT requires those inputs so they will have to be created somehow.

Appendix A – BCP Documentation

From ( http://msdn.microsoft.com/en-us/library/ms162802.aspx )

bcp Utility – Command Syntax

bcp {[[database_name.][owner].]{table_name | view_name} | "query"}
    {in | out | queryout | format} data_file
    [-mmax_errors] [-fformat_file] [-x] [-eerr_file]
    [-Ffirst_row] [-Llast_row] [-bbatch_size]
    [-n] [-c] [-N] [-w] [-V (60 | 65 | 70 | 80)] [-6]
    [-q] [-C { ACP | OEM | RAW | code_page } ] [-tfield_term]
    [-rrow_term] [-iinput_file] [-ooutput_file] [-apacket_size]
    [-Sserver_name[\instance_name]] [-Ulogin_id] [-Ppassword]
    [-T] [-v] [-R] [-k] [-E] [-h"hint [,...n]"]

Example

bcp pubs.dbo.authors out c:\temp\steventest.txt -n -Sstevenw -Usa -P
-eC:\temp\error.txt
In this example the pubs.dbo.authors table is the source.  The operation direction is out.  The exported data will be stored in a file called steventest.txt.  The export will export data in the native format per the -n argument.  The server to use is stevenw, the logon id is sa, and the password is blank.  Any errors that are encountered will be logged in error.txt.
For a full listing of command arguments with an explanation of their usage please refer to the URL listed at the beginning of this section.

Appendix B – BULK INSERT (T-SQL)

BULK INSERT (T-SQL) - Command Syntax
BULK INSERT [['database_name'.]['owner'].]{'table_name' FROM data_file}
[WITH
(
[ BATCHSIZE [= batch_size]]
[[,] CHECK_CONSTRAINTS]
[[,] CODEPAGE [= 'ACP' | 'OEM' | 'RAW' | 'code_page']]
[[,] DATAFILETYPE [=
{'char' | 'native'| 'widechar' | 'widenative'}]]
[[,] FIELDTERMINATOR [= 'field_terminator']]
[[,] FIRSTROW [= first_row]]
[[,] FORMATFILE [= 'format_file_path']]
[[,] KEEPIDENTITY]
[[,] KEEPNULLS]
[[,] KILOBYTES_PER_BATCH [= kilobytes_per_batch]]
[[,] LASTROW [= last_row]]
[[,] MAXERRORS [= max_errors]]
[[,] ORDER ({column [ASC | DESC]} [,...n])]
[[,] ROWS_PER_BATCH [= rows_per_batch]]
[[,] ROWTERMINATOR [= 'row_terminator']]
[[,] TABLOCK]
)
]
Example

BULK INSERT Northwind.dbo.[Order Details]

    FROM 'f:\orders\lineitem.tbl'

    WITH

        (

            FIELDTERMINATOR = '|',

            ROWTERMINATOR = '|\n'

        )
This example inserts values into the Order Details table from the lineitem.tbl file.  It specifies that rows are terminated with the new line character and that fields are pipe delimited.

Tags: ETL,

Related Articles

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\