Thursday, March 5, 2009

Importing and exporting bulk data with SQL Server's bcp utility

The Bulk Copy Program (bcp) utility in Microsoft SQL Server enables database administrators to import bulk data into a table or export it from a table into a file. It also supports several options that define how data will be exported, where it will be imported and which data will be loaded.
This tip discusses several examples of bcp commands to bulk-copy data in and out of SQL Server tables. The examples have been tested on SQL Server 2005 and SQL Server 2008, and for all examples I used the AdventureWorks sample database.

Exporting data with the bcp utility

One of the simplest operations that you can perform with the bcp utility is to bulk-copy data out of a SQL Server table into a text file. By entering the command at a Windows command prompt, you can run a bcp command. The following command, for example, copies the data from the Sales.vSalesPerson view in the AdventureWorks database to the C:\Data\SalesPerson.txt file:

bcp AdventureWorks.Sales.vSalesPerson out C:\Data\SalesPerson.txt -c –T

As you can see, the bcp command begins with the utility name, followed by the fully qualified name of the table (database.schema.table). The out keyword is next, which tells the bcp utility that the data will be exported from the table. The path and file name of the target text file follows the out keyword. Note that the command examples in this tip might wrap to multiple lines. All the examples, however, should be run as a single command.

In addition to these basic arguments, the bcp utility supports several switches that control the utility's behavior. In the example above, the -c switch indicates that all data is treated as character data, regardless of how it is stored in the source table. If you do not specify the -c switch or another type-related switch, you will be prompted to specify the type for each column after you enter the bcp command. You may notice that bcp command switches are case sensitive.

The next switch in the above example is -T, which tells the bcp utility to use a trusted connection to connect to the SQL Server instance. If you do not specify -T, you must provide your user name (the -U switch) and your password (the -P switch), or you will be prompted for this information.

Because no instance is specified in the preceding example, the utility uses the default instance on the local machine. To specify a SQL Server instance, use the -S switch, followed by the server name, as in the following example:

bcp AdventureWorks.Sales.vSalesPerson out C:\Data\SalesPerson.txt -c -T -S Server01

The bcp utility will now connect to the default instance on Server01. If you want to connect to an instance other than the default, you must specify the instance name along with the server name, as in Server01\SqlSrv.

By default, the bcp utility uses tabs to separate fields in the target file. You can, however, override that behavior by using the -t switch, as in the following example:

bcp AdventureWorks.Sales.vSalesPerson out C:\Data\SalesPerson.csv -c -T -t,

In this case, a comma is specified after the -t switch, which means that the data fields will now be separated by commas. This allows you to save the data to a .csv file, as I've done here, which can be handy if you want to view the data in a Microsoft Excel file.

These examples have been limited to exporting data from a table. But you can also use the bcp command to run a Transact-SQL query and export the query results. The following bcp command, for example, includes a SELECT statement that retrieves only the SalesPersonID, FirstName, and LastName columns from the vSalesPerson view:

bcp "SELECT SalesPersonID, FirstName, LastName FROM AdventureWorks.Sales.vSalesPerson"
queryout C:\Data\SalesPerson.csv -c -T -t,

In this scenario, the query, enclosed in quotes, is passed into the bcp command instead of the table name. In addition, the out keyword has been replaced with queryout. But the rest of the command is the same as the preceding example. As a result, the SalesPerson.csv file will now contain only the three specified columns. You can also refine your query further: You can include, for example, a WHERE clause that qualifies which rows are returned from the source table.

Importing data with the bcp utility

The bcp utility makes importing data as simple as exporting data. To run the examples in this section, first run the following T-SQL script to create the SalesPeople table in the AdventureWorks database:

USE AdventureWorks
GO
IF OBJECT_ID (N'SalesPeople', N'U') IS NOT NULL
DROP TABLE dbo.SalesPeople
GO
CREATE TABLE dbo.SalesPeople (
SalesPersonID INT IDENTITY PRIMARY KEY,
FirstName NVARCHAR(50) NOT NULL,
LastName NVARCHAR(50) NOT NULL
)

To import data, you need a source file from which to copy the data. For instance, the following command uses the recently created SalesPerson.csv file to load data into the SalesPeople table:

bcp AdventureWorks.dbo.SalesPeople in C:\Data\SalesPerson.csv -c -T -t,

First, you must specify the target table, followed by the in keyword that replaces out or queryout in this scenario. Next, you must specify the path and file name of the source file, followed by any applicable switches.

When you run this command and view the results, note that while the source file includes the salesperson IDs, these values are not inserted into the SalesPersonID column. The column is defined as an IDENTITY column, so the source data is ignored. To retain the original values, you must add the -E switch to the command, as in the following example:

bcp AdventureWorks.dbo.SalesPeople in C:\Data\SalesPerson.csv -c -T -t, -E

Your table will now include the desired data.

Using a format file

When importing or exporting data, you may find that the structure of your source data does not match the structure of the target data. The columns in a text file, for example, might not be in the same order as those in your target table, or the number of columns may be different. You can address this issue by creating a format file that maps the structures of the source and destination. Let's look at an example to demonstrate how this works.

Suppose you use the following command to export data from the vSalesPerson view to the SalesPeople.txt file:

bcp "SELECT LastName, FirstName, SalesPersonID FROM AdventureWorks.Sales.vSalesPerson"
queryout C:\Data\SalesPeople.txt -c -T -t,

This command uses the same arguments as those in previous examples. Note, however, the order in which the columns are retrieved from the view: LastName, then FirstName, and finally SalesPersonID.

Now suppose you plan to use this file to import data into the SalesPeople table. The columns in the table, however, are defined in a different order from that in the text file. To address this issue, you can create a format file to map the columns from the source to the destination. The following command demonstrates how to create a format file:

bcp AdventureWorks.dbo.SalesPeople format nul -f C:\Data\SalesPeople.fmt -c -T -t,

As previous examples show, the command first specifies the target table. This time, however, the table name is followed by the keywords format nul, which indicate that the bcp utility should create a format file. The -f argument is then used to specify a path and file name for the format file, followed by several switches. As a result, when you run this command, a format file is generated that contains the structure of the SalesPeople table.

The following data shows the content of the SalesPeople.fmt format file that is generated by the above command:

10.0
3
1 SQLCHAR 0 12 "," 1 SalesPersonID ""
2 SQLCHAR 0 100 "," 2 FirstName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\r\n" 3 LastName SQL_Latin1_General_CP1_CI_AS


The first row of the file (10.0) identifies the version of bcp that is being used. The second row (3) identifies the number of columns in the table, and the next three rows provide information about the columns themselves:


The first field indicates the order in which the columns should appear in the source document.


The second field shows the source file data type for each column. Because I specified the -c switch when I generated the file, a character data type is used for all fields as they are extracted from the data file. SQL Server will then convert this data into the correct types as the data is inserted.


The third field shows the prefix length for the field, which is used by SQL Server to provide the most compact file storage. A value of 0 is used automatically if you specify the -c switch when creating the format file.


The fourth field indicates the number of bytes for the data type of a particular field.


The fifth field shows how columns and rows are terminated. Because the -t switch was used when creating the format file, the field values of the source file must be terminated by commas.


The sixth field maps to the order in which the columns appear in the SQL Server table.


The seventh and final field provides the collation information for character columns in the SQL server table.

To use the format file to import data into the SalesPeople table, we must modify the file as follows:

10.0
3
1 SQLCHAR 0 100 "," 3 LastName SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "," 2 FirstName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 12 "\r\n" 1 SalesPersonID ""


As you can see, the order of the columns has been modified to reflect how they appear in the format file. The SalesPersonID column is now listed last and it is terminated with \r\n. The LastName column is now listed first and is terminated with a comma.

After you modify and save the format file, it is ready to use in your bcp commands. The following example demonstrates how to call the format file:

bcp AdventureWorks.dbo.SalesPeople in C:\Data\SalesPeople.txt -f C:\Data\SalesPeople.fmt –T

Note that, when you import data from the SalesPeople.txt file, you must also use the -f switch to call the format file. Also notice that you no longer need to include the -t and -c switches because the information is now included in the format file.

No comments: