Friday, October 24, 2008

Using BULK INSERT to Load a Text File

his example combines dynamic SQL, BULK INSERT and the proper handling of double-quotes to solve a client's problem with loading various text file formats into a database. (This article has been updated through SQL Server 2005.)

One of my clients contacted me recently and said they needed some help creating a stored procedure that imported data from a text file. They wanted the procedure to accept three parameters: PathFileName, OrderID, and FileType. The PathFileName is simply the name and physical location of the source file on the hard drive, the OrderID is generated in the program that calls the procedure and the FileType indicates the format of the data in the source file. The two possible formats for the source data are shown here:

FileType=1 (TxtFile1.txt)

"Kelly","Reynold","kelly@reynold.com"
"John","Smith","bill@smith.com"
"Sara","Parker","sara@parker.com"

FileType=2 (TxtFile2.txt)

Kelly,Reynold,kelly@reynold.com
John,Smith,bill@smith.com
Sara,Parker,sara@parker.com

BULK INSERT

I decided to use BULK INSERT to implement the solution. The BULK INSERT statement was introduced in SQL Server 7 and allows you to interact with bcp (bulk copy program) via a script. In pre-7 versions the only way you could access bcp functionality was from a command prompt. I am not going to list the full syntax of BULK INSERT here (but you can find it here), because it is a little long and most of it does not apply to the problem I am solving. Instead, I will show the valid BULK INSERT statements used to load the data shown above.

BULK INSERT TmpStList FROM 'c:\TxtFile1.txt' WITH (FIELDTERMINATOR = '","')

TmpStList is the target table and TxtFile1.txt is the source data file. The source file is located in the root of the C drive. The FIELDTERMINATOR argument allows you to specify the delimeter used to discern column values.

The valid statement for FileType=2 is shown here:

BULK INSERT tmpStList FROM 'c:\TxtFile2.txt' WITH (FIELDTERMINATOR = ',')

The only difference is the value of the FIELDTERMINATOR argument.
The Solution

The stored procedure used to implement the solution is fairly straight forward once you master the BULK INSERT statement. The only real trick is loading the data that comes to you in FileType=1 format. Because a double-quote starts and ends a data row, it too is loaded in the table. The FIELDTERMINATOR works between columns, not at the beginning or end of a row. To workaround this I simply load the data into a temporary table and then use a CASE statement and the SUBSTRING and DATALENGTH functions to load the correct data in the final table. The FileType=2 data will load as-is, but I still put in the temporary table for consistency (easier programming).

The SQL statements that create the temporary and final table are shown here.

CREATE TABLE StudentList
(
StID int IDENTITY NOT NULL,
StFName varchar(50) NOT NULL,
StLName varchar(50) NOT NULL,
StEmail varchar(100) NOT NULL,
OrderID int NOT NULL
)
go
CREATE TABLE TmpStList
(
stFName varchar (50) NOT NULL,
stLName varchar (50) NOT NULL,
stEmail varchar (100) NOT NULL
)
go

The procedure used to implement the data loading is shown here.

SET QUOTED_IDENTIFIER OFF
go
CREATE PROCEDURE ps_StudentList_Import
@PathFileName varchar(100),
@OrderID integer,
@FileType tinyint
AS

--Step 1: Build Valid BULK INSERT Statement
DECLARE @SQL varchar(2000)
IF @FileType = 1
BEGIN
-- Valid format: "John","Smith","john@smith.com"
SET @SQL = "BULK INSERT TmpStList FROM '"+@PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
END
ELSE
BEGIN
-- Valid format: John,Smith,john@smith.com
SET @SQL = "BULK INSERT TmpStList FROM '"+@PathFileName+"' WITH (FIELDTERMINATOR = ',') "
END

--Step 2: Execute BULK INSERT statement
EXEC (@SQL)

--Step 3: INSERT data into final table
INSERT StudentList (StFName,StLName,StEmail,OrderID)
SELECT CASE WHEN @FileType = 1 THEN SUBSTRING(StFName,2,DATALENGTH(StFName)-1)
ELSE StFName
END,
SUBSTRING(StLName,1,DATALENGTH(StLName)-0),
CASE WHEN @FileType = 1 THEN SUBSTRING(StEmail,1,DATALENGTH(StEmail)-1)
ELSE StEmail
END,
@OrderID
FROM tmpStList

--Step 4: Empty temporary table
TRUNCATE TABLE TmpStList
go

The first thing you need to know is that SET QUOTED_IDENTIFIER is set to OFF because double-quotes are used to set the value of a variable. Dynamic SQL is used to create the BULK INSERT statement on-the-fly, and double-quotes are required to do this. The final BULK INSERT statement is a function of both the @PathFileName and @FileType parameters. Once built, it is executed with the EXEC() statement and the source data is loaded into the temporary table.

Once the data is in TmpStList, the next step is to load it into the final table. I use the CASE statement to determine the value in the @FileType parameter and manipulate accordingly. When @FileType=1, the SUBSTRING and DATALENGTH functions are used to remove the double-quotes from the StFName and StEmail columns. When FileType=2, the data is loaded as is and no manipulation is required.

After the data is loaded I empty the temporary table with the TRUNCATE TABLE statement. I could have used DELETE to accomplish this, but TRUNCATE TABLE has less of an impact on the transaction log.

The following shows the way to call the procedure specifying a different FileType value for each call.

EXEC ps_StudentList_Import 'c:\TxtFile1.txt',1, 1
EXEC ps_StudentList_Import 'c:\TxtFile2.txt',1, 2

No comments: