Tuesday, November 18, 2008

Avoid cursors in SQL Server with these methods to loop over records

Many articles have beaten up on SQL Server cursors -- database objects that manipulate data in a set on a row-by-row basis -- and I want to add my name to the list of people who wish cursors had never been introduced. But, unfortunately, cursors are a fact of life. Problems with cursors include extending locks, their inability to cache execution plans and CPU/RAM overhead. Many T-SQL programmers and DBAs do not know how to successfully loop over records without the need for cursors. In this tip, I'll share some alternatives to cursors that provide looping functionality.

Method 1: Temp table with identity column

In the first approach, we will use a temp table with an identity column added to allow for row-by-row selection. If you're performing an INSERT/UPDATE/DELETE, be sure to use the explicit transactions. This vastly reduces the load on your log file by committing per loop, and it prevents huge rollbacks in the case of failure.

set nocount on
declare @i int --iterator
declare @iRwCnt int --rowcount
declare @sValue varchar(100)

set @i = 1 --initialize

create table #tbl(ID int identity(1,1), Value varchar(100))

insert into #tbl(Value)
select name
from master..sysdatabases (nolock)

set @iRwCnt = @@ROWCOUNT --SCOPE_IDENTITY() would also work

create clustered index idx_tmp on #tbl(ID) WITH FILLFACTOR = 100
/*

Always do this after the insert, since it's faster to add the index in bulk than to update the index as you write into the temp table. Since you know the data in this column, you can set the fill factor to 100% to get the best read times.

*/
while @i <= @iRwCnt
begin
select @sValue = Value from #tbl where ID = @i

--begin tran
print 'My Value is ' + @sValue --replace with your operations on this value
--commit tran

set @i = @i + 1
end
drop table #tbl


Method 2: Temp table without ID

In the second approach, we use a temp table without an identity column and simply grab the top row to process, then loop until we find no more rows to process. If you're performing an INSERT/UPDATE/DELETE, again, be sure to use the explicit transactions to vastly reduce the load on your log file by committing per loop, which prevents huge rollbacks in the case of failure.

set nocount on

declare @i int --iterator
declare @iRwCnt int --rowcount
declare @sValue varchar(100)
set @i = 1 --initialize

create table #tbl(Value varchar(100))

insert into #tbl(Value)
select name
from master..sysdatabases (nolock)

set @iRwCnt = @@ROWCOUNT --SCOPE_IDENTITY() would also work

create clustered index idx_tmp on #tbl(Value) WITH FILLFACTOR = 100
/*

Always do this after the insert, since it's faster to add the index in bulk than to update the index as you write into the temp table. Since you know the data in this column, you can set the fill factor to 100% to get the best read times.

*/

while @iRwCnt > 0
begin
select top 1 @sValue = Value from #tbl
set @iRwCnt = @@ROWCOUNT --ensure that we still have data

if @iRwCnt > 0
begin

--begin tran
print 'My Value is ' + @sValue --replace with your operations on this value
--commit tran

delete from #tbl where value = @sValue --remove processed record
end
end

drop table #tbl


Method 3: Selecting a comma-delimited list of items

When most developers/DBAs are asked to come up with a list of comma-delimited values from a table, they typically use a cursor or temp table (as above) to loop through the records. However, if you do not need to use a GROUP BY or an ORDER BY, then you can use the method below that operates in batch to handle the task. This cannot be used with GROUP BY DISTINCT, or ORDER BY, because of how SQL Server handles those operations.

Basically, this takes a given variable, and for every row in the table it adds the current value to the variable along with a comma.

declare @vrs varchar(4000)
declare @sTbl sysname
set @sTbl = 'TableName'
set @vrs = ''
select @vrs = @vrs + ', ' + name from syscolumns where id = (select st.id from sysobjects as st where name = @sTbl) order by colorder
set @vrs = right(@vrs, len(@vrs)-2)
print @vrs

This article gives you some good reasons why cursors in SQL Server should be avoided as well as some alternatives that give you looping functionality. Keep in mind that SQL Server is designed around batch processing, so the less you loop, the faster your system will run.

3 comments:

Pankaj said...

Too good.... very useful..

Thanks

Anonymous said...

Great work dude!!

I have a temp table having data in millions and then have to iterate it. @ROWCOUNT iteration method added significant amount in terms of performance.
Tanks you.

Unknown said...

Great work dude!!

I have a temp table having data in millions and then have to iterate it. @ROWCOUNT iteration method added significant amount in terms of performance.
Tanks you.