Sunday, January 18, 2009

Seek or scan?

One very common question that I see on the forums is on index seeks and index scans. A query is resulting in a table/clustered index scan, even though there’s an index on one or more of the columns been searched on.

One of the more common reasons that this happens is because the index in question is not covering, and SQL has determined that the cost of doing the lookups to fetch the extra columns is higher than the cost of scanning the entire table.

If an index does not cover a query, then bookmark lookups are required to get the additional columns, bookmark lookups are run one row at a time, and are seeks on the clustered index. Hence it’s clear that bookmark lookups on a large number of rows are exceedingly expensive and that is why SQL will switch to a clustered index/table scan when lookups are required on a significant percentage of the rows in the table.

So, what consitutes a significant percentage of the rows in the table? 50%? 20%? 10%?

The answer’s often surprising. It’s under 1% of the total rows in the table.

Here’s some test code to demonstrate that.

Setup code:



# CREATE TABLE [dbo].[SeekOrScan](
# [ID] [int] IDENTITY(1,1) NOT NULL primary key,
# SomeNumber int,
# padding char(100)
# ) ON [PRIMARY]
#
# CREATE NonCLUSTERED INDEX [idx_SeekOrScan] ON [dbo].[SeekOrScan] (SomeNumber)


Test code:

1. insert into [SeekOrScan] (SomeNumber)
2. select top 1000000 0
3. from master..spt_values a cross join master..spt_values b
4. where a.name is null and b.name is null
5.
6. update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between
7. GO
8.
9. dbcc freeproccache
10.
11. select * from [SeekOrScan]
12. where somenumber between 1 and 100000 -- 10% of table
13. -- Clustered index scan
14.
15. dbcc freeproccache
16.
17. select * from [SeekOrScan]
18. where somenumber between 1 and 50000 -- 5% of table
19. -- Clustered index scan
20.
21. dbcc freeproccache
22.
23. select * from [SeekOrScan]
24. where somenumber between 1 and 10000 -- 1%
25. -- Clustered index scan
26.
27. dbcc freeproccache
28.
29. select * from [SeekOrScan]
30. where somenumber between 1 and 5000 -- 0.5% of table
31. -- clustered index scan
32.
33. dbcc freeproccache
34.
35. select * from [SeekOrScan]
36. where somenumber between 1 and 3000 -- 0.3% of table
37. -- nonclustered index seek

insert into [SeekOrScan] (SomeNumber) select top 1000000 0 from master..spt_values a cross join master..spt_values b where a.name is null and b.name is null update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between GO dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 100000 -- 10% of table -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 50000 -- 5% of table -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 10000 -- 1% -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 5000 -- 0.5% of table -- clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 3000 -- 0.3% of table -- nonclustered index seek

So somewhere around 0.4% of the table, the index seek with bookmark lookup became more expensive than a table scan.

No comments: