It is well known that you get better performance if you compare two strings in a binary collation than in a dictionary collation. But is the performance difference significant enough to warrant the use of an explicit COLLATE clause in the string comparison expression? That was a question came up in a conversation I had with a colleague recently.
To get a feel for the extent of the performance difference, I ran several tests comparing two commonly-used collations:
· Latin1_General_BIN, and
· SQL_Latin1_General_CP1_CI_AS
More specifically, I ran the following T-SQL script in a loop for 100 times:
DECLARE @s1 varchar(max), @s2 varchar(max), @i int
DECLARE @dummy int, @dt datetime
DECLARE @length int
SET @length = 1000000 -- or 100, or 1000000
SELECT @s1 = REPLICATE(CAST('a' as varchar(max)), @length)
SELECT @s2 = REPLICATE(CAST('a' as varchar(max)), @length)
SELECT @i = 1, @dt = GETDATE()
WHILE @i < 1000
BEGIN
IF @s1 = @s2 COLLATE SQL_Latin1_General_CP1_CI_AS
--IF @s1 = @s2 COLLATE Latin1_General_BIN
SET @dummy =0
SET @i = @i + 1
END
SELECT DATEDIFF(ms, @dt, GETDATE())
The script was run in a number of scenarios:
· The @length variable was set to 100, 10,000, and 1,000,000 to see the impact of the string length on the comparison method,
· The string comparison was done in the IF clause with either the Latin1_General_BIN collation or the SQL_Latin1_General_CP1_CI_AS collation
The most salient point to note is that as the string length increases, the performance difference between the two comparison methods increases. While there was no performance difference when the strings were 100 bytes long, the binary comparison was about twice as fast as the dictionary comparison when the string length was 10,000 bytes. When the string length increased to 1,000,000, the binary comparison was about seven times faster.
So, if you are comparing two short strings, you probably shouldn’t bother to explicitly cast the comparison into a binary collation. But when the strings can be very long, it can result in a huge performance improvement to explicitly specify a binary collation for string comparison.
Note that string comparison in COLLATE Latin1_General_BIN and string comparison in COLLATE SQL_Latin1_General_CP1_CI_AS have different semantics. But there are many cases where you have control over your data, and the difference in comparison semantics does not matter (e.g. when you know your data is all in lower case).
Nothing is particularly new here. Just want to contribute some data points to the community.
Things borrower must know about Pre Closing Home loan
-
Most people tend to take a home loan for 15 to 20 years as this usually
offers the lowest home loan interest rates. Now to prepay the loan, you
have to p...
No comments:
Post a Comment