Friday, January 30, 2009

Good bye


Not sure any words or condolences make an announcement like this any less painful. Families and friends are directly impacted in a negative way when you hear about news like this and I consider all of you as friends and extended family members. I don't even want to think about the anger and disappointment everyone feels over there. To say I am shocked and confused is an understatement.

Playing a blame game of what went wrong and where did it go wrong leaves me with no answers or directions. Its easy to blame it on the current Economy of the USA, and perhaps I have been somewhat blinded by the fact it has not already hit our company. I assumed this was hitting companies much bigger than ourselves that may have been overstaffed to start with. Thinking it was more manufacturing based than educational based. Did not correlate the massive amounts of home forclosures tied to taxes which support the educational world. (90% comes from state taxes). Did not think about the companies and how they pay taxes on profits. Once again, companies all downsizing as there are no profits, hence no taxes paid by companies either. Here is a local story I found online today as well. http://www.waow.com/Global/Story.asp?S=9442775 I heard more about things like this on the news last night. (Keep in mind Wisconsin is only 1 state among many we are infiltrated with). Seems like every night I turn on the news, there is a new local company cutting jobs. Listen to the radio and massive layoffs for other companies forcasting billion dollar losses. Shutting down stores, factories.

I hope our time together would be classified as one of the best experiences in our lives. I hope we have allowed you to become better programmers than when you first started based on the programs you have developed. I hope Suresh and myself will be remembered as one of the best managers you have ever had the ability to work for. Perhaps this is not true with the most recent of events, but I am SO PROUD of the work you have done. As a team, you guys have shown so much growth thru our time together.

Anil, you have been a rock on this team with your programming practices and delivery builds. I think your fundamentals have improved greatly as well as your ability to take complex concepts and ideas (Program Selection on CMT, SystemInfo.dll, Object Oriented Class level project builds) and bring them to life. There is no doubt in my mind that you will go on to bigger and better things in the programming world. You have what it takes to be a great programmer in this world.

Its hard to talk about success when external factors force our working relationships to be closed and terminated. I will never consider the time we spent working together anything less than professional, productive and honorable. I was lucky to be able to work with a staff as great as what you have been. I really do hope you stay in touch and keep me up to date on personal as well as professional experiences in your lives. This is a sad day for Renaissance Learning and one I never want to be part of again. Several at RLI home offices have also been let go and over the next few months, more could happen. Its a scary time that nobody wants any part of. I hear from my grandparents about the Great Depression and how bad it was. Its a different time now, yet watching so many people lose their jobs, even outside of RLI and surrounding communities makes me fear that it will be a long and difficult journey back to where we need to be. I just hope we remember what it took to get us there and all the people that were sacrificed in the process.

Anil, if you ever need a reference, don't be afraid to email me. Currently I check my work email more often than my personal one, but as we have seen, things can change way to quick. So stay in touch to both accounts.

Thank you for everything you have developed for us as its been more than I could have asked for.

Tuesday, January 27, 2009

Restore a SQL Server 2008 Database into SQL Server 2005

If you've tried to back up a database in SQL Server 2008 and then restore it into SQL Server 2005, you know that the database backups are not backward compatible. However, with SQL Server 2008 Management Studio, you can script data and schemas in SQL Server 2005 mode. This 10-Minute Solution takes you through the steps to back up the Northwind database on SQL Server 2008 and restore it to SQL Server 2005.

Problem
Transferring databases from SQL Server 2008 to SQL Server 2005 is not possible with standard backup and restore facilities.

Solution
Leverage the scripting wizard in SQL Server 2008 to script data as well as schemas in SQL Server 2005 compatibility mode.

Using the "Generate SQL Server Scripts" Wizard
The Northwind database is no longer shipped as part of the SQL Server installation, but you can download it from go.microsoft.com. The data is scripted as INSERT statements.

To create the scripts, you have to run the "Generate SQL Server Scripts" wizard, which you can run within SQL Server Management Studio (once Object Explorer is connected to the appropriate instance) by right clicking on the database and selecting "Tasks –> Generate Scripts."

1. Click "Script all objects in the selected database" (see Figure 2), and then click "Next."

2. Amend the script options: Specifically, set "Script for Server Version" to "SQL Server 2005" and set "Script Data" to "True" (see Figure 3). (SQL Server 2000 is also supported.) If you are putting the database on a new instance for the first time, make sure the "Script Database Create" option is set to "True." Click "Next" when you are happy with the options.

3. Select "Script to file," select the file name, and choose "Single file" and Click "Next" for a summary
4. Now click on "Finish" to get progress messages while the script runs and completes.
If the generation process fails, then you can use the "Report" option to see why.

5. When the scripting is completed, look for the following lines:

CREATE DATABASE [Northwind] ON PRIMARY

(NAME = N'Northwind', FILENAME =
N'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\DATA\northwnd.mdf' ,
SIZE = 3328KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )

LOG ON

(NAME = N'Northwind_log', FILENAME =
N'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\DATA\northwnd.ldf' ,
SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

GO

You will need to amend the paths to a valid path for the computer on which you are running. You also will need to comment out the following lines like this:


--EXEC sys.sp_db_vardecimal_storage_format N'Northwind', N'ON'
--GO


Once you have made these changes, you can run the script in SQL Server 2005 Management Studio to recreate the database in your development environment. You can now test data against SQL Server 2008 and SQL Server 2005.

The Inevitable Limitations
Of course, this technique is not without its limitations. Here are a few to bear in mind:

* The data is insecure, as it is in clear readable text. So if you are using real data, you should delete the file created once you have loaded it into SQL Server 2005. You can regenerate the file from the SQL 2008 backup, if necessary.
* If you have a database with a large amount of data, the script file, of course, will be huge.
* SQL Server 2008 specifics in the source database will not be migrated.

SQL Server 2008 Management Studio Trick

I had a database selected in the Object Explorer window and I had the Object Explorer Details window open. I noticed a little icon at the bottom of the screen:



Then I saw that the bar above it was a moveable bar. So I moved it and saw this:



Whoa! So then I tried a table, HumanResources.Department from AdventureWorks2008:



Which caused me to check a procedure:




Each line has a little icon on the side that lets you copy it, line by line. It’s really just a way to display the basic properties about the object in a readily accessible format. It’s not going to change my life, or yours, but it sure is handy to know a quick way to access some basic information.

F4 is also a lot more handy in SSMS these days than it used to be. My favorite is when you hit F4 after clicking on a node in a graphical execution plan. But it works for several other things as well, including tables, procedures, etc. And doesn’t make you keep object explorer details open (which causes some problems of its own, at least pre-CU3).

SQL Server Query Execution Plan Analysis

If your SQL Server has multiple CPUs, and you have not changed the default setting in SQL Server to limit SQL Server's ability to use all of the CPUs in the server, then the query optimizer will consider using parallelism to execute some queries. Parallelism refers to the ability to execute a query on more than one CPU at the same time. In many cases, a query that runs on multiple processors is faster than a query that only runs on a single processor, but not always.

The Query Optimizer will not always use parallelism, even though it potentially can. This is because the Query Optimizer takes a variety of different things into consideration before it decides to use parallelism. For example, how many active concurrent connections are there, how busy is the CPU, is there enough available memory to run parallel queries, how many rows are being processed, and what is the type of query being run? Once the Query Optimizer collects all the facts, then it decides if parallelism is best for this particular run of the query. You may find that one time a query runs without parallelism, but later, the same query runs again, but this time, parallelism is used.

In some cases, the overhead of using multiple processors is greater than the resource savings of using them. While the query processor does try to weigh the pros and cons of using a parallel query, it doesn't always guess correctly.

If you suspect that parallelism might be hurting the performance of a particular query, you can turn off parallelism for this particular query by using the OPTION (MAXDOP 1) hint.

The only way to know for sure is to test the query both ways, and see what happens. [7.0, 2000, 2005]

Wednesday, January 21, 2009

How can I confirm whether a SQL Server installation is 32 or x64 bit?

How about to confirm the component type when a particular SQL Server 2005 instance has been installed?

You can obtain the relevant information using a TSQL statement or referring to the SQL installation directory on that Windows server. Each of these options are outlined below:

Transact-SQL statement:

Execute select @@version from Query Editor or Query Analyzer and the results will be :

SQL 90

Microsoft SQL Server 2005 - 9.00.2153.00 (X64) May 9 2006 13:58:37 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 1)

Microsoft SQL Server 2005 - 9.00.2047.00 (Intel X86) Apr 14 2006 01:12:25 Copyright (c) 1988-2005 Microsoft CorporationStandard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

SQL 80

Microsoft SQL Server 2000 - 8.00.2148 (Intel X86) Jul 7 2005 20:33:10 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 4)

If the installation is 32 bit then as highlighted above you will see 'Intel X86', if not then you will see 'X64' after the SQL Server version number.

SQL Server Installation Directory:

The default installation directory for SQL Server is \Program Files\ and the same can be referenced as below on a x64 operating system:

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

32-bit:
C:\Program Files (x86)\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

For more information on the SQL Server 2005 x64 bit refer to
http://www.microsoft.com/sqlserver/2005/en/us/64-bit.aspx

How to import an Excel file which has columns with more than 255 characters to SQL Server using SSIS?

When the driver determines that an Excel column contains text data, the driver selects the data type based on the longest value that it samples. If the driver does not discover any values longer than 255 characters in the rows that it samples, it treats the column as a 255-character string column instead of a memo column. Therefore, values longer than 255 characters may be truncated. To import data from a memo column without truncation, you must make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters, or you must increase the number of rows sampled by the driver to include such a row.

You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. By default it is 8 rows.

Uninstall and Remove Multiple Database Instances of Microsoft SQL Server 2005

When database system administrator installs Microsoft SQL Server 2005 and creates a few database engine instances, there will be problem and hard time to remove and uninstall these multiple database instance components and the corresponding SQL Server services. When users try to uninstall Microsoft SQL Server 2005 or remove database instance components using Add and Remove Programs or Uninstall or change a programs (Vista) in Control Panel, only one SQL Server 2005 database engine can be selected. After uninstallation has completed, when users attempt to run the SQL Server 2K5 Uninstall to completely clean remove any other instances or components, the action will fail with error message saying the uninstaller shortcut entry is invalid and asked if you want to remove the link.

When you run Add and Remove Programs or Programs and Features (for Windows Vista) in Control Panel, it actually executes the ARPWrapper.exe program that is installed by Microsoft SQL Server Setup Support Files component by using the /Remove option. However, when ARPWrapper.exe completes its removal task, the program also uninstall Microsoft SQL Server Setup Support Files component and hence removes itself or deletes the reference to the ARPWrapper.exe program. This is the cause that prevents administrators to run the uninstaller again to remove database instances or uninstall SQL Server 2005.

The resolution to remove multiple database engine instances from Microsoft SQL Server 2005 is to install ARPWrapper.exe again after removing each components. The following guide provides step-by-step instructions on how to uninstall stand-alone or multiple instance of SQL Server 2005 manually.

1. Login to the server with an administrator user account, or SQL Server 2005 service account.
2. Stop all active SQL Server services in Services applet in Control Panel as any active connections may prevent the uninstallation process from finishing successfully.
3. Go to Control Panel and then Add or Remove Programs applet.
4. Search and verify that Microsoft SQL Server Setup Support Files appears in the list of installed programs. If it’s found, proceed to next step. If Microsoft SQL Server Setup Support Files component is not listed as one of the installed program, install the component by running sqlsupport.msi file locates in Servers\setup\ folder of the original SQL Server 2005 installation media, or disc, folder and USB flash drive that contains SQL Server 2005 setup files.
5. Open command prompt (in Vista, use command prompt with administrator rights), and run the following command to uninstall the SQL Server components:

%ProgramFiles%\Microsoft SQL Server\90\Setup Bootstrap\ARPWrapper.exe /Remove
6. Uninstall the SQL Server instance components such as database engine one at a time.

Uninstall SQL Server 2005 Components


7. Once the removal completes, repeat all of the above steps again until all the SQL Server components are uninstalled.
8. Finally, uninstall the Microsoft SQL Server Setup Support Files component by using Add or Remove Programs (It should has been removed, but double check to ensure that).

Above guide assumes that you didn’t encounter any other problems, and simply want to get rid of all DB instances of the SQL Server 2005. If you have other issues, refer to KB909906 on how to manually uninstall SQL Server.

Tuesday, January 20, 2009

25 most dangerous programming errors

The SANS Institute has explanations of the 25 most dangerous programming errors, according to security experts from all over the world working for a number of different computer security organizations. As pointed out early in the article:

The impact of these errors is far reaching. Just two of them led to more than 1.5 million web site security breaches during 2008 - and those breaches cascaded onto the computers of people who visited those web sites, turning their computers into zombies.

The 25 errors, organized by type, are:

Insecure Interaction Between Components


* Improper Input Validation
* Improper Encoding or Escaping of Output
* Failure to Preserve SQL Query Structure
* Failure to Preserve Web Page Structure
* Failure to Preserve OS Command Structure
* Cleartext Transmission of Sensitive Information
* Cross-Site Request Forgery
* Race Condition
* Error Message Information Leak

Risky Resource Management

* Failure to Constrain Operations within the Bounds of a Memory Buffer
* External Control of Critical State Data
* External Control of File Name or Path
* Untrusted Search Path
* Failure to Control Generation of Code
* Download of Code Without Integrity Check
* Improper Resource Shutdown or Release
* Improper Initialization
* Incorrect Calculation

Porous Defenses

* Improper Access Control
* Use of a Broken or Risky Cryptographic Algorithm
* Hard-Coded Password
* Insecure Permission Assignment for Critical Resource
* Use of Insufficiently Random Values
* Execution with Unnecessary Privileges
* Client-Side Enforcement of Server-Side Security

Sunday, January 18, 2009

Seek or scan?

One very common question that I see on the forums is on index seeks and index scans. A query is resulting in a table/clustered index scan, even though there’s an index on one or more of the columns been searched on.

One of the more common reasons that this happens is because the index in question is not covering, and SQL has determined that the cost of doing the lookups to fetch the extra columns is higher than the cost of scanning the entire table.

If an index does not cover a query, then bookmark lookups are required to get the additional columns, bookmark lookups are run one row at a time, and are seeks on the clustered index. Hence it’s clear that bookmark lookups on a large number of rows are exceedingly expensive and that is why SQL will switch to a clustered index/table scan when lookups are required on a significant percentage of the rows in the table.

So, what consitutes a significant percentage of the rows in the table? 50%? 20%? 10%?

The answer’s often surprising. It’s under 1% of the total rows in the table.

Here’s some test code to demonstrate that.

Setup code:



# CREATE TABLE [dbo].[SeekOrScan](
# [ID] [int] IDENTITY(1,1) NOT NULL primary key,
# SomeNumber int,
# padding char(100)
# ) ON [PRIMARY]
#
# CREATE NonCLUSTERED INDEX [idx_SeekOrScan] ON [dbo].[SeekOrScan] (SomeNumber)


Test code:

1. insert into [SeekOrScan] (SomeNumber)
2. select top 1000000 0
3. from master..spt_values a cross join master..spt_values b
4. where a.name is null and b.name is null
5.
6. update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between
7. GO
8.
9. dbcc freeproccache
10.
11. select * from [SeekOrScan]
12. where somenumber between 1 and 100000 -- 10% of table
13. -- Clustered index scan
14.
15. dbcc freeproccache
16.
17. select * from [SeekOrScan]
18. where somenumber between 1 and 50000 -- 5% of table
19. -- Clustered index scan
20.
21. dbcc freeproccache
22.
23. select * from [SeekOrScan]
24. where somenumber between 1 and 10000 -- 1%
25. -- Clustered index scan
26.
27. dbcc freeproccache
28.
29. select * from [SeekOrScan]
30. where somenumber between 1 and 5000 -- 0.5% of table
31. -- clustered index scan
32.
33. dbcc freeproccache
34.
35. select * from [SeekOrScan]
36. where somenumber between 1 and 3000 -- 0.3% of table
37. -- nonclustered index seek

insert into [SeekOrScan] (SomeNumber) select top 1000000 0 from master..spt_values a cross join master..spt_values b where a.name is null and b.name is null update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between GO dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 100000 -- 10% of table -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 50000 -- 5% of table -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 10000 -- 1% -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 5000 -- 0.5% of table -- clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 3000 -- 0.3% of table -- nonclustered index seek

So somewhere around 0.4% of the table, the index seek with bookmark lookup became more expensive than a table scan.

Wednesday, January 14, 2009

Optimizing SQLServer 2008

Optimization in SQL Server 2008
Using parameterized queries is a well known SQL Server Best Practice. This technique ensures caching and reuse of existing query execution plans (instead of constantly compiling new plans), as well as avoiding SQL injection by mandating that input data be data type safe.
See more about SQL Server parameterization Best Practices here: http://blogs.msdn.com/sqlprogrammability/archive/2007/01/13/6-0-best-programming-practices.aspx
An application that I work with presented me with an interesting dilemma; It wanted to utilize the benefits of plan reuse but the parameter values that the application initially sends to SQL Server are not representative of the values passed in the subsequent re-execution of the statement. SQL Server compiled and cached a ‘good’ plan for the first parameter values. Unfortunately, this had the unintended side effect of caching a poor execution plan for all subsequent parameter values. To make this clearer let’s look at the following example query;
Select * from t where col1 > @P1 or col2 > @P2 order by col1;
Let’s assume for simplicities sake that col1 is unique and is ever increasing in value, col2 has 1000 distinct values and there are 10,000,000 rows in the table, and that the clustered index consists of col1, and a nonclustered index exists on col2.
Imagine the query execution plan created for the following initially passed parameters: @P1= 1 @P2=99
These values would result in an optimal queryplan for the following statement using the substituted parameters:
Select * from t where col1 > 1 or col2 > 99 order by col1;
Now, imagine the query execution plan if the initial parameter values were: @P1 = 6,000,000 and @P2 = 550.
As before, an optimal queryplan would be created after substituting the passed parameters:
Select * from t where col1 > 6000000 or col2 > 550 order by col1;
These two identical parameterized SQL Statements would potentially create and cache very different execution plans due to the difference of the initially passed parameter values. However, since SQL Server only caches one execution plan per query, chances are very high that in the first case the query execution plan will utilize a clustered index scan because of the ‘col1 > 1’ parameter substitution. Whereas, in the second case a query execution plan using index seek would most likely be created.
Unfortunately if the initial parameter values are similar to the first example above, then a ‘table scan’ execution plan gets created and cached, even though most of the following queries would rather use a plan that contains the index seek.
There are a number of ways to work-around this issue;
· Recompile every time the query is executed using the RECOMPILE hint - This can be very CPU intensive and effectively eliminates the benefits of caching queryplans.
· Unparameterize the query – Not a viable option in most cases due to SQL injection risk.
· Hint with specific parameters using the OPTIMIZE FOR hint (However, what value(s) should the app developer use?) This is a great option if the values in the rows are static, that is; not growing in number, etc. – However in my case the rows were not static.
· Forcing the use of a specific index
· Use a plan guide – Using any of the recommendations above.
SQL Server 2008 provides another alternative: OPTIMIZE FOR UNKNOWN

SQL Server 2008 provides a different alternative; the OPTIMIZE FOR UNKNOWN optimizer hint. This hint directs the query optimizer to use the standard algorithms it has always used if no parameters values had been passed to the query at all. In this case the optimizer will look at all available statistical data to reach a determination of what the values of the local variables used to generate the queryplan should be, instead of looking at the specific parameter values that were passed to the query by the application.
Full documentation of optimizer hints can be found here:
http://msdn.microsoft.com/en-us/library/ms181714(SQL.100).aspx
Example:
@p1=1, @p2=9998,
Select * from t where col > @p1 or col2 > @p2 order by col1
option (OPTIMIZE FOR (@p1 UNKNOWN, @p2 UNKNOWN))
Using this new optimizer hint option has allowed the ISV to generate queries that result in the benefits of parameterization; such as plan reuse, while eliminating the problems caused by the caching of queryplans that were created using nontypical initially passed parameter values.
NOTE: This new optimizer hint option, like all optimizer hints, should be used only by experienced developers and database administrators in cases where SQL Server cannot create an optimal plan.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 4 Comments
Friday, October 31, 2008 1:49 PM
SQL Server 2008 : new binary – hex string conversion functionality can dramatically improve related query performance by orders of magnitude.
In previous SQL Server releases it wasn’t possible to convert binary data to string characters in hex format directly, because SQL Server did not have a built-in Transact-SQL command for converting binary data to a hexadecimal string. The Transact-SQL CONVERT command converted binary data to character data in a one byte to one character fashion. SQL Server would take each byte of the source binary data, convert it to an integer value, and then uses that integer value as the ASCII value for the destination character data. This behavior applied to the binary, varbinary, and timestamp datatypes.

The only workarounds were to use either a stored procedure as described in a Knowledge Base Article: "INFO: Converting Binary Data to Hexadecimal String" ( http://support.microsoft.com/kb/104829 ) or by writing a CLR function.

An ISV I work with doesn’t support CLR and therefore they implemented their own version of a custom convert function in form of a stored procedure. This one was even faster than everything else they found on the Internet.

NEW – IN SQL SERVER 2008 the convert function was extended to support binary data – hex string conversion. It looks like a tiny improvement almost not worth mentioning.

However, for the ISV it was a big step forward as some critical queries need this functionality. Besides the fact that they no longer have to ship and maintain their own stored procedure, a simple repro showed a tremendous performance improvement.

Repro:
=====

I transformed the procedure described in the KB article mentioned above into a simple function. The stored procedure below will create a simple test table with one varbinary column and insert some test rows in 10K packages ( e.g. nr_rows = 100 -> 1 million rows in the table ).

The repro shows two different test cases:
1. insert 0x0 two million times
2. insert 0x0123456789A12345 two million times

Depending on the length of the value the disadvantage of the stored procedure solution will be even bigger. On my test machine the results of the test queries below were:
(both tests were done with the same SQL Server 2008 instance - no change of any settings)

1. two million times value 0x0

a, using stored procedure : about 3460 logical reads, no disk IO, ~52 secs elapsed time
b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time

2. two million times value 0x0123456789A12345
a, using stored procedure : about 3460 logical reads, no disk IO, ~157 secs elapsed time
b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time

Repro Script:
========

create function sp_hexadecimal ( @binvalue varbinary(255) )
returns varchar(255)
as
begin
declare @charvalue varchar(255)
declare @i int
declare @length int
declare @hexstring char(16)
select @charvalue = '0x'
select @i = 1
select @length = datalength(@binvalue)
select @hexstring = '0123456789abcdef'
while (@i <= @length)
begin
declare @tempint int
declare @firstint int
declare @secondint int
select @tempint = convert(int, substring(@binvalue,@i,1))
select @firstint = floor(@tempint/16)
select @secondint = @tempint - (@firstint*16)
select @charvalue = @charvalue +
substring(@hexstring, @firstint+1, 1) +
substring(@hexstring, @secondint+1, 1)
select @i = @i + 1
end
return ( @charvalue )
end


create procedure cr_conv_test_table ( @value varbinary(16), @nr_rows int )
as
begin
declare @exist int
declare @counter int
set NOCOUNT ON
set statistics time off
set statistics io off
set statistics profile off
set @exist = ( select count(*) from sys.objects
where name = 'conv_test_table' and
type = 'U' )
if( @exist = 1 )
drop table conv_test_table

set @exist = ( select count(*) from sys.objects
where name = 'conv_test_table_temp' and
type = 'U' )
if( @exist = 1 )
drop table conv_test_table_temp

create table conv_test_table ( varbincol varbinary(16) )
create table conv_test_table_temp ( varbincol varbinary(16) )
set @counter = 10000
while @counter > 0
begin
insert into conv_test_table_temp values ( @value )
set @counter = @counter - 1
end
set @counter = @nr_rows
while @counter > 0
begin
insert into conv_test_table select * from conv_test_table_temp
set @counter = @counter - 1
end
end

-- create 2 million test rows
execute cr_conv_test_table 0x0, 200

set statistics time on
set statistics io on

-- compare runtime of stored procedure with new convert feature
select count(*) from conv_test_table
where dbo.sp_hexadecimal(varbincol) = '0x00'
select count(*) from conv_test_table
where CONVERT(varchar(255),varbincol,1) = '0x00'

-- create 2 million test rows
execute cr_conv_test_table 0x0123456789A12345, 200

set statistics time on
set statistics io on

-- compare runtime of stored procedure with new convert feature
select count(*) from conv_test_table
where dbo.sp_hexadecimal(varbincol) = '0x0123456789A12345'
select count(*) from conv_test_table
where CONVERT(varchar(255),varbincol,1) = '0x0123456789A12345'



Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 0 Comments
Friday, August 22, 2008 1:54 PM
How to create an autonomous transaction in SQL Server 2008
I have been asked by many customers and partners, especially those migrating from Oracle, this question: how to create an autonomous transaction in SQL Server? It turns out to be a tricky thing to do since SQL Server doesn't have built-in autonomous transaction support like Oracle.
An Autonomous transaction is essentially a nested transaction where the inner transaction is not affected by the state of the outer transaction. In other words, you can leave the context of current transaction (outer transaction) and call another transaction (autonomous transaction). Once you finish work in the autonomous transaction, you can come back to continue on within current transaction. What is done in the autonomous transaction is truly DONE and won't be changed no matter what happens to the outer transaction. To make it easier to understand, here is an example of the described scenario.
BEGIN TRAN OuterTran
INSERT TABLE1
BEGIN “AUTONOMOUS” TRAN InnerTran
INSERT TABLE2
COMMIT “AUTONOMOUS” TRAN InnerTran
ROLLBACK TRAN OuterTran
The above pseudo script is meant to preserve result of “INSERT TABLE2”. In SQL Server 2008 or prior versions, "ROLLBACK TRAN" would always rollback all inner transactions to the outermost "BEGIN TRAN" statement (without specifiying savepoint). So the "InnerTran" transaction would be rolled back as well, which is not the desired behavior for the particular scenario.
You could wonder why we need an autonomous transaction in the first place. Why can't we just implement two separate transactions so they don't interfere with each other? There are scenarios where people do need logic structured like this. Logging errors in database is one of the most common scenarios. Below is a TSQL script demonstrating a nested transaction where the inner transaction attempts to save the runtime errors in a table.
USE TEMPDB
GO
CREATE TABLE ErrorLogging (logTime DATETIME, msg VARCHAR(255))
CREATE TABLE TestAT (id INT PRIMARY KEY)
GO
CREATE PROCEDURE usp_ErrorLogging
@errNumber INT
AS
INSERT INTO ErrorLogging VALUES (GETDATE(), 'Error ' + CAST(@errNumber AS VARCHAR(8)) +' occurred.')
GO

DECLARE @ERROR AS INT
INSERT INTO TestAT VALUES (1)
BEGIN TRAN OuterTran
INSERT INTO TestAT VALUES (1) -- This will raise primary key constraint violation error

SELECT @ERROR = @@ERROR
IF @ERROR <> 0
BEGIN
BEGIN TRAN InnerTran
EXEC usp_ErrorLogging @ERROR
COMMIT TRAN InnerTran

ROLLBACK TRAN OuterTran
END

IF @@TRANCOUNT > 0
COMMIT TRAN OuterTran
GO
SELECT * FROM TestAT
SELECT * FROM ErrorLogging
GO
If you run above script against SQL Server, you would see no error message recorded in table "ErrorLogging" due to the "ROLLBACK TRAN OuterTran" statement. So, how can we make it work?
In SQL Server 2008, you can implement a loopback linked server to achieve the same goal. For more information about loopback linked server, check Books Online for details (http://msdn.microsoft.com/en-us/library/ms188716.aspx).

USE MASTER
GO
EXEC sp_addlinkedserver @server = N'loopback',@srvproduct = N' ',@provider = N'SQLNCLI', @datasrc = @@SERVERNAME
GO
EXEC sp_serveroption loopback,N'remote proc transaction promotion','FALSE'
Go
Note 'remote proc transaction promotion' is a new option on SQL Server 2008, which allows you to control whether or not you want to enlist remote stored procedure call in a distributed transaction. When this option is off (FALSE) as we set in the above example, the local transaction will not be promoted to distributed transaction. This is how we are able to separate outer and inner transactions in a "autonomous transaction" fashion.
The Inner transaction above can be replaced by:
BEGIN TRAN InnerTran
EXEC loopback.tempdb.dbo.usp_ErrorLogging @ERROR
COMMIT TRAN InnerTran
Full working script is in the appendix below. I want to point out that this method of using a loopback linked server might not scale well if it's executed very frequently. And it only works in SQL Server 2008 due to new server option of 'remote proc transaction promotion' as discussed above. As always, test before you use it.
If you are looking for alternative ways of creating autonomous transaction on SQL 2008 or 2005, you have these options:
Loopback connection from SQLCLR procedure to start a new transaction. Compared to more rigid structure need of loopback linked server, SQLCLR is more flexible and gives you more control over how you want to handle interaction with database. If the logic of the autonomous transaction includes computational tasks, it's one of SQLCLR's strengths to provide performance gain as extra benefit.
Using table variable to save data within transaction. Table variables are not affected by transaction rollback thus serve as temporary buffer for transaction data. Once transaction is done, you can dump data out of table variable to a permanent table. Table variables have limited scope and are less flexible. Usually they would also be slower due to lack of index/statistics. However, it does offer you a pure TSQL option with no need to create anything new.
Loopback connection from Extended Stored Procedures.
Extended Stored Procedure are on the SQL Server deprecation list and we strongly recommend NOT using it.
In a future blog, I'll provide sample SQLCLR code and a script using a table variable to create autonomous transactions. I will also compare their performance differences with loopback linked server in a scalability test. Stay tuned.
Appendix
USE MASTER
GO
EXEC sp_addlinkedserver @server = N'loopback',@srvproduct = N' ',@provider = N'SQLNCLI', @datasrc = @@SERVERNAME
GO
EXEC sp_serveroption loopback,N'remote proc transaction promotion','FALSE'
EXEC sp_serveroption loopback,N'RPC OUT','TRUE' -- Enable RPC to the given server.
Go
USE TEMPDB
GO
CREATE TABLE ErrorLogging (logTime DATETIME, msg VARCHAR(255))
CREATE TABLE TestAT (id INT PRIMARY KEY)
GO
CREATE PROCEDURE usp_ErrorLogging
@errNumber INT
AS
INSERT INTO ErrorLogging VALUES (GETDATE(), 'Error ' + CAST(@errNumber AS VARCHAR(8)) +' occurred.')
GO

DECLARE @ERROR AS INT
INSERT INTO TestAT VALUES (1)
BEGIN TRAN OuterTran
INSERT INTO TestAT VALUES (1) -- This will raise primary key constraint violation error

SELECT @ERROR = @@ERROR
IF @ERROR <> 0
BEGIN
BEGIN TRAN InnerTran
EXEC loopback.tempdb.dbo.usp_ErrorLogging @ERROR
COMMIT TRAN InnerTran

ROLLBACK TRAN OuterTran
END

IF @@TRANCOUNT > 0
COMMIT TRAN OuterTran
GO
SELECT * FROM TestAT
SELECT * FROM ErrorLogging
GO Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Friday, July 11, 2008 1:51 PM
UPDATE with OUTPUT clause – Triggers – and SQLMoreResults

NOTE: the code in this BLOG is TSQL instead of ODBC calls. Since ODBC can be hard to understand and other API’s will have the same basic issues, I decided to use the simpler and more concise TSQL, which should also appeal to a wider audience.

An ISV I work with recently ran into an interesting problem; here is the description and solution.

PROBLEM:
Adding an unexpected trigger caused application code to fail due to incomplete SQL Syntax, and not reading through all returned results.

The ISV wanted to utilize the OUTPUT Clause of the UPDATE statement in their ODBC (SNAC) based application. The OUTPUT clause is very useful in providing data back to the application regarding the row, or rows, which were updated (or: inserted / deleted). In the example I use below, the application is interested in knowing the date/time of the updated row(s).

This could be accomplished by issuing the following statement:
UPDATE T SET COL2 = @Pcol2, COL3 = getdate() OUTPUT CAST(INSERTED.COL3 AS varchar(30))WHERE COL1 = @Pcol1

The ISV coded up the application expecting a return value for number of rows affected, and if that value was greater than 0 then it also returned the value of the inserted date/time.

This worked well, until an external Partner application added a trigger to the table listed in the UPDATE statement.

Example: CREATE TRIGGER [dbo].[TTrigger1] on [dbo].[T] after update as update t2 set col3 = 0

Now the application failed on the UPDATE statement with the following error message:
[Microsoft][SQL Native Client][SQL Server]The target table 'T' of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause.

The error message is self-explanatory, but was a surprise to the ISV application (and the application developer). The developer did not expect a trigger to ever be created on the table.
There are two different methods of getting OUTPUT data from an UPDATE statement;
· UPDATE with the OUTPUT clause only – this returns output results directly as part of the statement. This option cannot have a trigger defined on the table.
· UPDATE with OUTPUT and INTO clauses – this returns the output a specific table, or table variable. This option must be used if there is any possibility the table will have a trigger on it at any point.
· See the following website for complete the OUTPUT Clause documentation:
http://msdn.microsoft.com/en-us/library/ms177564.aspx
The developer then utilized the following syntax to send the same statement to SQL Server, and also to get the expected result back: declare @p165 table (col2 varchar(30));UPDATE T SET COL2 = ?, COL3 = getdate() OUTPUT CAST(INSERTED.COL3 AS varchar(30)) into @p165 WHERE COL1 = 1;select * from @p165

Now a subtlety occurred, can you guess what it was? If you guessed that additional results are returned you are correct.
The ODBC code returned data in a loop utilizing the following API calls: SQLFetch, SQLNumResultCols, SQLRowCount, SQLMoreResults:
· The first results returned were the number of rows affected by the trigger, not the number of rows affected by the UPDATE statement, which was what the application was actually expecting
· The second set of results were the number of rows affected by the UPDATE statement
· The third set of results were the number of rows returned by the SELECT statement reading the table variable
· And finally, the actual data from the updated row(s) – which is what we really wanted in the first place!
So, the lessons to be learned here are:
1. Be aware that triggers will affect your UPDATE statements if utilizing the OUTPUT clause
2. You should utilize the INTO clause to avoid the issue
3. Always use SQLMoreResults to read all of the result-sets that could be returned from SELECT, UPDATE, INSERT, or DELETE statements.
4. Triggers should include the ‘SET NOCOUNT ON’ statement to avoid returning the ‘affected number of rows’.
SOLUTION:
The application was changed to utilize the INTO clause, and SQLMoreResults was used to return all the resulting data. Using SET NOCOUNT ON in trigger logic is also a best practice that prevents additional results ‘Rows affected’ from being generated.

Here is a script to duplicate the issues I’ve described:
USE tempdb
GO
------You may want to run this script in steps from comment – to comment
------so you can follow along, instead of running the entire script at once

CREATE TABLE t(
[col1] [int] NOT NULL,
[col2] [varchar](30) NULL,
[col3] [datetime] NULL
) ON [PRIMARY]
GO
insert into t values (1,'abc', getdate())
insert into t values (1,'abc', getdate())
insert into t values (1,'abc', getdate())
GO
select * from t
GO
UPDATE t SET col2 = 'Peter', col3 = getdate()
OUTPUT CAST(INSERTED.col3 AS varchar(30))WHERE col1 = 1
GO
select * from t
GO
------So far everything is good, Now let’s add the new table and the trigger
CREATE TABLE t2(
[col1] [int] NULL,
[col2] [datetime] NULL
) ON [PRIMARY]
GO
insert into t2 values (2, getdate())
insert into t2 values (2, getdate())
GO
select * from t2
GO
------In this example, the trigger: ttr1 will update the rows
------of a second table: t2
CREATE TRIGGER ttr1 on t after update as update t2 set col1 = 0
GO
------OK, let’s try now with the trigger on
UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))WHERE col1 = 1
GO
------Chances are good you got the following error message
--Msg 334, Level 16, State 1, Line 1
--The target table 't' of the DML statement cannot have any enabled triggers --if the statement contains an OUTPUT clause without INTO clause.
----- let’s fix that now.
declare @p1 varchar(30)
UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))into @p1 WHERE col1 = 1
GO
------Notice this failed as well with the following error message
--Msg 1087, Level 16, State 1, Line 2
--Must declare the table variable "@p1".
------We need to use a table
------for this to work correctly we must use a table or
------a table variable where the ‘INTO’ data will reside,
------and be retrieved from
declare @p1 table (col2 varchar(30))
UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))into @p1 WHERE col1 = 1
select * from @p1
--Now you get what we were originally looking for
-- the date/times of the rows that were updated
--Look at the results under the 'Messages' tab as well...
--you will see the number of rows affected:
-- 2 for the rows inserted as part of the trigger
-- 3 for the rows Updated
-- and 3 for the rows we selected from the table variable
--Now, you can see that the application must utilize SQLMoreResults if it
--wants to return all the valid results.

Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Thursday, June 26, 2008 10:13 AM
Use SQL Server replay tools to reproduce and resolve customer issues
For many ISVs run that into issues at customer sites, it is sometimes difficult to isolate underlying problems, especially on a 24x7 production environment, where limitations apply to real time troubleshooting and live debugging. In situations like this, constructing a repro scenario in a separate environment would be ideal to minimize impact to live production system, and to speed up resolution process.

SQL Server Profiler
Allow me introduce SQL Profiler, which offers replay trace function. Well, it’s not something new. First shipped in SQL Server 7.0, the feature has gone through many improvements in later releases. You can use the tool to take a captured trace as input and replay it against test database(s). It helps identify issues that could be reproduced by replaying the events in the trace. Profiler itself uses ODBC only. In SQL Server 2005 and 2008, the replay function can be configured to use multiple threads (up to 64) to replay workloads.
Advantages:
1. SQL Server profiler is a built-in tool with full support of Microsoft product team. It works out of box.
2. Easy to set up and run. Capture a trace using predefined replay template with all required events, and replay it against original database(s) (target machine needs to meet certain requirements http://msdn2.microsoft.com/en-us/library/ms175525.aspx)
3. In addition to multi-threaded replay, it also provides option of step through to replay events in the order they were traced.
Disadvantages:
1. Certain events can’t be replayed including replication, events involving GUID, session binding events, operations on Blobs using bcp, full-text, READTEXT, WRITETEXT, and etc. See BOL for more details (http://msdn2.microsoft.com/en-us/library/ms188238.aspx)
2. The tool does not support multiple machine replay (running multiple instances of Profiler from multiple machines to replay the trace).
3. Profiler GUI tool is client side tracing and might be intrusive and generate significant performance overhead when capturing events for replay. Be careful of what events to capture and consider using server side tracing (sp_trace_create).

RML Utilities
Starting in SQL Server 2000, SQL Server Customer Support Services team (CSS) started a project of similar tool, called Ostress, with higher flexibility and scalability to help troubleshoot some of the challenging SQL problems. The latest version is packaged in “Replay Markup Language(RML) Utilities” supporting both SQL 2005 and SQL 2000 (http://support.microsoft.com/kb/944837). The tool can replay multi-threaded events as profiler does but with multiple machine replay support. It can simulate up to 1000 concurrent threads. The tool has a component called Ostress (just like old version), which takes a TSQL batch file and “stress” test it by opening arbitrary number of connections and iterate the TSQL batch in each connection configurable number of loops. This is useful when workload can be characterized as same or similar batch from various number of users (connections).

Advantages:
1. The tool offers both replay and stress test options.
2. It supports multiple machine replay (multiple instances of OStress) with up to 1000 concurrent threads.
3. OStress supports 7.0, 2000, and 2005 trace formats.
Disadvantages:
1. The tool is provided as is, no technical support from Microsoft. But you can submit questions via contact in readme of the tool.
2. Requires extra steps to process trace file and convert to RML format before being replayed.
3. Does not support MARS replay.

Recommendation
When to use SQL profiler and when to use RML Utilities? If you have a workload that can be replayed/reproduced with no or low concurrency requirement (<64> 64 threads) or an isolated batch that can be “stress” tested for simulation, use RML Utilities. Keep in mind, for concurrency replay, full sync of ordered events is very hard to replay and no tools exist today to exactly duplicate the original trace. So the issues that happened on traced source server might not be reproduced consistently afterwards even on same environment.

Both tools above are for database replay. For a simulation test of multi-tier application environment, consider load-test tool of Visual Studio (Team edition) or 3rd-party vendor products.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Tuesday, May 27, 2008 10:49 AM
SQL Server Intermittent Connectivity Issue
Recently many customers of an ISV I work with, reported intermittent connectivity issues when running the ISV application on SQL Server. Some customers reported the issue to be SQL Server 2005 specific. Others stated that they are experiencing the same issue on both SQL Server 2000 and 2005. Due to the intermittent nature, and the variation of the issue, it took us quite a while to collect all the data, (odbc trace, netmon trace, sql trace…), analyse it, and understand the exact cause.
SynAttackProtect
The first issue we found was a subtle Winsock behavior change in Window 2003 SP1. Windows 2003 SP1 introduces a configurable registry setting: SynAttackProtect, that protects the server from network Denial-Of-Service attacks. By default the protection is on. In a SQL Server environment, when the number of simultaneous client connection requests is more than the system can handle and SQL Server backlog queue is full, the client will receive a 'connection failed' error from SQL Server:
TCP Provider: An existing connection was forcibly closed by the remote host
The SQL Protocols team has a good Blog that explains the detailed interaction between SynAttackProtect setting and SQL Server. See http://blogs.msdn.com/sql_protocols/archive/2006/04/12/574608.aspx.
In Windows 2003, this issue could be worked-around by configuring the registry setting to disable SynAttackProtect.
1) Launch regedit.exe
2) Add DWORD value named SynAttackProtect under registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
3) Set data value to 0
See http://technet2.microsoft.com/windowsserver/en/library/8d3a9f4d-13d1-4280-ac57-30242504d8ba1033.mspx?mfr=true for additional information.
After applying the registry change, two customers reported the intermittent connection issue went away. Both customers had been stress-testing SQL Server. SynAttackProtect is more likely to become an issue in a lab environment where SQL Server is being stress-tested with extreme load. Customers running stress-testing in lab environments should turn off SynAttackProtect. I am not sure I’d recommend proactively turning it off in a production system given the potential security risk. If a production system ever runs into the SynAttackProtect issue, the where/why of the large number of connection requests should be examined first.
Windows “Scalable Networking Pack”
Windows Scalable Networking Pack was a second network stack change that was released as part of Windows 2003 SP1 + KB91222, or Windows 2003 SP2. With the Scalable Networking Pack, the TCP Chimney Offload feature is enabled by default to increase performance. However implementations on certain network cards are problematic when TCP Chimney Offload enabled, and can cause intermittent connection drop. When the connection is dropped due to incompatibility between the network card and Windows Scalable Networking Pack, typical error message is
[08S01] [Microsoft][SQL Native Client]Communication link failure
A workaround for this issue could be to disable to TCP Chimney Offload feature.
1) Launch regedit.exe
2) Edit DWORD EnableTCPChimney under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
3) Edit DWORD EnableRSS under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
4) Edit DWORD EnableTCPA under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
5) Restart the server
Amongst others, network cards using Broadcom 5708 chipsets are known to have compatibility issues w/ TCP Chimney Offload. Disabling TCP Chimney Offload would fix the incompatibility issue. However it would also remove the benefit of Windows TCP stack performance improvements. It should only be used as a workaround until a fix becomes available from the network card vendor.
See support article http://support.microsoft.com/kb/942861 for additional information.
By now, the majority of customers solved the intermittent connectivity issue after applying the SynAttackProtect and/or TCP Chimney Offload changes. Some customers connecting to SQL Server through Citrix have to turn off the TCP Chimney Offload feature on the Citrix server to fix the issue.
Query Timeout
The particular ISV application runs with a configurable Query timeout, by default the timeout is set to 1 second. When a query timeout happens, the application will retry the query for 16 times, if it still fails, the query will be submitted again with nolock hint. How would that affect the connection? At the first glance, it seems to be irrelevant, but it does. Here’s the sequence of what could happen.
1) The query is submitted to SNAC for execution
2) The execution of the query takes too long so the timeout expires
3) Once the timeout happens, client attempts to cancel the query and sends an ATTN packet to the server. After sending ATTN, the client then waits for the response from the server. A timeout is set for this wait, the value of the timeout is obtained through SQL_ATTR_CONNECT_TIMEOUT and if not set, the default to 120 seconds.
4) The wait for server to response also times out. This is treated as an error from the client and basically the connection is considered dead in such scenarios
5) The client marks the connection as dead and then returns the “Query timeout expired” error message.
6) The application code, on seeing a timeout expired message attempts to execute the request again on the same connection, but immediately hits the “communication link failure” message because the connection is deemed as dead by the client

During the investigation, we discovered a regression in SQL Server 2005. SQL Server 2005 may not response to a query cancel (timeout) request in a timely manner if the query requires index scan through a large number of pages. Checking for any attention requests is delayed when the SQL Server storage engine is busy bringing pages in from disk. Resulting in
[08S01] [Microsoft][SQL Native Client]Communication link failure

A hotfix is available to fix the regression, see details in
http://support.microsoft.com/kb/945442.

All the remaining customers had some long running batch queries that index scan a large table. This is just the last piece we needed to complete the puzzle. After applying the hotfix, all customers reported the problem solved.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Tuesday, March 18, 2008 4:25 PM
Using time zone data in SQL Server 2008

In SQL Server 2008 Microsoft has introduced a number of new date and time data types. One of these is the datetimeoffset data type. This data type includes an offset from UTC time as well as the datetime value and ensures that the datetime can be retrieved in UTC or a particular timezone based on this offset. There are also new functions to allow for conversions between different time zones using the new function SWITCHOFFSET().

An example from SQL Server 2008 Books On Line (BOL):
CREATE TABLE dbo.test
(
ColDatetimeoffset datetimeoffset
);
GO
INSERT INTO dbo.test
VALUES ('1998-09-20 7:45:50.71345 -5:00');
GO
SELECT SWITCHOFFSET (ColDatetimeoffset, '-08:00')
FROM dbo.test;
GO
--Returns: 1998-09-20 04:45:50.7134500 -08:00
SELECT ColDatetimeoffset
FROM dbo.test;
--Returns: 1998-09-20 07:45:50.7134500 -05:00

One of the most common questions we are asked is why we use the offset and not a timezone name. A timezone name is much easier to remember than an offset, and most people do not know an offset without looking it up, making queries more difficult.

Unfortunately, there is no current international standard authority for timezone names and values. Each system needs to use a system of their own choosing, and until there is an international standard, it is not feasible to try to have SQL Server provide one, and would ultimately cause more problems than it would solve. However, there are a couple of common systems that are well recognized around the world. One is the Dynamic timezone data that is stored in the Windows Vista registry. This data can be read from the registry into a file, which is then imported into SQL Server.
Another cross-platform standard is the public domain Olson Timezone database (http://www.twinsun.com/tz/tz-link.htm). There are many public domain programs for extracting a time zone from these files, but at this time most are programmatic solutions. So a programmatic function could be written in the CLR, but to provide full database functionality and query-ability, a table is needed.

The programmatic solutions take a date, then apply the many different rules that determine when a zone is in daylight savings time and when it is not. However, there are also historical changes. Daylight savings time changed in 2007, meaning that determining what the offset for a particular zone is at a particular time is different depending on the year. Then there are times when leap seconds need to be added. Therefore any data-driven solution must have rows that have valid time ranges as well.

The approach recommended here is to take one of the DLLs found on the web and instead of providing a programmatic solution around a specific date – to write all of the rows out into a database as an offset with the valid ranges. Currently this example uses the Windows standard naming conventions for timezones, with a mapping to the Olson timezone names, but you could easily add other names in other languages as well.

Working with timezones is very complex, and the following is a suggestion only for some ideas on how to use time zone data more effectively. This is an example program (no guarantees) that uses a .NET library and writes the data from the Olson tz files in table format, and which can then be imported into SQL Server. The .NET timezone library can be found at http://www.babiej.demon.nl/Tz4Net/main.htm and they request a small donation.

Here is some sample code to write to files the timezone data (no guarantees – does not include leap seconds):

StreamWriter sr = File.CreateText(@"D:\TZMapping.txt");
StreamWriter tr = File.CreateText(@"D:\TZZones.txt");

string[] zoneNames = OlsonTimeZone.AllNames;
sr.WriteLine("ID\tDaylightName\tStandardName\tRawUtcOffset\tOffsetSeconds\tWin32Id");
tr.WriteLine("ID\tTransitionStart\tTransitionEnd\tDeltaSeconds\tDST");

for (int i = 0; i < zoneNames.Length; i++)
{
OlsonTimeZone tz = OlsonTimeZone.GetInstanceFromOlsonName(zoneNames[i].ToString());
sr.Write(i.ToString() + "\t");
sr.Write(tz.DaylightName.Trim() + "\t");
sr.Write(tz.StandardName.Trim() + "\t");
sr.Write(tz.RawUtcOffset.ToString() + "\t");
sr.Write(tz.RawUtcOffset.TotalSeconds.ToString() + "\t");
sr.WriteLine(tz.Win32Id == null ? "" : tz.Win32Id.Trim());

DaylightTime[] times = tz.AllTimeChanges;
for (int j = 0; j < times.Length; j++)
{
tr.Write(i.ToString() + "\t");
tr.Write(times[j].Start.ToString("yyyy-MM-dd HH:mm:ss") + "\t");
tr.Write(times[j].End.ToString("yyyy-MM-dd HH:mm:ss") + "\t");
tr.Write(times[j] is StandardTime ? "0\t" :times[j].Delta.TotalSeconds.ToString() + "\t");
tr.WriteLine(times[j] is StandardTime ? false.ToString() : true.ToString() );
}
}
tr.WriteLine();
sr.WriteLine();
tr.Close();
sr.Close();

Import the TZMapping file, which will become the parent table, with the ID as the primary key. Your table structure might look like this:




Please note: If you use the Flat File Datasource in the Import Data Wizard in SQL Server 2008 Management Studio, you will need to open the Advanced Tab to set the source OutPutColumnWidth to greater than the default of 50. Then import the TZZones file, which will become the child table with the ID, TransitionStart, and TransitionEnd as the composite primary key with a foreign key reference to the TZMapping table. The TZZones table includes historical timezone data. Joining these new tables into your data into queries now allows for queries that include standard names, Windows IDs, etc.

For example, offsets can now be retrieved by a preferred name:

select UtcOffset from TZmapping where StandardName = 'US/Pacific (PST)'

The following two queries return different offset amounts for the same day in two different years. This is because the US changed daylight savings time, and the date in March now falls into daylight savings when it did not before.


select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600
from dbo.TZMapping
join dbo.TZZones
on dbo.TZMapping.id = dbo.TZZones.id
where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'
and '2006-03-15'
between dbo.TZZones.TransitionStart
and dbo.TZZones.TransitionEnd



select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600
from dbo.TZMapping
join dbo.TZZones
on dbo.TZMapping.id = dbo.TZZones.id
where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'
and '2007-03-15'
between dbo.TZZones.TransitionStart
and dbo.TZZones.TransitionEnd


Again, timezones are a complex area and each application will need to address how you are going to handle time zone data to make programs more user friendly. This is just one small example.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 5 Comments
Tuesday, March 18, 2008 3:49 PM
Increase your SQL Server performance by replacing cursors with set operations

You have probably heard many times, from different sources, that as a best practice; avoid using TSQL cursors.
During a recent visit to a partner we ran into a common cursor case, which I wanted to use as an example to demonstrate why you should avoid TSQL cursors in most cases, and how to convert cursor logic to simple set join operations. Now there are certain scenarios where using a cursor makes sense. For example, a cursor is ideal for row by row processing that can’t be accomplished by set based operations. A cursor is flexible in that it provides a window, or subset, of data and that allows manipulation of the data in various ways. Study carefully what you want to achieve on case by case basis before using a cursor. Keep in mind SQL Server, as a modern RDBMS system, performs much better with set operations.
Here is simplified version of a real cursor that was used to update a big table with over 200 million rows.
DECLARE @EntityId Varchar(16)
DECLARE @PerfId Varchar(16)
DECLARE @BaseId Varchar(16)
DECLARE @UpdateStatus Int

DECLARE outerCursor CURSOR FOR
SELECT EntityId, BaseId
FROM outerTable
--Returns 204,000 rows

OPEN outerCursor
FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId

WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE innerCursor CURSOR FOR
SELECT PRFMR_ID
FROM innerTable
WHERE ENTY_ID = @BaseId

OPEN innerCursor
FETCH NEXT FROM innerCursor INTO @PerfId
SET @UpdateStatus = @@FETCH_STATUS

WHILE @UpdateStatus = 0
BEGIN
UPDATE 200MilRowTable
SET ENTY_ID = @EntityId
WHERE PRFMR_ID = @PerfId

FETCH NEXT FROM innerCursor INTO @PerfId
SET @UpdateStatus = @@FETCH_STATUS
END

CLOSE innerCursor
DEALLOCATE innerCursor --clean up inner cursor

FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId
END

CLOSE outerCursor
DEALLOCATE outerCursor –cleanup outer cursor


You might notice that this is a nested cursor with 204,000 loops in total for outerCursor. The innerTable has 10 million rows but innerCursor varies in number of loops depending on @BaseId of outerCursor. When I arrived at the customer this cursor had been running for over a day. The developer was “hoping” that it would finish soon given another day or two. The problem was nobody knew for sure how much time this thing would need to complete. Well, we can find out how much progress it has made so far to make an educated guess:
SELECT execution_count, st.text
FROM sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
WHERE st.text like '%FETCH NEXT FROM outerCursor INTO%'

This would tell us how many times the outer cursor has looped already. It returned 107. That’s only around 0.5% (107/204,000) after 30 hours of running. If the trend were to continue, the cursor would need another 8+ months!!!
A rewrite of the batch to take advantage of set operations is as simple as this:
SELECT i.PRFMR_ID, o.EntityId INTO #tempTable
FROM innerTable i join outerTable o on i.ENTY_ID = o.BaseId
Go
UPDATE 200MilRowTable
SET m.ENTY_ID = t.EntityId
FROM 200MilRowTable m join #tempTable t on m.PRFMR_ID = t.PRFMR_ID
Go
--note this is only one of a few ways to rewrite.
In this particular case, “SELECT INTO” is minimally logged under simple recovery mode. The two statement approach makes it easier to understand the conversion logic.
This batch took approximately 17 hours to complete. Between the statement, I also put the database into simple recovery mode and added appropriate indexes to the temp table. I also dropped indexes from 200MilRowTable that touched “ENTY_ID” to speed this up. Adding indexes back took another 7 hours. The total time was approximately 24 hours, which is just a small fraction of the original cursor batch. I need to point out that the non-cursor batch uses more resources since the UPDATE now spawns multiple threads to process parallely. Remember our goal here is to make this finish faster not worrying about how much resources it consumes.
Note: this might not be a perfect example because the nested cursor is magnifying the slow performance.
However, the bottom line is; aviod cursors if possible and use joins / set operations whenever you can.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Thursday, March 06, 2008 11:33 AM
Appending Data Using SQL 2008 Filestream
SQL Server 2008 has a new feature called Filestream, which allows you to save large binary files in the file system, instead of in the database. This is targeted directly at the scenario that many document management and web applications have today where they save some metadata in the database, with a column holding the path to the actual file. However, there is no transactional context between the two; therefore, the data in the database can be changed to no longer point to the actual location, or the file moved without the pointer being updated. Users want to be able to have documents, pictures, video, etc. in the file system with the streaming support the file system provides, but have the metadata be consistent. This is exactly what the Filestream feature is targeted for.

When a large binary files gets ‘updated’ most users/applications will replace the file in its entirety (you do not usually update part of an image through the database). Filestream does not currently support in-place updates. Therefore an update to a column with the filestream attribute is implemented by creating a new zero-byte file, which then has the entire new data value written to it. When the update is committed, the file pointer is then changed to point to the new file, leaving the old file to be deleted at garbage collection time. This happens at a checkpoint for simple recovery, and at a backup or log backup.

Code that updates large value data types (varchar(max), nvarchar(max), and varbinary(max)) may use the .Write(offset) UPDATE clause; however since an in-place update of a column with the Filestream attribute is not supported, the .Write(offset) UPDATE clause will error out. Therefore, even though filestream is only a storage attribute, the update code does not work against both systems, requiring knowledge of underlying storage mechanism.

Many of the update actions taken against large data types are actually appends. And many logging scenarios continuously append to an ever-growing file. Filestream offers an option for this case, to avoid the scenario where the original data is pulled to the client, data appended, and then a new file written with the combined data – leaving the old file to be garbage collected later.

In order to avoid this round trip, a device FS control (FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) can be issued to trigger a server-side copy of the old content. This avoids the data moving over the network. This control is not yet documented; note that the DeviceIoControl ControlCode constant that is needed is: 599392. This will be added to the Filestream API documentation at a future date.

Example code:
[DllImport(“kernel32.dll”, SetLastError = true)]
Static extern bool DeviceIoControl(
IntPtr hDevice,
Uint dwIoControlCode,
IntPter lpInBuffer,
Uint ninBufferSize,
[out] IntPtr lpOutBuffer,
Uiny nOutBufferSize,
Ref uint lpBytesReturned,
IntPtr lpOverlapped );

IntPtr hFile = IntPtr.Zero;
Uint bytesReturned;
// hFile = mySafeHafle.DangereousGetHandle(); or something similar

Bool result = DeviceIoControl ( hFile, 599392, IntPtr.Zero, 0, IntPtr.Zero, 0, ref lpBytesReturned, InPtr.Zero );

As Filestream is minimally logged, if there is no activity other than filestream activity, there is usually not enough activity to trigger a checkpoint, and the old files will not get deleted. In non-simple recovery mode, the log will not increase, but the db will grow until those files get garbage collected. This will occur in a background thread, and can impact the server through this activity.

Therefore, it is a best practice to manually issue that checkpoint if in simple recovery mode; and to maintain an optimal transaction log backup to avoid the file growth.

Other quick notes:

Filestream deletes are significantly faster than blob deletes.
If an application needs to commonly delete large data values, filestream will be more scalable as the metadata is changed, then the file garbage collected asynchronously. For instance, the delete of a 6GB nvarbinary(max) file that took 6 seconds, happens in milliseconds with a filestream delete.

Filestream maximum size is different for SNAC 9 and SNAC 10.
Files using SNAC9 have a maximum size of 2GB. SNAC 10 supports unlimited size.

Note: Altering an existing column without a filestream attribute to having the filestream attribute is not supported.
This means that moving data from a blob storage mechanism to filestream mechanism requires that the data be copied to the new datatype with the filestream attribute.


Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Monday, February 25, 2008 10:57 AM
Now is time to try SQL Server 2008
Did you know that SQL Server 2008 is just around the corner, and that the SQL development team has just publicly released the latest Community Technology Preview (CTP 6) version?
This latest beta version of SQL 2008 includes many exciting new features like filtered indexes, sparse columns, data compression, Microsoft sync, integrated full text search, Report rendering in Microsoft Word, Data visualization enhancements, extensive Books Online documentation, the first feature pack, and many, many more.
Now is great time to try it for yourself! Download SQL Server 2008 here .
Also, the SQLCAT teams (including us on the SQLCAT ISV team) have been busy working with our customers and partners implementing pre-release (CTP) builds. We have many lessons learned to share with you; these can be found here
Additionally, we have a number of great best practices papers coming up:
· Index Defragmentation Best Practices
· Best Practices for using Resource Governor with a mixed (OLTP and Reporting) workload
· Data Compression: Design and Implementation Best Practices, Performance Impact
· XEvents Toolkit
· SQL 2008 Usage-Based Optimization Efficacy (subject to name change)
· Scale-Out Querying with Analysis Services 2008
· Best practices for Data Warehousing in SQL Server 2008


Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Friday, June 29, 2007 8:56 AM
Detecting Overlapping Indexes in SQL Server 2005

When SQL Server has an optimal index that satisfies the search predicates of a query the optimizer performs an index SEEK operation as opposed to an index (or table) scan to retrieve the required rows; this is desirable. Based on this, one may be led to believe that having an index for every possible query predicate set would result in all the queries executing optimally. While true, one has to keep in mind that the indexes need to be maintained when the underlying table data in the column included in the index changes, which amounts to overhead for the database engine. So as you may guess, there are advantages of having indexes, but having too many can result in excessive overhead. This implies that you need to carefully evaluate the pros and cons before creating indexes.

In your first evaluation scenario you clearly want to avoid the case of having overlapping indexes as there is no additional value that an overlapping index provides. For example, consider a table ‘TabA’ and its three associated indexes created with the following definitions.

CREATE TABLE TabA
( Col1 INT, Col2 INT, Col3 INT, Col4 INT );
GO

CREATE INDEX idx1 ON TabA ( Col1, Col2, Col3 );
CREATE INDEX idx2 ON TabA ( Col1, Col2 );
CREATE INDEX idx3 ON TabA ( Col1 DESC, Col2 DESC );
GO

In the table structure above, the index idx1 is a superset (overlap) of the index idx2 and therefore redundant. As can be expected any query that needs to perform a search on Col1 and Col2 could use index idx1 just as well as the index idx2 as seen in the graphical query plan below.






Such overlapping indexes are often a result of multiple developers working on the same product and not evaluating and understanding the existing schema before making additions. Once created, detecting such overlapping indexes in a database can often be a laborious task requiring detailed analysis. More importantly, most DBAs do not like to disable or drop indexes because they are not certain of the queries they were created to help with and fear the side effects the action may have.

The script below uses the new system catalog introduced in SQL Server 2005 to report all duplicate indexes in the current database context.

CREATE FUNCTION dbo.INDEX_COL_PROPERTIES (@TabName nvarchar(128), @IndexId INT, @ColId INT)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @IsDescending INT;
SELECT @IsDescending = is_descending_key
FROM sys.index_columns SYSIDXCOLS
WHERE OBJECT_ID(@TabName) = SYSIDXCOLS.object_id
AND @IndexId = SYSIDXCOLS.index_id
AND @ColId = SYSIDXCOLS.key_ordinal;

-- Return the value of @IsDescending as the property
RETURN(@IsDescending);
END;
GO

-- Find Duplicate Indexes in SQL Server Database
CREATE VIEW IndexList_VW AS
SELECT
SYSSCH.[name] AS SchemaName,
SYSOBJ.[name] AS TableName,
SYSIDX.[name] AS IndexName,
SYSIDX.[is_unique] AS IndexIsUnique,
SYSIDX.[type_desc] AS IndexType,
SYSIDX.[is_disabled] AS IsDisabled,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 1 ) AS Column1,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 2 ) AS Column2,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 3 ) AS Column3,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 4 ) AS Column4,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 5 ) AS Column5,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 6 ) AS Column6,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 7 ) AS Column7,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 8 ) AS Column8,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 9 ) AS Column9,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 10 ) AS Column10,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 11 ) AS Column11,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 12 ) AS Column12,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 13 ) AS Column13,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 14 ) AS Column14,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 15 ) AS Column15,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 16 ) AS Column16,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 1 ) AS Column1_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 2 ) AS Column2_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 3 ) AS Column3_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 4 ) AS Column4_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 5 ) AS Column5_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 6 ) AS Column6_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 7 ) AS Column7_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 8 ) AS Column8_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 9 ) AS Column9_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 10 ) AS Column10_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 11 ) AS Column11_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 12 ) AS Column12_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 13 ) AS Column13_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 14 ) AS Column14_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 15 ) AS Column15_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 16 ) AS Column16_Prop
FROM sys.indexes SYSIDX INNER JOIN sys.objects SYSOBJ ON SYSIDX.[object_id] = SYSOBJ.[object_id]
INNER JOIN sys.schemas SYSSCH ON SYSOBJ.schema_id = SYSSCH.schema_id
WHERE SYSIDX.index_id > 0
AND INDEXPROPERTY(SYSOBJ.[object_id], SYSIDX.[name], 'IsStatistics') = 0
GO

SELECT
vwA.SchemaName,
vwA.TableName,
vwA.IndexIsUnique,
vwA.IndexType,
vwA.IndexName,
vwB.IndexName AS OverlappingIndex,
vwA.Column1, vwA.Column2, vwA.Column3, vwA.Column4, vwA.Column5, vwA.Column6, vwA.Column7, vwA.Column8,
vwA.Column9, vwA.Column10, vwA.Column11, vwA.Column12, vwA.Column13, vwA.Column14, vwA.Column15, vwA.Column16
FROM IndexList_VW vwA INNER JOIN IndexList_VW vwB ON vwA.TableName=vwB.TableName
AND vwA.IndexName <> vwB.IndexName
AND vwA.IsDisabled = 0
AND vwB.IsDisabled = 0
AND (vwA.Column1=vwB.Column1 AND vwA.Column1_Prop=vwB.Column1_Prop)
AND ((vwA.Column2=vwB.Column2 AND vwA.Column2_Prop=vwB.Column2_Prop) OR vwA.Column2 IS NULL OR vwB.Column2 IS NULL)
AND ((vwA.Column3=vwB.Column3 AND vwA.Column3_Prop=vwB.Column3_Prop) OR vwA.Column3 IS NULL OR vwB.Column3 IS NULL)
AND ((vwA.Column4=vwB.Column4 AND vwA.Column4_Prop=vwB.Column4_Prop) OR vwA.Column4 IS NULL OR vwB.Column4 IS NULL)
AND ((vwA.Column5=vwB.Column5 AND vwA.Column5_Prop=vwB.Column5_Prop) OR vwA.Column5 IS NULL OR vwB.Column5 IS NULL)
AND ((vwA.Column6=vwB.Column6 AND vwA.Column6_Prop=vwB.Column6_Prop) OR vwA.Column6 IS NULL OR vwB.Column6 IS NULL)
AND ((vwA.Column7=vwB.Column7 AND vwA.Column7_Prop=vwB.Column7_Prop) OR vwA.Column7 IS NULL OR vwB.Column7 IS NULL)
AND ((vwA.Column8=vwB.Column8 AND vwA.Column8_Prop=vwB.Column8_Prop) OR vwA.Column8 IS NULL OR vwB.Column8 IS NULL)
AND ((vwA.Column9=vwB.Column9 AND vwA.Column9_Prop=vwB.Column9_Prop) OR vwA.Column9 IS NULL OR vwB.Column9 IS NULL)
AND ((vwA.Column10=vwB.Column10 AND vwA.Column10_Prop=vwB.Column10_Prop) OR vwA.Column10 IS NULL OR vwB.Column10 IS NULL)
AND ((vwA.Column11=vwB.Column11 AND vwA.Column11_Prop=vwB.Column11_Prop) OR vwA.Column11 IS NULL OR vwB.Column11 IS NULL)
AND ((vwA.Column12=vwB.Column12 AND vwA.Column12_Prop=vwB.Column12_Prop) OR vwA.Column12 IS NULL OR vwB.Column12 IS NULL)
AND ((vwA.Column13=vwB.Column13 AND vwA.Column13_Prop=vwB.Column13_Prop) OR vwA.Column13 IS NULL OR vwB.Column13 IS NULL)
AND ((vwA.Column14=vwB.Column14 AND vwA.Column14_Prop=vwB.Column14_Prop) OR vwA.Column14 IS NULL OR vwB.Column14 IS NULL)
AND ((vwA.Column15=vwB.Column15 AND vwA.Column15_Prop=vwB.Column15_Prop) OR vwA.Column15 IS NULL OR vwB.Column15 IS NULL)
AND ((vwA.Column16=vwB.Column16 AND vwA.Column16_Prop=vwB.Column16_Prop) OR vwA.Column16 IS NULL OR vwB.Column16 IS NULL)
ORDER BY
vwA.TableName, vwA.IndexName
GO

-- Drop function and view created above.
DROP FUNCTION dbo.INDEX_COL_PROPERTIES
GO
DROP VIEW IndexList_VW
GO
Executing the script in a test database with the table (TabA) mentioned above produces output similar to the following.






Indexes idx1 and idx2 are reported to have an overlap, and in this case the subset index that has fewer columns (idx2) can be disabled without any negative side-effects (provided there are no Index hints pointing to this index. This is explained in more detail below). This will result in the same performance for select statements but eliminate the overhead of the database engine having to maintain 2 indexes that serve the same general purpose. This is particularly important if the underlying columns (Col1 and Col2) are highly volatile.

A couple of additional points to keep in mind when using this script:
- Overlapping indexes are only those that have the columns in the same order. For example an index created on Col1, Col2, Col3 (in that order) does not overlap with an index created on Col2, Col1, Col3, even though the columns included are the same
- Indexes created on the same set of columns with the same column order, but with different sort order specifications (ASC, DESC) are distinct indexes and are not overlapping
- Indexes with included columns introduced in SQL Server 2005 are correctly reported as overlapping by the above script. For example, an index created as Col1, Col2 and Col3 as an included column overlaps with the index created on Col1, Col2, Col3. For overlapping indexes that involve included columns it is imperative to disable or drop the index with the higher number of included columns (refer to: http://msdn2.microsoft.com/en-us/library/ms188783(SQL.90).aspx for more information about indexes with included columns)
- Clustered indexes and non-clustered indexes are clearly marked in the script output. If a clustered index is reported as overlapped and you plan to drop it, you may want to take some additional steps to make sure that the overlapping index, or some other index, is created as clustered, because it is a SQL Server best practice to have a clustered index on every table (refer to: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/clusivsh.mspx for more information)
- It is recommended to either disable a duplicate index or save its definition before dropping it, in order to be able to recreate it if necessary
- Before disabling or dropping an index you should make sure that there are no index hints in your application that explicitly use the index being disabled, for example: SELECT * FROM TabA WITH (INDEX (idx2)) WHERE col1 = 28. Disabling or dropping an index that is explicitly used via an index hint will result in the query execution failing with an error message 308 or 315.


Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Tuesday, June 05, 2007 12:47 PM
New SQL Best Practice Articles now available

Please have a look at four new Best Practices Articles
SQL Server 2005 Predeployment I/O best practices http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx by SQLISVPM team member Mike Ruthruff
SQL Server 2005 Deployment Guidance for Web Hosting Environments http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/SQL2005DGWHE.mspxby Alex DeNeui
Resolving Common Connectivity Issues in SQL Server 2005 Analysis Services Connectivity Scenarioshttp://www.microsoft.com/technet/prodtechnol/sql/bestpractice/CISQL2005ASCS.mspxby Carl Rabeler
Implementing Application Failover with Database Mirroringhttp://www.microsoft.com/technet/prodtechnol/sql/bestpractice/implappfailover.mspxby Michael Thomassy, Sanjay Mishra
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Monday, April 30, 2007 5:25 PM
Will 64-bit increase the performance of my SQL Server application?
With 64-bit servers becoming more mainstream and SQL Server customers running more deployments on native 64-bit Windows there is a common misperception that running on 64-bit will always provide an increase in performance over 32-bit. Many customers are surprised to find that some workloads actually run faster on 32-bit SQL Server than on the native 64-bit version. The assumption that 64-bit will always outperform 32-bit is not true and is determined largely by characteristics of the particular workload and hardware. Workloads that have a large volume of small uniform transactions using relatively simple query plans (OLTP-like in nature) which are not experiencing memory pressure may not benefit from running on native 64-bit SQL Server.
Differences between 32-bit and 64-bit
There are differences between 32-bit and 64-bit processes in general which apply to SQL Server. The working set of 64-bit SQL Server is larger than 32-bit both in code and data; mainly due to doubling the size for pointers, 64-bit instructions, structures that are cacheline aligned and more. This means 64-bit will be more sensitive to the size of onboard chip caches which may be used less efficiently when compared to 32-bit, resulting in a higher Cost per Instruction (CPI). The end result is applications which are not memory constrained on 32-bit (especially OLTP) may incur a performance penalty when running on 64-bit using the same hardware. We have observed approximately 10-15% degradation in most cases however the impact will depend on the characteristics of the workload and hardware being used.
Which applications will benefit most from 64-bit?
Many SQL Server applications will realize significant performance benefit running on native 64-bit. This platform will also provide much better scalability in the long term so it is important not to be short sighted when evaluating minor performance differences between the two platforms. Native 64-bit SQL Server typically benefits memory hungry applications the most and also applications requiring memory for items other than data pages. Some of these include:
• Procedure Cache: Lots of stored procedures and dynamic SQL batches (OLTP)

• Workspace Memory: Large number of concurrent hash joins or group by’s; large-scale sorts and index builds (Data Warehouse)

• Connection memory: Large numbers of connections

• Thread memory: High Concurrency OLTP, Complex query plans on multi-CPU machines

• CLR GC Heap memory: CLR SP allocations

• Lock Memory: Large scale OLTP
SQL Server 32-bit version allows use of memory beyond the 2 or 3 GB (32-bit limitation of user mode virtual address space) through the use of Address Windowing Extensions however there are restrictions on the use of AWE mapped memory. Specifically, the items above could not utilize AWE mapped memory (more on this below).
What to consider when deciding between 32-bit and 64-bit
When considering one platform over the other I would recommend thinking about the following:
1. Processor architecture plays a significant role. The processor used will impact the overhead introduced by running native 64-bit SQL Server on 64-bit Windows; specifically the amount of cache on the processor. Older generation processors with less L2/L3 cache will likely incur more of a penalty running native 64-bit SQL Server than will newer processors with larger amounts of cache. It is important to consider this and when possible favor chipsets that offer more cache. The performance gain offered by processors with larger cache sizes vary and can only be determined through testing. An exception to this rule would be AMD processors which have L2 cache but no L3 cache. The memory architecture of AMD processors is very different from Intel processors and the lack of L3 cache may be offset by the different architecture of the data transfer between processor and memory. It is important to note that we are not endorsing one processor over the other; we have observed excellent SQL Server performance on both Intel and AMD processors.
2. Memory latency will have a significant impact. The speed of data transfer between processors and memory will have a significant impact on performance. When accessing main memory latency is important for performance. Slower transfer speeds will result in higher latencies and a bigger performance penalty for cache misses. Faster transfer speed will have a positive, and often, significant impact on the performance of 64-bit SQL Server. Reducing memory latency is a key design point and strength of Non-Uniform Memory Access (NUMA) server architectures.
3. Consider the specifics of your particular workload. If your workload is running on 32-bit and is not experiencing memory pressure or does not have any of the requirements mentioned above then the workload will likely not benefit from 64-bit.
4. Look for unexpected behaviors when testing. We have seen a number of cases where moving to 64-bit resulted in unexpected behavior as a result of removing the virtual address space limitations. On 32-bit systems it is common to run with AWE enabled to allow SQL Server to utilize memory beyond 2 or 3 GB. This memory however can only be used for data cache. On 64-bit this restriction is lifted and non-data cache memory can grow much larger than was possible on 32-bit. This includes items such as procedure cache, lock memory, connection memory, workspace memory, etc… While beneficial in most cases there are also cases where this can expose issues with the application’s use of SQL Server. Some common scenarios include:
• Extensive growth of the procedure cache as a result of a high volume of non-parameterized SQL statements. Service Pack 2 for SQL 2005 has some optimizations to help address these issues but it is considered a best practice to parameterize queries when possible and the sp2 optimizations do not replace the need to follow this advice.
• Failure to put a limit on ‘max server memory’ resulting in system wide memory pressure. It is important on 64-bit systems to place an upper limit on ‘max server memory’ leaving sufficient room for the operating system and ensure the SQL Server service account is granted the ‘Lock pages in memory’ security privilege (On 64-bit SQL Server the lock pages in memory privilege is only valid for Enterprise edition). The goal of setting an upper limit is to ensure that there is sufficient memory for the OS, other applications on the machine and allocations within SQL Server that are satisfied from outside the buffer pool. SQL Server’s buffer pool is dynamic and will adjust to external memory pressure however being proactive is always a good strategy. It is not uncommon on systems with large amount of physical memory (>= 16GB) to leave several GB for these items. Refer to http://support.microsoft.com/kb/918483 for more information.
5. Think long term. Although there are times when a workload will perform slightly slower on native 64-bit the difference is most cases is not significant (usually 10% or less). It is important not to let a small impact to performance overshadow the long term benefits of 64-bit SQL Server. Native 64-bit will offer significantly better scalability than 32-bit and is the future of computing (especially for database applications). Even if a workload does not require or benefit from the additional virtual address space now it may in the future.
6. Monitor and Test. By monitoring current performance of your application you may be able to get an idea of whether or not your specific workload will benefit from 64-bit SQL Server. Some potential indicators of memory pressure which may indicate a scenario when 64-bit will help are:
• High rate of I/O coupled with low page life expectancy and/or constant lazy writer activity.
• Constant non-zero values for memory grants pending.
• High rate of temporary objects being created/destroyed.
• High value for stolen pages as a percentage of the total pages consumed by the SQL Server buffer pool.
Ultimately however the only way to definitely know whether or not your application will benefit from running native 64-bit is to test.

Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 3 Comments
Friday, March 23, 2007 8:36 AM
Upgrading to SQL Server 2005 and DEFAULT_SCHEMA setting.
Upgrading to SQL Server 2005 has many benefits . As we look back at the past year and the deployment of SQL Server 2005 with our ISV partners, one important but under-advertised feature in SQL Server 2005 became increasingly visible; the ability to control the dynamic name resolution behavior. This capability applies to workloads where object names are NOT fully-qualified with a two or a three part name (for example: “database.schema.object”). If you already fully qualify your object references, which is a SQL Server best practice, then this blog does not apply to you.

In this post, I will first explain the behavior in SQL Server 2000 and expand on what changed in SQL Server 2005. Then I’ll point out how to identify the issue in your workloads and how to change the DEFAULT_SCHEMA setting to take advantage of the benefits.

Dynamic Name Resolution - SQL Server 2000 vs. 2005
A query such as "select * from table1" in SQL Server 2000 goes through a set of steps to resolve and validate the object references before execution. The search first looks at the identity of the connection executing the query. Let’s assume we have a user connection through "user1". For “select * from table1”, SQL Server 2000 searches for the object "user1.table1" first. If object does not exist, the next step is to search for "dbo.table1". These searches complete very quickly but can still be visible to the naked eye in cases where you have a workload with many ad-hoc queries (with today’s hardware this typically means over 100 queries per second) that contain not-fully-qualified object references. There is also a chance that, with the new flexibilities introduced in SQL Server 2005 around separation of users and schemas (http://msdn2.microsoft.com/en-us/library/ms190387.aspx), you may see a higher overhead in name resolution compared to SQL Server 2000. In extreme cases, we have seen the impact of name resolution to be 10-15% on throughput.

However SQL Server 2005 provides a mechanism to allow finer control over name resolution to the administrators. By manipulatng the value of the default_schema_name columns in the sys.database_principals Dynamic Management View, you can change the way name resolution is performed. You can assign a default schema for each database principal (a.k.a user) to go directly to the correct schema if an object name is not fully qualified, and bypass the unnecessary steps.

Reusing the example from above; with DEFAULT_SCHEMA set to 'dbo' in a SQL Server 2005 database, user1 executing 'select * from table1' will directly resolve immediately to 'dbo.table1' without searching for 'user1.table1'.

In fact, if you use the new CREATE USER syntax from SQL server 2005, you will get ‘dbo’ as the DEFAULT_SCHEMA by default. So if you are developing in a new database under SQL Server 2005, you are already taking advantage of this name resolution shortcut. However, the behavior for upgraded databases is different . During upgrade from previous versions of SQL Server, to preserve the same behavior, SQL Server 2005 sets the DEFAULT_SCHEMA to the name of the user. Example: ‘user1’ gets DEFAULT_SCHEMA of 'user1' when you upgrade a database from an earlier version to SQL Server 2005 by attaching or any other method. The reality is, in most instances, applications don't use anything but the ‘dbo’ schema. Nevertheless, it is close to impossible to detect whether your database and queries contain references to only objects in a specific schema.

Why? Well, it may be that some of your objects and queries get generated dynamically and only parsed right before execution. Obviously preserving the backward compatible behavior also means that you still pay for the added cost of the late bound name resolution.

This was exactly the reason why a few of our partners saw higher CPU cycles after the upgrade and experienced slower throughput in SQL Server 2005. In those cases, DEFAULT_SCHEMA setting got us back the throughput we were expecting from SQL Server 2005. The improvements have been as much as 15% in throughput.

Obviously the best practice still stands: You should fully qualify all object names and not worry about the name resolution cost at all. The reality is, there are still many imperfect applications out there and this setting help great for those cases.

How do you know if this is an issue for your workload?

Lets take a look at ways we can tell if this setting could benefit your workload.

The brute force method is to look at your app code or SQL Server 2005 Profiler for the queries that you are executing. If you do not have fully qualified names and you are executing them through a database principal (user) that has a DEFAULT_SCHEMA value that does not have the intended schema name for the not-fully-qualified objects in your queries, then you have the problem.

Reiterating the example from above; user1, with DEFAULT_SCHEMA='user1', executing 'select * from table1' and resolving to 'dbo.table1' could benefit from setting DEFAULT_SCHEMA to 'dbo'.

The other option is to use sys.dm_exec_cached_plans or sys.dm_exec_query_stats to look at the cached plans. Again you can go after the text and look for name qualification. You can also look for the results from the following query.

You can use the following query to see the cached plans that may indicate an issue with not-fully-qualified object names;
SELECT * FROM
(
SELECT e.[text], e.[user_id], e.[dbid]
FROM (
SELECT b.text,c.attribute,c.value
FROM sys.dm_exec_cached_plans AS a
OUTER APPLY sys.dm_exec_sql_text(a.plan_handle) AS b
OUTER APPLY sys.dm_exec_plan_attributes(a.plan_handle) AS c
) AS d
PIVOT (MAX(d.value) FOR d.attribute IN ("user_id", "dbid")) AS e
) AS f
WHERE [dbid] not in (32767) -- leave out the resource database
AND [user_id] not in (-2,1,4) -- leave out the generic plans, plans that belong to dbo and sys

If the resultset is non-empty, your application may be a good candidate for the DEFAULT_SCHEMA setting consideration. This is a good shortcut but is not a 100% guarantee. Be aware that sys.dm_exec_cached_plans contains only the cached plans. The cache fluctuates under memory pressure so some queries may get pushed out of the cache due to aging and pressure or, in some cases, may not get cached at all due to cost or security restrictions.

One additional thing to watch for; If you do not see a range of user_ids in the resultset, the benefits from setting the DEFAULT_SCHEMA may not be significant due to other optimizations and shortcuts in the engine around plan caching.

How do you change the DEFAULT_SCHEMA?
Assuming your database is using objects only in the dbo schema here is what you would run for each user:
ALTER USER WITH DEFAULT_SCHEMA=dbo

You can validate the current DEFAULT_SCHEMA settings by looking at the default_schema_name column in the sys.database_principals Dynamic Management View.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Thursday, February 22, 2007 11:47 AM
why cannot i GRANT EXECUTE permission on assemblies anymore ?
Just a quick note that this is by design and no you don't need to use it.
What are we talking about ?
Books online talks about granting assembly permissions. You used to do
GRANT execute on ASSEMBLY :: to with SQL Server 2005
Your database holds your assemblies and as such you could grant / revoke limited set of permissions. We initially allowed you to grant execute permission on the assembly but now when you execute the same on a server with the SP2 service pack, you get the following :
Msg 102, Level 15, State 1, Line 0
Incorrect syntax near 'EXECUTE...'.
Why did this change ?
Execute permission gave a false sense of security and was really inappropriate. Here's why:
As we shipped SQL Server 2005, this permission set was supported in TSQL but never enforced. So you could say "deny execute on assembly" but nothing happened. With this fix to revert the support from TSQL, you will not be misled anymore and we wanted to get this fix to you in a service pack of the same product that introduced you this immensely easy way of expressing your logic in assemblies. Going forward, with database upgrades to next release, any lingering assemblies that display this permission will be automatically corrected. And so you don't have to worry about a thing.
So how can i secure assemblies registered to the server ?
The assembly is really a container for types, objects etc which are independently permissible. The assembly itself can be marked safe, external_access and unsafe. In addition, you can use the 'trustworthy' property of the database for similar effect. For a complete list with description of our security model, please refer CLR Integration security in Books Online.
Thanks,
Mayank Nagar