Friday, October 24, 2008

Using Views to Enforce Business Rules

A view is most commonly thought of as a SELECT statement. Most developers will simply create a view to "group" complex SELECT statements for reuse within another view or stored procedures. It makes typing easier! But the really power of views is their ability to implement business rules.

To fully understand how this is possible, we must go back to basics. A view in the relational model has as much standing as a relation (table). This means it can be treated in EXACTLY the same way. You can insert, update and delete data from a view, add or remove columns and most importantly add constraints to the view.

The relational model has four types of constraints that can be used to implement business models and rules.

1. Domain/Type (Attribute/column)
2. Tuple (Row)
3. Relation (Table)
4. Database (Multiple tables)

SQL Server handles the first 3 fairly well but is limited to only one type of Database constraint: The "Foreign Key".

While extremely useful, the foreign key is the simplest form of database constraint. Practically, most business models will need much more complexity than a simple Parent-Child relationship. This is where views can be used.

Let's use an example where a business has clients that generate invoices. Each invoice belongs to a particular client. The business categorizes their clients based on spending limits. They want to restrict the total of each invoice to ensure that certain clients do not exceed a limit.

This gives us 3 tables:

* Client
* SpendingType
* ClientInvoice

create table SpendingType(
SpendingType varchar(25) NOT NULL primary key,
InvoiceLimit money not null)
go
create table Client(
ClientID int not null primary key,
ClientName varchar(50)
, SpendingType varchar(25) not null references SpendingType (SpendingType))
go
create table ClientInvoice(
InvoiceID int not null,
ClientID int not null,
TotalInvoice money not null
, primary key (InvoiceID, ClientID),
foreign key (ClientID) References Client (ClientID))
go

This yields us this entity-relationship diagram (ERD):

Given this ERD, we can see there is nothing to enforce our "Spending Limit" rule.

Enter the view...

create view ClientInvoice_SpendingConstraint
as
select InvoiceID,
ClientID,
TotalInvoice
from dbo.ClientInvoice CI
where exists
(Select 1
from dbo.Client C
inner join dbo.SpendingType ST on C.SpendingType = ST.SpendingType
where C.ClientID = CI.ClientID
and TotalInvoice <= ST.InvoiceLimit)
with check option

Notice the "with check option". This tells SQL Server to enforce the constraints defined by the view. There are several limitations to creating update-able views which practically can be summarised into 2 golden rules.

1. Express the entire table. Declare all columns in the underlying table in the view definition.
2. Don't touch yourself. Never reference the primary table (in our example the ClientInvoice table) in the constraint (WHERE).

When this view is presented to the user, any INSERT or UPDATE into this view must satisfy our rule. Failure to do so will result in an exception being thrown by SQL Server. Because the constraint is expressed as a set, the view can handle multiple row insert and updates effortlessly thus ensuring the ACID principle. Watch the execution plan for the successful insert/update to see how efficiently SQL Server processes the rule.

We can test the view using these statements:

insert SpendingType values ('Standard', 1000)
insert SpendingType values ('Premium', 5000)
insert Client values (1, 'David', 'Standard')
insert Client values (2, 'Peter', 'Premium')
go

--David is under 1000
insert ClientInvoice_SpendingConstraint values (1,1,600)
/* Result: (1 row(s) affected) */

--David is over 1000 (bad)
insert ClientInvoice_SpendingConstraint values (2,1,1600)

/* Result: The attempted insert or update failed because the target view either
specifies WITH CHECK OPTION or spans a view that specifies WITH CHECK OPTION and
one or more rows resulting from the operation did not qualify under the CHECK OPTION constraint. */


--David is 1000 (good)
insert ClientInvoice_SpendingConstraint values (2,1,1000)
/* Result: (1 row(s) affected) */
Go

-- Update that violates the rule.
update ClientInvoice_SpendingConstraint set TotalInvoice = 1001 where InvoiceID = 2

/* Result: The attempted insert or update failed because the target view either
specifies WITH CHECK OPTION or spans a view that specifies WITH CHECK OPTION and
one or more rows resulting from the operation did not qualify under the CHECK OPTION constraint. */

go
--Peter is under 5000
insert ClientInvoice_SpendingConstraint values (3,2,2600)
/* Result: (1 row(s) affected) */
go
Select * from ClientInvoice
Go

Most developers would choose a stored procedure or a trigger to implement this rule. But consider the advantages using the view gives:

1. Set based and Optimised. The view is compiled and BCP and BULK INSERT friendly.
2. Abstraction. The view provides the possibility to change business rules very quickly with minimal physical impact.
3. Tool friendly. Extracting view metadata is a very common feature.

The only downside I see is the error that SQL Server throws. The error message is ugly without any detailed information.

Using BULK INSERT to Load a Text File

his example combines dynamic SQL, BULK INSERT and the proper handling of double-quotes to solve a client's problem with loading various text file formats into a database. (This article has been updated through SQL Server 2005.)

One of my clients contacted me recently and said they needed some help creating a stored procedure that imported data from a text file. They wanted the procedure to accept three parameters: PathFileName, OrderID, and FileType. The PathFileName is simply the name and physical location of the source file on the hard drive, the OrderID is generated in the program that calls the procedure and the FileType indicates the format of the data in the source file. The two possible formats for the source data are shown here:

FileType=1 (TxtFile1.txt)

"Kelly","Reynold","kelly@reynold.com"
"John","Smith","bill@smith.com"
"Sara","Parker","sara@parker.com"

FileType=2 (TxtFile2.txt)

Kelly,Reynold,kelly@reynold.com
John,Smith,bill@smith.com
Sara,Parker,sara@parker.com

BULK INSERT

I decided to use BULK INSERT to implement the solution. The BULK INSERT statement was introduced in SQL Server 7 and allows you to interact with bcp (bulk copy program) via a script. In pre-7 versions the only way you could access bcp functionality was from a command prompt. I am not going to list the full syntax of BULK INSERT here (but you can find it here), because it is a little long and most of it does not apply to the problem I am solving. Instead, I will show the valid BULK INSERT statements used to load the data shown above.

BULK INSERT TmpStList FROM 'c:\TxtFile1.txt' WITH (FIELDTERMINATOR = '","')

TmpStList is the target table and TxtFile1.txt is the source data file. The source file is located in the root of the C drive. The FIELDTERMINATOR argument allows you to specify the delimeter used to discern column values.

The valid statement for FileType=2 is shown here:

BULK INSERT tmpStList FROM 'c:\TxtFile2.txt' WITH (FIELDTERMINATOR = ',')

The only difference is the value of the FIELDTERMINATOR argument.
The Solution

The stored procedure used to implement the solution is fairly straight forward once you master the BULK INSERT statement. The only real trick is loading the data that comes to you in FileType=1 format. Because a double-quote starts and ends a data row, it too is loaded in the table. The FIELDTERMINATOR works between columns, not at the beginning or end of a row. To workaround this I simply load the data into a temporary table and then use a CASE statement and the SUBSTRING and DATALENGTH functions to load the correct data in the final table. The FileType=2 data will load as-is, but I still put in the temporary table for consistency (easier programming).

The SQL statements that create the temporary and final table are shown here.

CREATE TABLE StudentList
(
StID int IDENTITY NOT NULL,
StFName varchar(50) NOT NULL,
StLName varchar(50) NOT NULL,
StEmail varchar(100) NOT NULL,
OrderID int NOT NULL
)
go
CREATE TABLE TmpStList
(
stFName varchar (50) NOT NULL,
stLName varchar (50) NOT NULL,
stEmail varchar (100) NOT NULL
)
go

The procedure used to implement the data loading is shown here.

SET QUOTED_IDENTIFIER OFF
go
CREATE PROCEDURE ps_StudentList_Import
@PathFileName varchar(100),
@OrderID integer,
@FileType tinyint
AS

--Step 1: Build Valid BULK INSERT Statement
DECLARE @SQL varchar(2000)
IF @FileType = 1
BEGIN
-- Valid format: "John","Smith","john@smith.com"
SET @SQL = "BULK INSERT TmpStList FROM '"+@PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
END
ELSE
BEGIN
-- Valid format: John,Smith,john@smith.com
SET @SQL = "BULK INSERT TmpStList FROM '"+@PathFileName+"' WITH (FIELDTERMINATOR = ',') "
END

--Step 2: Execute BULK INSERT statement
EXEC (@SQL)

--Step 3: INSERT data into final table
INSERT StudentList (StFName,StLName,StEmail,OrderID)
SELECT CASE WHEN @FileType = 1 THEN SUBSTRING(StFName,2,DATALENGTH(StFName)-1)
ELSE StFName
END,
SUBSTRING(StLName,1,DATALENGTH(StLName)-0),
CASE WHEN @FileType = 1 THEN SUBSTRING(StEmail,1,DATALENGTH(StEmail)-1)
ELSE StEmail
END,
@OrderID
FROM tmpStList

--Step 4: Empty temporary table
TRUNCATE TABLE TmpStList
go

The first thing you need to know is that SET QUOTED_IDENTIFIER is set to OFF because double-quotes are used to set the value of a variable. Dynamic SQL is used to create the BULK INSERT statement on-the-fly, and double-quotes are required to do this. The final BULK INSERT statement is a function of both the @PathFileName and @FileType parameters. Once built, it is executed with the EXEC() statement and the source data is loaded into the temporary table.

Once the data is in TmpStList, the next step is to load it into the final table. I use the CASE statement to determine the value in the @FileType parameter and manipulate accordingly. When @FileType=1, the SUBSTRING and DATALENGTH functions are used to remove the double-quotes from the StFName and StEmail columns. When FileType=2, the data is loaded as is and no manipulation is required.

After the data is loaded I empty the temporary table with the TRUNCATE TABLE statement. I could have used DELETE to accomplish this, but TRUNCATE TABLE has less of an impact on the transaction log.

The following shows the way to call the procedure specifying a different FileType value for each call.

EXEC ps_StudentList_Import 'c:\TxtFile1.txt',1, 1
EXEC ps_StudentList_Import 'c:\TxtFile2.txt',1, 2

Thursday, October 23, 2008

Tips for tuning SQL Server 2005 to improve reporting performance

There are a few things you can do to configure SQL Server for improved reporting performance. The first two are generic, and the rest pertain to SQL Server Reporting Services:

1. Create plenty of indexes to support your queries. In OLTP systems, we need to be more careful, since updating rows also requires updating all indexes. Therefore, we usually try to keep a balance with the number of indexes we put in place. But since these are reporting tables, it is highly desirable to create indexes to support your report queries; otherwise, the performance might be far from desirable.

2. Create a few "decision support" tables and populate them with data aggregated from the big tables. I often see reports processing millions of rows and then showing just a few rows with a high level summary of data grouped by a specific criteria. In many cases, you can just run that query once (or maybe daily or weekly) and store the results in small aggregate tables. In your reports, you can then use these small tables, and the execution time is much faster. If your reports use aggregations, give this technique serious consideration.

3. Reporting Services supports report caching. How does it work? When you configure a report to use caching (go to Report Properties and click on Execution), Reporting Services only executes the query once. After that, it caches the rendered report for the length of time you specified for caching. On any subsequent report requests, the report is served from the cached copy. Reporting Services creates a cached copy for each unique combination of parameters, so typically, the first person to run that combination of parameters has to wait longer. Any future requests are then returned much faster, and without going to the data source. This can be a huge time saver for reports with queries that take several minutes to execute.

4. Similarly, you can configure a report to be cached and rendered from a snapshot. You can think of it as a pre-processed report. Snapshots can be scheduled to be created off hours, and then the report would be served from the snapshot. This is suitable for reports with no parameters, long execution time, and when your report does not need real-time data.

Super Sizing Columns in SQL Server

SQL Server 2005, columns can also be Super Sized due to the introduction of the MAX Specifier. In previous versions of SQL Server, if an application allowed for the storage of string data that would exceed 8000 bytes, the only option available was to use the TEXT or NTEXT data type. By using either one of these data types, common operators were unable to be used, meaning that tasks such as searching and updating data was a complex process. With the introduction of the MAX Specifier it is now possible to work with large objects in SQL Server in ways that were previously not possible.

With the introduction of the MAX Specifier there is no longer a need to perform the complex manipulation of large objects that requires the use of the TEXTPTR operator to determine a the pointer to the value before using a set of specialized commands. The following examples show the complexity involved with working with the TEXT and NTEXT data types as a result of having to use a different set of operators. The example below illustrates how to find the first 10 characters for the pr_info column in the pub_info table:

DECLARE @ptrval varbinary(16);

SELECT @ptrval = TEXTPTR(pr_info)
FROM pubs.dbo.pub_info
WHERE pub_id = '0736'

READTEXT pub_info.pr_info @ptrval 0 10;

Whereas the following example illustrates how to update the first 10 characters for the pr_info column which is a TEXT data type:

DECLARE @ptrval binary(16)

SELECT @ptrval = TEXTPTR(pr_info)
FROM pubs.dbo.pub_info
WHERE pub_id = '0736'

UPDATETEXT pub_info.pr_info @ptrval 0 4 'This'

The introduction of the MAX Specifier in SQL Server 2005 provides the ability for variable length columns that previously were limited to 8000 bytes to store large amounts of data. The introduction also means that the NTEXT, TEXT and IMAGE data types are candidates for not being supported in future versions of SQL Server. Hence, it is recommended that the MAX Specifier is used for the storage of large amounts of data.

The MAX Specifier increases the maximum storage capabilities of the VARCHAR and VARINARY data types up to 2^31-1 bytes and up to 2^30-1 bytes for NVARCHAR. VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) are collectively called large-value data types. The MAX Specifier is in effect, a very large variable length column. Although the maximum size of the MAX specifier is approximately 2GB, the size is actually the maximum size that SQL Server supports. This means that in future versions of the product if the maximum size supported increases, the MAX specifier will automatically be able to support the size increase without any modifications. To use the MAX Specifier the word MAX is used in place of a size when you define a column or variable. In the following example the LargeColumn in the LargeDataType table is created using the VARCHAR data type and the MAX Specifier as the size.

CREATE TABLE dbo.LargeDataType
(
LargeColumn VARCHAR(MAX)
)

There are two ways in SQL Server that columns that are defined with the MAX Specifier may be stored. They can be stored either in a page with the other columns in a row or alternatively off-page. When the MAX Specifier is used for a column, SQL Server uses its own algorithms to determine whether to keep the value in line within the physical record or store the value externally to the base record and to keep track of the value by using a pointer. SQL Server will store the data as either a VARCHAR, NVARCHAR, VARBINARY or as a Large Object (LOB). If the length of the column is less than or equal to 8000 bytes, SQL Server will store the data in-page and where it is greater than 8000 bytes SQL Server will check the row size to determine the appropriate storage. If the row size is less than the size of a page (8060 bytes), the data values will be stored as in-row data whereas if the row size is greater than 8060 bytes, the data values are stored as LOB data with only a 16 byte pointer stored in the row. It is possible to override this default behaviour by using the new table option called large value type out of row so that columns defined with the MAX Specifier are always stored as a LOB. The following example illustrates how this option can be enabled by using the system Stored Procedure sp_tabeloption.

EXEC sp_tableoption
'dbo.LargeDataType', 'large value types out of row', 1

When the option is set to 1 (enabled) the data in columns that have been defined with the MAX specifier will always be stored out of row as a LOB with only a 16-byte text pointer stored in the row. Text pointers point to the root node of a tree built of internal pointers that map to the pages in which string fragments are actually stored.

The advantage of storing large-value data types in-row with the other columns in a table is that SQL Server can return a single row with only one I/O operation. If the bulk of SQL Server statements do not return large-value data type columns then the data should be stored out of row. This allows for a greater number of rows to be stored on a data page allowing a greater number of rows to be returned for each I/O operation.

Unlike TEXT and NTEXT data types that store the data off page, it does not matter where the data is stored for columns defined with the MAX Specifier. As no matter where the data is stored the column can be treated as a standard variable length data type. Hence, different operators do not need to be used and all of the standard operators can be used. This means that in SQL Server, there is now a unified programming model for working with regular types and large objects. The restrictions that previously existed for the use of TEXT and NTEXT as variables in Stored Procedures and Functions also no longer exists with large-value data types.

The following example illustrates just a few of the ways that standard operators can now be used with large-value data types.

String concatenation can now be used with large data types. The thing to note with this example is the use of the REPLICATE function. The REPLCIATE function returns a character expression of the same type as the supplied character expression. So if the supplied expressions is not CAST as a large data type the MAXIMUM length of the expression returned would be 8000 characters.

INSERT INTO dbo.LargeDataType(LargeColumn)
SELECT 'There is lots of data in this row ' +
REPLICATE(CAST('x' AS VARCHAR(MAX)), 100000)

Updates can be made directly to large data types without the need to use the UPDATETEXT operator:

UPDATE dbo.LargeDataType
SET LargeColumn = REPLACE(LargeColumn, 'lots', 'lots and lots')

Standard string operators such as SUBSTRING can now be used with large data types

SELECT SUBSTRING(LargeColumn, 10, 4)
FROM dbo.LargeDataType

The support for large data types is a valuable new addition to SQL Server. So if you are designing a new application that needs to store vales that are greater than 8000 bytes you should use the new large-value data types. As not only will the use of large-value data types assist by providing a unified programming model, it will also ensure that your application can take advantage of additional storage in future versions of SQL Server, allowing you to really Super Size your columns.

Wednesday, October 22, 2008

Performance Tuning Tips for SQL Server Backup and Restore

If you suspect that your backup or restore operations to disk are running at sub-optimal speeds, you can help verify this by using one or more of the following Performance Monitor counters to measure I/O activity during a backup or restore:

* SQL Server Backup Device Object: Device Throughput Bytes/sec: This counter measures how much data is being backed up or restored. While there is no absolute value this counter should show, it should give you an idea of how fast your backups or restores are occurring. If this value appears to be small in relation to how fast you think your I/O system is, then perhaps there is some bottleneck preventing your backups or restores from occurring faster.

* Physical Disk: % Disk Time: As a rule of thumb, the % Disk Time counter should run less than 55%. If this counter exceeds 90% for continuous periods when performing backups or restores (over 10 minutes or so) then your SQL Server may be experiencing an I/O bottleneck. If you suspect a physical disk bottleneck, you may also want to monitor the % Disk Read Time counter and the % Disk Write Time counter in order to help determine if the I/O bottleneck is being mostly caused by reads or writes.

* Physical Disk Object: Avg. Disk Queue Length: If the Avg. Disk Queue Length exceeds 2 for continuous periods when performing backups or restores (over 10 minutes or so) for each disk drive in an array, then you probably have an I/O bottleneck for that array. You will need to calculate this figure because Performance Monitor does not know how many physical drives are in arrays.

If you find that you do have an I/O bottleneck during backups or restores, your options to correct this include increasing the speed of your disk I/O system, reducing the load on your current system by performing backups or restores on less busy times, or backing up to a local tape device or over the network (assuming you are not doing that now). [6.5, 7.0, 2000, 2005]

Tuesday, October 21, 2008

Basic objects of T-SQL in SQL Server 2008

Learn the elementary objects and basic operators supported by the Transact-SQL language in SQL Server 2008 in this excerpt from "Microsoft SQL Server 2008: A Beginner's Guide." You'll find out about the basic objects and operators of T-SQL, including constants, identifiers and delimiters, and you'll also discover the corresponding data types that relate to each object and operator. Author Dusan Petkovic also describes how NULL values, scalar objects and global variables operate in SQL Server 2008.
SQL's Basic Objects

The language of Database Engine, Transact-SQL, has the same basic features as other common programming languages:

* Literal values (also called constants)
* Delimiters
* Comments
* Identifiers
* Reserved keywords

The following sections describe these features.
Literal Values

A literal value is an alphanumerical, hexadecimal, or numeric constant. A string constant contains one or more characters of the character set enclosed in two single straight quotation marks (' ') or double straight quotation marks (" ") (single quotation marks are preferred due to the multiple uses of double quotation marks, as discussed in a moment). If you want to include a single quotation mark within a string delimited by single quotation marks, use two consecutive single quotation marks within the string. Hexadecimal constants are used to represent nonprintable characters and other binary data. Each hexadecimal constant begins with the characters '0x' followed by an even number of characters or numbers. Examples 4.1 and 4.2 illustrate some valid and invalid string constants and hexadecimal constants.

Example 4.1
Some valid string constants and hexadecimal constants follow:

'Philadelphia'
"Berkeley, CA 94710"
'9876'
'Apostrophe is displayed like this: can''t' (note the two consecutive single quotation marks)
0x53514C0D

Example 4.2
The following are not string constants:

'AB'C' (odd number of single quotation marks)
'New York" (same type of quotation mark—single or double—must be used at each end of the string)

The numeric constants include all integer, fixed-point, and floating-point values with and without signs (see Example 4.3).

Example 4.3
The following are numeric constants:
130
–130.00
–0.357E5 (scientific notation—nEm means n multiplied by 10^m)
22.3E-3

A constant always has a data type and a length, and both depend on the format of the constant. Additionally, every numeric constant has a precision and a scale factor. (The data types of the different kinds of literal values are explained later in this chapter.)
Delimiters

In Transact-SQL, double quotation marks have two meanings. In addition to enclosing strings, double quotation marks can also be used as delimiters for so-called delimited identifiers. Delimited identifiers are a special kind of identifier usually used to allow the use of reserved keywords as identifiers and also to allow spaces in the names of database objects.

Note: Differentiation between single and double quotation marks was first introduced in the SQL92 standard. In the case of identifiers, this standard differentiates between regular and delimited identifiers. Two key differences are that delimited identifiers are enclosed in double quotation marks and are case sensitive. (Transact-SQL also supports the use of square brackets instead of double quotation marks.) Double quotation marks are only used for delimiting strings. Generally, delimited identifiers were introduced to allow the specification of identifiers, which are otherwise identical to reserved keywords. Specifically, delimited identifiers protect you from using names (identifiers, variable names) that could be introduced as reserved keywords in one of the future SQL standards. Also, delimited identifiers may contain characters that are normally illegal within identifier names, such as blanks.

In Transact-SQL, the use of double quotation marks is defined using the QUOTED_IDENTIFIER option of the SET statement. If this option is set to ON, which is the default value, an identifier in double quotation marks will be defined as a delimited identifier. In this case, double quotation marks cannot be used for delimiting strings.
Comments

There are two different ways to specify a comment in a Transact-SQL statement. Using the pair of characters /* and */ marks the enclosed text as a comment. In this case, the comment may extend over several lines. Furthermore, the characters -- (two hyphens) indicate that the remainder of the current line is a comment. (The two hyphens -- comply with the ANSI SQL standard, while /* and */ are the extensions of Transact-SQL.)
Identifiers

In Transact-SQL, identifiers are used to identify database objects such as databases, tables, and indices. They are represented by character strings that may include up to 128 characters and can contain letters, numerals, or the following characters: _, @, #, and $. Each name must begin with a letter or one of the following characters: _, @, or #. The character # at the beginning of a table or stored procedure name denotes a temporary object, while @ at the beginning of a name denotes a variable. As indicated earlier, these rules don't apply to delimited identifiers (also known as quoted identifiers), which can contain, or begin with, any character (other than the delimiters themselves).
Reserved Keywords

Each programming language has a set of names with reserved meanings, which must be written and used in the defined format. Names of this kind are called reserved keywords. Transact-SQL uses a variety of such names, which, as in many other programming languages, cannot be used as object names, unless the objects are specified as delimited or quoted identifiers.

Note: In Transact-SQL, the names of all data types and system functions, such as CHARACTER and INTEGER, are not reserved keywords. They can therefore be used for denoting objects. (Do not use data types and system functions as object names! Such a use makes Transact-SQL statements difficult to read and understand.)

Monday, October 20, 2008

Performance tuning tips for database developers

Performance tuning is not easy, but you can go a surprisingly long way with a few basic guidelines.

In theory, performance tuning is done by a DBA. But in practice, the DBA is not going to have time to scrutinize every change made to a stored procedure. Learning to do basic tuning might save you from reworking code late in the game.

Below is my list of the top 15 things I believe developers should do as a matter of course to tune performance when coding. These are the low hanging fruit of SQL Server performance – they are easy to do and often have a substantial impact. Doing these won’t guarantee lightening fast performance, but it won’t be slow either.

1. Create a primary key on each table you create and unless you are really knowledgeable enough to figure out a better plan, make it the clustered index (note that if you set the primary key in Enterprise Manager it will cluster it by default).
2. Create an index on any column that is a foreign key. If you know it will be unique, set the flag to force the index to be unique.
3. Don’t index anything else (yet).
4. Unless you need a different behaviour, always owner qualify your objects when you reference them in TSQL. Use dbo.sysdatabases instead of just sysdatabases.
5. Use set nocount on at the top of each stored procedure (and set nocount off) at the bottom.
6. Think hard about locking. If you’re not writing banking software, would it matter that you take a chance on a dirty read? You can use the NOLOCK hint, but it’s often easier to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED at the top of the procedure, then reset to READ COMMITTED at the bottom.
7. I know you’ve heard it a million times, but only return the columns and the rows you need.
8. Use transactions when appropriate, but allow zero user interaction while the transaction is in progress. I try to do all my transactions inside a stored procedure.
9. Avoid temp tables as much as you can, but if you need a temp table, create it explicitly using Create Table #temp.
10. Avoid NOT IN, instead use a left outer join - even though it’s often easier to visualize the NOT IN.
11. If you insist on using dynamic sql (executing a concatenated string), use named parameters and sp_executesql (rather than EXEC) so you have a chance of reusing the query plan. While it’s simplistic to say that stored procedures are always the right answer, it’s also close enough that you won’t go wrong using them.
12. Get in the habit of profiling your code before and after each change. While you should keep in mind the depth of the change, if you see more than a 10-15% increase in CPU, Reads, or Writes it probably needs to be reviewed.
13. Look for every possible way to reduce the number of round trips to the server. Returning multiple resultsets is one way to do this.
14. Avoid index and join hints.
15. When you’re done coding, set Profiler to monitor statements from your machine only, then run through the application from start to finish once. Take a look at the number of reads and writes, and the number of calls to the server. See anything that looks unusual? It’s not uncommon to see calls to procedures that are no longer used, or to see duplicate calls. Impress your DBA by asking him to review those results with you.

If you take these 15 steps, you’ve made a really good first pass.

There’s more to learn next time as we build a model of how your application, the network, and SQL Server all offer the potential for bottlenecks. We will also look at the potential for improving performance and some more steps that you can take without stepping too far into the land of the DBA.

Effect of DBCC CHECKDB and DBCC DBREINDEX on the Transaction log

DBCC DBREINDEX can be used to rebuild one or more indexes for a specific table and it is an offline operation. While this operation is running, the underlying table is unavailable to users of the database. DBCC DBREINDEX rebuilds indexes dynamically. During this operation it restores the page density levels to the original fillfactor (default); or the user can choose another target value for the page density. Internally, running DBCC DBREINDEX is very similar to using TSQL statements to drop and re-create the indexes manually.

Internally DBREINDEX process occurs as a single, atomic transaction. To reduce the fragmentation, new indexes must be completely built and in place before the old index pages are released. Performing the rebuild requires adequate free space in the data files. At this point the operation involved in this process is logged by default, and attempts to write to the disk whenever CHECKPOINT or BACKUP LOG process is initiated. So to accommodate the changes, SQL Server looks for free space in data and log files and if there is insufficient free space in the data file(s), DBCC DBREINDEX may be unable to rebuild the indexes, or the indexes may be rebuilt with logical fragmentation values above zero. The amount of free space needed varies and is dependent on the number of indexes being created in the transaction.

DBCC CHECKDB process performs a physical consistency check on indexed views and validates the integrity of every object in a database by collecting the information, and then scans the log for any additional changes made, merging the two sets of information together to produce a consistent view of the data at the end of the scan. This process involves excessive locking and in older versions of SQL Server (6.5 and 7.0) this has negative affect by taking the database essentially offline. In SQL Server 2000 it uses and reads the database transaction log to get a consistent view in order to run the CHECKDB online effectively. The process of involving the transaction log is read from the LSN of the 'begin tran' log record of the oldest transaction that is active at the time the database scan started, to the LSN at the time the database scan stops. The affect of REDO and UNDO of the transactions are as follows:

* Log records from transactions that commit during that time are used to generate REDO facts. In the scenario of a row insertion record this would produce a REDO fact of 'a row with these index keys was inserted into page A of table B, index C at slot position X'.

* Log records from transactions that rollback or don't commit during that time are used to generate UNDO facts. A row insert record would produce an UNDO fact of 'a row with these index keys was removed from page A of table B, index C at slot position X'.

The above REDO and UNDO process will use the log extensively by affecting the volatile changes to available free space. It is recommended to have at least 60% free space available in the transaction log if you are executing the DBREINDEX and CHECKDB statements on larger tables. Also, it is a best practice to perform frequent database transaction log backups during this operation in order to keep up the size of database data and log files.

Sunday, October 19, 2008

Defragmenting Tables in SQL Server

Over a period of time, because of the inserts, updates and deletes, the data and the index pages can get fragmented. Here's a script that can help to defragment the database:


--Re-indexes the specified database
CREATE PROCEDURE usp_DefragDatabase
-- We don't use sysname because it might not be long enough.
-- sysname is 128 chars, so we use double that.
@dbname nvarchar(256)
AS
BEGIN
-- Quote the database name with brackets
DECLARE @quoteddbname nvarchar(256)
set @quoteddbname = quotename( @dbname )

-- The outer EXEC is so we can do USE, not allowed in stored procs
-- The inner EXEC does the actual reindex on each table in the
-- specified database

EXEC('
USE '+ @quoteddbname +'
DECLARE @sTableName sysname
DECLARE PKMS_Tables CURSOR LOCAL FOR
select table_name from information_schema.tables
where table_type = ''base table'' order by 1
OPEN PKMS_Tables
FETCH NEXT FROM PKMS_Tables INTO @sTableName

WHILE @@FETCH_STATUS = 0
BEGIN
select @sTablename = quotename(@sTablename, ''[]'')
EXEC('' DBCC DBREINDEX ( ''+@sTableName+'') WITH NO_INFOMSGS'')
FETCH NEXT FROM PKMS_Tables INTO @sTableName
END
CLOSE PKMS_Tables')
END
GO

Moving Large Table to Different File Group

We had a database that was growing very large in size and the database was located on the D: drive that had a 240 GB Capacity. The drive was almost 90% full and the disk was running out of disk space. We had other drives on the server where there was available disk space, so our immediate solutions was to move a few large tables from that drive to different drive.

Here is how we did it. First, we identified the large tables by using sp_spaceused to identify the space used by every table.

Next we decided to create a new filegroup with T-SQL (you can also use SSMS to create a new file group).

ALTER DATABASE SALES ADD FILEGROUP [SECONDERYDATA]

We then had to create a file to point the new Filegroup to a new drive:

ALTER DATABASE SALES
ADD FILE
( NAME = XFILENAME,
FILENAME = 'new path\SALES.ndf',
SIZE = 1MB,
MAXSIZE = 10MB,
FILEGROWTH = 1MB)
TO FILEGROUP [SECONDERYDATA]
GO

Now the database knows that there is another filegroup that it could use for data. Remember the Server \ Database will not start to create new files in the new file group, you will have to explicitly specify it.

Now let's see how you can move an existing table that has a Cluster Index to a different filegroup. First, let's drop the Primary Key constraint with an Move to Option ( We are assuming that there is a cluster index on the PK).

ALTER TABLE [INVOICE]
DROP CONSTRAINT [INVOICE_PK] WITH (MOVE TO SECONDERYDATA)

After the move, we now recreate the PK Constraint:

ALTER TABLE [INVOICE]
ADD CONSTRAINT [INVOICE_PK] PRIMARY KEY CLUSTERED
( [column name] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [SECONDERYDATA]

Remember when you recreate the PK constraint on the Seconderydata filegroup, all the data in that table will automatically be moved to the Seconderydata filegroup. This will only happen in the case of a table that has a primary key constraint and has a clustered index.

The transfer time may depend on the size of the table, so please do not do this during business hours.SQL Server will generally lock the entire table.

Now all you data for that table will be moved to the new file group. Please remember to shrink the database to make the space available to the Operating System.

Thursday, October 16, 2008

Policy-Based Management in SQL Server 2008

Policy-Based Management in SQL Server 2008 allows the database administrator to define policies that tie to database instances and objects. These policies allow the Database Administrator (DBA) to specify rules for which objects and their properties are created, or modified. An example of this would be to create a database-level policy that disallows the AutoShrink property to be enabled for a database. Another example would be a policy that ensures the name of all table triggers created on a database table begins with tr_.

As with any new SQL Server technology (or Microsoft technology in general), there is a new object naming nomenclature associated with Policy-Based Management. Below is a listing of some of the new base objects.

Policy

A Policy is a set of conditions specified on the facets of a target. In other words, a Policy is basically a set of rules specified for properties of database or server objects.

Target
A Target is an object that is managed by Policy-Based Management. Includes objects such as the database instance, a database, table, stored procedure, trigger, or index.

Facet
A Facet is a property of an object (target) that can be involved in Policy Based Management. An example of a Facet is the name of a Trigger or the AutoShrink property of a database.

Condition
A Condition is the criteria that can be specify for a Target’s Facets. For example, you can set a condition for a Fact that specifies that all stored procedure names in the Schema ‘Banking’ begin with the name ‘bnk_’.

You can also assign a policy to a category. This allows you manage a set of policies assigned to the same category. A policy belongs to only one category.

Policy Evaluation Modes
A Policy can be evaluated in a number of different ways:

* On demand - The policy is evaluated only when directly ran by the administrator.
* On change: prevent - DDL triggers are used to prevent policy violations.
* On change: log only - Event notifications are used to check a policy when a change is made.
* On schedule - A SQL Agent job is used to periodically check policies for violations.

Advantages of Policy Based Management
Policy-Based Management gives you much more control over your database procedures as a DBA. You as a DBA have the ability to enforce your paper policies at the database level. Paper polices are great for defining database standards are guidelines. However, it takes time and effort to enforce these. To strictly enforce them, you have to go over your database with a fine-toothed comb. With Policy-Based Management, you can define your policies and rest assured that they will be enforced.

Wednesday, October 15, 2008

Performance Tuning SQL Server Cursors

Using cursors can reduce concurrency and lead to unnecessary locking and blocking. To help avoid this, use the READ_ONLY cursor option if applicable, or if you need to perform updates, try to use the OPTIMISTIC cursor option to reduce locking. Try to avoid the SCROLL_LOCKS cursor option, which reduces concurrency. [6.5, 7.0, 2000, 2005]

*****

When you are done using a cursor, don't just CLOSE it, you must also DEALLOCATE it. Deallocation is required to free up the SQL Server resources used by the cursor. If you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you don't DEALLOCATE your cursors, the resources used by the cursor will stay allocated, degrading the performance of your server until they are released. [6.5, 7.0, 2000, 2005]

ISNULL() vs. COALESCE()

There are a some minor differences between COALESCE() and ISNULL():

* COALESCE() is ANSI standard, so that is an advantage for the purists out there.

* Many consider ISNULL()'s readability and common sense naming to be an advantage. While I will agree that it easier to spell and pronounce, I disagree that its naming is intuitive. In other languages such as VB/VBA/VBScript, ISNULL() accepts a single input and returns a single boolean output.

* ISNULL() accepts exactly two parameters. If you want to take the first non-NULL among more than two values, you will need to nest your ISNULL() statements. COALESCE(), on the other hand, can take multiple inputs:

SELECT ISNULL(NULL, NULL, 'foo')

-- yields:
Server: Msg 174, Level 15, State 1, Line 1
The isnull function requires 2 arguments.

SELECT COALESCE(NULL, NULL, 'foo')

-- yields:
----
foo

In order to make this work with ISNULL(), you would have to say:

SELECT ISNULL(NULL, ISNULL(NULL, 'foo'))

* The result of ISNULL() always takes on the datatype of the first parameter (regardless of whether it is NULL or NOT NULL). COALESCE works more like a CASE expression, which returns a single datatype depending on precendence and accommodating all possible outcomes. For example:

DECLARE @foo VARCHAR(5)
SET @foo = NULL
SELECT ISNULL(@foo, '123456789')

-- yields:
-----
12345

SELECT COALESCE(@foo, '123456789')

-- yields:
---------
123456789

This gets more complicated if you start mixing incompatible datatypes, e.g.:

DECLARE @foo VARCHAR(5), @bar INT
SET @foo = 'foo'
SET @bar = NULL

SELECT ISNULL(@foo, @bar)
SELECT COALESCE(@foo, @bar)

-- yields:

-----
foo

Server: Msg 245, Level 16, State 1, Line 6
Syntax error converting the varchar value 'foo' to a column of data type int.

* A relatively scarce difference is the ability to apply constraints to computed columns that use COALESCE() or ISNULL(). SQL Server views a column created by COALESCE() as nullable, whereas one using ISNULL() is not. So:

CREATE TABLE dbo.Try
(
col1 INT,
col2 AS COALESCE(col1, 0)
PRIMARY KEY
)
GO

-- yields:
Server: Msg 8111, Level 16, State 2, Line 1
Cannot define PRIMARY KEY constraint on nullable column in table 'Try'.
Server: Msg 1750, Level 16, State 1, Line 1
Could not create constraint. See previous errors.

Whereas the following works successfully:

CREATE TABLE dbo.Try
(
col1 INT,
col2 AS ISNULL(col1, 0)
PRIMARY KEY
)
GO


* Finally, COALESCE() can generate a less efficient plan in some situations.

Tuesday, October 14, 2008

SQL Server Data Services - SSDS

SQL Server Data Service (SSDS) is a highly scalable and cost-effective on-demand data storage and query processing web service. It is built on robust SQL Server technologies and helps guarantees a business-ready service level agreement covering high availability, performance and security features. Microsoft SSDS is accessible using standards based protocols (SOAP, REST) for quick provisioning of on-demand data-driven & mashup applications.

Customers can use SQL Server Data Services to store virtually any amount of data in the Cloud. They can query and modify data as required by the specific business scenarios. SQL Server Data Services support standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit. The primary wire format is XML. Developers and service providers can quickly run on-demand applications with ease. The data has flexible schema which can be modified dynamically by the application. The data is provided with high availability and reliability virtually anywhere, anytime.

Technical Benefits are
* Easy access with REST and SOAP protocols
* Flexible, WEB 2.0 friendly data model and wire format (XML and others) with
query language
* Scale, with virtually no restriction on storage
* High availability, almost always available from any location
* Fast, reliable access, durable backup
* Easy to deploy solutions and modify user data
* Geo replication to help ensure business continuity
* Built on robust Microsoft SQL Server database technologies

Check more on
http://www.microsoft.com/sql/dataservices/default.mspx

Force SQL Server 2005 logins to change their passwords periodically

As you might be aware, SQL Server 2005 includes a new way to enforce password policy and expirations for SQL Server login IDs. This was not the case in previous SQL Server versions. In the past, the only option you had was to use Windows Authentication if you wanted to enforce a password policy, but this was not always possible. Many applications can't use Windows Authentication and SQL Server authentication is the only option.

If you currently are using SQL Server 2000 or earlier, and need the ability enforce a password policy on SQL Server logins, then this is a good incentive to upgrade to SQL Server 2005.

In SQL Server 2005, there are two ways to turn on a password policy for SQL Server logins. You can use either Management Studio's GUI interface, or you can use Transact-SQL.

If you want to use Management Studio to enforce password policies, when you create a new login, or if you edit an existing login, you can have to select which of the following policies you want to enforce:

* Enforce password policy
* Enforce password expiration
* User must change password at next login

If you choose one or more of the above options for each of the SQL Serve logins on your SQL Server 2005 instance, then SQL Server 2005 will enforce these rules based on the password rules already set for your domain.

If you want to use Transact-SQL to add password policy enforcement, you can do so using new options available in the CREATE LOGIN or the ALTER LOGIN commands.

For more info
http://www.sqlservercentral.com/articles/SQL+Server+2005+-+Security/sqlserver2005logins/2474/

Sunday, October 12, 2008

Using COALESCE to Build Comma-Delimited String

COALESCE is new function in SQL Server 2005. This is used to create comma-separated value string i.e to create the same resultset with a single SELECT statement. The following code shows how it's done.

DECLARE @EmployeeList varchar(100)

SELECT @EmployeeList = COALESCE(@EmployeeList + ', ', '') +
CAST(Emp_UniqueID AS varchar(5))
FROM SalesCallsEmployees
WHERE SalCal_UniqueID = 1

SELECT @EmployeeList

--Results--

---------
1, 2, 4, 5

The COALESCE function performs the magic here. When @EmployeeList is NULL (the first row processed), it returns an empty string. On subsequent rows, it concatenates the @EmployeeList value with a comma and the current @Emp_UniqueID value.

How do I connect to SQL Server on a port other than 1433

Sometimes an added measure of security can be achieved by using ports other than the defaults for server software. SQL Server allows you to specify which port you want it to run on; the default is 1433. Provided you can access your SQL Server through TCP/IP, the following connection string should help you connect to a different port (this example uses port 1510 on the local machine):

<%
cst = "Provider=SQLOLEDB;" & _
"Data Source=[x.x.x.x],[port number];" & _
"Initial Catalog=[dbname];" & _
"Network=DBMSSOCN;" & _
"User Id=[uid];" & _
"Password=[pwd]"

set conn = CreateObject("ADODB.Connection")
conn.open cst
...
%>

Notice that the IP address and port number are separated by a comma, and that TCP/IP is 'forced' by adding network=DBMSSOCN.

Search for special characters (e.g. %) in SQL Server

There are several characters that have special meaning within a SQL query, for example the percent sign (%) in a LIKE query is a wildcard that essentially means "any number of characters can go here." Likewise, the underscore (_) is a wildcard that says "any single character can go here." So what if you are actually looking for a value that contains a literal percent sign? You will end up with bizarre results if you try the following:

SELECT columns FROM table WHERE
column LIKE '%%%'

Instead, you can try one of the following solutions:

SELECT columns FROM table WHERE
column LIKE '%[%]%'

-- or

SELECT columns FROM table WHERE
column LIKE '%\%%' ESCAPE '\'

The first query 'delimits' the special character with square brackets, telling the engine to treat it as a normal literal character instead of a character with special meaning. The second query uses a custom escape character -- you can use any character you like, just be careful that you aren't also expecting to use it as part of the literal string.

Now, you might be wondering, how do I escape a square bracket? If you have something like this:

SELECT columns FROM table WHERE
column LIKE '%[SQL Server Driver]%'

The results won't be what you expect, because an opening square bracket is considered a special character. Surprisingly, you can avoid this problem in much the same way, by one of the following two queries:

SELECT columns FROM table WHERE
column LIKE '%[[]SQL Server Driver]%'

-- or

SELECT columns FROM table WHERE
column LIKE '%\[SQL Server Driver]%' ESCAPE '\'

You can do this replacement at the ASP side, before passing the string in, or within the SQL Server code itself.

How do I enable or disable connection pooling?

Connection pooling is enabled by default for SQL Server and Oracle, so you shouldn't need to configure anything to take advantage of this feature. (Connection pooling allows you to reuse existing connections from a "pool" instead of instantiating and creating a new connection object from scratch.)

The number of seconds before an idle connection in the pool is discarded is stored in the following registry key:

\HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\\CPTimeout

You will notice that this key is absent for file-based database platforms, such as Access, dBase and Excel. If you are using Access through the Jet/OLEDB driver, you can add the following to the connection string to force connection pooling:

OLE DB Services=-1

Please test the performance of your app with and without this setting.

Microsoft states:

"To make the best use of connection pooling, explicitly close database connections as soon as possible. By default, a connection terminates after your script finishes execution. However, by explicitly closing a connection in your script after it is no longer needed, you reduce demand on the database server and make the connection available to other users."

Thursday, October 9, 2008

How to Performance Tune Microsoft SQL Server During Setup

For best overall performance, locate the database files (.mdf) and log files (.ldf) on separate arrays in your server to isolate potentially conflicting reads and writes. [6.5, 7.0, 2000, 2005]

To store your database files (.mdf), the best performance is gained by storing them on RAID 10 arrays. If this is too expensive, then RAID 5 is most likely the next best bet. Each RAID array (5 or 10) should have as many physical disks in the array as the controller will support. This allows reads and writes to be performed simultaneously on each physical drive in the array, boosting disk I/O. [6.5, 7.0, 2000, 2005]

Wednesday, October 8, 2008

Exists Vs. Count(*) - The battle never ends..

Many of the database applications written today still disregard some basic rules of thumb when it comes to accessing the data. One in particular is the use of COUNT(*) to check to see if there are any rows that match some criteria. The technique of using EXISTS over COUNT(*) has been widely publicized and is in pretty much every best practices document I have come across. So why are database developers still using COUNT(*) instead of EXISTS all over the place? If it is because people just don't believe or recognize the benefits of it? Or is the concept of EXISTS difficult for new programmers to grasp? Lets see demo of why EXISTS is almost always a better way to code when you simply need to see if there is at least 1 row that matches some condition in the WHERE clause. Of course if you really need to know exactly how many match that condition then COUNT(*) is appropriate so hopefully this won't confuse anyone in that regard.

Lets use the Adventureworks database and turn statistics IO on so we can see the number of reads associated with each query. We will then compare COUNT(*) with EXISTS so there is no mistake on how much work is being done in relation to each other.

USE Adventureworks
GO

SET STATISTICS IO ON

GO

First lets look at a situation in which there is an index to satisfy the WHERE clause and there are only 2 matching rows:

IF (SELECT COUNT(*) FROM sales.salesorderdetail
WHERE ProductID = 870 ) > 0

Print 'Yes'

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ProductID = 870)
Print 'Yes'

Table 'SalesOrderDetail'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As we can see from the Logical reads there was only a difference of 1 between the two queries since the amount of matching rows was so small. So in this case there was not a significant difference although it was still 1/3 more expensive to use COUNT(*) instead of EXISTS.

Now lets do the same but with 4688 matching rows:

IF (SELECT COUNT(*) FROM sales.salesorderdetail
WHERE ProductID = 897 ) > 0

Print 'Yes'

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ProductID = 897)
Print 'Yes'

Table 'SalesOrderDetail'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now the cost for the COUNT(*) is over 5 times the EXISTS. This still may not seem like a lot to you. But if you were calling this queries thousands of times a second this would be a big deal. And remember this index is pretty small overall still.

OK now lets try this on a column with no index in that same table. There are 357 rows that match but there are a total of 121,317 in the table.

IF (SELECT COUNT(*) FROM sales.salesorderdetail
WHERE ModifiedDate = '20010701 00:00:00.000' ) > 0

Print 'Yes'

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ModifiedDate = '20010701 00:00:00.000')
Print 'Yes'

Table 'SalesOrderDetail'. Scan count 1, logical reads 1241, physical reads 0, read-ahead reads 331, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This is now almost 250 times more expensive to do a COUNT(*) vs. an EXISTS. Both queries scanned the table but the EXISTS was able to at least do a partial scan do to the fact it can stop after it finds the very first matching row. Where as the COUNT(*) must read each and every row in the entire table to determine if they match the criteria and how many there are. That is the key folks. The ability to stop working after the first row that meets the criteria of the WHERE clause is what makes EXISTS so efficient. The optimizer knows of this behavior and can factor that in as well. Now keep in mind that these tables are relatively small compared to most databases in the real world. So the figures of the COUNT(*) queries would be multiplied many times on larger tables. You could easily get hundred's of thousands of reads or more on tables with millions of rows but the EXISTS will still only have just a few reads on any queries that can use an index to satisfy the WHERE clause.

Hopefully this will help to persuade those last holdouts who insist on using COUNT(*) everywhere even when EXISTS is the clear choice. One last note, make sure to turn off the statistics IO when done.

SET STATISTICS IO OFF

Sunday, October 5, 2008

Deleting Duplicate batches of rows

This code works in SQL Server 2005 and above, because it uses the delete top (N) statement. Although using a cursor is not always a good idea but there are situations where we have to use it.

if object_id('tempdb.dbo.##Employee') IS NOT NULL
DROP TABLE dbo.##Employee

if object_id('tempdb.dbo.##Emp1') IS NOT NULL
DROP TABLE dbo.##Emp1

create table dbo.##Employee
(
EmpName varchar(30)
)

insert into ##Employee(EmpName) values('Abc')
insert into ##Employee(EmpName) values('Abc')
insert into ##Employee(EmpName) values('George')
insert into ##Employee(EmpName) values('Micky')
insert into ##Employee(EmpName) values('Micky')
insert into ##Employee(EmpName) values('Micky')


select empname,(count(1)-1) cnt
into ##Emp1
from ##Employee
group by empname
having count(1) > 1

DECLARE @empname varchar(30), @cnt int
DECLARE cur CURSOR FOR select empname,cnt from ##Emp1
OPEN cur

FETCH NEXT FROM cur INTO @empname, @cnt
WHILE @@FETCH_STATUS = 0
BEGIN

delete top (@cnt) from ##Employee where empname = @empname

FETCH NEXT FROM cur INTO @empname, @cnt
END
CLOSE cur
DEALLOCATE cur

select * from ##Employee

Friday, October 3, 2008

Best Way to Calculate Age

Given a "date of birth" value as a datetime type, what is the best way to calculate the current age (in years, as a tinyint value)?

Choose your answer:
Ans 2

The first answer, "DATEDIFF(yy, DateOfBirth, GETDATE())", is very efficient and readable, but what you have to remember about the DATEDIFF() function is that it first parses the specified DATEPART() from each value, then takes the difference of those. The incorrect expectation is to think it calculates the difference of the values first (to the highest precision available), and then returns that difference as specified by the DATEPART. So the first answer would actually return the person's age at the end of the year. The second answer, "FLOOR(CONVERT(decimal(9, 2), DATEDIFF(d, DateOfBirth, GETDATE())) / 365.0)", is far more accurate, because it calculates the difference in days and then divides by 365. This is roughly correct, but doesn't account for leap years, and thus might be off by a handful of days. The third answer first gets the difference in years, then subtracts 1 if the person has not yet had a birthday this year. This answer is 100% accurate, and is the correct response.

Thursday, October 2, 2008

Best Practice

You receive some data from UI and need to populate table Customer if the name received from UI does not exist in database. Which of options below should be used for this purpose:

1.

IF NOT EXISTS ( SELECT * FROM Customer
WHERE Name = @CustomerName )
INSERT INTO [Customer]
( [Name])
VALUES
( @CustomerName)

2.

INSERT INTO [Customer]
( [Name])
SELECT @CustomerName
WHERE NOT EXISTS ( SELECT * FROM Customer
WHERE Name = @CustomerName )

Correct answer:

Option 2 is the one to use

Explanation:

Option 1 has 2 separate statements. The first one applies locks relevant table resources in order to perform check for existence. As soon as it's completed SQL Server releases all applied locks. At the moment parallel SPID may insert a record into the table BEFORE following INSERT statement has applied its locks.

Therefore initial check may be irrelevant at the moment when following INSERT starts. IF there is a unique constraint on Customer.Name INSERT would fail despite you've done the check for existence.

The second option does everything in a single transaction. It applies the locks and holds it until INSERT transaction has finished its job. Another SPID cannot insert another row until all locks on the object applied by INSERT statement are released.

So, the first option is unacceptable, it relies on a user's luck not to have another user doing same thing at the same time. Of course, probability of the failure is quite low but it's a possible event. In active transactional systems with hundreds of transactions per second it does not look so improbable.

Script to run the test:

USE pubs
GO
CREATE TABLE Customer (
ID int IDENTITY(1,1) NOT NULL,
Name nvarchar(100) NOT NULL,
UNIQUE (Name)
)
GO
DECLARE @CustomerName nvarchar(100)
SET @CustomerName = 'Customer2'

IF NOT EXISTS ( SELECT * FROM Customer
WHERE Name = @CustomerName )
BEGIN
WAITFOR DELAY '00:00:05'
/* this 5 sec pause lets you insert same row from another QA window:
DECLARE @CustomerName nvarchar(100)
SET @CustomerName = 'Customer2'
INSERT INTO [Customer]
( [Name])
VALUES
( @CustomerName)
*/

INSERT INTO [Customer]
( [Name])
VALUES
( @CustomerName)

END

SET @CustomerName = 'Customer3'
INSERT INTO [Customer]
( [Name])
SELECT @CustomerName
WHERE NOT EXISTS ( SELECT * FROM Customer
WHERE Name = @CustomerName )

GO
select Object_Id('Customer')

DROP TABLE Customer

How to Performance Tune the Microsoft SQL Server tempdb Database

Heavy activity in the tempdb database can drag down your application's performance. This is especially true if you create one or more large temp tables and then query or join them.

To help speed queries or joins on large temp tables, be sure the AUTOSTATS database option is turned on for tempdb, and then create one or more indexes on these temp tables that can be used by your query or joins. This means that you will need to create the temp table, and then add the appropriate index(s), for the temporary table(s) you create.

In many cases, you will find that this can substantially speed up your application. But like many performance tips, be sure you test this one to see if it actually helps in your particular situation. In some cases, the overhead of creating the index(s) is greater than the time saved by using them. Only through testing will you know which option is best in your situation. [7.0, 2000, 2005]

Wednesday, October 1, 2008

New T-SQL Programmability Features in SQL Server 2008

Microsoft SQL Server 2008 introduces several important new Transact‑SQL programmability features and enhances some existing ones. This paper covers the new features and enhancements and demonstrates them through code samples. It will familiarize you with the new features by providing a high-level functionality overview but does not cover all technical details. You can find details in SQL Server Books Online.

This paper covers the following key new features

· Declaring and initializing variables

· Compound assignment operators

· Table value constructor support through the VALUES clause

· Enhancements to the CONVERT function

· New date and time data types and functions

· Large UDTs

· The HIERARCHYID data type

· Table types and table-valued parameters

· The MERGE statement, grouping sets enhancements

· DDL trigger enhancements

· Sparse columns

· Filtered indexes

· Large CLR user-defined aggregates

· Multi-input CLR user-defined aggregates

· The ORDER option for CLR table-valued functions

· Object dependencies

· Change data capture

· Collation alignment with Microsoft® Windows®

· Deprecation

Transact-SQL Delighters

Transact-SQL delighters are small enhancements that, for the most part, help make the programming experience more convenient. This section covers the following Transact-SQL delighters: declare and initialize variables, compound assignment operators, table value constructor support through the VALUES clause, and the enhanced CONVERT function.

Declare and Initialize Variables

Microsoft SQL Server® 2008 enables you to initialize variables inline as part of the variable declaration statement instead of using separate DECLARE and SET statements. This enhancement helps you abbreviate your code. The following code example demonstrates inline initializations using a literal and a function:

DECLARE @i AS INT = 0, @d AS DATETIME = CURRENT_TIMESTAMP;

SELECT @i AS [@i], @d AS [@d];

Compound Assignment Operators

Compound assignment operators help abbreviate code that assigns a value to a column or a variable. The new operators are:

· += (plus equals)

· -= (minus equals)

· *= (multiplication equals)

· /= (division equals)

· %= (modulo equals)

You can use these operators wherever assignment is normally allowed—for example, in the SET clause of an UPDATE statement or in a SET statement that assigns values to variables. The following code example demonstrates the use of the += operator:

DECLARE @price AS MONEY = 10.00;

SET @price += 2.00;

SELECT @price;

This code sets the variable @price to its current value, 10.00, plus 2.00, resulting in 12.00.

Table Value Constructor Support through the VALUES Clause

SQL Server 2008 introduces support for table value constructors through the VALUES clause. You can now use a single VALUES clause to construct a set of rows. One use of this feature is to insert multiple rows based on values in a single INSERT statement, as follows:

USE tempdb;

IF OBJECT_ID('dbo.Customers', 'U') IS NOT NULL DROP TABLE dbo.Customers;

CREATE TABLE dbo.Customers

(

custid INT NOT NULL,

companyname VARCHAR(25) NOT NULL,

phone VARCHAR(20) NOT NULL,

address VARCHAR(50) NOT NULL,

CONSTRAINT PK_Customers PRIMARY KEY(custid)

);

INSERT INTO dbo.Customers(custid, companyname, phone, address)

VALUES

(1, 'cust 1', '(111) 111-1111', 'address 1'),

(2, 'cust 2', '(222) 222-2222', 'address 2'),

(3, 'cust 3', '(333) 333-3333', 'address 3'),

(4, 'cust 4', '(444) 444-4444', 'address 4'),

(5, 'cust 5', '(555) 555-5555', 'address 5');

Note that even though no explicit transaction is defined here, this INSERT statement is considered an atomic operation. So if any row fails to enter the table, the entire INSERT operation fails.

A table value constructor can be used to define table expressions such as key derived tables and CTEs, and can be used where table expressions are allowed (such as in the FROM clause of a SELECT statement or as the source table in a MERGE statement). The following example demonstrates using the VALUES clause to define a derived table in the context of an outer SELECT statement:

SELECT *

FROM

(VALUES

(1, 'cust 1', '(111) 111-1111', 'address 1'),

(2, 'cust 2', '(222) 222-2222', 'address 2'),

(3, 'cust 3', '(333) 333-3333', 'address 3'),

(4, 'cust 4', '(444) 444-4444', 'address 4'),

(5, 'cust 5', '(555) 555-5555', 'address 5')

) AS C(custid, companyname, phone, address);

The outer query can operate on this table expression like any other table expression, including joins, filtering, grouping, and so on.

Enhanced CONVERT Function

The CONVERT function is enhanced in SQL Server 2008 to support new conversion options between character strings and binary data types. You determine the conversion option to use by specifying a style number as the third argument. Style 0 is the default behavior that was supported in previous SQL Server versions—this style translates character strings to the binary representation of the ASCII codes for the characters and vice versa.

Styles 1 and 2 introduce new functionality. These styles convert a hex string literal to a binary value that contains the same digits when presented in hex form and vice versa. If you use Style 1 when converting from a character string to a binary value, the input character string should include the 0x prefix; if you use Style 2, it should not. Similarly, if you use Style 1 when converting from a binary value to a character string, the output character string will contain the 0x prefix; if you use Style 2, it will not. The following example demonstrates using styles 1 and 2:

SELECT

CONVERT(VARCHAR(12) , 0x49747A696B , 1) AS [Bin to Char 1],

CONVERT(VARBINARY(5), '0x49747A696B', 1) AS [Char to Bin 1],

CONVERT(VARCHAR(12) , 0x49747A696B , 2) AS [Bin to Char 2],

CONVERT(VARBINARY(5), '49747A696B' , 2) AS [Char to Bin 2];

This code produces the following output:

Bin to Char 1 Char to Bin 1 Bin to Char 2 Char to Bin 2

------------- ------------- ------------- -------------

0x49747A696B 0x49747A696B 49747A696B 0x49747A696B

Date and Time Data Types

Before SQL Server 2008, date and time improvements were probably at the top of the list of the most requested improvements for SQL Server—especially the request for separate date and time data types, but also for general enhanced support for temporal data. SQL Server 2008 introduces four new date and time data types—including DATE, TIME, DATETIME2, and DATETIMEOFFSET—as well as new functions that operate on the new types and enhancements to existing functions.

New Data Types

The four new date and time data types provide a split between date and time, support for a larger date range, improved accuracy, and support for a time zone element. The new data types have the compliant semantics and are compatible with the SQL standard. The DATE and TIME data types split the date and time, which in previous versions were consolidated. The DATETIME2 data type is an improved version of DATETIME, providing support for a larger date range and better accuracy. The DATETIMEOFFSET data type is similar to DATETIME2 with the addition of a time zone component. Table 1 describes the new data types, showing their storage in bytes, date-range support, accuracy, recommended entry format for literals, and an example.

Data Type

Storage (bytes)

Date Range

Accuracy

Recommended Entry Format and Example

DATE

3

January 1, 0001, through December 31, 9999 (Gregorian calendar)

1 day

'YYYY-MM-DD'
'2009-02-12'

TIME

3 to 5

100 nanoseconds

'hh:mm:ss.nnnnnnn'
'12:30:15.1234567'

DATETIME2

6 to 8

January 1, 0001, through December 31, 9999

100 nanoseconds

'YYYY-MM-DD hh:mm:ss.nnnnnnn'
'2009-02-12 12:30:15.1234567'

DATETIMEOFFSET

8 to 10

January 1, 0001, through December 31, 9999

100 nanoseconds

'YYYY-MM-DD hh:mm:ss.nnnnnnn [+|-]hh:mm'
'2009-02-12 12:30:15.1234567 +02:00'

Table 1: New Date and Time Data Types

Note that the format 'YYYY-MM-DD' is language neutral for the new data types, but it is language dependent for the DATETIME and SMALLDATETIME data types. The language-neutral format for those data types is 'YYYYMMDD'.

The three new types that contain a time component (TIME, DATETIME2, and DATETIMEOFFSET) enable you to specify the fractional seconds precision in parentheses following the type name. The default is 7, meaning 100 nanoseconds. If you need a fractional second accuracy of milliseconds, such as three for example, you must explicitly specify it: DATETIME2(3).

Metadata information in SQL Server 2008 reports meaningful precision and scale values for the new date and time data types. In the sys.columns view, the precision attribute describes the total number of characters in the default literal string representation of the value, and the scale describes the number of digits in the fractional part of the seconds. In INFORMATION_SCHEMA.COLUMNS, the DATETIME_PRECISION attribute describes the number of digits in the fractional part of the seconds.

The following code shows an example of using the new types:

DECLARE

@d AS DATE = '2009-02-12',

@t AS TIME = '12:30:15.1234567',

@dt2 AS DATETIME2 = '2009-02-12 12:30:15.1234567',

@dto AS DATETIMEOFFSET = '2009-02-12 12:30:15.1234567 +02:00';

SELECT @d AS [@d], @t AS [@t], @dt2 AS [@dt2], @dto AS [@dto];

The new data types are fully supported by the SQL Server Native Client OLE DB and ODBC providers as well as by ADO.NET in Microsoft Visual Studio® 2008. Table 2 shows the mappings between the new SQL Server 2008 data types and the corresponding client provider types.

SQL

ODBC

OLE DB

ADO.NET

DATE

SQL_TYPE_DATE/
SQL_DATE

DBTYPE_DBDATE

DateTime

TIME

SQL_TIME/
SQL_SS_TIME2

DBTYPE_DBTIME/
DBTYPE_DBTIME2

TimeSpan

DATETIMEOFFSET

SQL_SS_TIMESTAMPOFFSET

DBTYPE_DBTIMESTAMPOFFSET

DateTimeOffset

DATETIME2

SQL_TYPE_TIMESTAMP

SQL_TIMESTAMP

DBTYPE_DBTIMESTAMP

DateTime

Table 2: Date and Time Data Type Mappings to Client Providers

New and Enhanced Functions

To support the new date and time data types, SQL Server 2008 introduces new functions and enhances existing functions. The new functions are SYSDATETIME, SYSUTCDATETIME, SYSDATETIMEOFFSET, SWITCHOFFSET, and TODATETIMEOFFSET.

SYSDATETIME, SYSUTCDATETIME, and SYSDATETIMEOFFSET return the current system date and time value. SYSDATETIME returns the current date and time as a DATETIME2 value, SYSUTCDATETIME returns the current date and time in UTC as a DATETIME2 value, and SYSDATETIMEOFFSET returns the current date and time along with the system time zone as a DATETIMEOFFSET value. To get only the current date or only the current time, you can cast the value returned from the SYSUTCDATETIME function to DATE or TIME, as the following example shows:

SELECT

CAST(SYSDATETIME() AS DATE) AS [current_date],

CAST(SYSDATETIME() AS TIME) AS [current_time];

Interestingly, when converting a column whose data type contains both a date and a time component to DATE, the SQL Server query optimizer can still rely on index ordering to process a query more efficiently. This is contrary to the usual behavior, where conversion of a column to a different type prevents the optimizer from relying on index order. The new behavior means that the optimizer might consider an index seek for a query filter that has a conversion to DATE. For example, the plan for the following query performs an index seek on the index on the CurrencyRateDate DATETIME column:

USE AdventureWorks;

SELECT FromCurrencyCode, ToCurrencyCode, EndOfDayRate

FROM Sales.CurrencyRate

WHERE CAST(CurrencyRateDate AS DATE) = '20040701';

The SWITCHOFFSET function adjusts an input DATETIMEOFFSET value to a specified time zone, while preserving the UTC value. The syntax is SWITCHOFFSET(datetimeoffset_value, time_zone). For example, the following code adjusts the current system datetimeoffset value to time zone GMT +05:00:

SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(), '-05:00');

So if the current system datetimeoffset value is February 12, 2009 10:00:00.0000000 -08:00, this code returns the value February 12, 2009 13:00:00.0000000 -05:00.

The TODATETIMEOFFSET function sets the time zone offset of an input date and time value. Its syntax is TODATETIMEOFFSET(date_and_time_value, time_zone).

This function is different from SWITCHOFFSET in several ways. First, it is not restricted to a datetimeoffset value as input; rather it accepts any date and time data type. Second, it does not try to adjust the time based on the time zone difference between the source value and the specified time zone but instead simply returns the input date and time value with the specified time zone as a datetimeoffset value.

The main purpose of the TODATETIMEOFFSET function is to convert types that are not time zone aware to DATETIMEOFFSET by the given time zone offset. If the given date and time value is a DATETIMEOFFSET, the TODATETIMEOFFSET function changes the DATETIMEOFFSET value based on the same original local date and time value plus the new given time zone offset.

For example, the current system datetimeoffset value is February 12, 2009 10:00:00.0000000 -08:00, and you run the following code:

SELECT TODATETIMEOFFSET(SYSDATETIMEOFFSET(), '-05:00');

The value February 12, 2009 10:00:00.0000000 -05:00 is returned. Remember that the SWITCHOFFSET function returned February 12, 2009 13:00:00.0000000 -05:00 because it adjusted the time based on the time zone differences between the input (-08:00) and the specified time zone (-05:00).

As mentioned earlier, you can use the TODATETIMEOFFSET function with any date and time data type as input. For example, the following code takes the current system date and time value and returns it as a datetimeoffset value with a time zone -00:05:

SELECT TODATETIMEOFFSET(SYSDATETIME(), '-05:00');

Functions that accept a date and time part as input now support new parts. The DATEADD, DATEDIFF, DATENAME, and DATEPART functions add support for microseconds and nanoseconds. DATENAME and DATEPART also support the TZoffset part (time zone), with DATEPART also supporting the ISO_WEEK part (ISO 8601 week number).

Many other functions are enhanced in SQL Server 2008 to support the new date and time types, among them the existing date and time functions, type conversion functions (CAST and CONVERT), set and aggregate functions (such as MAX, MIN), metadata functions (such as TYPEPROPERTY, COLUMNPROPERTY), and system functions (such as DATALENGTH, IS_DATE).

Large UDTs

In SQL Server 2005, user-defined types (UDTs) in the CLR were limited to 8,000 bytes. SQL Server 2008 lifts this limitation and now supports large UDTs. Similar to the built-in large object types that SQL Server supports, large UDTs can now reach up to 2 GB in size. If the UDT value does not exceed 8,000 bytes, the database system treats it as an inline value as in SQL Server 2005. If it exceeds 8,000 bytes, the system treats it as a large object and report its size as “unlimited.”

One application where large UDTs are required is to support spatial data. For the most part, 8,000 bytes might be enough to represent a value of a spatial data type, but in some cases, the size requirement of such values might exceed 8,000 bytes. SQL Server 2008 introduces the built-in UDTs GEOMETRY and GEOGRAPHY, which are implemented as large CLR UDTs. These new types introduce support for spatial data. The GEOGRAPHY type supports round earth topology, and the GEOMETRY type supports flat earth topology. Spatial data is covered in detail in a separate paper.

A UDT’s size is defined by the attribute SqlUserDefinedTypeAttribute.MaxByteSize as part of the type’s definition. If this attribute is set to -1, the serialized UDT can reach the same size as other large object types (currently 2 GB); otherwise, the UDT cannot exceed the size specified in the MaxByteSized property.

Client APIs such as OLE DB and ODBC in the SQL Server Native Client and ADO.NET were enhanced to support large UDTs. Down-level clients (SQL Server 2005 and 2000 clients) convert a large UDT to VARBINARY(MAX) and IMAGE, respectively.

HIERARCHYID Data Type

The new HIERARCHYID data type in SQL Server 2008 is a system-supplied CLR UDT that can be useful for storing and manipulating hierarchies. This type is internally stored as a VARBINARY value that represents the position of the current node in the hierarchy (both in terms of parent-child position and position among siblings). You can perform manipulations on the type by using either Transact-SQL or client APIs to invoke methods exposed by the type. Let’s look at indexing strategies for the HIERARCHYID type, how to use the type to insert new nodes into a hierarchy, and how to query hierarchies.

Indexing Strategies

The following code demonstrates how to use the HIERARCHYID type, creating a table named Employees (in the tempdb database for test purposes) with a column called hid that holds the HIERARCHYID value:

USE tempdb;

IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL DROP TABLE dbo.Employees;

CREATE TABLE dbo.Employees

(

empid INT NOT NULL,

hid HIERARCHYID NOT NULL,

lvl AS hid.GetLevel() PERSISTED,

empname VARCHAR(25) NOT NULL,

salary MONEY NOT NULL,

CONSTRAINT PK_Emploees PRIMARY KEY NONCLUSTERED(empid)

);

Notice how the code uses the GetLevel method of the hid column to define the persisted computed column level. This method returns the level of the current node in the hierarchy.

The HIERARCHYID value provides topological sorting, meaning that a child’s sort value is guaranteed to be higher than the parent’s sort value. This definition is transitive; that is, a node’s sort value is guaranteed to be higher than all of its ancestors. So if you create an index on the HIERARCHYID column, the index sorts the data in a depth-first manner—all members of the same subtree are close to each other in the leaf of the index. Such an index can efficiently satisfy requests such as returning all descendents of a node.

You can use another indexing strategy called breadth-first, in which you organize all nodes from the same level close to each other in the leaf of the index by having the column representing the level in the hierarchy first in the key list. Requests that can benefit from such an index include getting all nodes from the same level, getting direct subordinates of a node, and so on. The following code creates both types of indexes on the Employees table:

CREATE UNIQUE CLUSTERED INDEX idx_depth_first ON dbo.Employees(hid);

CREATE UNIQUE INDEX idx_breadth_first ON dbo.Employees(lvl, hid);

Inserting New Nodes

To insert a node into the hierarchy, you must first produce a new HIERARCHYID value that represents the correct position in the hierarchy. Use the HIERARCHYID::GetRoot() method to produce the value for the root node. You use the GetDescendant method to produce a value below a given parent. The GetDescendant method accepts two optional HIERARCHYID input values representing the two nodes between which you want to position the new node.

Note that the GetDescendant method does not guarantee that HIERARCHYID values are unique. To enforce uniqueness, you must define a primary key, a unique constraint, or a unique index on the column.

For example, the following code creates the usp_AddEmp stored procedure, which adds a new node to the hierarchy:

IF OBJECT_ID('dbo.usp_AddEmp', 'P') IS NOT NULL DROP PROC dbo.usp_AddEmp;

GO

CREATE PROC dbo.usp_AddEmp

@empid AS INT,

@mgrid AS INT = NULL,

@empname AS VARCHAR(25),

@salary AS MONEY

AS

DECLARE

@hid AS HIERARCHYID,

@mgr_hid AS HIERARCHYID,

@last_child_hid AS HIERARCHYID;

IF @mgrid IS NULL

SET @hid = HIERARCHYID::GetRoot();

ELSE

BEGIN

SET @mgr_hid = (SELECT hid FROM dbo.Employees WHERE empid = @mgrid);

SET @last_child_hid =

(SELECT MAX(hid) FROM dbo.Employees

WHERE hid.GetAncestor(1) = @mgr_hid);

SET @hid = @mgr_hid.GetDescendant(@last_child_hid, NULL);

END

INSERT INTO dbo.Employees(empid, hid, empname, salary)

VALUES(@empid, @hid, @empname, @salary);

GO

If the input employee is the first node in the hierarchy, the procedure uses the HIERARCHYID::GetRoot() method to assign the hid value for the root. Otherwise, the procedure queries the last child hid value of the new employee’s manager; it then invokes the GetDescendant method to produce a value that positions the new node after the last child of that manager. Run the following code to populate the table with a few employees:

EXEC dbo.usp_AddEmp @empid = 1, @mgrid = NULL, @empname = 'David' , @salary = $10000.00;

EXEC dbo.usp_AddEmp @empid = 2, @mgrid = 1, @empname = 'Eitan' , @salary = $7000.00;

EXEC dbo.usp_AddEmp @empid = 3, @mgrid = 1, @empname = 'Ina' , @salary = $7500.00;

EXEC dbo.usp_AddEmp @empid = 4, @mgrid = 2, @empname = 'Seraph' , @salary = $5000.00;

EXEC dbo.usp_AddEmp @empid = 5, @mgrid = 2, @empname = 'Jiru' , @salary = $5500.00;

EXEC dbo.usp_AddEmp @empid = 6, @mgrid = 2, @empname = 'Steve' , @salary = $4500.00;

EXEC dbo.usp_AddEmp @empid = 7, @mgrid = 3, @empname = 'Aaron' , @salary = $5000.00;

EXEC dbo.usp_AddEmp @empid = 8, @mgrid = 5, @empname = 'Lilach' , @salary = $3500.00;

EXEC dbo.usp_AddEmp @empid = 9, @mgrid = 7, @empname = 'Rita' , @salary = $3000.00;

EXEC dbo.usp_AddEmp @empid = 10, @mgrid = 5, @empname = 'Sean' , @salary = $3000.00;

EXEC dbo.usp_AddEmp @empid = 11, @mgrid = 7, @empname = 'Gabriel', @salary = $3000.00;

EXEC dbo.usp_AddEmp @empid = 12, @mgrid = 9, @empname = 'Emilia' , @salary = $2000.00;

EXEC dbo.usp_AddEmp @empid = 13, @mgrid = 9, @empname = 'Michael', @salary = $2000.00;

EXEC dbo.usp_AddEmp @empid = 14, @mgrid = 9, @empname = 'Didi' , @salary = $1500.00;

Querying the Hierarchy

If you query the hid value, you get its binary representation, which is not very meaningful. You can use the ToString method to get a more logical string representation of the value, which shows the path with a slash sign used as a separator between the levels. For example, run the following query to get both the binary and logical representations of the hid value:

SELECT hid, hid.ToString() AS path, lvl, empid, empname, salary

FROM dbo.Employees

ORDER BY hid;

Recall that HIERARCHYID values provide topological sorting and that the GetLevel method produces the level in the hierarchy. Using these, you can easily produce a graphical depiction of the hierarchy—simply sort the rows by hid, and produce indentation based on the lvl column as follows:

SELECT

REPLICATE(' | ', lvl) + empname AS emp,

hid.ToString() AS path

FROM dbo.Employees

ORDER BY hid;

To get all subordinates of an employee (subtree), you can use a method called IsDescendantOf. This method accepts a node’s HIERARCHYID value as input and returns 1 if the queried node is a descendant of the input node. For example, the following query returns all subordinates—direct and indirect—of employee 3:

SELECT C.empid, C.empname, C.lvl

FROM dbo.Employees AS P

JOIN dbo.Employees AS C

ON P.empid = 3

AND C.hid.IsDescendantOf(P.hid) = 1;

You can also the IsDescendantOf method to return all managers of a given employee. For example, the following query returns all managers of employee 14:

SELECT P.empid, P.empname, P.lvl

FROM dbo.Employees AS P

JOIN dbo.Employees AS C

ON C.empid = 14

AND C.hid.IsDescendantOf(P.hid) = 1;

To get a whole level of subordinates of a certain employee, use the GetAncestor method. This method accepts a number (call it n) as input and returns the HIERARCHYID value of the ancestor of the queried node, n levels above. For example, the following query returns direct subordinates (1 level below) of employee 9:

SELECT C.empid, C.empname

FROM dbo.Employees AS P

JOIN dbo.Employees AS C

ON P.empid = 9

AND C.hid.GetAncestor(1) = P.hid;

Other Supported Methods

You can use several other methods to manipulate the HIERARCHYID data type, including Parse, GetReparentedValue, Read, and Write.

The HIERARCHYID::Parse method converts a canonical string representation of a hierarchical value to HIERARCHYID. This is the same as using CAST( AS HIERARCHYID).

The GetReparentedValue method helps you reparent nodes. It accepts two arguments—@old_root and @new_root—and returns a value in which the @old_root portion of the path is replaced with @new_root. For example, the node you query currently has the path /1/1/2/3/2/ (logical representation), @old_root is /1/1/, and @new_root is /2/1/4/. The GetReparentedValue method returns /2/1/4/2/3/2/. As mentioned earlier in regard to the GetDescendant method, the GetReparentedValue method also does not guarantee unique HIERARCHYID values. To enforce uniqueness, you must define a primary key, a unique constraint, or a unique index on the column.

The Read and Write methods, available only in CLR code, are used to read from a BinaryReader and write to a BinaryWriter. In Transact-SQL, you simply use the CAST function to convert a binary value to a HIERARCHYID value and vice versa. Similarly, you can use the CAST function to convert a logical string representation of the path to HIERARCHYID and vice versa.

Table Types and Table-Valued Parameters

SQL Server 2008 introduces table types and table-valued parameters that help abbreviate your code and improve its performance. Table types allow easy reuse of table definition by table variables, and table-valued parameters enable you to pass a parameter of a table type to stored procedures and functions.

Table Types

Table types enable you to save a table definition in the database and use it later to define table variables and parameters to stored procedures and functions. Because table types let you reuse a table definition, they ensure consistency and reduce chances for errors.

You use the CREATE TYPE statement to create a new table type. For example, the following code defines a table type called OrderIDs in the AdventureWorks database:

USE AdventureWorks;

GO

CREATE TYPE dbo.OrderIDs AS TABLE

( pos INT NOT NULL PRIMARY KEY,

orderid INT NOT NULL UNIQUE );

When declaring a table variable, simply specify the table type name as the data type of the variable. For example, the following code defines a table variable called @T of the OrderIDs type, inserts three rows into the table variable, and then queries it:

DECLARE @T AS dbo.OrderIDs;

INSERT INTO @T(pos, orderid) VALUES(1, 51480),(2, 51973),(3, 51819);

SELECT pos, orderid FROM @T ORDER BY pos;

To get metadata information about table types in the database, query the view sys.table_types.

Table-Valued Parameters

You can now use table types as the types for input parameters of stored procedures and functions. Currently, table-valued parameters are read only, and you must define them as such by using the READONLY keyword.

A common scenario where table-valued parameters are very useful is passing an “array” of keys to a stored procedure. Before SQL Server 2008, common ways to meet this need were based on dynamic SQL, a split function, XML, and other techniques. The approach using dynamic SQL involved the risk of SQL Injection and did not provide efficient reuse of execution plans. Using a split function was complicated, and using XML was complicated and nonrelational. (For details about this scenario, see "Arrays and Lists in SQL Server" by SQL Server MVP Erland Sommarskog at http://www.sommarskog.se/arrays-in-sql.html.)

In SQL Server 2008, you simply pass the stored procedure a table-valued parameter. There is no risk of SQL Injection, and there is opportunity for efficient reuse of execution plans. For example, the following procedure accepts a table-valued parameter of the OrderIDs type with a set of order IDs and returns all orders from the SalesOrderHeader table whose order IDs appear in the input table-valued parameter, sorted by the pos column:

CREATE PROC dbo.usp_getorders(@T AS dbo.OrderIDs READONLY)

AS

SELECT O.SalesOrderID, O.OrderDate, O.CustomerID, O.TotalDue

FROM Sales.SalesOrderHeader AS O

JOIN @T AS T

ON O.SalesOrderID = T.orderid

ORDER BY T.pos;

GO

The following code invokes the stored procedure:

DECLARE @MyOrderIDs AS dbo.OrderIDs;

INSERT INTO @MyOrderIDs(pos, orderid)

VALUES(1, 51480),(2, 51973),(3, 51819);

EXEC dbo.usp_getorders @T = @MyOrderIDs;

Note that when you do not provide a parameter value, a table-valued parameter defaults to an empty table. This is important to stress because this case might be confused with an actual empty table passed to the procedure. Also note that you cannot set variables and parameters of a table type to NULL.

SQL Server 2008 also enhances client APIs to support defining and populating table-valued parameters. Table-valued parameters are treated internally like table variables. Their scope is the batch (procedure, function). They have several advantages in some cases over temporary tables and other alternative methods:

· They are strongly typed.

· SQL Server does not maintain distribution statistics (histograms) for them; therefore, they do not cause recompilations.

· They are not affected by a transaction rollback.

· They provide a simple programming model.

MERGE Statement

The new MERGE statement is a standard statement that combines INSERT, UPDATE, and DELETE actions as a single atomic operation based on conditional logic. Besides being performed as an atomic operation, the MERGE statement is more efficient than applying those actions individually.

The statement refers to two tables: a target table specified in the MERGE INTO clause and a source table specified in the USING clause. The target table is the target for the modification, and the source table data can be used to modify the target.

The semantics (as well as optimization) of a MERGE statement are similar to those of an outer join. You specify a predicate in the ON clause that defines which rows in the source have matches in the target, which rows do not, and which rows in the target do not have a match in the source. You have a clause for each case that defines which action to take—WHEN MATCHED THEN, WHEN NOT MATCHED [BY TARGET] THEN, and WHEN NOT MATCHED BY SOURCE THEN. Note that you do not have to specify all three clauses, but only the ones you need.

As with other modification statements, the MERGE statement also supports the OUTPUT clause, which enables you to return attributes from the modified rows. As part of the OUTPUT clause, you can invoke the $action function, which returns the action that modified the row ('INSERT', 'UPDATE', 'DELETE').

To demonstrate the MERGE statement, the following code creates the tables Customers and CustomersStage in tempdb for test purposes and populates them with sample data:

USE tempdb;

IF OBJECT_ID('dbo.Customers', 'U') IS NOT NULL DROP TABLE dbo.Customers;

CREATE TABLE dbo.Customers

(

custid INT NOT NULL,

companyname VARCHAR(25) NOT NULL,

phone VARCHAR(20) NOT NULL,

address VARCHAR(50) NOT NULL,

CONSTRAINT PK_Customers PRIMARY KEY(custid)

);

INSERT INTO dbo.Customers(custid, companyname, phone, address)

VALUES

(1, 'cust 1', '(111) 111-1111', 'address 1'),

(2, 'cust 2', '(222) 222-2222', 'address 2'),

(3, 'cust 3', '(333) 333-3333', 'address 3'),

(4, 'cust 4', '(444) 444-4444', 'address 4'),

(5, 'cust 5', '(555) 555-5555', 'address 5');

IF OBJECT_ID('dbo.CustomersStage', 'U') IS NOT NULL
DROP TABLE dbo.CustomersStage;

CREATE TABLE dbo.CustomersStage

(

custid INT NOT NULL,

companyname VARCHAR(25) NOT NULL,

phone VARCHAR(20) NOT NULL,

address VARCHAR(50) NOT NULL,

CONSTRAINT PK_CustomersStage PRIMARY KEY(custid)

);

INSERT INTO dbo.CustomersStage(custid, companyname, phone, address)

VALUES

(2, 'AAAAA', '(222) 222-2222', 'address 2'),

(3, 'cust 3', '(333) 333-3333', 'address 3'),

(5, 'BBBBB', 'CCCCC', 'DDDDD'),

(6, 'cust 6 (new)', '(666) 666-6666', 'address 6'),

(7, 'cust 7 (new)', '(777) 777-7777', 'address 7');

The following MERGE statement defines the Customers table as the target for the modification and the CustomersState table as the source. The MERGE condition matches the custid attribute in the source with the custid attribute in the target. When a match is found in the target, the target customer’s attributes are overwritten with the source customer attributes. When a match is not found in the target, a new row is inserted into the target, using the source customer attributes. When a match is not found in the source, the target customer row is deleted:

MERGE INTO dbo.Customers AS TGT

USING dbo.CustomersStage AS SRC

ON TGT.custid = SRC.custid

WHEN MATCHED THEN

UPDATE SET

TGT.companyname = SRC.companyname,

TGT.phone = SRC.phone,

TGT.address = SRC.address

WHEN NOT MATCHED THEN

INSERT (custid, companyname, phone, address)

VALUES (SRC.custid, SRC.companyname, SRC.phone, SRC.address)

WHEN NOT MATCHED BY SOURCE THEN

DELETE

OUTPUT

$action, deleted.custid AS del_custid, inserted.custid AS ins_custid;

This MERGE statement updates three rows (customers 2, 3, and 5), inserts two rows (customers 6 and 7), and deletes two rows (customers 1 and 4).

The MERGE statement lets you specify an additional predicate in all WHEN clauses (add an AND operator followed by the predicate). For the action to take place, besides the original ON predicate, the additional predicate must also hold true.

Grouping Sets

SQL Server 2008 introduces several extensions to the GROUP BY clause that enable you to define multiple groupings in the same query. These extensions are: the GROUPING SETS, CUBE, and ROLLUP subclauses of the GROUP BY clause and the GROUPING_ID function. The new extensions are standard and should not be confused with the older, nonstandard CUBE and ROLLUP options.

GROUPING SETS, CUBE, and ROLLUP Subclauses

To demonstrate the new extensions, let’s query the Orders table that the following code creates and populates:

USE tempdb;

IF OBJECT_ID('dbo.Orders', 'U') IS NOT NULL DROP TABLE dbo.Orders;

CREATE TABLE dbo.Orders

(

orderid INT NOT NULL,

orderdate DATETIME NOT NULL,

empid INT NOT NULL,

custid VARCHAR(5) NOT NULL,

qty INT NOT NULL,

CONSTRAINT PK_Orders PRIMARY KEY(orderid)

);

INSERT INTO dbo.Orders (orderid, orderdate, empid, custid, qty)

VALUES

(30001, '20060802', 3, 'A', 10), (10001, '20061224', 1, 'A', 12),

(10005, '20061224', 1, 'B', 20), (40001, '20070109', 4, 'A', 40),

(10006, '20070118', 1, 'C', 14), (20001, '20070212', 2, 'B', 12),

(40005, '20080212', 4, 'A', 10), (20002, '20080216', 2, 'C', 20),

(30003, '20080418', 3, 'B', 15), (30004, '20060418', 3, 'C', 22),

(30007, '20060907', 3, 'D', 30);

Without the extensions, a single query normally defines one “grouping set” (a set of attributes to group by) in the GROUP BY clause. If you want to calculate aggregates for multiple grouping sets, you usually need multiple queries. If you want to unify the result sets of multiple GROUP BY queries, each with a different grouping set, you must use the UNION ALL set operation between the queries.

You might need to calculate and store aggregates for various grouping sets in a table. By preprocessing and materializing the aggregates, you can support applications that require fast response time for aggregate requests. However, the aforementioned approach, in which you have a separate query for each grouping set, is very inefficient. This approach requires a separate scan of the data for each grouping set and expensive calculation of aggregates.

With the new GROUPING SETS subclause, you simply list all grouping sets that you need. Logically, you get the same result set as you would by unifying the result sets of multiple queries. However, with the GROUPING SETS subclause, which requires much less code, SQL Server optimizes data access and the calculation of aggregates. SQL Server will not necessarily need to scan data once for each grouping set; plus, in some cases it calculates higher-level aggregates based on lower-level aggregates instead of re-aggregating base data.

For example, the following query uses the new GROUPING SETS subclause to return aggregates for four grouping sets:

SELECT custid, empid, YEAR(orderdate) AS orderyear, SUM(qty) AS qty

FROM dbo.Orders

GROUP BY GROUPING SETS (

( custid, empid, YEAR(orderdate) ),

( custid, YEAR(orderdate) ),

( empid, YEAR(orderdate) ),

() );

The grouping sets defined in this query are (custid, empid, YEAR(orderdate)), (custid, YEAR(orderdate)), (empid, YEAR(orderdate)), and (). The last is an empty grouping set representing ALL. It is similar to an aggregate query with no GROUP BY clause, in which you treat the whole table as a single group.

The two new CUBE and ROLLUP subclauses should be considered as abbreviations to the GROUPING SETS subclause. The CUBE subclause produces the power set of the set of elements listed in its parentheses. In other words, it produces all possible grouping sets that can be formed out of the elements listed in parentheses, including the empty grouping set. For example, the following use of CUBE:

CUBE( a, b, c )

is logically equivalent to:

GROUPING SETS((a),(b),(c),(a, b),(a, c),(b, c),(a, b, c),())

For n elements. CUBE produces 2^n grouping sets.

Out of the elements listed in its parentheses, the ROLLUP subclause produces only the grouping sets that have business value, assuming a hierarchy between the elements. For example, the following use of ROLLUP:

ROLLUP( country, region, city )

is logically equivalent to:

GROUPING SETS((country, region, city),(country, region),(country),())

Notice that cases that have no business value, assuming a hierarchy between the elements—such as (city)—were not produced. There might be multiple cities with the same name in the world, and even within the same country, so there is no business value in aggregating them.

Grouping Sets Algebra

You are not restricted to only one subclause in the GROUP BY clause; you can specify multiple subclauses separated by commas. The comma serves as a product operator, meaning that you get a Cartesian product of the grouping sets represented by each subclause. For example, the following code represents a Cartesian product between two GROUPING SETS subclauses:

GROUPING SETS ( (a, b), (c, d) ), GROUPING SETS ( (w, x), (y, z) )

This code is logically equivalent to:

GROUPING SETS ( (a, b, w, x), (a, b, y, z), (c, d, w, x), (c, d, y, z) )

Bearing in mind that the CUBE and ROLLUP options are simply abbreviations of the GROUPING SETS subclause, you can also use CUBE and ROLLUP subclauses as part of a Cartesian product.

GROUPING_ID Function

The GROUPING_ID function lets you identify the grouping set that each result row belongs to. As input, you provide all attributes that participate in any grouping set. The function produces an integer result that is a bitmap, in which each bit represents a different attribute. This way the function produces a unique integer for each grouping set.

The following query shows an example of using the GROUPING_ID function to identify the grouping set:

SELECT

GROUPING_ID(

custid, empid,

YEAR(orderdate), MONTH(orderdate), DAY(orderdate) ) AS grp_id,

custid, empid,

YEAR(orderdate) AS orderyear,

MONTH(orderdate) AS ordermonth,

DAY(orderdate) AS orderday,

SUM(qty) AS qty

FROM dbo.Orders

GROUP BY

CUBE(custid, empid),

ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate));

Notice that in the output, shown here in abbreviated form, each grouping set is represented by a unique integer:

grp_id custid empid orderyear ordermonth orderday qty

------- ------ ----------- ----------- ----------- ----------- -----------

0 C 3 2006 4 18 22

16 NULL 3 2006 4 18 22

24 NULL NULL 2006 4 18 22

25 NULL NULL 2006 4 NULL 22

0 A 3 2006 8 2 10

16 NULL 3 2006 8 2 10

24 NULL NULL 2006 8 2 10

25 NULL NULL 2006 8 NULL 10

0 D 3 2006 9 7 30

16 NULL 3 2006 9 7 30

...

For example, the integer 25 represents the grouping set (orderyear, ordermonth). The bits representing the elements that are part of the grouping set are turned off (ordermonth – 2 and orderyear – 4), and the bits representing the elements that are not part of the grouping set are turned on (orderday – 1, empid – 8, and custid – 16). The integer 25 is achieved by adding the values represented by the bits that are turned on: 1 + 8 + 16 = 25. This feature is especially useful when you need to materialize the aggregates and then query only specific grouping sets. You can cluster the table by the grp_id attribute, which would allow SQL Server to efficiently satisfy a request for a specific grouping set.

DDL Trigger Enhancements

In SQL Server 2008, the type of events on which you can now create DDL triggers is enhanced to include stored procedures that perform DDL-like operations. This gives you more complete coverage of DDL events that you can capture with triggers. The XML schema for events is installed as part of the database engine installation in the directory C:\Program Files\Microsoft SQL Server\100\Tools\Binn\schemas\sqlserver\2006\11\events\events.xsd and can also be found at http://schemas.microsoft.com/sqlserver.

Many stored procedures perform DDL-like operations. Before SQL Server 2008, you could not capture their invocation with a trigger. Now you can capture many new events that fire as a result of calls to such procedures. You can find the full list of trappable events in SQL Server Books Online. For example, the stored procedure sp_rename now fires a trigger created on the new RENAME event. To demonstrate this type of trigger, the following code creates a database called testdb and, within it, a trigger on the RENAME event that prints the source and target entity details for test purposes:

USE master;

GO

IF DB_ID('testdb') IS NOT NULL DROP DATABASE testdb;

CREATE DATABASE testdb;

GO

USE testdb;

GO

CREATE TRIGGER trg_testdb_rename ON DATABASE FOR RENAME

AS

DECLARE

@SchemaName AS SYSNAME =

EVENTDATA().value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname'),

@TargetObjectName AS SYSNAME =

EVENTDATA().value('(/EVENT_INSTANCE/TargetObjectName)[1]', 'sysname'),

@ObjectName AS SYSNAME =

EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname'),

@NewObjectName AS SYSNAME =

EVENTDATA().value('(/EVENT_INSTANCE/NewObjectName)[1]', 'sysname');

DECLARE

@msg AS NVARCHAR(1000) =

N'RENAME event occurred.

SchemaName: ' + @SchemaName + N'

TargetObjectName: ' + @TargetObjectName + N'

ObjectName: ' + @ObjectName + N'

NewObjectName: ' + @NewObjectName;

PRINT @msg;

GO

To test the trigger, the following code creates a table called dbo.T1 with a column called col1 and runs the sp_rename procedure to rename the column to col2:

CREATE TABLE dbo.T1(col1 INT);

EXEC sp_rename 'dbo.T1.col1', 'col2', 'COLUMN';

The trigger on the RENAME event fires and prints the following message:

RENAME event occurred.

SchemaName: dbo

TargetObjectName: T1

ObjectName: col1

NewObjectName: col2

Sparse Columns

Sparse columns are columns that are optimized for the storage of NULLs. To define a column as sparse, specify the SPARSE attribute as part of the column definition. Sparse columns consume no storage for NULLs, even with fixed-length types; however, when a column is marked as sparse, storage of non-NULL values becomes more expensive than usual. Therefore, you should define a column as sparse only when it will store a large percentage of NULLs. SQL Server Books Online provides recommendations for the percentage of NULLs that justify making a column sparse for each data type.

Querying and manipulation of sparse columns is the same as for regular columns, with one exception described later in this paper. For example, the following code creates a table named T1 in tempdb (for test purposes), marks three of its columns with the SPARSE attribute, inserts a couple of rows, and queries the table:

USE tempdb;

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

CREATE TABLE dbo.T1

(

keycol INT NOT NULL PRIMARY KEY,

col1 VARCHAR(20) NOT NULL,

col2 INT SPARSE NULL,

col3 CHAR(10) SPARSE NULL,

col4 NUMERIC(12, 2) SPARSE NULL

);

INSERT INTO dbo.T1(keycol, col1, col2) VALUES(1, 'a', 10);

INSERT INTO dbo.T1(keycol, col1, col4) VALUES(2, 'b', 20.00);

SELECT keycol, col1, col2, col3, col4

FROM dbo.T1;

There are several restrictions on using sparse columns not covered in this paper; see SQL Server Books Online for complete information.

SQL Server 2008 lets you define a column set that combines all sparse columns of a table into a single XML column. You might want to consider this option when you have a large number of sparse columns in a table (more than 1,024) and operating on them individually might be cumbersome.

To define a column set, specify the following as part of the CREATE TABLE statement:

XML column_set FOR ALL_SPARSE_COLUMNS

For example, the following code recreates the table T1 with a column set named cs:

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

CREATE TABLE dbo.T1

(

keycol INT NOT NULL PRIMARY KEY,

col1 VARCHAR(20) NOT NULL,

col2 INT SPARSE NULL,

col3 CHAR(10) SPARSE NULL,

col4 NUMERIC(12, 2) SPARSE NULL,

cs XML column_set FOR ALL_SPARSE_COLUMNS

);

You can use the same code that was used earlier to insert rows into the table, and then query it:

INSERT INTO dbo.T1(keycol, col1, col2) VALUES(1, 'a', 10);

INSERT INTO dbo.T1(keycol, col1, col4) VALUES(2, 'b', 20.00);

SELECT keycol, col1, col2, col3, col4

FROM dbo.T1;

But you can also operate on the column set by using XML operations instead of relational operations. For example, the following code inserts a row into the table by using the column set:

INSERT dbo.T1(keycol, col1, cs)

VALUES(3, 'c', 'CCCCCCCCCC30.00');

NULL is assumed for a column that is omitted from the XML value, such as col2 in this case.

Note that if you have a column set defined in the table, SELECT * does not return the same result as a SELECT statement with an explicit list of all columns. SELECT * returns all sparse columns as a single XML value in the column set. To demonstrate this, run the following code:

SELECT * FROM dbo.T1;

This code returns the following output:

keycol col1 cs

----------- ---------- ------------------------------------------

1 a 10

2 b 20.00

3 c CCCCCCCCCC30.00

If you explicitly list the columns in the SELECT clause, all result columns are returned as relational ones.

Another new feature that can be used in conjunction with sparse columns is filtered indexes. This feature is explained in the next section.

Filtered Indexes and Statistics

SQL Server 2008 introduces filtered indexes and statistics. You can now create a nonclustered index based on a predicate, and only the subset of rows for which the predicate holds true are stored in the index B-Tree. Similarly, you can manually create statistics based on a predicate. The optimizer has the logic to figure out when such filtered indexes and statistics are useful.

Well-designed filtered indexes can improve query performance and plan quality because they are smaller than nonfiltered indexes. Also, filtered statistics—whether created automatically for a filtered index or manually—are more accurate than nonfiltered statistics because they need to cover only a subset of rows from the table.

You can also reduce index maintenance cost by using filtered indexes because there is less data to maintain. This includes modifications against the index, index rebuilds, and the cost of updating statistics. Filtered indexes also obviously reduce storage costs.

Let’s look at a few examples that demonstrate filtered indexes. The following code creates an index on the CurrencyRateID column in the Sales.SalesOrderHeader table, with a filter that excludes NULLs:

USE AdventureWorks;

GO

CREATE NONCLUSTERED INDEX idx_currate_notnull

ON Sales.SalesOrderHeader(CurrencyRateID)

WHERE CurrencyRateID IS NOT NULL;

Considering query filters, besides the IS NULL predicate that explicitly looks for NULLs, all other predicates exclude NULLs, so the optimizer knows that there is the potential to use the index. For example, the plan for the following query shows that the index is used:

SELECT *

FROM Sales.SalesOrderHeader

WHERE CurrencyRateID = 4;

The CurrencyRateID column has a large percentage of NULLs; therefore, this index consumes substantially less storage than a nonfiltered one on the same column. You can also create similar indexes on sparse columns.

The following code creates a nonclustered index on the Freight column, filtering rows where the Freight is greater than or equal to 5000.00:

CREATE NONCLUSTERED INDEX idx_freight_5000_or_more

ON Sales.SalesOrderHeader(Freight)

WHERE Freight >= 5000.00;

The optimizer considers using an index when a subinterval of the index filtered interval is requested in the query filter:

SELECT *

FROM Sales.SalesOrderHeader

WHERE Freight BETWEEN 5500.00 AND 6000.00;

Filtered indexes can also be defined as UNIQUE and have an INCLUDE clause as with regular nonclustered indexes.

SQL/CLR Enhancements

Common language runtime (CLR) support is enhanced in several ways in SQL Server 2008. Enhancements to CLR UDTs were described earlier. This section describes enhancements to user-defined aggregates (UDAs) and table-valued functions (TVFs).

Enhancements to User-Defined Aggregates

The new support for large UDTs was covered earlier in this paper. Similarly, SQL Server 2008 introduces support for large UDAs—the maximum size in bytes of a persisted value can now exceed 8,000 bytes and reach up to 2 GB. To allow a UDA to exceed 8,000 bytes, specify -1 in the MaxByteSize attribute; otherwise, you must specify a value that is smaller than or equal to 8,000, in which case the size in bytes cannot exceed the specified value.

UDAs are enhanced in another way as well—they now support multiple inputs. An example of a UDA that can benefit from both enhancements is one that performs string concatenation. The UDA can accept two input parameters: the column holding the string to concatenate and the character to use as a separator. The UDA can return an output larger than 8,000 bytes.

Enhancements to Table-Valued Functions

CLR TVFs now support a new ORDER clause as part of the CREATE FUNCTION DDL statement. You can use this clause to specify column names in the output table when you know that rows will always be returned in that order. This can help the optimizer when you query the table function and rely on those columns for ordering purposes (such as when they are used in ORDER BY, GROUP BY, DISTINCT, and INSERT when the (?) target has an (?) index).

For example, the following C#® code defines a function called fn_split that accepts a separated list of values and a separator, and then splits the input string to the individual elements. The function returns a table result with two columns—pos and element—with a row for each element. The pos column represents the position of the element in the list, and the element column returns the element itself:

using System;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

using System.Collections;

using System.Collections.Generic;

public partial class TVF

{

// Struct used in string split functions

struct row_item

{

public string item;

public int pos;

}

// Split array of strings and return a table

// FillRowMethodName = "ArrSplitFillRow"

[SqlFunction(FillRowMethodName = "ArrSplitFillRow",

DataAccess = DataAccessKind.None,

TableDefinition = "pos INT, element NVARCHAR(MAX)")]

public static IEnumerable fn_split(SqlString inpStr,

SqlString charSeparator)

{

string locStr;

string[] splitStr;

char[] locSeparator = new char[1];

locSeparator[0] = (char)charSeparator.Value[0];

if (inpStr.IsNull)

locStr = "";

else

locStr = inpStr.Value;

splitStr = locStr.Split(locSeparator,

StringSplitOptions.RemoveEmptyEntries);

//locStr.Split(charSeparator.ToString()[0]);

List SplitString = new List();

int i = 1;

foreach (string s in splitStr)

{

row_item r = new row_item();

r.item = s;

r.pos = i;

SplitString.Add(r);

++i;

}

return SplitString;

}

public static void ArrSplitFillRow(

Object obj, out int pos, out string item)

{

pos = ((row_item)obj).pos;

item = ((row_item)obj).item;

}

}

The function always returns the rows in pos order; however, you cannot rely on this order when querying the function unless you specify an ORDER BY clause in the outer query.

Assuming that the path for the assembly’s .dll file is C:\TVF\TVF\bin\Debug\TVF.dll, the following code creates the assembly in tempdb for test purposes:

USE tempdb;

CREATE ASSEMBLY TVF FROM 'C:\TVF\TVF\bin\Debug\TVF.dll';

The following code registers two functions based on the CLR fn_split function: fn_split_no_order does not have the ORDER clause, and fn_split_order_by_pos specifies the ORDER clause with pos as the ordering column:

CREATE FUNCTION dbo.fn_split_no_order

(@string AS NVARCHAR(MAX), @separator AS NCHAR(1))

RETURNS TABLE(pos INT, element NVARCHAR(4000))

EXTERNAL NAME TVF.TVF.fn_split;

GO

CREATE FUNCTION dbo.fn_split_order_by_pos

(@string AS NVARCHAR(MAX), @separator AS NCHAR(1))

RETURNS TABLE(pos INT, element NVARCHAR(4000))

ORDER (pos)

EXTERNAL NAME TVF.TVF.fn_split;

GO

Now consider the following queries:

SELECT *

FROM dbo.fn_split_no_order(

N'a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',

N',') AS T

ORDER BY pos;

SELECT *

FROM dbo.fn_split_order_by_pos(

N'a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',

N',') AS T

ORDER BY pos;

The first queries the table function that was registered without the ORDER clause, and the second queries the table function that was registered with ORDER(pos). Both request the data sorted by pos. If you examine the execution plans for both queries, you see that the plan for the first query involves sorting, while the plan for the second query does not. Also, the cost estimate for the first plan is about 10 times higher than the cost estimate for the second plan.

Object Dependencies

SQL Server 2008 delivers several objects that provide reliable discovery of object dependencies, replacing the unreliable older sys.sql_dependencies view and the sp_depends stored procedure. The new objects provide information about dependencies that appear in static code, including both schema-bound and non-schema-bound objects as well as cross-database and even cross-server dependencies. The new objects do not cover dependencies that appear in dynamic SQL code or in CLR code.

To see how to query the new object dependency information, first run the following code to create several objects in the tempdb database for test purposes:

USE tempdb;

IF OBJECT_ID('dbo.Proc1', 'P') IS NOT NULL DROP PROC dbo.Proc1;

IF OBJECT_ID('dbo.V1', 'V') IS NOT NULL DROP VIEW dbo.V1;

IF OBJECT_ID('dbo.V2', 'V') IS NOT NULL DROP VIEW dbo.V2;

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

IF OBJECT_ID('dbo.T2', 'U') IS NOT NULL DROP TABLE dbo.T2;

GO

CREATE PROC dbo.Proc1

AS

SELECT * FROM dbo.T1;

EXEC('SELECT * FROM dbo.T2');

GO

CREATE TABLE dbo.T1(col1 INT);

CREATE TABLE dbo.T2(col2 INT);

GO

CREATE VIEW dbo.V1

AS

SELECT col1 FROM dbo.T1;

GO

CREATE VIEW dbo.V2

AS

SELECT col1 FROM dbo.T1;

GO

SQL Server 2008 introduces three new objects that provide object dependency information: the sys.sql_expression_dependencies catalog view, and the sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities dynamic management functions (DMFs).

The sys.sql_expression_dependencies view provides object dependencies by name. It replaces the older sys.sql_dependencies view. The following query against sys.sql_expression_dependencies shows all dependencies in the current database:

SELECT

OBJECT_SCHEMA_NAME(referencing_id) AS srcschema,

OBJECT_NAME(referencing_id) AS srcname,

referencing_minor_id AS srcminorid,

referenced_schema_name AS tgtschema,

referenced_entity_name AS tgtname,

referenced_minor_id AS tgtminorid

FROM sys.sql_expression_dependencies;

This query produces the following output:

srcschema srcname srcminorid tgtschema tgtname tgtminorid

--------- ------- ---------- --------- ------- ----------

dbo Proc1 0 dbo T1 0

dbo V1 0 dbo T1 0

dbo V2 0 dbo T1 0

Notice that the query identified all dependencies in static code but not dependencies in dynamic code (the reference to dbo.T2 from dbo.Proc1).

The sys.dm_sql_referenced_entities DMF provides all entities that the input entity references—in other words, all entities that the input entity depends on. For example, the following code returns all entities that dbo.Proc1 depends on:

SELECT

referenced_schema_name AS objschema,

referenced_entity_name AS objname,

referenced_minor_name AS minorname,

referenced_class_desc AS class

FROM sys.dm_sql_referenced_entities('dbo.Proc1', 'OBJECT');

This code returns the following output:

objschema objname minorname class

--------- --------- --------- ----------------

dbo T1 NULL OBJECT_OR_COLUMN

dbo T1 col1 OBJECT_OR_COLUMN

The output shows that dbo.Proc1 depends on the table dbo.T1 and the column dbo.T1.col1. Again, dependencies that appear in dynamic code are not identified.

The sys.dm_sql_referencing_entities DMF provides all entities that reference the input entity—in other words, all entities that depend on the input entity. For example, the following code returns all entities that depend on dbo.T1:

SELECT

referencing_schema_name AS objschema,

referencing_entity_name AS objname,

referencing_class_desc AS class

FROM sys.dm_sql_referencing_entities('dbo.T1', 'OBJECT');

This code returns the following output:

objschema objname class

--------- --------- ----------------

dbo Proc1 OBJECT_OR_COLUMN

dbo V1 OBJECT_OR_COLUMN

dbo V2 OBJECT_OR_COLUMN

The output shows that dbo.Proc1, dbo.V1, and dbo.V2 depend on dbo.T1.

Change Data Capture

Change data capture is a new mechanism in SQL Server 2008 that enables you to easily track data changes in a table. The changes are read by a capture process from the transaction log and recorded in change tables. Those change tables mirror the columns of the source table and also contain metadata information that can be used to deduce the changes that took place. Those changes can be consumed in a convenient relational format through TVFs.

An extract, transform, and load (ETL) process in SQL Server Integration Services that applies incremental updates to a data warehouse is just one example of an application that can benefit from change data capture.

Here, I’ll demonstrate a simple process of capturing changes against an Employees table in a database called testdb.

Before you can enable tables for change data capture, you must first enable the database for change data capture by using the stored procedure sys.sp_cdc_enable_db. This stored procedure creates in the database several system objects related to change data capture, including the cdc schema, the cdc user, tables, jobs, stored procedures, and functions. To check whether the database is enabled for change data capture, query the is_cdc_enabled column in sys.databases. The following code creates a database called testdb and enables change data capture in the database:

USE master;

IF DB_ID('testdb') IS NOT NULL DROP DATABASE testdb;

CREATE DATABASE testdb;

GO

USE testdb;

EXECUTE sys.sp_cdc_enable_db;

To disable change data capture on the database, use the stored procedure sys.sp_cdc_disable_db.

The capture process starts reading changes from the log and recording them in the change tables as soon as the first table in the database is enabled for change data capture. To enable change data capture for a table, you use the sys.sp_cdc_enable_table stored procedure. Note that SQL Server Agent must be running for the capture processes to work. The following code creates a table NAMED Employees in the testdb database and inserts one employee row into the table:

CREATE TABLE dbo.Employees

(

empid INT NOT NULL,

name VARCHAR(30) NOT NULL,

salary MONEY NOT NULL

);

INSERT INTO dbo.Employees(empid, name, salary) VALUES(1, 'Emp1', 1000.00);

The following code enables the table for change data capture:

EXECUTE sys.sp_cdc_enable_table

@source_schema = N'dbo'

, @source_name = N'Employees'

, @role_name = N'cdc_Admin';

Because this is the first table in the database that is enabled for change data capture, this code also causes the capture process to start (two jobs will start: cdc.testdb_capture and cdc.testdb_cleanup). The @role_name argument enables you to assign a database role that will be granted access to the change data. If the specified role does not exist, SQL Server creates it. By default, changes are captured for all table columns. To capture changes only in a subset of columns, you can specify the list of columns in the @captured_column_list argument. To get metadata information about the columns included in the capture instance, you can use the sys.sp_cdc_get_captured_columns procedure.

If you later want to disable change data capture for a table, use the sys.sp_cdc_disable_table stored procedure. And to check whether a table is enabled for change data capture, query the is_tracked_by_cdc column in the sys.tables view.

To get information about the change data capture configuration for each enabled table, use the stored procedure sys.sp_cdc_help_change_data_capture as follows:

EXECUTE sys.sp_cdc_help_change_data_capture;

The changes are not consumed directly from the change tables but rather through table-valued functions. You can easily identify and consume only the delta of changes that was not yet consumed. Data is requested for changes that lie within a specified range of log serial numbers (LSNs). SQL Server 2008 also provides the mapping functions sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time that help you convert a date-time range to a range of LSNs and vice versa.

Separate functions provide all changes within an interval (cdc.fn_cdc_get_all_changes_) and net changes that took place against distinct rows (cdc.fn_cdc_get_net_changes_), if that option was enabled by the sys.sp_cdc_enable_table procedure.

SQL Server 2008 also has a stored procedure named sys.sp_cdc_get_ddl_history that gives you the DDL change history associated with a specified capture instance.

Collation Alignment with Windows

SQL Server 2008 aligns support for collations with Microsoft Windows Server® 2008, Windows Vista®, Windows Server 2003, and Windows® XP Home Edition. The new SQL Server version adds new collations and revises existing collation versions. You can recognize the new and revised collations by finding the number 100 (the internal version number of SQL Server 2008) in their names. You can use the following query to get the list of new and revised collations:

SELECT name, description

FROM sys.fn_helpcollations() AS C

WHERE name LIKE '%100%';

This query returns more than 1,400 rows representing different variations of 84 new collations.

The new collations introduce significant changes. Many of them address East Asian languages that support supplementary characters. Chinese_Taiwan_Stroke_100 and Chinese_Taiwan_Bopomofo_100 now assign culture-correct weight for each character, specifically the Ext. A + B characters.

The new collations also provide string comparisons between supplementary characters based on linguistic sorting. A binary flag is added for true code point comparisons: binary-code point.

Note that some collations supported in SQL Server 2005 are deprecated in SQL Server 2008, including the Korean, Hindi, Macedonian, and Lithuanian_Classic Windows collations and the SQL_ALTDiction_CP1253_CS_AS SQL collation. These collations are still supported in SQL Server 2008 for backward compatibility, but they do not show up in the Setup collation list and are not retuned when querying the fn_helpcollations function.

Deprecated Features

SQL Server 2008 handles deprecation policy more seriously than previous versions. SQL Server 2008 discontinues support for the following features (partial list):

· 60, 65, and 70 compatibility levels—at a minimum, the database must be at compatibility level 80 (2000)

· DUMP and LOAD commands—use the BACKUP and RESTORE commands instead

· BACKUP LOG WITH TRUNCATE_ONLY, BACKUP_LOG WITH NO_LOG, and BACKUP TRANSACTION statements

· The sp_addalias stored procedure—replace aliases with user accounts and database roles

· sp_addgroup, sp_changegroup, sp_dropgroup, and sp_helpgroup stored procedures—use roles instead

· The Surface Area Configuration Tool

· Window calculations (using the OVER clause) are not allowed in the recursive member of recursive common table expressions (CTEs)

SQL Server provides performance counters and trace events to keep track of deprecated feature usage. The SQLServer:Deprecated Features object provides the Usage counter and an instance for each deprecated feature. As for trace events, the Deprecation Final Support event class occurs when you use a feature that will be removed from the next version of SQL Server. The Deprecation Announcement event class occurs when you use a feature that will be removed from a future version of SQL Server.