SQL Server: November 2008

Sunday, November 30, 2008

Writing Outer Joins in T-SQL

Occasionally someone will ask for my help with a query and say that both a right outer join and a left outer join was tried, and still the expected results were not achieved. That made me realize that some developers do not completely understand outer joins and that an article explaining how to use them might help.

Inner Join Review
The most commonly used join is an INNER JOIN. This type of join combines rows from two tables only when they match on the joining condition. Usually the primary key from one table matches a foreign key on another table, but join conditions can be more complex than that.

(Note: Most of the information in this article can be applied to views as well as tables. For simplicity, the word "table" will be used to mean table or view unless stated otherwise. Keys are not defined on views, but the underlying table’s key column or columns are often included in the view. To keep things simple, let's assume that is the case.)

INNER JOIN will retrieve a results row only where there is a perfect match between the two tables in the join condition. You will also often see one row from one of the tables matching multiple rows in the other table. For example, one customer can have many orders. One order can have many order details. The data on the one side will be repeated for each row on the many side. The following query is an example showing how the information from the Sales.SalesOrderHeader is repeated on each matching row:

SELECT s.SalesOrderID, OrderDate,ProductID
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS d ON s.SalesOrderID = d.SalesOrderID
ORDER BY s.SalesOrderID, ProductID

Outer Join Introduction
OUTER JOIN is used to join two tables even if there is not a match. An OUTER JOIN can be used to return a list of all the customers and the orders even if no orders have been placed for some of the customers. A keyword, RIGHT or LEFT, is used to specify which side of the join returns all possible rows. I like using LEFT because it makes sense to me to list the most important table first. Except for one example demonstrating RIGHT OUTER JOIN, this article will use left joins. Just a note: the keywords INNER and OUTER are optional.

The next example returns a list of all the customers and the SalesOrderID for the orders that have been placed, if any.

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID

It uses the LEFT keyword because the Sales.Customer table is located on the left side and we want all rows returned from that table even if there is no match in the Sales.SalesOrderHeader table. This is an important point. Notice also that the CustomerID column is the primary key of the Sales.Customer table and a foreign key in the Sales.SalesOrderHeader table. This means that there must be a valid customer for every order placed. Writing a query that returns all orders and the customers if they match doesn’t make sense. The LEFT table should always be the primary key table when performing a LEFT OUTER JOIN.

If the location of the tables in the query are switched, the RIGHT keyword is used and the same results are returned:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.SalesOrderHeader s
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID = s.CustomerID

Notice that I didn’t change the join condition at all. It doesn’t matter which side of the equal sign the columns are listed; only where the tables are named is it important.

If I have a LEFT OUTER JOIN, what is returned from the table on the right side of the join where there is not a match? Each column from the right side will return a NULL. Try this query which lists the non-matching rows first:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
ORDER BY s.SalesOrderID

By adding a WHERE clause to check for a NULL SalesOrderID, you can find all the customers who have not placed an order. My copy of AdventureWorks returns 66 customers with no orders:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
WHERE s.SalesOrderID IS NULL

Occasionally, you will need to be more specific. How can you find all the customers who have not placed an order in 2002? There are several ways to solve this problem. You could create a view of all the orders placed in 2002 and join the view on the Sales.Customer table. Another option is to create a CTE, or Common Table Expression, of the orders placed in 2002. This example shows how to use a CTE to get the required results:

WITH s AS
( SELECT SalesOrderID, customerID
FROM Sales.SalesOrderHeader
WHERE OrderDate between '1/1/2002' and '12/31/2002'
)
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN s ON c.customerID = s.customerID
WHERE s.SalesOrderID IS NULL

My favorite technique to solve this problem is much simpler. Additional criteria, in this case filtering on the OrderDate, can be added to the join condition. The query joins all customers to the orders placed in 2002. Then the results are restricted to those where there is no match. This query will return exactly the same results as the previous, more complicated query:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
and s.OrderDate between '1/1/2002' and '12/31/2002'
WHERE s.SalesOrderID IS NULL

Using Aggregates with Outer Joins
Aggregate queries introduce another pitfall watch out for. The following example is an attempt to list all the customers and the count of the orders that have been placed. Can you spot the problem?

SELECT c.CustomerID, count(*) OrderCount
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID
ORDER BY OrderCount

Now the customers with no orders look like they have placed one order. That is because this query is counting the rows returned. To solve this problem, count the SalesOrderID column. NULL values are eliminated from the count.

SELECT c.CustomerID, count(SalesOrderID) OrderCount
FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s
ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID
ORDER BY OrderCount

Multiple Joins
Once more than two tables are involved in the query, things get a bit more complicated. When a table is joined to the RIGHT table, a LEFT OUTER JOIN must be used. That is because the NULL rows from the RIGHT table will not match any rows on the new table. An INNER JOIN causes the non-matching rows to be eliminated from the results. If the Sales.SalesOrderDetail table is joined to the Sales.SalesOrderHeader table and an INNER JOIN is used, none of the customers without orders will show up. NULL cannot be joined to any value, not even NULL.

To illustrate this point, when I add the Sales.SalesOrderDetail table to one of the previous queries that checked for customers without orders, I get back no rows at all.

SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
INNER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL

To get correct results, change the INNER JOIN to a LEFT JOIN.

SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
LEFT OUTER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL

What about additional tables joined to Sales.Customer, the table on the left? Must outer joins be used? If it is possible that there are some rows without matches, it must be an outer join to guarantee that no results are lost. The Sales.Customer table has a foreign key pointing to the Sales.SalesTerritory table. Every customer’s territory ID must match a valid value in Sales.SalesTerritory. This query returns 66 rows as expected because it is impossible to eliminate any customers by joining to Sales.SalesTerritory:

SELECT c.CustomerID, s.SalesOrderID, t.Name
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
INNER JOIN Sales.SalesTerritory t ON c.TerritoryID = t.TerritoryID
WHERE SalesOrderID IS NULL

Sales.SalesTerritory is the primary key table; every customer must match a valid territory. If you wanted to write a query that listed all territories, even those that had no customers, an outer join will be used. This time, Sales.Customers is on the right side of the join.

SELECT t.Name, CustomerID
FROM Sales.SalesTerritory t
LEFT OUTER JOIN Sales.Customer c ON t.TerritoryID =c.TerritoryID

SQL Server Connection Strings

Some common and not so common connection strings for the .NET SqlConnection object. The article includes .NET sample code and some tricks to increase the supportability of your application.

Trusted Authentication
Data Source=ServerName; Initial Catalog=DatabaseName; Integrated Security=SSPI;

Trusted authentication uses the security credentials of the current user to make the connection to SQL Server. SQL Server uses Windows (or Active Directory) to validate the current user. ServerName can be the name of a server or the name of a SQL Server instance such as Server1\Instance2. ServerName can also be expressed as an IP address. SSPI stands for Security Support Provider Interface (in you were curious).

SQL Server Security Authentication

Data Source=ServerName; Initial Catalog=DatabaseName; User Id=UserName; Password=UserPassword;

In SQL Server authentication SQL Server stores the username and password. ServerName can be the name of a server or the name of a SQL Server instance such as Server1\Instance2. ServerName can also be expressed as an IP address.

Setting the Application Name
Data Source=ServerName; Initial Catalog=DatabaseName; Integrated Security=SSPI; Application Name=MyAppName;

I often set the Application Name when I construct connections strings. Whatever text you assign to Application Name will appear in a couple of different places:

* It will be displayed in Profiler under the Application Name column.
* It will be shown in the output of sp_who2 in the Program Name column.
* It will be shown in the Activity Monitor in the Application column. You can get to the Activity Monitor in SQL Server Management Studio by Management -> Activity Monitor.
* It will appear in the program_name column if you select from master.dbo.sysprocesses (for SQL Server 2000)
* It will appear int he program_name column if you select from sys.dm_exec_sessions (for SQL Server 2005 and later).

Setting the application name makes it very easy to find out what applications are issuing particular SQL statements against my database. Setting the application name can also lead to an increase in the number of connections to your SQL Server. Each client that uses connection pooling will create one pool inside each application per unique connection string. If you use multiple application names you have the possibility to increase the number of pools and thus the number of connections to SQL Server. I've always found it more beneficial to have the application name than to have a few less connections to my database servers.

Using MARS (Multiple Active Result Sets)

Data Source=ServerName; Initial Catalog=DatabaseName; Integrated Security=SSPI; MultipleActiveResultSets=True;

If you want to use MARS you'll need to enable it in the connection string.

Sample .NET code
There are two common ways to create a connection string in .NET. The first is to use an explicit connection string.

SqlConnection conn = new SqlConnection();
conn.ConnectionString = "Data Source=L40; Initial Catalog=master; Integrated Security=SSPI;";

The second is to use the Connection String Builder object in .NET to construct a connection string.

SqlConnectionStringBuilder csb = new SqlConnectionStringBuilder();
csb.DataSource = "L40";
csb.InitialCatalog = "master";
csb.IntegratedSecurity = true;

SqlConnection conn = new SqlConnection();
conn.ConnectionString = csb.ToString();

Thursday, November 27, 2008

Table Information View -- No Cursors!

One of the most frequent request I get is for a procedure that can return sp_SpaceUsed type information for every table in a database. This is easy enough to do with cursors and dynamic SQL, but after looking at how sp_SpaceUsed worked and how SMO gets the same information, I decided that I could write it without either.

Even better, one I wrote it though, I realized that it could easily be rewritten as a view. Now I could reuse it by joining it with other tables and views in new queries or procedures whenever I wanted. Enjoy!

/*
vwTableInfo - Table Information View

This view display space and storage information for every table in the database.
Columns are:
Schema
Name
Owner may be different from Schema)
Columns count of the max number of columns ever used)
HasClusIdx 1 if table has a clustered index, 0 otherwise
RowCount
IndexKB space used by the table's indexes
DataKB space used by the table's data

16-March-2008, RBarryYoung@gmail.com
*/
--CREATE VIEW vwTableInfo
-- AS
SELECT SCHEMA_NAME(tbl.schema_id) as [Schema]
, tbl.Name
, Coalesce((Select pr.name
From sys.database_principals pr
Where pr.principal_id = tbl.principal_id)
, SCHEMA_NAME(tbl.schema_id)) as [Owner]
, tbl.max_column_id_used as [Columns]
, CAST(CASE idx.index_id WHEN 1 THEN 1 ELSE 0 END AS bit) AS [HasClusIdx]
, Coalesce( ( select sum (spart.rows) from sys.partitions spart
where spart.object_id = tbl.object_id and spart.index_id < 2), 0) AS [RowCount]

, Coalesce( (Select Cast(v.low/1024.0 as float)
* SUM(a.used_pages - CASE WHEN a.type <> 1 THEN a.used_pages WHEN p.index_id < 2 THEN a.data_pages ELSE 0 END)
FROM sys.indexes as i
JOIN sys.partitions as p ON p.object_id = i.object_id and p.index_id = i.index_id
JOIN sys.allocation_units as a ON a.container_id = p.partition_id
where i.object_id = tbl.object_id )
, 0.0) AS [IndexKB]

, Coalesce( (Select Cast(v.low/1024.0 as float)
* SUM(CASE WHEN a.type <> 1 THEN a.used_pages WHEN p.index_id < 2 THEN a.data_pages ELSE 0 END)
FROM sys.indexes as i
JOIN sys.partitions as p ON p.object_id = i.object_id and p.index_id = i.index_id
JOIN sys.allocation_units as a ON a.container_id = p.partition_id
where i.object_id = tbl.object_id)
, 0.0) AS [DataKB]
, tbl.create_date, tbl.modify_date

FROM sys.tables AS tbl
INNER JOIN sys.indexes AS idx ON (idx.object_id = tbl.object_id and idx.index_id < 2)
INNER JOIN master.dbo.spt_values v ON (v.number=1 and v.type='E')

Sunday, November 23, 2008

SQL Server Query Execution Plan Analysis

In most cases, the query optimizer will analyze joins and JOIN the tables using the most efficient join type, and in the most efficient order. But not always. In the graphical query plan you will see icons that represent the different types of JOINs used in the query. In addition, each of the JOIN icons will have two arrows pointing to it. The upper arrow pointing to the JOIN icon represents the outer table in the join, and the lower arrow pointing to the JOIN icon represent the inner table in the join. Follow the arrows back to see the name of the tables being joined.

Sometimes, in queries with multiple JOINs, tracing the arrow back won't reveal a table, but another JOIN. If you place the cursor over the arrows pointing to the upper and lower JOINs, you will see a popup window that tells you how many rows are being sent to the JOIN for processing. The upper arrow should always have fewer rows than the lower arrow. If not, then the JOIN order selected by the query optimizer might be incorrect (see more on this below).

First of all, let's look at JOIN types. SQL Server can JOIN a table using three different techniques: nested loop, hash, and merge. Generally, the fastest type of join in a nested loop, but if that is not feasible, then a hash JOIN or merge JOIN is used (as appropriate), both of which tend to be slower than the nested JOIN.

When very large tables are JOINed, a merge join, not a nested loop join, may be the best option. The only way to know is to try both and see which one is the most efficient.

If a particular query is slow, and you suspect it may be because the JOIN type is not the optimum one for your data, you can override the query optimizer's choice by using a JOIN hint. Before you use a JOIN hint, you will want to take some time to learn about each of the JOIN types and how each one works. This is a complicated subject, beyond the scope of this tip.

JOIN order is also selected by the query optimizer, which it trying to select the most efficient order to JOIN tables. For example, for a nested loop join, the upper table should be the smaller of the two tables. For hash joins, the same is true; the upper table should be the smaller of the two tables. If you feel that the query optimizer is selecting the wrong order, you can override it using JOIN hints.

In many cases, the only way to know for sure if using a JOIN hint to change JOIN type or JOIN order will boost or hinder performance is to give them a try and see what happens. [7.0, 2000, 2005]

Thursday, November 20, 2008

SQL Server Full-Text Search Performance Tuning and Optimization

If you are using the SQL Server Full-Text Search service, Microsoft recommends these two settings for optimum performance:

* The virtual memory (PAGEFILE.SYS file) setting for your operating system should be set to an amount equal to 3 times the amount of physical RAM in the server. If you have a non-dedicated SQL Server (a server running applications in addition to SQL Server) then you will want to add the virtual memory needs of these other applications to the amount calculated above.

* The SQL Server MAX SERVER MEMORY setting should be set manually (dynamic memory allocation is turned off) so that enough virtual memory is left for the Full-Text Search service to run. To achieve this, select a MAX SERVER MEMORY setting that once set, leaves enough virtual memory so that the Full-Text Search service is able to access an amount of virtual memory equal to 1.5 times the amount of physical RAM in the server. This will take some trial and error to achieve this setting.

To find out how much virtual memory is being used by SQL Server and the Full-Text Search Service, you can use the Task Manager. By default, the Task Manager does not display the amount of virtual memory used by a process. To see this number in Task Manager, you must first go to the "Processes" tab. Once there, select "View", and then "Select Columns". From the "Select Columns" dialog box, click on "Virtual Memory Size", then "OK". Now you will be able to see the amount of virtual memory size used by each process on your server using Task Manager. Use this information to help you tune your server for use with the Full-Text Search service. [7.0, 2000]

Wednesday, November 19, 2008

Reducing SQL Server Locks

Encapsulate all transactions within stored procedures, including both the BEGIN TRANSACTION and COMMIT TRANSACTION statements in the procedure. This provides two benefits that help to reduce blocking locks.

First, it limits the client application and SQL Server to communications before and after the transaction, thus forcing any messages between the client and the server to occur at a time other than when the transaction is running (reducing transaction time).

Second, it prevents the user from leaving an open transaction (holding locks open) because the stored procedure forces any transactions that it starts to complete or abort. [6.5, 7.0, 2000, 2005]

Tuesday, November 18, 2008

Avoid cursors in SQL Server with these methods to loop over records

Many articles have beaten up on SQL Server cursors -- database objects that manipulate data in a set on a row-by-row basis -- and I want to add my name to the list of people who wish cursors had never been introduced. But, unfortunately, cursors are a fact of life. Problems with cursors include extending locks, their inability to cache execution plans and CPU/RAM overhead. Many T-SQL programmers and DBAs do not know how to successfully loop over records without the need for cursors. In this tip, I'll share some alternatives to cursors that provide looping functionality.

Method 1: Temp table with identity column

In the first approach, we will use a temp table with an identity column added to allow for row-by-row selection. If you're performing an INSERT/UPDATE/DELETE, be sure to use the explicit transactions. This vastly reduces the load on your log file by committing per loop, and it prevents huge rollbacks in the case of failure.

set nocount on
declare @i int --iterator
declare @iRwCnt int --rowcount
declare @sValue varchar(100)

set @i = 1 --initialize

create table #tbl(ID int identity(1,1), Value varchar(100))

insert into #tbl(Value)
select name
from master..sysdatabases (nolock)

set @iRwCnt = @@ROWCOUNT --SCOPE_IDENTITY() would also work

create clustered index idx_tmp on #tbl(ID) WITH FILLFACTOR = 100
/*

Always do this after the insert, since it's faster to add the index in bulk than to update the index as you write into the temp table. Since you know the data in this column, you can set the fill factor to 100% to get the best read times.

*/
while @i <= @iRwCnt
begin
select @sValue = Value from #tbl where ID = @i

--begin tran
print 'My Value is ' + @sValue --replace with your operations on this value
--commit tran

set @i = @i + 1
end
drop table #tbl

Method 2: Temp table without ID

In the second approach, we use a temp table without an identity column and simply grab the top row to process, then loop until we find no more rows to process. If you're performing an INSERT/UPDATE/DELETE, again, be sure to use the explicit transactions to vastly reduce the load on your log file by committing per loop, which prevents huge rollbacks in the case of failure.

set nocount on

declare @i int --iterator
declare @iRwCnt int --rowcount
declare @sValue varchar(100)
set @i = 1 --initialize

create table #tbl(Value varchar(100))

insert into #tbl(Value)
select name
from master..sysdatabases (nolock)

set @iRwCnt = @@ROWCOUNT --SCOPE_IDENTITY() would also work

create clustered index idx_tmp on #tbl(Value) WITH FILLFACTOR = 100
/*

Always do this after the insert, since it's faster to add the index in bulk than to update the index as you write into the temp table. Since you know the data in this column, you can set the fill factor to 100% to get the best read times.

*/

while @iRwCnt > 0
begin
select top 1 @sValue = Value from #tbl
set @iRwCnt = @@ROWCOUNT --ensure that we still have data

if @iRwCnt > 0
begin

--begin tran
print 'My Value is ' + @sValue --replace with your operations on this value
--commit tran

delete from #tbl where value = @sValue --remove processed record
end
end

drop table #tbl

Method 3: Selecting a comma-delimited list of items

When most developers/DBAs are asked to come up with a list of comma-delimited values from a table, they typically use a cursor or temp table (as above) to loop through the records. However, if you do not need to use a GROUP BY or an ORDER BY, then you can use the method below that operates in batch to handle the task. This cannot be used with GROUP BY DISTINCT, or ORDER BY, because of how SQL Server handles those operations.

Basically, this takes a given variable, and for every row in the table it adds the current value to the variable along with a comma.

declare @vrs varchar(4000)
declare @sTbl sysname
set @sTbl = 'TableName'
set @vrs = ''
select @vrs = @vrs + ', ' + name from syscolumns where id = (select st.id from sysobjects as st where name = @sTbl) order by colorder
set @vrs = right(@vrs, len(@vrs)-2)
print @vrs

This article gives you some good reasons why cursors in SQL Server should be avoided as well as some alternatives that give you looping functionality. Keep in mind that SQL Server is designed around batch processing, so the less you loop, the faster your system will run.

Egoless programming: The path to better code

Egoless programming is a way for developers to distance themselves emotionally from their work so that they become open to criticism, often from individuals with less perceived ability. Enter egoless programming.

Some Guidlines

1. Understand and accept that you will make mistakes.
The point is to find them early, before they make it into production. Fortunately, except for the few of us developing rocket guidance software at JPL, mistakes are rarely fatal in our industry. We aren’t surgeons; we can learn, laugh, and move on.

2. You are not your code.
Remember, the entire point of a review is to find problems, and problems will be found. Don’t take it personally when a problem is uncovered.

3. No matter how much karate you know, someone else will always know more.
This fact kept the samurai from indiscriminately attacking people in Imperial Japan. In our less violent times, such an individual can teach you some new moves if you ask. There will always be people who know more than you. Seek and accept input from others, even when you think it’s not needed.

4. Don’t rewrite other programmers’ code.
There’s a fine line between “fixing other programmers’ code” and “rewriting other programmers’ code.” The former implies that a bug or other functionality problem exists and needs to be fixed and can also refer to correcting gross readability problems. The latter, however, refers to changes made to code for the sake of style. Programmers fresh from college are often guilty of this. Things like renaming variables, use of a different construct, recommenting, or gratuitous reformatting of white space fall into this category. Such activities, even with the purest of motive, are high hubris and detrimental to team mentality.

5. Treat people who know less than you with respect, deference, and patience.
Nontechnical people who deal with developers on a regular basis almost universally hold the opinion that we are prima donnas at best and crybabies at worst. Becoming angry only reinforces this perception and teaches people to avoid asking questions. This can only harm your work in the long run.

6. The only constant in the world is change.
Be open to it and accept it with a smile. Look at each change to your requirements, platform, or tool as a new challenge, not as some serious inconvenience to be fought.

7. The only true authority stems from knowledge, not from position.
Knowledge engenders authority, and authority engenders respect—so if you want respect in an egoless environment, cultivate knowledge.

8. Fight for what you believe but gracefully accept defeat.
Understand that sometimes your ideas will be overruled. Even if you do turn out to be right, don’t take revenge or say, “I told you so” more than a few times at most, and don't make your dearly departed idea a martyr or rallying cry.

9. Don’t be “the guy in the room.”
Don’t be the guy coding in the dark office emerging only to buy cola. The guy in the room is out of touch, out of sight, and out of control and has no place in an open, collaborative environment.

Following these rules will prevent needless eggshell walking by your coworkers and will allow them to get back to the one activity everyone in this profession enjoys: creating great things with their computers.

Monday, November 17, 2008

"Nothing has changed" - Determining when a procedure has been altered

Someone need to determine when a stored procedure was last altered. Without having implemented a series of DDL triggers, how can this be accomplished?

In Microsoft SQL Server, you can easily retrieve this information from the sys.procedures catalog view. The following query demonstrates this.

SELECT
[name]
,modify_date
,create_date
,*
FROM
sys.procedures

Of course you can take it a step further by limiting the results to a period of time where you know that no changes should have been made. For example, the following query lists all stored procedures that have been changed since August 1, 2007 (the time I last visited this client).

SELECT
[name]
,modify_date
,create_date
,*
FROM
sys.procedures
WHERE
modify_date > '2008-10-01'

Although, using this technique will not allow me to identify who made the change, I can at least determine that a change has taken place.
Cheers!

Reading the transaction log

SELECT *
FROM ::fn_dblog(DEFAULT, DEFAULT) AS l
INNER JOIN sysobjects AS so ON so.name = l.[transaction name]

SELECT so.name AS ObjectName,
so.type AS ObjectType,
MAX(CAST(l.[Begin Time] AS DATETIME)) AS LogTime
FROM ::fn_dblog(DEFAULT, DEFAULT) l
inner join sysobjects so on so.name = l.[transaction name]
--where so.type = 'u'
GROUP BY so.name,
so.type
ORDER BY so.name,
so.type

Sunday, November 16, 2008

SQL Server Join Hints

JOIN hints can be used in a query to specify the type of JOIN the Query Optimizer is to use for the execution plan. The JOIN options are:

* Loop
* Merge
* Hash

The syntax for a JOIN hint is (using an INNER JOIN as an example):

FROM table_one INNER [LOOP | MERGE | JOIN] JOIN table_two

Here's an example:

FROM header_table INNER LOOP JOIN detail_table

As you can see, the JOIN hint is between the type of JOIN (in this example, INNER) and the JOIN keyword. Only one JOIN hint can be used per JOIN in a query. In addition, JOIN hints can only be used when the ANSI JOIN syntax is used, not the older Microsoft JOIN syntax.

The syntax above is not the only option to add a JOIN hint to a query. In addition, you can also use OPTION clause. Using the OPTION clause specifies that the hint be used throughout all of the query. While multiple hints can be added to the OPTION clause, each query hint can be used only once. In addition, only one OPTION clause can be used per query.

Here's an example of using the OPTION clause:

OPTION (INNER) or OPTION (MERGE) or OPTION (HASH)

The Query Optimizer always tries to identify the fastest way to JOIN tables. The fastest JOIN method is the Loop JOIN, followed by the Merge and the Hash JOIN. While the Query Optimizer always tries to perform a Loop JOIN if possible, it is not always possible, and one of the other two types may have to be used.

Before you attempt to use JOIN hint to specify a JOIN type, you should first take a look at your query, and the indexing of the relevant tables, to see if you can find a way, other than using a hint, to induce the Query Optimizer to use a Loop JOIN. Keep in mind that if you specify a Loop JOIN hint for a JOIN that is currently using a HASH JOIN, that you may get worse, not better performance. So always test JOIN hints to see what their end result in performance really is. [7.0, 2000]

Thursday, November 13, 2008

Reducing SQL Server Deadlocks

When a deadlock occurs, by default, SQL Server choose a deadlock "victim" by identifying which of the two processes will use the least amount of resources to rollback, and then returns error message 1205.

But what if you don't like default behavior? Can you change it? Yes, you can, by using the following command:

SET DEADLOCK_PRIORITY { LOW | NORMAL | @deadlock_var }

Where:

Low tells SQL Server that the current session should be the preferred deadlock victim, not the session that incurs the least amount of rollback resources. The standard deadlock error message 1205 is returned.

Normal tells SQL Server to use the default deadlock method.

@deadlock_var is a character variable specifying which deadlock method you want to use. Specify "3" for low, or "6" for normal.

This command is set a runtime for a specified user connection. [2000]

Wednesday, November 12, 2008

The Cost of Function Use In A Where Clause

A common mistake made in writing SQL statements is to wrap filtering columns of a WHERE clause inside of a function. SQL server performance is very dependant on SQL's ability to properly use indexes when it retrieves records. The lifeblood of proper query plan creation is using very deterministic filtering logic on indexed columns (Some refer to this as Sargable Where clauses).

The biggest problem we see is when a filtering column is wrapped inside of a function, the query optimizer does not see the column and if an index exists on that column, the index likely will not be used (of course, just like anything else in our hi-tech world, this is not always the case).
Take this example

In the CadencedEventCustomer table, we have a nonclustered index on FullName1. The simple query below should use this index and should perform an index seek to find the qualifying records instantly (Seek = FAST).

Select
*
FROM
CadencedEventCustomer
WHERE
isNull(FullName1,'') = 'Ed Jones'

Our intention here (albeit incorrect) is to correctly handle records in which the FullName1 field is null. Our concern is that Null records will be missed (more on this later).

Since we are wrapping the FullName1 column in the IsNull function, the query optimizer doesn't see it and the index is not used.

Here is the query plan produced by SQL Server's query engine:

|--Index Scan(OBJECT:([CIPv3].[dbo].[CadencedEventCustomer].[nci_CadencedEventCustomer_FullName1]), WHERE:(isnull([CIPv3].[dbo].[CadencedEventCustomer].[FullName1],'bob')='Ed Jones'))

We see an Index Scan on the FullName1 index (the index exists, but a seek cannot be performed due to the function wrapping the indexed column). A Scan means the entire table is being searched from beginning to end until SQL happens to find all the records satisfying the condition. On a table with millions of records, this obviously takes a long time.

The subtree cost of this query is 10.58. Subtree cost is a relative cost to SQL Server indicating how difficult this query is for SQL Server to retrieve the data. The higher the relative cost, the more of a hit on performance (CPU, IO, RAM, etc). From my experience, anything in the double digit range is getting pretty spendy and will not perform well in a production environment. In this case, the cost is attributed to the inability of the optimizer to use an index which means SQL Server is having to do a full table scan to return all the data for the query result...OUCH.

If we remove the IsNull() function our query looks like this:

Select
*
FROM
CadencedEventCustomer
WHERE
FullName1 = 'Ed Jones'

This is a much cleaner WHERE clause and SQL Server will now see the indexed column and will use the index correctly:

|--Index Seek(OBJECT:([CIPv3].[dbo].[CadencedEventCustomer].[nci_CadencedEventCustomer_FullName1]), SEEK:([CIPv3].[dbo].[CadencedEventCustomer].[FullName1]='Ed Jones') ORDERED FORWARD)

Notice we now have a SEEK (Fast). The query subtree cost is now at 0.0093 !

Keep in mind, this behavior is true for ALL Functions (Not just IsNull). We should try very hard to not wrap filtering WHERE clause columns within a function.

Now....as for IsNull. The reason we are using IsNull() in the first place is to return records where the value is null in the column (I recommend not allowing nulls in the database, but that topic is for another article).

However, if we look closely at this WHERE clause, we can immediately see that the net result of our query (in this case) is identical whether we use the IsNull function or not.

Select
*
FROM
CadencedEventCustomer
WHERE
isNull(FullName1,'') = 'Ed Jones'

What this query says is return all records where the FullName1 = 'Ed Jones' and IF the FullName1 column in the database holds a Null Value assume that value is '' (an empty string instead of a Null value).

So....Null values will be replaced with '' which still does NOT Equal 'Ed Jones'. The net effect is that records with NULL will NOT be returned.
So if we change our query to this:

Select
*
FROM
CadencedEventCustomer
WHERE
FullName1 = 'Ed Jones'

We would get exactly the same result.

So the question is, when DO we need IsNull?

Well....if the IsNull function is replacing nulls with a value that satisfies the comparison and returns TRUE, then the IsNull function is more useful (still not needed as we'll see below).

Take this example:

Select
*
FROM
CadencedEventCustomer
WHERE
isNull(FullName1,'Ed Jones') = 'Ed Jones'

In this case, if we find Null values in the FullName1 column, we will replace them with 'Ed Jones'. Notice that 'Ed Jones' satisfies our comparison and will return true. So in a sense, this logic is forcing the Query to return Null values instead of filtering them out. In this case, we can argue that the IsNull function is useful.

Now, having said that, although the IsNull() logic is now useful, the problem is that the column is being hidden from the optimizer and we lose the index relief provided by the index.

We can rewrite this SQL statement to get the same result AND to still get index relief for the optimizer.

Essentially what we want is this:

Select All CadencedEventCustomers Where the FullName1 is 'Ed Jones' OR where the FullName1 is a null value.

So the correct rewrite to still obtain index relief is as follows:

Select
*
FROM
CadencedEventCustomer
WHERE
((FullName1 = 'Ed Jones') OR (FullName1 IS NULL))

Now we get the best of both worlds. We get the correct logic producing the results we want and SQL servers query engine can see the column so we get index relief.
A few examples other than IsNull()

I often see the use of Substring() and other string functions used in WHERE Clauses.
Take the following
WHERE SUBSTRING(MasterDealer.Name,4) = 'Ford'

This is effectively hiding the MasterDealer.Name from the optimizer and is negating the use of any index on the Name column.
The fix
WHERE MasterDealerName Like 'Ford%'

This produces the exact same result, yet also provides index relief.
Another example working with dates
WHERE DateDiff(mm,PlacedOnQueue,GetDate()) >= 30

Again, this is a common technique used when working with DateTime data. This is, again, hiding the PlacedOnQueue column from the optimizer.
The fix
WHERE PlacedOnQueue < DateAdd(mm,-30,GetDate())

The exact same results, yet this time we get index relief.

Keep in mind, this behavior holds true for all functions wrapping columns in your WHERE clause. So remember anytime you feel a need to wrap a WHERE clause filtering column within a function, try really hard of a way to rewrite the statement without function use.

Tuesday, November 11, 2008

Monitor Database Growth

This code provides a way of monitoring the growth of all your databases within a single instance. The first part is the creation of a monitoring table with the initial load of current databases and sizes. The second part is the SQL that can be put in a scheduled job to automate the growth monitoring.

It is recommended that this job is run weekly or monthly, depending on how fast your databases grow. Also, the code was written for SQL 2005 but can easily be altered for SQL 2000 by changing sys.Master_Files to sysaltfiles and sys.databases to sysdatabases. Make sure to change your column names appropriately if you make this alteration!

Create initial table and checks initial data
If exists (Select name from sys.objects where name = 'DBGrowthRate' and Type = 'U')
Drop Table dbo.DBGrowthRate

Create Table dbo.DBGrowthRate (DBGrowthID int identity(1,1), DBName varchar(100), DBID int,
NumPages int, OrigSize decimal(10,2), CurSize decimal(10,2), GrowthAmt varchar(100),
MetricDate datetime)

Select sd.name as DBName, mf.name as FileName, mf.database_id, file_id, size
into #TempDBSize
from sys.databases sd
join sys.master_files mf
on sd.database_ID = mf.database_ID
Order by mf.database_id, sd.name

Insert into dbo.DBGrowthRate (DBName, DBID, NumPages, OrigSize, CurSize, GrowthAmt, MetricDate)
(Select tds.DBName, tds.database_ID, Sum(tds.Size) as NumPages,
Convert(decimal(10,2),(((Sum(Convert(decimal(10,2),tds.Size)) * 8000)/1024)/1024)) as OrigSize,
Convert(decimal(10,2),(((Sum(Convert(decimal(10,2),tds.Size)) * 8000)/1024)/1024)) as CurSize,
'0.00 MB' as GrowthAmt, GetDate() as MetricDate
from #TempDBSize tds
where tds.database_ID not in (Select Distinct DBID from DBGrowthRate
where DBName = tds.database_ID)
Group by tds.database_ID, tds.DBName)

Drop table #TempDBSize

Select *
from DBGrowthRate

Code to run weekly to check the growth.
Select sd.name as DBName, mf.name as FileName, mf.database_id, file_id, size
into #TempDBSize2
from sys.databases sd
join sys.master_files mf
on sd.database_ID = mf.database_ID
Order by mf.database_id, sd.name

If Exists (Select Distinct DBName from #TempDBSize2
where DBName in (Select Distinct DBName from DBGrowthRate))
and Convert(varchar(10),GetDate(),101) > (Select Distinct Convert(varchar(10),Max(MetricDate),101) as MetricDate
from DBGrowthRate)
Begin
Insert into dbo.DBGrowthRate (DBName, DBID, NumPages, OrigSize, CurSize, GrowthAmt, MetricDate)
(Select tds.DBName, tds.database_ID, Sum(tds.Size) as NumPages,
dgr.CurSize as OrigSize,
Convert(decimal(10,2),(((Sum(Convert(decimal(10,2),tds.Size)) * 8000)/1024)/1024)) as CurSize,
Convert(varchar(100),(Convert(decimal(10,2),(((Sum(Convert(decimal(10,2),tds.Size)) * 8000)/1024)/1024))
- dgr.CurSize)) + ' MB' as GrowthAmt, GetDate() as MetricDate
from #TempDBSize2 tds
join DBGrowthRate dgr
on tds.database_ID = dgr.DBID
Where DBGrowthID = (Select Distinct Max(DBGrowthID) from DBGrowthRate
where DBID = dgr.DBID)
Group by tds.database_ID, tds.DBName, dgr.CurSize)
End
Else
IF Not Exists (Select Distinct DBName from #TempDBSize2
where DBName in (Select Distinct DBName from DBGrowthRate))
Begin
Insert into dbo.DBGrowthRate (DBName, DBID, NumPages, OrigSize, CurSize, GrowthAmt, MetricDate)
(Select tds.DBName, tds.database_ID, Sum(tds.Size) as NumPages,
Convert(decimal(10,2),(((Sum(Convert(decimal(10,2),tds.Size)) * 8000)/1024)/1024)) as OrigSize,
Convert(decimal(10,2),(((Sum(Convert(decimal(10,2),tds.Size)) * 8000)/1024)/1024)) as CurSize,
'0.00 MB' as GrowthAmt, GetDate() as MetricDate
from #TempDBSize2 tds
where tds.database_ID not in (Select Distinct DBID from DBGrowthRate
where DBName = tds.database_ID)
Group by tds.database_ID, tds.DBName)
End

--Select *
--from DBGrowthRate
----Verifies values were entered

Drop table #TempDBSize2

Trace : Who has accessed my SQL 2005 server

This script returns login information from the default trace created in SQL Server 2005. When the sys.server_principals data is null that would mean the login is allowed via a Windows Group.

sys.traces provides the information for the default trace such as the file path and the max files.

fn_trace_gettable returns the data from trace file(s) in table format.

sys.server_principals is the way you should access server logins in SQL Server 2005, replacing syslogins.

An important thing to note is that the default trace will create up to 100MB (5 20MB files) of event data and then begin wrapping. Also it creates a new file when ever the SQL Server is restarted so you may not have the full 100MB of data if you reboot or restart SQL Server often.

SELECT
I.NTUserName,
I.loginname,
I.SessionLoginName,
I.databasename,
Min(I.StartTime) as first_used,
Max(I.StartTime) as last_used,
S.principal_id,
S.sid,
S.type_desc,
S.name
FROM
sys.traces T CROSS Apply
::fn_trace_gettable(T.path, T.max_files) I LEFT JOIN
sys.server_principals S ON
CONVERT(VARBINARY(MAX), I.loginsid) = S.sid
WHERE
T.id = 1 And
I.LoginSid is not null
Group By
I.NTUserName,
I.loginname,
I.SessionLoginName,
I.databasename,
S.principal_id,
S.sid,
S.type_desc,
S.name

default trace enabled Option
Use the default trace enabled option to enable or disable the default trace log files. The default trace functionality provides a rich, persistent log of activity and changes primarily related to the configuration options.

To open the default trace log in the default location:
SELECT *
FROM fn_trace_gettable
('C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\log.trc', default)
GO

When set to 1, the default trace enabled option enables Default Trace. The default setting for this option is 1 (ON). A value of 0 turns off the trace.

The default trace enabled option is an advanced option. If you are using the sp_configure system stored procedure to change the setting, you can change the default trace enabled option only when show advanced options is set to 1. The setting takes effect immediately without a server restart.

Performance Tuning SQL Server's Configuration Settings

The default network packet size for clients communicating with SQL Server is 4096 bytes. In most cases, this is the overall best setting. But in some special cases, this SQL Server configuration setting should be changed. If the client regularly transfers large amounts of text or image data to SQL Server, or performs large BCP or DTS operations, then increasing the default size may boost performance because the number of network packets is reduced. On the other hand, if your application only sends and receives small amounts of data, then a smaller packet size can boost responsiveness.

Keep in mind that client software can overrule this setting in SQL Server. Only try changing this setting if you are very familiar with network traffic analysis and have the ability to test the performance, before and after the change. [6.5, 7.0, 2000, 2005]

Monday, November 10, 2008

Shrinking databases

“Order the pages, shuffle the pages.”

Do you ever shrink your data files? I’ve personally never been fond of it, especially for production databases. After all, they’ll simply have to grow again and, especially if the data files are on independent drives, there’s little difference between space free on the drive or space free in the data file. There is also a more insidious reason for not shrinking a database.

Let’s take a very simple database (The creation code is at the end of the post). I have two tables, both with a tens of thousands of rows. Both tables have a clustered index on a uniqueidentifier and are heavily fragmented (>99%).

DBCC SHOWCONTIG(LargeTable1) -- 99.30%
BCC SHOWCONTIG(LargeTable2) -- 99.21%

To fix the fragmentation, rebuild both indexes. That fixes the fragmentation, but now the data file is using almost twice the space necessary.

DBCC ShowFileStats -- 3363 extents total, 1697 used (215 MB total, 106 MB free)

So, shrink the database to release the wasted space back to the OS

DBCC SHRINKDATABASE (TestingShrink, 10) -- Shrink to 10% free

That’s fixed the space issue. But now, have another look at those two indexes that were just rebuilt.

DBCC SHOWCONTIG(LargeTable1)
- Logical Scan Fragmentation ………………: 99.99%

DBCC SHOWCONTIG(LargeTable2)
- Logical Scan Fragmentation ………………: 7.08%

Oops. Not exactly a desired outcome.

When SQL shrinks a data file, it takes extents that are towards the end of the file and moves them to empty places further forward. It does this with no concern over logical order of pages or indexes. Net result, after shrinking a database, many of the indexes in that database will be badly fragmented.

For this reason mainly I always recommend that, especially for production databases, the data files get grown as necessary and not shrunk. The space that can be reclaimed from the data file is not worth what the shrink does to page ordering. Especially since, as production databases tend to do, the file will simply be growing again sometime in the future.

All too often I hear of maintenance plans that first rebuild all the indexes, then shrink the data files. That kind of maintenance is worse than useless. The index rebuild uses cpu and time to arrange indexes in logical order and in the process often grows the data file. The shrink then uses more time and cpu and often will leave the indexes more fragmented than they were before the rebuild.

Basically, if you’re going to rebuild indexes, don’t shrink the data files. If you’re going to shrink data files, either don’t waste time rebuilding indexes, or do them after the shrink.

Paul Randal wrote a very nice post on the downsides of shrink, entitled “Turn Auto Shrink Off!” Pretty much says it all.

Caveat: There are cases where shrinking data files does make sense. When a process created lots of tables for processing then dropped them again, after a massive archiving job, after changing data types in a table to release a large amount of wasted space (more on that another time). Just be aware of the effect of a shrink on the fragmentation of indexes.

Edit: Some more thoughts from Paul Randal on shrinking databases: Autoshrink. Turn it OFF!

Sample Code:

1. SET NOCOUNT ON
2. GO
3.
4. CREATE DATABASE TestingShrink
5. GO
6.
7. ALTER DATABASE TestingShrink SET RECOVERY SIMPLE
8. GO
9.
10. USE TestingShrink
11. GO
12.
13. Create Table LargeTable1 ( -- row size of ~700 (10 rows per page)
14. ID BIGINT,
15. SomeString CHAR(600),
16. Row_ID UNIQUEIDENTIFIER,
17. AValue NUMERIC(30,8),
18. RandomDate DATETIME
19. )
20.
21. Create Table LargeTable2 ( -- row size of ~700 (10 rows per page)
22. ID BIGINT,
23. SomeString CHAR(600),
24. Row_ID UNIQUEIDENTIFIER,
25. AValue NUMERIC(30,8),
26. RandomDate DATETIME
27. )
28. GO
29.
30. -- ensuring high fragmentation
31. CREATE CLUSTERED INDEX idx_Large1 on LargeTable1 (Row_ID)
32. CREATE CLUSTERED INDEX idx_Large2 on LargeTable2 (Row_ID)
33. GO
34.
35. DECLARE @i SMALLINT
36. SET @i = 0
37. WHILE (@i<8)
38. BEGIN
39. ;WITH DataPopulate (RowNo, Strng,Uniqueid,Num,ADate) AS (
40. SELECT 1 AS RowNo, 'abc' as Strng, NewID() AS Uniqueid, rand()*856542 AS Num, DATEADD(dd, FLOOR(RAND()*75454),'1753/01/01')
41. UNION ALL
42. SELECT rowNo+1, 'abc' as Strng, NewID() AS Uniqueid, rand(RowNo*25411)*856542 AS Num, DATEADD(dd, FLOOR(RAND(RowNo*96322)*85454),'1753/01/01')
43. FROM DataPopulate WHERE RowNo<10000
44. )
45. INSERT INTO LargeTable1
46. SELECT * FROM DataPopulate
47. OPTION (MAXRECURSION 10000)
48.
49. ;WITH DataPopulate (RowNo, Strng,Uniqueid,Num,ADate) AS (
50. SELECT 1 AS RowNo, 'abc' as Strng, NewID() AS Uniqueid, rand()*856542 AS Num, DATEADD(dd, FLOOR(RAND()*75454),'1753/01/01')
51. UNION ALL
52. SELECT rowNo+1, 'abc' as Strng, NewID() AS Uniqueid, rand(RowNo*25411)*856542 AS Num, DATEADD(dd, FLOOR(RAND(RowNo*96322)*85454),'1753/01/01')
53. FROM DataPopulate WHERE RowNo<10000
54. )
55. INSERT INTO LargeTable2
56. SELECT * FROM DataPopulate
57. OPTION (MAXRECURSION 10000)
58. SET @i = @i+1
59. END
60. GO
61.
62. DBCC SHOWCONTIG(LargeTable1) -- 99.30%
63. DBCC SHOWCONTIG(LargeTable2) -- 99.21%
64. DBCC showfilestats -- 2467 extents total, 2463 used (157 MB total, 256kb free)
65. GO
66. -- Rebuild the indexes. This should grow the database quite a bit.
67. Alter Index idx_Large1 on LargeTable1 rebuild
68. Alter Index idx_Large2 on LargeTable2 rebuild
69. go
70.
71. DBCC SHOWCONTIG(LargeTable1) -- 0%
72. DBCC SHOWCONTIG(LargeTable2) -- 1%
73. DBCC ShowFileStats -- 3363 extents total, 1697 used (215 MB total, 106 MB free)
74. GO
75.
76. USE Master
77. go
78. DBCC SHRINKDATABASE (TestingShrink, 10) -- Shrink to 10% free
79. go
80. use TestingShrink
81. GO
82.
83. DBCC ShowFileStats -- 1885 extents total, 1695 used (120 MB total, 12 MB free)
84. DBCC SHOWCONTIG(LargeTable1) -- 99.99%
85. DBCC SHOWCONTIG(LargeTable2) --7.08%
86. GO
87.
88. USE master
89. GO
90.
91. DROP DATABASE TestingShrink
92. GO

Friday, November 7, 2008

Vardecimal Storage Format

In Service Pack 2, SQL Server 2005 adds a new storage format for numeric and decimal datatypes called vardecimal. Vardecimal is a variable-length representation for decimal types that can save unused bytes in every instance of the row. The biggest amount of savings come from cases where the decimal definition is large (like decimal(38,6)) but the values stored are small (like a value of 0.0) or there is a large number of repeated values or data is sparsely populated.

SQL Server 2005 also includes a stored procedure that can estimate the savings before you enable the new storage format.

master.dbo.sp_estimate_rowsize_reduction_for_vardecimal ‘tablename’

To enable vardecimal storage format, you need to first allow vardecimal storage on the database;

exec sys.sp_db_vardecimal_storage_format N'databasename', N'ON'

Once the database option is enabled, you can then turn on vardecimal storage at a table level using the following procedure;

exec sp_tableoption 'tablename', 'vardecimal storage format', 1

Vardecimal storage format presents an overhead due to the complexity inherent in variable length data processing. However in IO bound workloads, savings on IO bandwidth due to efficient storage can far exceed this processing overhead.

If you would like more information on this topic, updated SQL Server 2005 Books Online for Service Pack 2 contains extensive information on the new vardecimal format.

Thursday, November 6, 2008

Figuring out the most popular queries in seconds

Sql Server 2005 provides Dynamic Management Views(DMV) that can help save you a lot of work. One excellent and helpful DMV is sys.dm_exec_query_stats. In previous version of SQL Server to find out the highest impact queries on CPU or IO in system, you had to walk through a long set of analyses steps including getting aggregated information out of the data you collected from profiler.

With sys.dm_exec_query_stats, you can figure out many combinations of query analyses by a single query. Here are some of the examples;
•

Find queries suffering most from blocking –
(total_elapsed_time – total_worker_time)
•

Find queries with most CPU cycles –
(total_worker_time)
•

Find queries with most IO cycles –
(total_physical_reads + total_logical_reads + total_logical_writes)
•

Find most frequently executed queries –
(execution_count)
•

You can find more information on how to use dynamic management views for performance troubleshooting in the “SQL Server 2005 Waits and Queues” whitepaper located at:
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/performance_tuning_waits_queues.mspx

Find Rarely-Used Indexes

Sample stored procedure that lists rarely-used indexes. Because the number and type of accesses are tracked in dmvs, this procedure can find indexes that are rarely useful. Because the cost of these indexes is incurred during maintenance (e.g. insert, update, and delete operations), the write costs of rarely-used indexes may outweigh the benefits. This stored procedure requires Microsoft SQL Server 2005.

Script Code

declare @dbid int
select @dbid = db_id()
select objectname=object_name(s.object_id), s.object_id
, indexname=i.name, i.index_id
, user_seeks, user_scans, user_lookups, user_updates
from sys.dm_db_index_usage_stats s,
sys.indexes i
where database_id = @dbid
and objectproperty(s.object_id,'IsUserTable') = 1
and i.object_id = s.object_id
and i.index_id = s.index_id
order by (user_seeks + user_scans + user_lookups + user_updates) asc

Compare Single-Use and Re-Used Plans

Sample script that compares single use plans to re-used plans. This script requires Microsoft SQL Server 2005.

Script Code

declare @single int, @reused int, @total int

select @single=
sum(case(usecounts)
when 1 then 1
else 0
end),
@reused=
sum(case(usecounts)
when 1 then 0
else 1
end),
@total=count(usecounts)
from sys.dm_exec_cached_plans

select
'Single use plans (usecounts=1)'= @single,
'Re-used plans (usecounts>1)'= @reused,
're-use %'=cast(100.0*@reused / @total as dec(5,2)),
'total usecounts'=@total

select 'single use plan size'=sum(cast(size_in_bytes as bigint))
from sys.dm_exec_cached_plans
where usecounts = 1

How do you measure CPU pressure?

t is important to understand whether CPU pressure is affecting SQL Server performance. This is true even in the case where SQL Server is the only application running on a particular box. The System object Perfmon counter Processor Queue length is not necessarily an effective way of measuring CPU pressure in SQL Server. To see why this is the case, we first must take a brief (and simplified) look at the SQL Server Execution Model.

SQL Server uses a User Mode Scheduler (UMS) to control the execution of SQL Server user requests (SPIDs or session_ids). The UMS does not replace the Windows scheduler but rather, manages the execution of SQL Server requests (without returning control to Windows). So when SQL Server gets its time slice from the Windows scheduler, the SQL Server UMS manages what user requests are run during this time. In a 4-proc scenario, there will be 4 User Mode Schedulers, one for each CPU. Each UMS uses a number of constructs (queues, lists and worker threads) to govern execution. At any given time, each UMS will have at most a single running user, a runnable queue of requests that are waiting for CPU, a waiter list (for resources such as IO, locks, memory), and a work queue (user requests that are waiting for worker threads).

The runnable queue can be likened to a grocery analogy where there are multiple check out lines. The register clerk is the CPU. There is just one customer checking out e.g. “running” at any given register. The time spent in the checkout line represents CPU pressure. The longer the line, the longer the waits, hence more CPU pressure.

OK, back to SQL Server. Assume the following for a single UMS: SPID 51 is currently running. The Runnable Queue consists of SPIDs 60, 55, 87 & 79. The Waiter list includes SPIDS 55, 84 & 72. The Work queue is empty. Now, assume the running SPID 51 needs physical IO. Two things will happen. SPID 51 is moved to the wait list until the IO is completed and the next session_id in the runnable queue, SPID 60, begins to run. When the IO is complete, SPID 51 is moved to the bottom of the runnable queue which now consists of SPIDS 55, 87, 79 & 51.

Given this scenario, the total amount of time waiting consists of resource and signal waits. The time waiting for a resource is shown as Resource Waits. The time waiting in the runnable queue for CPU is called Signal Waits. In SQL Server 2005, waits are shown in the Dynamic Management View (DMV) sys.dm_os_wait_stats. The query to measure cpu pressure is as follows:

---- Total waits are wait_time_ms

Select signal_wait_time_ms=sum(signal_wait_time_ms)

,'%signal (cpu) waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))

,resource_wait_time_ms=sum(wait_time_ms - signal_wait_time_ms)

,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))

From sys.dm_os_wait_stats

You can initialize or clear out SQL Server 2005 waitstats with the statement dbcc sqlperf ([sys.dm_os_wait_stats],clear) with no_infomsgs. In SQL Server 2000, waits can be seen by looking at DBCC SQLPERF (WAITSTATS).

If you have SQL–dedicated box it is conceivable that you could have high signal waits indicating CPU pressure and still have a low processor queue length. In such cases, a faster CPU could reduce signal waits but just focusing on the low processor queue length as a measure of CPU contention may lead to the erroneous conclusion that CPU is fine.

In conclusion, if Signal Waits are a significant percentage of total waits, you have CPU pressure which may be alleviated by faster or more CPUs. Alternately, CPU pressure can be reduced by eliminating unnecessary sorts (indexes can avoid sorts in order & group by’s) and joins, and compilations (and re-compilations). If Signal Waits are not significant, a faster CPU will not appreciably improve performance.

Finding whitespace in a specific column

There was a quick way to check for any white space in a particular column? These are the ascii equivalents of 0-32 (33 if you want to include a regular SPACE). This procedure simply takes a table and column name and returns the rows (but you can change to return count(*) if you think it will be a lot)

-- This should check for the existence of any ascii values of 0 to 32
-- (33 for regular SPACE) and returns the rows. You can easily return
-- a count(*) instead if you want.
create procedure findwhitespace
@tablename sysname,
@colname sysname
as
declare @query varchar(1000)

select @query =
'
declare @whitespace varchar(128),
@ctr int

select @ctr = 0
select @whitespace = ''%[''
while @ctr < 32 -- for now, keep regular SPACE out of check, change to 33 if you want to check for it
begin
select @whitespace = @whitespace + char(@ctr)
select @ctr = @ctr + 1
end
select @whitespace = @whitespace + '']%''

select * from ' +@tablename+ ' where ' +@colname+ ' like @whitespace
'
exec(@query)
go

-- Procedure created, now do a quick test.
create table testtable(testval varchar(10))

insert testtable values('value1')
insert testtable values('value2')
insert testtable values('value3')
update testtable
set testval = testval + char(1)
where testval = 'value2'
update testtable
set testval = testval + ' '
where testval = 'value3'

exec findwhitespace @tablename = 'testtable', @colname = 'testval'

drop table testtable

Tuesday, November 4, 2008

Excel Shortcut keys

ENTER
Complete a cell entry and move down in the selection
ALT+ENTER
Start a new line in the same cell
CTRL+ENTER
Fill the selected cell range with the current entry
SHIFT+ENTER
Complete a cell entry and move up in the selection
TAB
Complete a cell entry and move to the right in the selection
SHIFT+TAB
Complete a cell entry and move to the left in the selection
ESC
Cancel a cell entry
BACKSPACE
Delete the character to the left of the insertion point, or delete the selection
DELETE
Delete the character to the right of the insertion point, or delete the selection
CTRL+DELETE
Delete text to the end of the line
Arrow keys
Move one character up, down, left, or right
HOME
Move to the beginning of the line
F4 or CTRL+Y
Repeat the last action
SHIFT+F2
Edit a cell comment
CTRL+SHIFT+F3
Create names from row and column labels
CTRL+D
Fill down
CTRL+R
Fill to the right
BACKSPACE
Edit the active cell and then clear it, or delete the preceding character in the active cell as you edit cell contents
ENTER
Complete a cell entry
CTRL+SHIFT+ENTER
Enter a formula as an array formula
ESC
Cancel an entry in the cell or formula bar
CTRL+A
Display the Formula Palette after you type a function name in a formula
CTRL+SHIFT+A
Insert the argument names and parentheses for a function after you type a function name in a formula
CTRL+K
Insert a hyperlink
ENTER (in a cell with a hyperlink)
Activate a hyperlink
F2
Edit the active cell and position the insertion point at the end of the line
F3
Paste a defined name into a formula
SHIFT+F3
Paste a function into a formula
F9
Calculate all sheets in all open workbooks
CTRL+ALT+F9
Calculate all sheets in the active workbook
SHIFT+F9
Calculate the active worksheet
#NAME?
Start a formula
ALT+= (equal sign)
Insert the AutoSum formula
CTRL+; (semicolon)
Enter the date
CTRL+SHIFT+: (colon)
Enter the time
CTRL+SHIFT+" (quotation mark)
Copy the value from the cell above the active cell into the cell or the formula bar
CTRL+` (single left quotation mark)
Alternate between displaying cell values and displaying cell formulas
CTRL+' (apostrophe)
Copy a formula from the cell above the active cell into the cell or the formula bar
ALT+DOWN ARROW
Display the AutoComplete list
ALT+' (apostrophe)
Display the Style dialog box
CTRL+1
Display the Format Cells dialog box
CTRL+SHIFT+~
Apply the General number format
CTRL+SHIFT+$
Apply the Currency format with two decimal places (negative numbers appear in parentheses)
CTRL+SHIFT+%
Apply the Percentage format with no decimal places
CTRL+SHIFT+^
Apply the Exponential number format with two decimal places
CTRL+SHIFT+#
Apply the Date format with the day, month, and year
CTRL+SHIFT+@
Apply the Time format with the hour and minute, and indicate A.M. or P.M.
CTRL+SHIFT+!
Apply the Number format with two decimal places, thousands separator, and minus sign (–) for negative values
CTRL+SHIFT+&
Apply the outline border
CTRL+SHIFT+_
Remove outline borders
CTRL+B
Apply or remove bold formatting
CTRL+I
Apply or remove italic formatting
CTRL+U
Apply or remove an underline
CTRL+5
Apply or remove strikethrough formatting
CTRL+9
Hide rows
CTRL+SHIFT+( (opening parenthesis)
Unhide rows
CTRL+0 (zero)
Hide columns
CTRL+SHIFT+) (closing parenthesis)
Unhide columns

RUN command shortcuts

I know there are a lot of people who love using the run prompt or the command prompt rather than using the mmc control ... So here are some quick tips that you can use ... Enjoy ...

Accessibility Controls : access.cpl

Add Hardware Wizard : hdwwiz.cpl

Add/Remove Programs : appwiz.cpl

Administrative Tools : control admintools

Automatic Updates : wuaucpl.cpl

Bluetooth Transfer Wizard : fsquirt

Calculator : calc

Certificate Manager : certmgr.msc

Character Map : charmap

Check Disk Utility : chkdsk

Clipboard Viewer : clipbrd

Command Prompt : cmd

Component Services : dcomcnfg

Computer Management : compmgmt.msc

Date and Time Properties : timedate.cpl

DDE Shares : ddeshare

Device Manager : devmgmt.msc

Direct X Control Panel (If Installed)* : directx.cpl

Direct X Troubleshooter : dxdiag

Disk Cleanup Utility : cleanmgr

Disk Defragment : dfrg.msc

Disk Management : diskmgmt.msc

Disk Partition Manager : diskpart

Display Properties : control desktop

Display Properties : desk.cpl

Display Properties (w/Appearance Tab Preselected) : control color

Dr. Watson System Troubleshooting Utility : drwtsn32

Driver Verifier Utility : verifier

Event Viewer : eventvwr.msc

File Signature Verification Tool : sigverif

Findfast : findfast.cpl

Folders Properties : control folders

Fonts : control fonts

Fonts Folder : fonts

Free Cell Card Game : freecell

Game Controllers : joy.cpl

Group Policy Editor (XP Prof) : gpedit.msc

Hearts Card Game : mshearts

Iexpress Wizard : iexpress

Indexing Service : ciadv.msc

Internet Properties : inetcpl.cpl

IP Configuration (Display Connection Configuration) : ipconfig /all

IP Configuration (Display DNS Cache Contents) : ipconfig /displaydns

IP Configuration (Delete DNS Cache Contents) : ipconfig /flushdns

IP Configuration (Release All Connections) : ipconfig /release

IP Configuration (Renew All Connections) : ipconfig /renew

IP Configuration (Refreshes DHCP & Re-Registers DNS) : ipconfig /registerdns

IP Configuration (Display DHCP Class ID) : ipconfig /showclassid

IP Configuration (Modifies DHCP Class ID) : ipconfig /setclassid

Java Control Panel (If Installed) : jpicpl32.cpl

Java Control Panel (If Installed) : javaws

Keyboard Properties : control keyboard

Local Security Settings : secpol.msc

Local Users and Groups : lusrmgr.msc

Logs You Out Of Windows : logoff

Microsoft Chat : winchat

Minesweeper Game : winmine

Mouse Properties : control mouse

Mouse Properties : main.cpl

Network Connections : control netconnections

Network Connections : ncpa.cpl

Network Setup Wizard : netsetup.cpl

Notepad : notepad

Nview Desktop Manager (If Installed) : nvtuicpl.cpl

Object Packager : packager

ODBC Data Source Administrator : odbccp32.cpl

On Screen Keyboard : osk

Opens AC3 Filter (If Installed) : ac3filter.cpl

Password Properties : password.cpl

Performance Monitor : perfmon.msc

Performance Monitor : perfmon

Phone and Modem Options : telephon.cpl

Power Configuration : powercfg.cpl

Printers and Faxes : control printers

Printers Folder : printers

Private Character Editor : eudcedit

Quicktime (If Installed) : QuickTime.cpl

Regional Settings : intl.cpl

Registry Editor : regedit

Registry Editor : regedit32

Remote Desktop : mstsc

Removable Storage : ntmsmgr.msc

Removable Storage Operator Requests : ntmsoprq.msc

Resultant Set of Policy (XP Prof) : rsop.msc

Scanners and Cameras : sticpl.cpl

Scheduled Tasks : control schedtasks

Security Center : wscui.cpl

Services : services.msc

Shared Folders : fsmgmt.msc

Shuts Down Windows : shutdown

Sounds and Audio : mmsys.cpl

Spider Solitare Card Game : spider

SQL Client Configuration : cliconfg

System Configuration Editor : sysedit

System Configuration Utility : msconfig

System File Checker Utility (Scan Immediately) : sfc /scannow

System File Checker Utility (Scan Once At Next Boot) : sfc /scanonce

System File Checker Utility (Scan On Every Boot) : sfc /scanboot

System File Checker Utility (Return to Default Setting) : sfc /revert

System File Checker Utility (Purge File Cache) : sfc /purgecache

System File Checker Utility (Set Cache Size to size x) : sfc /cachesize=x

System Properties : sysdm.cpl

Task Manager : taskmgr

Telnet Client : telnet

User Account Management : nusrmgr.cpl

Utility Manager : utilman

Windows Firewall : firewall.cpl

Windows Magnifier : magnify

Windows Management Infrastructure : wmimgmt.msc

Windows System Security Tool : syskey

Windows Update Launches : wupdmgr

Windows XP Tour Wizard : tourstart

Wordpad : write

DAC - Dedicated Admin Connection in SQL Server 2005

In mids of loads of new features introduced in SQL Server 2005, this is one feature that interests the DBA's the most. If you work with large number of servers and large number of databases you might want to have immense control the way you monitor and maintain SQL Server instances. In the past I've seen times when people say the server has frozen and I am not able to connect. To eliminate such problems "Welcome SQL Server 2005 - DAC"

What is DAC ?

Dedicated Admin Connection allows user to connect to SQL server when normal connection attempts fail, for example, when server is hanging, out of memory or other bad states that it's not responding to connection requests. DAC is achieved by pre-allocating dedicated resources during server startup, including memory and scheduler etc. Only available in SQL Server 2005.

Ways to use DAC ?

There are two modes of usage. You can use the sqlcmd which is the command promt version and the osql version in SQL Server 2005. We have a new option -A that enables the connection to be as Admin connection.

More often than not, I am sure you would like to have this with the Enterprise Manager way i.e. the SQL Server Management interface for SQL Server 2005. To enable the admin connection from SSMS you need to use the ADMIN: before your server's name. Hence the connection property would look like:

I am sure this is a nifty feature as DBA's we would like to capitalize on. This proves at any point in time the DBA has a window of control ALWAYS available to him. I think we need to use it, test it and experience the difference that this feature brings to the table.

Monday, November 3, 2008

Sorting, grouping, and summarizing SQL data

Arranging data in a manner that's meaningful can be a challenge. Sometimes all you need is a simple sort. Often, you need more -- you need groups you can analyse and summarise. Fortunately, SQL offers a number of clauses and operators for sorting, grouping, and summarising. The following tips will help you discern when to sort, when to group, and when and how to summarize.

#1: Bring order with a sort
More often than not, all your data really needs is a little order. SQL's ORDER BY clause organises data in alphabetic or numeric order. Consequently, similar values sort together in what appear to be groups. However, the apparent groups are a result of the sort; they aren't true groups. ORDER BY displays each record whereas a group may represent multiple records.

#2: Reduce similar values into a group
The biggest difference between sorting and grouping is this: Sorted data displays all the records (within the confines of any limiting criteria) and grouped data doesn't. The GROUP BY clause reduces similar values into one record. For instance, a GROUP BY clause can return a unique list of ZIP codes from a source that repeats those values:
SELECT ZIP FROM Customers GROUP BY ZIP

Include only those columns that define the group in both the GROUP BY and SELECT column lists. In other words, the SELECT list must match the GROUP BY list, with one exception: The SELECT list can include aggregate functions. (GROUP BY doesn't allow aggregate functions.)

Keep in mind that GROUP BY won't sort the resulting groups. To arrange groups alphabetically or numerically, add an ORDER BY clause (# 1). In addition, you can't refer to an aliased field in the GROUP BY clause. Group columns must be in the underlying data, but they don't have to appear in the results.

#3: Limit data before it's grouped
You can limit the data that GROUP BY groups by adding a WHERE clause. For instance, the following statement returns a unique list of ZIP codes for just Kentucky customers:
SELECT ZIP FROM Customers WHERE State = 'KY' GROUP BY ZIP

It's important to remember that WHERE filters data before the GROUP BY clause evaluates it.

Like GROUP BY, WHERE doesn't support aggregate functions.

#4: Return all groups
When you use WHERE to filter data, the resulting groups display only those records you specify. Data that fits the group's definition but does not meet the clause's conditions won't make it to a group. Include ALL when you want to include all data, regardless of the WHERE condition. For instance, adding ALL to the previous statement returns all of the ZIP groups, not just those in Kentucky:
SELECT ZIP FROM Customers WHERE State = 'KY' GROUP BY ALL ZIP

As is, the two clauses are in conflict, and you probably wouldn't use ALL in this way. ALL comes in handy when you use an aggregate to evaluate a column. For example, the following statement counts the number of customers in each Kentucky ZIP, while also displaying other ZIP values:
SELECT ZIP, Count(ZIP) AS KYCustomersByZIP FROM Customers WHERE State = 'KY' GROUP BY ALL ZIP

The resulting groups comprise all ZIP values in the underlying data. However, the aggregate column (KYCustomersByZIP) would display 0 for any group other than a Kentucky ZIP.

Remote queries don't support GROUP BY ALL.

#5: Limit data after it's grouped
The WHERE clause (# 3) evaluates data before the GROUP BY clause does. When you want to limit data after it's grouped, use HAVING. Often, the result will be the same whether you use WHERE or HAVING, but it's important to remember that the clauses are not interchangeable. Here's a good guideline to follow when you're in doubt: Use WHERE to filter records; use HAVING to filter groups.

Usually, you'll use HAVING to evaluate a group using an aggregate. For instance, the following statement returns a unique list of ZIP codes, but the list might not include every ZIP code in the underlying data source:
SELECT ZIP, Count(ZIP) AS CustomersByZIP FROM Customers GROUP BY ZIP HAVING Count(ZIP) = 1

Only those groups with just one customer make it to the results.

#6: Get a closer look at WHERE and HAVING
If you're still confused about when to use WHERE and when to use HAVING, apply the following guidelines:

* WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
* HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.

#7: Summarize grouped values with aggregates
Grouping data can help you analyse your data, but sometimes you'll need a bit more information than just the groups themselves. You can add an aggregate function to summarise grouped data. For instance, the following statement displays a subtotal for each order:
SELECT OrderID, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY OrderID

As with any other group, the SELECT and GROUP BY lists must match. Including an aggregate in the SELECT clause is the only exception to this rule.

#8: Summarise the aggregate
You can further summarise data by displaying a subtotal for each group. SQL's ROLLUP operator displays an extra record, a subtotal, for each group. That record is the result of evaluating all the records within each group using an aggregate function. The following statement totals the OrderTotal column for each group:
SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH ROLLUP

The ROLLUP row for a group with two OrderTotal values of 20 and 25 would display an OrderTotal of 45. The first record in a ROLLUP result is unique because it evaluates all of the group records. That value is a grand total for the entire recordset.

ROLLUP doesn't support DISTINCT in aggregate functions or the GROUP BY ALL clause.

#9: Summarise each column
The CUBE operator goes a step further than ROLLUP by returning totals for each value in each group. The results are similar to ROLLUP, but CUBE includes an additional record for each column in the group. The following statement displays a subtotal for each group and an additional total for each customer:
SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

CUBE gives the most comprehensive summarisation. It not only does the work of both the aggregate and ROLLUP, but also evaluates the other columns that define the group. In other words, CUBE summarises every possible column combination.

CUBE doesn't support GROUP BY ALL.

#10: Bring order to summaries
When the results of a CUBE are confusing (and they usually are), add the GROUPING function as follows:
SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

The results include two additional values for each row:

* The value 1 indicates that the value to the left is a summary value--the result of the ROLLUP or CUBE operator.
* The value 0 indicates that the value to the left is a detail record produced by the original GROUP BY clause.

SQL basics: Query multiple tables

It’s sometimes difficult to know which SQL syntax to use when combining data that spans multiple tables. I’ll discuss some of the more frequently used methods for consolidating queries on multiple tables into a single statement.

The sample queries in this article adhere to the SQL92 ISO standard. Not all database manufacturers follow this standard, and many have made enhancements that can yield unexpected results. If you’re uncertain about support for these concepts in your database, please refer to your manufacturer’s documentation.

SELECT
A simple SELECT statement is the most basic way to query multiple tables. You can call more than one table in the FROM clause to combine results from multiple tables. Here’s an example of how this works:
SELECT table1.column1, table2.column2 FROM table1, table2 WHERE table1.column1 = table2.column1;

In this example, I used dot notation (table1.column1) to specify which table the column came from. If the column in question only appears in one of the referenced tables, you don’t need to include the fully qualified name, but it may be useful to do so for readability.

Tables are separated in the FROM clause by commas. You can include as many tables as needed, although some databases have a limit to what they can efficiently handle before introducing a formal JOIN statement, which is described below.

This syntax is, in effect, a simple INNER JOIN. Some databases treat it exactly the same as an explicit JOIN. The WHERE clause tells the database which fields to correlate, and it returns results as if the tables listed were combined into a single table based on the provided conditions. It’s worth noting that your conditions for comparison don’t have to be the same columns you return as your result set. In the example above, table1.column1 and table2.column1 are used to combine the tables, but table2.column2 is returned.

You can extend this functionality to more than two tables using AND keywords in the WHERE clause. You can also use such a combination of tables to restrict your results without actually returning columns from every table. In the example below, table3 is matched up with table1, but I haven’t returned anything from table3 for display. I’ve merely checked to make sure the relevant column from table1 exists in table3. Note that table3 needs to be referenced in the FROM clause for this example.
SELECT table1.column1, table2.column2 FROM table1, table2, table3 WHERE table1.column1 = table2.column1 AND table1.column1 = table3.column1;

Be warned, however, that this method of querying multiple tables is effectively an implied JOIN. Your database may handle things differently, depending on the optimisation engine it uses. Also, neglecting to define the nature of the correlation with a WHERE clause can give you undesirable results, such as returning the rogue field in a column associated with every possible result from the rest of the query, as in a CROSS JOIN.

If you’re comfortable with how your database handles this type of statement, and you’re combining two or just a few tables, a simple SELECT statement will meet your needs.

JOIN
JOIN works in the same way as the SELECT statement above—it returns a result set with columns from different tables. The advantage of using an explicit JOIN over an implied one is greater control over your result set, and possibly improved performance when many tables are involved.

There are several types of JOIN—LEFT, RIGHT, and FULL OUTER; INNER; and CROSS. The type you use is determined by the results you want to see. For example, using a LEFT OUTER JOIN will return all relevant rows from the first table listed, while potentially dropping rows from the second table listed if they don’t have information that correlates in the first table.

This differs from an INNER JOIN or an implied JOIN. An INNER JOIN will only return rows for which there is data in both tables.

Use the following JOIN statement for the first SELECT query above:
SELECT table1.column1, table2.column2 FROM table1 INNER JOIN table2
ON table1.column1 = table2.column1;

Subqueries
Subqueries, or subselect statements, are a way to use a result set as a resource in a query. These are often used to limit or refine results rather than run multiple queries or manipulate the data in your application. With a subquery, you can reference tables to determine inclusion of data or, in some cases, return a column that is the result of a subselect.

The following example uses two tables. One table actually contains the data I’m interested in returning, while the other gives a comparison point to determine what data is actually interesting.
SELECT column1 FROM table1 WHERE EXISTS ( SELECT column1 FROM table2 WHERE table1.column1 = table2.column1 );

One important factor about subqueries is performance. Convenience comes at a price and, depending on the size, number, and complexity of tables and the statements you use, you may want to allow your application to handle processing. Each query is processed separately in full before being used as a resource for your primary query. If possible, creative use of JOIN statements may provide the same information with less lag time.

UNION
The UNION statement is another way to return information from multiple tables with a single query. The UNION statement allows you to perform queries against several tables and return the results in a consolidated set, as in the following example.
SELECT column1, column2, column3 FROM table1 UNION SELECT column1, column2, column3 FROM table2;

This will return a result set with three columns containing data from both queries. By default, the UNION statement will omit duplicates between the tables unless the UNION ALL keyword is used. UNION is helpful when the returned columns from the different tables don’t have columns or data that can be compared and joined, or when it prevents running multiple queries and appending the results in your application code.

If your column names don’t match when you use the UNION statement, use aliases to give your results meaningful headers:
SELECT column1, column2 as Two, column3 as Three FROM table1 UNION SELECT column1, column4 as Two, column5 as Three FROM table2;

As with subqueries, UNION statements can create a heavy load on your database server, but for occasional use they can save a lot of time.

Multiple options
When it comes to database queries, there are usually many ways to approach the same problem. These are some of the more frequently used methods for consolidating queries on multiple tables into a single statement. While some of these options may affect performance, practice will help you know when it’s appropriate to use each type of query.

Undocumented SQL Server iteration procedures

I rarely encourage the use of iteration when doing database programming because looping constructs in the database world tend to be harder to manage and much slower than set-based constructs. However, there are times when using iteration inside the database engine is useful.

I'll present a few examples using two undocumented system stored procedures provided by Microsoft. (Note: the examples in this article work in SQL Server 2000 and SQL Server 2005.)

Sp_msforeachdb and sp_msforeachtable allow you to pass TSQL statements that will be executed in a FOR...EACH loop fashion for each database on the instance or for each table in a given database. These procedures are very handy when you're performing sets of operations for everything in your system, such as backing up all user databases or capturing the sizes of each individual table.

sp_msforeachdb

The sp_msforeachdb system stored procedure accepts a TSQL string to be executed against each database that resides on your SQL Server instance. This procedure is especially useful when you're performing database administration and maintenance tasks, such as backup operations. This example loops through each database on the server and prints out the database name:

EXECUTE sp_msforeachdb 'USE ? PRINT DB_NAME()'

The code snippet may be a bit confusing if you haven't seen this in practice. Notice the use of the question mark (?); this character represents the name of the database returned at each iteration of the internal loop. I can use the question mark anywhere in the script where I would normally use the name of the database.

With just a bit of tweaking, I can change the above code into a statement that will create a full backup of all the user databases on the current server instance. For example:

EXECUTE sp_msforeachdb 'USE ?
IF DB_NAME() NOT IN(''master'',''msdb'',''tempdb'',''model'')
BACKUP DATABASE ? TO DISK = ''G:?.bak, WITH INIT'''

Notice the use of double tick (') marks; these marks are used frequently in dynamically built TSQL code and represent a single tick mark. Single tick marks are commonly used to mark the beginning or ending of string literal statements.

sp_msforeachtable

The sp_msforeachtable system stored procedure is very similar to the sp_msforeachdb procedure except that it loops through all of the tables contained in the current database. This procedure is great for operations such as gathering statistics and bulk operations on sets of tables. In the following example, I use sp_msforeachtable to invoke the stored procedure sp_spaceused and pass the table name.

CREATE TABLE #TableSizes
(
TableName NVARCHAR(255),
TableRows INT,
ReservedSpaceKB VARCHAR(20),
DataSpaceKB VARCHAR(20),
IndexSizeKB VARCHAR(20),
UnusedSpaceKB VARCHAR(20)
)

INSERT INTO #TableSizes
EXEC sp_msforeachtable 'sp_spaceused ''?'''

SELECT * FROM #TableSizes
ORDER BY TableRows DESC

One of the most useful aspects of the code is that it inserts the results from sp_spaceused into a table. Note that I am not calling the sp_spaceused procedure itself; I'm calling it dynamically inside of the loop. Even through this method of code execution, I am able to capture the results and store them in a table for later use.

Moving the Tempdb and Master Database in SQL Server

There have been many occasions where it necessary to move the database and log files to different drives to increase performance.

When you want to move a typical user database to a different drive for performance or to split out the logs, you run sp_detach and sp_attach to perform the move. However, when it comes to moving the Master and Tempdb database, different rules apply. In this post, let's walk you through the process of moving these databases.

Moving the Master Database

In the past I have had to move the master database log file to a different drive. If you ever have to perform this function, follow these rules to move the master database successfully. First, right-click on SQL Server in Enterprise Manager (EM) and choose Properties. Next, click the Startup Parameters.

The following parameters appear in this box:

-d is the fully qualified path for the master database data file.
-e is the fully qualified path for the error log file.
-l is the fully qualified path for the master database log

Now, if you want to move your files you would remove the current entry and recreate your new entry with the correct path. For example, I am going to move the Master database log to (C:Test). At this point, I would delete the -l [path] by highlighting the old parameter and choosing remove.

Next, I would add the following entry, -l (c:Testmastlog.ldf) and click OK twice. Now, you will stop SQL Server and move the mastlog.ldf to its new location.

Note: Please make sure you move the mastlog.ldf to the location you configured in your start-up parameters. If you move it to a location that is not specified in the start-up parameters, SQL Server WILL NOT start.

Finally, start SQL Server and you have successfully moved your Master database.

Moving Tempdb

In order to move the tempdb database, open query analyser and run the following query:

use master go
Alter database tempdb modify file (name = tempdev, filename = 'E:Sqldata empdb.mdf')
Go
Alter database tempdb modify file (name = templog, filename = 'E:Sqldata emplog.ldf')
Go

Depending on where you are moving Tempdb is where you will specify the filename=parameter. I am going to move the templog to (c: est). After running the query, delete the old file after restarting SQL Server.

Moving MSDB Database

In order to move the MSDB and Model database, follow these steps. First, right-click the SQL-Server name and click properties. From the General tab, choose your start-up parameters. Next, enter the parameter -T3608. Click OK, stop and restart SQL Server. After the restart, detach the database and move them to their appropriate place.

Whenever your drive space is getting tight and you need to move your system database files to different drives, following these simple procedures will allow you to move your databases efficiently and with confidence.