Thursday, November 6, 2008

How do you measure CPU pressure?

t is important to understand whether CPU pressure is affecting SQL Server performance. This is true even in the case where SQL Server is the only application running on a particular box. The System object Perfmon counter Processor Queue length is not necessarily an effective way of measuring CPU pressure in SQL Server. To see why this is the case, we first must take a brief (and simplified) look at the SQL Server Execution Model.

SQL Server uses a User Mode Scheduler (UMS) to control the execution of SQL Server user requests (SPIDs or session_ids). The UMS does not replace the Windows scheduler but rather, manages the execution of SQL Server requests (without returning control to Windows). So when SQL Server gets its time slice from the Windows scheduler, the SQL Server UMS manages what user requests are run during this time. In a 4-proc scenario, there will be 4 User Mode Schedulers, one for each CPU. Each UMS uses a number of constructs (queues, lists and worker threads) to govern execution. At any given time, each UMS will have at most a single running user, a runnable queue of requests that are waiting for CPU, a waiter list (for resources such as IO, locks, memory), and a work queue (user requests that are waiting for worker threads).

The runnable queue can be likened to a grocery analogy where there are multiple check out lines. The register clerk is the CPU. There is just one customer checking out e.g. “running” at any given register. The time spent in the checkout line represents CPU pressure. The longer the line, the longer the waits, hence more CPU pressure.

OK, back to SQL Server. Assume the following for a single UMS: SPID 51 is currently running. The Runnable Queue consists of SPIDs 60, 55, 87 & 79. The Waiter list includes SPIDS 55, 84 & 72. The Work queue is empty. Now, assume the running SPID 51 needs physical IO. Two things will happen. SPID 51 is moved to the wait list until the IO is completed and the next session_id in the runnable queue, SPID 60, begins to run. When the IO is complete, SPID 51 is moved to the bottom of the runnable queue which now consists of SPIDS 55, 87, 79 & 51.

Given this scenario, the total amount of time waiting consists of resource and signal waits. The time waiting for a resource is shown as Resource Waits. The time waiting in the runnable queue for CPU is called Signal Waits. In SQL Server 2005, waits are shown in the Dynamic Management View (DMV) sys.dm_os_wait_stats. The query to measure cpu pressure is as follows:


---- Total waits are wait_time_ms

Select signal_wait_time_ms=sum(signal_wait_time_ms)

,'%signal (cpu) waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))

,resource_wait_time_ms=sum(wait_time_ms - signal_wait_time_ms)

,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))

From sys.dm_os_wait_stats

You can initialize or clear out SQL Server 2005 waitstats with the statement dbcc sqlperf ([sys.dm_os_wait_stats],clear) with no_infomsgs. In SQL Server 2000, waits can be seen by looking at DBCC SQLPERF (WAITSTATS).

If you have SQL–dedicated box it is conceivable that you could have high signal waits indicating CPU pressure and still have a low processor queue length. In such cases, a faster CPU could reduce signal waits but just focusing on the low processor queue length as a measure of CPU contention may lead to the erroneous conclusion that CPU is fine.

In conclusion, if Signal Waits are a significant percentage of total waits, you have CPU pressure which may be alleviated by faster or more CPUs. Alternately, CPU pressure can be reduced by eliminating unnecessary sorts (indexes can avoid sorts in order & group by’s) and joins, and compilations (and re-compilations). If Signal Waits are not significant, a faster CPU will not appreciably improve performance.

No comments: