Tuesday, November 3, 2009

Performance counters - How Busy Is It?

A server that's too busy may be unable to satisfactorily respond to client requests. That translates into unhappy users and let's face it, an important aspect of your job as an administrator is to ensure a satisfactory "experience" for the end-users you support. The simplest measure of a system's busyness is Processor(_Total)\% Processor Time, which measures the total utilization of your processor by all running processes. Note that if you have a multiprocessor machine, Processor(_Total)\% Processor Time actually measures the average processor utilization of your machine (i.e. utilization averaged over all processors).

If you're monitoring this counter and it's running at or near 100% for extended periods, you should drill down at the process level by examining Process(instance)\% Processor Time counter for various process instances on your machine. For example, on an IIS web server you might track Process(inetinfo)\% Processor Time, while on an Exchange server a good counter to watch is Process(store)\% Processor Time and so on. High processor utilization isn't always a sign of a problem however. For example, when a backup job is running it's typical for processor utilization to hit high levels for the duration of the backup, especially if the backup program is encrypting or compressing information before writing it to tape. In fact, if your server typically runs at around 70% or 80% processor utilization then this is normally a good sign and means your machine is handling its load effectively and not under utilized. Average processor utilization of around 20% or 30% on the other hand suggests your machine is under utilized and may be a good candidate for server consolidation using Virtual Server or VMWare.

Another thing you can do to investigate high processor utilization is to break it down into Processor(_Total)\% Privileged Time and Processor(_Total)\% User Time, which respectively show processor utilization for kernel- and user-mode processes on your machine. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. And if user mode utilization is high, it may be you have your server running too many specific roles and you should either beef hardware up by adding another processor or migrate an application or role to another box.

If your machine is running several applications or handles several server roles on your network, another way to measure busy-ness is to measure processor contention, which is an indication of how different threads are fighting for the attention of the processors on your machine. If too many threads are contending for use of the same processor, the requests by these threads get queued up, and looking at the System\Processor Queue Length counter gives an indication of how many threads are waiting for execution. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. Note that this is not always a hard and fast indicator however, for some services like IIS 6 pool and manage their own worker threads, so on a busy web server for example you would want to look at other counters like ASP\Requests Queued or ASP.NET\Requests Queued as well. Furthermore, the larger the number of active services and applications running on your server, the busier the processor queue will normally be, so on a multi-role server running near 100% utilization content may only be a significant factor once System\Processor Queue Length exceeds something like 10 instead of 5 as mentioned previously.

No comments: