There are a couple of perfmon counters you can track to monitor for signs that your machine's hardware devices are functioning properly. One of these is System\Context Switches/sec, which measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if you are seeing a similar jump in the Processor(_Total)\Interrupts/sec counter on your machine. You may also want to check Processor(_Total)\% Privileged Time Counter and see if this counter shows a similar unexplained increase, as this may indicate problems with a device driver that is causing an additional hit on kernel mode processor utilization. In this case you can drill down and maybe find the culprit by examining the Process(instance)\% Processor Time counter for each process instances running on your machine. This won't directly tell you which driver is utilizing processor time, but it may indicate which calling application is indirectly causing the problem and may help you troubleshoot the issue further.
If Processor(_Total)\Interrupts/sec does not correlate well with System\Context Switches/sec however, your sudden jump in context switches may instead mean that your application is hitting its scalability limit on your particular machine and you may need to scale out your application (for example by clustering) or possibly redesign how it handles user mode requests. In any case, it's a good idea to monitor System\Context Switches/sec over a period of time to establish a baseline for this counter, and once you've done this then create a perfmon alert that will trigger when this counter deviates significantly from its observed mean value.
No comments:
Post a Comment