What are the possible causes of slow workstation performance?
We have all run into it before–the dreaded, vague complaint that “the network is slow.” We all know that sometimes it is real, and sometimes it isn’t. It is so hard to take at face value, because often this is all of the information that your users can give you to describe the problem. How do you go about troubleshooting something with both so many possible causes and so much subjectivity?
First, it can help to think through the possible areas that could slow down the user’s experience, and eliminate the simplest causes first. Here are some possibilities:
- A hardware problem on the user’s PC.
- A resource limitation on the user’s PC.
- A hardware problem on a key server.
- A resource usage limitation on a key server.
- An application or service bug or hang.
- A network problem at the physical layer.
- A network congestion issue.
First steps: check out a client PC with the symptoms
Is the problem happening right now? Go and talk to the user/users who made the complaint. Find out what patterns they have noticed about the problem (but be prepared that they may remember incorrectly or have noticed a false pattern). Try to have them narrow it down to a time of day, a frequency, a certain application, to a locally hosted server or a service accessed over the Internet.
Check the task manager on the user’s PC and look at running applications, CPU usage, and memory usage. You may make a quick review of the event log in “Administrative View” as well and notice any errors that look like they may be related. Errors might include disk or network card issues that will show up as “information” events, not warnings or errors.
Look over the user’s shoulder as they walk you through the steps that they take to reproduce the problem. Don’t make any assumptions about how a user described a problem to you–doing this may let you see that they perform tasks in an unusual way and let you avoid wasting time on the wrong problem (e.g., you may assume that they use Outlook, but in fact they have been using a web client). Avoid judgment at this point.
Run some basic tests in an elevated command prompt.
- Run an nslookup against a local host, a remote host, and a remote host that you don’t expect to have a locally cached response for. Watch for failures and slow responses.
- DNS could be misconfigured at the client or on your domain controller, or a DNS server could simply be offline.
- Ping a few servers, internal and external. Look for packet loss and slow response time. Internal response should be < 1 ms, external < 10-50 ms, with 0% packet loss. High packet loss may show congestion or a physical network problem.
- Packet loss could indicate a physical layer problem if across multiple servers on the same network, or on the workstation/server’s NIC, or high CPU usage on the server or hypervisor itself.
Try restarting the workstation and check physical connections (patch cables, etc.).
Open the tools that the user uses throughout the day. For example, open and save a file on a network drive, send an email, open a web page. Try doing using speedtest.net or your own ISP’s speed test page.
If you use DFS, make sure that the user is set to use the proper DFS replica.
Troubleshooting servers
If the troubleshooting at the user’s desk gave you any insight, starting with the implicated servers, check for server problems. Consider setting up a monitoring server (System Center Operations Manager, Nagios, Zabbix) if possible!
- Open task manager. Check for high CPU usage, and if found identify the process or service that is the culprit. Google can often lead you to documentation to determine if it is safe to restart the process without user impact.
- Check for high memory usage.
- Check IO usage with System Monitor / disk queue length and compare against theoretical max performance for your disk configuration.
- Check for free disk space and paging file size.
- Look for unusually high network usage.
If a restart of the whole server or a single process seems to resolve the problem, monitor it for a few days and determine if you can add additional resources (virtual CPUs or RAM for a VM). If IO performance is slow, it will be a bigger project to move to faster disks/more disk spindles in a RAID array. Look at alternatives to reduce the load.
Look through the event log on suspect servers. Don’t rely just on the “administrative events” filtered view, because you may miss “information” level events about the disk or network card that can let you know about potential future hardware failures showing up now as just slowness.
Exploring network equipment
Let’s face it–the network is seldom the cause of “the network is slow!” complaints. But every once in a while, it is.
Local area network
Start by trying to identify if you noticed any network issues–dropped packets are the biggest symptom, and can be identified with a basic ping test. Identify the scope of the problem–are all users on a single switch experiencing it, or all users in the building? Is it limited to specific services? This can help you determine if the problem is in a network closet, your server room, or at your edge router or even ISP.
- Examine the logs on suspect switches and routers and look for a high number of collisions, CRC errors, or unusually high usage on a non-uplink port. Identify a suspect port and try swapping a cable.
- Look for network loops (try looking for frequent “spanning tree” events in the switch log file). Someone may have plugged in both ends of a cable to a single device in a poorly-configured network without spanning tree turned on.
- Look for known problems with your particular firmware revision, particularly if there was a recent hardware change (installing new VoIP phones caused a problem with our switches in one odd example, fixed by upgrading firmware).
External wide area network
If the problem is linked to Internet usage, you will want to look on your firewall.
- Identify high network traffic compared to the available bandwidth in the Firewall logs or a live firewall network utilization view.
- Use a packet capture or firewall specific tools to narrow down the type/source of the traffic.
- Try selectively blocking or rate-limiting traffic if it is possible.
- Call your ISP and perform appropriate troubleshooting on physical hardware, such as modems or cables linking to your Internet connection.
Advanced workstation troubleshooting
If the problem didn’t appear in a single session at the user’s workstation, and you have done basic troubleshooting on servers, it is likely either an application or a Windows service on the workstation manifesting intermittently.
Ask a small sample of users to keep a log of problems for a few days. Try running a continuous ping overnight to see if packet loss shows up only intermittently. If desired, to get more information, run a few tests of LAN bandwidth with iperf.
Look for patterns in times of reported slowness. Look through the event log on the user’s PC and try to correlate application crashes or slowdowns with the times of reported problems. Try to reproduce the problem on your own PC. Look for high CPU/memory using activities correlated with those times–application installation, Windows update installs. Take a look at scheduled tas
ks, and take some time to think yourself what might be scheduled automatically that is not in the Scheduled Tasks view (such as Configuration Manager software inventory, application installation, updates scan, etc.) Look through available Windows hotfixes to see if any apply. Disable any extensions in applications such as Microsoft Office and Internet Explorer.
ks, and take some time to think yourself what might be scheduled automatically that is not in the Scheduled Tasks view (such as Configuration Manager software inventory, application installation, updates scan, etc.) Look through available Windows hotfixes to see if any apply. Disable any extensions in applications such as Microsoft Office and Internet Explorer.
In the moment you notice the problem yourself:
- Check Sysinternals’ Process Explorer and sort by highest CPU, Memory, and Disk access (all three can show up as “slowness”)
- Look for processes with suspicious names and run malware scans.
- If you notice high usage of svchost.exe, set the individual services to run under separate service hosts to make it possible to narrow down to a single service.
- Look for hotfixes or documented problems with particular services that show up as exhibiting high CPU or memory usage.
All of this effort can be very time consuming, but how satisfying to play the detective for a few days if you can solve a user’s problem.
0 Comments