The top command has always been very helpful when tracking down load issues on a server. However, when I started using EC2, I noticed that the wa stat was hogging the CPU. That was never a problem on our other servers, so I had to go look up wa. It turns out that the drive access is incredibly slow on EC2, even when using their EBS (which is supposed to be faster). I resolved this by offloading all of the static files to a different box. Requests to the CSS and image files were actually hogging the I/O enough to cause problems for the PHP processing. It was further improved by adjusting APC configurations so that there were fewere PHP hits to the hard drives too. At this point, MySql is the primary I/O culprit remaining.
- us -> User CPU time: The time the CPU has spent running users’ processes that are not niced.
- sy -> System CPU time: The time the CPU has spent running the kernel and its processes.
- ni -> Nice CPU time: The time the CPU has spent running users’ proccess that have been niced.
- wa -> iowait: Amount of time the CPU has been waiting for I/O to complete.
- hi -> Hardware IRQ: The amount of time the CPU has been servicing hardware interrupts.
- si -> Software Interrupts: The amount of time the CPU has been servicing software interrupts.
- st -> steal (used by virtual CPUs waiting while the hypervisor is servicing another virtual processor, on virtual machines)