Tracking down a serious problem took this route while testing a public AMI:
- After a period of intense traffic, Varnish begins serving a 503 error.
- It turned out to be a backend error that could be resolved by restarting Apache.
- No errors in the Apache log.
- Grep for apache in /var/log messages and found "apache2 invoked oom-killer".
- oom-killer kills processes when the system runs out of memory. It was apparently choosing Apache in some cases.
- Further research indicated that the public AMI was built without swap space. Seriously. No swap space.
- Rather than adding a partition, I simply added a swap file to resume my testing.