03 June 2012

The EC2 instance became unreachable. The Amazon console dumped the following from the syslog:

[2803865.768008] INFO: rcu_sched_state detected stall on CPU 0 (t=7805190 jiffies) [2803865.768014] INFO: rcu_sched_state detected stall on CPU 1 (t=7805190 jiffies) [2804045.888006] INFO: rcu_sched_state detected stall on CPU 1 (t=7850220 jiffies) [2804045.888007] INFO: rcu_sched_state detected stall on CPU 0 (t=7850220 jiffies) [2804226.008008] INFO: rcu_sched_state detected stall on CPU 0 (t=7895250 jiffies) [2804226.008010] INFO: rcu_sched_state detected stall on CPU 1 (t=7895250 jiffies) [2804406.128007] INFO: rcu_sched_state detected stall on CPU 1 (t=7940280 jiffies) [2804406.128006] INFO: rcu_sched_state detected stall on CPU 0 (t=7940280 jiffies)

After the second reboot, the instance resumed normal operation. Right now, it is an N of 1, but I will report back if this happens again and do some additional troubleshooting at that point -- there was no unusual spike in activity around the time when the instance went down. Unfortunately, I'm not alone with this problem: Google results.

July 6 - A month later, I encountered the same problem with the same resolution after rebooting twice. My faith in AWS has been shaken between this and the power outage a couple weeks ago -- it's been an unfortunate summer so far.

Technologies:


blog comments powered by Disqus