Downtime Reports: How Did They Respond?

I like reading downtime reports, because it shows what can happen and how people and departments respond to the crisis. There were two sites that experienced downtimes over the weekend – one very well known and one not.

WordPress.com went down over the weekend, disrupting thousands of blogs, including VIP subscribers. According to the report, the data hosting company had an unscheduled change take place in a router, resulting in wordpress.com responding to a fraction of the requests coming in. This meant that wordpress.com was not down, just inaccessable to 90% of incoming traffic. The failover mechanism was not activated, presumably because the host was not down – rather its ability to serve up web pages was hampered – the server itself was running fine.

This suggests the following improvement areas (speaking overall):

  • Use some sort of change control – and test changes when made. This unscheduled change very likely did not just affect wordpress.com, but perhaps many others.
  • Monitor not just the server, but paths into the server – everything between the customer and the server.
  • Failover mechanisms should be sensitive to not just server performance, but anything that affects the presenting of web pages to the public (or whatever service is being offered).
  • Relying on a single hosting provider (at one time) means that any problems that arise at that hosting provider affect your service in its entirety; relying on multiple providers in a cluster configuration means that if one hosting provider drops, your service continues (though degraded slightly).

The other site that went down was jdorganizer.com (the web site for Jeri Dansky: Professional Organizer). Since she used to be a system administrator before being a professional organizer, she knows IT. As a user, she had to respond to the outage she experienced (again caused by the data hosting provider).

Jeri explains on her blog what happened, and how she responded as a user of services. She lists the things she learned from the experience, in particular preparing a disaster plan and reviewing it.

Another thing she did was to switch providers when she no longer trusted hers to provide reliable services; being of a technical bent, she was able to make the switch and configure things reasonably easily. She had someone check availability and fixed the problems that arose.

Both of these experiences provide a window into how companies and other users of hosting services can respond when things fail. In both of these cases, the providers failed: the response from the users of the hosting provider services can help us to learn what to do if and when it happens to us.

Kudos to the WordPress.com team for keeping the blogs running, and kudos to both for being willing to tell us what happened (in delightfully complete technical detail…).

2 thoughts on “Downtime Reports: How Did They Respond?”

  1. Hi,

    I enjoy reading these sorts of posts as well. It’s always fascinating to see how others deal with events, from a technical and customer service perpective. Another one I saw recently was from the MicroISV on a Shoestring blog, discussing an outage with their Bingo Card Creator software. It’s at http://www.kalzumeus.com/2010/02/21/i-had-downtime-today-heres-what-im-doing-about-it/ if you’ve never seen it. I thought the way they reached out to the affected customers was quite interesting on that one.

    Jim.

  2. David, I’m glad to hear you found my account useful. It certainly helped that, unlike many small business owners, I manage my own web site. I could make my own call that it was time to move – and then do it.

    I’ve been keeping up with the issue at my prior hosting company, and 10 of the shared servers are STILL not up. People are noting that some machines that came up are failing again. It’s a nasty situation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: