As some of you noticed, including a number of major media organizations, WordPress.com had some unexpected downtime on Thursday evening. Whether you’re eating delicious BBQ, as I was, watching a marathon, or about to post your opus, downtime is an annoying interruption and we hate it.
This had nothing to do with our network providers, or data centers, or aliens, it was completely our fault. A single line, nay, a single character out-of-place, slipped by our normal review and testing and started overwriting settings when triggered. The team immediately took the site down to prevent further damage and clean up the mess that had been caused. All hands were called to deck.
First we determined that 11.2 million blogs were unaffected by the bug. So we brought those back up. For the remaining 50,000 or so, including some VIPs, we started restoring the lost settings using backups, audit trails, and logs. This was largely automated and we brought blogs back online as they were fixed, but a few final tricky ones were brought back one-by-one by hand because we wanted to make sure everything was in its right place.
For most folks (99%) your site was only unavailable for an hour, the rest came up a bit after that, and the tricky ones we worked on until Friday morning. Fortunately because of the time of day and the shorter duration, this had a smaller effect on traffic (about 3.9m) versus the last time (5.5m).
As a silver lining to this failing of the cloud, we learned a lot. We’ll be using our newfound experience to keep WP.com a safe, stable, and robust place to hang your hat and have your blog call home.
If you have any questions, notice any remaining wonkiness, or just want to say howdy, we’d be happy to hear from you.