As most of you are aware, on Monday night and Tuesday afternoon our services were temporarily unavailable. Now that everything is stable, we’d like to take some time here to explain what we know about what happened and answer some of the most commonly asked questions that came up during the incident. Before we get to the technical details, the most important message we would like to convey is that we understand how serious this downtime was and we offer our sincerest apologies to you and your users for having to go through this. We know many of you run important registrations, campaigns, contests and services through Wufoo and, for some, the timing couldn’t have been worse. We completely empathize and hope the following will help you understand our position and our options for the future and restore some of that lost confidence.
###So, what happened exactly?
During both outages, the problem was the same — all power went out at the data center and this resulted in all sites and services hosted there to go down in addition to ours. The downside to a power loss on this scale is that all core level services were affected. This therefore significantly increased the time to get all servers online. Networking on critical level servers had to be brought up first, and then all application level servers had to go through a crash recovery process.
We’re still working on getting a better understanding from our data center as to how and why this happened, and how they’re going to make sure this isn’t going to happen again. When we picked this data center, their power system was one of the key criteria. The main elements of their power equipment (UPS, generator and power control) are all good systems with ample capacity. We’re still trying to understand what it was about how these are put together, or how they’ve been managed, that led to the failure that we had.
###What triggered the power outage?
In both cases, there was a significant reduction in the voltage to the building from the local electric utility.
###Doesn’t Wufoo’s servers have a backup power supply?
We have redundant circuits on two separate power systems. Additionally, our UPS systems turned on as expected, and provided power for another hour after the outage occured. The third, and final, resort is to use a generator when a failure like this happens. Once an outage gets to the level that we had earlier this week, we must rely on the the company that runs the data center to get emergency systems like a generator up and operational. We don’t have the details yet, but there was apparently some difficulty in switching over to generator power, so the UPS reserves on our systems therefore ran out. Again, we’re seeking more details about this.
###Once power came back, why was Wufoo down for so long?
It is tough to prepare for a worst case scenario, and we can assure you everyone at BitPusher (our server management team) was moving as fast as they could. Realistically, recovery from a situation like this will always take at least an hour since all core servers need to be restored and all application servers have individual crash recovery processes that have to complete. On Tuesday, the process went smoother and it took approximately an hour from when power was restored.
###What improvements are you going to make?
Continuing in the current environment without significant changes that ensure this won’t happen again is unacceptable. We are examining three different directions: better power in the existing facility, moving our infrastructure to a different facility, and working with third-party hosting partners.
One thing, we’re also working on is creating better communication strategies and methods for communicating with our users during such incidents. While we responded (as always) to every single email sent in during the outage with updates as they came in, it’s good for you to know how to passively observe our actions behind the scenes.
Many of you followed our updates on on our [Twitter page](http://twitter.com/wufoo/) and our [Wufoo Status blog](http://wufoo.tumblr.com) and we think everyone liked how that worked out. Currently, we’re working on ways to enhance the Wufoo Status blog so that it provides daily updates and a feed on our uptime status and additional information. More on that to come as we enhance those processes.
###Why was there no downtime page?
Normally, Wufoo has a styled downtime page that appears when a form is unavailable. These pages make the embedded forms look more professional. During these outages, there was no downtime page and every request to our servers timed out. The reason for this is because our current downtime system relies on the load balancer. Since the load balancer lost power also, we had no downtime page to serve. We’re currently looking at our options for restoring downtime functionality from another facility.
###Is my data safe?
Yes. There was no data loss during the power outage and all servers successfully completed crash recovery without issue.
###This experience hurt our company. We would like a refund.
We completely understand and again apologize for such inconveniences. If you feel that these outages directly interfered with your primary purpose for using Wufoo, please contact us via [support](/support/) and we would be happy to give you a refund for this month’s services.
###End Note
During the outage, many of you were actually very supportive and we’re extremely grateful for such understanding and support. We’ve always believed that we have the best kinds of people using Wufoo and many of your actions this week served as a testament that this really is true. Our entire team sincerely thanks all of you for such patience and understanding and hope we can live up to such good treatment by minimizing these incidents to few and far between.
Comments
Thanks for the run down on the situation. It was a real pain in the *ss but somehow I think you lot had a worse day at work than most of us. For a service that has been almost flawless to date, it must hurt to have this happen, so no hard feelings. Just please please please don’t do it again!
Andy
Posted May 21st, 2009 by Andy Turley.The twitter update feed was a great way for us to know what was happening. Thanks for keeping everyone in the loop.
Posted May 21st, 2009 by Jonathan.When I found the site down, I was able to find you on twitter immediately and stay tuned for the latest. Fantastic use of twitter, kept me updated. You have an awesome service and even under this stress you still gave star service and thought outside the square. Well done!
Posted May 21st, 2009 by Kim Buchanan.Bravo handling what was obviously a nasty situation with a 3rd party datacenter provider. I’m bookmarking this so that if our services ever go through the same disaster scenario, I know who to emulate!
Posted May 21st, 2009 by Tyler Smith.thanks for the candid and straight forward explanation..I for one – will stay with you…
-g
Posted May 21st, 2009 by Gary.From my perspective, y’all handled this with grace and candor. Your Twitter updates were very helpful and to find your status RSS in the text above is excellent. I’m a big WuFan of you wufooligans, keep up the great work, your “honesty is the best policy” approach to customer service, and I’ll keep singing your praises!
Posted May 21st, 2009 by Brett.All questions about redundancy(!) aside, I overwhelmingly appreciate the updates on Twitter and the straightforward, direct acknowledgment of what happened, why it matters and what the future holds. wufoo is a great service and the consistent updates are always a warm welcome when logging in. I’m one of those people it couldn’t have been worse for, but in the end, the response from wufoo has been delightful. Now, how about a second datacenter? 🙂
Posted May 21st, 2009 by Tony Webster.I think the fact that your service has been flawless for the two years that I’ve used it made it such a surprise for my staff. wufoo is so popular at my organization, I’ve had near uprissings when I’ve sugested other solutions instead of wufoo for a project.
Although the outage caused some issues, I sounded in-control of the situation when , after only minutes of the second outage and while on the phone with the C.O.O., I was able to give real time updates from your Twitter feed. If not thing else, you have proved a perfect example of a benefit of Twitter for my next Social Media presentation. Thank you for such a fantastic service.
Posted May 21st, 2009 by Chris Groves.Still not working for me out in Singapore.
Posted May 21st, 2009 by Patrick.The form is there, but can’t enter any info . Hate to be a bother, but can you check on this please. Thanks, woofoo rocks
Patrick, can you please send a link to your form to support@wufoo.com. If the form is being displayed then it sounds like you’re probably experiencing something unrelated to the outage.
Posted May 21st, 2009 by Chris Campbell.Everyone needs a hug.Great content as usual. Keep it up! Thanks again.
Posted May 21st, 2009 by website design.