- August 14, 2018
- Matthew F. Fox
On August 14th 2003, exactly 15 years ago, a cascade of human and technical errors led to a blackout that affected the majority of the northeast. At least 50 million people, including some located in Canada, were left without power for 24 hours – and in some cases, as much as 2 weeks.
The blackout’s primary cause was a software bug in the alarm system at the control room of FirstEnergy Corporation, an Akron, Ohio–based company, causing operators to remain unaware of the need to re-distribute load after overloaded transmission lines dropped into foliage. What should have been a manageable local blackout cascaded into collapse of the entire electric grid.
I remember the day well; in fact, I remember exactly where I was when the power went out – the men’s room of the office I worked in at the time. I was the IT Manager at a non-profit organization, NFTE, and my office was directly across the hall from the men’s room.
At first, I assumed that we were having a minor power problem in the building – something that would happen from time to time. It became clear that more was going on once the UPS units protecting the servers began to sound the alarm.
The blackout took place in the wake of 9/11 and just a few months after I completed a disaster recovery project for the organization, providing redundancy for most services and applications on the west coast in case we experienced a disaster in the northeast. The disaster recovery system became active as I performed graceful shutdowns of the servers located in the office, and while we didn’t have electricity to access anything on the west coast, email and the organization’s websites were still up and running with little to no interruption.
With the local servers shut down and their backups up and running on the opposite coast, I took a leisurely (not really) walk down 29 flights of stairs, met up with some coworkers, and began my journey home.
My journey home took a detour once it became clear that getting back home to Long Island wasn’t going to be an easy feat. There was no mass transit; no subway, no LIRR, no busses. All of the major roadways connecting Manhattan to Long Island were packed with people walking on foot and that wasn’t something I wanted to be a part of.
Instead of trying to make my way home, I stayed in the city and had a great night. I spent the evening walking through the Financial District and Chinatown, enjoying free ice cream and drinks that were handed out by shop owners with no available refrigeration, ultimately ending up at a coworkers apartment in Alphabet City.
The entire area was one huge block party by sundown. Pizzerias with gas-powered ovens were making pizzas and giving them out anyone within arm’s reach, bakeries were handing out pastries before they could spoil, and battery-powered boomboxes filled the crowded streets with a soundtrack for a hot summer night without power.
Preparation is Key
My day could have been very different from what I experienced – if I didn’t have the appropriate backup systems and redundancies in place. Instead of taking a walk along the east side of the city with an ice cream cone in each hand, I could have been stuck in my office wondering if my servers and the data stored on them were safe; wondering if I’d be able to bring everything back online once the power was restored.
UPS units protected my servers from power-related damage and allowed me to perform graceful shutdowns on each, ensuring that all services terminated properly with no risk of data loss or corruption.
All major file stores and business applications were mirrored to servers at a datacenter located in California, and were online within minutes of the blackout. A backup Exchange server, also in the California datacenter, was online within seconds of the blackout.
Active Directory was replicated to 2 separate domain controllers in California and a terminal server allowed staff around the country to continue their work with minimal interruption.
The next week, the organization’s new CFO came onboard. After learning about how well we were able to handle the blackout, he shared a line with me:
Proper planning prevents poor performance.
I looked at him and said, “Well sure – but in this case, proper planning prevented poor partying.”
We shared a laugh and it was back to work as usual.