‘Maintenance error’ cause of Facebook outage

An interruption to the service of social media network Facebook on 4 October was the result of maintenance-related configuration changes that triggered a large-scale disruption to communication between data centres.

In a blog post by Facebook’s vice president of infrastructure Santosh Janardhan, the events that led to Facebook – and its family of apps including Instagram, WhatsApp and Messenger – going offline for more than five hours were detailed. During maintenance of the ‘backbone’ of the network, a command was issues to assess how much capacity was available. However, the command failed, and an audit tool designed to stop mistaken commands failed to identify the error.

This single fault quickly led to the disconnection of links between data centres and the internet. Further complication was added when Facebook’s engineers were initially unable to restore access because its data centres are heavily protected and employees could not gain immediate entry.

Janardhan said: “Every failure like this is an opportunity to learn and get better, and there’s plenty for us to learn from this one. After every issue, small and large, we do an extensive review process to understand how we can make our systems more resilient. That process is already underway.

"We’ve done extensive work hardening our systems to prevent unauthorized access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making. I believe a trade-off like this is worth it – greatly increased day-to-day security vs. a slower recovery from a hopefully rare event like this. From here on out, our job is to strengthen our testing, drills, and overall resilience to make sure events like this happen as rarely as possible.”

    Share Story:

Recent Stories


Cyber physical risks
Property damage as a consequence of cyber attack is often excluded from standard property policies, but as the industrial internet of things expands, so too do the risks. This podcast examines the evolving threat landscape. Published October 2021

Financial institutions were early adopters of cyber security and insurance. Are they still on top of the game?
Managing huge amounts of sensitive data online makes financial institutions a prime target for hackers. As such, the sector was an early cohort for insurers in creating cyber cover. Since then, the market has evolved almost beyond recognition. It continues to challenge itself to this day, complying with rigorous regulatory demands and implementing avant-garde enhancements to keep abreast of the ever-changing risks. Published June 2021

Advertisement