Explained: The Reason Behind Facebook Outage
The outage continued until the market closed, and the company's shares fell about 5% below their opening price on Monday. By mid-afternoon, services began to resume after Facebook reportedly sent a team to its Santa Clara data centre to "manually reset" the company's servers.
But what makes the outage unique is how extremely Facebook was offline.
Facebook sent out a short tweet in the morning to apologize saying "some people are having trouble accessing our apps and products." Then reports surfaced that the outage was affecting not just its users, but the company as well. Employees were reportedly unable to enter their office buildings, and staff called it a "snow day" - they were unable to get any work done because the outage affected internal collaboration applications as well.
Facebook did not comment on the cause of the outage, although security experts said the evidence pointed to a problem with the company's network that cut Facebook off the internet in general and also itself.
According to John Graham-Cumming, CTO for network giant Cloudflare, said, the first signs of trouble started around 8:50 am PDT in California. He further added, "Facebook disappeared from the internet in a flurry of BGP updates" in a two-minute window, referring to Border Gateway Protocol, the system that networks use to discover the fastest way to send data over the Internet to another network.
The updates were specifically withdrawals from the BGP route. Essentially, Facebook had sent a message to the Internet that it was closed to the business. Without any route to the network, Facebook was basically cut off from the rest of the internet, and due to the way the Facebook network is structured, withdrawals from the route also removed WhatsApp, Instagram, Facebook Messenger, and everything else inside its digital walls.
A few minutes after the BGP routes were taken down, users started noticing problems. Internet traffic that should have gone to Facebook was essentially lost to the Internet and got nowhere, shared Rob Graham, founder of Errata Security, in a tweet.
Users began to notice that their Facebook applications had stopped working and that websites were not loading and began reported having problems with DNS or the domain name system, which is another critical part of how the Internet works. DNS converts human-readable web addresses to machine-readable IP addresses to find where a web page is located on the Internet. Without a way to access Facebook's servers, applications and browsers would continue to cause what looked like DNS errors.
It is still not known exactly why the BGP routes were withdrawn. BGP, which has been around since the advent of the Internet, can be maliciously manipulated and exploited in ways that can lead to massive outages.
Chances are, a Facebook settings update went terribly wrong and its flaw spread across the internet. A now-deleted Reddit thread from a Facebook engineer described a BGP configuration bug long before it became widely known. The solution might be simple, the recovery can extend from the next few hours to the next few days due to the way the Internet functions. Typically, the nternet service providers update their DNS records every few hours, but it can take several days for them to fully propagate. Facebook tweeted around 3:30 p.m. local time. "To the huge community of people and businesses around the world who depend on us: we're sorry. We've been working hard to restore access to our apps and services and are happy to report they are coming back online now. Thank you for bearing with us."