Last week, a major AWS outage took down gaming behemoths like Fortnite and Roblox and caused wide chaos by also impacting important services and programs outside of the gaming world. The cause of the outage has now been identified, revealing that the massive outage was the result of a single software bug.
As initially reported while Amazon was trying to figure out the cause of the 15-hour outage, the problem originated in DynamoDB, AWS’s DNS management system. As reported by PC Gamer, the issue originated with DNS Enactor, a component of DynamoDB that constantly updates domain lookup tables to stay on top of load balancing as conditions change.
The bug started when the DNS Enactor started to lag, and “experienced unusually high delays needing to retry its update on several of the DNS endpoints.” The DNS Planner continued to generate new plans, and while the first Enactor was delayed, a second DNS Enactor began to implement those plans.