On Tuesday, June 8, 2021, there was a massive internet outage that brought down a significant number of websites and applications. Like many such outages, this one was caused by a relatively small web player, Fastly. Fastly provides cloud services and local caching for major portions of the internet. When it went down, the impact was felt throughout the internet.
As your application scales, it also becomes more complex. More scale and more complexity mean higher risk of a problem that could impact availability.
A well-known monitoring company suffered from serious availability problems while it was growing from a small to a midsize company. Its traffic was increasing dramatically, and its infrastructure couldn’t keep up. Worse yet, it didn’t always know when it was having a problem, and it certainly didn’t know when to expect the problems.
How do you avoid availability problems in your application? How do you mature your application as you scale so…