O'Reilly Community: The AWS Outage: The Cloud's Shining Moment: if your systems failed in the Amazon cloud this week, it wasn't Amazon's fault. You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon's cloud computing model....... two dueling architectural models of cloud computing applications: "design for failure" and traditional. ..... The Amazon model is the "design for failure" model. Under the "design for failure" model, combinations of your software and management tools take responsibility for application availability. The actual infrastructure availability is entirely irrelevant to your application availability. 100% uptime should be achievable even when your cloud provider has a massive, data-center-wide outage. ...... The advantage of the "design for failure" model is that the application developer has total control of their availability with only their data model and volume imposing geographical limitations. The downside of the "design for failure" model is that you must "design for failure" up front. ...... Physical redundancy encompasses all traditional "n+1" concepts: redundant hardware, data center redundancy, the ability to do vMotion or equivalents, and the ability to replicate an entire network topology in the face of massive infrastructural failure. ...... If you had redundancy across availability zones, you would have survived every outage suffered to date in the Amazon cloud. ...... If you had regional redundancy in place, you would have come through the recent outage without any problems except maybe an increased workload for your surviving virtual resources. ...... Cloud redundancy enables you to survive the complete loss of a cloud provider. ....... Being home to the world’s reserve currency confers great advantages on the U.S. economy. Because of it, our government, companies and households can borrow money more easily and cheaply. And because all that demand for dollars artificially raises its value, we can import goods at a cheaper price than other countries. ...... Applications built with "design for failure" in mind ..... will achieve uptimes you can't dream of with other architectures and survive extreme failures in the cloud infrastructure. ...... no humans, no 2am calls, and no outage! ..... Netflix, an AWS customer that kept on going because they had proper "design for failure" .. ? Try doing that in your private IT infrastructure with the complete loss of a data center.
I should have, but I did not expect this to happen. Servers are known to go down. Heck,
PCs crash. The browser freezes. The cloud went down. In a big way. What's next? Datacenters? I think it did happen once. One Google datacenter went down. Correct me if I am not remembering it right. What if Facebook's datacenter in Oregon went down for an hour?
So the cloud went down. And there has been much talk. The
Amazon Web Services is pretty much the cloud that most of us are privy to. And you thought Jeff Bezos was in the business of selling books.
The cloud should not go down. The cloud can not go down. It is like when there is a power cut the generator turns on on its own immediately, and so although there was a power cut, you did not feel it. The cloud needs that mechanism. Otherwise it is not a proper cloud. The cloud is not like the rest of us. The cloud is not supposed to go down.