Amazon S3 outage: A guide to getting over cloud failures
A comprehensive collection of articles, videos and more, hand-picked by our editors
For five hours, the internet was broken and it was Amazon's fault. The Simple Storage Service (S3) outage had not only taken out the medium and small-scale enterprises who leverage Amazon services, but it managed to pull the covers off the underpinnings of big vendors like Apple, exposing to the world that the Apple iCloud wasn't an Apple cloud at all. Instead, it's just a branding label slapped on another cloud-vendor's clock cycles.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
This nightmare wasn't supposed to happen. Cloud services aren't supposed to go down. More to the point, they aren't supposed to go down for five hours. And most perplexing of all, they're not supposed to go down when the tide is highest during the day, starting around lunch hour on North America's east coast and the start of the workday on the west.
Cloud computing's Fukushima moment
When future historians talk about the rise, fall and plateau of the cloud, the Amazon S3 outage will no doubt be seen as a Fukushima moment.
Some call nuclear a cheap and clean source of energy, and if you don't factor in the million-year maintenance fees for monitoring hazardous waste, there's a good argument to be made for harnessing the atom. Nuclear is clean, compact and has a relatively small geographical footprint compared to something like China's Three Gorges Dam. And, of course, nuclear can generate endless amounts of on-demand power. All of these are pretty compelling reasons to drill for uranium.
Of course, there's a fairly compelling reason to abandon nuclear, and it can be summed up in one word: Fukushima. You can argue the advantages of nuclear power until you're blue in the face, but so long as the long shot possibility exists that a nuclear meltdown will poison the international food supply and turn the town you live in into a radioactive wasteland, people are going to choose windmills over deuterium-cooled reactors.
A ticking time bomb in the data center
It's the same thing with the Amazon S3 outage. Amazon can promise three nines, four nines or all the nines in the world, but so long as the possibility exists that their service will blow up in the middle of the day because someone with super-user rights typed in an incorrect command from their troubleshooting playbook, organizations that need stable and reliable systems will think again on going whole hog on Amazon S3.
Amazon admitted that the Amazon S3 outage was due to the fact that their system index had grown to such a monstrous size that nobody really understood it, and nobody in their organization predicted that it could cause such a disastrous problem if restarted. That has left users wondering how many other ticking tomb bombs exist within the infrastructure. Before the S3 outage, people invested in the Amazon cloud because they were confident in both the technology being used and the manner in which it was managed. With the S3 outage, what was once confidence has been replaced with faith -- and faith isn't a compelling attribute when it comes to IT risk assessment.
Solving the cloud computing problem
Cloud advocates haven't been shy with proselytizing their proposed solution for dealing with future cloud outages. Their answer to the problem is simple: more cloud. After all, the Amazon S3 outage was localized to the North American compute grid. If customers had leveraged other data centers, there would have been redundancy. Of course, it's a ludicrous assertion, as if software architects are going to build cloud clusters for their applications. It's like saying the best way to deal with the fact that nuclear generators might blow up is to just go and build more nuclear generators.
On par with the promise of cost savings, eliminating the need to set up convoluted environments of horizontally, vertically and geographically clustered servers in order to maintain four nines of uptime was the primary compulsion driving organizations into the cloud. If IT departments can't depend on cloud service providers to solve the availability problem, the value proposition of using something like Amazon S3 dwindles to nothing.
Bringing the cloud home
So, what's the future of cloud computing, now that users realize that a full-scale, daylight-hours crash is always a possibility? The move will be for organizations to start bringing more of their systems back into the local data center. Leveraging cloud computing technologies like OpenStack will allow organizations to build their own in-house data systems where the benefits of cloud computing can be realized without handing over full control to a third-party vendor. Not only does it put control back into the hands of the IT department, but other worries like controlling external costs and dealing with security and auditing are no longer a governance headache.
The other big move will be for organizations to leverage cloud bursting technologies while making use of the cloud with their in-house systems approach capacity. But using the cloud exclusively will become a thing of the past. The Amazon S3 outage was a Fukushima moment for cloud computing, and it will forever taint the way organizations view the cloud.
You can follow Cameron McKenzie: @cameronmcnz
Interested in more of Cameron McKenzie's opinion pieces? Check these out:
- Software ethics and why ‘Uber developer’ stains a professional resume
- It was more than user input error that caused the Amazon S3 outage
- Don’t let fear-mongering drive your adoption of Docker and microservices?
- Stop adding web UI frameworks like JSR-371 to the Java EE spec
- Why the whole '12-Factor app' discussion is a fraud
Looking for a unified theory of all things cloud native, including DevOps, Agile and continuous integration?
Will cloud based performance ever compete with JVM performance on bare metal?
Will the term 'deprecated Java method' be given meaning in Java 9?