AWS S3 Outage and Critical Infrastructure Attacks

Yesterday the AWS team released a summary of why their S3 services in the Northern Virginia (US-EAST1) region were disrupted and some of the steps they’re taking to ensure it doesn’t happen again – in that service region or others.

For an outage that affected more than half the top 100 retail sites on the Internet, the hiccups that occurred because of Amazon AWS’ S3 failures earlier this week went by with relatively little public notice or media scrutiny.

As organizations continue to move their most critical Internet services to the cloud, and new businesses gestate and operate entirely inside the public clouds of a handful of global providers, a question now upon many lips is “how long before public cloud becomes critical infrastructure?”.

For a rapidly growing number of organizations and online businesses, failure of the cloud for any length of time will not only cause irreparable damage to their business, but may wipe them out entirely.

Many will rightly point out that if the loss of your cloud provider’s services for a week is an extinction event for a company, then it’s critical that they undertake contingency plans to remove such a deadly dependency. But what about the traditional critical infrastructure services (e.g. Water, Power, etc.) that are also moving much of their customer servicing and personnel management systems to the cloud? How vulnerable are they do such a disruption? Does public cloud receive a critical national infrastructure designation through inheritance?

At some point in the near future (if not already), so much of the first-world economy will be tied to the businesses relying upon public cloud infrastructure that for all economic measurements, the cloud will have to be deemed critical national infrastructure.

Securing such an infrastructure is not a trivial task.

While the integrity and security of the data within public clouds is benchmark grade-“A” work, and the resiliency and recovery options that businesses can subscribe to offer near endless permutations within each major cloud providers cache of service offerings, this recent failure of the AWS S3 service in just one region was felt and found painful for many.

If such a small coding oversight, human error, or tweak to operations can cause a cascade failure event for a sizable chunk of the AWS public cloud, what damage can be reaped if it someone with intent went on a spree?

I have little doubt that the physical security measures used by Amazon are as close to military grade as you can get. There’s a lot to protect and, perhaps fundamentally to AWS, the public cloud business is absolutely dependent upon customer trust. If I can’t see and touch something, then I’m going to have to have absolute trust in what you’re offering me is true and effectual.

That said, as Chelsea Manning and Edward Snowden have come to epitomize, no installation is secure and no physical deterrent is absolute. As the public cloud becomes, for all intents and purposes, critical national infrastructure, the threat and impact of insider threats become more apparent.

Even if we ignore the disgruntled trusted employee and the damage they could reap, if we look through the lens of high-stake criminal endeavors or state-directed economic sabotage, then the prospect of holding a trusted engineer’s family hostage (like those of past criminal groups against the families of bank managers and key holders) and forcing actions much more devastating than the AWS S3 outage are well within scope.

A critical national infrastructure designation may help to focus minds of many involved and promote a heightened level of security awareness and training. I also expect that a new generation of behavioral monitoring and alerting technologies will come to market and focus upon detecting when employees appear to go rogue – performing non-traditional actions, making unanticipated changes to code, physically accessing different parts of buildings, etc.

But, for the time being, any businesses reliance upon public cloud infrastructure needs to be carefully weighed and metered against a loss of availability and contingencies planned ahead of time.

-- Gunter Ollmann, Founder and Principal @ Ablative Security