Sameer Iyengar

Three Key Themes from the First AWS Developer Conference

December 2012

Amazon pulled out all the stops to throw their first Amazon Web Services (AWS) developers conference last week. Here are three key themes I noticed from the talks they scheduled to fill the time between the laser light shows, delicious food and abundant drinks.

1. Design for failure (and simulate failures)

I lost count of how many speakers stressed the need to be in multiple availability zones. This is easy to do, and is even easier if you start this from the beginning rather than trying to migrate later on. The last few EC2 outages resulted in downtime for many popular sites and some bad publicity for AWS. In response, the team is making it clear that machine failures are expected and a redundant architecture is required.

More importantly, regularly testing failure cases avoids surprises when they happen in real life. Two strategies:

Closely mirror your development environment to your production configuration and test machine failure and upgrades regularly.
Simulate failure in your production systems. For example, take down all the machines in one availability zone and ensure that everything is still working. Netflix has a great tool called ChaosMonkey that makes it easy to run "fire drills" like this.

2. Automate (and monitor) everything

The obvious advantage of cloud computing is the ability to scale capacity to meet demand. The hidden advantage is the ability to do this automatically. For example, Auto Scaling allows you to automatically launch and terminate new EC2 instances based on defined metrics or on a recurring schedule. This creates an architecture that can respond to unexpected user access patterns the same way that a redundant architecture can automatically respond to unexpected machine failures.

Another example of how to utilize swing capacity is for deployment. Rather than update existing machines, launch a new cluster of machines with the new code and point traffic to them. Rolling back is as simple as pointing traffic back to the old machines. If everything is working, tear down the old ones.

This has two implications for monitoring:

Careful performance monitoring is the best weapon in architecture tuning. The challenge is to identify the right configuration for your application. The logistical and financial headache of transitioning to that configuration no longer exists.
Developers can factor in cost as part of the optimization process. This is especially advantageous to startups that need to manage cashflow.

3. Data, Data, Data

Immediate access to large amounts of processing and storage capability has always appealed to data enthusiasts, but its clear that companies are making serious progress in this area. For example, Netflix has built a data processing architecture in which S3 is their "data warehouse" and EMR jobs run when they need to interact with the data.

There are many best practices around the extraction and transform stages of a data warehouse pipeline (essentially, save everything to S3 and use EMR). I'm excited to see more progress in the later two stages — warehousing and analytics.

Warehousing: Treating S3 as a data warehouse is clever but makes on the fly queries difficult. Amazon's Redshift aims to fill this gap with a more traditional data warehouse product.
Analytics: This is the area that I'm most excited for given how cumbersome it is to generate customized reports and charts. Most companies that aren't using a traditional enterprise solution like MicroStrategy have had to invest considerable energy to roll their own solution.

If you want to relive the ~~brainwashing~~ magic, here are some of the highlight sessions:

Fireside Chat with Jeff Bezos
Highly Available Architecture at Netflix by Adrian Cockcroft
Keynote by Werner Vogels
Data Science at Netflix with Elastic MapReduce by Kurt Brown