

If a fleet is distributed across separate data centers, Apollo will stripe the rolling update to simultaneously deploy to an equivalent number of hosts in each location. It can perform a rolling update across a fleet where only a fraction of the hosts are taken offline at a time to be upgraded, allowing an application to remain available during a deployment. The extensive use of Apollo inside Amazon has driven the addition of many valuable features. That’s an average of more than one deployment each second. In the past 12 months alone, Apollo was used for 50M deployments to development, testing, and production hosts. Thousands of Amazon developers use Apollo each day to deploy a wide variety of software, from Java, Python, and Ruby apps, to HTML web sites, to native code services. Over time, Amazon has relied on and dramatically improved Apollo to fuel the constant stream of improvements to our web sites and web services. The added efficiency and reliability of automated deployments removed the bottleneck and enabled the teams to rapidly deliver new features for their services.

This made it easy for developers to “push-button” deploy their application to a development host for debugging, to a staging environment for tests, and finally to production to release an update to customers. Developers could define their software setup process for a single host, and Apollo would coordinate that update across an entire fleet of hosts. Apollo’s job was to reliably deploy a specified set of software across a target fleet of hosts. It didn’t make sense for each of the small service teams to duplicate this work, so Amazon created a shared internal deployment service called Apollo. The system also requires the built-in logic to correctly respond to the many potential failure cases. An automated deployment system needs to carefully sequence a software update across a fleet while it is actively receiving traffic. The applications cannot afford any downtime, planned or otherwise. Amazon web applications and web services run across large fleets of hosts spanning multiple data centers. The Amazon production environment, however, is more complex than that. You can SSH into a machine, run a script, get the result, and you’re done.

Many teams started to fully automate their deployments to fix this, but that was not as simple as it first appeared.ĭeploying software to a single host is easy.
#APOLLO CLOUD AMAZON MANUAL#
Manual deployment steps slowed down releases and introduced bugs caused by human error. With this clear focus and control, the teams were able to quickly produce new features, but their deployment process soon became a bottleneck.
#APOLLO CLOUD AMAZON FULL#
Each team took on full ownership of the development and operation of a single service, and they worked directly with their customers to improve it.

When making the move to a service-oriented architecture, Amazon refactored its software into small independent services and restructured its organization into small autonomous teams. Amazon first faced this challenge many years ago. Without efficient, reliable, and repeatable software updates, engineers need to redirect their focus from developing new features to managing and debugging their deployments. Story of Apollo - Amazon’s Deployment EngineĪutomated deployments are the backbone of a strong DevOps environment.The Story of Apollo - Amazon’s Deployment Engine | All Things Distributed
