AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)

Over the last year, Yelp has transitioned its scalable and reliable parallel task execution system, Seagull, from On-Demand and Reserved Instances entirely to Spot Fleet. Seagull runs over 28 million tests per day, launches more than 2.5 million Docker containers per day, and uses over 10,000 vCPUs in Spot Fleet at peak capacity. To deal with rising infrastructure costs for Seagull, we have extended our in-house Auto Scaling Engine called FleetMiser to scale the Spot Fleet in response to demand. FleetMiser has reduced Seagull’s cluster costs by 60% in the past year and saved Yelp thousands of dollars every month.

In this session, we describe how Yelp uses Spot Fleet for Seagull and lessons we’ve learned over the past year, along with our recommendations on how to use it reliably (pro tip: don’t get outbid for your whole Spot Fleet). We conclude by looking at our future plans for extending Spot Fleet usage at Yelp.

via Amazon Web Services

About The Author
- Launched in 2006, Amazon Web Services offers a robust, fully featured technology infrastructure platform in the cloud comprised of a broad set of compute, storage, database, analytics, application, and deployment services from data center locations in the U.S., Australia, Brazil, China, Germany, Ireland, Japan, and Singapore. More than a million customers, including fast-growing startups, large enterprises, and government agencies across 190 countries, rely on AWS services to innovate quickly, lower IT costs and scale applications globally. To learn more about AWS, visit http://aws.amazon.com.

Tell us what you think...