AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
Over the last year, Yelp has transitioned its scalable and reliable parallel task execution system, Seagull, from On-Demand and Reserved Instances entirely to Spot Fleet. Seagull runs over 28 million tests per day, launches more than 2.5 million Docker containers per day, and uses over 10,000 vCPUs in Spot Fleet at peak capacity. To deal with rising infrastructure costs for Seagull, we have extended our in-house Auto Scaling Engine called FleetMiser to scale the Spot Fleet in response to demand. FleetMiser has reduced Seagull’s cluster costs by 60% in the past year and saved Yelp thousands of dollars every month.
In this session, we describe how Yelp uses Spot Fleet for Seagull and lessons we’ve learned over the past year, along with our recommendations on how to use it reliably (pro tip: don’t get outbid for your whole Spot Fleet). We conclude by looking at our future plans for extending Spot Fleet usage at Yelp.
via Amazon Web Services