AWS re:Invent 2016: Zillow: Classification and Recommendation Engines with EMR and Spark (MAC303)

Customers are adopting Apache Spark ‒ an open-source distributed processing framework ‒ on Amazon EMR for large-scale machine learning workloads, especially for applications that power customer segmentation and content recommendation. By leveraging Spark ML, a set of machine learning algorithms included with Spark, customers can quickly build and execute massively parallel machine learning jobs. Additionally, Spark applications can train models in streaming or batch contexts, and can access data from Amazon S3, Amazon Kinesis, Amazon Redshift, and other services. This session explains how to quickly and easily create scalable Spark clusters with Amazon EMR, build and share models using Apache Zeppelin and Jupyter notebooks, and use the Spark ML pipelines API to manage your training workflow. In addition, Jasjeet Thind, Senior Director of Data Science and Engineering at Zillow Group, will discuss his organization’s development of personalization algorithms and platforms at scale using Spark on Amazon EMR.

via Amazon Web Services

About The Author
- Launched in 2006, Amazon Web Services offers a robust, fully featured technology infrastructure platform in the cloud comprised of a broad set of compute, storage, database, analytics, application, and deployment services from data center locations in the U.S., Australia, Brazil, China, Germany, Ireland, Japan, and Singapore. More than a million customers, including fast-growing startups, large enterprises, and government agencies across 190 countries, rely on AWS services to innovate quickly, lower IT costs and scale applications globally. To learn more about AWS, visit http://aws.amazon.com.

Tell us what you think...