AWS re:Invent 2017: Best Practices for Building a Data Lake in Amazon S3 and Amazon (STG312) – #AWS

Learn how to build a data lake for analytics in Amazon S3 and Amazon Glacier. In this session, we discuss best practices for data curation, normalization, and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon Athena and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR. You’ll also get a chance to hear from Airbnb & Viber about their solutions for Big Data analytics using S3 as a data lake.

via Amazon Web Services

About The Author
- Launched in 2006, Amazon Web Services offers a robust, fully featured technology infrastructure platform in the cloud comprised of a broad set of compute, storage, database, analytics, application, and deployment services from data center locations in the U.S., Australia, Brazil, China, Germany, Ireland, Japan, and Singapore. More than a million customers, including fast-growing startups, large enterprises, and government agencies across 190 countries, rely on AWS services to innovate quickly, lower IT costs and scale applications globally. To learn more about AWS, visit

Tell us what you think...