How to Build a Data Lake in Amazon S3 & Amazon Glacier – AWS Online Tech Talks – #AWS

In this session, we discuss best practices for data ingestion, storage, cataloging and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR.

Learning Objectives:
– Understand the options for building an analytics platform that leverages Amazon S3 & Amazon Glacier
– Learn about the key considerations for ETL and other core analytics functions
– Determine if query-in-place capabilities like Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum are a good fit for your use case

About The Author
- Launched in 2006, Amazon Web Services offers a robust, fully featured technology infrastructure platform in the cloud comprised of a broad set of compute, storage, database, analytics, application, and deployment services from data center locations in the U.S., Australia, Brazil, China, Germany, Ireland, Japan, and Singapore. More than a million customers, including fast-growing startups, large enterprises, and government agencies across 190 countries, rely on AWS services to innovate quickly, lower IT costs and scale applications globally. To learn more about AWS, visit http://aws.amazon.com.

Tell us what you think...