How to Build a Data Lake in Amazon S3 & Amazon Glacier – AWS Online Tech Talks – #AWS
In this session, we discuss best practices for data ingestion, storage, cataloging and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR.
– Understand the options for building an analytics platform that leverages Amazon S3 & Amazon Glacier
– Learn about the key considerations for ETL and other core analytics functions
– Determine if query-in-place capabilities like Amazon S3 Select, Amazon Glacier Select, Amazon Athena, and Amazon Redshift Spectrum are a good fit for your use case