Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data...
Transcript of Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Axel Larsson, Enterprise Solutions ArchitectJoyjeet Banerjee, Enterprise Solutions Architect
9 April 2019
Consuming the Data Lake -Reporting, Analytics, Machine
Learning
What have we learned so far
Athena?
Anti-Pattern
Everything
Query
Also an Anti-Pattern
Everything
Query
One tool to rule them all
Where do I start?
• Understand your data• Data Structure, Access patterns & characteristics,
Temperature, Cost, Size
• Know your audience• Business Users, Data Scientists, Developers
• Select the right service
Archival
In-memory Warehouse
NoSQL
Hot data Warm data Cold data
Dat
a St
ruct
ure
Low
High
Object
Search
Understand your Data
Latency
Data volumeHighLow
Request rate
Cost / GBHigh Low
Amazon ElastiCache
Amazon ES
AmazonDynamoDB Amazon S3 Amazon Glacier
Hot data Warm data Cold data
Dat
a St
ruct
ure
Low
High
Understand your Data
Latency
Data volumeHighLow
Request rate
Cost / GBHigh Low
NoSQLObject
Archival
Search
In-Memory Warehouse
Amazon Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Who is your audience?
PRIORITIES NEEDS
Creating engaging visual and narrative journeys for analytical solutionsData Visualizer
Manages data as a product. Ensures freshness and consistency of data; understands lineage and compliance needs; treats DS as customers
Data Product Manager
Monitoring for reliability, quickly diagnose deployment or availability issues
DevOps Engineer
ROLE
VisualizationDashboardsReporting
Reports – data quality, errors
Ad hoc queryingDashboards
Makes sense of data, generates and communicates insights to improve or create business processes, creates predictive ML models to support them
Data Scientist Ad hoc querying Robust ML tools
Builds scalable pipelines, transforms and loads data into structures complete with metadata that can be readily consumed by DS
Data Engineer
Ad hoc queryingQuick visualization
Vetting the priortization and ROI, funding projects, providing ongoing feedback
Business Sponsor
ReportingDashboards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning
DashboardsVisual Representation of key metrics that change over time• Data structure - Low• Usage - Near real-time visualization• Data temperature - Hot
Available Services:
LambdaDynamoDB
+ Streams
ElasticsearchAmazon Kinesis Firehose
Dashboards – Near Real-time
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
DynamoDBUsers
EC2
Containers
Serverless
OR
OR
Web serving layer
Dashboards + Search
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
DynamoDB
Amazon Kinesis Firehose
AWSLambda
Dynamo Streams
AmazonElasticsearch Users
ReportsStatic representations of data rendered at a point in time• Usage - Point in time data extraction• Data structure - High• Data temperature - Cold
Available Services:
Amazon Redshift Athena
Ad Hoc AnalysisInformation sought on an as-needed basis• Usage - Dynamic Data Querying• Data structure - Case based• Data temperature - Medium - cold
Available Services:
Amazon Redshift Athena Amazon EMR
Amazon ElasticSearch
Reports and Ad-Hoc Analysis
Amazon QuickSight
OR
Amazon Redshift
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
Athena
Machine LearningData labeled with outcomes to train predication models• Usage - Machine learning data preparation• Data structure - Case based• Data temperature - Medium - cold
Available Services:
Amazon EMR
Reports and Ad-Hoc Analysis
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
Amazon EMR
Users
What else?
Athena?
Processing & Analytics
Transactional & RDBMS
DynamoDB
NoSQL DB Relational DatabaseAurora
BI & Data Visualization
Kinesis Streams & Firehose
Batch
EMRHadoop, Spark,
Presto
RedshiftData Warehouse
AthenaQuery Service
AWS Batch
Predictive
Real-time
AWS LambdaApache Storm
on EMR
Apache Flinkon EMR
Spark Streaming on EMR
ElasticsearchService
Kinesis Analytics, Kinesis Streams
ElastiCache DAX
Thank you!