Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data...

Post on 08-Jul-2020

6 views 0 download

Transcript of Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data...

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Axel Larsson, Enterprise Solutions ArchitectJoyjeet Banerjee, Enterprise Solutions Architect

9 April 2019

Consuming the Data Lake -Reporting, Analytics, Machine

Learning

What have we learned so far

Athena?

Anti-Pattern

Everything

Query

Also an Anti-Pattern

Everything

Query

One tool to rule them all

Where do I start?

• Understand your data• Data Structure, Access patterns & characteristics,

Temperature, Cost, Size

• Know your audience• Business Users, Data Scientists, Developers

• Select the right service

Archival

In-memory Warehouse

NoSQL

Hot data Warm data Cold data

Dat

a St

ruct

ure

Low

High

Object

Search

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

Amazon ElastiCache

Amazon ES

AmazonDynamoDB Amazon S3 Amazon Glacier

Hot data Warm data Cold data

Dat

a St

ruct

ure

Low

High

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

NoSQLObject

Archival

Search

In-Memory Warehouse

Amazon Redshift

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who is your audience?

PRIORITIES NEEDS

Creating engaging visual and narrative journeys for analytical solutionsData Visualizer

Manages data as a product. Ensures freshness and consistency of data; understands lineage and compliance needs; treats DS as customers

Data Product Manager

Monitoring for reliability, quickly diagnose deployment or availability issues

DevOps Engineer

ROLE

VisualizationDashboardsReporting

Reports – data quality, errors

Ad hoc queryingDashboards

Makes sense of data, generates and communicates insights to improve or create business processes, creates predictive ML models to support them

Data Scientist Ad hoc querying Robust ML tools

Builds scalable pipelines, transforms and loads data into structures complete with metadata that can be readily consumed by DS

Data Engineer

Ad hoc queryingQuick visualization

Vetting the priortization and ROI, funding projects, providing ongoing feedback

Business Sponsor

ReportingDashboards

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning

DashboardsVisual Representation of key metrics that change over time• Data structure - Low• Usage - Near real-time visualization• Data temperature - Hot

Available Services:

LambdaDynamoDB

+ Streams

ElasticsearchAmazon Kinesis Firehose

Dashboards – Near Real-time

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

DynamoDBUsers

EC2

Containers

Serverless

OR

OR

Web serving layer

Dashboards + Search

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

DynamoDB

Amazon Kinesis Firehose

AWSLambda

Dynamo Streams

AmazonElasticsearch Users

ReportsStatic representations of data rendered at a point in time• Usage - Point in time data extraction• Data structure - High• Data temperature - Cold

Available Services:

Amazon Redshift Athena

Ad Hoc AnalysisInformation sought on an as-needed basis• Usage - Dynamic Data Querying• Data structure - Case based• Data temperature - Medium - cold

Available Services:

Amazon Redshift Athena Amazon EMR

Amazon ElasticSearch

Reports and Ad-Hoc Analysis

Amazon QuickSight

OR

Amazon Redshift

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

Athena

Machine LearningData labeled with outcomes to train predication models• Usage - Machine learning data preparation• Data structure - Case based• Data temperature - Medium - cold

Available Services:

Amazon EMR

Reports and Ad-Hoc Analysis

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

Amazon EMR

Users

What else?

Athena?

Processing & Analytics

Transactional & RDBMS

DynamoDB

NoSQL DB Relational DatabaseAurora

BI & Data Visualization

Kinesis Streams & Firehose

Batch

EMRHadoop, Spark,

Presto

RedshiftData Warehouse

AthenaQuery Service

AWS Batch

Predictive

Real-time

AWS LambdaApache Storm

on EMR

Apache Flinkon EMR

Spark Streaming on EMR

ElasticsearchService

Kinesis Analytics, Kinesis Streams

ElastiCache DAX

Thank you!