Structured, Unstructured and Streaming Big Data on the AWS
-
Upload
amazon-web-services -
Category
Technology
-
view
2.029 -
download
4
Transcript of Structured, Unstructured and Streaming Big Data on the AWS
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Markku Lepistö
Principal Technology Evangelist, APAC
Structured, Unstructured and Streaming Big Data
on Amazon Web Services
Agenda
1:00pm - 2:00pm Registration – Lunch & Meet AWS SAs 2:00pm - 2:20pm Welcome & Introduction 2:20pm - 3:40pm Structured, unstructured and streaming Big Data on the AWS Platform 3:40pm - 4:00pm Break 4:00pm - 5:15pm Building an Amazon RedShift Data warehouse 5:15pm - 5:30pm Q&A 5.30pm Close
Big Data End to End Framework
Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Apache Storm
PIG
Amazon Machine Learning
Amazon EMR
Amazon Glacier
Amazon DynamoDB
”I got kicked out of the bookshop last week, because I moved all of the Big Data books
into the Religion sec<on”
Ingest Store Process Analyse Data Answers
Simplify Big Data Processing
Databases
Database Flat Files Database
Data
File Data
IoT Device
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Database
INGEST STORE
Databases
Database Flat Files Database
Data
File Data
IoT Device
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Database
INGEST
Amazon Redshift
Amazon RDS
STORE
Data Tier
Search Cache Object Store
RDBMS NoSQL Data Warehouse
logging analy)cs
webscale transac)ons
rich search hot reads complex queries and transac)ons
Data Tier
Amazon DynamoDB
Amazon RDS
Amazon ElastiCache
Amazon S3
Amazon Redshift
Amazon CloudSearch
Traditional Relational Database
Amazon
Redshift Amazon
RDS
Scaling Vertical Horizontal
Storage Row Column
Workload Transactional Analytical
Architecture SMP MPP
Type SQL Relational SQL Relational
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Storage
INGEST
Amazon Redshift
Amazon RDS
Application
Amazon S3
STORE
Impala PIG
Amazon EMR
Amazon S3
Amazon Redshift
Amazon EMR
Glacier
Amazon
DynamoDB
Amazon Machine Learning
Applications
Amazon
Redshift
Scaling Add nodes Automatic
Speed Fastest Fast
Cost Higher Lower
Durability Configurable Built-in
Amazon S3
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Stream Processor
INGEST
Amazon Redshift
Amazon RDS
Amazon S3
Amazon Kinesis
STORE
Why Stream Storage?
Sensors Amazon Kinesis
Apache Kafka
Availability Zone
Availability Zone
Availability Zone
Data Sources
Data Sources
Data Sources
Data Sources
Data Sources
Logging
Metrics
Analysis
Processing
S3
DynamoDB
Redshift
Lambda Amazon Kinesis
Stream
Amazon
Redshift
Ordering Yes Yes
Persistence 24 Hours Configurable
Size 50 KB Configurable
Scaling High High
Latency Low Low
Managed Yes No
Amazon Kinesis
”The world of gaming never sleeps. We owe every player a great experience, and AWS is our main tool to make that happen.” -‐ Sami Yliharju, Services Lead
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Amazon EMR
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
Hadoop
Amazon
Redshift
Scaling 2 PB+ Nodes
Storage Native HDFS/S3
BI Tools High Medium
Durability High High
Latency Low Low
Managed Fully Semi (EMR)
Amazon
Redshift
Nodes
HDFS
Medium
High
Low
Semi (EMR)
Amazon Redshift Impala
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
PIG
Stre
amin
g
Amazon EMR
Hadoop
PIG
SQL on Hadoop
Eats anything
New Processing Engine
Amplab Big Data Benchmark https://amplab.cs.berkeley.edu/benchmark/
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
PIG
Amazon EMR
Hadoop
AWS Lambda
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
PIG
ANALYSE
Amazon Machine Learning
L
Amazon EMR
Hadoop
AWS Lambda
Use Cases
FOMO
Amazon EMR
Hadoop
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Flat Files Database
Data
Event Data
Streaming Data
Databases Amazon Redshift
Amazon Redshift
Database Data
SQL Analytics
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Clickstream Analysis - Batch
Am
azon
Ela
stic
Map
Red
uce
Event Data
Amazon EMR
Hadoop
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Clickstream Analysis – Near Real Time
Event Producer
Amazon Kinesis
Amazon S3
Amazon Redshift
Kinesis Consumers Streaming
Data
Demo
Realtime Twitter analytics using AWS Kinesis, Lambda and Open Source Software
vs
Amazon Kinesis
Twitter Stream AWS Lambda
Demo: Live Twitter Feed Analysis
* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Twitter - On a typical day: More than 500 million Tweets sent* • Average 5,700 TPS
Thank You!