Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ......
Transcript of Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ......
![Page 1: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/1.jpg)
Big Data on AWS
Services Overview
Bernie Nallamotu| Principle Solutions Architect
\
![Page 2: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/2.jpg)
So what is it?
When your data sets become
so large that you have to start innovating around
how to collect, store, organize, analyze and share it
Compute Storage Big Data
![Page 3: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/3.jpg)
100
GB
1,000
PB
Challenges start at relatively small volumes
Compute Storage Big Data
![Page 4: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/4.jpg)
GB TB PB
Compute Storage Big Data Unconstrained data growth
95% of the 1.2 zettabytes of data in the digital universe is unstructured
70% of of this is user-generated content
Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – 2012.
Source: IDC
ZB
EB
![Page 5: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/5.jpg)
Web sites Blogs/Reviews/Emails/Pictures
Social Graphs Facebook, Linked-in, Contacts
Application server logs Web sites, games
Sensor data Weather, water, smart grids
Images/videos Traffic, security cameras
Twitter 50m tweets/day 1,400% growth/year
Where does it come from?
Compute Storage Big Data
![Page 6: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/6.jpg)
Innovation
Why AWS and big data?
Amazon
S3
Amazon
DynamoDB
Amazon
RedShift Spot
HPC EMR
Compute Storage
![Page 7: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/7.jpg)
AWS Worldwide Public Sector Team
Amazon EMR
(Elastic Map Reduce)
AWS Data Pipeline
Hosted Hadoop
framework Move data among AWS
services and on-
premises data sources
Amazon Redshift
Petabyte-scale data
warehouse service
Big Data Services
Compute Storage Big Data
![Page 8: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/8.jpg)
How do you get your slice of it?
AWS Direct Connect
Dedicated low latency
bandwidth
Queuing
Highly scalable event
buffering
Amazon Storage Gateway
Sync local storage to the cloud
AWS Import/Export
Physical media shipping
Compute Storage Big Data
![Page 9: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/9.jpg)
AWS Relational Database
Service
Fully managed database
(MySQL, Oracle, MS SQL Server,
PostgreSQL)
AWS DynamoDB
NoSQL, Schema-less,
Provisioned throughput
database
Amazon S3
Object datastore up to 5TB
per object
99.999999999% durability
Where do you put your slice of it?
AWS SimpleDB
NoSQL, Schema-less
Smaller datasets
Compute Storage Big Data
![Page 10: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/10.jpg)
Amazon Glacier
Long term cold storage
From $0.01 per GB/Month
99.999999999% durability
Where do you put your slice of it?
Compute Storage Big Data
![Page 11: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/11.jpg)
Scale Price
Performance
How quick do you need to read it?
Single digit ms 10s-100s ms <5 hours
AWS DynamoDB
Social scale applications Provisioned throughput performance
Flexible consistency models
AWS S3
Any object, any app 99.999999999% durability
Objects up to 5TB in size
AWS Glacier
Media & asset archives Extremely low cost
S3 levels of durability
Compute Storage Big Data
![Page 12: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/12.jpg)
Scale Price
Performance
Operate at any scale
Unlimited data
Compute Storage Big Data
![Page 13: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/13.jpg)
Data App App
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
Data has gravity
Compute Storage Big Data
![Page 14: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/14.jpg)
Data
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
Compute Storage Big Data …and inertia at volume…
![Page 15: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/15.jpg)
Data
…easier to move applications to the data
Compute Storage Big Data
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
![Page 16: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/16.jpg)
Bring compute capacity to the data
Very large dataset seeks
strong & consistent
compute for short term
relationship, possibly
longer
Compute Storage Big Data
![Page 17: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/17.jpg)
Compute Storage Big Data Flexible compute resources, on demand
Vertical
Scaling
From $0.02/hr
Amazon Elastic Compute Cloud (EC2) Basic unit of compute capacity
Range of CPU, memory & local disk options
27 Instance types available, from micro through cluster compute to SSD backed
Feature Details
Flexible Run Windows or Linux distributions
Scalable Wide range of instance types from micro to cluster compute
Machine Images Configurations can be saved as machine images (AMIs) from which
new instances can be created
Full control Full root or administrator rights
VM Import/Export Import and export VM images to transfer configurations in and out of
EC2
Monitoring Publishes metrics to Cloud Watch
Inexpensive On-demand, Reserved and Spot instance types
Secure Full firewall control via Security Groups
![Page 18: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/18.jpg)
On and Off Fast Growth
Variable peaks Predictable peaks
Elastic capacity as you need it
Compute Storage Big Data
![Page 19: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/19.jpg)
On and Off Fast Growth
Predictable peaks Variable peaks
WASTE
CUSTOMER DISSATISFACTION
Elastic capacity as you need it
Compute Storage Big Data
![Page 20: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/20.jpg)
Elastic cloud capacity
Traditional
IT capacity
Your IT needs
Time
Capacity
Elastic capacity as you need it
Compute Storage Big Data
![Page 21: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/21.jpg)
Fast Growth On and Off
Predictable peaks Variable peaks
Elastic capacity as you need it
Compute Storage Big Data
![Page 22: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/22.jpg)
From one instance…
Compute Storage Big Data
![Page 23: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/23.jpg)
…to thousands
Compute Storage Big Data
![Page 24: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/24.jpg)
Innovation
Why AWS and big data?
S3
DynamoDB RedShift
Spot
HPC EMR
Compute Storage
![Page 25: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/25.jpg)
Innovation
Why AWS and big data?
S3
DynamoDB RedShift
Spot
HPC EMR
Compute Storage
![Page 26: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/26.jpg)
AWS EMR – Elastic MapReduce
![Page 27: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/27.jpg)
AWS Worldwide Public Sector Team
A key tool in the toolbox to help with ‘Big Data’ challenges Makes possible analytics processes previously not feasible Cost effective when leveraged with EC2 spot market Broad ecosystem of tools to handle specific use cases
Amazon Elastic MapReduce
![Page 28: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/28.jpg)
What is EMR?
Map-Reduce engine Integrated with tools
Hadoop-as-a-service
Massively parallel
Cost effective AWS wrapper
Integrated to AWS services
![Page 29: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/29.jpg)
HDFS Reliable storage
MapReduce Data analysis
![Page 30: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/30.jpg)
map Input
file reduce Output
file
EC2 instance
![Page 31: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/31.jpg)
map Input
file reduce Output
file
map Input
file reduce Output
file
map Input
file reduce Output
file
EC2 instance
EC2 instance
EC2 instance
![Page 32: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/32.jpg)
Person Duration Bob 23 Charlie 16 Charlie 18 Charlie 14 Bob 15 Alice 8 David 17 Alice 7 Charlie 15 Bob 11 David 12 Alice 10
Person Start End Bob 00:44:48 00:45:11 Charlie 02:16:02 02:16:18 Charlie 11:16:59 11:17:17 Charlie 11:17:24 11:17:38 Bob 11:23:10 11:23:25 Alice 16:26:46 16:26:54 David 17:20:28 17:20:45 Alice 18:16:53 18:17:00 Charlie 19:33:44 19:33:59 Bob 21:13:32 21:13:43 David 22:36:22 22:36:34 Alice 23:42:01 23:42:11
map
Person Total Alice 25 Bob 49
Charlie 63 David 29
reduce
Map? Reduce?
![Page 33: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/33.jpg)
AWS Worldwide Public Sector Team
AWS Elastic MapReduce Architecture
![Page 34: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/34.jpg)
HDFS
Amazon EMR
Pig
![Page 35: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/35.jpg)
HDFS
Amazon S3 Amazon
DynamoDB
Amazon EMR
![Page 36: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/36.jpg)
HDFS
Data management
Amazon EMR
Amazon S3 Amazon
DynamoDB
![Page 37: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/37.jpg)
HDFS
Pig
Analytics languages Data management
Amazon EMR
Amazon S3 Amazon
DynamoDB
![Page 38: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/38.jpg)
HDFS
Pig
Amazon
RDS
Analytics languages Data management
Amazon EMR
Amazon S3 Amazon
DynamoDB
![Page 39: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/39.jpg)
HDFS
Pig
Analytics languages Data management
Amazon
RedShift AWS Data Pipeline
Amazon EMR Amazon
RDS
Amazon S3 Amazon
DynamoDB
![Page 40: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr](https://reader031.fdocuments.us/reader031/viewer/2022022505/5aba33777f8b9ab1118b9143/html5/thumbnails/40.jpg)
Useful Resources & Links
• AWS Big Data: http://aws.amazon.com/big-data
• AWS HPC: http://aws.amazon.com/hpc-applications
• Architecture Center: http://aws.amazon.com/architecture
• Documentation: http://aws.amazon.com/documentation
• Security Center: http://aws.amazon.com/security
• Whitepapers: http://aws.amazon.com/whitepapers
• Resources: http://aws.amazon.com/resources
• Case Studies: http://aws.amazon.com/solutions/case-studies
• Solution Providers: http://aws.amazon.com/solutions/global-solution-providers
• Calculator: http://calculator.s3.amazonaws.com/calc5.html
• TCO Calculator: http://aws.amazon.com/tco-calculator
• AWS Blog: http://aws.typepad.com
• The Power of 60: http://www.powerof60.com