A Empresa na Era da Informação Extrema
-
Upload
amazon-web-services-latin-america -
Category
Technology
-
view
726 -
download
2
description
Transcript of A Empresa na Era da Informação Extrema
“Algorithms have already written symphonies as moving as those
composed by Beethoven,
picked through legalese with the
deftness of a senior law partner, diagnosed patients
with more accuracy than a
doctor, written news articles
like a seasoned reporter,
and driven vehicles on urban highways with better control
than a human driver.”
A Nuvem é o alavancador das novas tendências tecnológicas
○○○○
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
We are constantly producing more data
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
From all types of industries
“Todos os mercados estão sendo transformados pela nova onda digital”
http://www.amazon.com.br/Digital-Disruption-Unleashing-Innovation-ebook/dp/B009L7QD1S/
3Vs
27 TB per day Large Hadron Collider – CERN
The Role of Data
is Changing
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
Until now, Questions you ask drove Data model
New model is collect as much data as possible – “Data-First Philosophy”
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
Data is the new raw material for
any business on par with
capital, people, labor
Data is the new raw material for business on par with capital
& labor
Data
Actionable Information
Generated
data
Available for analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Data Strategist
lunch hours last year?
select productId, count(*) from page_hits where hour in (12,13) group by productId order by count(*) desc
cat *-(12|13) | cut –f3 | sort | uniq -c > out
Hit <enter>?
1PB = 10^15 (1,000,000,000,000,000) bytes
1 PB = 231 days at 50MB/s
Solution: Massively Parallel Processing
○○○○
HDFS Reliable storage
MapReduce Data analysis
Very large log
(e.g TBs)
Very large log
(e.g TBs)
Lots of actions
by John
Very large log
(e.g TBs) Split into
small
pieces
Lots of actions
by John
Very large log
(e.g TBs)
Process in a
hadoop cluster
Split into
small
pieces
Lots of actions
by John
Very large log
(e.g TBs)
John’s history
Process in a
hadoop cluster
Aggregate
the results Split into
small
pieces
Lots of actions
by John
map Input
file reduce Output
file
Worker node
map Input
file reduce Output
file
map Input
file reduce Output
file
map Input
file reduce Output
file
Worker node
Worker node
Worker node
#3 ♥
○○●○○
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
Elastic On Demand
Pay as you go
Focus on
YOUR
business
November
Provisioned capacity
November
76%
24%
Provisioned capacity
November
November
0
1.000.000
2.000.000
3.000.000
4.000.000
5.000.000
6.000.000
“What kind of movies do people like ?”
More than 25 Million Streaming Members
50 Billion Events Per Day
30 Million plays every day
2 billion hours of video in 3
months
4 million ratings per day
3 million searches
Device location , time ,
day, week etc.
Social data
10 TB of streaming data per day
~1 PB of data stored in Amazon S3
S3
Wide range of processing languages used
EMR
Prod Cluster (EMR)S3
Data consumed in multiple ways
S3
EMR
Prod Cluster (EMR)
Recommendation
Engine
Ad-hoc
Analysis Personalization
EMR
S3EMR
EMR
Prod Cluster (EMR)
Query Cluster (EMR)
EMR
EMR
Foursquare…
33 million users 1.3 million businesses
…generates a lot of Data 3.5 billion check-ins 15M+ venues, Terabytes of log data
Uses EMR for Evaluation of new features
Machine learning
Exploratory analysis
Daily customer usage reporting
Long-term trend analysis
Source: IDC Whitepaper, sponsored by Amazon, “The Business Value of Amazon Web Services Accelerates Over Time.” July 2012
70% lower 5 year TCO per app
AWS
On-premises
$3.01M
$0.90M
50% reduction in analytics costs
0
0,1
0,2
0,3
0,4
0,5
0,6
Female Male
Gender
0 10 20 30 40 50 60 70 80
Age
Gorilla Coffee
Gray's Papaya
Amorino
Thursday Friday Saturday Sunday
Log files
250 EMR clusters spun up
and down every week
Challenge: Large amounts of computing resources needed for short periods of time; significant data storage costs
Solution: Clusters of 100s of nodes on EMR running 4-5 hours at a time Leverages 1000 genomes Public Data Set on AWS —free access to ~200 TB of genomes for over 2,600 people from 26 populations around the world.
Challenge: Volatile weather is deadly to crops like grapes
Solution: Built a predictive model based on freely available data— 60 years of crop data, 14 TBs of soil data, and 1M government Doppler radar points 50 EMR clusters process new data as it comes into S3 each day, continuously updating the model.
OBRIGADO!
http://awshub.com.br
slideshare.net/AmazonWebServicesLATAM
José Papo
Amazon Evangelist
@josepapo