2014 july 24_what_ishadoop
-
Upload
adam-muise -
Category
Software
-
view
106 -
download
0
description
Transcript of 2014 july 24_what_ishadoop
![Page 2: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/2.jpg)
Who am I?
![Page 3: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/3.jpg)
Who is ?
![Page 4: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/4.jpg)
We do Hadoop
The leaders of Hadoop’s development
Community driven, Enterprise Focused
Drive Innovation in the platform – We lead the roadmap
100% Open Source – Democratized Access to Data
![Page 5: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/5.jpg)
We do Hadoop successfully.
Support
Professional ServicesTraining
![Page 6: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/6.jpg)
What is Hadoop? What is everyone talking about?
![Page 7: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/7.jpg)
Data
![Page 8: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/8.jpg)
“Big Data” is the marketing term of the decade in IT
![Page 9: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/9.jpg)
What lurks behind the hype is the democratization of Data, a move to aggregate disparate data silos
into one shiny pile of analytic gold
![Page 10: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/10.jpg)
So what are the problems with Big Data?
![Page 11: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/11.jpg)
Let’s talk challenges…
![Page 12: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/12.jpg)
Volume
Volume
Volume
Volume
![Page 13: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/13.jpg)
Volume
Volume
Volume
Volume
VolumeVolume
Volume
VolumeVolume Volume
Volume
VolumeVolume
VolumeVolume
VolumeVolume
![Page 14: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/14.jpg)
Volume
Volume
Volume
Volume
VolumeVolume
Volume
VolumeVolume Volume
Volume
VolumeVolume
VolumeVolume
VolumeVolume Volume
Volume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
![Page 15: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/15.jpg)
Volume
Volume
Volume
Volume
VolumeVolume
Volume
VolumeVolume Volume
Volume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
VolumeVolume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
Volume
VolumeVolume
VolumeVolume
Volume
Volume
![Page 16: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/16.jpg)
Storage, Management, Processing all become challenges with Data at Volume
![Page 17: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/17.jpg)
Traditional technologies adopt a divide, drop, and conquer approach
![Page 18: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/18.jpg)
The solution?EDW
DataDataData
DataData
Data
Data DataData
Yet Another EDW
DataDataData
DataData
Data
Data DataData
Analytical DB
DataDataData
DataData
Data
Data DataData OLTP
DataDataData
DataData
Data
Data DataData
Another EDW
DataDataData
DataData
Data
Data DataData
![Page 19: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/19.jpg)
Ummm…you dropped something
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataDataDataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataDataDataData
Data
DataData
Data
Data DataData
EDW
DataDataData
DataData
Data
Data DataData
Yet Another EDW
DataDataData
DataData
Data
Data DataData
Analytical DB
DataDataData
DataData
Data
Data DataData
OLTP
DataDataData
DataData
Data
Data DataData
Another EDW
DataDataData
DataData
Data
Data DataData
![Page 20: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/20.jpg)
Analyzing the data usually raises more interesting questions…
![Page 21: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/21.jpg)
…which leads to more data
![Page 22: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/22.jpg)
Wait, you’ve seen this before.
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataDataDataData
Data
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
Analytics Sausage Factory
Data DataData
DataData
Data
Data DataData …Data
DataData…
DataData
Data
Data
![Page 23: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/23.jpg)
Data begets Data.
![Page 24: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/24.jpg)
What keeps us from our Data?
![Page 25: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/25.jpg)
“Prices, Stupid passwords, and Boring Statistics.”
- Hans Rosling
http://www.youtube.com/watch?v=hVimVzgtD6w
![Page 26: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/26.jpg)
Your data silos are lonely places.
EDW
DataDataData
DataData
Data
Data DataData
Accounts
DataDataData
DataData
Data
Data DataData
Customers
DataDataData
DataData
Data
Data DataData
Web Properties
DataDataData
DataData
Data
Data DataData
![Page 27: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/27.jpg)
… Data likes to be together.
EDW
DataDataData
DataData
Data
Data DataData
Accounts
DataDataData
DataData
Data
Data DataData
Customers
DataDataData
DataData
Data
Data DataData
Web Properties
DataDataData
DataData
Data
Data DataData
![Page 28: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/28.jpg)
Data likes to socialize too.EDW
DataDataData
DataData
Data
Data DataData
Accounts
DataDataData
DataData
Data
Data DataData
Customers
DataDataData
DataData
Data
Data DataData
Web Properties
DataDataData
DataData
Data
Data DataData
Machine Data
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
DataDataData
DataData
Data
Data DataData
CDR
DataDataData
DataData
Data
Data DataData
Weather Data
DataDataData
DataData
Data
Data DataData
![Page 29: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/29.jpg)
New types of data don’t quite fit into your pristine view of the world.
My Little Data Empire
DataDataData
Data
DataData
Data DataData
Logs
DataDataDataData
Data
DataData
Machine Data
DataDataDataData
Data
DataData
??
? ?
![Page 30: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/30.jpg)
To resolve this, some people take hints from Lord Of The Rings...
![Page 31: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/31.jpg)
…and create One-Schema-To-Rule-Them-All…
EDW
DataDataData
DataData
Data
Data DataDataSchema
![Page 32: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/32.jpg)
…but that has its problems too.
EDW
DataDataData
DataData
Data
Data DataDataSchemaData
DataData
ETL ETL
ETL ETL
EDW
DataDataData
DataData
Data
Data DataDataSchemaData
DataData
ETL ETL
ETL ETL
![Page 33: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/33.jpg)
What if the data was processed and stored centrally? What if you didn’t
need to force it into a single schema? We call it a Data Lake.
EDW
DataDataData
DataData
DataData
Schema
BI & Analytics Schema Schema
DataData
Data
Data Lake
DataData
DataData
DataDataData
DataData
DataData
Data
SchemaSchema
DataData
DataProcess Process
DataData
Data
DataData
Data
DataData
DataData
DataDataData Sources
Data Sources
![Page 34: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/34.jpg)
A Data Lake Architecture enables:- Landing data without forcing a single schema- Landing a variety and large volume of data efficiently- Retaining data for a long period of time with a very low $/TB- A platform to feed other Analytical DBs- A platform to execute next gen data analytics and processing applications (SAS, Informatica,
Graph Analytics, Machine Learning, SAP, etc…)
![Page 35: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/35.jpg)
In most cases, more data is better.Work with the population, not just a sample.
![Page 36: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/36.jpg)
Your view of a client today.
Male
Female
Age: 25-30
Town/City
Middle Income Band
Product Category Preferences
![Page 37: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/37.jpg)
Your view with more data.
Male
Female
Age: 27 but feels old
GPS coordinates
$65-68k per year
Product recommendations
Tea PartyHippie
Looking to start a business
Walking into Starbucks right now…
A depressed Toronto Maple Leaf’s Fan
Products left in basket indicate drunk amazon shopper
Gene Expression for Risk Taker
Thinking about a new house
Unhappy with his cell phone plan
Pregnant
Spent 25 minutes looking at tea cozies
![Page 38: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/38.jpg)
So what is the answer?
![Page 39: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/39.jpg)
Enter the Hadoop.
http://www.fabulouslybroke.com/2011/05/ninja-elephants-and-other-awesome-stories/
………
![Page 40: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/40.jpg)
Hadoop was created because traditional technologies never cut it for the Internet properties like Google, Yahoo, Facebook, Twitter, and LinkedIn
![Page 41: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/41.jpg)
Traditional architecture didn’t scale enough…
DB DBDB
SAN
AppApp AppApp
DB DBDB
SAN
AppApp AppApp DB DBDB
SAN
AppApp AppApp
![Page 42: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/42.jpg)
Traditional architectures cost too much at that volume…
$/TB
$pecial Hardware
$upercomputing
![Page 43: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/43.jpg)
So what is the answer?
![Page 44: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/44.jpg)
If you could design a system that would handle this, what would it look like?
![Page 45: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/45.jpg)
It would probably need a highly resilient, self-healing, cost-efficient, distributed file system…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
![Page 46: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/46.jpg)
It would probably need a completely parallel processing framework that took tasks to the
data…
Storage Storage Storage
Storage Storage Storage
Storage Storage StorageProcessing Processing Processing
Processing Processing Processing
Processing Processing Processing
![Page 47: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/47.jpg)
It would probably run on commodity hardware, virtualized machines, and common OS
platforms
Storage Storage Storage
Storage Storage Storage
Storage Storage StorageProcessing Processing Processing
Processing Processing Processing
Processing Processing Processing
![Page 48: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/48.jpg)
It would probably be open source so innovation could happen as quickly as possible
![Page 49: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/49.jpg)
It would need a critical mass of users
![Page 50: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/50.jpg)
{Processing + Storage}=
{YARN + HDFS}
![Page 51: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/51.jpg)
Want to get your hands dirty?
![Page 52: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/52.jpg)
To do this, we need to install Hadoop right?
![Page 53: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/53.jpg)
Nope.
![Page 54: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/54.jpg)
Enter the
Sandbox.
![Page 55: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/55.jpg)
The Sandbox is ‘Hadoop in a Can’.It contains one copy of each of the Master and Worker node processes
used in a cluster, only in a single virtual node.
Storage Storage Storage
Storage Storage Storage
Storage Storage StorageProcessing Processing Processing
Processing Processing Processing
Processing Processing Processing
ProcessingStorage
Linux VM
![Page 56: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/56.jpg)
Getting started with Sandbox VM:
- Pick your flavor of VM at…http://www.hortonworks.com/sandbox
- Start the sandbox VM- find the IP displayed - go to…
http://172.16.130.137
- Register- Click on ‘Start Tutorials’- On the left hand nav, click on ‘HCatalog, Basic Pig & Hive Commands’
![Page 57: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/57.jpg)
http://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/
In this tutorial you can…- Land files in HDFS- Assign metadata with HCatalog- Use SQL with Hive- Learn to process data with Pig
![Page 58: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/58.jpg)
Hadoop has other open source projects…
![Page 59: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/59.jpg)
Apache Hadoop
FlumeAmbari
HBaseFalcon
MapReduceHDFS
SqoopHCatalog
Pig
Hive
StormYARN
Knox
Tez
![Page 60: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/60.jpg)
Hortonworks Data Platform
FlumeAmbari
HBaseFalcon
MapReduceHDFS
SqoopHCatalog
Pig
Hive
Storm YARN
Knox
Tez
![Page 61: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/61.jpg)
What else are we working on?
hortonworks.com/labs/
![Page 62: 2014 july 24_what_ishadoop](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c6ba814a795938518b459b/html5/thumbnails/62.jpg)
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 62
There is NO second place
HortonworksWe do Hadoop.