Riding the Elephant - Hadoop 2.0
-
Upload
simon-elliston-ball -
Category
Documents
-
view
277 -
download
3
description
Transcript of Riding the Elephant - Hadoop 2.0
![Page 1: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/1.jpg)
Simon Elliston Ball Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant: Hadoop 2.0
http://bit.ly/RidingElephants
![Page 2: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/2.jpg)
Append only distributed file-system
In the beginning…
Map ReduceJava.
![Page 3: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/3.jpg)
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)HDP for Windows and .NET SDK
![Page 4: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/4.jpg)
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, PigCascadingScalding
![Page 5: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/5.jpg)
SQL on Hadoop
Learning to share the toys
HBaseSolr on Hadoop
Sharing HDFS…
![Page 6: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/6.jpg)
Map Reduce v1
JobTracker
Job
Head Node
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
Task
Task
Task
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
![Page 7: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/7.jpg)
Map Reduce v1
JobTracker
Job
Head Node
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
MR Status
MR Status
MR Status
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
![Page 8: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/8.jpg)
Typical Hadoop 1.x setup
HBase
Production
Adhoc
![Page 9: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/9.jpg)
Typical Hadoop 1.x setup
HBase
Production
Adhoc
![Page 10: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/10.jpg)
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
![Page 11: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/11.jpg)
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
![Page 12: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/12.jpg)
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
![Page 13: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/13.jpg)
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
![Page 14: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/14.jpg)
Removing the choke point
Advantages
60%-150% better usageLong running applications
![Page 15: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/15.jpg)
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
![Page 16: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/16.jpg)
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)
MapReduce 2
Giraph
<Insert your application here>
![Page 17: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/17.jpg)
Batch applications
Spinning YARNs with Spring
ServicesDirect to YARN APIsSpring Data Hadoop abstraction
![Page 18: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/18.jpg)
Streamin
g
Why?
Machine
LearningGraph
sService
sDistributed Shell -
Anything.
![Page 19: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/19.jpg)
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In Memory
Distributed
Fault tolerant
Real-time
✓✓✓
✓�
RRDs
✓
![Page 20: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/20.jpg)
Mesos
Wider sharing
Hadoop
Spark
Aurora
Mesos Framework
Hardware
YARN
MapReduce
HBase etc
HDFS
![Page 21: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/21.jpg)
Hadoop is more than MapReduce
The new world
YARN opens up new paradigmsInfrastructure maturing: better sharing
![Page 22: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/22.jpg)
Hadoop and beyond!
![Page 23: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/23.jpg)
Thank you
![Page 24: Riding the Elephant - Hadoop 2.0](https://reader034.fdocuments.us/reader034/viewer/2022042715/5597a1121a28aba2098b4686/html5/thumbnails/24.jpg)
Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures
http://bit.ly/RidingElephants