Hadoop 101 v2
-
Upload
john-berns -
Category
Data & Analytics
-
view
107 -
download
0
description
Transcript of Hadoop 101 v2
![Page 1: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/1.jpg)
Hadoop 101A really quick overview of the concepts…
![Page 2: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/2.jpg)
A few Terabytes of Data...
![Page 3: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/3.jpg)
![Page 4: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/4.jpg)
![Page 5: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/5.jpg)
Text processing--a few hours?
![Page 6: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/6.jpg)
But what if you have more data?
![Page 7: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/7.jpg)
Network Storage--Petabytes!
![Page 8: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/8.jpg)
Network Storage--Petabytes!
![Page 9: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/9.jpg)
What if you need compute power for complex algorithms?
![Page 10: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/10.jpg)
8 core? 16 Cores? 64 cores? 512 GB RAM?
![Page 11: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/11.jpg)
A network of commodity computers
![Page 12: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/12.jpg)
Run jobs on PART of the data on each computer then AGGRETAGE the intermediary results from each
computer.
![Page 13: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/13.jpg)
Let’s add a computer to manage the process of job delegation, merging the results...
and keeping track of the results...
![Page 14: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/14.jpg)
We also need something to keep track of what files are where, so we know where the data is that needs
to be computed...
![Page 15: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/15.jpg)
When you have a lot of computers, and even more hard drives,
one thing I can guarantee...
![Page 16: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/16.jpg)
Computers will eventually fail.
![Page 17: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/17.jpg)
Computers will eventually fail.
![Page 18: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/18.jpg)
Hard drives will eventually fail.
![Page 19: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/19.jpg)
Hard drives will eventually fail.
![Page 20: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/20.jpg)
Hard drives will eventually fail.
![Page 21: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/21.jpg)
Hard drives will eventually fail.
![Page 22: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/22.jpg)
Even whole racks will fail.
![Page 23: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/23.jpg)
If a computer fails and you only have one copy of your data...
![Page 24: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/24.jpg)
You will be very, very unhappy.
![Page 25: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/25.jpg)
So lets store multiple copies of the data. Hard drives are CHEAP!
![Page 26: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/26.jpg)
So lets store multiple copies of the data. Hard drives are CHEAP!
![Page 27: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/27.jpg)
So lets store multiple copies of the data. Hard drives are CHEAP!
![Page 28: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/28.jpg)
So lets store multiple copies of the data. Hard drives are CHEAP!
![Page 29: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/29.jpg)
If one hard drive fails... we are still OK
![Page 30: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/30.jpg)
If one computer fails... we are still OK
![Page 31: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/31.jpg)
Even if a whole rack fails... we are still OK
![Page 32: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/32.jpg)
Once we find a failure let’s have the system recopy the copies.
![Page 33: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/33.jpg)
Send the compute job to all nodes.
![Page 34: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/34.jpg)
And let it run on it’s part of the data….
![Page 35: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/35.jpg)
And let it run on it’s part of the data….
![Page 36: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/36.jpg)
And let it run on it’s part of the data….
![Page 37: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/37.jpg)
And let it run on it’s part of the data….
![Page 38: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/38.jpg)
One is stuck….
![Page 39: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/39.jpg)
We have three copies—we can redistribute the compute
![Page 40: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/40.jpg)
And take the one that finishes fastest
![Page 41: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/41.jpg)
Merge sorted sets based on some key…
A-E F-J K-O P-T U-Z
![Page 42: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/42.jpg)
…and write partial results
PART-01 PART-02 PART-03 PART-04 PART-05
![Page 43: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/43.jpg)
Guess, what? We’ve just invented Hadoop!
PART-03
PART-01
PART-02
A-E F-J
![Page 44: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/44.jpg)
So let’s talk about the pieces of Hadoop.
![Page 45: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/45.jpg)
Data nodes store and manage the data on a single “slave” computer
Data Node
![Page 46: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/46.jpg)
Task trackers manage the compute
Data Node
Task Tracker
![Page 47: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/47.jpg)
Job tracker manages task trackers, ships code to compute nodes
Data Node
Task TrackerJob Tracker
![Page 48: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/48.jpg)
Name node manages distribution and replication on the data nodes
Data Node
Task TrackerJob Tracker
Name Node
![Page 49: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/49.jpg)
Map Reduce
Task TrackerJob Tracker
![Page 50: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/50.jpg)
HDFS (Hadoop Distributed File System)
Data Node
Name Node
![Page 51: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/51.jpg)
HDFS
![Page 52: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/52.jpg)
Visual Example
![Page 53: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/53.jpg)
Map
![Page 54: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/54.jpg)
Shuffle
![Page 55: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/55.jpg)
Reduce
![Page 56: Hadoop 101 v2](https://reader037.fdocuments.us/reader037/viewer/2022103015/54c657824a7959e8238b4655/html5/thumbnails/56.jpg)
Putting It All Together