Hadoop 101 v1
-
Upload
john-berns -
Category
Technology
-
view
116 -
download
0
description
Transcript of Hadoop 101 v1
HADOOP 101A really quick overview of what it is
(and why you should care...)
Monday, 30 September, 13
HADOOP IS A
TOOL TO SOLVE
BIG DATA PROBLEMS
Monday, 30 September, 13
HADOOP IS A(ONE KIND OF)TOOL TO SOLVE
(SOME KINDS OF)BIG DATA PROBLEMS
Monday, 30 September, 13
OK....
SO WHAT IS A
BIG DATA PROBLEM?
Monday, 30 September, 13
“A PROBLEM IS A BIG DATA PROBLEM
WHEN THE SIZE OF THE DATA IS PART OF THE PROBLEM”
Monday, 30 September, 13
A few Terabytes of Data...Monday, 30 September, 13
Monday, 30 September, 13
Monday, 30 September, 13
Text processing--a few hours?
Monday, 30 September, 13
But what if you have more data?
Monday, 30 September, 13
Network Storage--Petabytes!
Monday, 30 September, 13
Network Storage--Petabytes!
Monday, 30 September, 13
What if you need compute power for complex algorithms?
Monday, 30 September, 13
8 core? 16 Cores? 64 cores? 512 GB RAM?
Monday, 30 September, 13
More processing in one box--price grows exponentially
0
75,000
150,000
225,000
300,000
16 32 48 64 80 96 112 128
Single Machine
Monday, 30 September, 13
Compute cost for many commodity machines scales linearly
0
75,000
150,000
225,000
300,000
16 32 48 64 80 96 112 128
Single Machine Many Commodity Maciines
Monday, 30 September, 13
A network of commodity computers
Monday, 30 September, 13
Run jobs on part of the data on each one. Compile the results.
Monday, 30 September, 13
Let’s add a computer to manage the process of job delegation, merging the results...
Monday, 30 September, 13
We also need something to keep track of what files are where, so we know what data needs to be computed...
Monday, 30 September, 13
When you have a lot of computers, and even more hard drives, one thing I can guarantee...
Monday, 30 September, 13
Computers will eventually fail.
Monday, 30 September, 13
Computers will eventually fail.
Monday, 30 September, 13
Hard drives will eventually fail.
Monday, 30 September, 13
Hard drives will eventually fail.
Monday, 30 September, 13
Hard drives will eventually fail.
Monday, 30 September, 13
Hard drives will eventually fail.
Monday, 30 September, 13
Even whole racks will fail.
Monday, 30 September, 13
If a computer fails and you only have one copy of your data...
Monday, 30 September, 13
You will be very, very unhappy.
Monday, 30 September, 13
So lets store multiple copies of the data. Hard drives are CHEAP!
Monday, 30 September, 13
So lets store multiple copies of the data. Hard drives are CHEAP!
Monday, 30 September, 13
So lets store multiple copies of the data. Hard drives are CHEAP!
Monday, 30 September, 13
So lets store multiple copies of the data. Hard drives are CHEAP!
Monday, 30 September, 13
If one hard drive fails... we are still OK
Monday, 30 September, 13
If one computer fails... we are still OK
Monday, 30 September, 13
Even if a whole rack fails... we are still OK
Monday, 30 September, 13
Once we find a failure let’s have the system recopy the copies.
Monday, 30 September, 13
Guess, what? We’ve just invented Hadoop!
Monday, 30 September, 13
So let’s talk about the pieces of Hadoop.
Monday, 30 September, 13
Data nodes store and manage the data on a single “slave” computer
Data Node
Monday, 30 September, 13
Task trackers manage the compute
Data Node
Task Tracker
Monday, 30 September, 13
Job tracker manages task trackers, ships code to compute nodes
Data Node
Task TrackerJob Tracker
Monday, 30 September, 13
Name node manages distribution and replication on the data nodes
Data Node
Task TrackerJob Tracker
Name Node
Monday, 30 September, 13
Map Reduce
Task TrackerJob Tracker
Monday, 30 September, 13
HDFS (Hadoop Distributed File System)
Data Node
Name Node
Monday, 30 September, 13
VISUAL EXAMPLEMonday, 30 September, 13
MAPMonday, 30 September, 13
SHUFFLEMonday, 30 September, 13
REDUCEMonday, 30 September, 13
PUTTING IT ALL TOGETHERMonday, 30 September, 13
SO WHAT?
Monday, 30 September, 13
THAT’S WHAT!Monday, 30 September, 13
OPPORTUNITY!Monday, 30 September, 13
HADOOP ECOSYSTEMMonday, 30 September, 13