Hadoop 101 v1

55
HADOOP 101 A really quick overview of what it is (and why you should care...) Monday, 30 September, 13

description

Given to Hadoop.SG meetup, Sept 25, 2013

Transcript of Hadoop 101 v1

Page 1: Hadoop 101 v1

HADOOP 101A really quick overview of what it is

(and why you should care...)

Monday, 30 September, 13

Page 2: Hadoop 101 v1

HADOOP IS A

TOOL TO SOLVE

BIG DATA PROBLEMS

Monday, 30 September, 13

Page 3: Hadoop 101 v1

HADOOP IS A(ONE KIND OF)TOOL TO SOLVE

(SOME KINDS OF)BIG DATA PROBLEMS

Monday, 30 September, 13

Page 4: Hadoop 101 v1

OK....

SO WHAT IS A

BIG DATA PROBLEM?

Monday, 30 September, 13

Page 5: Hadoop 101 v1

“A PROBLEM IS A BIG DATA PROBLEM

WHEN THE SIZE OF THE DATA IS PART OF THE PROBLEM”

Monday, 30 September, 13

Page 6: Hadoop 101 v1

A few Terabytes of Data...Monday, 30 September, 13

Page 7: Hadoop 101 v1

Monday, 30 September, 13

Page 8: Hadoop 101 v1

Monday, 30 September, 13

Page 9: Hadoop 101 v1

Text processing--a few hours?

Monday, 30 September, 13

Page 10: Hadoop 101 v1

But what if you have more data?

Monday, 30 September, 13

Page 11: Hadoop 101 v1

Network Storage--Petabytes!

Monday, 30 September, 13

Page 12: Hadoop 101 v1

Network Storage--Petabytes!

Monday, 30 September, 13

Page 13: Hadoop 101 v1

What if you need compute power for complex algorithms?

Monday, 30 September, 13

Page 14: Hadoop 101 v1

8 core? 16 Cores? 64 cores? 512 GB RAM?

Monday, 30 September, 13

Page 15: Hadoop 101 v1

More processing in one box--price grows exponentially

0

75,000

150,000

225,000

300,000

16 32 48 64 80 96 112 128

Single Machine

Monday, 30 September, 13

Page 16: Hadoop 101 v1

Compute cost for many commodity machines scales linearly

0

75,000

150,000

225,000

300,000

16 32 48 64 80 96 112 128

Single Machine Many Commodity Maciines

Monday, 30 September, 13

Page 17: Hadoop 101 v1

A network of commodity computers

Monday, 30 September, 13

Page 18: Hadoop 101 v1

Run jobs on part of the data on each one. Compile the results.

Monday, 30 September, 13

Page 19: Hadoop 101 v1

Let’s add a computer to manage the process of job delegation, merging the results...

Monday, 30 September, 13

Page 20: Hadoop 101 v1

We also need something to keep track of what files are where, so we know what data needs to be computed...

Monday, 30 September, 13

Page 21: Hadoop 101 v1

When you have a lot of computers, and even more hard drives, one thing I can guarantee...

Monday, 30 September, 13

Page 22: Hadoop 101 v1

Computers will eventually fail.

Monday, 30 September, 13

Page 23: Hadoop 101 v1

Computers will eventually fail.

Monday, 30 September, 13

Page 24: Hadoop 101 v1

Hard drives will eventually fail.

Monday, 30 September, 13

Page 25: Hadoop 101 v1

Hard drives will eventually fail.

Monday, 30 September, 13

Page 26: Hadoop 101 v1

Hard drives will eventually fail.

Monday, 30 September, 13

Page 27: Hadoop 101 v1

Hard drives will eventually fail.

Monday, 30 September, 13

Page 28: Hadoop 101 v1

Even whole racks will fail.

Monday, 30 September, 13

Page 29: Hadoop 101 v1

If a computer fails and you only have one copy of your data...

Monday, 30 September, 13

Page 30: Hadoop 101 v1

You will be very, very unhappy.

Monday, 30 September, 13

Page 31: Hadoop 101 v1

So lets store multiple copies of the data. Hard drives are CHEAP!

Monday, 30 September, 13

Page 32: Hadoop 101 v1

So lets store multiple copies of the data. Hard drives are CHEAP!

Monday, 30 September, 13

Page 33: Hadoop 101 v1

So lets store multiple copies of the data. Hard drives are CHEAP!

Monday, 30 September, 13

Page 34: Hadoop 101 v1

So lets store multiple copies of the data. Hard drives are CHEAP!

Monday, 30 September, 13

Page 35: Hadoop 101 v1

If one hard drive fails... we are still OK

Monday, 30 September, 13

Page 36: Hadoop 101 v1

If one computer fails... we are still OK

Monday, 30 September, 13

Page 37: Hadoop 101 v1

Even if a whole rack fails... we are still OK

Monday, 30 September, 13

Page 38: Hadoop 101 v1

Once we find a failure let’s have the system recopy the copies.

Monday, 30 September, 13

Page 39: Hadoop 101 v1

Guess, what? We’ve just invented Hadoop!

Monday, 30 September, 13

Page 40: Hadoop 101 v1

So let’s talk about the pieces of Hadoop.

Monday, 30 September, 13

Page 41: Hadoop 101 v1

Data nodes store and manage the data on a single “slave” computer

Data Node

Monday, 30 September, 13

Page 42: Hadoop 101 v1

Task trackers manage the compute

Data Node

Task Tracker

Monday, 30 September, 13

Page 43: Hadoop 101 v1

Job tracker manages task trackers, ships code to compute nodes

Data Node

Task TrackerJob Tracker

Monday, 30 September, 13

Page 44: Hadoop 101 v1

Name node manages distribution and replication on the data nodes

Data Node

Task TrackerJob Tracker

Name Node

Monday, 30 September, 13

Page 45: Hadoop 101 v1

Map Reduce

Task TrackerJob Tracker

Monday, 30 September, 13

Page 46: Hadoop 101 v1

HDFS (Hadoop Distributed File System)

Data Node

Name Node

Monday, 30 September, 13

Page 47: Hadoop 101 v1

VISUAL EXAMPLEMonday, 30 September, 13

Page 48: Hadoop 101 v1

MAPMonday, 30 September, 13

Page 49: Hadoop 101 v1

SHUFFLEMonday, 30 September, 13

Page 50: Hadoop 101 v1

REDUCEMonday, 30 September, 13

Page 51: Hadoop 101 v1

PUTTING IT ALL TOGETHERMonday, 30 September, 13

Page 52: Hadoop 101 v1

SO WHAT?

Monday, 30 September, 13

Page 53: Hadoop 101 v1

THAT’S WHAT!Monday, 30 September, 13

Page 54: Hadoop 101 v1

OPPORTUNITY!Monday, 30 September, 13

Page 55: Hadoop 101 v1

HADOOP ECOSYSTEMMonday, 30 September, 13