Hadoop 101: North East Wisconsin Code Camp

Post on 16-Jul-2015

121 views 1 download

Tags:

Transcript of Hadoop 101: North East Wisconsin Code Camp

HADOOP

101Cluster Computing Made Easy

Show of Hands

Big Data

Big Data

Volume

Variety

Velocity

Common Types of Analysis

Text mining

Index building

Graph creation and analysis

Pattern recognition

Collaborative filtering

Prediction Models

Sentiment Analysis

Risk Assessment

Hadoop

Hadoop is a cluster storage and computing

framework.

Changing of the Guard

“Scale out guarantees that

hardware and software will

fail”

“I don’t want to see anymore

2001 papers about awesome

my IT team was because they

could reshard my database

on demand.”

Storage

A

B

A

A

A

B

B

B

Storage

A

B

A

A

A

B

B

B

Tunneling Through the Cost

Barrier

Solutions

Solutions

Solutions

“In pioneer days they

used oxen for heavy

pulling, and when one ox

couldn’t budge a log, we

didn’t try to grow a larger

ox. We shouldn’t be trying

for bigger computers, but

for more systems of

computers.”

Cluster Computing

Complexities

Process management

Communication

Data movement

Task coordination

Partial failures

Scheduling

Tracking

Cluster Computing

Complexities

Process management

Communication

Data movement

Task coordination

Partial failures

Scheduling

Tracking

RobustnessResiliencePerformanceSimplicity

Where Do You Fit?

Input Split 1

Shuffle and Sort

Record

Reader

Output Format

Reducer

Mapper

Partitioner

Output File

Input Split 2

Record

Reader

Mapper

Partitioner

Input Split n

Record

Reader

Mapper

Partitioner

Output Format

Reducer

Output File

Storage

A

B

A

A

A

B

B

B

Where Do You Fit?

Input Split A

Shuffle and Sort

Record

Reader

Output Format

Reducer

Mapper

Partitioner

Output File

Input Split B

Record

Reader

Mapper

Partitioner

Output Format

Reducer

Output File

Mapper Purpose

Sanitize Data

Select Subsets

Convert

Input Split A

Record

Reader

Mapper

Partitioner

Mapper

Input:

Key

Value

Context

Output:

Key

Value

Input Split A

Record

Reader

Mapper

Partitioner

Mapper

Word Count Mapper

Input: (Long, Text)

Key: 0

Value: “the cat sat on the mat”

Output: (Text, Long)

Key Value

the 1

cat 1

sat 1

on 1

the 1

mat 1

Where Do You Fit?

Input Split A

Shuffle and Sort

Record

Reader

Output Format

Reducer

Mapper

Partitioner

Output File

Input Split B

Record

Reader

Mapper

Partitioner

Output Format

Reducer

Output File

Reducer

Input:

Key

Values // This is an iterable

Context

Output:

Key

Value

Reducer

Key Values

cat 1

mat 1

on 1

sat 1

the 1, 1

cat 1

mat 1

on 1

sat 1

the 2

Reducer

reduce(){

}

part-r-00001

Demo

MRUnit

Mapper

Reducer

Run the whole cycle

Platform

Bibliography

Rear Admiral Hopper http://www.youtube.com/watch?v=1-

vcErOPofQ

Mike Olson talk http://web.archive.org/web/20130729201323id_/http://itc.conversationsnetw

ork.org/shows/detail4868.html

Large Scale C++ by John Lakos http://www.amazon.com/Large-

Scale-Software-Design-John-Lakos/dp/0201633620

Jim Argeropoulos

tenholeharp@gmail.com

@exploremqt

https://github.com/exploremqt