An Introduction to MapReduce 2 and YARN

23
An Introduction to MapReduce 2 and YARN Tom White April 25, 2012 Seattle Hadoop / Scalability / NoSQL Meetup Wednesday, April 25, 2012

description

 

Transcript of An Introduction to MapReduce 2 and YARN

Page 1: An Introduction to MapReduce 2 and YARN

An Introduction to MapReduce 2 and

YARN

Tom WhiteApril 25, 2012

Seattle Hadoop / Scalability / NoSQL Meetup

Wednesday, April 25, 2012

Page 2: An Introduction to MapReduce 2 and YARN

First, whatʼs MapReduce 1?

Wednesday, April 25, 2012

Page 3: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 4: An Introduction to MapReduce 2 and YARN

Whatʼs wrong with MR1?

Wednesday, April 25, 2012

Page 5: An Introduction to MapReduce 2 and YARN

Motivation

•Scaling >4000 nodes

•HA of Job Tracker

•Poor resource utilization

Wednesday, April 25, 2012

Page 6: An Introduction to MapReduce 2 and YARN

Yet Another Resource Negotiator

Wednesday, April 25, 2012

Page 7: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 8: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 9: An Introduction to MapReduce 2 and YARN

Node Manageris a generalized Task Tracker• Task Tracker

• fixed number of map or reduce slots

• Node Manager

• containers with variable resource limits

Wednesday, April 25, 2012

Page 10: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 11: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 12: An Introduction to MapReduce 2 and YARN

MR is user spaceYARN is kernel

Wednesday, April 25, 2012

Page 13: An Introduction to MapReduce 2 and YARN

Bonus Apps

•Distributed shell

•MPI (MAPREDUCE-2911)

•Master-worker (MAPREDUCE-3315)

•Apache Giraph, Hama

Wednesday, April 25, 2012

Page 14: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 15: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 16: An Introduction to MapReduce 2 and YARN

Old API ≠ MR1New API ≠ MR2

Wednesday, April 25, 2012

Page 17: An Introduction to MapReduce 2 and YARN

Old APIo.a.h.mapred

New APIo.a.h.mapreduce

MR1 ✓ ✓

MR2 ✓ ✓

Wednesday, April 25, 2012

Page 18: An Introduction to MapReduce 2 and YARN

Wednesday, April 25, 2012

Page 19: An Introduction to MapReduce 2 and YARN

Try out MR2

•Apache Hadoop 0.23.1

•CDH4 Beta 2

Wednesday, April 25, 2012

Page 20: An Introduction to MapReduce 2 and YARN

<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>1.0.2</version></dependency>

<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>0.23.1</version></dependency>

MR1

MR2

Wednesday, April 25, 2012

Page 21: An Introduction to MapReduce 2 and YARN

TODO

• Still alpha status

• Performance tuning

• Usability bug fixes

• RM recovery

• Security in MR2 not complete

Wednesday, April 25, 2012

Page 22: An Introduction to MapReduce 2 and YARN

Further Reading

Wednesday, April 25, 2012

Page 23: An Introduction to MapReduce 2 and YARN

Thank You!

Wednesday, April 25, 2012