Google MapReduce

42
A DISTRIBUTED COMPUTING SOLUTION Google MapReduce

description

Google MapReduce. A Distributed Computing Solution. Outlines. Introduction MapReduce Model Implementation Refinements Q&A. Outlines. Introduction MapReduce Model Implementation Refinements Q&A. Introduction. MapReduce is designed to process large amount of distributed raw data. - PowerPoint PPT Presentation

Transcript of Google MapReduce

Page 1: Google  MapReduce

A DISTRIBUTED COMPUTING SOLUTION

Google MapReduce

Page 2: Google  MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Page 3: Google  MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Page 4: Google  MapReduce

Introduction

MapReduce is designed to process large amount of distributed raw data. Machine learning, clustering data, query, graph-

computation… The most significant use to date is indexing.

What’s the issues to be concerned?

Page 5: Google  MapReduce

Introduction

AbstractionReliability

Fault-tolerance Load-balancing

Efficiency Parallelization Data distribution

Page 6: Google  MapReduce

Brief Ideas

User defines the input data into key/value pairs.

User defines Mapper function to process key/value pairs into intermediate key/value pairs.

User defines Reducer function to process intermediate key/value pairs and producing results.

Page 7: Google  MapReduce

Brief Ideas

Why it works?

An interface that abstracts the large-scale computation into two main function.

Automatic ParallelizationRobustness

Page 8: Google  MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Page 9: Google  MapReduce

Model

Page 10: Google  MapReduce

Key/Value Pair

Input data type.Consists of a key and a value.Both key and value are of String type.

Page 11: Google  MapReduce

Map function

User specified function.Processes some splits of input key/value pairs

and produces intermediate key/value pairs

The output should be sorted (by key) and flushed to disk. Sorting makes sure pairs with the same key are

grouped together.

(key,value)[] map((key,value))

Page 12: Google  MapReduce

Intermediate Key/Value Pairs

Output of mappers.Intermediate key/value pairs will be further

shuffled(grouped) by reducer. The process is implemented by user specified

partitioning function. Ex. Hash(key) mod R Now the pairs become “key/value[]” pair

Page 13: Google  MapReduce

Reduce function

Another user specified function.Integrate key and list of values into result

output.

Usually the result contains only one value or even none.

value[] reduce((key,value[]))

Page 14: Google  MapReduce

Example : Word Counter

map( String key, String value ){for each word w in value{

EmitIntermediate(w,”1”);}

}……then all intermediates having the same w will be shuffled into (w,(”1”,”1”,…,”1”))

Reduce( String key, Iterator values ){int result = 0;for each v in values{

result += parseInt(v);}Emit(toString(result));

}

Page 15: Google  MapReduce

Other examples

Counting URL access frequency Map function processes URL log and output <URL, 1> Reduce function adds together all values for the same

URL and emits the count for each URL.Reverse Web-Link Graph

Map function outputs <target, source> Reduce function outputs <target, list(source)>

Inverted Index Map function processes document into <word,

documentID> Reduce function sorts documentID and emits <word,

list(documentID)>

Page 16: Google  MapReduce

Using MapReduce Model

User specifies the parallelization level.User specifies map and reduce functionUser specifies partitioning function

Page 17: Google  MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Page 18: Google  MapReduce

Implementation

A widely used implementation at Google.

Large cluster of commodity PCs, connected by switched Ethernet. Machine : dual-processor x86 processors running

Linux Memory : 2-4GB per machine Networking : 100MB~1GB/second per machine

Maybe slower because of bisection bandwidth. Storage : Inexpensive IDE disks directly attached to

machines File system : GFS (High availability and reliability)

Page 19: Google  MapReduce

Execution

User program calls MapReduce function.The execution steps are briefly illustrated

below.

Page 20: Google  MapReduce

Execution Flow

Page 21: Google  MapReduce

Execution

1. The library in the user program splits input data file into M splits. Then it starts the program on a cluster of machines.

M is chosen so that each splits will be 16~64MB per piece. 16 and 64 can be configured by user.

Locality optimization One of the machines will be elected as the master,

and others as workers.

Page 22: Google  MapReduce

Execution

2. Master assumes there are M map tasks and R reduce tasks to be assigned. Then it assigns the idle workers a map task or reduce task.

R also can be manually configured. But usually constrained because each reduce worker generate a separate file.

Usually M and R are chosen to be a multiple of the number of worker machines.

Better load balancing Speeds up worker-failing recovery.

Page 23: Google  MapReduce

Execution

3. The worker which is assigned the ith map task will read the ith input splits. It further parse the data and feeds the key/value pairs into map function, then buffered the output in the memory.

Page 24: Google  MapReduce

Execution

4. Periodically the memory will be flushed to local disk. Now the intermediate key/value pairs will be partitioned in R groups.

By the partitioning function. (ex. Hash(key) mod R)

Finishing the storing, the worker notice the master the locations where it stores the data.

The data will be forwarded to reduce worker later.

Page 25: Google  MapReduce

Execution

5. The master notify a reduce worker the location of the locations of its input are. Then the ith reduce worker will reads the ith group of intermediate key/value pairs. Then sort the pairs read.

Sorting ensures that the pairs with the same key will be put together.

Sorting is needed because there maybe multiple keys to process in a given task.

If the amount of the data exceeds memory, external sort will be used.

Page 26: Google  MapReduce

Execution

6. After sorting each unique key encountered, it collects all values follow the key and feeds the key/values pair to reduce function. The output will be appended to its final output file.

Page 27: Google  MapReduce

Execution

7. When all map tasks and reduce tasks completes, the master will wakeup the user program and the MapReduce function returns.

There will be exactly R output files. Usually the output files are given by user program,

so no return value is needed.

Page 28: Google  MapReduce

Illustration

1.Initializing

0123

2. Assign tasks3.Read Inputs4. Store and complete 5. Read intermediate6. Output7. End operation

Mappers

Reducers

MAP

Done!

REDUCEDone!

Page 29: Google  MapReduce

Master

The scheduler of the whole MapReduce process.

It pings each worker periodically.It keeps the state of each task (either map or

reduce) in memory. State is of three possible value { idle, in-progress,

completed } Workers completing their task will let the master

know. So the master knows where to get the intermediate key/value pairs.

It keeps all intermediate pairs’ location and propagates them to reduce workers.

Page 30: Google  MapReduce

Fault Tolerance

What can go wrong? preemption 78.3%

exceeded resources 10.8%crashed 9.2%machine failure 1.6%

Worker fails Offline, Straggler

Master failsInput and output file consistency is ensured

by file system (like GFS).

Page 31: Google  MapReduce

Worker Failure (Offline)

Master pings worker periodically to detect failing worker.

In-progress task assigned to the worker will be re-assigned.

To map worker, any completed map task (their state should be completed) done by the worker will be set back to idle state and can be assigned to other worker again. Because of the loss of the output in the worker. All reduce workers will be notified for the re-

execution.

Page 32: Google  MapReduce

Worker Failure (Straggler)

The worker that take unusual long time to complete the task. Bad disk , resource competition, ill caching

mechanism…The master assigns a backup task to the in-

progress task. Either the primary or backup will complete the task

first, and the master just ignores the later one.

A backup-disabled MapReduce takes about 44% longer times to accomplish the work.

Page 33: Google  MapReduce

Data Consistency

Every map worker generates R temporary files, and every reduce worker generate 1 output file.

All reduce worker writes to a temporary file. The file will be (atomically) renamed to the final output. The atomicity is guaranteed by underlying file system.

Page 34: Google  MapReduce

Master Failure

Master periodically checkpoints its data structure.

If master dies, a new copy can be started from the last checkpoints.

This happens relatively unlikely. (There will always be one master)

Real implementation just abort MapReduce operation and can be retry later.

Page 35: Google  MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Page 36: Google  MapReduce

Locality

GFS generally has 3 replicas for each file chunk (on different machines).

Master will try to assign a map task on a machine that contains a replica of input. The locality will help saving network bandwidth. If impossible, than a nearby machine can be selected.

Empirically a large fraction of input are read locally.

Page 37: Google  MapReduce

Partitioning Function

Can use simple hash-modular method.

Sometimes it will be better if we partition the data so that the output file will be well-organized. Ex: Hash(Hostname(keyURL)) mod R

Page 38: Google  MapReduce

Combiner

In some cases there will be much repetition of intermediate pairs.

An intermediate function between output of map function and intermediate key/value pairs. Ex. Thousands of <word, 1> can be merged to <word,

k> Saving a lot of network bandwidth between reduce

worker and map worker.

Page 39: Google  MapReduce

Skipping Bad Sectors

Sometimes there are bugs in user code so that worker fails to process some records.

If a records failed, the argument’s ID will be sent to the master.

If the ID is seen more than once, than the operation just skip that record. This service can be manually turned off.

Page 40: Google  MapReduce

Counter

Sometimes user will need to accumulate some number of the occur of special events. The words processed, the Chinese documents

processed…

User can create a Counter object and increment it in map or reduce function.

The counter value is propagated to master on periodic ping. All values will later be aggregated. Only successful task’s value will be aggregated

Page 41: Google  MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Page 42: Google  MapReduce

Q&A

Any questions?