Google MapReduce

A DISTRIBUTED COMPUTING SOLUTION

Google MapReduce

Outlines

IntroductionMapReduce ModelImplementationRefinementsQ&A

Introduction

MapReduce is designed to process large amount of distributed raw data. Machine learning, clustering data, query, graph-

computation… The most significant use to date is indexing.

What’s the issues to be concerned?

Introduction

AbstractionReliability

Fault-tolerance Load-balancing

Efficiency Parallelization Data distribution

Brief Ideas

User defines the input data into key/value pairs.

User defines Mapper function to process key/value pairs into intermediate key/value pairs.

User defines Reducer function to process intermediate key/value pairs and producing results.

Brief Ideas

Why it works?

An interface that abstracts the large-scale computation into two main function.

Automatic ParallelizationRobustness

Outlines


Key/Value Pair

Input data type.Consists of a key and a value.Both key and value are of String type.

Map function

User specified function.Processes some splits of input key/value pairs

and produces intermediate key/value pairs

The output should be sorted (by key) and flushed to disk. Sorting makes sure pairs with the same key are

grouped together.

(key,value)[] map((key,value))

Intermediate Key/Value Pairs

Output of mappers.Intermediate key/value pairs will be further

shuffled(grouped) by reducer. The process is implemented by user specified

partitioning function. Ex. Hash(key) mod R Now the pairs become “key/value[]” pair

Reduce function

Another user specified function.Integrate key and list of values into result

output.

Usually the result contains only one value or even none.

value[] reduce((key,value[]))

Example : Word Counter

map( String key, String value ){for each word w in value{

EmitIntermediate(w,”1”);}

}……then all intermediates having the same w will be shuffled into (w,(”1”,”1”,…,”1”))

Reduce( String key, Iterator values ){int result = 0;for each v in values{

result += parseInt(v);}Emit(toString(result));

}

Other examples

Counting URL access frequency Map function processes URL log and output <URL, 1> Reduce function adds together all values for the same

URL and emits the count for each URL.Reverse Web-Link Graph

Map function outputs <target, source> Reduce function outputs <target, list(source)>

Inverted Index Map function processes document into <word,

documentID> Reduce function sorts documentID and emits <word,

list(documentID)>

Using MapReduce Model

User specifies the parallelization level.User specifies map and reduce functionUser specifies partitioning function

Outlines


Implementation

A widely used implementation at Google.

Large cluster of commodity PCs, connected by switched Ethernet. Machine : dual-processor x86 processors running

Linux Memory : 2-4GB per machine Networking : 100MB~1GB/second per machine

Maybe slower because of bisection bandwidth. Storage : Inexpensive IDE disks directly attached to

machines File system : GFS (High availability and reliability)

Execution

User program calls MapReduce function.The execution steps are briefly illustrated

below.

Execution Flow

Execution

1. The library in the user program splits input data file into M splits. Then it starts the program on a cluster of machines.

M is chosen so that each splits will be 16~64MB per piece. 16 and 64 can be configured by user.

Locality optimization One of the machines will be elected as the master,

and others as workers.

Execution

2. Master assumes there are M map tasks and R reduce tasks to be assigned. Then it assigns the idle workers a map task or reduce task.

R also can be manually configured. But usually constrained because each reduce worker generate a separate file.

Usually M and R are chosen to be a multiple of the number of worker machines.

Better load balancing Speeds up worker-failing recovery.

Execution

3. The worker which is assigned the ith map task will read the ith input splits. It further parse the data and feeds the key/value pairs into map function, then buffered the output in the memory.

Execution

4. Periodically the memory will be flushed to local disk. Now the intermediate key/value pairs will be partitioned in R groups.

By the partitioning function. (ex. Hash(key) mod R)

Finishing the storing, the worker notice the master the locations where it stores the data.

The data will be forwarded to reduce worker later.

Execution

5. The master notify a reduce worker the location of the locations of its input are. Then the ith reduce worker will reads the ith group of intermediate key/value pairs. Then sort the pairs read.

Sorting ensures that the pairs with the same key will be put together.

Sorting is needed because there maybe multiple keys to process in a given task.

If the amount of the data exceeds memory, external sort will be used.

Execution

6. After sorting each unique key encountered, it collects all values follow the key and feeds the key/values pair to reduce function. The output will be appended to its final output file.

Execution

7. When all map tasks and reduce tasks completes, the master will wakeup the user program and the MapReduce function returns.

There will be exactly R output files. Usually the output files are given by user program,

so no return value is needed.

Illustration

1.Initializing

0123

2. Assign tasks3.Read Inputs4. Store and complete 5. Read intermediate6. Output7. End operation

Mappers

Reducers

MAP

Done!

REDUCEDone!

Master

The scheduler of the whole MapReduce process.

It pings each worker periodically.It keeps the state of each task (either map or

reduce) in memory. State is of three possible value { idle, in-progress,

completed } Workers completing their task will let the master

know. So the master knows where to get the intermediate key/value pairs.

It keeps all intermediate pairs’ location and propagates them to reduce workers.

Fault Tolerance

What can go wrong? preemption 78.3%

exceeded resources 10.8%crashed 9.2%machine failure 1.6%

Worker fails Offline, Straggler

Master failsInput and output file consistency is ensured

by file system (like GFS).

Worker Failure (Offline)

Master pings worker periodically to detect failing worker.

In-progress task assigned to the worker will be re-assigned.

To map worker, any completed map task (their state should be completed) done by the worker will be set back to idle state and can be assigned to other worker again. Because of the loss of the output in the worker. All reduce workers will be notified for the re-

execution.

Worker Failure (Straggler)

The worker that take unusual long time to complete the task. Bad disk , resource competition, ill caching

mechanism…The master assigns a backup task to the in-

progress task. Either the primary or backup will complete the task

first, and the master just ignores the later one.

A backup-disabled MapReduce takes about 44% longer times to accomplish the work.

Data Consistency

Every map worker generates R temporary files, and every reduce worker generate 1 output file.

All reduce worker writes to a temporary file. The file will be (atomically) renamed to the final output. The atomicity is guaranteed by underlying file system.

Master Failure

Master periodically checkpoints its data structure.

If master dies, a new copy can be started from the last checkpoints.

This happens relatively unlikely. (There will always be one master)

Real implementation just abort MapReduce operation and can be retry later.

Outlines


Locality

GFS generally has 3 replicas for each file chunk (on different machines).

Master will try to assign a map task on a machine that contains a replica of input. The locality will help saving network bandwidth. If impossible, than a nearby machine can be selected.

Empirically a large fraction of input are read locally.

Partitioning Function

Can use simple hash-modular method.

Sometimes it will be better if we partition the data so that the output file will be well-organized. Ex: Hash(Hostname(keyURL)) mod R

Combiner

In some cases there will be much repetition of intermediate pairs.

An intermediate function between output of map function and intermediate key/value pairs. Ex. Thousands of <word, 1> can be merged to <word,

k> Saving a lot of network bandwidth between reduce

worker and map worker.

Skipping Bad Sectors

Sometimes there are bugs in user code so that worker fails to process some records.

If a records failed, the argument’s ID will be sent to the master.

If the ID is seen more than once, than the operation just skip that record. This service can be manually turned off.

Counter

Sometimes user will need to accumulate some number of the occur of special events. The words processed, the Chinese documents

processed…

User can create a Counter object and increment it in map or reduce function.

The counter value is propagated to master on periodic ping. All values will later be aggregated. Only successful task’s value will be aggregated

Outlines


Q&A

Any questions?

Google MapReduce

Documents

Transcript of Google MapReduce