13-the mapreduce programming model and implementations slides.pdf

download 13-the mapreduce programming model and implementations slides.pdf

of 90

Transcript of 13-the mapreduce programming model and implementations slides.pdf

  • 5: MapReduce Theory and Implementation

    Zubair Nabi

    [email protected]

    April 18, 2013

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 1 / 34

  • Outline

    1 Introduction

    2 Programming Model

    3 Implementation

    4 Refinements

    5 Hadoop

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 2 / 34

  • Outline

    1 Introduction

    2 Programming Model

    3 Implementation

    4 Refinements

    5 Hadoop

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 3 / 34

  • Common computations at Google

    Process large amounts of data generated from crawled documents,web request logs, etc.

    Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

    1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

    2 Input data is large3 The original simple computation is made complex by system-level code

    to deal with issues of work assignment and distribution, andfault-tolerance

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

  • Common computations at Google

    Process large amounts of data generated from crawled documents,web request logs, etc.

    Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.

    Common properties:1 Computation is conceptually simple and is distributed across hundreds

    or thousands of machines to leverage parallelism2 Input data is large3 The original simple computation is made complex by system-level code

    to deal with issues of work assignment and distribution, andfault-tolerance

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

  • Common computations at Google

    Process large amounts of data generated from crawled documents,web request logs, etc.

    Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

    1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

    2 Input data is large3 The original simple computation is made complex by system-level code

    to deal with issues of work assignment and distribution, andfault-tolerance

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

  • Common computations at Google

    Process large amounts of data generated from crawled documents,web request logs, etc.

    Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

    1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

    2 Input data is large

    3 The original simple computation is made complex by system-level codeto deal with issues of work assignment and distribution, andfault-tolerance

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

  • Common computations at Google

    Process large amounts of data generated from crawled documents,web request logs, etc.

    Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

    1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

    2 Input data is large3 The original simple computation is made complex by system-level code

    to deal with issues of work assignment and distribution, andfault-tolerance

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

  • Enter MapReduce

    Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

    I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

    and load balancingI Relies on user-provided map and reduce primitives present in functional

    languages

    Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

  • Enter MapReduce

    Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

    I Abstraction that helps the programmer express simple computations

    I Hides the gory details of parallelization, fault-tolerance, data distribution,and load balancing

    I Relies on user-provided map and reduce primitives present in functionallanguages

    Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

  • Enter MapReduce

    Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

    I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

    and load balancing

    I Relies on user-provided map and reduce primitives present in functionallanguages

    Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

  • Enter MapReduce

    Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

    I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

    and load balancingI Relies on user-provided map and reduce primitives present in functional

    languages

    Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

  • Enter MapReduce

    Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

    I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

    and load balancingI Relies on user-provided map and reduce primitives present in functional

    languages

    Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

  • Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 6 / 34

  • Outline

    1 Introduction

    2 Programming Model

    3 Implementation

    4 Refinements

    5 Hadoop

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 7 / 34

  • Programming Model

    Input: Set of key/value pairs

    Output: Set of key/value pairs

    The user provides the entire computation in the form of two functions:map and reduce

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 8 / 34

  • Programming Model

    Input: Set of key/value pairs

    Output: Set of key/value pairs

    The user provides the entire computation in the form of two functions:map and reduce

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 8 / 34

  • Programming Model

    Input: Set of key/value pairs

    Output: Set of key/value pairs

    The user provides the entire computation in the form of two functions:map and reduce

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 8 / 34

  • User-defined functions

    1 MapI Takes an input pair and produces a set of intermediate key/value pairs

    I The framework groups together the intermediate values by key forconsumption by the Reduce

    2 ReduceI Takes as input a key and a list of associated valuesI In the common case, it merges these values to result in a smaller set of

    values

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

  • User-defined functions

    1 MapI Takes an input pair and produces a set of intermediate key/value pairsI The framework groups together the intermediate values by key for

    consumption by the Reduce

    2 ReduceI Takes as input a key and a list of associated valuesI In the common case, it merges these values to result in a smaller set of

    values

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

  • User-defined functions

    1 MapI Takes an input pair and produces a set of intermediate key/value pairsI The framework groups together the intermediate values by key for

    consumption by the Reduce

    2 ReduceI Takes as input a key and a list of associated values

    I In the common case, it merges these values to result in a smaller set ofvalues

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

  • User-defined functions

    1 MapI Takes an input pair and produces a set of intermediate key/value pairsI The framework groups together the intermediate values by key for

    consumption by the Reduce

    2 ReduceI Takes as input a key and a list of associated valuesI In the common case, it merges these values to result in a smaller set of

    values

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

  • Example: Word Count

    Counting the occurrence of each word in a large collection of documents

    1 MapI Emits each word and the value 1

    2 ReduceI Sums together all counts emitted for a particular word

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 10 / 34

  • Example: Word Count

    Counting the occurrence of each word in a large collection of documents1 Map

    I Emits each word and the value 1

    2 ReduceI Sums together all counts emitted for a particular word

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 10 / 34

  • Example: Word Count

    Counting the occurrence of each word in a large collection of documents1 Map

    I Emits each word and the value 1

    2 ReduceI Sums together all counts emitted for a particular word

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 10 / 34

  • Example: Word Count(2)

    1 map(String key, String value):2 // key: document name3 // value: document contents4 for each word w in value:5 EmitIntermediate(w, "1");67 reduce(String key, Iterator values):8 // key: a word9 // values: a list of counts

    10 int result = 0;11 for each v in values:12 result += ParseInt(v);13 Emit(AsString(result));

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 11 / 34

  • Types

    User-supplied map and reduce functions have associated types1 Map

    I map(k1, v1) list(k2, v2)

    2 ReduceI reduce(k2, list(v2)) list(v2)

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 12 / 34

  • Types

    User-supplied map and reduce functions have associated types1 Map

    I map(k1, v1) list(k2, v2)2 Reduce

    I reduce(k2, list(v2)) list(v2)

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 12 / 34

  • More applications

    Distributed Grep1 Map

    F Emits a line if its matches a user-provided pattern

    2 ReduceF Identity function

    Count of URL Access Frequency1 Map

    F Similar to Word Count map. Instead of words we have URLs

    2 ReduceF Similar to Word Count reduce

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 13 / 34

  • More applications

    Distributed Grep1 Map

    F Emits a line if its matches a user-provided pattern

    2 ReduceF Identity function

    Count of URL Access Frequency1 Map

    F Similar to Word Count map. Instead of words we have URLs

    2 ReduceF Similar to Word Count reduce

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 13 / 34

  • More applications (2)

    Inverted Index1 Map

    F Emits a sequence of < word, document_ID >

    2 ReduceF Emits < word, list(document_ID) >

    Distributed Sort1 Map

    F Identity

    2 ReduceF Identity

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 14 / 34

  • More applications (2)

    Inverted Index1 Map

    F Emits a sequence of < word, document_ID >

    2 ReduceF Emits < word, list(document_ID) >

    Distributed Sort1 Map

    F Identity

    2 ReduceF Identity

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 14 / 34

  • Outline

    1 Introduction

    2 Programming Model

    3 Implementation

    4 Refinements

    5 Hadoop

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 15 / 34

  • Cluster architecture

    A large cluster of shared-nothing commodity machines connected viaEthernet

    Each node is an x86 system running Linux with local memory

    Commodity networking hardware connected in the form of a treetopology

    As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

    I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

    Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

  • Cluster architecture

    A large cluster of shared-nothing commodity machines connected viaEthernet

    Each node is an x86 system running Linux with local memory

    Commodity networking hardware connected in the form of a treetopology

    As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

    I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

    Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

  • Cluster architecture

    A large cluster of shared-nothing commodity machines connected viaEthernet

    Each node is an x86 system running Linux with local memory

    Commodity networking hardware connected in the form of a treetopology

    As clusters consist of hundreds or thousands of machines, failure ispretty common

    Each machine consists of local hard-drivesI Google Filesystem runs atop of these disks which employs replication to

    ensure availability and reliability

    Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

  • Cluster architecture

    A large cluster of shared-nothing commodity machines connected viaEthernet

    Each node is an x86 system running Linux with local memory

    Commodity networking hardware connected in the form of a treetopology

    As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

    I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

    Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

  • Cluster architecture

    A large cluster of shared-nothing commodity machines connected viaEthernet

    Each node is an x86 system running Linux with local memory

    Commodity networking hardware connected in the form of a treetopology

    As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

    I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

    Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

  • Cluster architecture

    A large cluster of shared-nothing commodity machines connected viaEthernet

    Each node is an x86 system running Linux with local memory

    Commodity networking hardware connected in the form of a treetopology

    As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

    I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

    Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

  • MapReduce architecture

    1 Master: In charge of all meta data, work scheduling and distribution,and job orchestration

    2 Workers: Contain slots to execute map or reduce functions

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 17 / 34

  • MapReduce architecture

    1 Master: In charge of all meta data, work scheduling and distribution,and job orchestration

    2 Workers: Contain slots to execute map or reduce functions

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 17 / 34

  • Execution

    1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

    2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

    I The GFS block size is typically a multiple of 64MB

    3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

    4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

  • Execution

    1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

    2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

    I The GFS block size is typically a multiple of 64MB

    3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

    4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

  • Execution

    1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

    2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

    I The GFS block size is typically a multiple of 64MB

    3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

    4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

  • Execution

    1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

    2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

    I The GFS block size is typically a multiple of 64MB

    3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

    4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

  • Execution

    1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

    2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

    I The GFS block size is typically a multiple of 64MB

    3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

    4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

  • Mappers

    1 A map worker reads the contents of the input split that it has beenassigned

    2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

    3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

    4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

  • Mappers

    1 A map worker reads the contents of the input split that it has beenassigned

    2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

    3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

    4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

  • Mappers

    1 A map worker reads the contents of the input split that it has beenassigned

    2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

    3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

    4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

  • Mappers

    1 A map worker reads the contents of the input split that it has beenassigned

    2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

    3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

    4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

  • Reducers

    1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

    2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

    3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

    4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

  • Reducers

    1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

    2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

    3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

    4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

  • Reducers

    1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

    2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

    3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

    4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

  • Reducers

    1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

    2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

    3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

    4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

  • Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 21 / 34

  • Book-keeping by the Master

    The master contains meta-data for all jobs running in the cluster

    For each map and reduce tasks, it stores the state (pending,in-progress, or completed) and the ID of the worker on which it isexecuting (in-progress state)

    It stores the locations and sizes of partitions for each map task

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 22 / 34

  • Book-keeping by the Master

    The master contains meta-data for all jobs running in the cluster

    For each map and reduce tasks, it stores the state (pending,in-progress, or completed) and the ID of the worker on which it isexecuting (in-progress state)

    It stores the locations and sizes of partitions for each map task

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 22 / 34

  • Book-keeping by the Master

    The master contains meta-data for all jobs running in the cluster

    For each map and reduce tasks, it stores the state (pending,in-progress, or completed) and the ID of the worker on which it isexecuting (in-progress state)

    It stores the locations and sizes of partitions for each map task

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 22 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception

    1 Worker:I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

    amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

    nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception1 Worker:

    I Each worker sends a periodic heartbeat signal to the master

    I If the master does not receive a heartbeat from a worker in a certainamount of time, it marks the worker as failed

    I In-progress map and reduce tasks are simply re-executed on othernodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception1 Worker:

    I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

    amount of time, it marks the worker as failed

    I In-progress map and reduce tasks are simply re-executed on othernodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception1 Worker:

    I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

    amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

    nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception1 Worker:

    I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

    amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

    nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception1 Worker:

    I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

    amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

    nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failed

    I But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Fault-tolerance

    For large compute clusters, failures are the norm rather than the exception1 Worker:

    I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

    amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

    nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

    I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

    2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

  • Locality

    Network bandwidth is a scare resource in typical clusters

    GFS slices files into 64MB blocks and stores 3 replicas across thecluster

    The master exploits this information by scheduling a map task near itsinput data. Preference is in the order, node-local, rack/switch-local, andany

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 24 / 34

  • Locality

    Network bandwidth is a scare resource in typical clusters

    GFS slices files into 64MB blocks and stores 3 replicas across thecluster

    The master exploits this information by scheduling a map task near itsinput data. Preference is in the order, node-local, rack/switch-local, andany

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 24 / 34

  • Locality

    Network bandwidth is a scare resource in typical clusters

    GFS slices files into 64MB blocks and stores 3 replicas across thecluster

    The master exploits this information by scheduling a map task near itsinput data. Preference is in the order, node-local, rack/switch-local, andany

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 24 / 34

  • Speculative re-execution

    Every now and then the entire computation is held-up by a stragglertask

    Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

    To deal with stragglers, the master speculatively re-executes slow taskson other machines

    The task is marked as completed whenever the primary or the backupfinishes its execution

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

  • Speculative re-execution

    Every now and then the entire computation is held-up by a stragglertask

    Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

    To deal with stragglers, the master speculatively re-executes slow taskson other machines

    The task is marked as completed whenever the primary or the backupfinishes its execution

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

  • Speculative re-execution

    Every now and then the entire computation is held-up by a stragglertask

    Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

    To deal with stragglers, the master speculatively re-executes slow taskson other machines

    The task is marked as completed whenever the primary or the backupfinishes its execution

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

  • Speculative re-execution

    Every now and then the entire computation is held-up by a stragglertask

    Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

    To deal with stragglers, the master speculatively re-executes slow taskson other machines

    The task is marked as completed whenever the primary or the backupfinishes its execution

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

  • Scalability

    Possible to run on multiple scales: from single nodes to data centerswith tens of thousands of nodes

    Nodes can be added/removed on the fly to scale up/down

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 26 / 34

  • Scalability

    Possible to run on multiple scales: from single nodes to data centerswith tens of thousands of nodes

    Nodes can be added/removed on the fly to scale up/down

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 26 / 34

  • Outline

    1 Introduction

    2 Programming Model

    3 Implementation

    4 Refinements

    5 Hadoop

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 27 / 34

  • Partitioning

    By default MapReduce uses hash partitioning to partition the keyspace

    I hash(key) % R

    Optionally, the user can provide a custom partitioning function to say,negate skew or to ensure that certain keys always end up at aparticular reduce worker

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 28 / 34

  • Partitioning

    By default MapReduce uses hash partitioning to partition the keyspace

    I hash(key) % R

    Optionally, the user can provide a custom partitioning function to say,negate skew or to ensure that certain keys always end up at aparticular reduce worker

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 28 / 34

  • Combiner function

    For reduce functions which are commutative and associative, the usercan additionally provide a combiner function which is applied to theoutput of the map for local merging

    Typically, the same reduce function is used as a combiner

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 29 / 34

  • Combiner function

    For reduce functions which are commutative and associative, the usercan additionally provide a combiner function which is applied to theoutput of the map for local merging

    Typically, the same reduce function is used as a combiner

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 29 / 34

  • Input/output formats

    By default, the library supports a number of input/output formats

    I For instance, text as input and key/value pairs as output

    Optionally, the user can specify custom input readers and outputwriters

    I For instance, to read/write from/to a database

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

  • Input/output formats

    By default, the library supports a number of input/output formatsI For instance, text as input and key/value pairs as output

    Optionally, the user can specify custom input readers and outputwriters

    I For instance, to read/write from/to a database

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

  • Input/output formats

    By default, the library supports a number of input/output formatsI For instance, text as input and key/value pairs as output

    Optionally, the user can specify custom input readers and outputwriters

    I For instance, to read/write from/to a database

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

  • Input/output formats

    By default, the library supports a number of input/output formatsI For instance, text as input and key/value pairs as output

    Optionally, the user can specify custom input readers and outputwriters

    I For instance, to read/write from/to a database

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

  • Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 31 / 34

  • Outline

    1 Introduction

    2 Programming Model

    3 Implementation

    4 Refinements

    5 Hadoop

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 32 / 34

  • Hadoop

    Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

    Now a top-level Apache open-source project

    Implemented in Java (Googles in-house implementation is in C++)

    Comes with an associated distributed filesystem, HDFS (clone of GFS)

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

  • Hadoop

    Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

    Now a top-level Apache open-source project

    Implemented in Java (Googles in-house implementation is in C++)

    Comes with an associated distributed filesystem, HDFS (clone of GFS)

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

  • Hadoop

    Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

    Now a top-level Apache open-source project

    Implemented in Java (Googles in-house implementation is in C++)

    Comes with an associated distributed filesystem, HDFS (clone of GFS)

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

  • Hadoop

    Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

    Now a top-level Apache open-source project

    Implemented in Java (Googles in-house implementation is in C++)

    Comes with an associated distributed filesystem, HDFS (clone of GFS)

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

  • References

    Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplifieddata processing on large clusters. In Proceedings of the 6thSymposium on Operating Systems Design & Implementation -(OSDI04), Vol. 6. USENIX Association, Berkeley, CA, USA.

    Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 34 / 34

    IntroductionProgramming ModelImplementationRefinementsHadoop