Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where...

14
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Principles of Data Management Lecture #16 (MapReduce & DFS for Big Data) Instructor: Mike Carey [email protected] Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 Today’s News Bulletin Project dates Query execution layer is due on 3/17 Upcoming lectures Today: MapReduce (and distributed file systems) Next week: Wrap-up & review, in-class endterm Other upcoming events The long-lost midterms will appear on Tuesday! Class participation opportunities Teaching evaluations (after Tuesday ) End-of-term opinion survey (watch for it!)

Transcript of Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where...

Page 1: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Principles of Data Management

Lecture #16 (MapReduce & DFS for Big Data)

Instructor: Mike Carey [email protected]

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

Today’s News Bulletin

v  Project dates §  Query execution layer is due on 3/17

v  Upcoming lectures §  Today: MapReduce (and distributed file systems) §  Next week: Wrap-up & review, in-class endterm

v  Other upcoming events §  The long-lost midterms will appear on Tuesday!

v  Class participation opportunities §  Teaching evaluations (after Tuesday J) §  End-of-term opinion survey (watch for it!)

Page 2: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

Motivation

v  Google needed to process web-scale data §  Data much larger than what fits on one machine §  Needed parallel processing to get results in a

reasonable time §  Wanted to use cheap commodity machines to do

the job v  Credits: Some of the following slide content is excerpted from

the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying DFS.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

Requirements

v  Solution must §  Scale to 1000s of compute nodes §  Must automatically handle faults §  Provide monitoring of jobs §  Be easy for programmers to use

Page 3: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

MapReduce Programming model

v  Input and Output are sets of key/value pairs v  Programmer provides two functions

§  map(K1, V1) -> list(K2, V2) • Produces list of intermediate key/value pairs for each

input key/value pair

§  reduce(K2, list(V2)) -> list(K3, V3) • Produces a list of result values for all intermediate values

that are associated with the same intermediate key

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

MapReduce Pipeline

Map Shuffle Reduce

Read from DFS Write to DFS

Page 4: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7

MapReduce in Action

Map (k1, v1) à list(k2, v2) •  Processes one input key/value pair •  Produces a set of intermediate key/value pairs

Reduce (k2, list(v2)) list(k3, v3) •  Combines intermediate values for one particular key •  Produces a set of merged output values (usually one)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

MapReduce Architecture

MapReduce MapReduce MapReduce MapReduce

Distributed File System

Network

MapReduce Job Tracker

Page 5: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

Software Components

v  Job Tracker (Master) §  Maintains Cluster membership of workers §  Accepts MR jobs from clients and dispatches tasks

to workers §  Monitors workers’ progress §  Restarts tasks in the event of failure

v  Task Tracker (Worker) §  Provides an environment to run a task §  Maintains and serves intermediate files between

Map and Reduce phases

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

MapReduce Parallelism

 Hash  Partitioning  

Page 6: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Example 1: Count Word Occurrences

v  Input: Set of (Document name, Document Contents)

v  Output: Set of (Word, Count(Word)) v  map(k1, v1):

for each word w in v1 emit(w, 1)

v  reduce(k2, v2_list): int result = 0; for each v in v2_list

result += v; emit(k2, result)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Map

Example 1: Count Word Occurrences

Map

Reduce

Reduce

this is a line

this is another line

another line

yet another line

this, 1 is, 1 a, 1 line, 1 this, 1 is, 1 another, 1 line, 1

another, 1 line, 1

yet, 1 another, 1 line, 1

a, 1 another, 1 is, 1 is, 1

line, 1 line, 1 this, 1 this, 1

another, 1 another, 1

line, 1 line, 1 yet, 1

a, 1 another, 3 is, 2

line, 4 this, 2 yet, 1

Page 7: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

(Picture borrowed from Shiv Babu @ Duke University)

MapReduce Pipeline Revisited

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Example 2: Equijoins

v  Input: Rows of Relation R, Rows of Relation S v  Output: R join S on R.x = S.y v  map(k1, v1)

if (input == R) emit(v1.x, [“R”, v1])

else emit(v1.y, [“S”, v2])

v  reduce(k2, v2_list) for r in v2_list where r[1] == “R”

for s in v2_list where s[1] == “S” emit(1, result(r[2], s[2]))

Page 8: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

Other Examples

v  Distributed grep v  Inverted index construction v  Machine learning v  Distributed sort v  Fuzzy join v  … v  Or: A Pig script or a Hive query (which are

then auto-converted to a Hadoop MapReduce job series under the covers) – e.g., at Netflix

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

Fault Tolerant Evaluation

v  Task Fault Tolerance is achieved through re-execution ( roll forward, not back!)

v  All consumers consume data only after completely generated by the producer §  This is an important property to isolate faults to

one task

v  Task completion committed through Master v  Cannot handle master failure

Page 9: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

Task granularity and pipelining

v  Fine granularity tasks §  Many more map tasks than machines

• Minimizes time for fault recovery • Pipelines shuffling with map execution • Better load balancing

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

Optimization: Combiners

v  Sometimes partial aggregation is possible on the Map side

v  May cut down the amount of data needing to be transferred to the reducer (significantly in some cases, like grouped aggregation in Hive)

v  combine(K2, list(V2)) -> K2, list(V2) v  For Word Occurrence Count example,

Combine == Reduce (Q: Why?)

Page 10: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19

Map

Example 1: Word Count Revisited (With Combiners)

Map

Reduce

Reduce

this is a line

this is another line

another line

yet another line

this, 1 is, 1 a, 1 line, 1 this, 1 is, 1 another, 1 line, 1

another, 1 line, 1

yet, 1 another, 1 line, 1

a, 1 another, 1 is, 2

line, 2 this, 2

another, 2

line, 2 yet, 1

a, 1 another, 3 is, 2

line, 4 this, 2 yet, 1

ß Inte

rmedi

ate  

Result

s  

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20

Optimization: Redundant Execution

v  Slow workers lengthen completion time v  Slowness happens because of

§  Other jobs consuming resources §  Bad disks/network etc

v  Solution: Near the end of the job spawn extra copies of long running tasks §  Whichever copy finishes first, wins. §  Kill the rest

v  In Hadoop this is called “speculative execution”

Page 11: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21

Optimization: Locality

v  Task scheduling policy §  Ask DFS (next topic!) for locations of replicas of

input file blocks §  Map tasks scheduled so that input blocks are

machine local or rack local

v  Effect: Tasks read data at local disk speeds v  Without this, rack switches limit data rate

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22

Distributed (Big!) Filesystem

v  Used as the “store” for MapReduce data v  MapReduce reads its input from DFS and

writes its output to DFS v  Provides a “shared disk” view to applications

using local storage on shared-nothing hardware v  Provides redundancy by replication to protect

from node/disk failures

Page 12: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23

DFS Architecture

Taken from Ghemawat’s SOSP’03 paper (The Google Filesystem)

•  Single Master (with backups) that track DFS file name to chunk mapping •  Several Chunk servers that store chunks on local disks •  Chunk Size ~ 64MB or larger •  Chunks are replicated •  Master only used for chunk lookups – Does not participate in transfer of data

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24

Chunk Replication

v  Several Replicas of each Chunk §  Replicas usually spread across racks and data centers

to maximize availability §  3 replicas common (local, same rack, different rack)

v  Master tracks location of each replica of a chunk v  When chunk failure is detected, master

automatically rebuilds new replica to maintain replication level

v  Automatically picks chunk servers for new replicas based on utilization

Page 13: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25

MapReduce & DFS: Summary

v  Google laid a foundation for a new flurry of large-scale data storage and processing with their MR and DFS work in the early 2000’s

v  Apache open source versions soon sprung up outside of Google: Hadoop MapReduce & HDFS

v  Today, Big Data use cases are addressed with a mix of parallel RDBMS technologies as well as more “flexible” Hadoop-based technologies

v  So where are we now…?

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

(Pig)

Today’s Tangled World

Page 14: Principles of Data Management Lecture #16 (MapReduce & DFS ... · the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27

(Pig)

Today’s Tangled World

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28

Additional Reading v  Original MapReduce Paper (*** MUST READ! ***)

§  “Simplified Data Processing on Large Clusters” by Jeffrey Dean and Sanjay Ghemawat in OSDI ’04

v  Original DFS Paper §  “The Google Filesystem” by Sanjay Ghemawat, Howard Gobioff, and

Shun-Tak Leung in SOSP ’03

v  MapReduce vs. Parallel DBMS Papers in CACM (Jan. 2010) §  “MapReduce and Parallel DBMSs: Friends or Foes?” by Michael

Stonebraker, Daniel Abadi, David DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin

§  “MapReduce: A Flexible Data Processing Tool” by Jeffrey Dean and Sanjay Ghemawat

v  EDBT “Ogres & Onions Keynote” Paper §  “Inside "Big Data Management": Ogres, Onions, or Parfaits?” by Vinayak

Borkar, Michael J. Carey, and Chen Li in EDBT '12 (or watch the movie )