Final Project: Non-Work-Conserving Effects in MapReduce

Non-Work-Conserving Effects in MapReduce:

Diffusion Approximation

Ping-Chun Hsieh2015.04.27

How to Handle Large Data?

https://www.gtisoft.com/img/DistributedComputing.jpg

One solution is “distributed computing”

MapReduce is one implementation.

What is MapReduce?

• “Job”: a document with several words

1. “Map Task”: Word (word, count) pair

2. “Copy/Shuffle”: distribute the pairs to reduce machine

3. “Reduce Task”: count the word frequency

using word as a key

Example: Finding word frequency

He is in the elite of the elite right

now

(He,1)

(is,1)

(now,1)

…(the,2)

(elite,2)

(now,1)

…

Main Issue & Outline

• Model of MapReduce

• Load conditions and 3 Tie-Breaking

Policies

• Diffusion Approximation

• Finding a Lower Bound

• Sketch of the proof

Main Issue: Analyze the Map & Reduce queues and design a good scheduling policy [1]

Outline:

Model of MapReduce

• Each job has multiple Map tasks and Reduce Tasks.

• Map Task smaller than Reduce Task

• Workload Bi with mean E[B] and Var[B].

M/G/1 processor sharing queue

K-server G/G/1 queue

Intermediate data

Reduce Task

Reduce Queue

2

1

K

R

R

R

Map Queue

J obReduce Task

Time

Workload ~ Bi follows B

~Poi(l)

Reduce Task

Reduce Queue

2

1

K

R

R

R

Map Queue

J obReduce Task

Model and Notations

• Qr(t): # of jobs in the reduce queue.

• Qm(t): # of jobs in the map queue.

• For task j of job i:

• Ri(t): # of running Reduce tasks of job i

• Ri: total # of Reduce tasks of job i

Qr(t)

Qm(t)

: copy/shuffl e

: reduce phase

: ( )

jij

i

ij j

i ij

C

D

Z C D

Traffic Assumptions and Constraint

• Reduce queue: 1. max-min fairness

Ex: 10 servers with 3 jobs (4,3,3)

2. No “preemption”: jobs cannot be interrupted

• Dependence constraint: Progress of Reduce tasks <= Progress of Map Tasks

(t) (t)i iC B

Reduce queue is NOT work-conserving

Lightly Loaded

Example: K=10

Current state: 2 jobs in Reduce queue

Map Queue

J obReduce Task

Reduce Task

Reduce Queue 1 R

K R

Suppose: A new job join the Reduce queue

(5,5)

(?,?,?)

Heavily Loaded

How to break tie?

Map Queue

J obReduce Task

Reduce Task

Reduce Queue 1 R

K R Example: K=10

Current state: >>10 jobs in Reduce queue (1,…,1)Suppose: One job finished and leaves Reduce queue

Study on Tie-Breaking Rules• Consider 3 rules for Reduce queue:Policy 1: Choose the one with the smallest remaining Map service

Policy 2: Choose one job randomly (uniform)Policy 3: Choose the one which starts Map service first

Time

J ob 1

J ob 2

J ob 3

TS1 TS2 TS3 TE1TE2 TE3

Current time

Map Service

Policy 1

Policy 3

Diffusion Approximation• Consider a sequence of MapReduce

systems (n) (n) (n) (n)(B ,C ,D ,R )

(n) l l

1. Primitive data:

4. Arrival rate:

• Heavy-traffic assumption:(n) (n)(1 E[B ])n l Map service

1(n) (n) (n) (n)(1 E[(C )R ])n DK

l Reduce service

2. Reduce workload:1

(n)(n), (n),(n) : (C )iR j j

i i ijZ D

3. Limits: (n) (n) 2( [ ], [ ]) ( , )b bE B Var B (n) (n) (n)( [R ], [Z ], [Z ]) ( , , )r r rE E Var

Diffusion Approximation (Cont.)

• Diffusion limits:

(n) (n)(n) (n)Q (nt) Q (nt)

Q (t) : , Q (t) :m rm r

n n

• Queue length: (n) (n)(Q , Q )m r

Theorem 1 [4]: (Map queue) (n)

* (RBQ (t) Q (t) (Q ( M)0) W (t)) m m m m

where *W (t)m is a BM with:2 2b bdrift 2 ( ) 0b (1)

(2)22 2 2

b b bvariance 4 ( ) ( )b b l

Relaxed MapReduce & Lower Bound

1. Assume no dependence constraint:

(t) (t)i iC B

Reduce queue now becomes work-conserving

is possible

2. A job always given at least an equal number of servers as in the original queue, for all t.

• Construct a new queue:

Map Queue

J obReduce Task

Reduce Task

Reduce Queue 1 R

K R

Jobs are completed no later than in the original queue.

Lower Bound (Cont.)

Theorem 2: There exists a sequence

such that L,(n ( )) n

(t) (t)rr QQ

(1) L,(n)*(t) (W (t)) (RBM )r rQ

(2) * * is a BM with (0) 0,drift ,r rr

KW W

2

r3r

and variance

l

(3) L,(n)*(t) is independent of W (t)r mQ

Is the lower bound achievable?

Observation

A1: Map queue is M/G/1 and processor-sharingPast departures are independent of

Qm(t)The queue is reversible, i.e. departure process is also Poisson. [2]

Map Queue

J obReduce Task

Qm(t)

Observation (Cont.)

A2: In heavy traffic, if the processing of Reduce queue is the same as the departure of Map queue

Reduce queue will be very close to a FIFO multi-server queue with a service time for job i.

Map Queue

J obReduce Task

Reduce Task

Reduce Queue 1 R

K R

1(C )iR j j

i ijD

Intuition for Policy 1

From A1&A2: Arrival process of Qr(t) ~ departure process of Qm(t)

Map Queue

J obReduce Task

Reduce Task

Reduce Queue 1 R

K R

Policy 1: Choose the one with the smallest remaining Map service

Qr(t)

Qm(t)

(n)*(t) (W (t))r rQ

Reduce queue can be approximated by RBM


Diffusion approximation for Map queue

Policy 2: Choose one job randomly (uniform)

0 0(nt ) [ (t )] q ( )mm mQ n E Q n o n

Suppose 1 vacancy in Reduce queue at nt0The chosen job has remaining Map workload=

eB

( )

( ) [B]

e

x

P B x

P B u E du

( is a distribution)eB

[3]

nt0 nt0+Dt

Qm(t)

Time

n1/2

Intuition for Policy 2 (Cont.)

Let the remaining Map workload of that job = be

The remaining Map service time of that job ~ e

mb q n

Reduce queue will grow by ~ e

mb q n

K

l

0 0(t ) (t ) , jump in diff usion limit

em

r r

b qQ Q

K

l

nt0 nt0+Dt

Qr(t)

DQr(t)

Time

Why does it matter?Little’s law!


Depends on the workload distribution B

Policy 3: Choose the one which starts Map service first

Special case: B is constant

P[B>x]

x

1

x*

P[B>x]

x

1

x*

Policy 1 = Policy 3

Like Policy 1 Like Policy 2

Achieve the Lower Bound

Theorem 3.(1) Under policy 1, if B is bounded, then

(n)*(W (t))(t)r rQ

(2) Under policy 2, if B is bounded, then

* *(n)

(W (t))(t)r rQ

where * * * is modified from (W (t)) (W (t))r rwith jumps of random size when * *(W (t))r

hits zero.

Achieve the Lower Bound (Cont.)

Theorem 3.(3) Under policy 3, if B is bounded and

has a decreasing hazard function, then

(n)*(W (t))(t)r rQ

* *(n)

(W (t))(t)r rQ

If B has an increasing hazard function, then

Hazard function (failure rate)

Remark: 0

P(x )(x) lim

P(x )x

B x xH

x BD

D

D

(Like Policy 1)

(Like Policy 2)

Conclusion• Non-work-conserving effect might

occur in the MapReduce system under heavy traffic.

• With heavy-traffic assumption, we obtain a lower bound using diffusion approximation.

• Tie-breaking rule should be carefully designed to avoid possible jumps in the queue length.

References• [1] J. Tan et al., “Non-work-conserving Effects in

MapReduce: Diffusion Limit and Criticality,” in Proc. SIGMETRICS, 2014.

• [2] F. P. Kelly. Reversibility and Stochastic Networks. John Wiley & Sons, 1979.

• [3] H. C. Gromoll, “Diffusion approximation for a processor sharing queue in heavy traffic,” Annals of Applied Probability, 14:555–611, 2004.

• [4] A. Lambert, F. Simatos, and B. Zwart. “Scaling limits via excursion theory: Interplay between Crump Mode-Jagers branching processes and processor sharing queues,” The Annals of Applied Probability, 23:2161–2603, 2013.

Final Project: Non-Work-Conserving Effects in MapReduce

Technology

Transcript of Final Project: Non-Work-Conserving Effects in MapReduce