Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

57
Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008

Transcript of Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Page 1: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Cluster Computing with Dryad

Mihai Budiu, MSR-SVCLiveLabs, March 2008

Page 2: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

2

Goal

Page 3: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

3

The Dryad Project

http://research.microsoft.com/research/sv/dryad

Dryad: Distributed Data-Parallel Programs from Sequential Building BlocksMichael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly

European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

Page 4: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

4

• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad

Outline

Page 5: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

5

Design Space

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

DryadSearch

HPC

Grid

Transaction

Page 6: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

6

Data Partitioning

RAM

DATA

DATA

Page 7: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

7

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 8: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

8

Dryad = Execution Layer

Job (Application)

Dryad

Cluster

Pipeline

Shell

Machine≈

Page 9: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

9

• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad

Outline

Page 10: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

10

Virtualized 2-D Pipelines

Page 11: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

11

Virtualized 2-D Pipelines

Page 12: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

12

Virtualized 2-D Pipelines

Page 13: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

13

Virtualized 2-D Pipelines

Page 14: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

14

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Page 15: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

15

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

grep1000 | sed500 | sort1000 | awk500 | perl50

Page 16: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

16

Channels

X

M

Items

Finite Streams of items

• distributed filesystem files (persistent)• SMB/NTFS files (temporary)• TCP pipes (inter-machine)• memory FIFOs (intra-machine)

Page 17: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

17

Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS PD PDPD

V V V

Job manager cluster

Page 18: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

JM code

vertex code

Staging1. Build

2. Send .exe

3. Start JM

5. Generate graph

7. Serializevertices

8. MonitorVertex execution

4. Querycluster resources

Cluster services6. Initialize vertices

Page 19: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Fault Tolerance

Page 20: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

20

• Dryad Design• Implementation• Policies and Resource Management• Building on Dryad

Outline

Page 21: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

21

Policy Managers

R R

X X X X

Stage RR R

Stage X

Job Manager

R managerX ManagerR-X

Manager

Connection R-X

Page 22: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

X[0] X[1] X[3] X[2] X’[2]

Completed vertices Slow vertex

Duplicatevertex

Duplicate Execution Manager

Duplication Policy = f(running times, data volumes)

Page 23: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

23

S S S S

A A A

S S

T

S S S S S S

T

# 1 # 2 # 1 # 3 # 3 # 2

# 3# 2# 1

static

dynamic

rack #

Aggregation Manager

Page 24: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

24

Data Distribution(Group By)

Dest

Source

Dest

Source

Dest

Source m

n

m x n

Page 25: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

TT[0-?) [?-100)

Range-Distribution Manager

S

D D D

S S

S S S

Tstatic

dynamic25

Hist

[0-30),[30-100)

[30-100)[0-30)

[0-100)

Page 26: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

26

Goal: Declarative Programming

X

T

S

X X

S S

T T T

X

static dynamic

Page 27: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

27

• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad

Outline

Page 28: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

28

Software Stack

Windows Server

Cluster Services

Distributed Filesystem

Dryad

Distributed Shell

PSQL

DryadLINQ

PerlSQL

server

C++

Windows Server

Windows Server

Windows Server

C++

CIFS/NTFS

legacycode

sed, awk, grep, etc.

SSISQueries

C#

Vectors

Machine Learning

C#

Job

queu

eing

, mon

itorin

g

Page 29: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

29

SkyServer Query 18

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

L L

select distinct P.ObjIDinto results from photoPrimary U, neighbors N, photoPrimary Lwhere U.ObjID = N.ObjID and L.ObjID = N.NeighborObjID and P.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 and abs((U.r-U.i)-(L.r-L.i))<0.05 and abs((U.i-U.z)-(L.i-L.z))<0.05

Page 30: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

30

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

0 2 4 6 8 10

Number of Computers

Speed-up (times)

Dryad In-Memory

Dryad Two-pass

SQLServer 2005

SkyServer Q18 Performance

Page 31: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

31

DryadLINQ

• Declarative programming • Integration with Visual Studio• Integration with .Net• Type safety• Automatic serialization• Job graph optimizations static dynamic

• Conciseness

Page 32: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

32

LINQ

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 33: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

33

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 34: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

34

Sort & Map-Reduce in DryadLINQ

S

D D D

S S

SortSort

Sampl

[0-30),[30-100)

[30-100)[0-30)

[0-100)

Page 35: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

35

PLINQ

public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){

return source.AsParallel().OrderBy(keySelector, comparer);}

Page 36: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

36

Machine Learning in DryadLINQ

Dryad

DryadLINQ

Large Vector

Machine learningData analysis

Page 37: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

37

Very Large Vector LibraryPartitionedVector<T>

T

Scalar<T>

T T

T

Page 38: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

38

Operations on Large Vectors: Map 1

U

T

T Uf

f

f preserves partitioning

Page 39: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

39

V

Map 2 (Pairwise)

T Uf

V

U

T

f

Page 40: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

40

Map 3 (Vector-Scalar)T U

fV

V

40

U

T

f

Page 41: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Reduce (Fold)

41

U UU

U

f

f f f

fU U U

U

Page 42: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

42

Linear Algebra

T U Vnmm ,,=, ,

T

Page 43: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

43

Linear Regression

• Data

• Find

• S.t.

mt

nt yx ,

mnA

tt yAx

},...,1{ nt

Page 44: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

44

Analytic Solution

X×XT X×XT X×XT Y×XT Y×XT Y×XT

Σ

X[0] X[1] X[2] Y[0] Y[1] Y[2]

Σ

[ ]-1

*

A

1))(( Ttt t

Ttt t xxxyA

Map

Reduce

Page 45: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

45

Linear Regression Code

Vectors x = input(0), y = input(1);Matrices xx = x.PairwiseOuterProduct(x);OneMatrix xxs = xx.Sum();Matrices yx = y.PairwiseOuterProduct(x);OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(

xxinv, (a, b) => a.Multiply(b));

1))(( Ttt t

Ttt t xxxyA

Page 46: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Expectation Maximization (Gaussians)

46

• 160 lines • 3 iterations shown

Page 47: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Conclusions• Dryad = distributed execution environment• Application-independent (semantics oblivious)• Supports rich software ecosystem

– Relational algebra– Map-reduce– LINQ– Etc.

• DryadLINQ = A Dryad provider for LINQ• This is only the beginning!

47

START

Page 48: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

48

Backup Slides

Page 49: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

49

• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop

Dryad Map-Reduce

• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal

Page 50: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

50

Small Cluster Support

Sort Sort

Merge

Sort

MergeMerge

Sort

Merge

Grouping vertices

Sort

Merge

Fast channels

Page 51: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

U U

SkyServer DB query

• Took SQL plan• Manually coded in Dryad• Manually partitioned data

u: objid, colorn: objid, neighborobjid[partition by objid]

select u.color,n.neighborobjidfrom u join nwhere u.objid = n.objid

(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]

[distinct][merge outputs]

select u.objidfrom u join <temp>where u.objid = <temp>.neighborobjid and |u.color - <temp>.color| < d

Page 52: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Optimization

D

M

S

Y

X

M

S

M

S

M

S

U N

U

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

U U

Page 53: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Optimization

D

M

S

Y

X

M

S

M

S

M

S

U N

U

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

U U

Page 54: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Query histogram computation

• Input: log file (n partitions)• Extract queries from log partitions• Re-partition by hash of query (k buckets)• Compute histogram within each bucket

Page 55: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Naïve histogram topology

Q Q

R

Q

R k

k

k

n

n

is:Each

R

is:

Each

MS

C

P

C

S

C

S

D

P parse linesD hash distributeS quicksortC count

occurrencesMS merge sort

Page 56: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Efficient histogram topologyP parse linesD hash distributeS quicksortC count

occurrencesMS merge sortM non-deterministic

merge

Q' is:Each

R

is:

Each

MS

C

M

P

C

S

Q'

RR k

T

k

n

T

is:

Each

MS

D

C

Page 57: Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.

Final histogram refinement

Q' Q'

RR 450

TT 217

450

10,405

99,713

33.4 GB

118 GB

154 GB

10.2 TB

1,800 computers43,171 vertices11,072 processes11.5 minutes