1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame...

47
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    2

Transcript of 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame...

Page 1: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

1

Models and Frameworksfor Data IntensiveCloud Computing

Douglas ThainUniversity of Notre Dame

IDGA Cloud Computing8 February 2011

Page 2: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

2

The Cooperative Computing LabWe collaborate with people who have large scale computing problems in science, engineering, and other fields.

We operate computer systems on the scale of 1000 cores. (Small)

We conduct computer science research in the context of real people and problems.

We publish open source software that captures what we have learned.

http://www.nd.edu/~ccl

Page 3: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

3

I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour.A real problem will take a month (I think.)

Can I get a single result faster?Can I get more results in the same time?

Last year,I heard aboutthis grid thing.

What do I do next?

This year,I heard about

this cloud thing.

Page 4: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

4

Our Application CommunitiesBioinformatics– I just ran a tissue sample through a sequencing device.

I need to assemble 1M DNA strings into a genome, then compare it against a library of known human genomes to find the difference.

Biometrics– I invented a new way of matching iris images from

surveillance video. I need to test it on 1M hi-resolution images to see if it actually works.

Data Mining– I have a terabyte of log data from a medical service. I

want to run 10 different clustering algorithms at 10 levels of sensitivity on 100 different slices of the data.

Page 5: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

Why Consider Scientific Apps?

Highly motivated to get a result that is bigger, faster, or higher resolution.

Willing to take risks and move rapidly, but don’t have the effort/time for major retooling.

Often already have access to thousands of machines in various forms.

Security is usually achieved by selecting resources appropriately at a high level.

g5

Page 6: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

6

Cloud Broadly Considered

Cloud: Any system or method that allows me to allocate as many machines as I need in short order:– My own dedicated cluster. (ssh)– A shared institutional batch system (SGE)– A cycle scavenging system (Condor)– A pay-as-you-go IaaS system (Amazon EC2)– A pay-as-you-go PaaS system (Google App)

Scalable, Elastic, Dynamic, Unreliable…

Page 8: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

8

Page 9: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
Page 10: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

10

greencloud.crc.nd.edu

Page 11: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

11

What they want. What they get.

Page 12: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

12

The Most CommonApplication Model?

Every program attempts to grow until it can read mail.

- Jamie Zawinski

Page 13: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

13

An Old Idea: The Unix Model

input < grep | sort | uniq > output

Page 14: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

14

Advantages of Little Processes

Easy to distribute across machines.

Easy to develop and test independently.

Easy to checkpoint halfway.

Easy to troubleshoot and continue.

Easy to observe the dependencies between components.

Easy to control resource assignments from an outside process.

Page 15: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

15

Our approach:

Encourage users to decompose their applications into simple

programs.

Give them frameworks that can assemble them into programs of

massive scale with high reliability.

Page 16: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

16

Working with AbstractionsF

A1A2

An

AllPairs( A, B, F )

Cloud or Grid

A1A2

Bn

CustomWorkflow

Engine

Compact Data Structure

Page 17: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

17

MapReduceUser provides two simple programs:

Map( x ) -> list of (key,value)Reduce( key, list of (value) ) -> Output

The Map-Reduce implementation puts them together in a way to maximize data parallelism.

Open source implementation: Hadoop

Page 18: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

18

R

R

R O2

O1

O0

Key0

Key1

KeyN

V

V

V

V

V

V

V

V

V

V

V

V

V V

V V

V

M

M

M

M

Page 19: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

19

Of course, not all science fits into the Map-Reduce model!

Page 22: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

22

All-Pairs Abstraction

AllPairs( set A, set B, function F )

returns matrix M where

M[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

F

allpairs A B F.exe

Page 23: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

23

How Does the Abstraction Help?

The custom workflow engine:– Chooses right data transfer strategy.– Chooses the right number of resources.– Chooses blocking of functions into jobs.– Recovers from a larger number of failures.– Predicts overall runtime accurately.

All of these tasks are nearly impossible for arbitrary workloads, but are tractable (not trivial) to solve for a specific abstraction.

Page 24: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

24

Page 25: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

25

Choose the Right # of CPUs

Page 26: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

26

Resources Consumed

Page 27: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

27

All-Pairs in ProductionOur All-Pairs implementation has provided over 57 CPU-years of computation to the ND biometrics research group in the first year.

Largest run so far: 58,396 irises from the Face Recognition Grand Challenge. The largest experiment ever run on publically available data.

Competing biometric research relies on samples of 100-1000 images, which can miss important population effects.

Reduced computation time from 833 days to 10 days, making it feasible to repeat multiple times for a graduate thesis. (We can go faster yet.)

Page 28: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

28

All-Pairs Abstraction

AllPairs( set A, set B, function F )

returns matrix M where

M[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

F

allpairs A B F.exe

Page 29: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

29

Are there other abstractions?

Page 30: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

30

M[4,2]

M[3,2] M[4,3]

M[4,4]M[3,4]M[2,4]

M[4,0]M[3,0]M[2,0]M[1,0]M[0,0]

M[0,1]

M[0,2]

M[0,3]

M[0,4]

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

F

F

y

y

x

x

d

d

x F Fx

yd yd

Wavefront( matrix M, function F(x,y,d) )

returns matrix M such that

M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] )

F

Wavefront(M,F)

M

Page 31: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

31

What if your application doesn’t fit a regular pattern?

Page 32: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

32

Another Old Idea: Make

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Page 33: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

XgridCluster

CampusCondor

Pool

PublicCloud

Provider

PrivateSGE

Cluster

Makefile

Makeflowsubmit

jobs

Local Files and Programs

Makeflow: First Try

Page 34: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

34

Problems with the First Try

Software Engineering: too many batch systems with too many slight differences.

Performance: Starting a new job or a VM takes 30-60 seconds.

Stability: An accident could result in you purchasing thousands of cores!

Solution: Overlay our own work management system into multiple clouds.– Technique used widely in the grid world.

Page 35: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

PrivateCluster

CampusCondor

Pool

PublicCloud

Provider

PrivateSGE

Cluster

Makefile

Makeflow

Local Files and Programs

Makeflow: Second Trysge_submit_workers

W

W

W

ssh

WW

WW

W

Wv

W

condor_submit_workers

W

W

W

Hundreds of Workers in a

Personal Cloud

submittasks

Page 36: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

36

worker

workerworker

workerworker

workerworker

workqueue

afile bfile

put progput afileexec prog afile > bfileget bfile

100s of workersdispatched to

the cloud

makeflowmaster

queuetasks

tasksdone

prog

detail of a single worker:

Makeflow: Second Try

bfile: afile prog prog afile >bfile

Two optimizations: Cache inputs and output. Dispatch tasks to nodes with data.

Page 37: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

Makeflow Applications

Page 38: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

38

The best part is thatI don’t have to learn anything

about cloud computing!

- Anonymous Student

Page 39: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

39

I would like to posit that computing’s central challenge how not to make a mess of it

has not yet been met.

- Edsger Djikstra

Page 40: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

40

Too much concurrency!

Vendors of multi-core projects are pushing everyone to make their code multi-core.

Hence, many applications now attempt to use all available cores at their disposal, without regards to RAM, I/O, Disk…

Two apps running on the same machine almost always conflict in bad ways.

Opinion: Keep the apps simple and sequential, and let the framework handle concurrency.

Page 41: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

41

The 0, 1 … N attitude.Code designed for a single machine doesn’t worry about resources, because there isn’t any alternative. (Virtual Memory)

But in the cloud, you usually scale until some resource is exhausted!

App devels are rarely trained to deal with this problem. (Can malloc or close fail?)

Opinion: All software needs to do a better job of advertising and limiting resources. (Frameworks could exploit this.)

Page 42: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

42

To succeed, get used to failure.

Any system of 1000s of parts has failures, and many of them are pernicious:– Black holes, white holes, deadlock…

To discover failures, you need to have a reasonably detailed model of success:– Output format, run time, resources consumed.

Need to train coders in classic engineering:– Damping, hysteresis, control systems.

Keep failures at a sufficient trickle, so that everyone must confront them.

Page 43: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

43

Cloud Federation?10 years ago, we had (still have) multiple independent computing grids, each centered on a particular institution.

Grid federation was widely desired, but never widely achieved, for many technical and social reasons.

But, users ended up developing frameworks to harness multiple grids simultaneously, which was nearly as good.

Same story with clouds?

Page 44: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

44

Cloud Reliability?High reliability needed for something like a widely shared read-write DB in the cloud.

But in many cases, a VM is one piece in a large system with higher level redundancy.

End to end principle: The topmost layer is ultimately responsible for reliability.

Opinion: There is a significant need for resources of modest reliability at low cost.

Page 45: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

45

Many hard problems remain…

The cloud forces end users to think about their operational budget at a fine scale. Can software frameworks help with this?

Data locality is an unavoidable reality that cannot be hidden. (We tried.) Can we expose it to users in ways that help, rather than confuse them?

Page 46: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

46

A Team Effort

Grad Students– Hoang Bui – Li Yu– Peter Bui– Michael Albrecht– Peter Sempolinski– Dinesh Rajan

Faculty:– Patrick Flynn– Scott Emrich– Jesus Izaguirre– Nitesh Chawla– Kenneth Judd

NSF Grants CCF-0621434, CNS-0643229, and CNS 08-554087.

Undergrads– Rachel Witty– Thomas Potthast– Brenden Kokosza– Zach Musgrave– Anthony Canino

Page 47: 1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.

47

For More Information

The Cooperative Computing Lab– http://www.nd.edu/~dthain

Prof. Douglas Thain– [email protected]