Fei Dong Shivnath Babu - Duke University

Herodotos Herodotou Fei Dong

Shivnath Babu

Duke University

Analysis in the Big Data Era Data Parallel Systems  Scalability  Fault-tolerance  Reliability  Flexibility

Cloud Computing  Elasticity  Reduced costs  Disaster relief  Highly automated

10/28/2011 2 Duke University

Data Parallel Systems in the Cloud  Power to the (non-expert) users

 Can provision cluster of choice in minutes  Can avoid the middleman (system administrator)  Pay only for the resources used

 Burden to the (non-expert & expert) users  Make decisions at various levels (resources → job)  Meet workload requirements


Execu:ng MR Workloads on the Cloud

10/28/2011 Duke University 4

 Number of nodes  Node type

(~machine specs)

Goal: Execute MapReduce workload

Decisions:

TT TT

TT DN

TT TT

TT DN

 Number of M/R slots  Memory settings  …

 Number of M/R tasks  Use compression  …

Map

Map Red

Inpu

t

Out

put

Goal: Execute MapReduce workload in < 2 hours and < $10

Mul:-‐objec:ve Cluster Provisioning

0 200 400 600 800

1,000 1,200

m1.small m1.large m1.xlarge c1.medium c1.xlarge

Run

ning

Tim

e (m

in)

0.00 2.00 4.00 6.00 8.00

10.00


Cos

t ($)

Amazon EC2 Instance Type


Different machine specs

Moving from Develop. to Produc:on


 Smaller, less powerful than production cluster

 Develop & test MapReduce jobs

 How will the jobs behave?

 What settings to use for running the jobs?

Production Cluster

Development Cluster

Cluster Sizing Problem

Determine cluster resources & job-level configuration parameters to meet requirements on execution time & cost for a given workload


Solu:on: Elas:sizer


 Elastisizer, part of the Starfish system http://www.cs.duke.edu/starfish

Cluster Sizing Query Interface

Resource Optimizer

Configuration Optimizer

What-if Engine

Profiler

Cluster Sizing Query Interface 1.  Workload

  MapReduce jobs   Input data properties

2.  Cluster Resources   Number of nodes   Type of nodes

3.  Job Configurations   Number of map/reduce tasks   Task-level memory allocation

4.  Performance Requirements   Execution time   Monetary cost


Abstracted using job profiles

Either specify or ask for recommendations

Optimize for one (possibly) subject to a constraint on the other

Cluster Sizing Example Queries

 How will the jobs behave?  Specify workload, production cluster’s resources &

particular configuration settings. Ask for performance.  What settings to use?

 Specify workload, production cluster’s resources. Ask for configuration settings.


Production Cluster

Development Cluster

What-if Question

Optimization Request

What-if Engine

What-‐if Ques:on

Job Profiles

Properties of Hypothetical job

Input Data Properties

Cluster Resources

Configuration Settings

Possibly Hypothetical


Job Profile  Concise representation of program execution

 Records information at the level of “task phases”  Generated by Profiler through measurement or by the

What-if Engine through estimation

Memory Buffer

Merge

Sort, [Combine], [Compress]

Serialize, Partition map

Merge

split

DFS

Spill Collect Map Read 10/28/2011 12 Duke University

Job Profile Fields Dataflow: amount of data flowing through task phases Map output bytes Number of spills Number of records in buffer per spill

Costs: execution times at the level of task phases Read phase time in the map task Map phase time in the map task Spill phase time in the map task

Dataflow Statistics: statistical information about dataflow Width of input key-value pairs Map selectivity in terms of records Map output compression ratio

Cost Statistics: statistical information about resource costs I/O cost for reading from local disk per byte

CPU cost for executing the Mapper per record

CPU cost for uncompressing the input per byte


What-‐if Ques:on

Job Profiles

Properties of Hypothetical job

Input Data Properties

Cluster Resources

Configuration Settings

Possibly Hypothetical


What-if Engine Virtual Profile Estimation

Virtual Job Profiles

Task Scheduler Simulator

Virtual Profile Es:ma:on Given profile for job j, estimate profile for job j' based on new input data, cluster resources, & configuration

Dataflow Statistics

Dataflow

Cost Statistics

Costs

Profile for j

Resources

Relative Black-box

Models

Confi-guration

White-box Models

White-box Models

(Virtual) Profile for j'

Costs

Cost Statistics Dataflow

Dataflow Statistics

Input Data

Cardinality Models


Rela:ve Models for Cost Sta:s:cs (CS)

CSdev for job j

dev cluster resources r1 prod cluster resources r2

CSprod for job j Field Avg.

Local I/O cost / byte 17 Sort CPU cost / key 6

…

Field Avg. Local I/O cost / byte ? Sort CPU cost / key ?

…

Trai

ning

jobs

j1

j2

jn

… … Supervised learning algorithm


Genera:ng Training Samples  Requirements

  Good coverage of prediction space   Generate samples efficiently

  Idea 1: Try to cover the space based on workload

 Apriori Training Benchmark   Run a sample of the jobs in the workload   Assumes knowledge of full workload   Running time grows with workload & data   New jobs not represented in training data




  Idea 2: Focus on spectrum of resource usage

  Fixed Training Benchmark   Run representative MapReduce jobs with different resource

usage behavior   Independent from actual workload & data




  Idea 3: Focus on spectrum of resource usage + Exploit job profile abstraction

 Custom Training Benchmark   Small, synthetic workload   Tasks within jobs behave differently in terms of CPU, I/O,

memory, and network   Diverse training samples   No knowledge of actual workload/input data


Experimental Evalua:on  Different scenarios: Cluster × Workload × Data

EC2 Node Type

CPU: EC2 units

Mem I/O Perf. Cost /hour

#Maps /node

#Reds/node

MaxMem /task

m1.small 1 (1 x 1) 1.7 GB moderate $0.085 2 1 300 MB

m1.large 4 (2 x 2) 7.5 GB high $0.34 3 2 1024 MB

m1.xlarge 8 (4 x 2) 15 GB high $0.68 4 4 1536 MB

c1.medium 5 (2 x 2.5) 1.7 GB moderate $0.17 2 2 300 MB

c1.xlarge 20 (8 x 2.5) 7 GB high $0.68 8 6 400 MB

cc1.4xlarge 33.5 (8) 23 GB very high $1.60 8 6 1536 MB


Experimental Evalua:on  Different scenarios: Cluster × Workload × Data

Abbr. MapReduce Program Domain Dataset

CO Word Co-occurrence Natural Lang Proc. Wikipedia (10GB – 22GB)

WC WordCount Text Analytics Wikipedia (30GB – 180GB)

TS TeraSort Business Analytics TeraGen (30GB – 1TB)

LG LinkGraph Graph Processing Wikipedia (compressed ~6x)

JO Join Business Analytics TPC-H (30GB – 1TB)

TF Term Freq. - Inverse Document Freq.

Information Retrieval Wikipedia (30GB – 180GB)


Mul:-‐objec:ve Cluster Provisioning

0 200 400 600 800

1,000 1,200


Run

ning

Tim

e (m

in) Actual

Predicted

0.00 2.00 4.00 6.00 8.00

10.00


Cos

t ($)

EC2 Instance Type for Target Cluster

Actual Predicted

Workload: All jobs, 22GB – 180GB per job Instance Type for Source Cluster: m1.large


Moving from Develop. to Produc:on


Profile a job on Development cluster, optimize for Production cluster Development cluster: 10 m1.large nodes (40 EC2 units), ~60 GB Production cluster: 30 m1.xlarge nodes (240 EC2 units), ~180 GB

0 5

10 15 20 25 30 35 40

TS WC LG JO TF CO

Run

ning

Tim

e (m

in)

MapReduce Jobs

Rules-of-Thumb Profile from Development Profile from Production

Tuning Cluster Size


0 10 20 30 40

5 10 20 30 Run

ning

Tim

e (m

in)

Number of Nodes

LinkGraph Actual Predicted

0 50

100 150 200

5 10 20 30 Run

ning

Tim

e (m

in)

Number of Nodes

Word Co-occurrence Actual Predicted

Profile a job on 10-node cluster, predict for other sizes Profile cluster: 10 m1.large nodes (40 EC2 units), ~60 GB Prediction clusters: 5-30 m1.large nodes (40 EC2 units), ~60 GB

Training Benchmarks Evalua:on


-50

50

150

250

m1.small m1.large m1.xlarge c1.medium c1.xlarge Trai

ning

Tim

e (m

in)

Instance Type

Apriori Fixed Custom

0.0

0.5

1.0

1.5

m1.small m1.xlarge c1.medium c1.xlarge Rel

ativ

e Pr

edic

tion

Err

or o

ver A

prio

ri

Ben

chm

ark

Instance Type

Apriori Fixed Custom

More info: www.cs.duke.edu/starfish

Job-level MapReduce

configuration

Workflow optimization

Workload management

Data layout tuning

Cluster sizing

J1 J2

J3

J4


Fei Dong Shivnath Babu - Duke University

Documents

Transcript of Fei Dong Shivnath Babu - Duke University