Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

32
Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University

Transcript of Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Page 1: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Starfish: A Self-tuning System for Big Data Analytics

Herodotos Herodotou,

Harold Lim, Fei Dong, Shivnath Babu

Duke University

Page 2: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

2

Analysis in the Big Data Era

9/26/2011

Massive Data

DataAnalysi

s

Insight

Key to Success = Timely and Cost-Effective Analysis

Starfish

Page 3: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

3

Hadoop MapReduce EcosystemPopular solution to Big Data Analytics

9/26/2011

MapReduce Execution Engine

Distributed File System

Hadoop

Java / C++ / R / Python

OozieHivePigElastic

MapReduceJaql

HBase

Starfish

Page 4: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

4

Practitioners of Big Data AnalyticsWho are the users?

Data analysts, statisticians, computational scientists…Researchers, developers, testers…You!

Who performs setup and tuning?The users!Usually lack expertise to tune the system

9/26/2011 Starfish

Page 5: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

5

Tuning ChallengesHeavy use of programming languages for

MapReduce programs (e.g., Java/python)

Data loaded/accessed as opaque files

Large space of tuning choices

Elasticity is wonderful, but hard to achieve (Hadoop has many useful mechanisms, but policies are lacking)

Terabyte-scale data cycles

9/26/2011 Starfish

Page 6: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

6

Our goal: Provide good performance automatically

Starfish: Self-tuning System

9/26/2011

MapReduce Execution Engine

Distributed File System

Hadoop

Java / C++ / R / Python

OozieHivePigElastic

MapReduceJaql

HBase

Starfish

Analytics System

Starfish

Page 7: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

7

What are the Tuning Problems?

9/26/2011

Job-level MapReduce

configuration

Workload management

Datalayout tuning

Cluster sizing

Workflow optimization

J1 J2

J3

J4

Starfish

Page 8: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

8

Starfish’s Core Approach to Tuning

9/26/2011

1) if Δ(conf. parameters) then what …?

2) if Δ(data properties) then what …?

3) if Δ(cluster properties) then what …?

Profiler

Collects concisesummaries of

execution

What-if Engine

Estimates impact of hypothetical

changes on execution

Optimizers

Search through space of tuning choices

Job

WorkflowWorkload

Data layout

Cluster

Starfish

Page 9: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Starfish Architecture

9/26/2011 9

Profiler What-if Engine

Workflow Optimizer

Workload Optimizer Elastisizer

Job Optimizer

Data ManagerMetadata

Mgr.Intermediate

Data Mgr.Data Layout & Storage Mgr.

Starfish

Page 10: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

10

MapReduce Job Execution

9/26/2011

split 0 map out 0reduce

Two Map Waves One Reduce Wave

split 2 map

split 1 map split 3 map Out 1reduce

job j = < program p, data d, resources r, configuration c >

Starfish

Page 11: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

11

What Controls MR Job Execution?

Space of configuration choices:Number of map tasksNumber of reduce tasksPartitioning of map outputs to reduce tasksMemory allocation to task-level buffersMultiphase external sorting in the tasksWhether output data from tasks should be compressedWhether combine function should be used

9/26/2011

job j = < program p, data d, resources r, configuration c >

Starfish

Page 12: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

12

Effect of Configuration Settings

Use defaults or set manually (rules-of-thumb)Rules-of-thumb may not suffice

9/26/2011

Two-dimensional projection of a multi-dimensional surface(Word Co-occurrence MapReduce Program)

Rules-of-thumb settings

Starfish

Page 13: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

13

MapReduce Job Tuning in a NutshellGoal:

Challenges: p is an arbitrary MapReduce program; c is high-dimensional; …

9/26/2011

),,,(minarg crdpFcSc

opt

),,,( crdpFperf

Profiler

What-if Engine

Optimizer

Runs p to collect a job profile (concise execution summary) of <p,d1,r1,c1>

Given profile of <p,d1,r1,c1>, estimates virtual profile for <p,d2,r2,c2>

Enumerates and searches through the optimization space S efficiently

Starfish

Page 14: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

14

Job ProfileConcise representation of program execution as a jobRecords information at the level of “task phases”Generated by Profiler through measurement or by the

What-if Engine through estimation

9/26/2011

Memory Buffer

Merge

Sort,[Combine],[Compress]

Serialize,Partitionmap

Merge

split

DFS

SpillCollectMapRead

Starfish

Page 15: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

15

Job Profile FieldsDataflow: amount of data flowing through task phasesMap output bytes

Number of spills

Number of records in buffer per spill

9/26/2011

Costs: execution times at the level of task phasesRead phase time in the map task

Map phase time in the map task

Spill phase time in the map task

Dataflow Statistics: statistical information about dataflowWidth of input key-value pairs

Map selectivity in terms of records

Map output compression ratio

Cost Statistics: statistical information about resource costsI/O cost for reading from local disk per byte

CPU cost for executing the Mapper per record

CPU cost for uncompressing the input per byte

Starfish

Page 16: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

16

Generating Profiles by MeasurementGoals

Have zero overhead when profiling is turned offRequire no modifications to HadoopSupport unmodified MapReduce programs written in

Java or Hadoop Streaming/Pipes (Python/Ruby/C++)

Approach: Dynamic (on-demand) instrumentationEvent-condition-action rules are specified (in Java)Leads to run-time instrumentation of Hadoop internalsMonitors task phases of MapReduce job executionWe currently use Btrace (Hadoop internals are in Java)

9/26/2011 Starfish

Page 17: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

17

Generating Profiles by Measurement

9/26/2011

split 0 map out 0reduce

split 1 map

raw data

raw data

raw data

map profile

reduce profile

job profile

Use of Sampling• Profile fewer tasks• Execute fewer tasks

JVM = Java Virtual Machine, ECA = Event-Condition-Action

JVM JVM

JVM

Enable Profiling

ECA rules

Starfish

Page 18: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

18

What-if Engine

Job Oracle

Virtual Job Profile for <p, d2, r2, c2>

What-if Engine

9/26/2011

Task Scheduler Simulator

JobProfile

<p, d1, r1, c1>

Properties of Hypothetical job

Input DataProperties

<d2>

ClusterResources

<r2>

ConfigurationSettings

<c2>

Possibly Hypothetical

Starfish

Page 19: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

19

Virtual Profile Estimation

9/26/2011

Given profile for job j = <p, d1, r1, c1> estimate profile for job j' = <p, d2, r2, c2>

(Virtual) Profile for j'

DataflowStatistics

Dataflow

CostStatistics

Costs

Profile for jInput

Data d2

Confi-guration

c2

Resourcesr2

Costs

White-box Models

CostStatisticsRelative

Black-boxModels

Dataflow

White-box Models

DataflowStatistics

CardinalityModels

Starfish

Page 20: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

20

Job Optimizer

9/26/2011

Best Configuration Settings <copt> for <p, d2, r2>

Subspace Enumeration

Recursive Random Search

Just-in-Time Optimizer

JobProfile

<p, d1, r1, c1>

Input DataProperties

<d2>

ClusterResources

<r2>

What-ifcalls

Starfish

Page 21: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

21

Workflow Optimization Space

9/26/2011

Job-level Configuration

Dataset-level Configuration

Physical

Optimization Space

Logical

Join Selection

Partition Function Selection

Vertical Packing

Inter-job Inter-job

Starfish

Page 22: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

22

Optimizations on TF-IDF Workflow

9/26/2011

LogicalOptimization

M1R1

…D0 <{D},{W}>

J1

D1

D2

D4

M2R2

…<{D, W},{f}>

J2

…<{D},{W, f, c}>

J3, J4

…<{W},{D, t}>

Partition:{D}Sort: {D,W}

M1R1M2R2

…D0 <{D},{W}>

J1, J2

D2

D4

M3R3M4

…<{D},{W, f, c}>

J3, J4

…<{W},{D, t}>

PhysicalOptimization

Reducers= 50Compress = offMemory = 400…

Reducers= 20Compress = onMemory = 300…

LegendD = docname f = frequencyW = word c = countt = TF-IDF

M3R3M4

Starfish

Page 23: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

23

New ChallengesWhat-if challenges:

Support concurrent job execution

Estimate intermediate data properties

Optimization challengesInteractions across jobsExtended optimization spaceFind good configuration

settings for individual jobs

9/26/2011

J1 J2

J3

J4

Workflow

Starfish

Page 24: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

24

Cluster Sizing ProblemUse-cases for cluster sizing

Tuning the cluster size for elastic workloadsWorkload transitioning from development cluster to

production clusterMulti-objective cluster provisioning

GoalDetermine cluster resources & job-level configuration

parameters to meet workload requirements

9/26/2011 Starfish

Page 25: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

25

Multi-objective Cluster Provisioning

9/26/2011

m1.small m1.large m1.xlarge c1.medium c1.xlarge0

200400600800

1,0001,200

Ru

nn

ing

Tim

e (m

in)

m1.small m1.large m1.xlarge c1.medium c1.xlarge0.002.004.006.008.00

10.00

EC2 Instance Type

Cos

t ($

)Cloud enables users to provision clusters in minutes

Starfish

Page 26: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Experimental Evaluation

9/26/2011 26

Starfish (versions 0.1, 0.2) to manage Hadoop on EC2Different scenarios: Cluster × Workload × Data

EC2 Node Type

CPU: EC2 units

Mem I/O Perf. Cost /hour

#Maps /node

#Reds/node

MaxMem /task

m1.small 1 (1 x 1) 1.7 GB moderate $0.085 2 1 300 MB

m1.large 4 (2 x 2) 7.5 GB high $0.34 3 2 1024 MB

m1.xlarge 8 (4 x 2) 15 GB high $0.68 4 4 1536 MB

c1.medium 5 (2 x 2.5) 1.7 GB moderate $0.17 2 2 300 MB

c1.xlarge 20 (8 x 2.5) 7 GB high $0.68 8 6 400 MB

cc1.4xlarge 33.5 (8) 23 GB very high $1.60 8 6 1536 MB

Starfish

Page 27: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Experimental Evaluation

9/26/2011 27

Starfish (versions 0.1, 0.2) to manage Hadoop on EC2Different scenarios: Cluster × Workload × Data

Abbr. MapReduce Program Domain Dataset

CO Word Co-occurrence Natural Lang Proc. Wikipedia (10GB – 22GB)

WC WordCount Text Analytics Wikipedia (30GB – 1TB)

TS TeraSort Business Analytics TeraGen (30GB – 1TB)

LG LinkGraph Graph Processing Wikipedia (compressed ~6x)

JO Join Business Analytics TPC-H (30GB – 1TB)

TF Term Freq. - Inverse Document Freq.

Information Retrieval Wikipedia (30GB – 1TB)

Starfish

Page 28: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

28

Job Optimizer Evaluation

9/26/2011

Hadoop cluster: 30 nodes, m1.xlargeData sizes: 60-180 GB

TS WC LG JO TF CO0

10

20

30

40

50

60

Default Set-tings

Rule-based Optimizer

Cost-based Optimizer

MapReduce Programs

Sp

eed

up

ove

r D

efau

lt

Starfish

Page 29: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

29

Estimates from the What-if Engine

9/26/2011

Hadoop cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia

True surface Estimated surface

Starfish

Page 30: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

30

Profiling Overhead Vs. Benefit

9/26/2011

1 5 10 20 40 60 80 1000

5

10

15

20

25

30

35

Percent of Tasks Profiled

Per

cent

Ove

rhea

d ov

er J

ob

Run

ning

Tim

e w

ith

Pro

fili

ng

Tur

ned

Off

1 5 10 20 40 60 80 1000.0

0.5

1.0

1.5

2.0

2.5

Percent of Tasks Profiled

Spee

dup

over

Job

run

w

ith

RB

O S

etti

ngs

Hadoop cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia

Starfish

Page 31: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

31

Multi-objective Cluster Provisioning

9/26/2011

m1.small m1.large m1.xlarge c1.medium c1.xlarge0

200400600800

1,0001,200

ActualPredicted

Ru

nn

ing

Tim

e (m

in)

m1.small m1.large m1.xlarge c1.medium c1.xlarge0.002.004.006.008.00

10.00

ActualPredicted

EC2 Instance Type for Target Cluster

Cos

t ($

)

Instance Type for Source Cluster: m1.large

Starfish

Page 32: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

32

More info: www.cs.duke.edu/starfish

9/26/2011

Job-level MapReduce

configuration

Workflow optimization

Workload management

Datalayout tuning

Cluster sizing

J1 J2

J3

J4

Starfish