Starfish: A Self-tuning System for Big Data Analytics · Starfish: A Self-tuning System for Big...

Starfish: A Self-tuning System for Big Data AnalyticsHerodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu

Department of Computer Science, Duke University

Starfish Overview

SQL Client

OozieHivePig

Extensible MapReduce

Execution Engine

HDFSOther Storage

Engines

Starfish

Elastic MR…

Middleware / DB / K-V Stores

Database

SystemsHadoop

Analytics System

Scribe

Flume

Data

inputs

Data

outputs

Java Client

Web Frontend

OLTP System

Starfish in the Hadoop Ecosystem

Hadoop is a MAD system for data analytics

Magnetism: attracts all sources of data

Agility: adapts in sync with rapid data evolution

Depth: supports complex analytics needs

Starfish makes Hadoop MADDER and Self-Tuning

Data-lifecycle-awareness: achieves good

performance throughout data lifecycle

Elasticity: adjusts resources and operational costs

Robustness: provides availability and predictability

Just-in-Time Job Optimization

Goal

Find good settings for configuration parameters

Settings depend on job, data, and cluster characteristics

Challenges

Data opacity until processing

File-based processing

Heavy use of programming languages

Approach

Profiler: Uses dynamic instrumentation to learn performance

models (job profiles) for unmodified MapReduce programs

Sampler: Collects statistics about input, intermediate, and output

key-value spaces of a MapReduce job

What-if Engine: Uses a mix of simulation and model-based

estimation to predict job performance

Just-in-Time Optimizer: Searches through the high-dimensional

space of parameter settings

What-if

Engine

Components in the Starfish Architecture

Just-in-Time Optimizer

Profiler

Job-level tuning

Data Manager

Metadata

Mgr.

Intermediate

Data Mgr.

Workflow-level tuning

Workload-level tuning

Workflow-aware

Scheduler

Workload Optimizer Elastisizer

Data Layout &

Storage Mgr.

Sampler

Response surfaces of MapReduce programs in Hadoop

Workflow-aware Scheduling

Scheduling Objectives

Ensure balanced data layout

Avoid cascading reexecution under node failure or data corruption

Ensure power proportional computing

Adapt to imbalance in load or cost of energy across data centers

Causes of unbalanced data layouts

Skewed data

Data-layout-unaware scheduling of tasks

Addition or dropping of nodes without rebalancing operations

Approach

Consider interactions between scheduling policies and block

placement policies of the storage system

Use smart scheduling to perform rebalancing automatically

Exploit opportunities for collocating data sets

Unbalanced data layout after executing one MapReduce job

Workload Optimization

Processing multiple workflows on same data Optimization Techniques

Data-flow sharing

Materialization

Reorganization

Challenges

Interactions of above techniques among each other

and with scheduling, data layout policies, and

configuration parameter settings

Jumbo Operator

Use a single MapReduce job to process multiple

Select-Project-Aggregate operations over a table

Enables sharing of scans, computation, sorting,

shuffling, and output generation

Provisioning for Hadoop Workloads

Goal

Make provisioning decisions

based on workload requirements

Provisioning Choices

Number of nodes

Cluster configuration

Network configuration

Long-term vision

Hadoop Analytics as a Service

Workload performance and pay-as-you-go cost under various

cluster configurations on Amazon Elastic MapReduce

Amazon S3

storage

Users

(username, age, ipaddr)

GeoInfo

(ipaddr, region)

Clicks

(username, url, value)

Copy Copy CopyCopy

Partition by age into

<20, ≤25, ≤35, >35

Count users

per age <20

Count users

per region with

age > 25

Join

S3

Filter value >0

Join

Join

Count clicks

per

region,age

Filter age >35 Filter url is

“Sports” type

Count clicks

per url type

Count clicks

per age

Count clicks

per age

I II IVIII V VI

Copy

Example analytics workload for Amazon Elastic MapReduce

0

2000

4000

6000

8000

10000

12000

m1.small m1.large m1.xlarge

Exec

uti

on

Tim

e (

sec

)

Node Type

2 nodes 4 nodes 6 nodes

$0.00

$1.00

$2.00

$3.00

$4.00

$5.00

$6.00

m1.small m1.large m1.xlarge

Co

st

($)

Node Type

2 nodes 4 nodes 6 nodes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

5

10

15

20

Data Node

Dis

k U

sag

e (

%)

Map-only aggregation (non-data local)

Map-only aggregation (data local)

Partition (replication count = 1)

Initial layout

0

50

100

150

200

250

300

350

400

450

500

Ex

ec

uti

on

Tim

e (

se

c)

Serial

Concurrent

Jumbo

0

50

100

150

200

250

300

350

400

450

500

Ex

ec

uti

on

Tim

e (

se

c)

Partitioning

Serial

Concurrent

Jumbo

TeraSort in Hadoop

Ru

nn

ing

Tim

e (

sec)

Ru

nn

ing

Tim

e (

sec)

WordCount in Hadoop

Starfish: A Self-tuning System for Big Data Analytics · Starfish: A Self-tuning System for Big...

Documents

Transcript of Starfish: A Self-tuning System for Big Data Analytics · Starfish: A Self-tuning System for Big...