Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful...

27
Performance Analysis II Marco Serafini COMPSCI 590S Lecture 15

Transcript of Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful...

Page 1: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

Performance Analysis II

Marco Serafini

COMPSCI 590SLecture 15

Page 2: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2

2

Scalability

Parallelism

SpeedupIdeal

Reality

• Ideal world• Linear scalability

• Reality• Bottlenecks• For example: central coordinator

• When do we stop scaling?

Page 3: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

33

Scalability• Capacity of a system to improve performance by increasing the amount of resources available

• Typically, resources = processors• Strong scaling

• Fixed total problem size, more processors• Weak scaling

• Fixed per-processor problem size, more processors

Page 4: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

44

Strong and Weak Scaling• Strong scaling

• Fixed total problem size, more processors• Weak scaling

• Fixed per-processor problem size, more processors

Page 5: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

55

Scaling Up and Out• Scaling Up

• More powerful server (more cores, memory, disk)• Single server (or fixed number of servers)

• Scaling Out• Larger number of servers• Constant resources per server

Page 6: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately
Page 7: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

7 7

What Does This Plot Tell You?

Page 8: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

8 8

How About Now?

Page 9: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

99

COST• Configuration that outperforms single thread• # cores after which we achieve speedup over 1 core

Single iteration 10 iterations

Page 10: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

1010

Possible Reasons for High COST• Restricted API

• Limits algorithmic choice• Makes assumptions

• MapReduce: No memory-resident state• Pregel: program can be specified as “think-like-a-vertex”

• BUT also simplifies programming• Lower end nodes than laptop• Implementation adds overhead

• Coordination• Cannot use application-specific optimizations

Page 11: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

1111

Why not Just a Laptop?• Capacity

• Large datasets, complex computations don’t fit in a laptop• Simplicity, convenience

• Nobody ever got fired for using Hadoop on a cluster • Integration with toolchain

• Example: ETL à SQL à Graph computation on Spark

Page 12: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

1212

Disclaimers• Graph computation is peculiar

• Some algorithms are computationally complex…• Even for small datasets• Good use case for single-server implementations

• Machine Learning in too…

Page 13: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately
Page 14: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

14 14

Logistic Regression

“While VW can immediately start to update the model as data is read, Spark spends considerable time reading and caching the data, before it can run the first L-BFGS iteration.”

Page 15: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

1515

Gradient Boosted Trees

Page 16: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

16

Understanding Bottlenecks

Page 17: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

17

Page 18: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

18

Monotasks• Decompose data analytics jobs into monotasks

• Monotask is basic unit of scheduling • Each monotask uses only one resource

• This is the opposite of pipelining• Parallelize use of CPU, network, disk

• MonoSpark• 9% slower than Spark• Performance predictability

Page 19: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

19

Example: Spark• Non-uniform resource consumption• Concurrent access to same resources• Framework has no control on resource access

• Non-deterministic behavior, hard to debug/predict

Page 20: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2020

Monotasks Principles• Each monotask uses one resource

• CPU or network or disk• Monotasks execute in isolation

• No interaction or blocking during execution• Per-resource schedulers controls contention

• Enough concurrency to ensure full capacity, not more• For example, one CPU task per core

• Per-resource schedulers have complete control over resource

Page 21: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2121

Multitasks Execution

Page 22: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2222

Monotask Execution

Page 23: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2323

How to Break Task: Example

Page 24: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2424

Issues?• Network scheduling is difficult: requires coordination• Complex dependencies: CPU might wait for disk• Memory cost

• Cannot pipeline from disk, need to load all data

Page 25: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2525

Reasoning About Performance• Assume perfect parallelism/resource utilization

• They argue that it is a good approximation in MonoSpark• For each stage

• Measure utilization per monotask, take average• Estimate stage speedup with a different amount of resources

• Ignores• Skew• Dependencies and ramp-up time (networkingàCPUàdisk)

Page 26: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2626

Different HW Configurations• Sort with constant I/O cost and decreasing CPU cost• Effect of adding second disk

Page 27: Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful server (more cores, memory, disk) ... Logistic Regression “While VW can immediately

2727

Other Use Cases• Prioritizing optimizations

• Not trivial at all in concurrent workloads• Auto-configuration