Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful...

Performance Analysis II

Marco Serafini

COMPSCI 590SLecture 15

Scalability

Parallelism

SpeedupIdeal

Reality

• Ideal world• Linear scalability

• Reality• Bottlenecks• For example: central coordinator

• When do we stop scaling?

Scalability• Capacity of a system to improve performance by increasing the amount of resources available

• Typically, resources = processors• Strong scaling

• Fixed total problem size, more processors• Weak scaling

• Fixed per-processor problem size, more processors

Strong and Weak Scaling• Strong scaling

• Fixed total problem size, more processors• Weak scaling

• Fixed per-processor problem size, more processors

Scaling Up and Out• Scaling Up

• More powerful server (more cores, memory, disk)• Single server (or fixed number of servers)

• Scaling Out• Larger number of servers• Constant resources per server

What Does This Plot Tell You?

How About Now?

COST• Configuration that outperforms single thread• # cores after which we achieve speedup over 1 core

Single iteration 10 iterations

Possible Reasons for High COST• Restricted API

• Limits algorithmic choice• Makes assumptions

• MapReduce: No memory-resident state• Pregel: program can be specified as “think-like-a-vertex”

• BUT also simplifies programming• Lower end nodes than laptop• Implementation adds overhead

• Coordination• Cannot use application-specific optimizations

Why not Just a Laptop?• Capacity

• Large datasets, complex computations don’t fit in a laptop• Simplicity, convenience

• Nobody ever got fired for using Hadoop on a cluster • Integration with toolchain

• Example: ETL à SQL à Graph computation on Spark

Disclaimers• Graph computation is peculiar

• Some algorithms are computationally complex…• Even for small datasets• Good use case for single-server implementations

• Machine Learning in too…

Logistic Regression

“While VW can immediately start to update the model as data is read, Spark spends considerable time reading and caching the data, before it can run the first L-BFGS iteration.”

Gradient Boosted Trees

Understanding Bottlenecks

Monotasks• Decompose data analytics jobs into monotasks

• Monotask is basic unit of scheduling • Each monotask uses only one resource

• This is the opposite of pipelining• Parallelize use of CPU, network, disk

• MonoSpark• 9% slower than Spark• Performance predictability

Example: Spark• Non-uniform resource consumption• Concurrent access to same resources• Framework has no control on resource access

• Non-deterministic behavior, hard to debug/predict

Monotasks Principles• Each monotask uses one resource

• CPU or network or disk• Monotasks execute in isolation

• No interaction or blocking during execution• Per-resource schedulers controls contention

• Enough concurrency to ensure full capacity, not more• For example, one CPU task per core

• Per-resource schedulers have complete control over resource

Multitasks Execution

Monotask Execution

How to Break Task: Example

Issues?• Network scheduling is difficult: requires coordination• Complex dependencies: CPU might wait for disk• Memory cost

• Cannot pipeline from disk, need to load all data

Reasoning About Performance• Assume perfect parallelism/resource utilization

• They argue that it is a good approximation in MonoSpark• For each stage

• Measure utilization per monotask, take average• Estimate stage speedup with a different amount of resources

• Ignores• Skew• Dependencies and ramp-up time (networkingàCPUàdisk)

Different HW Configurations• Sort with constant I/O cost and decreasing CPU cost• Effect of adding second disk

Other Use Cases• Prioritizing optimizations

• Not trivial at all in concurrent workloads• Auto-configuration

Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful...

Documents

Transcript of Performance Analysis II - Marco Serafini · Scaling Up and Out •Scaling Up •More powerful...

More “normal” than Normal: Scaling distributions in complex systems

Scaling the Social Venture - University at Albany Scaling... · Paul Miesing, “Scaling the Social Venture” •An entity created when two or more firms pool resources to create

Cloud Scale Load Balancing - Rutgers Universitybadri/552dir/notes/W13-LB-four.pdf · Scaling •Vertical scaling (scale up) vs Horizontal scaling •Vertical scaling –More resource/capacity

The Scaling Dilemma Brisbane - YOW! Conferences...mary@poppendieck.com Mary Poppendieck Thank You! More Information: Title Microsoft PowerPoint - The Scaling Dilemma Brisbane.pptx

More on Multidimensional Scaling in R: smacof …More on Multidimensional Scaling and Unfolding in R: smacof Version 2 Patrick Mair Harvard University Patrick J. F. Groenen Erasmus

How Auto-Scaling Techniques Make Public Cloud Deployments More Cost-Effective

Scaling What Works Learnings from More Than A Meal Research

Replica Exchange with Solute Scaling: A More Efficient ... · Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2) Lingle

FEATURES - CONFIGURATIONS - SPECIFICATIONS is one of the more hazardous underground mining processes. Scaling activities are frequently performed in areas with unsupported ... scaling

Scaling up: The case of fodder shrubs in western Kenya Hellen Arimi, Dissemination facilitator Scaling up: “Bringing more benefits to more people over.

Charlie Serafini - Life Drawing Submission

MaxDiff Scaling: More Quantifiable Data and Less Respondent Fatigue

Industry Agenda More with Less: Scaling …...More with Less: Scaling Resource Efficiency and Sustainable Consumption 4 Acknowledgement of Project Board CEO Participants We would like

Serafini - Microfinance of KAF in Nepalweb.mit.edu/watsan/tech_hwts_docs/KAF/Serafini...Promotion and Microfinance of Kanchan Arsenic Filter in Rural Terai Region of Nepal Water is

Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more

Dynamic Speed Scaling Minimizing Expected Energy ... · Voltage and Frequency Scaling 1 Introduction Minimizing the energy consumption of embed-ded system is becoming more and more

Virtualize More: Secure Cloud Scaling-Grow Fast, Stay Nimble, Reduce Risk

Serafini Book PP3 - Appalachian State University · Reading Workshop in 180 Days (Serafini and Serafini Youngs 2006). In this unit stu- ... Mother Teresa Anne Sullivan Eve Shirley

Analytical Query Processing - Marco Serafini

Lost in Network Address Translation: Lessons from Scaling ...conferences.sigcomm.org/sigcomm/2015/pdf/papers/...common denominator in scaling more complex appliances: if we can’t