Natural Laws of Software Performance

Post on 27-May-2015

2.320 views 0 download

Tags:

description

Just like you can't defeat the laws of physics there are natural laws that ultimately decide software performance. Even the latest technology beta is still bound by Newton's laws, and you can't change the speed of light, even in the cloud!

Transcript of Natural Laws of Software Performance

Natural Laws of Software Performance

The changing face of performance optimization

Who Am I?

• Kendall Miller• One of the Founders of Gibraltar Software– Small Independent Software Vendor Founded in 2008– Developers of VistaDB and Gibraltar– Engineers, not Sales People

• Enterprise Systems Architect & Developer since 1995• BSE in Computer Engineering, University of Illinois

Urbana-Champaign (UIUC)• Twitter: @KendallMiller

Traditional Performance Optimization

• Run suspect use cases and find hotspots

• Very Linear• Finds unexpected

framework performance issues

• Final Polishing Step

Algorithms and Asymptotics

• Asymptotic (or ‘Big Oh’) Notation– Describes the growth rate of functions– Answers the question…

• Does execution time of A grow faster or slower than B?

• The rules of asymptotic notation say– A term of n^3 will tend to dominate a term of n^2– Therefore

• We can discount coefficients and lower order terms

– And so f(n) = 6n^2 + 2n + 3 + n^3– Can be expressed as O(n) = n^3

You Can’t Optimize out of Trouble

10 100 1000 10000 100000 10000000

5000

10000

15000

20000

25000

30000

Performance: Add Versus AddRange

AddAddRange

Number of Elements Added

Num

ber o

f Tic

ks

So Where Are We?

Laws• Immutable, invariant over time

Principles• Highly desirable, best practices evolving over

time

Tactics• Techniques embodying principles in a specific

domain

Moore’s Law

Components = Transistors

“The number of components in integrated circuits doubles every year”

Processor Iron Triangle

Clock Speed

Size

Com

plex

ity

Manufacturing Process

Spee

d of L

ight

Power

A Core Explosion

Before you Leap into Optimizing…

• Algorithms are your first step– Cores are a constant multiplier, algorithms provide

exponential effect– Everything we talk about today is ignored in O(n)

• Parallel processing on cores can get you a quick boost trading cost for modest boost

• Other tricks can get you more (and get more out of parallel)

Fork / Join Parallel Processing

• Split a problem into a number of independent problems

• Process each partition independently (potentially in parallel)

• Merge the results back together to get the final outcome (if necessary)

Fork / Join Examples

• Multicore Processors• Server Farm• Web Server– Original Http servers literally forked a process for

each request

Fork / Join in .NET

• System.Threading.ThreadPool• Parallel.ForEach• PLINQ• Parallel.Invoke

Fork / Join Usage

• Tasks that can be broken into “large enough” chunks that are independent of each other– Little shared state required to process

• Tasks with a low Join cost

Pipelines

• Partition a task based on stages of processing instead of data for processing

• Each stage of the pipeline processes independently (and typically concurrently)

• Stages are typically connected by queues– Producer (prev stage) & Consumer (next stage)

Pipeline Examples

• Order Entry & Order Processing• Classic Microprocessor Design– Break the instruction processing into stages and

process one stage per clock cycle• GPU Design– Combines Fork/Join with Pipeline

Pipeline Examples in .NET

• Not the ASP.NET processing Pipeline– No parallelism/multithreading/queueing

• Stream Processing• Map Reduce• BlockingCollection<T>• Gibraltar Agent

Pipeline Usage

• Significant shared state between data elements prevents decoupling them

• Linear processing requirements within parts of the workflow

Speculative Processing

• Isn’t there something you could be doing?• Do the work now when you can, throw the

results away if they aren’t useful

Speculative Processing Examples

• Microprocessor Branch Prediction• Search Indexing

Speculative Processing Usage

• Shift work from a future, performance critical operation to an earlier one.

• Either always valid (never has to be rolled back) or easy to roll back

Latency – The Silent Killer

• The time for the first bit to get from here to there

Typical LAN: 0.4ms

It’s the Law

• Speed of Light: 3x10^8 M/S• About 0.240 seconds to Geosynchronous orbit

and back• About 1 foot per nanosecond• 3GHz : 1/3rd ns period = 4 inches

New York London

5500 KM

18 ms

TCP Socket Establish: 54ms

L1 Cache

Memory

Local Storage (SSD)

LAN

Internet

3E+00 3E+01 3E+02 3E+03 3E+04 3E+05 3E+06 3E+07L1 Cache Memory Local Stor-

age (SSD)LAN Internet

Latency (ns) 2 40 50000 381000 18000000

Latency (ns)

Caching

• Save results of earlier work nearby where they are handy to use again later

• Cheat: Don’t make the call• Cheat More: Apply in front of anything that’s

time consuming

Why Caching?

• Apps ask a lot of repeating questions.– Stateless applications even more so

• Answers don’t change often• Authoritative information is expensive• Loading the world is impractical

Caching in Hardware

• Processor L1 Cache (typically same core)• Processor L2 (shared by cores)• Processor L3 (between proc & main RAM)• Disk Controllers• Disk Drives• …

.NET Caching Examples

• ASP.NET Output Cache• System.Web.Cache (ASP.NET only)• AppFabric Cache

Go Asynchronous

• Delegate the latency to something that will notify you when it’s complete

• Do other useful stuff while waiting.– Otherwise you’re just being efficient, not faster

• Maximize throughput by scheduling more work than can be done if there weren’t stalls

.NET Async Examples

• Standard Async IO Pattern• .NET 4 Task<T>• Combine with Queuing to maximize

throughput even without parallelization

Visual Studio Async CTP

• async methods will compile to run asynchronously

• await forces method to stall execution until the async call completes before proceeding

Batching

• Get your money’s worth out of every latency hit

• Tradeoff storage for duration

General Batching Examples

• Shipping – Many packages on one truck• Train travel• TCP Sockets

Batching in Code

• SQL Connection Pooling• HTTP Keep Alive• DataSet / Entity Collections• CSS Sprites

Optimistic Messaging

• Assume it’s all going to work out and just keep sending

• Be ready to step back & go another way when it doesn’t work out

Side Points

• Stateful interaction general increases the cost of latency

• Minimize Copying– It takes blocking time to copy data, introducing

latency• Your Mileage May Vary– Latency on a LAN can be dramatically affected by

hardware and configuration

Critical Lessons Learned

• Algorithms, Algorithms, Algorithms

• Plan for Latency & Failure

• Explicitly Design for Parallelism

Additional Information:

Websites– www.GibraltarSoftware.com– www.eSymmetrix.com

Follow Up– Kendall.Miller@eSymmetrix.com– Twitter: kendallmiller