Natural Laws of Software Performance
-
Upload
gibraltar-software -
Category
Technology
-
view
2.319 -
download
0
description
Transcript of Natural Laws of Software Performance
Natural Laws of Software Performance
The changing face of performance optimization
Who Am I?
• Kendall Miller• One of the Founders of Gibraltar Software– Small Independent Software Vendor Founded in 2008– Developers of VistaDB and Gibraltar– Engineers, not Sales People
• Enterprise Systems Architect & Developer since 1995• BSE in Computer Engineering, University of Illinois
Urbana-Champaign (UIUC)• Twitter: @KendallMiller
Traditional Performance Optimization
• Run suspect use cases and find hotspots
• Very Linear• Finds unexpected
framework performance issues
• Final Polishing Step
Algorithms and Asymptotics
• Asymptotic (or ‘Big Oh’) Notation– Describes the growth rate of functions– Answers the question…
• Does execution time of A grow faster or slower than B?
• The rules of asymptotic notation say– A term of n^3 will tend to dominate a term of n^2– Therefore
• We can discount coefficients and lower order terms
– And so f(n) = 6n^2 + 2n + 3 + n^3– Can be expressed as O(n) = n^3
You Can’t Optimize out of Trouble
10 100 1000 10000 100000 10000000
5000
10000
15000
20000
25000
30000
Performance: Add Versus AddRange
AddAddRange
Number of Elements Added
Num
ber o
f Tic
ks
So Where Are We?
Laws• Immutable, invariant over time
Principles• Highly desirable, best practices evolving over
time
Tactics• Techniques embodying principles in a specific
domain
Moore’s Law
Components = Transistors
“The number of components in integrated circuits doubles every year”
Processor Iron Triangle
Clock Speed
Size
Com
plex
ity
Manufacturing Process
Spee
d of L
ight
Power
A Core Explosion
Before you Leap into Optimizing…
• Algorithms are your first step– Cores are a constant multiplier, algorithms provide
exponential effect– Everything we talk about today is ignored in O(n)
• Parallel processing on cores can get you a quick boost trading cost for modest boost
• Other tricks can get you more (and get more out of parallel)
Fork / Join Parallel Processing
• Split a problem into a number of independent problems
• Process each partition independently (potentially in parallel)
• Merge the results back together to get the final outcome (if necessary)
Fork / Join Examples
• Multicore Processors• Server Farm• Web Server– Original Http servers literally forked a process for
each request
Fork / Join in .NET
• System.Threading.ThreadPool• Parallel.ForEach• PLINQ• Parallel.Invoke
Fork / Join Usage
• Tasks that can be broken into “large enough” chunks that are independent of each other– Little shared state required to process
• Tasks with a low Join cost
Pipelines
• Partition a task based on stages of processing instead of data for processing
• Each stage of the pipeline processes independently (and typically concurrently)
• Stages are typically connected by queues– Producer (prev stage) & Consumer (next stage)
Pipeline Examples
• Order Entry & Order Processing• Classic Microprocessor Design– Break the instruction processing into stages and
process one stage per clock cycle• GPU Design– Combines Fork/Join with Pipeline
Pipeline Examples in .NET
• Not the ASP.NET processing Pipeline– No parallelism/multithreading/queueing
• Stream Processing• Map Reduce• BlockingCollection<T>• Gibraltar Agent
Pipeline Usage
• Significant shared state between data elements prevents decoupling them
• Linear processing requirements within parts of the workflow
Speculative Processing
• Isn’t there something you could be doing?• Do the work now when you can, throw the
results away if they aren’t useful
Speculative Processing Examples
• Microprocessor Branch Prediction• Search Indexing
Speculative Processing Usage
• Shift work from a future, performance critical operation to an earlier one.
• Either always valid (never has to be rolled back) or easy to roll back
Latency – The Silent Killer
• The time for the first bit to get from here to there
Typical LAN: 0.4ms
It’s the Law
• Speed of Light: 3x10^8 M/S• About 0.240 seconds to Geosynchronous orbit
and back• About 1 foot per nanosecond• 3GHz : 1/3rd ns period = 4 inches
New York London
5500 KM
18 ms
TCP Socket Establish: 54ms
L1 Cache
Memory
Local Storage (SSD)
LAN
Internet
3E+00 3E+01 3E+02 3E+03 3E+04 3E+05 3E+06 3E+07L1 Cache Memory Local Stor-
age (SSD)LAN Internet
Latency (ns) 2 40 50000 381000 18000000
Latency (ns)
Caching
• Save results of earlier work nearby where they are handy to use again later
• Cheat: Don’t make the call• Cheat More: Apply in front of anything that’s
time consuming
Why Caching?
• Apps ask a lot of repeating questions.– Stateless applications even more so
• Answers don’t change often• Authoritative information is expensive• Loading the world is impractical
Caching in Hardware
• Processor L1 Cache (typically same core)• Processor L2 (shared by cores)• Processor L3 (between proc & main RAM)• Disk Controllers• Disk Drives• …
.NET Caching Examples
• ASP.NET Output Cache• System.Web.Cache (ASP.NET only)• AppFabric Cache
Go Asynchronous
• Delegate the latency to something that will notify you when it’s complete
• Do other useful stuff while waiting.– Otherwise you’re just being efficient, not faster
• Maximize throughput by scheduling more work than can be done if there weren’t stalls
.NET Async Examples
• Standard Async IO Pattern• .NET 4 Task<T>• Combine with Queuing to maximize
throughput even without parallelization
Visual Studio Async CTP
• async methods will compile to run asynchronously
• await forces method to stall execution until the async call completes before proceeding
Batching
• Get your money’s worth out of every latency hit
• Tradeoff storage for duration
General Batching Examples
• Shipping – Many packages on one truck• Train travel• TCP Sockets
Batching in Code
• SQL Connection Pooling• HTTP Keep Alive• DataSet / Entity Collections• CSS Sprites
Optimistic Messaging
• Assume it’s all going to work out and just keep sending
• Be ready to step back & go another way when it doesn’t work out
Side Points
• Stateful interaction general increases the cost of latency
• Minimize Copying– It takes blocking time to copy data, introducing
latency• Your Mileage May Vary– Latency on a LAN can be dramatically affected by
hardware and configuration
Critical Lessons Learned
• Algorithms, Algorithms, Algorithms
• Plan for Latency & Failure
• Explicitly Design for Parallelism
Additional Information:
Websites– www.GibraltarSoftware.com– www.eSymmetrix.com
Follow Up– [email protected]– Twitter: kendallmiller