Data Structures and Algorithms Analysis of Algorithms Richard Newman.
-
Upload
leslie-hunter -
Category
Documents
-
view
215 -
download
1
Transcript of Data Structures and Algorithms Analysis of Algorithms Richard Newman.
![Page 1: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/1.jpg)
Data Structures and AlgorithmsAnalysis of Algorithms
Richard Newman
![Page 2: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/2.jpg)
Players Boss/Manager/Customer
– Wants a cheap solution– Cheap = efficient
Programmer/developer– Wants to solve the problem, deliver system
Theoretician– Wants to understand
Student– Might play any or all of these roles some
day
![Page 3: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/3.jpg)
Why Analyze Algorithms?
• Predict performance
• Compare algorithms
• Provide guarantees
• Understand theory
• Practical reason: avoid poor performance!
• Also – avoid logical/design errors
![Page 4: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/4.jpg)
Algorithmic Success Stories• DFT
Discrete Fourier Transform Take N samples of waveform Decompose into periodic
components
• Used in DVD, JPEG, MPEG, MRI, astrophysics, ....
• Brute force: N2 steps
• FFT algorithm: N lg N steps
![Page 5: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/5.jpg)
Algorithmic Success Stories• B-Body Simulation
Simulate gravitational interactions among N bodies
• Brute force: N2 steps
• Barnes-Hut algorithm: N lg N steps
![Page 6: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/6.jpg)
The Challenge
Will my algorithm be able to solve problem with large practical input?
– Time– Memory– Power
Knuth (1970's) – use scientific method to understand performance
![Page 7: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/7.jpg)
Scientific Method
Observe feature of natural world Hypothesize a model consistent with observations Predict events using hypothesis Test predictions experimentally Iterate until hypothesis and observations agree
![Page 8: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/8.jpg)
Scientific Method Principles
Experiments must be reproducible Hypotheses must be falsifiable
![Page 9: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/9.jpg)
Example: 3-Sum Given N distinct integers, how many
triples sum up to exactly zero
% cat 8ints.txt830 -40 -20 -10 40 0 10 5
% ./ThreeSum 8ints.txt4
![Page 10: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/10.jpg)
3-Sum Brute Force Algo
For i=0 to N-1
For j=i+1 to N-1
For k=j+1 to N-1
If a[i] + a[j] + a[k] == 0
count++
return count
![Page 11: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/11.jpg)
Measuring Running TimeManually Start stopwatch when starting program Stop it when program finishes Can do this in script (date)
Internally Use C library function time() Can insert calls around code of interest
– Avoid initialization, etc.
![Page 12: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/12.jpg)
Measuring Running TimeStrategy Run program on various input sizes Measure time for each Can do this in script also Plot results
tools: http://www.opensourcetesting.org/performance.php
![Page 13: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/13.jpg)
Measuring Running Time
What do you think the time will be for input of size 16,000?Why?
N Time (s.)
250 0.0
500 0.0
1000 0.1
2000 0.8
4000 6.4
8000 51.1
16000 ?
![Page 14: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/14.jpg)
Data AnalysisStandard Plot Plot running time T(N) vs. input size N Use linear scales for both
![Page 15: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/15.jpg)
Data AnalysisLog-log Plot If straight line Slope gives power
lg y = m lg x + b y = 2b xm
![Page 16: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/16.jpg)
Hypothesis, Prediction, Validation
Hypothesis: running time 10-10 N3 Prediction: T(16,000) = 409.6 sObservation: T(16,000) = 410.8
N Time (s.)
250 0.0
500 0.0
1000 0.1
2000 0.8
4000 6.4
8000 51.1
16000 ?
![Page 17: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/17.jpg)
Doubling HypothesisQuick way to estimate slope m in log-log plotStrategy: Double size of input each run Run program on doubled input sizes Measure time for each Take ratio of times If polynomial, should converge to power
![Page 18: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/18.jpg)
Doubling Hypothesis
Hypothesis: running time 10-10 N3 Prediction: T(16,000) = 409.6 sObservation: T(16,000) = 410.8
N time ratio lg ratio
500 0.0 - -
1000 0.1 6.9 2.8
2000 0.8 7.7 2.9
4000 6.4 8.0 3.0
8000 51.1 8.0 3.0
16000 410.8 8.0 3.0
![Page 19: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/19.jpg)
Doubling Hypothesis
Hypothesis: running time is about aNb
With b = lg(ratio of running times)
Caveat!!! Cannot identify logarithmic factors
How to find a? Take large input, equate time to hypothesized time with b as estimated, then solve for a
![Page 20: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/20.jpg)
Experimental AlgorithmicsSystem Independent Effects Algorithm Input data
![Page 21: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/21.jpg)
Experimental AlgorithmicsSystem Dependent Effects Hardware: CPU, memory, cache, ... Software: compiler, interpreter, garbage collection, ... System: OS, network, other processes
![Page 22: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/22.jpg)
Experimental AlgorithmicsBad news Hard to get precise measurements
Good news Easier than other physical sciences! Can run huge number of experiments
![Page 23: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/23.jpg)
Mathematical Running Time Models
Total running time = sum (cost x freq) Need to analyze program to determine set of operations over which weighted sum is computed Cost depends on machine, compiler Frequency depends on algorithm, input data
Donald Knuth1974 Turing Award
![Page 24: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/24.jpg)
How to Estimate Constants?
*Running OS X on Macbook Pro 2.2 GHz 2 GB RAM
Operation example Time* (ns)
Integer add a + b 2.1
Integer multiply a * b 2.4
Integer divide a / b 5.4
Fp add a + b 4.6
Fp multiply a * b 4.2
Fp divide a / b 13.5
sine Math.sine(theta) 91.3
arctangent Math.atan2(x,y) 129.0
... ... ...
![Page 25: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/25.jpg)
Experimental AlgorithmicsObservation: most primitive functions take constant time Warning: non-primitive often do not!
How many instructions as f(input size)?
int count = 0;for (int i = 1; i < N; ++i) if (a[i] == 0) count++;
![Page 26: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/26.jpg)
Experimental Algorithmicsint count = 0;for (int i = 1; i < N; ++i) if (a[i] == 0) count++;
Operation Frequency
Var declaration 2
assignment 2
< compare N+1
== compare N
array access N
increment N to 2N
![Page 27: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/27.jpg)
Counting Frequency - Loopsint count = 0;for (int i = 1; i < N; ++i) for (int j = i+1; j < N, ++j) if (a[i] + a[j] == 0) count++;
How many additions in loop?N-1 + N-2 + ... + 3 + 2 + 1 = (1/2) N (N-1)Exact number of other operations?Tedious and difficult....
![Page 28: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/28.jpg)
Experimental AlgorithmicsObservation: tedious at bestStill may have noise!Approach: Simplify! Use some basic operation as proxy e.g., array accesses
int count = 0;for (int i = 1; i < N; ++i) for (int j = i+1; j < N; ++j) if (a[i] + a[j] == 0) count++;
![Page 29: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/29.jpg)
Experimental AlgorithmicsObservation: lower order terms become less important as input size increasesStill may be important for “small” inputsApproach: Simplify! Use ~ Ignore lower order terms
– N large, they are negligible– N small, who cares?
![Page 30: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/30.jpg)
Leading Term ApproximationExamples
Ex 1: 1/6 N3 + 20 N + 16 ~ 1/6 N3
Ex 2: 1/6 N3 + 100 N4/3 + 56 ~ 1/6 N3
Ex 3: 1/6 N3 – 1/2 N2 + 1/3 N ~ 1/6 N3
Discard lower order termse.g., N=1000, 166.67 million vs. 166.17 million
![Page 31: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/31.jpg)
Leading Term ApproximationTechnical definition:
f(N) ~ g(N) means limit = 1 f(N)N -> inf g(N)
![Page 32: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/32.jpg)
Bottom Lineint count = 0;for (int i = 1; i < N; ++i) for (int j = i+1; j < N, ++j) if (a[i] + a[j] == 0) count++;
How many array accesses in loop?~ N2
Use cost model and ~ notation!
![Page 33: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/33.jpg)
Example - 3-Sumint count = 0;for (int i = 1; i < N; ++i) for (int j = i+1; j < N; ++j) for (int k = j+1; k < N; ++k) if (a[i] + a[j] + a[k] == 0) count++;
How many array accesses in loop?Execute N (N-1)(N-2)/3! Times ~ (1/6)N3
~ (1/2) N3 array accesses (3 per stmt)Use cost model and ~ notation!
![Page 34: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/34.jpg)
Estimating Discrete Sums
Take Discrete Math (remember?)Telescope series, inductive proof
Approximate with integralDoesn't always work!
Use Maple or Wolfram Alpha
![Page 35: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/35.jpg)
Takeaway
In principle, accurate mathematical models
In practiceFormulas can be complicatedAdvanced math might be neededAre subject to noise anywayExact models – leave to experts!
We will use approximate models
![Page 36: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/36.jpg)
Order-of-Growth ClassesOrder
ofGrowth
Name Typical code desdription example T(2N)
T(N)
1 constant a=b+c Statement Add two numbers
1
log N logarithmic while(N>1) N=N/2
Divide in half
Binary search
~1
N linear for(i=0 to N-1) {...}
loop Find the maximum
2
N log N linearithmic See sorting Divide and conquer
mergesort ~2
N2 quadratic for(i=0 to N-1) for(j=0 to N-1) { ... }
Double loop
Check all pairs
4
N3 cubic for(i=0 to N-1) for(j=0 to N-1) for(k=0 to N-1) { ... }
Triple loop Check all triples
8
2N exponential See combinatorial search
Exhaustive search
Check all subsets
T(N)
![Page 37: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/37.jpg)
Order-of-Growth
Definition: If f(N) ~ c g(N) for some constant c > 0, then f(N) is O(g(N))
– Ignores leading coefficient– Ignores lower order terms
Brassard notation: O(g(N)) is the set of all functions with the same order So 3-Sum algorithm is order N3
– Leading coefficient depends on hardware, compiler, etc.
![Page 38: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/38.jpg)
Order-of-Growth
Good News! The following set of functions suffices to describe order of growth of most algorithms:
1, log N, N, N log N, N2, N3, 2N, N!
![Page 39: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/39.jpg)
Order-of-Growth
![Page 40: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/40.jpg)
Binary Search
Goal: Given a sorted array and a key, find the index of the key in the array Binary Search: Compare key against middle entry (of what is left)
– Too small, go left– Too big, go right– Equal, found
![Page 41: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/41.jpg)
Binary Search Implementation
Trivial to implement? First binary search published in 1946 First bug-free version in 1962 Bug in Java's Arrays.binarySearch() discovered in 2006!http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
![Page 42: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/42.jpg)
Binary Search – Math Analysis
Proposition: BS uses at most 1+lg N key compares for a sorted array of size N Defn: T(N) = # key compares on sorted array of size <= N Recurrence: for N > 1, T(N) <= T(N/2) + 1 for N = 1, T(1) = 1
![Page 43: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/43.jpg)
Binary Search – Math Analysis
Recurrence: for N > 1, T(N) <= T(N/2) + 1 for N = 1, T(1) = 1 Pf Sketch: (Assume N a power of 2)T(N) <= T(N/2) + 1
<= T(N/4) + 1 + 1<= T(N/8) + 1 + 1 + 1 ...<= T(N/N) + 1 + 1 + 1 + ... + 1= 1 + lg N
![Page 44: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/44.jpg)
3-Sum
Version 0: N3 time, N space Version 1: N2 log N time, N space Version 2: N2 time, N space
![Page 45: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/45.jpg)
3-Sum – N2 log N Algorithm
Algorithm– Sort the N (distinct) integers– For each pair of numbers a[i] and a[j], – Binary Search for -(a[i] + a[j])
Analysis: Order of growth is N2 log N– Step 1: N2 using insertion sort– Step 2: N2 log N with binary search
Can achieve N2 by modifying BS step
![Page 46: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/46.jpg)
Comparing Programs Hypothesis: Version 1 is significantly faster in practice than Version 0
N Time (s)
1000 0.1
2000 0.8
4000 6.4
8000 51.1
N Time (s)
1000 0.14
2000 0.18
4000 0.34
8000 0.96
16000 3.67
32000 14.88
64000 59.16
Version 0 Version 1
Theory works well in practice!
![Page 47: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/47.jpg)
Memory
Bit: 0 or 1 (binary digit) Byte: 8 bits (wasn't always that way) Megabyte (MB): 1 million or 220 bytes Gigabyte (GB): 1 billion or 230 bytes
NIST and networks guys Everybody else
![Page 48: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/48.jpg)
Memory
64-bit machine: assume 8-byte pointers• Can address more memory• Pointers use more space• Some JVMs “compress” ordinary
object pointers to 4 bytes to avoid this cost
![Page 49: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/49.jpg)
Typical Memory Usage
Type Bytes
boolean 1
byte 1
char 2
int 4
float 4
long 8
double 8
Type Bytes
char[ ] 2N + 24
int[ ] 4N + 24
double[ ] 8N + 24
Type Bytes
char[ ][ ] ~2MN
int[ ][ ] ~4MN
double[ ][ ] ~8MNPrimitive types
1-D arrays
2-D arrays
![Page 50: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/50.jpg)
Typical Java Memory UsageObject Overhead: 16 bytesObject Reference: 8 bytesPadding: Objects use multiple of 8 bytes
Ex: Date objectpublic class Date { private int day; private int month; private int year; ...}
ObjectOverhead
day
month
year
padding
4 bytes (int)4 bytes (int)
4 bytes (int)
4 bytes (pad)
16 bytes (OH)
32 bytes total
![Page 51: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/51.jpg)
Summary
Empirical Analysis:Execute pgm to perform experimentsAssume power law, formulate hypothesis
for running timeModel allows us to make predictions
![Page 52: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/52.jpg)
Summary
Mathematical Analysis:Analyze algo to count freq of operationsUse tilde notation to simplify analysisModel allows us to explain behavior
![Page 53: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/53.jpg)
Summary
Scientific MethodMathematical model is independent of
particular system, applies to machinesnot yet built
Empirical approach needed to validate theory, and to make predictions
![Page 54: Data Structures and Algorithms Analysis of Algorithms Richard Newman.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d315503460f94a09a4b/html5/thumbnails/54.jpg)
Next – Lecture 5
Read Chapter 3 Basic data structures