ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf ·...
Transcript of ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf ·...
![Page 1: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/1.jpg)
ECE 2162Evaluation & Metrics
![Page 2: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/2.jpg)
2162 Fall 2009 2
Performance
• Two common measures– Latency (how long to do X)
•Also called response time and execution time– Throughput (how often can it do X)
• Example of car assembly line– Takes 6 hours to make a car
(latency is 6 hours)– A car leaves every 5 minutes
(throughput is 12 cars per hour)– Overlap results in Throughput > 1/Latency
![Page 3: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/3.jpg)
2162 Fall 2009 3
Measuring Performance
• Peak (MIPS, MFLOPS)– Often not useful
• unachievable in practice, or unsustainable
![Page 4: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/4.jpg)
2162 Fall 2009 4
Measuring Performance
• Benchmarks– Real applications and application suites
•E.g., SPEC CPU2000, SPEC2006, TPC-C, TPC-H– Kernels
•“Representative” parts of real applications•Easier and quicker to set up and run•Often not really representative of the entire app
– Toy programs, synthetic benchmarks, etc.•Not very useful for reporting•Sometimes used to test/stress specific
functions/features
![Page 5: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/5.jpg)
2162 Fall 2009 5
SPEC CPU (integer)
“Representative” applications keeps growing with time!
![Page 6: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/6.jpg)
2162 Fall 2009 6
SPEC CPU (floating point)
![Page 7: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/7.jpg)
2162 Fall 2009 7
Price-Performance
![Page 8: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/8.jpg)
2162 Fall 2009 8
TPC Benchmarks
• Measure transaction-processing throughput• Benchmarks for different scenarios
– TPC-C: warehouses and sales transactions– TPC-H: ad-hoc decision support– TPC-W: web-based business transactions
• Difficult to set up and run on a simulator– Requires full OS support, a working DBMS– Long simulations to get stable results
![Page 9: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/9.jpg)
2162 Fall 2009 9
Throughput-Server Perf/Cost
High performanceVery expensive!
![Page 10: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/10.jpg)
2162 Fall 2009 10
CPU Performance Equation (1)
timecycleClock CyclesClock CPU timeCPU ×=
timecycleClock nInstructioPer CyclesCount n Instructio timeCPU ××=
CycleClock Seconds
nInstructioCyclesClock
ProgramnsInstructio
ProgramSeconds timeCPU ××==
Hardware Technology,Organization
Organization, ISA
ISA,CompilerTechnology
A.K.A. The “iron law” of performance
![Page 11: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/11.jpg)
2162 Fall 2009 12
CPU Version
• Program takes 33 billion instructions to run• CPU processes insts at 2 cycles per inst• Clock speed of 3GHz
CycleClock Seconds
nInstructioCyclesClock
ProgramnsInstructio
ProgramSeconds timeCPU ××==
= 22 seconds
Sometimes clock cycle time giveninstead (ex. cycle = 333 ps)
IPC sometimes used instead of CPI
![Page 12: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/12.jpg)
2162 Fall 2009 13
CPU Performance Equation (2)
timecycleClock CyclesClock CPU timeCPU ×=
timecycleClock CPI IC timeCPU n
1iii ××
= ∑
=
For each kind of instruction
How many instructions of this kind are there in the program
How many cycles it takes to execute an instruction of this kind
![Page 13: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/13.jpg)
2162 Fall 2009 14
CPU performance w/ different instructionsInstruction
TypeFrequency CPI
Integer 40% 1.0
Branch 20% 4.0
Load 20% 2.0
Store 10% 3.0
timecycleClock CPI IC timeCPU n
1iii ××
= ∑
=
Total Insts = 50B, Clock speed = 2 GHz
![Page 14: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/14.jpg)
2162 Fall 2009 15
Comparing Performance
• “Machine X is n times faster than Y”
• “Throughput of X is n times that of Y”
ntimeExecution timeExecution
X
Y =
nunit timeper Tasksunit timeper Tasks
Y
X =
![Page 15: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/15.jpg)
2162 Fall 2009 16
If Only it Were That Simple
• “X is n times faster than Y on A”
• But what about different applications(or even parts of the same application)– X is 10 times faster than Y on A, and 1.5 times on
B, but Y is 2 times faster than X on C,and 3 times on D, and…
nX machineon A app of timeExecution Y machineon A app of timeExecution =
So does X have better performance than Y?
Which would you buy?
![Page 16: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/16.jpg)
2162 Fall 2009 17
Summarizing Performance: Averages
• Arithmetic mean (A.M.)
• Geometric mean (G. M.)
• Harmonic mean (H. M.)
nxxx n+++ 21
nnxxx 21
nxxx
n111
21
++
H.M ≤ G.M ≤ A.M
![Page 17: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/17.jpg)
2162 Fall 2009 18
Speedup
Machine A Machine B
Program 1 5 sec 4 sec
Program 2 3 sec 6 sec
What is the speedup of A compared to B on Program 1?
What is the speedup of A compared to B on Program 2?
What is the average speedup?
What is the speedup of A compared to B on Sum(Program1, Program2) ?
![Page 18: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/18.jpg)
2162 Fall 2009 19
Normalizing & the Geometric Mean
• Speedup of A.M. != A.M. of speedup
• Use G.M.:– E.g. speedup of G.M.
– E.g. G.M. of speedup
• Neat property of the geometric mean:Consistent whatever the reference machine
• Do not use the arithmetic mean for normalized execution times
3564
××
nn
ii∏
=1
on timeexecution Normalized
36
54×
![Page 19: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/19.jpg)
2162 Fall 2009 20
CPI/IPC
• Often when making comparisons in comp-arch studies:– Program (or set of) is the same for two CPUs– The clock speed is the same for two CPUs
• So we can just directly compare CPI’s and often we use IPC’s
![Page 20: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/20.jpg)
2162 Fall 2009 21
Average CPI vs. “Average” IPC
• Average CPI =(CPI1 + CPI2 + … + CPIn)/n
• A.M. of IPC = (IPC1 + IPC2 + … + IPCn)/n
• Must use Harmonic Mean to remain ∝ to runtime
Not Equal to (A.M. of CPI)-1 !!!
![Page 21: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/21.jpg)
2162 Fall 2009 22
Harmonic Mean
• H.M.(x1,x2,x3,…,xn) = n
1 + 1 + 1 + … + 1x1 x2 x3 xn
• What in the world is this?– Average of inverse relationships
![Page 22: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/22.jpg)
2162 Fall 2009 23
A.M.(CPI) vs. H.M.(IPC)
• “Average” IPC = 1A.M.(CPI)
= 1CPI1 + CPI2 + CPI3 + … + CPInn n n n
= nCPI1 + CPI2 + CPI3 + … + CPIn
= n1 + 1 + 1 + … + 1 =H.M.(IPC)
IPC1 IPC2 IPC3 IPCn
![Page 23: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/23.jpg)
2162 Fall 2009 24
Amdahl’s Law (1)
new
old
TimeExecution TimeExecution
tEnhancemen with TimeExecution tEnhancemen without TimeExecution Speedup ==
What if enhancement does not enhance everything?
Possiblet when Enhancemen using TimeExecution allat t Enhancemen using without TimeExecution Speedup =
( )
+−= ×
Enhanced
EnhancedEnhancedoldnew Speedup
FractionFraction1TimeExecution TimeExecution
( )
+−
=
Enhanced
EnhancedEnhanced Speedup
FractionFraction1
1 Speedup Overall
Caution: fractionof What?
![Page 24: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/24.jpg)
2162 Fall 2009 25
Amdahl’s Law (2)
• Make the Common Case Fast
( )
+−
=
Enhanced
EnhancedEnhanced Speedup
FractionFraction1
1 Speedup Overall
20 SpeedupEnhanced = 0.1 FractionEnhanced =
( )105.1
201.01.01
1 Speedup =
+−
=
1.2 SpeedupEnhanced = .90 FractionEnhanced =
( )176.1
2.19.09.01
1 Speedup =
+−
=
VS
Important: Principle of localityApprox. 90% of the time spent in 10% of the code
![Page 25: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/25.jpg)
2162 Fall 2009 26
Amdahl’s Law (3)
• Diminishing Returns
2 SpeedupGreen =
21 FractionGreen =
33.1 SpeedupOverall =
Green Phase Blue PhaseTotal Execution Time
Generation 1
2 SpeedupGreen =
31 FractionGreen =
2.1 SpeedupOverall =
Green BlueTotal Execution Time
Generation 2
Generation 3
BlueTotal Execution Time
over Generation 2
over Generation 1
![Page 26: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/26.jpg)
2162 Fall 2009 28
Now Consider Price-Performance
• Without Turbo– Car costs $8,000 to manufacture– Selling price is $12,000 $4K profit per car– If we sell 10,000 cars, that’s $40M in profit
• With Turbo– Car costs extra $3,000– Selling price is $16,000 $5K profit per car– But only a few gear heads buy the car:
•We only sell 400 cars and make $2M in profit
![Page 27: ECE 2162 Evaluation & Metrics - University of Pittsburghjuy9/2162/slides-F2009/2_EvalMetrics.pdf · 2162 Fall 2009 2 Performance •Two common measures – Latency (how long to do](https://reader031.fdocuments.us/reader031/viewer/2022021911/5c41136493f3c338be2f5d8b/html5/thumbnails/27.jpg)
2162 Fall 2009 29
CPU Design is Similar
• What does it cost me to add some performance enhancement?
• How much effective performance do I get out of it?– 100% speedup for small fraction of time wasn’t a big win for the car
example
• How much more do I have to charge for it?– Extra development, testing, marketing costs
• How much more can I charge for it?– Does the market even care?
• How does the price change affect volume?