Post on 21-Dec-2015
15-447 Computer Architecture Fall 2007 ©
September 19, 2007
Karem Sakallahksakalla@qatar.cmu.edu
www.qatar.cmu.edu/~msakr/15447-f07/
CS-447– Computer Architecture
M,W 10-11:20am
Lecture 7Performance (Cont’d)
15-447 Computer Architecture Fall 2007 ©
Today
°Lecture & Discussion
°Next Lecture: Review
°Read the chapters & slides.
°Practice the performance examples in the Patterson book.
Done by nowDone by now
15-447 Computer Architecture Fall 2007 ©
Assessing & Understanding Performance
This chapter discusses how to measure, report, and summarize performance of a computer.
15-447 Computer Architecture Fall 2007 ©
Motivation
It is often helpful to have some yardstick by which to compare systems
• During development to evaluate different algorithms or optimizations
• During purchasing to compare between product offerings
• …
15-447 Computer Architecture Fall 2007 ©
° Measure, Report, and Summarize
° Make intelligent choices
° See through the marketing hype
° Key to understanding underlying organizational motivation
Performance
15-447 Computer Architecture Fall 2007 ©
Performance
Why is some hardware better than others for different programs?
What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)
How does the machine's instruction set affect performance?
15-447 Computer Architecture Fall 2007 ©
Which of these airplanes has the best performance?
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544
°How much faster is the Concorde compared to the 747? °How much bigger is the 747 than the Douglas DC-8?
15-447 Computer Architecture Fall 2007 ©
°Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database
query?
°Throughput— How many jobs can the machine run at
once?— What is the average execution rate?— How much work is getting done?
Computer Performance
15-447 Computer Architecture Fall 2007 ©
°Elapsed Time
•counts everything (disk and memory accesses, I/O , etc.)
•a useful number, but often not good for comparison purposes
Execution Time
15-447 Computer Architecture Fall 2007 ©
Execution Time°CPU time
•doesn't count I/O or time spent running other programs
•can be broken up into system time, and user time
•Our focus: user CPU time
•time spent executing the lines of code that are "in" our program
15-447 Computer Architecture Fall 2007 ©
°For some program running on machine X,
PerformanceX = 1 / Execution timeX
"X is n times faster than Y"
PerformanceX / PerformanceY = n
Definition of Performance
15-447 Computer Architecture Fall 2007 ©
Definition of Performance
Problem:• machine A runs a program in 20 seconds
• machine B runs the same program in 25 seconds
15-447 Computer Architecture Fall 2007 ©
How to compare the performance? Total Execution Time : A Consistent Summary Measure
Comparing and Summarizing Performance
Computer A Computer BProgram1(sec) 1 10Program2(sec) 1000 100Total time (sec) 1001 110
1.9110
1001
TimeB
Execution
TimeAExecution
AePerformanc
BePerformanc
15-447 Computer Architecture Fall 2007 ©
Clock Cycles
° Instead of reporting execution time in
seconds, we often use cycles
°Clock “ticks” indicate when to start activities
(one abstraction): time
seconds
program
cycles
program
seconds
cycle
15-447 Computer Architecture Fall 2007 ©
Clock cycles
° cycle time = time between ticks = seconds per cycle
° clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec)
A 4 Ghz clock has a 250ps cycle time
15-447 Computer Architecture Fall 2007 ©
CPU Execution Time
rateclockondscycle
onds
cycle
Cycle
SecondsCyclesSeconds
CPU
sec/
sec/
Program
cycles
ProgramProgram
time)cycle(clock x program) afor cyclesclock (CPU
program afor timeexecution
15-447 Computer Architecture Fall 2007 ©
So, to improve performance (everything else being equal) you can either increase or decrease?
________ the # of required cycles for a program, or________ the clock cycle time or, said another way, ________ the clock rate.
How to Improve Performanceseconds
program
cycles
program
seconds
cycle
15-447 Computer Architecture Fall 2007 ©
So, to improve performance (everything else being equal) you can either increase or decrease?
_decrease_ the # of required cycles for a program, or_decrease_ the clock cycle time or, said another way, _increase_ the clock rate.
How to Improve Performanceseconds
program
cycles
program
seconds
cycle
15-447 Computer Architecture Fall 2007 ©
Could assume that # of cycles equals # of instruction
time
1st
inst
ruct
ion
2nd
in
stru
ctio
n
3rd
in
stru
ctio
n
4th
5th
6th ...
How many cycles are required for a program?
This assumption is incorrect, different instructions take different amounts of time on different machines.
15-447 Computer Architecture Fall 2007 ©
° Multiplication takes more time than addition° Floating point operations take longer than integer ones° Accessing memory takes more time than accessing
registers° Important point: changing the cycle time often changes
the number of cycles required for various instructions
time
Different numbers of cycles for different instructions
15-447 Computer Architecture Fall 2007 ©
Now that we understand cycles
Components of Performance Units of Measure
CPU execution time for a program
Seconds for the program
Instruction count Instructions executed for the program
Clock Cycles per Instruction (CPI)
Average number of clock cycles per instruction
Clock cycle time Seconds per clock cycle
15-447 Computer Architecture Fall 2007 ©
CPI
CPU clock cycles = Instructions for a program
x Average clock cycles per Instruction (CPI)
CPU time = Instruction count x CPI x clock cycle time
rateClock
CPIcountnInstructio
15-447 Computer Architecture Fall 2007 ©
Performance
°Performance is determined by execution time
°Do any of the other variables equal performance?
•# of cycles to execute program?
•# of instructions in program?
•# of cycles per second?
•average # of cycles per instruction?
•average # of instructions per second?
°Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.
15-447 Computer Architecture Fall 2007 ©
CPIi : the average number of cycles per instructions for that
instruction class
Ci : the count of the number of instructions of class i executed.
n : the number of instruction classes.
CPU Clock Cyclesn
i 1 clock cycles ( )i iCPU CPI C
== ´å
15-447 Computer Architecture Fall 2007 ©
Example° Instruction Classes:
• Add
• Multiply
°Average Clock Cycles per Instruction:• Add 1cc
• Mul 3cc
°Program A executed:• 10 Add instructions
• 5 Multiply instructions
15-447 Computer Architecture Fall 2007 ©
QuizAn application using a desktop client and a remote server is limited by network performance. What happens to response time and throughput when:
°An extra network channel is added
°Networking software is upgraded to reduce communications delay
°More memory is added to the desktop computer
15-447 Computer Architecture Fall 2007 ©
Formula Summary° T: Execution Time (seconds)
° C: Total Number of Cycles
° f: Clock Frequency (cycles/second)
° I: (Dynamic) Instruction Count
• Ij: Count for Instructions of type j
• Cj: Cycles per Instruction of type j
T = C / fC = I1 x C1 + … + Ik x Ck
I = I1 + I2 + … + Ik
CPI = C / I
T = (I x CPI) / f
15-447 Computer Architecture Fall 2007 ©
Performance Calculation Example: fact(4)
fact:a. pushl %ebp # Setupb. movl %esp,%ebp # Setupc. movl $1,%eax # eax = 1d. movl 8(%ebp),%edx # edx = x
L11:e. imull %edx,%eax # result *= xf. decl %edx # x—g. cmpl $1,%edx # Compare x:1h. jg L11 # if > repeati. movl %ebp,%esp # Finishj. popl %ebp # Finishk. ret # Finish
f = 1GHzInst Type Inst Cycles
1 imull 5
2 decl 2
3 Other 1
Calculate:T, C, I, & CPI when factis executed with input x = 4
15-447 Computer Architecture Fall 2007 ©
Performance Calculation Example: fact(4) fact:a. pushl %ebpb. movl %esp,%ebpc. movl $1,%eaxd. movl 8(%ebp),%edx
L11:e. imull %edx,%eaxf. decl %edxg. cmpl $1,%edh. jg L11i. movl %ebp,%espj. popl %ebpk. ret
Inst Count Cycles
a
b
c
d
e
f
g
h
i
j
k
Total
Inst Type Inst Cycles
1 imull 5
2 decl 2
3 Other 1
15-447 Computer Architecture Fall 2007 ©
° Performance best determined by running a real application
• Use programs typical of expected workload
• Or, typical of expected class of applicationsex: compilers/editors, scientific applications, graphics
° Small benchmarks
• nice for architects and designers
• easy to standardize
• can be abused
Benchmarks
15-447 Computer Architecture Fall 2007 ©
Benchmarks (2)
° SPEC (Standard Performance Evaluation Corporation)
• companies have agreed on a set of real programs and inputs
• valuable indicator of performance (and compiler technology)
• can still be abused
15-447 Computer Architecture Fall 2007 ©
Standard Performance Evaluation Corporation
• SPEC is supported by a number of computer vendors to create
standard sets of benchmarks for modern computer systems.
• The SPEC benchmark sets include CPU performance, graphics,
High-performance computing, Object-oriented computing, Java
applications, Client-server models, Mail systems, File systems, and
Web servers.
15-447 Computer Architecture Fall 2007 ©
SPEC ‘89
°Compiler “enhancements” and performance
0
100
200
300
400
500
600
700
800
tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc
BenchmarkCompiler
Enhanced compiler
SPEC p
erform
ance
ratio
15-447 Computer Architecture Fall 2007 ©
SPEC CPU Benchmarks
CINT2000 : the SPEC ratio for the integer benchmark sets
CFP2000 : the SPEC ratio for the floating-point benchmark
sets.
computer measured on the timeexecution the
[300MHz]Sun Ultra5 of timeexecution the ratio SPEC
15-447 Computer Architecture Fall 2007 ©
SPEC 2000Does doubling the clock rate double the performance?
Can a machine with a slower clock rate have better performance?
Clock rate in MHz
500 1000 1500 30002000 2500 35000
200
400
600
800
1000
1200
1400
Pentium III CINT2000
Pentium 4 CINT2000
Pentium III CFP2000
Pentium 4 CFP2000
15-447 Computer Architecture Fall 2007 ©
SPEC 2000Does doubling the clock rate double the performance?
Can a machine with a slower clock rate have better performance?
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000
Always on/maximum clock Laptop mode/adaptiveclock
Minimum power/minimumclock
Benchmark and power mode
Pentium M @ 1.6/0.6 GHz
Pentium 4-M @ 2.4/1.2 GHz
Pentium III-M @ 1.2/0.8 GHz
15-447 Computer Architecture Fall 2007 ©
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
Amdahl's Law
15-447 Computer Architecture Fall 2007 ©
Example°Application execution time = 20sec
• 12 seconds are spent performing add operations
° If we improve the add operation to run twice as fast, how much faster will the application run?
15-447 Computer Architecture Fall 2007 ©
Amdahl’s Law° Example:
"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"
15-447 Computer Architecture Fall 2007 ©
Execution time after improvement
Amdahl's Law
seconds) 80100(n
seconds 80
16n ,2080
254
100 t improvemenafter time
n
Execution
15-447 Computer Architecture Fall 2007 ©
MIPS (million instructions per second)
Example
610 timeExecution
countn Instructio
MIPS
Code fromInstruction Counts (in billions)
for each instruction set
A (1 CPI) B (2 CPI) C (3 CPI)
Compiler 1 5 1 1
Compiler 2 10 1 1
Clock rate = 4GHz A,B,C : Instruction Classes
• Which code sequence will execute faster according to MIPS?• According to execution time?
15-447 Computer Architecture Fall 2007 ©
CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 109 = 10 x 109
CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 109 = 15 x 109
Execution time & MIPS
seconds 75.310 4
1015 time2
9
9
Execution
seconds 5.210 4
1010 time1
9
9
Execution
15-447 Computer Architecture Fall 2007 ©
Execution time & MIPS (2)
280010 seconsd 2.5
101)1(5 MIPS
6
9
1
320010 3.75
101)1(10 MIPS
6
9
2
15-447 Computer Architecture Fall 2007 ©
Performance Evaluation
°Performance depends on
• Hardware architecture
• Software environment
°Meaning of performance depends on viewpoint
• User: time
• System Manager: throughput
15-447 Computer Architecture Fall 2007 ©
Performance Evaluation
°Kinds of Performance• Graphics
• Network
• Transactional
• Multi-user system
• I/O
• Scientific/Engineering codes
15-447 Computer Architecture Fall 2007 ©
Example on the MIPS R10K
Prof run at: Tue Apr 28 15:50:26 1998 Command line: prof suboptim.ideal.m28293
109148754: Total number of cycles 0.55974s: Total execution time 77660914: Total number of instructions executed 1.405: Ratio of cycles / instruction 195: Clock rate in MHz R10000: Target processor modelled
cycles(%) cum % secs instrns calls procedure
61901843(56.71) 56.71 0.32 45113360 1 pdot 47212563(43.26) 99.97 0.24 32523280 1 init 31767( 0.03) 100.00 0.00 21523 1 vsum 1069( 0.00) 100.00 0.00 887 3 fflush : : : : : :