1 Lec. 2 Faraday’s Laws Lec. 2 Faraday’s Laws Industrial Electrolytic Processes “Electrolysis”
Lec 2 Performance
-
Upload
himanshuagra -
Category
Documents
-
view
223 -
download
0
Transcript of Lec 2 Performance
-
7/31/2019 Lec 2 Performance
1/28
Processor Performance
Ajit Pal
ProfessorDepartment of Computer Science and Engineering
Indian Institute of Technology Kharagpur
INDIA-721302
High Performance Computer Architecture
-
7/31/2019 Lec 2 Performance
2/28
Outline
Introduction
Defining Performance
The Iron Law of Processor Performance
Processor performance enhancement
Performance Evaluation Approaches
Performance Reporting
Amdahls Law
-
7/31/2019 Lec 2 Performance
3/28
Ajit Pal, IIT Kharagpur
Introduction
Performance measurement is important:
Helps us to determine if one processor (orcomputer) works faster than another
Helps us to know how much performanceimprovement has taken place after incorporatingsome performance enhancement feature
Helps to see through the marketing hype!
Provides answer to the following questions:
Why is some hardware better than others for
different programs? What factors affect system performance?
Hardware, OS or Compiler? How does the machine's instruction set affect
performance?
-
7/31/2019 Lec 2 Performance
4/28
Ajit Pal, IIT Kharagpur
Defining Performance in Terms of Time
Time is the final measure of computer performance
A computer exhibits higher performance if it executes
programs faster
Response Time(elapsed time, latency):
how long does it take for myjob to run? how long does it take to execute (start to
finish) myjob?
how long must Iwait for the database query?
Throughput: how manyjobs can the machine run at once?
what is the averageexecution rate?
how muchwork is getting done?
Individual user
concerns
Systems managerconcerns
-
7/31/2019 Lec 2 Performance
5/28
Ajit Pal, IIT Kharagpur
Execution Time
Elapsed Time
counts everything (disk and memory accesses, waiting for I/O,running other programs, etc.) from start to finish
a useful number, but often not good for comparison purposes
elapsed time = CPU time+ wait time (I/O, other programs, etc.)
CPU time
doesn't count waiting for I/O or time spent running otherprograms
can be divided into user CPU time and system CPU time (OScalls)
CPU time = user CPU time + system CPU timeelapsed time = user CPU time + system CPU time + wait time
Our focus: user CPU time
(CPU execution time or, simply, execution time): time spentexecuting the lines of code that are in our program
-
7/31/2019 Lec 2 Performance
6/28
Ajit Pal, IIT Kharagpur
Measuring Performance
For some program running on machine X:
PerformanceX = 1 / Execution timeX
X is n times faster than Ymeans:
PerformanceX / PerformanceY = n
-
7/31/2019 Lec 2 Performance
7/28Ajit Pal, IIT Kharagpur
The Iron Law of Processor Performance
Processor Performance = ---------------
Time
Program
Architecture --> Implementation --> Realization
Compiler Designer Processor Designer Chip Designer
Instructions Cycles
Program Instruction
Time
Cycle(code size)
= X X
(CPI) (cycle time)
-
7/31/2019 Lec 2 Performance
8/28Ajit Pal, IIT Kharagpur
The Iron Law of Processor Performance
Instructions/Program (Instruction count)Instructions executed, not static code sizeDetermined by algorithm, compiler, ISA
Cycles/Instruction (CPI)Determined by ISA and CPU organizationOverlap among instructions reduces this term
Time/cycle (Cycle time)Determined by technology, organization,clever circuit design
-
7/31/2019 Lec 2 Performance
9/28Ajit Pal, IIT Kharagpur
Processor Performance Enhancement
All processor performance enhancement technique
boils down to reducing one or more of these three terms
Some techniques can be used to reduce one termwithout affecting othersImproved hardware technologyCompiler optimization techniquesSuch type of performance optimization techniquesare preferred
Some techniques can reduce one of the terms, but mayincrease other terms (Inter-related)
CISC ISA reduces instruction count but increases CPILoop unrolling reduces instruction count but increases CPI
-
7/31/2019 Lec 2 Performance
10/28Ajit Pal, IIT Kharagpur
MIPS and MFLOPS
Used extensively 30 years back.
MIPS: millions of instructions processed persecond.
MFLOPS: Millions of Floating-point Operations
completed per Second
MIPS =Exec. Time x 106
Instruction Count
CPI x 106Clock Rate=
-
7/31/2019 Lec 2 Performance
11/28
Ajit Pal, IIT Kharagpur
Problems with MIPS
Three significant problems with using MIPS:
So severe, made some one term: Meaningless Information about Processing Speed
Problem 1:
MIPS is instruction set dependent.
Problem 2:
MIPS varies between programs on the same computer.
Problem 3:
MIPS can vary inversely to performance!
Lets look at an example as to why MIPS doesnt
work
-
7/31/2019 Lec 2 Performance
12/28
Ajit Pal, IIT Kharagpur
A MIPS Example
Consider the following computer:
Code type- A (1 cycle) B (2 cycle) C (3 cycle)
Compiler 1 5 1 1
Compiler 2 10 1 1
Instruction counts (in millions)for each instruction class
The machine runs at 100MHz.
Instruction A requires 1 clock cycle, Instruction B requires2 clock cycles, Instruction C requires 3 clock cycles.
CPIi x Ni
i=1
n
CPI =
Instruction Count
CPU Clock Cycles
Instruction Count
=
-
7/31/2019 Lec 2 Performance
13/28
Ajit Pal, IIT Kharagpur
A MIPS Example
CPI1 =(5 + 1 + 1) x 106
[(5x1) + (1x2) + (1x3)] x 10610/7 = 1.43=
MIPS1 = 1.43
100 MHz
69.9
=
CPI2 =
(10 + 1 + 1) x 106
[(10x1) + (1x2) + (1x3)] x 10615/12 = 1.25=
MIPS2 =1.25
100 MHz80.0=
So, compiler 2 has a higherMIPS rating and should befaster?
count cycles
-
7/31/2019 Lec 2 Performance
14/28
Ajit Pal, IIT Kharagpur
A MIPS Example
Now lets compare CPU time:
CPU Time =Clock Rate
Instruction Count x CPI
= 0.10 secondsCPU Time1 =100 x 106
7 x 106 x 1.43
= 0.15 secondsCPU Time2 =100 x 106
12 x 106
x 1.25
Therefore program 1 is faster despite a lower MIPS!
-
7/31/2019 Lec 2 Performance
15/28
Ajit Pal, IIT Kharagpur
Example: Calculating Overall CPI
Typical Instruction Mix
Operation ISA CPI(i) Freq
ALU 50% 1 (40%)
Load 20% 2 (27%)
Store 10% 2 (13%)
Branch 20% 5 (20%)
Overall CPI= 1*0.4+ 2*0.27+ 2*0.13+5*0.2
= 2.2
-
7/31/2019 Lec 2 Performance
16/28
Ajit Pal, IIT Kharagpur
Five levels of Benchmarks
1. Real ApplicationsExamples: compilers/editors, scientificapplications, graphics, etc.Problem: Portability due to dependence on OS andCompiler2. Modified ApplicationsReal applications modified/tailored to improveportability or to test specific features of CPU3. Kernels
Programs that are much simpler than realapplications
Kernels; small and key pieces of real applicationsExamples: Livermore Loops: 24 loop kernels
Linpack: linear algebra package
Measuring Performance Using Benchmarks
-
7/31/2019 Lec 2 Performance
17/28
Ajit Pal, IIT Kharagpur
Synthetic Benchmarks
4. Toy benchmarks
10 to 100lines of simple programsEasy to type and run on almost all computers
Example: Quick sort, Merge sort, etc.
5. Synthetic Benchmarks
Basic Principle: Analyze the distribution of instructions
over a large number of practical programs.
Synthesize a program that has the same
instruction distribution as a typical program: Need not compute something meaningful.
Dhrystone, Khornerstone, Linpack are some of the older
synthetic benchmarks
-
7/31/2019 Lec 2 Performance
18/28
Ajit Pal, IIT Kharagpur
SPEC
Recently used popular approach is to put together
collections of benchmarks measuring performanceof a variety of applications
SPEC:System Performance Evaluation Cooperative:
A non-profit organization (www.spec.org) CPU-intensive benchmark for evaluating processor
performance of workstation:
Generations: SPEC89, SPEC92, SPEC95, and
SPEC2000
Emphasizing memory system performance in
SPEC2000.
-
7/31/2019 Lec 2 Performance
19/28
Ajit Pal, IIT Kharagpur
SPEC
Sponsored by industry but independent and self-managed trusted by code developers and machine
vendors
Clear guides for testing, see www.spec.org
Regular updates (benchmarks are dropped and new
ones added periodically according to relevance)
Specialized benchmarks for particular classes of
applications
Can still be abused, by selective optimization!
http://www.spec.org/http://www.spec.org/ -
7/31/2019 Lec 2 Performance
20/28
Ajit Pal, IIT Kharagpur
SPEC History
First Round: SPEC CPU89
10 programs yielding a single number Second Round: SPEC CPU92
SPEC CINT92 (6 integer programs) and SPEC CFP92 (14floating point programs)
compiler flags can be set differently for different programs
Third Round: SPEC CPU95 new set of programs: SPEC CINT95 (8 integer programs)
and SPEC CFP95 (10 floating point) single flag setting for all programs
Fourth Round: SPEC CPU2000 new set of programs: SPEC CINT2000 (12 integer
programs) and SPEC CFP2000 (14 floating point) single flag setting for all programs programs in C, C++, Fortran 77, and Fortran 90
-
7/31/2019 Lec 2 Performance
21/28
Ajit Pal, IIT Kharagpur
CINT2000
Program Language What It Is
164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing: Chess
197.parser C Word Processing
252.eon C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.Bzip C Compression
300.twolf C Place and Route Simulator
(Integer component of SPEC CPU2000)
-
7/31/2019 Lec 2 Performance
22/28
Ajit Pal, IIT Kharagpur
(Floating point component of SPEC CPU2000)
Program Language What It Is
168.wupwise Fortran 77 Physics / Quantum Chromodynamics
171.swim Fortran 77 Shallow Water Modeling
172.Mgrid Fortran 77 Multi-grid Solver: 3D Potential Field
173.applu Fortran 77 Parabolic / Elliptic Differential Equations
177.mesa C 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid Dynamics
179.art C Image Recognition / Neural Networks
183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing: Face Recognition
188.ammp C Computational Chemistry189.Luca Fortran 90 Number Theory / Primality Testing
191.fma3d Fortran 90 Finite-element Crash Simulation
200.sixtrack Fortran 77 High Energy Physics Accelerator Design
301.apsi Fortran 77 Meteorology: Pollutant Distribution
CFP2000
-
7/31/2019 Lec 2 Performance
23/28
Ajit Pal, IIT Kharagpur
SPEC CPU2000 Reporting
Refer SPEC website www.spec.org for
documentation Any measure that summarizes performance
should reflect Execution time
Single number result Arithmetic mean orgeometric mean of normalized ratios for each
code in the suite
Weighted arithmetic mean summarizes
performance while tracking execution time
Report precise description of machine (platform)
Report compiler flag setting
http://www.spec.org/http://www.spec.org/ -
7/31/2019 Lec 2 Performance
24/28
Ajit Pal, IIT Kharagpur
Amdahls Law
Quantifies overall performance gain due to improve
in a part of a computation.
Amdahls Law:
Performance improvement gained from using
some faster mode of execution is limited by theamount of time the enhancement is actually used.
Speedup=Execution time for the task without enhancement
Execution time for a task using enhancement
-
7/31/2019 Lec 2 Performance
25/28
Ajit Pal, IIT Kharagpur
Amdahls Law and Speedup
Speedup tells us: How much faster a machine will run due to an
enhancement.
For using Amdahls law two things should beconsidered:
1st: Fraction of the computation time in theoriginal machine that can use the enhancement If a program executes in 30 seconds and 15
seconds of exec. uses enhancement, fraction=
2nd: Improvement gained by enhancement If enhanced task takes 3.5 seconds and
original task took 7secs, we say the speedupis 2.
-
7/31/2019 Lec 2 Performance
26/28
Ajit Pal, IIT Kharagpur
Amdahls Law Equations
Execution timenew = Execution timeold x (1 Fractionenhanced) +
Fractionenhanced
Speedupenhanced
Speedupoverall =Execution Timeold
Execution Timenew
=
(1 Fractionenhanced) +Fractionenhanced
Speedupenhanced
1
Dont just try to memorizethese equations and plug numbers into them.
Its always important to think about the problem too!
Use previous equation,Solve for speedup
-
7/31/2019 Lec 2 Performance
27/28
Ajit Pal, IIT Kharagpur
Points to Remember
Processor performance
Terms are inter-related
Minimize time, which is the product, NOT
isolated terms
Use of Benchmark Suite to measureperformance
Repoting by a single number
Instructions Cycles
Program Instruction
Time
Cycle
(code size)
= X X
(CPI) (cycle time)
-
7/31/2019 Lec 2 Performance
28/28
Ajit Pal IIT Kharagpur
Thanks!