1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

44
1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number:

Transcript of 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

Page 1: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

1

COMS 361Computer Organization

Title: Performance

Date: 10/02/2004

Lecture Number: 3

Page 2: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

2

Announcements

• Homework 1– Due next Tuesday 9/07/04

Page 3: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

3

Review

• Computer System Organization• Instruction Set Architecture (ISA)

– Important from the programmers point of view– Allows families of processors

• Hardware Set Architecture (HSA)– Defines operational components and their interconnections– An implementation of an ISA

• Computer Organization– The implementation of an HSA

Page 4: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

4

Outline

• Trends• Performance

– Measures– Comparisons

Page 5: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

5

VLSI Trends: Moore’s Law

• In 1965, Gordon Moore predicted that transistors would continue to shrink, allowing:– Doubled transistor density every 24 months– Doubled performance every 18 months

• History has proven Moore right

• But, is the end in sight?– Physical limitations– Economic limitations

Gordon MooreIntel Co-Founder and Chairmain Emeritus

Page 6: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

6

Microprocessor Trends (Log Scale)

Page 7: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

7

Microprocessor Trends (Intel)

Year Chip L transistors

1971 4004 10µm 2.3K

1974 8080 6µm 6.0K

1976 8088 3µm 29K

1982 80286 1.5µm 134K

1985 80386 1.5µm 275K

1989 80486 0.8µm 1.2M

1993 Pentium® 0.8µm 3.1M

1995 Pentium® Pro 0.6µm 15.5M

1999 Mobile PII 0.25µm 27.4

2000 Pentium® 4 0.18µm 42M

2002 Pentium® 4 (N) 0.13µm 55M

2003 Itanium® 2 (M) 0.13µm 410M

Page 8: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

8

Microprocessor Trends

Alpha (R.I.P)

P4N, G5

0

50

100

150

200

250

300

350

400

450

1970 1975 1980 1985 1990 1995 2000 2005

Tra

nsi

sto

rs (

Mill

ion

s)

Intel

Motorola

DEC/Compaq

IBM

I2M

Page 9: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

9

Microprocessor Trends (Log Scale)

0.001

0.01

0.1

1

10

100

1960 1970 1980 1990 2000

Tra

nsis

tors

(M

illi

on

s)

Intel

Motorola

DEC/Compaq

Alpha (R.I.P)

P4N, G5

G4

I2M

Page 10: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

10

DRAM Memory Trends (Log Scale)

0.0625

0.25

1

4

16

64128

256512

0.01

0.1

1

10

100

1000

1975 1980 1985 1990 1995 2000 2005

Size (Mb)

Page 11: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

11

Performance Trends

Vax 11/780

Page 12: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

12

Summary - Technology Trends

• Processor– Logic capacity increases ~ 30% per year– Clock frequency increases ~ 20% per year– Cost per function decreases ~20% per year

• Memory– DRAM capacity: increases ~ 60% per year

(4x every 3 years)– Speed: increases ~ 10% per year– Cost per bit: decreases ~25% per year

• Disk– Storage capacity increases ~60% per year

Page 13: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

13

Performance

• Measure, Report, and Summarize– Understand major factors of performance– Software performance is a function of

• Which instructions are used• Execution time of the instructions used

Page 14: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

14

Performance

• Performance measurement can be difficult– Complex software– Hardware performance enhancements– Different applications may perform differently on the same

hardware• Graphics application• Computational• Database

• Impossible to analytically compute a systems performance

Page 15: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

15

Performance

• An important attribute of a system– Purchasers– Designers– Sales people

• Stretch the truth• Performance claims may be meaningless for your application

• Focus– Why software performs as it does– How does the instruction set affect performance– What hardware features are responsible for improved

performance

Page 16: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

16

Performance: A Matter of Perspective

• What does this mean?– One computer performs better than another?

• Subtle question• Difficult to answer without an accurate meaning of the term

performance

• Example of a performance issue

Page 17: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

17

Performance: A Matter of Perspective

• Which plane has the best performance?– Concord: fastest– DC-8: longest range– 747: largest capacity

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544

Page 18: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

18

Performance: A Matter of Perspective

• Consider speed as a performance measure– Speed is still not precise enough– Concord: fastest plane for a individual– 747: fastest plane for transporting 450 people

• Same program running on two computers– First machine done is the fastest, right?– Shared system:

• The most tasks completed in the least amount of time has the better performance

– Response time: individual user– Throughput: system manager

Page 19: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

19

Performance Equation

• Selfish motivation:– Focus on response time

• Often faster response time means more throughput

– Maximize performance by minimizing response time

– Performance of machine X is better than the performance of machine Y if:

xx timeExecution

1 ePerformanc

yx ePerformanc ePerformanc

yx timeExecution

1

timeExecution

1

xy timeExecution timeExecution

Page 20: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

20

Performance Equation

• Performancex > Performancey

• Means the Execution time(x) < Execution (y)

• How much better one machine wrt another– relative performance measure of two machines

– If X is n times faster than Y, the execution time on Y is n times longer than on X

n ePerformanc

ePerformanc

y

x

n timeExectution

timeExecution

x

y

Page 21: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

21

Measuring Performance

• Time to execute a program is our measure• Which time?

– Obvious measures:• Wall-clock time, response time, or elapsed time• The total time for task completion

• Time-sharing (common)– Elapsed time – Time to complete our task

• CPU time– The time the CPU spends on our task

• User time (time executing the instructions in our program)• System time (OS things: opening files, networking, …)

Page 22: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

22

Measuring Performance

• Use the time command to determine– user time– system time– elapsed time

• Combine user and system time for a macroscopic measure of performance

• Designers– How fast can the hardware perform basic functions

• Moving data, adding, multiplying

– Most computers have a constant speed clock that determines when events happen

• Clock– Ticks, cycles, period, rate, …– Heartbeat of the processor

Page 23: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

23

Measuring Performance

• Clock rate – Measured as the number of clock cycles per unit time

t

1 clock cycle

clock period (measured in time)or

periodclock

1 rateclock

0.25 ns, 250 ps

4 GHz

Page 24: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

24

Orders of Magnitude

thousandths 10-3 milli, m

millionths 10-6 micro, u

billionths 10-9 nano, n

trillionths 10-12 pico, p

• Disk access: ms• Memory access: us• Clock cycle time: ns, ps

Page 25: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

25

Orders of Magnitude

thousands 103 kilo, K

millions 106 mega, M

billions 109 giga, G

trillions 1012 tera, T

• Size of L1 cache: k• Size of Main memory: M, G• Size of a disk drive: G• Size of a disk farm: T

Page 26: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

26

Orders of Magnitude

Scientific Units Computer Science Units

103 = 1,000 thousands kilo, K 210 = 1024

106 = 1,000,000 millions mega, M 220 = 1,048,576

109 = 1,000,000,000 billions giga, G 230 = 1,073,741,824

1012 = 1,000,000,000,000 trillions tera, T 240 = 1,099,511,627,776

Page 27: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

27

Clock/execution time

• Relation between clock rate and execution time– Tells what happens to execution time if the rate changes

• Increase performance– Reduce the number of clocks to execute the program– Reduce the clock period

• Increase the clock rate

periodclock * program

cyclesclock ofnumber timeCPU

rateclock

cyclesclock ofnumber timeCPU

Page 28: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

28

Example

• Our favorite program runs in 10 seconds on computer A, which has a 400 Mhz clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?

• Don't Panic, can easily work this out from basic principles

Page 29: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

29

Example

• A given program will require

– some number of instructions (machine instructions)

– some number of cycles

– some number of seconds

• We have a vocabulary that relates these quantities:

– cycle time (seconds per cycle)

– clock rate (cycles per second)

– CPI (cycles per instruction)

a floating point intensive application might have a higher CPI

– MIPS (millions of instructions per second)

this would be higher for a program using simple instructions

Page 30: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

30

Performance

• Performance is determined by execution time• Do any of the other variables equal performance?

– # of cycles to execute program?– # of instructions in program?– # of cycles per second?– average # of cycles per instruction?– average # of instructions per second?

• Common pitfall: thinking one of the variables is indicative of performance when it really isn’t

Page 31: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

31

• Suppose we have two implementations of the same instruction set Architecture (ISA).

• For some program,

• Machine A has a clock cycle time of 10 ns. and a CPI of 2.0

• Machine B has a clock cycle time of 20 ns. and a CPI of 1.2

CPI Example

Page 32: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

32

• What machine is faster for this program, and by how much?

• If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

CPI Example

Page 33: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

33

• A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively)

• The first code sequence has 5 instructions:– 2 of A, 1 of B, and 2 of C

• The second sequence has 6 instructions:– 4 of A, 1 of B, and 1 of C

# of Instructions Example

Page 34: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

34

• Which sequence will be faster?• How much?• What is the CPI for each sequence?

# of Instructions Example

Page 35: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

35

• Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software.

The first compiler's code uses– 5 million Class A instructions– 1 million Class B instructions– 1 million Class C instructions

MIPS example

Page 36: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

36

• The second compiler's code uses– 10 million Class A instructions– 1 million Class B instructions– 1 million Class C instructions

• Which sequence will be faster according to MIPS?• Which sequence will be faster according to

execution time?

MIPS example

Page 37: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

37

• Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications

e.g., compilers/editors, scientific applications, graphics, etc.

• Small benchmarks– nice for architects and designers– easy to standardize– can be abused

Benchmarks

Page 38: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

38

• SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real program and

inputs– can still be abused (Intel’s “other” bug)– valuable indicator of performance (and compiler

technology)

Benchmarks

Page 39: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

39

SPEC ‘89

• Compiler “enhancements” and performance

0

100

200

300

400

500

600

700

800

tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

BenchmarkCompiler

Enhanced compiler

SP

EC

pe

rfo

rman

ce r

atio

Page 40: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

40

SPEC ‘95

Benchmark Description

go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program

tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation

Page 41: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

41

SPEC ‘95

Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

Clock rate (MHz)

SP

EC

int

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Pentium

Pentium Pro

PentiumClock rate (MHz)

SP

EC

fp

Pentium Pro

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Page 42: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

42

Execution Time After Improvement =

Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

• Example:

"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"

How about making it 5 times faster?

• Principle: Make the common case fast

Amdahl's Law

Page 43: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

43

• Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions?

• We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?

Example

Page 44: 1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.

44

• Performance is specific to a particular program/s– Total execution time is a consistent summary of performance

• For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI– compiler enhancements that lower CPI and/or instruction count

• Pitfall: expecting improvement in one aspect of a machine’s

performance to affect the total performance

• You should not always believe everything you read! Read carefully!

(see newspaper articles, e.g., Exercise 2.37)

Remember