1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture? To learn the...

59
1/16/99 CS520S99 Introduction C. Edw ard Cho w Pag Why study computer architecture? To learn the principles for designing processors and systems To learn the system configuration trade-off what size of caches/memory is enough what kind of buses to connect system components what size (speed) of disks to use To choose a computer for a set of applications in a project. To interpret the benchmark figures given by salespersons. To decide which processor chips to use in a system To design the system software (compiler, OS) for a new processor? To be the leader of a processor design team?

Transcript of 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture? To learn the...

Page 1: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 1

Why study computer architecture? To learn the principles for designing processors and

systems To learn the system configuration trade-off

what size of caches/memory is enough what kind of buses to connect system components what size (speed) of disks to use

To choose a computer for a set of applications in a project. To interpret the benchmark figures given by salespersons. To decide which processor chips to use in a system To design the system software (compiler, OS) for a new

processor? To be the leader of a processor design team? To learn several machine’s assembly languages?

Page 2: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 2

The Basic Structure of a Computer

Page 3: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 3

Control and Data Flow in Processor

Processor is made up of Data operator (Arithmetic and Logic Unit, ALU)—D

consumes and combines information into a new meaning Control—K

evokes operations of other components

Page 4: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 4

Control is often distributed

Page 5: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 5

Instruction Execution at Register Transfer Level (RTL)

• Consider the detailed execution of the instruction “move &100, %d0” (Moving constant 100 to register d0)

• Assume the instruction was loaded into memory location 1000

• The op code of the move instruction and the register address d0 are encoded in byte1000 and 1001

• The constant 100 in byte 1002 and 1003.

Page 6: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 6

RTL Instruction Execution• Mpc is set to 1000 pointing at instruction in the meory• Step 1: Mmar = Mpc; // put pc into mar; prepare to fetch instruction.

1000

Page 7: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 7

Update Program Counter• Step 2: Mpc = Mpc+4; // update program counter;

move Mpc value to D, D perform +4, move result back to Mpc

1000+2

1000

1002

Page 8: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 8

Instruction Fetch• Step 3: Mir = Mp[Mmar]; // fetch instruction

send Mmar value to Mp, Mp retrieve move|d0, send back to Mir

Steps3 and 2 can be donein parallel.

1000

Move|d0100

Page 9: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 9

Instruction Decoding• Step 4: Decode Instruction in Mir

Move|d0100

Page 10: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 10

RTL Instruction Execution• Step 5: Mgeneral[0] = Mp[Mir16-31];// execute the move of

the constant into a general register named d0

Move|d0100

100Subscript 16-31 denotesthe 16th and 31th bits

containing constant 100

Page 11: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 11

Computer Architecture

The term “computer architecture” was coined by IBM in 1964 for use with IBM 360. Amdahl, Blaauw, and Brooks [1964] used the term to refer to the programmer-visible portion of the instruction set. They believe that a family of machines of the same architecture should be able to run the same software.

Benefits:• With a precise defined architecture, we can have

many compatible implementations.• The program written in the same instruction set can

run in all the compatible implementations.

Page 12: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 12

Architecture & Implementation

• Single Architecture—multiple implementation computer family

• Multiple Architecture—single implementation microcode emulator

Page 13: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 13

Computer Architecture Topics

Instruction Set Architecture

Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, DSP

Addressing,Protection,Exception Handling

L1 Cache

L2 Cache

DRAM

Disks, WORM, Tape

Coherence,Bandwidth,Latency

Emerging TechnologiesInterleavingBus protocols

RAID

VLSI

Input/Output and Storage

MemoryHierarchy

Pipelining and Instruction Level Parallelism

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 14: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 14

Computer Architecture Topics

M

Interconnection NetworkS

PMPMPMP° ° °

Topologies,Routing,Bandwidth,Latency,Reliability

Network Interfaces

Shared Memory,Message Passing,Data Parallelism

Processor-Memory-Switch

MultiprocessorsNetworks and Interconnections

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 15: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 15

CS 520 Course Focus

Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century

Technology ProgrammingLanguages

OperatingSystems History

ApplicationsInterface Design

(ISA)

Measurement & Evaluation

Parallelism

Computer Architecture:• Instruction Set Design• Organization• Hardware

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 16: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 16

Function Requirements faced by a computer designer

• Applications – general purpose

balanced performance for a range of tasks– Scientific

high performance floating points– Commercial

support for COBOL (decimal arithmetic)database/transaction processing

• Level of software compatibility– Object code/binary level

no software porting, more hw design cost– Programming Lang. Level

avoid old architecture burden, require software porting

Page 17: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 17

Function Requirements faced by a computer designer

• Operating System Requirements– Size of address space– Memory management/Protection

(e.g. garbage collection vs. realtime scheduling)– Interrupt/traps

• Standards– Floating Point (IEEE754)– I/O Bus– OS– Networks– Programming Languages

Page 18: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 18

1988 Computer Food Chain

PCWork-stationMini-

computer

Mainframe

Mini-supercomputer

Supercomputer

Massively Parallel Processors

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 19: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 19

1998 Computer Food Chain

PCWork-station

Mainframe

Supercomputer

Mini-supercomputerMassively Parallel Processors

Mini-computer

Now who is eating whom?

Server

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 20: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 20

Why Such Change in 10 years?• Performance

– Technology Advances• CMOS VLSI dominates older technologies (TTL, ECL) in cost AND

performance

– Computer architecture advances improves low-end • RISC, superscalar, RAID, …

• Price: Lower costs due to …– Simpler development

• CMOS VLSI: smaller systems, fewer components

– Higher volumes• CMOS VLSI : same dev. cost 10,000 vs. 10,000,000 units

– Lower margins by class of computer, due to fewer services• Function

– Rise of networking/local interconnection technology

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 21: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 21

Year

Transistors

1000

10000

100000

1000000

10000000

100000000

1970 1975 1980 1985 1990 1995 2000

i80386

i4004

i8080

Pentium

i80486

i80286

i8086

Technology Trends: Microprocessor Capacity

CMOS improvements:• Die size: 2X every 3 yrs• Line width: halve / 7 yrs

“Graduation Window”

Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

Moore’s Law

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 22: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 22

Memory Capacity (Single Chip DRAM)size

Year

Bits

1000

10000

100000

1000000

10000000

100000000

1000000000

1970 1975 1980 1985 1990 1995 2000

year size(Mb) cycle time

1980 0.0625 250 ns

1983 0.25 220 ns

1986 1 190 ns

1989 4 165 ns

1992 16 145 ns

1996 64 120 ns

2000 256 100 ns

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 23: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 23

Technology Trends(Summary)

Capacity Speed (latency)

Logic 2x in 3 years 2x in 3 years

DRAM 4x in 3 years 2x in 10 years

Disk 4x in 3 years 2x in 10 years

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 24: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 24

Processor Performance Trends

Year

0.1

1

10

100

1000

1965 1970 1975 1980 1985 1990 1995 2000

Microprocessors

Minicomputers

Mainframes

Supercomputers

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 25: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 25

Processor Performance(1.35X before, 1.55X now)

0

200

400

600

800

1000

1200

87 88 89 90 91 92 93 94 95 96 97

DEC Alpha 21264/600

DEC Alpha 5/500

DEC Alpha 5/300

DEC Alpha 4/266IBM POWER 100

DEC AXP/500

HP 9000/750

Sun-4/260

IBMRS/6000

MIPS M/120

MIPS M

2000

1.54X/yr

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 26: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 26

Performance Trends(Summary)

• Workstation performance (measured in Spec Marks) improves roughly 50% per year (2X every 18 months)

• Improvement in cost performance estimated at 70% per year

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 27: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 27

Computer Engineering Methodology

TechnologyTrends

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 28: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 28

Computer Engineering Methodology

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

TechnologyTrends

Benchmarks

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 29: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 29

Computer Engineering Methodology

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizations

TechnologyTrends

Benchmarks

WorkloadsAdapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 30: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 30

Computer Engineering Methodology

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizations

Implement NextImplement NextGeneration SystemGeneration System

TechnologyTrends

Benchmarks

Workloads

ImplementationComplexity

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 31: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 31

Measurement and Evaluation

Design

Analysis

Architecture is an iterative process:• Searching the space of possible designs• At all levels of computer systems

Creativity

Good IdeasGood Ideas

Mediocre IdeasBad Ideas

Cost /PerformanceAnalysis

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 32: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 32

Measurement Tools

• Benchmarks, Traces, Mixes• Hardware: Cost, delay, area, power estimation• Simulation (many levels)

– ISA, RT, Gate, Circuit• Queuing Theory• Rules of Thumb• Fundamental “Laws”/Principles

Page 33: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 33

Metric of Computer Architecture

• Space measured in bits of representation • Time measures in bit traffic (memory bandwidth)Many old frequency and benchmark studies focus on• dynamic opcode (memory size concern)• exponent differences of floating point operands (precision)• length of decimal numbers in business files (memory size)Trend: space is not much a concern; speed/time is everything.• Here we focus more on the following two performance metrics• Response time = time between start and finish of an event

— execution time— latency

• Throughput = total amount of work done in a given time— bandwidth (no. of bits or bytes moved per second)

Page 34: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 34

Metrics of Performance at Different Levels

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per monthOperations per second

Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

Page 35: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 35

Quantitative principles

Improve means• increase performance• decrease execution time

“X is n% faster than Y”

Quantitative principles• Make the common case fast

— Amdahl’s Law• Locality of reference

— 90% of execution time in 10% of code

1001

n

ximeExecutionT

yimeExecutionT

Page 36: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 36

Amdahl’s LawLaw of diminishing returns

ancementimeWithEnhExecutionT

tEnhancemenimeWithoutExecutionTSpeedup

enhancedModSpeedupOfE

deEnhancedMoFractionIndeEnhancedMoFractionInTimeTime oldnew )1(

enhancedModSpeedupOfEdeEnhancedMoFractionIn

deEnhancedMoFractionInTime

TimeSpeedup

new

old

)1(

1

50

FractionInEnhancedMode=0.5 based on old systemSpeedupOfEnhancedMode=2

50 50 25Timeold Timenew

Page 37: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 37

Amdahl’s Law Result

10020.99

101.90.9

3.331.50.7

21.330.5

1.41.150.3

1.11.050.1

OverallSpeedup When SpeedupOfEnhancedMode=

OverallSpeedup When SpeedupOfEnhancedMode=2

FractionIn Enhancedmode

Page 38: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 38

Apply Amdahl’s Law: Example 1

Example1: Assume that the memory access accounts for 90% of the execution time. What is the speedup by replacing a 100ns memory with a 10ns memory? How much fast is the new system?

Answer:FractionInEnhancedMode = 90%=0.9SpeedupOfEnhancedMode = 100ns/10ns = 10

The new system is 426% faster than the old one.Is it worthwhile if the high speed memory costs 10 times more?

100

426126.5

19.0

1

09.01.0

1

109.0

1.0

1

rallSpeedupOve

Page 39: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 39

Apply Amdahl’s Law: Example 2

Example 2: Assume that 40% of the time is spent on CPU task; the rest is spent on I/O. Assume we improve CPU and keep I/O speed unchanged.

a) How much faster should new CPU be to have the overall speedup of 1.5?

b) Is that possible to have an overall speedup of 2? Why?

Solution:

a) x=6. 500% faster

b) The maximum overall speedup that can be achieved is

Therefore, it is not possible to achieve the overall speedup of 2.

x4.0

)4.01(

15.1

66.14.01

1

Page 40: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 40

Apply Amdahl’s Law: Example 3Example: A recent research on the bottleneck of a 10Mbps Ethernet

network system showed that only 10% of the execution time of a distributed application was spent on transmitting messages and 90% of the time was on application/ protocol software execution at hosts’ computers. If we replace Ethernet with 100 Mbps FDDI, 900% faster than Ethernet, what will be speedup of this improvement? What if we use 900% faster hosts?

Page 41: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 41

Excution TimeThe first performance metric and the best metric.Measure the time it takes to execute the intended application(s) or the

typical workload. The time command can measure an application.vlsia[93]: time ts9217.1u 27.2s 8:16 49% 0+27552k 6+3io 26pf+0wHere is an example which shows how OS and I/O impact the

execution time.

For program 1,Elapsed Time = sum(t1:t11)-t6-t8 System CPU time = t1+t3+t5+t9+t11

CPU time = t1 + t3 + t4 + t5 + t9+t10 User CPU time = t4 + t10

Page 42: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 42

CPU Time

CPI=(Clock cycles Per Instruction); Ii is the frequency of instruction i in a program; IC=Instruction Count.; ClockCycleTime=1/ClockRate

CPI figure gives insight into different styles of instruction sets & implementations.

Interdependence among instruction count, CPI, and Clock rateClock rate—Hardware technology and organizationCPI—Organization and instruction set architectureInstruction count—Instruction set architecture and compiler technology

We cannot measure the performance of a computer by single factor above alone.

Page 43: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 43

Evaluating Instruction Set Design

Example Page 39: 1/4 of ALU and Load instructions replaced by new r->m inst. Assume that the clock cycle time is not changed. Is this a good idea?

212.1%New r->m

326.9%224%Braches

213.5%212%Stores

211.4%221%Loads

136.1%143%ALU ops

ClockCycleFrequencyAfter

ClockcycleFrequencyBefore

Page 44: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 44

Evaluate Instruction Design

CPIold = (0.43*1 + 0.21*2 + 0.12*2 + 0.24*2) = 1.57

CPU timeold = InstructionCountold * 1.57 * ClockCycleTimeold

CPInew==1.908

CPU timenew = (0.893*InstructionCountold) * 1.908 * ClockCycleTimeold

= 1.703 * InstructionCountold * ClockCycleTimeold

With the assumptions, it is a bad idea to add register-memory instructions.

43.0*25.01

3*24.02*12.02*)43.0*25.0(2*)43.0*25.0(21.0(1*))43.0*25.0(43.0(

Page 45: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 45

Estimate CPU time by (CPIi*InstructionCounti)*ClockCycleTime

Program: f=(a-b)/(c-d*e)MIPS R2000 25MHzInstructions (op dst, src1,

src2) lw $14, 20($sp) lw $15, 16($sp)

subu $24, $14, $15lw $25, 8($sp)lw $8, 4($sp)mul $9, $25, $8lw $10, 12($sp)subu $11, $10, $9div$12, $24, $11sw $12, 0($sp)

IC=InstructionCount=10CPI=ClockcyclesPerInstructionCPIi=ClockcyclesOfInstructionType iIi=number of Instructions of type i

in a prog.ClockCycleTime

=1/ClockRate=1/25*106

=40*10-9sec=40nsec

CPIi can be obtained from processor handbook.

Here we assume no cache misses.

Page 46: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 46

Estimate CPU time by ClockCycleTime*CPIi*InstructionCounti)

16

221sw5

111div4

111mul3

212subu2

1025lw1

CPIi*ICiCPIiIiCount

Instruction Type

i

CPU Time = 16*40 nsec = 640 nsec

Page 47: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 47

Other Performance MeasuresThe only reliable measure of performance is the execution time of real

programs. Other attempts:1.

• Depends on instruction set, hard to compare,• MIPS varies with programs on the same computer.Example1: the impact of using Floating Point Hardware on MIPS.Example2: Impact of optimizing compiler usage on MIPS.

What affects performance?

• input• version of programs, compiler, OS, CPU• optimizing level of compiler• machine configurations

— amount of cache, main memory, disks— the speed of cache, main memory, disks, and bus.

66 1010

imeExecutionT

nCountInstructio

CPI

ClockRateMIPS

Page 48: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 48

Myth of MIPSExample: The effect of optimizing compiler on MIPS number. (Page45)

A machine with the 500MHz clock rate and the following clock cycles for instructions. For a program, the relative frequencies of instructions before and after using an optimizing compiler are as shown in the table.

48248Branches

24224Stores

42242Loads

43186ALU ops

IC After Optimization

CPIiIC Before Optimization

InstructionType

CPI unoptimized = 86/200*1+42/200*2+24/200*2+48/200*2=1.57MIPS unoptimized = 500/(1.57*106)=318.5CPI optimized = 43/157*1+42/157*2+24/157*2+48/157*2=1.73MIPS optimized = 500/(1.73*106)=289.0CPU time unoptimized = 200*1.57*(2*10-9) = 6.28*10-7

CPU time optimized = 157*1.73*(2*10-9) = 5.43*10-7

Page 49: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 49

MFLOPS

For scientific computing MFLOPS is used as a metric:

Here it emphasizes operations instead of instructions.• Unfortunately, the set of floating-point operations is not

consistent across machines.• The rating changes with different mix ratio of integer-floating or

floating-floating instructions.The solution is to use a canonical number of floating point

operations for certain type of FP operations, e.g. 1 for (add, sub, compare, mul), 4 for (fdiv, fsqrt), 8 for (arctan, sin, exp)

610

Pr

imeExecutionT

ogramationInANoOfFPOperMFLOPS

Page 50: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 50

Programs to Evaluate Performance

Real programs — The set of programs to be run forms the workload.

Kernels — key pieces of real programs; isolate features of a machines; Livermore Loops (weighted ops); Linpack

Toy Benchmarks — 10 to 100 lines of codes: e.g., quicksort, Sieve, Puzzle

Synthetic Benchmarks — artificially created to match an average execution profile: e.g., Whetstone, Dhrystone

SPEC (System Performance Evaluation Cooperation) Benchmarks 89, 92, 95.

Perfect Club Benchmarks for parallel computations.

Page 51: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 51

SPEC: System Performance Evaluation Cooperative Benchmark

• First Round 1989: 10 programs yielding a single number (“SPECmarks”)• Second Round 1992: SPECInt92 (6 integer programs) and SPECfp92 (14

floating point programs)– Compiler Flags unlimited. March 93 of DEC 4000 Model 610:

spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy(b,a,c)”

wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas

• Third Round 1995– new set of programs: SPECint95 (8 integer programs) and SPECfp95

(10 floating point) – “benchmarks useful for 3 years”– Single flag setting for all programs: SPECint_base95, SPECfp_base95

Page 52: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 52

Comparison of Machine PerformanceSingle Program—execution timeCollection of (n) Programs1. Total execution time2. Normalized to a reference machine, compute the TimeRatio of

ith program TimeRatioi=Timei/Timei(ReferenceMachine)

arithmetic mean=

geometric mean=

harmonic mean=

Geometric mean is consistent independent of referenced machine.Harmonic mean decrease impact of outliers.

n

iiTimeRatio

n 1

1

n

n

iiTimeRatio

1

n

iiTimeRatio

n

1

Page 53: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 53

Summarize Performance ResultsExample: Execution of two programs on three machines. Assume

Program 1 has 10M floating point operations and Program 2 has 50M floating point operations

50/20=2.550/50=150/100=0.5Native MFLOPS on Program 2

Geometric Mean

(0.5+2.5)/2=3(1+1)/2=1(10+0.5)/2=5.25Arithmetic Mean

10/20=0.510/10=110/1=10Native MFLOPS on Program 1

4060101TotalTime(sec)

2050100Program2(sec)

20101Program1(sec)

ComputerCComputerBComputerA

24.25.010 111 12.15.25.0

Page 54: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 54

Weighted Arithmetic Means

• For a set of n program, each takes Timei on one machine, the “equal-time” weights on that machine

are

Figure 1.12W(3) [W(2)] are equal-time weights based on machineA [B]. This is used in Exercise 1.11

n

jTime

Timei

j

iw

1

1

1a b c w(1) w(2) w(3)

P1(sec) 1 10 20 0.5 0.909 0.999P2(sec) 1000 100 20 0.5 0.091 0.001

AM:W(1) 500.5 55 20      

AM:W(2) 91.82 18.18 20      

AM:W(3) 1.998 10.09 20      

Page 55: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 55

Hints for Homework # 1

Exercise 1.7:1. Whetstone consists of integer operations besides the floating-

point operations.2. When floating point processor is not used, all floating-point

operations need to be emulated by integer operations (e.g. shift, and, add, sub, multiply, div...).

3. For different co-fp processors, we will have the same # of integer ops but different # of FP ops.

Exercise 1.11: a. use the equal-time weightings formula in Page 26.b. DEC3000 execution time(ora)

= VAX11 780Time(ora)/ DEC3000SPECRatio=7421/165

Page 56: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 56

FP Compilation Results depend on existence of FP coprocessor

Exercise 1.7. Whetstone is a benchmark with both Integer and Floating Point (FP) operations.

Page 57: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 57

Compiling floating-point statement

Here are the generated assembly instructions of a floating-point operation statement in C on DEC3100 (with R2010 floating point unit) using command cc -S

Note that since the R2010 only implements simple floating point add, sub, mult, and div operations, sqrt, exp, and alog are translated as subroutine calls using jal instr. The floating-point division is translated as div.d and will be executed by R2010.

# 7 x=sqrt(exp(alog(x)/t1)); s.d $f4, 48($sp) #load x to fp register f4 l.d $f12, 56($sp) #load t1 to fp register f12 jal alog #call subroutine alog move $16, $2 mtc1 $16, $f6 cvt.d.w $f8, $f6 #f8 contains alog(x) l.d $f10, 48($sp) div.d $f12, $f8, $f10 jal exp mov.d $f20, $f0 mov.d $f12, $f20 jal sqrt s.d $f0, 56($sp)

Page 58: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 58

Homework #1Problems 1.7 and 1.11Problem A. Program segment: f=(a-b)/(a*b) is compiled into the

following MIPS R2000 code.

Instructions (op dst, src1, src2)

lw $14, 20($sp) # a is allocated at M[sp+20]

lw $15, 16($sp) # b is allocated at M[sp+16]

subu $24, $14, $15

mul $9, $14, $15

div $12, $24, $9

sw $12, 0($sp) # f is allocated at M[sp+0]

Page 59: 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

1/16/99 CS520S99 Introduction

C. Edward

Chow

Page 59

Homework #1 (Continue)Assume all the variables are already in the cache (i.e. does not have

to go the main memory for data) and Table 1 contains the clock cycles for each types of instructions when data is in the cache.

What is the execution time (in term of seconds) of the above segment using a R2000 chip with a 25 MHz clock?

Problem B. Assume the CPU operation accounts for 70% of the time in a system.

a) What is the overall speedup if we improve CPU speed by 100%?

b) How much faster should the new CPU be in order to have the overall speedup of 1.7?

c) Is it possible to have overall speedup of 3 by just improving the CPU?