No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components...

46
School of Electrical Engineering and Computer Science WASHINGTON STATE UNIVERSITY E E 3 3 4 COMPUTER ARCHITECTURE

Transcript of No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components...

Page 1: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

School of Electrical Engineering and Computer Science

WASHINGTON STATE UNIVERSITY

E E 3 3 4COMPUTER

ARCHITECTURE

Page 2: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 2

Topics

1. Computer Abstractions and Technology (Chapter 1)

1.1 Programs

1.2 Integrated circuits

1.3 Microprocessors

2. Instruction Set Architecture (Chapter 2)

2.1 Instruction types: Arithmetic, Logic, Branch, Memory

2.2 Operand storage, type and size

2.3 Examples of instruction sets (MIPS)

3. Computer Arithmetic (Chapter 3)

3.1 Number representation

3.2 Addition/Subtraction

3.3 ALU

3.4 Multiplication/Division

3.5 Floating Point

Page 3: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 3

Topics (Cont’d)

4. Processor Architecture (Chapter 4)

4.1 Building a Datapath

4.2 Pipelining principle

4.3 Pipelined datapth and control

4.4 Data hazards and forwarding

4.5 Control (branch) Hazards

4.6 Instruction-level parallelism

4.7 AMD Opteron X4

Page 4: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 4

Topics (Cont’d)

5. Memory Hierarchy (Chapter 5)

5.1 Principle of locality

5.2 Memory hierarchy

5.3 Cache memory

5.4 Virtual Memory

6. Multicore and Multiprocessor (Chapter 7)

6.1 Parallel Processing

6.2 Share Memory Multiprocessors

6.3 Hardware Multithreading

6.4 Message passing multiprocessors

Page 5: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 5

Textbook

D. A. Patterson and J. L. Hennessy,

Computer Organization and Design: The

Hardware/Software Interface,

Fourth Edition. Morgan Kaufmann Publishers, 2009.

Page 6: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 6

Grade

2 Exams (25% each) 50%

Homework 15%

Project 15%

Final Exam 20%

Page 7: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 7

Instructor

Dr. José Delgado-Frias

• Ph.D. Texas A&M University

• Academic experience:

– Washington State University (2001-present)

– Princeton University (2008-09 Sabbatical leave)

– University of Virginia

– State University of New York (SUNY)-Binghamton

– University of Oxford (England)

Page 8: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 8

Instructor (Cont’d)

Research interests:

– Computer architecture

• Reconfigurable architectures

– VLSI architectures

– Networking

Page 9: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 9

Office Hours

• Wednesday 10:30am – 12 noon (or by appointment)

• Office: EME 502 (Tower)

[email protected]

• Phone: (509)335-1156 Fax: (509)335-3818

Page 10: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 10

Course Objectives

Students in this course will be able to:

• Understand how modern computer systems work.

• Perform quantitative analysis of computer systems.

• Analyze at system level the impact of changes in the

computer systems.

• Estimate the performance of a computer system.

• Design novel schemes that improve the performance of

computer systems

• Use tools to design modern systems.

• Recognize the need for further learning in this field (life-long

learning).

Page 11: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 11

Things you’ll be learning

• How computers work, a basic foundation

• Classic/basic components of a computer.

• Stored program concept: instructions and data

• Issues affecting modern processors (caches, pipelines)

• Principles of locality to be exploited by means of memory hierarchy (L1, L2 & L3 cache; main memory; disk,…).

• Greater performance by means of instruction level parallelism

• Principle of abstraction, used to build systems as layers.

• Compilation vs. interpretation thru system layers.

• How to analyze their performance (or how not to!)

• Principles and pitfalls of performance measurement.

Page 12: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 12

Focus

• Our primary focus: the processor (datapath and control)

– Implemented using millions of transistors

– Impossible to understand by looking at each transistor

– We need to:

• Have an overall picture of the system.

• Analyze how the components interact.

• Consider both Hardware and Software.

Page 13: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 13

Grade

95 – 100 A 77 – 79.999 C+

90 – 94.999 A- 73 – 76.999 C

87 – 89.999 B+ 70 – 72.999 C-

83 – 86.999 B 60 – 69.999 D

80 – 82.999 B- Below 60 F

Page 14: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 14

Basic components

Processor

Computer

Control(“brain”)

Datapath(“brawn”)

Memory

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard,

Mouse

Display,

Printer

Disk(where programs, data live whennot running)

Page 15: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 15

Computer System

Instruction Set Architecture

Digital Design

Circuit Design

transistors

I/O systemProcessor

Datapath & Control

MemoryHardware

SoftwareCompiler

Operating

System

Assembler

Application (Explorer)

Page 16: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 16

Abstraction

• Delving into the depths

reveals more information

• An abstraction omits unneeded detail,

helps us cope with complexity

What are some of the details that

appear in these familiar abstractions?

swap(int v[], int k)

{int temp;

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

}

swap:

muli $2, $5,4

add $2, $4,$2

lw $15, 0($2)

lw $16, 4($2)

sw $16, 0($2)

sw $15, 4($2)

jr $31

00000000101000010000000000011000

00000000100011100001100000100001

10001100011000100000000000000000

10001100111100100000000000000100

101011001111001000000000000000

00

10101100011000100000000000000100

00000011111000000000000000001000

Binary machine

language

program(for MIPS)

C compiler

Assembler

Assemblylanguage

program

(for MIPS)

High-level

languageprogram

(in C)

Page 17: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 17

Below the program

• High-level language program (in C)

swap (int v[], int k)(int temp;

temp = v[k];v[k] = v[k+1];v[k+1] = temp;

)

• Assembly language program (MIPS)

swap: sll $2, $5, 2add $2, $4,$2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31

• Machine (object) code (MIPS)

000000 00000 00101 0001000010000000000000 00100 00010 0001000000100000. . .

C compiler

assembler

Page 18: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 18

Computer history

Generation -1: The early days ????-1642

Generation 0: Mechanical 1642-1935

Generation 1: Electromechanical 1935-1945

Generation 2: Vacuum tubes 1945-1955

Generation 3: Discrete transistors 1955-1965

Generation 4: Integrated circuits 1965-1980

Generation 5: VLSI 1980-????

Page 19: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 19

Generation 1:

Electromechanical

(1935-1945)

• Grace Murray Hopper

found the first computer

bug beaten to death in

the jaws of a relay. She

glued it into the logbook

of the computer and

thereafter when the

machine stopped

(frequently) she told

Howard Aiken that they

were "debugging" the

computer.

Page 20: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 20

Integrated Circuits

• Rapidly changing field:

– vacuum tube transistor IC VLSI

– doubling every 1.5 years:

memory capacity

processor speed

• (Due to advances in technology and organization)

Page 21: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 21

Intel 4004

• In 1971, Ted Hoff produced

the Intel 4004 in response to

the request from a Japanese

company (Busicom) to create

a chip for a calculator

• It is the first microprocessor,

i.e. the first processor-on-a-

chip

Page 22: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 22

Altair • Developers Edward

Roberts, William Yates

and Jim Bybee spent

1973-1974 to develop

the MITS Altair 8800, the

first personal computer.

• Priced $375, it contained

256 bytes of memory,

but had no keyboard, no

display, and no auxiliary

storage device.

• Later, Bill Gates and

Paul Allen wrote their

first product for the Altair

-- a BASIC compiler

Page 23: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 23

Intel 4004

Page 24: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 24

Intel Pentium II

Page 25: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 25

Pentium III (2000)

Page 26: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 26

•PIC Programmable Interrupt Controller

•E/BBL External/Back-side Bus Logic

•CLK Clocking

•L2 Level 2 cache

•DTLB Data Translation Look-aside Buffer

•DCU Data Cache Unit

•BTB Branch Target Buffer

•BAC Branch Address Calculator

•TAP Testability Access Port

•IFU Instruction Fetch Unit

•PMH Page Miss Handler

Page 27: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 27

•PFU Packed FPU (MMX)

•SIMD Packed Floating point

•MOB Memory Order Buffer

•IEU Integer Execution Unit

•RAT Register Alias Table

•FEU FPU Execution Unit

•MIU Memory Interface Unit

•RS Reservation Station

•ID Instruction Decode

•ROB Re-Order Buffer

•MS Micro-instruction Sequencer

Page 28: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 28

Intel Pentium 4

Fall 2002

• 2.80 GHz

• 0.13-micron technology

• 478-pin package

• 512 KB L2 Cache

• 50 Amps

• Vcc = 1.5V

Page 29: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 29

SONY PlayStation 2000

Page 30: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 30

SPE

PPE

Power Processor Element

Synergistic

Processor

Element

SONY PlayStation 3 Processor

Page 31: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 31

Page 32: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 32

Page 33: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 33

Chip Thermal Map

Page 34: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 34

XBOX 360

Page 35: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 35

CPU (Xbox 360)

Page 36: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 36

Intel Core i7

Cores: 4

Threads: 8

Clock: 2.5GHz

Cache:

L1: 32KB (8-way)

L2: 256K (8-way)

L3: 8MB (16 way)

32nm Tech

731M trans.

Page 37: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 37

Why Such Change in 12 years?

• Performance

– Technology Advances

• CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance

– Computer architecture advances improves low-end

• RISC, superscalar, RAID, …

• Price: Lower costs due to …

– Simpler development

• CMOS VLSI: smaller systems, fewer components

– Higher volumes

• CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units

– Lower margins by class of computer, due to fewer services

• Function

– Rise of networking/local interconnection technology

Page 38: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 38

Performance

1

DEC

Page 39: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias

Processor Core Trend

EE 334 39

Source: S. Fuller and L. Millett, “Computing Performance: Game Over or Next Level?”

IEEE Computer, pp. 31-38, January 2011.

Page 40: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias

Processor Core Clock Frequency

EE 334 40

Source: S. Fuller and L. Millett, “Computing Performance: Game Over or Next Level?”

IEEE Computer, pp. 31-38, January 2011.

Page 41: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 41

Metrics of Performance

Compiler

Programming

Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second: MIPS

(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per month

Operations per second

Page 42: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 42

CPU time

Instruction

CountCycles Per

InstructionClock Rate

Program

Compiler

Instruction

set

OrganizationTechnology

Page 43: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 43

Computer Architecture

Computer Architecture =

Instruction Set Architecture +

Machine Organization + …

Page 44: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 44

Computer Architecture’s Changing

Definition

• 1950s to 1960s: Computer Architecture Course Computer

Arithmetic

• 1970s to mid 1980s: Computer Architecture Course

Instruction Set Design, especially ISA appropriate for

compilers

• 1990s: Computer Architecture Course

Design of CPU, memory system, I/O system, Multiprocessors

• 2000s: Computer Architecture Course

Design of CPU (Performance & Power), Memory system, I/O,

embedded systems, wireless.

Page 45: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 45

Instruction Set Architecture (ISA)

instruction set

Page 46: No Slide Titlejdelgado/EE334/Intro2011.pdf · •CMOS VLSI: smaller systems, fewer components –Higher volumes •CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units –Lower margins

José Delgado-Frias EE 334 46

Interface Design (ISA)

A good interface:

• Lasts through many implementations (portability, compatibility)

• Is used in many different ways (generality)

• Provides convenient functionality to higher levels

• Permits an efficient implementation at lower levels

Interfaceimp 1

imp 2

imp 3

use

use

use

time