Introduction to Hardware/Architecture

1

Introduction to Hardware/Architecture

David A. Patterson

http://cs.berkeley.edu/~patterson/talks

[email protected], University of California

Berkeley, CA 94720-1776

2

Year

Transistors

1000

10000

100000

1000000

10000000

100000000

1970 1975 1980 1985 1990 1995 2000

i80386

i4004

i8080

Pentium

i80486

i80286

i8086

Technology Trends: Microprocessor Capacity

2X transistors/ChipEvery 1.5 years

Called “Moore’s Law”:

Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

Moore’s Law

3

Technology Trends: Processor Performance

0100200300400500600700800900

87 88 89 90 91 92 93 94 95 96 97

DEC Alpha 21264/600

DEC Alpha 5/500

DEC Alpha 5/300

DEC Alpha 4/266IBM POWER 100

DEC AXP/500

HP 9000/750

Sun-4/260

IBMRS/6000

MIPS M/120

MIPS M

2000

1.54X/yr

Processor performance increase/yrmistakenly referred to as Moore’s Law (transistors/chip)

4

5 components of any Computer

Processor (active)

Computer

Control(“brain”)

Datapath(“brawn”)

Memory(passive)

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard, Mouse

Display, Printer

Disk,Network

5

Computer Technology=>Dramatic Change

Processor 2X in speed every 1.5 years; 1000X performance in last 15 years

Memory DRAM capacity: 2x / 1.5 years; 1000X size in last 15 years Cost per bit: improves about 25% per year

Disk capacity: > 2X in size every 1.5 years Cost per bit: improves about 60% per year 120X size in last decade

State-of-the-art PC “when you graduate” (1997-2001) Processor clock speed: 1500 MegaHertz (1.5 GigaHertz) Memory capacity: 500 MegaByte (0.5 GigaBytes) Disk capacity: 100 GigaBytes (0.1 TeraBytes) New units! Mega => Giga, Giga => Tera

6

Integrated Circuit CostsDie cost = Wafer cost

Dies per Wafer * Die yield

Die Cost is goes roughly with the cube of the area: fewer dies per wafer * yield worse with die area

FlawsDies

7

Die Yield (1993 data)Raw Dices Per Wafer

wafer diameter die area (mm2)100 144 196 256 324 400

6”/15cm 139 90 62 44 32 23

8”/20cm 265 177 124 90 68 52

10”/25cm 431 290 206 153 116 90

die yield 23% 19% 16% 12% 11% 10%typical CMOS process: =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer

Good Dices Per Wafer (Before Testing!)

6”/15cm 31 16 9 5 3 2

8”/20cm 59 32 19 11 7 5

10”/25cm 96 53 32 20 13 9

typical cost of an 8”, 4 metal layers, 0.5um CMOS wafer: ~$2000

8

1993 Real World ExamplesChip Metal Line WaferDefect Area Dies/ Yield Die Cost

layers width cost /cm2 mm2 wafer

386DX 2 0.90 $900 1.0 43 360 71% $4

486DX2 3 0.80 $1200 1.0 81 181 54% $12

PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53

HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73

DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149

SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272

Pentium 3 0.80 $1500 1.5 296 40 9% $417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report,

August 2, 1993, p. 15

9

Processor Trends/ History

History of innovations to 2X / 1.5 yr Pipelining (helps seconds / clock, or clock rate) Out-of-Order Execution (helps clocks / instruction) Superscalar (helps clocks / instruction)

10

Pipelining is Natural!° Laundry Example

° Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away

° Washer takes 30 minutes

° Dryer takes 30 minutes

° “Folder” takes 30 minutes

° “Stasher” takes 30 minutesto put clothes into drawers

A B C D

11

Sequential Laundry

Sequential laundry takes 8 hours for 4 loads

30Task

Order

B

C

D

ATime

3030 3030 30 3030 3030 3030 3030 3030

6 PM 7 8 9 10 11 12 1 2 AM

12

Pipelined Laundry: Start work ASAP

Pipelined laundry takes 3.5 hours for 4 loads!

Task

Order

12 2 AM6 PM 7 8 9 10 11 1

Time

B

C

D

A

303030 3030 3030

13

Pipeline Hazard: Stall

A depends on D; stall since folder tied up

Task

Order

12 2 AM6 PM 7 8 9 10 11 1

Time

B

C

D

A

E

F

bubble

303030 3030 3030

14

Out-of-Order Laundry: Don’t Wait

A depends on D; rest continue; need more resources to allow out-of-order

Task

Order

12 2 AM6 PM 7 8 9 10 11 1

Time

B

C

D

A

303030 3030 3030

E

F

bubble

15

Superscalar Laundry: Parallel per stage

More resources, HW match mix of parallel tasks?

Task

Order

12 2 AM6 PM 7 8 9 10 11 1

Time

B

C

D

A

E

F

(light clothing) (dark clothing) (very dirty clothing)

(light clothing) (dark clothing) (very dirty clothing)

303030 3030

16

Superscalar Laundry: Mismatch Mix

Task mix underutilizes extra resources

Task

Order

12 2 AM6 PM 7 8 9 10 11 1

Time303030 3030 3030

(light clothing)

(light clothing) (dark clothing)

(light clothing)

A

B

D

C

17

State of the Art: Alpha 21264 15M transistors 2 64KB caches on chip; 16MB L2 cache off chip Clock <1.7 nsec, or >600 MHz 90 watts Superscalar: fetch up to 6 instructions/clock cycle,

retires up to 4 instruction/clock cycle Execution out-of-order

18

Other example: Sony Playstation 2

Emotion Engine: 6.2 GFLOPS, 75 million polygons per second (Microprocessor Report, 13:5)

Superscalar MIPS core + vector coprocessor + graphics/DRAM Claim: “Toy Story” realism brought to games

19

The Goal: Illusion of large,

fast, cheap memory Fact: Large memories are slow,

fast memories are small How do we create a memory that is large, cheap

and fast (most of the time)? Hierarchy of Levels

Similar to Principle of Abstraction: hide details of multiple levels

20

Hierarchy Analogy: Term Paper Working on paper in library at a desk Option 1: Every time need a book

Leave desk to go to shelves (or stacks) Find the book Bring one book back to desk Read section interested in When done with section, leave desk and go to

shelves carrying book Put the book back on shelf Return to desk to work Next time need a book, go to first step

21

Hierarchy Analogy: Library Option 2: Every time need a book

Leave some books on desk after fetching them Only go to shelves when need a new book When go to shelves, bring back related books in case

you need them; sometimes you’ll need to return books not used recently to make space for new books on desk

Return to desk to work When done, replace books on shelves, carrying as

many as you can per trip Illusion: whole library on your desktop Buzzword “cache” from French for hidden treasure

22

Why Hierarchy works: Natural Locality The Principle of Locality:

Program access a relatively small portion of the address space at any instant of time.

Address Space

0 2^n - 1

Probabilityof reference

What programming constructs lead to Principle of Locality?

23

Memory Hierarchy: How Does it Work?

Temporal Locality (Locality in Time): Keep most recently accessed data items closer to

the processor Library Analogy: Recently read books are kept on

desk Block is unit of transfer (like book)

Spatial Locality (Locality in Space): Move blocks consists of contiguous words to the

upper levels Library Analogy: Bring back nearby books on

shelves when fetch a book; hope that you might need it later for your paper

24

Memory Hierarchy Pyramid

Levels in memory hierarchy

Central Processor Unit (CPU)

Size of memory at each level

Level 1

Level 2

Level n

Increasing Distance from

CPU,Decreasing

cost / MB

“Upper”

“Lower”Level 3

. . .

(data cannot be in level i unless also in i+1)

25

Big Idea of Memory Hierarchy Temporal locality: keep recently accessed data

items closer to processor Spatial locality: moving contiguous words in

memory to upper levels of hierarchy Uses smaller and faster memory technologies

close to the processor Fast hit time in highest level of hierarchy Cheap, slow memory furthest from processor

If hit rate is high enough, hierarchy has access time close to the highest (and fastest) level and size equal to the lowest (and largest) level

26

Disk Description / History

1973:1. 7 Mbit/sq. in140 MBytes

1979:7. 7 Mbit/sq. in2,300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces”

SectorTrack

Cylinder

HeadPlatter

Arm

Embed. Proc. (ECC, SCSI)

Track Buffer

27

1

10

100

1000

10000

1970 1980 1990 2000

Year

Areal Density

Disk History

1989:63 Mbit/sq. in60,000 MBytes

1997:1450 Mbit/sq. in2300 Mbytes (2.5” diameter)source: N.Y. Times, 2/23/98, page C3

1997:3090 Mbit/s. i.8100 Mbytes(3.5” diameter)

2000:10,100 Mb/s. i.25,000 MBytes

2000:11,000 Mb/s. i.73,400 MBytes

28

State of the Art: Ultrastar 72ZX 73.4 GB, 3.5 inch disk 2¢/MB 16 MB track buffer 11 platters, 22 surfaces 15,110 cylinders 7 Gbit/sq. in. areal density 17 watts (idle) 0.1 ms controller time 5.3 ms avg. seek

(seek 1 track => 0.6 ms) 3 ms = 1/2 rotation 37 to 22 MB/s to media

source: www.ibm.com; www.pricewatch.com; 2/14/00

Latency = Queuing Time + Controller time +Seek Time + Rotation Time + Size / Bandwidth

per access

per byte{+

Sector

Track

Cylinder

Head PlatterArm

Embed. Proc.

Track Buffer

29

A glimpse into the future? IBM microdrive for digital cameras

340 Mbytes Disk target in 5-7 years?

30

Questions?

Contact us if you’re interested:email: [email protected]

http://iram.cs.berkeley.edu/

Introduction to Hardware/Architecture

Documents

Transcript of Introduction to Hardware/Architecture