INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI...

40
INEL6067 Technology ---> Limitations & Opportunities Wires - Area - Propagation speed Clock Power VLSI - I/O pin limitations - Chip area - Chip crossing delay - Power Can not make light go any faster KISS rule (Keep It Simple, Stupid)

Transcript of INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI...

Page 1: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Technology ---> Limitations & Opportunities

• Wires

- Area

- Propagation speed

• Clock

• Power

• VLSI

- I/O pin limitations

- Chip area

- Chip crossing delay

- Power

• Can not make light go any faster

• KISS rule (Keep It Simple, Stupid)

Page 2: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Major theme

• Look at typical applications

• Understand physical limitations

• Make tradeoffs

ARCHITECTURE

Application requirements

Technological constraints

Page 3: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Unfortunately

° Requirements and constraints are often at odds with each other!

° Architecture ---> making tradeoffs

Full connectivity!

Gasp!!!

Page 4: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Putting it all together

° The systems approach

• Lesson from RISCs

• Hardware software tradeoffs

• Functionality implemented at the right level

- Hardware

- Runtime system

- Compiler

- Language, Programmer

- Algorithm

Page 5: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Commercial Computing

° Relies on parallelism for high end• Computational power determines scale of business that can be

handled

° Databases, online-transaction processing, decision support, data mining, data warehousing ...

Page 6: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Scientific Computing Demand

Page 7: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

1980 1985 1990 1995

1 MIPS

10 MIPS

100 MIPS

1 GIPS

Sub-BandSpeech Coding

200 WordsIsolated SpeechRecognition

SpeakerVeri¼cation

CELPSpeech Coding

ISDN-CD StereoReceiver

5,000 WordsContinuousSpeechRecognition

HDTV Receiver

CIF Video

1,000 WordsContinuousSpeechRecognitionTelephone

NumberRecognition

10 GIPS

• Also CAD, Databases, . . .

• 100 processors gets you 10 years, 1000 gets you 20 !

Applications: Speech and Image Processing

Page 8: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Is better parallel arch enough?

° AMBER molecular dynamics simulation program

° Starting point was vector code for Cray-1

° 145 MFLOP on Cray90, 406 for final version on 128-processor Paragon, 891 on 128-processor Cray T3D

Page 9: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Summary of Application Trends

° Transition to parallel computing has occurred for scientific and engineering computing

° In rapid progress in commercial computing• Database and transactions as well as financial

• Usually smaller-scale, but large-scale systems also used

° Desktop also uses multithreaded programs, which are a lot like parallel programs

° Demand for improving throughput on sequential workloads

• Greatest use of small-scale multiprocessors

° Solid application demand exists and will increase

Page 10: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Per

form

ance

0.1

1

10

100

1965 1970 1975 1980 1985 1990 1995

Supercomputers

Minicomputers

Mainframes

Microprocessors

Technology Trends

° Today the natural building-block is also fastest!

Page 11: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Proc $

Interconnect

Technology: A Closer Look

° Basic advance is decreasing feature size ( )• Circuits become either faster or lower in power

° Die size is growing too• Clock rate improves roughly proportional to improvement in • Number of transistors improves like (or faster)

° Performance > 100x per decade• clock rate < 10x, rest is transistor count

° How to use more transistors?• Parallelism in processing

- multiple operations per cycle reduces CPI• Locality in data access

- avoids latency and reduces CPI- also improves processor utilization

• Both need resources, so tradeoff

° Fundamental issue is resource distribution, as in uniprocessors

Page 12: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

• 30% per year

0.1

1

10

100

1,000

19701975

19801985

19901995

20002005

Cloc

k ra

te (M

Hz)

i4004i8008

i8080

i8086 i80286i80386

Pentium100

R10000

Growth Rates

Tran

sisto

rs

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

19701975

19801985

19901995

20002005

i4004i8008

i8080

i8086

i80286i80386

R2000

Pentium R10000

R3000

40% per year

Page 13: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Architectural Trends

° Architecture translates technology’s gifts into performance and capability

° Resolves the tradeoff between parallelism and locality

• Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect

• Tradeoffs may change with scale and technology advances

° Understanding microprocessor architectural trends

=> Helps build intuition about design issues or parallel machines

=> Shows fundamental role of parallelism even in “sequential” computers

Page 14: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Transis

tors

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1970 1975 1980 1985 1990 1995 2000 2005

Bit-level parallelism Instruction-level Thread-level (?)

i4004

i8008i8080

i8086

i80286

i80386

R2000

Pentium

R10000

R3000

Phases in “VLSI” Generation

Page 15: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Architectural Trends

° Greatest trend in VLSI generation is increase in parallelism

• Up to 1985: bit level parallelism: 4-bit -> 8 bit -> 16-bit

- slows after 32 bit

- adoption of 64-bit now under way, 128-bit far (not performance issue)

- great inflection point when 32-bit micro and cache fit on a chip

• Mid 80s to mid 90s: instruction level parallelism

- pipelining and simple instruction sets, + compiler advances (RISC)

- on-chip caches and functional units => superscalar execution

- greater sophistication: out of order execution, speculation, prediction

– to deal with control transfer and latency problems• Next step: thread level parallelism

Page 16: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

0 1 2 3 4 5 6+0

5

10

15

20

25

30

0 5 10 150

0.5

1

1.5

2

2.5

3

Fra

ctio

n o

f to

tal c

ycle

s (%

)

Number of instructions issued

Sp

ee

du

p

Instructions issued per cycle

How far will ILP go?

° Infinite resources and fetch bandwidth, perfect branch prediction and renaming

– real caches and non-zero miss latencies

Page 17: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Threads Level Parallelism “on board”

° Micro on a chip makes it natural to connect many to shared memory

– dominates server and enterprise market, moving down to desktop

° Faster processors began to saturate bus, then bus technology advanced

– today, range of sizes for bus-based systems, desktop to large servers

Proc Proc Proc Proc

MEM

Page 18: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

What about Multiprocessor Trends?

0

10

20

30

40

CRAY CS6400

SGI Challenge

Sequent B2100

Sequent B8000

Symmetry81

Symmetry21

Power

SS690MP 140 SS690MP 120

AS8400

HP K400AS2100SS20

SE30

SS1000E

SS10

SE10

SS1000

P-ProSGI PowerSeries

SE60

SE70

Sun E6000

SC2000ESun SC2000SGI PowerChallenge/XL

SunE10000

50

60

70

1984 1986 1988 1990 1992 1994 1996 1998

Nu

mb

er

of

pro

cess

ors

Page 19: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

What about Storage Trends?

° Divergence between memory capacity and speed even more pronounced

• Capacity increased by 1000x from 1980-95, speed only 2x

• Gigabit DRAM by c. 2000, but gap with processor speed much greater

° Larger memories are slower, while processors get faster• Need to transfer more data in parallel

• Need deeper cache hierarchies

• How to organize caches?

° Parallelism increases effective size of each level of hierarchy, without increasing access time

° Parallelism and locality within memory systems too• New designs fetch many bits within memory chip; follow with fast

pipelined transfer across narrower interface

• Buffer caches most recently accessed data

° Disks too: Parallel disks plus caching

Page 20: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Economics° Commodity microprocessors not only fast but CHEAP

• Development costs tens of millions of dollars

• BUT, many more are sold compared to supercomputers

• Crucial to take advantage of the investment, and use the commodity building block

° Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors

° Standardization makes small, bus-based SMPs commodity

° Desktop: few smaller processors versus one larger one?

° Multiprocessor on a chip?

Page 21: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Consider Scientific Supercomputing

° Proving ground and driver for innovative architecture and techniques

• Market smaller relative to commercial as MPs become mainstream

• Dominated by vector machines starting in 70s

• Microprocessors have made huge gains in floating-point performance

- high clock rates

- pipelined floating point units (e.g., multiply-add every cycle)

- instruction-level parallelism

- effective use of caches (e.g., automatic blocking)

• Plus economics

° Large-scale multiprocessors replace vector supercomputers

Page 22: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

LIN

PA

CK

(G

FLO

PS

) CRAY peak MPP peak

Xmp /416(4)

Ymp/832(8) nCUBE/2(1024)iPSC/860

CM-2CM-200

Delta

Paragon XP/S

C90(16)

CM-5

ASCI Red

T932(32)

T3D

Paragon XP/S MP(1024)

Paragon XP/S MP(6768)

0.1

1

10

100

1,000

10,000

1985 1987 1989 1991 1993 1995 1996

Raw Parallel Performance: LINPACK

° Even vector Crays became parallel• X-MP (2-4) Y-MP (8), C-90 (16), T94 (32)

° Since 1993, Cray produces MPPs too (T3D, T3E)

Page 23: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Where is Parallel Arch Going?

Application Software

System Software SIMD

Message Passing

Shared MemoryDataflow

SystolicArrays Architecture

• Uncertainty of direction paralyzed parallel software development!

Old view: Divergent architectures, no predictable pattern of growth.

Page 24: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Modern Layered Framework

CAD

Multiprogramming Sharedaddress

Messagepassing

Dataparallel

Database Scientific modeling Parallel applications

Programming models

Communication abstractionUser/system boundary

Compilationor library

Operating systems support

Communication hardware

Physical communication medium

Hardware/software boundary

Page 25: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Summary: Why Parallel Architecture?

° Increasingly attractive• Economics, technology, architecture, application demand

° Increasingly central and mainstream

° Parallelism exploited at many levels• Instruction-level parallelism

• Multiprocessor servers

• Large-scale multiprocessors (“MPPs”)

° Focus of this class: multiprocessor level of parallelism

° Same story from memory system perspective• Increase bandwidth, reduce average latency with many local

memories

° Spectrum of parallel architectures make sense• Different cost, performance and scalability

Page 26: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Threads Level Parallelism “on board”

° Micro on a chip makes it natural to connect many to shared memory

– dominates server and enterprise market, moving down to desktop

° Faster processors began to saturate bus, then bus technology advanced

– today, range of sizes for bus-based systems, desktop to large servers

Proc Proc Proc Proc

MEM

Page 27: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

What about Multiprocessor Trends?

0

10

20

30

40

CRAY CS6400

SGI Challenge

Sequent B2100

Sequent B8000

Symmetry81

Symmetry21

Power

SS690MP 140 SS690MP 120

AS8400

HP K400AS2100SS20

SE30

SS1000E

SS10

SE10

SS1000

P-ProSGI PowerSeries

SE60

SE70

Sun E6000

SC2000ESun SC2000SGI PowerChallenge/XL

SunE10000

50

60

70

1984 1986 1988 1990 1992 1994 1996 1998

Nu

mb

er

of

pro

cess

ors

Page 28: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

What about Storage Trends?

° Divergence between memory capacity and speed even more pronounced

• Capacity increased by 1000x from 1980-95, speed only 2x

• Gigabit DRAM by c. 2000, but gap with processor speed much greater

° Larger memories are slower, while processors get faster• Need to transfer more data in parallel

• Need deeper cache hierarchies

• How to organize caches?

° Parallelism increases effective size of each level of hierarchy, without increasing access time

° Parallelism and locality within memory systems too• New designs fetch many bits within memory chip; follow with fast

pipelined transfer across narrower interface

• Buffer caches most recently accessed data

° Disks too: Parallel disks plus caching

Page 29: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Economics° Commodity microprocessors not only fast but CHEAP

• Development costs tens of millions of dollars

• BUT, many more are sold compared to supercomputers

• Crucial to take advantage of the investment, and use the commodity building block

° Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors

° Standardization makes small, bus-based SMPs commodity

° Desktop: few smaller processors versus one larger one?

° Multiprocessor on a chip?

Page 30: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

LIN

PA

CK

(G

FLO

PS

) CRAY peak MPP peak

Xmp /416(4)

Ymp/832(8) nCUBE/2(1024)iPSC/860

CM-2CM-200

Delta

Paragon XP/S

C90(16)

CM-5

ASCI Red

T932(32)

T3D

Paragon XP/S MP(1024)

Paragon XP/S MP(6768)

0.1

1

10

100

1,000

10,000

1985 1987 1989 1991 1993 1995 1996

Raw Parallel Performance: LINPACK

° Even vector Crays became parallel• X-MP (2-4) Y-MP (8), C-90 (16), T94 (32)

° Since 1993, Cray produces MPPs too (T3D, T3E)

Page 31: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Where is Parallel Arch Going?

Application Software

System Software SIMD

Message Passing

Shared MemoryDataflow

SystolicArrays Architecture

• Uncertainty of direction paralyzed parallel software development!

Old view: Divergent architectures, no predictable pattern of growth.

Page 32: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Modern Layered Framework

CAD

Multiprogramming Sharedaddress

Messagepassing

Dataparallel

Database Scientific modeling Parallel applications

Programming models

Communication abstractionUser/system boundary

Compilationor library

Operating systems support

Communication hardware

Physical communication medium

Hardware/software boundary

Page 33: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Application Software

System Software SIMD

Message Passing

Shared MemoryDataflow

SystolicArrays Architecture

History

° Parallel architectures tied closely to programming models

• Divergent architectures, with no predictable pattern of growth.

• Mid 80s revival

Page 34: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Programming Model

° Look at major programming models• Where did they come from?

• What do they provide?

• How have they converged?

° Extract general structure and fundamental issues

° Reexamine traditional camps from new perspective

SIMD

Message Passing

Shared MemoryDataflow

SystolicArrays Generic

Architecture

Page 35: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Programming Model

° Conceptualization of the machine that programmer uses in coding applications

• How parts cooperate and coordinate their activities

• Specifies communication and synchronization operations

° Multiprogramming• no communication or synch. at program level

° Shared address space• like bulletin board

° Message passing• like letters or phone calls, explicit point to point

° Data parallel: • more regimented, global actions on data

• Implemented with shared address space or message passing

Page 36: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Adding Processing Capacity

° Memory capacity increased by adding modules

° I/O by controllers and devices

° Add processors for processing! • For higher-throughput multiprogramming, or parallel

programs

I/O ctrlMem Mem Mem

Interconnect

Mem I/O ctrl

Processor Processor

Interconnect

I/Odevices

Page 37: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Historical Development

P

P

C

C

I/O

I/O

M MM M

PP

C

I/O

M MC

I/O

$ $

° “Mainframe” approach• Motivated by multiprogramming• Extends crossbar used for Mem and I/O• Processor cost-limited => crossbar• Bandwidth scales with p• High incremental cost

- use multistage instead

° “Minicomputer” approach• Almost all microprocessor systems have bus• Motivated by multiprogramming, TP• Used heavily for parallel computing• Called symmetric multiprocessor (SMP)• Latency larger than for uniprocessor• Bus is bandwidth bottleneck

- caching is key: coherence problem• Low incremental cost

Page 38: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Shared Physical Memory

° Any processor can directly reference any memory location

° Any I/O controller - any memory

° Operating system can run on any processor, or all.• OS uses shared memory to coordinate

° Communication occurs implicitly as result of loads and stores

° What about application processes?

Page 39: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Shared Virtual Address Space

° Process = address space plus thread of control

° Virtual-to-physical mapping can be established so that processes shared portions of address space.

• User-kernel or multiple processes

° Multiple threads of control on one address space.• Popular approach to structuring OS’s

• Now standard application capability° Writes to shared address visible to other threads

• Natural extension of uniprocessors model• conventional memory operations for communication• special atomic operations for synchronization

- also load/stores

Page 40: INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

INEL6067

Structured Shared Address Space

° Add hoc parallelism used in system code

° Most parallel applications have structured SAS

° Same program on each processor• shared variable X means the same thing to each thread

St or e

P1

P2

Pn

P0

Load

P0 pr i vat e

P1 pr i vat e

P2 pr i vat e

Pn pr i vat e

Virtual address spaces for acollection of processes communicatingvia shared addresses

Machine physical address space

Shared portionof address space

Private portionof address space

Common physicaladdresses