Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX...

60
Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP

Transcript of Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX...

Page 1: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Comparing High-End Computer Architectures for Business Applications

Presentation: 493

Track: HP-UX

Dr. Frank Baetke

HP

Page 2: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basicsbasics

Page 3: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basic components / nomeclature

cpu

memory

Page 4: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

processors: bandwidth & latency

cache

cpu

memory

bus

memory bandwidth~0.6 GB/s to 2.7 GB/s

low high

memory latency:~200ns to ~1000ns

on-chip cache latency ~4ns

off-chip cache latency ~30ns

Page 5: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

processorsprocessors

Page 6: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

2© 2002

next generation processor technologyIn

no

va

tio

n

CISC RISC

OOO SuperScalarH/W detects implicit parallelism

H/W O-O-O scheduling & speculation

H/W renames 8-32 registers to 64+

Explicitly

Parallel

Instruction

Computing

Explicitly

Parallel

Instruction

ComputingMultiple Cores &Integrated InterconnectsMultiple Cores &Integrated Interconnects

New features !New features !

New features ?New features ?

Alpha EV7

Itaniumtm2

IA-32 Processor Family

PA-8800

POWER4

SPARC-III

MIPS 14K

PA-8700

Alpha EV68

Itanium

Page 7: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

PA-8700

3 GFLOP/s RISC processor

Actually a PIM (a processor in memory)

Page 8: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

PA-8800 processor

Dual core technology

Page 9: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

trend 1:

large on-chipcaches

multi-core chips

new architecture Itanium®

Page 10: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Itanium®architecture

formerly IA-64, IPF

Page 11: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

IA-64 (IPF) Key Features:

• Massive resources: 2* 128 64-bit+ registers.n* Integer Units, m*Floating-Point Units, lots of special registers for branches, predication, loop unrolling etc.

• Speculation: The processor can ‘pre-load’ data into the caches even if the access is potentially illegal - so it speculates that the data are valid. Correctness can later be checked in 1 cycle !

• Explicit Parallelization: The compiler ‘tells’ the processor what can be executed in parallel (and what requires sequential processing)

• Predication: The compiler tells the processor to run (for example) both parts of an ‘IF condition’ in parallel and then discard the ‘wrong part’.

Page 12: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Itanium: Principal CPU-Design

128128GRsGRs

128128GRsGRs

128128FRsFRs

128128FRsFRs

MMEEMMOORRYY

Massively resourced - large register files Massively resourced - large register files Can use registers for ‘vector-like’ loopsCan use registers for ‘vector-like’ loops

ExecutionUnits

Page 13: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

IA-64 (IPF) Key Features:

• Massive resources: 2* 128 64-bit+ registers.n* Integer Units, m*Floating-Point Units, lots of special registers for branches, predication, loop unrolling etc.

• Speculation: The processor can ‘pre-load’ data into the caches even if the access is potentially illegal - so it speculates that the data are valid. Correctness can later be checked in 1 cycle !

• Explicit Parallelization: The compiler ‘tells’ the processor what can be executed in parallel (and what requires sequential processing)

• Predication: The compiler tells the processor to run (for example) both parts of an ‘IF condition’ in parallel and then discard the ‘wrong part’.

Page 14: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Speculation: preloading data into caches reduces (hides) latency

Memory Latency:~200ns to ~1000ns

On-Chip Cache Latency ~4ns

Off-Chip Cache Latency ~30ns

Page 15: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

IA-64 (IPF) Key Features:

• Massive resources: 2* 128 64-bit+ registers.n* Integer Units, m*Floating-Point Units, lots of special registers for branches, predication, loop unrolling etc.

• Speculation: The processor can ‘pre-load’ data into the caches even if the access is potentially illegal - so it speculates that the data are valid. Correctness can later be checked in 1 cycle !

• Explicit Parallelization: The compiler ‘tells’ the processor what can be executed in parallel (and what requires sequential processing)

• Predication: The compiler tells the processor to run (for example) both parts of an ‘IF condition’ in parallel and then discard the ‘wrong part’.

Page 16: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

RISC vs. EPIC instruction formatRISC vs. EPIC instruction format

Instruction 2Instruction 2 Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

128-bit bundle128-bit bundle

00127127

The new instruction format enables scalability w/ compatibilityThe new instruction format enables scalability w/ compatibility

Instruction 0Instruction 0 Instruction 0Instruction 0 Instruction 0Instruction 0 Instruction 0Instruction 0

32-bit parcel32-bit parcel

Page 17: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Itanium: explicit parallelismItanium: explicit parallelism

Instruction 1Instruction 1 Instruction 2Instruction 2 Instruction3Instruction3 TemplateTemplate

00127127

6 instructions/cycle on Itanium or McKinley6 instructions/cycle on Itanium or McKinley

TemplateTemplateInstruction6Instruction6Instruction 5Instruction 5Instruction 4Instruction 4

128-bit bundle128-bit bundle

Page 18: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Itanium: explicit parallelism 2Itanium: explicit parallelism 2

Load InstructionLoad Instruction Load InstructionLoad Instruction Instruction3Instruction3 TemplateTemplate

00127127

Itanium2 (ex McKinley) and future processors will allow 4 loads/cycle

Itanium2 (ex McKinley) and future processors will allow 4 loads/cycle

Load InstructionLoad Instruction Load InstructionLoad Instruction Instruction 6Instruction 6 TemplateTemplate

Instruction 7Instruction 7 Instruction 8Instruction 8 Instruction 1Instruction 1 TemplateTemplate

128-bit bundle128-bit bundle

Page 19: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

IA-64 (IPF) Key Features:

• Massive resources: 2* 128 64-bit+ registers.n* Integer Units, m*Floating-Point Units, lots of special registers for branches, predication, loop unrolling etc.

• Speculation: The processor can ‘pre-load’ data into the caches even if the access is potentially illegal - so it speculates that the data are valid. Correctness can later be checked in 1 cycle !

• Explicit Parallelization: The compiler ‘tells’ the processor what can be executed in parallel (and what requires sequential processing)

• Predication: The compiler tells the processor to run (for example) both parts of an ‘IF condition’ in parallel and then discard the ‘wrong part’.

Page 20: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

IA-64: PredicationIA-64: PredicationRun the Run the THENTHEN and and ELSEELSE part in parallel ! part in parallel !

Instruction 2Instruction 2 Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

00127127

Predication allows to eliminate branchesPredication allows to eliminate branches

Instruction 2Instruction 2 Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

Instruction 2Instruction 2 Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

Instruction 2Instruction 2 Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

TemplateTemplateInstruction 0Instruction 0Instruction 1Instruction 1Instruction 2Instruction 2

128-bit bundle128-bit bundle

Page 21: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

PA-RISCis the only

RISC architecture

compatible to IA-64

The IA-64 instructions set has been designed with pa-risc instructions in mind

IA-64 processors can:

run pa-risc native code (under HP-UX 11)

run IA-64 native code (hp-ux, linux, …)

run IA-32 native code (Win64)

Page 22: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

running PA-RISC code on Itanium

CPU

Memory

PA-RISC CPU

HP-UX 11

Page 23: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

CPU

Memory

IPF CPU runningHP-UX 11

PA-RISC Code

DyncodeTranslator

running PA-RISC code on IPF

Page 24: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basics

trend 2:

very few instruction set architectures will survive

(no applications on ‘exotic’architectures !)

Page 25: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basicsservers

Page 26: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

A small server with 2 CPUsone address space – one copy of HP-UX 11.x (J-Class)

Page 27: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SMP advantage: big & small jobs, butthe bus may become a bottleneck ….

Page 28: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

rp7400 - a hybrid design (still SMP)

Page 29: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Half Dome / Super Dome

Page 30: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome

•Scalable Architecture –to 64–to .5 TB Physical Memory–to 192 PCI Busses

•HP-UX 11i–resource partitioning –near fault-tolerant capabilities–on-line add/replace for CPUs,

memory and I/O components

•Processor support –PA8600 introduction–PA-RISC & IA64 roadmap

Page 31: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Cell - A Modular Building Block

Up to 4 processors Up to 16/32 GB memory

(4 carriers x 8 GB ea) IO channel supports 16 PCI

busses, 16 slots Cell crossbar provides

8 GB/s Cell I/O bandwidth is

1.8 GB/s

Cro

ssb

ar

8 G

B/s

Page 32: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome Cell Board

Page 33: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome / rp8400 (16 CPUs)

Crossbar 32 GB/s

Page 34: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

server rp8400

Page 35: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome (32 CPUs)Latencies: 208, 301 and 357 ns

Crossbar 32 GB/s

Crossbar 32 GB/s

Page 36: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome (64 CPUs)Latencies: 208, 301 and 357 ns

Crossbar 32 GB/s

Crossbar 32 GB/s

Crossbar 32 GB/s

Crossbar 32 GB/s

Page 37: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basics

trend 3:

SMPs with crossbars

cells & ccNUMA

inherent upgradability

Page 38: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basicspartitions

Page 39: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

4 nPartitions (1 x 32, 1 x 16, 2 x 8 CPUs)

Page 40: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome (32 CPUs) & virtual partitions3 nPartitions (16, 8, 8 CPUs), 6 Copies of HP-UX

Page 41: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

SuperDome (32 CPUs) & virtual partitions3 nPartitions (16, 8, 8 CPUs), 6 Copies of HP-UX

Page 42: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

rp7400 – with virtual partitions (2 x HP-UX)

Page 43: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basics

trend 4:

partitioning

- GRIDS- clusters- hard partitions- virtual partitions- schedulers

Page 44: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basics

Example:SUN’S UE10000

(not a cell-based architecutre)

:

Page 45: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

UE10000: A Special SMP: 560 ns latency

Crossbar 12.6 GB/s

Page 46: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

All traffic goes through one crossbar !!!560 ns latency (SuperDome has 200 – 350ns)

Crossbar 12.6 GB/s

Page 47: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Loading multiple domains - one left

Crossbar 12.6 GB/s

Page 48: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basics

Example:

SUN’sMidFrames

(cell-based architectures)

:

Page 49: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Sun Fire 3800 Server – Basic Design

9.6 GB/s data bandwidth9.6 GB/s data bandwidthI/O 1

I/O 2

4.8

GB

/s

2.4 GB/s

4.8

GB

/s

2.4 GB/s

2.4 GB/s

2.4 GB/s

Page 50: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Sun Fire 3800 Server – Basic Design

Now with Address Routers included ... Still 9.6 GB/S

9.6 GB/s data bandwidth9.6 GB/s data bandwidthI/O 1

I/O 2

4.8

GB

/s

2.4 GB/s

4.8

GB

/s

2.4 GB/s

2.4 GB/s

2.4 GB/s

Address RouterAddress Router

Page 51: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Sun Fire 4800 Server – Basic Design

I/O 1

I/O 2

4.8

GB

/s

2.4 GB/s

4.8

GB

/s2.4 GB/s

2.4 GB/s

2.4 GB/s

9.6 GB/s data bandwidth9.6 GB/s data bandwidth

Page 52: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Sun Fire 6800 Server – Basic Design

I/O 1

I/O 24

.8 G

B/s

2.4 GB/s4

.8 G

B/s

2.4 GB/s

2.4 GB/s

2.4 GB/s4

.8 G

B/s

4.8

GB

/s9.6 GB/s data bandwidth9.6 GB/s data bandwidth

I/O 3

I/O 4

Page 53: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

System BW Scalability (random access)

0

10

20

30

40

50

60

70

0 8 16 24 32 40 48 56 64

cpus

GB

/sec

Sun MidframeN4000Superdome

Page 54: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basics

trend 5:

multi-OS environments

Page 55: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

basicsExample:

SuperDomeafter an upgrade to Itanium

:

Page 56: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Cell2 - Dramatic Improvements

Page 57: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Windows200x

SuperDome-IPF (64 CPUs)4 nPartitions (1 x 32, 1 x 16, 2 x 8 CPUs), 1 and 3 vPars for HP-UX

Linux-64 Win-64HP-UX 11i V2

Page 58: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

OS/ABI flexibility on HP IPF-SystemsWindows64 (native boot)

MS Win64 Applications compiled for IPF (64bit)

Linux64 (native boot)

MS W2Kapps for IA32

HP-UX 11i (native boot)

HP-UX ABILinux ABI

Linux 64 Applications compiled for IPF (64bit)

Linux 64 Applications compiled for IPF (64bit)

HP-UX Applications compiled for IPF

PA-Apps

ARIES

Page 59: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Windows200x

SuperDome 64 (IPF) CPUs4 nPartitions (1 x 32, 1 x 16, 2 x 8 CPUs)

Windows200xHP-UX 11i V2 HP-UX 11i V2

... all LINUX64 binaries can be

executed here

HP-UX 11i V2HP-UX 11i V2

... all LINUX64 binaries can be

executed here

Page 60: Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.

Thank You