Dezső Sima Fall 2008 (Ver. 1.0) Sima Dezső, 2008 DP/MP System Architectures.

70
Dezső Sima Fall 2008 (Ver. 1.0) Sima Dezső, 2008 DP/MP System Architectures

Transcript of Dezső Sima Fall 2008 (Ver. 1.0) Sima Dezső, 2008 DP/MP System Architectures.

Page 1: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Dezső Sima

Fall 2008

(Ver. 1.0) Sima Dezső, 2008

DP/MP System Architectures

Page 2: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Contents

2. Intel’s DP servers•

3. Intel’s MP servers

1. The evolution of Intel’s basic microarchitectures•

4. AMD’s servers

Page 3: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures

Page 4: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (1)

Figure: Intel’s Tick-Tock development model [22]

Page 5: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (2)

Figure: The speed of changes in Intel’s Tick-Tock development model [24]

Page 6: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (3)

Figure: Key enhancements introduced into the Core2 microarchitecture (vs the Pentium4) [22]

• Wide dynamic execution - 4-wide decode/rename/retire

• Advanced digital media processing - 128-bit wide SSE execution unit

• Improved graphics/MM - New SSE 4.1 instructions

• Smart memory access - Memory disambiguation (spec. loads) -Hardware prefetching

• Advanced smart cache - Low latency, high BW shared L2 cache

Page 7: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (4)

Figure: Key enhancements introduced into the Penryn microarchitecture (vs the Core) [23]

Page 8: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (5)

Figure: Improvements introduced into the Nehalem microarchitecture (vs Penryn) [22]

Page 9: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (6)

Figure: Hyperthreading inthe Nehalem microarchitecture [22]

Page 10: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (7)

2-level cache hierarchy 3-level cache hierarchy

Figure: 3-level cache hierarchy of Nehalem [22]

Page 11: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (8)

Figure: Nehalem’s innovations in the system architecture [22]

Page 12: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

1. The evolution of Intel’s basic microarchitectures (9)

Figure: Nehalem’s innovations in the system architecture [22]

Page 13: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

QickPath Interconnect

3.2 GHz DDR20-bit (16-bit data 4-bit CRC) on each lane

12.8 GT/s on each direction

Fastest FSB

Formerly: Common System interconnect (CSI)

400 MHz QDR 8 Byte 12.8 GT/s bidirectional

HyperTransport Bus

HT 1.0: 0.8 GHz DDR 2-Byte 3.2 GT/s on each direction

HT 2.0: 1.0 GHz DDR 2-Byte 4.0 GT/s on each direction

HT 3.0: 2.6 GHz DDR 2-Byte 10.4 GT/s on each direction

Typical speed and width figures in AMD’s systems

1. The evolution of Intel’s basic microarchitectures (10)

Page 14: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP Servers

Page 15: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Typical configuration of an early DP-server motherboard based on Intel’s E7500/E7501 (Plunas) chipset

P4

ICH3-S

FWH

E7500/E7501 SDRAM

SDRAM

SDRAMinterface

SDRAMinterface

DDR 200/266

registered, ECC opt.

Ultra ATA/100

PCI v.2.2

USB v. 1.1GPIO

FSB

LPC

HI 1.5

P4

(with RASUM)

HI 2.0

PCI-X v.2.2

Prestonia Prestonia

MCH

400/533 MHz

8/12/16 GB

HI 2.0

HI 2.0

PCI-Xbridge

SATA c.

GbE c.

PCI-X v.2.2

SATA

GbE

Video c.

MbE c.

PCI v.2.2

LAN

(5 ports)

SVGA

MbE

SIO

FD KB MS SP PP

SCSI c.SCSI

(1-2 slots)

(1-2 slots)

(3 slots)

3200-4264

1600-2128

1600-2128

266

133

1.5

2*100

~5

1066

1066

1066

(2 ports)

2. Intel’s DP servers (1)

Page 16: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset

ICH5R

FWH

E7520 SDRAM

SDRAM

SDRAMinterface

SDRAMinterface

DDR 266/333, DDR2 400

registered, ECC opt.

Ultra ATA/100

PCI v.2.3

USB v. 2.0

SATA

AC' 97 v.2.3

GPIO

FSB

LPC

HI 1.5

(with RASUM)

PCI E. x8PCI-X v.1.0b

NoconaPaxville DP

NoconaPaxville DP

MCH

800 MHz

16/24/32 GB

PCI E. x8

PCI E. x8

PCI-Xbridge

SCSI c.

GbE c.

PCI-X v.1.0bPCI E. x8(or 2x x4)

SCSI

GbE

Video c.

MbE c.

PCI v.2.3

LAN

(4 ports)

SVGA

MbE

SIO

FD KB MS SP PP

60

3200

2128-3200

2128-3200

266

133

~1.4

2*100

2*150

~5

4000

4000

4000

(2 ports)

(2 ports)

2. Intel’s DP servers (2)

P4 P4

Page 17: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (3)

Paxville DP 2.8

2xIrwindale cores/90 nm

Figure: Intel’s Pentium 4 based DC DP server processors [33], [34]

Page 18: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

http://www.xbitlabs.com/articles/cpu/display/opteron-xeon-workstation_5.html

Nocona PaxvilleIrwindaleNocona

(L2 enlarged to 2MB) (2 x Irwindale cores)

6/2004

90 nm112 mm2

125 mtrs

mPGA 604

2/2005

90 nm135 mm2

169 mtrs

mPGA 604

10/2005

90 nm2 x 135 mm2

2 x 169 mtrs

Xeon DP 2.8Xeon MP 7020-7041

mPGA 604

Figure: Genealogy of the Xeon Paxville core

(DP enhanced Prescott) (DP enhanced Prescott 2M)

http://www.theinquirer.net/default.aspx?article=16879http://www.gamepc.com/labs/view_content.asp?id=x36o252&page=2

Sources:

Intel’s first 64-bit Xeon

In contrast: correspondingdesktop processors have the LGA 775 socket.

2. Intel’s DP servers (4)

Page 19: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (5)

Xeon 5000(Dempsey)Paxville DP 2.8

2xIrwindale cores/90 nm 2xCedar Mill/65 nm(65 nm shrink of the Irwindale)

Figure: Intel’s Pentium 4 based DC DP server processors [33], [34]

Page 20: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (6)

Xeon 5100(Woodcrest)

Core2-based/65 nm

Xeon 5300(Clowertown)

Core2-based/65 nm2xXeon 5100

Figure: Intel’s Core2 based DC/QC DP server processors [33], [35], [36]

Page 21: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (7)

Figure: Intel’s Penryn based QC DP server processor/45 nm (Source: Intel)

Xeon 5400(Harpertown)

Page 22: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (8)

Figure: Contrasting the die shots of the Xeon 5400 and 5300 processors [24]

Page 23: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (9)

Series---

(Paxville DP)5000

(Dempsey)5100

(Woodcrest)5200

(Wolfdale)5300

(Clovertown)5400

(Harpertown)

Dual/Quad-Core DC DC DC DC QC QC

Models Xeon DP 2.8 5030-5080 5110-5160E5205/E5260/

X5275E5310-5345/

X5355E5405-E5472, X5450-

X5482

Microarchitecture

Pentium 4 Pentium 4 Core2 Penryn Core2 Penryn

Core 2*Irwindale dies 2*Cedar dies Single die 2*Woodcrest dies 2*Penryn

Intro. 10/2005 5/2006 6/2006 11/2007 11/2006 11/2007

Techology 90 nm 65 nm 65 nm 45 nm 65 nm 45 nm

Die size 2*135 mm2 2*81 mm2 143 mm2 2*143 mm2 2*107 mm2

Nr. of transistors 2*169 mtrs 2*188 mtrs 291 mtrs 2*291 mtrs 2*410 mtrs

Fc [GHz] 2.8 2.6-3.73 1.6-3.0 1.86-3.40 1.6-2.66 2.00-3.20

L2 2*2 MB 2*2 MB 4 MB 6 MB 2*4 MB 2*6 MB

FSB [MT/s] 800 667/1066 1066/1333 1333/1600 1066/1333 1333/1600

TDP [W] 135 95/130 65/80 65/80 80/120 80/120/150

Socket PGA 604 LGA 771 LGA 771 LGA771 LGA 771 LGA 771

EM64T

HT --- --- --- ---

ED

VT

EIST (5140 or above)

La Grande --- ---

AMT2 --- ---

Flex Migration --- --- ---

Table: Intel’s DC, QC DP servers

Page 24: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (10)

Gainstown

(Q1/2009) (Q1/2010?)

Nehalem-based/45 nm Westmere_based/32 nm(Socket 1366)

???

Figure: Intel’s future DP server processors [21]

(Both 2-way multithreaded)

Page 25: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure : Intel’s late Pentium4 based and subsequent DP server platforms

DP Platforms

Xeon DP 2.8 DC

10/2005

DP Cores

DP Chipsets

2. Intel’s DP servers (11)

90 nm/2*169 mtrs2*2 MB L2800 MT/sPGA604

7520

6/2004

(Lindenhurst)

800 MT/s2 x DDR/DDR2

16 GB

Pentium4-based (90/65 nm)

/Paxville DP) DC

Page 26: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Evolution of Intel’s DP servers

800MT/s

6.4 GB/s

7520

(Lindenhurst)

Nocona

Paxville DC

SC/DC

Nocona

Paxville

SC/DC

24 LanesPCIe

7.5GB/s

Dual DDR2 400 MT/s6.4 GB/s

2. Intel’s DP servers (12)

Single

Page 27: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset

ICH5R

FWH

E7520 SDRAM

SDRAM

SDRAMinterface

SDRAMinterface

DDR 266/333, DDR2 400

registered, ECC opt.

Ultra ATA/100

PCI v.2.3

USB v. 2.0

SATA

AC' 97 v.2.3

GPIO

FSB

LPC

HI 1.5

(with RASUM)

PCI E. x8PCI-X v.1.0b

NoconaPaxville DP

NoconaPaxville DP

MCH

800 MHz

16/24/32 GB

PCI E. x8

PCI E. x8

PCI-Xbridge

SCSI c.

GbE c.

PCI-X v.1.0bPCI E. x8(or 2x x4)

SCSI

GbE

Video c.

MbE c.

PCI v.2.3

LAN

(4 ports)

SVGA

MbE

SIO

FD KB MS SP PP

60

3200

2128-3200

2128-3200

266

133

~1.4

2*100

2*150

~5

4000

4000

4000

(2 ports)

(2 ports)

2. Intel’s DP servers (13)

P4 P4

Page 28: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure : Intel’s late Pentium4 based and subsequent DP server platforms

DP Platforms

Xeon DP 2.8 DC

10/2005

DP Cores Xeon 5100 Xeon 5300Xeon 5000

11/20066/20065/2006

DP Chipsets

(Dempsey) DC (Woodcrest) DC (Clowertown) QC

5000

06/2006

5000P 5000V/Z

6/2006

(Blackford) (Blackford V/Z)

2xFSB1066MT/s

4 x FBDIMM(DDR2)64GB

2 x FBDIMM(DDR2)16GB

2. Intel’s DP servers (11)

(Bensley)

65 nm/291 mtrs4 MB L2

667/1066 MT/sLGA771

Pentium4/Core2-based (65 nm)

65 nm/2*188 mtrs2*2 MB L2

667/1066 MT/sLGA771

65 nm/2*291 mtrs2*4 MB L2

667/1066 MT/sLGA771

90 nm/2*169 mtrs2*2 MB L2800 MT/sPGA604

7520

6/2004

(Lindenhurst)

800 MT/s2 x DDR/DDR2

16 GB

Pentium4-based (90/65 nm)

/Paxville DP) DC

Page 29: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Evolution of Intel’s DP servers

800MT/s

6.4 GB/s

7520

(Lindenhurst)

Nocona

Paxville

SC/DC

Nocona

Paxville

SC/DC

24 LanesPCIe

7.5GB/s

Dual1066MT/s17.1 GB/s

DempseyWoodcrestClowertown

DC

5000

(Blackford)24 Lanes

PCIe7.5GB/s

DempseyWoodcrestClowertown

DC

Dual DDR2 400 MT/s6.4 GB/s

Quad FB-DIMM 533 MT/s17.1 GB/s

2. Intel’s DP servers (14)

Single

Page 30: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (15)

http://www.tyan.com/tempest/training/s5370.pdf

Intel’s Bensley platform [30] (Actually the block diagram of Tyan’s S5370 DP server)

Page 31: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

FB-DIMM DDR2

64 GB

5000P

SBE2

Xeon DC/QC

5000 DC5100 DC5300 QC

Figure: Bensley DP motherboard, with the 5000 (Blackford) chipset (Supermicro X7DB8+) for the Xeon 5000 DC/QC DP processor families [7]

2. Intel’s DP servers (16)

Page 32: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Table: Latency and bandwidth scaling of the Intel 5000 platform (2006) vs the earlier generation (2004) [1]

2. Intel’s DP servers (17)

Page 33: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure : Intel’s late Pentium4 based and subsequent DP server platforms

DP Platforms

Xeon DP 2.8 DC

10/2005

DP Cores Xeon 5100 Xeon 5300 Xeon 5400Xeon 5000

11/200711/20066/20065/2006

DP Chipsets

(Dempsey) DC (Woodcrest) DC (Clowertown) QC (Harpertown) QC

5000

06/2006

5000P 5000V/Z

5100

6/2006

(Blackford) (Blackford V/Z)

10/2007

2xFSB1066MT/s

4 x FBDIMM(DDR2)64GB

2 x FBDIMM(DDR2)16GB

5100

10/2007

(San Clemente)

2xFSB1333/1066 MT/s

2 x DDR232/48 GB

2. Intel’s DP servers (11)

(Bensley) (Cranberry Lake)

65 nm/291 mtrs4 MB L2

667/1066 MT/sLGA771

Pentium4/Core2-based (65 nm) Penryn-based (45 nm)

65 nm/2*188 mtrs2*2 MB L2

667/1066 MT/sLGA771

65 nm/2*291 mtrs2*4 MB L2

667/1066 MT/sLGA771

45 nm/850 mtrs2*6 MB L2

1066/1333 MT/sLGA771

90 nm/2*169 mtrs2*2 MB L2800 MT/sPGA604

Xeon 5200

(Harpertown) DC

45 nm/850 mtrs2*6 MB L2

1066/1333 MT/sLGA771

7520

6/2004

(Lindenhurst)

800 MT/s2 x DDR/DDR2

16 GB

Pentium4-based (90/65 nm)

/Paxville DP) DC

Page 34: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (18)

Figure: The Cranberry Lake platform [19]

Xeon 5400 (QC)Xeon 5200 (DC)

5100 chipset

Page 35: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

2. Intel’s DP servers (19)

Figure: Intel’s forthcoming Nehalem-based DP server system architecture [31]

QuickPath Interconnect

Integrated memory controller

Page 36: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

3. Intel’s MP servers

Page 37: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

3. Intel’s MP servers (1)

Figure: Intel’s Pentium4 based Xeon MP processors [17], [18]

Tulsa

90 nm 65 nm 65 nm

CDM: Cedar Mill core

Potomac Paxville MP

Page 38: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

3. Intel’s MP servers (2)

Figure: Intel’s Core2 /Penryn based Xeon MP processors [19], [20]

65 nm

45 nm

Core2 based

Penryn based

Page 39: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Table: Dual- and Quad-Core Xeon MP-lines

1 Concerning the L2 cache size, there is a contradiction in Intel’s dokumentation; whereas according to the data sheets models of the 7000 series include 1 or 2 MB L2 caches the comparison charts for all models shows 1 MB large L2 caches.

3. Intel’s MP servers (3)

Series7000

(Paxville MP)7100

(Tulsa)7200

(Tigerton DC)7300

(Tigerton QC)

7400(Dunnington

QC)

7400(Dunnington

6C)

Dual/Quad-Core DC DC 2xSC 2xDC QC 6C

Models 7020-70417110M-7140M / 7110N-7150N

E7210/E7220E7310/E7320/E7330/

E7340/X7350E7420-E7440 E7450/X7460

Microarchitecture Netburst Netburst Core 2 Penryn

Core2xIrwindale

diesCedar Mill-based

single die2xSC Woodcrest

dies2xWoodcrest dies

Intro. 11/2005 8/2006 9/2007 9/2008

Techology 90 nm 65 nm 65 nm 45 nm

Die size 2*135 mm2 435 mm2 2*143 mm2 503 mm2

Nr. of transistors 2*169 mtrs 1328 mtrs 2*291 mtrs 1900 mtrs

Fc [GHz] 2.66-3.0 2.5-3.5 2.4/2.93 1.6/2.13/2.4/2.4/2.93 2.13-2.40 2.40/2.66

L2 2*1/2 MB1 2*1 MB 2*4 MB 2*2/2*2/2*3/2*4/2*4 MB 3*2 MB 3*3 MB

L3 --- 4/8/16 MB --- --- 8/12/16 MB 12/16 MB

FSB [MT/s] 667/800 667/800 1066 1066

TDP [W] 95/150 95/150 80 80/80/80/80/130 90 90/130

Socket mPGA604 mPGA604 mPGA604 mPGA 604

EM64T

HT --- ---

ED

VT

EIST

La Grande --- --- n.a.

AMT2 --- --- (Except E7310) n.a.

Page 40: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

3. Intel’s MP servers (4)

Figure: Intel’s Nehalem based MP server processor [21]

Page 41: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem)

Preceding NBs

Xeon MP1 Xeon MP1 Xeon MP1 Xeon MP1

3. Intel’s MP servers (5)

SC SC SC SC

1 Xeon MP before Potomac

Typically HI 1.5(266 MB/s)

Page 42: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

3. Intel’s MP servers (6)

Figure: Former Pentium II/III MP systemarchitecture [32]

Page 43: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

MP Platforms

Xeon 7000

11/2005

MP Cores Xeon 7100

8/2006

MP Chipsets

3/2005 4/2006

8500 8501

(Paxville MP DC) (Tulsa DC)

(Twin Castle) (?)

Figure : Intel’s Xeon-based MP server platforms

2xFSB667 MT/s

4 x XMB(2 x DDR2)

32GB

2xFSB800 MT/s

4 x XMB(2 x DDR2)

32GB

Truland

65 nm/1328 mtrs

2x1 MB L216/8/4 MB L3

800/667 MT/smPGA 604

P4-based/65 nm

3/2005

Xeon MP

3/2005

(Potomac SC)

90 nm/2x169 mtrs

2x1 (2) MB L2-

800/667 MT/smPGA 604

90 nm/675 mtrs

1 MB L28/4 MB L3

667 MT/smPGA 604

P4-based/90 nm

Truland

3. Intel’s MP servers (7)

Page 44: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem)

Preceding NBs

Xeon MP1 Xeon MP1 Xeon MP1 Xeon MP1

(Twin Castle)

XMB

XMB

XMB

XMB

3. Intel’s MP servers (8)

8500

SC SC SC SC

28 PCIe lanes + HI 1.5

Truland

Caneland

1 Xeon MP before Potomac

Potomac2

Paxville MP3

DC/SC

Potomac2

Paxville MP3

DC/SC

Potomac2

Paxville MP3

DC/SC

Potomac2

Paxville MP3

DC/SC

(266 MT/s)Typically HI 1.5

(266 MB/s) (7 GT/s)

Page 45: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

eXxternalMemoryBridge

IndependentMemory Interface

5.33 GB inbound BW2.67 GB outbound BW simultaneously

Figure: Intel’s 8501 chipset for MP servers (4/ 2006) [4]

Xeon DC MP 7000(4/2005) or later

DC/QC MP 7000 processors

Intelligent MCDual mem. channelsDDR 266/333/4004 DIMM/channel

(North Bridge)

3. Intel’s MP servers (9)

Serial link

Page 46: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

7000/7100

FB-DIMM DDR2

64 GB

Figure: Quad socket Intel E8501 chipset based motherboard (Supermicro X6QT8) for the Xeon 7000/7100 DC MP processor families [7]

Xeon DC

E8501 NB

ICH5R SB

3. Intel’s MP servers (10)

Page 47: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure Bandwith bottlenecks in Intel’s 8501 MP server platform [2]

3. Intel’s MP servers (11)

Page 48: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

MP Platforms

Xeon 7000

11/2005

MP Cores Xeon 7200 Xeon 7300Xeon 7100

9/20078/2006

MP Chipsets

3/2005 4/2006 9/2007

8500 8501 7300

(Paxville MP DC) (Tulsa DC) (Tigerton DC) (Tigerton) QC

Caneland

9/2007

(Clarksboro)(Twin Castle) (?)

Figure : Intel’s Xeon-based MP server platforms

2xFSB667 MT/s

4 x XMB(2 x DDR2)

32GB

2xFSB800 MT/s

4 x XMB(2 x DDR2)

32GB

4xFSB1066 MT/s

4 x FBDIMM(DDR2)512GB

Truland

Xeon 7400

9/2008

(Dunnington 6C)

65 nm/1328 mtrs

2x1 MB L216/8/4 MB L3

800/667 MT/smPGA 604

65 nm/2x291 mtrs

2x4 MB L2-

1066 MT/smPGA 604

65 nm/2x291 mtrs

2x(4/3/2) MB L2-

1066 MT/smPGA 604

45 nm/1900 mtrs

9/6 MB L216/12/8 MB L3

1066 MT/smPGA 604

P4-based/65 nm Core2-based/65 nm Core2-based/45 nm

3/2005

Xeon MP

3/2005

(Potomac SC)

90 nm/2x169 mtrs

2x1 (2) MB L2-

800/667 MT/smPGA 604

90 nm/675 mtrs

1 MB L28/4 MB L3

667 MT/smPGA 604

P4-based/90 nm

Truland Caneland

7300

3. Intel’s MP servers (12)

Page 49: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem)

Preceding NBs

Xeon MP1 Xeon MP1 Xeon MP1 Xeon MP1

(Clarksboro)

Tigerton Tigerton Tigerton Tigerton

(Twin Castle)

XMB

XMB

XMB

XMB

3. Intel’s MP servers (13)

8500

6C/QC/DC 6C/QC/DC 6C/QC/DC 6C/QC/DC

SC SC SC SC

FB-DIMM(DDR2)

28 PCIe lanes + HI 1.5

Dunnington Dunnington Dunnington Dunnington

8 PCI-E lanes + ESI

Truland

Caneland

7300

1 Xeon MP before Potomac

Potomac2

Paxville MP3

DC/SC

Potomac2

Paxville MP3

DC/SC

Potomac2

Paxville MP3

DC/SC

Potomac2

Paxville MP3

DC/SC

• Cransfield SC)• Tulsa (DC)

3 Supports also

2 First x86-64 MP processor

(266 MT/s)Typically HI 1.5

(266 MB/s) (7 GT/s)

(2 GT/s) (1 GT/s)

Page 50: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Intel’s four socket 7300 (Caneland) platform, based on the 7300 (Clarksboro) chipset for the Xeon 7200/7300 DC/QC MP families (9/2007) [6]

FB-DIMM

up to 512 GB

7200 (Tigerton DC, Core2), DC

Xeon

7300 (Tigerton QC, Core2), QC

3. Intel’s MP servers (14)

Page 51: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

FB-DIMM DDR2

192 GB

ATI ES1000 Graphics with    32MB video memory

7200 DC 7300 QC(Tigerton)

Xeon

Figure: Caneland MP motherboard, with the 7300 (Clarksboro) chipset (Supermicro X7QC3) for the Xeon 7200/7300 DC/QC MP processor families [7]

SBE2 SB

7300 NB

3. Intel’s MP servers (15)

Page 52: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Performance comparison of the Caneland platform with a quad core Xeon (7300 family) vs the Bensley platform with a dual core Xeon 7140M [13]

3. Intel’s MP servers (16)

Page 53: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

3. Intel’s MP servers (17)

FB-DIMM(DDR2)

QPI

Figure: Intel’s Nehalem based MP server system architecture [22]

Page 54: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

4. AMD’s servers

Page 55: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

UP: Opteron 100/1000, DP: Opteron 200/2000 MP: Opteron 800/8000

CPU0

1MB L2 Cache

CPU1

System Request Interface

Crossbar Switch

MemoryController HT

1MB L2 Cache

CPU0

1MB L2 Cache

CPU1

System Request Interface

Crossbar Switch

MemoryController 0 1 2

1MB L2 Cache

HyperTransport™

2 x 72 bit 2 x 72 bit 800/8000: 3 coherent links200/2000: 1 coherent link

Figure: Basic structure of the Opteron families [8]

4. AMD’s servers (1)

Page 56: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

AMD’s 4P/8P Direct Connect server architecture [2]

4. AMD’s servers (2)

Page 57: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Block diagram of a DP QC motherboard (Asus KFSN4-DRE/SAS)

for AMD Opteron 2300 QC family [10]

4. AMD’s servers (3)

Page 58: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: DP motherboard (Asus KFSN4-DRE/SAS) for the AMD Opteron 2300 QC family [10]

DDR2

64 GB

2300

Opteron QC DP

nForce 2200 chipset

4. AMD’s servers (4)

Page 59: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: Block diagram of a QP QC motherboard (ASUS KFN5-Q/SAS) for AMD’s Opteron 8000 DC/QC familes [10]

4. AMD’s servers (5)

Page 60: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Figure: 4-socket motherboard (ASUS KFN5-Q/SAS) for the AMD Opteron 8000 DC/QC familes [10]

8300

Opteron QC MP

nForce 3600 chipset

DDR2

64 GB

4. AMD’s servers (6)

Page 61: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

4. AMD’s servers (7)

Figure: Simplified block diagram of the QC Barcelona [25]

Page 62: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

4. AMD’s servers (8)

Figure: Die shot and floor plan of Barcelona [27]

Page 63: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

4. AMD’s servers (9)

Figure: Cache architectures of AMD’s QC Barcelona and Shanghai processors [25], [26]

Barcelona (65 nm)

Shanghai (65 nm)

Page 64: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

4. AMD’s servers (10)

Figure: Die shot of Shanghai [29]

Pin to pin compatible

with Barcelona

6 MB shared L3

Page 65: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

4. AMD’s servers (11)

Figure: AMD’s roadmap for DP/MP platforms (2000/8000 Series) [28]

(Virtualisation)

Page 66: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

References

[1]: Radhakrisnan S., Sundaram C. and Cheng K., „The Blackford Northbridge Chipset for the Intel 5000,” IEEE Micro, March/April 2007, pp. 22-33

[2]: Next-Generation AMD Opteron Processor with Direct Connect Architecture – 4P Server Comparison http://www.amd.com/us-en/assets/content_type/DownloadableAssets/4P_Server_Comparison _PID_41461.pdf

[3]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm

[4]: Intel® E8501 Chipset North Bridge (NB) Datasheet, Mai 2006, http://www.intel.com/design/chipsets/e8501/datashts/309620.htm

[5]: Conway P & Hughes B., „The AMD Opteron Northbridge Architecture”, IEEE MICRO, March/April 2007, pp. 10-21

[6]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm

[7]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/

[8] Sander B., „AMD Microprocessor Technologies,” 2006, http://www.ewh.ieee.org/r4/chicago/foxvalley/IEEE_AMD_Meeting.ppt

[9]: AMD Quad FX Platform with Dual Socket Direct Connect (DSDC) Architecture , http://www.asisupport.com/ts_amd_quad_fx.htm

[10]: Asustek motherboards - http://www.asus.com.tw/products.aspx?l1=9&l2=39 http://support.asus.com/download/model_list.aspx?product=5&SLanguage=en-us

Literature (1)

Page 67: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

[11] Kanter, D. „A Preview of Intel's Bensley Platform (Part I),” Real Word Technologies, Aug. 2005, http://www.realworldtech.com/page.cfm?ArticleID=RWT110805135916&p=2

[12] Kanter, D. „A Preview of Intel's Bensley Platform (Part II),” Real Word Technologies, Nov. 2005, http://www.realworldtech.com/page.cfm?ArticleID=RWT112905011743&p=7

[13] Quad-Core Intel® Xeon® Processor 7300 Series Product Brief, Intel, Nov. 2007 http://download.intel.com/products/processor/xeon/7300_prodbrief.pdf

[14] „AMD Shows Off More Quad-Core Server Processors Benchmark” X-bit labs, Nov. 2007 http://www.xbitlabs.com/news/cpu/display/20070702235635.html

[15] AMD, Nov. 2006 http://www.asisupport.com/ts_amd_quad_fx.htm

Literature (2)

[16]: Rusu S., “A Dual-Core Multi-Threaded Xeon Processor with 16 MB L3 Cache,” Intel, 2006, http://ewh.ieee.org/r5/denver/sscs/Presentations/2006_04_Rusu.pdf

[17]: Goto H., Intel Processors, PCWatch, March 04 2005, http://pc.watch.impress.co.jp/docs/2005/0304/kaigai162.htm

[18]: Gilbert J. D., Hunt S., Gunadi D., Srinivas G., “The Tulsa Processor,” Hot Chips 18, 2006, http://www.hotchips.org/archives/hc18/3_Tues/HC18.S9/HC18.S9T1.pdf

[19]:Goto H., IDF 2007 Spring, PC Watch, April 26 2007, http://pc.watch.impress.co.jp/docs/2007/0426/hot481.htm

Page 68: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Literature (3)

[20]: Hruska J., “Details slip on upcoming Intel Dunnington six-core processor,” Ars technica, February 26, 2008, http://arstechnica.com/news.ars/post/20080226-details-slip-on- upcoming-intel-dunnington-six-core-processor.html

[21]: Goto H,, 32 nm Westmere arrives in 2009-2010, PC Watch, March 26 2008, http://pc.watch.impress.co.jp/docs/2008/0326/kaigai428.htm

[22]: Singhal R., “Next Generation Intel Microarchitecture (Nehalem) Family: Architecture Insight and Power Management , IDF Taipeh, Oct. 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TPTS001/FA08%20IDF -Taipei_TPTS001_100.pdf[23]: Smith S. L., “45 nm Product Press Briefing,”, IDF Fall 2007, ftp://download.intel.com/pressroom/kits/events/idffall_2007/BriefingSmith45nm.pdf

[24]: Bryant D., “Intel Hitting on All Cylinders,” UBS Conf., Nov. 2007, http://files.shareholder.com/downloads/INTC/0x0x191011/e2b3bcc5-0a37-4d06- aa5a-0c46e8a1a76d/UBSConfNov2007Bryant.pdf

[25]: Barcelona's Innovative Architecture Is Driven by a New Shared Cache , http://developer.amd.com/documentation/articles/pages/8142007173.aspx

[26]: Larger L3 cache in Shanghai, Nov. 13 2008, AMD, http://forums.amd.com/devblog/blogpost.cfm?threadid=103010&catid=271

[27]: Shimpi A. L., “Barcelona Architecture: AMD on the Counterattack,” March 1 2007, Anandtech, http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939&p=1

Page 69: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Literature (4)

[28]: Rivas M., “Roadmap update,”, 2007 Financial Analyst Day, Dec. 2007, AMD, http://download.amd.com/Corporate/MarioRivasDec2007AMDAnalystDay.pdf

[29]: Scansen D., “Under the Hood: AMD’s Shanghai marks move to 45 nm node,” EE Times, Nov. 11 2008, http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=212002243

[30]: 2-way Intel Dempsey/Woodcrest CPU Bensley Server Platform, Tyan,http://www.tyan.com/tempest/training/s5370.pdf

[31]: Gelsinger P. P., “Intel Architecture Press Briefing,”, 17. March 2008, http://download.intel.com/pressroom/archive/reference/Gelsinger_briefing_0308.pdf

[32]: Mueller S., Soper M. E., Sosinsky B., Server Chipsets, Jun 12, 2006, http://www.informit.com/articles/article.aspx?p=481869

[33]: Goto H., IDF, Aug. 26 2005, http://pc.watch.impress.co.jp/docs/2005/0826/kaigai207.htm

[34]: TechChannel, http://www.tecchannel.de/_misc/img/detail1000.cfm?pk=342850& fk=432919&id=il-74145482909021379

[35]: Intel quadcore Xeon 5300 review, Nov. 13 2006, Hardware.Info, http://www.hardware.info/en-US/articles/amdnY2ppZGWa/Intel_quadcore_Xeon_ 5300_review

Page 70: Dezső Sima Fall 2008 (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

[36]: Wasson S., Intel's Woodcrest processor previewed, The Bensley server platform debuts, Mai 23, 2006, The Tech Report, http://techreport.com/articles.x/10021/1