Dezső Sima 20 11 December

79
Dezső Sima 2011 December (Ver. 1.6) Sima Dezső, 2011 Platforms II.

description

Dezső Sima 20 11 December. Platforms II. (Ver. 1 . 6 ).  Sima Dezső, 20 11. 3. Platform architectures. Contents. Platform architectures. 3 .1 . Design space of the basic platform architecture. 3 . 2 . The driving force for the evolution of platform architectures. - PowerPoint PPT Presentation

Transcript of Dezső Sima 20 11 December

Page 1: Dezső  Sima 20 11 December

Dezső Sima

2011 December

(Ver. 1.6) Sima Dezső, 2011

Platforms II.

Page 2: Dezső  Sima 20 11 December

3. Platform architectures

Page 3: Dezső  Sima 20 11 December

Contents

3.1. Design space of the basic platform architecture•

3.3. DT platforms

Platform architectures•

3.3.1. Design space of the basic architecture of DT platforms

3.3.2. Evolution of Intel’s home user oriented multicore DT platforms

3.3.3. Evolution of Intel’s business user oriented multicore DT platforms

3.4. DP server platforms•

3.4.1. Design space of the basic architecture of DP server platforms

3.4.2. Evolution of Intel’s low cost oriented multicore DP server platforms

3.4.3. Evolution of Intel’s performance oriented multicore DP server platforms

3.2. The driving force for the evolution of platform architectures•

Page 4: Dezső  Sima 20 11 December

Contents

3.5. MP server platforms•

3.5.2. Evolution of Intel’s multicore MP server platforms•

3.5.3. Evolution of AMD’s multicore MP server platforms•

3.5.1. Design space of the basic architecture of MP server platforms

Page 5: Dezső  Sima 20 11 December

3.1. Design space of the basic platform architecture

Page 6: Dezső  Sima 20 11 December

3.1 Design space of the basic platform architecture (1)

Platform architecture

Architecture of theprocessor subsystem

• Interpreted only for DP/MP systems• In SMPs: Specifies the interconnection of the processors and the chipset• In NUMAs: Specifies the interconnections between the processors

Specifies

MCH

ICH

P P P P

MCH

..

..

. .

ICH

P P P P

• Memory is attached to the MCH• There are serial FB-DIMM channels

Processors are connected to the MCH by individual buses

Architecture of the I/O subsystem

Specifies the structure of the I/O subsystem

(Will not be discussed)

Example: Core 2/Penryn based MP SMP platform

MCH

..

..

. .

ICH

P P P P

The chipset consist of two partsdesignated as the MCH and the ICH

FSB

Architecture of thememory subsystem

• the point and • the layout

of the interconnection

Page 7: Dezső  Sima 20 11 December

The notion of Basic platform architecture

Platform architecture

Architecture of theprocessor subsystem

Architecture of the I/O subsystem

Architecture of thememory subsystem

Basic platform architecture

3.1 Design space of the basic platform architecture (2)

Page 8: Dezső  Sima 20 11 December

The notion of Basic platform architecture

Platform architecture

Architecture of theprocessor subsystem

Architecture of the I/O subsystem

Architecture of thememory subsystem

Basic platform architecture

3.1 Design space of the basic platform architecture (2)

Page 9: Dezső  Sima 20 11 December

SMP systems

Architecture of the processor subsystem

NUMA systems

Scheme of attaching the processors

to the rest of the platform

Scheme of interconnecting the processors

MCH

ICH

P P P P

FSB

P

..

..

..

..

..

..

PP

..

..

..

..

..

..

P

Examples

Architecture of the processor subsystem

Interpreted only for DP and MP systems.The interpretation depends on whether the multiprocessor system is an SMP or NUMA

3.1 Design space of the basic platform architecture (3)

Memory

Page 10: Dezső  Sima 20 11 December

a) Scheme of attaching the processors to the rest of the platform

(In case of SMP systems)

Scheme of attaching the processors to the rest of the platform

DP platforms MP platforms

MCH

P P P P

Memory MCH

P P P P

Memory MCH

..

P P P P

Memory

P P

Memory MCH

P P

MemoryMCH

Dual FSBsSingle FSB Dual FSBsSingle FSB Quad FSBs

3.1 Design space of the basic platform architecture (4)

Page 11: Dezső  Sima 20 11 December

b) Scheme of interconnecting the processors

(In case of NUMA systems)

PP

PP

PP

PP

Fully connected mesh

Memory

Memory

Memory

Memory

Memory

Memory

Memory

Memory

Partially connected mesh

Scheme of interconnecting the processors

3.1 Design space of the basic platform architecture (5)

Page 12: Dezső  Sima 20 11 December

The notion of Basic platform architecture

Platform architecture

Architecture of theprocessor subsystem

Architecture of the I/O subsystem

Architecture of thememory subsystem

Basic platform architecture

3.1 Design space of the basic platform architecture (6)

Page 13: Dezső  Sima 20 11 December

Architecture of the memory subsystem (MSS)

Layout of the interconnection

Point of attachingthe MSS

Architecture of the memory subsystem (MSS)

3.1 Design space of the basic platform architecture (7)

Page 14: Dezső  Sima 20 11 December

MCH Memory

MemoryProcessor

?

Point of attaching the MSS

a) Point of attaching the MSS (Memory Subsystem) (1)

Platform

Platform

3.1 Design space of the basic platform architecture (8)

Page 15: Dezső  Sima 20 11 December

Attaching memory to the MCH (Memory Control Hub)

Point of attaching the MSS

Attaching memory to the processor(s)

Point of attaching the MSS – Assessing the basic design options (2)

• Longer access time (~ 20 – 70 %), • Shorter access time (~ 20 – 70 %),

• As the memory controller is on the processor die, the memory type (e.g. DDR2 or DDR3) and speed grade is bound to the processor chip design.

• As the memory controller is on the MCH die, the memory type (e.g. DDR2 or DDR3) and speed grade is not bound to the processor chip design.

3.1 Design space of the basic platform architecture (9)

Page 16: Dezső  Sima 20 11 December

Attaching memory to the MCH (Memory Control Hub)

Point of attaching the MSS

Attaching memory to the processor(s)

DT platforms DP/MP platformsDT platforms DP/MP platforms

DT Systems with off-die memory controllers

DT Systems with on-die memory controllers

Shared memory DP/MP systems

Distributed memory DP/MP systems

SMP systems

(SymmetricalMultiporocessors)

(Systems w/ non uniformmemory access)

NUMA systems

Related terminology

3.1 Design space of the basic platform architecture (10)

Page 17: Dezső  Sima 20 11 December

Attaching memory to the MCH

Point of attaching the MSS

Attaching memory to the processor(s)

MCH

ICH

FSB

Processor

MCH

ICH

FSB

Processor

Intel’s processors before Nehalem Intel’s Nehalem and subsequent processors

Memory

Memory

Example 1: Point of attaching the MSS in DT systems

DT System with off-die memory controller DT System with on-die memory controller

Examples

3.1 Design space of the basic platform architecture (11)

Page 18: Dezső  Sima 20 11 December

Intel’s processors before Nehalem

• Shared memory DP server aka Symmetrical Multiprocessor (SMP)

• Memory does not scale with the number of processors

MCH

ICH

FSB

ProcessorProcessor

Memory

• Distributed memory DP server aka System w/ non-uniform memory access (NUMA)

• Memory scales with the number of processors

Intel’s Nehalem and subsequent processors

MCH

ICH

FSB

MemoryMemory ProcessorProcessor

Attaching memory to the MCH

Point of attaching the MSS

Attaching memory to the processor(s)

Example 2: Point of attaching the MSS in SMP-based DP servers

Examples

3.1 Design space of the basic platform architecture (12)

Page 19: Dezső  Sima 20 11 December

Point of attaching the MSS

Attaching memory to the processor(s) Attaching memory to the MCH

POWER4 (2C) (2001) POWER5 (2C) (2005)and subsequent POWER families

Montecito (2C) (2006)

Opteron server lines (2C) (2003)and all subsequent AMD lines

PA-8800 (2004)PA-8900 (2005)

and all previous PA lines

Core 2 Duo line (2C) (2006)and all preceding Intel lines

Core 2 Quad line (2x2C) (2006/2007)Penryn line (2x2C) (2008)

Figure: Point of attaching the MSS

Nehalem lines (4) (2008)and all subsequent Intel lines

Examples

Tukwila (4C) (2010??)

AMD’s K7 lines (1C) (1999-2003)

UltraSPARC III (2001)and all subsequent Sun lines

UltraSPARC II (1C) (~1997)

3.1 Design space of the basic platform architecture (13)

Page 20: Dezső  Sima 20 11 December

Figure: Attaching memory via parallel channels or serial links

Layout of the interconnection

Attaching memoryvia parallel channels

Attaching memory via serial links

Data are transferred overparallel buses

Data are transferred overpoint-to-point links in form of packets

01

E.g: 16 cycles/packet on a 1-bit wide link

15

E.g: 4 cycles/packet on a 4-bit wide link

01

MC

t

MC

t

MC

t

101

100

E.g: 64 bits data + address, command andcontrol as well as clock signals in each cycle

b) Layout of the interconnection

3.1 Design space of the basic platform architecture (14)

Page 21: Dezső  Sima 20 11 December

b1) Attaching memory via parallel channels

The memory controller and the DIMMs are connected

Example 1: Attaching DIMMs via a single parallel memory channel to the memory controller that is implemented on the chipset [45]

3.1 Design space of the basic platform architecture (15)

• by a single parallel memory channel• or a few number of memory channels

to synchron DIMMs, such as SDRAM, DDR, DDR2 or DDR3 DIMMs.

Page 22: Dezső  Sima 20 11 December

Example 2: Attaching DIMMs via 3 parallel memory channels to memory controllers implemented on the processor die

(This is actually Intel’s the Tylersburg DP platform, aimed at the Nehalem-EP processor, used for up to 6 cores) [46]

3.1 Design space of the basic platform architecture (16)

Page 23: Dezső  Sima 20 11 December

The number of lines needed depend on the kind of the memory modules, as indicated below:

SDRAM

DDR

DDR2

DDR3

168-pin

184-pin

240- pin

240-pin

All these DIMM modules provide an 8-byte wide datapath and optionally ECC and registering.

The number of lines of the parallel channels

3.1 Design space of the basic platform architecture (17)

Page 24: Dezső  Sima 20 11 December

Attaching memory via serial links

Serial links attach FB-DIMMs

..

..

..

Serial

link

Serial

link

..

FB-DIMMs provide buffering and S/P conversion

Proc./MCH

Serial links attach S/P converters w/ parallel channels

Proc./MCH

S/P

..

..

..

S/P

..

..

..

Serial

link

Serial

link

..

3.1 Design space of the basic platform architecture (18)

b2) Attaching memory via serial links

Serial memory links are point-to-point interconnects that use differential signaling.

Page 25: Dezső  Sima 20 11 December

65 nm Pentium 4 Prescott DP (2x1C)/

Core2 (2C/2*2C)

E5000 MCH

631*ESB/632*ESB IOH

FSB

65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)

FB-DIMMw/DDR2-533

Xeon 5000(Dempsey)

2x1C

Xeon 5100(Woodcrest)

2C

Xeon 5300(Clowertown)

2x2C

/ /

ESI

ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,

providing 1 GB/s transfer rate in each direction)

Xeon 5400(Harpertown)

2x2C

Xeon 5200(Harpertown)

2C

// /

Example 1: FB-DIMM links in Intel’s Bensley DP platform aimed at Core 2 processors-1

3.1 Design space of the basic platform architecture (19)

Page 26: Dezső  Sima 20 11 December

Example 2: SMI links in Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-1

3.1 Design space of the basic platform architecture (20)

Nehalem-EX (8C) Westmere-EX

(10C)

QPI

QPI

DDR3-1067

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMB

SMB

SMB

SMB

7500 IOH

QPI

Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)

or

Xeon 6500(Nehalem-EX)

(Becton)

Xeon E7-2800(Westmere-EX)

ME

SMI: Serial link between the processor and the SMB

SMB: Scalable Memory Buffer with Parallel/serial conversion

SMI links SMI links

Nehalem-EX (8C) Westmere-EX

(10C)

Page 27: Dezső  Sima 20 11 December

• The SMI interface builds on the Fully Buffered DIMM architecture with a few protocol changes, such as those intended to support DDR3 memory devices. • It has the same layout as FB-DIMM links (14 outbound and 10 inbound differential lanes as well as a few clock and control lanes).

• It needs altogether about 50 PC trails.

Example 2: The SMI link of Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-2 [26]

3.1 Design space of the basic platform architecture (21)

SMB

Page 28: Dezső  Sima 20 11 December

..

. .

Attaching memoryvia parallel channels

Layout of the interconnection

Attaching memory via serial links

Serial links attach S/P-converters w/ par. channels

Att

ach

ing

mem

ory

to t

he p

rocessor(

s)

Poin

t of

att

ach

ing

mem

ory

Att

ach

ing

mem

ory

to

th

e M

CH

P

S/P

..

..

S/P

..

..

..

S/P

..

..

.. ..

S/P

..

... .P

. .

MCH

..

S/P

..

..

..

S/P

..

... .MCH

PP

..

..

..

..

..

..

..

..

..

. .

Serial links attach FB-DIMMs

MCH

PP

..

..

. .

..

..

. . . .

..

..

. .

Parallel channels attachDIMMs

Design space of the architecture of the MSS

3.1 Design space of the basic platform architecture (22)

Page 29: Dezső  Sima 20 11 December

Subsequent fields from left to right and from top to down of the design space of the architecture of MSS allow to implement an increasing number of memory channels (nM), as discussed in Section 4.2.5 and indicated in the next figure.

Max. number of memory channels that can be implemented while using particular design options of the MSS

3.1 Design space of the basic platform architecture (23)

Page 30: Dezső  Sima 20 11 December

..

. .

Attaching memoryvia parallel channels

Layout of the interconnection

Attaching memory via serial links

Serial links attach S/P-converters w/ par. channels

Att

ach

ing

mem

ory

to t

he p

rocessor(

s)

Poin

t of

att

ach

ing

mem

ory

Att

ach

ing

mem

ory

to

th

e M

CH

P

S/P

..

..

S/P

..

..

..

S/P

..

..

.. ..

S/P

..

... .P

. .

MCH

..

S/P

..

..

..

S/P

..

... .MCH

PP

..

..

..

..

....

..

..

..

. .

Serial links attach FB-DIMMs

MCH

PP

..

..

. .

..

..

. . . .

..

..

. .

nC

Parallel channels attachDIMMs

Design space of the architecture of the MSS

3.1 Design space of the basic platform architecture (24)

Page 31: Dezső  Sima 20 11 December

The design space of the basic platform architecture-1

Platform architecture

Architecture of theprocessor subsystem

Architecture of the I/O subsystem

Architecture of thememory subsystem

Basic platform architecture

3.1 Design space of the basic platform architecture (25)

Page 32: Dezső  Sima 20 11 December

The design space of the basic platform architectures-2

Obtained as the combinations of the options available for the main aspects discussed.

Basic platform architecture

Architecture of the processor subsystem

Scheme of attaching the processors

(In case of SMP systems)

Scheme of interconnectingthe processors

(In case of NUMA systems)

Architecture of the memory subsystem (MSS)

Layout of theinterconnection

Point of attachingthe MSS

3.1 Design space of the basic platform architecture (26)

Page 33: Dezső  Sima 20 11 December

Design space of the basic architecture of particular platforms

Design space of the basic architecture of

DT platforms

Design space of the basic architecture of

DP server platforms

Design space of the basic architecture of MP server platforms

The design space of the basic platform architectures of DT, DP and MP platforms will be discussed next in the Sections 3.3.1, 3.4.1 and 3.5.1.

3.1 Design space of the basic platform architecture (27)

Section 3.3.1 Section 3.4.1 Section 3.5.1

Page 34: Dezső  Sima 20 11 December

3.2. The driving force for the evolution of platform architectures

Page 35: Dezső  Sima 20 11 December

The available (peak) memory bandwidth of a processor (BW) is a product of

BW = nM x w x fM

3.2 The driving force for the evolution of platform architectures (1)

The peak per processor bandwidth demand of a platform

• BW needs to be scaled with the peak performance of the processor.

• The peak performance of the processor increases linearly with the core count (nC).

The per processor memory bandwidth (BW) needs to be scaled with the core count (nC).

Let’s consider a single processor of a platform and the bandwidth available for it (BW).

• the number of memory channels available per processor (nM) ,

• their width (w) as well as• the transfer rate of the memory used (fM):

Page 36: Dezső  Sima 20 11 December

If we assume a constant width for the memory channel (w = 8 Byte), it can be stated that

nM x fM

needs to be scaled with the number of cores that is it needs to be doubled approximately every two years.

This statement summarizes the driving force

• for raising the bandwidth of the memory subsystem, and• at the same time also it is the major motivation for the evolution of platform architectures.

3.2 The driving force for the evolution of platform architectures (2)

Page 37: Dezső  Sima 20 11 December

The bandwidth wall

• As recently core counts (nC) double roughly every two years, also the per processor bandwidth demand of platforms (BW) doubles roughly every two years, as discussed before,

• On the other hand, memory speed (fM) doubles only approximately every four years, as indicated e.g. in the next figure for Samsung’s memory technology.

3.2 The driving force for the evolution of platform architectures (3)

Page 38: Dezső  Sima 20 11 December

The time span between e.g. DDR-400 and DDR3-1600 is approximately 7 years, this means roughly a doubling of memory speeds (fM) in every 4 years.

Evolution of the memory technology of Samsung [12]

3.2 The driving force for the evolution of platform architectures (4)

Page 39: Dezső  Sima 20 11 December

• This fact causes a widening gap between the bandwidth demand and achievable bandwidth growth due to increasing memory speed.

•This fact can be designated as the bandwidth wall.

3.2 The driving force for the evolution of platform architectures (5)

It is the task of the developers of platform architectures to overcome the bandwidth wall by providing the needed number of memory channels .

Page 40: Dezső  Sima 20 11 December

It can be shown that in case when the core count (nC) increases according to Moore’s law and memory subsystems will be evolved by using in them the fastest available memory devices, as typical, then the number of per processor available memory channels needs be scaled as

nM(nC) = √2 x √nC

to provide altogether a linear bandwidth scaling with the core count (nC).

The above relationship can be termed as the square root rule of scaling the number of memory channels.

For multiprocessors incorporating nP processors then the total number of memory channels of the platform (NM) amounts to

NM = nP x nM

Then the scaled number of memory channels available per processor (nM(nC)) and the increased device speed (fM) together will provide the needed linear scaling of the per processor bandwidth (BW) with nC.

Remark

The square root rule of scaling the number of memory channels

3.2 The driving force for the evolution of platform architectures (6)

Page 41: Dezső  Sima 20 11 December

3.3. DT platforms

3.3.1. Design space of the basic architecture of DT platforms

3.3.2. Evolution of Intel’s home user oriented multicore DT platforms

3.3.3. Evolution of Intel’s business user oriented multicore DT platforms

Page 42: Dezső  Sima 20 11 December

3.3 DT platforms

3.3 DT platforms

3.3.1 Design space of the basic architecture of DT platforms

Page 43: Dezső  Sima 20 11 December

3.3.1 Design space of the basic architecture of DT platforms (1)

MCH

..

..

. .

ICH

P

P

..

..

. .

..

..

. .

Layou

t of

the in

terc

on

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

al lin

ks

att

ach

FB

-DiM

Ms

Att

ach

ing

mem

ory

via

para

llel ch

an

nels

Point of attaching the MSS

MCH

..

..

..

P

ICH

P

..

..

..

....

..

Attaching memory to the MCH Attaching memory to the processor

Pentium D/EE to Penryn (Up to 4C) 1. G. Nehalem to Sandy Bridge (Up to 6C)

DT platforms

No. of mem. channels

..

S/P

..

..

..

S/P

..

... .MCH

P

ICH

..

S/P

..

..

S/P

..

..

..

S/P

..

..

.. ..

S/P..

... .P

. .

Seri

al lin

ks a

ttach

.S

/P c

on

v.

w/

par.

ch

an

.P

ara

llel ch

an

nels

att

ach

DIM

Ms

Page 44: Dezső  Sima 20 11 December

Layou

t of

the in

terc

on

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

a l lin

ks a

ttach

FB

-DIM

Ms

Att

ach

ing

mem

ory

via

para

llel

ch

an

nels

Seri

al lin

ks a

ttach

S/P

con

vert

ers

w

/ p

ar.

ch

an

nels

Pentium D/EE 2x1C (2005/6)Core 2 2C (2006)

Core 2 Quad 2x2C (2007)Penryn 2C/2x2C (2008)

1. G. Nehalem 4C (2008)Westmere-EP 6C (2010)2. G. Nehalem 4C (2009)

Westmere-EP 2C+G (2010)Sandy Bridge 2C/4C+G (2011)

Sandy Bridge-E 6C (2011)

Attaching memory to the MCH Attaching memory to the processor

Point of attaching the MSSEvolution of Intel’s DT platforms (Overview)

No. of memory channels

No.

of

mem

ory

ch

an

nels

No need for higher memory bandwidththrough serial memory interconnection

Para

llel ch

an

nels

att

ach

DIM

Ms

3.3.1 Design space of the basic architecture of DT platforms (2)

Page 45: Dezső  Sima 20 11 December

Up to DDR2-667

2/4 DDR2 DIMMsup to 4 ranks

Pentium D/Pentium EE

(2x1C)

945/955X/975X MCH

ICH7

FSB

DMI

Core2 2CCore 2 Quad (2x2C)/Penryn (2C/2*2C)

965/3-/4- Series

MCH

ICH8/9/10

FSB

DMI

Up to DDR2-800Up to DDR3-1067

X58 IOH

ICH10

QPI

DMI

1. gen. Nehalem (4C)/

Westmere-EP (6C)

Up to DDR3-1067

Tylersburg (2008)Anchor Creek (2005)

Bridge Creek (2006) (Core 2 aimed)Salt Creek (2007) (Core 2 Quad aimed)Boulder Creek (2008) (Penryn aimed)

3.3.2 Evolution of Intel’s home user oriented multicore DT platforms (1)

3.3.2 Evolution of Intel’s home user oriented multicore DT platforms (1)

Page 46: Dezső  Sima 20 11 December

X58 IOH

ICH10

QPI

DMI

1. gen. Nehalem (4C)/

Westmere-EP (6C)

2. gen. Nehalem (4C)/Westmere-EP (2C+G)

5- Series

PCH

FDI DMI

Sandy Bridge (4C+G)

6- Series

PCH

FDI DMI2

Up to DDR3-1067

Up to DDR3-1333

Sugar Bay (2011)

Up to DDR3-1333

Tylersburg (2008) Kings Creek (2009)

3.3.2 Evolution of Intel’s home user oriented multicore DT platforms (2)

3.3.2 Evolution of Intel’s home user oriented multicore DT platforms (2)

Page 47: Dezső  Sima 20 11 December

X58 IOH

ICH10

QPI

DMI

1. gen. Nehalem (4C)/

Westmere-EP (6C)

Up to DDR3-1067

Tylersburg (2008)

Up to DDR3-1600

Waimea Bay (2011)

X79

PCH

DMI2

DDR3-1600: up to 1 DIMM per channelDDR3-1333: up to 2 DIMMs per channel

Sandy Bridge-E (4C)/6C)

3.3.2 Evolution of Intel’s home user oriented multicore DT platforms (3)

3.3.2 Evolution of Intel’s home user oriented multicore DT platforms (3)

Page 48: Dezső  Sima 20 11 December

Up to DDR2-667

2/4 DDR2 DIMMsup to 4 ranks

Pentium D/Pentium EE

(2x1C)

945/955X/975X MCH

ICH7

FSB

DMI

Core2 (2C)Core 2 Quad (2x2C)Penryn (2C/2*2C)

Q965/Q35/Q45

MCH

ICH8/9/10

FSB

Up to DDR2-800Up to DDR3-1067

Up to DDR3-1333

Piketon (2009)Lyndon (2005)

Averill Creek (2006) (Core 2 aimed)Weybridge (2007) (Core 2 Quad aimed)McCreary (2008) (Penryn aimed)

82573E GbE(Tekoe)

Gigabit Ethernet LAN connection

LCI

82566/82567 LAN PHY

LCI/GLCI

Gigabit EthernetLAN connection

82578GbE LAN PHY

PCIe 2.0/SMbus 2.0

Gigabit Ethernet LAN connection

DMI C-link

ME

MEQ57 PCH

ME

2. gen. Nehalem (4C)Westmere-EP (2C+G)

FDI DMI

3.3.3 Evolution of Intel’s business user oriented multicore DT platforms (1)

3.3.3 Evolution of Intel’s business user oriented multicore DT platforms (1)

Page 49: Dezső  Sima 20 11 December

Sugar Bay (2011)

Up to DDR3-1333

82578GbE LAN PHY

PCIe 2.0/SMbus 2.0

Gigabit Ethernet LAN connection

Q57 PCHME

Piketon (2009)

2. gen. Nehalem (4C)Westmere-EP (2C+G)

GbE LAN

PCIe 2.0/SMbus 2.0

Gigabit Ethernet LAN connection

Q67 PCHME

Sandy Bridge (4C+G)

FDI DMI FDI DMI2

Up to DDR3-1333

3.3.3 Evolution of Intel’s business user oriented multicore DT platforms (2)

3.3.3 Evolution of Intel’s business user oriented multicore DT platforms (2)

Page 50: Dezső  Sima 20 11 December

3.4. DP server platforms

3.4.1. Design space of the basic architecture of DP server platforms

3.4.2. Evolution of Intel’s low cost oriented multicore DP server platforms

3.4.3. Evolution of Intel’s performance oriented multicore DP server platforms

Page 51: Dezső  Sima 20 11 December

3.4 DP server platforms

3.4 DP server platforms

3.4.1 Design space of the basic architecture of DP server platforms

Page 52: Dezső  Sima 20 11 December

MCH

..

..

..

P P

MCH

..

..

..

P P

ICH ICH

MCH

..

..

. .

ICH

P P

MCH

..

..

. .

ICH

P P

3.4.1 Design space of the basic architecture of DP server platforms (1)

Single FSB Dual FSBs

90 nm Pentium 4 DP 2x1C (2005)

Core 2/Penryn 2C/2x2C (2006/7)

65 nm Pentium 4 DP 2x1CCore 2/Penryn 2C/2x2C (2006/7)

PP

..

..

..

....

..

..

S/P

..

..

..

S/P

..

... .MCH

P P

ICH

..

S/P

..

....

S/P

..

... .MCH

ICH

P P ..

P

S/P

..

..

S/P

..

..

..

S/P

..

..

.. ..

S/P..

... .P

. .

PP

..

..

. .

..

..

. .

NUMA

Nehalem-EX/Westmere-EX8C/10C (2010/11)

Nehalem-EP to Sandy Bridge -EP/EN Up to 8 C (2009/11)

Layou

t of

the in

terc

on

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

al lin

ks

att

ach

FB

-DiM

Ms A

ttach

ing

mem

ory

via

para

llel ch

an

nels

Seri

al lin

ks a

ttach

.S

/P c

on

v.

w/

par.

ch

an

.P

ara

llel ch

an

nels

att

ach

DIM

Ms

nM

DP platforms

Page 53: Dezső  Sima 20 11 December

Single FSB Dual FSBs

90 nm Pentium 4 DP 2x1C (2006)

Core 2 2C/Core 2 Quad 2x2C/

Penryn 2C/2x2C (2006/2007)

SMP NUMA

Nehalem-EP 4C (2009)Westmere-EP 6C (2010)

No. of memory channels

(Paxville DP)

(Cranberry Lake) (Tylersburg-EP)

Nehalem-EX/Westmere-EX

8C/10C (2010/2011)

(Boxboro-EX)

65 nm Pentium 4 DP2x1C

Core 2 2CCore 2 Quad 2x2CPenryn 2C/2x2C

(2006/2007)

(Bensley)

Eff. Eff.

HP

HP

No.

of

mem

ory

ch

an

nels

Sandy Bridge-EN 8C (2011) Romley-EN

Sandy Bridge-EP 8C (20 11) Romley-EP

Layou

t of

the in

terc

on

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

al lin

ks

att

ach

FB

-DiM

Ms

Att

ach

ing

mem

ory

via

para

llel ch

an

nels

Seri

al lin

ks a

ttach

.S

/P c

on

vert

ers

w/

par.

ch

an

.

Para

llel ch

an

nels

att

ach

DIM

Ms

nM

HP

HP

Scheme of attaching and interconnecting DP processorsEvolution of Intel’s DP platforms (Overview)

3.4.1 Design space of the basic architecture of DP server platforms (2)

Page 54: Dezső  Sima 20 11 December

3.4.2 Evolution of Intel’s low cost oriented multicore DP server platforms (1)

3.4.2 Evolution of Intel’s low cost oriented multicore DP server platforms

Page 55: Dezső  Sima 20 11 December

90 nm Pentium 4Prescott DP (2C)

E7520 MCH

ICH5R/6300ESB IOH

FSB

HI 1.5

90 nm Pentium 4Prescott DP (2C)

FSB

DDR-266/333DDR2-400

90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C)

Xeon DP 2.8 /Paxville DP)

HI 1.5 (Hub Interface 1.5)8 bit wide, 66 MHz clock, QDR,

66 MB/s peak transfer rate

Core 2 (2C/Core 2 Quad (2x2C)/

Penryn (2C/2x2C)

E5100 MCH

ICHR9

FSB

ESI

Core 2 (2C/Core 2 Quad (2x2C)//Penryn (2C/2x2C)

DDR2-533/667

Penryn aimed Cranberry Lake DP server platform (for up to 4 C)

Xeon 5300(Clowertown)

2x2C

Xeon 5400(Harpertown)

4C

Xeon 5200(Harpertown)

2C

or or

ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,

providing 1 GB/s transfer rate in each direction)

Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Penryn aimed Cranberry Lake DP platform (up to 4 cores)

3.4.2 Evolution of Intel’s low cost oriented multicore DP server platforms (2)

Page 56: Dezső  Sima 20 11 December

Sandy Bridge-EN (Socket B2) aimed Romley-EN DP server platform

(for up to 8 cores)

Sandy Bridge-EN (8C)

Socket B2

C600 PCH

DMI2

QPI Sandy Bridge-EN (8C)

Socket B2

DDR3-1600

Penryn aimed Cranberry Lake DP platform (for up to 4 C)

Core 2 (2C/2x2C)/ Penryn (2C/4C)

proc.

E5100 MCH

ICHR9

FSB

ESI

Core 2 (2C/2x2C)/ Penryn (2C/4C)

proc.

DDR2-533/667

Xeon 5300(Clowertown)

2x2C

Xeon 5400(Harpertown)

4C

Xeon 5200(Harpertown)

2C

or or

E5-2400Sandy Bridge–EN 8C

Evolution from the Penryn aimed Cranberry Lake DP platform (up to 4 cores) to the Sandy Bridge-EP aimed Romley-EP DP platform (up to 8 cores)

3.4.2 Evolution of Intel’s low cost oriented multicore DP server platforms (3)

Page 57: Dezső  Sima 20 11 December

3.4.3 Evolution of Intel’s performance oriented multicore DP server platforms (1)

3.4.3 Evolution of Intel’s performance oriented multicore DP server platforms

Page 58: Dezső  Sima 20 11 December

90 nm Pentium 4Prescott DP (2x1C)

E7520 MCH

ICH5R/6300ESB IOH

FSB

HI 1.5

90 nm Pentium 4Prescott DP (2x1C)

FSB

DDR-266/333DDR2-400

65 nm Pentium 4 Prescott DP (2x1C)/

Core2 (2C/2*2C)

E5000 MCH

631*ESB/632*ESB IOH

FSB

65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)

FB-DIMMw/DDR2-533

Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Core 2 aimed Bensley DP platform (up to 4 cores)

90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C)

Core 2 aimed Bensley DP server platform (for up to 4 C)

Xeon DP 2.8 /Paxville DP)

Xeon 5000(Dempsey)

2x1C

Xeon 5100(Woodcrest)

2C

Xeon 5300(Clowertown)

2x2C

/ /

ESI

ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,

providing 1 GB/s transfer rate in each direction)

HI 1.5 (Hub Interface 1.5)8 bit wide, 66 MHz clock, QDR,

66 MB/s peak transfer rate

Xeon 5400(Harpertown)

2x2C

Xeon 5200(Harpertown)

2C

// /

3.4.3 Evolution of Intel’s performance oriented multicore DP server platforms (2)

Page 59: Dezső  Sima 20 11 December

FBDIMMw/DDR2-533

65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)

5000 MCH

631*ESB/632*ESB IOH

FSB

65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)

ESI

65 nm Core 2 aimed high performance Bensley DP server platform (for up to 4 C)

1First chipset with PCI 2.0ME: Management Engine

Nehalem-EP aimed Tylersburg-EP DP server platformwith dual IOHs (for up to 6 C)

DDR3-1333

Nehalem-EP (4C) Westmere-EP (6C)

55xx IOH1

QPI

QPI

ICH9/ICH10

ESI

QPI

CLink

DDR3-1333ME

Nehalem-EP (4C) Westmere-EP (6C)

DDR3-1333

Nehalem-EP (4C) Westmere-EP (6C)

QPI

QPI

ICH9/ICH10

ESI

QPI

CLink

DDR3-1333

ME

Nehalem-EP (4C) Westmere-EP (6C)

55xx IOH1QPI

55xx IOH1

ME

Nehalem-EP aimed Tylersburg-EP DP server platformwith a single IOH (for up to 6 C)

Evolution from the Core 2 aimed Bensley DP platform (up to 4 cores)

to the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores)

3.4.3 Evolution of Intel’s performance oriented multicore DP server platforms (3)

Page 60: Dezső  Sima 20 11 December

Basic system architecture of the Sandy Bridge-EN and -EP aimed Romley-EN and –EP DP server platforms

Nehalem –EP (4C)Westmere-EP (6C)

34xx PCH

DMI

QPI Nehalem-EP (4C)Westmere-EP (6C)

DDR3-1333DDR3-1333ME

Xeon 55xx(Gainestown)

Xeon 56xx(Gulftown)/

Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores)

Sandy Bridge-EP (Socket R) aimed Romley-EP DP server platform (for up to 8 cores) (LGA 2011)

C600 PCH

DMI2

QPI 1.1

DDR3-1600

QPI 1.1

Sandy Bridge-EP (8C)

Socket R

Sandy Bridge-EP (8C)

Socket R

E5-2600Sandy Bridge–EP 8C

E5-2600Sandy Bridge-EP 8C

3.4.3 Evolution of Intel’s performance oriented multicore DP server platforms (4)

Page 61: Dezső  Sima 20 11 December

Nehalem-EX (8C) Westmere-EX

(10C)

QPI

QPI

DDR3-1067

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMB

SMB

SMB

SMB

7500 IOH

QPI

Nehalem –EP (4C)Westmere-EP (6C)

34xx PCH

ESI

QPI

DDR3-1333DDR3-1333ME

Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)

Xeon 5500(Gainestown)

Xeon 5600(Gulftown)or

Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores)

Nehalem-EX (8C) Westmere-EX

(10C)

or

Xeon 6500(Nehalem-EX)

(Becton)

Xeon E7-2800(Westmere-EX)

ME

Contrasting the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores) to the Nehalem-EX aimed very high performance scalable Boxboro-EX DP platform (up to 10 cores)

Nehalem –EP (4C)Westmere-EP (6C)

SMI: Serial link between the processor and the SMB

SMB: Scalable Memory Buffer with Parallel/serial conversion

SMI links SMI links

3.4.3 Evolution of Intel’s performance oriented multicore DP server platforms (5)

Page 62: Dezső  Sima 20 11 December

3.5. MP server platforms

3.5.2. Evolution of Intel’s multicore MP server platforms•

3.5.3. Evolution of AMD’s multicore MP server platforms•

3.5.1. Design space of the basic architecture of MP server platforms

Page 63: Dezső  Sima 20 11 December

3.5 MP server platforms

3.5 MP server platforms

3.5.1 Design space of the basic architecture of MP server platforms

Page 64: Dezső  Sima 20 11 December

..

S/P

..

..

..

S/P

..

... .MCH

MCH

..

..

..

P P P P

MCH

..

..

..

P P P P

ICH ICH

MCH

..

..

..P P P P

ICH

P P P P

ICH

..

S/P

....

..

S/P

..

... .MCH

ICH

P P P P

MCH

..

..

. .

..

S/P

..

..

..

S/P

..

... .MCH

ICH

P P P P

ICH

P P P P

MCH

..

..

. .

ICH

P P P P

MCH

..

..

. .

ICH

P P P P

3.5.1 Design space of the basic architecture of MP server platforms (1)

MP SMP platforms

Single FSB Dual FSBs Quad FSBs

Pentium 4 MP 1C (2004)

90 nm Pentium 4 MP 2x1C

Core 2/Penryn up to 6C

Layou

t of

the

inte

rcon

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

al lin

ks

att

ach

FB

-DiM

Ms

Att

ach

ing

mem

ory

via

para

llel ch

an

nels

Seri

al lin

ks a

ttach

.S

/P c

on

v.

w/

par.

ch

an

.P

ara

llel ch

an

nels

att

ach

DIM

Ms

Page 65: Dezső  Sima 20 11 December

PP

..

..

..

..

..

..

PP

..

..

..

..

..

..

PP

..

..

..

..

..

..

PP

..

..

..

..

..

..

PP

..

..

. .

..

..

. .

PP

..

..

. .

..

..

. .

PP

..

..

. .

..

..

. .

PP

..

..

. .

..

..

. .

..

P

S/P

..

..S/P

..

..

..

S/P..

..

.. ..

S/P..

... .

P

. ...

P... .P

..S/P

..

S/P..

..

S/P

....

..

S/P..

..

..

..

P

S/P

..

..

S/P

..

..

..

S/P..

..

.. ..

S/P..

... .

P

. .

..

P... .P

..S/P

..

S/P..

..

S/P

..

..

..

S/P..

..

... .

. .

MP NUMAplatforms

Partially connected mesh Fully connected mesh

AMD Direct Connect Architecture 1.0 (2003) AMD Direct Connect Architecture 2.0 (2010)

Nehalem-EX/Westmere up to 10C (2010/11)Inter proc. BW

Mem

. B

W

Layou

t of

the

inte

rcon

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

al lin

ks

att

ach

FB

-DiM

Ms

Att

ach

ing

mem

ory

via

para

llel ch

an

nels

Seri

al lin

ks a

ttach

.S

/P c

on

v.

w/

par.

ch

an

.P

ara

llel ch

an

nels

att

ach

DIM

Ms

3.5.1 Design space of the basic architecture of MP server platforms (2)

Page 66: Dezső  Sima 20 11 December

Single FSB Dual FSBs Quad FSBs

Pentium 4 MP 1C (2004)

(Not named)

90 nm Pentium 4 MP 2x1C (2006)

(Truland)

Core 2/Penryn up to 6C

(2006/2007)Caneland

Part. conn.mesh

Fully conn.mesh

SMP NUMA

Scheme of attaching and interconnecting MP processors

Nehalem-EX/Westmere up to 10C (2010/11)

(Boxboro-EX)

AMD DCA 1.0 (2003)

AMD DCA 2.0 (2010)

No. of memory channels

No.

of

mem

ory

ch

an

nels

Layou

t of

the in

terc

on

necti

on

Att

ach

ing

mem

ory

via

seri

al lin

ks

Seri

al lin

ks

att

ach

FB

-DiM

Ms

Att

ach

ing

mem

ory

via

para

llel ch

an

nels

Seri

al lin

ks a

ttach

.S

/P c

on

vert

ers

w/

par.

ch

an

.

Para

llel ch

an

nels

att

ach

DIM

Ms

Interproc. bandwidth

Evolution of Intel’s MP platforms (Overview)

3.5.1 Design space of the basic architecture of MP server platforms (3)

Page 67: Dezső  Sima 20 11 December

3.5.2 Evolution of Intel’s multicore MP server platforms (1)

3.5.2 Evolution of Intel’s multicore MP server platforms

Page 68: Dezső  Sima 20 11 December

Xeon MP1

SCXeon MP1

SC

FSB

Xeon MP1

SCXeon MP1

SC

Preceding ICH

Preceding NBs

E.g. HI 1.5

HI 1.5 266 MB/s

E.g. DDR-200/266 E.g. DDR-200/266

85001/8501

ICH5

XMB

XMB

DDR-266/333DDR2-400

FSB

XMB

XMB

HI 1.5

DDR-266/333DDR2-400

Xeon 7000(Paxville MP) 2x1C

Xeon 7100(Tulsa) 2C

Xeon MP(Potomac) 1C

/ /

90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C)

Pentium 4Xeon MP 1C/2x1C

Pentium 4Xeon MP 1C/2x1C

Pentium 4Xeon MP 1C/2x1C

Pentium 4Xeon MP 1C/2x1C

Previous Pentium 4 MP aimedMP server platform (for single core processors)

Evolution from the first generation MP servers supporting SC processors to the 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (supporting up to 2 cores)

3.5.2 Evolution of Intel’s multicore MP server platforms (2)

Page 69: Dezső  Sima 20 11 December

Core 2 (2C/2x2C)

Penryn (6C)

Core 2 (2C/2x2C)

Penryn (6C)

Core 2 (2C/2x2C)

Penryn (6C)

Core 2 (2C/2x2C)

Penryn (6C)

7300

631xESB/632xESB

4 channelsup to

8 DIMMs/channel

85001/8501

ICH5

XMB

XMB

DDR-266/333DDR2-400

FSB

XMB

XMB

ESI

FSB

HI 1.5

DDR-266/333DDR2-400

FB-DIMMDDR2-533/667

Xeon 7000(Paxville MP) 2x1C

Xeon 7100(Tulsa) 2C

Xeon MP(Potomac) 1C

/ /

1 The E8500 MCH supports an FSB of 667 MT/s and consequently only the SC Xeon MP (Potomac)

Xeon 7200(Tigerton DC) 1x2C

Xeon 7300(Tigerton QC) 2x2C

Xeon 7400(Dunnington 6C)

/ /

90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C)

Core 2 aimed Caneland MP server platform (for up to 6 C)

ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,

providing 1 GB/s transfer rate in each direction)

HI 1.5 (Hub Interface 1.5)8 bit wide, 66 MHz clock, QDR,

266 MB/s peak transfer rate

Pentium 4Xeon MP 1C/2x1C

Pentium 4Xeon MP 1C/2x1C

Pentium 4Xeon MP 1C/2x1C

Pentium 4Xeon MP 1C/2x1C

Evolution from the 90 nm Pentium 4 Prescott MP aimed Truland MP platform (up to 2 cores) to the Core 2 aimed Caneland MP platform (up to 6 cores)

3.5.2 Evolution of Intel’s multicore MP server platforms (3)

Page 70: Dezső  Sima 20 11 December

Nehalem-EX 8CWestmere-EX

10C

7500 IOH

QPI

QPI

QPIQPI QPI QPI

QPIQPI

SMB

SMB

DDR3-1067

SMB

SMB

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMI: Serial link between the processor and the SMBs

SMB: Scalable Memory Buffer Parallel/serial converter

SMB

SMB

SMB

SMB

SMB

SMB

SMB

SMB2x4 SMI

channels2x4 SMI

channels

ME

ME: Management Engine

Xeon 7500(Nehalem-EX)(Becton) 8C

Xeon 7-4800(Westmere-EX) 10C

Nehalem-EX 8CWestmere-EX

10C

Nehalem-EX 8CWestmere-EX

10C

Nehalem-EX 8CWestmere-EX

10C

/

Nehalem-EX aimed Boxboro-EX MP server platform (for up to 10 C)

Evolution to the Nehalem-EX aimed Boxboro-EX MP platform (that supports up to 10 cores) (In the basic system architecture we show the single IOH alternative)

3.5.2 Evolution of Intel’s multicore MP server platforms (4)

Page 71: Dezső  Sima 20 11 December

3.5.3 Evolution of AMD’s multicore MP server platforms (1)

3.5.3 Evolution of AMD’s multicore MP server platforms [47] (1)

Introduced in the single core K8-based Opteron DP/MP servers (AMD 24x/84x) (6/2003)Memory: 2 channels DDR-200/333 per processor, 4 DIMMs per channel.

Page 72: Dezső  Sima 20 11 December

Introduced in the 2x6 core K10-based Magny-Course (AMD 6100)(3/2010)Memory: 2x2 channels DDR3-1333 per processor, 3 DIMMs per channel.

3.5.3 Evolution of AMD’s multicore MP server platforms [47] (2)

3.5.3 Evolution of AMD’s multicore MP server platforms (2)

Page 73: Dezső  Sima 20 11 December

5. References

Page 74: Dezső  Sima 20 11 December

5. References (1)

[1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino

[2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226

[3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/

[4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29.

[5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm

[6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf

[7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004 , http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm

[8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2

[9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf

[10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html

Page 75: Dezső  Sima 20 11 December

5. References (2)

[11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf

[12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html

[13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf

[14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org

[15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf

[16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf

[17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf

[18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years, Xbit Laboratories, 04/28/2006, http://www.xbitlabs.com/news/cpu/display/20060428162855.html

[19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7

Page 76: Dezső  Sima 20 11 December

5. References (3)

[24]: Detecting Memory Bandwidth Saturation in Threaded Applications, Intel, March 2 2010, http://software.intel.com/en-us/articles/detecting-memory-bandwidth-saturation-in- threaded-applications/

[25]: McCalpin J. D., STREAM Memory Bandwidth, July 21 2011, http://www.cs.virginia.edu/stream/by_date/Bandwidth.html

[26]: Rogers B., Krishna A., Bell G., Vu K., Jiang X., Solihin Y., Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling, ISCA 2009, Vol. 37, Issue 1, pp. 371-382

[22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, 2007,

[20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1

[21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf

[23]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf

[27]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture: Capacity, Performance, Reliability, and Longevity, Febr. 18 2004, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf

[28]: Wikipedia: Intel X58, 2011, http://en.wikipedia.org/wiki/Intel_X58

Page 77: Dezső  Sima 20 11 December

5. References (4)

[29]: Sharma D. D., Intel 5520 Chipset: An I/O Hub Chipset for Server, Workstation, and High End Desktop, Hotchips 2009, http://www.hotchips.org/archives/hc21/2_mon/ HC21.24.200.I-O-Epub/HC21.24.230.DasSharma-Intel-5520-Chipset.pdf

[30]: DDR2 SDRAM FBDIMM, Micron Technology, 2005, http://download.micron.com/pdf/datasheets/modules/ddr2/HTF18C64_128_256x72F.pdf

[31]: Wikipedia: Fully Buffered DIMM, 2011, http://en.wikipedia.org/wiki/Fully_Buffered_DIMM

[32]: Intel E8500 Chipset eXternal Memory Bridge (XMB) Datasheet, March 2005, http://www.intel.com/content/dam/doc/datasheet/e8500-chipset-external-memory- bridge-datasheet.pdf

[33]: Intel 7500/7510/7512 Scalable Memory Buffer Datasheet, April 2011, http://www.intel.com/content/dam/doc/datasheet/7500-7510-7512-scalable-memory- buffer-datasheet.pdf

[34]: AMD Unveils Forward-Looking Technology Innovation To Extend Memory Footprint for Server Computing, July 25 2007, http://www.amd.com/us/press-releases/Pages/Press_Release_118446.aspx

[35]: Chiappetta M., More AMD G3MX Details Emerge, Aug. 22 2007, Hot Hardware, http://hothardware.com/News/More-AMD-G3MX-Details-Emerge/

[36]: Goto S. H., The following server platforms AMD, May 20 2008, PC Watch, http://pc.watch.impress.co.jp/docs/2008/0520/kaigai440.htm

[37]: Wikipedia: Socket G3 Memory Extender, 2011, http://en.wikipedia.org/wiki/Socket_G3_Memory_Extender

Page 78: Dezső  Sima 20 11 December

5. References (5)

[38]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc.

[39]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf

[40]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/documentation/white_papers/wp190.pdf

[41]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002, www.edn.com

[42]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008

[43]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online, 01/12/2007, http://eetimes.eu/showArticle.jhtml?articleID=196901366&queryText =calibrated

[44]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/documentation/white_papers/wp190.pdf

[45]: Memory technology evolution: an overview of system memory technologies, Technology brief, 9th edition, HP, Dec. 2010, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00256987/c00256987.pdf

Page 79: Dezső  Sima 20 11 December

5. References (6)

[46]: Kane L., Nguyen H., Take the Lead with Jasper Forest, the Future Intel Xeon Processor for Embedded and Storage, IDF 2009, July 27 2009, ftp://download.intel.com/embedded/processor/prez/SF09_EMBS001_100.pdf

[47]: The AMD Opteron™ 6000 Series Platform: More Cores, More Memory, Better Value, March 29 2010, http://www.slideshare.net/AMDUnprocessed/amd-opteron-6000-series -platform-press-presentation-final-3564470

[48]: Memory Module Picture 2007, Simmtester, Febr. 21 2007, http://www.simmtester.com/page/news/showpubnews.asp?num=150