Dezső Sima 2012 December (Ver. 1.6) Sima Dezső, 2012 Platforms I.

109
Dezső Sima 2012 December (Ver. 1.6) Sima Dezső, 2012 Platforms I.

Transcript of Dezső Sima 2012 December (Ver. 1.6) Sima Dezső, 2012 Platforms I.

Page 1: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Dezső Sima

2012 December

(Ver. 1.6) Sima Dezső, 2012

Platforms I.

Page 2: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Contents

2. Main components of platforms•

1. Introduction to platforms•

5. References•

3. Platform architectures

4. Memory subsystem design considerations•

Page 3: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1. Introduction to platforms

1.1. The notion of platform•

1.2. Description of particular platforms

1.3. Representation forms of platforms

1.4. Compatibility of platform components

Page 4: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1.1. The notion of platform

Page 5: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

The notion platform is widely used in different segments of the IT industry e.g. by IC manufacturers, system providers or even by software suppliers with different interpretations.Here we are focusing on the platform concept as used typically by system providers.

1.1 The notion of platform

1.1 The notion of platform (1)

Page 6: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Modular (unified) system design and the notion platform

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI C-link

Two memory channelsDDR2-800/666/533

Two DIMMs per channel

FSB: 1066/800/533 MT/s speed

ME

Figure: Intel’s Core 2 Duo (and Core 2 Extreme (the highest speed model) aimedDT platform (the Bridge Creek platform)

Modular system design means that the system architecture is partitioned to a few standard components (modules), such as the processor, memory control hub (MCH), I/O control hub (ICH) that are interconnected by specified (standard) interconnections.

1.1 The notion of platform (2)

Page 7: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Modular system design became part of scientific research at the end of the 1990s, see e.g. [4].

The need for a modular system design, called platform design, arose in the PC industry when PCI-based system designs were substituted by port based system designs, about 1998-1999 .

Remark

1.1 The notion of platform (3)

Page 8: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Late PCI-based system architecture (~ 1998)(used typically with Pentium II/III

(built around Intel’s 440xx chipset)

Systemcontroller

PCI bus

Processor bus

Main Memory(EDO/SDRAM)

Peripheralcontroller

PCI device adapter

ISA deviceadapter

ISA bus

Pentium II/Pentium III

Pentium II/Pentium III

AGP

2xIDE/ATA33/66

2xUSB

(Legacy and/orslow devices)

Systemcontroller

PCI bus

Processor bus

Main Memory(SDRAM)

Peripheralcontroller

PCI device adapter

ISA deviceadapter

ISA bus

AGP

2xIDE/

2x/4x USB

Hub interfaceATA 33/66/100

PCI to ISA bridge

LPCSuper I/O (KBD, MS, etc.)AC'97

Legacy devices

Pentium III

Early port-based system architecture (~ 1999)(used first with Pentium III

(built around Intel’s 810 chipset)

1.1 The notion of platform (4)

Page 9: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Main goals of modular system level design

• to reduce the complexity of designing complex systems by partitioning it to modules,• to have stable interfaces (at least for a few number of years) interconnecting the modules• in this way

Platform components are typically co-designed, announced and delivered as a set.

Co-design of platform components

• to minimize design rework while upgrading a given system design, like moving from one processor generation to the next and thus• to shorten the time to market.

1.1 The notion of platform (5)

Page 10: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

The notion of platforms

System providers however, may use the notion platform either in a more general or a more specific sense.

Interpretation of the notion platform

Interpretation in a more general sense

Interpretation in amore specific sense

A modular system design targeting a givenapplication area,

used as terms like DT or MP platforms.

A particular modular system architecture, developed for a given application area,such as a given DT or MP platform, like

Intel’s Sandy Bridge Based Sugar Bay DT platform orAMD’s Phenom II X! based Dragon platform (2008)

for gamers (2009)

1.1 The notion of platform (6)

Page 11: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

• With the platform concept in mind manufacturers, like Intel or AMD will plan, design and market all key components of a platforms, such as the processor or the processors and the related chipset as an integrated entity [5].

• This is beneficial for the manufacturers since it motivates OEMs as system providers, to buy all key parts of a computer system from the same manufacturer.

Benefits of the platform concept for computer manufacturers

1.1 The notion of platform (7)

Page 12: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Benefits of the platform concept for customers

The platform concept is beneficial for the customers as well since an integrated “backbone” of a system architecture promises a more reliable and more cost effective system.

1.1 The notion of platform (8)

Page 13: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

In a more specific sense the notion platform refers to a particular modular system architecture, that is developed for a given application area, such as a DT, DP or MP platform.

• the processor or processors,• the chipset,• the memory subsystem (MSS) that is attached by a specific memory interface• in some cases, such as in mobile or business oriented DT platforms also the networking component [7] as well• the buses interconnecting the above components of the platform..

In this sense the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of

Subsequently, we will focus on the interpretation of the notion platform in this latter sense.

Chipset Buses interconnecting the preceding

basic components

Processor or processors

Basic components of a platform

(LAN controller)

1.1 The notion of platform (9)

Interpretation the notion platform in a more specific sense

The memory subsystem

Page 14: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Example 1: Intel’s Core 2 aimed home user DT platform (Bridge Creek) [3]

2 DIMMs/channel

2 DIMMs/channel

card

C-link

1066 MT/s

Display

Platform

1.1 The notion of platform (10)

Page 15: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Nehalem-EX 8CWestmere-EX

10C

7500 IOH

QPI

QPI

QPIQPI QPI QPI

QPIQPI

SMB

SMB

DDR3-1067

SMB

SMB

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMI: Serial link between the processors and SMBsSMB: Scalable Memory Buffer Parallel/serial conversion

SMB

SMB

SMB

SMB

SMB

SMB

SMB

SMB2x4 SMI

channels2x4 SMI

channels

Example 2: Intel’s Nehalem-EX aimed Boxboro-EX MP server platform, assuming 1 IOH

ME

ME: Management Engine

Xeon 7500(Nehalem-EX)(Becton) 8C

Xeon 7-4800(Westmere-EX) 10C

Nehalem-EX 8CWestmere-EX

10C

Nehalem-EX 8CWestmere-EX

10C

Nehalem-EX 8CWestmere-EX

10C

/

Platform

Interfaces connecting platformcomponents

1.1 The notion of platform (11)

Page 16: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

The structure of a platform is termed as its architecture (or topology).

It describes the basic components and their interconnections and will be discussed in Section 3.

1.1 The notion of platform (12)

Page 17: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Historical remarks

System providers began using the notion “platform” about 2000, like

• Philips’ Nexperia digital video platform (1999), • Texas Intruments (TI) OMAP platform for SOCs (2002),• Intel’s first generation mobile oriented Centrino platform for laptops, designated as the Carmel platform (3/2003).

Intel contributed significantly for spreading the notion platform when based on the success of their Centrino platform they introduced this concept also for their desktops [5] and servers [6], [7] in 2004.

1.1 The notion of platform (13)

Page 18: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Intel’s early server and workstation roadmap from Aug. 2004 [6]

Note

a) This roadmap already makes use of the notion platform without revealing platform names.b) In 2004 Intel made a transition from 32 bit systems to 64 bit systems.

1.1 The notion of platform (14)

Page 19: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Intel’s multicore platform roadmap announced at the IDF Spring 2005 [8]

Note

This roadmap includes also the particular platform designations for desktops, UP servers etc.

1.1 The notion of platform (15)

Page 20: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1.2. Description of a particular platform

Page 21: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Description of a particular platform

Detailing the platform

architecture

Description of a particular platform

Example: The Tylersburg DT platform (2008)

1.2 Description of a particular platform (1)

Processor

MCH

ICH

Page 22: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Detailing the platform architecture includes the specification architecture (topology) of the processor-, the memory- and the I/O subsystems (to be discussed in Section 3).

1.2 Description of a particular platform (2)

Example: The Tylersburg DT platform (2008)

Processor

MCH

ICH

It is concerned with issues, such as whether the processors of an MP server are connected to the MCH via an FSB or otherwise, or whether the memory is attached to the system architecture through the MCH or through the processors etc.).

Page 23: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Identification of theplatform components

Description of a particular platform

Detailing the platform

architecture

Description of a particular platform

X58 IOH

ICH10

1. gen. Nehalem (4C)/

Westmere-EP (6C)

Example: The Tylersburg DT platform (2008)

Processor

MCH

ICH

1.2 Description of a particular platform (3)

Page 24: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Identification of theplatform components

Description of a particular platform

Specification of the interfaces

interconnecting the platform components

Detailing the platform

architecture

Description of a particular platform

X58 IOH

ICH10

1. gen. Nehalem (4C)/

Westmere-EP (6C)

X58 IOH

ICH10

QPI

DMI

1. gen. Nehalem (4C)/

Westmere-EP (6C)

Example: The Tylersburg DT platform (2008)

1.2 Description of a particular platform (4)

Processor

MCH

ICH

Page 25: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

The specification of a platform will be completed by the datasheets of the related platform components.

Remark

1.2 Description of a particular platform (5)

Page 26: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Architecture ofDT platforms

Platform architecture

Architecture ofMP platforms

Architecture ofDP platforms

Architecture ofmobile platforms

In these slides platform architectures will be discussed in Section 3, nevertheless restricted only for DT, DP and MP platforms.

Dependence of the platform architecture on the platform category

Of course, beyond the above categories also further processor categories and related platforms exist, such as embedded processors and related platforms.

In conformity with different platform categories also different platform architectures arise, as indicated below.

Platforms may be classified according to the target area of application, such as

Desktop (DT) platforms

Platforms

Quad processor (MP) platforms

Dual processor (DP) platforms

Mobile platforms

1.2 Description of a particular platform (6)

Page 27: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1.3. Representation forms of platforms

Page 28: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1.3 Representation forms of platforms (1)

1.3 Representation forms of platforms

a) Thumbnail representationb) Extended representation (an arbitrarily chosen representation form in these slides) c) Block diagram of a platform.

Page 29: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI

DDR2-800/666/566

C-link

Two DDR2 channels

FSB: 1066/800/566 MT/s speed

ME Two DIMMs per channel

Example

In particular, the thumbnail representation• reveals the platform architecture,

• identifies the basic components of a platform, such as the processor or processors, the chipset, in some cases (e.g. in mobile platforms) also the Gigabit Ethernet controller,

• and specifies the interconnection links (buses) between the platform components.

Intel’s Core 2 Duo aimed home user oriented platform (The bridge Creek platform)

1.3 Representation forms of platforms (3)

a) Thumbnail representation

It is a concise representation of a particular platform.

Page 30: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2-aimed (65 nm)

E6xxx/E4xxxX6800

(Conroe: E6xxx/X6800)1

Allendale: E4xxx)1

Core 2 Extreme 2C Core 2 Duo 2C

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

DP cores

MCH

ICH

DT platformb) Extended representation

This kind of representation

• indicates a few additional data of the processor and the chipset, (like data of the die, the cache system or the memory)• reveals the dates of the introduction of platform components, and• identifies compatibility ranges of processors or chipsets in platforms by encircling compatible components,• but lacks the graphical representation of the platform.

1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.

1.3 Representation forms of platforms (4)

Page 31: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Example for stating the compatibility range of a platform

The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform).

1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.

1.3 Representation forms of platforms (5)

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI

DDR2-800/666/566

C-link

Two DDR2 channels

FSB: 1066/800/566 MT/s speed

ME Two DIMMs per channel

• the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and• the subsequent Core 2 Quad lines of processors,

Beyond the target processor this platform may be used also with

as shown in the next slides.

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

DT core

MCH

ICH

DT platform

Page 32: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

DT cores

MCH

ICH

Pentium D/EE 8xx1

(Smithfield) 2x1C

90 nm2x115 mtrs2x103 mm2

2x1 MB L2800/533 MT/s

No multithreadingLGA775

5/2005

Pentium D/EE 9xx2,3

(Presler) 2x1C

65 nm2x188 mtrs2x81 mm2

2x2 MB L21066/800 MT/s

No multithreadingLGA775

1/2006

Pentium 4 6x0/6x1/EE

(Prescott-2M) 1C

90 nm169 mtrs135 mm2

2 MB L2800 MT/s

Two-way multithreadingLGA775

2/2005

1Pentium EE 840 supports only 800 MT/s2Pentium D 9xx support only 800 MT/s3Pentium EE 955/965 supports only 1066 MT/s

Supports alsoPentium D/EE processors/90/65 nm

Supports alsoPentium 4 6x0/6x1/EE processors/90nm

Support of Pentium 4/D/EE processors

1.3 Representation forms of platforms (6)

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

Page 33: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

11/2006

Core 2 Quad (2x2C): Q6xxxQ6xxx: Kentsfield

65 nm2x291 mtrs/2x143 mm2

2*4 MB L21066 MT/s

LGA775

Core 2 Quad (2x2C)

Supports alsoCore 2 Quad processors/65 nm

Support of Core 2 Quad processors)

1.3 Representation forms of platforms (7)

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

DT core

MCH

ICH

DT platform

Page 34: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

c) Block diagram of a platform

Example: The Core 2 aimed home user DT platform (Bridge Creek) (without an integrated display controller) [3]

2 DIMMs/channel

2 DIMMs/channel

card

C-link

1066 MT/s

Display

1.3 Representation forms of platforms (8)

Page 35: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1.4. Compatibility of platform components

Page 36: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1.4 Compatibility of platform components

1.4 Compatibility of platform components (1)

One of the goals of platform based designs is to use stabilized interfaces (at least for a while) to minimize or eliminate design rework while moving from one processor generation to the next [2]. Consequently, assuming platform based designs, platform components, such as processors or chipsets of a given line are typically compatible with their previous or subsequent generations as long as the same interfaces are used and interface parameters (such FSB speed) or other implementation requirements (either from side of the components to be substituted or the substituting components) do not restrict this.

Page 37: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

In the discussed DT platform the target processor is the Core 2, that is connected to the MCH by an FSB with 1066/800/533 MT/s.The target processor of the platform however, can be substituted

• either by processors of three previous generations or• processors of the subsequent generation (Core 2 Quad)

since all these processors have FSBs of 533/800/1066 MT/s, as shown before.

1.4 Compatibility of platform components (2)

Limits of compatibility

Nevertheless, The highest performance level Core 2 Quad, termed as the Core 2 Extreme Quad, provided already an increased FSB speed of 1333 MT/s and therefore was not more supported by the Core 2 aimed platform considered.

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI C-link

Two memory channelsDDR2-800/666/533

Two DIMMs per channel

FSB: 1066/800/533 MT/s

ME

Page 38: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2. Basic components of platforms

2.1. Processors•

2.2. The memory subsystem

2.3. Buses interconnecting platform components

Page 39: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

• the processor or processors,• the chipset, • the memory subsystem (MSS) that is attached by a specific memory interface,• in some cases, such as in mobile or business oriented DT platforms also the networking component [7], as well as• the buses interconnecting the above components.

As already discussed in Section 1. the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of

Subsequently, we will discuss the following three basic components of platforms:

Chipset Buses interconnecting the preceding

basic components

Processor or processors

Basic components of a platform

(LAN controller)

1.1 The notion of platform (6)

Basic components of platforms - Overview

• Processors (Section 2.1)• The memory subsystem (Section 2.2) and• Buses interconnecting platform components (excluding memory buses) (Section 2.3).

The memory subsystem

Page 40: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.1. Processors

Page 41: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.1 Processors (1)

Figure 2.1: Overview of Intel’s Tick-Tock model (based on [17])

Key microarchitectural featuresIntel’s Tick-Tock model

Adv. microarch., hyperthreading, 64-bit

New microarch., 4-wide core, 128-bit SIMD, no hyperthreading

11/2007

New microarch., hyperthreading,(inclusive) L3, integrated MC, QPI

01/2006

90nm

130nmTICKTOCK

180nm

2 Y

EA

RS

2 Y

EA

RS

2 Y

EA

RS

65nm

TICK Pentium 4 / Cedar Mill

TOCK Core 2 2 Y

EA

RS

New microarch.

Adv. microarch., hyperthreadingPentium 4 /Northwood

TICKTOCK

TICKTOCK Pentium 4 /Prescott

Pentium 4 /Willamette

07/2006

11/2008

New microarch. hyperthreading,256-bit AVX, integr. GPU, ring bus,

11/2000

01/2002

02/2004

01/2011

01/201032nm

45nm

2 Y

EA

RS

2 Y

EA

RS

22nm

2 Y

EA

RS

TICK PENRYN Family

TOCK NEHALEM

TICK WESTMERE

TOCK SANDY BRIDGE

TICK IVY BRIDGE

TOCK HASWELL

04/2012

Page 42: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Basic architectures Basic architectures and their shrinks

Pentium 4(Prescott)

2005 90 nm Pentium 4

2006 65 nm Pentium 4

Core 22006 65 nm Core 2

2007 45 nm Penryn

Nehalem2008 45 nm Nehalem

2010 32 nm Westmere

Sandy Bridge2011 32 nm Sandy Bridge

2012 22 nm Ivy Bridge

Haswell 2013 22 nm Haswell

Basic architectures and their related shrinks

Considered from the Pentium 4 Prescott (the third core of Pentium 4) on

2.1 Processors (2)

Page 43: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Basic Arch. Techn. Core/technology Cores Intro. Cache arch. Interf.

Core2 65 nm

X6800 ConroeE6xxx ConroeE4xxx AllendaleE6xxx AllendaleQX67xx Kentsfield Q6xxx Kentsfield

2C2C2C2C

2x2C2*2C

7/2006 7/20061/20077/2007

11/20061/2007

4 MB L2/2C2/4 MB L2/2C4 MB L2 /2C4 MB L2/2C4MB L2/2C4 MB l2/2C

FSB

Penryn 45 nm

E8xxx WolfdaleE7xxx Wolfdale-3MQX9xxx Yorkfield XEQ9xxx YorkfieldQ9xxx Yorkfield-6MQ8xxx Yorkfield-4M

2C2C

2x2C2*2C2*2C2x2C

1/20084/2008

11/20071/20081/20088/2008

6 MB L2/2C3 MB L2/2C6 MB L2/2C6 MB L2/2C3 MB L2/2C2 MB L2/2C

FSB

1. G. Nehalem-EP45 nm

i7-920-965 Bloomfield 4C 11/2008 ¼ MB L2/C, 8 MB L3 QPI

2. G. Nehalem-EP i7-8xxx/i5-7xx Lynnfield 4C 9/2009 ¼ MB L2/C, 8 MB L3 DMI

Westmere-EP 32 nmi7-9xxX Gulftowni7-9xx Gulftowni5-6xx/i3-5xx Clarkdale

6C6C

2C+G

3/20107/20101/2010

¼ MB L2/C, 12 MB L3¼ MB L2/C, 12 MB L3

¼ MB L2/C, max. 4 MB L2

QPIQPIDMI

Sandy Bridge 32 nm

i7-39/38xxi7-26/27xxi5-23/24/25xx Sandy Bridgei3-21xx

6C2/4C+G2/4C+G2C+G

11/20111/20111/20111/2011

¼ MB L2/C, 15 MB L3¼ MB L2/C, 4/8 MB L3¼ MB L2/C, 3/6 MB L3¼ MB L2/C, 3 MB L3

DMI 2.0DMI 2.0PCIe 2.0

Ivy Bridge 22 nmi7-3770i5-33/34/35xx Iyv Brigdei3-32xx

4C+G2/4C+G

2C

4/20124/20129/2012

¼ MB L2/C, 8 MB L3¼ MB L2/C, 6 MB L3¼ MB L2/C, 3 MB L3

DMI 2.0PCIe 3.0(PCIe 3.0)

Table 2.1: Intel’s Core 2 based and subsequent multicore DT processor lines

2.1 Processors (5)

Page 44: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Basic Arch. Core/technology DP server processors

Pentium 4 (Prescott)

Pentium 4 90 nm 10/2005 Paxville DP 2.8 2x1 C, 2 MB L2/C

Pentium 4 65 nm 5/2006 5000 (Dempsy) 2x1 C, 2 MB L2/C

Core 2

Core2 65 nm6/200611/206

5100 (Woodchrest)5300 (Clowertown)

1x2 C, 4 MB L2/C2x2 C, 4 MB L2/C

Penryn 45 nm 11/2007 5400 (Harpertown) 2x2 C, 6 MB L2/2C

Nehalem

Nehalem-EP 45 nm 3/2009 5500 (Gainstown) 1x4 C, ¼ MB L2/C, 8 MB L3

Westmere-EP 32 nm 3/2010 56xx (Gulftown) 1x6 C, ¼ MB L2/C, 12 MB L3

Nehalem-EX 45 nm 3/2010 6500 (Beckton) 1x8 C, ¼ MB L2/C, 24 MB L3

Westmere-EX 32 nm

4/2011 E7-28xx (Westmere-EX) 1X10 C, ¼ MB L2/C, 30 MB L3

Sandy Bridge

Sandy Bridge-EN 32 nm 5/2012 E5-2xxx 1x8 C, ¼ MB L2/C, 20 MB L3

Ivy Bridge 22 nm

Table 2.2: Overview of Intel’s multicore DP server processors

2.1 Processors (6)

Page 45: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Basic Arch.

Core/technology MP server processors

Pentium 4 (Prescott)

Pentium 4 90 nm 11/2005 Paxville MP 2x1 C, 2 MB L2/C

Pentium 4 65 nm 8/2006 7100 (Tulsa) 2x1 C, 1 MB L2/C 16 MB L3

Core 2

Core2 65 nm 9/20077200 (Tigerton DC)7300 (Tigerton QC)

1x2 C, 4 MB L2/C2x2 C, 4 MB L2/C

Penryn 45 nm 9/2008 7400 (Dunnington) 1x6 C, 3 MB L2/2,C 16 MB L3

Nehalem

Nehalem-EP 45 nm

Westmere-EP 32 nm

Nehalem-EX 45 nm 3/2010 7500 (Beckton) 1x8 C, ¼ MB L2/C, 24 MB L3

Westmere-EX 32nm 4/2011 E7-48xx (Westmere-EX) 1x10 C, ¼ MB L2/C, 30 MB L3

Sandy Bridge

Sandy Bridge-EP 32 nm 5/2012 E5-4xxx 1x8C, ¼ MB L2/C, 20 MB L3

Ivy Bridge 22 nm

Table 2.2: Overview of Intel’s multicore MP server processors

2.1 Processors (7)

Page 46: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.2. The memory subsystem

2.2.1. Key parameters of the memory subsystem•

2.2.2. Main attributes of the memory technology used•

2.2.2.1. Overview: Main attributes of the memory technology used

2.2.2.2. Memory type•

2.2.2.2. Speed grades•

2.2.2.4. DIMM density•

2.2.2.5. Use of ECC support•

2.2.2.6. Use of registering•

Page 47: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.2.1 Key performance parameters of the memory subsystem (1)

2.2.1 Key performance parameters of the memory subsystem

This issue will be discussed in Section 4.

Page 48: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.2.2 Main attributes of the memory technology used

Speed grade Use of registering

Memory type Use of ECC support

Main attributes of the memory technology used

2.2.2.1 Overview: Main attributes of the memory technology used

DIMM density

2.2.2.2Section 2.2.2.2 2.2.2.4 2.2.2.5 2.2.2.6

2.2.2 Main attributes of the memory technology used

Page 49: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.2.2.2 Memory type (1)

a) Overview: Main DRAM types

1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers

DRAM

(1970)

FB-DIMM

(2006)

DRDRAM

(1999)

DDR3

(2007)

DDR2

(2004)

DDR

(2000)

SDRAM

(1996)

FPM

(1983)

FP

(~1974)

XDR

(2006)1Year

of intro.

Asynchronous DRAMs Synchronous DRAMs

DRAMs with parallel bus connection

DRAMs with serial bus connection

DRAMs for general use

Main stream DRAM types Challenging DRAM types

EDO

(1995)

Commodity DRAMs

2.2.2.2 Memory type

Page 50: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

b) Synchronous DRAMs (SDRAM, DDR, DDR2, DDR3)

2.2.2.2 Memory type (2)

Page 51: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

SDRAM

DDR

DDR2

DDR3

168-pin

184-pin

240- pin

240-pin

All these DIMM modules are 8-byte wide

SDRAM to DDR3 DIMMs

2.2.2.2 Memory type (3)

Page 52: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Memory CellArray

I/OBuffers

Memorycontroller

(MC)

DRAM device

Sourcing/sinking datato/from the I/O buffers

• at a rate of fCell

• at a width of FW (Fetch Width)

Receiving/transmitting datato/from the MC

fCell fCK

Data transmission

• at a rate of fCK (SDRAM) or

• 2 x fCK(DDR to DDR3)

• on the rising edge of the strobe (CK) for SDRAMs or• on both edges of the strobe (DQS) for DDR/DDR2/DDR3.

Principle of operation of synchronous DRAMs (SDRAM to DDR3 memory chips)

2.3.2.2 Memory type (4)

Page 53: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

The memory cell array sources/sinks data to/from the I/O buffers

• at a rate of fCell, where fCell is the clock frequency of the memory cell aray,

• at a data width of FW, where FW is the fetch width of the memory cell array.

• fCell is 100 to 200 MHz

• fCK stands in a given ratio with fCell (the clock frequency of the memory cell array) as follows:

• When a new memory technology (e.g. DDR2 or DDR3) appears fCore is initially 100 MHz, .this sets the initial speed grade of fCK accordingly (e.g. to 400 MT/s for DDR2 or to 800 MT/s for DDR3).

• As memory technology evolves fCore will be raised from 100 MHz to 133, 167 and to 200 MHz.

• Along with fCore fCK and the final speed grade will also be raised.

The core clock frequency of the memory cell array (fcell)

Sourcing/sinking data by the memory cell array

Raising fCell from 100 MHz to 200 MHz characterizes the evolution of each memory technology

fCK

SDRAM fcell

DDR fcell

DDR2 2 x fcell

DDR3 4 x fcell

2.3.2.2 Memory type (5)

Page 54: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

It specifies how many times more bits the cell array fetches per column cycle then the data width of the device (xn).

E.g. a 4-bit wide DRAM device (x4 DRAM chip) with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array in every fCell cycle.

The fetch width (FW) of the memory cell array of synchronous DRAMs is as follows:

The fetch width (FW) of the memory cell array

DRAM type FW

SDRAM 1

DDR 2

DDR2 4

DDR3 8

2.3.2.2 Memory type (6)

Page 55: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

DRAM core clock100 MHz

Clock (CK/CK#)400 MHz

Memory CellArray

I/OBuffers

DDR3SDRAM DDR3-800

2 x fCK

fCell

n bits

8xn bits

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

800 MT/s

Data Strobe (DQS)400 MHzE.g.

DRAM core clock100 MHz

Clock (CK/CK#)200 MHz

Memory CellArray

I/OBuffers

DDR2SDRAM DDR2-400

2 x fCK

fCell

4xn bitsn bits

Data Strobe (DQS)200 MHz

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

400 MT/s

E.g.

Memory CellArray

I/OBuffers

DDRSDRAM DDR-200

fCKfCell

2xn bitsn bits

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

200 MT/s

DRAM core clock100 MHz

Clock (CK/CK#)100 MHz

Data Strobe (DQS)100 MHzE.g.

DRAM core frequency100 MHz

Clock frequency (fCK)

100 MHz

Clock (CK)100 MHzE.g.

Memory CellArray

I/OBuffers

SDRAMSDRAM-100

fCKfCell

n bits n bits

Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1)

100 MT/s

Page 56: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

shorter signal rise/fall times higher speed grades

but lower voltage budget higher requirements for signal integrity

Smallervoltageswings

Q = Cin x V = I x t tR ~ Cin x V/I

Q: Charge on the input capacitance of the line (Cin)Cin: Input capacitance of the line V: Voltage I: Current strength of the driver tR: Rise time

Relation between voltage swings and rise/fall times of signals

Voltage/Voltage swingMemory type

SDRAMDDRDDR2DDR3

3.3 V2.5 V1.8 V1.5 V

The main technique to increase memory speed

2.2.2.2 Memory type (9)

Page 57: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.4: Signal types used in MMs for control, address and data signals

Signals

Voltage referencedSingle ended Differential

LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL(D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage VREF: Reference Voltage

t t

VREF

LVTTL (3.3 V) FPM/EDO SDRAM HI1.5

TTL (5 V)

FPM/EDO

SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3)RSL (RDRAM)FSB

LVDS PCIe QPI, DMI, ESI FB-DIMMs

t

S+

S-VCM

Smaller voltage swings

Typ.voltageswings 600-800 mV

DRSL XDR (data)

200-300 mV3.3-5 V

Signalingsystem used

Signaling used in buses

2.2.2.2 Memory type (9b)

Page 58: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.7: Signaling alternatives of buses used with memories

FPMEDO

SDRAM

DDRDDR2DDR3

RDRAM

FBDIMM

Sig

nalin

g o

f d

ata

lin

es

Volt

ag

e r

ef.

(RS

L,

SS

TL)

Diff

ere

nti

al

(DR

SL,

LV

DS

)S

ing

le e

nd

ed

(TTL,

LV

TTL)

XDRXDR2

Signaling of command, control and adress lines

Voltage ref.(RSL, SSTL)

Single ended(TTL, LVTTL)

Differential(DRSL, LVDS)

2.2.2.2 Memory type (10)

Page 59: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Table 2.4: Key features of synchronous DRAM devices

SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM

JEDEC standard JESD 21-C Release 4 JESD 79 JESD 79-2 JESD 79-3

Key featuresSynchronous, pipelined,

burst orientedDouble data rate

2n prefetch architectureDouble data rate

4n prefetch architectureDouble data rate

8n pref. architecture

StandardFirst/last release

JESD 21-CRelease 411/1993

JESD 796/2000

JESD 79E5/2005

JESD 79-29/2003

JESD 79-2C5/2006

JESD 79-36/2007

Device density 64 Mb 128 Mb - 1Gb 256 Mb - 4 Gb 256 Mb – 4 Gb 512 Mb – 8Gb

Organization x4/8/16 x4/8/16 x4/8/16 x4/8/16 x4/8/16

Device speed (MT/s) 66 100/133 200/266200/266/333/400

400/533/667/800

800/1066/1333/1600

Device density 4/16 Mb16-256 Mb

x8/1664-512 Mb

x8/16128-512 Mb

x8/16256 Mb – 1 Gb

x8/16256 Mb -1 Gb

x8/16512 Mb – 16 Gb

Typ. processorsPentium

(3V)Pentium III

P4 (Willamette)

P4 (Northwood)P4 (Prescott)

P4 (Prescott)P4 (Presler)Pentium DCore2 Duo

Core2 Duo toSandy Bridge

Voltage 3.3 V 2.5 V 1.8 V 1.5 V

No. of pins on the modul 168 184 240 240

Key features of synchronous DRAM devices (SDRAM to DDR3)

2.2.2.2 Memory type (11)

Page 60: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Approximate appearance dates and speed grades of DDR DRAMs as well as the bandwidth provided by a dual channel memory subsystem

Bandwidth1

1 Bandwidth of a dual channel memory subsystem [12]

2.2.2.2 Memory type (12)

Page 61: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Green and ultra-low power memories

Green memories: lower dissipation memories

Ultra-low-power DDR3 memories: Use of 1.35 V supply voltage instead of 1.50 V to reduce dissipation

They represents the latest achievements of the DRAM memory technology

2.2.2.2 Memory type (13)

Page 62: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Green and ultra-low power memories- Examples [13]

2.2.2.2 Memory type (14)

Page 63: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

c) FB-DIMMs

1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers

DRAM

(1970)

FB-DIMM

(2006)

DRDRAM

(1999)

DDR3

(2007)

DDR2

(2004)

DDR

(2000)

SDRAM

(1996)

FPM

(1983)

FP

(~1974)

XDR

(2006)1Year

of intro.

Asynchronous DRAMs Synchronous DRAMs

DRAMs with parallel bus connection

DRAMs with serial bus connection

DRAMs for general use

Main stream DRAM types Challenging DRAM types

EDO

(1995)

2.2.2.2 Memory type (15)

Page 64: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

• Introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses)

• Introduce full buffering (registered DIMMs buffer only addresses)

• CRC error checking (cyclic redundancy check)

Principle of operation

2.2.2.2 Memory type (16)

Page 65: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

The architecture of FB-DIMM memories [19]

2.2.2.2 Memory type (17)

Page 66: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.8: Maximum supported FB-DIMM configuration [20](6 channels/8 DIMMs)

2.2.2.2 Memory type (18)

Page 67: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

• Serial (differential) transmission between the North Bridge and the DIMMs (each bit needs a pair of wires)

• Read packets (frames, bursts): 168 bits (12 x 14 bits)

• 144 data bits

(equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles)• 24 CRC bits.

• Every 12 cycles (that is every two memory cycles) constitute a packet.

• Write packets (frames, bursts): 120 bits (12 x 10 bits)

• 98 payload bits

• 22 CRC bits.

• Clocked at 6 x data rate of the DDR2

e.g. for a DDR-667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz

• Number of seral links

• 14 read lanes (2 wires each)• 10 write lanes (2 wires each)

Implementation details (1)

2.2.2.2 Memory type (19)

Page 68: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

98 payload bits.

• 2 frame type bits,

• 24 bits of command,

• 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands.

Commands

• all commands include a 3-bit FB-DIMM module address to select one of 8 modules.

Implementation details (2)

2.2.2.2 Memory type (20)

Page 69: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Source: PC stats

FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s

FB-DIMM-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s

FB-DIMM-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s

FB-DIMM data puffer

Figure 2.9: Different implementations of FB-DIMMs

(Advanced Memory Buffer, AMB)

Manages the read/write operationsof the module

2.2.2.2 Memory type (22)

Page 70: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)

Figure 2.10: Block diagram of the AMB [21]

2.2.2.2 Memory type (23)

Page 71: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Necessary routing to connect the north bridge to the DIMM socket

a) In case of a DDR2 DIMM (240 pins)

b) In case of an FB-DIMM (69 pins)

A 3-layer PCB is needed A 2-layer PCB is needed(but a 3. layer is used for power lines)

Figure 2.11: PCB routing [19]

2.2.2.2 Memory type (24)

Page 72: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Assessing benefits and drawbacks of FB-DIMM memories (as compared to DDR2/3 memories)

Benefits of FB-DIMMs

higher memory size and bandwidth

• more DIMM modules (up to 8) per channel

higher memory size (6x8=48 DIMM size)

• more memory channels (up to 6)

Drawbacks of FB-DIMMs

• higher latency

(Typical dissipation figures: DDR2: about 5 W AMB: about 5 W FB-DIMM with DDR2: about 10 W)

• higher cost

• higher dissipation

asuming 8 GB/DIMM up to 512 GB

• same bandwidth figures as the parts based on (DDR2)

2.2.2.2 Memory type (25)

Page 73: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Latency [22]

• Due to their additional serialization tasks and daisy-chained nature FB-DIMMs have about 15 % higher overall average latency than DDR2 memories.

Production

The production of FB-DIMMs stopped with DDR2-800 modules, no DDR3 modules came to the market due to the drawbacks of the technology.

2.2.2.2 Memory type (26)

Page 74: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.2.2.2 Speed grades (1)

Overview of the speed grades of DDR DRAMs

Bandwidth1

1 Bandwidth of a dual channel memory subsystem [12]

2.2.2.2 Speed grades

Page 75: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Then subsequent speed grades of FSBs and also those of the memories were chosen as subsequent integral multiples of 133 MHz, such as

266 = 2 x 133 400 ~= 3 x 133 533 ~= 4 x 133 667 ~= 5 x 133 800 ~= 6 x 1331067 ~= 7 x 1331333 ~= 8 x 1331600 ~= 9 x 133 etc.

Remark

Speed grades of FSBs and DRAMs were defined at the time when the base clock frequency of the FSBs was 133 MHz (around 2000).

2.2.2.2 Speed grades (2)

Page 76: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.12: The evolution of peak transfer rates of parallel connected synchronous DRAMs as manifested in Intel’s chipsets

Transfer rate(MT/s)

50

100

500

Year03 0596 97 98 99 2000 01 02 04 06 07 08

*

**

*

*

*

*

*

20

*

1000

SDRAM66

5000

200

2000

10

~ 10*/10 years

DDR266

DDR2533

SDRAM100

DDR31333

DDR2667

DDR2800

DDR333

SDRAM133

*

DDR400

*

DDR31600

Rate of increasing the transfer rates in synchronous DRAMs

2.2.2.2 Speed grades (3)

Page 77: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Kind of attaching memory (In Intel’s MC systems, typically)

Attaching memoryby parallel channels

Attaching memory by serial channels

Using serial channelswith S/P converters

Memory is attachedto the MCH

Memory is attachedto the processor(s)

Up to DDR2-667 Up to DDR3-1600 Up to DDR3- 1600/2133

Memory speed grades used in Intel’s multicore systems

Using FB-DIMMs

Up to DDR2-667

2.2.2.2 Speed grades (4)

Page 78: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.13: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [23])

256M

64K

16M

1G

4M

256K

64M

1M

20151980 1985 1990 1995 2000 2005 2010

500

1000

1500

2000

16K

Units 106

Year

Density: ~4×/4Y

a) Device density

2.2.2.4. DIMM density

2.2.2.4 DIMM density (1)

Page 79: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

b) DIMM (module) density

Based on device densities of 1 to 8 Gb and with typical width of x4 to x16 (bits) DDR2 orDDR3 modules provide typical densities of up to 8 or 16 GB.

2.2.2.4 DIMM density (2)

Page 80: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Implemented as SEC-DED (Single Error Corretion Double Error Detection)

Single bit Error Correction

The minimum number of check-bits (P) for single bit error corection ?

2P ≥ the minimum number of states to be distinguished.

For D data bits P check-bits are added.

Figure: The code word

Requirement:

Data bits Check bits

2.2.2.5 Use of ECC support (1)

ECC basics (as used in DIMMs)

D P

2.2.2.5 Use of ECC support

Page 81: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

• It is needed to specify the bit position of a possible single bit error in the code word consisting of both data and check bits This requires D + P states

• one additional state to specify the „no error” state.

2P ≥ D + P + 1

The minimum number of states to be distinguished:

the minimum number of states to be distinguished is: D + P + 1

to implement single bit error correction the minimum number of check bits (P) needs to satisfy the requirement:

Accordingly:

2.2.2.5 Use of ECC support (2)

Page 82: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Double bit error detection

an additional parity bit is needed to check for an additional error.

Then the minimum number of check-bits (CB) needed for SEC-DED is:

CB = P + 1

2CB-1 ≥ D + CB -1 + 1

Table 2.5: The number of check-bits (CB) needed for D data bits

since

Data bits (D) Check bits (CB)

1 2

3:2 3

7:4 4

15:8 5

31:16 6

63:32 7

127:64 8

255:128 9

511:256 10

2CB-1 ≥ D + CB

2P ≥ D + P + 1

P = CB - 1

2.2.2.5 Use of ECC support (3)

Page 83: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Supported memory features of DT and DP/MP platforms

DT memories typically do not support ECC or registered (buffered) DIMMs,

Servers make typically use of registered DIMMs with ECC protection.

2.2.2.5 Use of ECC support (4)

Page 84: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.14:Typical layout of a registered memory module with ECC [14]

• Two register chips, for buffering the address- and command lines• A PLL (Phase Locked Loop) unit for deskewing clock distribution.

Typical implementation of ECC protected registered DIMMs (used typically in servers)

ECC

RegisterRegister PLL

Main components

2.2.2.5 Use of ECC support (5)

Page 85: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.2.2.6 Use of registering (1)

Higher memory capacities need more modules

Higher loading the lines

Signal integrity problems

Buffering address and command lines,Phase locked clocking of the modules

Problems arising while implementing higher memory capacities

2.2.2.6 Use of registering

Page 86: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Registering

Principle

• to reduce signal loading in a memory channel• in order to increase the number of supported DIMM slots (max. mem. capacity), needed first of all in servers.

Buffering address and control lines

2.2.2.6 Use of registering (2)

Page 87: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.17: Example. Block diagram of a registered DDR DIMM [16]

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

PI74SSTV16857 Register

PI74SSTV16857 Register

Address/Controlform

Motherboard

Address Controlfrom

Motherboard

PI6CV857PLL

Input Clockfor

Motherboard

Data From / To Motherboard

Example: Block diagram of a registered DDR DIMM

2.2.2.6 Use of registering (3)

Page 88: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Implementation of registering

Figure 2.15: Registered signals in case of an SDRAM memory module [15]

REGISTER

REGE: Register enable signal

Note: Data (DQ) and data strobe (DQS) signals are not registered

as only address an control signals are common for all memory chips.

By means of a register chip that buffers address and control lines

2.2.2.6 Use of registering (4)

Page 89: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Number of register chips required

• Synchronous memory modules (SDRAM to DDR3 DIMMs) have about 20 – 30 address and control lines,

• Register chips buffer usually 14 lines,

Typically, two register chips are needed per memory module [16].

2.2.2.6 Use of registering (5)

Page 90: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.16:Typical layout of a registered memory module with ECC [14]

• Two register chips, for buffering the address- and command lines• A PLL (Phase locked loop) unit for deskewing clock distribution.

Typical layout of registered DIMMs

ECC

RegisterRegister PLL

2.2.2.6 Use of registering (6)

Page 91: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.17: Example. Block diagram of a registered DDR DIMM [16]

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

PI74SSTV16857 Register

PI74SSTV16857 Register

Address/Controlform

Motherboard

Address Controlfrom

Motherboard

PI6CV857PLL

Input Clockfor

Motherboard

Data From / To Motherboard

Example: Block diagram of a registered DDR DIMM

2.2.2.6 Use of registering (7)

Page 92: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.18: Registered DIMM module with ECC [14]

Registered DIMM module with ECC

ECC

2.2.2.6 Use of registering (8)

Page 93: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

in servers (Memory capacities: a few tens of GB to a few hundreds of GB)

Typical use of registered DIMM (RDIMM)

Typical use of unregistered DIMMs (UDIMMs)

in desktops/laptops (Memory capacities: up to a few GB)

2.2.2.6 Use of registering (9)

Page 94: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.3. Buses interconnecting platform components

Page 95: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

2.3 Buses interconnecting platform components (1)

Buses interconnectingprocessors

(In NUMA topologies)

Buses interconnecting processors to chipsets

Buses interconnectingMCHs to ICHs

(In 2-part chipsets)

Use of buses in Intel’s DT/DP and MP platforms

2.3 Buses interconnecting platform components

RemarkBuses connecting the memory subsystem with the main body of the platforms are memory specific interfaces and will be discussed in Section 4.

Nehalem-EX (8C) Westmere-EX

(10C)

QPI

QPI

DDR3-1067

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMB

SMB

SMB

SMB

7500 IOH

QPI

Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)

Nehalem-EX (8C) Westmere-EX

(10C)

or

Xeon 6500(Nehalem-EX)

(Becton)

Xeon E7-2800(Westmere-EX)

ME

SMI: Serial link between the processor and the SMB

SMB: Scalable Memory Buffer with Parallel/serial conversion

SMI links SMI links

Page 96: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Parallel/serial bus

Parallel bus

HI1.5

4-bit wide(4 PCIe lanes)

Serial bus(Point-to-point interconnection)

DMI (Direct Media Interface)

ESI (Enterprise System Interface)

DMI2(Direct Media Interface 2.G.)

FSB(Front Side Bus)

64-bit wide 8-bit wide

Used to interconnectprocessors to chipsetsin previous platforms

Used to interconnectMCHs to ICHs

in previous platforms

16-bit wide

QPI(Quick Path Interconnect)

QPI1.1(Quick Path Interconnect v.1.1)

Used to interconnectprocessors to processors

and processors to chipsets

Used to interconnectprocessors to chipsets

or MCHs to ICHs

Implementation of buses used in Intel’s DT/DP and MP platforms

2.3 Buses interconnecting platform components (2)

Page 97: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Buses used in Intel’s DT/DP/MP platforms

Buses interconnectingprocessors

(In NUMA topologies)

Buses interconnecting processors to chipsets

Buses interconnectingMCHs to ICHs

(In 2-parts chipsets)

Seri

al b

us

Para

llel/

seri

al b

us

Para

llel b

us

FSB (64-bit: 1993) HI 1.5 (1999)

DMI/ESI (20041)QPI (2008)

• 64-bit wide• ~150 lines• 3.2-12.8 GB/s total in both directions

• 8-bit wide• 16 lines• 266 MB/s total in both directions

• 4 PCIe lanes• 18 lines• 1 GB/s/direction

• 4 PCIe lanes• 18 lines• 2 GB/s/direction

DMI2 (2011)

• 20 lanes• 84 lines• 9.6/11.72/12.8 GB/s in each direction

DMI/ESI (2008)2

• 4 PCIe lanes• 18 lines• 1 GB/s/direction

• 4 PCIe lanes• 18 lines• 2 GB/s/direction

DMI2 (2011)

QPI (2008)

• 20 lanes• 84 lines• 9.6/11.72/12.8 GB/s in each direction

QPI1.1 (2012?)

Specification na.

Low-cost systems

High-performancesystems

2.3 Buses interconnecting platform components (3)

Page 98: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

1 DMI: Introduced as an interface between the MCH and the ICH first along with the ICH6, supporting Pentium 4 Prescott processors, in 2004.

2 DMI: Introduced as an interface between the processors and the chipset first between Nehalem-EP and the 34xxPCH, in 2008, after the memory controllers were placed to the processor die.

Remarks

2.3 Buses interconnecting platform components (4)

Page 99: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.4: Signal types used in MMs for control, address and data signals

Signals

Voltage referencedSingle ended Differential

LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL(D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage VREF: Reference Voltage

t t

VREF

LVTTL (3.3 V) FPM/EDO SDRAM HI1.5

TTL (5 V)

FPM/EDO

SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3)RSL (RDRAM)FSB

LVDS PCIe QPI, DMI, ESI FB-DIMMs

t

S+

S-VCM

Smaller voltage swings

Typ.voltageswings 600-800 mV

DRSL XDR (data)

200-300 mV3.3-5 V

Signalingsystem used

Signaling used in buses

2.3 Buses interconnecting platform components (5)

Page 100: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Main features of parallel buses used in Intel’s multicore platforms

FSB HI 1.5

Typical useConnecting the processors

and the chipsetConnecting MCH and ICH

Introduced With the Pentium (1993) With the Pentium III (1999)

Width 64 bit 8 bit

Clock 100-400 MHz 66 MHz

DDR/QDR QDR since Pentium 4 (2000) QDR

Transfer rate 400-1600 MT/s 266 MT/s

Bandwidth3.2-12.8 GB/s

in both directions altogether266 MB/s

in both directions altogether

Signaling Voltage referenced data signals Single-ended data signals

No. of lines ~ 150 lines ~ 16 lines

FSB/HI 1.5: Bus type interconnects

2.3 Buses interconnecting platform components (6)

Page 101: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Main features of serial buses used in Intel’s platforms

DMI/ESI DMI2 QPI QPI 1.1

Typical useTo interconnect MCHs and ICHs

or processors to chipsets inNUMA platforms

To interconnect processors in NUMA topologies or processors to chipsets

IntroducedIn connection with 2. gen.

Nehalem in 2008

In connection with Sandy

Bridge in 2011

In connection with Nehalem-EP in 2008

In connection with Sandy Bridge in

2012 (?)

Width 4 PCI lanes 4 PCI2 lanes 20 lanesNo specification available yet

Clock 2.5 GHz 5 GHz 2.4/2.93/3.2 GHz

DDR – – DDR

Encoding 10bit/8bit 10bit/8bit no

Bandwidth/direction

1 GB/s 2 GB/s 9.6/11.72/12.8 GB/s

Signaling LVDS LVDS LVDS

No. of lines 18 lines 18 lines 84 lines

DMI/QPI: Point-to-point interconnection

2.3 Buses interconnecting platform components (7)

Page 102: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Comparing main features of Intel’s FSB and QPI [9]

2.3 Buses interconnecting platform components (8)

GTL+: A kind of voltage refenced signaling

Page 103: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

Figure 2.5: LVDS Single Link Interface Circuit [10]

Principle of LVDS signal transmission used in serial buses

2.3 Buses interconnecting platform components (9)

Page 104: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

PCI Express Data Frame [10]

PCIe package format (data frames)

The related fields are:

Field Interpretation

Frame 1-byte Start-of-Frame/End of Frame

Seq# 2-byte Sequence Number

Header 16- or 20-byte Header

Data 0-4096-byte Data field

CRC4 byte ECRC (End-to-End CRC) + 4-byte LCRC (Link CRC) (CRC: Cyclic Redundancy Check)

2.3 Buses interconnecting platform components (10)

Page 105: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

16 data 2 protocol

2 CRC

TX Unidirectional link

RX Unidirectional link

Figure 2.6: Signals of the QuickPath Interconnect bus (QPI-bus) [11]

Principle of the QuickPath Interconnect bus (QPI bus)

2.3 Buses interconnecting platform components (11)

Page 106: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

5. References

Page 107: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

5. References (1)

[1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino

[2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226

[3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/

[4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29.

[5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm

[6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf

[7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004 , http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm

[8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2

[9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf

[10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html

Page 108: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

5. References (2)

[11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf

[12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html

[13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf

[14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org

[15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf

[16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf

[17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf

[18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years, Xbit Laboratories, 04/28/2006, http://www.xbitlabs.com/news/cpu/display/20060428162855.html

[19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7

Page 109: Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.

5. References (3)

[22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, 2007,

[20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1

[21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf

[23]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf