Dezső Sima 2011 November (Ver. 1.4) Sima Dezső, 2011 Platforms I.

113
Dezső Sima 2011 November (Ver. 1.4) Sima Dezső, 2011 Platforms I.

Transcript of Dezső Sima 2011 November (Ver. 1.4) Sima Dezső, 2011 Platforms I.

Page 1: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Dezső Sima

2011 November

(Ver. 1.4) Sima Dezső, 2011

Platforms I.

Page 2: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Contents

2. Main components of platforms•

1. Introduction to platforms•

5. References•

3. Platform architectures

4. Memory subsystem design considerations•

Page 3: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1. Introduction to platforms

1.1. The notion of platform•

1.2. Description of particular platforms

1.3. Representation forms of platforms

1.4. Compatibility of platform components

Page 4: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1.1. The notion of platform

Page 5: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The notion platform is widely used in different segments of the IT industry e.g. by IC manufacturers, system providers or even by software suppliers with different interpretations.Here we are focusing on the platform concept as used typically by system providers.

1.1 The notion of platform (1)

1.1 The notion of platform

System providers however, may use the notion platform either in a more general or a more specific sense.

Interpretation of the notion platform

Interpretation in a more general sense

Interpretation in amore specific sense

Unified system design A particular unified system architecture, developed for a given application area.

such as a DT or MP platform

Page 6: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI C-link

Two memory channelsDDR2-800/666/533

Two DIMMs per channel

FSB: 1066/800/533 MT/s speed

ME

Intel’s Core 2 Duo (and Core 2 Extreme (the highest speed model) aimedDT platform (the Bridge Creek platform)

Unified system design means that the system architecture is partitioned to a small number of standard components, such as the processor, memory control hub (MCH), I/O control hub (ICH) that are interconnected by specified (standard) interconnections.

Thus the notion platform designates system architectures with unified design in the above sense.

1.1 The notion of platform (2)

Interpretation the notion platform in a more general sense

Page 7: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The need for a unified system design, called platform design, arose in the PC industry in the time when PCI-based system designs were substituted by port based system designs, about 1998-1999 .

Remark

1.1 The notion of platform (3)

Page 8: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Late PCI-based system architecture (~ 1998)

(used typically with Pentium II/III(built around Intel’s 440xx chipset)

Systemcontroller

PCI bus

Processor bus

Main Memory(EDO/SDRAM)

Peripheralcontroller

PCI device adapter

ISA deviceadapter

ISA bus

Pentium II/Pentium III

Pentium II/Pentium III

AGP

2xIDE/ATA33/66

2xUSB

(Legacy and/orslow devices)

Systemcontroller

PCI bus

Processor bus

Main Memory(SDRAM)

Peripheralcontroller

PCI device adapter

ISA deviceadapter

ISA bus

AGP

2xIDE/

2x/4x USB

Hub interfaceATA 33/66/100

PCI to ISA bridge

LPCSuper I/O (KBD, MS, etc.)AC'97

Legacy devices

Pentium III

Early port-based system architecture (~ 1999)(used first with Pentium III

(built around Intel’s 810 chipset)

1.1 The notion of platform (4)

Page 9: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

In a more specific sense the notion platform refers to a particular unified system architecture, that is developed for a given application area, such as a DT, DP or MP platform.

• the processor or processors,• the chipset, • in some cases, such as in mobile or business oriented DT platforms also the networking component [7],• the buses interconnecting the above components of the platform as well as • the memory subsystem (MSS) that is attached by a specific memory interface..

In this sense the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of

Subsequently, we will focus on the interpretation of the notion platform in this latter sense.

Chipset Buses interconnecting the preceding

basic components

Processor or processors

The memory subsystem

Basic components of a platform

(LAN controller)

1.1 The notion of platform (6)

Interpretation the notion platform in a more specific sense

Page 10: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The primary goals of introducing unified system designs are

• to minimize design rework while moving from one processor generation to the next and • to stabilize interfaces for server and desktop designs [2]• to shorten the time to market.

1.1 The notion of platform (5)

Page 11: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Example 1: Intel’s Core 2 aimed home user DT platform (Bridge Creek) [3]

2 DIMMs/channel

2 DIMMs/channel

card

C-link

1066 MT/s

Display

Platform

1.1 The notion of platform (8)

Page 12: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Nehalem-EX 8CWestmere-EX

10C

7500 IOH

QPI

QPI

QPIQPI QPI QPI

QPIQPI

SMB

SMB

DDR3-1067

SMB

SMB

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMI: Serial link between the processors and SMBsSMB: Scalable Memory Buffer Parallel/serial conversion

SMB

SMB

SMB

SMB

SMB

SMB

SMB

SMB2x4 SMI

channels2x4 SMI

channels

Example 2: Intel’s Nehalem-EX aimed Boxboro-EX MP server platform, assuming 1 IOH

ME

ME: Management Engine

Xeon 7500(Nehalem-EX)(Becton) 8C

Xeon 7-4800(Westmere-EX) 10C

Nehalem-EX 8CWestmere-EX

10C

Nehalem-EX 8CWestmere-EX

10C

Nehalem-EX 8CWestmere-EX

10C

/

Platform

Interfaces connecting platformcomponents

1.1 The notion of platform (9)

Page 13: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The structure of a platform is termed as its architecture (or topology).

It describes the basic components and their interconnections and will be discussed in Section 3.

1.1 The notion of platform (9)

Page 14: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Main goals of the system level design are

• to reduce the complexity of designing complex systems by partitioning it,• in this way to reduce the time-to-market of products,• to be able to enhance system components (such as processors) upward compatible as long as the same interfaces (e.g. an FSB with a given max. frequency) are used.

• The platform concept and platform based design will be considered as part of the system level design. • It became the topic of scientific research at the end of the 1990s, see e.g. [4].

Many facets of the platform concept

The platform concept as seen from the point of view of the system developers

Platform components are typically co-designed, announced and delivered as a set.

Co-design of platform components

1.1 The notion of platform (10)

Page 15: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

• With the platform concept in mind manufacturers, like Intel or AMD will plan, design and market all key components of a platforms, such as the processor or the processors and the related chipset as an integrated entity [5].

• This is benefitial for the manufacturers since it motivates OEMs as system providers, to buy all key parts of a computer system from the same manufacturer.

The platform concept as seen from the point of view of the manufacturers

1.1 The notion of platform (11)

Page 16: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The platform concept as seen from the point of view of the customers

The platform concept is benefitial for the customers as well since an integrated “backbone” of a system architecture promises a more reliable and more cost effective system.

1.1 The notion of platform (12)

Page 17: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Historical remarks

System providers began using the notion “platform” about 2000, like

• Philips’ Nexperia digital video platform (1999), • Texas Intruments (TI) OMAP platform for SOCs (2002),• Intel’s first generation mobile oriented Centrino platform for laptops, designated as the Carmel platform (3/2003).

Intel contributed significantly for spreading the notion platform when based on the success of their Centrino platform they introduced this concept also for their desktops [5] and servers [6], [7] in 2004.

1.1 The notion of platform (13)

Page 18: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Intel’s early server and workstation roadmap from Aug. 2004 [6]

Note

a) This roadmap already makes use of the notion platform without revealing platform names.b) In 2004 Intel made a transition from 32 bit systems to 64 bit systems.

1.1 The notion of platform (14)

Page 19: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Intel’s multicore platform roadmap announced at the IDF Spring 2005 [8]

Note

This roadmap includes also the particular platform designations for desktops, UP servers etc.

1.1 The notion of platform (15)

Page 20: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1.2. Description of a particular platform

Page 21: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Description of a particular platform

Detailing the platform

architecture

Description of a particular platform

Example: The Tylersburg DT platform (2008)

1.2 Description of a particular platform (1)

Processor

MCH

ICH

Page 22: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Detailing the platform architecture includes the architecture of the processor-, the memory- and the I/O subsystems (to be discussed in Section 3).

1.2 Description of a particular platform (2)

Example: The Tylersburg DT platform (2008)

Processor

MCH

ICH

It is concerned with issues, such as whether the processors of an MP server are connected to the MCH via an FSB or otherwise, or whether the memory is attached to the system architecture through the MCH or through the processors etc.).

Page 23: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Identification of theplatform components

Description of a particular platform

Detailing the platform

architecture

Description of a particular platform

X58 IOH

ICH10

1. gen. Nehalem (4C)/

Westmere-EP (6C)

Example: The Tylersburg DT platform (2008)

Processor

MCH

ICH

1.2 Description of a particular platform (3)

Page 24: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Identification of theplatform components

Description of a particular platform

Specification of the interfaces

interconnecting the platform components

Detailing the platform

architecture

Description of a particular platform

X58 IOH

ICH10

1. gen. Nehalem (4C)/

Westmere-EP (6C)

X58 IOH

ICH10

QPI

DMI

1. gen. Nehalem (4C)/

Westmere-EP (6C)

Example: The Tylersburg DT platform (2008)

1.2 Description of a particular platform (4)

Processor

MCH

ICH

Page 25: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The specification of a platform will be completed by the datasheets of the related platform components.

Remark

1.2 Description of a particular platform (5)

Page 26: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Architecture ofDT platforms

Platform architecture

Architecture ofMP platforms

Architecture ofDP platforms

Architecture ofmobile platforms

In these slides platform architectures will be discussed in Section 3, nevertheless restricted only for DT, DP and MP platforms.

Dependence of the platform architecture on the platform category

Of course, beyond the above categories also further processor categories and related platforms exist, such as embedded processors and related platforms.

In conformity with different platform categories also different platform architectures arise, as indicated below.

Platforms may be classified according to the target area of application, such as

Desktop (DT) platforms

Platforms

Quad processor (MP) platforms

Dual processor (DP) platforms

Mobile platforms

1.2 Description of a particular platform (6)

Page 27: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1.3. Representation forms of platforms

Page 28: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1.3 Representation forms of platforms (1)

1.3 Representation forms of platforms

a) Thumbnail representationb) Roadmap like representation (an arbitrarily chosen representation form in these slides) c) Block diagram of a platform.

Page 29: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI

DDR2-800/666/566

C-link

Two DDR2 channels

FSB: 1066/800/566 MT/s speed

ME Two DIMMs per channel

Example

In particular, the thumbnail representation• reveals the platform architecture,

• identifies the basic components of a platform, such as the processor or processors, the chipset, in some cases (e.g. in mobile platforms) also the Gigabit Ethernet controller,

• and specifies the interconnection links (buses) between the platform components.

Intel’s Core 2 Duo aimed home user oriented platform (The bridge Creek platform)

1.3 Representation forms of platforms (3)

a) Thumbnail representation

It is a concise representation of a particular platform.

Page 30: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Example for stating the compatibility range of a platform

The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform).

1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.

1.3 Representation forms of platforms (5)

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI

DDR2-800/666/566

C-link

Two DDR2 channels

FSB: 1066/800/566 MT/s speed

ME Two DIMMs per channel

• the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and• the subsequent Core 2 Quad lines of processors,

Beyond the target processor this platform may be used also with

as shown in the next slides.

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

DT core

MCH

ICH

DT platform

Page 31: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Example for stating the compatibility range of a platform

The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform).

1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.

1.3 Representation forms of platforms (5)

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI

DDR2-800/666/566

C-link

Two DDR2 channels

FSB: 1066/800/566 MT/s speed

ME Two DIMMs per channel

• the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and• the subsequent Core 2 Quad lines of processors,

Beyond the target processor this platform may be used also with

as shown in the next slides.

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

DT core

MCH

ICH

DT platform

Page 32: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

DT cores

MCH

ICH

Pentium D/EE 8xx1

(Smithfield) 2x1C

90 nm2x115 mtrs2x103 mm2

2x1 MB L2800/533 MT/s

No multithreadingLGA775

5/2005

Pentium D/EE 9xx2,3

(Presler) 2x1C

65 nm2x188 mtrs2x81 mm2

2x2 MB L21066/800 MT/s

No multithreadingLGA775

1/2006

Pentium 4 6x0/6x1/EE

(Prescott-2M) 1C

90 nm169 mtrs135 mm2

2 MB L2800 MT/s

Two-way multithreadingLGA775

2/2005

1Pentium EE 840 supports only 800 MT/s2Pentium D 9xx support only 800 MT/s3Pentium EE 955/965 supports only 1066 MT/s

Supports alsoPentium D/EE processors/90/65 nm

Supports alsoPentium 4 6x0/6x1/EE processors/90nm

Support of Pentium 4/D/EE processors

1.3 Representation forms of platforms (6)

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

Page 33: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

11/2006

Core 2 Quad (2x2C): Q6xxxQ6xxx: Kentsfield

65 nm2x291 mtrs/2x143 mm2

2*4 MB L21066 MT/s

LGA775

Core 2 Quad (2x2C)

Supports alsoCore 2 Quad processors/65 nm

Support of Core 2 Quad processors)

1.3 Representation forms of platforms (7)

Core 2-aimed (65 nm)

7/2006

6/2006

965 Series

6/2006

(Broadwater)FSB

1066/800/566 MT/s2 DDR2 channels

DDR2-800/666/5334 ranks/channel

8 GB max.

Core 2 Duo (2C)Core 2 Extr. (2C)

Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800

E6xxx/X68001: ConroeE4xxx)1: Allendale

65 nmConroe: 291 mtrs/143 mm2

Allendale: 167 mtrs/111 mm2

Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s

E4xxx: 800MT/sLGA775

ICH8

6/2006

Bridge Creek

DT core

MCH

ICH

DT platform

Page 34: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

c) Block diagram of a platform

Example: The Core 2 aimed home user DT platform (Bridge Creek) (without an integrated display controller) [3]

2 DIMMs/channel

2 DIMMs/channel

card

C-link

1066 MT/s

Display

1.3 Representation forms of platforms (8)

Page 35: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1.4. Compatibility of platform components

Page 36: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1.4 Compatibility of platform components

1.4 Compatibility of platform components (1)

One of the goals of platform based designs is to use stabilized interfaces (at least for a while) to minimize or eliminate design rework while moving from one processor generation to the next [2]. Consequently, assuming platform based designs, platform components, such as processors or chipsets of a given line are typically compatible with their previous or subsequent generations as long as the same interfaces are used and interface parameters (such FSB speed) or other implementation requirements (either from side of the components to be substituted or the substituting components) do not restrict this.

Page 37: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

In the discussed DT platform the target processor is the Core 2, that is connected to the MCH by an FSB with 1066/800/533 MT/s.The target processor of the platform however, can be substituted

• either by processors of three previous generations or• processors of the subsequent generation (Core 2 Quad)

since all these processors have FSBs of 533/800/1066 MT/s, as shown before.

1.4 Compatibility of platform components (2)

Limits of compatibility

Nevertheless, The highest performance level Core 2 Quad, termed as the Core 2 Extreme Quad, provided already an increased FSB speed of 1333 MT/s and therefore was not more supported by the Core 2 aimed platform considered.

Core 2 Duo Core 2 Extreme

(2C)

965 Series

MCH

ICH8

FSB

DMI C-link

Two memory channelsDDR2-800/666/533

Two DIMMs per channel

FSB: 1066/800/533 MT/s

ME

Page 38: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2. Basic components of platforms

2.1. Processors•

2.2. Buses interconnecting platform components

2.3. The memory subsystem•

Page 39: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

• the processor or processors,• the chipset, • in some cases, such as in mobile or business oriented DT platforms also the networking component [7],• the buses interconnecting the above components of the platform as well as • the memory subsystem (MSS) that is attached by a specific memory interface..

As already discussed in Section 1. the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of

Subsequently, we will discuss the following three basic components of platforms:

Chipset Buses interconnecting the preceding

basic components

Processor or processors

The memory subsystem

Basic components of a platform

(LAN controller)

1.1 The notion of platform (6)

Basic components of platforms - Overview

• Processors (Section 2.1)• Buses interconnecting platform components (excluding memory buses) (Section 2.2) and • The memory subsystem (Section 2.3).

Page 40: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.1. Processors

Page 41: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.1 Processors (1)

Figure 2.1: Overview of Intel’s Tick-Tock model (based on [17])

Adv. microarch., hyperthreading, 64-bit

New microarch., 4-wide core, 128-bit SIMD, no hyperthreading

11/2007

New microarch., hyperthreading,(inclusive) L3, integrated MC, QPI

01/2006

90nm

130nmTICKTOCK

180nm

2 Y

EA

RS

2 Y

EA

RS

2 Y

EA

RS

65nm

TICK Pentium 4 / Cedar Mill

TOCK Core 2 2 Y

EA

RS

New microarch.

Adv. microarch., hyperthreadingPentium 4 /Northwood

TICKTOCK

TICKTOCK Pentium 4 /Prescott

Pentium 4 /Willamette

07/2006

11/2008

New microarch. hyperthreading,256-bit AVX, integr. GPU, ring bus,

11/2000

01/2002

02/2004

Key microarchitectural featuresIntel’s Tick-Tock model

01/2011

01/2010

Page 42: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Basic architectures Basic architectures and their shrinks

Pentium 4(Prescott)

2005 90 nm Pentium 4

2006 65 nm Pentium 4

Core 22006 65 nm Core 2

2007 45 nm Penryn

Nehalem2008 45 nm Nehalem

2010 32 nm Westmere

Sandy Bridge2011 32 nm Sandy Bridge

2012 22 nm Ivy Bridge

Basic architectures and their related shrinks

Considered from the Pentium 4 Prescott (the third core of Pentium 4) on

2.1 Processors (2)

Page 43: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

In 2003 Intel shifted the focus of their processor development from the performance goal to the aspect of performance per watt, as stated in a slide from 4/2006, see below.

Figure 2.3: Intel’s plan to develop their manufacturing technology and processor linesrevealed at a shareholder’s meeting back in 4/2006 [18]

2.1 Processors (4)

Page 44: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Basic Arch. Techn. Core/technology Cores Intro. Cache arch. Interf.

Core2 65 nm

X6800 ConroeE6xxx ConroeE4xxx AllendaleE6xxx AllendaleQX67xx Kentsfield Q6xxx Kentsfield

2C2C2C2C

2x2C2*2C

7/2006 7/20061/20077/2007

11/20061/2007

4 MB L2/2C2/4 MB L2/2C4 MB L2 /2C4 MB L2/2C4MB L2/2C4 MB l2/2C

FSB

Penryn 45 nm

E8xxx WolfdaleE7xxx Wolfdale-3MQX9xxx Yorkfield XEQ9xxx YorkfieldQ9xxx Yorkfield-6MQ8xxx Yorkfield-4M

2C2C

2x2C2*2C2*2C2x2C

1/20084/2008

11/20071/20081/20088/2008

6 MB L2/2C3 MB L2/2C6 MB L2/2C6 MB L2/2C3 MB L2/2C2 MB L2/2C

FSB

1. G. Nehalem-EP

2. G. Nehalem-EP45 nm

i7-920-965 Bloomfield

i7-8xxx/i5-7xx Lynnfield

4C

4C

11/2008

9/2009

¼ MB L2/C, 8 MB L3

¼ MB L2/C, 8 MB L3

QPI

DMI

Westmere-EP 32 nmi7-9xxX Gulftowni7-9xx Gulftowni5-6xx/i3-5xx Clarkdale

6C6C

2C+G

3/20107/20101/2010

¼ MB L2/C, 12 MB L3¼ MB L2/C, 12 MB L3

¼ MB L2/C, max. 4 MB L2

QPIQPIDMI

Sandy Bridge 32 nmi7-26/27/28/29xxi5-23/24/25xx Sandy Bridgei3-21/23xx

2/4C+G2/4C+G2C+G

1/20011/20111/2011

¼ MB L2/C, 4/8 MB L3¼ MB L2/C, 3/6 MB L3¼ MB L2/C, 3 MB L3

DMI2

Table 2.1: Intel’s Core 2 based and subsequent multicore DT processor lines

2.1 Processors (5)

Page 45: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Basic Arch. Core/technology DP server processors

Pentium 4 (Prescott)

Pentium 4 90 nm 10/2005 Paxville DP 2.8 2x1 C, 2 MB L2/C

Pentium 4 65 nm 5/2006 5000 (Dempsy) 2x1 C, 2 MB L2/C

Core 2

Core2 65 nm6/200611/206

5100 (Woodchrest)5300 (Clowertown)

1x2 C, 4 MB L2/C2x2 C, 4 MB L2/C

Penryn 45 nm 11/2007 5400 (Harpertown) 2x2 C, 6 MB L2/2C

Nehalem

Nehalem-EP 45 nm 3/2009 5500 (Gainstown) 1x4 C, ¼ MB L2/C 8 MB L3

Westmere-EP 32 nm 3/2010 56xx (Gulftown) 1x6 C, ¼ MB L2/C 12 MB L3

Nehalem-EX 45 nm 3/2010 6500 (Beckton) 1x8C, ¼ MB L2/C, 24 MB L3

Westmere-EX 32 nm

4/2011 E7-28xx (Westmere-EX) 1X10 C, ¼ MB L2/C 30 MB L3

Sandy Bridge

Sandy Bidge 32 nm 1/2011

Ivy Bridge 22 nm 11/2012?

Table 2.2: Overview of Intel’s multicore DP server processors

2.1 Processors (6)

Page 46: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Basic Arch.

Core/technology MP server processors

Pentium 4 (Prescott)

Pentium 4 90 nm 11/2005 Paxville MP 2x1 C, 2 MB L2/C

Pentium 4 65 nm 8/2006 7100 (Tulsa) 2x1 C, 1 MB L2/C 16 MB L3

Core 2

Core2 65 nm 9/20077200 (Tigerton DC)7300 (Tigerton QC)

1x2 C, 4 MB L2/C2x2 C, 4 MB L2/C

Penryn 45 nm 9/2008 7400 (Dunnington) 1x6 C, 3 MB L2/2C 16 MB L3

Nehalem

Nehalem-EP 45 nm

Westmere-EP 32 nm

Nehalem-EX 45 nm 3/2010 7500 (Beckton) 1x8 C, ¼ MB L2/C 24 MB L3

Westmere-EX 32nm 4/2011 E7-48xx (Westmere-EX) 1x10 C, ¼ MB L2/C 30 MB L3

Sandy Bridge

Sandy Bidge 32 nm /2011

Ivy Bridge 22 nm 11/2012

Table 2.3: Overview of Intel’s multicore MP server processors

2.1 Processors (7)

Page 47: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.2. Buses interconnecting platform components

Page 48: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.2 Buses interconnecting platform components (1)

Buses interconnectingprocessors

(In NUMA topologies)

Buses interconnecting processors to chipsets

Buses interconnectingMCHs to ICHs

(In 2-part chipsets)

Use of buses in Intel’s DT/DP and MP platforms

2.2 Buses interconnecting platform components

RemarkBuses connecting the memory subsystem with the main body of the platforms are memory specific interfaces and will be discussed in Section 4.

Nehalem-EX (8C) Westmere-EX

(10C)

QPI

QPI

DDR3-1067

SMB

SMB

SMB

SMB

ICH10

ESI

DDR3-1067

SMB

SMB

SMB

SMB

7500 IOH

QPI

Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)

Nehalem-EX (8C) Westmere-EX

(10C)

or

Xeon 6500(Nehalem-EX)

(Becton)

Xeon E7-2800(Westmere-EX)

ME

SMI: Serial link between the processor and the SMB

SMB: Scalable Memory Buffer with Parallel/serial conversion

SMI links SMI links

Page 49: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Parallel/serial bus

Parallel bus

HI1.5

4-bit wide(4 PCIe lanes)

Serial bus(Point-to-point interconnection)

DMI (Direct Media Interface)

ESI (Enterprise System Interface)

DMI2(Direct Media Interface 2.G.)

FSB(Front Side Bus)

64-bit wide 8-bit wide

Used to interconnectprocessors to chipsetsin previous platforms

Used to interconnectMCHs to ICHs

in previous platforms

16-bit wide

QPI(Quick Path Interconnect)

QPI1.1(Quick Path Interconnect v.1.1)

Used to interconnectprocessors to processors

and processors to chipsets

Used to interconnectprocessors to chipsets

or MCHs to ICHs

Implementation of buses used in Intel’s DT/DP and MP platforms

2.2 Buses interconnecting platform components (2)

Page 50: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Buses used in Intel’s DT/DP/MP platforms

Buses interconnectingprocessors

(In NUMA topologies)

Buses interconnecting processors to chipsets

Buses interconnectingMCHs to ICHs

(In 2-parts chipsets)

Seri

al b

us

Para

llel/

seri

al b

us

Para

llel b

us

FSB (64-bit: 1993) HI 1.5 (1999)

DMI/ESI (20041)QPI (2008)

• 64-bit wide• ~150 lines• 3.2-12.8 GB/s total in both directions

• 8-bit wide• 16 lines• 266 MB/s total in both directions

• 4 PCIe lanes• 18 lines• 1 GB/s/direction

• 4 PCIe lanes• 18 lines• 2 GB/s/direction

DMI2 (2011)

• 20 lanes• 84 lines• 9.6/11.72/12.8 GB/s in each direction

DMI/ESI (2008)2

• 4 PCIe lanes• 18 lines• 1 GB/s/direction

• 4 PCIe lanes• 18 lines• 2 GB/s/direction

DMI2 (2011)

QPI (2008)

• 20 lanes• 84 lines• 9.6/11.72/12.8 GB/s in each direction

QPI1.1 (2012?)

Specification na.

Low-cost systems

High-performancesystems

2.2 Buses interconnecting platform components (3)

Page 51: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

1 DMI: Introduced as an interface between the MCH and the ICH first along with the ICH6, supporting Pentium 4 Prescott processors, in 2004.

2 DMI: Introduced as an interface between the processors and the chipset first between Nehalem-EP and the 34xxPCH, in 2008, after the memory controllers were placed to the processor die.

Remarks

2.2 Buses interconnecting platform components (4)

Page 52: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.4: Signal types used in MMs for control, address and data signals

Signals

Voltage referencedSingle ended Differential

LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL(D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage VREF: Reference Voltage

t t

VREF

LVTTL (3.3 V) FPM/EDO SDRAM HI1.5

TTL (5 V)

FPM/EDO

SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3)RSL (RDRAM)FSB

LVDS PCIe QPI, DMI, ESI FB-DIMMs

t

S+

S-VCM

Smaller voltage swings

Typ.voltageswings 600-800 mV

DRSL XDR (data)

200-300 mV3.3-5 V

Signalingsystem used

Signaling used in buses

2.2 Buses interconnecting platform components (5)

Page 53: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Main features of parallel buses used in Intel’s MC platforms

FSB HI 1.5

Typical useConnecting the processors

and the chipsetConnecting MCH and ICH

Introduced With the Pentium (1993) With the Pentium III (1999)

Width 64 bit 8 bit

Clock 100-400 MHz 66 MHz

DDR/QDR QDR since Pentium 4 (2000) QDR

Transfer rate 400-1600 MT/s 266 MT/s

Bandwidth3.2-12.8 GB/s

in both directions altogether266 MB/s

in both directions altogether

Signaling Voltage referenced data signals Single-ended data signals

No. of lines ~ 150 lines ~ 16 lines

FSB/HI 1.5: Bus type interconnects

2.2 Buses interconnecting platform components (6)

Page 54: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Main features of serial buses used in Intel’s MC platforms

DMI/ESI DMI2 QPI QPI 1.1

Typical useTo interconnect MCHs and ICHs

or processors to chipsets inNUMA platforms

To interconnect processors in NUMA topologies or processors to chipsets

IntroducedIn connection with 2. gen.

Nehalem in 2008

In connection with Sandy

Bridge in 2011

In connection with Nehalem-EP in 2008

In connection with Sandy Bridge in

2012 (?)

Width 4 PCI lanes 4 PCI2 lanes 20 lanesNo specification available yet

Clock 2.5 GHz 5 GHz 2.4/2.93/3.2 GHz

DDR – – DDR

Encoding 10bit/8bit 10bit/8bit no

Bandwidth/direction

1 GB/s 2 GB/s 9.6/11.72/12.8 GB/s

Signaling LVDS LVDS LVDS

No. of lines 18 lines 18 lines 84 lines

DMI/QPI: Point-to-point interconnection

2.2 Buses interconnecting platform components (7)

Page 55: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Comparing main features of Intel’s FSB and QPI [9]

2.2 Buses interconnecting platform components (8)

GTL+: A kind of voltage refenced signaling

Page 56: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.5: LVDS Single Link Interface Circuit [10]

Principle of LVDS signal transmission used in serial buses

2.2 Buses interconnecting platform components (9)

Page 57: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

PCI Express Data Frame [10]

PCIe package format (data frames)

The related fields are:

Field Interpretation

Frame 1-byte Start-of-Frame/End of Frame

Seq# 2-byte Sequence Number

Header 16- or 20-byte Header

Data 0-4096-byte Data field

CRC4 byte ECRC (End-to-End CRC) + 4-byte LCRC (Link CRC) (CRC: Cyclic Redundancy Check)

2.2 Buses interconnecting platform components (10)

Page 58: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

16 data 2 protocol

2 CRC

TX Unidirectional link

RX Unidirectional link

Figure 2.6: Signals of the QuickPath Interconnect bus (QPI-bus) [11]

Principle of the QuickPath Interconnect bus (QPI bus)

2.2 Buses interconnecting platform components (11)

Page 59: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.3. The memory subsystem

2.3.1. Key parameters of the memory subsystem•

2.3.2. Main attributes of the memory technology used•

2.3.2.1. Overview: Main attributes of the memory technology used

2.3.2.2. Memory type•

2.3.2.3. Speed grades•

2.3.2.4. DIMM density•

2.3.2.5. Use of ECC support•

2.3.2.6. Use of registering•

Page 60: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.3.1 Key performance parameters of the memory subsystem (1)

2.3.1 Key performance parameters of the memory subsystem

This issue will be discussed in Section 4.

Page 61: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.3.2 Main attributes of the memory technology used

Speed grade Use of registering

Memory type Use of ECC support

Main attributes of the memory technology used

2.3.2.1 Overview: Main attributes of the memory technology used

DIMM density

2.3.2.2Section 2.3.2.3 2.3.2.4 2.3.2.5 2.3.2.6

2.3.2 Main attributes of the memory technology used

Page 62: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.3.2.2 Memory type (1)

a) Overview: Main DRAM types

1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers

DRAM

(1970)

FB-DIMM

(2006)

DRDRAM

(1999)

DDR3

(2007)

DDR2

(2004)

DDR

(2000)

SDRAM

(1996)

FPM

(1983)

FP

(~1974)

XDR

(2006)1Year

of intro.

Asynchronous DRAMs Synchronous DRAMs

DRAMs with parallel bus connection

DRAMs with serial bus connection

DRAMs for general use

Main stream DRAM types Challenging DRAM types

EDO

(1995)

Commodity DRAMs

2.3.2.2 Memory type

Page 63: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

b) Synchronous DRAMs (SDRAM, DDR, DDR2, DDR3)

2.3.2.2 Memory type (2)

Page 64: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

SDRAM

DDR

DDR2

DDR3

168-pin

184-pin

240- pin

240-pin

All these DIMM modules are 8-byte wide

SDRAM to DDR3 DIMMs

2.3.2.2 Memory type (3)

DRAM device

DIMM

Page 65: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Memory CellArray

I/OBuffers

Memorycontroller

(MC)

DRAM device

Sources/sinks datato/from the I/O buffers

• at a rate of fCell

• at a width of FW

Receives/transmit datato/from the MC

fCell fCK

Data transmission

• at a rate of fCK (SDRAM) or• 2 x fclock (DDR to DDR3)

• on the rising edge of the strobe (CK) for SDRAMs or• on both edges of the strobe (DQS) for DDR/DDR2/DDR3.

Principle of operation of synchronous DRAMs (SDRAM to DDR3 memory chips)

2.3.2.2 Memory type (4)

Page 66: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The memory cell array sources/sinks data to/from the I/O buffers

• at a rate of fCell, where fCell is the clock frequency of the memory cell aray,

• at a data width of FW, where FW is the fetch width of the memory cell array.

• fCell is 100 to 200 MHz

• It stands in a given ratio with the clock frequency of the memory device (fCK) as follows:

• When a new memory technology (e.g. DDR2 or DDR3) appears fCore is initially 100 MHz, .this sets the initial speed grade of fCK accordingly (e.g. to 400 MT/s for DDR2 or to 800 MT/s for DDR3).

• As memory technology evolves fCore will be raised from 100 MHz to 133, 167 and to 200 MHz.

• Along with fCore fCK and the final speed grade will also be raised.

The core clock frequency of the memory cell array (fcell)

Sourcing/sinking data by the memory cell array

Raising fCell from 100 MHz to 200 MHz characterizes the evolution of each memory technology

fCK

SDRAM fcore

DDR fcore

DDR2 2 x fcore

DDR3 4 x fcore

2.3.2.2 Memory type (5)

Page 67: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

It specifies how many times more bits the cell array fetches per column cycle then the data width of the device (xn).

E.g. a 4-bit wide DRAM device (x4 DRAM chip) with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array in every fCell cycle.

The fetch width (FW) of the memory cell array of synchronous DRAMs is as follows:

The fetch width (FW) of the memory cell array

DRAM type FW

SDRAM 1

DDR 2

DDR2 4

DDR3 8

2.3.2.2 Memory type (6)

Page 68: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Transferring data between the I/O Buffers and the Memory Controller

Data transmission between the I/O buffers and the Memory Controller is clocked by a frequency of fCK.

Data transmission occurs

• for SDRAMs at the rising edge of the strobe signal (CK)

• for DDR/DDR2/DDR3 at both edges of the strobe signal (DQS), designated as the Double Data Rate transfer)

The final transfer rate (speed grade) results in

• fCK for SDRAMs

• 2 x fCK for DDR/DDR2/DDR3

Accordingly, typical speed grade ranges cover

• 100 to 200 MT/s for SDRAM devices,• 200 to 400 MT/s for DDR devices,• 400 to 800 MT/s for DDR2 devices and• 800 to 1600 MT/s for DDR3 devices.

2.3.2.2 Memory type (7)

Page 69: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

DRAM core clock100 MHz

Clock (CK/CK#)400 MHz

Memory CellArray

I/OBuffers

DDR3SDRAM DDR3-800

2 x fCK

fCell

n bits

8xn bits

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

800 MT/s

Data Strobe (DQS)400 MHzE.g.

DRAM core clock100 MHz

Clock (CK/CK#)200 MHz

Memory CellArray

I/OBuffers

DDR2SDRAM DDR2-400

2 x fCK

fCell

4xn bitsn bits

Data Strobe (DQS)200 MHz

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

400 MT/s

E.g.

Memory CellArray

I/OBuffers

DDRSDRAM DDR-200

fCKfCell

2xn bitsn bits

Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)

200 MT/s

DRAM core clock100 MHz

Clock (CK/CK#)100 MHz

Data Strobe (DQS)100 MHzE.g.

DRAM core frequency100 MHz

Clock frequency (fCK)

100 MHz

Clock (CK)100 MHzE.g.

Memory CellArray

I/OBuffers

SDRAMSDRAM-100

fCKfCell

n bits n bits

Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1)

100 MT/s

Page 70: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

shorter signal rise/fall times higher speed grades

but lower voltage budget higher requirements for signal integrity

Smallervoltageswings

Q = Cin x V = I x t tR ~ Cin x V/I

Q: Charge on the input capacitance of the line (Cin)Cin: Input capacitance of the line V: Voltage I: Current strength of the driver tR: Rise time

Relation between voltage swings and rise/fall times of signals

Voltage/Voltage swingMemory type

SDRAMDDRDDR2DDR3

3.3 V2.5 V1.8 V1.5 V

The main technique to increase memory speed

2.3.2.2 Memory type (9)

Page 71: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.7: Signaling alternatives of buses used with memories

FPMEDO

SDRAM

DDRDDR2DDR3

RDRAM

FBDIMM

Sig

nalin

g o

f d

ata

lin

es

Volt

ag

e r

ef.

(RS

L,

SS

TL)

Diff

ere

nti

al

(DR

SL,

LV

DS

)S

ing

le e

nd

ed

(TTL,

LV

TTL)

XDRXDR2

Signaling of command, control and adress lines

Voltage ref.(RSL, SSTL)

Single ended(TTL, LVTTL)

Differential(DRSL, LVDS)

2.3.2.2 Memory type (10)

Page 72: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Table 2.4: Key features of synchronous DRAM devices

SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM

JEDEC standard JESD 21-C Release 4 JESD 79 JESD 79-2 JESD 79-3

Key featuresSynchronous, pipelined,

burst orientedDouble data rate

2n prefetch architectureDouble data rate

4n prefetch architectureDouble data rate

8n pref. architecture

StandardFirst/last release

JESD 21-CRelease 411/1993

JESD 796/2000

JESD 79E5/2005

JESD 79-29/2003

JESD 79-2C5/2006

JESD 79-36/2007

Device density 64 Mb 128 Mb - 1Gb 256 Mb - 4 Gb 256 Mb – 4 Gb 512 Mb – 8Gb

Organization x4/8/16 x4/8/16 x4/8/16 x4/8/16 x4/8/16

Device speed (MT/s) 66 100/133 200/266200/266/333/400

400/533/667/800

800/1066/1333/1600

Device density 4/16 Mb16-256 Mb

x8/1664-512 Mb

x8/16128-512 Mb

x8/16256 Mb – 1 Gb

x8/16256 Mb -1 Gb

x8/16512 Mb – 16 Gb

Typ. processorsPentium

(3V)Pentium III

P4 (Willamette)

P4 (Northwood)P4 (Prescott)

P4 (Prescott)P4 (Presler)Pentium DCore2 Duo

Core2 Duo toSandy Bridge

Voltage 3.3 V 2.5 V 1.8 V 1.5 V

No. of pins on the modul 168 184 240 240

Key features of synchronous DRAM devices (SDRAM to DDR3)

2.3.2.2 Memory type (11)

Page 73: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Approximate appearance dates and speed grades of DDR DRAMs as well as the bandwidth provided by a dual channel memory subsystem

Bandwidth1

1 Bandwidth of a dual channel memory subsystem [12]

2.3.2.2 Memory type (12)

Page 74: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Green and ultra-low power memories

Green memories: lower dissipation memories

Ultra-low-power DDR3 memories: Use of 1.35 V supply voltage instead of 1.50 V to reduce dissipation

They represents the latest achievements of the DRAM memory technology

2.3.2.2 Memory type (13)

Page 75: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Green and ultra-low power memories- Examples [13]

2.3.2.2 Memory type (14)

Page 76: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

SDRAM

DDR

DDR2

DDR3

168-pin

184-pin

240- pin

240-pin

8-byte wide memory modules (DIMMs)

2.3.2.2 Memory type (15)

DRAM device

DIMM

Keying (notch)

Page 77: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Types of DIMMs

• 64 data bits or 64 data + 8 ECC bits wide memory block,• all devices of a rank will be activated by the same chip select (CS) signal.

DIMMs

Single-sided DIMMs Double-sided DIMMs

Rank (logical module)

DRAM devices are placedonly on one DIMM side

DRAM devices are placedon both DIMM sides

DIMM (physical module)

The physical carrier of ½, 1 or 2 ranks.

Single-sided/double-sided DIMMs

2.3.2.2 Memory type (16)

Page 78: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Examples for single-sided and double sided DIMMs with single or dual ranks [45]

9 x 8-bit DDR devices

2.3.2.2 Memory type (17)

Page 79: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Example: Traditional way of attaching DIMMs via a parallel channel to the MCH [45]

2.3.2.2 Memory type (18)

Page 80: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Example 2: Attaching DIMMs via 3 parallel memory channels to memory controllers implemented on the processor die

(This is actually Intel’s the Tylersburg DP platform, aimed at the Nehalem-EP processor, used for up to 6 cores) [46]

2.3.2.2 Memory type (18)

Page 81: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

c) FB-DIMMs

1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers

DRAM

(1970)

FB-DIMM

(2006)

DRDRAM

(1999)

DDR3

(2007)

DDR2

(2004)

DDR

(2000)

SDRAM

(1996)

FPM

(1983)

FP

(~1974)

XDR

(2006)1Year

of intro.

Asynchronous DRAMs Synchronous DRAMs

DRAMs with parallel bus connection

DRAMs with serial bus connection

DRAMs for general use

Main stream DRAM types Challenging DRAM types

EDO

(1995)

2.3.2.2 Memory type (19)

Page 82: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

• Introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses)

• Introduce full buffering (registered DIMMs buffer only address and control signal)

• CRC error checking (cyclic redundancy check)

Principle of operation

2.3.2.2 Memory type (20)

Page 83: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

The architecture of FB-DIMM memories [19]

2.3.2.2 Memory type (21)

Page 84: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.8: Maximum supported FB-DIMM configuration [20](6 channels/8 DIMMs)

2.3.2.2 Memory type (22)

Page 85: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

• Serial (differential) transmission between the North Bridge and the DIMMs (each bit needs a pair of wires)

• Read packets (frames, bursts): 12 x 14 = 168 bits

• 144 data bits

(equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles)• 24 CRC bits.

• Every 12 cycles (that is every two memory cycles) constitute a packet.

• Write packets (frames, bursts): 12 x 10 = 120 bits

• 98 payload bits

• 22 CRC bits.

• Clocked at 6 x data rate of the DDR2

e.g. for a DDR-667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz

• Number of seral links

• 14 read lanes (2 wires each)• 10 write lanes (2 wires each)

Implementation details (1)

2.3.2.2 Memory type (23)

Page 86: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

98 payload bits.

• 2 frame type bits,

• 24 bits of command,

• 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands.

Commands

• all commands include a 3-bit FB-DIMM module address to select one of 8 modules.

Implementation details (2)

2.3.2.2 Memory type (24)

Page 87: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s

FB-DIMM-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s

FB-DIMM-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s

FB-DIMM data puffer

Figure 2.9: Different implementations of FB-DIMMs [48]

(Advanced Memory Buffer, AMB)

Manages the read/write operationsof the module

2.3.2.2 Memory type (25)

The notch (keying) differs from DDR2 DIMMs

Page 88: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)

Figure 2.10: Block diagram of the AMB [21]

2.3.2.2 Memory type (26)

S/PConverter

Page 89: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Necessary routing to connect the north bridge to the DIMM socket

a) In case of a DDR2 DIMM (240 pins)

b) In case of an FB-DIMM (69 pins)

A 3-layer PCB is needed A 2-layer PCB is needed(but a 3. layer is used for power lines)

Figure 2.11: PCB routing [19]

2.3.2.2 Memory type (27)

Page 90: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Assessing benefits and drawbacks of FB-DIMM memories (as compared to DDR2/3 memories)

Benefits of FB-DIMMs

higher memory size and bandwidth

• more DIMM modules (up to 8) per channel

higher memory size (6x8=48 DIMM size)

• more memory channels (up to 6)

Drawbacks of FB-DIMMs

• higher latency

(Typical dissipation figures: DDR2: about 5 W AMB: about 5 W FB-DIMM with DDR2: about 10 W)

• higher cost

• higher dissipation

asuming 8 GB/DIMM up to 512 GB

• same bandwidth figures as the parts based on (DDR2)

2.3.2.2 Memory type (28)

Page 91: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Latency [22]

• Due to their additional serialization tasks and daisy-chained nature FB-DIMMs have about 15 % higher overall average latency than DDR2 memories.

Production

The production of FB-DIMMs stopped with DDR2-800 modules, no DDR3 modules came to the market due to the drawbacks of the technology.

2.3.2.2 Memory type (29)

Page 92: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.3.2.3 Speed grades (1)

Overview of the speed grades of DDR DRAMs

Bandwidth1

1 Bandwidth of a dual channel memory subsystem [12]

2.3.2.3 Speed grades

Page 93: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Then subsequent speed grades of FSBs and also those of the memories were chosen as subsequent integral multiples of 133 MHz, such as

266 = 2 x 133 400 ~= 3 x 133 533 ~= 4 x 133 667 ~= 5 x 133 800 ~= 6 x 1331067 ~= 7 x 1331333 ~= 8 x 1331600 ~= 9 x 133 etc.

Remark

Speed grades of FSBs and DRAMs were defined at the time when the base clock frequency of the FSBs was 133 MHz (around 2000).

2.3.2.3 Speed grades (2)

Page 94: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.12: The evolution of peak transfer rates of parallel connected synchronous DRAMs as manifested in Intel’s chipsets

Transfer rate(MT/s)

50

100

500

Year03 0596 97 98 99 2000 01 02 04 06 07 08

*

**

*

*

*

*

*

20

*

1000

SDRAM66

5000

200

2000

10

~ 10*/10 years

DDR266

DDR2533

SDRAM100

DDR31333

DDR2667

DDR2800

DDR333

SDRAM133

*

DDR400

*

DDR31600

Rate of increasing the transfer rates in synchronous DRAMs

2.3.2.3 Speed grades (3)

Page 95: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.13: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [23])

256M

64K

16M

1G

4M

256K

64M

1M

20151980 1985 1990 1995 2000 2005 2010

500

1000

1500

2000

16K

Units 106

Year

Density: ~4×/4Y

a) Device density

2.3.2.4. DIMM density

2.3.2.4 DIMM density (1)

Page 96: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

b) DIMM (module) density

Based on

2.3.2.4 DIMM density (2)

• typical device densities of 1 to 8 Gb and with• typical widths of x4 to x16 (bits)

DDR2 or DDR3 modules provide typical densities of up to 8 or 16 GB.

Page 97: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Implemented as SEC-DED (Single Error Corretion Double Error Detection)

Single bit Error Correction

The minimum number of check-bits (P) for single bit error corection ?

2P ≥ the minimum number of states to be distinguished.

For D data bits P check-bits are added.

Figure: The code word

Requirement:

Data bits Check bits

2.3.2.5 Use of ECC support (1)

ECC basics (as used in DIMMs)

D P

2.3.2.5 Use of ECC support

Page 98: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

• It is needed to specify the bit position of a possible single bit error in the code word consisting of both data and check bits This requires D + P states

• one additional state to specify the „no error” state.

2P ≥ D + P + 1

The minimum number of states to be distinguished:

the minimum number of states to be distinguished is: D + P + 1

to implement single bit error correction the minimum number of check bits (P) needs to satisfy the requirement:

Accordingly:

2.3.2.5 Use of ECC support (2)

Page 99: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Double bit error detection

an additional parity bit is needed to check for an additional error.

Then the minimum number of check-bits (CB) needed for SEC-DED is:

CB = P + 1

2CB-1 ≥ D + CB -1 + 1

Table 2.5: The number of check-bits (CB) needed for D data bits

since

Data bits (D) Check bits (CB)

1 2

3:2 3

7:4 4

15:8 5

31:16 6

63:32 7

127:64 8

255:128 9

511:256 10

2CB-1 ≥ D + CB

2P ≥ D + P + 1

P = CB - 1

2.3.2.5 Use of ECC support (3)

Page 100: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Support of ECC and registering in DT and DP/MP platforms

DT memories typically do not support ECC or registered (buffered) DIMMs,

Servers make typically use of registered DIMMs with ECC protection.

2.3.2.5 Use of ECC support (4)

Page 101: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.14:Typical layout of a registered memory module with ECC [14]

• Two register chips, for buffering the address- and command lines• A PLL (Phase Locked Loop) unit for deskewing clock distribution.

Typical implementation of ECC protected registered DIMMs (used typically in servers)

ECC

RegisterRegister PLL

Main components

2.3.2.5 Use of ECC support (5)

Page 102: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

2.3.2.6 Use of registering (1)

Higher memory capacities need more modules

Higher loading the lines

Signal integrity problems

Buffering address and command lines,Phase locked clocking of the modules

Problems arising while implementing higher memory capacities

2.3.2.6 Use of registering

Page 103: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Registering

Principle

• to reduce signal loading in a memory channel• in order to increase the number of supported DIMM slots (max. mem. capacity), needed first of all in servers,

Buffering address and control lines

2.3.2.6 Use of registering (2)

Page 104: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Implementation of registering

Figure 2.15: Registered signals in case of an SDRAM memory module [15]

REGISTER

REGE: Register enable signal

Note: Data (DQ) and data strobe (DQS) signals are not registered

as only address an control signals are common for all memory chips.

By means of a register chip that buffers address and control lines

2.3.2.6 Use of registering (3)

Page 105: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Number of register chips required

• Synchronous memory modules (SDRAM to DDR3 DIMMs) have about 20 – 30 address and control lines,

• Register chips buffer usually 14 lines,

Typically, two register chips are needed per memory module [16].

2.3.2.6 Use of registering (4)

Page 106: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.17: Example. Block diagram of a registered DDR DIMM [16]

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

SDRAM

PI74SSTV16857 Register

PI74SSTV16857 Register

Address/Controlform

Motherboard

Address Controlfrom

Motherboard

PI6CV857PLL

Input Clockfor

Motherboard

Data From / To Motherboard

Example: Block diagram of a registered DDR DIMM

2.3.2.6 Use of registering (5)

Page 107: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.16:Typical layout of a registered memory module with ECC [14]

• Two register chips, for buffering the address- and command lines• A PLL (Phase locked loop) unit for deskewing clock distribution.

Typical layout of registered DIMMs

ECC

RegisterRegister PLL

2.3.2.6 Use of registering (6)

Page 108: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

Figure 2.18: Registered DIMM module with ECC [14]

Registered DIMM module with ECC

ECC

2.3.2.6 Use of registering (7)

Page 109: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

in servers (Memory capacities: a few tens of GB to a few hundreds of GB)

Typical use of registered DIMM (RDIMM)

Typical use of unregistered DIMMs (UDIMMs)

in desktops/laptops (Memory capacities: up to a few GB)

2.3.2.6 Use of registering (9)

Page 110: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

5. References

Page 111: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

5. References (1)

[1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino

[2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226

[3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/

[4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29.

[5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm

[6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf

[7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004 , http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm

[8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2

[9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf

[10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html

Page 112: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

5. References (2)

[11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf

[12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html

[13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf

[14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org

[15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf

[16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf

[17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf

[18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years, Xbit Laboratories, 04/28/2006, http://www.xbitlabs.com/news/cpu/display/20060428162855.html

[19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7

Page 113: Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.

5. References (3)

[22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, 2007,

[20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1

[21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf

[23]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf