Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 1. Macroarchitecture and performance...

102
Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 1. Macroarchitecture and performance parameters of MMs

Transcript of Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 1. Macroarchitecture and performance...

Dezső Sima

September 2008

(Ver. 1.0) Sima Dezső, 2008

1. Macroarchitecture and performance parameters of MMs

Overview

1. Introduction

2. Macroarchitecture of main memories•

3. Key performance parameters of main memories•

4. References•

General purpose main memories, i.e. main memories used in desktops, servers and laptops

1. Introduction (1)

Scope

Figure: Main memories on motherboards

Server [77]Desktop [32]

1. Introduction (2)

1. Introduction (3)

Figure: Different kinds of memory modules

Layout of main memories

Macroarchitecture of the main memory

Layout of thememory modules

Figure: Main dimensions of the layout of main memories

1. Introduction (4)

2. Macroarchitecture of main memories

2. Macroarchitecture of main memories

2.1 Introduction

2.4 Number of memory controllers•

2.3 Point of attachment•

2.5 Number of memory channels•

2.6 Attributes of memory channels•

2.2 Attachment policy•

L2 contr.

Core

L2

FSB c.

FSB

NorthBridge

Mem. channel Mem. modules

L2

FSB c.

FSB

NorthBridge Memory

Macroarchitecture of main memories

Example 1

Memory

L2 contr.

Core

ProcessorProcessor

Figure: Single channel main memory attached via the FSB and the north bridge

2.1. Introduction (1)

L2 contr.

Core

L2

Core

FSB c.

FSB

NorthBridge

Mem. channels

Mem. modules

L2 contr.

Core

L2

Core

FSB c.

FSB

NorthBridge Memory

Memory

Processor Processor

Figure: Dual channel main memory attached via the FSB and the north bridge

Example 2

2.1. Introduction (2)

IN (Xbar)

B. c. M. c.

IO-bus

Core

L2

Memory B. c. M. c.

IO-bus

Mem. channel Mem. modules

Memory

Processor Processor

IN (Xbar)

Core

L2

Figure: Single channel main memory attached via a dedicated memory controller

Example 3

2.1. Introduction (3)

IN (Xbar)

Syst. Req. Queue

B. c. M. c.

IO-bus

Core Core

L2 L2

Memory

IN (Xbar)

Syst. Req. Queue

B. c. M. c.

IO-bus

Core Core

L2 L2

Mem. channels

Mem. modules

Memory

Processor Processor

Figure: Dual channel main memory attached via a dedicated memory controller

Example 4

2.1. Introduction (4)

Macroarchitecture of main memories

No. of mem. contr.s

(in case of directattachment)

No. of mem. channels

Attachment policy Point of attachment

Figure: Main dimensions of the macroarchitecture of main memories

Attributes of mem. channels

2.1. Introduction (5)

Attachment policy

Direct attachment Indirect attachment

POWER4 (2001)

UltraSPARC IV+ (2005)

POWER5 (2005)

Montecito (2006) UltraSPARC T1 (2005)

UltraSPARC IV (2004)

Athlon 64 X2 line (2005)

PA-8800 (2004)

PA-8900 (2005)

Core Duo line (2006)

• Longer access times (~20-30%),• Independency of memory technology and speed

• Shorter access times (~20-30%),• Dependency of memory technology and speed

POWER6 (2007)

Figure: Attachment policy

2.2. Attachment policy (1)

Attachment via the FSB andnorth bridge (mem. control hub)

Attachment via mem. controller(s)

Opteron line (2003)

Barcelona (2007)

Cell BE (2006)

L2 contr.

Core

L2

Core

FSB c.

FSB

Core Duo (2006)

Core 2 Duo (2006)

IN (Xbar)

System Request Queue

B. c. M. c.

HT-bus

Athlon 64 X2 (2005)

NorthBridge

MemoryMemory

Figure:Indirect attachmentof the main memory to the syst. architecture

Figure: Direct attachmentof the main memory to the syst. architecture

Core Core

L2 L2

2.2. Attachment policy (2)

The highest cache level

(via an IN)

The point of attachment

Between the two highest cache levels

(via the IN connecting these levels)

2-level caches: 3-level caches: 2-level caches: 3-level caches:

The IN connecting the L2 cache

The IN connecting the L3 cache

The IN connecting the L1 and L2 caches

The IN connecting the L2 and L3caches

The M. c is connected usually in this way if the highest cache level is exclusive.

The M. c is connected usually in this way if the highest cache level is inclusive.

L3

IN

L3

M

IN

L3 L3 L3

L2 L2 L2

M

L2

IN

L2

M

IN1

L2 L2

CC

M

Figure: Possible points of attachment of main memory to the system architecture

2.3. Point of attachment (1)

Data missing in L2/L3 (high traffic)

L2

M.c.

Replaced lines

Replaced, modified data(low traffic)

Lines missing in L2 are reloaded and deleted from L3

L3

Memory

L2

IN

L2

L3 L3

M.c.

L3 L3

M.c.

Memory

Memory

Montecito (2006)POWER4 (2001) UltraSPARC IV+ (2004)

POWER5 (2004)

Interrelationsship between inclusion policy of L3 caches and point of attachment

Memory

L3

L2

Inclusive L3 Exclusive L3

2.3. Point of attachment (2)

2.3. Point of attachment (3)

Core

L2 I L2 D

L3

Core

L2 I L2 D

L3

FSB c.

FSB

Montecito (2006)

L2 contr.

Core

L2

Core

FSB c.

FSB

Athlon 64 X2 (2005)Core 2 Duo (2006)

In case of a two-level cache hierarchy In case of a three-level cache hierarchy

IN (Xbar)

Memory

System Request Queue

B. c. M. c.

HT-bus

L2 L2

Core Core

Figure: Examples for attaching memory via the highest cache level

2.3. Point of attachment (4)

UltraSPARC T1 (2005) UltraSPARC IV+ (2005)

In case of a two-level cache hierarchy In case of a three-level cache hierarchy

(exclusive L3)

L2L2 M. c.

B. c.

L2

L2

L2Core 7

M. c.

M. c.

M. c.

Core 0

X

b

a

r

Memory

Memory

Memory

Memory

JBus

Core

L3 tags/contr.

L3 data

Interconn. network

M. c.

Memory

B. c.

Fire Planebus

Core

L2

Figure: Examples for attaching memory via the interconnection network connecting the two highest cache levels

Number ofmemory controllers(in case of direct attachment)

Dualmemory controllers

Singlememory controller

Usual implementations

POWER6 (2007)

Figure: Number of memory controllers (in case of direct attachment)

UltraSPARC T2 (2007)

Quadmemory controllers

2.4. Number of memory controllers (1)

Barcelona (2007)

E.g. POWER5 (2004)

K8-based processors (2006)

A few recent designsTyp. use Exceptional designs

UltraSPARC T1 (2005)

Figure: Block diagrams of the POWER5 and POWER6 processors [57]

2.4. Number of memory controllers (2)

Figure: Block diagrams of AMD’s K8 and Barcelona processors [58]

2.4. Number of memory controllers (3)

Figure: Block diagram of the UltraSPARC 2 (Niagara-2) [59]

2.4. Number of memory controllers (4)

Number of memory channels(per north bridge/memory controller)

Dualmemory channels

Singlememory channel

Quadmemory channels

E.g. Intel’s 845/848 chipset familiesfor P4 desktops

and earlier desktopchipsets

Intel’s 865 and higherchipset familiesfor P4 desktops,Intel’s P4 based

DP server chipsets

Intel’s 5000 (Bensley) and 7000 Caneland

platforms for Core Duo DC and MC processors

Figure: Number of memory channels supported per north bridge/memory controller

2.5. Number of memory channels (1)

Typ. use Early desktops Recent desktops,single core

DP/MP servers

Recent DC and QC DP/MP servers withFB DIMM memory

Cell BE

Figure: Block diagram of an early P4 desktop having a single memory channel (Intel 845 chipset) [49]

2.5. Number of memory channels (2)

Example 1

Figure: Block diagram of a more advanced P4 desktop including dual memory channels (Intel’s 975 chipset) [50]

2.5. Number of memory channels (3)

Example 2

Figure: Block diagram of an early P4-based DP server including dual memory channels (Supermicro’s E7520 chipset based X6DH8-G2/X6DHE-G2 motherboard) [51]

Example 3

2.5. Number of memory channels (4)

Memory Interface Controller (MIC)

• Dual XDRTM memory channels

• Interleaved adressing in the channels• The MIC can be configured to support only a single channel

• ECC support (32 + 4 bits)

2.5. Number of memory channels (5)

Dual 36 bits wide XDR channels

Figure: Basic blocks of the Cell BE processor [60]

3.2 Gb/s x 2 x 4 B = 25.6GB/s

Memory bandwidth at 3.2 Gb/s transfer rate:

2.5. Number of memory channels (6)

Remark

In dual channel configurations (or in general, in case of multiple memory channels) a scheme is needed to define the allocation of memory addresses to the individual channels.

Allocation of addresses to the individual channels

Asymmetric modeInterleaved mode

• Addresses are allocated alternating to the channels at 64 B boundaries, assuming 64 B long cache lines. Two consecutive cache lines can be retrieved simultaneously.

• Both memory channels must be populated with modules having the same size (e.g. 1 GB).

• Provides maximum performance in real applications.

• Addresses start in the first channel and are allocated to this channel until the highest rank of this channel. Then addresses continue in the second channnel.

• No need to populate both channels, or populate them with the same size.

• In real applications, performance is limited to single channel performance.

Figure: Address allocation alternatives to the individual channels

5000 (Dempsey, Netburst), DC

5100 (Woodcrest, Core 2), DC

5300 (Clowertown, Core 2), QC

2.5. Number of memory channels (7)

FB-DIMM

up to 64 GB

Xeon

In workstations the snoop filtereliminates snoop traffic to the

graphics port

5000(Blackford)

Figure: Block diagram of Intel’s 5000 (Bensley) DP platform for DC/QC Core 2 Duo processorsincluding quad memory channels [52]

Example 4

FB-DIMM

up to 512 GB

7200 (Tigerton DC, Core2), DC

Xeon

7300 (Tigerton QC, Core2), QC

2.5. Number of memory channels (8)

Figure: Block diagram of Intel’s 7300 (Bensley) MP platform for DC/QC Core 2 Duo processorsincluding quad memory channels [53]

Example 5

Figure: Maximum supported FB-DIMM configuration [54]

(6 channels/8 DIMMs)

Remark

The FBI technology supports even 6 memory channels with 8 DIMMs each [54], nevertheless actual implementations support typically only four DIMMs.

2.5. Number of memory channels (9)

Attributes of memory channels

Supported type of mem. modules

Supported no. of mem. modules

Supported no. of ranks per mem. module

Supported attributes of DRAM devices

Figures: Attributes of memory channels

2.6. Attributes of memory channels (1)

Suported type of memory modules

Memory modules of different DRAM types

Memory modules of the same DRAM type

In order to provide a choice and evolution path in times of

memory technology transfers(e.g. while DDR2 technology replaces DDR technology)

DRAM type B DRAM type A

Usual implementation

E.g. DDR DDR2DDR2

Figure: Type of memory modules supported on the memory channel(s)

2.6. Attributes of memory channels (2)

Example

Intel’s 915P/G chipsets support dual memory channels with either DDR or DDR2 technologies. Per channel a single memory module is supported (with one or two memory ranks on each).

Accordingly, a mainboard based on the 915G chipset, such as MSI’s 915G Combo mainboard, is a designated as a combo mainboard.

2.6. Attributes of memory channels (3)

Note: Motherboards allowing to choose from two different DRAM types are termed Combo boards.

Figure: MSI’s 915G Combomotherboard (based on

Intel’s 915G chipset) [61]

North bridge of the 915G chipset

4 DIMM slots

2.6. Attributes of memory channels (4)

Figure: DIMM slots of theMSI’s 915G Combomotherboard [61]

DDR2

DDR

Two DDR or DDR2 channelswith a single DIMM slot

on each channel

2.6. Attributes of memory channels (5)

Supported number of memory modules

It depends on the

• DRAM connection technology

• DRAM speed

• Number of ranks mounted onto the memory module(s).

2.6. Attributes of memory channels (6)

The maximum number of supported memory modules depends heavily on the memory connection technology, that is whether the modules are connected

• via a parallel bus (as in case of SDRAM, DDR, DDR2, DDR3 modules) or• via a serial bus (like in case of FBDIMM modules).

Number of memory modulessupported per memory channel

1-4memory modules

6-8memory modules

Modules connectedvia a parallel bus

Modules connectedvia a serial bus

E.g. SDRAM, DDR, DDR2, DDR3modules

FBDIMM modules

Figure: Number of memory modules vs memory connection technology in synchronous DRAMs

2.6. Attributes of memory channels (7)

Dependency on the memory connection technology

Remarks

1. Early chipsets supporting low speed 1 or 4 Byte wide asynchronous DRAMs often allowed 4 – 8 memory modules to attach.

2.6. Attributes of memory channels (8)

2. The Pentium processor provided a 64-bit wide datapath. So early (430 family) chipsets supported typically two pairs of 32-bit wide FPM/EDO modules.

• skews• jitter and• reflections (caused by impedance mismatch while terminating transmission lines)

Higher transfer rates limit the number of memory modules that can be supported on a memory channel.

2.6. Attributes of memory channels (9)

For higher transfer rates

Obviously, the more memory modules are present on a channel the serious signal integrity problems arise.

impede more and more signal integrity.

Dependency on the memory speed

Figure: Scaling down the number of supported DIMMs per channel with increasing data rates (assuming two ranks per DIMM) [62]

2.6. Attributes of memory channels (10)

Figure: Scaling down the number of PCI-X slots with increasing PCI-X bus speed [55]

2.6. Attributes of memory channels (11)

But increasing server performancedoubles memory capacity demand

about every two years [66]

• increasing device densities • but decreasing number of modules supported for higher transfer rates by memory channels,

Figure: Channel capacity of synchronous SDRAMs vs memory capacity demand [66]

With

the maximum memory capacity per memory channel remains roughly the same for synchronous SDRAM devices [66].

2.6. Attributes of memory channels (12)

Levelling off channel capacity for synchronous DRAMs

2.6. Attributes of memory channels (13)

Increasing server capacity demand calls for memory technologies with higher capacity potential, such as DRAM technologies with serial bus connection, like FB-DIMM.

Dependency on the number of ranks mounted onto the memory modules

Dual memory ranks mounted on the memory modules result in higher bus loading, and may reduce the maximum number of supported memory slots.

E.g. the north bridge of Intel’ 815 chipset supports at 133 MHz memory speed

• up to three SDRAM DIMMs with just a single rank or • up to two SDRAM DIMMs with dual ranks.

2.6. Attributes of memory channels (14)

Number of memory modulessupported per memory channel

1-2memory modules

6-8memory modules

Figure: Number of memory modules supported per memory channel by Intel’s P4/Core 2 Duo north bridges

Desktops/entry level

servers

Typical use

2.6. Attributes of memory channels (15)

DP/MP serverswith FBDIMM

mem. modules

2.6. Attributes of memory channels (16)

Figure: Example 1. P4 based desktop motherboard

(MSI’s 915G Combomotherboard with

Intel’s 915G chipset) [61]

4 DIMM slots

Two DDR or DDR2 channelswith a single DIMM slot

on each channel

DDR2

DDR

Figure: Example 2. P4-based entry-level DP server motherboard

(Supermicro’s P8SCT with Intel’s E7221 chipset) [63]

CPU

MCH(E7221)

2.6. Attributes of memory channels (17)

Two DDR2 channels with two DIMM slots on each channel

Ch. A Ch. B4 DIMM slots

Figure: Example 3. Block diagram of a Core 2 based four-processor MP server (Supermicro’s X7QC3 with Intel’s 7300 North bridge) [64]

2.6. Attributes of memory channels (18)

4 DDR2 FB-DIMMchannels

6 DIMM slots on each channel

192 GB ATI ES1000 Graphics with    32MB video memory

7200 DC 7300 QC(Tigerton)

Xeon

SBE2 SB

7300 NB

2.6. Attributes of memory channels (19)

Figure: Example 3. Core 2 based four-processor MP server motherboard (Supermicro’s X7QC3 with Inte’s 7300 North bridge) [64]

4 DDR2 FB-DIMMchannels

6 DIMM slots on each channel

Figure: Example 4. Block diagram of Intel’s Core 2 based 7300 (Caneland) MP platformwith the 7300 (Clarksboro) chipset (9/2007) [65]

up to 512 GB

7200 (Tigerton DC, Core2), DC

Xeon

7300 (Tigerton QC, Core2), QC

2.6. Attributes of memory channels (20)

Four DDR2 FB-DIMM channels with 8 DIMM slots on each channel

Rank: logical unit

• A rank consists of a set of DRAM devices (of a given width) that are needed to achieve the expected data width of the memory module.

E.g. a 64-bit wide rank consists of 8 8-bit wide or 4 16-bit wide DRAM devices.

• DRAM devices constituting a rank are mounted side by side onto a memory module.

• Optionally, a rank may include an additional DRAM device to hold ECC bits.

• All devices of a rank share the address and the command bus.

• All devices of a rank are selected by the same CS (Chip Select) signal, whereas different ranks have different CS signals.

A memory rank is sometimes designated also as a row.

2.6. Attributes of memory channels (21)

Supported number of ranks per memory module

Memory module: physical unit

• A rank covers usually one side of the memory module (using x8 or x16 devices, but 64-bit wide ranks built up of x4 devices (16 devices) cover typically both sides.

2.6. Attributes of memory channels (22)

Figure: Connecting ranks to the memory controller [68]

A memory module may contain

• a single rank on one of its sides• a single rank on both of its sides• two ranks, each one of its sides

• A memory module is basically a PC card that carries one or more ranks, and fits into a memory slot of the motherboard. • Memory modules may be populated either on one side or on both sides.

Memory module: physical unit

2.6. Attributes of memory channels (23)

Figure: Example 1: One 64-bit wide DDR3 SO-DIMM rank consisting of 4 16-bit DRAM devices,that are mounted on one side of the module [67]

2.6. Attributes of memory channels (24)

Figure: Example 2: One 64-bit wide DDR3 SO-DIMM rank consisting of 8 8-bit DRAM devices,that are mounted on both sides of the module [67]

2.6. Attributes of memory channels (25)

Figure: Example 3. Two 64-bit wide DDR3 SO-DIMM ranks, each consisting of 4 16-bit DRAM devices,that are mounted on both sides of the module [67]

2.6. Attributes of memory channels (26)

Supported number of ranks per memory module

Dual ranksare supported per mem. module

A single rankis supported per mem. module

Figure: Supported number of ranks (rows) per memory module

2.6. Attributes of memory channels (27)

Typical implementationIn few cases, usually as arestriction for higher DRAM speeds

Examples

a) The north bridge of Intel’s 815 chipset supports

• up to three SDRAM-133 DIMMs with just a single rank or • up to two SDRAM-133 DIMMs with dual ranks.

• up to three SDRAM-100 DIMMs with dual ranks or

b) The north bridge of Intel’s P35 chipset for Core 2 Duo processors supports

• up to two DDR2-800/667 or DDR3 1066/800 DIMMs with dual ranks

Supported attributes of DRAM devices

DRAM width DRAM density DRAM speed

Figure: Supported attributes of DRAM devices

2.6. Attributes of memory channels (28)

DRAM type

2.6. Attributes of memory channels (29)

DRAM

(1970)

FBDIMM

(2006)

DRDRAM

(1999)

DDR3

(2007)

DDR2

(2004)

DDR

(2000)

SDRAM

(1996)

FPM

(1983)

FP

(~1974)

XDR

(2006)1Year

of intro.

Asynchronous DRAMs Synchronous DRAMs

DRAMs with parallel bus connection

DRAMs with serial bus connection

DRAM types( for general use)

1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers

Main stream DRAM types Challenging DRAM types

Figure: DRAM types for general use

EDO

(1995)

(Described in Sections 4, 5, 6 of the Chapter DRAM devices)

DRAM width

Most recent north bridges/memory controllers support x8 and x16 DRAM devices.

DRAM density

DRAM speed

North bridges/memory controllers specify the width of supported DRAM devices.

North bridges/memory controllers specify supported DRAM densities.

Example 1

Also north bridges/memory controllers specify supported DRAM speeds.

The north bridge of Intel’s 815 chipsets for Pentium 4 processors supports SDRAM devices with 16Mb/64Mb/128Mb/256Mb densities

2.6. Attributes of memory channels (30)

Example 2

The north bridge of Intel’s Series 3 chipset family for Core Duo and Core Quad processors supports DDR2 and DDR3 devices with 512Mb and 1Gb densities. .

5/02 10/02

845GL 845GV845G 845E 845GE

400 MHz 533/400 MHz

10/02

845xx family

(Brookdale)

Single channel SDR/DDR SDRAM

5/02

FSBHT not supported HT supported

845

5/02 10/02

845PE

PC133,DDR 266/200

DDR 266/200 DDR 266/200 DDR 333/266 DDR 333/266

9/01 1/02

PC133 DDR 266/200 PC133,DDR 266/200

(unbuffered)

HT support

DRAM speed

Features

MemoryMCH/GMCH

Max. memory 2 GB

11/01

845

Example: Supported DRAM speeds of the north bridges of Intel’s 845xx family of chipsets.

Another example:

The north bridge of Intel’s Series 3 chipsets for Core 2 Duo and Core 2 Quad processors support DDR2 devices with 667/800 MT/s or DDR3 devices with 800/1066 MT/s transfer rate.

2.6. Attributes of memory channels (31)

3. Key performance parameters of main memories

3. Key performance parameters of main memories

3.1 Memory capacity

3.3 Memory latency•

3.2 Memory bandwidth•

Memory capacity (CM)

CM = nCU x nCH x nM x nR x CD

nM: No. of memory modules per channel

nCU: No. of north bridges/memory control units

nCH: No. of memory channels per north bridge/control unit

CR: Rank capacity (device density x no. of DRAM devices)

with

nR: No. of ranks per memory module

E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank.

The resulting maximum memory capacity is:

CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB

3.1. Memory capacity (1)

3.1. Memory capacity (2)

Crucial factors limiting the maximum capacity of main memories

• nM: No. of memory modules supported per memory channel

• CR: Rank capacity (device density x no. of DRAM devices/rank).

Number of memory modulessupported per memory channel

1-4memory modules

6-8memory modules

Modules connectedvia a parallel bus

Modules connectedvia a serial bus

SDRAM, DDR, DDR2, DDR3modules

FBDIMM modules

Higher transfer rates limitthe number of mem. modules

typically to one or two.

Figure: Number of memory modules supported by memory channel

E.g.

3.1. Memory capacity (3)

Rank capacity (CR)

CR = nD x D

with nD: Number of DRAM devices/rank

D: Device density

Number of DRAM devices/ rank

E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices

Typically: up to 8

3.1. Memory capacity (4)

Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [35])

3.1. Memory capacity (5)

256M

64K

16M

1G

4M

256K

64M

1M

20151980 1985 1990 1995 2000 2005 2010

500

1000

1500

2000

16K

Units 106

Year

Density: ~4×/4Y

Device density

Typical maximum main memory sizes(CMmax) of recent Core 2 based desktops

Ranks include typically up to 8 DRAM devices.

• 2 memory channels• 1 modules per channel• dual ranked modules• populated with 8 x8 DDR2 or DDR3 devices of 1 Gb density:

CMmax = 1 x 2 x 1 x 2 x 1 = 8 GB

• 4 memory channels• 6 modules per channel• dual ranked modules• populated with 8 x8 FB-DIMM DDR2 devices of 4 Gb density:

CMmax = 1 x 4 x 6 x 2 x 4 = 192 GB

Typical maximum main memory sizes of recent Core 2 based servers, assuming:

3.1. Memory capacity (6)

assuming:

assuming:

For the same number of control units/modules/ranks

The rate of increasing DRAM densities

In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month

DRAM densities evolve about 4 x/ 4 years.

the maximum size of main memories increases also about 4 x/4 years.

3.1. Memory capacity (7)

Bandwidth of memory systems

Total bandwidth (BW) provided by a memory system:

BW = nCU x nCH x T x WM

T: Transfer rate of the module (no. of data transfers/sec)

nCU: No. of north bridges/memory control units

nCH: No. of memory channels per north bridge/control unit

WM: Data width of the memory modules

E.g. A memory system with a single, dual channel controller and 8 Byte wide DDR2 800 modules provides a total bandwidth of:

BW = 1 x 2 x 800 x 8 MB/s = 12.8 GB/s

Processors with increasing number of cores require obviously, increasingly higher memory bandwidth.

3.2. Memory bandwidth (8)

with

Figure: The interpretation of tCCD [36]

3.2. Memory bandwidth (10)

The min. column cycle time (tCCD) of the memory cell array

tCCD (Core column delay)

is the min. time interval between consecutive Reads or Writes.

Remark

tCCD is designated also as the Read/Write command to Read/Write command delay

Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [37]

3.2. Memory bandwidth (11)

ns

Note: The min. column cycle time (tCCD) of synchronous DRAMs is:

SDRAM: 7.5 nsDDR/2/3 5 ns

The crucial factor limiting the memory bandwidth of the main memory:

Transfer rate of the memory module (no. of data transfers/sec)

The transfer rate of the memory module (T)

equals the transfer rate of the DRAM devices used.

Tmax = 1/tCCD x FW

with tCCD: Min. column cycle time of the memory cell array

FW: Fetch width of the memory cell array

3.2. Memory bandwidth (9)

The peak transfer rate (Tmax) of synchronous DRAM devices:

specifies how many times more bits the cell array fetches per column cycle then the data width of the device.

E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle.

The fetch width (FW) of the memory cell array of synchronous DRAMs is:

SDRAM: 1DDR: 2DDR2: 4DDR3: 8

DRAM type FW

3.2. Memory bandwidth (12)

The fetch width (FW) of the memory cell array

SDRAM: 1/7.5 x 1 = 133 MT/sDDR: 1/5 X 2 = 400 MT/sDDR2: 1/5 x 4 = 800 MT/sDDR3: 1/5 x 8 = 1600 MT/s

The peak transfer rates of the different DRAM technologies are:

Tmax = 1/tCCD x FW

3.2. Memory bandwidth (13)

3.2. Memory bandwidth (14)

Transfer rate(MT/s)

50

100

500

Year03 0596 97 98 99 2000 01 02 04 06 07 08

*

**

*

*

*

**

20

*

1000

SDRAM66

5000

200

2000

10

~ 10*/10years

DDR266

DDR2533

SDRAM100

DDR31067

DDR2667

DDR2800

DDR333

SDRAM133

*

DDR400

Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

Peak transfer rates evolve by ≈ 10x/10 years,

that means doubling in 3-4 years

Sources of the evolution

the introduction of new syncronous DRAM technologies (SDRAM/DDR/DDR2/DDR3)

The evolution of peak transfer rates of synchronous DRAMs

3.2. Memory bandwidth (15)

More specifically

the more and more advanced approaches to improve first of all

• signaling (by using SSTL_2/1.8/1.5, differential CK/DQS)

• synchronisation (by using source synchronisation, DLLs to align CK with DQs etc.) and

• line terminations (by using ODT, dynamic ODT, ZQ calibration etc.)

The evolution of processor clock frequencies vs transfer rates of main memories in mainstream processors

3.2. Memory bandwidth (16)

5

10

50

Year

*

** *

2

8088

*

100

386

Pentium

Year of first volume shipment

cf

500

1000

20

200

*

486-DX2

79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978

*

*

**

*

*486

*

** *

*

** *

**

Pentium II

***Pentium III

*

286

*

Pentium Pro

1

486-DX4

2000 01 02 03

2000**

***

***

*

*

5000

Pentium 4

~10*/10years

~100*/10years

04 05

* * *

Leveling off(MHz)

Figure: Evolution of clock frequencies in Intel’s desktop processors

The evolution of processor clock frequencies (fC) in desktops

3.2. Memory bandwidth (17)

3.2. Memory bandwidth (21)

Transfer rate(MT/s)

50

100

500

Year03 0596 97 98 99 2000 01 02 04 06 07 08

*

**

*

*

*

**

20

*

1000

SDRAM66

5000

200

2000

10

~ 10*/10years

DDR266

DDR2533

SDRAM100

DDR31067

DDR2667

DDR2800

DDR333

SDRAM133

*

DDR400

Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

5

10

50

Year

*

** *

2

8088

*

100

386

Pentium

Year of first volume shipment

cf

500

1000

20

200

*

486-DX2

79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978

*

*

**

*

*486

*

** *

*

** *

**

Pentium II

***Pentium III

*

286

*

Pentium Pro

1

486-DX4

2000 01 02 03

2000**

***

***

*

*

5000

Pentium 4

~10*/10years

~100*/10years

04 05

* * *

Leveling off(MHz)

Figure: Evolution of clock frequencies in Intel’s desktop processors

The evolution of processor clock frequencies (fC) in desktops between 1995-2003

3.2. Memory bandwidth (19)

clock frequencies arose by a rate of ≈ 100x/10 years

transfer rates of main memories only by a rate of ≈ 10x/10 years.

In the time period of about 1995 - 2003

3.2. Memory bandwidth (20)

clock frequencies arose by a rate of ≈ 100x/10 years

transfer rates of main memories only by a rate of ≈ 10x/10 years.

In the time period of about 1995 - 2003

a strong motivation arose to increase the bandwidth of main memories by increasing the width of the datapath to the main memory ,

first of all by introducing dual memory channels.

Dual memory channels became the commonplace even in desktops.

the gap between clock frequencies and memory transfer rates became continuously wider.

In this time period higher clock rates were the main source for higher proc. performance, but higher processor performance invokes higher memory traffic

3.2. Memory bandwidth (22)

In this time period

5

10

50

Year

*

** *

2

8088

*

100

386

Pentium

Year of first volume shipment

cf

500

1000

20

200

*

486-DX2

79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978

*

*

**

*

*486

*

** *

*

** *

**

Pentium II

***Pentium III

*

286

*

Pentium Pro

1

486-DX4

2000 01 02 03

2000**

***

***

*

*

5000

Pentium 4

~10*/10years

~100*/10years

04 05

* * *

Leveling off(MHz)

Figure: Evolution of clock frequencies in Intel’s desktop processors after about 2003

The evolution of processor clock frequencies (fC) in desktops after about 2003

3.2. Memory bandwidth (24)

After about 2003 however,

clock frequencies became saturated (due to meeting the thermal wall), and single core processors represented the mainline until about 2005.

the gap between clock frequencies and memory transfer rates became narrover.

Nevertheless,

beginning with ~ 2005

the era of multicores emerged

with doubling the core count about every two years.

A new scenario becomes dominant with steadily increasing bandwidth/transfer rate requirements.

3.2. Memory bandwidth (23)

In the time period of about 2003 - 2005

Beginning with about 2005

The status quo in increasing bandwidth/transfer rates

3.2. Memory bandwidth (25)

. double data rate SDRAM migration

                                                                             

3.2. Memory bandwidth (26)

Figure: Evolution of the bandwidth of dual-channel synchronous DRAM memory systems [56]

Figure: Evolution of transfer rates (per pin bandwidth figures) of different DRAM types [40]

3.2. Memory bandwidth (27)

• Device level memory latency

• System level memory latency

3.3. Memory latency (1)

Memory latency

Figure: Estimated maximum and minimum read latencies of DRAM devices (ns)

3.3. Memory latency (2)

1 Read latency of DRAM, FPM, EDO and BEDO parts = tRAC (Row access time (time from row address until data valid)) Read latency of SDRAM parts = CL + tRCD (CAS Latency + Row to Column delay)2 The 815 chipset supports SDRAMs while the 820 RDRAMs3A new revision of the 845 supports DDRs instead of SDRAMs

486 DX P PII PIII386 DX

86 8881 82 83 84 85 87 89 90 91 92 93 94 95 96 97 98 99

200

180

160

140

120

100

80

60

40

20

2000

*

PC AT

*

*

* *

**

**

*

*

64 K 64 K 256 K 256 K 64 M

Year

processor

Chipset

Typ. DRAMchips (bits)

(ns)

FPM

4 M

1 M 1 M 16 M 128 M

64 M

16 M

64 M

256 M

200

150

100

70

80

60

70

5060

50

35

FPMEDO

SDRAMEDO

SDRAMRDRAM

64 K

01 02 03 04 05 06 07

FPMFPMFPM FPM

64 K

P4

128 M

256 M

SDRAM

Core2

512 M

1 G

2 G

DDR2

*****

*

30

3025

40

24 22

256 K 256 K

256 M

512 M

1 G

DDR DDRDDR2

DDR3DDR2

40*

Desktop

DRAM type

Readlatency1

512 M

1 G

835865915845

256 M

512 M

1 G

8453

512 M

RDRAM

128 M

256 M

8152

8202

850

FPMEDO

SDRAM

4 M256 K

FPM

1 M

440ZX430VX430FX420TX 430LX

16 M

4 M

256 K

*100

80 *

Figure : Estimated typical system-level memory latency in x86-based PCs (in ns)

486 DX P PPro PII PIII386 DXPC AT(286)(8088)

P4

Memory latencyns

300

200

100

*

*

* **

155135 140

120

210

*200

86 8881 82 83 84 85 87 89 1990 91 92 93 94 95 96 97 98 99 2000 Year01 02 03 04 05 06 07 08

*160

*110

*85

*70

50

Core2processorChipset

Typ. DRAMparts (bits)

Desktop

DRAM type

16 K

DRAM

64 K 64 K

DRAMDRAM

64 K

128 K 128 K

256 K

256 K

1 M

DRAM FPM

DRAM FPM

256 K

FPM

4 M

1 M

256 K

FPM

1 M

420TX 430LX

16 M

64 M

EDOFPM

EDOFPM

SDRAM

4 M

430VX430FX

16 M

4 M 64 M

128 M

16 M

64 M

256 M

EDOSDRAM

RDRAMSDRAM

64 M

128 M

256 M

SDRAM DDR

845

256 M

512 M

1 G

8453

512 M

RDRAM

128 M

256 M

8152

8202

850440ZX

512 M

1 G

2 G

DDR2

256 M

512 M

1 G

DDRDDR2

DDR3DDR2

512 M

1 G

835865915

RDRAM

3.3. Memory latency (3)

Figure 5.1c: System-level memory latencies in x86-based PCs (in proc. clock cycles)

486 DX P PPro PII PIII386 DXPC AT(286)(8088)

P4 Core2processor

Chipset

Typ. DRAMparts (bits)

Desktop

DRAM type

16 K

DRAM

64 K 64 K

DRAMDRAM

64 K

128 K 128 K

256 K

256 K

1 M

DRAM FPM

DRAM FPM

256 K

FPM

4 M

1 M

256 K

FPM

1 M

420TX 430LX

16 M

64 M

EDOFPM

EDOFPM

SDRAM

4 M

430VX430FX

16 M

4 M 64 M

128 M

16 M

64 M

256 M

EDOSDRAM

RDRAMSDRAM

64 M

128 M

256 M

SDRAM DDR

845

256 M

512 M

1 G

8453

512 M

RDRAM

128 M

256 M

8152

8202

850440ZX

512 M

1 G

2 G

DDR2

256 M

512 M

1 G

DDRDDR2

DDR3DDR2

512 M

1 G

835865915

Memory latencyin proc. cycles

86 8881 82 83 84 85 87 89 1990 91 92 93 94 95 96 97 98 99

100

10

12000 Year

50

1000

3020

500

200

23

5

*

*

*

10

40

85

300

**

*

1 1

3

01 02 03 04 05 06 07 08

* **

*240 220 280

180RDRAM

3.3. Memory latency (4)

[1]: 64MB Apple G3 Beige 168p SDRAM DIMM, http://www.memoryx.net/apl168s64.html

[2]: 4, 8 MEG x 32 DRAM SIMMs, Micron, http://www.pjrc.com/mp3/simm/datasheet.html

[3]: 168 Pin, PC133 SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.2

[4]: 184 Pin Unbuffered DDR SDRAM DIMM Family, JEDEC Standard No. 21-C, Page 4.5.10

[5]: Direct Rambus DRAMM RIMM Module, 512 MB, MC-4R512FKE6D, Elpida, http://pdf1.alldatasheet.com/datasheet-pdf/view/60081/ELPIDA/MC-4R512FKE6D.html

[6]: DDR2 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist

[7]: DDR3 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist

[8]: DDR2 SDRAM FBDIMM Features, Micron, http://www.micron.com/products/modules/fbdimm/partlist

[9]: Torres G., „Memory Tutorial”, July 19, 2005, Hardwaresecrets, http://www.hardwaresecrets.com/article/167/1

[10]: Besedin D., „First look at DDR3”, Digit-life, June 29, 2007, http://www.digit-life.com/articles2/mainboard/ddr3-rmma.html

4. References (1)

[11]: http://www.hardwaresecrets.com/fullimage.php?image=2862

4. References (2)

[12]: http://cgi.ebay.com/Vintage-Microsoft-8-Bit-ISA-PC-RAM-Card-W-Gold-5150_ W0QQitemZ310017171151QQcmdZViewItem

[13]: http://www.hardwaresecrets.com/fullimage.php?image=2856

[14]: http://www.memex.com.au/images/72psimm.jpg

[15]: Ahn J.-H., „DRAM Operation & Architecture,” 2007. 9. 10., Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf

[16]: http://www.twinmos.com/dram/dram_p_dt_ddr.htm#s

[17]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg

[18]: http://www.twinmos.com/dram/dram_p_dt_ddr3_1333.htm#s

[19]: http:// item.express.ebay.com/16mb-EDO-3-3V-72-Pin-SODIMM-LAPTOP-RAM- LAPTOP-16mb-EDO_W0QQitemZ230060958674QQihZ013QQcmdZExpressItem

[20]: http:// www.twinmos.com/dram/dram_p_nb_sdr_sodimm.htm

[21]: http:// www.cdw.com/shop/products/default.aspx?EDC=915882

[22]: http:// laptoping.com/category/laptop-memory

[23]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg

[24]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg

[25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ SD18C32_64_128x72D.pdf

[25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ sd18c32_64_128x72.pdf

[26]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr/ DDF18C64_128x72D.pdf

[27]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr2/ HTF18C64_128_256x72D.pdf

[28]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr3/ JSF18C256x72PD.pdf

[29]: PLL Clock Driver for 2.5V DDR-SDRAM Memory, Datasheet, Pericom, Febr. 2003, http://www.pericom.com/pdf/datasheets/PI6CV857.pdf

[30]: PC2100 and PC1600 DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Revison 1.3, Jan. 2002, http://www.jedec.org/download/search/4_20_04R13.PDF

[31]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/

[32]: http://www.pricegrabber.com/search_getprod.php/masterid=3191326

[33]: Definition of CDCV857 PLL Clock Driver for Registered DDR DIMM Applications, JESD82, JEDEC, July 2000

4. References (3)

[34]: http://www.tranzistoare.ro/datasheets2/32/327037_1.pdf

[35]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf

[36]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf

[37]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf

4. References (4)

[39]: Van Roon T., „What exactly is a PLL?,” April 2006, http://www.uoguelph.ca/~antoon/gadgets/pll/pll.html

[38]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, http://www.mosaid.com/corporate/products-services/ip/ SDRAM_Controller_whitepaper_Oct_2006.pdf

[40]: Choi J. H., „High Speed DRAM,” Memory Division, Samsung, 2004, http://asic.postech.ac.kr/1.Nrl/2.NRL%20Seminar/invitation/041208ChoiJH.pdf

[42]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc.

[41]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/ cse320_f07/xilinx_intro.ppt

[44]: Tam S., „Single Error Correction and Double Error Detection,”, XILINX Application Note XAP645 (v.2.2), Aug. 2006, http://www.xilinx.com/support/documentation/ application_notes/xapp645.pdf

[45]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org

[46]: Understanding DDR3 Serial Presence Detect (SPD) Table, July 17, 2007, Simmtester, http://www.simmtester.com/PAGE/news/showpubnews.asp?num=153

[47]: DDR2 DIMM SPD Definition, August 25, 2006, http://docmemory.com/page/news/showpubnews.asp?num=141

[48]: Memory Module Serial Presence-Detect, TN-04-42, Micron, 2002 http://download.micron.com/pdf/technotes/TN_04_42_C.pdf

[43]: 64-bit Flow-Thru Error Detection and Correction Unit, IDT49C466, Integrated Device Technology Inc., 1999, http://www.digchip.com/datasheets/parts/ datasheet/222/IDT49C466.php

4. References (5)

[49] Intel 845 Chipset: 8245 Memory Controller Hub (MCH) for DDR, Datasheet, Jan. 2002, Intel, No. 298604-001

[51] Supermicro X6DH8-G2, X6DHE-G2 Mainboards User’s Manual, Rev. 1.1b, June 2007, SUPER MICRO Computer Inc.

[50] Intel 975X Express Chipset: 82975X Memory Controller Hub (MCH), Datasheet, Nov. 2005, Intel, No. 310158-001

[54]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1

[55]: PCI Technology overview, Febr. 2003, http://www.digi.com/pdf/prd_msc_pcitech.pdf

[56]: DDR3 SDRAM, Samsung, http://www.samsung.com/global/business/semiconductor/ products/dram/Products_DDR3SDRAM.html

[57]: Le H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, 2007. pp 639-662

[58]: Kanter D., „Inside Barcelona: AMD's Next Generation,” May 2007, http://www.realworldtech.com/includes/templates/articles.cfm? ArticleID=RWT051607033728&mode=print

[59]: Golla R., „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006 http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf

[60]: Hofstee P., „Tutorial: Hardware and Software Architectures for the CELL BROADBAND ENGINE processor”, IBM Corp., September 2005 http://www.crest.gatech.edu/conferences/cases2005/pdf/Cell-tutorial.pdf

4. References (6)

[52]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm

[53]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm

[63]: http://www.supermicro.com/manuals/motherboard/E7221/MNL-0776.pdf

[65]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm

[64]: http://www.supermicro.com/manuals/motherboard/7300/MNL-0955.pdf

[66]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf

[67]: 204-Pin DDR3 SDRAM Unbuffered SO-DIMM Design Specification, JEDEC Standard No. 21C, Page 4.20.18-1

[68]: Jacob B. & Wang D., „Memory Systems: Circuits, Architecture and Performance Analysis,” Lecture notes, University of Maryland, ENEE759H, Spring 2005

[69]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf

[70]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf

4. References (7)

[61]: 915 P/G Combo Mainboard (MS-7058) Manual, Mai 2004, MSI

[62]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, http://www.intel.com/technology/magazine/ computing/fully-buffered-dimm-0305.htm