Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 3. Overall design space of main memories.
Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 1. Macroarchitecture and performance...
-
Upload
maria-tucker -
Category
Documents
-
view
213 -
download
1
Transcript of Dezső Sima September 2008 (Ver. 1.0) Sima Dezső, 2008 1. Macroarchitecture and performance...
Dezső Sima
September 2008
(Ver. 1.0) Sima Dezső, 2008
1. Macroarchitecture and performance parameters of MMs
Overview
1. Introduction
2. Macroarchitecture of main memories•
3. Key performance parameters of main memories•
•
4. References•
General purpose main memories, i.e. main memories used in desktops, servers and laptops
1. Introduction (1)
Scope
Layout of main memories
Macroarchitecture of the main memory
Layout of thememory modules
Figure: Main dimensions of the layout of main memories
1. Introduction (4)
2. Macroarchitecture of main memories
2.1 Introduction
2.4 Number of memory controllers•
2.3 Point of attachment•
2.5 Number of memory channels•
•
2.6 Attributes of memory channels•
2.2 Attachment policy•
L2 contr.
Core
L2
FSB c.
FSB
NorthBridge
Mem. channel Mem. modules
L2
FSB c.
FSB
NorthBridge Memory
Macroarchitecture of main memories
Example 1
Memory
L2 contr.
Core
ProcessorProcessor
Figure: Single channel main memory attached via the FSB and the north bridge
2.1. Introduction (1)
L2 contr.
Core
L2
Core
FSB c.
FSB
NorthBridge
Mem. channels
Mem. modules
L2 contr.
Core
L2
Core
FSB c.
FSB
NorthBridge Memory
Memory
Processor Processor
Figure: Dual channel main memory attached via the FSB and the north bridge
Example 2
2.1. Introduction (2)
IN (Xbar)
B. c. M. c.
IO-bus
Core
L2
Memory B. c. M. c.
IO-bus
Mem. channel Mem. modules
Memory
Processor Processor
IN (Xbar)
Core
L2
Figure: Single channel main memory attached via a dedicated memory controller
Example 3
2.1. Introduction (3)
IN (Xbar)
Syst. Req. Queue
B. c. M. c.
IO-bus
Core Core
L2 L2
Memory
IN (Xbar)
Syst. Req. Queue
B. c. M. c.
IO-bus
Core Core
L2 L2
Mem. channels
Mem. modules
Memory
Processor Processor
Figure: Dual channel main memory attached via a dedicated memory controller
Example 4
2.1. Introduction (4)
Macroarchitecture of main memories
No. of mem. contr.s
(in case of directattachment)
No. of mem. channels
Attachment policy Point of attachment
Figure: Main dimensions of the macroarchitecture of main memories
Attributes of mem. channels
2.1. Introduction (5)
Attachment policy
Direct attachment Indirect attachment
POWER4 (2001)
UltraSPARC IV+ (2005)
POWER5 (2005)
Montecito (2006) UltraSPARC T1 (2005)
UltraSPARC IV (2004)
Athlon 64 X2 line (2005)
PA-8800 (2004)
PA-8900 (2005)
Core Duo line (2006)
• Longer access times (~20-30%),• Independency of memory technology and speed
• Shorter access times (~20-30%),• Dependency of memory technology and speed
POWER6 (2007)
Figure: Attachment policy
2.2. Attachment policy (1)
Attachment via the FSB andnorth bridge (mem. control hub)
Attachment via mem. controller(s)
Opteron line (2003)
Barcelona (2007)
Cell BE (2006)
L2 contr.
Core
L2
Core
FSB c.
FSB
Core Duo (2006)
Core 2 Duo (2006)
IN (Xbar)
System Request Queue
B. c. M. c.
HT-bus
Athlon 64 X2 (2005)
NorthBridge
MemoryMemory
Figure:Indirect attachmentof the main memory to the syst. architecture
Figure: Direct attachmentof the main memory to the syst. architecture
Core Core
L2 L2
2.2. Attachment policy (2)
The highest cache level
(via an IN)
The point of attachment
Between the two highest cache levels
(via the IN connecting these levels)
2-level caches: 3-level caches: 2-level caches: 3-level caches:
The IN connecting the L2 cache
The IN connecting the L3 cache
The IN connecting the L1 and L2 caches
The IN connecting the L2 and L3caches
The M. c is connected usually in this way if the highest cache level is exclusive.
The M. c is connected usually in this way if the highest cache level is inclusive.
L3
IN
L3
M
IN
L3 L3 L3
L2 L2 L2
M
L2
IN
L2
M
IN1
L2 L2
CC
M
Figure: Possible points of attachment of main memory to the system architecture
2.3. Point of attachment (1)
Data missing in L2/L3 (high traffic)
L2
M.c.
Replaced lines
Replaced, modified data(low traffic)
Lines missing in L2 are reloaded and deleted from L3
L3
Memory
L2
IN
L2
L3 L3
M.c.
L3 L3
M.c.
Memory
Memory
Montecito (2006)POWER4 (2001) UltraSPARC IV+ (2004)
POWER5 (2004)
Interrelationsship between inclusion policy of L3 caches and point of attachment
Memory
L3
L2
Inclusive L3 Exclusive L3
2.3. Point of attachment (2)
2.3. Point of attachment (3)
Core
L2 I L2 D
L3
Core
L2 I L2 D
L3
FSB c.
FSB
Montecito (2006)
L2 contr.
Core
L2
Core
FSB c.
FSB
Athlon 64 X2 (2005)Core 2 Duo (2006)
In case of a two-level cache hierarchy In case of a three-level cache hierarchy
IN (Xbar)
Memory
System Request Queue
B. c. M. c.
HT-bus
L2 L2
Core Core
Figure: Examples for attaching memory via the highest cache level
2.3. Point of attachment (4)
UltraSPARC T1 (2005) UltraSPARC IV+ (2005)
In case of a two-level cache hierarchy In case of a three-level cache hierarchy
(exclusive L3)
L2L2 M. c.
B. c.
L2
L2
L2Core 7
M. c.
M. c.
M. c.
Core 0
X
b
a
r
Memory
Memory
Memory
Memory
JBus
Core
L3 tags/contr.
L3 data
Interconn. network
M. c.
Memory
B. c.
Fire Planebus
Core
L2
Figure: Examples for attaching memory via the interconnection network connecting the two highest cache levels
Number ofmemory controllers(in case of direct attachment)
Dualmemory controllers
Singlememory controller
Usual implementations
POWER6 (2007)
Figure: Number of memory controllers (in case of direct attachment)
UltraSPARC T2 (2007)
Quadmemory controllers
2.4. Number of memory controllers (1)
Barcelona (2007)
E.g. POWER5 (2004)
K8-based processors (2006)
A few recent designsTyp. use Exceptional designs
UltraSPARC T1 (2005)
Figure: Block diagrams of the POWER5 and POWER6 processors [57]
2.4. Number of memory controllers (2)
Figure: Block diagrams of AMD’s K8 and Barcelona processors [58]
2.4. Number of memory controllers (3)
Number of memory channels(per north bridge/memory controller)
Dualmemory channels
Singlememory channel
Quadmemory channels
E.g. Intel’s 845/848 chipset familiesfor P4 desktops
and earlier desktopchipsets
Intel’s 865 and higherchipset familiesfor P4 desktops,Intel’s P4 based
DP server chipsets
Intel’s 5000 (Bensley) and 7000 Caneland
platforms for Core Duo DC and MC processors
Figure: Number of memory channels supported per north bridge/memory controller
2.5. Number of memory channels (1)
Typ. use Early desktops Recent desktops,single core
DP/MP servers
Recent DC and QC DP/MP servers withFB DIMM memory
Cell BE
Figure: Block diagram of an early P4 desktop having a single memory channel (Intel 845 chipset) [49]
2.5. Number of memory channels (2)
Example 1
Figure: Block diagram of a more advanced P4 desktop including dual memory channels (Intel’s 975 chipset) [50]
2.5. Number of memory channels (3)
Example 2
Figure: Block diagram of an early P4-based DP server including dual memory channels (Supermicro’s E7520 chipset based X6DH8-G2/X6DHE-G2 motherboard) [51]
Example 3
2.5. Number of memory channels (4)
Memory Interface Controller (MIC)
• Dual XDRTM memory channels
• Interleaved adressing in the channels• The MIC can be configured to support only a single channel
• ECC support (32 + 4 bits)
2.5. Number of memory channels (5)
Dual 36 bits wide XDR channels
Figure: Basic blocks of the Cell BE processor [60]
3.2 Gb/s x 2 x 4 B = 25.6GB/s
Memory bandwidth at 3.2 Gb/s transfer rate:
2.5. Number of memory channels (6)
Remark
In dual channel configurations (or in general, in case of multiple memory channels) a scheme is needed to define the allocation of memory addresses to the individual channels.
Allocation of addresses to the individual channels
Asymmetric modeInterleaved mode
• Addresses are allocated alternating to the channels at 64 B boundaries, assuming 64 B long cache lines. Two consecutive cache lines can be retrieved simultaneously.
• Both memory channels must be populated with modules having the same size (e.g. 1 GB).
• Provides maximum performance in real applications.
• Addresses start in the first channel and are allocated to this channel until the highest rank of this channel. Then addresses continue in the second channnel.
• No need to populate both channels, or populate them with the same size.
• In real applications, performance is limited to single channel performance.
Figure: Address allocation alternatives to the individual channels
5000 (Dempsey, Netburst), DC
5100 (Woodcrest, Core 2), DC
5300 (Clowertown, Core 2), QC
2.5. Number of memory channels (7)
FB-DIMM
up to 64 GB
Xeon
In workstations the snoop filtereliminates snoop traffic to the
graphics port
5000(Blackford)
Figure: Block diagram of Intel’s 5000 (Bensley) DP platform for DC/QC Core 2 Duo processorsincluding quad memory channels [52]
Example 4
FB-DIMM
up to 512 GB
7200 (Tigerton DC, Core2), DC
Xeon
7300 (Tigerton QC, Core2), QC
2.5. Number of memory channels (8)
Figure: Block diagram of Intel’s 7300 (Bensley) MP platform for DC/QC Core 2 Duo processorsincluding quad memory channels [53]
Example 5
Figure: Maximum supported FB-DIMM configuration [54]
(6 channels/8 DIMMs)
Remark
The FBI technology supports even 6 memory channels with 8 DIMMs each [54], nevertheless actual implementations support typically only four DIMMs.
2.5. Number of memory channels (9)
Attributes of memory channels
Supported type of mem. modules
Supported no. of mem. modules
Supported no. of ranks per mem. module
Supported attributes of DRAM devices
Figures: Attributes of memory channels
2.6. Attributes of memory channels (1)
Suported type of memory modules
Memory modules of different DRAM types
Memory modules of the same DRAM type
In order to provide a choice and evolution path in times of
memory technology transfers(e.g. while DDR2 technology replaces DDR technology)
DRAM type B DRAM type A
Usual implementation
E.g. DDR DDR2DDR2
Figure: Type of memory modules supported on the memory channel(s)
2.6. Attributes of memory channels (2)
Example
Intel’s 915P/G chipsets support dual memory channels with either DDR or DDR2 technologies. Per channel a single memory module is supported (with one or two memory ranks on each).
Accordingly, a mainboard based on the 915G chipset, such as MSI’s 915G Combo mainboard, is a designated as a combo mainboard.
2.6. Attributes of memory channels (3)
Note: Motherboards allowing to choose from two different DRAM types are termed Combo boards.
Figure: MSI’s 915G Combomotherboard (based on
Intel’s 915G chipset) [61]
North bridge of the 915G chipset
4 DIMM slots
2.6. Attributes of memory channels (4)
Figure: DIMM slots of theMSI’s 915G Combomotherboard [61]
DDR2
DDR
Two DDR or DDR2 channelswith a single DIMM slot
on each channel
2.6. Attributes of memory channels (5)
Supported number of memory modules
It depends on the
• DRAM connection technology
• DRAM speed
• Number of ranks mounted onto the memory module(s).
2.6. Attributes of memory channels (6)
The maximum number of supported memory modules depends heavily on the memory connection technology, that is whether the modules are connected
• via a parallel bus (as in case of SDRAM, DDR, DDR2, DDR3 modules) or• via a serial bus (like in case of FBDIMM modules).
Number of memory modulessupported per memory channel
1-4memory modules
6-8memory modules
Modules connectedvia a parallel bus
Modules connectedvia a serial bus
E.g. SDRAM, DDR, DDR2, DDR3modules
FBDIMM modules
Figure: Number of memory modules vs memory connection technology in synchronous DRAMs
2.6. Attributes of memory channels (7)
Dependency on the memory connection technology
Remarks
1. Early chipsets supporting low speed 1 or 4 Byte wide asynchronous DRAMs often allowed 4 – 8 memory modules to attach.
2.6. Attributes of memory channels (8)
2. The Pentium processor provided a 64-bit wide datapath. So early (430 family) chipsets supported typically two pairs of 32-bit wide FPM/EDO modules.
• skews• jitter and• reflections (caused by impedance mismatch while terminating transmission lines)
Higher transfer rates limit the number of memory modules that can be supported on a memory channel.
2.6. Attributes of memory channels (9)
For higher transfer rates
Obviously, the more memory modules are present on a channel the serious signal integrity problems arise.
impede more and more signal integrity.
Dependency on the memory speed
Figure: Scaling down the number of supported DIMMs per channel with increasing data rates (assuming two ranks per DIMM) [62]
2.6. Attributes of memory channels (10)
Figure: Scaling down the number of PCI-X slots with increasing PCI-X bus speed [55]
2.6. Attributes of memory channels (11)
But increasing server performancedoubles memory capacity demand
about every two years [66]
• increasing device densities • but decreasing number of modules supported for higher transfer rates by memory channels,
Figure: Channel capacity of synchronous SDRAMs vs memory capacity demand [66]
With
the maximum memory capacity per memory channel remains roughly the same for synchronous SDRAM devices [66].
2.6. Attributes of memory channels (12)
Levelling off channel capacity for synchronous DRAMs
2.6. Attributes of memory channels (13)
Increasing server capacity demand calls for memory technologies with higher capacity potential, such as DRAM technologies with serial bus connection, like FB-DIMM.
Dependency on the number of ranks mounted onto the memory modules
Dual memory ranks mounted on the memory modules result in higher bus loading, and may reduce the maximum number of supported memory slots.
E.g. the north bridge of Intel’ 815 chipset supports at 133 MHz memory speed
• up to three SDRAM DIMMs with just a single rank or • up to two SDRAM DIMMs with dual ranks.
2.6. Attributes of memory channels (14)
Number of memory modulessupported per memory channel
1-2memory modules
6-8memory modules
Figure: Number of memory modules supported per memory channel by Intel’s P4/Core 2 Duo north bridges
Desktops/entry level
servers
Typical use
2.6. Attributes of memory channels (15)
DP/MP serverswith FBDIMM
mem. modules
2.6. Attributes of memory channels (16)
Figure: Example 1. P4 based desktop motherboard
(MSI’s 915G Combomotherboard with
Intel’s 915G chipset) [61]
4 DIMM slots
Two DDR or DDR2 channelswith a single DIMM slot
on each channel
DDR2
DDR
Figure: Example 2. P4-based entry-level DP server motherboard
(Supermicro’s P8SCT with Intel’s E7221 chipset) [63]
CPU
MCH(E7221)
2.6. Attributes of memory channels (17)
Two DDR2 channels with two DIMM slots on each channel
Ch. A Ch. B4 DIMM slots
Figure: Example 3. Block diagram of a Core 2 based four-processor MP server (Supermicro’s X7QC3 with Intel’s 7300 North bridge) [64]
2.6. Attributes of memory channels (18)
4 DDR2 FB-DIMMchannels
6 DIMM slots on each channel
192 GB ATI ES1000 Graphics with 32MB video memory
7200 DC 7300 QC(Tigerton)
Xeon
SBE2 SB
7300 NB
2.6. Attributes of memory channels (19)
Figure: Example 3. Core 2 based four-processor MP server motherboard (Supermicro’s X7QC3 with Inte’s 7300 North bridge) [64]
4 DDR2 FB-DIMMchannels
6 DIMM slots on each channel
Figure: Example 4. Block diagram of Intel’s Core 2 based 7300 (Caneland) MP platformwith the 7300 (Clarksboro) chipset (9/2007) [65]
up to 512 GB
7200 (Tigerton DC, Core2), DC
Xeon
7300 (Tigerton QC, Core2), QC
2.6. Attributes of memory channels (20)
Four DDR2 FB-DIMM channels with 8 DIMM slots on each channel
Rank: logical unit
• A rank consists of a set of DRAM devices (of a given width) that are needed to achieve the expected data width of the memory module.
E.g. a 64-bit wide rank consists of 8 8-bit wide or 4 16-bit wide DRAM devices.
• DRAM devices constituting a rank are mounted side by side onto a memory module.
• Optionally, a rank may include an additional DRAM device to hold ECC bits.
• All devices of a rank share the address and the command bus.
• All devices of a rank are selected by the same CS (Chip Select) signal, whereas different ranks have different CS signals.
A memory rank is sometimes designated also as a row.
2.6. Attributes of memory channels (21)
Supported number of ranks per memory module
Memory module: physical unit
• A rank covers usually one side of the memory module (using x8 or x16 devices, but 64-bit wide ranks built up of x4 devices (16 devices) cover typically both sides.
A memory module may contain
• a single rank on one of its sides• a single rank on both of its sides• two ranks, each one of its sides
• A memory module is basically a PC card that carries one or more ranks, and fits into a memory slot of the motherboard. • Memory modules may be populated either on one side or on both sides.
Memory module: physical unit
2.6. Attributes of memory channels (23)
Figure: Example 1: One 64-bit wide DDR3 SO-DIMM rank consisting of 4 16-bit DRAM devices,that are mounted on one side of the module [67]
2.6. Attributes of memory channels (24)
Figure: Example 2: One 64-bit wide DDR3 SO-DIMM rank consisting of 8 8-bit DRAM devices,that are mounted on both sides of the module [67]
2.6. Attributes of memory channels (25)
Figure: Example 3. Two 64-bit wide DDR3 SO-DIMM ranks, each consisting of 4 16-bit DRAM devices,that are mounted on both sides of the module [67]
2.6. Attributes of memory channels (26)
Supported number of ranks per memory module
Dual ranksare supported per mem. module
A single rankis supported per mem. module
Figure: Supported number of ranks (rows) per memory module
2.6. Attributes of memory channels (27)
Typical implementationIn few cases, usually as arestriction for higher DRAM speeds
Examples
a) The north bridge of Intel’s 815 chipset supports
• up to three SDRAM-133 DIMMs with just a single rank or • up to two SDRAM-133 DIMMs with dual ranks.
• up to three SDRAM-100 DIMMs with dual ranks or
b) The north bridge of Intel’s P35 chipset for Core 2 Duo processors supports
• up to two DDR2-800/667 or DDR3 1066/800 DIMMs with dual ranks
Supported attributes of DRAM devices
DRAM width DRAM density DRAM speed
Figure: Supported attributes of DRAM devices
2.6. Attributes of memory channels (28)
DRAM type
2.6. Attributes of memory channels (29)
DRAM
(1970)
FBDIMM
(2006)
DRDRAM
(1999)
DDR3
(2007)
DDR2
(2004)
DDR
(2000)
SDRAM
(1996)
FPM
(1983)
FP
(~1974)
XDR
(2006)1Year
of intro.
Asynchronous DRAMs Synchronous DRAMs
DRAMs with parallel bus connection
DRAMs with serial bus connection
DRAM types( for general use)
1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
Main stream DRAM types Challenging DRAM types
Figure: DRAM types for general use
EDO
(1995)
(Described in Sections 4, 5, 6 of the Chapter DRAM devices)
DRAM width
Most recent north bridges/memory controllers support x8 and x16 DRAM devices.
DRAM density
DRAM speed
North bridges/memory controllers specify the width of supported DRAM devices.
North bridges/memory controllers specify supported DRAM densities.
Example 1
Also north bridges/memory controllers specify supported DRAM speeds.
The north bridge of Intel’s 815 chipsets for Pentium 4 processors supports SDRAM devices with 16Mb/64Mb/128Mb/256Mb densities
2.6. Attributes of memory channels (30)
Example 2
The north bridge of Intel’s Series 3 chipset family for Core Duo and Core Quad processors supports DDR2 and DDR3 devices with 512Mb and 1Gb densities. .
5/02 10/02
845GL 845GV845G 845E 845GE
400 MHz 533/400 MHz
10/02
845xx family
(Brookdale)
Single channel SDR/DDR SDRAM
5/02
FSBHT not supported HT supported
845
5/02 10/02
845PE
PC133,DDR 266/200
DDR 266/200 DDR 266/200 DDR 333/266 DDR 333/266
9/01 1/02
PC133 DDR 266/200 PC133,DDR 266/200
(unbuffered)
HT support
DRAM speed
Features
MemoryMCH/GMCH
Max. memory 2 GB
11/01
845
Example: Supported DRAM speeds of the north bridges of Intel’s 845xx family of chipsets.
Another example:
The north bridge of Intel’s Series 3 chipsets for Core 2 Duo and Core 2 Quad processors support DDR2 devices with 667/800 MT/s or DDR3 devices with 800/1066 MT/s transfer rate.
2.6. Attributes of memory channels (31)
3. Key performance parameters of main memories
3.1 Memory capacity
3.3 Memory latency•
•
3.2 Memory bandwidth•
Memory capacity (CM)
CM = nCU x nCH x nM x nR x CD
nM: No. of memory modules per channel
nCU: No. of north bridges/memory control units
nCH: No. of memory channels per north bridge/control unit
CR: Rank capacity (device density x no. of DRAM devices)
with
nR: No. of ranks per memory module
E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank.
The resulting maximum memory capacity is:
CMmax = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB
3.1. Memory capacity (1)
3.1. Memory capacity (2)
Crucial factors limiting the maximum capacity of main memories
• nM: No. of memory modules supported per memory channel
• CR: Rank capacity (device density x no. of DRAM devices/rank).
Number of memory modulessupported per memory channel
1-4memory modules
6-8memory modules
Modules connectedvia a parallel bus
Modules connectedvia a serial bus
SDRAM, DDR, DDR2, DDR3modules
FBDIMM modules
Higher transfer rates limitthe number of mem. modules
typically to one or two.
Figure: Number of memory modules supported by memory channel
E.g.
3.1. Memory capacity (3)
Rank capacity (CR)
CR = nD x D
with nD: Number of DRAM devices/rank
D: Device density
Number of DRAM devices/ rank
E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices
Typically: up to 8
3.1. Memory capacity (4)
Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [35])
3.1. Memory capacity (5)
256M
64K
16M
1G
4M
256K
64M
1M
20151980 1985 1990 1995 2000 2005 2010
500
1000
1500
2000
16K
Units 106
Year
Density: ~4×/4Y
Device density
Typical maximum main memory sizes(CMmax) of recent Core 2 based desktops
Ranks include typically up to 8 DRAM devices.
• 2 memory channels• 1 modules per channel• dual ranked modules• populated with 8 x8 DDR2 or DDR3 devices of 1 Gb density:
CMmax = 1 x 2 x 1 x 2 x 1 = 8 GB
• 4 memory channels• 6 modules per channel• dual ranked modules• populated with 8 x8 FB-DIMM DDR2 devices of 4 Gb density:
CMmax = 1 x 4 x 6 x 2 x 4 = 192 GB
Typical maximum main memory sizes of recent Core 2 based servers, assuming:
3.1. Memory capacity (6)
assuming:
assuming:
For the same number of control units/modules/ranks
The rate of increasing DRAM densities
In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month
DRAM densities evolve about 4 x/ 4 years.
the maximum size of main memories increases also about 4 x/4 years.
3.1. Memory capacity (7)
Bandwidth of memory systems
Total bandwidth (BW) provided by a memory system:
BW = nCU x nCH x T x WM
T: Transfer rate of the module (no. of data transfers/sec)
nCU: No. of north bridges/memory control units
nCH: No. of memory channels per north bridge/control unit
WM: Data width of the memory modules
E.g. A memory system with a single, dual channel controller and 8 Byte wide DDR2 800 modules provides a total bandwidth of:
BW = 1 x 2 x 800 x 8 MB/s = 12.8 GB/s
Processors with increasing number of cores require obviously, increasingly higher memory bandwidth.
3.2. Memory bandwidth (8)
with
Figure: The interpretation of tCCD [36]
3.2. Memory bandwidth (10)
The min. column cycle time (tCCD) of the memory cell array
tCCD (Core column delay)
is the min. time interval between consecutive Reads or Writes.
Remark
tCCD is designated also as the Read/Write command to Read/Write command delay
Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [37]
3.2. Memory bandwidth (11)
ns
Note: The min. column cycle time (tCCD) of synchronous DRAMs is:
SDRAM: 7.5 nsDDR/2/3 5 ns
The crucial factor limiting the memory bandwidth of the main memory:
Transfer rate of the memory module (no. of data transfers/sec)
The transfer rate of the memory module (T)
equals the transfer rate of the DRAM devices used.
Tmax = 1/tCCD x FW
with tCCD: Min. column cycle time of the memory cell array
FW: Fetch width of the memory cell array
3.2. Memory bandwidth (9)
The peak transfer rate (Tmax) of synchronous DRAM devices:
specifies how many times more bits the cell array fetches per column cycle then the data width of the device.
E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle.
The fetch width (FW) of the memory cell array of synchronous DRAMs is:
SDRAM: 1DDR: 2DDR2: 4DDR3: 8
DRAM type FW
3.2. Memory bandwidth (12)
The fetch width (FW) of the memory cell array
SDRAM: 1/7.5 x 1 = 133 MT/sDDR: 1/5 X 2 = 400 MT/sDDR2: 1/5 x 4 = 800 MT/sDDR3: 1/5 x 8 = 1600 MT/s
The peak transfer rates of the different DRAM technologies are:
Tmax = 1/tCCD x FW
3.2. Memory bandwidth (13)
3.2. Memory bandwidth (14)
Transfer rate(MT/s)
50
100
500
Year03 0596 97 98 99 2000 01 02 04 06 07 08
*
**
*
*
*
**
20
*
1000
SDRAM66
5000
200
2000
10
~ 10*/10years
DDR266
DDR2533
SDRAM100
DDR31067
DDR2667
DDR2800
DDR333
SDRAM133
*
DDR400
Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets
Peak transfer rates evolve by ≈ 10x/10 years,
that means doubling in 3-4 years
Sources of the evolution
the introduction of new syncronous DRAM technologies (SDRAM/DDR/DDR2/DDR3)
The evolution of peak transfer rates of synchronous DRAMs
3.2. Memory bandwidth (15)
More specifically
the more and more advanced approaches to improve first of all
• signaling (by using SSTL_2/1.8/1.5, differential CK/DQS)
• synchronisation (by using source synchronisation, DLLs to align CK with DQs etc.) and
• line terminations (by using ODT, dynamic ODT, ZQ calibration etc.)
The evolution of processor clock frequencies vs transfer rates of main memories in mainstream processors
3.2. Memory bandwidth (16)
5
10
50
Year
*
** *
2
8088
*
100
386
Pentium
Year of first volume shipment
cf
500
1000
20
200
*
486-DX2
79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978
*
*
**
*
*486
*
** *
*
** *
**
Pentium II
***Pentium III
*
286
*
Pentium Pro
1
486-DX4
2000 01 02 03
2000**
***
***
*
*
5000
Pentium 4
~10*/10years
~100*/10years
04 05
* * *
Leveling off(MHz)
Figure: Evolution of clock frequencies in Intel’s desktop processors
The evolution of processor clock frequencies (fC) in desktops
3.2. Memory bandwidth (17)
3.2. Memory bandwidth (21)
Transfer rate(MT/s)
50
100
500
Year03 0596 97 98 99 2000 01 02 04 06 07 08
*
**
*
*
*
**
20
*
1000
SDRAM66
5000
200
2000
10
~ 10*/10years
DDR266
DDR2533
SDRAM100
DDR31067
DDR2667
DDR2800
DDR333
SDRAM133
*
DDR400
Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets
The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets
5
10
50
Year
*
** *
2
8088
*
100
386
Pentium
Year of first volume shipment
cf
500
1000
20
200
*
486-DX2
79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978
*
*
**
*
*486
*
** *
*
** *
**
Pentium II
***Pentium III
*
286
*
Pentium Pro
1
486-DX4
2000 01 02 03
2000**
***
***
*
*
5000
Pentium 4
~10*/10years
~100*/10years
04 05
* * *
Leveling off(MHz)
Figure: Evolution of clock frequencies in Intel’s desktop processors
The evolution of processor clock frequencies (fC) in desktops between 1995-2003
3.2. Memory bandwidth (19)
clock frequencies arose by a rate of ≈ 100x/10 years
transfer rates of main memories only by a rate of ≈ 10x/10 years.
In the time period of about 1995 - 2003
3.2. Memory bandwidth (20)
clock frequencies arose by a rate of ≈ 100x/10 years
transfer rates of main memories only by a rate of ≈ 10x/10 years.
In the time period of about 1995 - 2003
a strong motivation arose to increase the bandwidth of main memories by increasing the width of the datapath to the main memory ,
first of all by introducing dual memory channels.
Dual memory channels became the commonplace even in desktops.
the gap between clock frequencies and memory transfer rates became continuously wider.
In this time period higher clock rates were the main source for higher proc. performance, but higher processor performance invokes higher memory traffic
3.2. Memory bandwidth (22)
In this time period
5
10
50
Year
*
** *
2
8088
*
100
386
Pentium
Year of first volume shipment
cf
500
1000
20
200
*
486-DX2
79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978
*
*
**
*
*486
*
** *
*
** *
**
Pentium II
***Pentium III
*
286
*
Pentium Pro
1
486-DX4
2000 01 02 03
2000**
***
***
*
*
5000
Pentium 4
~10*/10years
~100*/10years
04 05
* * *
Leveling off(MHz)
Figure: Evolution of clock frequencies in Intel’s desktop processors after about 2003
The evolution of processor clock frequencies (fC) in desktops after about 2003
3.2. Memory bandwidth (24)
After about 2003 however,
clock frequencies became saturated (due to meeting the thermal wall), and single core processors represented the mainline until about 2005.
the gap between clock frequencies and memory transfer rates became narrover.
Nevertheless,
beginning with ~ 2005
the era of multicores emerged
with doubling the core count about every two years.
A new scenario becomes dominant with steadily increasing bandwidth/transfer rate requirements.
3.2. Memory bandwidth (23)
In the time period of about 2003 - 2005
Beginning with about 2005
. double data rate SDRAM migration
3.2. Memory bandwidth (26)
Figure: Evolution of the bandwidth of dual-channel synchronous DRAM memory systems [56]
Figure: Evolution of transfer rates (per pin bandwidth figures) of different DRAM types [40]
3.2. Memory bandwidth (27)
Figure: Estimated maximum and minimum read latencies of DRAM devices (ns)
3.3. Memory latency (2)
1 Read latency of DRAM, FPM, EDO and BEDO parts = tRAC (Row access time (time from row address until data valid)) Read latency of SDRAM parts = CL + tRCD (CAS Latency + Row to Column delay)2 The 815 chipset supports SDRAMs while the 820 RDRAMs3A new revision of the 845 supports DDRs instead of SDRAMs
486 DX P PII PIII386 DX
86 8881 82 83 84 85 87 89 90 91 92 93 94 95 96 97 98 99
200
180
160
140
120
100
80
60
40
20
2000
*
PC AT
*
*
* *
**
**
*
*
64 K 64 K 256 K 256 K 64 M
Year
processor
Chipset
Typ. DRAMchips (bits)
(ns)
FPM
4 M
1 M 1 M 16 M 128 M
64 M
16 M
64 M
256 M
200
150
100
70
80
60
70
5060
50
35
FPMEDO
SDRAMEDO
SDRAMRDRAM
64 K
01 02 03 04 05 06 07
FPMFPMFPM FPM
64 K
P4
128 M
256 M
SDRAM
Core2
512 M
1 G
2 G
DDR2
*****
*
30
3025
40
24 22
256 K 256 K
256 M
512 M
1 G
DDR DDRDDR2
DDR3DDR2
40*
Desktop
DRAM type
Readlatency1
512 M
1 G
835865915845
256 M
512 M
1 G
8453
512 M
RDRAM
128 M
256 M
8152
8202
850
FPMEDO
SDRAM
4 M256 K
FPM
1 M
440ZX430VX430FX420TX 430LX
16 M
4 M
256 K
*100
80 *
Figure : Estimated typical system-level memory latency in x86-based PCs (in ns)
486 DX P PPro PII PIII386 DXPC AT(286)(8088)
P4
Memory latencyns
300
200
100
*
*
* **
155135 140
120
210
*200
86 8881 82 83 84 85 87 89 1990 91 92 93 94 95 96 97 98 99 2000 Year01 02 03 04 05 06 07 08
*160
*110
*85
*70
50
Core2processorChipset
Typ. DRAMparts (bits)
Desktop
DRAM type
16 K
DRAM
64 K 64 K
DRAMDRAM
64 K
128 K 128 K
256 K
256 K
1 M
DRAM FPM
DRAM FPM
256 K
FPM
4 M
1 M
256 K
FPM
1 M
420TX 430LX
16 M
64 M
EDOFPM
EDOFPM
SDRAM
4 M
430VX430FX
16 M
4 M 64 M
128 M
16 M
64 M
256 M
EDOSDRAM
RDRAMSDRAM
64 M
128 M
256 M
SDRAM DDR
845
256 M
512 M
1 G
8453
512 M
RDRAM
128 M
256 M
8152
8202
850440ZX
512 M
1 G
2 G
DDR2
256 M
512 M
1 G
DDRDDR2
DDR3DDR2
512 M
1 G
835865915
RDRAM
3.3. Memory latency (3)
Figure 5.1c: System-level memory latencies in x86-based PCs (in proc. clock cycles)
486 DX P PPro PII PIII386 DXPC AT(286)(8088)
P4 Core2processor
Chipset
Typ. DRAMparts (bits)
Desktop
DRAM type
16 K
DRAM
64 K 64 K
DRAMDRAM
64 K
128 K 128 K
256 K
256 K
1 M
DRAM FPM
DRAM FPM
256 K
FPM
4 M
1 M
256 K
FPM
1 M
420TX 430LX
16 M
64 M
EDOFPM
EDOFPM
SDRAM
4 M
430VX430FX
16 M
4 M 64 M
128 M
16 M
64 M
256 M
EDOSDRAM
RDRAMSDRAM
64 M
128 M
256 M
SDRAM DDR
845
256 M
512 M
1 G
8453
512 M
RDRAM
128 M
256 M
8152
8202
850440ZX
512 M
1 G
2 G
DDR2
256 M
512 M
1 G
DDRDDR2
DDR3DDR2
512 M
1 G
835865915
Memory latencyin proc. cycles
86 8881 82 83 84 85 87 89 1990 91 92 93 94 95 96 97 98 99
100
10
12000 Year
50
1000
3020
500
200
23
5
*
*
*
10
40
85
300
**
*
1 1
3
01 02 03 04 05 06 07 08
* **
*240 220 280
180RDRAM
3.3. Memory latency (4)
[1]: 64MB Apple G3 Beige 168p SDRAM DIMM, http://www.memoryx.net/apl168s64.html
[2]: 4, 8 MEG x 32 DRAM SIMMs, Micron, http://www.pjrc.com/mp3/simm/datasheet.html
[3]: 168 Pin, PC133 SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.2
[4]: 184 Pin Unbuffered DDR SDRAM DIMM Family, JEDEC Standard No. 21-C, Page 4.5.10
[5]: Direct Rambus DRAMM RIMM Module, 512 MB, MC-4R512FKE6D, Elpida, http://pdf1.alldatasheet.com/datasheet-pdf/view/60081/ELPIDA/MC-4R512FKE6D.html
[6]: DDR2 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist
[7]: DDR3 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist
[8]: DDR2 SDRAM FBDIMM Features, Micron, http://www.micron.com/products/modules/fbdimm/partlist
[9]: Torres G., „Memory Tutorial”, July 19, 2005, Hardwaresecrets, http://www.hardwaresecrets.com/article/167/1
[10]: Besedin D., „First look at DDR3”, Digit-life, June 29, 2007, http://www.digit-life.com/articles2/mainboard/ddr3-rmma.html
4. References (1)
[11]: http://www.hardwaresecrets.com/fullimage.php?image=2862
4. References (2)
[12]: http://cgi.ebay.com/Vintage-Microsoft-8-Bit-ISA-PC-RAM-Card-W-Gold-5150_ W0QQitemZ310017171151QQcmdZViewItem
[13]: http://www.hardwaresecrets.com/fullimage.php?image=2856
[14]: http://www.memex.com.au/images/72psimm.jpg
[15]: Ahn J.-H., „DRAM Operation & Architecture,” 2007. 9. 10., Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf
[16]: http://www.twinmos.com/dram/dram_p_dt_ddr.htm#s
[17]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg
[18]: http://www.twinmos.com/dram/dram_p_dt_ddr3_1333.htm#s
[19]: http:// item.express.ebay.com/16mb-EDO-3-3V-72-Pin-SODIMM-LAPTOP-RAM- LAPTOP-16mb-EDO_W0QQitemZ230060958674QQihZ013QQcmdZExpressItem
[20]: http:// www.twinmos.com/dram/dram_p_nb_sdr_sodimm.htm
[21]: http:// www.cdw.com/shop/products/default.aspx?EDC=915882
[22]: http:// laptoping.com/category/laptop-memory
[23]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg
[24]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg
[25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ SD18C32_64_128x72D.pdf
[25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ sd18c32_64_128x72.pdf
[26]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr/ DDF18C64_128x72D.pdf
[27]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr2/ HTF18C64_128_256x72D.pdf
[28]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr3/ JSF18C256x72PD.pdf
[29]: PLL Clock Driver for 2.5V DDR-SDRAM Memory, Datasheet, Pericom, Febr. 2003, http://www.pericom.com/pdf/datasheets/PI6CV857.pdf
[30]: PC2100 and PC1600 DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Revison 1.3, Jan. 2002, http://www.jedec.org/download/search/4_20_04R13.PDF
[31]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/
[32]: http://www.pricegrabber.com/search_getprod.php/masterid=3191326
[33]: Definition of CDCV857 PLL Clock Driver for Registered DDR DIMM Applications, JESD82, JEDEC, July 2000
4. References (3)
[34]: http://www.tranzistoare.ro/datasheets2/32/327037_1.pdf
[35]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf
[36]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf
[37]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf
4. References (4)
[39]: Van Roon T., „What exactly is a PLL?,” April 2006, http://www.uoguelph.ca/~antoon/gadgets/pll/pll.html
[38]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, http://www.mosaid.com/corporate/products-services/ip/ SDRAM_Controller_whitepaper_Oct_2006.pdf
[40]: Choi J. H., „High Speed DRAM,” Memory Division, Samsung, 2004, http://asic.postech.ac.kr/1.Nrl/2.NRL%20Seminar/invitation/041208ChoiJH.pdf
[42]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc.
[41]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/ cse320_f07/xilinx_intro.ppt
[44]: Tam S., „Single Error Correction and Double Error Detection,”, XILINX Application Note XAP645 (v.2.2), Aug. 2006, http://www.xilinx.com/support/documentation/ application_notes/xapp645.pdf
[45]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org
[46]: Understanding DDR3 Serial Presence Detect (SPD) Table, July 17, 2007, Simmtester, http://www.simmtester.com/PAGE/news/showpubnews.asp?num=153
[47]: DDR2 DIMM SPD Definition, August 25, 2006, http://docmemory.com/page/news/showpubnews.asp?num=141
[48]: Memory Module Serial Presence-Detect, TN-04-42, Micron, 2002 http://download.micron.com/pdf/technotes/TN_04_42_C.pdf
[43]: 64-bit Flow-Thru Error Detection and Correction Unit, IDT49C466, Integrated Device Technology Inc., 1999, http://www.digchip.com/datasheets/parts/ datasheet/222/IDT49C466.php
4. References (5)
[49] Intel 845 Chipset: 8245 Memory Controller Hub (MCH) for DDR, Datasheet, Jan. 2002, Intel, No. 298604-001
[51] Supermicro X6DH8-G2, X6DHE-G2 Mainboards User’s Manual, Rev. 1.1b, June 2007, SUPER MICRO Computer Inc.
[50] Intel 975X Express Chipset: 82975X Memory Controller Hub (MCH), Datasheet, Nov. 2005, Intel, No. 310158-001
[54]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1
[55]: PCI Technology overview, Febr. 2003, http://www.digi.com/pdf/prd_msc_pcitech.pdf
[56]: DDR3 SDRAM, Samsung, http://www.samsung.com/global/business/semiconductor/ products/dram/Products_DDR3SDRAM.html
[57]: Le H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, 2007. pp 639-662
[58]: Kanter D., „Inside Barcelona: AMD's Next Generation,” May 2007, http://www.realworldtech.com/includes/templates/articles.cfm? ArticleID=RWT051607033728&mode=print
[59]: Golla R., „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006 http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf
[60]: Hofstee P., „Tutorial: Hardware and Software Architectures for the CELL BROADBAND ENGINE processor”, IBM Corp., September 2005 http://www.crest.gatech.edu/conferences/cases2005/pdf/Cell-tutorial.pdf
4. References (6)
[52]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm
[53]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm
[63]: http://www.supermicro.com/manuals/motherboard/E7221/MNL-0776.pdf
[65]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm
[64]: http://www.supermicro.com/manuals/motherboard/7300/MNL-0955.pdf
[66]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf
[67]: 204-Pin DDR3 SDRAM Unbuffered SO-DIMM Design Specification, JEDEC Standard No. 21C, Page 4.20.18-1
[68]: Jacob B. & Wang D., „Memory Systems: Circuits, Architecture and Performance Analysis,” Lecture notes, University of Maryland, ENEE759H, Spring 2005
[69]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf
[70]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf
4. References (7)
[61]: 915 P/G Combo Mainboard (MS-7058) Manual, Mai 2004, MSI
[62]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, http://www.intel.com/technology/magazine/ computing/fully-buffered-dimm-0305.htm