Advanced Memory Technology - #1 Factor for Energy Efficient System -
2012. 09
Kenny Han
Samsung Semiconductor Inc.
Legal Disclaimer
This presentation is intended to provide information concerning supercomputer and memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.
The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
Why Exa-Scale Computing?
Data explosion by social media, on-line games, cloud computing, …
Over 1 ZETA bytes in 2010 and keep rapidly growing
1
10
100
1,000
10,000
100
1,000
10,000
100,000
10 11 12 13 14 15 16 17 18 19
Created Data(Exa Byte) #1 Super Computer PFLOPS
[Exa Byte] [PFLOPS]
[Source : IDC(2011) & Top500.org(2011) & Estimated by extrapolation]
* Log Scale
Break thru Exa FLOPs
Break thru 1 Zeta Byte
Memory Wall
Widening of the gap between CPU and Memory performance
• Memory becomes more important than ever
• Requires larger capacity, higher performance, and better power efficiency
0.0
0.1
1.0
10.0
100.0
1,000.0
10,000.0
100,000.0
1,000,000.0
10,000,000.0
100,000,000.0
1980 1985 1990 1995 2000 2003 2005 2010 2015
Effective CPU cycle time
CPU cycle time
SRAM access time
DRAM access time
Flash SSD access time
Disk seek time
[ns]
HDD
SSD
DRAM
SRAM
CPU
DRAM vs. CPU
Exa-Scale
Memory Capacity (System)
~10PB
Memory BW (Node)
0.5~1TB/s
Power (System)
~20MW
[Carnegie Mellon Univ. & Samsung Estimation]
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
Memory Capacity/System Memory Capacity/Node DRAM Component Capacity
Capacity Requirements in Exa-Scale Computing
Requires more than 70x memory capacity
Peta-flops 20Peta-flops Exa-flops
100~200TB
>5x (~750TB)
>70x (~10PB)
Now
2~4GB
>4x (~16GB)
>32x (~128GB)
512Mb~1Gb
>2x (~2Gb)
>8x (~8Gb)
2018
[Source: “Memory systems for PetaFlop to ExaFlop class machines” by IBM, 2007 & 2010]
Memory Capacity Requirements
GB/s/node Gb/s/pin
Peta-flops 20Peta-flops Exa-flops
400~600Mbps
3.7x (~1.6Gbps)
12.5x (~5.3Gbps)
Now
10~20GB/s
7.5x (~100GB/s)
100x (~1.4TB/s)
Bandwidth Requirements in Exa-Scale Computing
Requires more than 100x bandwidth per node
2018
[Source: “Memory systems for PetaFlop to ExaFlop class machines” by IBM, 2007 & 2010]
Memory Bandwidth Requirements
Power Efficiency Requirements in Exa-Scale Computing
Requires at least doubled memory power efficiency
• Larger capacity and higher bandwidth require extreme power efficiency
System Node DRAM
Peta-flops 20Peta-flops Exa-flops
Now
0.2~0.4MW
5~10W
0.2~0.4W
6x
4x
2x
37x
16x
4x
Now
1x 1x
~0.5x
1x
~0.6x
~0.3x
Peta-flops 20Peta-flops Exa-flops
[Source: “Memory systems for PetaFlop to ExaFlop class machines” by IBM, 2007 & 2010]
2018
2018
Memory Power Projection Power Efficiency
W/Gbps W/GB
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
Memory Module Capacity Trend
Larger capacity mono die introduction is getting delayed
• Technology difficulty & tremendous investment
• Disruptive technology is necessary to increase module capacity more
[GB]
16
32
[Year]
64
* Highest Module Capacity with Mono-component
4Gb
8Gb @ 20nm Class
16Gb @ 10nm Class
2011 2015 2018
Subject to cost/energy efficiency, scaling, …
Performance Trend
[Mbps]
1333
1600
1866
2133
2400/2667
3200 Limitation by memory I/F & channel environment
Limitation by physical channel environment & stacking technology
3600
4200
Feeding data with enough BW will be the 1st Challenge for exa-scale Computing
DDR5(?) & New I/F(Optic)
DDR4
*Highest speed in that year
2011 2015 2018 [Year]
Subject to cost/energy efficiency, scaling, …
Channel Environment Challenge
Limitation of physical channel environment
• Validity of multi-drop bus architecture beyond DDR4 to support > 25.6GBps/Ch. ?
New approach for memory sub-system
• Differential signaling or serial link ?
• New bus architecture like point-to-point, daisy-chained, …
• New technology like optical I/O
DDR4 Post DDR4*
Speed/pin 1.6~3.2Gbps ~6.4Gbps
BW/Ch. ~25.6GB/s ~51.2GB/s
* Expectation
Memory Power Efficiency Trend
Aggressive process shrink provides better power efficiency and performance
20nm class 4Gb 1.35V can offer the most efficient power usage
50nm class DDR3 2Gb 1.5V
40nm class DDR3 2Gb 1.5V
30nm class DDR3 2Gb
1.35V
20nm class DDR3 4Gb
1.35V
20nm class DDR3 4Gb
1.25V*
Considered with an 8 hours active and 16 hours idle status in server
Green
DDR3-G2
Green
DDR3-G1
Source: Samsung Lab.
Green
DDR3-G3
Memory Power Consumption as measured in 96GB server
67% Savings
35% Savings
* Optional Solution
Technology Interface Density Voltage
65.3W
33.6W 21.8W 20.3W
42.8W
34% Savings
21% Savings
10.0
100.0
1000.0
Only Process Shrink
With New Technology
Efficiency improvement by process shrink only is limited
Additional technology aids to reduce the power consumption more
[Source : Samsung]
Traditional power reduction Needs more new technology
[mW/GB]
Power Efficiency Limitation Problem
CAGR(’10 – ’14) : -36%
CAGR(’14 – ’20) : -12%
CAGR (’14 – ’20) : -18%
40nm class 30nm class 20nm class 10nm class Sub 10nm class
Sub 1.0V
2011 2015 2018
TSV Optic I/O
* Log Scale
* Estimated in 8hr active and 16hr standby
- Considered speed and supply voltage by generation
Technology Scaling Challenges
DRAM is not free any more • Scaling approaches a theoretical limit
• Technology difficulty & large investment
Very difficult to achieve economical scaling
40nm class 30nm class 20nm class 10nm class Sub 10nm class
2011 2015 2018
Cost/bit
Investment/WF
Transition Period
Requires Close-Collaboration between All Parties: CPU, Memory, S/W, etc
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
TSV(Through-Silicon Via)
TSV offers less power consumption and higher performance by hiding electrical loading
Still concern on cost, even if matured, intrinsic cost overhead exist…
Traditional RDIMM 3DS-TSV RDIMM
Power Comparison
☞ Merits 3DS-TSV consumes 17% lower power
17% Saving
* Measured by 32GB RDIMM @ 2DPC
TSV Conventional Stacking
Pros
• Short Interconnection (< ~50um) • Lower Profile • More # of Interconnects (>1000ea)
Low Cost & Matured Technology
Cons High Cost • Long Loop Wires • Higher Profiles • Limitations in # of Interconnects • Overhang
Optical Interface
Mainly has been utilized in long distant communication
Interface Power(I/O & Termination) getting dominant
• Memory sub-system is relatively short channel environment
• Potential to get ~60% better power efficiency over DDR3 interface
• Opportunity to expand # of slots and support high pin speed
• How to implement power efficient and low cost optical I/O solution?
40nm class 30nm class 20nm class
I/O Power & Termination RCD Power Core Power
29% 34% 40%
18% 21% 24%
53% 45% 36%
0
10
20
30
40
0 2 4 6 8 10 12
Bandwidth / pin (Gbps)
Pow
er
Cost
(mW
/Gbps)
DDR2 0.8Gbps
DDR3 1.6Gbps
XDR 3.2Gbps
GDDR55.0Gbps Optical
10Gbps
Optical GDDR5
DDR3 1.6Gbps
DDR2 0.8Gbps
XDR 3.2Gbps
Pow
er
Cost
[mW
/Gbps]
Bandwidth / pin[Gbps]
I/F power getting dominant Optical I/O : lower Power per B/W
* Source : Samsung
60%
Source: Hot chip 2011
Controller Offloading
Potential advantages by additional functionalities in logic die
• Distributed small scale computing
• Reduced controller complexity & increased performance
• Better system power efficiency with reduced data traffic
• Additional logic to enhance device reliability and DRAM scaling extension
• Supporting heterogeneous memories DDR3, DDR4, PRAM, Flash, MRAM etc.
Technology agnostic High speed Memory Interface
Offloading Control & Computing
TSV
Memory Cube
Memory Sub-System Candidate
Important to figure out the memory sub-system satisfying Capacity, Performance, and Power efficiency requirements of HPC
Ch.0
Ch.1
Memory Cube Module
Optical I/F
High BW & Power Efficiency
Multi-drop Channel
Large Capacity & Power Efficiency
Large Capacity
L1
L2
Memory Cube
DDR4
Additional Layer with Higher BW, small capacity DRAM
, OR
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
The Performance Gap : Flash Is A Good Fit
Huge latency gap between memory and disk drive
• Only 1.3X mechanical access improvement in HDD for 13 years
• Flash storage is a good gap filler to minimize the distance
Latency Gap
Flash storage is a good
candidate to fill the gap
“I/O Memory Tier” (Typically 50us ~ 300us Latency)
Limiting System Performance
SSD Performance Today & Tomorrow
SSD provides extremely higher performance than disk drive
• Unlocking true NAND Flash potential with host interface enhancement
• >20x Latency, >100x IOPs, and >3x Sequential Performance
ms
0
2000
4000
6000
8000
10000
12000
0
2
4
6
8
10
12
14
16
7.2K RPM 15K RPM SSD
Rotational Latency AVG Seek IOPS
<0.3ms
Latency & IOPS
2004 2006 2008 2010 2012 2014 2016
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
2005 2008 2011 2014
HDD SSD
SAS
SATA
MB/s
PCIe x4
Interface & Performance
>100x
Power Saving with SSD
Enhanced energy efficiency and smaller footprint through SSD
• Higher single device performance reduces number of drives
• A case study shows 73% power reduction with a half number of drives
[W]
2730 W
736 W
0
500
1000
1500
2000
2500
3000
HDD only HDDs+SSDs
Aggregated SAS
5% 200GB SSD 25ea. 95% 7.4K RPM 2TB SATA 48ea.
15K SAS
Tired Storage
7.5K SATA
100% 15K RPM 600GB SAS 167ea.
73% reduction/system Smaller footprint
(~ ½ number of drives )
Source : EMC, 2011 Storage Capacity: 100TB
Case Study
..… ..…
NAND Flash Innate Characteristics
Sop
his
tica
ted
NA
ND
Fla
sh
Man
age
me
nt
Tech
no
logy
High Performance & Reliable
SSD
• NAND characteristics
- No overwrite
- Page operation
but block erase
• NAND suffers from
- P/E cycle
- Data retention VS.
NAND flash management technique is different
Managing Endurance
Facing challenges to maintain P/E cycles with process shrink
• To extend SSD life time, it needs sophisticated HW & SW technologies
- Continued Evolution in Controller Technology
- Adaptive Management & Tuned NAND Flash
- Application Awareness
NAND Flash Endurance
10K
100K
2004 2008 2012
100K
50K ~100K
10K
5K
3K
1.5K
2016
MLC Endurance
SLC Endurance Shrink Rate Slows Down
Reliability Degrades
Performance Deteriorates
Years
SSD Requirement
How To Utilize SSD Value
Active collaboration is the key for SSD value capture in HPC
• SSD characteristics is too variant to be a generic solution for all storage needs
• Engineered systems allows to accelerate the benefits of flash while avoiding the pitfalls
Active Collaboration
NAND Flash Controller
F/W
Host System & Application
New Architecture, Algorithm, Software, File System, Application Dependant, Metadata, …
7.2K rpm, 15K rpm
From Simple Replacement To Transformation
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
New Memory Technology : STT-MRAM
Non-volatility and comparable to DRAM
• Highly desired solution, but technology maturity?
Outer Better
Retention
Tech. Maturity
Speed (RD/WT)
Standby Power
Endurance Bit cost
(Scalability)
DRAM
STT-MRAM
STT-MRAM Cell Structure
How to Capture Non-Volatility Value?
CPU
Registers
SRAM Cache
DRAM
Storage
Disk
CPU
Registers
SRAM Cache
STT-MRAM
Storage
Disk
Simple Replacement of DRAM ? [Many hurdles]
• Initial large investment • Technology difficulty • Process maturity • Etc…
Collaboration to recognize as a value added device
• Maximize values and advantages of non-volatility in STT-MRAM
• Build up eco-system: H/W, S/W including OS, …
− Software overhead minimization
− New architecture combining memory and storage
− New application with non-volatile buffer
• Additional value creation will dilute initial large investment
Contents
1. Introduction
2. DRAM Requirements in Exa-Scale Computing
3. DRAM Technology Evolution & Challenges
4. DRAM Solution Consideration
5. SSD for Storage Solution
6. New Memory Technology : STT-MRAM
7. Summary
Exa-Scale computing requires large capacity, high bandwidth, and power efficient memory sub-system
DRAM technology is getting difficult and expensive
A revolutionary and innovative memory solution is mandatory
• Continued scaling, TSV, Optical IO, Controller offloading, …
SSD is a good candidate for efficient storage system
• Very high performance and power efficient device
• Different characteristics require different optimization
Very high potential in STT-MRAM, but technology is not yet
Call to action for active and strong collaboration between all parties
Summary
Top Related