Post on 30-May-2020
NVRAMOS 2008Jeju Island, April 25, 2008
FRASH: Exploiting Storage Class Memory FRASH: Exploiting Storage Class Memory in Hierarchical File System
YoujipYoujip WonWon
Dept. Dept. of Electronics & Computer of Electronics & Computer EngineeringEngineeringHanyangHanyang UniversityUniversityHanyangHanyang UniversityUniversity
Seoul, KoreaSeoul, Korea
1/61Youjip Won
Outlines
Motivation
Byte Addressable NVRAM and Flash
Maintaining Flash as Storage Device: Log-Structured vs. FTL
FRASH: Hybrid File System for Hierarchical Storage
Performance AnalysisPerformance Analysis
Conclusion & Future Work
2/61Youjip Won
Background
Advancement of Large Scale NAND Flash
Advancement of Byte Addressable NVRAM
Does the current file system technology
effectively exploit their physical characteristics?effectively exploit their physical characteristics?
3/61Youjip Won
NAND Flash Trend
referred from samsung electronics site
4/61Youjip Won
Byte-addressable NVRAM Technology Trend
Source: FRAM: Nikkei Elec., MRAM: NEDO(Japan)
NVRAM Technology Trend
400010000
0.5 GByte/Chip
20001000
4000
1000
2561000
MRAM
FRAM
64
256
256128
100
ensi
ty [M
bit] “profitable”, Bytea
ndSwitch, 2006/6
410
De
referred from samsung electronics site
12004 2006 2008 2010 2012 2014
Years
5/61Youjip Won
NVRAMs
Items FRAM PRAM NOR NAND
Byte Addressable Yes Yes Yes (read only) No
Non-volatility Yes Yes Yes Yes
Read 85ns 62ns 85ns 16us
Write/Erase 85ns / none 300ns / none 6.5us / 700ms 200us / 2ms
Power Consumption Low High High High
Capacity Low Middle Middle High
Endurance 1E15 >1E7 100K 100K
B/LB/L B/LB/LB/LB/L B/LB/L
Unit CellW/L
P/L
W/L
P/L
W/L
VSS
W/L
VSS
W/L
Source
W/L
Source
W/L
Source
W/L
Source
6/61Youjip Won
P/LP/L VSSVSS
Some fact on Solid State Disk
Device Sequential Random 8KB Price $ Power iops/$ iops/watt
SCSI 15k rpm 75 MBps 200 iops 500$ 15 watt 0.5 13
SATA 10k rpm 60 MBps 100 iops 150$ 8 watt 0.7 12
Flash - read 53 MBps 2,800 iops 400$ 0.9 watt 7.0 3,100
Flash - write 36 MBps 27 iops 400$ 0.9 watt 0.07 30
< source : “Flash Disk Opportunity for Server-Application”, Microsoft Research >
excerpt from S. Kang, “NVRAM for Write Buffer in SSD”, NVRAMOS 2007, Jeju, Korea
7/61Youjip Won
Limitation
FLASH
Slow write or mount delay!
Page write/block erase slow write performance
Byte addressable NVRAMByte-addressable NVRAM
Small and expensive!
FRAM: still needs more density
p
PRAM: write speed is slow.
MRAM: power consumption
8/61Youjip Won
MRAM: power consumption
Objective
New file system for hybrid storage of
Byte addressable NVRAM
NAND Flash
9/61Youjip Won
Related works
Booting time acceleration
snapshotboot: longer unmount time
RFFS: mount time is subject to flash device size
MNFS: large block size
yaffs2/3yaffs2/3
NVRAM: MRAMFS, HERMES, PRIMS, CONQUEST
Memory file system: RAMFS, TMPFS
10/61Youjip Won
Objective
New File System for byte addressable
NVRAM and Flash
Faster mount time
Robust against crash
Faster I/O
11/61Youjip Won
Limitation of Flash Memory: Write/Erase
Page write/block erase
Erase before write on dirty page.
Write unit and erase unit are different.
1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1Write 0xCC(11001100)
Write 0xF0(11110000)
Erase blockInitial status Initial status
Can’t write
Wear-out
an upper bound on the number of write/erase cycle of a
flash memory block.
12/61Youjip Won
NAND Flash
1 Block = 528 Bytes X 32 Pages1 Block 528 Bytes X 32 Pages= 16 K + 512 Bytes
Erase Unit
Write unit and erase unit are different!!
Page 1 Page = 512 Bytes + 16 Bytes= 528 Bytes
Write Unit
Data area512 Bytes
Spare area16 Bytes
Write Unit
13/61Youjip Won
Characteristics of Flash Memory
Out-place update: Erase-before-write
Basic Operations
Read : Page (512byte)Read Page (512byte)
Write (program) : Page
Erase : Block (1Page × 32)Erase : Block (1Page × 32)
Asymmetric cost
Read : 20 ㎲
Write : 200 ㎲
Erase : 2 ㎳
14/61Youjip Won
Limitations of Flash Memory: Garbage collection
(1) ▶ Select the dirtiest …
Valid page number 4 6 3:
(1)
(2)
block
▶ Copy the valid d t…(2) data pages
…(3) ▶ Erase the block
free valid invalid
15/61Youjip Won
NAND Flash File System
Hide the erase operation from the upper layer.
Efficient Garbage collection
M i t i fMaintain performance
16/61Youjip Won
Two approaches for NAND Flash File System
Flash Translation layer(FTL): Samsung, Intel,…
Log Structured File System: Android(Google)
Applications ApplicationsApplications
File System
Applications
Flash File System
Flash Memory
FTL (Flash Translation Layer
Flash Memory
Flash File System
F ash M mory
FTL Flash File System
17/61Youjip Won
Flash Translation Layer (FTL)
File System
Controller
ROM SRAM
Mapping TableMapping Table
FTL codes
Physicaladdress
NAND Flash deviceNAND Flash device
Flash Translation Layer(FTL)
18/61Youjip Won
Flash Translation Layer(FTL)
excerpt from D. Lee, “An Efficient Buffer Management Scheme for Implementing a B-Tree on NAND Flash Memory” NVRAMOS 2007 Jeju Korea
19/61Youjip Won
NAND Flash Memory , NVRAMOS 2007, Jeju, Korea
Flash Translation Layer (FTL)
Strength
Can use conventional file systems
Weakness
Encumbered by patentsEncumbered by patents
Software FTL on Linux has bad performance.
Hardware FTL has power consumption problem.
20/61Youjip Won
Log Structured File System
Out-place update
All pending writes are buffered in memory into a single
segment
Flush the segment into the disk as a log in order to use
disk full bandwidthL t d l
Need a map to find file metadata
B ll fil t d t tt d ll th l
Long mount delay
Because all file metadata are scattered all over the log
Need a space management mechanism
Segment cleaner (garbage collector)
21/61Youjip Won
Native file system approach terminology
Object: File/directory
PATI (Physical Address Translation Information)
logical address physical address
File metadata(File metadata, e.g. inode)
Object type Name File size EtcObject type, Name, File size, Etc
Page metadata(PM)
Block status – Information about bad block
Data status – Information about invalid page
Page ECC, Page information tuple
22/61Youjip Won
Data Structures in native File system for FLASH
Object ObjectObject …
Object
Physical Address Translation InformationMainMemory
ParentPhysical Address Translation Info
…
Flash Device
File Metadata page File data page Empty page
23/61Youjip Won
Data Structures in main memory
Objectj(/)
/childrenFM (/)
FM (file1)Object(file1)
Object(dir1)
file1
sibling
PATI hilddir1
FM (file1)
Data (file1)
Data (file1)
PATI(file1)
Object(file2)
file1
file2
PATI childrenFM (dir1)
FM (file2)
PATI
file2
PATI
Directory Structure
Data (file2)
…
Fl h D i PATI(file2)
yFlash Device(FM: File Metadata)
24/61Youjip Won
Data Structures in Flash Device
File Metadata
Data
PM
PM
Block Status Information
Data Status Information
file_number
file_page_numberData
Data
File Metadata
PM
PM
PM
Flash device
Page ECC
Page Information Tuple
file_byte_count
version
ECCFlash device g p
Page Metadata (PM) Page Information Tuple
25/61Youjip Won
Mount Operation: disk in-core
Data PM
FM PM
Data PM
Object
Data PM
Data PM
Data PM
Data PM
... ...
... ...
... ...
FM PMSCAN Physical Page
addressData PM
PATI in Main Memory
address
26/61Youjip Won
Objective
Use byte-addressable RAM and FLASH
Exploit the physical characteristics of them.!!!
Better file system
FRASH: File system for FRAM and Flash
27/61Youjip Won
Design of Hierarchical file system for Hybrid storage
Design choice
Location of each file system component
Device friendly data structure of file system componentDevice-friendly data structure of file system component
View on the byte-addressable NVRAMBlock device vs. RAM
28/61Youjip Won
Location of file system component
29/61Youjip Won
Device Friendly Data Structure
30/61Youjip Won
B-NVRAM: memory vs. storage
DRAM
Main Memory Main Memory
DRAM DRAM
B-NVRAM B-NVRAM B-NVRAM
FLASH FLASH FLASHFLASH FLASH FLASH
31/61Youjip Won
Block device Block device
Design Choice 1: Metadata at NVRAM
Objective: Reduce the file system mount delay
File metadata and page metadata in b-NVRAM
Avoid flash scan overhead in file system mount
Data in NAND FLASH NVRAMData in NAND FLASH NVRAM
Issue
Synchronization overhead between different storage
hierarchy( byte-addressable NVRAM and NAND Flash)
32/61Youjip Won
Metadata at NVRAM(without modification)
… ...………
PMPMPMPM PM PM
Structure Scan operation
chunkID = 0
Data PM
Data PM
File metadata PM
PM PartPM Part
PMPMPM
PMPMPM
PMPMPMPM PM PMPM PM PM
PM PM PM
chunkID = 0
Data PM
Data PM
Data PM File metadata P i PFile metadata P i t P t
………
File Information
Data PM
Data PM
File metadata PM
D PM
Pointer PartPointer Part
File metadataFile metadata
File Information
Data PM
Data PM
Data PM
D PM
File metadata PartFile metadata Part
File metadataFile metadata
Data PM
… …
NAND FlashFRAM SDRAM
33/61Youjip Won
FRAM SDRAMhas no pointerhas pointer to corresponding File metadata
Issues
We do not need ECC for data at FRAM.
Remove one level of indirection for accessing FRAM dataRemove one level of indirection for accessing FRAM data.
file_number
file page number
Block Status Information
Data Status Information file_page_number
file_byte_count
version
ECC
Data Status Information
Page ECC
Page Information Tuple
ECCPage Information Tuple
Page Metadata (PM)
file_number
file_page_number
Block Status Information
Data Status Information
file_byte_count
version
index
Page ECC
Page Information Tuple
34/61Youjip WonPage Information Tuple
g p
Page Metadata (PM)
Metadata at NVRAM(without ECC in PM)
Page Metadata part
Index Pointer
PM
File Metadata part
D t PM PM
PM
PM…
ObjectData
File Metadata
Data…
PM
PM
PM
SC
AN
File Metadata
PMFile Metadata PM
(PM: Page Metadata)
empty
File Metadata
Ph i l P ddFile Metadata
empty
Flash Device NVRAM Main Memory
Physical Page address
35/61Youjip Won
B-NVRAM data structure
chunkId = 0 chunkId = 0역 chunkId = 0chunkId = 0 chunkId = 0chunkId = 0역rea
tag
chunkId = 0
tag
chunkId = 0
Spar
e 영 tagtag
chunkId = 0chunkId = 0
tag
chunkId = 0
tag
chunkId = 0
Spar
e 영a
r
topPointer topPointer
tag
topPointer
tag
S영
역 topPointertopPointer topPointer
tag
topPointer
tag
topPointer
tag
topPointer
tag
S영
역rea
Object Header
empty
t
empty
Object Header
t
Object Header
t
Object Header
Hea
der
영 Object Header
empty
t
Object Header
empty
t
empty
Object Header
t
empty
Object Header
t
Object Header
t
Object Header
Object Header
t
Object Header
Hea
der
영a
empty
empty
empty
empty
empty
empty
empty
empty
emptyObj
ect H empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
empty
emptyObj
ect H
파일 1 생성 후 파일 2 생성 후 파일 1 삭제 후
empty emptyemptyO
파일 1 생성 후 파일 2 생성 후 파일 1 삭제 후
emptyempty emptyemptyemptyemptyO
After creating file 1 After creating file 2 After deletion of file 2
36/61Youjip Won
Design Choice 2: right Data Structure
Why disk data structure in b-NVRAM?
File system mount
Scan the metadata(FRAM, NAND , or whatever),
Parse it and translate it into RAM friendly structure a k a Parse it and translate it into RAM-friendly structure a.k.a.
in-core data structure(file system mount).
What is the right data structure for b-NVRAM?g
37/61Youjip Won
Exploiting Byte-addressability
Use RAM-like representation of file system in byte addressable
NVRAM.
File system accesses byte addressable NVRAM directly File system accesses byte addressable NVRAM directly
instead of accessing RAM.
IIssues
Synchronization problem. Flash memory still has metadata.
Performance degradation: B-NVRAM is slower than DRAM.
38/61Youjip Won
RAM-like Data Structure
Device Info Structure – Partition status information
P Bi A I f i b i Page Bitmap Array – Information about page in-use or not
Block Info – Block Status Information
Page Bitmap Array
Device Info Structure
Block Info
Object Info linked list
PAT Info linked list
Fixed Size
Dynamic Size
…
…
PAT Info linked list
…
…
(PAT: Physical Address Translation)
39/61Youjip Won
NVRAM
Design Choice 3: Block Device vs. RAM
B-NVRAM: shall we see it as Block device or RAM?
Both of them?Both of them?
In fact, neither of them!!!
40/61Youjip Won
Final Design
All page metadata and file metadata is moved to FRAM from
fl h flash memory.
Flash memory no more have file metadata and page
metadata.
File system does not need to access flash memory when y y
metadata operation is executed.
41/61Youjip Won
Final Design
Exploit RAM!
Maintain all data structures used by kernel in FRAM
Copy(not mount) to DRAM at Mount time.
DRAM data is copied to FRAM at Unmount time
Synchronization between FRAM and DRAM is coarse.
No defense and recovery method is implemented in crash
condition.
42/61Youjip Won
Final Design
File Metadata
Data
Page Metadata
Page Metadata Storage System Part
Data
File Metadata…
Page Metadata
Page Metadata…
Device Info
Page Bitmap Array
Block Info
Main Memory Part
Object Info
PAT Info
…
Flash Device NVRAM
…
43/61Youjip Won
Mount Operation at Final Design
File Metadata
Data
File Metadata…
Page Metadata
Page Metadata
Page Metadata
Data
…
Page Metadata
…
Device Info
Object
Page Bitmap Array
Block Info
Object Info
Copymount-time
Copyj
PAT Info
…
…
Copyunmount-time
Flash Device NVRAM Main Memory
Physical Page address
44/61Youjip Won
Implementation
Hardware: SMDK 2440
Core clock: 400 MH
Memory bus: 100 MHz
OS: Linux 2.6 kernel
Hierarchical StorageHierarchical Storage
64 Mbit FRAM, 5.6 MHz(180 ns)
128 Mbyte NAND Flash(Smart Media Card)
45/61Youjip Won
Memory Map in SMDK2440
SDRAM
SDRAM(Bank7)
0x3800_0000
0x4000_0000
SROM(Bank5)
0x2800 0000
SDRAM(Bank6)
0x3000_0000
SROM(Bank3)
SROM(Bank4)
0x2000_0000
_
FRAM
SROM(Bank2)
0x1000_0000
( )0x1800_0000
Boot InternalSRAM (Bank0)
0x0000_0000
(Bank1)0x0800_0000
46/61Youjip Won
Memory Map in SMDK2440
SMDK2440 BoardSMDK2440 BoardSMDK2440 BoardSMDK2440 Board
S3C2440 CPUS3C2440 CPU
FRAM Artwork
8MB FRAM
FRAM Artwork
8MB FRAM
M E i PiM E i Pi
47/61Youjip Won
Memory Extension PinMemory Extension Pin
• Effect of Maintaining Metadata at FRAMEffect of Maintaining Metadata at FRAM
• With and without ECC
• Redundancy issue
48/61Youjip Won
Effect of Removing ECC and level of indirection
YAFFS FRASH 1.0 FRASH 1.5
3180
3000
3500
2000
2500
간 [m
s]
890
1500
마운
트 시
간t l
aten
cy
560790
70 140
890
50 80380500
1000마M
ount
70 50 800
50 500 5000
파일 개수N b f fil
49/61Youjip Won
파일 개수Number of files
Effect of Removing ECC and level of indirection: metadata update
400YAFFS FRASH 1.0 FRASH 1.5
306
356
316
292
344
307
30334
1
9300
350
400
c
215 23
6
2
208 23
222
9
195
269
200
250
300
파일
개수
f file
s/se
c
122 15
5
120 15
415
2
115
100
150
200
초당
파um
ber o
f
65 62 64
0
50
100
Nu
0
생성 삭제 생성 삭제 생성 삭제 생성 삭제
0KByte 1KByte 4KByte 10KByte
creation creation creation creation deletiondeletiondeletiondeletion
Fil i
50/61Youjip Won
파일 크기File size
• RAM-friendly data structure
RAM like data structure• RAM-like data structure
• Updating FRAM not DRAMUpdating FRAM, not DRAM
51/61Youjip Won
Mount latency
YAFFS FRASH2
6
7
YAFFS FRASH2
4
5
6
Tim
e (s
)
2
3
4
ount
ing
T
0
1
2
Mo
02000 4000 6000 8000
Total File Count
52/61Youjip Won
Metadata update
YAFFS FRASH2
302354
319
266288300350400
YAFFS FRASH2
217 235
155141
206
266288
209200250300
es/s
ec
121
64
155
31
141
63
132
50100150fil
050
Creation Deletion Creation Deletion Creation Deletion Creation Deletion
0KByte 1KByte 4KByte 10KByte
File Size
53/61Youjip Won
Data I/O
3000YAFFS FRASH2
256124092500
3000
2000
/sec
1000
1500
Kby
te
653
344500
0
Read Write
54/61Youjip Won
Overall
Yaffs
Choice 1: Metadata in b-NVRAM, no ECC in b-NVRAM
Choice 2: Direct access on b-NVRAM
Choice 3: Adaptive Layer SelectionChoice 3: Adaptive Layer Selection
55/61Youjip Won
Mount Delay with different partition sizes)
YAFFS First Conceptual Design Second Conceptual Design Final Design
80
100
120
me (
1/1
00s
40
60
80
Tim
e: re
al ti
0
20
Mount T
10 20 30 40 50 60 70 80 90 100
Partition Size (Mbytes)
56/61Youjip Won
Mount Delay with different number of filess)
YAFFS First Conceptual Design Second Conceptual Design Final Design
250
300
350
tim
e (
1/1
00
150
200
Tim
e: re
al t
0
50
100
Mount T
0 2000 4000 6000 8000
Total Number of Files
57/61Youjip Won
File Creation (LMBENCH)
450
500
YAFFS First Conceptual Design Second Conceptual Design Final Design
300
350
400
450
c
150
200
250
300
file
s/se
c
50
100
150
0
0KByte 1KByte 4KByte 10KByte
58/61Youjip Won
File Deletion (LMBENCH)
450
500
YAFFS First Conceptual Design Second Conceptual Design Final Design
300
350
400
450
c
150
200
250
300
file
s/s
ec
0
50
100
150
0
0KByte 1KByte 4KByte 10KByte
59/61Youjip Won
Sequential Read/Write (LMBENCH)
3500
YAFFS First Conceptual Design Second Conceptual Design Final Design
2500
3000
sec
1500
2000
Kbyt
es/s
500
1000
0
Write Read
60/61Youjip Won
Sequential Read/Write (Iozone)
3000
YAFFS First Conceptual Design Second Conceptual Design Final Design
2000
2500
sec
1000
1500
Kbyt
es/s
500
1000
0
Write Read
61/61Youjip Won
Conclusion
Log Structured Approach for Flash
Long Mount Delay
FRASH
Hierarchical File System with Flash and NVRAMFast scan operationFast scan operation
Elimination of scan operation
Faster Mount TimeFaster Mount Time
Better Performance
62/61Youjip Won
Q&A
Th k Thank you.
63/61Youjip Won