An Empirical Study of File Systems on NVM - MSST...

35
An Empirical Study of File Systems on NVM Priya Sehgal, Sourav Basu, Kiran Srinivasan, Kaladhar Voruganti NetApp Inc. © 2015 NetApp, Inc. All rights reserved. NetApp Confidential Limited Use 1

Transcript of An Empirical Study of File Systems on NVM - MSST...

An Empirical Study of File Systems on NVMPriya Sehgal, Sourav Basu, Kiran Srinivasan, Kaladhar Voruganti

NetApp Inc.

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 1

Motivation

Need a thin software stack to access data from fast NVM

Persistent memory abstractions

NVM optimized file systems

What characteristics of traditional block based file systems are good for NVM?

Can traditional file systems be fine tuned using mount and format options?

Can it be optimized with minor changes?

How does the performance of traditional file systems compare with NVM-optimized one?

What file system features help improve performance on NVM?

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 2

Overview

Related Work

Experimental Methodology

Experimental Results

Recommendation and Conclusion

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 3

Overview

Related Work

Experimental Methodology

Experimental Results

Recommendation and Conclusion

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 4

Related Work

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 5

POSIX interface, re-designing the file system for

NVM

POSIX library interposers or

bypass file system

New Programming Models and

Data-Structures

Adjust operating system,

storage stack or file system configurations

Related Work

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 6

POSIX interface, re-designing the file system for

NVM

POSIX library interposers or

bypass file system

New Programming Models and

Data-Structures

Adjust operating system,

storage stack or file system configurations

• PMFS

• SCMFS

• BPFS

Related Work

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 7

POSIX interface, re-designing the file system for

NVM

POSIX library interposers or

bypass file system

New Programming Models and

Data-Structures

Adjust operating system,

storage stack or file system configurations

• PMFS

• SCMFS

• BPFS

• Moneta-D

• Bankshot

• Aerie

Related Work

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 8

POSIX interface, re-designing the file system for

NVM

POSIX library interposers or

bypass file system

New Programming Models and

Data-Structures

Adjust operating system,

storage stack or file system configurations

• PMFS

• SCMFS

• BPFS

• Moneta-D

• Bankshot

• Aerie

• Mnemosyne

• PMem-Lib

• NV-Heaps

• NV-Tree

• CDDS

Related Work

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 9

POSIX interface, re-designing the file system for

NVM

POSIX library interposers or

bypass file system

New Programming Models and

Data-Structures

Adjust operating system,

storage stack or file system configurations

• PMFS

• SCMFS

• BPFS

• Moneta-D

• Bankshot

• Aerie

• Mnemosyne

• PMem-Lib

• NV-Heaps

• NV-Tree

• CDDS

• Lee et al

• Our work.

Overview

Related Work

Experimental Methodology

Experimental Results

Recommendation and Conclusion

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 10

Experimental MethodologyWorkloads

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 11

WORKLOADS

FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations

Experimental MethodologyWorkloads

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 12

FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations

WebProxy:Flat namespace, large no. of small files, data and metadata operations

WORKLOADS

Experimental MethodologyWorkloads

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 13

WORKLOADSOLTP: data intensive

WebProxy:Flat namespace, large no. of small files, data and metadata operations

FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations

Experimental MethodologyWorkloads

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 14

Key-Value Store: Uses Memory-Mapped Semantics (load/stores)

WORKLOADSOLTP: data intensive

WebProxy:Flat namespace, large no. of small files, data and metadata operations

FileServer :Moderate directory depth, moderate no. of large files, data and metadata operations

Experimental MethodologyFile System Characteristics (varied using mount and format options)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 15

Inode Structure: Linear vs B+ Tree

Block Size: Fixed vs Variable sized extent

Layout/Update: In-place vs Log-structured vs Hybrid

Allocation Strategy: Immediate vs Delayed

Parallel Allocation (Concepts like Allocation/Block group)

Journal: Ordered vs Write-Back vs Data

Execute-in-place(XIP)

Experimental MethodologyFile System Characteristics (varied using mount and format options)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 16

Inode Structure: Linear vs B+ Tree

Block Size: Fixed vs Variable sized extent

Layout/Update: In-place vs Log-structured vs Hybrid

Allocation Strategy: Immediate vs Delayed

Parallel Allocation (Concepts like Allocation/Block group)

Journal: Ordered vs Write-Back vs Data

Execute-in-place(XIP)

File Systems Evaluated

Ext2, Ext3, Ext4, XFS, F2FS, NILFS2, PMFS

Experimental MethodologyExperimental Setup

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 17

RamDisk(NVM

Simulated)

File System

Workload/ Benchmark

DRAM

User Space

Kernel

Block Driver

XIP

Limitations:

• NVM characteristics not simulated

• Not considered recoverabiliy

Overview

Related Work

Experimental Methodology

Experimental Results

Recommendation and Conclusion

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 18

FileServer (Throughput)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 19

0

20

40

60

80

100

120

140

160

op

s/s

ec (

x 1

000)

1.5x2x

1.3x

XIP support, atomic updates and fine grained

logging

Higher is better100K files of size 128KB

0

20

40

60

80

100

120

140

160

op

s/s

ec (

x 1

000)

FileServer (Throughput)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 20

1.4x 3x

XIP support only for data in

Ext2/3

Higher is better

0

20

40

60

80

100

120

140

160

op

s/s

ec (

x 1

000)

FileServer (Throughput)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 21

31%

Increased AG count increased parallelism

for meta-data intensive workloads

Higher is better

0

20

40

60

80

100

120

140

160

op

s/s

ec (

x 1

000)

FileServer (Throughput)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 22

F2FS solves following problems of NILFS2:• Wandering tree • Garbage collection

3.1x

Higher is better

0

20

40

60

80

100

120

140

160

op

s/s

ec

(x

100

0)

100x

41x

45x

WebProxy (Throughput)At directory depth 0.7 – Flat namespace

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 23

Hash tree (indexed directory)

missing in ext2 and pmfs

500K files of size 32KB Higher is better

WebProxy (Throughput)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 24

0

20

40

60

80

100

120

140

160

180

200

op

s/s

ec

(x

10

00

)

First Column: 0.7 Directory Depth

Second Column: 2.1 Directory Depth

Lookup is fast for increased

directory depth

PMFS scalability affected by

allocation from single linked list

Higher is better

WebProxy (Throughput)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 25

0

20

40

60

80

100

120

140

160

180

200

op

s/s

ec

(x

10

00

)

First Column: 0.7 Directory Depth

Second Column: 2.1 Directory Depth1.6x

Delayed Allocation not good for small

files

Higher is better

OLTP Database(TPC-C on MySQL)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 26

0

0.2

0.4

0.6

0.8

1

1.2

No

rmalized

tp

mC

XIP-FS performs best

400 warehouse, total size 38GB Higher is better

0

0.2

0.4

0.6

0.8

1

1.2

No

rmalized

tp

mC

OLTP Database(TPC-C on MySQL)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 27

Next Level are buffered FS:13-22%

worse than XIP

400 warehouse, total size 38GB Higher is better

XIP-FS performs best

0

0.2

0.4

0.6

0.8

1

1.2

No

rmalized

tp

mC

OLTP Database(TPC-C on MySQL)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 28

Next Level are buffered FS:13-22%

worse than XIP

Worst Performing FS: 32-42% worse compared to XIP

400 warehouse, total size 38GB Higher is better

XIP-FS performs best

0

0.2

0.4

0.6

0.8

1

1.2

No

rmalized

tp

mC

OLTP Database(TPC-C on MySQL)

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 29

10%

Increased AG count increases indirections

for large files

Higher is better

Key-Value Stores (YCSB on MongoDB) - LatencyWORKLOAD C

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 30

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rmalized

Late

ncy

READ

Reduced page faults blurs

performance difference

10 million records, total size 36GB

READ=100%

Lower is better

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

No

rmalized

Late

ncy

READ UPDATE

Key-Value Stores (YCSB on MongoDB) - LatencyWORKLOAD A

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 31

Msync and Fsync is costly in non-xip file

systems

Lower is better

READ=50% UPDATE=50%

Lower performing

workload affects other workloads

Key-Value Stores (YCSB on MongoDB) - LatencyWORKLOAD E

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 32

0

0.5

1

1.5

2

2.5

3

No

rmalized

Late

ncy

SCAN INSERT

Effect of page-faults shows-up in case of

loads(SCAN)

SCAN=95% INSERT=5%

Lower is better

Overview

Related Work

Experimental Methodology

Experimental Results

Recommendation and Conclusion

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 33

Recommendation and Conclusion

Recommendation for traditional and new file systems

In-place update layout

Execute In Place

Simple and parallel allocation strategy

Fixed sized data blocks

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 34

Thank you

© 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 35