Post on 12-Feb-2020
Direct-FUSE: Removing the Middlemanfor High-Performance
FUSE File System SupportYue Zhu*, Teng Wang*,
Kathryn Mohror+, Adam Moody+, Kento Sato+, Muhib Khan*, Weikuan Yu*
Florida State University*Lawrence Livermore National Laboratory+
ROSS’18 S-2
Outline
• Background & Motivation
·Design
·Performance Evaluation
·Conclusion
ROSS’18 S-3
Introductionn High-performance computing (HPC) systems needs efficient
file system for supporting large-scale scientific applicationsØ Different file systems are used for different kinds of data in a single jobØ Both kernel- and user-level file systems can be used in the applicationsØ Due to kernel-level file systems’ development complexity, reliability and
portability issues, user-level file systems are more leveraged for particular I/O workloads with special purpose
n Filesystem in UserSpacE (FUSE)Ø A software interface for Unix-like computer operating systemsØ It allows non-privileged users to create their own file systems without
modifying kernel codeØ User defined file system is run as a separate process in user-spaceØ Example: SSHFS, GlusterFS client, FusionFS(BigData’14)
ROSS’18 S-4
How does FUSE Work?n Execution path of a function call
Ø Send the request to the user-level file system processo App program → VFS → FUSE kernel module → User-level file system
process
Ø Return the data back to the application programo User-level file system process → FUSE kernel module → VFS → App
programApplication Program User Level File System
Ext4
Storage Device
User Space
Kernel Space
FUSE
Page Cache
Virtual File System (VFS)
ROSS’18 S-5
FUSE File System vs. Native File SystemFUSE File System Native File System
# User-kernelMode Switch 4 2
# ContextSwitch 2 0
# MemoryCopies 2 1
Application Program User Level File System
Ext4
Storage Device
User Space
Kernel Space
FUSE
Page Cache
Virtual File System (VFS)
ROSS’18 S-6
Number of Context Switches & I/O Bandwidthn The complexity added in FUSE file system execution path
causes performance degradation in I/O bandwidthØ tmpfs: a file system that stores data in volatile memoryØ FUSE-tmpfs: a FUSE file system deployed on top of tmpfsØ dd micro-benchmark and perf system profiling tool are used to gather the I/O
bandwidth and the number of context switchesØ Experiment method: continually issue 1000 writes
Write Bandwidth # Context Switches
Block Size (KB)
FUSE-tmpfs(MB/s)
tmpfs(GB/s)
FUSE-tmpfs
tmpfs
4 163 1.3 1012 716 372 1.6 1012 764 519 1.7 1012 7128 549 2.0 1012 7256 569 2.4 2012 7
ROSS’18 S-7
Breakdown of Metadata & Data Latencyn The actual file system operations (i.e. metadata or data
operations) only occupy a small amount of total execution timeØ Tests are on tmpfs and FUSE-tmpfsØ Real Operation in metadata operation: the time of conducting operationØ Data Movement: the actual time of write in a complete write function callØ Overhead: the cost besides the above two, e.g. the time of context switches
0
100
200
300
400
500
600
1 4 16 64 128 256
Late
ncy
(ns)
Transfer Sizes (KB)
Data MovementOverhead
34.8%
33.7%
37.86%
15.82%10.08%
38.12%
0
50
100
150
200
250
Late
ncy
(ns)
Metadata Operations
Real OperationOverhead
Create Close
11.18%
2.17%
tmpfs FUSE-tmpfs tmpfs FUSE-tmpfs
Fig. 1. Time Expense in Metadata Operations Fig. 2. Time Expense in Data Operations
ROSS’18 S-8
Existing Solution and Our Approachn How to reduce the overheads from FUSE?
Ø Build an independent user-space library to avoid goingthrough kernel (e.g., IndexFS (SC’14), FusionFS)
Ø However, this approach cannot support multiple FUSE libraries with distinct file paths and file descriptors
n We propose Direct-FUSE to support multiple backend I/O services to an application Ø We adapted libsysio to our purpose in Direct-FUSE
o libsysio is developed by Scalability team of Sandia National Lab):« a POSIX-like file I/O, and name space support for remote file systems
from an application’s user-level address space.
ROSS’18 S-9
Outline
·Background & Motivation
• Design
·Performance Evaluation
·Conclusion
ROSS’18 S-10
n Direct-FUSE mainly consists of three components1. Adapted-libsysio
o Intercept file path and file descriptor for backend services identificationo Simplify metadata and data execution path in original libsysio
2. lightweight-libfuse (not real libfuse)o Abstract file system operations from backend services to unified APIs
3. Backend serviceso Provide defined file system operations (e.g., FusionFS)
The Overview of Direct-FUSE
Application Program
Ext4
Adapted-libsysio
lightweight-libfuse
FUSE-Ext4 FusionFS Client ….
FusionFS Server …
Backend Services
Direct-FUSE
ROSS’18 S-11
Path and File Descriptor Operationsn To facilitate the interception of file system operations
for multiple backends, the operations are categorizedinto two:
1. File path operationsi. Intercept prefix and path (e.g., sshfs:/sshfs/test.txt) and return mount
informationii. Look up corresponding inode based on the mount information, and
redirect to defined operations
2. File descriptor operationsi. Find open-file record based on given file descriptor
« Open-file record contains pointers to inode, current stream position, etc
ii. Redirect to defined operations based inode info in open-file record
ROSS’18 S-12
Requirements for New Backendsn Interact with FUSE high-level APIsn Separated as an independent user-space library
Ø The library contains the fuse file system operations,initialization function, and also the unmount function
Ø If a backend passes some specialized data to the fuse module via fuse_mount(), then the data has to be globalized for later file system operations
n Implemented in C/C++ or has to be binary compatible with C/C++
ROSS’18 S-13
Outline
·Background and Challenges
·Design
• Performance Evaluation
·Conclusion
ROSS’18 S-14
Experimental Methodologyn We compare the bandwidth of Direct-FUSE with local
FUSE file system and native file system on disk and memory by IozoneØ Disk
o Ext4-fuse: FUSE file system overlying Ext4 o Ext4-direct: Ext4-fuse bypasses the FUSE kernelo Ext4-native: original Ext4 on disk
Ø Memoryo tmpfs-fuse, tmpfs-direct, and tmpfs-native are similar to the three tests on
disk
n We also compare the I/O bandwidth of distributed FUSE file system with Direct-FUSEØ FusionFS: a distributed file system that supports metadata-
and write-intensive operations
ROSS’18 S-15
Sequential Write Bandwidthn Direct-FUSE achieves comparable bandwidth
performance to the native file systemØ Ext4-direct outperforms Ext4-fuse by 16.5% on averageØ tmpfs-direct outperforms tmpfs-fuse at least 2.15x
1
10
100
1000
10000
4 16 64 256 1024Ban
dwid
th(M
B/s
)
Write Transfer Sizes (KB)
Ext4-fuse Ext4-direct Ext4-nativetmpfs-fuse tmpfs-direct tmpfs-native
ROSS’18 S-16
Sequential Read Bandwidthn Similar to the sequential write bandwidth, the read
bandwidth of Direct-FUSE is comparable to the native file systemØ Ext4-direct outperforms Ext4-fuse by 2.5% on averageØ tmpfs-direct outperforms tmpfs-fuse at least 2.26x
1
10
100
1000
10000
4 16 64 256 1024Ban
dwid
th(M
B/s
)
Read Transfer Sizes (KB)
Ext4-fuse Ext4-direct Ext4-nativetmpfs-fuse tmpfs-direct tmpfs-native
ROSS’18 S-17
Distributed I/O Bandwidthn Direct-FUSE outperforms FusionFS in write
bandwidth and shows comparable read bandwidthØ Writes benefit more from the FUSE kernel bypassing
n Direct-FUSE delivers similar scalability results as the original FusionFS
1
10
100
1000
10000
1 2 4 8 16
Ban
dwid
th (M
B/s
)
Number of Nodes
fusionfs direct-fusionfs
1
10
100
1000
10000
1 2 4 8 16
Ban
dwid
th(M
B/s
)
Number of Nodes
fusionfs direct-fusionfs
Write Read
ROSS’18 S-18
Overhead Analysisn The dummy read/write occupies less than 3% of the
complete I/O function time in Direct-FUSE, even when the I/O size is very smallØ Dummy write/read: no actual data movement, directly
return once reach the backend serviceØ Real write/read: the actual Direct-FUSE read and write I/O
calls
1
10
100
1000
10000
1B 4B 16B 64B 256B 1KB
Lat
ency
(ns)
Transfer Sizes
dummy write real write
1
10
100
1000
10000
1B 4B 16B 64B 256B 1KB
Lat
ency
(ns)
Transfer Sizes
dummy read real read
ROSS’18 S-19
ConclusionsØ We have revealed and analyzed the context switches count and
time overheads in FUSE metadata and data operations
Ø We have designed and implemented Direct-FUSE, which canavoid crossing kernel boundary and support multiple FUSEbackends simultaneously
Ø Our experimental results indicate that Direct-FUSE achieves significant performance improvement compared to original FUSEfile systems
ROSS’18 S-20
Sponsors of This Research