What we'll be coveringWhat's in a filesystem?Recent activity in Open Source filesystemsThe consequences for the RHEL userFuture expectations?
So, what's in a filesystem? What we'll be looking at in this talk:
● Local disk filesystems● Clustered filesystems● Not distributed filesystems
Related technologies:● Storage infrastructure● Cluster infrastructure
Open Source FilesystemsAn overview of the “state of the art” Who is involved in development?
● The traditional Linux hobbyist● Dedicated Linux companies/organisations
● Commercial● OSDL
● Partner companies● Government organisations
Who is involved in testing?● The Linux kernel community● Cutting edge distributions● Alpha / Beta cycles for RHEL
Open Source Filesystems: New Core FeaturesThe Linux2.6 kernel brings: Security
● Extended attributes: ACLs, SELinux● NFS security extensions
Scalability● New IO subsystem:
● Very large (>2TB) filesystems● Muchimproved SCSI layer
● New locking infrastructures for massive SMP scalability● Core “RCU” lockless data structures● Ext3 locking and fragmentation improvements
Manageability● LVM2based virtual driver stack● Filesystem online resize
Open Source Filesystems: AddonsIt's not just the core kernel distribution that is progressing. Reiserfs4 local filesystem Clustered filesystems outside the core kernel:
● Sistina / Red Hat's GFS ● CFS's Lustre● Oracle's OCFS/OCFS2
Leads to a very different storage model from traditional localdisk filesystems Ideal for SANs LVM2based cluster LVM under development
So what's new for the RHEL ext3 user?Ext3 features at a glance:
Backwards compatibility is guaranteed, of course! Forwards compatibility at the kernel level RHEL3 e2fsck will not understand all RHEL4 features Undesired features can be removed at will
2.4 kernel RHEL-3 RHEL-4
Functionality:
Max. Filesystem size 1TB 2TB (U5) 8TB (U1)
Max. File Size 2TB 2TB 2TB
Extended Attributes No Yes Yes
POSIX ACLs No Yes Yes
SELinux labels No No Yes
Online Resize No No Yes
Performance:
Tree-based Directories No No Yes
Reservations No No Yes
SMP scaling improvements No No Yes
Ext3 performance: “htree” directory indexingC
reat
e 10
00
Cre
ate
1000
0
Cre
ate
1000
00
Cre
ate
1000
000
ls 1
000
ls 1
0000
ls 1
0000
0
ls 1
0000
00
Del
ete
1000
Del
ete
1000
0
Del
ete
1000
00
Del
ete
1000
000
CP
U(c
reat
e) 1
000
CP
U(c
reat
e) 1
0000
CP
U(c
reat
e) 1
0000
0
CP
U(c
reat
e) 1
0000
00
0
0.01
0.1
1
10
100
1000
10000
100000
Sequential directory performance
No htree
Htree
Tim
e (s
econ
ds)
Ext3 performance: reservations
1 2 4 80
10
20
30
40
50
60
70
80
90
100
ext3 read performance
reads, no reservations
reads with reservations
Number of threads
Meg
abyt
es/s
econ
d
1 2 4 80
10
20
30
40
50
60
70
80
90
100
ext3 write performance
writes, no reservations
writes with reservations
Number of threads
Meg
abyt
es/s
econ
d
The future for ext3There's still lots to look forward to! Larger ondisk “inodes” (file structures):
● Extended attributes in the inode● Larger/morefinegrained metadata: timestamps, block/link counts etc.
Extent maps:● Far more efficient mapping of really large files● 48bit block pointers (1Exabyte filesystem size)
● (But backup and fsck can become slow!) Performance improvements:
● Background deletes● Deferred deallocations● multipage operations
RHEL users and GFSGFS features at a glance:
All open source: http://sources.redhat.com/cluster/ Packaged and included in Fedora Core 4
ext3 GFS GFS2Supports internal disks Yes No (1) No (1)
Yes Yes Yes
No Yes Yes
No Yes Yes
No Yes YesOnline resize Yes (RHEL4) Yes YesExtended attributes / ACLs Yes Yes YesSELinux attributes Yes (RHEL4) No Yes“ Ordered data” integrity Yes No YesStatic inode placement Yes Yes NoMax filesystem size 8TB (RHEL4) 8 Exabyte (3) 8 Exabyte (3)Notes:(1) Sharing internal disks is possible via gnbd, but introduces SPOF(2) Cluster infrastructure includes: lock manager, membership/connection manager, fencing agents(3) Theoretical maximum, untested! Limited to 16TB on 32-bit platforms
Supports external disks (FC/iSCSI/SAN)Requires cluster infrastructure (2)Coordinates multiple concurrent mountsIncurs cluster locking overhead
The future for GFSWork still going on with both GFS and GFS2: 2.6 (RHEL4) port of GFS GFS and GFS2 to be able to use Distributed Lock Manager (DLM) GFS2 features:
● Online shrink; defragment (planned)● Ordered data mode● Performance improvements:
● Fuzzy “df” statfs● Faster directory scans, synchronous IO
● SELinux attributes
Ongoing...We've seen some common themes: Performance, performance, performance Scaling up:
● Large SMP systems● Large filesystems
Big business contributing to ongoing development Advanced cluster support Compatibility/migration
Some of the challenges: “lost” projects like InterMezzo Harder and harder for hobbyists to do proper testing Limited real time support for now
Development model is proving extremely scalable and sustainable!
Top Related