EDUCATION
Advanced Data Sharing -Survey of Networked File Systems & File Servers
Jonathan Goldick, ONStorPhilippe Nicolas, Brocade
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
2
SNIA Legal Notice
• The material contained in this tutorial is copyrighted by the SNIA.
• Member companies and individuals may use this material in presentations and literature under the following conditions:– Any slide or slides used must be reproduced without modification– The SNIA must be acknowledged as source of any material used
in the body of any document containing material from these presentations.
• This presentation is a project of the SNIA Education Committee.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
3
Abstract
Survey of Networked File Systems and File Servers
With all of the new advances in file systems and file server technology how do you know which ones are the best for you? This presentation will provide a framework for evaluating file systems approaches and a look at how each approach is evolving. Topics discussed will include: survey of local, SAN, clustered, NAS, global, and wide area file systems, how application characteristics should affect your choice of file systems, as well as performance, scalability, ease of use, data management, deployment and maintenance and cost considerations.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
4
Agenda
• File Services• How to Evaluate File Systems• Comparison of File System Types• Conclusion
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
5
What This Session Will Not Cover
• Volume Managers• Databases• Storage Layer – Block Services• Security Models
Check outSNIA Tutorial:
Check outSNIA Tutorial:
Object-Based Storage Device (OSD) –Architecture and System
Check outSNIA Tutorial:
Check outSNIA Tutorial:
NAS & iSCSITechnologyOverview
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
6
TheSNIA Shared Storage Model
File/record layerFile/record layer
Database(dbms)
File system(FS)
Stor
age
dom
ain
Block layerBlock layer
Storage devices (disks, …)Storage devices (disks, …)
Ser
vice
sS
ervi
ces
Dis
cove
ry, m
onito
ring
Dis
cove
ry, m
onito
ring
Res
ourc
e m
gmt,
conf
igur
atio
nR
esou
rce
mgm
t, co
nfig
urat
ion
Sec
urity
, bill
ing
Sec
urity
, bill
ing
Red
unda
ncy
mgm
t (ba
ckup
, …)
Red
unda
ncy
mgm
t (ba
ckup
, …)
Hig
h av
aila
bilit
y (fa
il-ov
er, …
)H
igh
avai
labi
lity
(fail-
over
, …)
Cap
acity
pla
nnin
gC
apac
ity p
lann
ing
Network
Host
DeviceBlock aggregation
Application
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
7
File Services
• Accommodate Data Types– Structured: Data Base– Semi-Structured: email– Unstructured: Text, Excel, image files, etc.
• Provide Interface to Storage• Manage Files
– Backup– Provisioning– Availability
• Allow Data Sharing
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
8
What is a File System?
• A management system that exports a hierarchy of files and directories with a simple and constrained set of access methods.
• Access methods follow one of a very few semantic models, POSIX, NTFS, etc.
• Coordinates access to data and state information between multiple requests.
• Manages disk utilization on behalf of requests.• Maintains metadata integrity and restores it in
the event of a failure.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
9
File System Components
Volume Management
Access Mediator
Metadata Methods Data Methods
Interface to Storage
Metadata Cache Data Cache
ILM
Transaction Manager
Recovery Logic
Inodes, Directories,
etc. File Data
Fast Failure Recovery
Logic
Data Placement Strategy
Lock Management, Access Control
Interface to External
Applications
Volume Access Mediator Cluster/SAN FS Access Management
SCSI, FC, etc.
LUN(s), Logical Blocks, RAID,
Mirroring, Striping, etc.
Auditing
Snapshots
Application Access Methods
Block Allocator
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
10
File System Types
1. Local File Systema) Host-based, single operating systemb) Co-located with application serverc) Many types with unique formats, feature mix
2. Distributed File Systema) Remote, network-accessb) Semantics are limited subset of local file systems c) Cooperating file serversd) May include integrated replication
3. Shared (SAN and Clustered) File Systemsa) Host-based file systemsb) Hosts access all datac) Co-located with application server for performance
4. Clustered Distributed File Systema) Each file server runs a SAN/Clustered file systemb) Global name space enables access to all data
5. Wide Area File Systema) Distributed file system b) Improved performance over long latency networksc) Deployed as appliances in a hub-spoke model
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
11
Evaluating File Systems
• Does it fit the Application Characteristics?– Does the application even support the file system?– Is it optimized for the type of operations that are important to the
application?
• Performance & Scalability– Does the file system meet the latency and throughput
requirements?– Can it scale up to the expected workload and deal with growth?– Can it support the number of files and total storage needed?
• Data Management– What kind of features does it include? Backup, Replication,
Snapshots, ILM, …
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
12
Evaluating File Systems
• Security– Does it conform to the security requirements of your company?– Does it integrate with your security services?– Does it have Auditing, Access Control and at what granularity?
• Ease of Use– Does it require training the end users or changing applications to
perform well?– Can it be easily administered in small and large deployments?– Does it have centralized monitoring, reporting?– How hard is it to recover from a software or hardware failure and
how long does it take?– How hard is it to upgrade or downgrade the software and is it
live?
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
13
Application Characteristics
Workload Profiles (A) (B) (C) (D) (E)1. Latency Sensitive High Med Low Low High
2. Throughput High read/write
High read Low High read High
write3. Concurrent sharing High High Low High read Low4. Caching (re-read rate) High High High Low Low
Typical Applications:(A) OLTP (B) Small Data Mart(C) Home Directory
(D) Large Scale Streaming (Web Farm)(E) High Frequency Meta Data Update (small file create/delete)
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
14
Performance & Scalability
• Performance– Throughput– Read / write access patterns– Impact of data protection mechanisms, operations
• Scalability– Number of files, directories, file systems– Performance, recovery time– Simultaneous and active users
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
15
Data Management
• Backup– Performance– Backup vendors; native agent vs. network-based– Data de-duplication – backup once
• Replication– Multiple read-only copies– Optimization for performance over network– Data de-duplication – transfer once
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
16
Data Management
• Quotas– Granularity
• User quotas• Group quotas• Directory tree quotas• Nested directory tree quotas
– Extended quota features– Ease of set up– Native vs. external servers– Scalability with increasing number of files– Quota per user vs. quota per file system per user
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
17
Data Management
• Information Lifecycle Management (ILM)– Lots of features, differing definitions– Can enforce compliance and auditing rules– Cost & performance vs. impact of lost/altered data
Check outSNIA Tutorial:
Check outSNIA Tutorial:
The Secret Sauce of ILM – The Professional ILM
Check outSNIA Tutorial:
Check outSNIA Tutorial:
ILM: Tiered Services and the Need for Classification
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
18
Security Considerations
• Authentication– Support and to what degree
• Authorization– Granularity by access types– Need for client-side software– Performance impact of large scale ACL changes
• Auditing– Controls– Audit log full condition– Login vs. login attempt vs. data access– Digitally signed audit trails
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
19
Security Considerations
• Virus Scanning– Preferred vendor supported?– Performance & scalability– External vs. file server-side virus scanning
• Vulnerabilities– Security & data integrity vulnerabilities vs. performance & cost– Compromised file system
• One client• One file server
– Detection– Packet sniffing
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
20
Ease of Use
• End-User– Local file systems vs. any other
• Deployment & Maintenance– Implementation– Scalability of management– File system migration between servers– Automatic provisioning– Centralized monitoring, reporting, phone-home– Hardware failure recovery– Single points of failure– Performance monitoring
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
21
How Do They Stack Up?
• Local File System
• Distributed File System
• Shared File System
• Clustered Distributed File System
• Wide Area File System
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
22
Local File System
• Key Characteristics– Easy to use– Scale up via processor performance– Cannot scale file services independently of application services– Most feature-rich data management tools but you cannot offload
data management• Quota Management for Windows• HSM/ILM• Data Classification
– Islands of storage• Low utilization rates• Non-scalable management
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
23
Local File System
• Target Applications (or Best Suited for Applications)– Productivity software applications such as MS Excel, Word, text
editors– Personal file services for Unix, Windows & Linux applications– Small and medium size databases used for semi-structured
applications (e.g. Email systems – MS Exchange, Lotus Notes)– Document Management Systems– Tightly integrated Web applications (web servers, application
servers etc.)• Interesting Developments
– Content aware file systems– Transactional semantics on top of file systems (winfs)– Encryption and steganography– Data de-duplication
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
24
How Do They Stack Up?
• Local File System
• Distributed File System
• Shared File System
• Clustered Distributed File System
• Wide Area File System
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
25
Client Client
Distributed File System
File Server
StorageNetwork
Data & Control Access
Shared Disks
Client
NAS Protocols* NFS * WebDAV* CIFS * HTTP* AFS
Client
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
26
Distributed File System
• Key Characteristics– Often purpose-built file servers – No real standardization for file sharing across Unix (NFS) and
Windows (CIFS)– Scales independently of application services– Performance limited to that of a single file server– Reduces (not eliminate) islands of storage– Replication sometimes built in– Global name space through external service– Less featured 3rd party data management tools– Strong network security supported– NAS historically suffered a variety of security vulnerabilities– Implementations evolved to leveraging block storage technology
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
27
Distributed File System
Target Applications (or Best Suited for Applications)– NFS, CIFS, static HTTP file serving– Productivity applications that can store files over a
networked share (E.g., MS Excel, Word, Text editors)– Generic home directory files– Large scale software development with file sharing– Reference data archiving– Image file sharing – PACS
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
28
Distributed File System
• Complimentary Products– Network compression products– NDMP for server-less backup
• Interesting Developments– NFSv4 (v4.1 soon)– NFS RDMA– Parallel NFS– Native ILM– Content Addressable Storage– Virtualized file servers
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
29
How Do They Stack Up?
• Local File System
• Distributed File System
• Shared File System
• Clustered Distributed File System
• Wide Area File System
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
30
Shared File System
• SAN File System
NFS/CIFSServer
SharedDisks
StorageNetwork
Data Network - LANMetadata
Server
Client sw Client sw
Block list
File Request
Data Access
App.
App. App.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
31
Shared File System
• Cluster File System
FirstHost
Shared Disks
HeartBeatLock Management
StorageNetwork
Cluster File System
Cluster Volume ManagerCluster
WebServer
WebServer
WebServer
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
32
Shared File System
• Key Characteristics– Tremendous scalability– Highest data throughput– Applications must be cluster-ready– Less featured/unsupported 3rd party data management tools– Offload data management tasks – Secure all application hosts– Downtime to upgrade cluster of hosts – Limited host operating system versions– Lead time for certification of new operating systems/versions– Single master server can be a single point of failure
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
33
Shared File System
• Target Applications (or Best Suited for Applications)– Applications that need large files to be shared by multiple
processes in a workflow (e.g., scientific computations, video post-production rendering, vector analysis, seismic data analysis)
– Database applications for OLTP (e.g., Oracle 9i RAC)– High performance computing applications (e.g., Rendering, Grid
computing, Financial analysis, Computer Aided designs)– Highly scalable Web serving
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
34
How Do They Stack Up?
• Local File System
• Distributed File System
• Shared File System
• Clustered Distributed File System
• Wide Area File System
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
35
Clustered Distributed File System
Cluster File SystemLock Management
Cluster
File Server File Server
Cluster Volume Manager
StorageNetwork
HeartBeatFirstHost
Optional Layer
Shared Disks
File Server
Client Client
Client
NAS Protocols
Client
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
36
Clustered Distributed File System
• Key Characteristics– Advantages of distributed file systems + performance and
scalability of a Shared File System– Span a number of file servers– File servers may not have full read/write access to all files– Usually includes integrated global name space feature– No host operating system compatibility issues.– Upgrade complexity less than with shared file systems– Mostly an NFS solution today
• Target Applications (or Best Suited for Applications)– Scalable NFS-based file services– High throughput when reading & writing large files– Require concurrent data access to files and data
• Seismic data analysis, CAD & E-CAD design simulations, digital image rendering, pre & post-production video
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
37
How Do They Stack Up?
• Local File System
• Distributed File System
• Shared File System
• Clustered Distributed File System
• Wide Area File System
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
38
NAS Aggregation – WAFS(Core + Edge)
FileServer
NFS/CIFSServer
SharedDisks
StorageNetwork
Data Network - LAN
Data and Control Access
NFS/CIFSClient
NFS/CIFSClient
NFS/CIFSClient
NFS/CIFSClient
Data Network - LAN
NFS/CIFSClient
NFS/CIFSClient
NFS/CIFSClient
NFS/CIFSClient WAN DataCenter
Data Network - LAN
NFS/CIFSClient
NFS/CIFSClient
NFS/CIFSClient
NFS/CIFSClient
Remote Offices
Recent MethodWAFSEdge
Appliance
WAFSCore
Appliance
WAFSEdge
Appliance
NAS Protocols
NAS Protocols
NAS Protocols
• Private protocol
• Excellent in Read caching mode
• Write-through is preferable
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
39
Wide Area File System
• Key Characteristics– LAN-like performance over the WAN
• Optimized TCP/IP file sharing protocols• Storage and data caching for reducing file access latencies
– Cached data enables high read performance– Hub-spoke model of remote office data services– Consolidated of data management and file services – Little concurrent read/write sharing of files across sites– Scalability dependent on remote file servers’ native file system – Application aware data caching beyond simple unstructured files
• E.g. Microsoft Exchange– Remote NAS file servers to provide data management capabilities – Security
• Secure communications (encryption) at the network layer– Remote NAS file servers provide authentication and authorization
mechanisms – Complexity at data center vs. managing systems at all remote sites
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
40
Wide Area File System
• Target Applications (or Best Suited for Applications)– Distributed software development for a variety of
applications (e.g., Computer Aided Design)– File sharing applications (e.g., home directory,
document management)– Email messaging systems (e.g., MS Exchange)– Web applications– Distributed print services
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
41
Conclusion
• There are a number of things to consider when choosing a file system or server.– Will the application work as desired?– Will it perform and scale?– Does it have the required data management services?– Is it secure enough?– Is it easy to use and manage?
• There is no single solution that is superior in all cases.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
42
Q&A / Feedback
• Please send any questions or comments on this presentation to SNIA: [email protected]
Many thanks to the following individuals for their contributions to this tutorial.
SNIA Education Committee
David Black, EMC Philippe Nicolas, BrocadeNarayan Venkat, ONStor Elaine Silber, Firefly Comm.Jonathan Goldick, ONStor
EDUCATION
Appendix
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
44
Performance & Scalability Notes
• Performance– Throughput is basically how fast you can stream data in and out of the file system/server. More file servers
and better disk striping will give better results. Be mindful of the difference between out of cache versus sustained numbers, vendors commonly quote the former.
– If your applications are pretty much random in data access patterns, as is common with many small files, caching won’t help much and read ahead will hurt. Look for systems that have optimized for metadata writes and can strip across many disks.
– How does performance change when doing data management operations like snapshot and backup? These are regular and recurring tasks so take them into account when evaluating the performance of the system.
– Does the system do read after write? This hurts performance but ensures that the data made it to disk as intended. If your storage is not highly reliable, and your application is risk averse, this may be worth it.
• Scalability– How many files and directories can really fit in a single (logical) file system? This is not just addressable
storage. Consider how long would it take to recover from a disk subsystem outage/failure.– Most file systems have a performance cliff on directory size. Hundreds of thousands or millions of files in a
single directory works poorly on most systems.– How many simultaneous, and active, users can the system really support? What for quotes of supported
users that don’t actually say “active”.– How many file systems are supported and is performance that same when spread across many of them
when compared to just a few? If you are planning on consolidating your infrastructure, how many file systems would that be?
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
45
Data Management Notes
• Backup– How does performance change when doing backup? If it halves performance, this is a critical deciding
factor. Does the cache get wiped out? Does the network bandwidth get used up?– Can you offload backup to other servers with some shared access to data?– Does it support a Disk-to-disk-to-tape model?– Does your preferred backup vendor support it?– Note that Native Agent backup applications often have far more features and capabilities than network-
based ones.– Data de-duplication is a real plus here. This is where multiple blocks with the same data only get backed up
once.• Replication
– Some file systems have native support for making read only copies of all or part of a directory tree. These replicas are generally mountable on other file servers and are used for disaster protection and content distribution. This is a simple way of getting very high read throughput without the complexity of SAN, cluster, or global file systems. If your application can meaningfully use data that is say at most a few minutes out of date, this is a very scalable alternative. Most content delivery systems fall into this category. As a rule this scales linearly with the number of file servers since there is little overhead.
– Be mindful of how many file servers can mount the same replica.– Look for systems that work with network optimizers and compression.– Look for systems that send the least amount of data over the network. Some send any file that changed,
some send any changed blocks, and some send only the changed bytes.– Data de-duplication is a real plus here. This is where multiple blocks with the same data only get transferred
up once.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
46
Data Management Notes
• Quotas– Make sure the quota features support the level of granularity you require.
• User quotas• Group quotas• Directory tree quotas• Nested directory tree quotas
– Does the file system directly, or through a 3rd party add-on, support extended quota features like a file-type quota? Think of a quota on all .MP3 files. There are many quota-like features out there with a variety of supported policies, reporting mechanisms, and automation.
– What kind of policy infrastructure is offered and how hard is it to set up?– Does it run natively on the file server or does it require external servers? External servers may not offer the
same level of high-availability, security, and scalability capabilities as the file servers they manage. – Just as in Backup, native agents often have more capabilities than remote ones. It’s not fast to determine
when to block the creation of an .MP3 file by calling out to an external server. Since this would increase latencies a great deal it often isn’t done.
– Does the software create a separate shadow copy of the entire directory tree? Most non-native systems do this and have scalability problems when the number of files gets large. Basically the software is keeping a duplicate of the directory tree, often in an external relational database.
– In NAS environments where CIFS and NFS are both present, is the one quota for a user or does CIFS usage and NFS usage count separately? Note that when there is a single quota there is some agent doing mappings between Windows and Unix domain. This gives a good user experience but quotas can be down for a long time when a rebuild needs to be done and you can get a load spike on your domain controllers. As a rule, directory tree quotas have lower overhead and don’t suffer load rebuild periods.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
47
Data Management Notes
• Information Lifecycle Management (ILM)– This can encompass a lot of very different features.– Often used to describe traditional Hierarchical Storage Management (HSM) but with a disk to disk instead of
a disk to tape model. In reality there are many views as to what constitutes ILM. SNIA to the rescue ☺– File systems play a key role in ILM. They mediate access to data so are a clear point at which to enforce
compliance and auditing rules. They control block allocation so can decide initial placement strategies. They own the name space so can efficiently implement retention policies.
– New file system models have recently emerged to tackle some of the core ILM problems, content addressable storage being a prime example. In CAS a signature based on the contents of a file is used instead of a file name, if the contents change so does the signature. These systems are generally much slower than traditional NAS because they are doing a great deal of extra data integrity checks, including read after write. The cost and performance penalty must be weighted against the regulatory hit you will take if data is lost or altered.
Check outSNIA Tutorial:
Check outSNIA Tutorial:
The Secret Sauce of ILM – The Professional ILM
Check outSNIA Tutorial:
Check outSNIA Tutorial:
ILM: Tiered Services and the Need for Classification
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
48
Security Notes
• Authentication– All network file servers need to authenticate users, or trust clients not to lie to them (most NFS
deployments).– Most environments have domain authentication services, it is an important consideration whether a file
system supports them and to what degree. Note that most NAS authentication schemes supports multiple levels, watch for servers that only support the weaker strength choices.
• Authorization– File systems all have mechanisms to control access to a file. These can be as simple as the Unix rwx mode
bits to complex ACL(s) and compliance-related filters.– Make sure the authorization model has the required granularity in terms of access types.– If end users needs to be able to get/set the authorization fields watch for systems that require special client-
side software to function.– If you regularly change ACL(s) over large numbers of files on a regular basis you can hit performance
problems over NAS. NAS protocols are not optimized for recursive security changes. Compare an tree ACL update on a local NTFS file system to the same operation over CIFS. Companies that have strict security compliance requirements on ACL(s) often change them regularly when employees come and go.
• Auditing– Auditing of data accesses has been commonly available in Windows for many years.– Often included in ILM offerings.– Watch for controls on who can control/access the audit facility.– What happens if the audit log is full? Does the system stop serving data?– Are logins and login attempts audited as well or only data accesses?– Are audit trails digitally signed?
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
49
Security Notes
• Virus Scanning– Does the file system support your preferred virus vendor?– What kind of performance penalty is incurred? A 50% loss of read throughput is not uncommon.– Can external virus scan servers be used, and if so, how many? External servers are regularly used in the
NAS world and allow you to add more virus scan resources for scalability. The downside is that not all vendors support this model of operation. With the rise of automated updates of virus definition files on clients, does file server-side virus scanning matter as much as it used to?
• Vulnerabilities– All file systems have some security and data integrity vulnerabilities. To remove them all has a huge
performance and complexity cost. The important thing is to have your eyes open when looking at the tradeoffs you are making.
– Is the file system compromised when any client on the network is? What if one file server is compromised, how far does that extend?
– Is there any way to know it has happened?– Can someone sniffing network packets easily read the data?
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
50
Ease of Use Notes
• End-User– Local file systems that come bundled with the host operating system are the least likely to have application
problems or require end user training. Virtually any other choice has some issue that can affect the end user.
– When possible choose something that is standards-based. While this is no guarantee of success, it increases your odds.
• Deployment & Maintenance– How long does the initial implementation take?– Does management scale with the number of servers?– How hard is it to upgrade/downgrade the software? Is it a live upgrade?– Can file system load be migrated between servers to avoid performance bottlenecks? Is this live? Is it
policy-driven?– Does the system have any automatic provisioning? Being able to set some policies and have the system
provide some level of automation is a serious labor saving device. File systems only run out of space at 2am ☺
– Does it have centralized monitoring, reporting, phone-home support? This is a major part of scaling management with the number of servers.
– When there is a hardware failure how long does it take to recover? Can there be partial data access while a disk system is down?
– Are there any single points of failure?– Can you find out where your performance is going? Hot files, hot clients, etc.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
51
Distributed File System
• Key Characteristics– These are often purpose-built file servers with a focus on consolidation and data management.– Standards-based network file systems work with most, but definitely not all, applications. There is no real
standardization when it comes to file sharing across Unix (NFS) and Windows (CIFS). Not all distributed file systems provide the same sharing semantics.
– Can scale up file services independently of application services.– The maximum performance on a file system does not scale beyond what a single file server can provide.– Consolidates data management tasks onto fewer servers. Reduces, but does not eliminate, islands of
storage.– Replication sometimes built in.– Usually must be coupled with an external service to get a global name space. NAS aggregation not
generally available by default.– 3rd party data management tools generally have less features when they work at all. There is rarely support
for native agents.– File servers can be secured in a physical and networking sense.– Strong network security can be put in place, but can also be run wide open.– NAS as typically deployed has historically suffered a variety of security vulnerabilities. CIFS has been open
to “man in the middle attacks” where passwords are compromised. NFS without Kerberos is as secure as the least secure client on the network.
– Implementations have evolved from the file server implementing block data protection on proprietary storage to file servers leveraging sophisticated disk array and block storage virtualization technology
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
52
Shared File System
• Key Characteristics– Can scale up enormously, often to hundreds or even thousands of nodes.– Provides the highest throughput to data.– Often performs poorly on small files and latency-sensitive applications due to locking overhead. Really
targeted at very large files and streaming applications.– Applications must be cluster-ready. They must be aware that the same data is accessible from multiple
hosts at the same time. They must handle locking and synchronization using either an out of band protocol or file system primitives. They can also avoid sharing conflicts by construction.
– Not all systems guarantee cache consistency on the cluster members. Can be a source of application compatibility problems.
– 3rd party data management tools may work but are often unsupported because the file systems are so new and are not widely deployed. The focus here is performance and scalability, not data management.
– You can offload data management tasks to dedicated members of the cluster.– Since all application hosts have full access to the data all of them must be secured.– Upgrading a cluster of hosts is often painful and has significant downtime.– Generally only runs on a small set of host operating system versions.– Certification of new operating systems, and new operating system releases, can lag by months.– Watch for systems that have a single master server as this can be a single point of failure.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
53
Clustered Distributed File System• Key Characteristics
– Combines the advantages of distributed file systems with the performance and scalability advantages of a Shared File System.
– Allows a single NAS file system, and even a single directory, to span a number of file servers.– Not all solutions allow all file servers to have full read/write access to all files, some use a proxy model
wherein all updates route through single host.– Usually comes with an integrated global name space feature. This allows any file to be uniquely named no
matter what file server is contacted.– Does not suffer from host operating system compatibility issues.– Upgrade can still be complex but is much more manageable than with shared file systems.– Mostly an NFS solution today.
• Target Applications (or Best Suited for Applications)– Scalable NFS-based file services for applications that require high throughput when reading & writing large
files.– High performance computing applications (seismic data analysis, CAD & E-CAD design simulations, digital
image rendering, pre & post-production video) that require concurrent data access to files and data.
EDUCATION
Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.
54
Wide Area File System
• Unique Characteristics– Goal is to give LAN-like performance over the WAN using optimized TCP/IP file sharing protocols and
storage and data caching for reducing file access latencies.– Read performance tends to be quite high because data is cached in remote locations.– Works in conjunction with existing file servers to provides file consistency and coherency across a WAN.– Targets a hub-spoke model of remote office data services.– Enables consolidation of data management services to a remote data center.– The ultimate goal is to remove the need for a file server or backup administrator from the remote offices.– As a rule the expectation is that there will be little or no concurrent read/write sharing of files across sites.– Scalability can be excellent, but is dependent on the scalability of native file system on the remote file
servers.– Some products include application aware data caching beyond simple unstructured files, Microsoft
Exchange being a major example.– All WAFS implementations rely on the remote NAS file servers to provide data management capabilities
such as backup (e.g., snapshot, rapid restores), replication, and quota management.– WAFS implementation provide capabilities for secure communications (encryption) at the network layer.– They leverage the authentication and authorization mechanisms available on the remote NAS file servers.– In general, deployment and maintenance of WAFS solutions adds complexity to administering file services at
the data center but make up for it in benefits at the remote offices.• Target Applications (or Best Suited for Applications)
– Distributed software development for a variety of applications (e.g., Computer Aided Design)– File sharing applications (e.g., home directory, document management)– Email messaging systems (e.g., MS Exchange)– Web applications– Distributed print services
EDUCATION
Advanced Data Sharing -Survey of Networked File Systems & File Servers
Jonathan Goldick, ONStorPhilippe Nicolas, Brocade
Top Related