CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – –...

36
CephFS and Samba CephFS and Samba Scale-out file serving for the Masses Scale-out file serving for the Masses David Disseldorp David Disseldorp Senior Software Engineer Senior Software Engineer [email protected] [email protected] Jan Fajerski Jan Fajerski Senior Software Engineer Senior Software Engineer [email protected] [email protected]

Transcript of CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – –...

Page 1: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

CephFS and SambaCephFS and SambaScale-out file serving for the MassesScale-out file serving for the Masses

David DisseldorpDavid DisseldorpSenior Software EngineerSenior Software [email protected]@suse.com

Jan FajerskiJan FajerskiSenior Software EngineerSenior Software [email protected]@suse.com

Page 2: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

https://jan--f.github.io/19_04_susecon/https://jan--f.github.io/19_04_susecon/

Page 3: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

AgendaAgenda•

Quick intro

Spotlight on a few (new) features

Samba Gateway

Performance numbers

Page 4: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Ceph IntroductionCeph Introduction•

Distributed storage system based on RADOS

Scalability by design

Fault tolerant

Self-healing and self-managing

Unified storage cluster

Object storage

Block Storage

File system (POSIX compatible)

Your applicaton

Page 5: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Ceph ArchitectureCeph Architecture

Page 6: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

CephFS ArchitectureCephFS Architecture

Page 7: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

CephFS features (a selection)CephFS features (a selection)

Page 8: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Multiple MDS daemonsMultiple MDS daemonsHigh availability and scalabilityHigh availability and scalability

A failed MDS will bring service down

Many clients and many files can overwhelm MDS cache

Directory tree is partitioned into ranks - max_mds defaults to 1

Additional MDS daemons will join the cluster as standby's

• increase MDS count

systemctl start [email protected] # start an additional mds

ceph fs set <cephfs> max_mds 2

Page 9: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Multi MDS configurationMulti MDS configurationFailover handling parametersFailover handling parameters

mds_beacon_grace

mds_standby_replay

mds_replay_interval

mds_standby_for_rank

mds_standby_for_name

Control rank partitioningControl rank partitioning

Automatic partitioning based on load - hard to beat

Pin directories (and its sub-directories) to a certain rank

Spread periodic loads

Prevent particular load from impacting others

Page 10: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Extended attributesExtended attributesMany options can be controlled at runtime via Many options can be controlled at runtime via setxattr

CaveatsCaveats

File and dir layout are inherited at creation time - don't apply to existing inodes

Clients need the p flag in their cephx key to set quota and layout restrictions

setfattr -n <attribute.name> -v 1000000 /some/dir # set to 1 milliongetfattr -n <attribute.name> /some/dir # read xattrsetfattr -x <attribute.name> /some/dir # remove xattr

ceph.dir.pinceph.quota.[max_bytes|max_files]ceph.[file|dir].layout.[pool|pool_namespace|stripe_unit|stripe_count|object_size]

Page 11: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

QuotasQuotas•

Directory based

Set with xattrs

LimitationLimitation

Cooperative - needs client cooperation

Imprecise - rule of thumb: Clients will be stopped within 10s of reaching a quota

Kernel support require >=4.17 kernel and mimic+ cluster

Path restrictions can interner - Clients need read permission on quota directory

Snapshots of since deleted data does not count - see bug #24284

Page 12: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

SnapshotsSnapshots•

Snapshot a directory tree

Fast creation - data writeback is asynchronous

Magic .snap directory

Clients need s flag in their cephx key

mkdir .snap/my_snapshot # Take a snapshot called my_snapshotls .snap/ # List snapshotsrmdir .snap/my_snapshot # Remove snapshot

client.0 key: AQAz7EVWygILFRAAdIcuJ12opU/JKyfFmxhuaw== caps: [mds] allow rw, allow rws path=/bar caps: [mon] allow r caps: [osd] allow rw tag cephfs data=cephfs_a

Page 13: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

SambaSamba

Page 14: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Samba IntroductionSamba Introduction•

File and print server

SMB / CIFS, SMB2 and SMB3+ dialects

Authentication

NTLMv2 and Kerberos

Identity mapping

Windows SIDs to uids and gids

Active Directory domain member

Page 15: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Samba Gateway for CephFSSamba Gateway for CephFS•

Samba VFS layer for filesystem abstraction

Ceph VFS module provides libcephfs callouts

One or more nodes "proxy" SMB traffic through to CephFS

Samba performs authentication and ID mapping

static CephX credentials per Samba gateway

Page 16: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Samba Gateway for CephFSSamba Gateway for CephFS

Page 17: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Samba Clustering with CTDBSamba Clustering with CTDB•

Clustered Trivial Database (CTDB)

Share state across multiple Samba nodes

Key-value store

Reliable messaging

Active / Active

HA features

Monitoring and failover

Page 18: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Clustering with CTDBClustering with CTDB•

Record location master and data master

Elected recovery master monitors state of cluster

Performs database recovery if necessary

Cluster-wide mutex used to prevent split brain

Uses Ceph RADOS object lock

connections to public IPs are tracked

reset on IP failover

gratuitous ARP and "Tickle" clients

Page 19: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Clustering with CTDBClustering with CTDB

Page 20: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Gateway limitationsGateway limitations•

Cross protocol access

Performance

Leases / oplocks

Clustered node failover

Load balancing

Page 21: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

DemonstrationDemonstration

Page 22: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

PerformancePerformance

Page 23: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Basic hardware performanceBasic hardware performanceCluster hardwareCluster hardware

Ceph on 8 node

5 OSD nodes – 24 cores – 128 GB RAM

16 OSD daemons per node – 1 per SSD, 5 per NVME

3 MON/MDS nodes – 24 cores – 128 GB RAM

9 client nodes

16 cores – 64 GB RAM

Basic networking performanceBasic networking performance

25G networking, 25G to 100G interfaces

Client node -> OSD node 25Gbit/s

Multiple client nodes -> OSD node 72 Gbit/s

OSD Node -> OSD Node 68 Gbit/s

OSD devicesOSD devices

2 x Intel® SSD DC P3700 Series 800GB, 1/2 Height PCIe 3.0, 20nm, MLC

5 x Intel® SSD DC S3700 Series 400GB, 2.5in SATA 6Gb/s, 25nm, MLC

Page 24: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Work loadsWork loadsExperimentsExperiments

Each client (thread) operates on 100 4MB files

Sequential IO in 4k and 4M blocksizes

fsync every 64 IO operations

10 minutes runtime with 2 minutes ramp time

7 clients, various process counts per client

Page 25: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 26: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 27: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 28: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 29: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 30: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 31: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 32: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 33: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2
Page 34: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

Thank you - Questions?Thank you - Questions?

Many Thanks to , and Adam Spiers Florian Haas Hakim El-Hattab and contributors

Page 35: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

https://jan--f.github.io/19_04_susecon/https://jan--f.github.io/19_04_susecon/

Page 36: CephFS and SambaCephFS and Samba...Samba IntrSamba Introdoductionuction • – • – • – – File and print server SMB / CIFS, SMB2 and SMB3+ dialects Authentication NTLMv2

fio example jobfio example job[global]directory=/run/ceph_bench/1:/run/ceph_bench/2:/run/ceph_bench/3:/run/ceph_bench/4:/run/ceph_bench/5:unlink=1wait_for_previous=1filesize=4mnrfiles=100ramp_time=2mtime_based=1runtime=10m#size shouldn't be reached due to runtimesize=100Gfsync=64numjobs=64