CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but...
Transcript of CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but...
![Page 1: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/1.jpg)
CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION
RADU POPESCU, CERNVM-FS TEAM
GENERIC COMPONENTS OF THE ESCIENCE INFRASTRUCTURE ECOSYSTEM, OCT 2018, AMSTERDAM
![Page 2: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/2.jpg)
CERNVM-FS IN A NUTSHELL
A FILE SYSTEM APPROACH TO DISTRIBUTING SOFTWARE
• FUSE based, independent mount points, e. g. /cvmfs/sft.cern.ch • Clients have a read-only view; single writer into repository • Immutable, content-addressed storage • HTTP transport, access and caching on demand
�2
BASIC SYSTEM UTILITIES CERNVM FS
GLOBAL HTTP CACHE HIERARCHY
FILE SYSTEM MEMORY BUFFER
(~100MB)
CERNVM-FS PERSISTENT CACHE
(~20GB)
REPOSITORY (HTTP OR S3) ~1-10TB
FUSEOS KERNEL
![Page 3: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/3.jpg)
CERNVM-FS @ 10 YEARS
SOME STATS
• ~100 000 clients on the WLCG • Largest repositories: O(108) number of files, O(TB) content size • 85 monitored repositories • Platforms:
• x86_64, i386, ARM (aarch64) • Linux: RHEL, Ubuntu LTS, Debian, SLES • macOS • Experiment: Raspberry Pi, RISC-V!
• Latest version: CernVM-FS 2.5.1
�3
![Page 4: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/4.jpg)
SOFTWARE VS DATA
There are differences when storing and distributing data vs software:
• File size
• Number of files
• Access policies
• Storage location
• Change frequency
Large data distributions require a separate infrastructure, to avoid impacting the software distribution use case.
�4
![Page 5: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/5.jpg)
BEYOND SOFTWARE DISTRIBUTION WITH CERNVM-FS
• Conditions data distribution for ALICE and LHCb
• OSG’s StashCache for working sets < 10 TB
• Using LIGO data on opportunistic resources in OSG
�5
![Page 6: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/6.jpg)
EXTERNAL DATA DISTRIBUTION
A set of features that allow the distribution of a namespace of an existing data repository (ex: LIGO):
• Grafting (only store file metadata)
• Uncompressed files
• HTTPS access (with authorisation plugins)
• A separate infrastructure needs to be operated for this!
�6
![Page 7: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/7.jpg)
AUTHORISATION HELPER PLUGINS
• External processes communicate with CernVM-FS FUSE module over stdin/stdout
• Grant or deny read access to processes based on uid, gid, and “membership”
• Support for X.509 proxy certificates
• SciTokens support is coming in CernVM-FS 2.6.0, thanks D. Weizel!
�7
![Page 8: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/8.jpg)
• Files are stored as compressed, content-addressed chunks
• Max chunk size can be configured
• Rolling checksum algorithm maximises chunk reuse
FILE STORAGE IN CVMFS
�8
![Page 9: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/9.jpg)
VERSIONING FEATURES
• Snapshots: All the snapshots in a repository’s history are accessible through the .snapshots virtual directory (similar to ZFS)
• Branching: organise repository snapshots into a non-linear history
�9
A B C
B’
DEFAULT BRANCH
“FIX” BRANCH
![Page 10: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/10.jpg)
THE SNAPSHOT DIRECTORY
λ ls -al /cvmfs/cernvm-prod.cern.ch/.cvmfs/snapshots
total 381
dr-xr-xr-x 97 cvmfs staff 97 Sep 12 10:15 .
dr-xr-xr-x 3 cvmfs staff 97 Sep 12 10:15 ..
drwxr-xr-x 7 cvmfs staff 4096 Jan 13 2014 HEAD
drwxr-xr-x 4 cvmfs staff 4096 Jan 13 2014 cernvm-system-3.1.0.0
drwxr-xr-x 4 cvmfs staff 4096 Jan 13 2014 cernvm-system-3.1.1.0
drwxr-xr-x 4 cvmfs staff 4096 Jan 13 2014 cernvm-system-3.1.1.1
drwxr-xr-x 4 cvmfs staff 4096 Jan 13 2014 cernvm-system-3.1.1.2
�10
![Page 11: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/11.jpg)
CLIENT CACHE
• The CernVM-FS FUSE client caches content-addressed blocks locally
• Cache can be configured to suit different use-cases or environments: disk cache, external cache plugin, tiered cache
• At CSCS Lugano, CernVM-FS is running on Piz Daint (Cray XC40/50, #6 Top500), using disk cache on GPFS
�11
![Page 12: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/12.jpg)
EXTERNAL CACHE PLUGIN API
• A cache plugin is an external process which communicates with the main CernVM FS client process through a socket (UNIX or network), using a well defined protocol
�12
CVMFS FUSE MODULE
SOCKETLIBCVMFS_CACHE
PLUGIN MAIN
BACKEND LIBRARY
CACHE PLUGIN
![Page 13: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/13.jpg)
CACHE PLUGIN API
• The plugins support different operating environments, such as diskless compute nodes in HPC
• Current plugins: in-memory cache, RamCloud (low latency key-value store), XRootD
• There is a library for developing new plugins
�13
![Page 14: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/14.jpg)
TARBALL INGESTION (IN CERNVM-FS 2.6.0)
• Allows to directly publish the contents of an archive
# cvmfs_server ingest -t archive.tar sft.cern.ch
• Good performance, avoid passing through disk
• No OverlayFS needed
• No way to run any scripts in the same transaction
• A core component for container image distribution (unpacked.cern.ch)
• Thanks, Simone Mosciatti!
�14
![Page 15: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/15.jpg)
PUBLISHING TO CVMFS REPOS
SINGLE PUBLISHER PER REPOSITORY
�15
AUTHORITATIVE STORAGE STRATA 1
SSH NFS, S3
HTTP
USER MACHINE
RELEASE MANAGER
CVMFS FUSE CVMFS SERVER
![Page 16: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/16.jpg)
SINGLE PUBLISHER PER REPOSITORY
DISADVANTAGES
• No support for concurrent writing
• Shell access needed on the machine with direct access to the repository storage
• No fine grained access control
• Possible performance issues for very large change-sets
�16
![Page 17: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/17.jpg)
PUBLISHING TO CVMFS REPOS
MULTIPLE PUBLISHERS
�17
AUTH. STORAGESTRATA 1
CVMFS SERVICE API
REPOSITORY GATEWAY
USER MACHINE
RELEASE MANAGER
CVMFS FUSE CVMFS SERVER
RELEASE MANAGER
CVMFS FUSE CVMFS SERVER
USER MACHINE
![Page 18: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/18.jpg)
PUBLISHING TO CVMFS REPOS
HORIZONTALLY SCALING THE PUBLICATION PROCESS
• Scale-out to multiple release manager machines, process changes concurrently
• New use cases possible, such as multi-tenant repos, containerised release manager
• More flexibility, but at an increased maintenance cost for the user; optional
�18
![Page 19: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/19.jpg)
S3 REPOSITORY BACKEND
• In addition to locally-mounted volumes, CernVM-FS can use S3-compatible object stores as repository storage backend
• Available S3 object stores: AWS S3, Google Cloud Storage, Ceph (CERN S3 Service), Minio (for testing)
• Advantages: availability, scalability, uploading with multiple streams
• Compatible with CernVM-FS repository gateway
�19
![Page 20: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/20.jpg)
XCACHE AS AN HTTP PROXY (EXPERIMENT)
• Xcache is an XRootD configuration that provides a high-performance file proxy
• Accessed using XRootD or HTTP, ingests from XRootD or HTTP (new ingestion plugin)
• With HTTP ingestion plugin it can be inserted non-intrusively between CernVM-FS strata and clients
• Use cases: high-performance site-level cache, better than Squid for large files?
• All the pieces available in XRootD 4.9
�20
![Page 21: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/21.jpg)
SUMMARY
• CernVM-FS has a few useful features in a data distribution context
• Most features are orthogonal, can be mixed and matched
• CernVM-FS is not meant to replace existing data distribution solutions like XRootD, but complement them
• S3 object stores are becoming a preferred storage backend, we aim to have the best integration possible
• Looking forward to CernVM-FS 2.6.0 in Q1 2019
�21
![Page 22: CERNVM-FS: BEYOND SOFTWARE DISTRIBUTION · 2018-10-29 · distribution solutions like XRootD, but complement them • S3 object stores are becoming a preferred storage backend, we](https://reader033.fdocuments.us/reader033/viewer/2022041812/5e5879c5941a484c330d43db/html5/thumbnails/22.jpg)
THANK YOU!
�22