Transcript of Efficient data maintaince in GlusterFS using Databases
- 1. GlusterFS1 Efficient data maintenance in GlusterFS using
databases Joseph Fernandes Dan Lambright
- 2. GlusterFS2 Who we are ? Joseph Fernandes (Senior Engineer,
Red Hat Storage) Dan Lambright (Principal Engineer, Red Hat
Storage)
- 3. GlusterFS3 Agenda Quick GlusterFS Overview Data Maintenance
Challenges Existing Solutions Proposed Solution : Optimized
Database Case study : GlusterFS Data Cache Tier Lessons learned
What's next
- 4. GlusterFS4 What is GlusterFS Distributed File System
Software Define NAS TCP/IP or RDMA Native Client, SMB, NFS
- 5. GlusterFS5 What is Data Maintenance Maintenance tasks
performed on data for protection, performance, and optimum storage
utilization
- 6. GlusterFS6 Challenges in Data maintenance Data Maintenance
has a overhead on CPU, Memory, Storage, Network.. Therefore.. Fast
Search Rich Metadata Distribute Load balancing
- 7. GlusterFS7 Existing Solutions File system crawl File system
log Metadata databases In-memory inode caches
- 8. GlusterFS8 Proposed Optimized DB for GlusterFSOptimized DB
for GlusterFS
- 9. GlusterFS9 Optimized DB for GlusterFS Recordnow,consumelater
Database optimized to record fast Good Querying Capabilities
Embedded Database Crash Consistent (Eventually)
- 10. GlusterFS10 LibgfDB API Abstraction Rich Search Filters
Performance optimization options
- 11. GlusterFS11 Gluster Brick Data Maintenance ScannersGluster
Client Posix Xlator CTR Xlator IO Insert / Update LIBGFDB DataStore
LIBGFDB Query
- 12. GlusterFS12 Datastore Optimization: Sqlite3 PRAGMA
page_size: Align page size PRAGMA cache_size: Increased cache size
PRAGMA journal_mode: Change to WAL PRAGMA wal_autocheckpoint : Less
often autocheck PRAGMA synchronous : Set to NORMAL PRAGMA
auto_vacuum : Set to NONE
- 13. GlusterFS13 DataStore Optimization: Sqlite3 Buffer cache
Shared Memory File Write Ahead Logging (WAL) Database file
Insert/Update Sync Checkpoint
- 14. GlusterFS14 Cache Tiering (Gluster 3.7 feature) Tiering
logical volume composed of diverse storage units Secure /
nonsecure, compressed / uncompressed, etc. Cache tiering Fast
storage as cache for slow storage Fa$t SSD, slow HDD Fast 2X
replicated, slow erasure coded What goes in the cache? DB tracks
usage patterns Files migrate between tiers per usage Migration is
slow
- 15. GlusterFS15 Policies for Smart Migration File size Access
rate Migration frequency Break files into chunks Gluster sharding
feature
- 16. GlusterFS16 Tier Xlator HOT DHT COLD DHT Replication Xlator
Other Client Xlator HOT Tier POSIX Xlator CTR Xlator Other Server
Xlator Brick Storage Heat Data Store POSIX Xlator CTR Xlator Other
Server Xlator Brick Storage Heat Data Store COLD Tier Demotion
Promotion
- 17. GlusterFS17 Lesson Learned : DB updates can be expensive DB
query may have scalability problems
- 18. GlusterFS18 What's next: Libgfdb Performance options :
iMeTaL : in-Memory Transaction Log PeTal : Persistent Transaction
Log Sqlite3 Database Sharding Ceph Tier Implementation: Bloom
Filters
- 19. GlusterFS19 Feature Page
http://www.gluster.org/community/documentation/index.php/Features/
Gluster Github: https://github.com/gluster/glusterfs Email: Joseph
Fernandes Dan Lambright
- 20. GlusterFS20 THANK YOU