How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack...
Transcript of How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack...
![Page 1: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/1.jpg)
Ceph @ CSC How Do we do
![Page 2: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/2.jpg)
#whoami Karan Singh
System Specialist Cloud Storage
CSC-IT Center for Science FINLAND
2
• Author for Learning Ceph – Packt Publication 2015
• Author for Ceph Cookbook – Packt Publication 2016
• Technical Reviewer for Mastering Ceph – Packt Publication 2016
• www.ksingh.co.in - Tune in for my blogs
![Page 3: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/3.jpg)
CSC-IT Center For Science
3
• Founded in 1971
• Finnish Non Profit organization, Funded by Ministry of Education
• Connected Finland to Internet in 1988
• Most Powerful academic computing facility in the Nordics
• ISO27001:2013 Certification
• Public cloud offering Pouta Cloud Services
More Information o https://www.csc.fi/ o https://research.csc.fi/cloud-computing
![Page 4: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/4.jpg)
CSC Cloud Offering
4
• Pouta Cloud Service [ IaaS ] o cPouta - Public cloud , General Purpose o ePouta - Public cloud , purposely built for sensitive data
• Built using OpenStack
• Uses upstream openstack packages, No distribution
• Storage : Both Ceph and non-Ceph
![Page 5: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/5.jpg)
Our Need for Ceph
5
• To build our own storage – Not to buy black box
• Software Defined , use commodity hardware
• Unified – Block , Object , ( File )
• Tightly Integrates with OpenStack
• Open Source, no vendor lock-in
• Scalable and High available
![Page 6: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/6.jpg)
Our Need for Ceph
6
• Remove SPOF for Storage in OpenStack
• OpenStack alone is too complex – Let’s make it a bit less o By using Ceph for storage needs
• To be up-to-date with community o Ceph is the most used storage backend for OpenStack
• Need for Object storage
![Page 7: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/7.jpg)
Storage Complexity
7
LUN
Gateway-‐1
Gateway-‐2
LUN
LUN
LUN
Enterprise Array
Storage for Cinder
OpenStack Compute OpenStack Compute OpenStack Compute
OpenStack Compute OpenStack Compute OpenStack Controller
Local Disk
NFS
Storage for Nova Instances
Storage for Glance
![Page 8: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/8.jpg)
This is why we choose Ceph
• One storage to rule them all
• Goes hand-in-hand with OpenStack
• Supports instance Live Migration, CoW
• Bonus for using Ceph o OpenStack Manila ( Shared filesystem) o On the way
hFp://www.slideshare.net/ircolle/what-‐is-‐a-‐ceph-‐and-‐why-‐do-‐i-‐care-‐openstack-‐storage-‐colorado-‐openstack-‐meetup-‐october-‐14-‐2014
![Page 9: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/9.jpg)
Ceph Infrastructure
9
Production Cluster
• 10 x HP DL380 o E5-2450, 8c, 2.10 GHz o 24GB Memory o 12 x 3TB SATA o 2 x 40Gbe
• Ceph Firefly 0.80.8
• CentOS 6.6 o 3.10.69
• 360 TB Raw
Test Cluster
• 5 x HP DL380 o E5-2450, 8c, 2.10 GHz o 24GB Memory o 12 x 3TB SATA o 2 x 40Gbe
• Ceph Hammer 0.94.3
• CentOS 6.6 o 3.10.69
• 180 TB Raw
Development Cluster
• 4 x HP SL4540 o 2 x E5-2470, 8c, 2.30 GHz o 192 GB Memory o 60 x 4TB SATA o 2 x 10Gbe
• Ceph Hammer 0.94.3
• CentOS 6.6 o 3.10.69
• 960 TB Raw
ePouta Cloud Service
![Page 10: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/10.jpg)
Ceph Infrastructure .. Cont..
10
Pre-Production Cluster
• 4 x HP SL4540 o 2 x E5-2470, 8c, 2.30 GHz o 192 GB Memory o 60 x 4TB SATA o 2 x 10Gbe
• Object Storage Service
• Ceph Firefly 0.80.10
• CentOS 6.5 o 2.6.32
• 240 OSD / 870 TB Available
cPouta Cloud Service
Fujitsu Eternus CD10000
• 4 x Primergy Rx300 S8 o 2 x E5-2640, 8c, 2.00 GHz o 128 GB Memory o 1 x 10Gbe / 1 x 40Gbe o 15 x 900GB SAS 2.5“ 10K o 1 x 800G Fusion ioDrive2 PCIe SSD
• 4 x Eternus JX40 JBOD o 24 x 900GB SAS 2.5“ 10K
• Ceph Firefly 0.80.7
• CentOS 6.6 o 3.10.42
• 156 OSD / 126 TB Available
Proof of Concept
![Page 11: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/11.jpg)
Our toolkit for Ceph • OS deployment, package mgmt.
o Spacewalk
• Ansible o End to end system configuration
o Network, Kernel, packages, OS Tuning, NTP, o Metric collection, Monitoring, Central logging
etc. o Entire Ceph deployment o System / Ceph administration
• Performance Metric & Dashboard o Collectd, Graphite, Grafana
• Monitoring and Logs Management o OpsView, ELK stack
• Version Control o Git , GitHub
11
![Page 12: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/12.jpg)
Live Demo
12
![Page 13: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/13.jpg)
Near Future
• CSC Espoo DC [ePouta Cloud Storage] o Next 8-12 months à 3PB Raw o Introduction to storage POD layout for scalability & better failure domain o Dedicated Monitor node o SSD Journals o Erasure Coding
• CSC Kajaani DC [cPouta Cloud Storage] o Early next year à Add new capacity ~850TB ( total capacity ~1.8 PB Raw ) o Enable full support to OpenStack ( Nova, Glance, Cinder, Swift ) o Erasure Coding
• Miscellaneous o Multi DC replication [Espoo – Kajaani]
13
![Page 14: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/14.jpg)
14
Build Ceph environment , that is • Multi-Petabyte ( ~ 10 PB Usable ) • Hyper Scalable • Multi-Rack Fault tolerant
Storage PODs • Design on paper currently • Still thinking for the best way • Interested to know, what other’s are doing ?
Long Term
![Page 15: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/15.jpg)
15
Disks, Nodes , Racks
Disks
Storage Node
Rack Rack Rack
![Page 16: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/16.jpg)
16
More Racks ... Hyper scale
Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack
How to manage effecPvely
C E P H
![Page 17: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/17.jpg)
17
Storage POD
• Storage POD is a group of racks • Ease of management , in a hyper scale environment • Scalable modular design • Can sustain multi-rack failure • CRUSH failure domain changes required
• Primary copy à One POD • Secondary & Tertiary Copy à Other Two POD’s
![Page 18: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/18.jpg)
18
Storage POD in action
Rack
POD-‐1
Rack
POD-‐2
Rack
POD-‐3
C E P H
![Page 19: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/19.jpg)
Scaling up Multi Rack
19
Rack
POD-‐1
C E P H
Rack
POD-‐2
Rack
POD-‐3
![Page 20: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/20.jpg)
20
Scaling up…even more racks
POD-‐1
C E P H
POD-‐2 POD-‐3
![Page 21: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/21.jpg)
21
Scaling up…several PODs
C E P H
![Page 22: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/22.jpg)
Some Recommendations
• Monitor Nodes o Use dedicated monitor nodes, avoid sharing them with OSD’s o Use SSD for Ceph Monitor LevelDB
• OSD nodes o Avoid overloading your SSD journals, you might not get what you expect. o Node Preference:
o #1 Thin node (10-16 disk) o #2 Thick Node (16-30 disk) o #3 Fat Node (disk > 30)
o If using FAT nodes , use several of them
22
![Page 23: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/23.jpg)
Operational Experience • Use dedicated disks for OS , OSD data & OSD Journal ( can be shared )
• Plan your requirement well , choose PG count wisely for a prod. Cluster o Increasing PG count is one of the most intensive operation o Decreasing PG count is not allowed
• Ceph version upgrades / rolling upgrades , works like charm
• For Thick and FAT OSD nodes , tune kernel o kernel.pid_max=4194303 o kernel.threads-max=200000
23
![Page 24: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/24.jpg)
Operational Experience
• If you are seeing Blocked OPS/Slow OSD/Request, don’t worry you are not alone o Ceph health detail -> Find OSD -> Find node -> Check “EVERYTHING” on that node -> Mark out o If the problem is on most of the nodes -> Check “NETWORK”
o Interface errors , MTU , Configuration, Network blocking , Architecture, Switch logs, remove iface, bonding. o Even the cable change worked for us ( upgraded switch FW and the cable type became up supported )
• Tune CRUSH for optimal parameters o # ceph osd crush tunables optimal o Caution this will trigger a lot of data movement
• Ceph recovery/backfilling can starve your client for IO , you may want to reduce it
ceph tell osd.\* injectargs '--osd_recovery_max_active 1 --osd_recovery_max_single_start 1 --osd_recovery_op_priority 50 --osd_recovery_max_chunk 1048576 --osd_recovery_threads 1 --osd_max_backfills 1 --osd_backfill_scan_min 4 --osd_backfill_scan_max 8’
24
![Page 25: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/25.jpg)
03/02/15 25
# 1 Health OK
#2
![Page 26: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/26.jpg)
Operational Experience • Increasing filestore max_sync and min_sync vlaues , helped to a certain extent
o filestore_max_sync_interval = 140 o filestore_min_sync_interval = 100
• Firmware upgrade on the network switches as well as replacing physical network cables fixed the issue.
26
Advice : Always check your network TWICE !!!
![Page 27: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect.](https://reader033.fdocuments.us/reader033/viewer/2022042708/5af434017f8b9a8d1c8bd7a4/html5/thumbnails/27.jpg)
THANK YOU
27