My SQL on Ceph
-
Upload
redhatstorage -
Category
Technology
-
view
1.629 -
download
0
Transcript of My SQL on Ceph
![Page 1: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/1.jpg)
MySQL and
Ceph
2:20pm – 3:10pm
Room 203
MySQL in the CloudHead-to-Head Performance Lab
1:20pm – 2:10pm
Room 203
![Page 2: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/2.jpg)
WHOIS
Brent Compton and Kyle Bader
Storage Solution Architectures
Red Hat
Yves Trudeau
Principal Architect
Percona
![Page 3: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/3.jpg)
AGENDA
MySQL on Ceph MySQL in the CloudHead-to-Head Performance Lab
• MySQL on Ceph vs. AWS• Head-to-head: Performance• Head-to-head: Price/performance• IOPS performance nodes for Ceph
• Why MySQL on Ceph• Ceph Architecture• Tuning: MySQL on Ceph• HW Architectural Considerations
![Page 4: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/4.jpg)
Why MySQL on Ceph
![Page 5: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/5.jpg)
• Ceph #1 block storage for OpenStack clouds
• MySQL #4 workload on OpenStack
(#1-3 often use databases too!)
• 70% Apps use LAMP on OpenStack
• Ceph leading open-source SDS
• MySQL leading open-source RDBMS
WHY MYSQL ON CEPH?MARKET DRIVERS
![Page 6: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/6.jpg)
• Shared, elastic storage pool
• Dynamic DB placement
• Flexible volume resizing
• Live instance migration
• Backup to object pool
• Read replicas via copy-on-write snapshots
WHY MYSQL ON CEPH?OPS EFFICIENCY
![Page 7: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/7.jpg)
WHY MYSQL ON CEPH?PUBLIC CLOUD FIDELITY
• Hybrid Cloud requires familiar platforms
• Developers want platform consistency
• Block storage, like the big kids
• Object storage, like the big kids
• Your hardware, datacenter, staff
![Page 8: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/8.jpg)
WHY MYSQL ON CEPH?HYBRID CLOUD REQUIRES HIGH IOPS
Ceph Provides
• Spinning Block – General Purpose
• Object Storage - Capacity
• SSD Block – High IOPS
![Page 9: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/9.jpg)
CEPH ARCHITECTURE
![Page 10: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/10.jpg)
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-
distributed block
device with cloud
platform integration
CEPHFSA distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 11: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/11.jpg)
CEPH OSD
![Page 12: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/12.jpg)
RADOS CLUSTER
RADOS CLUSTER
![Page 13: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/13.jpg)
RADOS COMPONENTS
OSDs
• 10s to 10000s in a cluster
• Typically one per disk
• Serve stored objects to clients
• Intelligently peer for replication & recovery
Monitors
• Maintain cluster membership and state
• Provide consensus for distributed decision-making
• Small, odd number
• These do not serve stored objects to clients
![Page 14: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/14.jpg)
WHERE DO OBJECTS LIVE?
??
![Page 15: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/15.jpg)
A METADATA SERVER?
1
2
![Page 16: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/16.jpg)
CALCULATED PLACEMENT
![Page 17: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/17.jpg)
EVEN BETTER: CRUSH
CLUSTERPLACEMENT GROUPS
(PGs)
![Page 18: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/18.jpg)
CRUSH IS A QUICK CALCULATION
CLUSTER
![Page 19: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/19.jpg)
DYNAMIC DATA PLACEMENT
CRUSH:
• Pseudo-random placement algorithm
• Fast calculation, no lookup
• Repeatable, deterministic
• Statistically uniform distribution
• Stable mapping
• Limited data migration on change
• Rule-based configuration
• Infrastructure topology aware
• Adjustable replication
• Weighting
![Page 20: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/20.jpg)
DATA IS ORGANIZED INTO POOLS
CLUSTERPOOLS(CONTAINING PGs)
POOL
A
POOL
B
POOL
C
POOL
D
![Page 21: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/21.jpg)
ACCESS METHODS
![Page 22: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/22.jpg)
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-
distributed block
device with cloud
platform integration
CEPHFSA distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 23: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/23.jpg)
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-
distributed block
device with cloud
platform integration
CEPHFSA distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 24: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/24.jpg)
ACCESSING A RADOS CLUSTER
RADOS CLUSTER
socket
![Page 25: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/25.jpg)
RADOS ACCESS FOR APPLICATIONS
LIBRADOS
• Direct access to RADOS for applications
• C, C++, Python, PHP, Java, Erlang
• Direct access to storage nodes
• No HTTP overhead
![Page 26: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/26.jpg)
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-
distributed block
device with cloud
platform integration
CEPHFSA distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 27: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/27.jpg)
STORING VIRTUAL DISKS
RADOS CLUSTER
![Page 28: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/28.jpg)
STORING VIRTUAL DISKS
RADOS CLUSTER
![Page 29: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/29.jpg)
STORING VIRTUAL DISKS
RADOS CLUSTER
![Page 30: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/30.jpg)
PERCONA ON KRBD
RADOS CLUSTER
![Page 31: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/31.jpg)
TUNING MYSQL ON CEPH
![Page 32: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/32.jpg)
TUNING FOR HARMONYOVERVIEW
Tuning MySQL
• Buffer pool > 20%
• Flush each Tx or batch?
• Parallel double write-buffer
flushTuning Ceph
• RHCS 1.3.2, tcmalloc 2.4
• 128M thread cache
• Co-resident journals
• 2-4 OSDs per SSD
![Page 33: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/33.jpg)
TUNING FOR HARMONYSAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
1% Buffer Pool
5% Buffer Pool
25% Buffer Pool
50% Buffer Pool
75% Buffer Pool
![Page 34: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/34.jpg)
TUNING FOR HARMONYSAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Batch Tx flush (1 sec)
Per Tx flush
![Page 35: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/35.jpg)
TUNING FOR HARMONYSAMPLE EFFECT OF CEPH TCMALLOC VERSION ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Per Tx flush
Per Tx flush (tcmalloc v2.4)
![Page 36: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/36.jpg)
TUNING FOR HARMONYCREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS
Creating multiple pools in the CRUSH map
• Distinct branch in OSD tree
• Edit CRUSH map, add SSD rules
• Create pool, set crush_ruleset to SSD rule
• Add Volume Type to Cinder
![Page 37: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/37.jpg)
TUNING FOR HARMONYIF YOU MUST USE MAGNETIC MEDIA
Reducing seeks on magnetic pools
• RBD cache is safe
• RAID Controllers with write-back cache
• SSD Journals
• Software caches
![Page 38: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/38.jpg)
HW ARCHITECTURE
CONSIDERATIONS
![Page 39: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/39.jpg)
ARCHITECTURAL CONSIDERATIONSUNDERSTANDING THE WORKLOAD
Traditional Ceph Workload
• $/GB
• PBs
• Unstructured data
• MB/sec
MySQL Ceph Workload
• $/IOP
• TBs
• Structured data
• IOPS
![Page 40: My SQL on Ceph](https://reader033.fdocuments.us/reader033/viewer/2022061307/588267021a28ab470c8b4a77/html5/thumbnails/40.jpg)
NEXT UP
2:20pm – 3:10pm
Room 203
MySQL in the CloudHead-to-Head Performance Lab