MyRocks in the Wild Wild West! - percona.com · Percona tools generally work with MyRocks PMM...
Transcript of MyRocks in the Wild Wild West! - percona.com · Percona tools generally work with MyRocks PMM...
© 2019 Percona 1
AlkinTezuysal
MyRocksintheWildWildWest!AlternateStorageEngineforMySQL
Sr.TechnicalManagerFOSDEM–Feb1st2020
© 2019 Percona 2
Whoarewe?
@ask_dba-AlkinTezuysalBorntoSail,ForcedtoWork❖ OpenSourceDatabaseEvangelist❖ GlobalDatabaseOperationsExpert❖ StoryTeller❖ InspiringTechnicalandStrategicLeader❖ CreativeTeamBuilder❖ Speaker,Mentor,andCoach
© 2019 Percona 3
Agenda
• Introandbasics• Advancedinternalsandlimitations• Benchmarks• Tuningsuggestions• Conclusion
© 2019 Percona 4
OverviewofMyRocks
❖ What’sMyRocks?● StorageengineforMySQL● BasedonRocksDB,aforkofLevelDB● Persistentkey-valuestore● ImplementedatFacebookandintroducedin2016● UsedbyFBinproduction● Wasonlyavailableassourcecodeatfirst
© 2019 Percona 5
OverviewofMyRocks
❖ What’sMyRocks?● PerconaServer:○ AnnouncedforQ12017○ Fullysupported:5.7.20,8.0
● MariaDB:○ Pluginalphasince10.2.5○ Stablesince10.3.7/10.2.16
● Gettingmoremature● Notwidelyused
© 2019 Percona 6
OverviewofMyRocks
❖ BasedonLSMtree❖ Optimizedforwrites❖ Space-efficient❖ Fastdataload(withcorrectsetup)❖ Fastread-freereplication❖ Noforeignkeys,noserializable❖ NoFullTextorSpatialkeys❖ MyRockshasTTLfordata
© 2019 Percona 7
LSMvsB-tree
Imagecredit:b+treelsmhttp://www.benstopford.com/2015/02/14/log-structured-merge-trees/
© 2019 Percona 8
LSMvsB-tree
LSM: write-optimized B-tree: read-optimized
Sequential writes first In-place
Compaction in background Live tree re-balancing
Fast access only to leaves in the fast levels: memory, L0
Fast access to all leaves
© 2019 Percona 9
InnodbvsMyRocks
❖ MyRocks:betterwrites❖ MyRocks:2-5xlesssizethanInnoDB❖ InnoDBsupportsFKsandSerializable❖ InnoDBsupportsXA❖ Handlelockingdifferently
© 2019 Percona 10
InnodbvsMyRocks
❖ InnoDBcanbeusedwithadvancedreplication:Galera,PerconaXtradbCluster,GroupReplication
❖ InnoDBsupportsSTATEMENTandMIXEDbinlogformat❖ MyRocksdoesn’tsupporttransactionslargerthan
availablememory
© 2019 Percona 11
WhyuseMyRocksengine?
❖ Largedatasets➢ Largerthanmemoryavailable
■ 100Gisnotthatlarge➢ Multipleindexes
❖ Write-intensiveload❖ Mostlypointselects*(it’scomplicated)❖ NoFKs/Serializable/XArequired
© 2019 Percona 12
WhyuseMyRocksengine?
© Vadim Tkachenko “How to Rock with MyRocks”
© 2019 Percona 13
WhyuseMyRocksengine?
❖ Costs➢ Cloudcostsspecifically➢ GoodforFlash➢ Resourceutilization
https://www.percona.com/blog/2019/07/19/assessing-mysql-performance-amongst-aws-options-part-two/
© 2019 Percona 14
InstallationandConfiguration
❖ EasilyinstalledforPerconaServerwithpercona-release.#yuminstallPercona-Server-server-57.x86_64#yuminstallPercona-Server-rocksdb-57.x86_64#ps-admin--enable-rocksdbmysql>SHOWENGINES;ROCKSDB|YES|RocksDBstorageenginemysql>createtabletest(idintprimarykey)engine=ROCKSDB;QueryOK,0rowsaffected(0.03sec)❖ Nodowntimerequired
© 2019 Percona 15
InstallationandConfiguration
❖ Configurationoptionscanbereviewedmysql>SHOWVARIABLESLIKE'rocksdb%';rocksdb_block_cache_size:536870912rocksdb_default_cf_options:compression=kLZ4Compression;bottommost_compression=kLZ4Compression
❖ PerconaServer8.0bringsalotofimprovementstodefaults
© 2019 Percona 16
InstallationandConfiguration
❖ SomethingsareconfigurablepercolumnfamilyCREATETABLEt1(aINT,bINT,PRIMARYKEY(a)COMMENT'cfname=cf1’,KEYkb(b)COMMENT'cfname=cf2’)rocksdb_override_cf_options='cf1={compression=kNoCompression};cf2={compression=kZSTD}'
© 2019 Percona 17
Differencesbetweendistributions
❖ Compression➢ Facebook:none,dependsonwhatyoucompilewith➢ PerconaServer:Zlib,ZSTD,LZ4,LZ4HC➢ MariaDB:Snappy,Zlib(+LZ4,LZ4HConUbuntu)
❖ Datafilelocation➢ FacebookandPerconaServer:$datadir/.rocksdb➢ MariaDB:$datadir/#rocksdb
❖ Gaplockdetection➢ PerconaServerandFacebook:yes(FBoffbydefault)➢ MariaDB:no
© 2019 Percona 18
AdvancedInternalsandLimitations
❖ MemTable❖ WAL(WriteAheadLog)❖ LeveledLSMStructure❖ Compaction❖ ColumnFamily❖ …andmore
© 2019 Percona 19
MyRocksEngineArchitectureMemory
PersistentStorage
WAL
WAL
ActiveMemTable
MemTable
Switch Switch
Flush
Compaction
SSTFiles
WriteRequest
© 2019 Percona 20
HowdoesLSMhandlewrites?INSERT INTO ..
WAL/MemTable
Sort
New SST
Existing SSTs Merge & Compact
New SST
© 2019 Percona 21
MemTable(s)❖ StorewritesinMyRocks➢ Associatedwitheachcolumnfamily➢ ChangesgotoWAL➢ Limitedto64Mb
Ref:https://blog.pythian.com/exposing-myrocks-internals-via-system-variables-part-1-data-writing/
© 2019 Percona 22
WAL(WriteAheadLog)
❖ Immediatewrites❖ Actasredo-log
© 2019 Percona 23
LSMLeveledCompaction
Ref:https://www.percona.com/live/17/sites/default/files/slides/MyRocks_Tutorial.pdf
© 2019 Percona 24
Compaction
❖ LSMcompactiononRowlevelisbetter➢ AlignedtoOSsector(4Kbunit)➢ NegligibleOSpagealignmentoverhead
❖ PerconaServerLZ4asdefaultalgorithm➢ Alllevelscompressed➢ Zstdavailable➢ Columnfamiliesallowpertable/index
© 2019 Percona 25
CompressionResults
© 2019 Percona 26
ColumnFamily
❖ Providesqueryatomicitybetweendifferentkeyspaces.➢ MemTablesandSSTfiles➢ Sharedtransactionlogs
❖ Indexmappingis1toN❖ MyRocksconfigurationparametersareperCF❖ IndexCommentperCF
© 2019 Percona 27
LSMonDisk
❖ Innodb(WriteAmplificationonB+Tree)➢ LowerwritepenaltyvsReducedfragmentation➢ B+TreeFragmentationoverspace➢ Compressionissues
❖ Higherreadpenalty❖ Goodfitforwriteheavyworkloads
© 2019 Percona 28
LSMonFlash
❖ Pros➢ Smallerspacewithcompression➢ Lowerwriteamplification
❖ Cons➢ Higherreadpenalty
❖ Goodfitforwriteheavyworkloads
© 2019 Percona 29
MyRocksEngineArchitectureMemory
PersistentStorage
WAL
WAL
ActiveMemTableBloomFilter
MemTableBloomFilter
SSTFiles
ReadRequest
IndexandBloomFilterscached
BlockCache
© 2019 Percona 30
DataStructure&QueryOptimizer
❖ SupportsPrimaryandSecondaryKeys➢ PKisclustered,singlesteplookup➢ FKnotsupported
❖ Tablespacesdon’texist❖ OnlineDDLnotpossible❖ Fastonscanningforward,slowonORDERBYDESC❖ ReversecolumnfamiliescanmakeDESCscanfast
© 2019 Percona 31
DataStructure&QueryOptimizer
❖ OptimizerStatistics➢ Tablestatistics(rocksdb_table_stats_sampling_pct; the default value is 10% )➢ Indexcardinality➢ Records-in-rangeestimates➢ SHOWENGINEROCKSDBSTATUS\G➢ CaseSensitiveandBinaryCollations
■ CREATETABLEmyrocksENGINE=ROCKSDBCOLLATElatin1_bin
© 2019 Percona 32
DataStructure&QueryOptimizer
❖ OptimizerStatistics➢ SSTfilesstoresindexstatistics
■ Idxname,size,#ofrows,diskspace,deletes■ Distinct#ofkeys
➢ Calculatedduringflush/compaction■ AbilitytoforceusingANALYZETABLEsyntax(smalltables)
➢ MultiRangeRead(MRR)isnotsupported
© 2019 Percona 33
DataDictionary
❖ ColumnFamilyID❖ IndexID❖ GlobalIndexID:ColumnFamilyID+IndexID❖ InformationSchema
© 2019 Percona 34
Locking&IsolationLevels
❖ Rowlocking➢ Read-Committed➢ Repeatable-Read
❖ GapLock-NotSupported➢ ErroronstatementforRepeatable-Read➢ PerconaServerwilldetectanderrorout
© 2019 Percona 35
Replication
❖ RBRbinlog_format=ROW➢ Largebinlogs➢ Notriggersonslaves➢ Schemaincompatibilities
❖ SBRcausesissueswithGapLocks➢ Canuseonslaves➢ Ifsafesetrocksdb_unsafe_for_binlog=1
© 2019 Percona 36
BackupandRecovery
❖ XtraBackup➢ Onlyin8.0withxtrabackup8.0.6+➢ OptimizedforInnodbandMyRocks➢ NopartialbackupsforMyRocks
❖ Mariabackup➢ 10.2.16+,10.3.8+➢ NopartialbackupsforMyRocks
© 2019 Percona 37
BackupandRecovery
❖ myrocks_hotbackup➢ Originalbackuptool➢ Doesn’tworkwith8.0➢ CopiesRocksDBcheckpoint+WAL➢ MyRocksonly,won’tdoanythingforinnodb➢ Supportsrollingcheckpoint
■ LessWALtoapplyonrestoretillreplication
© 2019 Percona 38
BackupandRecovery
❖ mysqldump➢ Optimizationcanbeenabledforimport➢ rocksdb_bulk_load=1➢ mysqldumpinPerconaServerdetectsMyRocksautomatically
❖ Snapshots➢ Quitedifficulttodorightwhenmixingengines➢ MyRocks:checkpoint+wal
© 2019 Percona 39
Crashrecovery
❖ Corruptedimmutablefiles:notrecoverable❖ WALfile:recoverable➢ Variablerocksdb_wal_recovery_mode
■ 1:Failtostart,donotrecover■ 0:Ifcorruptedlastentry:truncateandstart■ 2:Truncateeverythingaftercorruptedentry■ 3:Truncateonlycorruptedentry(unsafe)
© 2019 Percona 40
Toolcompatibility
PerconatoolsgenerallyworkwithMyRocks
PMM Supported Built-in dashboards for MyRocks
xtrabackup Supported Since xtrabackup 8.0.6 (MySQL 8.0 only)
pt-online-schema-change Partial Only in read committed
pt-table-checksum Not supported Only ROW is supported by MyRocks
pt-table-sync Not supported Only ROW is supported by MyRocks
© 2019 Percona 41
Benchmarks
© 2019 Percona 42
Benchmarks
© 2019 Percona 43
Tuningsuggestions
❖ DirectoryStructure➢ Allfilesareunder.rocksdbdirectory➢ Nofilepertableoption(notevenperdb)➢ Logfileverbosityishigh
❖ Bewareofbulkloadisproblematic➢ Setrocksdb_bulk_load=1➢ Setrocksdb_commit_in_the_middle=1
© 2019 Percona 44
Tuningsuggestions
❖ MemoryCacheBlocks➢ rocksdb_block_cache_size-SHOWENGINEROCKSDBSTATUS
❖ DirectIO(bypassOScache)➢ rocksdb_use_direct_reads=ON➢ rocksdb_use_direct_io_for_flush_and_compaction=ON
© 2019 Percona 45
Tuningsuggestions
❖ Simulationcache➢ rocksdb_sim_cache_size■ Simulatesblockcache(forreads)■ Settolarger/smallervalue(restart)■ Costs~2%ofthatvalue■ Showenginerocksdbstatus\G
● rocksdb.sim.block.cache.hitCOUNT:346684● rocksdb.sim.block.cache.missCOUNT:86667
© 2019 Percona 46
Tuningsuggestions
❖ Backgroundjobs➢ rocksdb_max_background_jobs=<num_cpu_cores/4>➢ rocksdb_max_total_wal_size=4G
❖ Bettercompression➢ rocksdb_block_size=16384
© 2019 Percona 47
Tuningsuggestions
❖ Memorylimits➢ rocksdb_db_write_buffer_size
❖ UnlessusingPerconaServer8.0withoptimizeddefaults➢ rocksdb_default_cf_options■ Use8.0defaults,atleastenablebloomfilters■ block_based_table_factory= {filter_policy=bloomfilter:10:false;};
© 2019 Percona 48
Conclusion
❖ Bigdatasetsover100Gb❖ Multipleindexes❖ Write-intensiveworkloads❖ Concurrentreadswithoutrangescans❖ Cloudefficientandcheapertorun➢ LessIOPS,Memory,Storage
❖ WriteandReadimmediately
© 2019 Percona 49
SpecialThanksto...
❖ YoshinoriMatsunobu@matsunobu ❖ VadimTkachenko@VadimTk❖ SvetaSmirnova@svetsmirnova❖ MarkCallaghanfordoingtheextensive
researchanddevelopment.❖ Engineering,ExpertsandServicesTeamsat
Percona
© 2019 Percona 50
Q&A
© 2019 Percona 51
Credits&Referenceshttps://www.slideshare.net/matsunobu/myrocks-deep-divehttps://blog.pythian.com/exposing-myrocks-internals-via-system-variables-part-1-data-writing/https://www.percona.com/resources/webinars/how-rock-myrockshttps://mariadb.com/kb/en/library/optimizer-statistics-in-myrocks/http://smalldatum.blogspot.com/2017/12/myrocks-innodb-and-tokudb-summary.html