Espresso Database Replication with Kafka, Tom Quiggle
-
Upload
confluent -
Category
Engineering
-
view
1.462 -
download
1
Transcript of Espresso Database Replication with Kafka, Tom Quiggle
![Page 1: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/1.jpg)
Distributed Data Systems 1 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO Database Replication with KafkaTom QuigglePrincipal Staff Software [email protected]/in/tquiggle@TomQuiggle
![Page 2: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/2.jpg)
Distributed Data Systems 2 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO Overview– Architecture– GTIDs and SCNs– Per-instance replication (0.8)– Per-partition replication (1.0)
Kafka Per-Partition Replication– Requirements– Kafka Configuration– Message Protocol– Producer– Consumer
Q&A
Agenda
![Page 3: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/3.jpg)
Distributed Data Systems 3 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO OverviewESPRESSO Database Replication with Kafka
![Page 4: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/4.jpg)
Distributed Data Systems 4 ©2016 LinkedIn Corporation. All Rights Reserved.
Hosted, Scalable, Data as a Service (DaaS) for LinkedIn’s Online Structured Data Needs
Databases are partitioned Partitions distributed across available hardware HTTP proxy routes requests to appropriate database node Apache Helix provides centralized cluster management
ESPRESSO1
1. Elastic, Scalable, Performant, Reliable, Extensible, Stable, Speedy and Operational
![Page 5: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/5.jpg)
Distributed Data Systems 5 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO Architecture
![Page 6: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/6.jpg)
Distributed Data Systems 6 ©2016 LinkedIn Corporation. All Rights Reserved.
GTIDs and SCNs
MySQL 5.6 Global Transaction Identifier Unique, monotonically increasing, identifier for each transaction
committed GTID :== source_id:transaction_id ESPRESSO conventions
– source_id encodes database name and partition number– transaction_id is a 64 bit numeric value
High Order 32 bits is generation count Low order 32 bit are sequence within generation
– Generation increments with every change in mastership– Sequence increases with each transaction– We refer to a transaction_id component as a Sequence Commit Number (SCN)
![Page 7: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/7.jpg)
Distributed Data Systems 7 ©2016 LinkedIn Corporation. All Rights Reserved.
GTIDs and SCNs
Example binlog transaction:SET @@SESSION.GTID_NEXT= 'hash(db_part):(gen<<32 + seq)';SET TIMESTAMP=<seconds_since_Unix_epoch>BEGINTable_map: `db_part`.`table1` mapped to number 1234Update_rows: table id 1234BINLOG '...'BINLOG '...'Table_map: `db_part`.`table2` mapped to number 5678Update_rows: table id 5678BINLOG '...'COMMIT
![Page 8: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/8.jpg)
Distributed Data Systems 8 ©2016 LinkedIn Corporation. All Rights Reserved.
Node 1
P1 P2 P3
Node 2
P1 P2 P3
Node 3
P1 P2 P3
Node 1
P4 P5 P6
Node 2
P4 P5 P6
Node 3
P4 P5 P6
ESPRESSO: 0.8 Per-Instance Replication
Master
Slave
Offline
![Page 9: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/9.jpg)
Distributed Data Systems 9 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO: 0.8 Per-Instance Replication
Node 1
P1 P2 P3
Node 2
P1 P2 P3
Node 3
P1 P2 P3
Node 1
P4 P5 P6
Node 2
P4 P5 P6
Node 3
P4 P5 P6
Master
Slave
Offline
![Page 10: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/10.jpg)
Distributed Data Systems 10 ©2016 LinkedIn Corporation. All Rights Reserved.
Node 1
P1 P2 P3
Node 2
P1 P2 P3
Node 3
P1 P2 P3
Node 1
P4 P5 P6
Node 2
P4 P5 P6
Node 3
P4 P5 P6
Master
Slave
Offline
ESPRESSO: 0.8 Per-Instance Replication
![Page 11: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/11.jpg)
Distributed Data Systems 11 ©2016 LinkedIn Corporation. All Rights Reserved.
Issues with Per-Instance Replication
Poor resource utilization – only 1/3 of nodes service application requests
Partitions unnecessarily share fate Cluster expansion is an arduous process Upon node failure, 100% of the traffic is redirected to one node
![Page 12: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/12.jpg)
Distributed Data Systems 12 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO: 1.0 Per-Partition ReplicationPer-Instance MySQL replication replaced with Per-Partition Kafka
HELIX
P4:Master: 1Slave: 3…
EXTERNALVIEW
Node 1Node 2Node 3
LIVEINSTANCESNode 1
P1 P2
P4
P3
P5 P6
P9 P10
Node 2
P5 P6
P8
P7
P1 P2
P11 P12
Node 3
P9 P10
P12
P11
P3 P4
P7 P8
Kafka
![Page 13: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/13.jpg)
Distributed Data Systems 13 ©2016 LinkedIn Corporation. All Rights Reserved.
Cluster ExpansionInitial State with 12 partitions, 3 storage nodes, r=2
HELIX
EXTERNALVIEW
Node 1Node 2Node 3
LIVEINSTANCESNode 1
P1 P2
P4
P3
P5 P6
P9 P10
Node 2
P5 P6
P8
P7
P1 P2
P11 P12
Node 3
P9 P10
P12
P11
P3 P4
P7 P8
Master
Slave
Offline
P4:Master: 1Slave: 3…
![Page 14: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/14.jpg)
Distributed Data Systems 14 ©2016 LinkedIn Corporation. All Rights Reserved.
Cluster ExpansionAdding Node: Helix Sends OfflineToSlave for new partitions
HELIX
EXTERNALVIEW
Node 1Node 2Node 3Node 4
LIVEINSTANCESNode 1
P1 P2
P4
P3
P5 P6
P9 P10
Node 2
P5 P6
P8
P7
P1 P2
P11 P12
Node 3
P9 P10
P12
P11
P3 P4
P7 P8
Node 4
P4 P8
P1
P12
P7 P9
P4:Master: 1Slave: 3Offline: 4…
![Page 15: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/15.jpg)
Distributed Data Systems 15 ©2016 LinkedIn Corporation. All Rights Reserved.
Cluster ExpansionOnce a new partition is ready, transfer ownership and drop old
HELIX
EXTERNALVIEW
Node 1Node 2Node 3Node 4
LIVEINSTANCESNode 1
P1 P2 P3
P5 P6
P9 P10
Node 2
P5 P6
P8
P7
P1 P2
P11 P12
Node 3
P9 P10
P12
P11
P3 P4
P7 P8
Node 4
P4 P8
P1
P12
P7 P9
P4:Master: 4Slave: 3…
![Page 16: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/16.jpg)
Distributed Data Systems 16 ©2016 LinkedIn Corporation. All Rights Reserved.
Cluster ExpansionContinue migration of master and slave partitions
HELIX
EXTERNALVIEW
Node 1Node 2Node 3Node 4
LIVEINSTANCESNode 1
P1 P2 P3
P5 P6
P9 P10
Node 2
P5 P6 P7
P2
P11 P12
Node 3
P9 P10
P12
P11
P3 P4
P7 P8
Node 4
P4 P8
P1
P12
P7 P9
P9:Master: 3Slave: 1Offline: 4…
![Page 17: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/17.jpg)
Distributed Data Systems 17 ©2016 LinkedIn Corporation. All Rights Reserved.
Cluster ExpansionRebalancing is complete after last partition migration
HELIX
EXTERNALVIEW
Node 1Node 2Node 3Node 4
LIVEINSTANCESNode 1 Node 2
Node 3 Node 4
P4 P8
P1
P12
P7 P9
P5 P6
P2
P7
P11 P12
P9 P10
P3
P11
P4 P8
P1 P2
P5
P3
P6 P10
P9:Master: 4Slave: 3…
![Page 18: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/18.jpg)
Distributed Data Systems 18 ©2016 LinkedIn Corporation. All Rights Reserved.
Node FailoverDuring failure or planned maintenance, promote slaves to master
HELIX
EXTERNALVIEW
Node 1Node 2Node 3Node 4
LIVEINSTANCESNode 1 Node 2
Node 3 Node 4
P4 P8
P1
P12
P7 P9
P5 P6
P2
P7
P11 P12
P9 P10
P3
P11
P4 P8
P1 P2
P5
P3
P6 P10
P9:Master: 4Slave: 3…
![Page 19: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/19.jpg)
Distributed Data Systems 19 ©2016 LinkedIn Corporation. All Rights Reserved.
Node FailoverDuring failure or planned maintenance, promote slaves to master
HELIX
EXTERNALVIEW
Node 1Node 2Node 4
LIVEINSTANCESNode 1 Node 2
Node 3 Node 4
P4 P8
P1
P12
P7 P9
P5 P6
P2
P7
P11 P12
P9 P10
P3
P11
P4 P8
P1 P2
P5
P3
P6 P10
P9:Master: 4Offline: 3…
![Page 20: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/20.jpg)
Distributed Data Systems 20 ©2016 LinkedIn Corporation. All Rights Reserved.
Advantages of Per-Partition Replication
Better hardware utilization– All nodes service application requests
Mastership hand-off done in parallel After node failure, can restore full replication factor in parallel Cluster expansion is as easy as:
– Add node(s) to cluster– Rebalance
Single platform for all Change Data Capture– Internal replication– Cross-colo replication– Application CDC consumers
![Page 21: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/21.jpg)
Distributed Data Systems 21 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Per-Partition Replication
ESPRESSO Database Replication with Kafka
![Page 22: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/22.jpg)
Distributed Data Systems 22 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka for Internal Replication
![Page 23: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/23.jpg)
Distributed Data Systems 23 ©2016 LinkedIn Corporation. All Rights Reserved.
Requirements
Delivery Must Be: Guaranteed In-Order Exactly Once (sort of)
![Page 24: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/24.jpg)
Distributed Data Systems 24 ©2016 LinkedIn Corporation. All Rights Reserved.
Broker Configuration
Replication factor = 3(most LinkedIn clusters use 2)
min.isr=2 Disable unclean leader elections
![Page 25: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/25.jpg)
Distributed Data Systems 25 ©2016 LinkedIn Corporation. All Rights Reserved.
B – Begin txnE – End txnC – Control
Message ProtocolMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104DB_0: 3:104E
![Page 26: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/26.jpg)
Distributed Data Systems 26 ©2016 LinkedIn Corporation. All Rights Reserved.
Message Protocol – Mastership HandoffOld Master
MySQL
ProducerConsumer
Promoted Slave
MySQL
ProducerConsumer
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104DB_0: 3:104E
4:0C
![Page 27: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/27.jpg)
Distributed Data Systems 27 ©2016 LinkedIn Corporation. All Rights Reserved.
Message Protocol – Mastership HandoffMaster
MySQL
ProducerConsumer
Promoted Slave
MySQL
ProducerConsumer
Consumed own control message
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104DB_0: 3:104E
4:0C
![Page 28: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/28.jpg)
Distributed Data Systems 28 ©2016 LinkedIn Corporation. All Rights Reserved.
Message Protocol – Mastership HandoffOld Master
MySQL
ProducerConsumer
Master
MySQL
ProducerConsumer
Enable writes with new gen
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104DB_0: 3:104E
4:0C
4:0B
![Page 29: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/29.jpg)
Distributed Data Systems 29 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Producer Configuration
acks = “all” retries = Integer.MAX_VALUE block.on.buffer.full=true max.in.flight.requests.per.connection=1 linger=0 On non-retryable exception:
– destroy producer– create new producer– resume from last checkpoint
![Page 30: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/30.jpg)
Distributed Data Systems 30 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Producer CheckpointingMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104
Can’tCheckpoint
Here
Periodically writes (SCN, Kafka Offset) to MySQL tableMay only checkpoint offset at end of valid transaction!
![Page 31: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/31.jpg)
Distributed Data Systems 31 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Producer CheckpointingMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104
Producer checkpoint will lag current producer Kafka OffsetKafka Offset obtained from callback
LastCheckpoint
Here
![Page 32: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/32.jpg)
Distributed Data Systems 32 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Producer CheckpointingMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
Last
CheckpointHere
send()FAILS
X
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104 3:104
![Page 33: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/33.jpg)
Distributed Data Systems 33 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Producer CheckpointingMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
Recreate producer and resume from last checkpoint
Resume From
Checkpoint
Messages will be replayed
3:102B
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104
![Page 34: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/34.jpg)
Distributed Data Systems 34 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Producer CheckpointingMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
Kafka stream now contains replayed transactions(possibly including partial transactions)
Can Checkpoint
Here
Replayed Messages
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104 3:102B
3:102 3:102E
3:103B,E
3:104B
3:104
![Page 35: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/35.jpg)
Distributed Data Systems 35 ©2016 LinkedIn Corporation. All Rights Reserved.
Partition 3
Kafka Consumer
Uses Low Level Consumer Consume Kafka partitions slaved on node
Partition 1
Partition 2
Kafka Broker A
Kafka Broker B
Kafka Consumer
poll()
Consumer Thread
EspressoKafkaConsumer
EspressoReplicationApplier
MySQL
P1
P2
P3
ApplierThreads
![Page 36: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/36.jpg)
Distributed Data Systems 36 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka ConsumerMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104
Slave updates (SCN, Kafka Offset) row for every committed txn
3:101@2
![Page 37: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/37.jpg)
Distributed Data Systems 37 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka ConsumerMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
Client only applies messages with SCN greater than last committed
Replayed Messages
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104 3:102B
3:102 3:102E
3:103B,E
3:104B
3:104
BEGINTransaction
3:104
3:103@6
![Page 38: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/38.jpg)
Distributed Data Systems 38 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka ConsumerMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
Incomplete transaction is rolled back
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104 3:102B
3:102 3:102E
3:103B,E
3:104B
3:104
ROLLBACK3:104
Replayed Messages
3:103@6
![Page 39: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/39.jpg)
Distributed Data Systems 39 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka ConsumerMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
Client only applies messages with SCN greater than last committed
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104 3:102B
3:102 3:102E
3:104B
3:104
SKIP3:102..3:10
3
Replayed Messages
3:103@6
3:103B,E
![Page 40: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/40.jpg)
Distributed Data Systems 40 ©2016 LinkedIn Corporation. All Rights Reserved.
Kafka ConsumerMaster
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
3:101B,E
3:102B
3:102 3:102E
3:100B,E
3:103B,E
3:104B
3:104 3:102B
3:102 3:102E
3:104B
3:104
Replayed Messages
BEGIN3:104
(again)
3:104E
3:103@6
3:103B,E
![Page 41: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/41.jpg)
Distributed Data Systems 41 ©2016 LinkedIn Corporation. All Rights Reserved.
Zombie Write Filtering
What if stalled master continues writing after transition?
![Page 42: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/42.jpg)
Distributed Data Systems 42 ©2016 LinkedIn Corporation. All Rights Reserved.
Zombie Write FilteringMASTER
MySQL
ProducerConsumer
Slave
MySQL
ProducerConsumer
3:102B
3:102 3:102E
3:103B,E
3:104B
3:104
Master Stalled
![Page 43: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/43.jpg)
Distributed Data Systems 43 ©2016 LinkedIn Corporation. All Rights Reserved.
Zombie Write FilteringMaster
MySQL
ProducerConsumer
Promoted Slave
MySQL
ProducerConsumer
3:102B
3:102 3:102E
3:103B,E
3:104B
3:104
Master Stalled4:0C
Helix sends SlaveToMaster transition to one of the slaves
![Page 44: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/44.jpg)
Distributed Data Systems 44 ©2016 LinkedIn Corporation. All Rights Reserved.
Zombie Write FilteringMaster
MySQL
ProducerConsumer
New Master
MySQL
ProducerConsumer
Master Stalled3:102
B3:102 3:102
E3:103B,E
3:104B
3:104 4:0C
4:1B,E
4:2B
4:2E
Slave becomes master and starts taking writes
![Page 45: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/45.jpg)
Distributed Data Systems 45 ©2016 LinkedIn Corporation. All Rights Reserved.
Zombie Write FilteringMaster
MySQL
ProducerConsumer
New Master
MySQL
ProducerConsumer
3:102B
3:102 3:102E
3:103B,E
3:104B
3:104 4:0C
4:1B,E
4:2B
4:2E
3:104E
3:105B,E
Stalled Master resumes and sends binlog entries to Kafka
![Page 46: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/46.jpg)
Distributed Data Systems 46 ©2016 LinkedIn Corporation. All Rights Reserved.
Zombie Write FilteringERROR
MySQL
ProducerConsumer
New Master
MySQL
ProducerConsumer
3:102B
3:102 3:102E
3:103B,E
3:104B
3:104 4:0C
4:1B,E
4:2B
4:2E
3:104E
3:105B,E
4:3B,E
Former master goes into ERROR stateZombie writes filtered by all consumers based on increasing SCN rule
![Page 47: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/47.jpg)
Distributed Data Systems 47 ©2016 LinkedIn Corporation. All Rights Reserved.
Current Status
ESPRESSO Database Replication with Kafka
![Page 48: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/48.jpg)
Distributed Data Systems 48 ©2016 LinkedIn Corporation. All Rights Reserved.
ESPRESSO Kafka Replication: Current Status
Pre-Production integration environment migrated to Kafka replication 8 production clusters migrated (as of 4/11) Migration will continue through Q3 of 2016 Average replication latency < 90ms
![Page 49: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/49.jpg)
Distributed Data Systems 49 ©2016 LinkedIn Corporation. All Rights Reserved.
Conclusions
Configure Kafka for reliable, at least once, delivery. See:
http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
Carefully control producer and consumer checkpoints along txn boundaries
Embed sequence information in message stream to implement exactly-once application of messages
![Page 50: Espresso Database Replication with Kafka, Tom Quiggle](https://reader031.fdocuments.us/reader031/viewer/2022021500/58f1e9081a28ab072f8b45c3/html5/thumbnails/50.jpg)
Distributed Data Systems
Even our workspace isHorizontally Scalable!