How we used Kafka to scale our Database Infrastructure...Today’s agenda Introduction to Espresso...

How we used Kafka to scale our Database Infrastructure

Basavaiah Thambara(Basu) Staff Site Reliability Engineer

( https://www.linkedin.com/in/basavaiaht )

Today’s agenda

Introduction to Espresso

Espresso - Replication

Espresso with MySQL Replication

Espresso with Kafka Replication

Advantages of Using Kafka

How Kafka Based Replication Works

Conclusion & References

Espresso

Document store MySQL RDBMS & k-v Stores

Consistent & Partition tolerance

Espresso : Features

Multi-colo writes

Bulk import export

Secondary Indexing

Schema Evolution

Change data capture

Linkedin Profiles

Linkedin Invitations

Linkedin InMails, etc.

Espresso : Use CasesLINKEDIN

Espresso : Current Scale

O(100)Clusters

O(10K)Servers

O(100)Databases

O(PB)Data

O(M)Peak QPS

Espresso : Basic Architecture

● Client/Application

● Router

● Helix

● Zookeeper

● Storage node

Espresso : Replication Requirements

Read Scaling BackupsHigh Availability

Disaster Recovery

Multi-colo writes

Espresso : Local Replication

● MySQL Replication

● 3 Copies

● Per Node Replication

● Node Failure

P1 P2 P3

Node 1

P1 P2 P3

Node 2

Node 3

P1 P2 P3

P4 P5 P6

Node 4

P4 P5 P6

Node 5

Node 6

P4 P5 P6

Q1 Q2 Q3

Node N -2

Q1 Q2 Q3

Node N -1

Node N

Q1 Q2 Q3

Master Slave Replication

Legacy Architecture

Espresso : Cross Colo Replication (Legacy)

● Databus

● Data Replicator

● Colo failure

Remote Data Center

Client

Router

API Server

Storage Node

API Server

Storage Node

API Server

Storage Node

DataBusData Replicator

Online Data Center

Client

Router

API Server

Storage Node

API Server

Storage Node

API Server

Storage Node

Data ReplicatorDataBus

Limitations : Per Instance Replication

Poor Resource Utilization

Cross Colo Replication (Legacy)

Limitations : Per Instance Replication

● Databus

■ tightly coupled to storage node

■ operational complexity

■ Uses SSD,higher cost to serve

● Cluster expansion is painful

■ Lot of manual steps

■ Needs databus expansion

■ Requires downtime

Remote Data Center

Client

Router

API Server

Storage Node

API Server

Storage Node

API Server

Storage Node

DataBusData Replicator

Online Data Center

Client

Router

API Server

Storage Node

API Server

Storage Node

API Server

Storage Node

Data ReplicatorDataBus

Limitations: Per Instance Replication

● Upon master failure, single

node gets traffic

● Human intervention to

bring up slaves

● Slave-less situation might

lead to outage

Master Failure Slaves Failure

Node 1

Node 2

Node 3

Master Slave

Node 1

Node 2

Node 3

Node 1

Node 2

Node 3

Node 1

Node 2

Node 3

Offline

Espresso : Replication Using Kafka

● Per partition replication

● Flexible partition placement

● Every node serves traffic

● Data replicator uses kafka

New architecture

Advantages: Per Partition Replication

Better resource utilization.

Resource Utilization (Legacy)

Resource Utilization ( )

Easy cluster expansion.

Cluster Expansion

Initial cluster state with 12 partitions,

3 storage nodes, replication factor=3

Cluster Expansion

Adding a node: Helix will send

offline to Slave for new node

Cluster Expansion

Once partitions on new node

are ready, transfer ownership

and drop old

Cluster Expansion

Cluster state after expansion

with 12 partitions, 4 storage

nodes, r=3

Advantages : Per Partition Replication

Node failure

■ parallel mastership handoff

■ parallel restore of slaves

Easy cluster expansion.

No human intervention.

Single platform.

Cost savings.

Databus complexity eliminated.

Internal replication

Cross colo replication

Change capture for nearline

Requirements

Implementing Kafka based replication

● Broker and producer config

● Implement

2Solution

Requirements

Guaranteed Delivery

Exactly Once(sort of)

In-Order

● Implement

2Solution

Broker config

● Kafka broker config■ replication factor =3■ min.isr = 2■ Disabled unclean

leader elections

● Implement

2Solution

Producer Config● acks = “all”● Infinite retries● block.on.buffer.full = true

● max.in.flight.requests.per.connection = 1● linger.ms = 0● on non-retryable exception

■ destroy producer■ create new producer■ resume from last checkpoint

Global Transaction Identifier

● Global transaction identifier(GTID)● Unique

Replication flow

Message protocol

MySQLMySQL

Message protocol - Mastership Handoff

MySQLMySQL

Checkpointing - Producer

MySQLMySQL

Checkpointing - Producer ...

MySQLMySQL

Checkpointing - Consumer

MySQL MySQL

3:101@2

Producer Failure

MySQLMySQL

3:101@2

Producer Failure...

MySQLMySQL

Producer Failure...

MySQLMySQL

Producer Failure...

MySQL MySQL

3:103@6

Producer Failure...

3:103@6

Producer Failure...

MySQL MySQL

3:103@6

Producer Failure...

3:103@6

Zombie Writes

MySQLMySQL

Zombie Writes...

MySQLMySQL

Zombie Writes...

MySQLMySQL

Zombie Writes...

MySQL MySQL

Zombie Writes...

MySQL MySQL

Conclusion

● LinkedIn leveraged Kafka to scale Espresso● Kafka helped to Unify data pipelines● Reduced operational complexity● Saved $$$

References

1. https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store

2. https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin

3. https://www.slideshare.net/ConfluentInc/espresso-database-replication-with-kafka-tom-quiggle

4. https://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844

How we used Kafka to scale our Database Infrastructure...Today’s agenda Introduction to Espresso...

Documents

Transcript of How we used Kafka to scale our Database Infrastructure...Today’s agenda Introduction to Espresso...

Macchiato - Amazon Web Services...Espresso Espresso Espresso Steamed milk Milk foam Cafe latte Espresso Steamed milk Steamed milk Milk foam Cappuccino Espresso Milk foam Macchiato

IBM INFOSPHERE DATA REPLICATION Kafka...define one partition per topic and the replication factor is set to 1 (no high availability). It is possible to define more than one partition,

Authorization in Apache Kafka - Seattle Kafka Meetup - Ashish Singh

Kafka Reliability Guarantees ATL Kafka User Group

Franz Kafka

User's Guide Apache Kafka Software Release 2 diagram shows the parts (green) of Apache Kafka: Core Apache Kafka (light green), including the Kafka client API and the Kafka broker.

· Apache Kafka Introduction to Apache Kafka Apache Kafka Architecture explanation Practical Examples on Apache Kafka SCALA, PYTHON, SPARK Course Content

Kafka Low-Level Design discussion of Kafka Design Kafka …cloudurable.com/ppt/4-kafka-detailed-architecture.pdf · Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Formatted: Figure [PACKT] cm, Width: 21.59 cm, Height: 27 ... · Kafka 0.7.x Consumer Kafka 0.7.x Cluster Kafka Migration Kafka 0.8 Cluster Kafka 0.8 Producer Producer (Front End)

Kafka Tutorial - basics of the Kafka streaming platform

Kelly Technologies · Kafka Introduction to kafka kafka Architecture Zookeeper quorum and Brokers Creating Topics , producers and consumers Kafka API Flume and Kafka PIG HBASE Introduction

Kafka Streams: The Stream Processing Engine of Apache Kafka

PDF: Kafka Tutorial - Cloudurablecloudurable.com/ppt/cloudurable-kafka-tutorial-v1.pdf · Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ What is Kafka?

Metamorfoza - Franz Kafka - Franz Kafka.pdf · Title: Metamorfoza - Franz Kafka Author: Franz Kafka Keywords: Metamorfoza - Franz Kafka Created Date: 5/13/2019 11:17:21 AM

Kafka to the Maxka - (Kafka Performance Tuning)

Sheppard Kafka

Kafka Messaging JavaCro2019 IBM pdfv · •Kafka basics •Kafka Use Cases •Java Connector Architecture •Distributed messaging challenges •Reliable Kafka Processing •IBM Event

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

Enterprise Kafka: Kafka as a Service

ESPRESSO 50 ESPRESSO DOPPIO 4 ESPRESSO …