New Neo4j Auto HA Cluster

26
Neo4j High Availability New Auto-Cluster 1 Michael Hunger - @mesirii

description

In this talk, Michael Hunger is going to shed some light over the new High Availability architecture for the popular Neo4j Graph Database. We are going to look at the different variants of the Paxos protocol, master failover strategies and cluster management state handling. This piece of infrastructure poses non-trivial challenges to distributed consensus-finding, an interesting session for anyone into scalable systems.

Transcript of New Neo4j Auto HA Cluster

Page 1: New Neo4j Auto HA Cluster

Neo4jHigh Availability

New Auto-Cluster

1

Michael Hunger - @mesirii

Page 2: New Neo4j Auto HA Cluster

High Availability Cluster

2

๏Neo4j Enterprise

๏Master-Slave Replication

๏read-scaling and fault-tolerance

๏eventual consistency

•write to master (push_factor)

•write to slaves

Page 3: New Neo4j Auto HA Cluster

3 Separate Concerns (I)

3

๏Cluster Management

•Members join/leave/heartbeat

๏Failover

•Master Election

•Distribution of Master-Status

Page 4: New Neo4j Auto HA Cluster

3 Separate Concerns (II)

4

๏Replication

•synchronized id-generation

•distributed locks

•pull, push of transactions

•initial store synchronization

Page 5: New Neo4j Auto HA Cluster

Pre 1.9 - Zookeeper

5

Page 6: New Neo4j Auto HA Cluster

Pre 1.9

6

๏Apache Zookeeper took care of concerns

•Cluster Management

‣new members register with ZK

•Failover

‣ZK stores Master and last TX-Id

‣ZK uses ZAB to determine new Masterand distribute information

Page 7: New Neo4j Auto HA Cluster

HA Cluster

7

Master

Slave Slave

RO-Slave

Coordinator Coordinator

Coordinator

Page 8: New Neo4j Auto HA Cluster

Pre 1.9 - Problems

8

๏Additional setup and operations of a separate component

๏unreliable operation / hiccups

๏longterm stability

๏no dynamic reconfig of the ZK cluster important for cloud setup

Page 9: New Neo4j Auto HA Cluster

Post 1.9 - Neo4j Auto Cluster

9

Page 10: New Neo4j Auto HA Cluster

Replace Zookeeper!?

10

๏Implement Multi-Paxos ourselves

๏simple, testable code

๏only covers

•cluster management,

•master election

Page 11: New Neo4j Auto HA Cluster

HA Cluster

11

Page 12: New Neo4j Auto HA Cluster

What is Paxos?

12

๏reliable consensus making

๏broadcasting

๏works even with unreliable communication

•message lost

•delays, invalid order

๏does not guarantee progress

Page 13: New Neo4j Auto HA Cluster

What is Paxos?

13

Page 14: New Neo4j Auto HA Cluster

Implementation

14

๏everything is a State Machines

•SM = stateless enums + context

•Message = type enum + payload

•State = enum instance

•Transition = handle() messages, switch on msg-type, implement logic

Page 15: New Neo4j Auto HA Cluster

Implementation (II)

15

๏everything is a State Machines

•use timeouts for reliability

•handle failing messages

•decouple network and time

‣for testability

•listeners interact on messages with outside world, sync or async

Page 16: New Neo4j Auto HA Cluster

Paxos

Implementation (II)

16

๏Paxos (3 roles)

•Proposer-SM

•Acceptor-SM

•Learner-SM

๏Cluster

•Heartbeat

Proposer

Acceptor

Learner

Heartbeat

ClusterState

Page 17: New Neo4j Auto HA Cluster

LEARN FAIL

Proposer Acceptor(2 * f + 1)Learner

PREPARE

PROMISE

ACCEPT

TIMEOUT

TIMEOUT

ACCEPTEDOR

STORE VALUE

ORREJECT

REJECTED

VALUE MATCH

NO MATCH

MATCHES PROMISE?

NO

CHECK , STORE

RESPONSESIF QUORUM

MET, CANCEL TIMEOUT

OUT OF ORDER

MSG HANDLING

STORE VALUE

DELIVER ALL VALID

ATOMIC BC

LEARN TIMEOUTWE STILL DON'T KNOW

LEARN TIMEOUT

A VALUE IS MISSING

LEARN REQLEARN TIMEOUT

other Learner

LEARN

LEARN

ORDON'T KNOW

HAVE VALUE

PREPARE

Multi-Paxos (happy path)

17

...

Page 18: New Neo4j Auto HA Cluster

18

LEARN FAIL

Proposer Acceptor(2 * f + 1)Learner

PROPOSE

PREPARE

PROMISE

ACCEPT

TIMEOUT

TIMEOUT

ACCEPTEDOR

STORE VALUE

ORREJECT

REJECTED

VALUE MATCH

NO MATCH

MATCHES PROMISE?

NO

CHECK , STORE

RESPONSESIF QUORUM

MET, CANCEL TIMEOUT

OUT OF ORDER

MSG HANDLING

STORE VALUE

DELIVER ALL VALID

ATOMIC BC

LEARN TIMEOUTWE STILL DON'T KNOW

LEARN TIMEOUT

A VALUE IS MISSING

LEARN REQLEARN TIMEOUT

other Learner

LEARN

LEARN

ORDON'T KNOW

HAVE VALUE

Multi-Paxos (happy path)...

Page 19: New Neo4j Auto HA Cluster

Acceptor State Machine

19

Page 20: New Neo4j Auto HA Cluster

Heartbeat State Machine

20

Page 21: New Neo4j Auto HA Cluster

Implementation (III)

21

๏HA Implementation uses state machines as infrastructure

๏notifications via listeners

๏piggyback heartbeat on messages

๏master election

•(all - failed) have to agree

•Paxos BC needs quorum of total

Page 22: New Neo4j Auto HA Cluster

Multi-Paxos

22

๏everything is a State Machines

•use timeouts for reliability

•handle failing messages

•decouple network and time

‣for testability

•listeners interact on messages with outside world, sync or async

Page 23: New Neo4j Auto HA Cluster

Unit-Testing

23

•Mock Time

‣fast running tests despite timeouts

•Mock Network

‣simulate delays, failing messages

Page 24: New Neo4j Auto HA Cluster

Unit-Test-Example

24

Page 25: New Neo4j Auto HA Cluster

Setup

25

•Config

•Video

•Auto-Setup Script (Demo)

Page 26: New Neo4j Auto HA Cluster

Thank You - Questions?

26