MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed...

14
MAHADEV KONAR Apache ZooKeeper

Transcript of MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed...

Page 1: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

MAHADEV KONAR

Apache ZooKeeper

Page 2: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

What is ZooKeeper?

A highly available, scalable, distributed coordination kernel

Page 3: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Use Cases

» Leader Election» Group Membership» Work Queues» Event Notifications/workflow management» Configuration Management» Cluster Management » Sharding

Page 4: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

What is ZooKeeper again?

File api without partial reads/writesNo renamesOrdered updates and strong persistence

guaranteesConditional updates (version)Watches for data changesEphemeral znodesGenerated file names

Page 5: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Data Model

Hierarchal namespace

Each znode has data and children

data is read and written in its entirety

/

apps

users

locks

servers

app1

read-1

master

regionserver

Page 6: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

ZooKeeper API

String create(path, data, acl, flags)

void delete(path, expectedVersion)

Stat setData(path, data, expectedVersion)

(data, Stat) getData(path, watch)

Stat exists(path, watch)

String[] getChildren(path, watch)

Page 7: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

ZooKeeper Service

All servers store a copy of the data (in memory) A leader is elected at startup Followers service clients, all updates go through leader Update responses are sent when a majority of servers have persisted the

change

ZooKeeper Service

ServerServer ServerServerServerServer

Leader

Client ClientClientClientClient ClientClient

Page 8: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

ZooKeeper and HBase

Master Failover

Region Servers and Master discovery via ZooKeeper HBase clients connect to ZooKeeper to find

configuration data Region Servers and Master failure detecti0n

Page 9: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Hbase and ZooKeeper as of now!

/

root-region-server

rs

master

• Master • If more than one master, they fight

• Root Region Server• This znode holds the location of the

server hosting the root of all tables in hbase

• rs• A directory in which there is a znode

per Hbase region server• Region Servers register themselves with

ZooKeeper when they come online • On Region Server failure (detected via ephemeral znodes and notification via ZooKeeper), the master splits the edits out per region

shutdown

Page 10: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Common Problems/Error Cases

Garbage Collection at the Region Servers Causes zookeeper clients to stall

Session expiry

Low throughput and connection loss Mostly due to under provisioned ZooKeeper instances Disk and Memory usage

Bad Usage example: NameNode, RegionServer, JobTracker, ZooKeeper

running on the same node

Page 11: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Release 3.3.0, whats in for Hbase?

Allow configuration of session timeout min/max bounds

HBase needs large session timeouts

Improved logging information to detect issues

Improved debugging toolsImproved documentationImproved performance and robustnessQueue implementation available

Page 12: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Upcoming 3.4 release

No ConnectionlossUse Netty - allow encryptionTesting

MockitoMore of backwards compatibility testing

Page 13: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

More ZooKeeper in Hbase?

Table Schema and state in ZooKeeper read only, online

Region Server state transitions via ZooKeeper

Store region assignment in ZooKeeper for each Region Server

http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases

Page 14: MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.

Questions?