ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

17
© 2015 IBM Corporation ZooKeeper And Embedded ZooKeeper IBM InfoSphere Streams Version 4.0 Yip-Hing Ng Senior Software Engineer Streams Platform Team [email protected]

Transcript of ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

Page 1: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

ZooKeeper And Embedded ZooKeeper

IBM InfoSphere Streams Version 4.0

Yip-Hing Ng

Senior Software Engineer

Streams Platform Team

[email protected]

Page 2: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

3 © 2015 IBM Corporation

Agenda

Apache ZooKeeper Overview

ZooKeeper Architecture

ZooKeeper Data Model

ZooKeeper Consistency Guarantees

Embedded ZooKeeper

External ZooKeeper

ZooKeeper Guidelines/Best Practices

Questions

Page 4: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

4 © 2015 IBM Corporation

Apache ZooKeeper Overview

A highly scalable, open source, distributed coordination service for

distributed applications

Key component and Prerequisite of Streams Version 4.0– Requires v3.4.6 or above

Apache Software Foundation– Used in Apache Hadoop and HBase projects

Provides a set of primitives to implement higher level constructs in a

distributed system such as:– Configuration maintenance

– Synchronization

– Leader Election

– Groups and Naming services

– Work Queues

High Availability– Replication

Page 5: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

5 © 2015 IBM Corporation

ZooKeeper Architecture

ZooKeeper

(Follower)

Host A

Client

(Read)

Client

(Write)

Client

(Read)

ZooKeeper

(Leader)

Host B

Client

(Read)

Client

(Read)

Client

(Read)

ZooKeeper

(Follower)

Host C

ZooKeeper Ensemble

Page 6: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

6 © 2015 IBM Corporation

ZooKeeper Data Model

Hierarchical namespace (e.g.: similar to distributed file system)

Each node called Znode can have its own data and child nodes

Path is represented as canonical absolute path (no relative path) e.g.: /app1/p1

Znode maintains a stat structure Version (conditional update)

ACL

Watcher for data change notification, single trigger

Znode Types Persistent

Exists until they are explicitly deleted

Ephemeral

gets deleted when session expires

Not allowed to have children

Sequential

Can be persistent or ephemeral

Monotonic sequence counter, helpful for synchronization, e.g.: /app2/p1-0000000001

/

/app1 /app2

/app1/p3/app1/p2/app1/p1 /app2/p1

Page 7: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

7 © 2015 IBM Corporation

ZooKeeper Consistency Guarantees

Sequential Consistency

Updates are applied in the order they are received by ZooKeeper

Atomicity

All or nothing, no partial results

Reliability

Once an update has been applied, it will persist from that time forward until

overwritten by another update

Timeliness

Client view is guaranteed to be up-to-date within certain time-bound

Single System Image

Client sees the same view of the service regardless of the ZooKeeper server it

connects to

Page 8: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

8 © 2015 IBM Corporation

Embedded ZooKeeper

Managed by Streams (start, stop, etc.) to simplify Streams prerequiste

Basic Domain creation by Domain Manager or via streamtool e.g.: streamtool mkdomain -d streamsdomain1 --embeddedzk

Primarily use for a single node developer environment. It is not

recommended for a production environment.

A Supervisor process/watchdog runs side by side with Embedded

ZooKeeper

Can be manually started or stopped via streamtool (when no active domain) e.g.: To start it: streamtool embeddedzk --start

e.g.: To stop it: streamtool embeddedzk --stop

e.g.: To get its status: streamtool embeddedzk --status

Page 9: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

9 © 2015 IBM Corporation

Embedded ZooKeeper (cont.)

Embedded ZooKeeper configuration can be set via streamtool

ZooKeeper server related config parameters are prepended with:

streams.zookeeper.property.

e.g.: To update its server port to 21810:

streamtool setbootproperty streams.zookeeper.property.clientPort=21810

Default Embedded ZooKeeper dataDir location

$HOME/.streams/var/embeddedzk/datadir

Default Embedded ZooKeeper and ZKMonitor log/trace file location

$HOME/.streams/var/embeddedzk

Page 10: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

10 © 2015 IBM Corporation

Embedded ZooKeeper (cont.)

ZooKeeper

Controller

ZK Monitor

Single Host

Audit Log

SWS

JMX

AAS

SAM SRM

View

APP

SCH

HC

Page 11: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

11 © 2015 IBM Corporation

External ZooKeeper

Not Managed by Streams

Standalone or Replicated Mode

Specify STREAMS_ZKCONNECT env var or streamtool –zkconnect option e.g.: streamtool mkdomain -d streamsdomain2

--zkconnect zkserver1:2181,zkserver2:2181,zkserver3:2181

Enterprise Domain, use for multi-users and hosts

For reliability and high availability on a production environment, its

recommended to run as an ensemble of ZooKeeper servers.

ZooKeeper Ensemble Writes

All writes go through leader

Global ordering (zxid)

Reads In memory

Follow-the-leader (can lag from leader – but eventual consistency)

Page 12: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

12 © 2015 IBM Corporation

External ZooKeeper (Standalone Mode)

ZooKeeper

Controller

Single Host

Audit Log

SWS

JMX

AAS

SAM SRM

View

APP

SCH

HC

Page 13: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

13 © 2015 IBM Corporation

External ZooKeeper (Replicated Mode)

ZooKeeper

(Follower)

Controller

Host A

AAS

SAM

SWS

Audit Log

JMX

SRM

SCH

View

ZooKeeper

(Leader)

Controller

Host B

AAS

SAM

Audit Log

JMX

SRM

SCH

View

ZooKeeper

(Follower)

Controller

Host C

AAS

SAM

Audit Log

JMX

SRM

SCH

View

Host D

Controller

HC

APP

Host E

Controller

HC

APP

Host F

Controller

HC

APP

Page 14: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

14 © 2015 IBM Corporation

ZooKeeper Guidelines/Best Practices

The ZooKeeper Admin Guide does not recommend standalone mode in a production

environment. ZooKeeper runs as an ensemble of ZooKeeper servers. For reliability and

availability, run ZooKeeper on at least 3 hosts. Running ZooKeeper on 5 hosts is preferred.

For optimal performance and response time, run the ZooKeeper server on a dedicated

machine, and use a dedicated device for the transaction log.

Having a supervisory process that manages each of the ZooKeeper server processes

ensures that if the ZooKeeper process exits abnormally, it is restarted automatically and

rejoins the cluster.

If you use the default ZooKeeper configuration, ZooKeeper does not remove old snapshots

and log files that are stored in the data directory. To configure automatic purging of the old

files, you can use the autopurge.snapRetainCount and autopurge.purgeInterval

parameters.

Ensure that the value of the maxClientCnxns configuration parameter is high enough to

avoid the loss of connections.

Page 15: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

15 © 2015 IBM Corporation

ZooKeeper Guidelines/Best Practices (cont.)

ZooKeeper keeps data in memory and in a persistent store. The amount of data that

InfoSphere Streams stores in ZooKeeper depends on the application runtime size. A typical

amount is three times the application description language (ADL) file size.

The default Java™ heap size for ZooKeeper is the JVM default for the system. If the

maximum heap size is not sufficient for the ZooKeeper runtime system and data in memory,

increase the size by using the JVMFLAGS environment variable.

Tune JVM GC flags to avoid long garbage collection pauses (Parallel/CMS/Incremental GC)

To avoid disk swapping, ensure that the Java heap size is less than the unused physical

memory.

The ZooKeeper Administrator’s Guide recommends having a dedicated disk for the

dataLogDir directory that is separate from the dataDir directory. Set the dataLogDir

parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file.

Periodically backing up the ZooKeeper data and data log directory is a good practice.

Recovering from backups might be necessary in case a catastrophic failure, such as a

corrupted disk, occurs.

Page 16: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

16 © 2015 IBM Corporation

ZooKeeper Guidelines/Best Practices (cont.)

If ZooKeeper follower(s) throw exception that it has fail to follow leader, it may be caused by

Network issues

Disk IO contention

ZK snapshot is too large

This can be resolved by:

Monitoring network

Reduce IO contention

Increase initLimit and syncLimit on all ZooKeeper servers and restart

Page 17: ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0

17 © 2015 IBM Corporation

Questions?