Distributed Coordination with Python
-
Upload
oscon-byrum -
Category
Technology
-
view
1.429 -
download
0
description
Transcript of Distributed Coordination with Python
![Page 1: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/1.jpg)
DISTRIBUTED COORDINATIONWITH PYTHON
Ben Bangertmozilla
![Page 2: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/2.jpg)
Tools of the Trade
![Page 3: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/3.jpg)
DISTRIBUTED COORDINATION IS NOT...
• Distributed Databases (Cassandra, Riak)
• Distributed Computing (Hadoop, etc.)
• Distributed Event Analysis (Storm)
![Page 4: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/4.jpg)
The Common Element
![Page 5: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/5.jpg)
Apache Zookeeper
![Page 6: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/6.jpg)
ZooKeeper is a centralized service for maintaining configuration information,
naming, providing distributed synchronization, and providing group services.
![Page 7: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/7.jpg)
ZOOKEEPER
![Page 8: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/8.jpg)
WHY NOT USE...
• Memcached?
• MongoDB?
• Postgres/MySQL?
![Page 9: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/9.jpg)
![Page 10: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/10.jpg)
![Page 11: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/11.jpg)
Hierarchical data structure in znodes
![Page 12: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/12.jpg)
![Page 13: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/13.jpg)
• Session Based
• Znode watches
• Ephemeral and Sequential Znodes
![Page 14: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/14.jpg)
• Last for duration of client session
• Session dies when connection is closed or expires
• Can’t have children znodes
EPHEMERAL ZNODES
![Page 15: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/15.jpg)
SEQUENTIAL ZNODES
• Supply a node name (or not), get node name back with a trailing sequence number (0001, 0002, 0003, etc.)
• Can be combined with ephemeral flag
![Page 16: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/16.jpg)
BASIC COMMANDS
• create(PATH, DATA...)
• get(PATH...)
• get_children(PATH...)
• set(PATH, DATA...)
• delete(PATH...)
![Page 17: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/17.jpg)
![Page 18: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/18.jpg)
PYTHON CLIENTS
• txzookeeper
• kazoo
• unified client that works with gevent
• implements wire protocol in pure Python
![Page 19: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/19.jpg)
USE KAZOO
![Page 20: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/20.jpg)
EASY TO USE
from kazoo.client import KazooClient
client = KazooClient()client.start()
![Page 21: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/21.jpg)
USE CASES
![Page 22: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/22.jpg)
CONFIGURATION
• Store settings in node data
• Organize node structure
• Set watches on nodes of interest
![Page 23: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/23.jpg)
![Page 24: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/24.jpg)
PARTY MEMBERSHIP
• Join a party, find out who else is around
• Elect a leader if desired
• Recipe in Kazoo
![Page 25: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/25.jpg)
LOCKS
• Lock a resource for a single client
• Lock a resource for multiple clients (Semaphore)
• Hard to write properly
• Recipe in Kazoo
![Page 26: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/26.jpg)
BUILDING HIGHER LEVELABSTRACTIONS
ONZOOKEEPER
![Page 27: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/27.jpg)
CAVEAT
![Page 28: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/28.jpg)
DO NOT IMPLEMENT YOURSELFUSE THE RECIPE
![Page 29: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/29.jpg)
![Page 30: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/30.jpg)
BASIC STEPS
• Create lock parent node if needed
• Create ephemeral+sequence node under parent, store node name returned
• Get children of lock node
• Sort children list by sequence number
• First child in the list has the lock!
![Page 31: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/31.jpg)
THINGS TO WATCH OUT FOR
• Avoid the thundering herd, use watches only when needed
• When our node isn’t the lowest, watch the one in front of us
• Only one client wanting a lock is ‘woken’ when the lock is released by a different client
![Page 32: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/32.jpg)
HANDLING FAILURE
![Page 33: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/33.jpg)
ROBUST CODE TAKES EFFORT
• What happens when a server fails?
• What happens when the client fails?
• What happens when we don’t know if the server has failed?
![Page 34: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/34.jpg)
STOPPING WHEN UNCERTAIN
![Page 35: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/35.jpg)
A BIT BETTER VERSION...
![Page 36: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/36.jpg)
EVEN BETTER
![Page 37: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/37.jpg)
FAILURE WILL HAPPEN
• Fail fast, fail completely.
• Session expiration is a good time to sys.exit
• Always include jitter (kazoo includes jitter on its connection and command retry operations)
• Consider what exceptions can occur in any code relying on a distributed system
![Page 38: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/38.jpg)
• Distributed systems are hard
• Use existing battle-proven tools (Zookeeper, Kazoo)
• Always consider everything that can fail, and how
• Be wary of tools that don’t tell you how they fail
• Read Kyle Kingsbury’s Jepsen posts to see examples of systems failing: http://aphyr.com/tags/jepsen
![Page 39: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/39.jpg)
FIN
![Page 40: Distributed Coordination with Python](https://reader036.fdocuments.us/reader036/viewer/2022081413/54821564b4af9f4b418b4751/html5/thumbnails/40.jpg)
QUESTIONS?