Improving Robustness in Distributed Systems Per Bergqvist [email protected] [email protected] Erlang User...
-
Upload
aubrie-boyd -
Category
Documents
-
view
216 -
download
0
Transcript of Improving Robustness in Distributed Systems Per Bergqvist [email protected] [email protected] Erlang User...
![Page 1: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/1.jpg)
Improving Robustness in Distributed Systems
Erlang User Conference 2001
(courtesy CellPoint Systems AB)
![Page 2: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/2.jpg)
Design base
Cluster of cooperating hostsErlang and CCOTS hardware basedUnix based (i.e. Solaris or Linux)10/100/1000 base-T back plane(”system area network”)
![Page 3: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/3.jpg)
Cluster
Shared, distributed, system configurationEach host have ONE cluster controllerDispatch and supervise worker tasksMaster cluster controller: holds configuration database (persistent replica)Slave cluster controller: gets configuration from master cluster controllersCluster is DOWN when all master cluster controllers are inaccessible
![Page 4: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/4.jpg)
Typical system
FirewallSwitch
Traffic
Control
![Page 5: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/5.jpg)
Cluster Key Benefits
Single system view
Enforces decoupling of parts of O&M from actual traffic processing
![Page 6: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/6.jpg)
Implementing a cluster
Cluster->Host->Node->NodeData Cluster global parametersSubscription mechanisms for conf. changesMnesia as configuration database on master cluster controllersHomebrewn configuration distribution to slave controllers (NOT using mnesia)(Worker) node supervision
![Page 7: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/7.jpg)
Mnesia gotchas
First distributed node startup Disallow writes when all replicas not
accessible Use timeout on table load and force
load
![Page 8: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/8.jpg)
... BUT ...
TCP based distribution
Network partitioning
![Page 9: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/9.jpg)
Network parameters
Align TCP retransmission intervals w/ Erlang heartbeatsAlign TCP and IP rerouting parameters
![Page 10: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/10.jpg)
Typical system II: Dual back plane
FirewallSwitch Traffic
Control
![Page 11: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/11.jpg)
Erlang multi-homing problem
Host A
Host B
Host C
![Page 12: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/12.jpg)
Multi-home Erlang w/ TCP
Add an alias interface to loop back i/fPatch tcp distribution to bind to alias
Publish alias interface on (all wanted) via real hw i/f’s Method 1: Static routes and
gratuitous/proxy arp Method 2: Use new (routing) protocol
![Page 13: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/13.jpg)
ARP method
Implement a utility to:- broadcast unsolicited ARP responses- respond to ARP requests for the alias i/f addressAdd static routes on all far end systemsNOTE: all real i/f needs to be on same IP subnet
![Page 14: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/14.jpg)
New routing protocol
Broadcast (Ethernet frames) what you have, including interface priorityLet the far end select path based on what/when they receiveFar end dynamically sets up host routesUse short retransmission intervals
![Page 15: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/15.jpg)
Erlang multi-homing resolved ?
Host A
Host B
Host C
![Page 16: Improving Robustness in Distributed Systems Per Bergqvist per@synapse.se per@synapse.se Erlang User Conference 2001 (courtesy CellPoint Systems AB)](https://reader036.fdocuments.us/reader036/viewer/2022082518/5697bff61a28abf838cbdd22/html5/thumbnails/16.jpg)
Summing up
Erlang can support multihoming with some additional workBy using loop back alias i/f, link failure becomes a routing problem (peer-peer association is kept intact)Solaris TCP/IP stack parameters are:- hard to find (only in out-of-date app. notes)- hard to set ”right”- host globalA distribution mechanism with built-in support for multi-homing preferred