Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · –...
Transcript of Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · –...
![Page 1: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/1.jpg)
Yair Amir 110 Dec 02
Distributed Systems Research that Makes a Difference
Center for Networking and Distributed SystemsJohns Hopkins University
With: Baruch Awerbuch, R. Sean Borgstrom, Ryan Caudy,Claudiu Danilov, Ashima Munjal, Cristina Nita-Rotaru, Theo Schlossnagle, Jonathan Stanton, Ciprian Tutu
www.cnds.jhu.edu
Yair Amir
Yair Amir 210 Dec 02
Distributed Systems• Distributed system is:
– A collection of computers. – Connected by a network.– With a common goal.
• We at CNDS care about:– High availability.– High performance.
![Page 2: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/2.jpg)
Yair Amir 310 Dec 02
Hopkins/CNDS�Research�Objective
• Understand the problem: distributed communication andinformation access.– Availability, consistency, performance tradeoffs…
– Local / wide area networks, latency, throughput…
• Invent correct and efficient algorithmic solutions to cope with the problematic aspects of distributed systems.
• Develop a small set of generic software tools encapsulating these solutions.
• Use these software tools to create a paradigm shift in the way distributed systems are actually built:– Through actual use of these software tools.
– Through education.
Yair Amir 410 Dec 02
Outline• CNDS vision.• Group communication.
– What is it?– The Spread toolkit.– Distributed logging with Spread.
• Metacomputers.– Static and dynamic resource allocation.– The Cost-Benefit framework.– Backhand – load balancing a web cluster.
• Wackamole.– High availability for clusters.– High availability for edge routers.
• Making a difference.
![Page 3: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/3.jpg)
Yair Amir 510 Dec 02
Group�Communication:A�different�communication�paradigm
C C C C C C C Ca a a ab
bc c d a
Message to group a Message to group b
Yair Amir 610 Dec 02
Group�Communication:A�different�communication�paradigm
• Handles potential problems in the network: message loss, machine downtime, network partitions, security threats.
• Delivery guarantees: Unreliable, Reliable, Safe (stable).• Message ordering: Unordered, FIFO, Agreed order.• Membership services.• Group security: Unsecure, Signed, Encrypted.
C C C C C C C Ca a a ab
bc c d a
![Page 4: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/4.jpg)
Yair Amir 710 Dec 02
What�Can�You�Do�With�Group�Communication?
• Efficient server replication– High throughput.– High availability.
• And also:– Distributed system management.– Conferencing.– Collaborative Design. – Distributed simulations.– …
Yair Amir 810 Dec 02
Spread:�A�Group�Communication�Toolkit
• Client – Daemon architecture.• Very simple API (basically 6 calls). C/C++, Java, Perl, Python…• Cross platform. Windows, Solaris, Alpha, MacOS, Embedded… • Part of OpenLinux, Debian, FreeBSD.
http://www.spread.org.
C
S
CC C
S
CC C
S
Ca a a ab
bc c d a
![Page 5: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/5.jpg)
Yair Amir 910 Dec 02
A�Spread�Overlay�Network
• Uses membership knowledge to optimize performance.
UCSB
OSU
Rutgers
Mae East
Hopkins
CNDS
HardwareBroadcast
HardwareMulticast
IP Multicast
Yair Amir 1010 Dec 02
Distributed�Logging
• Think of a cluster of 50 (web) servers.• Operation logs for all of the servers should be
consolidated in one logical place.• Aggregate log should be resilient (so it must be
replicated).
![Page 6: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/6.jpg)
Yair Amir 1110 Dec 02
Distributed�Logging�(cont.)
Today, mainly a service for local area clusters.
(Web) cluster
Useful Spread properties:
• Scalability with the number of groups.• Open group semantics - good support
for publish-subscribe.• Membership services.• Efficient for small messages.• Small footprint - low overhead.
Yair Amir 1210 Dec 02
Implementation
• Option 1: Distributed syslogd– Adding a Spread group as a potential
destination of the operating system loggersyslogd (in addition to file, screen, socket).
• Option 2: mod_log_spread – An Apache module enabling logging through
Spread (courtesy of George Schlossnagle).
![Page 7: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/7.jpg)
Yair Amir 1310 Dec 02
The�Real�Challenge• Surprisingly, not the basic protocols.
• But:– Hostile environment - average load on the servers
may top 100!– They keep adding these servers...– Has to be rock-solid, running constantly for days.
====================================================Status at bp12 V3.12 (state 1, gstate1) after 81113 seconds :Membership : 34 procs in 1 segments, leader is bp12rounds : 2796352 tok_hurry : 8537 memb change: 5sent pack: 625516 recv pack : 27229735 retrans : 553u retrans: 460 s retrans : 93 b retrans : 0My_aru: 21829811 Aru : 21829718 Highest seq: 21829811Sessions : 67 Groups : 2 Window : 60Deliver M: 27552100 Deliver Pk: 27855251 Pers Window: 15Delta Mes: 8076 Delta Pack: 8166 Delta sec : 10====================================================
Yair Amir 1410 Dec 02
Distributed�Logging�(cont.)
• Once you have it running, there is more:– Real-time monitoring.– Real-time data-mining and customization.
![Page 8: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/8.jpg)
Yair Amir 1510 Dec 02
Outline• CNDS vision.• Group communication.
– What is it?– The Spread toolkit.– Distributed logging with Spread.
• Metacomputers.– Static and dynamic resource allocation.– The Cost-Benefit framework.– Backhand – load balancing a web cluster.
• Wackamole.– High availability for clusters.– High availability for edge routers.
• Making a difference.
Yair Amir 1610 Dec 02
Static�and�Dynamic�Distributed�Resource�Allocation
• PVM - Parallel Virtual Machine– Static Assignment of jobs to machines.– Default - round robin assignment policy.– Widely used.
• Mosix– Dynamic process migration.– Main objective is load balancing, with some
ad-hoc heuristics for memory depletion.
We looked at two metacomputing systems: PVM and Mosix.
![Page 9: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/9.jpg)
Yair Amir 1710 Dec 02
A�Cost�Benefit�Approach�to�Distributed�Resource�Allocation
...1 n2
Resource A(CPU)
Resource B(memory)
Resource C(I/O)
Usage
Cost
Usage
Cost
Usage
Cost
)( rnutilization=
Machine_cost = cost(CPU) + cost(memory) + cost(I/O)
cost
Yair Amir 1810 Dec 02
New�Assignment�Policy
• Enhanced PVM (EPVM)
• Enhanced Mosix (Emosix).
Based on the Cost Benefit framework,We have created two additional policies:
We compared the performance of the four policiesaccording to the average slowdown criterion.
![Page 10: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/10.jpg)
Yair Amir 1910 Dec 02
Simulation:�PVM�/�EPVM
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80
PVM
Enh
ance
d P
VM
Average Slowdown (lower is better)
Yair Amir 2010 Dec 02
Simulation:�Mosix /�EMosix
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60 70
MOSIX
En
han
ced
MO
SIX
Average Slowdown (lower is better)
![Page 11: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/11.jpg)
Yair Amir 2110 Dec 02
Real�Life:�PVM�/�EPVM
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35
PVM
En
han
ced
PV
MN
Average Slowdown (lower is better)
Yair Amir 2210 Dec 02
Interesting:�Mosix /�EPVM
0
5
10
15
20
25
30
0 5 10 15 20 25 30
Mosix
En
han
ced
PV
M
Average Slowdown (lower is better)
![Page 12: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/12.jpg)
Yair Amir 2310 Dec 02
How�general�is�the�framework?�Managing�a�Web�farm
• Applying the same concept to manage the resources of a Web farm.
• Web servers in the farm may have different capabilities. Tasks may have different requirements.
• Hard Problems:– Relatively short lived tasks.– Stale information (welcome to the real world).– Inaccurate information (welcome to the real world -
did we say that?).
Yair Amir 2410 Dec 02
Backhand:Load�Balancing�A�Web�Cluster
• Peer-to-peer, cost-based resource allocation decisions.
• The necessary “machinery” implemented as an Apache module to minimize overhead.
• Linux, Solaris, FreeBSD, Windows*• Part of SuSE Linux and Open Linux.• www.backhand.org
![Page 13: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/13.jpg)
Yair Amir 2510 Dec 02
Backhand�in�Action
Yair Amir 2610 Dec 02
Wackamole
![Page 14: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/14.jpg)
Yair Amir 2710 Dec 02
Wackamole
Yair Amir 2810 Dec 02
Wackamole
![Page 15: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/15.jpg)
Yair Amir 2910 Dec 02
Outline• CNDS vision.• Group communication.
– What is it?– The Spread toolkit.– Distributed logging with Spread.
• Metacomputers.– Static and dynamic resource allocation.– The Cost-Benefit framework.– Backhand – load balancing a web cluster.
• Wackamole.– High availability for clusters.– High availability for edge routers.
• Making a difference.
Yair Amir 3010 Dec 02
Wackamole�- Dynamic�Cluster�with�N-way�Fail-over
• In contrast to available gateway solutions (Alteon, BigIP, Cisco’s Local-director, …).
• Exploiting Spread’s strong membership semantics to build a distributed agreement protocol that strictly distributes the cluster’s public IP addresses between the currentlyconnected servers.
• Useful for locally replicated web servers, ftp servers, DNS servers, and even fail-over routers.
• Transparent!www.wackamole.org
![Page 16: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/16.jpg)
Yair Amir 3110 Dec 02
Wackamole�Architecture
Yair Amir 3210 Dec 02
Wackamole�for�Clusters
![Page 17: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/17.jpg)
Yair Amir 3310 Dec 02
Wackamole�for�Edge�Routers
Yair Amir 3410 Dec 02
Wackamole�Performance
• Out of the box average for fail-over: 12 seconds.• Tuned average for fail-over: 2.5 seconds.• Voluntary fail-over (maintenance, etc): up to 250 milliseconds,
most of the time within 10 milliseconds.
Fail-over Latency
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11
Number of Servers
Sec
on
ds Out of the box
Spread
Tuned Spread
![Page 18: Distributed Systems Research that Makes a Differenceyairamir/Space_telescope_0212.pdf · – Availability, consistency, performance tradeoffs… – Local / wide area networks, latency,](https://reader034.fdocuments.us/reader034/viewer/2022052023/603878d16962f7686e32b1d8/html5/thumbnails/18.jpg)
Yair Amir 3510 Dec 02
0
1000
2000
3000
4000
5000
6000
Oct
-97
Apr
-98
Oct
-98
Apr
-99
Oct
-99
Apr
-00
Oct
-00
Apr
-01
Oct
-01
Apr
-02
Oct
-02
Nu
mbe
r of D
ownl
oads
SpreadBackhandWackamole
Making�a�Difference
10,000 actual sitesdiscovered
100 actual sitesdiscovered
Backhand-powered web sites discovery thanks to Netcraft