Reintroducing Consistency in Cloud Settings

48
Reintroducing Consistency in Cloud Settings Ken Birman, Cornell University

description

Ken Birman , Cornell University. Reintroducing Consistency in Cloud Settings. Massive Cloud Platforms. … and a Live, Collaborative Web. The “ realtime web” Simple ways to create and share collaboration and social network applications - PowerPoint PPT Presentation

Transcript of Reintroducing Consistency in Cloud Settings

Page 1: Reintroducing Consistency in Cloud Settings

Reintroducing Consistency in Cloud Settings

Ken Birman, Cornell University

Page 2: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 2

Massive Cloud Platforms

Sept 24, 2009

Page 3: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 3

… and a Live, Collaborative Web

The “realtime web”

Simple ways tocreate and sharecollaboration and social network applications

[Try it! http://liveobjects.cs.cornell.edu]

Examples: Live Objects, Google “Wave”, Javascript/AJAX, Silverlight, Java Fx, Adobe FLEX and AIR, etc….

Sept 24, 2009

Page 4: Reintroducing Consistency in Cloud Settings

Rediscovering a Lost World… Cloud computing entails building

massive distributed systems They use replicated data, sharded relational

databases, parallelism Brewer’s “CAP theorem:” Must sacrifice

Consistency for Availability & Performance Cloud providers believe this theorem

My view:

We gave up on consistency too easily

Long ago, we knew how to build reliable,

consistent distributed systems.

Page 5: Reintroducing Consistency in Cloud Settings

Why do people believe in CAP?

Partly, superstition….

… albeit backed by some painful experiences

Page 6: Reintroducing Consistency in Cloud Settings

Consistency can hurt!Don’t believe me? Just ask the people who really know…

Page 7: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 7

eBay’s Five Commandments

As described by Randy Shoup at LADIS 2008

Thou shalt…1. Partition Everything2. Use Asynchrony Everywhere3. Automate Everything4. Remember: Everything Fails5. Embrace Inconsistency

Sept 24, 2009

Page 8: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 8

Vogels at the Helm

Werner Vogels is CTO at Amazon.com… His first act? He banned reliable

multicast*! Amazon was troubled by platform instability Vogels decreed: all communication via

SOAP/TCP

This was slower… but Stability matters more than speed

* Amazon was (and remains) a heavy pub-sub user

Sept 24, 2009

Page 10: Reintroducing Consistency in Cloud Settings

… embodied into Azure

Applications structured as stateless tasks Azure decides when and how much to

replicate them, can pull the plug as often as it likes

Any consistent state lives in backend servers running SQL server… but application design tools encourage developers to run locally if possible

Page 14: Reintroducing Consistency in Cloud Settings

Why fear consistency?

They reason this way: Systems that make guarantees put those

guarantees first and struggle to achieve them For example, any reliability property forces a

system to retransmit lost messages, use acks, etc But modern computers often become unreliable

as a symptom of overload… so these consistency mechanisms will make things worse, by increasing the load just when we want to ease off!

So consistency (of any kind) is a “root cause” for meltdowns, oscillations, thrashing

Page 15: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 15

Where does it come from?

Transactions that update replicated data

Atomic broadcast or other forms of reliable multicast protocols

Distributed 2-phase locking mechanisms

Sept 24, 2009

Page 16: Reintroducing Consistency in Cloud Settings

If we rule out such mechanisms…

Our systems become “eventually” consistent but can lag far behind reality

Thus application developers are urged to not assume consistency and to avoid anything that will break if inconsistency occurs

Page 17: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 17

A Consistency Property: Virtual Synchrony

Synchronous runs: indistinguishable from non-replicated object that saw the same updates (like Paxos)

Virtually synchronous runs are indistinguishable from synchronous runs

p

q

r

s

t

Time: 0 10 20 30 40 50 60 70

p

q

r

s

t

Time: 0 10 20 30 40 50 60 70

Synchronous execution Virtually synchronous execution

Sept 24, 2009

Non-replicated reference execution

A=3

B=7

B = B-A

A=A+1

Page 18: Reintroducing Consistency in Cloud Settings

When virtual synchrony ruled… During the 1990’s, Isis was a big success

French Air Traffic Control System, New York Stock Exchange, US Navy AEGIS are some blue-chip examples that used (or still use!) Isis

But there were hundreds of less high-profile users

However, it was not a huge commercial success Focus was on server replication and in those

days, few companies had big server pools

Page 19: Reintroducing Consistency in Cloud Settings

Under market pressures,Isis faded away…

Leaving a collection of weaker products that, nonetheless, were sometimes highly toxic

For example, publish-subscribe message bus systems that use IPMC are notorious for massive disruption of data centers!

Among systems with strong consistency models, only Paxos is widely used in cloud systems (but its role is strictly for locking)

020004000600080001000012000

250 400 550 700 850

mes

sage

s /s

time (s)

Page 20: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 20

Dangers of Inconsistency Inconsistency causes bugs

Clients would never be able to trust servers… a free-for-all

Weak or “best effort” consistency? Strong security guarantees demand

consistency Would you trust a medical electronic-

health records system or a bank that used “weak consistency” for better scalability?

My rent check bounced?That can’t be right!

Sept 24, 2009

Jason Fane Properties 1150.00

Sept 2009 Tommy Tenant

Page 21: Reintroducing Consistency in Cloud Settings

Challenges

To reintroduce consistency we need A scalable model▪ Should this be the Paxos model? The old Isis

one? A high-performance implementation▪ Can handle massive replication for individual

objects▪ Massive numbers of objects▪ Won’t melt down under stress▪ Not prone to oscillatory instabilities or

resource exhaustion problems

Page 22: Reintroducing Consistency in Cloud Settings

ReIntroducing Isis2ReIntroducing Isis2

I’m reincarnating group communication! Basic idea: Imagine the distributed system as a

world of “live objects” somewhat like files They float in the network and hold data when idle Programs “import” them as needed at runtime▪ The data is replicated but every local copy is accurate▪ Updates, locking via distributed multicast; reads are

purely local; failure detection is automatic & trustworthy

Page 23: Reintroducing Consistency in Cloud Settings

How will Isis2 look?

A library… highly asynchronous…Group g = new

Group(“/amazon/something”);g.register(UPDATE, myUpdtHandler);g.cast(UPDATE, “John Smith”,

new_salary);

public void myUpdtHandler(string empName, double salary)

{ …. }

Page 24: Reintroducing Consistency in Cloud Settings

Example: Parallel search

Just ask all the members to do “their share” of work:

Replies = g.query(LOOKUP, “Name=*Smith”);g.callback(myReplyHndlr, Replies, typeof(double));

public void lookup(string who) {divide work into viewSize() chunks

this replica will search chunk # getMyRank(); reply(myAnswer);}

public void myReplyHndlr(double[] whatTheyFound) { … }

Page 25: Reintroducing Consistency in Cloud Settings

Example: Parallel search

Replies = g.query(LOOKUP, “Name=*Smith”);

g.callback(myReplyHndlr, Replies, typeof(double));

public void myReplyHndlr(double[] fnd) { foreach(double d in fnd) avg += d; …}

public void myLookup(string who) { divide work into viewSize() chunks this replica will search chunk # getMyRank();

…..

reply(myAnswer);}

Group g = new Group(“/amazon/something”);g.register(LOOKUP, myLookup);

Page 26: Reintroducing Consistency in Cloud Settings

Key points

The group is just an object. User doesn’t experience sockets…

marshalling… preprocessors… protocols… As much as possible, they just provide arguments

as if this was a kind of RPC, but no preprocessor Sometimes they provide a list of types and Isis

does a callback

Groups have replicas… handlers… a “current view” in which each member has a “rank”

Page 27: Reintroducing Consistency in Cloud Settings

Virtual synchrony vs Paxos Can’t we just use Paxos?

In recent work (collaboration with MSR SV) we’ve merged the models. Our model “subsumes” both…

This new model is more flexible: Paxos is really used only for locking. Isis can be used for locking, but can also replicate data

at very high speeds, with dynamic membership, and support other functionality.

Isis2 will be much faster than Paxos for most group replication purposes (1000x or more)

[Building a Dynamic Reliable Service.  Ken Birman, Dahlia Malkhi and Robbert van Renesse.   Available as a 2009 technical report, in submission to SOCC 10 and ACM Computing Surveys...]

Page 28: Reintroducing Consistency in Cloud Settings

Later… Can offer “tools”

Unbreakable TCP connections that terminate in groups [Burgess ‘10] describes Robert Burgess’ new r-TCP solution Groups use some form of state machine replication scheme

State transfer and persistence

Locking, other coordination paradigms

2PC and transactional 1-copy SR

Publish-subscribe with topic or content filtering (or both)

Page 29: Reintroducing Consistency in Cloud Settings

Building it won’t be easy! Isis2 has a lot in common with an

operating system and is internally very complex Distributed communication layer manages

multicast, flow control, reliability, failure sensing

Agreement protocols track group membership, maintain group views, implement virtual synchrony

Infrastructure services build messages, handle callbacks, keep groups healthy

Page 30: Reintroducing Consistency in Cloud Settings

Core of the challenge

To scale really well we need to take full advantage of the hardware: IPMC

But IPMC was the root cause of the oscillation shown on the prior slide

Page 31: Reintroducing Consistency in Cloud Settings

Managed IPMC space

Traditional IPMC systems canoverload the router, melt down

Issue is that routers have a small“space” for active IPMC addresses

In [Vigfusson, et al ‘09] we show how to use optimization to manage the IPMC space

In effect, merges similar groups while respecting limits on the routers and switches

Melts down at

~100 groups

Page 32: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 32

Channel Aggregation

Algorithm by Vigfusson, Tock [HotNets 09, LADIS 2008, Submission to Eurosys 10]

Uses a k-means clustering algorithm Generalized problem is NP complete But heuristic works well in practice

Sept 24, 2009

Page 33: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 33

Optimization Questions

o Assign IPMC and unicast addresses s.t.  % receiver filtering (hard) Min. network traffic # IPMC addresses (hard)M

• Prefers sender load over receiver load

• Intuitive control knobs as part of the policy

(1)

Dr. Multicast

Sept 24, 2009

Page 34: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 34

MCMD Heuristic

Topics in `user-interest’ space

FGIF BEER GROUP FREE FOOD

(1,1,1,1,1,0,1,0,1,0,1,1)(0,1,1,1,1,1,1,0,0,1,1,1)

Dr. Multicast

Sept 24, 2009

Page 35: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 35

MCMD Heuristic

Topics in `user-interest’ space

224.1.2.3224.1.2.4

224.1.2.5

Dr. Multicast

Sept 24, 2009

Page 36: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 36

MCMD Heuristic

Topics in `user-interest’ space

Filtering cost:

MAXSending cost:

Dr. Multicast

Sept 24, 2009

Page 37: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 37

MCMD Heuristic

Topics in `user-interest’ space

Filtering cost:

MAXSending cost:

Unicast

Dr. Multicast

Sept 24, 2009

Page 38: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 38

MCMD Heuristic

Topics in `user-interest’ space

Unicast

Unicast

224.1.2.3

224.1.2.4

224.1.2.5

Dr. Multicast

Sept 24, 2009

Page 39: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 39

Using the Solution

Procs   L-IPMC

Heuristic

multicast

Procs    L-IPMC

• Processes use “logical” IPMC addresses• Dr. Multicast transparently maps these to

true IPMC addresses or 1:1 UDP sends

Dr. Multicast

Sept 24, 2009

Page 40: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 40

Effectiveness?

We looked at various group scenarios

Most of the traffic is carried by <20% of groups

For IBM Websphere,Dr. Multicast achieves18x reduction in physical IPMC addresses

[Dr. Multicast: Rx for Data Center Communication Scalability.  Ymir Vigfusson, Hussam Abu-Libdeh, Mahesh Balakrishnan, Ken Birman, and Yoav Tock.  LADIS 2008.  November 2008. Full paper submitted to Eurosys 10.]

Sept 24, 2009

Page 41: Reintroducing Consistency in Cloud Settings

For small groups, reliable multicast protocols directly ack/nack the sender

For large ones, use QSM technique: tokens circulate within a tree of rings Acks travel around the rings and aggregate over

members they visit (efficient token encodes data) This scales well even with many groups Isis2 uses this mode for |groups| > 25 members,

with each ring containing ~25 nodes

[Quicksilver Scalable Multicast (QSM).  Krzys Ostrowski, Ken Birman, and Danny Dolev.   Network Computing and Applications (NCA’08), July 08. Boston.]

Hierachical acknowledgements

Page 42: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 42

Flow Control: AJIL

Needed to prevent bursts of multicast from overrunning receivers

AJIL protocol imposes limits on IPMC rate AJIL monitors aggregated multicast rate Uses optimization to apportion bandwidth If limit exceeded, user perceives a “slower”

multicast channel

[Ajil: Distributed Rate-limiting for Multicast Networks.  Hussam Abu-Libdeh, Ymir Vigfusson, Ken Birman, and Mahesh Balakrishnan (Microsoft Research, Silicon Valley).  Cornell University TR.  Dec 08.]

Sept 24, 2009

Page 43: Reintroducing Consistency in Cloud Settings

Cornell Dept of Computer Science Colloquium 43

AJIL in action…

AJIL reacts rapidly to load surges, stays close to targets (and we’re improving it steadily)

Makes it possible to eliminate almost all IPMC message loss within the datacenter!

Sept 24, 2009

Page 44: Reintroducing Consistency in Cloud Settings

Summary?

Challenges SolutionsDistributed computing is hard and our target developers have limited skills

Make group communication look as natural to the developer as building a .NET GUI

Raw performance is critical to success

Consistency at the “speed of light” by using lossless IPMC to send updates

IPMC can trigger resource exhaustion and loss by entering “promiscuous” mode, overrunning receivers.

Optimization-based management of IPMC addresses reduces # of IPMC groups 100:1. AJIL flow control scheme prevents overload.

User’s will generate massive numbers of groups, not just high rates of events

Aggregation, aggregation, aggregation… all automated and transparent to users

Reliable protocols in massive groups result in ack implosions

For big groups, deploy hierarchical ack/nack rings (idea from Quicksilver)

Many existing group communication systems are insecure

Use replicated group keys to secure membership, sensitive data

What about C++ and Python on Linux?

Port platform to Linux with Mono, then offer C++/Python supporting using remoting

Page 45: Reintroducing Consistency in Cloud Settings

Summary?

Isis2 is coming soon… initially on .NET

Developers will think of distributed groups very much as they think of objects in C#. A friendly, easy to understand model And under the surface, theoretically rigorous Yet fast and secure too

All the complexities of distributed computing are swept into this library… users have a very insulated and easy experience

Page 46: Reintroducing Consistency in Cloud Settings

How can non-C# users access it?

.NET supports ~40 languages, all of which can call Isis2 directly

On Linux, we’ll do a Mono port and then build an outboard server that offers a remoted library interface

C++ and other Linux languages/applications will simply run off this server, unless they are comfortable running under Mono of course

Page 47: Reintroducing Consistency in Cloud Settings

Why did we opt for C# in .NET? Code extensively leverages

Reflection capabilities of C#, even when called from one of the other .NET languages

Component architecture of .NET means that users will already have the right “mind set”

Powerful prebuilt data types such as HashSets

All of this makes Isis2 simpler and more robust; roughly a 3x improvement compared to older C/C++ version of Isis!

Page 48: Reintroducing Consistency in Cloud Settings

Status report?

Building this system (myself) as a sabbatical project… code is mostly written

Goal is to run this system on 500 to 500,000 node systems, with millions of object groups

Initial byte-code only version will be released under a freeBSD license.