HPTS Keynote, October 2001 Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder...
-
Upload
bryan-seton -
Category
Documents
-
view
214 -
download
1
Transcript of HPTS Keynote, October 2001 Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder...
HPTS Keynote, October 2001HPTS Keynote, October 2001
Invariant BoundariesInvariant Boundaries
Dr. Eric A. BrewerDr. Eric A. BrewerProfessor, UC BerkeleyProfessor, UC Berkeley
Co-Founder & Chief Scientist, InktomiCo-Founder & Chief Scientist, Inktomi
HPTS Keynote, October 2001HPTS Keynote, October 2001
Our PerspectiveOur Perspective
Inktomi builds two Inktomi builds two distributed systems:distributed systems:– Global Search EnginesGlobal Search Engines– Distributed Web CachesDistributed Web Caches
Based on scalable Based on scalable cluster & parallel cluster & parallel computing technologycomputing technology
But very little use of But very little use of classic DS research...classic DS research...
HPTS Keynote, October 2001HPTS Keynote, October 2001
““Distributed Systems” don’t work...Distributed Systems” don’t work...
There exist working DS:There exist working DS:– Simple protocols: DNS, WWWSimple protocols: DNS, WWW– Inktomi search, Content Delivery NetworksInktomi search, Content Delivery Networks– Napster, Verisign, AOLNapster, Verisign, AOL
But these are not classic DS:But these are not classic DS:– Not distributed objectsNot distributed objects– No RPCNo RPC– No modularityNo modularity– Complex ones are single owner (except phones)Complex ones are single owner (except phones)
HPTS Keynote, October 2001HPTS Keynote, October 2001
Concept: Invariant BoundariesConcept: Invariant Boundaries
Claim: we don’t understand boundariesClaim: we don’t understand boundaries Solution: make “Invariant Boundary” explictitSolution: make “Invariant Boundary” explictit Goal: simpler, easier, faster to correctnessGoal: simpler, easier, faster to correctness
InvariantHolds
Invariantmay not hold
HPTS Keynote, October 2001HPTS Keynote, October 2001
Three Basic IssuesThree Basic Issues
Where is the state?Where is the state?
Consistency vs. AvailabilityConsistency vs. Availability
Communication BoundariesCommunication Boundaries
HPTS Keynote, October 2001HPTS Keynote, October 2001
Santa Clara ClusterSanta Clara Cluster
• Very uniform• No monitors• No people• No cables
• Working power• Working A/C• Working BW
HPTS Keynote, October 2001HPTS Keynote, October 2001
Boundaries for Kinds of StateBoundaries for Kinds of State
StatelessStateless– Front endsFront ends
Immutable stateImmutable state
Soft State (rebuild on restart)Soft State (rebuild on restart)
Durable Single-writer (e.g. user data)Durable Single-writer (e.g. user data)– Emissaries, horizontal partitioningEmissaries, horizontal partitioning
Durable MWDurable MW– Fiefdoms, DBMSFiefdoms, DBMS
HPTS Keynote, October 2001HPTS Keynote, October 2001
Persistent State is HARDPersistent State is HARD
Classic DS focus on the computation, not the dataClassic DS focus on the computation, not the data– this is WRONG, computation is the easy partthis is WRONG, computation is the easy part
Data centers exist for a reasonData centers exist for a reason– can’t have consistency or availability without themcan’t have consistency or availability without them
Other locations are for caching only:Other locations are for caching only:– proxies, basestations, set-top boxes, desktopsproxies, basestations, set-top boxes, desktops– phones, PDAs, …phones, PDAs, …
Distributed systems can’t ignore locationDistributed systems can’t ignore location
Invariant Boundary is Invariant Boundary is smallsmall
APActive Proxy:Bootstraps thin devices into infrastructure, runs mobile code
AP
Workstations & PCs
Berkeley Ninja ArchitectureBerkeley Ninja Architecture
Base: Scalable, highly-available platform for persistent-state services
Internet
PDAs(e.g. IBM Workpad)Cellphones, Pagers, etc.
APActive Proxy:Bootstraps thin devices into infrastructure, runs mobile code
AP
Workstations & PCs
Berkeley Ninja ArchitectureBerkeley Ninja Architecture
Base: Scalable, highly-available platform for persistent-state services
Internet
PDAs(e.g. IBM Workpad)Cellphones, Pagers, etc.
Consistent Shared
Soft-state, immutable state
Single user
HPTS Keynote, October 2001HPTS Keynote, October 2001
Three Basic IssuesThree Basic Issues
Where is the state?Where is the state?
Consistency vs. AvailabilityConsistency vs. Availability
Communication BoundariesCommunication Boundaries
HPTS Keynote, October 2001HPTS Keynote, October 2001
Data is only consistent *inside*Data is only consistent *inside*
Data is not consistent(“reference” data)
Consistent A
HPTS Keynote, October 2001HPTS Keynote, October 2001
Data is only consistent *inside*Data is only consistent *inside*
Consistent A
Data is not consistent(“reference” data)
Consistent B
HPTS Keynote, October 2001HPTS Keynote, October 2001
The CAP TheoremThe CAP Theorem
Consistency Availability
Tolerance to network
Partitions
Theorem: You can have at most two of these invariantsfor any shared-data system
HPTS Keynote, October 2001HPTS Keynote, October 2001
The CAP TheoremThe CAP Theorem
Consistency Availability
Tolerance to network
Partitions
Theorem: You can have at most two of these invariantsfor any shared-data system
Corollary: consistency boundary must choose A or P
HPTS Keynote, October 2001HPTS Keynote, October 2001
Forfeit PartitionsForfeit Partitions
Consistency Availability
Tolerance to network
Partitions
ExamplesExamples Single-site databasesSingle-site databases Cluster databasesCluster databases LDAPLDAP FiefdomsFiefdoms
TraitsTraits 2-phase commit2-phase commit cache validation protocolscache validation protocols The “inside”The “inside”
HPTS Keynote, October 2001HPTS Keynote, October 2001
Forfeit AvailabilityForfeit Availability
Consistency Availability
Tolerance to network
Partitions
ExamplesExamples Distributed databasesDistributed databases Distributed lockingDistributed locking Majority protocolsMajority protocols
TraitsTraits Pessimistic lockingPessimistic locking Make minority partitions Make minority partitions
unavailableunavailable
HPTS Keynote, October 2001HPTS Keynote, October 2001
Forfeit ConsistencyForfeit Consistency
Consistency Availability
Tolerance to network
Partitions
ExamplesExamples CodaCoda Web cachingeWeb cachinge DNSDNS EmissariesEmissaries
TraitsTraits expirations/leasesexpirations/leases conflict resolutionconflict resolution OptimisticOptimistic The “outside”The “outside”
HPTS Keynote, October 2001HPTS Keynote, October 2001
ACID vs. BASEACID vs. BASE
ACIDACID Strong consistencyStrong consistency
IsolationIsolation
Focus on “commit”Focus on “commit”
Nested transactionsNested transactions
Availability?Availability?
Conservative (pessimistic)Conservative (pessimistic)
Difficult evolutionDifficult evolution(e.g. schema)(e.g. schema)
““small” Invariant Boundarysmall” Invariant Boundary
The “inside”The “inside”
BASEBASE Weak consistencyWeak consistency
– stale data OKstale data OK
Availability firstAvailability first
Best effortBest effort
Approximate answers OKApproximate answers OK
Aggressive (optimistic)Aggressive (optimistic)
““Simpler” and fasterSimpler” and faster
Easier evolution (XML)Easier evolution (XML)
““wide” Invariant Boundarywide” Invariant Boundary
Outside consistency boundaryOutside consistency boundary
but it’s a spectrum
HPTS Keynote, October 2001HPTS Keynote, October 2001
Consistency Boundary Summary Consistency Boundary Summary
Can have consistency & availability within a Can have consistency & availability within a cluster. No partitions within boundary!cluster. No partitions within boundary!
OS/Networking better at A than COS/Networking better at A than C
Databases better at C than ADatabases better at C than A
Wide-area databases can’t have bothWide-area databases can’t have both
Disconnected clients can’t have bothDisconnected clients can’t have both
HPTS Keynote, October 2001HPTS Keynote, October 2001
Three Basic IssuesThree Basic Issues
Where is the state?Where is the state?
Consistency vs. AvailabilityConsistency vs. Availability
Communication BoundariesCommunication Boundaries
HPTS Keynote, October 2001HPTS Keynote, October 2001
The BoundaryThe Boundary
The interface between two modulesThe interface between two modules– client/server, peers, libaries, etc…client/server, peers, libaries, etc…
Basic boundary = the procedure callBasic boundary = the procedure call
– thread traverses the boundarythread traverses the boundary– two sides are in the same address spacetwo sides are in the same address space
What invariants don’t hold across?What invariants don’t hold across?
C S
HPTS Keynote, October 2001HPTS Keynote, October 2001
Different Address SpacesDifferent Address Spaces
What if the two sides are NOT in the same What if the two sides are NOT in the same address space?address space?– IPC or LRPCIPC or LRPC
Can’t do pass-by-reference (pointers)Can’t do pass-by-reference (pointers)– Most IPC screws this up: pass by value-resultMost IPC screws this up: pass by value-result– There are TWO copies of args not oneThere are TWO copies of args not one
What if they share some memory?What if they share some memory?– Can pass pointers, but…Can pass pointers, but…– Need synchronization between client/serverNeed synchronization between client/server– Not all pointers can be passedNot all pointers can be passed
HPTS Keynote, October 2001HPTS Keynote, October 2001
Partial FailurePartial Failure
Can the two sides fail independently?Can the two sides fail independently?– RPC, IPC, LRPCRPC, IPC, LRPC
Can’t be transparent (like RPC) !!Can’t be transparent (like RPC) !!
New exceptions (other side gone)New exceptions (other side gone)
Idempotent calls?Idempotent calls?– Use Transaction Ids (to solve replay problem)Use Transaction Ids (to solve replay problem)
Reclaim local resourcesReclaim local resources– e.g. kernels leak sockets over time => reboote.g. kernels leak sockets over time => reboot
RPC tries to hide these issues (but fails)RPC tries to hide these issues (but fails)
Use Level 4/7 switches to hide failures?Use Level 4/7 switches to hide failures?
HPTS Keynote, October 2001HPTS Keynote, October 2001
Resource AllocationResource Allocation
How to reclaim resources allocated for client?How to reclaim resources allocated for client?– Usually timeout exception cleans upUsually timeout exception cleans up– Release locks? (must track them!)Release locks? (must track them!)– How to avoid long delays while holding resources?How to avoid long delays while holding resources?
How long to remember client?How long to remember client?– Delayed responses (past timeout) must be ignoredDelayed responses (past timeout) must be ignored
Problem with leases:Problem with leases:– Great for servers, but…Great for servers, but…– Client’s lease may expire mid operationClient’s lease may expire mid operation– Hard to make client updates atomic with multiple leases (2PC?)Hard to make client updates atomic with multiple leases (2PC?)– Which things have leases? (can be hidden)Which things have leases? (can be hidden)
HPTS Keynote, October 2001HPTS Keynote, October 2001
Trust the other side?Trust the other side?
What if we don’t trust the other side?What if we don’t trust the other side?– Or partial trust (legal contract), or malicious?Or partial trust (legal contract), or malicious?– Have to check args, no pointer passingHave to check args, no pointer passing
Limited release of information (leaks)Limited release of information (leaks)
Kernels get this right:Kernels get this right:– copy/check argscopy/check args– use opaque references (e.g. File Descriptors)use opaque references (e.g. File Descriptors)
Most systems do notMost systems do not– TCP, Napster, web browsersTCP, Napster, web browsers
Security boundaries tend to be explicitSecurity boundaries tend to be explicit– Holes come from services!Holes come from services!
HPTS Keynote, October 2001HPTS Keynote, October 2001
Multiplexing clients?Multiplexing clients?
Does the server have to:Does the server have to:– Deal with high concurrency?Deal with high concurrency?– Say “no” sometimes (graceful degradation)Say “no” sometimes (graceful degradation)– Treat clients equally (fairness)Treat clients equally (fairness)– Bill for resources (and have audit trail)Bill for resources (and have audit trail)– Isolate clients performance, data, ….Isolate clients performance, data, ….
These all affect the boundary definitionThese all affect the boundary definition
HPTS Keynote, October 2001HPTS Keynote, October 2001
Boundary evolution?Boundary evolution?
Can the two sides be updated independently? Can the two sides be updated independently? (NO)(NO)
The DLL problem...The DLL problem...
Boundaries need versionsBoundaries need versions
Negotiation protocol for upgrade?Negotiation protocol for upgrade?
Promises of backward compatibility?Promises of backward compatibility?
Affects naming too (version number)Affects naming too (version number)
HPTS Keynote, October 2001HPTS Keynote, October 2001
Example: protocols vs. APIsExample: protocols vs. APIs
Protocols have been more successful than APIsProtocols have been more successful than APIs
Some reasons:Some reasons:– protocols are pass by valueprotocols are pass by value– protocols designed for partial failureprotocols designed for partial failure– not trying to look like local procedure callsnot trying to look like local procedure calls– explicit state machine, rather than call/returnexplicit state machine, rather than call/return
(this exposes exceptions well)(this exposes exceptions well)
Protocols still not good at trust, billing, evolutionProtocols still not good at trust, billing, evolution
HPTS Keynote, October 2001HPTS Keynote, October 2001
Example: XMLExample: XML
XML doesn’t solve any of these issuesXML doesn’t solve any of these issues
It is RPC with an extensible type systemIt is RPC with an extensible type system
It makes evolution better?It makes evolution better?– two sides need to agree on schematwo sides need to agree on schema– can ignore stuff you don’t understandcan ignore stuff you don’t understand– Must agree on meaning, not just tagsMust agree on meaning, not just tags
Can mislead us to ignore/postpone the real issuesCan mislead us to ignore/postpone the real issues
HPTS Keynote, October 2001HPTS Keynote, October 2001
Example: servicesExample: services
Claim: you can’t magically convert a class to a Claim: you can’t magically convert a class to a serviceservice
Behavior depends on the boundaries that callers Behavior depends on the boundaries that callers cross…. cross…. – Trusted? Multiplexed? Partial failure? Namespaces?Trusted? Multiplexed? Partial failure? Namespaces?
Shouldn’t TRY to be transparentShouldn’t TRY to be transparent
Instead: make it easier to state boundary Instead: make it easier to state boundary assumptions (and check them)assumptions (and check them)
HPTS Keynote, October 2001HPTS Keynote, October 2001
Annotated InterfacesAnnotated Interfaces
IDL can annotate interfaces:IDL can annotate interfaces:– ““Timeout” => client may not respondTimeout” => client may not respond– ““Trusted” => no malicious clientsTrusted” => no malicious clients– ““Malicious’ => client may be hostileMalicious’ => client may be hostile– ““Multiplexed” => many simultaneous callersMultiplexed” => many simultaneous callers– ““Idempotent”Idempotent”– Etc.Etc.
Could be checked at statically in some cases, Could be checked at statically in some cases, dynamically in othersdynamically in others
Annotations + Boundaries => fewer bugsAnnotations + Boundaries => fewer bugs
HPTS Keynote, October 2001HPTS Keynote, October 2001
Lessons for ApplicationsLessons for Applications
Make boundaries very explicitMake boundaries very explicit– Not just client/serverNot just client/server– Independent systemsIndependent systems– Third-party softwareThird-party software– Third-party services (RPC to vendors/partners)Third-party services (RPC to vendors/partners)
Have a few big modules…Have a few big modules…– Otherwise too many boundaries, and no invariantsOtherwise too many boundaries, and no invariants– Examples: Apache, Oracle, Inktomi search engineExamples: Apache, Oracle, Inktomi search engine– Big Modules are well supportedBig Modules are well supported– Big Modules justify their cost (Lampson)Big Modules justify their cost (Lampson)
HPTS Keynote, October 2001HPTS Keynote, October 2001
Partial checklistPartial checklist
What is shared? (namespace, schema?)What is shared? (namespace, schema?)
What kind of state in each boundary?What kind of state in each boundary?
How would you evolve an API?How would you evolve an API?
Lifetime of references? Expiration impact?Lifetime of references? Expiration impact?
Graceful degradation as modules go down?Graceful degradation as modules go down?
External persistent names?External persistent names?
Consistency semantics and boundary?Consistency semantics and boundary?
HPTS Keynote, October 2001HPTS Keynote, October 2001
ConclusionsConclusions
Most systems are fragileMost systems are fragile Root causes:Root causes:
– False transparencyFalse transparency: assuming locality, trust, privacy…: assuming locality, trust, privacy…– ImplicitlyImplicitly changingchanging boundaries: consistency, partial failure, … boundaries: consistency, partial failure, …
Some of the causes:Some of the causes:– focus on computation, not datafocus on computation, not data– ignoring location distinctionsignoring location distinctions– overestimating consistency boundaryoverestimating consistency boundary– degraded boundaries (RPC is not a PC)degraded boundaries (RPC is not a PC)
Invariant Boundaries Invariant Boundaries – Help understanding, documentationHelp understanding, documentation– Simplify detectionSimplify detection– Simpler and easier than full specifications (but weaker)Simpler and easier than full specifications (but weaker)
HPTS Keynote, October 2001HPTS Keynote, October 2001
ACID vs. BASEACID vs. BASE
DBMS research is about ACID (mostly)DBMS research is about ACID (mostly)
But we forfeit “C” and “I” for availability, graceful But we forfeit “C” and “I” for availability, graceful degradation, and performancedegradation, and performance
This tradeoff is fundamental.This tradeoff is fundamental.
BASE:BASE:– BBasically asically AAvailablevailable– SSoft-stateoft-state– EEventual consistencyventual consistency