Building and Programming the Cloud, Mysore, Jan 2010 1 Accountable distributed systems and the...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Building and Programming the Cloud, Mysore, Jan 2010 1 Accountable distributed systems and the...
Building and Programming the Cloud, Mysore, Jan 2010 1
Accountable distributed systems and the accountable cloud
Peter Druschel
joint work with Andreas Haeberlen1, Petr Kuznetsov2, Rodrigo Rodrigues
1 University of Pennsylvania
2 TU Berlin/Deutsche Telekom Labs
2
Outline
Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion
Building and Programming the Cloud, Mysore, Jan 2010
3
What is the problem?
Building and Programming the Cloud, Mysore, Jan 2010
Multiple administrative domains (federated, p2p)
Multiple stakeholders (hosting, Web) different actors, somewhat different
interests lack of global visibility, control
Complex faults software faults, mis-configuration,
negligence, disgruntled employees, outside attacks, manipulation
Lack of transparency
4
Learning from the 'offline' world Relies heavily on accountability to deal with
faults, misbehavior Example: Banking
Record can be used to (manually) detect problems identify the responsible party convince that a problem does (not) exist
Requirement Solution
Commitment Signed receipts
Tamper-evident record
Double-entry bookkeeping
Inspections Audits
Building and Programming the Cloud, Mysore, Jan 2010
5
What does accountability mean in distributed systems?
1. Tamper-evident record of each node‘s actions
2. (Automated) audit for fault detection, localization
3. Evidence to convince a third party that a fault has (not) occured
Accountability provides transparency trust incentives to avoid faults
Building and Programming the Cloud, Mysore, Jan 2010
6
Outline
Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion
Building and Programming the Cloud, Mysore, Jan 2010
7
Ideal accountability
Whenever a node is faulty, the system generates a proof of misbehavior against that node
Fault := Node deviates from expected behavior
Our goal is to automatically detect faults identify the faulty nodes convince others that a node is (or is not) faulty
Can we build a system that provides the following guarantee?
Building and Programming the Cloud, Mysore, Jan 2010
8
Can we detect all faults? Problem: Faults that
affect only a node's internal state
Would require online trusted probes at each node
Focus on observable faults: Faults that affect a correct node
Can detect observable faults without requiring trusted components
AA
X
CC
100101011000101101011100100100
0
Building and Programming the Cloud, Mysore, Jan 2010
9
Can we always get a proof?
Problem: He-said-she-said Three possible causes:
A never sent X B refuses to acknowledge X X was delayed by the network
Cannot get proof of misbehavior! Generalize to verifiable evidence:
a proof of misbehavior, or a challenge that a faulty node cannot answer
What if the challenged node does not respond? Does not prove a fault, but node is suspected until it
responds
AA
X
BB
CC
?
I sent X!
I neverreceived
X!
?!
Building and Programming the Cloud, Mysore, Jan 2010
10
Practical accountability Requirement for an accountable distributed
system:
This is useful Any (!) fault that affects a correct node is
eventually detected and linked to a faulty node
It can be implemented in practice
Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node
Building and Programming the Cloud, Mysore, Jan 2010
11
Outline
Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion
Building and Programming the Cloud, Mysore, Jan 2010
12
Adds accountability to a given system Implemented as a library Provides tamper-evident record Detects faults via state-machine replay
Assumptions:
PeerReview
1. Nodes can be modeled as deterministic state machines
2. There is a trusted reference implementation of the state machines
3. Correct nodes can eventually communicate
4. Nodes can sign messagesBuilding and Programming the Cloud, Mysore, Jan 2010
13
PeerReview is widely applicable App #1: NFS server in the Linux kernel
Many small, latency-sensitive requests Tampering with files Lost updates
App #2: Overlay multicast Transfers large volume of data
Freeloading Tampering with content
App #3: P2P email Complex, large, decentralized
Denial of service Attacks on DHT routing
Details in [Haeberlen et al., SOSP’07] NetReview [Haeberlen et al. NSDI’08]
Metadata corruption Incorrect access
control
Censorship
Building and Programming the Cloud, Mysore, Jan 2010
14
How much does PeerReview cost?
Log storage 10 – 100 GByte per month, depending on
application
Message signatures Message latency (e.g. 1.5ms RTT with RSA-
1024) CPU overhead (embarrassingly parallel)
Log/authenticator transfer, replay overhead Depends on # witnesses Can be deferred to exploit bursty/diurnal load
patternsBuilding and Programming the Cloud, Mysore, Jan 2010
15
Outline
Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion
Building and Programming the Cloud, Mysore, Jan 2010
Split administration in the Cloud
Bug in Alice‘s software Subtle differences between
Alice and Bob‘s environments ...
16
Alice
Bob
Alice's customers
Bug in Bob‘s software Insufficient resource allocation Hacker attack ...
What if there is a problem?
Building and Programming the Cloud, Mysore, Jan 2010
Split administraction: Alice‘s perspective
17Building and Programming the Cloud, Mysore, Jan 2010
Alice Alice's customers
? ?????? ?
Bob
If something is wrong, how will I
know? How can I tell if it's
my software or the cloud?
If it's the cloud, how can I convince Bob?
If something is wrong, how will I
know? How can I tell if it's
my software or the cloud?
If it's the cloud, how can I convince Bob?
Split administraction: Bob's perspective
18Building and Programming the Cloud, Mysore, Jan 2010
Alice
Bob
Alice's customers
?? ?
???
?
?
?
?
??
?
If something is wrong, how will I know?
How can I tell if it's the cloud or Alice's
software? If it's Alice's software,
how can I convince Alice?
An idealized solution
What if we had an oracle that Alice and Bob could ask about problems?
Completeness: If the cloud is faulty, the oracle will say so
Accuracy: If the cloud is not faulty, the oracle will say so
Verifiability: The oracle produces evidence that would convince a third party
19Building and Programming the Cloud, Mysore, Jan 2010
Alice
Bob
Alice's customers
Oracle
The accountable cloud
Idea: Make cloud accountable Cloud records its actions in a tamper-evident log Alice can audit the log and check for faults Use log to construct evidence that a fault does (not)
exist Should work even if one party was compromised!
20Building and Programming the Cloud, Mysore, Jan 2010
Alice
Bob
Alice's customers
Tamper-evidentlog
Discussion
Is this too pessimistic? Cloud isn't malicious!
Hacker attacks, software bugs, operator error, malicious client, …
Difficult to come up with a more restrictive fault model
Without provable properties, evidence has little value
Why would a provider want to deploy this?
Attractive to prospective customers (peace of mind) Helps in handling customer complaints, resolve
disputes 21Building and Programming the Cloud, Mysore, Jan 2010
22
Outline
Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion
Building and Programming the Cloud, Mysore, Jan 2010
Is the technology ready?
Cloud accountability should Have provable guarantees Work for most cloud applications Require no changes to application code Cover a wide spectrum of properties Have reasonable overhead
Can existing techniques deliver this? CATS, Repeat&Compare, AIP, PeerReview,
NetReview, AudIt, ...
More work is needed!
23Building and Programming the Cloud, Mysore, Jan 2010
??
?
Work in progress: AVM
Goal: Provide accountability for arbitrary binary executables
Idea: Accountable virtual machine (AVM) Cloud records enough data to enable deterministic
replay Alice can replay log against a reference
implementation Can audit any part of the hosted execution 24
Building and Programming the Cloud, Mysore, Jan 2010
Alice Bob
Virtual machine
Challenges
Complete state-machine replay expensive
limit to spot checks, investigation of suspected faults
multi-core replay is hard replay log against an abstract model?
Checking performance properties
Checking information flow
Lots of research opportunities25
Building and Programming the Cloud, Mysore, Jan 2010
Summary Accountability is a useful capability in
distributed systems tamper-evident record fault detection and localization evidence
Proposal: the accountable cloud Can verify correct operation, produce evidence Provable guarantees solid foundation for both
players Challenges remain
26
Questions?
Building and Programming the Cloud, Mysore, Jan 2010