Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of...
-
Upload
gladys-carmel-montgomery -
Category
Documents
-
view
215 -
download
0
Transcript of Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of...
![Page 1: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/1.jpg)
Implementation and Evaluation of a Protocol for Recording Process
Documentation in the Presence of Failures
Zheng Chen and Luc [email protected]
University of Southampton
![Page 2: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/2.jpg)
Outline
Motivation
Protocol Overview
Implementation
Experimental Setup
Experimental Results & Analysis
Conclusions & Future Work
![Page 3: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/3.jpg)
The provenance of a data product refers to the process that led to that data product
Process documentation is a computer-based representation of a past process for determining provenance
Process documentation consists of a set of p-assertions
Process documentation is stored in provenance stores Provenance obtained by querying provenance stores
![Page 4: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/4.jpg)
Link
A protocol to record process documentation
Multiple provenance stores are interlinked to enable retrievability of distributed process documentation
PReP (Groth 04-08)
invocationresult
Actor1
PS1
Invocation and result p-assertions
PS2
Actor2 invocationresult
Actor3
PS3
invocationresult
Actor4
PS4
Pointer Chain
![Page 5: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/5.jpg)
Failures
Provenance store crash, communication failures We do not consider application failures, e.g. actor crash Poor quality process documentation
Incomplete
Disconnected
invocationresult
Actor1 Actor2 invocationresult
Actor3 invocationresult
Actor4
Broken Pointer Chain
PS2
Link Invocation and result p-assertions
PS1 PS3 PS4
![Page 6: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/6.jpg)
Requirements Guaranteed Recording After a process completes, the entire documentation of
the process must eventually be recorded in provenance stores
Link Accuracy All the links recorded during a process must eventually
be accurate to enable retrievability of distributed documentation
Efficient Recording The protocol should be efficient and introduce
minimum overhead
![Page 7: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/7.jpg)
F-PReP
A protocol for recording process documentation in the presence of failures
Derives from PReP to inherit its generic nature
Introduces an Update Coordinator to facilitate updating links (We assume the coordinator does not crash)
Actor’s side Uses timeout and retransmission to record p-assertions Chooses alternative provenance stores in case of failures Requests the coordinator to update links
Provenance store Replies an acknowledgement only after it has successfully
recorded p-assertions in its persistent storage.
![Page 8: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/8.jpg)
Invocation and result p-assertions
Link
PS1 PS2 PS3 PS4
F-PReP
Actor1 Actor2invocationresult
Actor3invocationresult
Actor4invocationresult
PS2’
Update Coordinator
Repair Request
Pointer Chain
Update
Update
![Page 9: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/9.jpg)
Implementation Provenance Store
Implemented as a Java Servlet backend store (Berkeley DB) Disk cache Flushing OS buffers to disk before providing an ack to actor Update Plug-In
Client Side Library Remedial actions that cope with failures Multithreading for the creation and recording of p-assertions A local file store (Berkeley DB) for temporarily maintaining p-
assertions
Update Coordinator Implemented as a Java Servlet Berkeley DB is also employed to maintain request
information
![Page 10: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/10.jpg)
Performance Study
Throughput of provenance store and coordinator
Scalability of update coordinator
Failure-free recording performance
Overhead of taking remedial actions
Performance impact on application
![Page 11: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/11.jpg)
Experimental Setup Iridis cluster (Over 1000 processor-cores) Gigabit Ethernet Tomcat 5.0 container Berkeley DB Java Edition database Java 1.5 A generator is used on an actor's side to inject random
failure events: Failure to submit a batch of p-assertions to a
provenance store Failure to receive an acknowledgement from a
provenance store before a timeout Generates a failure event based on a failure rate, i.e.,
the number of failure events occurring after a total number of recordings
![Page 12: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/12.jpg)
1. Provenance Store (PS) Throughput
Setup: up to 512 clients sending 10k p-assertions to 1 PS in 10 min Hypothesis: Disk cache may sacrifice a provenance store's throughput. Result: 20% decrease in throughput
![Page 13: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/13.jpg)
2. Coordinator Throughput
Setup: up to 512 clients sending 100 requests to 1 coordinator in 10 min Hypothesis: The coordinator’s throughput is high. Result: 30,000*100 repair requests accepted in 10 min
![Page 14: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/14.jpg)
3. Throughput Experiment with Failures (1 client)
Setup: 1 client sending 10k p-assertions to 1 PS 1 alt. PS and 1 coordinator used in the case of failures Hypothesis: (a) Resending to a same PS is preferred over alt. PS
for transient failures (b) Update coordinator is not a bottleneck.
A client sends at most 200*100 repair requests. (Maximum is seen when failure rate is 50%.)
Coordinator throughput: 30,000*100 req/10min
This implies that coordinatorcan support a large number
of clients (50 - 100?) without being a bottleneck.
![Page 15: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/15.jpg)
4. Throughput Experiment with Failures (128 clients)
Setup: 128 clients sending 10k p-assertions to 1 PS 1 alt. PS and 1 coordinator used in the case of failures Hypothesis: (a) Resending to a alt. PS is preferred to same PS
(b) The coordinator is not a bottleneck.
128 clients send at most 750*100 repair requests. (Maximum is seen when failure rate is 50%.)
Coordinator throughput: 30,000*100 req/10min
This implies that coordinator can support a large number of clients
without being a bottleneck.
![Page 16: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/16.jpg)
5. Failure-free Recording Performance
Setup: 1 client recording 10,000 10k p-assertions to 1 PS 100 p-assertions shipped in a single batch
Hypothesis: Disk cache causes overhead. Results: (a) 900 10k p-assertions may be lost if PS’s OS crashes. (PReP)
(b) 13.8% overhead, compared to PReP
![Page 17: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/17.jpg)
6. Overhead of Taking Remedial Actions
Setup: 1 client recording 100 p-assertions to 1 PS 1 alt. PS and 1 coordinator used in the case of failures
Hypothesis: Remedial actions have acceptable overhead. Result: <10% overhead, compared to failure-free record time
![Page 18: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/18.jpg)
7. Performance Impact on Application
Amino Acid Compressibility Experiment (ACE) High performance and fine grained, thus representative One run of ACE: 20 parallel jobs; 54, 000 interactions/job Extremely detailed process documentation 1.08 GB p-assertions/job in 25 minutes
![Page 19: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/19.jpg)
Recording Performance in ACE
Setup: 5 PS and 1 coordinator Multithreading for creation and recording p-
assertions Hypothesis: F-PReP has acceptable recording overhead. Results: (a) similar overhead (12%) as PReP on application
performance when no failure occurs
(b) Timeout and queue management affect performance.
![Page 20: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/20.jpg)
Impact of Queue Management on Performance
Hypothesis: Flow control on queue affects performance. Conclusions: (a) The result supports our hypothesis.
(b) We can monitor queue and take actions,
e.g., employing the local file store.
![Page 21: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/21.jpg)
8. Quality of Recorded Process Documentation
Setup: Using F-PReP and PReP to record p-assertions Querying PS to verify recorded documentation Results: (a) PReP: incomplete; F-PReP: complete (b) PReP: irretrievable; F-PReP: retrievable
![Page 22: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/22.jpg)
Conclusions & Future Work Coordinator does not affect an actor’s recording performance. In an application, F-PReP has similar recording overhead as
PReP on application performance when there is no failure. Although it introduces overhead in the presence of failures,
we believe the overhead is still acceptable, given that it can record high quality (i.e., complete and retrievable) process documentation.
We are currently investigating how to create process documentation when an application has its own fault tolerance schemes to tolerate application level failures.
In future work, we plan to make use of the process documentation recorded in the presence of failures to diagnose failures.
![Page 23: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e415503460f94b33980/html5/thumbnails/23.jpg)
Questions?
Thank Thank you!you!