Parallel Computation ofParallel Computation of
Skyline QueriesSkyline QueriesVerificationVerification
COSC6490A Fall 2007 Slawomir Kmiec
Presentation OutlinePresentation Outline
Skyline Concepts
The Parallel Algorithm
JPF Experience
JPF Issues
Abstraction
Results
Future Work
Summary
Questions
Skyline ConceptsSkyline ConceptsIn a set of points (or records) identify points that are better than (i.e. not worse than) any of the others by a given set of their attributes.
Name Rating Avg. Price
Parthenon 5 $45.00
Olympus 4 $40.00
Coliseum 4 $30.00
Pyramid 3 $25.00
Bombay 5 $35.00
Paris 5 $40.00
Roma 4 $35.00
Palermo 3 $30.00
Point pa is said to dominate point pb if for all i such that 1 ≤ i ≤ d we have xi(pa) ≤ xi(pb) , and at least one of those inequalities is strict.
A point p is a skyline point if it is not dominated by any other point in S. The skyline of S is denoted sky(S).
The Parallel Algorithm (A)The Parallel Algorithm (A) Principles:→ data divided equally and distributed→ local skyline is computed at each peer→ size of the local skyline is shared with
peers→ if combined results fit on any processor→ local skylines are exchanged with peers
then→ processor pi picks ith chunk of the
combined skyline and eliminates points in it that the combined skyline dominates
→ local results are sent to the central process
→ end // of processing
The Parallel Algorithm (A cont.)The Parallel Algorithm (A cont.)
The Parallel Algorithm (B)The Parallel Algorithm (B) Principles (continued)→ else // combined results do not fit on
some pi
→ loop until required number of results is available or all pi have finished do
→ each processor pi picks a random set of points (in proportion of his local skyline)
→ this set is submitted to all peers that mark point that they dominate and marked points are returned to sender
→ each processor pi collects back points submitted to peers and removes marked ones from the original set but sends the remaining ones to the central processor
→ end loop→ end // of processing
The Parallel Algorithm (B cont.)The Parallel Algorithm (B cont.)
JPF ExperienceJPF Experience
getting JPF
getting JPF to run
the Eclipse way
the Linux way
incremental examples
configuration options
JPF value-added services
JPF IssuesJPF Issues independent processors
- restricted to threads
eliminate native code classes- no Swing, Sockets, NIO, Regex (Eclipse)- out of 15 just java.util.ArrayList left- eliminate Socket-oriented developed classes
search-state-space reduction- input: 10 points- 2 worker threads- operation abstraction- output discarded
AbstractionAbstraction• 2 types of developed classes left
SkylineMain and SkylineWorker - workflow classes “Handler” classes - request handling classes
SkylineMain
SkylineMainListener
SkylineMainHandler
Thread
Socket
ServeSocket
SkylineWorker
SkylineWorkerListener
SkylineWorkerHandler
Thread
Socket
ServerSocket
Abstraction (cont.)Abstraction (cont.)• high volume of work:
- due to a lot of original code
• removed all GUI:- remove Swing and AWT elements
• asynchronous Socket messaging done as:- keep references to workers instead of addresses- eliminate the “Listener” classes- each message done as an instance of the handler- create a handler for the destination worker- execute synchronous (blocking) part of data sending- start handler to execute asynchronous processing- each type of messages split into synch- and asynch- part
• file IO done as:- store parameters as static constants- store input data as an array- replace input scanning with referencing the array- display or discard output
• String.split() method (Regex) done as:- re-done as a String manipulation method
ResultsResults• issues reported - different issues at different settings - large volume of output to be analyzed
• uncaught-exception conditions - issues regarding un-synchronized access - the above as IllegalMonitorStateException
• dead-lock conditions - issues regarding termination conditions
• PreciseRaceDetector -“Unprotected Variable Access” severe warnings
• possibly more - it ran for a long time with no other errors - it did not finish in the time given
Future WorkFuture Work• atomize code - wrap code fragments into atomic operations
• protect shared variable access - use locks of synchronized blocks - re-run PreciseRaceDetector
• run it for an extended period of time - to search the complete state space
• analyze the applicability of issues found - wrt the applicability to the original app - not as a result of the abstraction or transformation
• reduce shared data interaction - handlers to create private data structures to be quickly accepted by corresponding main process - this will allow greater robustness and redundancy
SummarySummary
• JPF is a flexible and complex tool
• JPF is memory- and time- intensive
• JPF is a valuable verification tool
• the application had to be changed
extensively to work with JPF
• potential issues were found by JPF
• verification = value-added serviceextra testing
code refinement (robustness)
QuestionsQuestions
??????
Top Related