T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Stijn DekeyserGriffith U. Seminar,...
-
Upload
jerome-gray -
Category
Documents
-
view
213 -
download
0
Transcript of T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Stijn DekeyserGriffith U. Seminar,...
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 1
Towards a new Towards a new approach to Document approach to Document
CollaborationCollaboration
Stijn DekeyserStijn DekeyserUSQ, AustraliaUSQ, Australia
Jan HiddersJan HiddersU. Antwerp, BelgiumU. Antwerp, Belgium
A Concurrency A Concurrency Control mechanism Control mechanism for XML databasesfor XML databases
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 2
ContentsContents
• Introduction: XMLIntroduction: XML• Running ExampleRunning Example• Classic CC methodsClassic CC methods• Data ModelsData Models
– User Data ModelUser Data Model– Scheduler Data ModelScheduler Data Model
• Path Lock SchemesPath Lock Schemes– Propagation (PL-PROP)Propagation (PL-PROP)– Satisfiability (PL-SAT)Satisfiability (PL-SAT)
• SchedulersSchedulers– Commit schedulerCommit scheduler– Conflict schedulerConflict scheduler
Part II
Part I • Asynchronous collaborative workAsynchronous collaborative work– cvs, change tracking, cscw, emscvs, change tracking, cscw, ems
• Synchronous collaborative workSynchronous collaborative work
• A new approach: use cases and clientsA new approach: use cases and clients
• DiscussionDiscussion
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 3
Asynchronous collaborative workAsynchronous collaborative work
• CVSCVS– Version repository + concurrencyVersion repository + concurrency– Text, Line-basedText, Line-based– Human interventionHuman intervention
• Change TrackingChange Tracking– MS Office, OOo et alMS Office, OOo et al– Human mgt + interventionHuman mgt + intervention
Part I
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 4
Asynchronous collaborative workAsynchronous collaborative work
• CSCW & EMSCSCW & EMS– Many different systemsMany different systems– Docs distributed wholly at all sitesDocs distributed wholly at all sites– Messages update docsMessages update docs– Human intervention neededHuman intervention needed
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 5
Synchronous collaborative workSynchronous collaborative work
• CSCW & EMSCSCW & EMS
• XML-enabled RDBMSXML-enabled RDBMS– Traditional table lockingTraditional table locking
• Native XML databasesNative XML databases– Document-based lockingDocument-based locking
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 6
A new approach: use casesA new approach: use cases
• Document AuthoringDocument Authoring– May move section while updatingMay move section while updating
• CMSCMS– XForms causes overwriteXForms causes overwrite
• Design & artDesign & art– May update same parts!May update same parts!
• ProgrammingProgramming– Change function name globallyChange function name globally
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 7
A new approach: serversA new approach: servers
– Implementation:Implementation:• Native or XML-enabled db Native or XML-enabled db • Use of Path Locks in transactionsUse of Path Locks in transactions• Document collaboration protocol (dcp)Document collaboration protocol (dcp)
– Type:Type:• Enhanced web server, e.g. in ApacheEnhanced web server, e.g. in Apache• or P2P implementation, e.g. in Wordor P2P implementation, e.g. in Word
– Extra features:Extra features:• Access Control (Elena Ferrari et al.)Access Control (Elena Ferrari et al.)
– Elements contain user rightsElements contain user rights
• version management (Epic)version management (Epic)– Elements contain version informationElements contain version information
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 8
A new approach: clientsA new approach: clients
• General purpose XML editorsGeneral purpose XML editors– E.g. Epic or XMLSpy E.g. Epic or XMLSpy
• Specific purpose XML editorsSpecific purpose XML editors– E.g. Autocad or ExcelE.g. Autocad or Excel
• Issues:Issues:– Intelligently query section to be updatedIntelligently query section to be updated– Commit when possibleCommit when possible– Refresh contentRefresh content
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 9
DiscussionDiscussion
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 10
Introduction: XMLIntroduction: XML
• XML is evolution of document XML is evolution of document language technologylanguage technology– 1969: GML (Generalized MarkUp L)1969: GML (Generalized MarkUp L)– 1974: SGML (Standard …), Goldfarb1974: SGML (Standard …), Goldfarb– 1986: ISO standard1986: ISO standard– 1989: HTML, Berners-Lee, Berglund, 1989: HTML, Berners-Lee, Berglund,
Cailliau at CERNCailliau at CERN
• XML much simpler than SGML (10% XML much simpler than SGML (10% of spec)of spec)
• Now: much more data stored as XMLNow: much more data stored as XML– Enter the XML-DBMS age…Enter the XML-DBMS age…
Part II
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 11
Running example (1/2)Running example (1/2)<document id="0"> <person id="1", age="55"> <name> Peter </name> <addr> Parklane 7 </addr> <child> <person id="3", age="22"> <name> John </name> <addr> Unistreet 1 </addr> <hobby> swimming </hobby> <hobby> cycling </hobby> </person> </child> <child> <person id="4", age="7"> <name> David </name> <addr> Parklane 7 </addr> </person> </child> </person> <person id="2", age="43"> <name> Mary </name> <addr> Parklane 7 </addr> <hobby> painting </hobby> </person></document>
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 12
Running example (2/2)Running example (2/2)
Queries:•/document/person//hobby•//child//hobby
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 13
Classic CC methods (1/3)Classic CC methods (1/3)
Table lockingTable locking• How:How: On update, whole table is lockedOn update, whole table is locked• Precludes Precludes phantomsphantoms• XML: parent-child relation in 1XML: parent-child relation in 1(*)(*) table table• Example:Example:
– Query: Query: //child//hobby//child//hobby– Update that Update that shouldshould be allowed: change be allowed: change hobbyhobby
element not occuring under element not occuring under child child – Not possible when entire table is lockedNot possible when entire table is locked
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 14
Classic CC methods (2/3)Classic CC methods (2/3)
Predicate lockingPredicate locking• HowHow
– Locks in form of predicate: Locks in form of predicate: name=“person”name=“person”– Predicate indicates what has been readPredicate indicates what has been read
• Example:Example:– Query: Query: /document/person//hobby/document/person//hobby– Update(ok): create Update(ok): create personperson under root element under root element– Update(~ok): create Update(~ok): create hobbyhobby under this under this personperson– Both are not possible since 1st predicate locks all Both are not possible since 1st predicate locks all personpersons under the roots under the root
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 15
Classic CC methods (3/3)Classic CC methods (3/3)
Hierarchical lockingHierarchical locking• HowHow
– Lock granule Lock granule intentionintention lock on ancestors lock on ancestors– Change granule Change granule exclusive lock on exclusive lock on XX
Tree lockingTree locking• HowHow
– Lock node Lock node lock parent of node lock parent of node– Add node under Add node under XX exclusive lock on exclusive lock on XX
And ... query And ... query //A//B//A//B requires shared-locks requires shared-locks on on entireentire tree tree
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 16
User Data Model (1/2)User Data Model (1/2)
Data ModelData Model• (XPath-tree) Tree with labelled nodes(XPath-tree) Tree with labelled nodes
– NB: we ignore ordering of childrenNB: we ignore ordering of children
Path ExpressionsPath Expressions• Sequence of tag names and wild-cards (*)Sequence of tag names and wild-cards (*)• Separated by / (child) and // (descendants). Separated by / (child) and // (descendants).
– person/childperson/child– *//person/child*//person/child
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 17
User Data Model (2/2)User Data Model (2/2)Query Query • QQ((nn,,pp): yields set of nodes which are reachable from ): yields set of nodes which are reachable from
nn via path expression via path expression pp..– QQ((nn,,*//hobby*//hobby))
AdditionAddition• AA((nn,,aa): add node with name ): add node with name aa under under nn
– A(A(nn,,hobbyhobby))– Fails if Fails if nn is not there, yields new node. is not there, yields new node.
DeletionDeletion• D(D(nn): delete ): delete nn
– Fails if Fails if nn has children. has children.
CommitCommit• C(): end of transactionC(): end of transaction
Node-correctness:Thou shalt only use nodes which you have obtained via an addition or via a query.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 18
Scheduler’s Data Model (1/2) Scheduler’s Data Model (1/2)
• Instance GraphInstance Graph– Acyclic graph with labelled nodesAcyclic graph with labelled nodes– Nodes labelled with a Nodes labelled with a delete set:delete set:
• Identifiers of transactions which deleted the node.Identifiers of transactions which deleted the node.
• Actual InstanceActual Instance– Subgraph of instance graph formed by the Subgraph of instance graph formed by the
nodes with an empty delete setnodes with an empty delete set– Is always an XPath-treeIs always an XPath-tree
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 19
Scheduler’s Data Model (2/2)Scheduler’s Data Model (2/2)
Query Query • QQ((nn,,pp): yields a set of nodes which are -- in the ): yields a set of nodes which are -- in the
actual instanceactual instance – reachable from – reachable from nn via path via path expression expression pp..
AdditionAddition• AA((nn,,aa): adds node with name ): adds node with name aa under under nn
– Empty delete setEmpty delete set– Fails if Fails if nn is not in the is not in the actual instanceactual instance..
DeletionDeletion• DD((nn): add transaction to the delete set of ): add transaction to the delete set of nn
– Fails if Fails if nn has children in the has children in the actual instanceactual instance..
CommitCommit• CC(): delete nodes with transaction in delete set(): delete nodes with transaction in delete set
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 20
PL propagation scheme (1/3)PL propagation scheme (1/3)Read LocksRead Locks
● rlrl((n, pn, p) ) ● e.ge.g. . rlrl((nn1212, //a//b), //a//b)
Required read locks:Required read locks:● For For QQ((nn,,pp) request rl() request rl(nn,,pp) and do ....) and do ....● Read lock propagation:Read lock propagation:
1)1) rl(rl(nn, a/, a/pp) -> rl() -> rl(n'n', , pp) if ) if n'n' is an a-child of is an a-child of nn
2)2) rl(rl(nn, */, */pp) -> rl() -> rl(n'n', , pp) if ) if n'n' is a child of is a child of nn
3)3) rl(rl(nn, a//, a//pp) -> rl() -> rl(n'n', *//, *//pp) and rl() and rl(n'n', , pp) if ) if n'n' is an a-child of is an a-child of nn
4)4) rl(rl(nn, *//, *//pp) -> rl() -> rl(n'n', *//, *//pp) and rl() and rl(n'n', , pp) if ) if n'n' is a child of is a child of nn● Recomputing propagation on update is very easy (!)Recomputing propagation on update is very easy (!)
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 21
PL propagation scheme (2/3)PL propagation scheme (2/3)
Example:Example: Doc. root
document
personperson
childchildaddrnameage
person
age name addr hobby hobby
person
age name addr
age name addr hobby
*//child//hobby
*//child//hobbychild//hobby
*//child//hobbychild//hobby
*//child//hobbychild//hobby
*//child//hobbychild//hobby
*//child//hobbychild//hobby*//hobby
*//hobby
hobby
*//child//hobby
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 22
PL propagation scheme (3/3)PL propagation scheme (3/3)Write LocksWrite Locks
– wl(wl(nn, , aa) or wl() or wl(nn, , **))
Required write locks:Required write locks:– For For AA((nn,,aa) request:) request:
• wl(wl(nn,,aa))
– For For DD((nn) request: ) request: • wl(wl(nn, *) if , *) if nn exists exists• wl(wl(n'n', , aa) if ) if nn is an is an aa-child of -child of n'n'
Conflict rules:Conflict rules:– wl(wl(nn, *) and wl(, *) and wl(nn, , aa) conflicts with rl() conflicts with rl(nn, , **) and rl() and rl(nn, , aa))– All others do not. (... write-write conflicts?)All others do not. (... write-write conflicts?)
Number of locks:• updates: O(1)• queries: O(|p|.|G|)
Number of locks:• updates: O(1)• queries: O(|p|.|G|)
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 23
Path-lock satisfiability schemePath-lock satisfiability scheme
Read locksRead locks● rl(rl(n, pn, p) ) (see PL-Prop)(see PL-Prop)
● For For QQ((nn,,pp) only rl() only rl(nn,,pp) is necessary) is necessary
Write locksWrite locks● wl(wl(nn,,aa) and wl() and wl(nn,*),*)● For For AA((nn,,aa) and ) and DD((nn) necessary as w. PL-Prop.) necessary as w. PL-Prop.
Conflict rulesConflict rules● wl(wl(nn,,aa) and wl() and wl(nn,*) conflict with rl(,*) conflict with rl(n'n',,pp) if) if
1)1) there is a path from there is a path from n'n' to to nn with label-list with label-list LL and and
2)2) LL/a (or /a (or LL/*) satisfies the path expression /*) satisfies the path expression pp
Number of locks:• updates: O(1)• queries: O(1)
Number of locks:• updates: O(1)• queries: O(1)
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 24
Commit schedulerCommit scheduler• Transactions make requests for operations:Transactions make requests for operations:
– Query, Addition, Deletion, Query, Addition, Deletion, CommitCommit, , Roll-backRoll-back
• Scheduler accepts request only if Scheduler accepts request only if 1)1) the operation does not fail, andthe operation does not fail, and
2)2) the required locks do not conflict with existing locksthe required locks do not conflict with existing locks
• On Commit the locks disappear ofOn Commit the locks disappear of– the committing transaction, andthe committing transaction, and– the nodes deleted by the transactionthe nodes deleted by the transaction
• On Roll-back the locks disappear ofOn Roll-back the locks disappear of– the transaction being rolled-backthe transaction being rolled-back
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 25
Conflict SchedulerConflict Scheduler• Scheduler has a Scheduler has a Dependency GraphDependency Graph (DG): (DG):
– arrow arrow t1t1 --> --> t2t2 if a lock of an operation of if a lock of an operation of transaction transaction t1t1 conflicts with a lock of a preceding conflicts with a lock of a preceding operation of operation of t2t2
• Scheduler accepts request only if Scheduler accepts request only if 1)1) the operation does not fail, andthe operation does not fail, and
2)2) no cycles appear in the DGno cycles appear in the DG– A commit of A commit of t1t1 is not accepted if in the DG arrows is not accepted if in the DG arrows
depart from depart from t1t1..– A roll-back of A roll-back of t1t1 leads to a roll-back of leads to a roll-back of t2t2 if if t2t2 --> --> t1t1
in the DG.in the DG.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dT h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d
Stijn Dekeyser Griffith U. Seminar, October ’04 26
Conclusion and Further ResearchConclusion and Further Research
• Commit and conflict schedulers guarantee Commit and conflict schedulers guarantee serialisability serialisability
• Complexity is decided by the size of the document / Complexity is decided by the size of the document / instanceinstance
• Is order of children a problem?Is order of children a problem?– simulationsimulation– write-write conflictswrite-write conflicts
• What with the relocation of subtrees?What with the relocation of subtrees?– Identity of nodes to be taken into account?Identity of nodes to be taken into account?
• No use / knowledge of (entire) instance?No use / knowledge of (entire) instance?– Use of DataGuideUse of DataGuide– Instance Independent CC for SSDInstance Independent CC for SSD