University of Lübeck, Germany Institute of Information Systems Incremental Validation of String-...
-
Upload
beryl-parks -
Category
Documents
-
view
215 -
download
0
Transcript of University of Lübeck, Germany Institute of Information Systems Incremental Validation of String-...
University of Lübeck, GermanyInstitute of Information Systems
Incremental Validation of String-Based XML Data in Databases, File
Systems and Streams
Beda Christoph Hammerschmidt3, Christian Werner2, Ylva Brandt2, Volker Linnemann1,
Sven Groppe1, Stefan Fischer2
1Institute ofInformation Systems
U of Lübeck, Germany
2Institute ofTelematics
U of Lübeck, Germany
3Oracle Corp.Redwood ShoresCalifornia, USA
Incremental Validation of String-based XML Data © Volker Linnemann et al. 22.10.2007
Table of Contents
1. Introduction and Motivation
2. The XML Validation Problem
3. Efficiently Validating Updates
4. Experiments
5. Conclusion
Incremental Validation of String-based XML Data © Volker Linnemann et al. 32.10.2007
1. Introduction and Motivation
• XML Data is important in many applications
• Valid XML data increases the correctness of applications
• Validity according to an XML DTD or an XML Scheme
Incremental Validation of String-based XML Data © Volker Linnemann et al. 42.10.2007
1. Introduction and Motivation
• In case of an update:– Revalidation of the whole document is time
consuming– Solution: Incremental Validation
XML Document
validate changed part only
Incremental Validation of String-based XML Data © Volker Linnemann et al. 52.10.2007
1. Introduction and Motivation
• Some approaches for partial validation exist, but:– most of them are DOM-based, i.e. tree of nodes
• DOM: inherently well formed
• We focus on the string representation of XML data as it is used in– XML column types– Message Systems (SOAP)– SQLXML update commands
→ Sequence of tags and values
Incremental Validation of String-based XML Data © Volker Linnemann et al. 62.10.2007
2. The XML Validation Problem
• XML Schema:
Incremental Validation of String-based XML Data © Volker Linnemann et al. 72.10.2007
2. The XML Validation Problem
• Regular Tree Grammar of XML Schema:
G = (N,T,P,S)
N: set of Nonterminal Symbols
T: set of Terminal Symbols
P: set of Production Rules
S: set of Start Symbols, S N
Incremental Validation of String-based XML Data © Volker Linnemann et al. 82.10.2007
2. The XML Validation Problem
• Example:
Incremental Validation of String-based XML Data © Volker Linnemann et al. 92.10.2007
2. The XML Validation Problem
• Set of Finite State Machines generated out of a regular tree grammar
• Example:
Incremental Validation of String-based XML Data © Volker Linnemann et al. 102.10.2007
XML Schema Aware Pushdown Automaton PDA
<a> <b> <a></a> <a></a> </b></a>
Z q0 r0q1 q0r1q1 r1q1 q0r2q1
r2q1 q1 Stack empty
Incremental Validation of String-based XML Data © Volker Linnemann et al. 112.10.2007
PDA
3. Efficiently Validating Updates
• Element State Index
Incremental Validation of String-based XML Data © Volker Linnemann et al. 122.10.2007
3. Efficiently Validating UpdatesThe Element/State-Index referencing XML dataand PDA states for the document
<c> 27 </c>
<b> <c> 27 </c> </b>
r0 s0r2 s1r2 r2 Stack empty
/a/b/c 7 OpenC s0
Incremental Validation of String-based XML Data © Volker Linnemann et al. 132.10.2007
3. Efficiently Validating Updates
• Finding the update position in the XML data using the index
Incremental Validation of String-based XML Data © Volker Linnemann et al. 142.10.2007
4. Efficiently Validating Updates
How efficient is the incremental validation ?
– PDA is generated only once for the XML scheme– Time for the validation is linear in the size of
the updated part, it is independent of the total size of the document
– Time for the index update is also linear in the size of the updated part, except for updating the offsetlist
But: Offsetlist is not needed for validating the update, it is used only for searching
Incremental Validation of String-based XML Data © Volker Linnemann et al. 152.10.2007
4. Experiments
• Time to validate the XMark Sample Data
• Updated Element: 20 kB– Xenia global: PDA with no incremental update– Xenia local: PDA with incremental update
Incremental Validation of String-based XML Data © Volker Linnemann et al. 162.10.2007
5. Conclusion
• Incremental validation by using a Pushdown Automaton PDA:– Costs are in the size of the update operation– Validation is performed before updating the data
→ no invalid data
• In the paper:– formalism for generating the PDA– element/state index in detail
Incremental Validation of String-based XML Data © Volker Linnemann et al. 172.10.2007
5. Conclusion
• Directions for Future Work– Optimize Index Update– Index only for selected paths
→ Index Selection Problem– Update Index only when needed
Thank you for your attention !!
University of Lübeck, GermanyInstitute of Information Systems
Incremental Validation of String-Based XML Data in Databases, File
Systems and Streams
Beda Christoph Hammerschmidt3, Christian Werner2, Ylva Brandt2, Volker Linnemann1,
Sven Groppe1, Stefan Fischer2
1Institute ofInformation Systems
U of Lübeck, Germany
2Institute ofTelematics
U of Lübeck, Germany
3Oracle Corp.Redwood ShoresCalifornia, USA