University of Lübeck, Germany Institute of Information Systems Incremental Validation of String-...

18
University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams Beda Christoph Hammerschmidt 3 , Christian Werner 2 , Ylva Brandt 2 , Volker Linnemann 1 , Sven Groppe 1 , Stefan Fischer 2 1 Institute of Information Systems U of Lübeck, Germany 2 Institute of Telematics U of Lübeck, Germany 3 Oracle Corp. Redwood Shores California, USA

Transcript of University of Lübeck, Germany Institute of Information Systems Incremental Validation of String-...

Page 1: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

University of Lübeck, GermanyInstitute of Information Systems

Incremental Validation of String-Based XML Data in Databases, File

Systems and Streams

Beda Christoph Hammerschmidt3, Christian Werner2, Ylva Brandt2, Volker Linnemann1,

Sven Groppe1, Stefan Fischer2

1Institute ofInformation Systems

U of Lübeck, Germany

2Institute ofTelematics

U of Lübeck, Germany

3Oracle Corp.Redwood ShoresCalifornia, USA

Page 2: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 22.10.2007

Table of Contents

1. Introduction and Motivation

2. The XML Validation Problem

3. Efficiently Validating Updates

4. Experiments

5. Conclusion

Page 3: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 32.10.2007

1. Introduction and Motivation

• XML Data is important in many applications

• Valid XML data increases the correctness of applications

• Validity according to an XML DTD or an XML Scheme

Page 4: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 42.10.2007

1. Introduction and Motivation

• In case of an update:– Revalidation of the whole document is time

consuming– Solution: Incremental Validation

XML Document

validate changed part only

Page 5: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 52.10.2007

1. Introduction and Motivation

• Some approaches for partial validation exist, but:– most of them are DOM-based, i.e. tree of nodes

• DOM: inherently well formed

• We focus on the string representation of XML data as it is used in– XML column types– Message Systems (SOAP)– SQLXML update commands

→ Sequence of tags and values

Page 6: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 62.10.2007

2. The XML Validation Problem

• XML Schema:

Page 7: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 72.10.2007

2. The XML Validation Problem

• Regular Tree Grammar of XML Schema:

G = (N,T,P,S)

N: set of Nonterminal Symbols

T: set of Terminal Symbols

P: set of Production Rules

S: set of Start Symbols, S N

Page 8: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 82.10.2007

2. The XML Validation Problem

• Example:

Page 9: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 92.10.2007

2. The XML Validation Problem

• Set of Finite State Machines generated out of a regular tree grammar

• Example:

Page 10: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 102.10.2007

XML Schema Aware Pushdown Automaton PDA

<a> <b> <a></a> <a></a> </b></a>

Z q0 r0q1 q0r1q1 r1q1 q0r2q1

r2q1 q1 Stack empty

Page 11: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 112.10.2007

PDA

3. Efficiently Validating Updates

• Element State Index

Page 12: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 122.10.2007

3. Efficiently Validating UpdatesThe Element/State-Index referencing XML dataand PDA states for the document

<c> 27 </c>

<b> <c> 27 </c> </b>

r0 s0r2 s1r2 r2 Stack empty

/a/b/c 7 OpenC s0

Page 13: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 132.10.2007

3. Efficiently Validating Updates

• Finding the update position in the XML data using the index

Page 14: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 142.10.2007

4. Efficiently Validating Updates

How efficient is the incremental validation ?

– PDA is generated only once for the XML scheme– Time for the validation is linear in the size of

the updated part, it is independent of the total size of the document

– Time for the index update is also linear in the size of the updated part, except for updating the offsetlist

But: Offsetlist is not needed for validating the update, it is used only for searching

Page 15: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 152.10.2007

4. Experiments

• Time to validate the XMark Sample Data

• Updated Element: 20 kB– Xenia global: PDA with no incremental update– Xenia local: PDA with incremental update

Page 16: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 162.10.2007

5. Conclusion

• Incremental validation by using a Pushdown Automaton PDA:– Costs are in the size of the update operation– Validation is performed before updating the data

→ no invalid data

• In the paper:– formalism for generating the PDA– element/state index in detail

Page 17: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

Incremental Validation of String-based XML Data © Volker Linnemann et al. 172.10.2007

5. Conclusion

• Directions for Future Work– Optimize Index Update– Index only for selected paths

→ Index Selection Problem– Update Index only when needed

Thank you for your attention !!

Page 18: University of Lübeck, Germany Institute of Information Systems Incremental Validation of String- Based XML Data in Databases, File Systems and Streams.

University of Lübeck, GermanyInstitute of Information Systems

Incremental Validation of String-Based XML Data in Databases, File

Systems and Streams

Beda Christoph Hammerschmidt3, Christian Werner2, Ylva Brandt2, Volker Linnemann1,

Sven Groppe1, Stefan Fischer2

1Institute ofInformation Systems

U of Lübeck, Germany

2Institute ofTelematics

U of Lübeck, Germany

3Oracle Corp.Redwood ShoresCalifornia, USA