Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

41
Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin

description

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu. Presented by Claudia Levin. Introduction. Here are investigated Incremental validation algorithm for XML document presented as DTD (Data Type Definition) in O(m log n) . - PowerPoint PPT Presentation

Transcript of Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Page 1: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Incremental Validation of XML Documents

Yannis PapakonstantinouVictor Vianu

Presented by Claudia Levin

Page 2: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Introduction

Here are investigated

Incremental validation algorithm for XML document presented as DTD (Data Type Definition) in O(m log n).

Incremental validation algorithm for XML schema in O(m log2 n).

Using the auxiliary structure of size O(n) for both.

Page 3: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Example of an XML document<dealer>

<UsedCars>

<ad>

<model>Honda</model>

<year>92</year>

</ad>

</UsedCars>

<NewCars>

<ad>

<model>BMW</model>

</ad>

</NewCars>

</dealer>

Page 4: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

An XML document as Labeled Ordered Tree

Dealer

UsedCars NewCars

Ad Ad AdAd

Model Year ModelModelModelYear

Honda 92 Subaru 99 BMW Mazda

Page 5: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Abstraction of Document Type Definitions (DTDs)

The basic mechanism for specifying the type of XML documents.

root : dealer

dealer → UC NC

UC → ad*

NC → ad*

ad → model (year| ε)

model → ε

year → ε

Page 6: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Specialized DTD abstraction (XML Schema)

A specialized DTD is a 4-tuple ‹,t,d,μ› where is a finite alphabet of labels, t is a finite alphabet of types, d is a DTD over and μ is a mapping from t to .

Page 7: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Specialized DTD (XML Schema) example

root : dt

dt → UCt NCt μ(dt) = dealer

UCt → adu* μ(UCt) = UC

NCt → adn* μ(NCt) = NC

adu → mt yt μ(adu) = ad

adn → mt μ(adn) = ad

mt → ε μ(mt) = model

yt → ε μ(yt) = year

Page 8: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Specialized DTD example

Dealer

UC NC

Ad Ad AdAd

Model Year ModelModelModelYear

dt

NCt

adu

ytytmt mt mt mt

aduadn adn

UCt

Page 9: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Incremental Validation Problem

Given a specialized DTD , a tree sat(), and a sequence of updates to yielding another tree ’, we wish to efficiently check if ’ sat().

Use and maintain the auxiliary structure () to help in the validation.

Page 10: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Update types

Replace the current label of a specified node by another label;

Insert a new leaf node after a specified node;

Insert a new leaf node as the first child of a specified node;

Delete a specified leaf node.

Page 11: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Node label renaming u(ai,b)

r

ai-1 ai ai+1

c1 c2 cn

… …

Page 12: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

New node inserting

Insert air

ai-1 ai ai+1

… …

Page 13: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Deleting of a node

Delete air

ai-1 ai ai+1

… …

Page 14: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Warmup: incremental validation of Strings

Check the validity of a string a1 … an with respect to NFA N = ‹,Q,Q0,F,δ› after a sequence of element renames u(ai1,b1)…u(aim,bm), where i1 < i2 <…< im.

Validating the new string from scratch by running it throw N takes O(n |Q2| log|Q|)

Page 15: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Incremental validation of Strings (the first attempt)

Consider a single renaming u(i,b) for 1≤i≤n.

Pre(i)= δ(q0,a1…ai-1)

Post(i)={s | δ(s,ai+1…an) F}

bs2

Pre(i)Post(i)

S2 δ(b,s1)

Page 16: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Definition of Transition Relation

For each I,j 1≤ I < j ≤ n

Ti,j = {‹p,q› | p,q Q, q δ(p, ai…aj)}

δb = {<r,s>| r,s Q, s δ(r,b)}

qai+1

pai

aj

sb

r

Page 17: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Checking of validity with Transition Relation

The updated string

a1…ai1-1b1ai1+1…aim-1bmaim+1…an

is valid iff

<qo,f> To(i1-1)o δb1

o T(i1+1)(i2-1) o …oT(im+1)(n)

Time complexity here is O(m|Q2| log |Q|)

Page 18: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Divide-and-conquer validation with Transition Relation Tree

Validates a sequence of m renamings to a string of length n.

The time taken is O(m|Q|2 log|Q| log n) The auxiliary structure size is O(|Q|2 n)

Page 19: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Transition Relation Tree example

Τ18

Τ14 Τ58

Τ12 Τ34 Τ56 Τ78

Τ11 Τ22 Τ33 Τ 44 Τ55 Τ66 Τ77 Τ88

a1 a2a3 a4 a5 a6 a7 a8

The number of nodes in T1n is 2n-1. Its depth is log n.

Page 20: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Label renaming with Transition Relation Tree

Consider a1…an L(n) and a sequence of renames u(i1,b1), …,u(im, bm), where i1<i2<…<im. The updated string is a1…ai1b1ai+1…ai,m-1 bm ai,m+1…am .

The relations Tij which are affected by the updates are those laying on the path from a leaf changed to the root of Tn.

The number of relations changed is at most mlogn.

Page 21: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Label Renaming by Divide-and-Conquer approach in O(log n)

U(a3,b)Τ18

Τ14 Τ58

Τ12 Τ34 Τ56 Τ78

Τ11 Τ22 Τ33 Τ 44 Τ55 Τ66 Τ77 Τ88

a1 a2 a3 a4 a5 a6 a7 a8b

Page 22: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Dealing with inserts and deletes: Why B-trees?

Inserts and deletes cause the position of the nodes in the string to change.

The length of the string and the set of relevant intervals used to construct Tn are now dynamic.

Tree should continue to be balanced and have depth O(log n)

Page 23: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

B-trees

3 cells in each node; The cell is either empty or contains a set

Ts corresponding to some subsequence s of the string.

At most one of the 3 cells in a node can be empty.

Each nonempty cell is either at a leaf or has one node as a child.

Page 24: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

B-Trees for dealing with inserts and deletes in O(log n)

Tsa,Tsb,Tsc

Ts1,Ts2 Ts3,Ts5,Ts6 Ts7,Ts9

n1 n3 n5 n6 n7 n9n2

Tsa = Ts1 o Ts2 Tsb = Ts3 o Ts5 o Ts6

Page 25: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Validation with B-trees with respect to NFA N = ‹,Q,Q0,F,δ‹

When T for the updated string is computed, check that for some f F, <qo,f> belongs to the composition of the sets Ts in the cells of the root node of T.

The cost of checking is O(|Q|2 log|Q|)

Page 26: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Insertion to a Transition Relation Tree

Insertion of nodes n4 and n8

Tsa,Tsb,Tsc

Ts1,Ts2 Ts3,Ts5,Ts6 Ts7,Ts9

n1 n3 n5 n6 n7 n9n2

n8n4

Page 27: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Insertion to a Transition Relation Tree

Insertion of nodes n4 and n8

Tse,Tsf

Ts1,Ts2 Ts7,Ts8,Ts9

n1 n3 n5 n6 n7 n9n2 n8n4

Ts3,Ts4 Ts5,Ts6

Tsb’’,TscTsa,Tsb’

Page 28: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

B-Tree validation algorithm costs

Renaming: update propagates from the leaf to the root – O(log n) updates.

Insertion or deletion: may involve splits and merges of the cells all the way to the root. The worst case complexity is O(|Q|2 log|Q| log n)

Page 29: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Incremental DTD validation

d → r(d)root

d …

a1 ai-1 ai ai+1 an

c1 c2 c3 c4

… …

bv

Page 30: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Incremental DTD validation

The auxiliary structure maintained: for each sequence of siblings in the tree the transition relations Ts of the divide-and-conquer algorithm are preemptively computed.

The auxiliary structure size is at most O(|| |d|2 |T|), where |T| is the size of T and |d|=max{|ra| | a → ra d}

The total validation time is O(m || |d|2 log |d| log |T|)

Page 31: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Specialized DTDs: a first attempt

Tree T is valid iff root(d) types(root(T))r

v

ai-1 ai ai+1

c1 c2 c3 c4 cn

… …

types(r)

types(v)

types(an)types(ai)

b

Page 32: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Specialized DTDs: a first attempt

The auxiliary structure size is the same as for DTDs, at most O(|| |d|2 |T|), where |T| is the size of T and |d|=max{|ra| | a → ra d} .

The total validation time for DTD is O(m || |d|2 log |d| log |T|).

The total validation time for specialized DTD is O(m |t| |d|2 log |d| depth(T) log |T|).

Page 33: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Binary tree encoding of unranked tree

a

bd

jk

e

f h

g i

a

b kj

#

d

c

#

#

#c# #

e

f

#

i

#h

#

#

#

g #

Page 34: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Binary tree encoding of unranked tree

One of the standard encodings in the literature (F.Neven. Automata, Logic and XML. In Computer Science Logic, 2002)

Lemma: For each specialized DTD = ‹,t,d,μ› there exists a BNTA A over #

whose number of states is O(|t||d|) such that (A) = {enc(T) | T sat(),

Page 35: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Principle lines

a

b kj

#

d #

#

#c# #

e

f

#

i

#h

#

#

#

g #

Page 36: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

From BNTA to NFA on principal lines

a

b kj

#

d #

#

#c# #

e

f

#

i

#h

#

#

#

g #

TC, b d, Tj

Tg, f

Page 37: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

From BNTA to NFA on principal lines

a b d e f ih

c

k

j g

Tc, b d, Tj Tg, f

Page 38: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

NFA construction

We’ll construct NFA N which accepts the string an…a1 iff NTFA A = ‹#,Q,Q0,qf,δ› accepts enc(T)

Let NFA N = ‹’,Q,q0,F’,δ’›, where ’= {#} υ (Q x ) υ ( x Q), F’= {qf}, and δ’(#,q0) = Q0; δ’(‹a,S›,q) = υq’ S

δ(a,q,q’) for a ; δ’(q,‹a,S›) = υq’ S

δ(a,q’,q) for a ;

Page 39: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Line rearrangement for insertions and deletions

vv’

ll0

l’

l’’

Page 40: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Complexity Results

Given sequence of m updates for DTD XML abstraction we get

The auxiliary structure size is at most O(|| |d|2 |T|), where |T| is the size of T and |d|=max{|ra| | a → ra d}

The total validation time is O(m || |d|2 log |d| log |T|)

Page 41: Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Complexity Results

Given sequence of m updates for specialized DTD (XML schema) we get

The auxiliary structure size is at most O(|| |d|2 |T|);

The total validation time is

O(m |t|2 |d|2 log (|t||d|) log2 |T|)