Querying Streaming XML Data. Layout of the presentation Introduction Common Problems faced ...

34
Querying Streaming XML Data
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    0

Transcript of Querying Streaming XML Data. Layout of the presentation Introduction Common Problems faced ...

Page 1: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Querying Streaming XML Data

Page 2: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Layout of the presentation

Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given

query Features of the system

Page 3: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Streaming XML XML – standard for information exchange. Some XML documents only available in

streaming format. Streaming is like reading data from a tape

drive. Used in Stock Market, News, Network

Statistics. Predecessor systems used to filter

documents.

Page 4: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Structure of an XPath Query

Consists of a Location path and an Output Expression (name).

Location path consists of closure axis(//), node test (book) and predicate (year>2000).

e.g. //book[year>2000]/name

Page 5: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Features of our Approach

Efficient Easy to understand design. Design of BPDT is tricky

Page 6: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Page 7: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Page 8: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Page 9: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Page 10: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Buffer both A & B

Page 11: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Failed price<11. Remove

Buffer both A & B

Page 12: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Failed price<11. Remove

Buffer both A & B

Test passed. Output

Page 13: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Page 14: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Page 15: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Passes year=2002

Page 16: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>7. <book>8. <name> Y </name>9. <author> B </author>10. <pub>11. <book>12. <name> Z </name>13. <author> B </author>14. </book>15. <year> 1999 </year>16. </pub>17. </book>18. <year> 2002 </year>19. </pub>20. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Passes year=2002

Lets add author. Result?

Page 17: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Handling XML Stream

Input – well formed XML stream. Use SAX API to parse XML. Events belong to

Begin = {(a, attrs, d)} End = {(/a, d)} Text = {(a, text(), d)}

XML Stream: {e1,e2,…,ei,…} ¦

ei Є Begin υ End υ Text

Page 18: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Grammar for XPath Queries Q N+[/O] N [/¦//] tag [F] F [FO[OP constant]] FO @attribute ¦ tag [@attribute] ¦ text() O @attribute ¦ text() OP > ¦ ≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains

XPath query of the form N1N2…Nn/O

Cant handle Reverse Axis, Positional Functions.

Page 19: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Solution to QueryQuery: /pub[year=2002]/book[price<11]/author

PDA PDT

Page 20: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Basic PushDown Transducer (BPDT)

Similar to PushDown Automata Actions defined on Transition Arcs Finite set of states

A Start state A set of final states

Set of input symbols Set of Stack symbols

Page 21: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Book – Author: Buffer for future: Begin event of Author.

Book – Author: Remove from Buffer: End event of Book.

Book – Author: Output result if predicates true: Begin event of Author.

Building a BPDTQuery: /pub[year>2000]/book[author]/name/text()

Consider location step: /book[author]

Page 22: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Basic Building Blocks

XPath Expression: /tag[child]

Page 23: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Buffer Operations needed Enqueue(x): Add x to the end of the queue.

Clear(): Removes all items from the queue.

Flush(): Outputs all items in the queue in FIFO order.

Upload(): Moves all items to the end of the queue of a parent BPDT.

No Dequeue operation needed.

Page 24: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Basic Building Blocks

XPath Expression: /tag[@attr=val]

Page 25: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Basic Building Blocks

XPath Expression: /tag[text()=val]

Page 26: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Basic Building Blocks

XPath Expression: /tag[child@attr=val]

Page 27: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Basic Building Blocks

XPath Expression: /tag[child=val]

Page 28: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

A sample BPDT

Query: /pub[year>2000]

Page 29: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Building a solutionHPDT for Query:

//pub[year>2000]//book[author]//name/text()

Page 30: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

HPDT Structure Each BPDT in HPDT has:

Position BPDT POSITION (l,K) :- l = depth of BPDT in HPDT, K

= sequence # from right to left BPDT Position (i-1,k) – has right child BPDT position

(i,2k) – connected to NA state BPDT Position(i-1,k) – has left child BPDT position

(I,2k+1) – connected to True state. BPDT Position (i, 2i – 1) – means predicates in higher

level BPDT’s evaluate to trueBuffer – potential resultsStack – stack of elements (SAX) eventsDepth Vector

Page 31: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Example Query

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

rootpub book name

1 2 7 11

1 2 10 11

1 9 10 11

3 paths from $1 to $14

Page 32: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

System Features

Name Support Streaming Multiple

Predicates Closure

Buffered Predicate

Evaluation

XSQ-F XPath X X X X

XSQ-NC XPath X X X

XMLTK XPath X X

XQEngine XQuery X X

Galax XQuery X X

Joost STX X X

Page 33: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Reference Feng Peng and Sudarshan Chawate. XPath Queries

on Streaming Data. In SIGMOD 2003.

Page 34: Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Thank You

???