Oracle Use of Adaptive Programming Compilation Technology

26
High Performance XML Data Retrieval Mark V. Scardina Group Product Manager & XML Evangelist Oracle Corporation Jinyu Wang Senior Product Manager Oracle Corporation

Transcript of Oracle Use of Adaptive Programming Compilation Technology

Page 1: Oracle Use of Adaptive Programming Compilation Technology

High Performance XMLData Retrieval

Mark V. ScardinaGroup Product Manager & XML EvangelistOracle Corporation

Jinyu WangSenior Product ManagerOracle Corporation

Page 2: Oracle Use of Adaptive Programming Compilation Technology

Agenda

Why XPath for Data Retrieval?Current XML Data Retrieval Strategies and IssuesHigh Performance XPath RequirementsDesign of Extractor for XPathExtractor Use Cases

Page 3: Oracle Use of Adaptive Programming Compilation Technology

Why XPath for Data Retrieval?

W3C Standard for XML Document Navigation since 2001Support for XML Schema Data Types in 2.0Support for Functions and Operators in 2.0Underlies XSLT, XQuery, DOM, XForms, XPointer

Page 4: Oracle Use of Adaptive Programming Compilation Technology

Current Standards-based Data Retrieval Strategies

Document Object Model (DOM) Parsing Simple API for XML Parsing (SAX)Java API for XML Parsing (JAXP)Streaming API for XML Parsing (StAX)

Page 5: Oracle Use of Adaptive Programming Compilation Technology

Data Retrieval Using DOM Parsing

Advantages– Dynamic random access to entire document– Supports XPath 1.0

Disadvantages– DOM In-memory footprint up to 10x doc size– No planned support for XPath 2.0 – Redundant node traversals for multiple XPaths

Page 6: Oracle Use of Adaptive Programming Compilation Technology

DOM-based XPath Data Retrieval

A

B

C

D

C

F

F

1

12

2

2

E

B

12

1

2 /A/B/C/D/A/B/C

Page 7: Oracle Use of Adaptive Programming Compilation Technology

Data Retrieval using SAX/StAX Parsing

Advantages– Stream-based processing for managed memory– Broadcast events for multicasting (SAX)– Pull parsing model for ease of programming

and control (StAX)Disadvantages

– No maintenance of hierarchical structure – No XPath Support either 1.0 or 2.0

Page 8: Oracle Use of Adaptive Programming Compilation Technology

High Performance Requirements

Retrieve XML data with managed memory resourcesSupport for documents of all sizesHandle multiple XPaths with minimum node traversalsSupport DTD and Schema-based XML documents

Page 9: Oracle Use of Adaptive Programming Compilation Technology

Extractor for XPath

Stream-based processing utilizing SAXSupport for DTDs and XML SchemasImplements Publish/Subscribe model for scalabilityHandles multiple XPaths simultaneouslySupports XPath 1.0; extended to 2.0

Page 10: Oracle Use of Adaptive Programming Compilation Technology

Extractor’s Publish/Subscribe Processing Model

SAX

XPath Expressions&Content Handlers

OutputsExtractor

Page 11: Oracle Use of Adaptive Programming Compilation Technology

Extractor’s Function Blocks

Initialization: registration of XPaths/Handlers XPath Compilation: compiles and builds index graphsXPath Tracking: maintains XPath state and matches doc XPaths with the indexed XPathsOutput: sends matching XPath start/stop events along with the XML data

Page 12: Oracle Use of Adaptive Programming Compilation Technology

Extractor’s Function Blocks

SAXTrack XPath and perform

XPath matching whenreceiving SAX events

XPathExpressions &

Content Handlers Register XPathexpressions and theinstances of content

handlers

Registered Handler A......

Registered Handler Z

Extractor

XML

Compile XPathexpressions and build

Index graph

......

XMLSequenceBuilder

XMLSAXSerializer

SAX

XML Sequences

Files

SAX

Runtime

Initialization Output

DTD/XSD

Page 13: Oracle Use of Adaptive Programming Compilation Technology

Initialization

Registration of absolute XPaths Support for XML Namespaces for differentiationRegistration of execution handlers using XContentHandler()Built-in Handlers for ease of use

– XMLSequenceBuilder()– XMLSAXSerializer()

Page 14: Oracle Use of Adaptive Programming Compilation Technology

XPath Compilation

XPath streamability evaluationStreamable isAll=true/false option

– True: Process only streamable XPaths– False: Buffer data as needed

Build XPath Predicate TableBuild Index Tree

– XPath Dependency Tree (w/o DTD/XSD)– Data Model Tree (w/ DTD/XSD) using existing

Validation engine

Page 15: Oracle Use of Adaptive Programming Compilation Technology

Compilation of Each XPath Predicate

Location XPath -1

Predicate XPath -1 Value

Predicate XPath -2 Predicate XPath-3

Predicate XPath-4 Value

Related to

... ... ... ...

TRUE

FALSE

Pending

Predicate XPath Evaluation Attributes Status

Location XPath -n

Predicate XPath -1 Value

Predicate XPath -2 Predicate XPath-3

Predicate XPath-4 Value

Related to TRUE

FALSE

Pending

Predicate XPath Evaluation Attributes Status

Predicate Table (not-matched, matched, yet-to-be-matched)

Predicate Table (not-matched, matched, yet-to-be-matched)

•Also Used for isAll = True condition

Page 16: Oracle Use of Adaptive Programming Compilation Technology

Runtime XPath Matching

State Machine tokenizes and tracks– In-scope Namespaces– Current Element Name– Current Element Attributes– Node Position Relative to Siblings– Number of Child Elements

Implemented as a Stack

Page 17: Oracle Use of Adaptive Programming Compilation Technology

XPath Index Tree (w/o DTD/XSD)

3

A

B

C

D

X

P

1

2

XPath Expressions XPath Dependency Tree

/A/B/C/D/A/B//P

/A/B/C

//P3

4

5 //P/*/Q

/

2

1

X

4P

X

5Q

X Fake Node for /*

X Fake Node for //

Page 18: Oracle Use of Adaptive Programming Compilation Technology

Dependency Tree Traversal

3

A

B

C

D

X'

P

Matched Node XPath Dependency Tree

/

2

1

X''

4P

X'''

5Q

X Fake Node for /*

X Fake Node for //

XPath Stack

A

B

C

D

A

P

C

X''

X''B

X'' X'1

2 D X'' X'

P

1

2 /A/B/C/D/A/B//P

/A/B/C

//P3

4

5 //P/*/Q

4X'' X'

3

Page 19: Oracle Use of Adaptive Programming Compilation Technology

Synchronous Data Model Traversal

A

B

C

D

C

F

F

1

2

3E

B

A

B

C

D

C

F

F

1

2

E

B

XML Document Tree XML Data Model

3

/A/B/C/D

/A/B/C

Page 20: Oracle Use of Adaptive Programming Compilation Technology

Extractor Output

XContentHandler()– Execution of Registered Content Handlers

XMLSequenceBuilder()– Built-in Handler– Presents Result Set as XMLSequence Object– Contains a Linked List of XMLItems

XMLSAXSerializer– Built-in Handler– Serializes output to Printwriter or OutputStream

Page 21: Oracle Use of Adaptive Programming Compilation Technology

Extractor Use Cases

Content ManagementWeb-ServicesXSLT/XQuery Implementation

Page 22: Oracle Use of Adaptive Programming Compilation Technology

Extractor Content Management Use Case

Pre-Process

SAXParser

JDBCDTD/XML Schema:

XPaths TableXPathExpressions:/a/b/a/b/@c...

XML system/public ID orXML Schema Location URL

Registered Handler A

Registered Handler Y

......

MetaData Tables

Registered Handler Z

JDBC

JDBC

Connection PoolDoc Table

Retrieve XPath Expressions

Process Data using Content Handlers

Extractor

Oracle Database

XML

Initialize

Page 23: Oracle Use of Adaptive Programming Compilation Technology

Extractor Web Service Use Case

Web ServicesRegister XPath/

ContentHandlers

"invoke" WSSOAP

SAXParser

XML

......

Client 3

Clients Subscribing and Receiving theData

Extractor

SOAP

Initialize

Web ServiceClient

Web Services

WSresponse

SOAP

Register DataSubscription using

XPath

Client 1

Client 2

Service Broker

Page 24: Oracle Use of Adaptive Programming Compilation Technology

Extractor XSLT/XQuery Use Case

XSLT ResolvingXPath

SAXParser

XML

Extractor

XML

RegisterXPath

XQueryXMLSequnceObject

XMLSequenceBuilder

XPath Bridge

Page 25: Oracle Use of Adaptive Programming Compilation Technology

Oracle XML Resources

•Oracle Technology Network• http://otn.oracle.com• Downloads, Demos, Samples, Papers• XML Support Forum

•Oracle Database 10g XML & SQLDesign, Build, & Manage XML Applications in Java, C, C++, & PL/SQL

• Covers all of Oracle XML technology• BetaBook Forum on OTN • Available in May from Bookstores

Page 26: Oracle Use of Adaptive Programming Compilation Technology