XML Publishing

44
XML Publishing XML Publishing Introduction General approach XPERRANTO SilkRoute Microsoft SQL 2000 Summary

description

XML Publishing. Introduction General approach XPERRANTO SilkRoute Microsoft SQL 2000 Summary. Introduction. What is XML Publishing? XML Publishing is the task of transforming the relational data into XML, for the purpose of exchange over the Internet. - PowerPoint PPT Presentation

Transcript of XML Publishing

Page 1: XML Publishing

XML PublishingXML Publishing

Introduction General approach XPERRANTO SilkRouteMicrosoft SQL 2000 Summary

Page 2: XML Publishing

IntroductionIntroduction What is XML Publishing?

XML Publishing is the task of transforming the relational data into XML, for the purpose of exchange over the Internet.

More specifically, publishing XML data involves joining tables, selecting and projecting the data that needs to be exported, creating XML hierarchies; and processing values in an application specific manner.

Page 3: XML Publishing

IntroductionIntroduction Why need XML Publishing? - most business data are stored in

relational database systems.- XML is a standard for exchanging business data on the web.

- it’s a simply, platform independent, unicode based syntax for which simple and efficient parsers are widely available.

- it can not only represent structured data, but also provide an uniform syntax for semi-structured data and marked-up content.

Page 4: XML Publishing

IntroductionIntroduction

Two data model: Relational data- fragmented into many flat relations- normalized- proprietary XML data- nested- un-normalized- public (450 schemas at www.biztalk.org)

Page 5: XML Publishing

General ApproachGeneral Approach

Create XML views over Relational Data, each of these XML views can provide an alternative, application-specific view of the underlying relational data. Through these XML views, business partners can access existing relational data as though it was in some industry-standard XML format.

Page 6: XML Publishing

Virtual vs. MaterializeVirtual vs. Materialize

Materialized XML Publishing

Materialize the entire XML view on request and return the resulting XML document.

Virtual XML Publishing

Support queries over XML views, return what user applications actually want.

Page 7: XML Publishing

Virtual vs. MaterializeVirtual vs. MaterializeMaterialized XML Publishing- applications can access all the data without interfering with

the relational engine- XML view need to be refreshed periodically- inefficient in some casesVirtual XML Publishing- guarantee data freshness- leverage the processing power of relational engines- translation of an XML query of an XML view into SQL may

be complex

Page 8: XML Publishing

Middleware SystemMiddleware System

Interface between Relational Database and User Application

- defines and manages XML views

- translates incoming XML queries into SQL

and submits them to the database system

- receives the queries’ results, then translates them back into XML terms.

Page 9: XML Publishing

Applications

Web/Intranet

XMLQuery

Processor

XML Views Manager

XML Tagger

RDBMS

Middleware System

Figure 1 A high-level architecture of middleware system

View Definition

View Description

Result XML Documents

User XML Queries

SQL Queries Tuples Streams

Page 10: XML Publishing

XPERRANTO vs. SilkRouteXPERRANTO vs. SilkRoute

IBM XPERRANTO- pure XML, single query language approach.

XML views are defined by XML query language which is using the type system of XML schema.

SilkRoute

- XML views are defined using a declarative query language called RXL (Relational to XML Transformation Language).

Page 11: XML Publishing

XPERRANTO vs. SilkRouteXPERRANTO vs. SilkRoute

XPERRANTO

- user only need be familiar with XML

- both relation data and meta-data can be represented and queried in the same framework

- can publish object-relational structures

- pushes all relational logic down to database engine

Page 12: XML Publishing

Query Translation

XML View Services

XML Tagger

Figure 2 XPERRANTO Architecture

View Definition

View Description

SQL Queries

Data Tuples

XML-QL Parser

Query Rewrite

SQL Translation

XQGM

XQGM

XML Schema Generator

XML Result

O-R Database

SQL Query Processor

Stored Tables

System Catalog

Catalog Info.

XML Schema

Page 13: XML Publishing

Example 1: Relational Schema vs. XML Example 1: Relational Schema vs. XML View SchemaView Schema

DDL (Data Definition Language) for O-R Schema in SQL99 Terms

1.Create Table Book AS (bookID CHAR(30), name VARCHAR(225), publisher VARCHAR(30))

2.Create Table publisher AS (name VARCHAR(30), address VARCHAR(255))

3.Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30))

4.Create Table author OF author_type (REF IS ssn USER GENERATED)

Page 14: XML Publishing

XML View Schema over Example O-R databaseXML View Schema over Example O-R database

<simpleType name=“string255” source=“string”> <maxlength value=“255”/> </simpleType>

<simpleType name=“string30” source=“string”> <maxlength value=“30”/> </simpleType>

<complexType name=“bookTupleType”>

<element name=“bookID” type=“string30”/>

<element name=“name” type=“string225”/>

<element name=“publisher” type=“string30”/>

</complexType>

<complexType name=“bookSetType”>

<element name=“bookTuple” type=“bookTupleType” maxOccurs=“*”/>

</complexType>

<element name=“book” type=“bookSetType”/>

<complexType name=“author_type”>

<element name=“bookID” type=“string30”/>

<element name=“first” type=“string30”/>

<element name=“last” type=“string30”/>

</complexType>

<complexType name=“authorTupleType” source=“author_type” derivedBy=“extension”>

<attribute name=“ssn” type=“ID”/>

</complexType>

<complexType name=“authorSetType”>

<element name=“authTuple” type=“authTupleType” maxOccurs=“*”/>

</complexType>

<element name=“author” type=“authSetType”/>

Create Type author_type AS...

Create Table book AS...

Create Table author OF ...

Page 15: XML Publishing

Default XML View over Example O-R databaseDefault XML View over Example O-R database

<db> <book>

<row><bookID>…</bookID><name>…</name><publisher>…</publisher></row> <row><bookID>…</bookID><name>…</name><publisher>…</publisher></row>

… </book> <author> <row><ssn>…</ssn><bookID>…</bookID><first>…</first><last>…</last></row>

<row><ssn>…</ssn><bookID>…</bookID><first>…</first><last>…</last></row>…

</author><publisher>

…similar to <book> and <item></publisher>

</db>

Page 16: XML Publishing

Example 2: From XQuery to SQLExample 2: From XQuery to SQL

XPERRANTO Query Engine

XQuery Parser

Query Rewrite &View Composition

Computational Pushdown

XQuery

RDBMSSQL Query

TaggerRuntime

Tuples

Query Result

XQGM

XQGM

Tagger Graph

Page 17: XML Publishing

A Purchase Order Database and its Default ViewA Purchase Order Database and its Default View

id

Smith Construction10

9

custnumcustname

Western Builders

7734

7725

order

oid costdesc

10

10

generator

backhoe

8000

12000

item

oid amtdue

10

10

1/10/01

6/10/01

20000

12000

payment

<db>

<order> <row><id>10</id><custname>Smith Construction</custname><custnum>7734</custnum></row> <row><id>9</id><custname>Western Builders</custname><custnum>7725</custnum></row> </order>

<item> <row><oid>10</id><desc>generator</desc><cost>8000</cost></row> <row><oid>10</id><desc>backhoe</desc><cost>24000</cost></row> </item>

<payment> …similar to <order> and <item> </payment>

</db>

Page 18: XML Publishing

XML Purchase OrderXML Purchase Order

<order id=“10”><customer>Smith Construction</customer><items>

<item description=“generator”><cost>8000</cost>

</item> <item description=“backhoe”>

<cost>24000</cost></item>

</items><payments>

<payment due=“1/10/01”><amount>20000</

amount></payment> <payment due=“6/10/01”>

<amount>12000</amount>

</payment></payments>

</order><order id=“9”>…</order>

01. create view orders as (02. for $order in view (“default”)/order/row03. return04. <order id=$order/id>05. <customer>$order/custname</customer>06. <items>07. for $item in view(“default”)/item/row08. where $order/id=$item/oid09. return10. <item description=$item/desc> 11. <cost>$item/cost</cost>12. </item>13. </items> 14. <payments>15. for $payment in view(“default”)/payment/row16. where $order/id=$payment/oid17. return18. <payment due=$payment/data>19. <amount>$payment/amount</amount>20. </payment>21. sortby(@due)22. </payments>23. </order>)

User-defined XML “orders” viewUser-defined XML “orders” view

Page 19: XML Publishing

1. for $order in view(“orders”)

2. where $order/customer/text() like “Smith%”

3. return $order

XQuery over “orders” viewXQuery over “orders” view

XQuery ParserXQuery Parser

Page 20: XML Publishing

Query ParsingQuery Parsing

XQGM (XML Query Graph Model)

- extension of a SQL internal query representation called Query Graph Model (QGM).

- consists of a set of operators and functions that are designed to capture the semantics of an XML query.

Page 21: XML Publishing

OPERATOR DESCRIPTION

Table Represents a table in a relational database

Project Computes results based on its input

Select Restricts its input

Join Join two or more inputs

Groupby Applies aggregate functions and grouping

Orderby Sorts input based on column values

Union Unions two or more inputs

Unnest Applies super-scalar functions to input

View Represents a view

Function Represents an Xquery function

XML FUNCTION DESCRIPTION OPERATORS

1 cr8Elem(Tag, Atts, Clist) Creates an element with tag name Tag, attribute list Atts, and contents Clist Project

2 cr8AttList(A1,…,An) Creates a list of attributes from the attributes passed as parameters Project

3 cr8XMLFragList(C1,…,Cn) Creates an XML fragment list from the content (element/text) parameters Project

4 aggXMLFrags© Aggregate function that creates an XML fragment list from content inputs Groupby

5 getTagName(Elem) Returns the element name of Elem Project, Select

6 getAttributes(Elem) Returns the list of attributes of Elem Project, Select

7 getAttName(Att) Returns the name of attribute Att Project, Select

8 Is Element(E) Returns true if E is an element, returns false otherwise Select

9 isText(T) Returns true if T is text, returns false otherwise Select

10 Unnest(List) Superscalar function that unnest a list Unnest

Part of the XML Functions and Operators in XQGM

Page 22: XML Publishing

table: item table: paymenttable: order

project: $order= <order id=$id> <custname>$custname</custname> <items>$items</items> <payments>$pmts</payments> </order>

$oid $desc $cost $id $custname $oid $due $amt

select: $oid = $id select: $oid = $id

$desc $cost $due $amt

project: $item = <item> … project: $pmt = <payment> …

groupby:$items = aggXMLFrags($item)

groupby:orderby (on $due):$pmts = aggXMLFrags($pmt)

$due $pmt$item

join (correlated):

$id $custname $items $pmts

$items$pmts

$order

correlation on order.id

view result

12

3

4

5

6

7

8

9

10

11

XQGM for the XML Orders View

Page 23: XML Publishing

project: $elems = getContents($order)$elems

View: orders$order

Unnest: $elem = unnest($elems)$elem

select: isElement($elem) and getTagName($elem) = “customer”

$elem

project: $vals = getContents($elem)$vals

Unnest: $val = unnest($vals)$val

select: isText($val) and $val like “Smith%”$val

join (correlated):

$order

1

2

3

4

5

6

7

8

correlation

on $order

XQGM for the Query over Orders View

for $order in view(“order”) where $order/customer/text() like “Smith%” return $order

Page 24: XML Publishing

XQGM after the Query Parsing Stage is composed with the views it references (orders view here) and rewrite optimizations are performed to eliminate the construction of intermediate XML fragments and push down predicates.

View CompositionView Composition

Page 25: XML Publishing

View CompositionView Composition

FUNCTION COMPOSES WITH REDUCTION

1 getTagName cr8Elem(Tag, Atts, Clist) Tag

2 getAttributes cr8Elem(Tag, Atts, Clist) Atts

3 getContents cr8Elem(Tag, Atts, Clist) Clist

4 getAttName cr8Att(Name, Val) Name

5 getAttValue cr8Att(Name, Val) Val

6 isElement cr8Elem(Tag, Atts, Clist) True

7 isElement Other than cr8Elem False

8 isText PCDATA True

9 isText Other than PCDATA False

10 unnest aggXMLFrags( C ) C

11 unnest cr8XMLFragList(C1,…,Cn) C1 … Cn∪ ∪

12 unnest cr8AttList(A1,…, An) A1 … An∪ ∪

Composition RulesComposition Rules

Page 26: XML Publishing

table: item table: paymenttable: order

project: $order= <order id=$id>…

$oid $desc $cost $id $custname $oid $due $amt

select: $oid = $id select: $oid = $id

$desc $cost $due $amt

project: $item = <item> … project: $pmt = <payment> …

groupby:$items = aggXMLFrags($item)

groupby:orderby (on $due):$pmts = aggXMLFrags($pmt)

$due $pmt$item

join (correlated):

$id $custname $items $pmts

$items$pmts

$order

correlation on order.id

12

3

4

5

6

7

8

9

10

11

Select: $custname like “Smith%”

$custname

join (correlated):

$order

Select: $custname like “Smith%”

$custname$id

$custnameQueryQueryViewView

12

13

Predicate pushdown

Page 27: XML Publishing

The goal in this phase of query processing is to push all data and memory intensive operations down to the relational engine as an efficient SQL query. Two techniques are available:

1. Query Decorrelation

2. Tagger Pull-up

Computation PushdownComputation Pushdown

Page 28: XML Publishing

Complex expressions in Xquery can be represented using correlations. However, it has been shown in earlier work that executing correlated XML queries over a relational database leads to poor performance, so query de-correlation is a necessary step for efficient XML query execution.

Query DecorrelationQuery Decorrelation

Page 29: XML Publishing

table: item table: paymenttable: order

left outer join: $id = $id

$oid $desc $cost $id $custname $oid $due $amt

join: $oid = $id join: $oid=$id

$desc $cost $due $amt

project: $item = <item> … project: $pmt = <payment> …

Groupby (on $id) :$items = aggXMLFrags($item)

groupby:orderby (on $due):$pmts = aggXMLFrags($pmt)

$due $pmt$item

right outer join: $id = $id

$id $custname $items

$items$pmts

$order

12

3

4

5

6

7

8

9

12

11

Select: $custname like “Smith%”

$custname$id

$custname

XQGM after XQGM after DecorrelationDecorrelation

10

project: $order= <order>…$order

13

$id

$id

$id

$items $pmts

$id

$id

$id

Page 30: XML Publishing

This step comes right after the query decorrelation. It separates the tagger and SQL operations before SQL query are generated

Relational operations are pushed to the bottom of the graph. SQL statements are generated and sent to the relational engine for execution.

XML construction functions are pulled up to the top of the query graph and transformed into a “tagger run-time” graph, which produces the result XML documents.

Tagger Pull-upTagger Pull-up

Page 31: XML Publishing

input:

$id $custnameinput: $oid = $id input: $oid = $id

$desc $cost $due $amt

merge: $item = <item> … merge: $pmt = <payment> …

aggregate:$items = aggXMLFrags($item)

aggregate::$pmts = aggXMLFrags($pmt)

$pmt$item

Merge: $order=<order>…

$items $pmts

$ordercorrelation on id

12

3

4

5

6

7

8XQGM after Tagger Pull-up

select p.oid, i.desc, i.costfrom item i, order owhere o.custname like ‘Smith%’

and i.oid = o.idorder by o.id

select o.id, o.custnamefrom order owhere o.custname like ‘Smith%’order by o.id

select p.oid, p.due, p.amtfrom payment p, order owhere o.custname like ‘Smith%’

and p.oid = o.idorder by o.id, p.due

Page 32: XML Publishing

SilkRoute ApproachSilkRoute Approach

Page 33: XML Publishing

Applications

Web/Intranet

PlanGenerator

XML Tagger

RDBMSSilkRoute’s Architecture

Query RXL

Result XML Documents

User XML Queries

SQL Queries Tuples Streams

SilkRouteQueryComposer

XML Template

Source DescriptionXML

Virtual ViewOr Materialized ViewRXL

Page 34: XML Publishing

SilkRoute ApproachSilkRoute Approach Database administrator starts by writing an RXL

query that defines the XML view of the database. It is called the view query.

A materialized view is fed directly into the Plan Generator, which generates a set of SQL queries and one XML template.

A virtual view is first composed by the Query Composer with a user query resulting another RXL query which then is fed into Plan Generator.

SQL queries are sent to the RDMS server, which returns one sorted tuple stream per SQL query

XML Tagger merges the tuple streams and produces the XML document, which is returned to the application.

Page 35: XML Publishing

Query ComposerQuery Composer

This component takes a user XML-QL query and composes it with the RXL view query resulting a new RXL query. It combines fragments of the view query and user query. Works the similar way that the Query Parser and Query Rewrite components in XPERRANTO do.

Page 36: XML Publishing

Plan GeneratorPlan Generator

This component in SilkRoute uses a greedy optimization algorithm to choose an optimal set of SQL queries for a given RXL view definition. The algorithm bases its decisions on query cost estimations provided by the relational engine and can return more than one plan, which will be integrated with additional optimization algorithms that optimize specific parameters, such as network traffic or server load. Details of the greedy algorithm can be found in:

Efficient evaluation of XML middle-ware queries. M. Fernandez etc.

Page 37: XML Publishing

XML Publishing : SQL ServerXML Publishing : SQL Server

Two approachesSQL-centric approach extend the function of SQL queries to realize the

transformation. The extended version of SQL query is called “FOR XML”.

Virtual XML views approach use XDR (XML-based XML-Data Reduced)

schema language to define virtual XML views over relation database, then do querying with XPath.

Page 38: XML Publishing

XML Publishing : SQL ServerXML Publishing : SQL ServerSQL-centric approachSQL-centric approach

Three modesRAW modeAuto ModeExplicit Mode

Page 39: XML Publishing

XML Publishing : SQL Server, XML Publishing : SQL Server, RAW ModeRAW Mode

SELECT CustomerID, OrderIDFROM Customer LEFT OUTER JOIN ORDERSON Customers.CustomerID = Orders.CustomerID For XML Raw

SELECT CustomerID, OrderIDFROM Customer LEFT OUTER JOIN ORDERSON Customers.CustomerID = Orders.CustomerID For XML Raw

<row CustomerID = “ALFKI”, OrderID = “10643”/><row CustomerID = “ALFKI”, OrderID = “10692”/><row CustomerID = “ANATR”, OrderID = “10308”/>. . . .

<row CustomerID = “ALFKI”, OrderID = “10643”/><row CustomerID = “ALFKI”, OrderID = “10692”/><row CustomerID = “ANATR”, OrderID = “10308”/>. . . .

• flat XML• default tag and attribute names

Page 40: XML Publishing

XML Publishing : SQL ServerXML Publishing : SQL ServerAuto ModeAuto Mode

SELECT Customers.CustomerID, OrderIDFROM Customer LEFT OUTER JOIN ORDERSON Customers.CustomerID = Orders.CustomerIDORDER BY Customers.OrderID For XML Auto

SELECT Customers.CustomerID, OrderIDFROM Customer LEFT OUTER JOIN ORDERSON Customers.CustomerID = Orders.CustomerIDORDER BY Customers.OrderID For XML Auto

<Customers CustomerID = “ALFKI”><Orders OrderID = “10643”/><Orders OrderID = “10692”/>

</Customers><Customers CustomerID = “ANATR”>

<Orders OrderID = “10308”/></Customers>. . . .

<Customers CustomerID = “ALFKI”><Orders OrderID = “10643”/><Orders OrderID = “10692”/>

</Customers><Customers CustomerID = “ANATR”>

<Orders OrderID = “10308”/></Customers>. . . .

• default tag and attribute names• no differently typed sibling elements

Page 41: XML Publishing

XML Publishing : SQL ServerXML Publishing : SQL ServerExplicit ModeExplicit Mode

Nested XMLUser defined tags and attributes

Idea: write SQL queries with complex column names

Ad-hoc, order dependent semantics

Page 42: XML Publishing

XML Publishing : SQL ServerXML Publishing : SQL ServerVirtual XML ViewsVirtual XML Views

The core mechanism of providing XML views over relation data is the concept of an annotated schema, which consist of a schema description of the XML view and annotations that describe the mapping of the XML schema constructs to the relational schema constructs. Then the XPath query together with the annotated schema is translated into a FOR XML query that only returns the data that is required by the query.

Page 43: XML Publishing

SummarySummary

IBM XPERRANTO pure XML, single query language approach.

XML views are defined by XML query language which is using the type system of XML schema.

SilkRoute

XML views are defined using a declarative query language called RXL (Relational to XML Transformation Language).

Microsoft SQL 2000

Supports queries over XML views, but the support is very limited, because queries are specified using XPath, which is a subset of XQuery.

Page 44: XML Publishing

Future WorkFuture Work

IBM XPERRANTO- provides support for insertable and updateable XML views

- pushes tagging inside the database system

SilkRoute

- looks for better algorithms for translating of RXL into efficient SQL and minimization of composed RXL views

Microsoft SQL 2000

- finds out whether query composition and decomposition is possible for the complete XQuery language or for only a subset of the language