DB2 9 zOS pureXML Advanced Topics - OOWidgets 9 zOS pureXML Advanced Topics.pdf · – SQLCODE =...

© 2010 IBM Corporation

Information Management

Advanced Topics:pureXML in DB2 9 for z/OSGuogen (Gene) Zhang, IBMMay 2012

© 2010 IBM Corporation2


Agenda

� Usage scenarios

� Hints and tips of using XML features

� XML Indexing and access paths

� XML performance, monitoring and tuning

� Additional best practices

�



Usage Scenarios

� Directly processing XML (for data in motion)– SEPA/UNIFI, ACORD, FIXML, FpML, MIMSO, XBRL, …– DJXDM, HR-XML, HL7, ARTS, HIPAA, NewsML, XForms, …– Insurance policy, contract, purchase order, emails, etc.

� Versatile schemas and enable end-user customizableapplications.

� Object persistence (single XML column v.s. many tables)

� Sparse attribute values (null v.s. absence)

� Migration from legacy data model (network, hierarchical, relational)

� Generating web pages: XHTML

� Provide/consume Web Services (SOAP), support SOA

� …



Customer Experiences

� Insurance, financial, banking, government, healthcare, telecom, manufacturing, research, transportation …

� References or public information:– ZIVIT: Tax processing– GAD: Banking, XBRL & SEPA– BOA Merrill Lynch: Finance– Temenos T24: universal banking application– …

� From LUW:– NY State: Tax processing– UCLA Health System: medical records– …

� All sorts of XML usages: flexibility

� More uptake since V10 GA



Usage Scenarios: Directly Processing XML

Financial Services XML Standards

�XML for data in motion - storing what is exchanged

–SEPA/UNIFI, ACORD, FIXML, FpML, MIMSO, XBRL, …

–DJXDM, HR-XML, HL7, ARTS, HIPAA, NewsML, XForms, …

–Insurance policy, contract, purchase order, emails, etc.

X9NACHAECCHO

UNIFI (ISO 20022)



XML Event Log for Auditing

� Event log keeps records of diverse events, including:– TIMESTAMP, SESSIONID, USERID, REQUEST, RESULTSTATUS, SOURCE, EVENTTYPE, ACTION,

RESPONSE etc.

� An XML column cancontain the diversedata specific to eachevent type.

� Many XML indexescan be created to servediverse query needswith good performance.

SystemServices

Event Log

Auditing

xml



Output XSLT

Feeds

JSON

Any text formatincluding XML, PDF

� Flexibility in Web service input and output formats

� Alleviates ‘top-down’ Web service format requirements

Usage Scenarios: Generating web pages: XHTML,...



XML as Front to Backend/Core Systems

XML

DB2 pureXML

relationalXML

XML

XML

Backend 1

Backend 2

Backend 3

Backend n

Interface Tables 1

Interface Tables 2

Interface Tables 3

Interface Tables n

Physical tables or logical views

MQ

FTP

HTTP

Need to handle XML data, but full normalization is overkill.

Great fit for pureXML



Agenda

� Usage scenarios





�



What You Can Do with pureXML

� Create tables with XML columns or alter table add XML columns

� Insert XML data, optionally validated against schemas

� Create indexes on XML data

� Efficiently search XML data

� Extract XML data

� Decompose XML data into relational data or create relational view

� Construct XML documents from relational and XML data

� Handle XML objects in all the utilities and tools

XMLDOC

XML Column

XMLIndex

XML

- Managing XML data the same way as relational data



XML Column fundamental

�Store XML documents up to 2GB each

�Store any well-formed documents–<a/>: smallest well-formed document ("not valid")

–"" (empty string): is not a well-formed document–Bytes of 00x are not legal XML characters,–Blanks are legal, and can appear at the end of a doc.–NULL: legal value for XML (and only reasonable default)–XML column is always NULLable.

�XML document can be validated against schema.

�Zero, one or more index entries from one document

�Dynamic typing at query time



"Fun" with SQL/XML and XPath

� XMLSERIALIZE(XMLQUERY('/po/@number' PASSING po))

SQL Error.– Use XMLCAST(XMLQUERY('/po/@number' PASSING po) as INT) to get

attribute value in SQL

� Dept-name='Shipping & Receiving'

XMLSERIALIZE => 'Shipping & Receiving'

XMLCAST => 'Shipping & Receiving'

� /po/items/item[price >= 10 and price <= 20]

not necessarily between 10 and 20– <item><price>5</price><price>25</price></item>

� /po/items/item[price[. >=10 and . <= 20]] this is between

� Compare: /po/items/item[@price >=10 and @price <= 20]

� Compare: /a/b[c=5], /a/b[.//c=5], /a/b[//c=5]



Using SPUFI or JCL

SELECT Cid, InfoFROM DSN8910.CUSTOMERWHERE XMLEXISTS ('declare default element namespace "http://posample.org";//addr[city="Toronto"]' passing INFO)

� XML and XPath are case-sensitive: CAP off/case mixed– SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN

UNEXPECTED TOKEN DEFAULT FOLLOWING DECLARE.

� Terminal session CCSID setting has to be consistent with application encoding scheme as “[” and “]” have different code points in different code pages.– SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN

UNEXPECTED TOKEN FOLLOWING "Toronto".



FETCH CONTINUE for XML and LOB

� No size associated with XML values

� Hard to allocate large memory

� Shortcomings with LOB Locator

� New FETCH CONTINUE statements: (one of two ways)– DECLARE CURSOR1 CURSOR FOR SELECT C2 FROM T1;– OPEN CURSOR1;– FETCH WITH CONTINUE CURSOR1 into :clobhv;– if (sqlcode >= 0) & sqlcode <> 100– Loop if truncation occurs until lob/xml complete (total length)– FETCH CURRENT CONTINUE CURSOR1 into :clobhv;– Consume :clobhv content– end loop

� Another way is to use FETCH … INTO DESCRIPTOR :SQLDA



Examples of XPath - Typing

� No cast is needed: “Find all the products in the Catalog with RegPrice> 100”XMLQUERY(‘/Catalog/Categories/Product[RegPrice > 100]’PASSING XCatalog)

� Cast is needed: “Find all the products on sale in the Catalog”XMLQUERY(‘/Catalog/Categories/Product[RegPrice > xs:double(SalePrice) ]’ PASSING XCatalog)

� No cast is needed: “Find all the products with more than 10% discount in the Catalog”XMLQUERY(‘/Catalog/Categories/Product[RegPrice * 0.9 > SalePrice ]’ PASSING XCatalog)



Examples of XPath - Cardinality

� No cardinality problem: “Find all the products in the Catalog with RegPrice > $price”XMLQUERY(‘/Catalog/Categories/Product[RegPrice > $price]’PASSING XCatalog, 200 as “price”)

� To avoid cardinality violation: “Find all the products on sale in the Catalog”XMLQUERY(‘/Catalog/Categories/Product[RegPrice > SalePrice/xs:double(.) ) ]’ PASSING XCatalog)

� To avoid cardinality violation: “Find all the products with more than 10% discount in the Catalog”XMLQUERY(‘/Catalog/Categories/Product[RegPrice/(. * 0.9) > SalePrice ]’ PASSING XCatalog)



Tricks in Generating XML test data� Generate from relational data using constructors

INSERT INTO MYTABLE SELECT XMLDOCUMENT(XMLELEMENT(…. XMLAGG(…) ) ) FROM …

� Repeat a document 1000 times just to create volume (applies to any other data)

INSERT INTO MYTABLEWITH T(n) AS (SELECT 1 FROM SYSIBM.SYSDUMMYUUNION ALLSELECT n+1 FROM T WHERE n <1000)SELECT n, '<doc>xml testing document </doc>' FROM T;

� Stuff a document with long values

INSERT INTO MYTABLE VALUES(1,XMLPARSE(DOCUMENT CLOB('<A> testing doc')

|| CLOB(REPEAT('1',32000)) || CLOB(REPEAT('2',32000)) || CLOB(REPEAT('3',10000)) || CLOB('</A>')

) );

<doc>xml testing document </doc>1000

…..






<A> testing doc11111111111………..1111111122222222222………..22222222233333333333………333333</A>



Pattern matching for strings using regular expressions

� Three important functions in XPath using regular expressions: fn:matches, fn:replace, fn:tokenize

� fn:replace(): to replace some patterns with a replacement string.For example, strip spaces and dashes: fn:replace($x, "[ \-]", "")Remove non-digit characters: fn:replace($x, "[^0-9]+", "")

� fn:matches(): check if a string matches a patternFor example: all digits starting with "4" fn:matches($y, "^4[0-9]+$")

� fn:tokenize(): break a string into a sequenceFor example, split a sequence of keywords separated by a space: fn:tokenize($x, " ")



Returning items from a string

� Items separated by space or comma, can be used to pass an array (use XMLAGG to construct it)

� Using space: SELECT * FROM XMLTABLE('fn:tokenize($x, " ")'PASSING '12 34 56 78' as "x" COLUMNS A INT PAHT '.') as XT

A

12

34

56

78

� Using comma: SELECT * FROM XMLTABLE('fn:tokenize($x, ",")'PASSING '12 AB, 34 CD, 56, 78' as "x" COLUMNS B VARCHAR(10) PATH '.') as XT

B

12 AB

34 CD

56

78



Flexible Parameter Passing using XML

� Passing flexible structures, or arrays

� Passing a flexible structure

CALL MyProc1(XMLELEMENT(NAME "Array",XMLFOREST('1001' as "prodID",

'1003' as "prodID",

'1080' as "prodID",

'1088' as "prodID")) );

In MyProc1(x):

SELECT …

FROM XMLTABLE('Array' passing xcolumn prodID int PATH 'prodID'), …

CALL myProc2(XMLELEMENT(NAME "Struct",XMLFOREST('1001' as "id", 'prod_name' as "pname", 123.45 as "qty"));

In myProc2(x):

SELECT …

FROM XMLTABLE('Struct' passing xcolumn id int PATH 'id', pname varchar(20) PATH 'pname', qty decfloatPATH 'qty'), …

1088

1080

1003

1001

123.45'prod_name'1001



Simulating Arrays and Structures

� Flexibility of XML allows for simulation of arrays and structures.

� Regular structures can be represented as a table using XMLTABLE function.

<products><produc>

<id>196</id><name>z</name><qty>100</qty>

</product><product><id>P7</id><name>P</name><qty>10000</qty></product>

…</products>

XML Representation

XMLTABLE('/products/product' PASSING XDATA COLUMNS"id" VARCHAR(10),"name" VARCHAR(20),"qty" DECIMAL(10,2) )

/products/product[1]/id

/products/product[1]/name

/products/product[1]/qty

/products/product[2]/id

/products/product[2]/name

/products/product[2]/qty

…

ProductsID: 196NAME: zQTY: 100

ID: P7NAME: PQTY: 10000

ID: I5NAME: iQTY: 1000

XMLTableXPath to refer to the values

User View



Usage Scenario: DB2 as Web Services Consumer� Invoke light-weight SOAPHTTP UDF from SQL, consume response in the

same query conveniently.

� Tooling support for DB2 in RAD – Create a SQL UDF specifically for a WS operation from WSDL– Creates ‘wrapper’ SQL UDF to hide XML/relational mapping.

How much is EUR1000 worth in USD?SELECT 1000 *

XMLCAST( XMLQUERY('$d//*:ConversionRateResult' PASSING

XMLPARSE (DOCUMENT

DB2XML.SOAPHTTPNV('http://www.webservicex.net/CurrencyConvertor.asmx',

'http://www.webserviceX.NET/ConversionRate',

'<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Body>

<ConversionRate xmlns="http://www.webserviceX.NET/">

<FromCurrency>EUR</FromCurrency>

<ToCurrency>USD</ToCurrency>

</ConversionRate>

</soap:Body>

</soap:Envelope>')

) AS "d") AS DECIMAL(10,5))

FROM SYSIBM.SYSDUMMYU#



Scenario: End-user Customizable Apps

� ISVs or large enterprise ITs need to serve diverse needs.

� Information such as product spec, customer info varies drastically.

� Use one XML column to contain the customizable data.

<product><spec><unit>box</unit><unitprice>8.5</unitprice><color>black</color><size>8 x 10</size><standards>

<std>ISO910001</std></standards>

</spec></product>

XML Representation

Using XSLT or XForms or your own framework based on XHTML

/product/spec/unit

/product/spec/unitprice

/product/spec/color

/product/spec/size

/product/spec/battery

/product/spec/voltage

/product/spec/standards/std

product

spec

unit

unit price

Color

Size

Battery

Voltage

Standards

DisplayXPath to refer to the valuesUser View



Extracting values from XML for Hybrid Store using Triggers

� CUST(ID, NAME, CITY, ZIP, INFO): extract NAME, CITY, ZIP from INFO (XML)

� CREATE TRIGGER ins_cust

AFTER INSERT ON cust

REFERENCING NEW AS newrow

FOR EACH ROW MODE DB2SQL

BEGIN ATOMIC

update cust

set (name, city, zip) =

(select X.name, X.city, X.zip

from cust, XMLTABLE('customerinfo' PASSING CUST.INFO

COLUMNS

name varchar(30) PATH 'name',

city varchar(20) PATH 'addr/city',

zip varchar(12) PATH 'addr/pcode-zip') as X

where cust.id = newrow.id

)

where cust.id = newrow.id;

END #



Agenda

� Usage scenarios





�



XML Indexes: examples<?xml version="1.0"?>

<purchaseOrder orderDate="1999-10-20">

<shipTo country="US">

<name>Alice Smith</name>

. . .

</shipTo>

<billTo country="US">

<name>Robert Smith</name>

. . .

</billTo>

<comment>Hurry, my lawn is going wild!</comment>

<items>

<item partNum="872-AA">

<desc>Lawnmower</desc>

<quantity>1</quantity>

<USPrice>148.95</USPrice>

<comment>Confirm this is electric</comment>

</item>


<desc>Baby Monitor</desc>



<shipDate>2003-05-21</shipDate>

</item>

</items>

</purchaseOrder>

<?xml version="1.0"?>

<purchaseOrder orderDate="1999-10-20">

<shipTo country="US">

<name>Alice Smith</name>

. . .

</shipTo>

<billTo country="US">

<name>Robert Smith</name>

. . .

</billTo>

<comment>Hurry, my lawn is going wild!</comment>

<items>


<desc>Lawnmower</desc>



<comment>Confirm this is electric</comment>

</item>


<desc>Baby Monitor</desc>



<shipDate>2003-05-21</shipDate>

</item>

</items>

</purchaseOrder>

CREATE INDEX IDX4 ON PurchaseOrders(XMLPO)Generate Keys Using XMLPATTERN

‘/purchaseOrder/items/item/desc’as SQL VARCHAR(100);

CREATE INDEX IDX1 ON PURCHASEORDERS(XMLPO)Generate Keys Using XMLPATTERN

‘/purchaseOrder/shipTo/name’as SQL VARCHAR(40);


‘/purchaseOrder/billTo/name’as SQL VARCHAR(40);


‘/purchaseOrder/items/item/@partNum’as SQL VARCHAR(10);

CREATE INDEX IDX5 ON PurchaseOrders(XMLPO)Generate Keys Using XMLPATTERN

‘/purchaseOrder/items/item/USPrice’as SQL DECFLOAT;



Something Special for XML Index

� The number of keys for each document (each base row) depends on the document and XMLPattern.

� For a numeric index, if a string from a document cannot be converted into a number, it is ignored.

– <a><b>X</b><b>5</b></a>, XMLPattern ‘/a/b’ as SQL Decfloat. Only one entry ‘5’ in the index.

� For a string (VARCHAR(n)) index, if a key value is longer than the limit, INSERT or CREATE INDEX will fail.

� Restriction: Index key value cannot span multiple rows. Always safe to index leaf nodes with short values.



XML Index Exploitation For XMLEXISTS/XMLTABLE

� XML Index may eligible to evaluate XPath predicates in the XMLEXISTS and row XPath predicates in the XMLTABLE functions.

– Will not evaluate XPath predicate in the XMLQUERY function.

� An XML index is eligible to evaluate an XMLEXISTS, if– Rule 1 (Type match):

• The data type used in the XPath predicates must match the data type of the XML index. That is, a string comparison requires a string type XML index, and a numeric comparison requires a numeric type XML index.

– Rule 2 (Node containment): • The XML index contains the XPath predicate. That is, the XMLPattern of the

XML index must be equally or more general than the XPath predicate.

Even if these requirements are satisfied, DB2 optimizer can still decide NOT to use an eligible index!



Type match requirement

� String comparison requires string XML index; Numeric comparison requires numeric XML index.

� String XML index may be used to evaluated existential predicates.

NoNoXMLEXISTS(‘/a/b/c’)DECFLOAT/a/b/c

YESNoXMLEXISTS(‘/a/b[c<100]’)VARCHAR/a/b/c

YESXMLEXISTS(‘/a/b[c=“IBM”]’VARCHAR/a/b/c

YESXMLEXISTS(‘/a/b[c<100’]DECFLOAT/a/b/c

NOXMLEXISTS(‘/a/b[c=“IBM”]DECFLOAT/a/b/c

YESNOXMLEXISTS(‘/a/b/c’)VARCHAR/a/b/c

Requirement Satisfied?

Qualify for Existential?

Type match ? PredicateData type of index

XML Index Pattern



Node Containment Requirement

� Same set (Exact Match). – the XML index pattern is the same as the XPath predicate.

� Superset (Partial Match)– the XML index pattern is more general than the querying XPath, i.e., the index

contains a super set of data than the querying XPath needs. – More general:

•descedant-or-self (//) is more general than child axis (/)•wildcard (*) for name test is more general than “named” name test.

SubsetXMLEXISTS(‘/a/*[c]’)/a/b/c

Subset. XMLEXISTS(‘/a//b[c]’)/a/b/c

Superset.XMLEXISTS(‘/a/b[c]’)/a/*/c

Superset XMLEXISTS(‘/a/b[c]’)/a//b/c

Same set.XMLEXISTS(‘/a/*[c]’)/a/*/c

Same set. XMLEXISTS(‘/a//b[c]’)/a//b/c

Same set. XMLEXISTS(‘/a/b[c]’)/a/b/c

Requirement satisfied ? TypePredicateXML index



Index Match for Multiple Index Access (“and” predicate)

SELECT ORD_NO

FROM PurchaseOrders

WHERE

XMLEXISTS(‘/purchaseOrder/items/item[desc=“Baby

Monitor and USPrice < 100”]’

PASSING XMLPO);

CREATE INDEX IDX1 ON PurchaseOrders(XMLPO)

GENERATE KEY USINGXMLPATTERN

‘/purchaseOrder/items/item/desc’AS SQL VARCHAR(20);

and

desc = “..” USPrice < 100

CREATE INDEX IDX2 ON PurchaseOrders(XMLPO) GENERATE KEY USING XMLPATTERN

‘/purchaseOrder/items/item/USPrice’ AS SQL DECFLOAT;

Single index access plan ( using IDX2)Only IDX2 created

Multiple index access plan ( using IDX1 ANDing with IDX2).

Both IDX1 and IDX2 created.

Single index access plan ( using IDX1)Only IDX1 created

Rule for “and” predicate: any individual branch can match an XML index, and get an index access plan.



Query processing using index ANDing in a picture

XML IDX1XML IDX1 XML IDX2XML IDX2

DOCID list 1 DOCID list 2

INTERSECT

DOCID list

DOCID IDXDOCID IDX

RID list

Base Table NODEID IDXNODEID IDX1 2

34

56

Base Table

XMLColDocID …

B+treeB+tree

DocID index

Internal XML Table

B+treeB+tree

NodeID index

B+treeB+tree

XML index (user)

XMLDataDOCID MIN_NODEID

1

2

3

1

2

2

3

02

02

0208

02

Base Table

XMLColDocID …

B+treeB+tree

DocID index

Internal XML Table

B+treeB+tree

NodeID index

B+treeB+tree

XML index (user)

XMLDataDOCID MIN_NODEID

1

2

3

1

2

2

3

02

02

0208

02

XMLEXISTS('/a/b[c<5 and d="good"]' …) or XMLEXISTS('/a/[…]/d[…]…' …)or XMLEXISTS(…) AND XMLEXISTS(…) <= same column or different columns



Index Match for Multiple Index Access (“or” predicate)SELECT ORD_NO

FROM PurchaseOrders

WHERE

XMLEXISTS(‘/purchaseOrder/items/item[desc=“Baby

Monitor or USPrice < 100”]’

PASSING XMLPO);


GENERATE KEY USINGXMLPATTERN

‘/purchaseOrder/items/item/desc’AS SQL VARCHAR(20);

or

desc = “..” USPrice < 100

CREATE INDEX IDX2 ON PurchaseOrders(XMLPO) GENERATE KEY USING XMLPATTERN

‘/purchaseOrder/items/item/USPrice’ AS SQL DECFLOAT;

No index will be used. Only IDX2 created

Multiple index access plan ( using IDX1 ORing with IDX2).

Both IDX1 and IDX2 created.

No index will be used.Only IDX1 created

Rule for “or” predicate: Every branch has to match an XML index to have a multiple index access plan. Otherwise, no index access (table scan).



EXPLAIN Access Paths

CREATE TABLE PurchaseOrders

(ORD_NO INT,

XMLPO XML);


GENERATE KEY USING XMLPATTERN

'/purchaseOrder/items/item/desc' AS SQL VARCHAR(20);



'/purchaseOrder/items/item/USPrice' AS SQL DECFLOAT;

EXPLAIN ALL SET QUERYNO = 1 FOR

SELECT ORD_NO

FROM PurchaseOrders

WHERE

XMLEXISTS('/purchaseOrder/items/item[desc = “Baby Monitor”]' PASSING XMLPO);

TABLE DDL

INDEX DDL

EXPLAIN QUERY

EXPLAIN TABLE

0IDX1 DXPURCHASEORDERS 0111

MIXOPSEQ ACCESSNAME ACCESSTYPE TNAME METHOD PLANNO QBLOCKNOQUERYNO



EXPLAIN Access PathsEXPLAIN ALL SET QUERYNO = 2 FOR

SELECT ORD_NO FROM PurchaseOrders

WHERE

XMLEXISTS('/purchaseOrder/items/item[desc=“Baby Monitor” and USPrice < 100]'

PASSING XMLPO);

EXPLAIN ALL SET QUERYNO = 1 FOR

SELECT ORD_NO FROM PurchaseOrders

WHERE

XMLEXISTS('/purchaseOrder/items/item[desc=“Baby Monitor” or USPrice < 100]'

PASSING XMLPO);

3DIPURCHASEORDERS0112

2IDX2DXPURCHASEORDERS0112

0MPURCHASEORDERS 0112


MIXOPSEQ

ACCESSNAME ACCESSTYPE TNAME METHOD PLANNO QBLOCKNOQUERYNO

3DUPURCHASEORDERS0112




MIXOPSEQ




Exploring XML Index For XMLTABLE

� XPath predicate in Row expression of XMLTABLE can explore XML index– Column XPath predicate will NOT use any XML index !

� The index matching criteria of XMLTABLE is the same as that of XMLEXISTS

SELECT X.*

FROM PurchaseOrders,

XMLTABLE(‘/purchaseOrder/items/item[desc=“Baby Monitor”]’PASSING PurchaseOrders.XMLPO

COLUMNS “quantity” INTEGER,

“desc” VARCHAR(20)) X;

… XMLPATTERN ‘/purchareOrder/items/item/desc’ AS SQL VARCHAR(20);

SELECT X.*

FROM PurchaseOrders,

XMLTABLE(‘/purchaseOrder/items/item’

PASSING PurchaseOrders.XMLPO

COLUMNS “quantity” INTEGER,

“item” XML PATH ‘.[desc=“Baby Monitor”]’) X;



XML index for XML joins

�TPoX Q7: Find the most expensive order of a customer

� SELECT …FROM custacc, orderWHERE … AND XMLEXISTS(

'declare default element namespace"http://www.fixprotocol.org/FIXML-4-4";declare namespace c="http://tpox-benchmark.com/custacc";$odoc/FIXML/Order[@Acct=

$cadoc/c:Customer/c:Accounts/c:Account/@id] 'PASSING cadoc AS "cadoc", odoc AS "odoc")

�Join between orders and account

�XML index XMLPATTERN: '/FIXML/Order/@Acct'

�PK55783/PK81260 enhanced to use XML index for the joins



XML Index for Non-Existential Predicate (missing an element or attribute) (PK80732/PK80735)

XMLEXISTS('/purchaseOrder/items/item[fn:not(shipDate)]' PASSING XMLPO);

XMLPODOCID

<purchaseOrder orderDate=“2009-10-05">. . .

<items><item partNum="872-AA">

<desc>Lawnmower</desc><quantity>1</quantity><shipDate>2010-05-01</shipDate>

</item></items></purchaseOrder>

. . .1001

1002

1003

<purchaseOrder orderDate=“2009-10-05">. . .<items>

<item partNum=“926-AA"><desc>Baby Monitor</desc><quantity>1</quantity>

</item><items>

</purchaseOrder>


<items><item partNum=“957-BB">

<desc>Air Conditioner</desc><quantity>2</quantity> <shipDate>2010-01-01</shipDate>

</item><items>

</purchaseOrder>

. . .

. . .

CREATE INDEX IDX1 ON PurchaseOrders(XMLPO) GENERATE KEY USING XMLPATTERN '/purchaseOrder/items/item/fn:exists(shipDate)' AS SQL VARCHAR(1);

1003T

1001T

1002F

…DocIDKey

XMLEXISTS('/purchaseOrder/items/item[fn:exists(shipDate)="F"]' PASSING XMLPO);

XML Index scan returns DocID = 1002

Internal implementation



XML Index for Existential Predicate (containing an element or attribute)XMLEXISTS('/purchaseOrder/items/item[fn:exists(shipDate)]' PASSING XMLPO);

XMLPODOCID


<items><item partNum="872-AA">

<desc>Lawnmower</desc><quantity>1</quantity><shipDate>2010-05-01</shipDate>

</item></items></purchaseOrder>

. . .1001

1002

1003

<purchaseOrder orderDate=“2009-10-05">. . .<items>

<item partNum=“926-AA"><desc>Baby Monitor</desc><quantity>1</quantity>

</item><items>

</purchaseOrder>


<items><item partNum=“957-BB">

<desc>Air Conditioner</desc><quantity>2</quantity> <shipDate>2010-01-01</shipDate>

</item><items>

</purchaseOrder>

. . .

. . .

CREATE INDEX IDX1 ON PurchaseOrders(XMLPO) GENERATE KEY USING XMLPATTERN '/purchaseOrder/items/item/fn:exists(shipDate)' AS SQL VARCHAR(1);

1003T

1001T

1002F

…DocIDKey

XMLEXISTS('/purchaseOrder/items/item[fn:exists(shipDate)="T"]' PASSING XMLPO);

XML Index scan returns DocID = 1001, 1003

Internal Implementation



Case-insensitive Search with XML indexes

� Use fn:upper-case() in the XPath predicate for Case-insensitive comparison.

� Use fn:upper-case() in the XMLPattern to create eligible XML index.

SELECT ORD_NOFROM PurchaseOrdersWHERE XMLEXISTS('/purchaseOrder/items/item[fn:upper-case(desc)=‘BABY MONITOR’]‘PASSING XMLPO);

CREATE INDEX IDX2 ON PurchaseOrders(XMLPO) Generate Keys

Using XMLpattern '/purchaseOrder/items/item/desc/fn:upper-case(.)'

AS SQL VARCHAR(20);



XML Index + Relational IndexCREATE INDEX IDX1 ON PurchaseOrders(XMLPO)


'/purchaseOrder/items/item/desc'AS SQL VARCHAR(20);

EXPLAIN ALL SET QUERYNO = 6 FORSELECT P.ORD_NOFROM PurchaseOrders PWHERE

Ord_NO = 100AND

XMLEXISTS('/purchaseOrder/items/item[desc = “Baby Monitor”]' PASSING XMLPO) ;

CREATE INDEX IDX_ORDNO on PurchaseOrders(ORD_NO);

3MIPURCHASEORDERS0116



1IDX_ORDNOMXPURCHASEORDERS0116

MIXOPSEQ


Multi-index access. Always followed by MX, MI or MUM

Intersection of multiple indexesMI

Multi-index scan on referenced indexMX



Avoid XMLEXISTS XPath scan after XML indexes

� XMLEXISTS predicates are stage 2 predicates.

� The XPath in XMLEXISTS is re-evaluated by QSCAN after XML index matches.– For two reasons: not exact match, and possible update

� Evaluating XPath using scan is quite expensive.

� Under certain condition, re-evaluation can be avoid and performance improved:1. The XPath in the predicates matches exactly with the XPath for

the index. (at prepare time)2. No update to the XML document since the start of the index

scan. (at runtime)



Agenda

� Usage scenarios





�



TPoX Benchmark

TPoX

0

500

1000

1500

2000

2500

PK81260 PK80732 10 PK80732 20 PK80732 30 PK80732 40

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

ETR (tps)

cpu busy (%)

z10, 5 CPs

Number of users

October, 2009



XML performance facts

�Compress well, 3-5x, and minimal impact to CPU and elapsed time.

–Need to REORG to make compression to take effect, not just LOAD

� Insert and select linearly scalable.

�We probably have the most efficient parser and validating parser on z/OS. Parsing usually very small percentage for the entire workload. Validation is 2-3x more expensive.

�Each XML index adds overhead 10-15%, longer steps and // more expensive.

�XML queries using XML indexes have very good performance.



Performance Monitoring and Tuning

� Since XML native storage is built on top of regular tablespace structure, there are no special changes in DB2 Performance Expert to support XML other than minor points - such as new XML locks (type x’35’).

� XML performance problem can be analyzed through accounting traces and performance traces.

� There is a new LOAD MODULE for XML: DSNNXML

� XML indexes have the same consideration as other indexes.

� The REORG utility should be used to maintain order and free space.

� Run RUNSTATS for statistics to help pick XML indexes.



XML Query Performance Issues

■ Numerous XML performance enhancements since V9 GA

■ DOCID index page latch contention due to sequential DOCIDs

■ PM44210/PM44216

■ New zparm XML_RANDOMIZE_DOCID

■ 85% of the performance issues relate to:– Query execution plans– Index usage– Proper coding of SQL/XML and XQuery expressions

(Best Practices section)



How to obtain and analyze XML query plans

■ Create Explain tables– Use member DSNTESC of the SDSNSAMP library

– Option E from menu of DB2 admin tool (DSN_STATEMNT_TABLE)

– Use Visual Explain• Optim Development Studio

• IBM DB2 Optimization Service Center for DB2 for z/OS(OSC)

■ Gather explain information– Use SPUFI – prefix query with EXPLAIN PLAN SET QUERYNO

– SELECT from PLAN_TABLE



Use RUNSTATS

� Use RUNSTATS to collect statistics for XML data and indexes so the optimizer can pick the right access methods

LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD

RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)



Agenda

� Usage scenarios




� Additional best practices�



Best Practices

� Tip 1: Choose the right table and storage design

� Tip 2: Choose the right XML document granularity

� Tip 3: Be aware of XML schema validation overhead

� Tip 4: Avoid encoding conversion during XML insert and retrieval

� Tip 5: In XPath expressions, use fully specified paths as much as possible

� Tip 6: Define lean XML indexes

� Tip 7: Put document filtering predicates in XMLEXISTS instead ofXMLQUERY

� Tip 8: Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS

� Tip 9: Use RUNSTATS to collects statistics for XML data and indexes

� Tip 10: Use SQL/XML publishing views to expose relational data as XML

� Tip 11: Use XMLTABLE views to expose XML as relational data

� Tip 12: Use SQL/XML statements with parameter markers and host vars



Tip 1: Decision making: XML input => storage

Regulatory Requirements

Intact Digital Signature Significant Data Flexible

Search in XML

Never

LOBVARCHARVARBIN

(preserve whitespace) (strip whitespace) (Relational/XML)

Yes Return XMLalways

Yes

XML withXML indexes

No

Light Reporting

StructuresRegularFixed

Relational

Complex Flexible

XML withXMLTABLE()

Yes

Heavy Analytics

Relational withXML

(can bematerialized)



Some considerations

� Tedious normalization and frustrated changes of schema are an indicator for using native XML.

� Store hybrid or redundant data in relational/XML, when– Fully normalized storage is an overkill– Referential integrity: extract into relation columns– Store in XML, but materialize frequently used fields in relational

for heavy analytic applications– Document size is big (more later)

� Use compression for XML data always



Table Design

� Mixed document types in one table– Flexibility in exchange of

overhead (such as index maintenance)

� Separate tables for different document types– to avoid overhead

XMLDOCID DOCTYPE

XMLDOCID

DOCTYPE1TAB

XMLDOCID

DOCTYPE2TAB

DOCTAB



Table Space Size Consideration

� Basic XML storage is about 0.3 (strip whitespace w/ compression) to 1.5 (preserve ws w/o compression) of original doc size

� An XML table space always use 16KB pages.– For non range-partitioned base table spaces, PBG table space is used

for XML.

� Range-partitioned base table spaces: XML partitioning follows base table partitioning.

� The number of rows to fit into a relational partition is limited by the number of documents to fit into an XML partition.– For example, 4K doc size, 32GB partition can roughly store 8M

documents (or 7M to be safe).



Tip 2: Choose the right XML document granularity

� Small vs. large documents? (KBs vs. MBs)XML Indexes filter at the document level

� Smaller documents tend to perform betterBut, rule of thumb:

Document granularity should match thepredominant granularity of access



Document Granularity: Example<order date=‘2004-11-05'>

<customer>Doe</customer><part key='82' >

<quantity>5</quantity><price>5.00</price>

</part><part key='83' >


</part></order>

<order date=‘2004-11-06'><customer>Doe</customer><part key=‘19' >


</part><part key=‘48' >


</part></order>

<allorders><order date=‘2004-11-05'>

<customer>Doe</customer><part key='82' >


</part><part key='83' >


</part></order><order date=‘2004-11-06'><customer>Doe</customer><part key=‘19'>


</part><part key=‘48'>


</part></order>

</allorders>

<part key='83' ><quantity>11</quantity><price>19.95</price>

</part>

<part key=‘48' ><quantity>1</quantity><price>24.95</price>

</part>

<part key='82' ><quantity>5</quantity><price>5.00</price>

</part>

<part key=‘19' ><quantity>23</quantity><price>1.99</price>

</part>

<order date=‘2004-11-05'><customer>Doe</customer><part key='82‘/><part key='83‘/>

</order>

<order date=‘2004-11-06'><customer>Doe</customer><part key=‘19‘/><part key=‘48‘/>

</order>



Tip 3: Beware of Schema Validation Overhead

create table dept(deptID char(8), deptdoc xml);

Validation is optional, and per document (per row):insert into dept values (?, ?)insert into dept values (?, dsn_xmlvalidate(?, ?))

Validation increases CPU time for inserts, and reduces throughput.

Use schema validation if needed.Avoid schema validation for highest possible insert performance.

No validation

with validation



Tip 4: Avoid encoding conversion

■ Internally encoded XML: encoding derived from the data, e.g. Unicode Byte-Order Mark or optional XML declaration: <?xml version="1.0" encoding="UTF-8" ?>

■ Externally encoded XML: application encoding determines XML encoding if character type variables are used

■ Internally encoded XML with UTF-8 is preferred

– CLI: use SQL_C_BINARY data buffers rather than SQL_C_CHAR, SQL_C_DBCHAR, SQL_C_WCHAR

– Java: use binary stream (setBinaryStream) rather than string (setString).

– COBOL: SQL BLOB



Tip 5: Use fully specified paths if possible■ As much as possible possible, use fully specified XPath expression

rather than wildcards, e.g.

– /customerinfo/phone instead of //phone

– /customerinfo/addr/state instead of /customerinfo/*/state

<customerinfo Cid="1004"><name>Matt Foreman</name><addr country="Canada">

<street>1596 Baseline</street><city>Toronto</city><state>Ontario</state><pcode>M3Z-5H9</pcode>

</addr><phone type="work">905-555-4789</phone><phone type="home">416-555-3376</phone><assistant>

<name>Peter Smith</name><phone type="home">416-555-3426</phone>

</assistant></customerinfo>



Tip 6: Lean XML Indexes

• create unique index idx1 on customer(info) generate key using xmlpattern '/customerinfo/@Cid' as sql decfloat;

• create index idx2 on customer(info) generate key using xmlpattern '/customerinfo/name' as sql varchar(40);






create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40);

create table customer( info XML);

create index idx4 on customer(info)generate key using xmlpattern '/customerinfo/phone' as sql varchar(40);

LUW: “as sql double”zOS: “as sql decfloat”



Tip 6: Lean XML Indexes

• create unique index idx1 on customer(info) generate key using xmlpattern '/customerinfo/@Cid' as sql decfloat;

• create index idx2 on customer(info) generate key using xmlpattern '/customerinfo/name' as sql varchar(40);






create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40);


create index idx4 on customer(info) generate key using xmlpattern '//text()' as sql varchar(40);

Don’t index everything!Very expensive for

insert, update, delete !

Avoid:



Tip 6: Lean XML Indexesand Indexing non-leaf Nodes



</addr>(…)

</customerinfo>Typically not useful !

…xmlpattern '/customerinfo/addr' as sql varchar(128);

Single index entry. Key value = concatenation of all text nodes under “addr”:

(/customerinfo/addr, “1596 BaselineTorontoOntarioM3Z-5H9”)

� Better: 4 separate indexes !

�…xmlpattern '/customerinfo/addr/street' as sql varchar(50);

�…xmlpattern '/customerinfo/addr/city' as sql varchar(40);

�…xmlpattern '/customerinfo/addr/state' as sql varchar(25);

�…xmlpattern '/customerinfo/addr/pcode' as sql varchar(10);



Tip 7 & 8 : Put document filtering predicates in XMLEXISTS instead of XMLQUERY & Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS

■ XMLQUERY function in a SELECT clause does not filter documents or rows, does not use indexes

■ Document/Row-filtering predicates must be in XMLEXISTS in the WHERE clause

■ Predicates in XMLEXISTS must be in square brackets



SQL/XML with XMLQUERY

<customerinfo><name>Matt Foreman</name><phone>905-555-4789</phone>

</customerinfo>

<customerinfo><name>Peter Jones</name><phone>905-123-9065</phone>

</customerinfo>

<customerinfo><name>Mary Poppins</name><phone>905-890-0763</phone>

</customerinfo>

• select xmlquery(‘$i/customerinfo[phone = “905-555-4789”]/name’ passing info as “i”)

� from customer

� select xmlquery(‘$i/customerinfo/name’ passing info as “i”) � from customer� where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

<name>Matt Foreman</name>


customer table:

1 record(s) selected



Can usean index!

Can not use an index!



� select xmlquery(‘$i/customerinfo/name’ passing info as “i”) � from customer� where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

SQL/XML with XMLEXISTS� select xmlquery(‘$i/customerinfo/name’ passing info as “i”)

from customerwhere xmlexists(‘$i/customerinfo/phone = “905-555-4789”’ passing info as “i”)



<name>Peter Jones</name>

<name>Mary Poppins</name>

customer table:



<customerinfo><name>Matt Foreman</name><phone>905-555-4789</phone>

</customerinfo>

<customerinfo><name>Peter Jones</name><phone>905-123-9065</phone>

</customerinfo>

<customerinfo><name>Mary Poppins</name><phone>905-890-0763</phone>

</customerinfo>


Can usean index!

Can not use an index!

True or false, not empty!



Tip 9: Use RUNSTATS on XML data!

■ RUNSTATS does collect statistics for XML data and XML indexes!

■ The optimizer does use these stats!

LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD

RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)



Tip 10: Use SQL/XML publishing views to expose relational data as XML

■ SQL/XML publishing functions hidden in a view

■ create table unit( unitID char(8), name char(20), manager varchar(20));

■ create view UnitView(unitID, name, unitdoc) as select unitID, name, XMLELEMENT(NAME "Unit",

XMLELEMENT(NAME "ID", u.unitID),

XMLELEMENT(NAME "UnitName", u.name),

XMLELEMENT(NAME "Mgr", u.manager) )

from unit u;



Tip 10: Use SQL/XML publishing views to expose relational data as XML

� Queries that perform sub-optimally

select unitdoc from UnitViewwhere xmlexists('$i/Unit[ID = "WWPR"]' passing unitdocas "i");

� Query that performs well: filter on relational

select unitdoc from UnitViewwhere UnitID = "WWPR";

In a nutshell, include relational columns in a SQL/XML publishing view, and when querying the view express any predicates on those columns rather than on the constructed XML.



Tip 11: Use XMLTABLE views to expose XML data in relational format

■ Values returned from XML documents in tabular format

■ create table customer(info XML);

■ create view myview(CustomerID, Name, Zip, Info) as SELECT T.*, info

FROM customer, XMLTABLE ('$c/customerinfo' passing info as “c”

COLUMNS

“CID” INTEGER PATH './@Cid',

“Name” VARCHAR(30) PATH './name',

“Zip” CHAR(12) PATH './addr/pcode' ) as T;



Tip 11: Use XMLTABLE views to expose XML data in relational format

■ Query with an XML predicate– May perform sub-optimally

select CustomerID,Name

from myview

where Zip = “95141”;

■ Will perform wellselect CustomerID, Name

from myviewwhere xmlexists('$i/customerinfo[addr/pcode ' “95141”] passing info as “i”);

In a nutshell, be careful with XMLTABLE views which expose XML data in relational form. When possible, include additional columns in the view definition so that filtering predicates can be expressed on those columns instead of the XMLTABLE columns.



Tip 12: Use Parameter markers and host vars for fast XML queries

select info from customer wherexmlexists('$i/customerinfo[phone = "905-555-4789"]' passing info as "i")

select info from customer where xmlexists('$i/customerinfo[phone = $p]'

passing info as "i", cast(? as varchar(12)) as "p")

select info from customer where xmlexists('$i/customerinfo[phone = $p]'

passing info as "i", :vchostvar as "p")



Summary

� Usage scenarios



� XML performance

� Performance monitoring and tuning


DB2 9 zOS pureXML Advanced Topics - OOWidgets 9 zOS pureXML Advanced Topics.pdf · – SQLCODE =...

Documents

Transcript of DB2 9 zOS pureXML Advanced Topics - OOWidgets 9 zOS pureXML Advanced Topics.pdf · – SQLCODE =...