8/6/2019 IDUG - 07 Programming Support for PureXML
1/77
IBM Software Group
Leveraging pureXML in your application
IDUG IndiaManoj K [email protected]
May, 2007
8/6/2019 IDUG - 07 Programming Support for PureXML
2/77
IBM Software Group | DB2 Information Management Software
2
Agenda
XML Data management DifferentWays.
DB2 pureXML An innovative way.
XML Data Storage.Query Language Support.Application Programming SupportUse Cases.
8/6/2019 IDUG - 07 Programming Support for PureXML
3/77
IBM Software Group | DB2 Information Management Software
3
XML Data Management Traditional Approach
8/6/2019 IDUG - 07 Programming Support for PureXML
4/77
IBM Software Group | DB2 Information Management Software
4
XML Data Management
8/6/2019 IDUG - 07 Programming Support for PureXML
5/77
IBM Software Group | DB2 Information Management Software
5
DB2 pureXML An Innovative way
8/6/2019 IDUG - 07 Programming Support for PureXML
6/77
IBM Software Group | DB2 Information Management Software
6
XML Data management
47; John Doe; 58; Peter Pan; Database systems; 29; SQL; relational
Let's start with an example...
Here is some data from an ordinary delimited flat file:
What does this data mean?
8/6/2019 IDUG - 07 Programming Support for PureXML
7/77
IBM Software Group | DB2 Information Management Software
7
What is XML?
John Doe
Peter Pan
Database systems29
SQL
relational
XML = eXtensible Markup Language
XML is "self-describing data"
XML: Describes data
HTML: Describes display
8/6/2019 IDUG - 07 Programming Support for PureXML
8/77
8/6/2019 IDUG - 07 Programming Support for PureXML
9/77
IBM Software Group | DB2 Information Management Software
9
XML vs. Relational
10CHRISTINESMITH408-463-496352750.00
27MICHAELTHOMPSON41250.00
DepartmentDEPTID DEPTNAME
15 Sales
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON NULL 4125015 10 CHRISTINE SMITH 408-463-4963 52750
Relational XML
Set oriented Sequences (ordered!)Structure Semi-structured
Strong schema Schema-chaos
Strongly typed Optionally typed
Tabular data model XML data model
Flat Nested, hierarchical
3 value logic 2 value logic
"Null" Not there at all
ANSI/ISO W3C
8/6/2019 IDUG - 07 Programming Support for PureXML
10/77
IBM Software Group | DB2 Information Management Software
10
Why XML?
Flexibility, Flexibility, Flexibility !
XML is a very flexible data model:
for structured data, semi-structured data, schema-less data
Easy to extend: define new tags as needed XML is self-describing: any XML parser can "understand" it !
Easy to transform XML documents into other formats (HTML, etc.)
XML is vendor and platform independent
Easy to shareXML between applications, businesses, processes,
Easy to "validate" XML, i.e. to check compliance with a schema
- any XML parser can do it!
8/6/2019 IDUG - 07 Programming Support for PureXML
11/77
IBM Software Group | DB2 Information Management Software
11
XML can be a better choice than relational for...
Data thats inherently hierarchical or nested in nature
Example: Medical data, Bill-of-materials, etc., OO & Multi-value
Data sets with sparsely populated attributes
Example: FIXML, FpML, Customer profiles
Schema evolution Example: Frequently changing services/products/processes
Variable schemas, many schemas
Example: Data integration, consolidation of diverse data sources
Combining structured & unstructured data
Example: CM, Life Sciences, News & Media
8/6/2019 IDUG - 07 Programming Support for PureXML
12/77
IBM Software Group | DB2 Information Management Software
12
Importance ofXML data?
More and more XML
data generated
everyday
XML is pervasive inall kinds of
organizations
Almost every sector
has XML based
standards
8/6/2019 IDUG - 07 Programming Support for PureXML
13/77
IBM Software Group | DB2 Information Management Software
13
Where is your XML?
In files Storage not managed and not secure
In LOBS Content and business value locked up
Shred to tables Complex and fragile mapping
In XML DB Scalability & integration concerns
8/6/2019 IDUG - 07 Programming Support for PureXML
14/77
IBM Software Group | DB2 Information Management Software
14
XML Data Needs Relational Maturity
XML Data Needs Protection Backup and recovery features to ensure continuity
Data is protected using database security
SimplifiedXML Data Access Centrally store and access difficult to retrieve data
SQL or XQuery can be used to retrieve data
Join XML data with its related relational data
SearchSpeed
Search documents quickly and efficiently using provensearch optimization engine of mature database
Optimize ExistingInvestments Use existing technology infrastructure and skills to store
and manage both relational and XML
5
8/6/2019 IDUG - 07 Programming Support for PureXML
15/77
IBM Software Group | DB2 Information Management Software
15
DB2
DB2 9 pureXML
8/6/2019 IDUG - 07 Programming Support for PureXML
16/77
IBM Software Group | DB2 Information Management Software
RelationalInterface
DB2
XInterface
DB2 9: Hybrid Data ServerpureXML and Relational Storage
DB2 9 SERVER
CLIENT SQL +SQL/XML
XQuery
Relational
XML
DB2 Client /
ClientApplication
Relational and XML data are stored differently, but closely linked Seamlessly Join Relational and XML data
DB2 Storage:
Engine
Hybrid
8/6/2019 IDUG - 07 Programming Support for PureXML
17/77
IBM Software Group | DB2 Information Management Software
17
DB2 9 Summary of pureXML Support
XML as a native data type
PureXML storage and indexing
XQuery and SQL/XML support
XML Schema Repository
Schema validation
Application Support
Java, C/C++, .NET, PHP, etc.
Visual Tooling, Control CenterEnhancements
Annotated schema shredding
DB2 Utilities: Import/Export, HADR, etc.
and more
Secure andResilient
Infrastructurefor a New
Breed of Agile
Applications
DB2 9
8/6/2019 IDUG - 07 Programming Support for PureXML
18/77
IBM Software Group | DB2 Information Management Software
18
pureXML Usage Scenarios
8/6/2019 IDUG - 07 Programming Support for PureXML
19/77
IBM Software Group | DB2 Information Management Software
19
pureXML Usage Scenarios
1. Industry standards and data exchange applications
2. Web services, SOA data transport and message persistence
3. Business object / transaction record
4. Integration of diverse data sources
5. Forms and workflow processing
6. Document storage and querying
7. XML Feeds and Web 2.0 Syndication
8. Mapping XML in relational applications
9. Better data model for certain types of data10. Rapid application prototyping and development
and many more!
8/6/2019 IDUG - 07 Programming Support for PureXML
20/77
IBM Software Group | DB2 Information Management Software
20
1. Industry standards and data exchangeapplications
Banking
IFX, OFX, SWIFT, SPARCS,
MISMO +++
Financial Markets
FIXML, MDDL,RIXML, FpML +++
Insurance
ACORD
XML for P&C, Life +++
Chemical & Petroleum
Chemical eStandards
CyberSecurity
PDX Standard+++
Healthcare
HL7, DICOM, SNOMED,
LOINC, SCRIPT +++
Life Sciences
MIAME, MAGE,
LSID, HL7, DICOM,
CDIS, LAB, ADaM +++
Retail
IXRetail, UCCNET, EAN-UCC
ePC Network +++
Electronics
PIPs, RNIF, Business Directory,
Open Access Standards +++
Automotive
ebXML,other B2B Stds.
Telecommunications
eTOM, NGOSS, etc.
Parlay Specification +++
Energy & Utilities
IEC Working Group 14
Multiple Standards
CIM, MultispeakCross Industry
PDES/STEPml
SMPI Standards
RFID, DOD XML+++
8/6/2019 IDUG - 07 Programming Support for PureXML
21/77
IBM Software Group | DB2 Information Management Software
21
Case1 : Industry Standard FIXML
Buying 1000 Shares of IBM Stock..8=FIX.4.2^9=251^35=D^49=AFUNDMGR^56=ABROKER^34=2
^52=20030615-01:14:49^11=12345^1=111111^63=0^64=2003
0621^21=3^110=1000^111=50000^55=IBM^48=459200101^22=
1^54=1^60=2003061501:14:4938=5000^40=1^44=15.75^15=USD
^59=0^10=127
New FIXML
Protocolextensible
lower appl development &maintenance cost
Old FIX
Protocol
8/6/2019 IDUG - 07 Programming Support for PureXML
22/77
IBM Software Group | DB2 Information Management Software
22
Case 2: FpML (Derivative Trading)
Financial Products Markup Language
XML vocabulary for describing derivatives, their trades, andtheir risks
Derivatives: risk-shifting agreement, based on any tradableinstrument (interest rate, stock, index, currency,)
OTC (over-the-counter) derivatives: privately negotiated, nostandards, customized contracts
Large variety & rapid changes in derivative products &transactions
Not manageable in a relational database schema
8/6/2019 IDUG - 07 Programming Support for PureXML
23/77
IBM Software Group | DB2 Information Management Software
23
Derivatives Trading beforeFpML:
Highly manual, i.e. error-proneand of poor timeliness
No automated system
Fear: system not able tohandle variety & rapid change
Fear: too costly to buildautomated trading system
Fear: system is obsolete bythe time its implemented
Solution: XML-based tradingsystem, automated, able toevolve rapidly -> FpML
8/6/2019 IDUG - 07 Programming Support for PureXML
24/77
IBM Software Group | DB2 Information Management Software
24
Benefits of FpML
Integration of trading services across diverse systems andapplications
HW & SW independence
Lower system implementation & maintenance cost Higher trading volumes with higher accuracy
Increased business opportunities
Reduced operational risks
8/6/2019 IDUG - 07 Programming Support for PureXML
25/77
IBM Software Group | DB2 Information Management Software
25
Case 3: ACORD Insurance Industry
ACORD = Agent-Company Organization for Research and
Development
Non-profit standards body for insurance data exchange
1970: Forms development for property & casualty insurance
1980: EDI standards for P&C industry
1996: Standards for Life Insurance
2000+:
XML-based standards for P&C, Life, Reinsurance
Data and application integration
Real-time information exchange for B2B and B2C
8/6/2019 IDUG - 07 Programming Support for PureXML
26/77
IBM Software Group | DB2 Information Management Software
26
Why is ACORD moving to XML?
eBusiness and Internet-based business: connecting backoffices, agents, brokers, consumers, etc.
Diversified & multi-channel distribution
Streamlined & simplified data transfer
Straight-through processing of applications & claims
Cross platform and cross-system data exchange
Integration of diverse data sources
Extensible: for hybrid & aggregate insurance products
8/6/2019 IDUG - 07 Programming Support for PureXML
27/77
IBM Software Group | DB2 Information Management Software
27
DB2 pureXML Quick Start Samples
Each Quick Start Sample provides instructions of Creating a sample database. Registering standard industry schemas. Inserting and validating sample XML messages. Building XML indexes and querying the stored data.
Two sample packages are available right now at
http://www.alphaworks.ibm.com/tech/purexml/download Acord.zip : Insurance Industry Cdisc.zip : Clinical Data Fixml.zip : Financial Trading FpML.zip : Financial Derivatives Mismo.zip : Mortgages
ZosAcord.zip : Financial Trading on zOS platform ZosFixml.zip : Financial Derivatives on zOS platform
And many more to come soon..
8/6/2019 IDUG - 07 Programming Support for PureXML
28/77
IBM Software Group | DB2 Information Management Software
28
2. XML - the foundation for SOA and Web Services
XML is the transport for messages and data in SOA
XML DBs can provide SOA data services
ServiceRequestor
ServiceProvider
XML
SOA messages/data often need to be persisted
Temporary Cache
Audit Logs
Compliance Records
Insight
8/6/2019 IDUG - 07 Programming Support for PureXML
29/77
IBM Software Group | DB2 Information Management Software
29
3. XML Transaction Records / Business Objects
Transactions being conducted as XML
Within SOA environments
Between value chain members
Need to store the transaction record and query later
Many business objects being represented as XML
Purchase orders
Invoices Insurance policies
Need to store XML business objects intact
8/6/2019 IDUG - 07 Programming Support for PureXML
30/77
IBM Software Group | DB2 Information Management Software
30
4. Integration of Diverse Data Sources
XML database as integration hub XML schema flexibility integrate data with differing formats
XQuery language excellent for joining different data sources
Integration using SOA environments
Services Oriented Integration (SOI)
DB2 9
Applications,
Services,
Employee/
Customer
Portals,
Suppliers,
Distributors,
Partners,
Agencies
Z
O
E
8/6/2019 IDUG - 07 Programming Support for PureXML
31/77
IBM Software Group | DB2 Information Management Software
31
5. Forms and theirprocessing
Forms exist for virtually all types of goods and services Insurance applications, bank loans, tax filings,
Paper forms being replaced by electronic forms
Online forms are becoming XML based e.g. XForms
Store entire form (XML document) as a whole in XML databaserather than shred into relational columns
DB2 9
Broker
ApplicationForm
Status
Audit
8/6/2019 IDUG - 07 Programming Support for PureXML
32/77
IBM Software Group | DB2 Information Management Software
32
6. Document storage and querying
Document-centric XML mostly has unstructured data Can contain some structured elements
E.g. Legal Contracts, Manuals, etc.
Application managed document processing
Contract Performance ManagementContract Performance Management
DB2 9DB2 9
ContractDates
Prices
Liabilities
Milestones
Quantities Certificates
Structured Unstructured
Procurement / Sales /
Legal / Finance
Create
Update
Manage
Business Exec /
Analyst
Search
Report
Analyze
8/6/2019 IDUG - 07 Programming Support for PureXML
33/77
IBM Software Group | DB2 Information Management Software
33
7. XML Feeds and Syndication
Syndication is heartbeat of Web 2.0
RSS/ATOM Feeds encapsulated as XML
Use XML database for serving and strong feeds
E.g. Stock ticker feeds, inventory feeds, etc.
DB2 9
Web Server
ATOM/RSSReader
Web Server
ATOM/RSSProvider
8/6/2019 IDUG - 07 Programming Support for PureXML
34/77
IBM Software Group | DB2 Information Management Software
34
8. MappingXML for relational applications
Shredding may be ok if:
Simple data / Schema notcomplicated
XML is merely a transport
i.e. XML structure notrelevant
Existing SQL Apps have onlyrelational APIs
E.g. BI apps, reporting tools
DB2 9 Annotated SchemaShredding
Acme12.99
DB2 9
ID Name Price
129 Acme 12.99
Insight
8/6/2019 IDUG - 07 Programming Support for PureXML
35/77
IBM Software Group | DB2 Information Management Software
35
9. XML as a better data model
XML provides a better data model for many new apps Flexibility, schema versatility, hierarchical nature
Semi-structured or unstructured data
E.g. healthcare records, biological data, contracts, insuranceclaims, etc.
Inherently hierarchical, nested or complex data
E.g. manuals, books, catalogs, bills of materials, land records, etc.
Data with changing orevolving schemas
E.g. Forms, changing industry standard documents, new product
versions, etc. Data with Null, Multiple or Unknown values
E.g., Phone numbers (home, office, mobile), in patient records, etc.
pureXML database a natural choice for XML data
8/6/2019 IDUG - 07 Programming Support for PureXML
36/77
IBM Software Group | DB2 Information Management Software
36
10. pureXML for Rapid Application Prototyping andDevelopment
Represent multiple elements as asingle object
e.g.: Purchase Order Relational:
Many tables: Customer, Product,Shipping,
Normalization Foreign key relationships Insert involves many columns Complex queries with joins Conform to column definition
XML:
Single Purchase Order column Easily access individual
elements
Write less code with pureXML!
8/6/2019 IDUG - 07 Programming Support for PureXML
37/77
8/6/2019 IDUG - 07 Programming Support for PureXML
38/77
IBM Software Group | DB2 Information Management Software
38
Who uses DB2 pureXML?
8/6/2019 IDUG - 07 Programming Support for PureXML
39/77
IBM Software Group | DB2 Information Management Software
Profile
Challenge
Status
U.S. State Tax office
Have 3600 different tax
forms Schema Diversity
Typically not every field ina form is used
Sparse Data
Many forms change everyyear
Schema Evolution
A case forXML !
Need to store/ manage thousands ofdifferent tax forms in a database,changing every year. Today they use 640
generic columns in RDBMS.
Chose DB2 pureXML : Much simpler storage and processing of
tax forms in XML format Handles schema diversity, schema
changes, and sparsity On AIX: reduces cost and dependency
on mainframe/Cobol skills
8/6/2019 IDUG - 07 Programming Support for PureXML
40/77
IBM Software Group | DB2 Information Management Software
40
Solution 1: Each form has a different set of fields(schema)
Thousands of Tables i.e. one per form ? Considered not feasible
Too many tables to maintain
Relational schema would deteriorateover time
Not sufficiently flexible and extensible
Solution 2: Single table whose rowscan store anyform
100s of generic columns Ouch!
Typical Current Usage: Relational Database
8/6/2019 IDUG - 07 Programming Support for PureXML
41/77
IBM Software Group | DB2 Information Management Software
41
Generic columns XML
col1 col2 col3 col4 col5 col1000
134 NULL 11/23/05 NULL NULL NULL
NULL 276 NULL NULL Yes NULL
12 NULL NULL 99.99 NULL NULL
NULL NULL NULL 123.23
NULL No
13411/
23/
05
XML:Avoids sparsity. Proper data labeling. 2 columns, not1000. Transformable. Extensible. Simplifies mapping.
Current relational storage,inefficient, anonymouscolumns, requires complexmappings in the application
New XML format:
8/6/2019 IDUG - 07 Programming Support for PureXML
42/77
8/6/2019 IDUG - 07 Programming Support for PureXML
43/77
IBM Software Group | DB2 Information Management Software
43
Storebrands Service Oriented Architecture (SOA)
LifeInsurance
YTPPensions
ITPPensions
Investments
Banking
Mortgage
StorebrandIntegration
Architecture
Customer
BusinessServices
ArchiveDataWare
house
ProcessManage
ment
Internet
WAP
Financial adviser
Call Center
XML
XML
XML
XML
XML
XML
XML
XML
XMLXML
XML
XML
XML
XML
Business goals: improve customer focus, manage costs, access data 24/7,
speed time to market for new products, increase product customization,combine products into packages
8/6/2019 IDUG - 07 Programming Support for PureXML
44/77
IBM Software Group | DB2 Information Management Software
Profile
Challenge
Status
NA Bank
Requirement1. load 500,000 XML documentsper day
achieved in less than 1 hour onDB2 9 pureXML One major vendor unable toeven load all the data
2. Queries:Retrieve XML doc for any specific
trade (by trade number)Retrieve all trades for acounterpartyRetrieve all trades by trade createtimeRetrieve all trades by maturitydate range
Retrieve trades for a givenacquire day range, and tradenumber rangeAll transactional queriescompleted sub-second perrecord
Moving to a Service OrientedArchitecture
Creating a flexible, on demandinformation architecture.
Send data from operational system asXML message and store in repository.Downstream applications retrieve data asneeded via web services
Held PoC with DB2 pureXML : Met all criteriaPerformance was significantly faster
than customer expected
Win in competitive environment
8/6/2019 IDUG - 07 Programming Support for PureXML
45/77
IBM Software Group | DB2 Information Management Software
45
Some of our partners
8/6/2019 IDUG - 07 Programming Support for PureXML
46/77
IBM Software Group | DB2 Information Management Software
Profile
Challenge
Status
It is exciting to seeIBMevolve toembrace XML.
-- Tiffany Riley, VP
Approximately 90% of the
valuable data in contractual
agreements are unstructured. XML technology is
essentialin tapping into this
hidden reservoirof
information,enabling
companies to actively
manage andmaximizecustomer, supplierand
partnerrelationships.
-- Nextancepress release,
11/05
Need to store/ manage contract fragmentsand allow users to compose contractdocuments from these.
Prior experience with XML in other database Successful migration to DB2 9 pureXML Major production deployments using DB2/XML
under way.
Simplified application developmentBetter schema evolution
Better scalability
Very enthusiastic about DB2 9
Nextance: IBM business partner, providesXML-based enterprise contract managementsoftware.
8/6/2019 IDUG - 07 Programming Support for PureXML
47/77
IBM Software Group | DB2 Information Management Software
Profile
Challenge
Status
"The combination of
industrialstrength database
management fornative XML
byDB2Viperand Skytide's
ability toprovide direct
multidimensionalanalysis ofXML data,removes two key
barriers towidespread
adoption of XML and the
transformation of this data
into actionable business
information."
-- Joseph Rozenfeld,VP of Skytide
Need to integrate structured, semi-structured and unstructured data forbusiness analytics.
Enabled forViper XMLSchema flexibilityFirst-class support for analytics over
XML data
Offering free Skytide / DB2 PoC
"With DB29,we've seen a 5 to 10times performance improvementovera non-databaseenvironment."
-- Keith Feingold, CEO, Skytide
IBM business partner providingOLAP-style analytics for XML.
8/6/2019 IDUG - 07 Programming Support for PureXML
48/77
IBM Software Group | DB2 Information Management Software
Profile
Challenge
Status
"Inxight has foundIBMDB2spureXML to be highly
performant and an excellent
complement toourextreme high-
throughput SmartDiscovery
Extraction Server(SDX). []Viperallows us to quickly and
efficiently store information in a
rich XML representation. Inxights
integrationprocess with
DB2/XML has been a smooth
one,requiringminimaleffort.
- Renzo Lazzarato,VP of Advanced Development,Inxight Software
IBM business partner providing entityand fact extraction from text data.
Enabling SDX forViper XML
Very impressed with DB2s XMLperformance, especially insertand indexing throughput
Estimates a 10x developmentproductivity improvement byavoiding shredding and schemamapping
Need to store large amounts of XMLdata at a very high rate, and it needsto be immediately query-ready.
8/6/2019 IDUG - 07 Programming Support for PureXML
49/77
8/6/2019 IDUG - 07 Programming Support for PureXML
50/77
IBM Software Group | DB2 Information Management Software
50
More about DB2 9 pureXML
8/6/2019 IDUG - 07 Programming Support for PureXML
51/77
IBM Software Group | DB2 Information Management Software
51
XML Databases
XML-enabled Databases
The core data model is not XML (but e.g. relational)
Mapping between XML data model and DBs datamodel is required, or XML is stored as text
E.g.: DB2 XML Extender (V
7,V
8) pureXML Databases
Use the hierarchical XML data model to store andprocess XML internally
No mapping, no storage as text Storage format = processing format
E.g.: Viper
IBM S ft G | DB2 I f ti M t S ft
8/6/2019 IDUG - 07 Programming Support for PureXML
52/77
IBM Software Group | DB2 Information Management Software
52
XML-Enabled Databases: Two Main Options
XMLDOC
Extract selected
elements/attr.
Side Tables
CLOB/Varchar
XML DOC
XML DOC
XML DOC
XMLDOC
Varchar or clobcolumn
Fixed
Mapping Shredder
(regular tables forfaster lookup)
(regular relational tables)
Decompositon
Shredding
IBM S ft G | DB2 I f ti M t S ft
8/6/2019 IDUG - 07 Programming Support for PureXML
53/77
IBM Software Group | DB2 Information Management Software
53
Problems of XML-enabled Databases
CLOB storage:
Query evaluation & sub-document level accessrequires costly XML Parsing too slow !
Shredding:
Mapping from XML to relational often too complex
Often requires dozens or hundreds of tables
Complex multi-way joins to reconstruct documents
XML schema changes break the mapping no schema flexibility !
For example: Change element from single- to multi-occurrencerequires normalization of relational schema & data
IBM S ft G | DB2 I f ti M t S ft
8/6/2019 IDUG - 07 Programming Support for PureXML
54/77
IBM Software Group | DB2 Information Management Software
54
Shredding: A simple case
10CHRISTINESMITH408-463-496352750.00
27MICHAELTHOMPSON406-463-123441250.00
Depart ent
DEPTID DEPTNAME15 Sales
Empl eeDEPTID EMPNO FI TNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 41250
15 10 CHRISTINE SMITH 408-463-4963 52750
IBM S ft G | DB2 I f ti M t S ft
8/6/2019 IDUG - 07 Programming Support for PureXML
55/77
IBM Software Group | DB2 Information Management Software
55
Shredding: A schema changeEmployeesarenowallowedtohavemultiplephonenumbers
10CHRISTINESMITH408-463-4963
415-010-123452750.00
27MICHAELTHOMPSON406-463-1234
41250.00
P neEMPNO PHONE
27 406-463-123410 415-010-1234
10 40 -463-4 63
Requires: N rmalizati n fexisting data ! Modificationof t e mapping Changeof applications
Costly!
Department
DEPTID DEPTNAME15 S l s
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 41250
15 10 CHRISTINE SMITH 40 -463-4 63 52750
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
56/77
IBM Software Group | DB2 Information Management Software
56
XML A First Class Citizen
Data Definitioncreate table dept(deptID int, deptdoc xml);
Insert
insert into dept(deptID, deptdoc) values (?,?)
Retrieve
select deptdoc from dept where deptID = ?
Queryselect deptID, xmlquery('$d/dept/name' passing
deptdoc as d") from dept where deptID PR27;
SQL as theprimary
language
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
57/77
IBM Software Group | DB2 Information Management Software
57
XML data is stored in XML-typed columns in tables
create tabledept (deptID char(8),, deptdocxml);
XML is stored in aparsed hierarchicalformat
Relational columnsare stored in relationalformat
Native XML Storage
deptID deptdoc
PR27
DB2 Storage
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
58/77
IBM Software Group | DB2 Information Management Software
58
XML Storage: Regions Index
maps nodeIDs toregions & pages
allows intelligent
prefetching
page page page
Regions index
System defined, default component of XML storage layer
Reuse RDBMS featuresPages
Buffer PoolsTablespacesPrefetchingLocking
8/6/2019 IDUG - 07 Programming Support for PureXML
59/77
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
60/77
IBM Software Group | DB2 Information Management Software
60
What is SQL/XML?
Extension of the SQL language standard (ANSI/ISO)
XML Data Type
XML publishing functions (relational datap XML)
Conversion function: XML typem char/varchar/clob
Integration of SQL and XQuery languages
Other functions, e.g. validation, parsing, serialization
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
61/77
IBM Software Group | DB2 Information Management Software
61
SQL/XML: Use SQL to produce XML from Relational Data
SELECT
XMLELEMENT (NAME "Department",
XMLATTRIBUTES (e.dept AS "name" ),
XMLAGG ( XMLELEMENT (NAME "emp", e.firstname) )
)AS "dept_list"
FROM employeeeWHERE ..
GROUP BY e.dept;
dept_list
CHRISTINE
VINCENZO
SEAN
MICHAEL
A00LEESEAN
B01JOHNSONMICHAEL
A00BARELLIVINCENZO
A00SMITHCHRISTINE
deptlastnamefirstname
Start With Produce
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
62/77
IBM Software Group | DB2 Information Management Software
62
XMLTABLE: Return XML in tabular format
John
Doe
344
Peter
Pan
216
empID firstname lastname office
901 John Doe 344
902 Peter Pan 216
SELECT X.* FROM dept,XMLTABLE ($d/dept/employee passing deptdoc as d)
COLUMNSempID INTEGER PATH @id,firstname V ARCHAR(30) PATH name/first,lastname V ARCHAR(30) PATH name/last,
office INTEGER PATH office) AS X
SQL/XMLXQuery
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
63/77
IBM Software Group | DB2 Information Management Software
63
XQuery: The FLWOR Expression
FOR: iterates through a sequence, bind variable to items LET: binds a variable to a sequence WHERE: eliminates items of the iteration ORDER: reorders items of the iteration RETURN: constructs query results
XQuery
XQUERY for $movie in xmlcolumn(movies.doc)
let $actors := $movie//actor
where $movie/duration > 90
order by $movie/@yearreturn
{$movie/title, $actors}
ChicagoRenee Zellweger
Richard Gere
Catherine Zeta-Jones
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
64/77
IBM Software Group | DB2 Information Management Software
64
Mixing SQL and XQueryCalling SQL from XQuery
Identifying XML data by columnFOR $d in xmlcolumn(DEPT.DEPTDOC) always operates on the entire
column!
Identifying XML data via a select statementLeverage predicates/indexes on relational columns
FOR $d in sqlquery('select deptdoc from dept')
FOR $d in sqlquery('select deptdoc from dept where deptID = PR27 ')
FOR $d in sqlquery('select deptdoc from dept where deptID LIKE PR% ')
FOR $d in sqlquery('select dept.deptdoc from dept, unit
where dept.deptID=unit.ID and unit.headcount > 200)..
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
65/77
IBM Software Group | DB2 Information Management Software
65
XML Indexing: Examples
create table dept(deptID char(8) primary key, deptdoc xml);
create unique index idx2 on dept(deptdoc) generate key
using xmlpattern '/dept/employee/@id' as sql double;
create index idx3 on dept(deptdoc) generate key
using xmlpattern '/dept/employee/name' as sql varchar(35);
John Doe
408 555 1212
344
Peter Pan
408 555 9918
216
xmlpattern '//name' as sql varchar(35); (Index on ALL name elements)
xmlpattern '//@*' as sql double; (Index on ALL numeric attributes)
xmlpattern '/dept/employee//text()' as sql varchar(128); (All text nodes underemployee)
xmlpattern 'declare namespace m="http://www.myself.com/";/m:dept/m:employee/m:name
as sql varchar(45);
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
66/77
p | g
66
XML Schema Repository (XSR)
XSR_REGISTER (dbschema, identifier, schemalocation, xsd, docproperty)
XSR_ADDSCHEMADOC (dbschema, identifier, schemalocation, xsd, docproperty)
XSR_COMPLETE (dbschema, identifier, schemaproperties, isusedforshred)
XSR = New DB2 catalog tables
+ Command Line and Stored Procedure Interfaces
The XSRStores XML Schema documents, assign identifiers
Keeps track of relationships between schema documents
Keeps precompiled schema grammars
Provides mapping from schema location to schema identifier
SYSCAT.XSROBJECTSSYSCAT.XSROBJECTCOMPONENTSSYSCAT.XSROBJECTAUTHSYSCAT.XSROBJECTHIERARCHIES
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
67/77
p | g
67
Can override schema location found in the document,
reference a schema from DB2s schema repository:
insert into deptvalues(?,xmlvalidate(? according to xmlschema id dept.schema1))
insert into deptvalues(?,xmlvalidate(? according to xmlschema uri http://my.dept.com))
create table dept(deptID char(8), deptdoc xml);
Validation is optional, and per document (per row):
insert into dept values (?, ?)
insert into dept values (?, xmlvalidate(?))
Schemareferencedby identifier
Schema referencedby namespace URI
No Validation
With Validation
Validation usingXML Schemas
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
68/77
p | g
68
API Support for XML in V9.1
New XML type support added to APIs including:
JDBC, SQLJ, .NET, CLI, Embedded SQL, PHP
SQL/XML supported by all APIs XQuery supported by all APIs
Result sequencewillbe treated as a resultset
Each itemwillbe treated as a row.
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
69/77
| g
69
Language Bindings
Java ExamplePreparedStatement stmt1 = con.prepareStatement("Select deptdoc from deptwhere id = 001 );ResultSet rs = stmt1.executeQuery();rs.next();// Get the first returned document as a stringStringxmlString = rs.getString(1);// As a binary streamInputStream is = rs.getBinaryStream(1);// As an XML objectcom.ibm.db2.jcc.DB2Xml xml = (com.ibm.db2.jcc.DB2Xml) rs.getObject
(1);rs.close();stmt1.close();
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
70/77
70
Utilities & Tools for XML in DB2
XML Import & Export
Runstats collects stats for XML data
XML type support in SQL stored procedures
XML columns supported by HADR
XML columns supported by backup/restore
XQuery Builder GUI
GUI for XML Schema annotations for shredding
XML Index Definition GUI
Control Center extensions for XML
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
71/77
71
XML Schema Flexibility in DB2 9
No Schema One Schema Schema V1 Documents Any mix you want!& Schema V2 w/ and w/o
schema
Document validation for zero, one, or many schemas perXML column:
(a) (b) (c) (d) (e)
Most Databases only support (a) and (b). DB2 9 allows (a) through (e).
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
72/77
2
pureXML Insert vs. Shredding Performance
, XML Documents, to 2
XML Insert vs. Shre Perf r ance
DB2 V XML
tender Shred
DB2 ew Shred DB2 pureXML
Insert
1
14.2%
3.4%
Seconds
Fixed
apping
(regular relational tables)
X
L
DOC
X
L
DOC
XML Column
X
L
Index
shredded to87 columns in
12 tables.
pureXMLinsert, 1 XML
column, 1 table.
TM
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
73/77
73
Query 2: XML vs CLOB column (10 concurrent users)
# Documents
ElapsedTimeinS
econd
B 2pure M 2
2
pure M vs B Query Performance
Query 2: Retrieves onedocument based on asingle search condition.
No index is used.
The larger the table( , to ,documents) the biggerthe performance benefitof pure M !
Query Response Time lower is better
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
74/77
74
Constantly High XML Insert Throughput
AIX 5.3, P-Series P5-560Q, 8 CPUs, TotalStorage DS8100,
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606schiefer
XM cument nsert ate over Time
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30
Inserts
erse
conds
umber ofdocumentsa ready inserted (in millions)
100 concurrent clients insertingFIXML order documents, with XMLindex building, at ~30GB/hour.
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
75/77
75
Challenges Solved by DB2 V9 with pureXML
Data Query Need the ability to query any
element in the XML document
Need to quickly retrieve sets of
data Shredding
Need to remove the complexityaround shredding
Standard XML Technology XQuery And XPath
Flexibility Need the ability to change any
data element at any time
Native storage defineseach field
SQL or XQuery to retrieve
sets of data
No more shredding
Schema evolution allowsmultiple schemas
Easy to learn
Use same technology forapplication and database
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
76/77
76
Summary
XML is widely accepted and increasingly usedin customer applications and solutions
Integration, SOA, Document Management
Driving business processes and workflows
XML based applications need robust, scalableenterprise class database capabilities
Structure clash between XML and relationaldata models
DB2 Viper provides pureXML support indatabase engine
IBM Software Group | DB2 Information Management Software
8/6/2019 IDUG - 07 Programming Support for PureXML
77/77
Resources
DB2 pureXML Enablement web site: http://www-03.ibm.com/developerworks/wikis/display/db2xml/Home News and Success stories
Books and magazine issues
Technical papers and articles
Webcasts and demonstrations
Free software downloads
Education
Emails:
[email protected] : Thuan Bui
[email protected] : Mallarswami Nonvinkere
Top Related