[IEEE 9th International Conference on Information Technology (ICIT'06) - Bhubaneswar, India...

2
A generic prototype for storing and querying XML documents in RDBMS using model mapping methods Saeed Hassan Hisbani PAF-Karachi Institute of Economics & Technology Korangi Creek Karachi-75190, Sindh, Pakistan [email protected] Abstract In this paper a generic prototype for storing and querying XML data in any RDBMS using model mapping methods is proposed. In this prototype the latest researched model mapping method SUCXENT and free of cost available technologies MySQL, PhpMyAdmin and PHP classes are used as the examples. Otherwise being a generics solution any model mapping strategy to translate XML document into relational schema, any RDBMS as backend and any programming language for processing and front end can be used. As the proposed prototype is using SUCXENT so it will be proved to be an efficient solution with respect to Query processing specially recursive XML queries and updating. Instead of XPath, XQuery query processor is used for the querying of XML data as it is going to be standard XML query language 1. Introduction & Problem definition XML eXtensible Markup Language is emerging as the universal standard format for data exchange. The XML data management is becoming increasingly significant. The importance of relational data can not be overlooked because in e-commerce transactions and a significant amount of today’s business data is stored in relational databases, an essential part of many applications that use XML as a data exchange format is transferring data between relational databases and XML documents. There is a need arise to manage XML data and other data stored in Relational data seamlessly at a time efficiently. The native-XML databases usually have limited support for relational data. XML-Enabled databases like IBM, Oracle and Microsoft (are enabling their products for XML) have mature and proven techniques for relational data processing but XML-extensions have not been mature enough yet. In these vendor specific RDBMS; database administrators (DBAs) have to express how to map XML data into their systems and the XML storage are tailored to one particular system and are hard-coded to some default mappings on behalf of the users, so they cannot be used for any other relational backend. For the solution to these problems a generic prototype is required which act as mediator for storing and querying XML data in any RDBMS, independent of particular proprietorship. Some existing proposed schema dependent solution [1] and [2] have a drawback that even a small change of logical structure of XML documents influence on database schemas and also some more problems occur during the updating process because their strategies are depend upon the DTD/XML Schema. These proposals can not produced a generic solution to the problems. In [3] slower and un-efficient mapping strategy like Edge[4] and XRel[5] are used as compared to SUCXENT[6] which is being use here. Further these solutions have no support for XQuery, where as it is going to be a standard XML query language. Moreover these systems do not support recursive XML queries so efficiently as compared to proposed solution. To solve these issues the generic prototype is produced which will act as an efficient mediator in between XML and relational data. In this prototype the latest researched model mapping method SUCXENT[6] to map XML documents into fixed relations and free of cost available technologies MySQL as RDBMS, PhpMyAdmin and PHP Classes for the front end, processing and translating queries are used as the examples. Otherwise being a generics solution any model mapping strategy, any RDBMS and any programming language’s can be used. 2. SUCXENT (Schema UnConcious XML ENabled sysTem) SUCXENT[6] is a model-mapping, path- oriented approach. SUCXENT is chosen because it outperforms significantly current state-of-the-art model mapping approaches like Edge[4], XRel[5], XParent [7] as far as storage size, insertion time, extraction time, and path expression queries are concerned. SUCXENT schema is shown as under. Document (DocId, Name) Path (PathId, PathExp, CPathId) PathValue (DocId, PathId, LeafOrder,CPathId, BranchOrder, BranchOrderSum, LeafValue) TextContent (DocId, PathId, LeafOrder, CPathId, BranchOrder, BranchOrderSum, Text) DocumentRValue (DocId, Level, RValue) SUCXENT[6] investigates the performance of recursive XML queries. SUCXENT[6] outperforms existing schema-oblivious techniques, such as XParent, by up to 15 times and Shared-Inlining [8] - a schema-conscious approach – by up to 8 times for recursive queries. In addition, SUCXENT can reconstruct shredded documents up to 2 times faster than Shared-Inlining[8]. The main 9th International Conference on Information Technology (ICIT'06) 0-7695-2635-7/06 $20.00 © 2006

Transcript of [IEEE 9th International Conference on Information Technology (ICIT'06) - Bhubaneswar, India...

Page 1: [IEEE 9th International Conference on Information Technology (ICIT'06) - Bhubaneswar, India (2006.12.18-2006.12.21)] 9th International Conference on Information Technology (ICIT'06)

A generic prototype for storing and querying XML documents in RDBMS using model mapping methods

Saeed Hassan Hisbani PAF-Karachi Institute of Economics & Technology

Korangi Creek Karachi-75190, Sindh, Pakistan [email protected]

Abstract

In this paper a generic prototype for storing and

querying XML data in any RDBMS using model mapping methods is proposed. In this prototype the latest researched model mapping method SUCXENT and free of cost available technologies MySQL, PhpMyAdmin and PHP classes are used as the examples. Otherwise being a generics solution any model mapping strategy to translate XML document into relational schema, any RDBMS as backend and any programming language for processing and front end can be used.

As the proposed prototype is using SUCXENT so it will be proved to be an efficient solution with respect to Query processing specially recursive XML queries and updating. Instead of XPath, XQuery query processor is used for the querying of XML data as it is going to be standard XML query language 1. Introduction & Problem definition

XML eXtensible Markup Language is emerging as the universal standard format for data exchange. The XML data management is becoming increasingly significant. The importance of relational data can not be overlooked because in e-commerce transactions and a significant amount of today’s business data is stored in relational databases, an essential part of many applications that use XML as a data exchange format is transferring data between relational databases and XML documents. There is a need arise to manage XML data and other data stored in Relational data seamlessly at a time efficiently. The native-XML databases usually have limited support for relational data. XML-Enabled databases like IBM, Oracle and Microsoft (are enabling their products for XML) have mature and proven techniques for relational data processing but XML-extensions have not been mature enough yet. In these vendor specific RDBMS; database administrators (DBAs) have to express how to map XML data into their systems and the XML storage are tailored to one particular system and are hard-coded to some default mappings on behalf of the users, so they cannot be used for any other relational backend. For the solution to these problems a generic prototype is required which act as mediator for storing and querying XML data in any RDBMS, independent of particular proprietorship.

Some existing proposed schema dependent solution [1] and [2] have a drawback that even a small change of

logical structure of XML documents influence on database schemas and also some more problems occur during the updating process because their strategies are depend upon the DTD/XML Schema. These proposals can not produced a generic solution to the problems. In [3] slower and un-efficient mapping strategy like Edge[4] and XRel[5] are used as compared to SUCXENT[6] which is being use here. Further these solutions have no support for XQuery, where as it is going to be a standard XML query language. Moreover these systems do not support recursive XML queries so efficiently as compared to proposed solution. To solve these issues the generic prototype is produced which will act as an efficient mediator in between XML and relational data. In this prototype the latest researched model mapping method SUCXENT[6] to map XML documents into fixed relations and free of cost available technologies MySQL as RDBMS, PhpMyAdmin and PHP Classes for the front end, processing and translating queries are used as the examples. Otherwise being a generics solution any model mapping strategy, any RDBMS and any programming language’s can be used. 2. SUCXENT (Schema UnConcious XML ENabled sysTem)

SUCXENT[6] is a model-mapping, path-oriented approach. SUCXENT is chosen because it outperforms significantly current state-of-the-art model mapping approaches like Edge[4], XRel[5], XParent [7] as far as storage size, insertion time, extraction time, and path expression queries are concerned. SUCXENT schema is shown as under. Document (DocId, Name) Path (PathId, PathExp, CPathId) PathValue (DocId, PathId, LeafOrder,CPathId, BranchOrder, BranchOrderSum, LeafValue) TextContent (DocId, PathId, LeafOrder, CPathId, BranchOrder, BranchOrderSum, Text) DocumentRValue (DocId, Level, RValue) SUCXENT[6] investigates the performance of recursive XML queries. SUCXENT[6] outperforms existing schema-oblivious techniques, such as XParent, by up to 15 times and Shared-Inlining [8] - a schema-conscious approach – by up to 8 times for recursive queries. In addition, SUCXENT can reconstruct shredded documents up to 2 times faster than Shared-Inlining[8]. The main

9th International Conference on Information Technology (ICIT'06)0-7695-2635-7/06 $20.00 © 2006

Page 2: [IEEE 9th International Conference on Information Technology (ICIT'06) - Bhubaneswar, India (2006.12.18-2006.12.21)] 9th International Conference on Information Technology (ICIT'06)

reasons SUCXENT performs better than existing approaches are as follows. 1. Significantly lower storage size and, consequently,

lower I/O-cost associated with query processing. 2. Fewer number of joins in the corresponding SQL

queries. For further details please refer to [6].

3. Prototype’s architecture and its

implementation The main idea for the implementation of the prototype is taken from [3] with some modifications & enhancements like XQuery support and the usage of most efficient available mapping strategy SUCXENT instead of any slower one. A relatively simple database system MySQL is chosen which is free of cost, open source and easily available. This implementation adds collection support to the MySQL database. A collection is a set of XML documents stored in fixed schema of tables. In this case it is a set of fixed schema tables according to the proposed by SUCXENT[6]. In reality, MySQL database model does not change, but from the user point of view, inside a database there is not only tables but also collections. In addition, the users can not modify or access to the tables of collection directly. Users know the existing collection names and types only. They can create, drop or browse collections. The users can insert, browse or delete XML documents into collections. The independent classes in PHP can be created and embedded to PhpMyAdmin program which is a web based interface between MySQL and the users. The PhpMyAdmin program provides all database operations with user friendly web interface. After the implementation it will become repository for both XML documents and relational data. The prototype’s detailed architecture is show as under.

4. Conclusions This prototype can be used as an affordable and quick solution until XML data processing matures. The key to this approach is storing XML documents in a relational database, providing a user interface for XML manipulation, and adding an XQuery query processor for XML querying. The prototype implemented in this study can be used with any other database management system as it doesn’t require any modification to the DBMS itself. It provides collections or XML repositories to store XML documents in a database. The prototype has flexibility to add any future proposed more efficient schema-oblivious mapping strategy as new collection. The prototype can also be used as benchmark tool for the researchers to compare various model mapping XML schemas by adding them as collection. References [1] R. Bourret, C. Bornh¨ ovd, A. Buchmann A Generic

Load/Extract Utility for Data Transfer between XML Documents and Relational Databases Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000)

[2] Nianjun Zhou, George Mihaila, Dikran Meliksetian XML Data Mediator Integrated solution for XML Roundtrip from XML to Relational Proceedings of the 13th international WWW 2004, May 17–22, 2004, New York, New York, USA. ACM 1-58113-912-8/04/0005.

[3] Z.Şevkli, M.Mercan, A.Kurt, "A Middleware Approach to Storing and Querying XML Documents in Relational Databases", Lecture Notes in Computer Science (Springer ISSN: 0302-9743), Vol. 3261, Oct. 2004 , pp. 223-233.

[4] Florescu, D., and Kossmann, D., “Storing and quering xml data using an RDBMS”, IEEE Data Engineering Bulletin 22, 3, 27-34, 1999

[5] YoshiKawa, M. And Amagasa, T., “XRel: A Path -based approach to storage andRetrieval of XML documents using relational databases”, ACM Transaction on Internet Technology, 2001

[6] S. Prakash, S. S. Bhowmick, S. Mardia. SUCXENT: An Efficient Path-based Approach to Store and Query XML Documents. Proceedings of the 15th DEXA, Zaragoza, Spain, 2004.

[7] Jiang, H., Lu, H., Wang, W., and Yu, J.X., Path materialization revisited: an efficient storage model for XML data. In Proceedings of the thirteenth Australasian conference on Database technologies. 2002, pages 85-94, Australian Computer Society, Inc

[8] J. Shanmugasundaram, K. Tufte et al. Relational Databases for Querying XML Documents: Limitations and Opportunities. VLDB 1999

XQuery Parser

XQuerytoSQL

XQuery Class

PhpM

yAdm

in

MyS

QL

Web

Inte

rfac

e Pr

ogra

m

Collection Creator

Collection Class

Collection Remover

Collection Cleaner

Collection Insertion

Document Parser

XML Document Class

Document Loader

Document Brower

Document Deletion

Table 1

Table 2

Table 3

EXTENDED MYSQL RDBMS FOR XML

Table …

Table n

RDBMS SUPPORT

Document

Path

PathValue TextContent

DumentRValue

XML SUPPORT USING SUCXENT

Col

lect

ions

Fig. 3.1 Prototype’s Architecture

9th International Conference on Information Technology (ICIT'06)0-7695-2635-7/06 $20.00 © 2006