XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

15
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999

Transcript of XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Page 1: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

XML and Its Applications

Ben Y. Zhao, CS294-7

Spring 1999

Page 2: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Overview: The XML Language

What is XML Document Type Definitions XML and DTD Example XML APIs: DOM and SAX Pros and Cons

Page 3: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

What is XML?

Extensible Markup Language is a simplified subset of Standard Generalized Markup Language.

Tags can be arbitrarily named, and can be used to encode semantic information about enclosed data.

Documents can be well-formed, or optionally validated against Document Type Definitions.

XML documents can be componentized, and be distributed across networks.

Creates a self-describing, text-based framework around text and binary data

Page 4: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Document Type Definitions

DTD: Concise structural definition of an XML document type.

DTDs are optional, well-formed+DTD=>valid DTDs provide a way to enforce XML documents’

compliance to constraints on XML documents. Optional elements and attributes add additional

flexibility to support evolving schemas. DTDs can be embedded in the XML document,

providing a self-sufficient validating document.

Page 5: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

XML and DTD Example (open editor)

<?xml version =“1.0” encoding="US-ASCII"?>

<!ELEMENT PERSON (NAME, (EMAIL)+)>

<!ELEMENT NAME (FIRSTNAME, LASTNAME)>

<!ELEMENT FIRSTNAME (#PCDATA)>

<!ELEMENT LASTNAME (#PCDATA)>

<!ELEMENT EMAIL (#PCDATA)><?xml version =“1.0”?>

<PERSON>

<NAME>

<FIRSTNAME>Ben</FIRSTNAME>

<LASTNAME>Zhao</LASTNAME>

</NAME>

<EMAIL>[email protected]</EMAIL>

<EMAIL>[email protected]</EMAIL>

</PERSON>

Page 6: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

XML APIs: DOM and SAX

DOM (Document Object Model)– Provides a definitive API for accessing hierarchical

description languages like XML, HTML– Specifies interfaces for accessing all part of a document– Includes inheritance, typing, and constants

SAX (Simple API for XML)– An event-driven parser API– API reports parsing events to application via callbacks– Optimized for parsing large documents by eliminating

need to keep tree structure in memory

Page 7: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Pros and Cons

Pros Simple (human readable) Standard (easy to

integrate, widely adopted) Portable (cross-platform

data exchange) Flexible

(handles complex data) Extensible

(dynamic data model)

Cons Text-based means space

consuming Standardization is still a

problem to be solved Evolutionary model

means ill-defined functionality core

Page 8: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Overview: Current XML Efforts

XML Tools Evolving Recommendations Industry Databases XML Query Languages Research Query Engines Relevance to Systems Research

Page 9: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

XML Tools

Parsers– Existing parsers support DOM or SAX– Varying XML compliance and performance

Editors for XSL, XML and DTDs Browsers Converters

– Applications that convert from and to XML

Document Management– Lightweight searching and indexing tools– Difference engines

Page 10: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Related Evolving Recommendations

Namespaces: qualifying names with URI references XML-data, defines XML vocabulary for schemas

(definitions of characteristics of classes of objects) XLink

– XML Linking language, sophisticated link styles

XPointers– XML Pointers to all parts of XML documents

RDF (Resource Description Framework)– model for using XML to describe metadata on the web

DCD(Document Content Description)– XML-data + RDF

Page 11: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Industry XML Databases

ObjectStore eXcelon– Middle tier server that imports from different DB stores– XQL queries applied to integrated data– Provides “cache server” for XML imported from

heterogeneous DB backends– Focuses on web applications as access methods to DBs

Poet XML Repository– Object oriented database with standard DB

functionality, with OQL– Focuses on use of XML to faciliate EDI

Page 12: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

XML Query Languages

XML-QL (AT&T, Inria, U.Wash.)– Very similar to SQL– Optimizations and other DB techniques applicable– Data integration and conversion from hetero. sources

XQL (Microsoft)– Based on the XSL transformation language– Context based and XML-specific query matching– Departure from the database-centric SQL format

LOREL (Stanford)– See notes from Last Week’s LOREL presentation

Page 13: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Research Query Engines

LORE (Stanford)– Based on the LOREL query language– A feature-rich DB approach to XML storage and query,

with context-free indexing, path indexing through dataguides, query optimizations, and views

XSet (UCB, Ninja)– Streamlined XML search engine implemented in Java– Focus on high performance rather than feature set– Small size favors integration into low-level applications– Research issues on next slide

Page 14: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Relevance to Systems Work

FSML: XML meta-index for fast access to files Distributed Service Discovery (Ninja SDS)

– Service descriptions encoded in XML

Semantically Enhanced Web searching Data exchange across heterogeneous platforms Low overhead scripting language for thin clients User preferences

– Embedded logic and scripting inside XML

Page 15: XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.

Discussion

XML: flexible description language with optional DTD validation

Provides flexible framework for marking data with inferred semantics

Provides additional push towards standardization, but not as a result of the language itself

Are the benefits of the XML movement due to something intrinsic in the language?

Description language of choice? Pervasive among future applications?