XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
-
Upload
blaise-peters -
Category
Documents
-
view
213 -
download
1
Transcript of XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
XML and Its Applications
Ben Y. Zhao, CS294-7
Spring 1999
Overview: The XML Language
What is XML Document Type Definitions XML and DTD Example XML APIs: DOM and SAX Pros and Cons
What is XML?
Extensible Markup Language is a simplified subset of Standard Generalized Markup Language.
Tags can be arbitrarily named, and can be used to encode semantic information about enclosed data.
Documents can be well-formed, or optionally validated against Document Type Definitions.
XML documents can be componentized, and be distributed across networks.
Creates a self-describing, text-based framework around text and binary data
Document Type Definitions
DTD: Concise structural definition of an XML document type.
DTDs are optional, well-formed+DTD=>valid DTDs provide a way to enforce XML documents’
compliance to constraints on XML documents. Optional elements and attributes add additional
flexibility to support evolving schemas. DTDs can be embedded in the XML document,
providing a self-sufficient validating document.
XML and DTD Example (open editor)
<?xml version =“1.0” encoding="US-ASCII"?>
<!ELEMENT PERSON (NAME, (EMAIL)+)>
<!ELEMENT NAME (FIRSTNAME, LASTNAME)>
<!ELEMENT FIRSTNAME (#PCDATA)>
<!ELEMENT LASTNAME (#PCDATA)>
<!ELEMENT EMAIL (#PCDATA)><?xml version =“1.0”?>
<PERSON>
<NAME>
<FIRSTNAME>Ben</FIRSTNAME>
<LASTNAME>Zhao</LASTNAME>
</NAME>
<EMAIL>[email protected]</EMAIL>
<EMAIL>[email protected]</EMAIL>
</PERSON>
XML APIs: DOM and SAX
DOM (Document Object Model)– Provides a definitive API for accessing hierarchical
description languages like XML, HTML– Specifies interfaces for accessing all part of a document– Includes inheritance, typing, and constants
SAX (Simple API for XML)– An event-driven parser API– API reports parsing events to application via callbacks– Optimized for parsing large documents by eliminating
need to keep tree structure in memory
Pros and Cons
Pros Simple (human readable) Standard (easy to
integrate, widely adopted) Portable (cross-platform
data exchange) Flexible
(handles complex data) Extensible
(dynamic data model)
Cons Text-based means space
consuming Standardization is still a
problem to be solved Evolutionary model
means ill-defined functionality core
Overview: Current XML Efforts
XML Tools Evolving Recommendations Industry Databases XML Query Languages Research Query Engines Relevance to Systems Research
XML Tools
Parsers– Existing parsers support DOM or SAX– Varying XML compliance and performance
Editors for XSL, XML and DTDs Browsers Converters
– Applications that convert from and to XML
Document Management– Lightweight searching and indexing tools– Difference engines
Related Evolving Recommendations
Namespaces: qualifying names with URI references XML-data, defines XML vocabulary for schemas
(definitions of characteristics of classes of objects) XLink
– XML Linking language, sophisticated link styles
XPointers– XML Pointers to all parts of XML documents
RDF (Resource Description Framework)– model for using XML to describe metadata on the web
DCD(Document Content Description)– XML-data + RDF
Industry XML Databases
ObjectStore eXcelon– Middle tier server that imports from different DB stores– XQL queries applied to integrated data– Provides “cache server” for XML imported from
heterogeneous DB backends– Focuses on web applications as access methods to DBs
Poet XML Repository– Object oriented database with standard DB
functionality, with OQL– Focuses on use of XML to faciliate EDI
XML Query Languages
XML-QL (AT&T, Inria, U.Wash.)– Very similar to SQL– Optimizations and other DB techniques applicable– Data integration and conversion from hetero. sources
XQL (Microsoft)– Based on the XSL transformation language– Context based and XML-specific query matching– Departure from the database-centric SQL format
LOREL (Stanford)– See notes from Last Week’s LOREL presentation
Research Query Engines
LORE (Stanford)– Based on the LOREL query language– A feature-rich DB approach to XML storage and query,
with context-free indexing, path indexing through dataguides, query optimizations, and views
XSet (UCB, Ninja)– Streamlined XML search engine implemented in Java– Focus on high performance rather than feature set– Small size favors integration into low-level applications– Research issues on next slide
Relevance to Systems Work
FSML: XML meta-index for fast access to files Distributed Service Discovery (Ninja SDS)
– Service descriptions encoded in XML
Semantically Enhanced Web searching Data exchange across heterogeneous platforms Low overhead scripting language for thin clients User preferences
– Embedded logic and scripting inside XML
Discussion
XML: flexible description language with optional DTD validation
Provides flexible framework for marking data with inferred semantics
Provides additional push towards standardization, but not as a result of the language itself
Are the benefits of the XML movement due to something intrinsic in the language?
Description language of choice? Pervasive among future applications?