XML Parsers Overview Types of parsers Using XML parsers SAX DOM DOM versus SAX Products ...

20
XML Parsers Overview Types of parsers Using XML parsers SAX DOM DOM versus SAX Products Conclusion

Transcript of XML Parsers Overview Types of parsers Using XML parsers SAX DOM DOM versus SAX Products ...

Page 1: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

XML Parsers

Overview Types of parsers

Using XML parsers

SAX

DOM

DOM versus SAX

Products

Conclusion

Page 2: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Types of Parsers

There are several different ways to categorise

parsers:

– Validating versus non-validating parsers

– Parsers that support the Document Object Model (DOM)

– Parsers that support the Simple API for XML (SAX)

– Parsers written in a particular language (Java, C++, Perl, etc.)

Page 3: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Non-validating Parsers

Speed and efficiency- It takes a significant amount of effort for an XML

parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD.

If only want to find tags and extract information - use non-validating

Page 4: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Using XML Parsers

• Three basic steps to use an XML parser– Create a parser object – Pass your XML document to the parser – Process the results

• Generally, writing out XML is outside scope of parsers (though some may implement proprietary mechanisms)

Page 5: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parsing XML

Two established API's:

– SAX (Simple API for XML)• Define handlers containing methods as XML

parsed

– DOM (Document Object Model)• Defines a logical tree representing the parsed

XML

Page 6: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parsing XML: DOM

• Document Object Model

• standard API for accessing and creating XML data

• tree-based

• programming language indepedent

• developed by W3C

• whole document is read into memory

• read and write

Page 7: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Creating a DOM Tree

• A DOM implementation will have a method to pass a XML file to a factory object that will return a Document object that represents root element of whole document

• After this, may use DOM standard interface to interact with XML structure

API

Application

Page 8: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parsing XML: DOM

XML File DOM Tree

Page 9: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

DOM Interfaces

• The DOM defines several interfaces

– Node The base data type of the DOM – Element Represents element– Attr Represents an attribute of an element– Text The content of an element or attribute– Document Represents the entire XML document.

A Document object is often referred to as a DOM tree

Page 10: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

DOM Level

• DOM Level 1- basic functionality for document navigation and manipulation.

• DOM Level 2- includes a style sheet object model- defines an event model and provides support for XML namespaces.

• DOM Level 3- still under development- addresses document loading and saving

- content model (DTDs and schemas) with document validation support.

Page 11: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parsing XML: SAX

• Simple API for XML

• API for accessing xml data

• event based

• programming language indepedent

• application has to store fragments into memory

• read only

Page 12: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parsing XML: SAX

• SAX is an interface to the XML parser based on

streaming and call-backs

• You need to implement the HandlerBase interface :

• startDocument, endDocument

• startElement, endElement

• characters

• warning, error, fatalError

Page 13: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parsing XML: SAX

XML File SAX calls

Page 14: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

SAX versus DOM

DOM:• read and write• need to move back and forth in data• document is human created

SAX:• read only• huge data or streams• data is machine generated

Page 15: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

DOM pro and contra

PRO

• The file is parsed only once. • High navigation abilities : this is the aim of the DOM design.

CONTRA

• More memory needed since the XML tree is in memory.

Page 16: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

SAX pro and contra

PRO• Low memory needs since the XML file is never entirely in

memory• Can deal with XML streams

CONTRA

• The file has to be parsed entirely to access any node. Thus, getting the 10 nodes included in a catalog ended up in parsing 10 times the same file.

• Poor navigation abilities : no way to get easily the children of a given node or the list of "B" nodes

Page 17: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

SAX versus DOM

• If your document is very large and you only need a

few elements - use SAX

• If you need to process many elements and perform

operations on XML - use DOM

• If you need to access the XML many times

- use DOM

Page 18: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Parser Products

• Xerces4J / Xerces4C++ (Apache)

• James Clark’s XP (Java)

• IBM XML4J / XML4C++

• Java Project X (Sun)

• Oracle’s XML Parser for Java

• MSXML (Microsoft)

• Dan Connolly’s XML Parser (Phyton)

• …

Page 19: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

Conclusion

• The parser is key building block for every XML application.

• When building XML applications, you have to think how will you handle large chunks of data

• Choosing between SAX and DOM is not always trivial

Page 20: XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.

The End

Questions?

Thank you!