eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

77
Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... eXtensible Markup Language APIs in Java 1.6 Simple and efficient XML parsing using Java lanaguage Wojciech Podg´ orski http://podgorski.wordpress.com April 8, 2008 Wojciech Podg´ orski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

description

Presentation describes modern ways of parsing XML documents using Java language. It shows different approaches to the same problem, their capabilities, advantages, disadvantages and their comparison. Moreover, we can learn what to expect from Java 7 in context of XML.

Transcript of eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

Page 1: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

eXtensible Markup Language APIs in Java 1.6Simple and efficient XML parsing using Java lanaguage

Wojciech Podgorskihttp://podgorski.wordpress.com

April 8, 2008

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 2: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Presentation outline

1 IntroductionWhat is parsingDiffrent ways of parsing documents

2 XML API’s in JavaSAXDOMStAX

3 Capabilities and performance comparison

4 CASE STUDY: Parsing Really Simple Syndication (RSS) doc

5 What next? Alternatives to API’s, Java SE 7.0 features

6 Summary

7 Further reading...

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 3: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

Parsing definition

Parsing, more formally called syntactic analysis is the process ofanalyzing a sequence of tokens to determine grammatical structurewith respect to a given formal grammar.

Source: http://en.wikipedia.org/wiki/Parsing

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 4: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

We can distinguish three main models of parsing XML documents.Each one of them differs with mechanism of traversing betweenthe nodes and idea of processing XML data.Those models are:

SAX - Simple API for XML

DOM - Document Object Model

StAX - Streaming API for XML

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 5: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

We can distinguish three main models of parsing XML documents.Each one of them differs with mechanism of traversing betweenthe nodes and idea of processing XML data.Those models are:

SAX - Simple API for XML

DOM - Document Object Model

StAX - Streaming API for XML

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 6: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

We can distinguish three main models of parsing XML documents.Each one of them differs with mechanism of traversing betweenthe nodes and idea of processing XML data.Those models are:

SAX - Simple API for XML

DOM - Document Object Model

StAX - Streaming API for XML

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 7: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

We can distinguish three main models of parsing XML documents.Each one of them differs with mechanism of traversing betweenthe nodes and idea of processing XML data.Those models are:

SAX - Simple API for XML

DOM - Document Object Model

StAX - Streaming API for XML

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 8: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

That’s not all! There are other approaches, which won’t bedescribed in this presentation.

JAXB - Java XML Binding APITechnology providing ability to marshal Java objects intoXML and the reverse, i.e. to unmarshal XML elements backinto Java objects. Working on top of another parser (mostlystreaming parsers).

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 9: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

That’s not all! There are other approaches, which won’t bedescribed in this presentation.

JAXB - Java XML Binding APITechnology providing ability to marshal Java objects intoXML and the reverse, i.e. to unmarshal XML elements backinto Java objects. Working on top of another parser (mostlystreaming parsers).

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 10: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

JavolutionLibrary providing real-time StAX-like implementation whichdoes not force object creation and has smaller effect onmemory footprint/garbage collection, using eg. lookup tablesfor retriving and reusing data.

VTD-XML - Virtual Token Descriptor for XMLCollection of efficient processing technologies, centeredaround a non-extractive and ‘document-centric‘ parsingtechnique called VTD. Supports random access’ and XPath

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 11: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

What is parsingDiffrent ways of parsing documents

JavolutionLibrary providing real-time StAX-like implementation whichdoes not force object creation and has smaller effect onmemory footprint/garbage collection, using eg. lookup tablesfor retriving and reusing data.

VTD-XML - Virtual Token Descriptor for XMLCollection of efficient processing technologies, centeredaround a non-extractive and ‘document-centric‘ parsingtechnique called VTD. Supports random access’ and XPath

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 12: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

SAX as a processing model

While describing SAX, firstly it should be considered as a specificprocessing mechanism, rather then simple API. SAX representsevent-driven architecture. It means, that parser would performan operation each time when a particular event will occur.

To handle these occurences, user defines a number of callbackmethods, which will be called when parser is notified aboutencountered element.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 13: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Figure: Top-down parsing in SAX API

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 14: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

In Java language, SAX API is a collection of classes and interfaces,which should be implemented while constructing XML parser.Package containing this collection is:

org.xml.sax.*

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 15: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Figure: org.xml.sax.* package class diagram

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 16: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Basic class structure

1 // D e c l a r e document URI2 S t r i n g xmlURI = ” h t t p : / / example . com/ r e p o r t . xml ” ;3

4 // C r e a t e r e a d e r i n s t a n c e5 XMLReader r e a d e r = XMLReaderFactory . createXMLReader ( ) ;6

7 // Set i m p l e m n t a t i o n c l a s s o f Content Hand le r8 r e a d e r . s e t C o n t e n t H a n d l e r ( new MyContentHandler ( ) ) ;9

10 // R e s o l v e document s o u r c e11 I n p u t S o u r c e i n p u t S o u r c e = new I n p u t S o u r c e ( xmlURI ) ;12

13 // Parse document14 r e a d e r . p a r s e ( i n p u t S o u r c e ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 17: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Diffrent SAX implementations

1 // X e r c e s i m p l e m e n t a t i o n2 XMLReader r e a d e r =3 new org . apache . x e r c e s . p a r s e r s . SAXParser ( ) ;4

5 // JAXP i m p l e m e n t a t i o n6 SAXParser p a r s e r = SAXParserFactory . newSAXParser ( ) ;7 XMLReader r e a d e r = p a r s e r ;8

9 // P i c c o l o i m p l e m e n t a t i o n10 XMLReader r e a d e r = new com . b l u e c a s t . xml . P i c c o l o ( ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 18: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Other SAX features

SAX provides number of interfaces for correct data handling. Someof them, not only process the content of document, but also it’sstructure.

Interfaces such as:

ErrorHandler

EntityResolver

DTDHandler

Analyze also structure of the document, for possible errors, entitylinks or elements describing other elements.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 19: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Advanced SAX features I

SAX API is considered as very flexible solution. Mainly because itcan be configured by properites and features.

1 v o i d s e t P r o p e r t y ( S t r i n g p r o p e r t y I D , Object v a l u e ) ;2 v o i d s e t F e a t u r e ( S t r i n g f e a t u r e I D , b o o l e a n s t a t e ) ;

Properties and features modify parser behaviour while processingdocument. For example, we can validate if document is well-formedXML file, or validate it against the schema related to it.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 20: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Advanced SAX features II

Among many other interesting SAX features, one is very importantand radically extends SAX capabilities. Interface XMLFilter allowsto create a cascade of parsers, each for a different processingoperation. It greatly accelerates parsing as a one piece.

Figure: Cascade processing using XMLFilter interface

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 21: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... I

Q: Why do we need other mechanisms, if SAX is so good?

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 22: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... I

Q: Why do we need other mechanisms, if SAX is so good?

A: SAX has some serious limitations due to his sequential dataaccess.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 23: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... II

SAX parse data from beginning to end. It doesn’t allow to goback. And also got some other negative issues.:

it is unable to modify content or structure of document

it cannot access specific or random elements

it cannot access sibling elements

it is not serializable

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 24: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... II

SAX parse data from beginning to end. It doesn’t allow to goback. And also got some other negative issues.:

it is unable to modify content or structure of document

it cannot access specific or random elements

it cannot access sibling elements

it is not serializable

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 25: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... II

SAX parse data from beginning to end. It doesn’t allow to goback. And also got some other negative issues.:

it is unable to modify content or structure of document

it cannot access specific or random elements

it cannot access sibling elements

it is not serializable

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 26: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... II

SAX parse data from beginning to end. It doesn’t allow to goback. And also got some other negative issues.:

it is unable to modify content or structure of document

it cannot access specific or random elements

it cannot access sibling elements

it is not serializable

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 27: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... II

SAX parse data from beginning to end. It doesn’t allow to goback. And also got some other negative issues.:

it is unable to modify content or structure of document

it cannot access specific or random elements

it cannot access sibling elements

it is not serializable

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 28: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

What SAX cannot do... II

SAX parse data from beginning to end. It doesn’t allow to goback. And also got some other negative issues.:

it is unable to modify content or structure of document

it cannot access specific or random elements

it cannot access sibling elements

it is not serializable

So it seems, that it is useless. THAT’S NOT TRUE! (comparisonsection). Every issue mentioned above can be resolved by SAXcomplement...

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 29: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

DOM as a processing model

Document Object Model is based on a whole different idea.It doesn’t parse document and react to specific events (though it isable to), instead of this it builds up a tree based on documentsstructure, and store it in memory as an object.Due to this, every node in this tree is always available and can beaccessed later on, many times. Moreover, strucutre stored inmemory, can be easily transformed in many ways.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 30: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

DOM architecture I

DOM, in contrary to SAX, is a standard developed by W3C1. Dueto standarization it has strict architecture divided into levels, eachcontaining required and optional modules.

To claim to support a level, an application must implement all therequirements of the claimed level and the levels below it. There are3 levels, the newest (DOM 3) has been developed in 2004 and isthe current release of the DOM specification.

Every level has it’s core, which is a root element for other modules(figure)

1Refernce to the standard could be found on W3C sites

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 31: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Figure: Document Object Model architecture (Adapted from original W3C specification)

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 32: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

In Java language, DOM has a different structure than SAX. Almostevery class representing Document Object Model implementsinterfaces inherited from org.w3c.dom.Node interface.

Such framework, allows very simple data manipulation andtraversing between nodes contained in tree structure. It is essentialto understand how elements are stored in tree (figure).

For example if we want to read text data from element A, weshould get his child element contatining text, not extract elementsA content.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 33: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Figure: org.w3c.dom.* package class diagram From [1]

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 34: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Basic class structure using Java implementation

1 S t r i n g docURI = ” h t t p : / / example . org / n u t r i t i o n . xml ” ;2 // g e t new D o c u m e n t B u i l d e r F a c t o r y3 D o c u m e n t B u i l d e r F a c t o r y d o c B u i l d e r F a c t o r y =4 D o c u m e n t B u i l d e r F a c t o r y . n e w I n s t a n c e ( ) ;5 // g e t new DocumentBui lder6 DocumentBui lder d o c B u i l d e r =7 d o c B u i l d e r F a c t o r y . newDocumentBui lder ( ) ;8 // i n i t i a l i z e document w i t h n u l l9 Document doc = n u l l ;

10 // p a r s e document11 doc = d o c B u i l d e r . p a r s e ( docURI ) ;12 // e x t r a c t r o o t e l em en t and13 // n o r m l i z e whole t r e e ( o p t i o n a l )14 doc . getDocumentElement ( ) . n o r m a l i z e ( ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 35: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Accessing elements

1 N o d e L i s t e l e m e n t s = n u l l ;2 // g e t ” food ” e l e m e n t s3 e l e m e n t s = doc . getElementsByTagName ( ” food ” ) ;4 f o r ( i n t i =0; i <e l e m e n t s . g e t L e n g t h ( ) ; i ++)5 // g e t ” Avocado Dips ”6 S t r i n g foodName = e l e m e n t s . i tem ( i ) . getNodeName ( ) ;7 i f ( foodName . c o n t a i n s ( ” Avocado Dip ” ) )8 {9 N o d e L i s t l = e l e m e n t s . i tem ( i ) . g e t C h i l d N o d e s ( ) ;

10 f o r ( i n t j =0; j< l . g e t L e n g t h ( ) ; j ++)11 // p r i n t out c a l o r i e s12 i f ( l . i tem ( j ) . getNodeName ( ) . e q u a l s ( ” c a l o r i e s ” ) )13 System . out . p r i n t l n ( l . i tem ( j ) . g e t T e x t C o n t e n t ( ) ) ;14 }

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 36: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Modyfing elements

1 . . .2 i f ( l . i tem ( j ) . getNodeName ( ) . e q u a l s ( ” c a l o r i e s ” ) )3 {4 I n t e g e r c a l =( I n t e g e r ) ( l . i tem ( j ) . g e t T e x t C o n t e n t ( ) ) ;5 // i f food avocado d i p has more than 300 c a l .6 i f ( c a l > 300)7 {8 Element a v o c a d o d i p = l . i tem ( j ) . getParentNode ( ) ;9 // r e p l a c e i t w i t h low f a t food

10 Element newfood=doc . c r e a t e E l e m e n t ( ” LowFatFood ” ) ;11 doc . r e p l a c e C h i l d ( newfood , a v o c a d o d i p ) ;12 }13 }

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 37: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Diffrent DOM implementations

1 // X e r c e s DOM i m p l e m e n t a t i o n2 DOMParser p=new org . apache . x e r c e s . p a r s e r s . DOMParser ( ) ;3 p . p a r s e ( new I n p u t S o u r c e ( xmlURI ) ) ;4 Document doc = p . getDocument ( ) ;5

6 // JDOM DOM i m p l e m e n t a t i o n7 DOMBuilder b u i l d e r = org . jdom . i n p u t . DOMBuilder ( ) ;8 Document d=b u i l d e r . b u i l d ( new F i l e I n p u t S t r e a m ( xmlURI ) ) ;9 // i t ’ s org . jdom . Document not org . w3c . dom . Document !

10

11 // dom4j DOM i m p l e m e n t a t i o n12 SAXReader r e a d e r = new org . dom4j . i o . SAXReader ( ) ;13 Document document = r e a d e r . r e a d ( xmlURI ) ;14 // i t ’ s org . dom4j . Document not org . w3c . dom . Document !

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 38: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Advanced DOM features I

DOM provides many advanced functionalities with modulesspecified in standard (mainly level 3 modules). Some of them:

MutationEvents module provides methods for changeslistining

LS, LS-Async modules provides methods for various kinds ofserialization

Validation module provides methods for real-time validation

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 39: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Advanced DOM features II

It is important, while using specified API, to check what modulesand in what version are implemented. To do this, we can use:

1 b o o l e a n h a s F e a t u r e ( S t r i n g f e a t u r e , S t r i n g v e r s i o n ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 40: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Streaming API for XML - different approach

The third approach to processing XML data is based on idea totreat incoming information, about events, as a stream.

Streaming API for XML use technique called pull parsing whichprovides a sequential access to the document adapting iteratordesign pattern. Associating this with java.util.Iterator is notaccidenatial, because part of API implements this interface.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 41: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

StAX architecture

StAX in Java divides into two (theoretically) seperate APIs:

cursor API represented by XMLStreamReader andXMLStreamWriter classes. Maintained as a fast and mostefficient solution.

event API represented by XMLEventReader andXMLEventWriter classes. Regarded as a simple and andflexible solution.

Both are specified in JSR173 and contained in javax.xml.stream.*

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 42: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Difference between SAX event-driven architecture

Common view as if StAX API is similar to SAX is wrong.

SAX architecture provides number of interfaces to handle incomingevents. StAX Event API provides methods for iterating throughevent stream, and proper handling specific occurences.

Moreover StAX is symmetric Read/Write API which allows alsoto modify and store elements.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 43: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Basic class structure

1 /∗ C r e a t i n g r e a d e r s . . . ∗/2

3 // c r e a t i n g i n p u t f a c t o r y4 S t r i n g xmlURI = ” h t t p : / / example . org / n u t r i t i o n . xml ”5 S t r i n g R e a d e r s r = new S t r i n g R e a d e r ( xmlURI ) ;6 XMLInputFactory i f = XMLInputFactory . n e w I n s t a n c e ( ) ;7

8 // c u r s o r API r e a d e r9 XMLStreamReader c u r = i f . createXMLStreamReader ( s r ) ;

10 // e v e n t API r e a d e r11 XMLEventReader e v e n t = i f . createXMLEventReader ( s r ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 44: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Identifying events I

Main issue while using StAX is how to identify event which hasjust occured. There are many ways to do that, most simple is tocheck the constant connected with an event (cursor API).Constants are declared in XMLStreamConstants interface2.For example:

1 - START ELEMENT

2 - END ELEMENT

3 - PROCESSING INSTRUCTION

And so on...

2https://java.sun.com/webservices/docs/1.5/api/javax/xml/stream/XMLStreamConstants.html

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 45: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Accessing elements by iterator I (cursor API)

1 s t a r t E l e m = XMLStreamConstants . START ELEMENT ;2 // w h i l e t h e r e i s n e x t e v e n t3 w h i l e ( c u r . hasNext ( ) )4 {5 // c a t c h e v e n t t y p e6 i n t eventType = c u r . n e x t ( ) ;7 System . out . p r i n t l n ( evenType ) ;8 // i f e v e n t t y p e i s START ELEMENT9 // p r i n t e l e m e n t s t e x t c o n t e n t

10 i f ( eventType == s t a r t E l e m )11 System . out . p r i n t l n ( c u r . getE lementText ( ) ) ;12 }

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 46: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Identifying events II

In event API identyfing events is a bit different. XMLEventReaderProvides methods:

1 XMLEvent n e x t E v e n t ( ) ;2 b o o l e a n hasNext ( ) ;

So, to identify catched event, we must analyse XMLEvent objectreturned from the first method. Once again there are few ways todo that. Getting event type method can be called:

1 i n t getEventType ( ) ;

Or we can test if element is certain type, by one of “is“ methods.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 47: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Accessing elements by iterator II (event API)

1 // w h i l e t h e r e i s n e x t e v e n t2 w h i l e ( e v e n t . hasNext ( ) )3 {4 XMLEvent e = e v e n t . n e x t E v e n t ( ) ;5 // i d e n t i f y e v e n t by c a s t i n g !6 i f ( e i n s t a n c e o f S t a r t E l e m e n t )7 {8 // c a s t e v e n t to s p e c i f i c e l e me nt9 S t a r t E l e m e n t s e = ( S t a r t E l e m e n t ) e ;

10 QName name = s e . getName ( ) ;11 // p r i n t e l em en t name12 System . out . p r i n t l n ( name . g e t L o c a l P a r t ( ) ) ;13 }14 }

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 48: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Advanced iteration methods

Both StAX APIs provides more complex iteration methods.

1 XMLEvent nextTag ( ) ;2 // o n l y i n XMLEventReader3 XMLEvent peek ( ) ;4 // o n l y i n XMLStreamReader5 v o i d r e q u i r e ( i n t type , S t r i n g nsURI , S t r i n g l o c a l N ) ;

First method moves cursor omitting events, until the start or endof the element. Second allows to check next event before movingcursor. And third compares cursor position with wanted value.All methods are well documented and should reviewed by reader.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 49: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

EventFilters and StreamFilters I

StAX API allows to create filtered readers. It’s not necessary tocreate complex stream handlers to process specific events. Onlything that should be done is implementing one (or both) interfacecontaining singular method.Interfaces:

1 E v e n t F i l t e r ( e x t e n d s XMLFi l te r )2 S t r e a m F i l t e r ( e x t e n d s XMLFi l te r )

Methods:

1 p u b l i c b o o l e a n a c c e p t ( XMLEvent e v e n t )

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 50: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

EventFilters and StreamFilters II

Implementing filter is simple:

1 p u b l i c c l a s s C h a r F i l t e r imp lements E v e n t F i l t e r2 {3 p u b l i c b o o l e a n a c c e p t ( XMLEvent e v e n t )4 {5 r e t u r n ( e v e n t . getEventType ( ) ==6 XMLStreamConstants . CHARACTERS ) ;7 }8 }

Filter above will only react to characters elements.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 51: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Writing elements I

StAX as a symmetric API providing I/O handling is able to writeXML data. It provides to interfaces to do that:

1 XMLEventWriter ( e x t e n d s XMLEventConsumer )2 XMLStreamWriter

Basic difference between them, is that XMLEventWriter has lessfunctionalities.

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 52: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Writing elements II

1 // u s i n g XMLStreamWriter2 OutputStream c o n s o l e = System . out ;3 XMLOutputFactory o f = XMLOutputFactory . n e w I n s t a n c e ( ) ;4 XMLStreamWriter sw = o f . createXMLStreamWriter ( c o n s o l e ) ;5 sw . w r i t e S t a r t D o c u m e n t ( ” 1 . 0 ” ) ;6 // c r e a t e document w i t h one meal7 sw . w r i t e S t a r t E l e m e n t ( ” n u t r i t i o n ” ) ;8 sw . w r i t e S t a r t E l e m e n t ( ” food ” ) ;9 sw . w r i t e S t a r t E l e m e n t ( ”name” ) ;

10 sw . w r i t e C h a r a c t e r s ( ” C h o c o l a t e i c e cream ” ) ;11 sw . wr i teEndElement ( ) ;12 sw . wr i teEndElement ( ) ;13 sw . wr i teEndElement ( ) ;14 sw . writeEndDocument ( ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 53: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

Writing elements III

1 // th e same u s i n g XMLEventWriter2 OutputStream c o n s o l e = System . out ;3 XMLEventFactory x e f = XMLEventFactory . n e w I n s t a n c e ( ) ;4 XMLOutputFactory o f = XMLOutputFactory . n e w I n s t a n c e ( ) ;5 XMLEventWriter ew = o f . createXMLEventWri ter ( c o n s o l e ) ;6 ew . add ( x e f . c r e a t e S t a r t D o c u m e n t ( ”UTF8” , ” 1 . 0 ” ) ) ;7 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ” n u t r i t i o n ” ) ) ;8 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ” food ” ) ) ;9 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ”name” ) ) ;

10 ew . add ( x e f . c r e a t e C h a r a c t e r s ( ” C h o c o l a t e i c e cream ” ) ) ;11 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ;12 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ;13 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ;14 ew . add ( x e f . createEndDocument ( ) ) ;

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 54: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

SAXDOMStAX

XmlPull

XmlPull is ancestor of StAX. Although StAX is a popular standardfor parsing XML data, XmlPull didn’t retire. Due to its lightweight(JAR file - only 9 kB) XmlPull found applicable for devices withlimited memory. It is often used in developing mobile applications.

http://www.xmlpull.org/

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 55: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Comparing capabilities I

Developing applications processing XML data, always relates withparser choice.

Selection of proper API is essential to success of the project.Although choice is not an easy task. Before making decision, askyourself few questions:

What needs to be done (using parser)?

Is application platform-dependent? If so, what’s the platform?

Is it a distributed system?

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 56: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Comparing capabilities I

Developing applications processing XML data, always relates withparser choice.

Selection of proper API is essential to success of the project.Although choice is not an easy task. Before making decision, askyourself few questions:

What needs to be done (using parser)?

Is application platform-dependent? If so, what’s the platform?

Is it a distributed system?

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 57: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Comparing capabilities I

Developing applications processing XML data, always relates withparser choice.

Selection of proper API is essential to success of the project.Although choice is not an easy task. Before making decision, askyourself few questions:

What needs to be done (using parser)?

Is application platform-dependent? If so, what’s the platform?

Is it a distributed system?

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 58: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Comparing capabilities I

Developing applications processing XML data, always relates withparser choice.

Selection of proper API is essential to success of the project.Although choice is not an easy task. Before making decision, askyourself few questions:

What needs to be done (using parser)?

Is application platform-dependent? If so, what’s the platform?

Is it a distributed system?

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 59: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Comparing capabilities II

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 60: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Benchmarks I

Figures: From http://piccolo.sourceforge.net/bench.html

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 61: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Benchmarks II

Figures: From http://piccolo.sourceforge.net/bench.html

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 62: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Benchmarks III

Figures: From http://www.xml.com/lpt/a/1702

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 63: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Benchmarks IV

Figure: From: http://www.ximpleware.com/benchmark1.html

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 64: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

CASE STUDYParsing Really Simple Syndication documents

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 65: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

RSS definition

RSS is a family of Web feed formats used to publish frequentlyupdated content. An RSS document (which is called a ”feed“ or”web feed“ or ”channel“) contains either a summary of contentfrom an associated web site or the full text stored as a XML. RSSmakes it possible for people to keep up with web sites in anautomated manner that can be piped into applications or filtereddisplays.

Source: http://en.wikipedia.org/wiki/RSS

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 66: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

The initials ”RSS” are used to refer to the following formats:

Really Simple Syndication (RSS 2.0)

RDF Site Summary (RSS 1.0 and RSS 0.90)

Rich Site Summary (RSS 0.91)

While creating solution for reading/writing RSS documents wemust remember that, RSS is not a standard, and doesn’t haveXMLSchema doc descrbing it’s strucutre (or DTD)! Onlyreference could be found on:

http://www.rssboard.org/rss-specification

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 67: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

The CodePresenting jNivo RSS Exterior Plugin v.0.1

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 68: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Every previous presented API, can be thought as difficult to learnand use. It’s partly true, XML APIs in Java have rather difficultsyntax, and hundreds of classes and interfaces, which should behandled to process XML data.

Another thing is that, there are few standards:

javax.xml.stream.* (StAX, JSR-173)

org.w3c.dom.* (DOM standard)

org.xml.sax.* (SAX standard)

JAXP

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 69: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Mark Reinhold3 suggested different way of expressing XML in

Java language4.

Built in:

New type:New package:

java.lang.String ”foo“

java.lang.XML <foo\> (syntax!)java.lang.xml.* (XML Literlas!)

3Chief Engineer for the Java Platform, Standard Edition, at Sun Microsystems.

4Java Technical Session 3441 (TS-3441)

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 70: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Proposed syntax I

Figure: From [3]

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 71: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Proposed syntax II

Figure: From [3]

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 72: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Much more...

Obviously new syntax is not just syntactic sugar, it helps improveproper structure of the document, and prevent from wronginstruction order.Mark Reinhold proposed also:

datatype coders

collections

hybrid event/tree API

accessing by XPath

And more! His blog:

http://blogs.sun.com/mr/

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 73: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Three different approaches to XML parsing

SAX - keywords: event-based, callback model, fast, cannotmodify structure, interfaced based API

DOM - keywords: builds tree in memory, divided intomodules, rather slow, can generate and modify documents

StAX -keywords: pull parsing, events catched from stream,consistent code!, can be used on mobile devices (XmlPull)

RSS parsing? Difficult to make decision about parsing model,most efficient are already implemented APIs for example ROME

http://rome.dev.java.net

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 74: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Brett McLaughlin, Justin EdelsonJava & XMLO’Reilly Media, 3rd edition, 1 December 2006

Cay S. Horstmann, Gary CornellCore Java, Volume II — Advanced FeaturesPrentice Hall PTR, 8th edition, 7 April 2008

Mark ReinholdIntegrating XML into the Java Programming Language TS-3441http://developers.sun.com/learning/javaoneonline/sessions/2006/TS-3441/index.htm

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 75: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Jurgen SaleckerHybrid Parser Architectural Patternhttp://developerlife.com/tutorials/?p=53

Various APIs documentationFor starters it’s good to search wikipedia...Xerces 2 Java Parser http://xerces.apache.org/xerces2-j/JAXP reference implementation https://jaxp.dev.java.net/XOM - XML Object Model http://www.xom.nu/JDOM - Java Document Object Model http://www.jdom.org/StAX - Streaming API for XML http://stax.codehaus.org/VTD - XML - new way of processing XMLhttp://vtd-xml.sourceforge.net/

AND OTHER...

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 76: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

Why?...

Questions ?What if?...

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6

Page 77: eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

IntroductionXML API’s in Java

Capabilities and performance comparisonCASE STUDY: Parsing Really Simple Syndication (RSS) doc

What next? Alternatives to API’s, Java SE 7.0 featuresSummary

Further reading...

THANK YOU

Wojciech Podgorski http://podgorski.wordpress.com eXtensible Markup Language APIs in Java 1.6