Cspp51038
description
Transcript of Cspp51038
![Page 1: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/1.jpg)
Cspp51038
Parsing XML into programming languages
![Page 2: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/2.jpg)
Parsing XML
• Goal: read XML files into data structures in programming languages
• Possible strategies– Parse by hand with some reusable libraries
– Parse into generic tree structure
– Parse as sequence of events
– Automagically parse to language-specific objects
![Page 3: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/3.jpg)
Parsing by-hand
• Advantages– Complete control– Good if simple needs – build off of regex package
• Disadvantages– Must write the initial code yourself, even if it becomes
generalized– Pretty tedious and error prone.– Gets very hard when using schema or DTD to validate– No one does this anymore
![Page 4: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/4.jpg)
Parsing into generic tree structure
• Advantages– Industry-wide, language neutral W3C standard exists called DOM
(Document Object Model)– Learning DOM for one language makes it easy to learn for any
other– As of JAXP 1.2, support for Schema– Have to write much less code to get XML to something you want
to manipulate in your program
• Disadvantages– Non-intuitive API, doesn’t take full advantage of Java– Still quite a bit of work
![Page 5: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/5.jpg)
What is JAXP?
• JAXP: Java API for XML Processing– In the Java language, the definition of these standard
API’s (together with XSLT API) comprise a set of interfaces known as JAXP
– Java also provides standard implementations together with vendor pluggability layer
– Some of these come standard with J2SDK, others are only availdable with Web Services Developers Pack
– We will study these shortly
![Page 6: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/6.jpg)
Another alternative
• JDOM: Native Java published API for representing XML as tree
• Like DOM but much more Java-specific, object oriented
• However, not supported by other languages
• Also, no support for schema
• Dom4j another alternative
![Page 7: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/7.jpg)
JAXB
• JAXB: Java API for XML Bindings
• Defines an API for automagically representing XML schema as collections of Java classes.
• Most convenient for application programming
• Will cover next class
![Page 8: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/8.jpg)
DOM
![Page 9: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/9.jpg)
About DOM
• Stands for Document Object Model
• A World Wide Web Consortium (w3c) standard
• Standard constantly adding new features – Level 3 Core released late 05
• Well cover most of the basics. There’s always more, and it’s always changing.
![Page 10: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/10.jpg)
DOM abstraction layer in Java -- architecture
Returns specific parserimplementation
org.w3d.dom.Document
Emphasis is on allowing vendors to supply their own DOM Implementation without requiring change to source code
![Page 11: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/11.jpg)
Sample Code
DocumentBuilderFactor factory = DocumentBuilderFactory.newInstance();
/* set some factory options here */
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlFile);
A factory instanceis the parser implementation.Can be changed with runtime System property. Jdk has default.Xerces much better.
From the factory one obtainsan instance of the parser
xmlFile can be an java.io.File,an inputstream, etc.
javax.xml.parsers.DocumentBuilderFactoryjavax.xml.parsers.DocumentBuilderorg.w3c.dom.Document
For reference. Notice that theDocument class comes from thew3c-specified bindings.
![Page 12: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/12.jpg)
Validation
• Note that by default the parser will not validate against a schema or DTD
• As of JAXP1.2, java provides a default parser than can handle most schema features
• See next slide for details on how to setup
![Page 13: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/13.jpg)
Important: Schema validation
String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
Next, you need to configure DocumentBuilderFactory to generate a namespace-aware, validating parser that uses XML Schema:
… DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { // Happens if the parser does not support JAXP 1.2 ... }
![Page 14: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/14.jpg)
Associating document with schema
• An xml file can be associated with a schema in two ways
1. Directly in xml file in regular way
2. Programmatically from java
• Latter is done as:– factory.setAttribute(JAXP_SCHEMA_SOURCE,
new File(schemaSource));
![Page 15: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/15.jpg)
A few notes
• Factory allows ease of switching parser implementations– Java provides simple DOM implementation,
but much better to use vendor-supplied when doing serious work
– Xerces, part of apache project, is installed on cluster as Eclipse plugin. We’ll use next week.
– Note that some properties are not supported by all parser implementations.
![Page 16: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/16.jpg)
Document object
• Once a Document object is obtained, rich API to manipulate.
• First call is usually Element root = doc.getDocumentElement();
This gets the root element of the Document as an instance of the Element class
• Note that Element subclasses Node and has methods getType(), getName(), and getValue(), and getChildNodes()
![Page 17: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/17.jpg)
Types of Nodes
• Note that there are many types of Nodes (ie subclasses of Node):Attr, CDATASection, Comment, Document, DocumentFragment, DocumentType, Element, Entity, EntityReference, Notation, ProcessingInstruction, Text
Each of these has a special and non-obvious associated type, value, and name.
Standards are language-neutral and are specified on chart on following slide
Important: keep this chart nearby when using DOM
![Page 18: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/18.jpg)
Node nodeName() nodeValue() Attributes nodeType()
Attr Attr name Value of attribute null 2
CDATASection #cdata-section CDATA cotnent null 4
Comment #comment Comment content null 8
Document #document Null null 9
DocumentFragment #document-fragment
null null 11
DocumentType Doc type name null null 10
Element Tag name null NamedNodeMap 1
Entity Entity name null null 6
EntityReference Name entity referenced
null null 5
Notation Notation name null null 1
ProcessingInstruction target Entire string null 7
Text #text Actual text null 3
![Page 19: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/19.jpg)
DOM Exercise
Write a function to do a depth search printout of the node information of a given XML file as: recursePrint(root);
Assume you have access to the following:printNodeInfo(Node node):prints the name, type, and value of the input node.boolean Node.hasChildNodes(): to check if a node has any childrenNodeList Node.getChildNodes(): to get a list of all children nodesNode NodeList.item(int num): to select the num’th child node
public static void recursePrint(Node node){ }
![Page 20: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/20.jpg)
DOM Exercise Answer
Write a function to do a depth search printout of the node information of a given XML file as: recursePrint(root);
Assume you have access to the following:printNodeInfo(Node node):prints the name, type, and value of the input node.boolean Node.hasChildNodes(): to check if a node has any childrenNodeList Node.getChildNodes(): to get a list of all children nodesNode NodeList.item(int num): to select the num’th child node
public static void recursePrint(Node node){ printNodeInfo(node); if (!node.hasChildNodes()) return; NodeList nodes = node.getChildNodes(); for (int i = 0; i < nodes.getLength(); ++i){ node = nodes.item(i); recursePrint(depth, node); } }
![Page 21: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/21.jpg)
Transforming XML
![Page 22: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/22.jpg)
The JAXP Transformation Packages
• JAXP Transformation APIs: – javax.xml.transform
• This package defines the factory class you use to get a Transformer object. You then configure the transformer with input (Source) and output (Result) objects, and invoke its transform() method to make the transformation happen. The source and result objects are created using classes from one of the other three packages.
– javax.xml.transform.dom • Defines the DOMSource and DOMResult classes that let you use a DOM as an input to
or output from a transformation. – javax.xml.transform.sax
• Defines the SAXSource and SAXResult classes that let you use a SAX event generator as input to a transformation, or deliver SAX events as output to a SAX event processor.
– javax.xml.transform.stream • Defines the StreamSource and StreamResult classes that let you use an I/O stream as an
input to or output from a transformation.
![Page 23: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/23.jpg)
Transformer Architecture
![Page 24: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/24.jpg)
Writing DOM to XML
public class WriteDOM{ public static void main(String[] argv) throws Exception{ File f = new File(argv[0]); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(f);
TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer(); DOMSource source = new DOMSource(document); StreamResult result = new StreamResult(System.out); transformer.transform(source, result); }}
![Page 25: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/25.jpg)
Creating a DOM from scratch
• Sometimes you may want to create a DOM tree directly in memory. This is done with:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.newDocument();
![Page 26: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/26.jpg)
Manipulating Nodes
• Once the root node is obtained, typical tree methods exist to manipulate other elements:
boolean node.hasChildNodes()
NodeList node.getChildNodes()
Node node.getNextSibling()
Node node.getParentNode()
String node.getValue();
String node.getName();
String node.getText();
void setNodeValue(String nodeValue);
Node insertBefore(Node new, Node ref);
![Page 27: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/27.jpg)
JDOM
![Page 28: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/28.jpg)
JDOM Motivation(from Elliot Harold)
• Unfortunately DOM suffers from a number of design flaws and limitations that make it less than ideal as a Java API for processing XML
– DOM had to be backwards compatible with the hackish, poorly thought out, unplanned object models used in third generation web browsers.
– DOM was designed by a committee trying to reconcile differences between the object models implemented by Netscape, Microsoft, and other vendors. They needed a solution that was at least minimally acceptable to everybody, which resulted in an API that ユ s maximally acceptable to no one.
– DOM is a cross-language API defined in IDL, and thus limited to those features and classes that are available in essentially all programming languages, including not fully-object oriented scripting languages like JavaScript and Visual Basic. It is a lowest common denominator API. It does not take full advantage of Java, nor does it adhere to Java best practices, naming conventions, and coding standards.
– DOM must work for both HTML (not just XHTML, but traditional malformed HTML) and XML.
![Page 29: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/29.jpg)
Some sample JDOM<fibonacci/>
In JDOM:Element element = new Element("fibonacci");
In DOM:DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();DocumentBuilder builder = factory.newDocumentBuilder();DOMImplementation impl = builder.getDOMImplementation();Document doc = impl.createDocument( null, "Fibonacci_Numbers", null);
In JDOM:Element element = doc.createElement("fibonacci");Element element = new Element("fibonacci");element.setText("8"); :element.setAttribute("index", "6");
Extremely simple and intuitive!
![Page 30: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/30.jpg)
More JDOM
• To create this element
<sequence>
<number>3</number>
<number>5</number>
</sequence>
Element element = new Element("sequence");
Element firstNumber = new Element("number");
Element secondNumber = new Element("number");
firstNumber.setText("3");
secondNumber.setText("5");
element.addContent(firstNumber);
element.addContent(secondNumber);
![Page 31: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/31.jpg)
import org.jdom.*;
import org.jdom.input.SAXBuilder; Parsing XML file with JDOMimport java.io.IOException;import java.util.*;public class ElementLister { public static void main(String[] args) { if (args.length == 0) { System.out.println("Usage: java ElementLister URL"); return; } SAXBuilder builder = new SAXBuilder(); try { Document doc = builder.build(args[0]); Element root = doc.getRootElement(); listChildren(root, 0); } // indicates a well-formedness error catch (JDOMException e) { System.out.println(args[0] + " is not well-formed."); System.out.println(e.getMessage()); } catch (IOException e) { System.out.println(e); } }
public static void listChildren(Element current, int depth) { printSpaces(depth); System.out.println(current.getName()); List children = current.getChildren(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); listChildren(child, depth+1); } }
private static void printSpaces(int n) { for (int i = 0; i < n; i++) { System.out.print(' '); } }}
![Page 32: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/32.jpg)
SAX
Simple API for XML Processing
![Page 33: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/33.jpg)
About SAX
• SAX in Java is hosted on source forge
• SAX is not a w3c standard
• Originated purely in Java
• Other languages have chosen to implement in their own ways based on this prototype
![Page 34: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/34.jpg)
SAX vs. …
• Please don’t compared unrelated things:– SAX is an alternative to DOM, but realize that
DOM is often built on top of SAX
– SAX and DOM do not compete with JAXP
– They do both compete with JAXB implementations
![Page 35: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/35.jpg)
How a SAX parser works
• SAX parser scans an xml stream on the fly and responds to certain parsing events as it encounters them.
• This is very different than digesting an entire XML document into memory.
• Much faster, requires less memory.
• However, need to reparse if you need to revisit data.
![Page 36: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/36.jpg)
Obtaining a SAX parser
• Important classes javax.xml.parsers.SAXParserFactory;
javax.xml.parsers.SAXParser;
javax.xml.parsers.ParserConfigurationException;
//get the parser
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
//parse the document
saxParser.parse( new File(argv[0]), handler);
![Page 37: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/37.jpg)
DefaultHandler
• Note that an event handler has to be passed to the SAX parser.
• This must implement the interface
org.xml.sax.ContentHandler;
• Easier to extend the adapter
org.xml.sax.helpers.DefaultHandler
![Page 38: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/38.jpg)
Overriding Handler methods
• Most important methods to override – void startDocument()
• Called once when document parsing begins– void endDocument()
• Called once when parsing ends– void startElement(...)
• Called each time an element begin tag is encountered– void endElement(...)
• Called each time an element end tag is encountered– void characters(...)
• Called randomly between startElement and endElement calls to accumulated character data
![Page 39: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/39.jpg)
startElement
• public void startElement( String namespaceURI, //if namespace assoc String sName, //nonqualified name String qName, //qualified name Attributes attrs) //list of attributes
• Attribute info is obtained by querying Attributes objects.
![Page 40: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/40.jpg)
Characters
• public void characters(
char buf[], //buffer of chars accumulated
int offset, //begin element of chars
int len) //number of chars
• Note, characters may be called more than once between begin tag / end tag
• Also, mixed-content elements require careful handling
![Page 41: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/41.jpg)
Entity references
• Recall that entity references are special character sequences for referring to characters that have special meaning in XML syntax– ‘<‘ is <
– ‘>’ is >
• In SAX these are automatically converted and passed to the characters stream unless they are part of a CDATA section
![Page 42: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/42.jpg)
Choosing a Parser
• Choosing your Parser Implementation – If no other factory class is specified, the default SAXParserFactory
class is used. To use a different manufacturer's parser, you can change the value of the environment variable that points to it. You can do that from the command line, like this:
• java -Djavax.xml.parsers.SAXParserFactory=yourFactoryHere ...
• The factory name you specify must be a fully qualified class name (all package prefixes included). For more information, see the documentation in the newInstance() method of the SAXParserFactory class.
![Page 43: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/43.jpg)
Validating SAX ParsersString JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
Next, you need to configure DocumentBuilderFactory to generate a namespace-aware, validating parser that uses XML Schema:
… SaxParserFactory factory = SaxParserFactory.newInstance() factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { // Happens if the parser does not support JAXP 1.2 ... }
![Page 44: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/44.jpg)
Transforming arbitrary data structures using SAX and
Transformer
![Page 45: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/45.jpg)
Goal
• Now that we know SAX and a little about Transformations, there are some cool things we can do.
• One immediate thing is to create xml files from plain text files using the help of a faux SAX parser
• Turns out to be more robust than doing by hand
![Page 46: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/46.jpg)
Transformers
• Recall that transformers easily let us go between any source and result by arbitrary wirings of– StreamSource / StreamResult– SAXSource / SAXResult– DOMSource / DOMResult
• We used this to write a DOM tree to an XML file
• Now we will use a SAXSource together with a StreamResult to convert our text file
![Page 47: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/47.jpg)
Strategy
• We construct our own SAXParser – ie a class that implements the XMLReader interface
• This class must have a parse method (among others)
• We use parse to read our input file and fire the appropriate SAX events, rather than handcoding the Strings ourselves.
![Page 48: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/48.jpg)
Main snippet
public static void main (String argv []){ StudentReader parser = new StudentReader(); TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer(); FileReader fr = new FileReader(“students.txt”); BufferedReader br = new BufferedReader(fr); InputSource inputSource = new InputSource(fr); SAXSource source = new SAXSource(saxReader, inputSource); StreamResult result = new StreamResult(System.out); transformer.transform(source, result); }
create transformer
Create SAX “parser”
Use text as result
Use text File as Transformer source
![Page 49: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/49.jpg)
XMLReader implementation
• To have a valid SAXSource we need a class that implements XMLReader interface
public void parse(InputSource input)public void setContentHandler(ContentHandler handler) public ContentHandler getContentHandler() ...
•Shown are the important methods for a simple app
![Page 50: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/50.jpg)
See Course Examples for details
![Page 51: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/51.jpg)
JAXB
Java Architecture for XML Bindings
![Page 52: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/52.jpg)
What is JAXB?
• JAXB defines the behavior of a standard set of tools and interfaces that automatically generate java class files from XML schema
• JAXB is a framework or architecture, not an implementation.
• Sun provides a reference implementation of JAXB with the Web Services Developers kit, available as a separate download http://java.sun.com/webservices/downloads/webservicespack.html
![Page 53: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/53.jpg)
JAXB vs. DOM and SAX
• JAXB is a higher level construct than DOM or SAX– DOM represents XML documents as generic trees
– SAX represents XML documents as generic event streams
– JAXB represents XML documents as Java classes with properties that are specific to the particular XML document
• E.g. book.xml becomes Book.java with getTitle, setTitle, etc.
• JAXB thus requires almost no knowledge of XML to be able to programmatically process XML documents!
![Page 54: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/54.jpg)
High-level comparison
• Before diving into details of JAXB, it’s good to see a bird’s-eye-view of the difference between JAXB and SAX and/or DOM-like parsers
• Study the books/ examples under the examples/jaxb directory on the course website
![Page 55: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/55.jpg)
JAXB steps
• We start by assuming that you have a valid installation of java web services developers pack version 3. We cover these installation details later
• Using JAXB then requires several steps:1. Run the binding compiler on the
schema file to automagically produce the appropriate java class files
2. Compile the java class files (ant tool helps here)
3. Study the autogenerated api to learn what java types have been created
4. Create a program that unmarshals an xml document into these elementary data structures
![Page 56: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/56.jpg)
Running binding compiler
• <install_dir>/jaxb/bin/xjc.sh -p test.jaxb books.xsd -d work– xjc.sh : executes binding compiler
– -p test.jaxb : place resulting class files in package test.jaxb
– books.xsd : run compiler on schema books.xsd
– -d work : place resulting files in directory called work/
• Note that this creates a huge number of files that together represent the content of the books.xsd schema as a set of Java classes
• It is not necessary to know all of these classes. We’ll study them only at a high level so we can understand how to use them
![Page 57: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/57.jpg)
Example: students.xsd
![Page 58: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/58.jpg)
Generated interfaces
• xjc.sh -p test.lottery students.xsd
• This generates the following interfaces– test/lottery/ObjectFactory.java
• Contains methods for generating instances of the interfaces
– test/lottery/Students.java• Represents the root node <students>
– test/lottery/StudentsType.java• Represents the unnamed type of each student object
![Page 59: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/59.jpg)
Generated implementations
• Each interface is implemented in the impl directory– test/lottery/impl/StudentsImpl.java
• Vendor-specific implementation of the Students inteface
– test/lottery/impl/StudentsTypeImpl.java• Vendor-specific implementation of the StudentsType Interface
![Page 60: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/60.jpg)
Compilation
• Next, the generated classes must be compiled:– javac students/*.java students/impl/*.java
• CLASSPATH requires many jar files:– jaxb/lib/*.jar– jwsdp-shared/lib/*.jar– jaxp/lib/**/*.jar
• Note: an ant buildfile (like a java makefile) makes this much easier. More on this later
![Page 61: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/61.jpg)
Generated docs
• Java API docs for these classes are generated in– students/docs/api/*.html
• After bindings are generated, one usually works directly through these API docs to learn how to access/manipulate the XML data.
![Page 62: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/62.jpg)
Sample Programs
![Page 63: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/63.jpg)
Sample Programs
• Easiest way to learn is to cover certain generic sample cases. These are all on the course website under ace104/lesson6/examples
• Summary of examples:– student/
• Use JAXB to read an xml document composed of a single student complex type
– student/• Same, but for an xml document composed of a sequence of such student
types of indefinite length
– purchaseOrder/• Another read example, but for a more complex schema
![Page 64: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/64.jpg)
Sample programs, cont
• Course examples, cont– create-marshal
• Purchase-order example modified to create in memory and write to XML
– modify-marshal• Purchase-order example modified to read XML, change it and
write back to XML
• Study these examples!
![Page 65: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/65.jpg)
Some additional JAXB details
![Page 66: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/66.jpg)
Binding Data Types
• Default java datatype bindings can be found at:http://java.sun.com/webservices/docs/1.3/tutorial/doc/JAXBWorks5.html
• These defaults can be changed if required for an application
• Also, name binding are fairly standard changes of names to things acceptable in java programming language
• See other binding rules on subsequent pages
![Page 67: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/67.jpg)
Default binding rules summary• The JAXB binding model follows the default binding rules summarized below:
• Bind the following to Java package:– XML Namespace URI
• Bind the following XML Schema components to Java content interface:– Named complex type– Anonymous inlined type definition of an element declaration
• Bind to typesafe enum class:– A named simple type definition with a basetype that derives from "xsd:NCName" and has enumeration facets.
• Bind the following XML Schema components to a Java Element interface:– A global element declaration to a Element interface.– Local element declaration that can be inserted into a general content list.
• Bind to Java property:– Attribute use– Particle with a term that is an element reference or local element declaration.
• Bind model group with a repeating occurrence and complex type definitions with mixed {content type} to:– A general content property; a List content-property that holds Java instances representing element information items and character
data items.
![Page 68: Cspp51038](https://reader035.fdocuments.us/reader035/viewer/2022081508/56813a9d550346895da2995c/html5/thumbnails/68.jpg)
End