Chapter 24 XML. CHAPTER GOALS Understanding XML elements and attributes Understanding the concept of...

65
Chapter 24 XML

Transcript of Chapter 24 XML. CHAPTER GOALS Understanding XML elements and attributes Understanding the concept of...

Chapter 24

XML

CHAPTER GOALS

• Understanding XML elements and attributes

• Understanding the concept of an XML parser

• Being able to read and write XML documents

• Being able to design Document Type Definitions for XML documents

XML

• Stands for Extensible Markup Language

• Lets you encode complex data in a form that the recipient can parse easily

• Is independent from any programming language

XML Encoding of Coin Data

<coin> <value>0.5</value> <name>half dollar</name>

</coin>

Advantages of XML

• XML files are readable by both computers and humans

• XML formatted data is resilient to change

o It is easy to add new data elements

o Old programs can process the old information in the new data format

Differences Between XML and HTML

• Both are descendants of SGML (Standard Generalized Markup Language)

• XML is a simplified version of SGML

• XML is very strict but HTML (as used today) is not

• XML tells what the data means; HTML tells how to display data

Differences Between XML and HTML

• XML tags are case-sensitive o <LI> is different from <li>

• Every XML start tag must have a matching end tag

• If a tag has no end-tag, it must end in /> o <img src="hamster.jpeg"/>

• XML attribute values must be enclosed in quotes o <img src="hamster.jpeg" width="400" height="300"/>

Structure of an XML Document • An XML data set is called a document

• The document starts with a header

<?xml version 1.0?>

• The data are contained in a root element <?xml version 1.0?> <purse>

more data </purse>

• The document contains elements and text

Structure of an XML Document • An XML element has one of two forms

<elementTag optional attributes> contents </elementTag> or <elementTag optional attributes/>

• The contents can be elements or text or both

• An example of an element with both elements and text (mixed content):

<p>Use XML for <strong>robust</strong> data formats.</p>

• Avoid mixed content for data descriptions

Structure of an XML Document • An element can have attributes

• The a element in HTML has an href attribute

<a href="http://java.sun.com"> ... </a>

• An attribute has a name (such as href) and a value

• The attribute value is enclosed in either single or double quotes

• Attribute is intended to provide information about the content

<value currency="USD">0.5</value> or

<value currency="EUR">0.5</value>

• An element can have multiple attributes

Parsing XML Documents

• A parser is a program that o Reads a document o Checks whether it is syntactically cornet o Takes some action as it processes the document

• There are two kinds of XML parsers o SAX (Simple Access to XML) o DOM ( Document Object Model)

Parsing XML Documents • SAX parser

o Event-driven o It calls a method you provide to process each construct it encounters o More efficient for handling large XML documents

• DOM parser o Builds a tree that represents the document o When the parser is done, you can analyze the tree o Easier to use for most applications

JAXP • Stands for Java API for XML Processing

• Provides a standard mechanism for DOM parsers to read and create documents

• Part of Java1.4 and above

• Earlier versions need to download additional libraries

Parsing XML Documents • Document interface describes the tree structure of an XML document

• A DocumentBuilder can generate an object of a class that implements Document interface

• Get a DocumentBuilder by calling the static newInstance method of the DocumentBuilderFactory class

• Call newDocumentBuilder method of the factory to get a DocumentBuilder DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder();

Parsing XML Documents • To read a document from a file String fileName = . . . ; File f = new File(filename);

Document doc = builder.parse(f);

• To read a document from a URL on the Internet String urlName = . . . ; URL u = new URL(urlName); Document doc = builder.parse(u);

• To read from an input stream InputStream in = . . . ; Document doc = builder.parse(in);

Parsing XML Documents

• You can inspect or modify the document

• The document tree consists of nodes

• Two node type are Element and Text

• Element and Text are subinterfaces of the Node interface

An XML Document <?xml version="1.0"?><items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <item> <product> <description>4-port Mini Hub</description> <price>19.95</price> </product> <quantity>4</quantity> </item></items>

Tree View of XML Document

Parsing XML Documents • Start inspection of the tree by getting the root element Element root = doc.getDocumentElement();

• To get the child elements of an element o Use the GetChildNodes method of the Element interface o The nodes are stored in an object of a class that implements the NodeList interface

• Use a NodeList to visit the child nodes of an element o getLength method gives the number of elements o item method gets an item in the node list

• Code to get a child node NodeList nodes = root.getChildNodes(); int i = . . . ; //a value between o and getlength() - 1 Node child = nodes.item(i);

• The XML parser keeps all white spaces if you don't use a DTD o You can include a test to ignore the white space

Parsing XML Documents

• Get an element name with the getTagName Element priceElement = . . . ;

String name = priceElement.getTagName();

• To find the value of the currency attribute String attributeValue = priceElement.getAttribute("currency")

• You can also iterate through all attributes o Use a NamedNodeMap o Each attribute is stored in a Node

Parsing XML Documents • Some elements have children that contain text

• Document builder creates nodes of type Text

• If you don't use mixed content elements o Any element containing text has a single Text child node o Use getFirstChild method to get it o Use getData method to read the text

• To determine the price stored in the price element Element priceNode = . . . ; Text priceData = (Text)priceNode.getFirstChild(); String priceString = priceNode.getData(); double price = Double.parseDouble(priceString);

File ItemListParser.java 001: import java.io.File;

002: import java.io.IOException;

003: import java.util.ArrayList;

004: import javax.xml.parsers.DocumentBuilder;

005: import javax.xml.parsers.DocumentBuilderFactory;

006: import javax.xml.parsers.ParserConfigurationException;

007: import org.w3c.dom.Attr;

008: import org.w3c.dom.Document;

009: import org.w3c.dom.Element;

010: import org.w3c.dom.NamedNodeMap;

011: import org.w3c.dom.Node;

012: import org.w3c.dom.NodeList;

013: import org.w3c.dom.Text;

014: import org.xml.sax.SAXException;

015:

016: /**

017: An XML parser for item lists

018: */

019: public class ItemListParser

020: {

021: /**

022: Constructs a parser that can parse item lists

023: */

024: public ItemListParser()

025: throws ParserConfigurationException

026: {

027: DocumentBuilderFactory factory

028: = DocumentBuilderFactory.newInstance();

029: builder = factory.newDocumentBuilder();

030: }

031:

032: /**

033: Parses an XML file containing an item list

034: @param fileName the name of the file

035: @return an array list containing all items in the XML file

036: */

037: public ArrayList parse(String fileName)

038: throws SAXException, IOException

039: {

040: File f = new File(fileName);

041: Document doc = builder.parse(f);

042:

043: // get the <items> root element

044:

045: Element root = doc.getDocumentElement();

046: return getItems(root);

047: }

048:

049: /**

050: Obtains an array list of items from a DOM element

051: @param e an <items> element

052: @return an array list of all <item> children of e

053: */

054: private static ArrayList getItems(Element e)

055: {

056: ArrayList items = new ArrayList();

057:

058: // get the <item> children

059:

060: NodeList children = e.getChildNodes();

061: for (int i = 0; i < children.getLength(); i++)

062: {

063: Node childNode = children.item(i);

064: if (childNode instanceof Element)

065: {

066: Element childElement = (Element)childNode;

067: if (childElement.getTagName().equals("item"))

068: {

069: Item c = getItem(childElement);

070: items.add(c);

071: }

072: }

073: }

074: return items;

075: }

076:

077: /**

078: Obtains an item from a DOM element

079: @param e an <item> element

080: @return the item described by the given element

081: */

082: private static Item getItem(Element e)

083: {

084: NodeList children = e.getChildNodes();

085: Product p = null;

086: int quantity = 0;

087: for (int j = 0; j < children.getLength(); j++)

088: {

089: Node childNode = children.item(j);

090: if (childNode instanceof Element)

091: {

092: Element childElement = (Element)childNode;

093: String tagName = childElement.getTagName();

094: if (tagName.equals("product"))

095: p = getProduct(childElement);

096: else if (tagName.equals("quantity"))

097: {

098: Text textNode = (Text)childElement.getFirstChild();

099: String data = textNode.getData();

100: quantity = Integer.parseInt(data);

101: }

102: }

103: }

104: return new Item(p, quantity);

105: }

106:

107: /**

108: Obtains a product from a DOM element

109: @param e a <product> element

110: @return the product described by the given element

111: */

112: private static Product getProduct(Element e)

113: {

114: NodeList children = e.getChildNodes();

115: String name = "";

116: double price = 0;

117: for (int j = 0; j < children.getLength(); j++)

118: {

119: Node childNode = children.item(j);

120: if (childNode instanceof Element)

121: {

122: Element childElement = (Element)childNode;

123: String tagName = childElement.getTagName();

124: Text textNode = (Text)childElement.getFirstChild();

125:

126: String data = textNode.getData();

127: if (tagName.equals("description"))

128: name = data;

129: else if (tagName.equals("price"))

130: price = Double.parseDouble(data);

131: }

132: }

133: return new Product(name, price);

134: }

135:

136: private DocumentBuilder builder;

137: }

File ItemListParserTest.java01: import java.util.ArrayList;

02:

03: /**

04: This program parses an XML file containing an item list.

05: It prints out the items that are described in the XML file.

06: */

07: public class ItemListParserTest

08: {

09: public static void main(String[] args) throws Exception

10: {

11: ItemListParser parser = new ItemListParser();

12: ArrayList items = parser.parse("items.xml");

13: for (int i = 0; i < items.size(); i++)

14: {

15: Item anItem = (Item)items.get(i);

16: System.out.println(anItem.format());

17: }

18: }

19: }

Creating XML Documents • We can build a Document object in a Java program

and then save it as an XML document

• We need a DocumentBuilder object to create a new, empty document DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.newDocument(); //empty document

• The Document class has methods to create elements and text nodes

Creating XML Documents • To create an element use createElement method and

pass it a tag

Element itemElement = doc.createElement("item");

• To create a text node, use createTextNode and pass it a string

Text quantityText= doc.createTextNode("8");

• Use setAttribute method to add an attribute to the tag priceElement.setAttribute("currency", "USD");

Creating XML Documents • To construct the tree structure of a document

o start with the root

o add children with appendChild

• To build an XML tree that describes an item

// create elementsElement itemElement = doc.createElement("item");Element productElement = doc.createElement("product");Element descriptionElement = doc.createElement("description");Element priceElement = doc.createElement("price");Element quantityElement = doc.createElement("quantity");Text descriptionText = doc.createTextNode("Ink Jet Refill Kit");Text priceText = doct.createTextNode("29.95");Text quantityText = doc.createTextNode("8");

// add elements to the documentdoc.appendChild(itemElement);itemElement.appendChild(productElement);itemElement.appendChild(quantityElement);productElement.appendChild(descriptionElement);productElement.appendChild(priceElement);descriptionElement.appendChild(descriptionText);priceElement.appendChild(priceText);quantityElement.appendChild(quantityText);

Creating XML Documents • Use a Transformer to write an XML document to a stream

• Create a transformer Transformer t =

TransformerFactory.newInstance().newTransformer();

• Create a DOMSource from your document

• Create a StreamResult from your output stream

• Call the transform method of your transformer t.transform(new DOMSource(doc),

new StreamResult(System.out));

File ItemListBuilder.java 001: import java.util.ArrayList;

002: import javax.xml.parsers.DocumentBuilder;

003: import javax.xml.parsers.DocumentBuilderFactory;

004: import javax.xml.parsers.ParserConfigurationException;

005: import org.w3c.dom.Document;

006: import org.w3c.dom.Element;

007: import org.w3c.dom.Text;

008:

009: /**

010: Builds a DOM document for an array list of items.

011: */

012: public class ItemListBuilder

013: {

014: /**

015: Constructs an item list builder.

016: */

017: public ItemListBuilder()

018: throws ParserConfigurationException

019: {

020: DocumentBuilderFactory factory

021: = DocumentBuilderFactory.newInstance();

022: builder = factory.newDocumentBuilder();

023: }

024:

025: /**

026: Builds a DOM document for an array list of items.

027: @param items the items

028: @return a DOM document describing the items

029: */

030: public Document build(ArrayList items)

031: {

032: doc = builder.newDocument();

033: Element root = createItemList(items);

034: doc.appendChild(root);

035: return doc;

036: }

037:

038: /**

039: Builds a DOM element for an array list of items.

040: @param items the items

041: @return a DOM element describing the items

042: */

043: private Element createItemList(ArrayList items)

044: {

045: Element itemsElement = doc.createElement("items");

046: for (int i = 0; i < items.size(); i++)

047: {

048: Item anItem = (Item)items.get(i);

049: Element itemElement = createItem(anItem);

050: itemsElement.appendChild(itemElement);

051: }

052: return itemsElement;

053: }

054:

055: /**

056: Builds a DOM element for an item.

057: @param anItem the item

058: @return a DOM element describing the item

059: */

060: private Element createItem(Item anItem)

061: {

062: Element itemElement = doc.createElement("item");

063: Element productElement

064: = createProduct(anItem.getProduct());

065: Text quantityText = doc.createTextNode(

066: "" + anItem.getQuantity());

067: Element quantityElement = doc.createElement("quantity");

068: quantityElement.appendChild(quantityText);

069:

070: itemElement.appendChild(productElement);

071: itemElement.appendChild(quantityElement);

072: return itemElement;

073: }

074:

075: /**

076: Builds a DOM element for a product.

077: @param p the product

078: @return a DOM element describing the product

079: */

080: private Element createProduct(Product p)

081: {

082: Text descriptionText

083: = doc.createTextNode(p.getDescription());

084: Text priceText = doc.createTextNode("" + p.getPrice());

085:

086: Element descriptionElement

087: = doc.createElement("description");

088: Element priceElement = doc.createElement("price");

089:

090: descriptionElement.appendChild(descriptionText);

091: priceElement.appendChild(priceText);

092:

093: Element productElement = doc.createElement("product");

094:

095: productElement.appendChild(descriptionElement);

096: productElement.appendChild(priceElement);

097:

098: return productElement;

099: }

100:

101: private DocumentBuilder builder;

102: private Document doc;

103: }

File ItemListBuilderTest.java01: import java.util.ArrayList;

02: import org.w3c.dom.Document;

03: import javax.xml.transform.Transformer;

04: import javax.xml.transform.TransformerFactory;

05: import javax.xml.transform.dom.DOMSource;

06: import javax.xml.transform.stream.StreamResult;

07:

08: /**

09: This program tests the item list builder. It prints the

10: XML file corresponding to a DOM document containing a list

11: of items.

12: */

13: public class ItemListBuilderTest

14: {

15: public static void main(String[] args) throws Exception

16: {

17: ArrayList items = new ArrayList();

18: items.add(new Item(new Product("Toaster", 29.95), 3));

19: items.add(new Item(new Product("Hair dryer", 24.95), 1));

20:

21: ItemListBuilder builder = new ItemListBuilder();

22: Document doc = builder.build(items);

23: Transformer t = TransformerFactory

24: .newInstance().newTransformer();

25: t.transform(new DOMSource(doc),

26: new StreamResult(System.out));

27: }

28: }

Document Type Definitions • A DTD is a set of rules for correctly formed documents of a particular type

o Describes the legal attributes for each element type

o Describes the legal child elements for each element type

• Legal child elements are described with an ELEMENT rule

<!ELEMENT items (item*)>

• The items element (the root in this case) can have 0 or more item elements

• Definition of an item node

<!ELEMENT item (product, quantity)>

• Children of the item node must be a product node followed by a quantity

node

Document Type Definitions • Definition of product node

<! ELEMENT product (description, price)>

• The other nodes

<!ELEMENT quantity (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)>

• #PCDATA stands for parsable character data which is just text

o Can contain any characters

o Special characters have to be encoded when they occur in character data

Encodings for Special Characters

DTD for Item List

<!ELEMENT items (item)*>

<!ELEMENT item (product, quantity)>

<!ELEMENT product (description, price)>

<!ELEMENT quantity (#PCDATA)>

<!ELEMENT description (#PCDATA)>

<!ELEMENT price (#PCDATA)>

Regular Expressions for Element Content

Document Type Definitions

• A DTD gives you control over the allowed attributes of an element <!ATTLIST Element Attribute Type Default>

• Type can be any sequence of character data specified as CDATA

• Type can also specify a finite number of choices <!ATTLIST price currency (USD | EUR | JPY ) #REQUIRED >

Common Attribute Types

Attribute Defaults

Document Type Definitions

• #IMPLIED keyword means you can supply an attribute or not.

<!ATTLIST price currency CDATA #IMPLIED >

• If you omit the attribute, the application processing the XML data implicitly assumes some default value

• You can specify a default to be used if the attribute is not specified

<!ATTLIST price currency CDATA "USD" >

Parsing with Document Type Definitions

• Specify a DTD with every XML document

• Instruct the parser to check that the document follows the rules of the DTD

• Then the parser can be more intelligent about parsing

• If the parser knows that the children of an element are elements, it can suppress white spaces

Parsing with Document Type Definitions

• An XML document can reference a DTD in one of two ways

• The document may contain the DTD

• The document may refer to a DTD stored elsewhere

• A DTD is introduced with a DOCTYPE declaration

Parsing with Document Type Definitions

• If the document contains the DTD, the declaration looks like this: <!DOCTYPE rootElement [ rules ]>

• Example <?xml version="1.0"?><!DOCTYPE items [<!ELEMENT items (item*)><!ELEMENT item (product, quantity)><!ELEMENT product (description, price)><!ELEMENT quantity (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)>]>

<items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <item> <product> <description>4-port Mini Hub</description> <price>19.95</price> </product> <quantity>4</quantity> </item></items>

Parsing with Document Type Definitions

• If the DTD is stored outside the document, use the SYSTEM keyword inside the DOCTYPE declaration

• This indicates that the system must locate the DTD

• The location of the DTD follows the SYSTEM keyword

• A DOCTYPE declaration can point to a local file <!DOCTYPE items SYSTEM "items.dtd" >

• A DOCTYPE declaration can point to a URL <!DOCTYPE items SYSTEM "http://www.mycompany.com/dtds/items.dtd">

Parsing with Document Type Definitions

• When your XML document has a DTD, use validation when parsing

• Then the parser will check that all child elements and attributes conformto the ELEMENT and ATTRIBUTE rules in the DTD

• The parser throws an exception if the document is invalid

• Use the setValidating method of the DocumentBuilderFactorybefore calling newDocumentBuilder method

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setValidating(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(. . .);

Parsing with Document Type Definitions

• If the parser validates the document with a DTD, you can avoid validity checks in your code

• You can tell the parser to ignore white space in non-text elements factory.setValidating(true); factory.setIgnoringElementContentWhitespace(true);

• If the parser has access to a DTD, it can fill in defaults for attributes

File ItemListParser.java 001: import java.io.File;

002: import java.io.IOException;

003: import java.util.ArrayList;

004: import javax.xml.parsers.DocumentBuilder;

005: import javax.xml.parsers.DocumentBuilderFactory;

006: import javax.xml.parsers.ParserConfigurationException;

007: import org.w3c.dom.Attr;

008: import org.w3c.dom.Document;

009: import org.w3c.dom.Element;

010: import org.w3c.dom.NamedNodeMap;

011: import org.w3c.dom.Node;

012: import org.w3c.dom.NodeList;

013: import org.w3c.dom.Text;

014: import org.xml.sax.SAXException;

015:

016: /**

017: An XML parser for item lists

018: */

019: public class ItemListParser

020: {

021: /**

022: Constructs a parser that can parse item lists

023: */

024: public ItemListParser()

025: throws ParserConfigurationException

026: {

027: DocumentBuilderFactory factory

028: = DocumentBuilderFactory.newInstance();

029: factory.setValidating(true);

030: factory.setIgnoringElementContentWhitespace(true);

031: builder = factory.newDocumentBuilder();

032: }

033:

034: /**

035: Parses an XML file containing an item list

036: @param fileName the name of the file

037: @return an array list containing all items in the XML file

038: */

039: public ArrayList parse(String fileName)

040: throws SAXException, IOException

041: {

042: File f = new File(fileName);

043: Document doc = builder.parse(f);

044:

045: // get the <items> root element

046:

047: Element root = doc.getDocumentElement();

048: return getItems(root);

049: }

050:

051: /**

052: Obtains an array list of items from a DOM element

053: @param e an <items> element

054: @return an array list of all <item> children of e

055: */

056: private static ArrayList getItems(Element e)

057: {

058: ArrayList items = new ArrayList();

059:

060: // get the <item> children

061:

062: NodeList children = e.getChildNodes();

063: for (int i = 0; i < children.getLength(); i++)

064: {

065: Element childElement = (Element)children.item(i);

066: Item c = getItem(childElement);

067: items.add(c);

068: }

069: return items;

070: }

071:

072: /**

073: Obtains an item from a DOM element

074: @param e an <item> element

075: @return the item described by the given element

076: */

077: private static Item getItem(Element e)

078: {

079: NodeList children = e.getChildNodes();

080:

081: Product p = getProduct((Element)children.item(0));

082:

083: Element quantityElement = (Element)children.item(1);

084: Text quantityText

085: = (Text)quantityElement.getFirstChild();

086: int quantity = Integer.parseInt(quantityText.getData());

087:

088: return new Item(p, quantity);

089: }

090:

091: /**

092: Obtains a product from a DOM element

093: @param e a <product> element

094: @return the product described by the given element

095: */

096: private static Product getProduct(Element e)

097: {

098: NodeList children = e.getChildNodes();

099:

100: Element descriptionElement = (Element)children.item(1);

101: Text descriptionText

102: = (Text)descriptionElement.getFirstChild();

103: String description = descriptionText.getData();

104:

105: Element priceElement = (Element)children.item(1);

106: Text priceText

107: = (Text)priceElement.getFirstChild();

108: double price = Double.parseDouble(priceText.getData());

109:

110: return new Product(description, price);

111: }

112:

113: private DocumentBuilder builder;

114: }

File ItemListParserTest.java01: import java.util.ArrayList;

02:

03: /**

04: This program parses an XML file containing an item list.

05: The XML file should reference the items.dtd

06: */

07: public class ItemListParserTest

08: {

09: public static void main(String[] args) throws Exception

10: {

11: ItemListParser parser = new ItemListParser();

12: ArrayList items = parser.parse("items.xml");

13: for (int i = 0; i < items.size(); i++)

14: {

15: Item anItem = (Item)items.get(i);

16: System.out.println(anItem.format());

17: }

18: }

19: }