DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

81
DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

Page 1: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 1

DOM(Document Object Model)

Cheng-Chia Chen

Page 2: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 2

What is DOM?

DOM (Document Object Model)A tree-based Data model of XML DocumentsAn API for XML document processing

cross multi-languages language neutral. defined in terms of CORBA IDL language-specific bindings supplied for ECMAScri

pt, java, ….

Page 3: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 3

Document Object Model

Defines how XML and HTML documents are represented as objects in programs

W3C StandardDefined in IDL; thus language independentHTML as well as XMLWriting as well as readingCovers everything except internal and external DTD

subsets

Page 4: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 4

Trees

An XML document can be represented as a tree.It has a root.It has nodes.It is amenable to recursive processing.

Page 5: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 5

DOM (Document Object Model)

What is the tree view of the document ?

<?xml version=“1.0” encoding=“UTF-8” ?> <TABLE><TBODY> <TR> <TD>紅樓夢 </TD> <TD>曹雪芹 </TD> </TR> <TR> <TD>三國演義 </TD> <TD>羅貫中 </TD> </TR> </TBODY></TABLE>

Page 6: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 6

Tree view (DOM view) of an XML Docuemnt

紅樓夢 曹雪芹 三國演義 羅貫中

(document node; root)

(element node)

(text node)

Page 7: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 7

DOM Evolution

DOM Level 0: DOM Level 1, a W3C Standard DOM Level 2, a W3C Standard DOM Level 3: W3C Standard:

Document Object Model (DOM) Level 3 Core Specification Document Object Model (DOM) Level 3 Load and Save Specification Document Object Model (DOM) Level 3 Validation Specification

DOM Level 3 : W3C Working group notes Document Object Model (DOM) Level 3 XPath

Specification Version 1.0 Document Object Model (DOM) Level 3 Views and Formatting Specifica

tion Document Object Model (DOM) Level 3 Events Specification Version 1.

0 W3c DOM Working group W3C DOM Tech Reports

Page 8: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 8

DOM Implementations for Java

Apache XML Project's Xerces/Crimson parsers: http://xml.apache.org/xerces2-j/index.html http://xml.apache.org/xerces-j/index.html

Hibernated http://xml.apache.org/crimson/

Hibernated, default implementation in java1.4

Sun's Java API for XML http://java.sun.com/products/xml

Oracle: http://technet.oracle.com/tech/xml

GNU JAXP: http://www.gnu.org/software/classpathx/jaxp/jaxp.html

Page 9: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 9

Modules

Modules: Core: org.w3c.dom (L1~L3) HTML: org.w3c.dom.html (L2) Views: org.w3c.dom.views(L2) StyleSheets: org.w3c.dom.stylesheets CSS: org.w3c.dom.css Events: org.w3c.dom.events (L2) Traversal: org.w3c.dom.traversal (L2) Range: org.w3c.dom.range (L2) Xpath, Load and Save, Validation (L3)

Only the core,traversal, XPath, L&S, and Validation modules really apply to XML. The others are for HTML.

Page 10: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 10

DOM Trees

Entire document is represented as a tree.A tree contains nodes.Some nodes may contain other nodes (depending on no

de type).Each document node contains:

zero or one doctype nodes one root element node zero or more comment and processing instruction nodes

Page 11: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 11

org.w3c.dom

17 interfaces: Attr CDATASection CharacterData Comment Document DocumentFragment DocumentType DOMImplementation Element Entity EntityReference

NamedNodeMap Node NodeList Notation ProcessingInstruction Text

plus one exception: DOMException

Plus a bunch of HTML stuff in org.w3c.dom.html and other packages

Page 12: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 12

The DOM Interface Hierarchy

Fundamental Interface

Extended Interface

Node Document

DOMImplementation

DOMExceptionNodeList

NamedNodeMap

CharacterData

Attr

Element

Text

Comment

CDATASection

DocumentType

Notation

Entity

EntityReference

ProcessingInstruction

DocumentFragment

Page 13: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 13

Steps to use DOM

Creates a parser using library specific codeUse the parser to parse the document and return a DOM

org.w3c.dom.Document object. The entire document is stored in memory. DOM methods and interfaces are used to extract data

from this object

Page 14: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 14

Parsing documents with a (Xerces) DOM Parser Example

import com.sun.org.apache.xerces.internal.parsers.*;// import org.apache.xerces.parsers.*;import org.w3c.dom.*;import org.xml.sax.*;import java.io.*;

public class DOMParserMaker {

public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }}

Page 15: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 15

Parsing process using JAXP

javax.xml.parsers.DocumentBuilderFactory.newInstance() creates a DocumentBuilderFactory

Configure the factoryThe factory's newDocumentBuilder() method creates a Do

cumentBuilderConfigure the builderThe builder parses the document and returns a DOM org.

w3c.dom.Document object. The entire document is stored in memory. DOM methods and interfaces are used to extract data fro

m this object

Page 16: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 16

JAXP’s DOM plugability mechanism

Page 17: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 17

Parsing documents with a JAXP DocumentBuilder

import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*;

public class JAXPParserMaker {

public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); builderFactory.setNamespaceAware(true); DocumentBuilder parser = builderFactory.newDocumentBuilder(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document d = parser.parse(args[i]); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } // end for } catch (ParserConfigurationException e) { System.err.println("You need to install a JAXP aware parser."); }}}

Page 18: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 18

The Node Interface

package org.w3c.dom;

public interface Node {

// NodeType public static final short ELEMENT_NODE = 1; public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; public static final short CDATA_SECTION_NODE = 4; public static final short ENTITY_REFERENCE_NODE = 5; public static final short ENTITY_NODE = 6; public static final short PROCESSING_INSTRUCTION_NODE = 7; public static final short COMMENT_NODE = 8; public static final short DOCUMENT_NODE = 9; public static final short DOCUMENT_TYPE_NODE = 10; public static final short DOCUMENT_FRAGMENT_NODE = 11; public static final short NOTATION_NODE = 12;

Page 19: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 19

The Node interface

Node Property

public String getNodeName();

public String getNodeValue()    throws DOMException;

public String setNodeValue(String value)    throws DOMException;

public short getNodeType();

public String getNamespaceURI();

public String getPrefix();

public void setPrefix(String prefix)    throws DOMException;

public String getLocalName();

Page 20: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 20

The Node interface

Tree navigation

public Node getParentNode();

public NodeList getChildNodes();

public Node getFirstChild();

public Node getLastChild();

public Node getPreviousSibling();

public Node getNextSibling();

public NamedNodeMap getAttributes();

public Document getOwnerDocument();

public boolean hasChildNodes();

public boolean hasAttributes();

Page 21: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 21

Node navigation

previousSliblingthis

firstChild

parentNode

lastChild

nextSibling

childNodes

Page 22: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 22

The Node interface

Tree Modification

public Node insertBefore (Node newNode, Node refNode)    throws DOMException;

public Node replaceChild (Node newNode, Node refNode)    throws DOMException;

public Node removeChild(Node node)    throws DOMException;

public Node appendChild(Node newNode)    throws DOMException;

Page 23: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 23

Node manipulation

this

refNodefirstChild lastChild

childNodes

newNode

this.insertBefore(newNode, refNode)this.replaceChild(newNode, refNode)

this.appendChild(newNode)

Page 24: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 24

The Node interface

Utilities

public Node cloneNode(boolean deep);

public void normalize(); merge all adjacent text nodes into one.

public boolean isSupported(String feature, String version); Tests whether the DOM implementation implements a spec

ific feature and that feature is supported by this node.

Page 25: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 25

The NodeList Interface

package org.w3c.dom;

public interface NodeList {

public Node item(int index);

public int getLength();

Page 26: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 26

The NamedNodeMap interface

public interface NamedNodeMap {

public Node getNamedItem(String name); // by nodeName

public Node setNamedItem(Node arg) throws DOMException;

// insert/replace node if nodeName== arg.getNodeName()

public Node removeNamedItem(String name) throws DOMException;

public Node item(int index);

public int getLength();

// Introduced in DOM Level 2:

public Node getNamedItemNS(namespaceURI, localName);

public Node setNamedItemNS(Node arg) throws DOMException;

public Node removeNamedItemNS(namespaceURI, localName)

throws DOMException ;

}

Page 27: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 27

NodeReporter

import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*;

public class NodeReporter { public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = builderFactory.newDocumentBuilder(); NodeReporter iterator = new NodeReporter(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document doc = parser.parse(args[i]); iterator.followNode(doc); } catch (SAXException ex) { System.err.println(args[i] + " is not well-formed."); } catch (IOException ex) { System.err.println(ex); } } } catch (ParserConfigurationException ex) { System.err.println("You need to install a JAXP aware parser."); } } // end main

Page 28: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 28

// note use of recursion public void followNode(Node node) { processNode(node);

if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { followNode(children.item(i)); } } }

public void processNode(Node node) { String name = node.getNodeName();

String type = typeName[node.getNodeType()];

System.out.println("Type " + type + ": " + name); }

Page 29: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 29

Type2TypeName

Public String[ ] typeName = new String[]{

"Unknown Type“ ,

"Element“, "Attribute“, "Text“,

"CDATA Section“, "Entity Reference“,

"Entity“, "Processing Instruction“,

"Comment“, "Document“,

"Document Type Declaration“,

"Document Fragment“,

"Notation“,

} }

Page 30: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 30

Values of NodeName, NodeValue and attributes in a Node

Interface nodeName nodeValue attributes

Attr name of attribute value of attribute null

CDATASection #cdata-section content null

Comment #comment content null

Document#document null null

DocumentFragment

#document-fragment null null

DocumentType document type name null null

Element tag name null NamedNodeMap

Entity entity name null null

EntityReference null

name of entity referenced null

Notation notation name null null

ProcessingInstruction content excluding target

target null

Text #text content of the text node null

Page 31: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 31

The Document Node

The root node representing the entire document; not the same as the root element

Contains: one element node zero or more processing instruction nodes zero or more comment nodes zero or one document type nodes

Page 32: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 32

The Document Interface

package org.w3c.dom;

public interface Document extends Node {

public DocumentType getDoctype();

public DOMImplementation getImplementation();

public Element getDocumentElement();

public NodeList getElementsByTagName(String tagname);

public NodeList getElementsByTagNameNS(String

NamespaceURI, String localName);

public Element getElementById(String elementId);

Page 33: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 33

The Document Interface

// Factory methods public Element createElement(String tagName) throws DOMException;

public Element createElementNS(String namespaceURI, String qName) throws DOMException; public DocumentFragment createDocumentFragment();

public Text createTextNode(String data); public Comment createComment(String data);

public CDATASection createCDATASection(String data) throws DOMException;

public ProcessingInstruction createProcessingInstruction(String target, String data) throws DOMException;

public Attr createAttribute(String name) throws DOMException; public Attr createAttributeNS(String namespaceURI, String qName) throws DOMException; public EntityReference createEntityReference(String name) throws DOMException; public Node importNode(Node importedNode, boolean deep) throws DOMException; }

Page 34: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 34

Element Nodes

Represents a complete element including its start-tag, end-tag, and content

Content may contain: Element nodes ProcessingInstruction nodes Comment nodes Text nodes CDATASection nodes EntityReference nodes

Page 35: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 35

The Element Interface

public String getTagName(); // = getNodeName();

public NodeList getElementsByTagName(String name); public NodeList getElementsByTagNameNS(String rui, String localName);

public String getAttribute(String name); public String getAttributeNS(String uri, String localName);

public void setAttribute(String name, String value) throws DOMException;

public void setAttributeNS(String uriURI, String qName, String value) throws DOMException;

public void removeAttribute(String name) throws DOMException; public void removeAttributeNS(String uri, String localName) throws DOMException;

public Attr getAttributeNode(String name); public Attr getAttributeNodeNS(String namespaceURI, String localName);

public Attr setAttributeNode(Attr newAttr) throws DOMException; public Attr setAttributeNodeNS(Attr newAttr) throws DOMException;

public Attr removeAttributeNode(Attr oldAttr) throws DOMException;

Page 36: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 36

Example application

UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml: or locally, xml/rsslogs.xml

<?xml version="1.0"?><!-- <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> --><weblogs> <log> <name>MozillaZine</name> <url>http://www.mozillazine.org</url> <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl> <ownerName>Jason Kersey</ownerName> <ownerEmail>[email protected]</ownerEmail> <description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chat

s, Builds, Demos, Screenshots, and more.</description> <imageUrl></imageUrl> <adImageUrl>http://static.userland.com/weblogMonitor/ads/[email protected] </adImageUrl> </log> …</weblogs>

Page 37: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 37

DOM Design

Want to find all URLs in the logs

The character data of each url element needs to be read. Everything else can be ignored.

The getElementsByTagName() method in Document gives us a quick list of all the url elements.

Page 38: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 1

The programWeblogsDOM .java

Page 39: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 39

CharacterData interface

Represents things that are basically text holders

Super interface of Text, Comment, and CDATASection

Page 40: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 40

The CharacterData Interface

package org.w3c.dom;

public interface CharacterData extends Node { // content retrieval public String getData() throws DOMException; public int getLength(); public String substringData(int offset, int count) throws DOMException;

// content modification public void setData(String data) throws DOMException; public void appendData(String arg) throws DOMException; public void insertData(int offset, String arg) throws DOMException; public void deleteData(int offset, int count) throws DOMException; public void replaceData(int offset, int count, String arg) throws DOME

xception; }

Page 41: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 41

Text Nodes

Represents the text content of an element or attribute

Contains only pure text, no markup

Parsers will return a single maximal text node for each contiguous run of pure text

Editing may change this

Page 42: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 42

The Text Interface

package org.w3c.dom;

public interface Text extends CharacterData {

public Text splitText(int offset) throws DOMException;

}

Page 43: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 43

CDATA section Nodes

Represents a CDATA section like this example from a hypothetical SVG tutorial:

<p>You can use a default <code>xmlns</code> attribute to avoid

having to add the svg prefix to all your elements:</p>

<![CDATA[

<svg xmlns="http://www.w3.org/2000/svg"

width="12cm" height="10cm">

<ellipse rx="110" ry="130" />

<rect x="4cm" y="1cm" width="3cm" height="6cm" />

</svg>

]]>

No children

Page 44: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 44

The CDATASection Interface

package org.w3c.dom;

// no additional methods other than those form Text

public interface CDATASection extends Text {

}

Page 45: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 45

DocumentType Nodes

Represents a document type declaration

Has no children

Page 46: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 46

The DocumentType Interface

package org.w3c.dom;

public interface DocumentType extends Node {

public String getName(); public NamedNodeMap getEntities(); public NamedNodeMap getNotations(); public String getPublicId(); public String getSystemId(); public String getInternalSubset(); }

Page 47: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 47

Example

<!DOCTYPE html PUBLIC

"-//W3C//DTD XHTML 1.0 Strict//EN"

"DTD/xhtml1-strict.dtd">

name = “html”pubicId = "-//W3C//DTD XHTML 1.0 Strict//EN" systemId= "DTD/xhtml1-strict.dtd"

Page 48: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 48

Attr Nodes

Represents an attribute

Contains: Text nodes Entity reference nodes

Page 49: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 49

The Attr Interface

package org.w3c.dom;

public interface Attr extends Node {

public String getName();

public boolean getSpecified(); //false => from DTD

public String getValue();

public void setValue(String value)

throws DOMException;

public Element getOwnerElement();

// namespaceURI, prefix, localName inherited from Node

}

Page 50: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 50

ProcessingInstruction Nodes

Represents a processing instruction like

<?robots index="yes" follow="no"?>

No children

Page 51: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 51

The ProcessingInstruction Interface

package org.w3c.dom;

public interface ProcessingInstruction extends Node {

public String getTarget();

public String getData();

public void setData(String data) throws DOMException;

}

Ex: <?robots index="yes" follow="no“ ?> target = [robots] data = [index="yes" follow="no“ ]

Page 52: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 52

Comment Nodes

Represents a comment like this example from the XML 1.0 spec:

<!--* This is a comment -->No children

The Comment Interface

package org.w3c.dom;

public interface Comment extends CharacterData { }

Notes: Text, CDATASection, Comment are all subinterfaces of CharacterData and can use all methods defined in it.

Page 53: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 53

Notation

Page 54: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 54

Notation, Entity and EntityReference

public interface Notation extends Node {

public String getPublicId();

public String getSystemId(); }

public interface Entity extends Node { // for GE or unparsed

public String getPublicId(); // entity only.

public String getSystemId();

public String getNotationName(); }

// Entity’s replacement Text are stored as its readonly

// childNodes if available.

public interface EntityReference extends Node { }

// referred entity contents are children of this node.

// nodeName contains entity name referenced.

}

Page 55: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 55

DOMException

A runtime exception but you should catch it Error code accessible from the public code field Error code gives more detailed information: import static DOMException.*; DOMException.INDEX_SIZE_ERR

Index or size is negative, or greater than the allowed value DOMSTRING_SIZE_ERR

The specified range of text does not fit into a String HIERARCHY_REQUEST_ERR

Attempt to insert a node somewhere it doesn't belong WRONG_DOCUMENT_ERR

If a node is used in a different document than the one that created it (that doesn't support it)

INVALID_CHARACTER_ERR An invalid or illegal character is specified, such as in a name.

NO_DATA_ALLOWED_ERR Attempt to add data to a node which does not support data

Page 56: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 56

DOMException

NO_MODIFICATION_ALLOWED_ERR Attempt to modify a read-only object

NOT_FOUND_ERR Attempt to reference a node in a context where it does not exist

NOT_SUPPORTED_ERR The implementation does not support the type of object requested

INUSE_ATTRIBUTE_ERR Attempt to add an attribute to an element that already has that attribute

INVALID_STATE_ERR An attempt is made to use an object that is not, or no longer, usable.

SYNTAX_ERR An invalid or illegal string is specified.

INVALID_MODIFICATION_ERR An attempt to modify the type of the underlying object.

NAMESPACE_ERR An attempt is made to create or change an object in a way which is incorrect

with regard to namespaces. INVALID_ACCESS_ERR

A parameter or an operation is not supported by the underlying object.

Page 57: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 57

The DOMImplementation interface

Creates new Document objects

Creates new DocType objects

Tests features supported by this implementation

Page 58: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 58

DOMImplementation interface

package org.w3c.dom;

public interface DOMImplementation {

public boolean hasFeature(String feature, String version) public Object getFeature(String feature, String version) public DocumentType createDocumentType(String qName, String publicID, String systemID, String internalSubset) public Document createDocument(String uri, String qName, D

ocumentType doctype) throws DOMException}

Page 59: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 59

org.apache.xerces.dom.DOMImplementationImpl

The Xerces-specific class that implements DOMImplementation

package org.apache.xerces.dom;

public class DOMImplementationImpl implements DOMImplementation {

// factory method public static DOMImplementation getDOMImplementation()

public boolean hasFeature(String feature, String version) public Object getFeature(String feature, String version) public DocumentType createDocumentType(String qName, String publicID, String systemID, String internalSubset) public Document createDocument(String uri, String qName, Docume

ntType doctype) throws DOMException}

Page 60: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 60

Examples of creating DOM documents in the memory

FibonacciDOM.java using Xerces-j

FibonacciJAXP.java using JAXP.

Page 61: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 61

Which modules and features are supported?

A DOM application can use the hasFeature() method of the DOMImplementation interface to determine whether a module is supported or not.

XML Module: "XML"

HTML Module: "HTML"

Views Module: "Views"

StyleSheets Module: "StyleSheets"

CSS Module: "CSS“

CSS (extended interfaces) Module: "CSS2"

Events Module: "Events"

User Interface Events (UIEvent interface) Module: "UIEvents"

Mouse Events Module: "MouseEvents"

Mutation Events Module: "MutationEvents"

HTML Events Module: "HTMLEvents"

Traversal Module: "Traversal"

Range Module: "Range"

Page 62: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 62

Which modules are supported?

import org.apache.xerces.dom.DOMImplementationImpl;import org.w3c.dom.*; import java.io.*;

public class ModuleChecker { public static void main(String[] args) { // parser dependent DOMImplementation implementation = DOMImplementationImpl.getDOMImplementation();

String[] features = { "XML", "HTML", "Views", "StyleSheets", "CSS", "CSS2", "Events", "UIEvents", "MouseEvents", "MutationEvents", "HTMLEvents", "Traversal", "Range"}; for (int i = 0; i < features.length; i++) { if (implementation.hasFeature(features[i], "2.0")) { System.out.println("Implementation supports " + features[i] ); } else { System.out.println("Implementation does not support " + features[i]); } } } }

Page 63: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 63

The result

> java ModuleCheckerImplementation supports XMLImplementation does not support HTMLImplementation does not support ViewsImplementation does not support StyleSheetsImplementation does not support CSSImplementation does not support CSS2Implementation supports EventsImplementation does not support UIEventsImplementation does not support MouseEventsImplementation supports MutationEventsImplementation does not support HTMLEventsImplementation supports TraversalImplementation supports Range

>

Page 64: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 64

Serialization

The process of taking an in-memory DOM tree and converting it to a stream of characters that can be written onto an output stream

Not a standard part of DOM Level 2 The org.apache.xml.serialize package:

public interface DOMSerializer public interface Serializer public abstract class BaseMarkupSerializer extends Object

implements DocumentHandler, org.xml.sax.misc.LexicalHandler, DTDHandler, org.xml.sax.misc.DeclHandler,

DOMSerializer, Serializer public class HTMLSerializer extends BaseMarkupSerializer public final class TextSerializer extends BaseMarkupSerializer public final class XHTMLSerializer extends HTMLSerializer public final class XMLSerializer extends BaseMarkupSerializer

Page 65: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 65

Example

A DOM program that writes Fibonacci numbers onto System.out

FibonacciDOMSerializer.java

Page 66: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 66

OutputFormat

For pretty format of output.package org.apache.xml.serialize;public class OutputFormat extends Object {

public OutputFormat( [String method, String encoding, boolean indenting ]) public OutputFormat( [Document doc,] String encoding, boolean indenting)

// abbreviated as public property String method; public String getMethod(); public void setMethod(String method)

// other public properties : int indent, lineWidth; boolean indenting, OmitXMLDeclaration, Standalone, LineSeparator, PreserveSpace; String encoding, version, mediaType, DoctypePublic, DoctypeSystem;

public void setDoctype(String publicID, String systemID)// Elements whose text children should be output as CDATA public String[] getCDataElements() public boolean isCDataElement(String tagName) public void setCDataElements(String[] cdataElements)

Page 67: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 67

OutputFormat

//NonEscape elements; i.e., text children output without using char reference

public String[] getNonEscapingElements() public boolean isNonEscapingElement(String tagName) public void setNonEscapingElements(String[] nonEscapingElements)

// last printable character in the encoding public char getLastPrintable() Query methods public static String whichMethod(Document doc) public static String whichDoctypePublic(Document doc) public static String whichDoctypeSystem(Document doc) public static String whichMediaType(String method)

Page 68: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 68

Better formatted output

UTF-8 encoding, Indentation, Word wrapping Document type declaration

try {

// Now that the document is created we need to *serialize* it

OutputFormat format = new OutputFormat(fibonacci, “UTF-8", true);

format.setLineSeparator("\r\n");

format.setLineWidth(72);

format.setDoctype(null, "fibonacci.dtd");

XMLSerializer serializer = new XMLSerializer(System.out, format);

serializer.serialize(root);

}

catch (IOException e) { System.err.println(e); }

> Java domexample. PrettyFibonacciDOMSerializer

Page 69: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 69

DOM based XMLPrettyPrinter

public class DOMPrettyPrinter { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document document = parser.getDocument(); // set output format & serialize OutputFormat format = new OutputFormat(document, "UTF-8", true);

format.setLineSeparator("\r\n"); format.setIndenting(true); format.setIndent(2); format.setLineWidth(72); format.setPreserveSpace(false);

XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(document); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main }

Page 70: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 70

Notes

Using the DOM to write documents automatically maintains well-formedness constraints

Validity is not automatically maintained.

Page 71: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 71

References

Most contents this presentation comes from: http://www.cafeconleche.org/slides/sd2004west/saxdom

Processing XML with Java Elliotte Rusty Harold, Chapters 9-13: Chapter 9, The Document Object Model: Chapter 10, Creating New XML Documents with DOM: Chapter 11, The Document Object Model Core: Chapter 12, The DOM Traversal Module: Chapter 13, Output from DOM:

DOM Level 2 Core Specification: DOM Level 2 Traversal and Range Specification:

Page 72: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 1

JAXP(Java API for XML ) for DOM

Page 73: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 73

DOMParsers and DOMImplementations

Problems:How to get a DOM Document object from an XML Docum

ent ? Get DOM Parser, parse XML document and then get a DOM

document.HOW to construct DOM objects directly by programs ?

get a DOMImplementation, invoke cerateDocument() to get the initial DOM document.

HOW to get a DOM object form an XML Document and modify it by programs ? get a DOM document by parsing the XML Docuemnt, use th

e factory methods of Document to create Nodes and use Node methods to add them to the result tree.

DOMParser

XML Document

DOM Document

Page 74: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 74

Use Apache’s xerces for DOM

XML2DOM: // find the DOM parser implementation class: org.apache.xerces.parsers.

DOMParser DOMParser parser = new DOMParser(); parser.setFeature(("http://xml.org/sax/features/validation", true ); parser.setFeature(("http://xml.org/sax/features/namespace", true ); … parser.parse( url_or_inputSource) ; Document doc = parser.getDocument();DOMImplementation =doc.getImplementation();Construct DOM from scratch: // find DOMImplematation class: org.apache.xerces.dom.DOMImplementat

ionImpl DOMImplementation dm = new DOMImplementationImpl(); // or dm = DOMImplementationImpl.getDOMImplementation(); // non-dom Document doc = dm.createDocument(…); Element e = doc.createElement(…); Attr attr = doc.createAttributeNS(…); Text txt = doc.createTextNode(“…”);

Page 75: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 75

JAXP (Java API for XML Processing) 1.2Sun’s Java API for XML Processingthree modules:

for DOM Processing for SAX Processing for Transformation

5 packages1. javax.xml.parsers

Provides classes allowing the processing of XML documents. Two types of plugable parsers are supported: SAX (Simple API for XML) DOM (Document Object Model)

2. javax.xml.transform ( + … ) APIs for processing transformation instructions, and perform

ing a transformation from source to result.

Page 76: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 76

JAXP’s DOM plugability mechanism

Page 77: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 77

JAXP API for DOM

javax.xml.parsers.DocumentBuilder Using this class, an application programmer can o

btain a Document from XML.javax.xml.parsers.DocumentBuilderFactory

a factory class for obtaining a DocumentrBuilder. abstract class Concrete subclass can be obtained by the static m

ethod: DocumentBuilderFactory.newInstance() desired capability of the parser can be specified b

y setting the various properties of the obtained factory instance.

Page 78: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 78

Example code snippet

import javax.xml.parsers.*;

DocumentBuilder builder;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setNamespaceAware(true);

factory.setValidating(true);

String location = "http://myserver/mycontent.xml";

try {

builder = factory.newDocumentBuilder();

Document doc1 = builder.parse(location);

Document doc2 = builder.newDocument(); //empty document

} catch (SAXException se) {// handle error

} catch (IOException ioe) { // handle error

} catch (ParserConfigurationException pce){// handle error

}

Page 79: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 79

javax.xml.dom.DocumentBuilder

abstract DOMImplementation getDOMImplementation() Obtain an instance of a DOMImplementation object.

abstract Document newDocument() Obtain a new instance of a DOM Document object to build a DOM tree with.

abstract boolean isNamespaceAware() Indicates whether or not this parser is configured to understand namespaces.

abstract boolean isValidating() Indicates whether or not this parser is configured to validate XML documents.

Document parse(File | InputSource | InputStream [, systemId] | uriString ) Parse the content of the given file as an XML document and return a new DO

M Document object. abstract void setEntityResolver(EntityResolver er)

Specify the EntityResolver to be used to resolve entities present in the XML document to be parsed.

abstract void setErrorHandler(ErrorHandler eh) Specify the ErrorHandler to be used to report errors present in the XML docu

ment to be parsed.

Page 80: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 80

javax.xml.dom.DocumentBuilderFactory

Object getAttribute(String name) void setAttribute(String name, Object value)

Allows users to set/get specific attributes on the underlying implementation.

boolean isIgnoringComments() , setIgnoringComments(boolean) Indicates whether or not the factory is configured to produce parsers w

hich ignores comments. Other properties:

IgnoringElementContentWhitespace ; ExpandEntityReferences; Coalescing; // merge adjacent texts and CDATA into a text node NamespaceAware; Validating;

abstract DocumentBuilder newDocumentBuilder() Creates a new instance of a DocumentBuilder using the currently confi

gured parameters. static DocumentBuilderFactory newInstance()

Obtain a new instance of a DocumentBuilderFactory.

Page 81: DOM Transparency No. 1 DOM (Document Object Model) Cheng-Chia Chen.

DOM

Transparency No. 81

HOW DocumentBuilderFactory finds its instance

Use the javax.xml.parsers.DocumentBuilderFactory system property

Use the above property at file “%JAVA_HOME%/lib/jaxp.properties" in the JRE directory.

look for the classname in the file META-INF/services/ javax.xml.parsers.DocumentBuilderFactory in jars available to the runtime.

Platform default DocumentBuilderFactory instance, which is "org.apache.crimson.jaxp.DocumentBuilderFactoryImpl“ f

or jdk 1.4 “com.sun.org.apache.xerces.internal.jaxp.DocumentBuilde

rFactoryImpl” for jdk 1.5.