Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan...

83
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards

Transcript of Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan...

Page 1: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Representing Web Data:XML

CSI 3140

WWW Structures, Techniques and Standards

Page 2: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 2

XML

Example XML document:

An XML document is one that follows

certain syntax rules (most of which we

followed for XHTML)

Page 3: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3

XML Syntax

An XML document consists of

Markup

Tags, which begin with < and end with >

References, which begin with & and end with ;

Character, e.g. &#x20;

Entity, e.g. &lt;

The entities lt, gt, amp, apos, and quot are recognized in every XML document.

Other XHTML entities, such as nbsp, are only recognized in other XML documents if they are defined in the DTD

Character data: everything not markup

Page 4: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 4

XML Syntax

Comments

Begin with <!--

End -->

Must not contain –

CDATA section

Special element the entire content of which is interpreted

as character data, even if it appears to be markup

Begins with <![CDATA[

Ends with ]]> (illegal except when ending CDATA)

Page 5: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 5

XML Syntax

The CDATA section

is equivalent to the markup

Page 6: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 6

XML Syntax

< and & must be represented by references

except

When beginning markup

Within comments

Within CDATA sections

Page 7: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 7

XML Syntax

Element tags and elements

Three types

Start, e.g. <message>

End, e.g. </message>

Empty element, e.g. <br />

Start and end tags must properly nest

Corresponding pair of start and end element tags plus

everything in between them defines an element

Character data may only appear within an element

Page 8: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 8

XML Syntax

Start and empty-element tags may contain

attribute specifications separated by white

space

Syntax: name = quoted value

quoted value must not contain <, can contain &

only if used as start of reference

quoted value must begin and end with matching

quote characters („ or “)

Page 9: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 9

XML Syntax

Element and attribute names are case

sensitive

XML white space characters are space,

carriage return, line feed, and tab

Page 10: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 10

XML Documents

A well-formed XML document

follows the XML syntax rules and

has a single root element

Well-formed documents have a tree structure

Many XML parsers (software for

reading/writing XML documents) use tree

representation internally

Page 11: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 11

XML Documents

An XML document is written according to

an XML vocabulary that defines

Recognized element and attribute names

Allowable element content

Semantics of elements and attributes

XHTML is one widely-used XML

vocabulary

Another example: RSS (rich site summary)

Page 12: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 12

XML Documents

Page 13: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 13

XML Documents

Page 14: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 14

XML Documents

Valid names and content for an XML

vocabulary can be specified using

Natural language

XML DTDs (Chapter 2)

XML Schema (Chapter 9)

If DTD is used, then XML document can

include a document type declaration:

Page 15: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 15

XML Documents

Two types of XML parsers:

Validating

Requires document type declaration

Generates error if document does not

Conform with DTD and

Meet XML validity constraints

Example: every attribute value of type ID must be unique within the document

Non-validating

Checks for well-formedness

Can ignore external DTD

Page 16: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 16

XML Documents

Good practice to begin XML documents with

an XML declaration

Minimal example:

If included, < must be very first character of the

document

To override default UTF-8/UTF-16 character

encoding, include encoding declaration

following version:

Page 17: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 17

XML Documents

Internal subset of DTD

Entity vsn will be defined by any XML parser,

validating or not

Declaration of

internal subset of DTD

Page 18: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 18

XML Namespaces

XML Namespace: Collection of element and

attribute names associated with an XML vocabulary

Namespace Name: Absolute URI that is the name

of the namespace

Ex: http://www.w3.org/1999/xhtml is the namespace

name of XHTML 1.0

Default namespace for elements of a document is

specified using a form of the xmlns attribute:

Page 19: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 19

XML Namespaces

Another form of xmlns attribute known as a

namespace declaration can be used to

associate a namespace prefix with a

namespace name:Namespace prefix

Namespace declaration

Page 20: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 20

XML Namespaces

Example use of namespace prefix:

Page 21: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 21

XML Namespaces

In a namespace-aware XML application, all

element and attribute names are considered qualified

names

A qualified name has an associated expanded name that

consists of a namespace name and a local name

Ex: item is a qualified name with expanded name

<null, item>

Ex: xhtml:a is a qualified name with expanded name

<http://www.w3.org/1999/xhtml, a>

Page 22: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 22

XML Namespaces

Other namespace usage:A namespace can be declared and used on the same element

Page 23: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 23

XML Namespaces

Other namespace usage:

A namespace prefix can be redefined for

an element and its content

These elements belong to http://www.example.org/namespace

Page 24: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 24

JavaScript and XML

JavaScript DOM can be used to process

XML documents

JavaScript XML Dom processing is often

used with XMLHttpRequest

Host object that is a constructor for other host

objects

Sends an HTTP request to server, receives back

an XML document

Page 25: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 25

JavaScript and XML

Example use:

Previous visit count servlet: must reload

document to see updated count

Visit count with XMLHttpRequest: browser

will automatically update the visit count

periodically without reloading the entire page

Page 26: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

26

JavaScript and XMLDocument

generated by

GET request to

VisitCountUpdate

servlet

JavaScript file using

XMLHttpRequest object

span that will be updated by JavaScript code

Page 27: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

27

JavaScript and XML

XMLHttpRequest

request is processed

by doPost() method of

servlet

Response is XML document

Current visit count is returned as content

of count XML element

Page 28: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 28

JavaScript and XML

Page 29: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 29

JavaScript and XML

Typical code for creating an instance of XMLHttpRequest

Page 30: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 30

JavaScript and XML

Return immediately after sending request

(asynchronous behavior)

Function called as

state of connection

changes

Body of request (empty in this example)

Page 31: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 31

JavaScript and XML

Indicates response received

successfully

Root of

returned

XML

document

Page 32: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 32

JavaScript and XML

Ajax: Asynchronous JavaScript and XML

Combination of

(X)HTML

XML

CSS

JavaScript

JavaScript DOM (HTML and XML)

XMLHttpRequest in asynchronous mode

Page 33: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 33

Java-based DOM

Java DOM API defined by org.w3c.dom

package

Semantically similar to JavaScript DOM

API, but many small syntactic differences

Nodes of DOM tree belong to classes such as

Node, Document, Element, Text

Non-method properties accessed via methods

Ex: parentNode accessed by calling

getParentNode()

Page 34: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 34

Java-based DOM

Methods such as

getElementsByTagName() return

instance of NodeList

getLength() method returns # of items

item() method returns an item

Page 35: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 35

Java-based DOM

Example: program to count link elements

in an RSS document:

Page 36: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 36

Java-based DOM

Imports:

From Java

API for XML

Processing

(JAXP)

Page 37: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 37

Java-based DOM

Default parser is non-validating and non-

namespace-aware.

Overriding:

Also setValidating(true)

Page 38: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 38

Java-based DOM

Namespace-aware versions of methods end

in NS:Namespace name

Local name

Page 39: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 39

SAX

Page 40: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 40

SAX

Page 41: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 41

SAX

Page 42: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 42

SAX

Used if not namespace-aware or

if qualified name does not belong

to any namespace.

Page 43: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 43

SAX

Page 44: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 44

Transformations

JAXP provides API for transforming

between DOM, SAX, and Stream (text)

representations of XML documents

Example:

Input from stream to DOM

Modify DOM

Output as stream

Page 45: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 45

Transformations

Page 46: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 46

Transformations

Page 47: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 47

Transformations

“SAX output” means that a SAX event

handler is called:

Example: the code

feeds the XML document represented by DOM

document through the SAX event handler

CountElementsHelper()

Page 48: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 48

XSL

The Extensible Stylesheet Language (XSL)

is an XML vocabulary typically used to

transform XML documents from one form to

another form

XSL document

Input XML

document XSLT ProcessorOutput XML

document

Page 49: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 49

XSL

XSL

markup

Everything in the body

of the document that is

not XSL markup is

template data

Example XSL document

HelloWorld.xsl

Page 50: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 50

XSL

Input XML document HelloWorld.xml:

Output XML document:

Page 51: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 51

XSL

Page 52: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 52

XSL

Components of XSL:

XSL Transformations (XSLT): defines XSL

namespace elements and attributes

XML Path Language (XPath): used in many

XSL attribute values (ex: child::message)

XSL Formatting Objects (XSL-FO): XML

vocabulary for defining document style (print-

oriented)

Page 53: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 53

XPath

XPath operates on a tree representation of an XML document

Similar to DOM tree in that nodes have different types (element, text, comment, etc.)

Unlike DOM, attributes are also nodes in the XPath tree

Root of XPath tree called document root

One of the children of the document root, called the document element, is the root element of the XML document

Page 54: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 54

XPath

Location path: expression representing one

or more XPath tree nodes

/ represents document root

child::message is an example of a location

step and has two parts:

Axis name Node test

Page 55: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 55

XPath

Attribute nodes are

only seen along the attribute axis

XSLT specifies

context node

Page 56: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 56

XPath

Node test:

Name test: qualified name representing an

element (or attribute, for attribute axis) type

Example: child::message uses a name test

May use * as wildcard name test

Node-type test:

text(): true if node is a text node

comment(): true if node is a comment node

node(): true of any node

Page 57: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 57

XPath

A location step can have one or more

predicates that act as filters:

This predicate

applied first

Page 58: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 58

XPath

Page 59: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 59

XPath

Abbreviations:

Axis defaults to child if not specified

child::para = para

@ can be used in place of attribute::

attribute::display = @display

parent::node() = ..

self::node() = .

Page 60: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 60

XPath

A location path is one or more location steps

separated by /

Ex: child::para/child::strong (or

just para/strong)

Ex: para/strong/emph

Page 61: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 61

XPath

Evaluating a two-step location path:

Evaluate first location step, producing node list L1

For each node ni in L1

Temporarily set context node to ni

Evaluate second location step, producing node list Li

Result is union of Lis

Continue process for paths with more steps

Page 62: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 62

XPath

If body is context node, then:

para/strong represents {s1,s2,s4}

para[strong] represents {p1, p3}

Page 63: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 63

XPath

An absolute location path begins with / and uses

the document root as the context node

Ex: /body/para represents all para nodes that are

children of body which is child of document root

Ex: / represents list consisting only of the document

root

A relative location path does not begin with / and

uses an element determined by the application (e.g.,

XSLT) as the context node

Ex: body/para

Page 64: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 64

XPath

Another abbreviation:

/descendant-or-self::node()/ = //

Examples:

//strong: All strong elements in document

.//strong: All strong elements that are

descendants (or self) relative to the context node

Page 65: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 65

XPath

Combining node lists:

Use | to represent union of node lists produced

by individual location paths

Ex: strong|descendant::emph

represents all nodes that are either

children of the context node of type strong; or

descendants of the context node of type emph

Page 66: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 66

XSLT

Template

rule

Pattern of template rule

Template

of template

rule

Page 67: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 67

XSLT

XSLT processor deals with three XPath

trees:

Input trees: source and style-sheet

Elements containing only white space are normally

not included in either input tree (exception:

xsl:text element)

White space retained within other elements

Output tree: result

Page 68: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 68

XSLT

XSLT processing (high level):

Construct input trees

Initialize empty result tree

Search source tree for a node that is matched by a

template rule, i.e., a node that is contained in the node list

represented by the pattern of some template rule

Instantiate the template of the matching template rule in

the result tree

Context node for XPath expressions is matched node

Page 69: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 69

XSLT

Matches source tree document root

Context node for relative location

path is source tree document root

Page 70: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 70

XSLT

Page 71: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 71

XSLT

Restrictions on XPath in template rule

pattern (value of match attribute):

Only child and attribute axes are allowed

directly (can indirectly use descendant-or-

self axis via // notation)

XPath expression must evaluate to a node list

(some XPath expressions are functions that

produce string values)

Page 72: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 72

XSLT

Page 73: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 73

XSLT

Page 74: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 74

XSLT

Page 75: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 75

XSLT

Page 76: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 76

XSLT

Adding attributes:

Page 77: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 77

XSLTSource document elements are in a namespace

We want to copy

all h1 elements

plus all of their

descendants

Page 78: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 78

XSLT

XPath expressions must

use qualified names (because source

includes namespace)

Copy

element

and content

Page 79: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 79

XSLT

Template markup

in the result tree becomes

Most browsers will not accept this notation!

XSLT does not recognize &nbsp;

Solution:

Page 80: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 80

XSLT

Adding XML special characters to the result

Template:

Result:

disable-output-escaping also applies to value-of

Becomes & on input

Page 81: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 81

XSLT

Output formatting:

Add xml:space=“preserve” to

transform element of template to retain white

space

Use xsl:output element:

Page 82: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 82

XML and Browsers

An XML document can contain a processing

instruction telling a browser to:

Apply XSLT to create an XHTML document:

Page 83: Representing Web Data: XML - Engineeringgvj/Courses/CSI3140/lectures/XML.pdf · Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 3 XML Syntax An XML document

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 83

XML and Browsers

An XML document can contain a processing

instruction telling a browser to:

Apply CSS to style the XML document: