XML for .NET

44
XML for .NET Session 1 Introduction to XML Introduction to XSLT Programmatically Reading XML Documents Introduction to XPATH

description

XML for .NET. Session 1 Introduction to XML Introduction to XSLT Programmatically Reading XML Documents Introduction to XPATH. Introduction to XML. XML stands for e X tensible M arkup L anguage. XML is a way of specifying self-describing data in a human-readable format. - PowerPoint PPT Presentation

Transcript of XML for .NET

Page 1: XML for .NET

XML for .NET

Session 1Introduction to XML Introduction to XSLT

Programmatically Reading XML DocumentsIntroduction to XPATH

Page 2: XML for .NET

Introduction to XML XML stands for eXtensible Markup

Language. XML is a way of specifying self-describing

data in a human-readable format. XML’s structure is like that of HTML – it

consists of nested tags, or elements. Elements can contain text data, attributes, and other elements.

XML is ideal for describing hierarchical data.

Page 3: XML for .NET

The Components of XML XML document can be comprised

of the following components: Processing Directives A Root Element (required) Elements

Attributes Comments

Page 4: XML for .NET

An Example XML Document

<?xml version="1.0" encoding="UTF-8" ?><books>

<!-- Each book is represented by a <book> element -->

<book price="34.95">  <title>Teach Yourself ASP 3.0 in 21

Days</title><authors>

<author>Mitchell</author>   <author>Atkinson</author>  

</authors>  <year>1999</year>  

</book></books>

Page 5: XML for .NET

An Example XML Document The first line of an XML document is

referred to as a processing directive, and indicates the version of the XML standard being used and the encoding.<?xml version="1.0" encoding="UTF-8" ?>

In general, processing directives have the following syntax:

<?processing directive content ?>

Page 6: XML for .NET

All XML Documents must have precisely one root element. The first element in the XML document is referred to as the root element.

For the books example, the root element is <books>:

<?xml version="1.0" encoding="UTF-8" ?><books> …</books>

An Example XML Document

Page 7: XML for .NET

An Example XML Document XML documents can contain an arbitrary

number of elements within the root element. Elements have a name, can have an arbitrary number of attributes, and can contain either text content or further elements.

The generic syntax of an element is given as:

<elementName attribute1="value1" … attributeN="valueN">text content or further elements

</elementName>

Page 8: XML for .NET

An Example XML Document

For example, in the books XML document, the <book> element contains an attribute named price and three child elements: <title>, <authors>, and <year>.

<book price="34.95">  <title>…</title><authors> …</authors>  <year>…</year>   </book>

Page 9: XML for .NET

An Example XML Document The <title> and <year> elements

contain text content.

<title>Teach Yourself ASP 3.0 in 21 Days</title>

<year>1999</year>

Page 10: XML for .NET

An Example XML Document The <authors> element contains two

<author> child elements. The <author> elements contain text content, specifying the author(s) of the book:

<authors><author>Mitchell</author>   <author>Atkinson</author>  

</authors>

Page 11: XML for .NET

An Example XML Document XML comments are delimited by

<!-- and -->. XML comments are ignored by an XML parser and merely make the XML document easier to read for a human.<!-- Each book is represented by a <book> element -->

Page 12: XML for .NET

XML Formatting Rules XML is case sensitive. XML Documents must adhere to

specific formatting rules. An XML document that adheres to such rules is said to be well-formed.

These formatting rules are presented over the next few slides.

Page 13: XML for .NET

Matching Tags XML elements must consist of both an

opening tag and a closing tag. For example, the element <book> has an opening tag - <book> - and must also have a matching closing tag - </book>.

For elements with no contents, you can use the following shorthand notation:<elementName optionalAttributes />

Page 14: XML for .NET

Elements Must be Properly Nested Examples of properly nested elements:

<book price="39.95">  <title>XML and ASP.NET</title></book>

Example of improperly nested elements:<book price="39.95">  <title>XML and ASP.NET</book> </title>

Page 15: XML for .NET

Attribute Values Must Be Quoted Recall that elements may have an

arbitrary number of attributes, which are denoted as:<elementName attribute1="value1" ... attributeN="valueN"> … </elementName>

Note that the attribute values must be delimited by quotation marks. That is, the following is malformed XML:

<book price=39.95> … </book>

Page 16: XML for .NET

Illegal Characters for Text Content Recall that elements can contain text

content or other elements. For example, the <title> element contains text, providing the title of the book: <title>XML for ASP.NET</title>

There are five characters that cannot appear within the text content: <, >, &, ", and '.

Page 17: XML for .NET

Replacing the Illegal Characters with Legal Ones

If you need to use any of those four illegal characters in the text portion of an element, replace them with their equivalent legal value:

Replace This

With This

< &lt;> &gt;& &amp;" &quot;' &apos;

Page 18: XML for .NET

Namespaces Element names in an XML document can be

invented by the XML document’s creator. Due to this naming flexibility, when

computer programs work with different XML documents there may be naming conflicts.

For example, appending two XML documents together that use the same element name for different data representations leads to a naming conflict.

Page 19: XML for .NET

Naming Conflicts For example, if the following two XML

documents were to be merged, there would be a naming conflict:

<bug> <genus>Homoptera</genus> <species>Pseudococcidae</species></bug>

<bug> <date>2003-05-22</date> <description>The button, when clicked, raises a GPF exception.</description></bug>

Page 20: XML for .NET

Solving Name Conflicts with Namespaces A namespace is a unique name

that is “prefixed” to each element name to uniquely identify it.

To create namespaces, use the xmlns attribute:

<namespace-prefix:elementName xmlns:namespace-prefix="namespace">

E.g.:<animal:bug xmlns:animal="http://www.bugs.com/">

Page 21: XML for .NET

Namespaces Note that the xmlns namespace attribute contains

two parts:1. The namespace-prefix and,2. The namespace value.

The namespace-prefix can be any name. The namespace-prefix is then used to prefix element names to indicate that the element belongs to a particular namespace.

The namespace for an element automatically transfers to its children elements, unless explicitly specified otherwise.

Page 22: XML for .NET

Namespaces The namespace value must be a

globally unique string. Typically, URLs are used since they are unique to companies/developers. However, any string value will suffice, so long as it is unique.

Page 23: XML for .NET

The Default Namespace To save typing in the namespace-

prefix for every element, a default namespace can be specified using xmlns without the namespace-prefix part, like:

<elementName xmlns="namespace">

Page 24: XML for .NET

Namespaces in Action In later sections on XSLT and XSD we’ll see

namespaces in use in real-world XML situations…

Note that most of our XML examples will not use namespaces. Always using namespaces is a good habit to get into, but can add confusion/clutter to examples when learning… (Hence their omission where possible.)

Page 25: XML for .NET

Comparing XML to HTML HTML, or HyperText Markup

Language, is the syntax used to describe a Web page’s appearance.

At first glance, there appears to be a lot in common between HTML and XML – their syntax appears similar, they both consist of elements, attributes, and textual content, and so on.

Page 26: XML for .NET

Comparing XML to HTML However, XML and HTML are more different than

they are alike. With XML, you must create your own elements. With

HTML, your elements must come from a predefined set of legal elements (<b>, <p>, <body>, etc.)

XML is used solely to describe data; other technologies must be used if you want to present the XML in a formatted manner. HTML describes primarily presentation. That is, you use the <b> tag to make something bold, not to indicate its meaning.

Page 27: XML for .NET

Comparing XML to HTML XML documents must be well-formed,

while HTML documents are not confined to such rigorous rules.

The following legal HTML is illegal XML (why?):

<p align=center><b>I said, “Hello, World!”

</P></b>

•Tags improperly nested

•Unquoted attribute value

•<p> tag and </P> do not match (due to case-sensitivity)

•Contains illegal characters (") in text

Page 28: XML for .NET

Comparing XML to HTML Realize that XML is not a

replacement to HTML. XML and HTML have different aims – XML was created to describe data, HTML was designed to present data.

Page 29: XML for .NET

Comparing XML to Traditional Databases Imagine that we have information

about books stored in two formats:1. In a traditional database, say Microsoft

SQL Server 2000.2. In an XML document

What advantages does having the data represented as XML provide?

Page 30: XML for .NET

XML’s Benefits Over Traditional Databases XML data is human-readable. A human

glancing at the XML document can determine what data each element and attribute indicates.

XML is self-describing. With a traditional database, you would need to view the table schemas to determine the purpose of the data. With XML, the purpose is reflected by the names of the elements and their nesting.

Page 31: XML for .NET

XML’s Benefits Over Traditional Databases Also, since XML is naturally hierarchical,

determining the relationship among entities is much simpler in an XML document than in a relational database.

XML data is simply text content, meaning the data is platform-neutral. For example, the XML book data could be utilized by a program running on a Linux box, whereas the raw SQL Server 2000 data is only useful to a Windows box running SQL Server 2000.

Page 32: XML for .NET

Creating Our First XML Document Imagine that we wanted to be able to

catalog our music collection. Being interested in learning XML, we decide to create an XML document that stores the information about our music library.

First, we must decide what elements our XML document will consist of…

Page 33: XML for .NET

Creating Our First XML Document All XML documents need a root element.

What should we call our root element? Now, our root element will contain an

arbitrary number of children elements. Each child element will represent a single item in our music library. What should these child elements be named? Should they have any attributes? What elements should they contain?(Diagram XML document on board.)

Page 34: XML for .NET

Creating the XML Document with Visual Studio .NET

Visual Studio .NET makes it easy to create XML files. Go to File > New > File and select the XML File option from the New File dialog box:

Page 35: XML for .NET

Creating the XML Document with Visual Studio .NET This will create a blank XML file

with the <?xml …?> preprocessing directive.

At this point, you can type in the elements for the XML document. Note that after typing in an opening tag, VS.NET automatically adds the corresponding closing tag.

Page 36: XML for .NET

A Peek Ahead: Defining the XML Document’s Schema

Realize that we have yet to formally specify the structure of the XML document. In our discussions we arrived at modeling our music library using a certain set of XML elements, but this particular data model is known only to us.

XML documents can contain schemas that describe their structure. In the next session we’ll discuss XML Schemas in greater detail. For now, just observe that Visual Studio .NET allows us to enter any XML content we want, and we can enter content that violates the schema we verbally agreed on just a few moments ago.

Page 37: XML for .NET

An Hierarchical Way to Think of XML Documents When working with XML documents, it

helps to think of these documents as a hierarchy, where the top of the hierarchy is the root element.

At each “level” in the hierarchy there are elements, attributes, or text content.

This hierarchy-model of XML documents is commonly used in accessing XML documents programmatically, and hence is worth examining.

Page 38: XML for .NET

What is the DOM? DOM stands for Document Object Model,

and it’s a model that can be used to describe an XML document.

The DOM expresses the XML document as a hierarchy of elements, where each element can have zero to many children elements.

The text content and attributes of an element are expressed as its children as well.

Page 39: XML for .NET

Why Discuss the DOM? Being able to think of an XML document

in terms of the DOM helps in understanding how XPath, XSLT, and related concepts work.

Also, later we will examine how to access an XML document programmatically. When working with XML documents this way, the XML documents are presented as a DOM.

Page 40: XML for .NET

Example XML File

<?xml version="1.0" encoding="UTF-8" ?><books>

<book price="34.95">  <title>TYASP 3.0</title><authors>

<author>Mitchell</author>   </authors> 

</book><book price=“29.95"> 

<title>ASP.NET Tips</title><authors>

<author>Mitchell</author><author>Walther</author><author>Seven</author>

</authors>  </book>

</books>

Page 41: XML for .NET

The DOM View of the XML Document

Page 42: XML for .NET

Understanding the DOM With the DOM model we can think of each XML

element as having an arbitrary number of children elements.

Children element are comprised of both other elements as well as the text content and attributes.

Therefore, to visit every element in an XML document we could start with the root element and visit each of its children. Then, for each child being visited, we can recursively visit all of its children.

If this makes zero sense, don’t worry, we’ll discuss this in more detail later!

Page 43: XML for .NET

Summary In this presentation we examined XML

basics, including the components of XML and the rules for an XML document to be considered well-formed.

We compared XML to HTML and traditional data stores.

We created our first XML document using Visual Studio .NET.

We discussed the Document Object Model.

Page 44: XML for .NET

Questions?