XML for .NET

XML for .NET

Session 1Introduction to XML Introduction to XSLT

Programmatically Reading XML DocumentsIntroduction to XPATH

Introduction to XML XML stands for eXtensible Markup

Language. XML is a way of specifying self-describing

data in a human-readable format. XML’s structure is like that of HTML – it

consists of nested tags, or elements. Elements can contain text data, attributes, and other elements.

XML is ideal for describing hierarchical data.

The Components of XML XML document can be comprised

of the following components: Processing Directives A Root Element (required) Elements

Attributes Comments

An Example XML Document

<?xml version="1.0" encoding="UTF-8" ?><books>



<book price="34.95"> <title>Teach Yourself ASP 3.0 in 21

Days</title><authors>

<author>Mitchell</author> <author>Atkinson</author>

</authors> <year>1999</year>

</book></books>

An Example XML Document The first line of an XML document is

referred to as a processing directive, and indicates the version of the XML standard being used and the encoding.<?xml version="1.0" encoding="UTF-8" ?>

In general, processing directives have the following syntax:

<?processing directive content ?>

All XML Documents must have precisely one root element. The first element in the XML document is referred to as the root element.

For the books example, the root element is <books>:

<?xml version="1.0" encoding="UTF-8" ?><books> …</books>

An Example XML Document XML documents can contain an arbitrary

number of elements within the root element. Elements have a name, can have an arbitrary number of attributes, and can contain either text content or further elements.

The generic syntax of an element is given as:

<elementName attribute1="value1" … attributeN="valueN">text content or further elements

</elementName>

For example, in the books XML document, the <book> element contains an attribute named price and three child elements: <title>, <authors>, and <year>.

<book price="34.95"> <title>…</title><authors> …</authors> <year>…</year> </book>

An Example XML Document The <title> and <year> elements

contain text content.

<title>Teach Yourself ASP 3.0 in 21 Days</title>

<year>1999</year>

An Example XML Document The <authors> element contains two

<author> child elements. The <author> elements contain text content, specifying the author(s) of the book:

<authors><author>Mitchell</author> <author>Atkinson</author>

</authors>

An Example XML Document XML comments are delimited by

. XML comments are ignored by an XML parser and merely make the XML document easier to read for a human.

XML Formatting Rules XML is case sensitive. XML Documents must adhere to

specific formatting rules. An XML document that adheres to such rules is said to be well-formed.

These formatting rules are presented over the next few slides.

Matching Tags XML elements must consist of both an

opening tag and a closing tag. For example, the element <book> has an opening tag - <book> - and must also have a matching closing tag - </book>.

For elements with no contents, you can use the following shorthand notation:<elementName optionalAttributes />

Elements Must be Properly Nested Examples of properly nested elements:

<book price="39.95"> <title>XML and ASP.NET</title></book>

Example of improperly nested elements:<book price="39.95"> <title>XML and ASP.NET</book> </title>

Attribute Values Must Be Quoted Recall that elements may have an

arbitrary number of attributes, which are denoted as:<elementName attribute1="value1" ... attributeN="valueN"> … </elementName>

Note that the attribute values must be delimited by quotation marks. That is, the following is malformed XML:

<book price=39.95> … </book>

Illegal Characters for Text Content Recall that elements can contain text

content or other elements. For example, the <title> element contains text, providing the title of the book: <title>XML for ASP.NET</title>

There are five characters that cannot appear within the text content: <, >, &, ", and '.

Replacing the Illegal Characters with Legal Ones

If you need to use any of those four illegal characters in the text portion of an element, replace them with their equivalent legal value:

Replace This

With This

< <> >& &" "' '

Namespaces Element names in an XML document can be

invented by the XML document’s creator. Due to this naming flexibility, when

computer programs work with different XML documents there may be naming conflicts.

For example, appending two XML documents together that use the same element name for different data representations leads to a naming conflict.

Naming Conflicts For example, if the following two XML

documents were to be merged, there would be a naming conflict:

<bug> <genus>Homoptera</genus> <species>Pseudococcidae</species></bug>

<bug> <date>2003-05-22</date> <description>The button, when clicked, raises a GPF exception.</description></bug>

Solving Name Conflicts with Namespaces A namespace is a unique name

that is “prefixed” to each element name to uniquely identify it.

To create namespaces, use the xmlns attribute:

<namespace-prefix:elementName xmlns:namespace-prefix="namespace">

E.g.:<animal:bug xmlns:animal="http://www.bugs.com/">

Namespaces Note that the xmlns namespace attribute contains

two parts:1. The namespace-prefix and,2. The namespace value.

The namespace-prefix can be any name. The namespace-prefix is then used to prefix element names to indicate that the element belongs to a particular namespace.

The namespace for an element automatically transfers to its children elements, unless explicitly specified otherwise.

Namespaces The namespace value must be a

globally unique string. Typically, URLs are used since they are unique to companies/developers. However, any string value will suffice, so long as it is unique.

The Default Namespace To save typing in the namespace-

prefix for every element, a default namespace can be specified using xmlns without the namespace-prefix part, like:

<elementName xmlns="namespace">

Namespaces in Action In later sections on XSLT and XSD we’ll see

namespaces in use in real-world XML situations…

Note that most of our XML examples will not use namespaces. Always using namespaces is a good habit to get into, but can add confusion/clutter to examples when learning… (Hence their omission where possible.)

Comparing XML to HTML HTML, or HyperText Markup

Language, is the syntax used to describe a Web page’s appearance.

At first glance, there appears to be a lot in common between HTML and XML – their syntax appears similar, they both consist of elements, attributes, and textual content, and so on.

Comparing XML to HTML However, XML and HTML are more different than

they are alike. With XML, you must create your own elements. With

HTML, your elements must come from a predefined set of legal elements (, , <body>, etc.)

XML is used solely to describe data; other technologies must be used if you want to present the XML in a formatted manner. HTML describes primarily presentation. That is, you use the tag to make something bold, not to indicate its meaning.

Comparing XML to HTML XML documents must be well-formed,

while HTML documents are not confined to such rigorous rules.

The following legal HTML is illegal XML (why?):

I said, “Hello, World!”



•Tags improperly nested

•Unquoted attribute value

• tag and do not match (due to case-sensitivity)

•Contains illegal characters (") in text

Comparing XML to HTML Realize that XML is not a

replacement to HTML. XML and HTML have different aims – XML was created to describe data, HTML was designed to present data.

Comparing XML to Traditional Databases Imagine that we have information

about books stored in two formats:1. In a traditional database, say Microsoft

SQL Server 2000.2. In an XML document

What advantages does having the data represented as XML provide?

XML’s Benefits Over Traditional Databases XML data is human-readable. A human

glancing at the XML document can determine what data each element and attribute indicates.

XML is self-describing. With a traditional database, you would need to view the table schemas to determine the purpose of the data. With XML, the purpose is reflected by the names of the elements and their nesting.

XML’s Benefits Over Traditional Databases Also, since XML is naturally hierarchical,

determining the relationship among entities is much simpler in an XML document than in a relational database.

XML data is simply text content, meaning the data is platform-neutral. For example, the XML book data could be utilized by a program running on a Linux box, whereas the raw SQL Server 2000 data is only useful to a Windows box running SQL Server 2000.

Creating Our First XML Document Imagine that we wanted to be able to

catalog our music collection. Being interested in learning XML, we decide to create an XML document that stores the information about our music library.

First, we must decide what elements our XML document will consist of…

Creating Our First XML Document All XML documents need a root element.

What should we call our root element? Now, our root element will contain an

arbitrary number of children elements. Each child element will represent a single item in our music library. What should these child elements be named? Should they have any attributes? What elements should they contain?(Diagram XML document on board.)

Creating the XML Document with Visual Studio .NET

Visual Studio .NET makes it easy to create XML files. Go to File > New > File and select the XML File option from the New File dialog box:

Creating the XML Document with Visual Studio .NET This will create a blank XML file

with the <?xml …?> preprocessing directive.

At this point, you can type in the elements for the XML document. Note that after typing in an opening tag, VS.NET automatically adds the corresponding closing tag.

A Peek Ahead: Defining the XML Document’s Schema

Realize that we have yet to formally specify the structure of the XML document. In our discussions we arrived at modeling our music library using a certain set of XML elements, but this particular data model is known only to us.

XML documents can contain schemas that describe their structure. In the next session we’ll discuss XML Schemas in greater detail. For now, just observe that Visual Studio .NET allows us to enter any XML content we want, and we can enter content that violates the schema we verbally agreed on just a few moments ago.

An Hierarchical Way to Think of XML Documents When working with XML documents, it

helps to think of these documents as a hierarchy, where the top of the hierarchy is the root element.

At each “level” in the hierarchy there are elements, attributes, or text content.

This hierarchy-model of XML documents is commonly used in accessing XML documents programmatically, and hence is worth examining.

What is the DOM? DOM stands for Document Object Model,

and it’s a model that can be used to describe an XML document.

The DOM expresses the XML document as a hierarchy of elements, where each element can have zero to many children elements.

The text content and attributes of an element are expressed as its children as well.

Why Discuss the DOM? Being able to think of an XML document

in terms of the DOM helps in understanding how XPath, XSLT, and related concepts work.

Also, later we will examine how to access an XML document programmatically. When working with XML documents this way, the XML documents are presented as a DOM.

Example XML File

<?xml version="1.0" encoding="UTF-8" ?><books>

<book price="34.95"> <title>TYASP 3.0</title><authors>

<author>Mitchell</author> </authors>

</book><book price=“29.95">

<title>ASP.NET Tips</title><authors>

<author>Mitchell</author><author>Walther</author><author>Seven</author>

</authors> </book>

</books>

The DOM View of the XML Document

Understanding the DOM With the DOM model we can think of each XML

element as having an arbitrary number of children elements.

Children element are comprised of both other elements as well as the text content and attributes.

Therefore, to visit every element in an XML document we could start with the root element and visit each of its children. Then, for each child being visited, we can recursively visit all of its children.

If this makes zero sense, don’t worry, we’ll discuss this in more detail later!

Summary In this presentation we examined XML

basics, including the components of XML and the rules for an XML document to be considered well-formed.

We compared XML to HTML and traditional data stores.

We created our first XML document using Visual Studio .NET.

We discussed the Document Object Model.

Questions?

XML for .NET

Documents

Transcript of XML for .NET