SGML, HTML, XML: Do We Really Need All That?

60
SGML, HTML, XML: Do We Really Need All That? ISMT Multimedia Fall 2002 Dr Vojislav B Mišić

description

SGML, HTML, XML: Do We Really Need All That?. ISMT Multimedia Fall 2002 Dr Vojislav B Mišić. Lecture Overview. What is a markup language? HTML markup: what’s good, what’s wrong Extensions to HTML (dHTML and style sheets, XML and XSL, …) XML Basic elements Well-formed vs. valid XML - PowerPoint PPT Presentation

Transcript of SGML, HTML, XML: Do We Really Need All That?

Page 1: SGML, HTML, XML: Do We Really Need All That?

SGML, HTML, XML:Do We Really Need All That?

ISMT MultimediaFall 2002Dr Vojislav B Mišić

Page 2: SGML, HTML, XML: Do We Really Need All That?

Lecture Overview What is a markup language? HTML markup: what’s good, what’s wrong Extensions to HTML (dHTML and style sheets, XML and

XSL, …) XML

Basic elements Well-formed vs. valid XML Writing a DTD Examples of XML

Page 3: SGML, HTML, XML: Do We Really Need All That?

Markup languages What is markup?

Text (actual contents of the document) is interspersed with markings

Markup is related to the text notes on the content notes on text presentation but virtually anything can be marked (remember Fermat’s

last theorem?) Markup language allows separation of concerns: content

vs. presentation

Page 4: SGML, HTML, XML: Do We Really Need All That?

Standards for markup SGML (IBM) – a standardized way to write other markup

languages (actually, a meta-language) SGML-based language is specified using a DTD (Document

Type Definition) SGML is not really a user-friendly language, hence its use

was rather limited, even though software support for it does exist

Page 5: SGML, HTML, XML: Do We Really Need All That?

Other markup languages

TeX (Knuth) is another widely used markup language Performs extremely well for complex texts with

mathematical formulas and symbols cross-references different typefaces foreign language

Page 6: SGML, HTML, XML: Do We Really Need All That?

A TeX example\begin{equation}\label{coh1} \Psi (S) = \displaystyle \frac{\displaystyle \sum_{x \in R (S)} \left( \# S_w (x) - 1 \right)} {\displaystyle \sum_{x \in R (S)} \left( \# S - 1 \right)}\end{equation}

Page 7: SGML, HTML, XML: Do We Really Need All That?

HTML HTML (HyperText Markup Language) is the language of the

Internet Allows platform-independent browsing Text-only at first, media later Hyperlinks, limited visual formatting However, it is far from perfect, and is gradually being

replaced (current version: 4.01)

Page 8: SGML, HTML, XML: Do We Really Need All That?

HTML markup First you write the text, then add appropriate markup tags Tags can describe logical entities

Headings of different levels: H1, H2, … Lists and list elements (UL, OL, LI)

But tags can describe visual effects (display rendering) Bold and italic text (B, IT) Font and typeface changes

Page 9: SGML, HTML, XML: Do We Really Need All That?

If you make an error… Anything not recognized as correct HTML is essentially

ignored HTML browser just treats it as plain text and displays it

directly In this manner, users are still able to see most of the

source, albeit without proper formatting Your opinion: is this good or bad?

Page 10: SGML, HTML, XML: Do We Really Need All That?

HTML editing HTML source is ASCII and essentially layout independent

Plain text editors can be used You can put extra white space to your heart’s content, with no

effect on what is displayed by the browser Most browsers allow you to view and save the HTML

source of the document displayed – the quickest way to learn HTML

HTML is interpreted – editing changes are displayed (almost) instantly

Page 11: SGML, HTML, XML: Do We Really Need All That?

HTML on the Internet HTML browsers can display graphics and other media

objects Although HTML by itself provides only the most primitive

support for multimedia Tags can specify target URLs (hyperlinks) Error tolerance ensures that anyone with a browser (any

browser) can access HTML documents … all of which made HTML the language of choice for

hypertext on the Internet

Page 12: SGML, HTML, XML: Do We Really Need All That?

More HTML features Visual formatting is allowed but not forced

you can specify a typeface, but the browser will substitute another one of its own choice if the one specified is not available

User can easily change the presentation just resize window and select different fonts/sizes

Browser differences (IE vs. Navigator) – actually, not very important any more

Page 13: SGML, HTML, XML: Do We Really Need All That?

HTML Interactivity Interactivity at first limited to hyperlinks Forms introduced later (Navigator 3) Form support still limited, most often a client- or server-

side scripting is required Proliferation of scripting languages

CGI scripts JavaScript and Jscript (more details later) Vbscript, ASP perl

Page 14: SGML, HTML, XML: Do We Really Need All That?

Is HTML a Good Markup Language? Logical and visual formatting capabilities together

Some people argue for cleaner separation of logical from visual formatting

Others want more author control Many extensions (some proprietary) Changes generally lean towards greater author control

over document rendering – more direct formatting instructions included

Page 15: SGML, HTML, XML: Do We Really Need All That?

Dynamic HTML Commercial term – there is no such thing as a dHTML

standard Combination of HTML with new technologies

Stylesheets add greater author control Scripting allows improved interactivity, including user input Even simple animations are possible

As always, not quite compatible extensions by Microsoft and Netscape

Page 16: SGML, HTML, XML: Do We Really Need All That?

HTML styles In standard HTML, logical markup tags (such as <H1>)

have predefined properties for Typeface Font size Mode Line spacing

Properties cannot be changed, and we cannot define our own tags

The only way is to use a (possibly way too long) sequence of appropriate primitive tags every time – not a very convenient solution

Page 17: SGML, HTML, XML: Do We Really Need All That?

Stylesheets to the rescue Cascaded stylesheets (CSS): cleaner separation of markup

from actual content Style: a named set of properties that define presentation

of a chunk of text (character, paragraph, …) Styles are present in text processing software (WinWord)

but in some markup languages as well (TeX) CSS is used with HTML, but it’s not HTML – although

browsers know how to handle them together

Page 18: SGML, HTML, XML: Do We Really Need All That?

CSS Syntax A CSS-compatible stylesheet contains a set of rules, each

with a selector (name), a number of properties and their values

Rules can be Inline (within a HTML tag, in document body) Embedded (in the head of a HTML document) External, in a separate file which is then linked or imported

into a HTML document Position of the rule defines the scope of its effect on the

document

Page 19: SGML, HTML, XML: Do We Really Need All That?

CSS Selectors HTML selectors – text portions of HTML tags Class selectors – can be applied to any HTML tag ID selectors – usually applied only once per page to a

particular HTML tag Type of HTML tag defines the scope of CSS properties

Block level (DIV, LI, H1) Inline (B, FONT, TT) Replaced tags (IMG)

Page 20: SGML, HTML, XML: Do We Really Need All That?

CSS Properties Always of the form property:value; Categories of properties control

Typefaces (fonts, size, mode) Text (kerning, leading, alignment) Lists (bullets, indentation) Colors (borders, text, rules, background) Margins Positioning of individual elements

Page 21: SGML, HTML, XML: Do We Really Need All That?

CSS Rule with a HTML selector Effective redefinition of HTML tags, e.g.:

B { fonts: bold 18pt times,serif; text-decoration: underline;}

Redefines the <B> (boldface) tag throughout the rest of the document

Don’t forget to close the brace!

Page 22: SGML, HTML, XML: Do We Really Need All That?

CSS Rule with a class selector Independent style, applicable to any HTML tag:

.extra { font-size: 28pt; }

.huge { font-size: 48pt; }

Class selector must be referred to within the HTML tag:

<B class="extra">Extra</B><B class="huge">HUGE</B>

Page 23: SGML, HTML, XML: Do We Really Need All That?

CSS Rule with a class selector May be linked to a specific HTML tag:

p.mini { font-size: 8pt; }p.big { font-size: 14pt; }

Class selector may be applied to this HTML tag only:

<P class=“mini">mini</P><P class=“big">BIG</P>

Page 24: SGML, HTML, XML: Do We Really Need All That?

CSS Rule with an ID selector Another independent style, applicable to any HTML tag:

#area1 { position: relative; margin-left: 9em; color: red; }

ID is specified within the HTML tag:

<SPAN ID="area1"> ... </SPAN>

Page 25: SGML, HTML, XML: Do We Really Need All That?

More on CSS selectors Several CSS selectors may share the same definition, and

individual selectors may get additional properties separately

CSS rules can refer to tags nested within other tags, e.g.,

P B { background: pink; }

redefines the <B> tag only when encountered within the <P> tag

Page 26: SGML, HTML, XML: Do We Really Need All That?

Adding CSS to your document Within a style container in the document head:

<HEAD><STYLE TYPE="text/css"><!-- CSS rules go here--></STYLE></HEAD>

HTML comment tags hide the CSS rules form non-CSS browsers

Page 27: SGML, HTML, XML: Do We Really Need All That?

Importing CSS into your document Create a separate file, stylefile.css, then write

<HEAD><LINK REL=stylesheets TYPE="text/css“ HREF="stylefile.css“></HEAD>

Several files may be added in this manner

Page 28: SGML, HTML, XML: Do We Really Need All That?

More on CSS Single line comments start with // Multiline comments between matched pairs of /* and */ A stylesheet file may import another stylesheet file (hence

the name CSS) with the statement

@import url(stylefile)

But: the last rule listed wins! Also: beware of browser differences!

Page 29: SGML, HTML, XML: Do We Really Need All That?

More CSS capabilities Font selection Text control List properties Background properties Absolute and relative positioning (but this is very

dangerous!) Visibility (which probably has little use by itself – but it can

be quite useful when changed though appropriate scripts) Stacking (vertical) order

Page 30: SGML, HTML, XML: Do We Really Need All That?

Document Object Model

DOM describes the structure of HTML HTML document as a hierarchy

Thus allowing a script written in a suitable language to access and manipulate only selected element (or elements) within that document

document.images.b1.src="button_on.gif" describes a path from root or top (which is the document itself) to a particular element – an image file

Then, a script can manipulate this element (e.g., hide, show, replace, move, …) in response to certain events

Page 31: SGML, HTML, XML: Do We Really Need All That?

XML eXtended Markup Language: a simplified (easier, more

consistent) version of SGML XML-compliant languages defined with appropriate DTDs XML parsers signal syntax errors (unlike HTML) – use of

authoring tools implied current uses (with more to follow)

SMIL for synchronized multimedia RDF for resource definition exchange

Page 32: SGML, HTML, XML: Do We Really Need All That?

What is XML? A method for putting structured data in a text file Data stored on disk can be in binary or text format

Binary formats are often more concise Text format allows human inspection

XML is a set of rules/guidelines/conventions for designing text formats for such data, to produce files that are Easy to generate and read (by a computer) Unambiguous and platform-independent Extensible, easy to localize/internationalize

Page 33: SGML, HTML, XML: Do We Really Need All That?

XML looks like HTML but isn't HTML XML makes use of

tags (words bracketed by '<' and '>') and attributes (of the form name="value")

HTML specifies what each tag & attribute means (and often how the text between them will look in a browser)

XML uses the tags only to delimit pieces of data – and leaves the interpretation to the application

Page 34: SGML, HTML, XML: Do We Really Need All That?

XML is text, but isn't meant to be read XML files are text files, but they are not made for human

readers Text format allows experts (such as programmers) to more

easily debug applications Text format allows the use of a simple text editor to fix a

broken XML file Rules for XML files much stricter than for HTML Applications are not allowed to try to second-guess the

creator of a broken XML file – if the file is broken, just stop and issue an error message

Page 35: SGML, HTML, XML: Do We Really Need All That?

XML is verbose, but that is not a problem XML is a text format and uses tags to delimit the data Therefore, XML files are nearly always larger than

comparable binary formats But disk space isn't as expensive anymore as it used to be,

and compression/decompression can be fast and reliable Communication protocols can compress data on the fly,

thus saving bandwidth as effectively as a binary format

Page 36: SGML, HTML, XML: Do We Really Need All That?

XML is … good XML is license-free XML is platform-independent XML is well-supported Choosing XML is a lot like choosing SQL

you still have to build your own database and your own programs/procedures that manipulate it

but there are many tools available and many people that can help you

XML isn't always the best solution, but it is always worth considering …

Page 37: SGML, HTML, XML: Do We Really Need All That?

XML is a family of technologies XML: the specification that defines what "tags" and

"attributes" are Xlink describes a standard way to add hyperlinks to an

XML file CSS is applicable to XML as it is to HTML XSL: an advanced language for style sheets (presentation

and manipulation) XSLT: a transformation language SMIL: Synchronized Multimedia Modeling … and others

Page 38: SGML, HTML, XML: Do We Really Need All That?

Well-formed vs. valid XML Well-formed vs. valid XML Well-formed documents comply with XML well-formedness

constraints, which require that Elements properly nest within each other Elements use other markup syntax correctly

XML allows you to use elements of your own naming: ESSAY, SECTION, PARAGRAPH, NOTE, IMPORTANT

… unlike HTML, which forces all documents into a fixed document type

Page 39: SGML, HTML, XML: Do We Really Need All That?

Writing XML One, Two XML Declaration: declares the nature of XML documents to

document readers <?xml version="1.0" standalone="yes"?> <?xml version="1.0" standalone="no"?> <?xml version="1.0“

standalone="no“ encoding="UTF-8"?>

Root element: contains all other elements (i.e., the rest of the document)

Root element is synonymous with your document type Root element cannot be repeated

Page 40: SGML, HTML, XML: Do We Really Need All That?

An XML example

<?xml version="1.0" standalone="yes"?> <TRIVIA><MATH><QUESTION>What is the square root of 25</QUESTION><ANSWER>5</ANSWER></MATH> <GENERAL><QUESTION>What is the season after Summer</QUESTION><ANSWER>Fall</ANSWER><ANSWER>Autumn </ANSWER></GENERAL></TRIVIA>

Page 41: SGML, HTML, XML: Do We Really Need All That?

Rules for XML elements All elements must have opening and closing (start and

end) tags<MATH> ... </MATH>

There are exceptions – tags like<QUESTION ... />

Case matters – CML is case-sensitive Proper tag nesting must be observed You can add whitespace to your heart’s content – it is

ignored in processing

Page 42: SGML, HTML, XML: Do We Really Need All That?

XML Writing Describe content with elements of your own naming Invent a new element each time you introduce content

that significantly differs from any previous More elements = greater control you will have later, when

you use it Add attributes to elements Attributes describe the content or behavior of elements

Page 43: SGML, HTML, XML: Do We Really Need All That?

Another Example

<?xml version="1.0" standalone="yes"?><HELP><TITLE>XML Help</TITLE>

<QUERY area="XML"><QUESTION>Where do I start?</QUESTION><ANSWER>Start with your root element. Break your document down into parts, fill them in, repeat.</ANSWER></QUERY>

<QUERY area="XML"><QUESTION>Are my element names are well chosen?</QUESTION></HELP>

Page 44: SGML, HTML, XML: Do We Really Need All That?

XML Writing 4 Parsing: checking well-formedness

<PRICE>$57.80</PRICE><PET><CAT type="Cornish Rex">Cat nests properly within PET.</CAT></PET>

<WEATHER>Foggy no closing tag<LEVEL>Intermediate<LEVEL> improper tag<PASSWORD>planetB612</PASSWD> wrong spelling<DISTANCE TYPE=KM 120</DISTANCE>

missing closing bracket<CAR><engine>engine does not nest properly within CAR</CAR></engine> improper nesting

Page 45: SGML, HTML, XML: Do We Really Need All That?

Valid XML Valid XML—unlike well-formed one—requires a Document

Type Definition DTD: a set of rules that a particular document type must

follow The rules state the name and contents of each element,

and the contexts in which a particular element can and must exist

DTD enables communication with databases Valid XML documents may be accompanied by style sheets

for proper presentation

Page 46: SGML, HTML, XML: Do We Really Need All That?

What’s in a DTD Two essential structures: the element and the attribute Root element: contains all other elements Contents of other elements defined recursively starting

from the root, until you reach text-level elements, e.g.,<!ELEMENT NAME CONTENT>

Elements may have attributes, which are defined within the element definition, or separately, e.g.,<!ATTLIST ELEMENT-NAME NAME CDATA #IMPLIED>

Page 47: SGML, HTML, XML: Do We Really Need All That?

Writing a DTD

<!ELEMENT novel (preface,chapter+,biography?,criticalessay*)>

<!ELEMENT preface (paragraph+)>

<!ELEMENT chapter (title,paragraph+,section+)>

<!ELEMENT section (title,paragraph+)>

<!ELEMENT biography (title,paragraph+)>

<!ELEMENT criticalessay (title,section+)>

<!ELEMENT paragraph (#PCDATA|keyword)*>

<!ELEMENT title (#PCDATA|keyword)*>

<!ELEMENT keyword (#PCDATA)>

Page 48: SGML, HTML, XML: Do We Really Need All That?

DTD Declarations (1):Element type declaration Each element type includes a name, content, and possibly

a set of attributes A document can contain many conforming elements of

that type Sequence: ordered list of components (,) Choice: alternative components (|) Components may be optional (?) Components may be required and repeatable (+) Components may be optional and repeated (*)

Mixed-content declarations must include #PCDATA , parsed character data (i.e., text) as their first member

Page 49: SGML, HTML, XML: Do We Really Need All That?

DTD Declarations (2):Attribute List Declarations Much more variation here String type attributes (CDATA): virtually unconstrained text

strings Enumeration attributes: require a list of options to pick

from Attribute defaults:

#REQUIRED, required; #IMPLIED, optional; #FIXED "value", a fixed value, "value", a default but overridable value

Usage:<ELEMENT-NAME NAME="value">

Page 50: SGML, HTML, XML: Do We Really Need All That?

An Attribute List Example

<!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)><!ATTLIST MEMO importance (HIGH|MEDIUM|LOW) "LOW"><!ELEMENT TO (#PCDATA)><!ELEMENT FROM (#PCDATA)><!ELEMENT SUBJECT (#PCDATA)><!ELEMENT BODY (P+)><!ELEMENT P (#PCDATA)><!ELEMENT SIGN (#PCDATA)><!ATTLIST SIGN signatureFile CDATA #IMPLIED email CDATA #REQUIRED>

Page 51: SGML, HTML, XML: Do We Really Need All That?

XML Writing

Add an XML declaration Valid XML documents must include the appropriate DTD

either as a set of internal definitions, or<!DOCTYPE NAME SYSTEM [ definitions ]> as a reference to an external DTD file, <!DOCTYPE NAME SYSTEM "file“ > or both simultaneously<!DOCTYPE NAME SYSTEM "file“ [ definitions ]>

DTD enables the parser to check validity of the document (errors are NOT permitted!)

Page 52: SGML, HTML, XML: Do We Really Need All That?

Writing and Parsing Valid XML First suggestion: use a specialized editor Lots of choices, some of which are free Second suggestion: use a validating parser Again, lots of choices are available, mostly in Java, some in

C++, perl, JavaScript IE5 includes an XML parser (not quite up to the standard,

yet) XML interfaces to be included in standard DBMS systems:

Oracle, DB2, MS SQL Server

Page 53: SGML, HTML, XML: Do We Really Need All That?

SMIL Synchronized Multimedia Integration Language based on XML specification, endorsed by W3C

http://www.w3.org/TR/PR-smil integration of a set of independent media objects into a

synchronized presentation enables authors to describe

temporal behavior of a presentation spatial layout of the presentation hyperlinks between media objects

Page 54: SGML, HTML, XML: Do We Really Need All That?

Basic elements of a SMIL specification smil element can have an id attribute, and it can contain

body and head children elements head contains information not related to temporal behavior head can contain the following children: layout, switch

(but not both), and meta (zero or more) layout determines how the elements in the body are

positioned on an abstract rendering surface (audio or visual) if no layout is specified, the rendering is implementation

dependent Alternative layouts specified with a switch element

Page 55: SGML, HTML, XML: Do We Really Need All That?

Basic elements (III) each element has an id and a type element type specifies the layout language used in the

layout element (default: text/smil-basic-layout) the default type information contains region and root-layout

elements non-default type information is simply character data

SMIL basic layout is a subset of the visual rendering model only positionable media object elements are controlled by

the SMIL basic layout

Page 56: SGML, HTML, XML: Do We Really Need All That?

A region example

A text element is set to a 5 pixel distance from the top border of the rendering window: <smil> <head> <layout> <region id="a" top="5" /> </layout> </head> <body> <text region="a" .../> </body></smil>

Page 57: SGML, HTML, XML: Do We Really Need All That?

Meta attributes define properties of a document each meta element specifies a single property/value pair

the list of properties is open-ended authoring tools should ensure that all meta elements have

a title with meaningful description information related to temporal and linking behavior of the

document Parallel/sequential playback of the children Complex synchronization possible Synchronization alternatives possible

Page 58: SGML, HTML, XML: Do We Really Need All That?

Hyperlinking elements navigational links between elements links are unidirectional and single-headed SMIL supports name fragment identifiers and the '#'

connector (just like HTML – http://foo.com/some/path#anchor1)

the a element used as in HTML – associates a link with a complete media object only New link (presentation) can replace the old one New link (presentation) can be added to the old one New link (presentation) can pause the old one

Page 59: SGML, HTML, XML: Do We Really Need All That?

Summary XML is “HTML done right” Widespread use in many areas: web publishing, document

processing, multimedia, B2B electronic commerce … Tools added daily Database connection: crucial for success

Page 60: SGML, HTML, XML: Do We Really Need All That?

XML links www.w3c.org http://www.software.ibm.com/xml/ http://msdn.microsoft.com/xml/ www.xml.org www.xml.com …