Structure in documents: an introduction - Text · PDF fileStructure in documents: an...

36
Structure in documents: an introduction text matters Structure in documents: an introduction Being an introduction to the use of databases and markup languages to help the designer make electronic and paper documents work harder and be more useful, concentrating on the markup language called XML.

Transcript of Structure in documents: an introduction - Text · PDF fileStructure in documents: an...

Page 1: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Structure

text matters

an introduction

f databases and markup ke electronic and paper re useful, concentrating L.

in documents: an introduction

Structure in documents:

Being an introduction to the use olanguages to help the designer madocuments work harder and be moon the markup language called XM

Page 2: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Structure

text matters

ture

tic use of type, space meaning to a human

atic use of electronic eaning to a machine.

finitions and schemas

in documents: an introduction Visible and invisible structure

Visible and invisible struc

Visible structure: regular, systemaand colour to expose a document’suser. Codified in• grids• style manuals

Invisible structure: regular, systemmarkers to expose a document’s mCodified in• database tables and schemas• SGML/XML Document Type De

Page 3: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Structure

text matters

t?

o:

r different audiences

their meaningystems

in documents: an introduction Why is structure important?

Why is structure importan

A structured document allows us t• design sites rather than pages• publish different ‘slices’ of data fo• auto-publish new data• navigate documents according to• exchange information between s

Page 4: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Two over

text matters

al systems

and SGML add labels . The labels are often

s of information into ructure. The labels exist r than the document. In to most databases.

lapping structural systems Why is structure important?

Two overlapping structur

Markup languages such as XML about structure within documentsdirectly human-readable.

Databases separate different typedifferent fields within an overall stwithin the database software rathefact ‘document’ is an alien concept

Page 5: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Why XM

text matters

L – it is the new

ation between many

r the exchange of information systems

L matters Why is structure important?

Why XML matters

XML is the replacement for HTMlanguage of the Web

XML is a way of exchanging informdifferent computer systems

XML is the mandatory standard foinformation between government

Page 6: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

rmation

Fo ppearance

For IS p different computer

L is Why is structure important?

What XML is

A standard way of marking-up info

r publishers: a way of separating content from a

rofessionals: a way of exchanging data betweensystems.

Page 7: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

L is How it works

How it works

Page 8: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

XML an ML.

separate formatting a web page – but with

ia separate formatting haviour of a web page –

L is How it works

d HTML XML is a markup language like HT

Like HTML, XML can be used (viarules) to control the typography ofgreater control and flexibility.

Like DHTML, XML can be used (vrules) to control the layout and bebut much more powerfully.

Page 9: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

ed to move information s across the web.

HT text and pictures

X information – text and

Web browser

Foreign’ computer

L is How it works

Unlike HTML, XML is ideally suitdirectly between computer system

ML describes The appearance of information –

ML describes The structure and relationships ofpictures and data

XML file

Formatting rules

Page 10: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

What Xlike

rounding text, pictures

a single schema – to e throughout the world:

L is How it works

ML looks XML – like HTML – works by surand other information with ‘tags’:

<tag>This is some text</tag>

Tags in HTML In HTML there is one set of tags –cover everything on every web pag

<H1>Heading</H1><P>And a paragraph</P>

Page 11: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

articular class of as, each specific to a

tags called <item>,

gs called <prelims>,

L is How it works

Tags in XML In XML there is a set of tags for a pdocuments: there are many schemparticular class of document:

A schema for invoices might have <price>, <quantity>, etc

A schema for books might have ta<body>, <chapter>, etc

Page 12: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

Sch ment or a class of

ocuments (instances) in

eaningfully to each

rm to one (or more)

L is How it works

emas in XML A schema describes a class of docuinformation.

A schema says what will be in all dthe class.

Schemas allow computers to talk mother

Almost all XML documents confoschema

Page 13: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

ld-wide (see http://te, local, or confined to a p or project.

is critical to the success

L is How it works

Some schemas are public and worwww.schema.net/), some are privaparticular industry or interest grou

Making or using the right schema of any project.

Page 14: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

L is Putting XML on the web

Putting XML on the web

Page 15: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

How XMreache

ished’ direct in XML.

ehind the scenes’ – the rms it to HTML, and

systems to the webshing’ projects

L is Putting XML on the web

L s the web

Today, very few web sites are ‘publ

Instead, if XML is used, it works ‘bweb server reads the XML, transfosends the HTML to the user.

XML is mostly used today as • an intermediate format joining IT• a way of delivering ‘parallel publi

Page 16: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

Joining

s to the public or other eutral’ way of sending

Web browser

IT system

L is Putting XML on the web

IT systems tothe web

Websites which knit ‘live’ IT systembusinesses use XML as a ‘system-nand receiving information.

IT system

XML to common

IT system

schema

Web server

Page 17: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

traditional IT ‘back-

ck-end’ systems changeL

an using ASP/JSP etctand XMLifferent HTML flavours

L is Putting XML on the web

For IS professionals• ‘Middleware’ to create XML from

end’ systems is easy to write• XML can stay the same when ‘ba• the web can be used to send XM

For web managers• Designing for XML is simpler th• Web servers increasingly unders• Servers can transform XML to d

on the fly

Page 18: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

Delivepublish

on must be available in ats at the same time.

t can be printed (via a onal print) and ent management system.

Web browser

Printing

L is Putting XML on the web

ring ‘paralleling’ projects

Projects where the same informatiprint, web, and possibly other form

The same XML ‘master’ documenPostScript formatter and conventipublished via a web server or cont

Author

XML master

Designer

Web server

document

Formatter

Page 19: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

What XM

text matters

ts can be very different in to the PostScript changing the XML

sers but they are not yet ML publishing, except

.

:

n 6 Preview 1 or higher)

L is Putting XML on the web

The ‘look and feel’ of the documen– design and interaction are built-formatter and web server without master document.

Direct to web Most users have XML-aware browcommon enough to allow ‘direct’ Xfor specialist/controlled audiences

XML-compatible browsers include• Internet Explorer 5.0 or higher• Netscape Navigator soon (Versio• Opera 4.0 or higher

Page 20: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

XML and

text matters

no information about e kind of content.

system (for example, a owser) which uses es Cascading Style

this kind of information

design Putting XML on the web

XML and design

XML documents generally containdesign – just information about th

Design is ‘applied’ by a formattingweb server or sometimes a web brformatting rules, just as HTML usSheets (CSS).

The formatting system says ‘make look like this on the web’.

Page 21: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

XML and

text matters

ules

e and Transformation

t.

design Standards for formatting rules

Standards for formatting r

• CSS – Cascading Style Sheets• XSL – eXtensible Style Language• XSLT – eXtensible Style Languag

Each is more powerful than the las

Page 22: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

XML and

text matters

CSS ar to web designers now, HTML files. The

ut and typographyr XSL/XSLT

design Standards for formatting rules

Cascading Style Sheets are familiand work well with XML as well asstandard:• is stable• is firmly focused on the web• allows quite good control of layo• works with HTML in the browse• is a ‘transition’ standard towards

Page 23: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

XML and

text matters

XSL & X Xtensible Style new standards which L) files. They:

the web and also (soon?)

and typographyell as designd send transformed

design Standards for formatting rules

SLT eXtensible Style Language and eLanguage – Transformation are work only with XML (and XHTM• are not quite stable standards• can be used to control design on

for print.• allow excellent control of layout • allow selection and ordering as w• work with XML on the server an

HTML or XML to the user.

Page 24: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Structure

text matters

f tables of information nd columns are called

ional: one database uses ere are other types.

colour

green

l orange

blue

in databases Standards for formatting rules

Structure in databases

The traditional database consists oin which rows are called records afields.

Most databases these days are relatmany tables with relationships. Th

name size

record1 Fred large

record2 Natalie smal

record3 Fiona large

Page 25: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Structure

text matters

rnal working of ported language – SQL

r putting information in

pecific tools) to add

call information from a

in databases Talking to databases

Talking to databases

There are no standards for the intedatabases, but there is a widely-sup(Structured Query Language) – foand getting it out of the database.

You can use SQL (or application-smarkup to database output.

You can use ASP or JSP or PHP todatabase with suitable drivers.

Page 26: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Structure

text matters

arkup – either or PostScript, or SGML. It’s

utput marked-up

d-up documents, widespread adoption of stems.

in databases Markup and databases

Markup and databases

Most databases don’t understand mprocedural markup such as HTMLstructural markup such as XML or• (quite) easy to get databases to o

documents. • herder to get them to read marke

though this is changing with the XML throughout information sy

Page 27: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

hing systems

tabase-based publishing

-centred publishing systems Markup and databases

Database-centred publis

Two main technologies used in dasystems:• Relational or flat databases• Object repositories

Page 28: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

blishing

t, surrounded by input ilters, or exporting to

hers are relational BMS) used in many

ediaSurface and

base and web

-centred publishing systems Conventional database publishing

Conventional database pu

Database used as the master forma(editing) and export (publishing) fsuitable publishing software

Oracle, Informix, IBM DB2 and otdatabase management systems (RDweb-publishing systems such as MStoryServer.

Lotus Notes/Domino is a ‘flat’ datapublishing system.

Page 29: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

PostScriptprint

L website

L (further cessing)

nt formattware

-centred publishing systems Conventional database publishing

DatabaseAuthors and

editors in XMLdocument

Authors andeditors direct

HTM

XMpro

Prisof

Page 30: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

Plus po n such as catalogues,

ms for authors/editors and/or XMLs’ on data reflecting

als

-centred publishing systems Conventional database publishing

ints • ideal for ‘table-based’ informatioprice lists and statistics

• straightforward to build web-for• straightforward to export HTML• easy to make ‘live transformation

changes instantly• well-understood by IS profession

Page 31: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

Minus p uments (!)espoke formatting hing process may be

nnectionthoring is unpleasant

-centred publishing systems Conventional database publishing

oints • Not suited to ‘document-like’ doc• Output to print often difficult: b

software or separate print-publisrequired

• Remote authors need live web co• For most purposes, web-form au

Page 32: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

r format, surrounded by ing) filters, or exporting

ata structures as well as l in dealing with XML.

for revision control

x-hive

-centred publishing systems Object repositories

Object repositories

Object database used as the masteinput (editing) and export (publishto suitable publishing software.

Object databases deal with ‘tree’ d‘table’ data structures. This is usefu

Normally contain ‘workflow’ tools

Oracle IFS, Zope, Frontier, POET:

Page 33: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

PostScriptprint

website

further ssing)

ormatare

-centred publishing systems Object repositories

DatabaseAuthors and

editors in XMLdocument

Authors andeditors direct

HTML

XML (proce

Print fsoftw

DTD(s)

Page 34: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Database

text matters

Plus poryagementted’ XML

-centred publishing systems Object repositories

ints • Object databases are cool• System design is simple – in theo• Usually suited for workflow man• Work well with ‘document-orien

Page 35: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Object r

text matters

Minus p m

oreky/expensive

epositories

oints • Not many products to choose fro• Can be expensive• Tools are scarce or young, theref• Implementing projects can be ris

Page 36: Structure in documents: an introduction - Text · PDF fileStructure in documents: an introduction text matters ... Structure in documents: ... Object databases deal with ‘tree’

Who doe

text matters

esigners and IS rk increasingly closely

or most designerse too complex for most

ortant: getting e important (and harder

s what Object repositories

Who does what

Web publishing already involves dprofessionals. They will have to wobecause:• XSL and XSLT are too complex f• user interaction and branding ar

IS professionals• e-everything is getting more imp

everything right is more and morand harder)