Structure in documents: an introduction - Text · PDF fileStructure in documents: an...
Transcript of Structure in documents: an introduction - Text · PDF fileStructure in documents: an...
Structure
text matters
an introduction
f databases and markup ke electronic and paper re useful, concentrating L.
in documents: an introduction
Structure in documents:
Being an introduction to the use olanguages to help the designer madocuments work harder and be moon the markup language called XM
Structure
text matters
ture
tic use of type, space meaning to a human
atic use of electronic eaning to a machine.
finitions and schemas
in documents: an introduction Visible and invisible structure
Visible and invisible struc
Visible structure: regular, systemaand colour to expose a document’suser. Codified in• grids• style manuals
Invisible structure: regular, systemmarkers to expose a document’s mCodified in• database tables and schemas• SGML/XML Document Type De
Structure
text matters
t?
o:
r different audiences
their meaningystems
in documents: an introduction Why is structure important?
Why is structure importan
A structured document allows us t• design sites rather than pages• publish different ‘slices’ of data fo• auto-publish new data• navigate documents according to• exchange information between s
Two over
text matters
al systems
and SGML add labels . The labels are often
s of information into ructure. The labels exist r than the document. In to most databases.
lapping structural systems Why is structure important?
Two overlapping structur
Markup languages such as XML about structure within documentsdirectly human-readable.
Databases separate different typedifferent fields within an overall stwithin the database software rathefact ‘document’ is an alien concept
Why XM
text matters
L – it is the new
ation between many
r the exchange of information systems
L matters Why is structure important?
Why XML matters
XML is the replacement for HTMlanguage of the Web
XML is a way of exchanging informdifferent computer systems
XML is the mandatory standard foinformation between government
What XM
text matters
rmation
Fo ppearance
For IS p different computer
L is Why is structure important?
What XML is
A standard way of marking-up info
r publishers: a way of separating content from a
rofessionals: a way of exchanging data betweensystems.
What XM
text matters
L is How it works
How it works
What XM
text matters
XML an ML.
separate formatting a web page – but with
ia separate formatting haviour of a web page –
L is How it works
d HTML XML is a markup language like HT
Like HTML, XML can be used (viarules) to control the typography ofgreater control and flexibility.
Like DHTML, XML can be used (vrules) to control the layout and bebut much more powerfully.
What XM
text matters
ed to move information s across the web.
HT text and pictures
X information – text and
Web browser
Foreign’ computer
L is How it works
Unlike HTML, XML is ideally suitdirectly between computer system
ML describes The appearance of information –
ML describes The structure and relationships ofpictures and data
XML file
Formatting rules
‘
What XM
text matters
What Xlike
rounding text, pictures
a single schema – to e throughout the world:
L is How it works
ML looks XML – like HTML – works by surand other information with ‘tags’:
<tag>This is some text</tag>
Tags in HTML In HTML there is one set of tags –cover everything on every web pag
<H1>Heading</H1><P>And a paragraph</P>
What XM
text matters
articular class of as, each specific to a
tags called <item>,
gs called <prelims>,
L is How it works
Tags in XML In XML there is a set of tags for a pdocuments: there are many schemparticular class of document:
A schema for invoices might have <price>, <quantity>, etc
A schema for books might have ta<body>, <chapter>, etc
What XM
text matters
Sch ment or a class of
ocuments (instances) in
eaningfully to each
rm to one (or more)
L is How it works
emas in XML A schema describes a class of docuinformation.
A schema says what will be in all dthe class.
Schemas allow computers to talk mother
Almost all XML documents confoschema
What XM
text matters
ld-wide (see http://te, local, or confined to a p or project.
is critical to the success
L is How it works
Some schemas are public and worwww.schema.net/), some are privaparticular industry or interest grou
Making or using the right schema of any project.
What XM
text matters
L is Putting XML on the web
Putting XML on the web
What XM
text matters
How XMreache
ished’ direct in XML.
ehind the scenes’ – the rms it to HTML, and
systems to the webshing’ projects
L is Putting XML on the web
L s the web
Today, very few web sites are ‘publ
Instead, if XML is used, it works ‘bweb server reads the XML, transfosends the HTML to the user.
XML is mostly used today as • an intermediate format joining IT• a way of delivering ‘parallel publi
What XM
text matters
Joining
s to the public or other eutral’ way of sending
Web browser
IT system
L is Putting XML on the web
IT systems tothe web
Websites which knit ‘live’ IT systembusinesses use XML as a ‘system-nand receiving information.
IT system
XML to common
IT system
schema
Web server
What XM
text matters
traditional IT ‘back-
ck-end’ systems changeL
an using ASP/JSP etctand XMLifferent HTML flavours
L is Putting XML on the web
For IS professionals• ‘Middleware’ to create XML from
end’ systems is easy to write• XML can stay the same when ‘ba• the web can be used to send XM
For web managers• Designing for XML is simpler th• Web servers increasingly unders• Servers can transform XML to d
on the fly
What XM
text matters
Delivepublish
on must be available in ats at the same time.
t can be printed (via a onal print) and ent management system.
Web browser
Printing
L is Putting XML on the web
ring ‘paralleling’ projects
Projects where the same informatiprint, web, and possibly other form
The same XML ‘master’ documenPostScript formatter and conventipublished via a web server or cont
Author
XML master
Designer
Web server
document
Formatter
What XM
text matters
ts can be very different in to the PostScript changing the XML
sers but they are not yet ML publishing, except
.
:
n 6 Preview 1 or higher)
L is Putting XML on the web
The ‘look and feel’ of the documen– design and interaction are built-formatter and web server without master document.
Direct to web Most users have XML-aware browcommon enough to allow ‘direct’ Xfor specialist/controlled audiences
XML-compatible browsers include• Internet Explorer 5.0 or higher• Netscape Navigator soon (Versio• Opera 4.0 or higher
XML and
text matters
no information about e kind of content.
system (for example, a owser) which uses es Cascading Style
this kind of information
design Putting XML on the web
XML and design
XML documents generally containdesign – just information about th
Design is ‘applied’ by a formattingweb server or sometimes a web brformatting rules, just as HTML usSheets (CSS).
The formatting system says ‘make look like this on the web’.
XML and
text matters
ules
e and Transformation
t.
design Standards for formatting rules
Standards for formatting r
• CSS – Cascading Style Sheets• XSL – eXtensible Style Language• XSLT – eXtensible Style Languag
Each is more powerful than the las
XML and
text matters
CSS ar to web designers now, HTML files. The
ut and typographyr XSL/XSLT
design Standards for formatting rules
Cascading Style Sheets are familiand work well with XML as well asstandard:• is stable• is firmly focused on the web• allows quite good control of layo• works with HTML in the browse• is a ‘transition’ standard towards
XML and
text matters
XSL & X Xtensible Style new standards which L) files. They:
the web and also (soon?)
and typographyell as designd send transformed
design Standards for formatting rules
SLT eXtensible Style Language and eLanguage – Transformation are work only with XML (and XHTM• are not quite stable standards• can be used to control design on
for print.• allow excellent control of layout • allow selection and ordering as w• work with XML on the server an
HTML or XML to the user.
Structure
text matters
f tables of information nd columns are called
ional: one database uses ere are other types.
colour
green
l orange
blue
in databases Standards for formatting rules
Structure in databases
The traditional database consists oin which rows are called records afields.
Most databases these days are relatmany tables with relationships. Th
name size
record1 Fred large
record2 Natalie smal
record3 Fiona large
Structure
text matters
rnal working of ported language – SQL
r putting information in
pecific tools) to add
call information from a
in databases Talking to databases
Talking to databases
There are no standards for the intedatabases, but there is a widely-sup(Structured Query Language) – foand getting it out of the database.
You can use SQL (or application-smarkup to database output.
You can use ASP or JSP or PHP todatabase with suitable drivers.
Structure
text matters
arkup – either or PostScript, or SGML. It’s
utput marked-up
d-up documents, widespread adoption of stems.
in databases Markup and databases
Markup and databases
Most databases don’t understand mprocedural markup such as HTMLstructural markup such as XML or• (quite) easy to get databases to o
documents. • herder to get them to read marke
though this is changing with the XML throughout information sy
Database
text matters
hing systems
tabase-based publishing
-centred publishing systems Markup and databases
Database-centred publis
Two main technologies used in dasystems:• Relational or flat databases• Object repositories
Database
text matters
blishing
t, surrounded by input ilters, or exporting to
hers are relational BMS) used in many
ediaSurface and
base and web
-centred publishing systems Conventional database publishing
Conventional database pu
Database used as the master forma(editing) and export (publishing) fsuitable publishing software
Oracle, Informix, IBM DB2 and otdatabase management systems (RDweb-publishing systems such as MStoryServer.
Lotus Notes/Domino is a ‘flat’ datapublishing system.
Database
text matters
PostScriptprint
L website
L (further cessing)
nt formattware
-centred publishing systems Conventional database publishing
DatabaseAuthors and
editors in XMLdocument
Authors andeditors direct
HTM
XMpro
Prisof
Database
text matters
Plus po n such as catalogues,
ms for authors/editors and/or XMLs’ on data reflecting
als
-centred publishing systems Conventional database publishing
ints • ideal for ‘table-based’ informatioprice lists and statistics
• straightforward to build web-for• straightforward to export HTML• easy to make ‘live transformation
changes instantly• well-understood by IS profession
Database
text matters
Minus p uments (!)espoke formatting hing process may be
nnectionthoring is unpleasant
-centred publishing systems Conventional database publishing
oints • Not suited to ‘document-like’ doc• Output to print often difficult: b
software or separate print-publisrequired
• Remote authors need live web co• For most purposes, web-form au
Database
text matters
r format, surrounded by ing) filters, or exporting
ata structures as well as l in dealing with XML.
for revision control
x-hive
-centred publishing systems Object repositories
Object repositories
Object database used as the masteinput (editing) and export (publishto suitable publishing software.
Object databases deal with ‘tree’ d‘table’ data structures. This is usefu
Normally contain ‘workflow’ tools
Oracle IFS, Zope, Frontier, POET:
Database
text matters
PostScriptprint
website
further ssing)
ormatare
-centred publishing systems Object repositories
DatabaseAuthors and
editors in XMLdocument
Authors andeditors direct
HTML
XML (proce
Print fsoftw
DTD(s)
Database
text matters
Plus poryagementted’ XML
-centred publishing systems Object repositories
ints • Object databases are cool• System design is simple – in theo• Usually suited for workflow man• Work well with ‘document-orien
Object r
text matters
Minus p m
oreky/expensive
epositories
oints • Not many products to choose fro• Can be expensive• Tools are scarce or young, theref• Implementing projects can be ris
Who doe
text matters
esigners and IS rk increasingly closely
or most designerse too complex for most
ortant: getting e important (and harder
s what Object repositories
Who does what
Web publishing already involves dprofessionals. They will have to wobecause:• XSL and XSLT are too complex f• user interaction and branding ar
IS professionals• e-everything is getting more imp
everything right is more and morand harder)