DMT Week 3

14
Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar

description

Leiden University. The university to discover. DMT Week 3. Adriaan van der Weel and Peter Verhaar. Leiden University. The university to discover. Where do we stand?. Leiden University. The university to discover. Principles of markup. HTML: Document instance (your CV) Stylesheet (css) - PowerPoint PPT Presentation

Transcript of DMT Week 3

Page 1: DMT Week 3

Leiden University. The university to discover.

DMT Week 3

Adriaan van der Weel and Peter Verhaar

Page 2: DMT Week 3

Leiden University. The university to discover.

Where do we stand?

Page 3: DMT Week 3

Leiden University. The university to discover.

Principles of markup- HTML:

- Document instance (your CV)- Stylesheet (css)

- Application- Document instance (your CV)- Stylesheet (css)- DTD/Schema- Add: Prologue (XML decl.; DTD)

Page 4: DMT Week 3

Leiden University. The university to discover.

Text and markup

Page 5: DMT Week 3

Leiden University. The university to discover.

Knowledge representation- Structure and content- Ontology

- What knowable things exist - What are the relationships that hold

between them- Tree diagram

- The book has structure and content: chapters, paragraphs, footnotes, etc.

- XML represents structure and content- Various ontologies - various DTDs

Page 6: DMT Week 3

Leiden University. The university to discover.

XML Basics 1- Elements <p>...</p>- Attributes <title

type=play>...</title>- Entities

- Character: &#xE8; = è- General entities, referencing:

• Chunks of text defined elsewhere• Text or image files, etc. • E.g., <p>The &BTCP; aims to ... </p>

- Well-formedness, validation- Prologue (XML decl.; DTD)

Page 7: DMT Week 3

Leiden University. The university to discover.

XML Basics 2- Open standard (cf de facto standard):

- Publicly available- Royalty-free- Fully and publicly documented

- NB: ‘Who owns your data?’- (Lower) ASCII and Unicode:

- Platform and software independent- Software independent- Device independent

Page 8: DMT Week 3

Leiden University. The university to discover.

Open standards 1- Open standards in a networking

world- Why?- Which? E.g., Internet Protocol Suite:

- Link layer (physical/data, e.g., ethernet)

- Internet layer, facilitating transport, e.g., IP

- Transport layer, e.g. TCP- Application layer, e.g., HTTP,

SMTP, FTP

Page 9: DMT Week 3

Leiden University. The university to discover.

Open standards 2- E.g.:

- File format: Pdf, txt- Programming language: PHP,

Linux- Style language: CSS, XSLT- Markup metalanguage: SGML, XML- Markup language: DocBook, HTML,

EAD, TEI

Page 10: DMT Week 3

Leiden University. The university to discover.

TEI basics- Text Encoding Initiative, 1987- Text exchange in the humanities- TEI is a DTD

- TEI is a collection of DTD fragments or modules

- Platform and software independent (ASCII); open standard; open source

- Used in an XML application (diagram)

- Document ‘instances’ should be validated against the TEI DTD

Page 11: DMT Week 3

Leiden University. The university to discover.

TEI DTD- The TEI DTD is modular. We use:

- <!DOCTYPE TEI PUBLIC "-//TEI P5//DTD Main Document Type//EN" "http://www.tei-c.org/release/xml/tei/schema/dtd//tei.dtd" [

<!ENTITY % TEI.header "INCLUDE"> <!ENTITY % TEI.core "INCLUDE"> <!ENTITY % TEI.textstructure "INCLUDE"> <!ENTITY % TEI.transcr "INCLUDE"> <!ENTITY % TEI.linking "INCLUDE"> <!ENTITY % TEI.namesdates "INCLUDE"> ]>

http://www.tei-c.org/release/xml/tei/schema/dtd/

Page 12: DMT Week 3

Leiden University. The university to discover.

Why this rigmarole?

- Print (‘Order of the Book’):- Author’s brain > Book > reader’s brain- Instrument: typography

- Digital (‘Digital Order’?):- Author’s brain > Computer > reader’s

brain- Instrument: markup- For both typography(=form) and

content

- So: Need to make text intelligent

Page 13: DMT Week 3

Leiden University. The university to discover.

Using the computer / UM- Author’s brain > Computer > reader’s

brain

- Vary output format (paper, pdf, html, mobile phone, etc.)

- Exchange- Reuse - Search and select- Count- Change content (order) and form- Etcetera

Page 14: DMT Week 3

Leiden University. The university to discover.

New research questions?- Chris Anderson (The Long Tail), in Wired

‘The end of theory’- But: need for hypothesis remains- But: humanities data:

- Quantity: not such a wealth of data. Bitty. Discontinuous.

- Quality: narrative, evaluative, ambiguous, subjective, conceptual

- Who decides the agenda? Need to lead, rather than follow.