A Spot of TEI

17
February 4th, 2013 A spot of TEI Hugh Cayless, NYU [email protected] follow me on Twitter: @hcayless

description

Presentation on TEI (particularly as it relates to the Integrating Digital Papyrology project). Given at U. of South Carolina Center for Digital Humanities, 2/4/2013

Transcript of A Spot of TEI

Page 1: A Spot of TEI

February 4th, 2013

A spot of TEIHugh Cayless, [email protected] me on Twitter: @hcayless

Page 2: A Spot of TEI

Who am I?

✤ Ph.D. in Classics, M.S. in Information Science

✤ Worked as a software engineer for the last 12 years or so

✤ the last 4 have been for NYU doing Digital Classics and similar cultural heritage digital access projects

✤ recently elected to the TEI Technical Council.

✤ One of the founders of EpiDoc, a TEI-based standard for encoding ancient inscriptions (and now papyri too).

Page 3: A Spot of TEI

What am I talking about?

✤ How we use TEI/XML in projects

✤ Why TEI?

✤ Current projects

Page 4: A Spot of TEI

Integrating Digital Papryology

✤ Unification of several long-running projects:

✤ Duke Databank of Documentary Papyri (DDbDP)✤ Heidelberg Gesamtverzeichnis (directory of Greek documentary

papyri — HGV)✤ Advanced Papyrological Information System (APIS)✤ Bibliographie Papyrologique✤ Trismegistos

Page 5: A Spot of TEI

State of play at the beginning

✤ DDbDP: TEI SGML files

✤ HGV: Filemaker Pro database + web interface

✤ APIS: idiosyncratic text-based catalog + images + web interface

✤ BP: database only, published annually in print/on disk

✤ TM: database + web interface

✤ TM is a going concern, working with IDP, but with no plans to be subsumed by it

Page 6: A Spot of TEI

What we did

✤ DDbDP: converted TEI SGML to EpiDoc (TEI) XML

✤ HGV: converted to EpiDoc XML

✤ APIS: converted to EpiDoc XML

✤ BP: converted to TEI <bibl> fragments

✤ TM: inserted TM ids into IDP documents, generated linkages to TM site

Page 7: A Spot of TEI

Structure

✤ The core of the system is just TEI files in a Git repository.

✤ These are transformed, using XSLT, into RDF, HTML, plain text, and add documents for our search index.

✤ They are pulled into an editing workflow system as needed, which allows editing the files using a web form or (for texts) a non-XML syntax based on papyrological/epigraphic editing conventions.

✤ An automated process syncs data from the editor’s repo and a Github repo, and publishes them to the site.

Page 8: A Spot of TEI

Or, visually

Canonical Git Repo

Github Repo

Github

Git Repos

Editor Database

Numbers Server

Papyri.info Git Repo

Navigator Interface

search API

SPARQL API

XSLT API

Editor

Automated Document Sync

Leiden+ Conversion

API

Search Engine

Page 9: A Spot of TEI

So why TEI?

✤ Lots of reasons:

✤ Granular control over records

✤ Attribution

✤ Multiple outputs

✤ Mixture of controlled and free-form data

✤ Relatively easy to obtain / create tools

✤ Engaged and responsive community

Page 10: A Spot of TEI

What I’m working on now

✤ Fixing the TEI Pointer spec

✤ Annotation of documents to mark things like personal and place names

✤ Linguistic annotation

✤ Linking text and image

Page 11: A Spot of TEI

Some examples

✤ http://papyri.info/ddbdp/cpr;8;72

✤ fine-grained attribution / version control (click on “Editorial History”) and “Detailed” at the bottom of the text)

✤ http://papyri.info/ddbdp/c.ep.lat;;218

✤ What’s going on underneath?

Page 12: A Spot of TEI

r  ̣[  ̣  ̣  ̣]  ̣c  ̣[  ̣  ̣ Aelio Fel]ici pluṛ[imam] ṣạ[lutem] opto deos · ut mi[hi v]ạleas · quod ṃẹ[um votum est] ego enim · valeọ coṛpọṛe   ̣  ̣  ̣[ -ca.?- ] te non videọ rog̣ọ ṇe · fac ̣ịaṣ [ -ca.?- ] f  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣[- ca.9 -]uma  ̣[ -ca.?- ]   ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣ [ -ca.?- ]vAelio Felici

Beginning of a letter marked up according to the Leiden Conventions

Page 13: A Spot of TEI

r  ̣[  ̣  ̣  ̣]  ̣C  ̣[  ̣  ̣–ca.9– ]ICIPLUṚ[. . . .] ṢẠ[. . . . .] OPTODEOS · UTMI[. . . ]ẠLEAS · QUODṂẸ[ –ca.10– ] EGOENIM · VALEỌCOṚPỌṚE   ̣  ̣  ̣[ -ca.?- ] TENONVIDEỌROG̣ỌṆE · FAC̣ỊAṢ [ -ca.?- ] F ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣[- ca.9 -]UMA  ̣[ -ca.?- ]   ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣ [ -ca.?- ]vAELIO FELICI

The same letter, diplomatic(ish) edition

Page 14: A Spot of TEI

<div xml:lang="la" type="edition" xml:space="preserve"><div n="r" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="3" unit="character"/><gap reason="illegible" quantity="1" unit="character"/>c<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/><supplied reason="lost"> Aelio Fel</supplied>ici plu<unclear>r</unclear><supplied reason="lost">imam</supplied><lb n="2"/><unclear>sa</unclear><supplied reason="lost">lutem</supplied><lb n="3"/>opto deos <g type="middot"/> ut mi<supplied reason="lost">hi v</supplied><unclear>a</unclear>leas <g type="middot"/> quod <unclear>me</unclear><supplied reason="lost">um votum est</supplied><lb n="4"/>ego enim <g type="middot"/> vale<unclear>o</unclear> co<unclear>r</unclear>p<unclear>or</unclear>e <gap reason="illegible" quantity="3" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="5"/>te non vide<unclear>o</unclear> ro<unclear>go</unclear> <unclear>n</unclear>e <g type="middot"/> fa<unclear>ci</unclear>a<unclear>s</unclear> <gap reason="lost" extent="unknown" unit="character"/><lb n="6"/>f<gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/><gap reason="lost" quantity="9" unit="character"/>uma<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="7"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/> <gap reason="lost" extent="unknown" unit="character"/></ab></div><div n="v" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/>Aelio Felici </ab></div></div>

The same letter marked up in EpiDoc (TEI) XML

Page 15: A Spot of TEI

The same letter, visualization of the tree structure of the XML

Page 16: A Spot of TEI

✤ What is the text and what is the markup?

✤ There is no text, only readings. EpiDoc allows you to produce models of readings.

✤ Slicing the text up into bits isn’t adulterating it, it just adds hooks for transforming the text in useful ways.

Page 17: A Spot of TEI

✤ Mailing list: [email protected] ✤ http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1✤ http://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l

✤ TEI Sourceforge:✤ Report a bug:

✤ http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644062✤ Make a feature request:

✤ http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644065

✤ IRC: #tei-c on http://freenode.net/

How to get involved