A Spot of TEI
-
Upload
hugh-cayless -
Category
Documents
-
view
349 -
download
0
description
Transcript of A Spot of TEI
February 4th, 2013
A spot of TEIHugh Cayless, [email protected] me on Twitter: @hcayless
Who am I?
✤ Ph.D. in Classics, M.S. in Information Science
✤ Worked as a software engineer for the last 12 years or so
✤ the last 4 have been for NYU doing Digital Classics and similar cultural heritage digital access projects
✤ recently elected to the TEI Technical Council.
✤ One of the founders of EpiDoc, a TEI-based standard for encoding ancient inscriptions (and now papyri too).
What am I talking about?
✤ How we use TEI/XML in projects
✤ Why TEI?
✤ Current projects
Integrating Digital Papryology
✤ Unification of several long-running projects:
✤ Duke Databank of Documentary Papyri (DDbDP)✤ Heidelberg Gesamtverzeichnis (directory of Greek documentary
papyri — HGV)✤ Advanced Papyrological Information System (APIS)✤ Bibliographie Papyrologique✤ Trismegistos
State of play at the beginning
✤ DDbDP: TEI SGML files
✤ HGV: Filemaker Pro database + web interface
✤ APIS: idiosyncratic text-based catalog + images + web interface
✤ BP: database only, published annually in print/on disk
✤ TM: database + web interface
✤ TM is a going concern, working with IDP, but with no plans to be subsumed by it
What we did
✤ DDbDP: converted TEI SGML to EpiDoc (TEI) XML
✤ HGV: converted to EpiDoc XML
✤ APIS: converted to EpiDoc XML
✤ BP: converted to TEI <bibl> fragments
✤ TM: inserted TM ids into IDP documents, generated linkages to TM site
Structure
✤ The core of the system is just TEI files in a Git repository.
✤ These are transformed, using XSLT, into RDF, HTML, plain text, and add documents for our search index.
✤ They are pulled into an editing workflow system as needed, which allows editing the files using a web form or (for texts) a non-XML syntax based on papyrological/epigraphic editing conventions.
✤ An automated process syncs data from the editor’s repo and a Github repo, and publishes them to the site.
Or, visually
Canonical Git Repo
Github Repo
Github
Git Repos
Editor Database
Numbers Server
Papyri.info Git Repo
Navigator Interface
search API
SPARQL API
XSLT API
Editor
Automated Document Sync
Leiden+ Conversion
API
Search Engine
So why TEI?
✤ Lots of reasons:
✤ Granular control over records
✤ Attribution
✤ Multiple outputs
✤ Mixture of controlled and free-form data
✤ Relatively easy to obtain / create tools
✤ Engaged and responsive community
What I’m working on now
✤ Fixing the TEI Pointer spec
✤ Annotation of documents to mark things like personal and place names
✤ Linguistic annotation
✤ Linking text and image
Some examples
✤ http://papyri.info/ddbdp/cpr;8;72
✤ fine-grained attribution / version control (click on “Editorial History”) and “Detailed” at the bottom of the text)
✤ http://papyri.info/ddbdp/c.ep.lat;;218
✤ What’s going on underneath?
r ̣[ ̣ ̣ ̣] ̣c ̣[ ̣ ̣ Aelio Fel]ici pluṛ[imam] ṣạ[lutem] opto deos · ut mi[hi v]ạleas · quod ṃẹ[um votum est] ego enim · valeọ coṛpọṛe ̣ ̣ ̣[ -ca.?- ] te non videọ rog̣ọ ṇe · fac ̣ịaṣ [ -ca.?- ] f ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣[- ca.9 -]uma ̣[ -ca.?- ] ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ [ -ca.?- ]vAelio Felici
Beginning of a letter marked up according to the Leiden Conventions
r ̣[ ̣ ̣ ̣] ̣C ̣[ ̣ ̣–ca.9– ]ICIPLUṚ[. . . .] ṢẠ[. . . . .] OPTODEOS · UTMI[. . . ]ẠLEAS · QUODṂẸ[ –ca.10– ] EGOENIM · VALEỌCOṚPỌṚE ̣ ̣ ̣[ -ca.?- ] TENONVIDEỌROG̣ỌṆE · FAC̣ỊAṢ [ -ca.?- ] F ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣[- ca.9 -]UMA ̣[ -ca.?- ] ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ ̣ [ -ca.?- ]vAELIO FELICI
The same letter, diplomatic(ish) edition
<div xml:lang="la" type="edition" xml:space="preserve"><div n="r" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="3" unit="character"/><gap reason="illegible" quantity="1" unit="character"/>c<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/><supplied reason="lost"> Aelio Fel</supplied>ici plu<unclear>r</unclear><supplied reason="lost">imam</supplied><lb n="2"/><unclear>sa</unclear><supplied reason="lost">lutem</supplied><lb n="3"/>opto deos <g type="middot"/> ut mi<supplied reason="lost">hi v</supplied><unclear>a</unclear>leas <g type="middot"/> quod <unclear>me</unclear><supplied reason="lost">um votum est</supplied><lb n="4"/>ego enim <g type="middot"/> vale<unclear>o</unclear> co<unclear>r</unclear>p<unclear>or</unclear>e <gap reason="illegible" quantity="3" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="5"/>te non vide<unclear>o</unclear> ro<unclear>go</unclear> <unclear>n</unclear>e <g type="middot"/> fa<unclear>ci</unclear>a<unclear>s</unclear> <gap reason="lost" extent="unknown" unit="character"/><lb n="6"/>f<gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/><gap reason="lost" quantity="9" unit="character"/>uma<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="7"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/> <gap reason="lost" extent="unknown" unit="character"/></ab></div><div n="v" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/>Aelio Felici </ab></div></div>
The same letter marked up in EpiDoc (TEI) XML
The same letter, visualization of the tree structure of the XML
✤ What is the text and what is the markup?
✤ There is no text, only readings. EpiDoc allows you to produce models of readings.
✤ Slicing the text up into bits isn’t adulterating it, it just adds hooks for transforming the text in useful ways.
✤ Mailing list: [email protected] ✤ http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1✤ http://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l
✤ TEI Sourceforge:✤ Report a bug:
✤ http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644062✤ Make a feature request:
✤ http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644065
✤ IRC: #tei-c on http://freenode.net/
How to get involved