TEI Conference - CVCE

26
Florentina Armaselu – DHLab, Centre virtuel de la connaissance sur l’Europe (CVCE), Luxembourg [email protected] 1 www.cvce.eu From a Small-Scale Digital Edition to a TEI Publication Framework in Modern European History Text Encoding Initiative (TEI) Conference and Members’ Meeting. Connect, Animate, Innovate. 28 to 31 Octobre 2015. Université Lumière Lyon 2

Transcript of TEI Conference - CVCE

Page 1: TEI Conference - CVCE

Florentina Armaselu – DHLab, Centre virtuel de la connaissance sur l’Europe (CVCE), Luxembourg [email protected]

1www.cvce.eu

From a Small-Scale Digital Edition to a TEI Publication Framework in Modern

European History

Text Encoding Initiative (TEI) Conference and Members’ Meeting. Connect, Animate, Innovate. 28 to 31 Octobre 2015. Université Lumière Lyon 2

Page 2: TEI Conference - CVCE

1. The WEU-DIPLO pilot project

2. Transviewer, towards a TEI publication framework

3. Discussion

4. References

Summary

2

Page 3: TEI Conference - CVCE

Part I The WEU-DIPLO pilot project

3

Page 4: TEI Conference - CVCE

1. Goal: XML-TEI encoding, corpus analysis and Web publication of institutional documents of the W.E.U. (Western European Union):• Topics: armament production, standardization, control in the period from 1954 to 1982;• Source: Archives nationales de Luxembourg, W.E.U collection.

2. Initial format: • digitized versions (JPG) of typewritten materials (one file per page).

3. Size:

*proc. = processed

Overview of the WEU-DIPLO project

Part I. WEU-DIPLO pilot 4

Category Number of documents

Number of documents per language

Number of pages

Number of pages per language

EN FR FR proc.* EN FR FR proc.*

Note 89 43 46 37 395 191 204 155Minutes 30 15 15 15 256 138 118 118Memorandum 3 1 2 2 16 7 9 9Study 2 0 2 1 12 0 12 8

Discourse 1 0 1 0 4 0 4 0Draft protocol 2 1 1 0 4 2 2 0

Total 127 60 67 55 687 338 349 290

Page 5: TEI Conference - CVCE

Overview of the WEU-DIPLO project: workflow

Part I. WEU-DIPLO pilot 5

Page 6: TEI Conference - CVCE

Overview of the WEU-DIPLO project: page structure. ©WEU-UEO

Part I. WEU-DIPLO pilot 6

Header

Content

Footer

Page 7: TEI Conference - CVCE

Microsoft Word Styling – WEU-DIPLO

Part I. WEU-DIPLO pilot 7

Headers, footers

Headings, line breaks, paragraphs

Page 8: TEI Conference - CVCE

Conversion and enrichment (XSLT, manual, NER)

Part I. WEU-DIPLO pilot 8

OxGarage (DOCX to TEI P5)

oXygen XML Editor• XSLT transformation (metadata, structure); • manual enrichment (semantics – discourse of

country/institutional representatives)

GATE (Name Entity Recognition)• training phase (Gazetteer List Collector)• annotation phase (names of persons,

organisations, places, functions, events, products; dates) oXygen XML Editor

• XSLT (GATE XML to TEI P5 transformation)

Page 9: TEI Conference - CVCE

XML-TEI Encoding: WEU-DIPLO - metadata; layout (header). ©WEU-UEO

Part I. WEU-DIPLO pilot 9

@@hAuthor

@@hArchNum

@@hStampConfid@@hDocRef

@@hOrigDate

@@hOrigLang

@@hVersion

Page 10: TEI Conference - CVCE

XML-TEI Encoding: WEU-DIPLO – Structure (headings, paragraphs, line breaks); semantics (named entities, discourse). ©WEU-UEO

Part I. WEU-DIPLO pilot 10

@@Heading2@@Paragraph @@LineBreak@@Names

@@Discourse

Page 11: TEI Conference - CVCE

XML-TEI Encoding: WEU-DIPLO – transcription features (Pierazzo, 2011)

Part I. WEU-DIPLO pilot 11

Page 12: TEI Conference - CVCE

Part II Transviewer, towards a TEI

publication framework

12

Page 13: TEI Conference - CVCE

• Treaties; official declarations and meeting reports; letters; notes; press articles; images, video and audio archives related to European integration history

Context: The CVCE’s ePublications

Part II. Transviewer 13

Page 14: TEI Conference - CVCE

1. Transviewer concept:• XML-TEI transformation/visualisation on the fly, in the browser• flexible framework for the publication of XML-TEI documents in European

integration history;

2. Technologies : • XML, HTML, XSLT, CSS and JavaScript

3. Tested platforms:• EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/

• KILN : http://kiln.readthedocs.org/en/latest/#

• TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/

• Versioning Machine: http://v-machine.org/

• XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/

Transviewer overview

Part II. Transviewer 14

Page 15: TEI Conference - CVCE

Implementation (adaptation and in-house development):• side-by-side view digital facsimile and transcription (EVT model)

• third-party libraries:o BookReader: tool designed to provide online access to scanned books o Saxon-CE: support for XSLT 2.0 transformation in the browser

o in-house development (configuration, frames and buttons layout/actions, transcription rendering, third-party libraries calls)

Transviewer prototype

Part II. Transviewer 15

Page 16: TEI Conference - CVCE

Transviewer experiments– digital facsimile/transcription side-by-side view. ©WEU-UEO

Part II. Transviewer 16

Page 17: TEI Conference - CVCE

Transviewer experiments– digital facsimile/transcription side-by-side view. Werner – handwritten notes

Part II. Transviewer 17

Page 18: TEI Conference - CVCE

Transviewer experiments (simulation) – video/audio and transcription synchronisation. Werner - interviews

Part II. Transviewer 18

Page 19: TEI Conference - CVCE

Transviewer features – panels layouts

Part II. Transviewer 19

Page 20: TEI Conference - CVCE

Transviewer features – transcription format

Part II. Transviewer 20

Page 21: TEI Conference - CVCE

Transviewer features – panels interlinking

Part II. Transviewer 21

Page 22: TEI Conference - CVCE

Part III Discussion

22

Page 23: TEI Conference - CVCE

“By teaching an edition how to swim, I mean endowing an edition not only with a store of factual knowledge concerning the work presented, but also with the capability of dealing gracefully with the mutability of the electronic medium, by exploiting the possibilities for reader-controlled changes to the edition’s presentation and by adapting successfully to rapid changes in the hardware and software environment.” (Sperberg-McQueen, 2009)

1. Transviewer prototype questions:• flexible enough to support different types of documents in

European integration history and different user requirements; • modular architecture to allow gradual development and

customisation according to the needs of the projects;• balance manual interventions/automatic processing (XSLT, NER);• XML transformation on the fly (no need for intermediary

formats/steps, changes to the XML already part of the publication).

Discussion

Part III. Discussion 23

Page 24: TEI Conference - CVCE

3. Issues: • BookReader – use of an older version of jQuery library;• non-uniform support of Saxon-CE for XSLT 2.0 transformation in the

browsers;• need for batch conversion to XML-TEI (potential adaptation of

OxGarage for batch processing).4. Ongoing/future work for further development:

• evaluation (technology – technical experts; usability tests – experts in European integration studies);

• development of new modules (multi-panels, audio/video transcription, etc.) and tests with more project samples;

• integration into the existing CVCE’s Website architecture:o Back End;o Front End.

Discussion

Part III. Discussion 24

Page 25: TEI Conference - CVCE

Thank you!

Discussion

25

Scaling in a publication framework would imply not only teaching your editions “how to swim” but also how to swim

together.

Page 26: TEI Conference - CVCE

• Book Reader: https://openlibrary.org/dev/docs/bookreader

• EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/

• GATE: https://gate.ac.uk/

• KILN : http://kiln.readthedocs.org/en/latest/#

• OxGarage: http://www.tei-c.org/oxgarage/

• Pierazzo, Elena. (2011). A rationale of digital documentary editions. In LLC. The Journal of Digital Scholarship in the Humanities, Vol. 26, No. 4, December 2011, pp. 463-477.

• http://www.scholarlyediting.org/2014/essays/essay.pierazzo.html.

• TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/ • TEI (Text Encoding Initiative): http://www.tei-c.org • Versioning Machine: http://v-machine.org/ • Saxon-CE: http://www.saxonica.com/ce/user-doc/1.1/index.html• Sperberg-McQueen, C.M. 2009. “How to teach your edition how to swim”. In LLC. The Journal of Digital

Scholarship in the Humanities. Volume 24, No. 1, April 2009. Oxford Journals.• XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/

References

26