Special applications for Digital Libraries: computer-aided philological and linguistic analysis of...

25
Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale – Pisa Andrea Bozzi NEH/CNR Meeting Washington DC October 5, 2007

Transcript of Special applications for Digital Libraries: computer-aided philological and linguistic analysis of...

Page 1: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Special applications for Digital Libraries:computer-aided philological and linguistic

analysis of digital documents

Istituto di Linguistica Computazionale – Pisa

Andrea Bozzi

NEH/CNR MeetingWashington DCOctober 5, 2007

Page 2: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Presentation contents

1. An EU supported system for Greek papyrology

2. A special application for browsing and searching demotic documents on ostraka;

3. A philological workstation for digital medieval manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;

5. How to integrate all these modules in a web-based open source application.

Page 3: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Presentation contents

1. An EU supported system for Greek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project)CHLT-LEMLAT (EC-NSF project) to perform to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

Page 4: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

The philological workstation: image and text transcription

Page 5: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Image segmentation and semi-automatic word linking

Page 6: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Annotations and critical apparatus

Page 7: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Wordforms list and specific indexes

Page 8: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

The web philological workstation to manage documentsof the Istituto Papirologico Vitelli in Florence (restricted use)

Page 9: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Presentation contents

Andrea Bozzi

[email protected]

NEH/CNR Meeting, Washington October 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and searching demotic documents on ostraka;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

Page 10: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

OMM 1381: E. Bresciani, S. Pernigotti, M.C. Betrò, Ostraka demotici da Narmuti, Pisa, 1983, pp. 16-18;

OMM 300: Gallo P., Ostraca demotici e ieratici dall’archivio bilingue di Narmouthis, Pisa, 1997, pp. 113-114;

OMM 393: R. Pintaudi, P.J. Sijpesteijn, Ostraka greci da Narmuthis, Pisa, 1993, p. 40.

Special system for teaching and retrieving linguistic information from demotic texts on ostraka

Page 11: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

 L’archivio delle immagini digitali e la tabella dei segni demotici

 

Page 12: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Research results:see the blue parts(arrow) where the selected symbolhas been found

Page 13: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Presentation contents

Andrea Bozzi

[email protected]

NEH/CNR Meeting, Washington October 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital medieval manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

Page 14: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Textual criticism for medieval manuscripts

Link to the listof collatedsources

Page 15: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Selection ofthe variant eixens

Evaluation of the variant reading in the collated source

Page 16: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Recording of thevariant Eixensin theCritical apparatus

Page 17: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Variants search in different ancient printed editions of the same work

Link to the listof collatedbooks

Page 18: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Image of the corresponding page

Page 19: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Presentation contents

Andrea Bozzi

[email protected]

NEH/CNR Meeting, Washington October 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

Page 20: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Lemmatization results(C. Sallustius Crispus, De coniuratione Catilinae, 1-2)

Page 21: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Lemmatization results of selected wordforms

Page 22: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Presentation contents

Andrea Bozzi

[email protected]

NEH/CNR Meeting, Washington DCOctober 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-based open source application.

Page 23: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Pinakes 3.0http://pinakes.imss.fi.it

• Aim: web-based open source application to manage cultural heritage historical data in digital format.

• Partners:– Fondazione Rinascimento Digitale, Florence;– Istituto e Museo della Storia della Scienza,

Florence;– Ministero per i Beni Culturali, Rome– CNR, Istituto di Linguistica Computazionale, Pisa

Page 24: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Technology

– Programming language: JAVA (Jdk1.5)– Servlet Engine: Tomcat 5.5.x + Apache HTTP

Connectors.– Web server: Apache httpd server 2.2.x.– Web Applications Framework: Jakarta Struts– Web Service Framework: Apache Axis 1.4– Database Engine: Postgres 8.1– Programming environment: NetBeans 5.5.1.– Final development: Hibernate 3.2.5.

Page 25: Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale.

Standards

• DCMI (Dublin Core Metadata Initiative)• TEI (Text Encoding Initiative)• OWL (Ontology Web Language)• RDF-XML (Resource Description Framework)• SPARQL (Query Language fo RDF)

• UTF8 (Unicode Transformation Format).