1 The Digital du Cange: Moldy Old Tomes Make an Internet Comeback Andrew Gollan and Ross Scaife...

15
1 The Digital du Cange: Moldy Old Tomes Make an Internet Comeback Andrew Gollan and Ross Scaife Modern and Classical Languages, Literatures, and Cultures February 23, 2005

Transcript of 1 The Digital du Cange: Moldy Old Tomes Make an Internet Comeback Andrew Gollan and Ross Scaife...

1

The Digital du Cange:

Moldy Old Tomes Make an Internet Comeback

Andrew Gollan and Ross Scaife

Modern and Classical Languages, Literatures, and Cultures

February 23, 2005

2

How do we move this 17th century resource into the 21st century?

3

Major Periods of Latin Literature

Archaic: down to c. 100 BCE Classical: c. 100 BCE to c. 200 CE Late Antique: c. 200 CE to 500 CE Medieval: c. 500 CE to c. 1400 CE Renaissance (Humanistic): c. 1400 CE

to c. 1700 Neolatin: c. 1700 CE to present

4

Some Lexica for Archaic and Classical Latin Thesaurus linguae latinae (TLL). 1900+ Forcellini. Totius latinitatis lexicon.

1755-1887 Lewis and Short. 1879 Oxford Latin Dictionary. 1968-1982

5

Souter: A glossary of Later Latin to 600 A.D. 1964

C. du Fresne, seigneur Du Cange. Glossarium ad scriptores mediae et infimae latinitatis, 10 vols. 1678-1887

J.F. Niermeyer. Mediae latinitatis lexicon minus.... 1964-76

Some Lexica for late Antique and Medieval Latin

6

Some Lexica for Humanistic and Neo- Latin Egger (Vatican): Lexicon recentis

Latinitatis: a dictionary of contemporary Latin. 1992

Hoven. Lexique de la prose latine de la renaissance. 1994

7

Problems with the status quo Physical access extremely limited: rare

and expensive resources Thesaurus Linguae Latinae (TLL)

moving at a snail’s pace inadequate application of computers

Copyright restrictions vs. open access Desirability of corpus-based approach

8

Latin on line so far What about all the databases Young

Library subscribes to? David Packard’s PHI: more promising Perseus Latin corpus: available but

limited Lewis and Short

Available in TEI-XML from Perseus Limited electronic extensions in place

9

Our goals and principles Incremental approach adding value at

each level Push the envelope technically in how

one goes about lexicography Network effect: leverage the distributed

community Open Access licensing of all data

(Creative Commons)

10

Where we are now: Stage I Indexed page images (scans, OCR, and

CGI) Complete transcriptions with simple

markup Gradually more elaborate markup of

original lexica

11

What we want to do: Stage II Merge the parallel lemmata in the

separate lexica: this is the hard part! Concatenate lemmata Map common citations Apply WordNet or similar approach to

group semantically related definitions Create interface that allows humans to

clean up the resulting mess

12

What we want to do: Stage III Implement methods for collaboration on

extending this resource Multiple perspectives: from a single

source, generate thesauri, lexica for particular authors or genres or locales or periods

Die happy.

13

UK: a good locale for this work Strong interest in post-Classical Latin

among faculty and graduate students History of projects in humanities

computing related to classical languages and cultures

14

A propitious moment… Computational humanists standardizing on

XML (mostly TEI) markup of texts Powerful tools emerging for working with

XML, e.g. editors and XML-aware databases Also: important recent advances in

computational linguistics (machine learning systems)

OCR getting much better

15

Challenges of networked scholarship

Experts needed to improve merging of our electronic lexica extend them with new information

Credit and recognition based on tracking and evaluation of incremental contributions to scholarship

Emerging forms of evaluation