The Europeana Use Case - World Wide Web Consortium · Multilingual Enrichment Strategy. In: ... !...

14
Multilingual & Semantic Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 12 March 2013 W3C Multilingual Web Workshop The Europeana Use Case

Transcript of The Europeana Use Case - World Wide Web Consortium · Multilingual Enrichment Strategy. In: ... !...

Multilingual & Semantic Interoperability in Cultural Heritage Information Systems

Vivien Petras

Berlin School of Library and Information Science

12 March 2013 W3C Multilingual Web Workshop

The Europeana Use Case

Contents

•  Europeana: Multilingual Collections & Users •  Multilingual Interoperability •  Semantic Enrichment •  Preview: New Enrichment Plans •  Playing with Europeana Data

2 Image: http://www.europeana.eu/portal/record/08535/D53FE7B7621E65A5E01E16E3D72785C68F2E2059.html

Europeana

3

•  15.2 million images •  10 million texts •  450,000 sound files •  170,000 video files

> 2,200 institutions > 30 countries

Europeana Multilingual Collections

German 18%

Multilingual 12%

French 11%

Dutch 10%

Swedish 9%

Spanish 8%

English 7%

Norwegian 6%

Polish 6%

Italian 6%

Finnish 3%

Danish 2%

Hungarian 1%

Slovenian 1%

à Most Europeana objects are language-independent (e.g. images), but the meta-data is multilingual.

4

Multilingual Europeana Users

•  Native language browser: 69% •  Native language Google (entry point): 91%

•  Native language objects: 43% (SV 77%, DE 71%) à Native language use increases as soon as native language

content increases. Gäde, Maria (forthcoming). “User Behavior through the Language Glass” – Language-specific Behavior in Multilingual Digital Libraries.

5 Image: http://www.europeana.eu/resolve/record/9200105/AF5C65B3CC6A71CC0E4FF6FE5AAEB4CDAA1873C9

Multilingual Interface in 31 Languages

•  users seem to assume that search is affected

6

Query Result Filtering by Language

•  language of record vs. language of content

7

Document Translation

•  general MT – not domain-specific

8

Query Translation – Planned for 2013

•  How many languages? •  How much user interaction?

9

•  concept (GEMET Thesaurus), agent (DBpedia), period (Semium time ontology), place (Geonames)

10

Semantic Enrichment

Poisonous India…

11

Enrichment Challenges

•  Metadata quality & sparsity

•  Vocabulary ambiguity

–  domain GEMET print (German) Druck pressure

–  language electrical Power (German) Strom (Czech) strom tree

–  context Córdoba = Spain | Argentina

Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain.

12 Image: http://www.europeana.eu/portal/record/03919/FCD38BDE7A03579F24BEDA5D157943B75BB36F11.html

Preview: New Enrichment Plans

13

à transition to linked data-based Europeana Data Model (EDM)

•  links to contextual vocabularies from providers •  enrich during ingestion

Playing with Europeana Data

•  CHiC: Cultural Heritage in CLEF à Europeana data (XML) & queries / 13 languages à ad-hoc retrieval / semantic enrichment tasks à Submission deadline: 14 April 2013 à http://www.culturalheritageevaluation.org

•  Europeana Linked Open Data à RDF file dumps in EDM (Europeana Data Model) à SPARQL endpoint à CC0 open license à http://data.europeana.eu/

•  Contact: [email protected]

14 Image: http://www.europeana.eu/resolve/record/03486/DF559A7721E55BAE5BF5095FB9AA55406C0269C4