Post on 27-Jul-2018
Multilingual & Semantic Interoperability in Cultural Heritage Information Systems
Vivien Petras
Berlin School of Library and Information Science
12 March 2013 W3C Multilingual Web Workshop
The Europeana Use Case
Contents
• Europeana: Multilingual Collections & Users • Multilingual Interoperability • Semantic Enrichment • Preview: New Enrichment Plans • Playing with Europeana Data
2 Image: http://www.europeana.eu/portal/record/08535/D53FE7B7621E65A5E01E16E3D72785C68F2E2059.html
Europeana
3
• 15.2 million images • 10 million texts • 450,000 sound files • 170,000 video files
> 2,200 institutions > 30 countries
Europeana Multilingual Collections
German 18%
Multilingual 12%
French 11%
Dutch 10%
Swedish 9%
Spanish 8%
English 7%
Norwegian 6%
Polish 6%
Italian 6%
Finnish 3%
Danish 2%
Hungarian 1%
Slovenian 1%
à Most Europeana objects are language-independent (e.g. images), but the meta-data is multilingual.
4
Multilingual Europeana Users
• Native language browser: 69% • Native language Google (entry point): 91%
• Native language objects: 43% (SV 77%, DE 71%) à Native language use increases as soon as native language
content increases. Gäde, Maria (forthcoming). “User Behavior through the Language Glass” – Language-specific Behavior in Multilingual Digital Libraries.
5 Image: http://www.europeana.eu/resolve/record/9200105/AF5C65B3CC6A71CC0E4FF6FE5AAEB4CDAA1873C9
• concept (GEMET Thesaurus), agent (DBpedia), period (Semium time ontology), place (Geonames)
10
Semantic Enrichment
Enrichment Challenges
• Metadata quality & sparsity
• Vocabulary ambiguity
– domain GEMET print (German) Druck pressure
– language electrical Power (German) Strom (Czech) strom tree
– context Córdoba = Spain | Argentina
Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain.
12 Image: http://www.europeana.eu/portal/record/03919/FCD38BDE7A03579F24BEDA5D157943B75BB36F11.html
Preview: New Enrichment Plans
13
à transition to linked data-based Europeana Data Model (EDM)
• links to contextual vocabularies from providers • enrich during ingestion
Playing with Europeana Data
• CHiC: Cultural Heritage in CLEF à Europeana data (XML) & queries / 13 languages à ad-hoc retrieval / semantic enrichment tasks à Submission deadline: 14 April 2013 à http://www.culturalheritageevaluation.org
• Europeana Linked Open Data à RDF file dumps in EDM (Europeana Data Model) à SPARQL endpoint à CC0 open license à http://data.europeana.eu/
• Contact: vivien.petras@ibi.hu-berlin.de
14 Image: http://www.europeana.eu/resolve/record/03486/DF559A7721E55BAE5BF5095FB9AA55406C0269C4