Building an ecosystem of networked references
-
Upload
hugo-manguinhas -
Category
Technology
-
view
263 -
download
0
Transcript of Building an ecosystem of networked references
Building an ecosystem of networked referencesHugo Manguinhas | DBpedia Community Meeting 2016
Europeana has many data challenges
Building an ecosystem of networked referencesCC BY-SA
We aggregate very heterogeneous metadata:
• More than 48M objects
• 3,500 galleries, libraries, archives and museums
• 50 languages
• From all EU countries
• Level of quality varies greatly• Huge amount of references to places, agents,
concepts, time
Europeana Linked Data StrategyMotivation
Building an ecosystem of networked referencesCC BY-SA
• Improve user experience• support better ways of searching and navigating
through the collections, eliminating ambiguity and clarifying the meaning of descriptions
• better adapt to the language of the user
• by improving the interlinking of data• brings more context to the objects
• alleviates polysemy issues
• better language coverage
• Contributes to build a web of data ('knowledge graph') that third parties can use to improve their users' experience
Europeana Linked Data StrategyOur efforts and lines of work
Building an ecosystem of networked referencesCC BY-SA
• Europeana Data Model (EDM) offers a base for linking data
• We apply an enrichment strategy to link source data to reference data, incl. DBpedia
• Encourage data providers to contribute their own vocabularies so that we can benefit from data links made at data providers’ level
• We encourage alignment activities between domain vocabularies
Europeana Linked Data StrategyVocabularies currently provided to Europeana
CC BY-SA
Building an ecosystem of networked references
Europeana Linked Data StrategyEuropeana also hosts vocabularies
CC BY-SA
Building an ecosystem of networked references
Europeana Linked Data StrategyA strategy for Entities
Building an ecosystem of networked referencesCC BY-SA
• As a cornerstone for our strategy we are building an "Entity Collection"• A service that acts as a centralized point of reference and
access to data about contextual entities• Caching and curating data from the wider Linked Open Data
cloud• A sort of Europeana "knowledge graph"
The Entity CollectionUse Cases
CC BY-SA
Building an ecosystem of networked references
Europeana Collections Portal
● Findability: users can look for entities, not only records (Entity-Based Search)
● Understandability: Entity Pages group and present all assertions about an entity
● Exploration: Navigation along relationships becomes possible
Crowdsourcing
● Objects can be annotated with references to entities
● A controlled vocabulary for client applications
Enrichment of Provider’s Data
● A controlled vocabulary to help identify named references to entities
Republication for Re-use
● Entities can be republished as an open source to the community
Entity Collection
The Entity CollectionWhy DBpedia?
CC BY-SA
Building an ecosystem of networked references
• It offers labels in about 124 languages through all its language editions of which 48 match the languages that Europeana supports
• It gives fairly complete and accurate descriptive metadata about entities
• Works great as a “pivot” vocabulary, providing further links to other vocabularies such as Wikidata and Freebase
Entity Collection
The Entity CollectionIntegrating DBpedia resources
CC BY-SA
Building an ecosystem of networked references
RDFdumps
48 Language Editions
(~3.6GB/GZ)
http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/
TripleStore
MongoDBSPARQL
Dumps were carefully selected and downloaded for all languages that Europeana supports...
SPARQL queries select DBpedia resources for each EDM Contextual Class
Each DBpedia resource is converted to EDM using XSLT and further filtered
loaded...
The Entity CollectionSome statistics for DBpedia
CC BY-SA
Building an ecosystem of networked references
Entity Class Target vocabulary Size
Places GeoNames 140,097
Concepts DBpedia 5,284
GEMET 280
Agents DBpedia 161,209
Time Semium Time 2,566
The Entity CollectionDBpedia resource for “Mozart” in our data
CC BY-SA
Building an ecosystem of networked references
<http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart> a <http://www.europeana.eu/schemas/edm/Agent> ; skos:prefLabel " वोल्फ़गांक आमडेयुस मोत्सार्ट�"@hi, "Волфганг Амадеус Моцарт"@sr, "Волфганг Амадеус Моцарт"@mk, "Волфганг Амадеус Моцарт"@bg, "Вольфганг Амадэй Моцарт"@be, "Βόλφγκανγκ Αμαντέους Μότσαρτ"@el, "Моцарт, Вольфганг Амадей"@ru, "Volfgangs Amadejs Mocarts"@lv, "Wolfgang Amadeusz Mozart"@pl, " 볼프강 아마데우스 모차르트 "@ko, " וואלפגאנג ,yi, "Volfqanq Amadey Mosart"@az, "Вольфганг Амадей Моцарт"@uk, "Wolfgang Amadeus Mozart"@nl@"אמאדעוס מאצארט"Wolfgang Amadeus Mozart"@es, "Wolfgang Amadeus Mozart"@et, "Wolfgang Amadeus Mozart"@en, " ვოლფგანგ ამადეუს
"@ka, "Wolfgang Amadeus Mozart"@de, "Wolfgang Amadeus Mozart"@ga, "Wolfgang Amadeus Mozart"@bs, "Wolfgang მოცარტიAmadeus Mozart"@hu, "Wolfgang Amadeus Mozart"@no, "Wolfgang Amadeus Mozart"@tr, "Wolfgang Amadeus Mozart"@ca, "Wolfgang Amadeus Mozart"@ro, "Wolfgang Amadeus Mozart"@hr, "Wolfgang Amadeus Mozart"@gd, "Wolfgang Amadeus Mozart"@gl, "Wolfgang Amadeus Mozart"@cs, "Wolfgang Amadeus Mozart"@cy, "Wolfgang Amadeus Mozart"@sq, "Wolfgang Amadeus Mozart"@it, "Wolfgang Amadeus Mozart"@is, "Wolfgang Amadeus Mozart"@eu, "Wolfgang Amadeus Mozart"@da, "Wolfgang Amadeus Mozart"@sk, "Wolfgang Amadeus Mozart"@sl, "Wolfgang Amadeus Mozart"@fr, "Wolfgang Amadeus Mozart"@sv, "Wolfgang Amadeus Mozart"@fi, "Wolfgang Amadeus Mozart"@pt, "Volfgangas Amadėjus Mocartas"@lt, "ヴォルフガング・アマデウス・モーツァルト "@ja, " "@hy, "Վոլֆգանգ Ամադեուս Մոցարտ أماديوس " ,he@"וולפגנג אמדאוס מוצרט فولفغانغar, "沃尔夫冈@"موتسارت ·阿马德乌斯 ·莫扎特 "@zh ; skos:altLabel "Mozart, Johann Chrysostom Wolfgang Amadeus"@en, "Mozart, Wolfgang Amadeus"@en ; dc:identifier "32197206" ; rdaGr2:placeOfBirth <http://dbpedia.org/resource/Austria>, <http://dbpedia.org/resource/Salzburg> ; rdaGr2:placeOfDeath <http://dbpedia.org/resource/Vienna>, <http://dbpedia.org/resource/Austria> ; edm:end "1791-12-05" ; rdaGr2:dateOfDeath "1791-12-05" ; rdaGr2:dateOfBirth "1756-01-27" ; rdaGr2:biographicalInformation … ; owl:sameAs <http://en.dbpedia.org/resource/Wolfgang_Amadeus_Mozart>, <http://zitgist.com/music/artist/b972f589-fb0e-474e-b64a-803b0364fa75>,<http://wikidata.org/entity/Q254>, <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Mozart_Wolfgang_Amadeus_1756-1791>, <http://sw.cyc.com/concept/Mx4rvlD9e5wpEbGdrcN5Y29ycA>, <http://yago-knowledge.org/resource/Wolfgang_Amadeus_Mozart>, <http://wikidata.dbpedia.org/resource/Q254>, <http://rdf.freebase.com/ns/m.082db> … .
Preferred labels for 48 languages
Coreference links to 6 other datasets(e.g. Freebase, Wikidata)
Inter-linking information
The Entity CollectionIs DBpedia enough?
CC BY-SA
• Not enough coreferencing information to other vocabularies• particularly to the ones we receive from data providers
(e.g. MIMO)
• Labels and values are not always accurate and normalized• need for better reference data (e.g. VIAF)
Building an ecosystem of networked references
The Entity CollectionOur roadmap for the next years
CC BY-SA
Building an ecosystem of networked references
• Generate Europeana URIs for Entities
• Make entity services and data available via an API, and further integrate existing components
• Integrate vocabularies that can further improve• entity descriptions and multilingual coverage (e.g.
VIAF)• linking between entities (e.g. Wikidata)
• Integrate alignments• particularly, links between local/domain vocabularies to
pivot vocabularies
Linking Metadata to MIMO with CultuurLinkAbout CultuurLink
CC BY-SA
Building an ecosystem of networked references
• Online tool for aligning SKOS vocabularies, http://cultuurlink.beeldengeluid.nl
• Successor to EuropeanaConnect's Amalgame• Developed by Spinque• With support of the Network Digital Heritage
• Semi-automatic discovery of alignments• Ability to define complex strategies• Manual assessment and export of the results
Linking Metadata to MIMO with CultuurLinkThe Europeana Sounds Experiment
CC BY-SA
Building an ecosystem of networked references
• Scope• Evaluate MIMO as target vocabulary for enrichment of subject
fields• Evaluate the potential of CultuurLink for alignment of
vocabularies
• Participants• British Library, CREM, MMSH, NISV (6 collections in total)
• Work done• A SKOS vocabulary was generated for each collection from
the labels found within subject fields• Each participant used CultuurLink to discover alignments by
designing and testing different strategies
Linking Metadata to MIMO with CultuurLinkExample from CNRS
CC BY-SA
Building an ecosystem of networked references
see http://cultuurlink.beeldengeluid.nl
Linking Metadata to MIMO with CultuurLinkResults of the experiment
CC BY-SA
Building an ecosystem of networked references
• There were a couple of issues with metadata quality• matching previous observations: enrichment works better
when metadata quality is good
• The feedback was positive for CultuurLink • Applying different strategies showed to be crucial for
discovering alignments between concepts • ... taking into account different features of the data
(prefered / alternative labels, different languages, vernacular labels, etc.)
• Providers have realized the interest and feasibility of linking to a richer, more multilingual pivot vocabulary such as MIMO
Linking Metadata to MIMO with CultuurLinkDemo
CC BY-SA
Building an ecosystem of networked references
http://cultuurlink.beeldengeluid.nl
Conclusion
CC BY-SA
Building an ecosystem of networked references
• A Strategy for Entities is a “must” for Europeana
• There is no “one fits all” vocabulary
• We have a long way to go…• ...but we are making progress