Building an ecosystem of networked references

21
Building an ecosystem of networked references Hugo Manguinhas | DBpedia Community Meeting 2016

Transcript of Building an ecosystem of networked references

Page 1: Building an ecosystem of networked references

Building an ecosystem of networked referencesHugo Manguinhas | DBpedia Community Meeting 2016

Hugo Manguinhas
This slide will probably disappear.
Page 2: Building an ecosystem of networked references

Europeana has many data challenges

Building an ecosystem of networked referencesCC BY-SA

We aggregate very heterogeneous metadata:

• More than 48M objects

• 3,500 galleries, libraries, archives and museums

• 50 languages

• From all EU countries

• Level of quality varies greatly• Huge amount of references to places, agents,

concepts, time

Page 3: Building an ecosystem of networked references

Europeana Linked Data StrategyMotivation

Building an ecosystem of networked referencesCC BY-SA

• Improve user experience• support better ways of searching and navigating

through the collections, eliminating ambiguity and clarifying the meaning of descriptions

• better adapt to the language of the user

• by improving the interlinking of data• brings more context to the objects

• alleviates polysemy issues

• better language coverage

• Contributes to build a web of data ('knowledge graph') that third parties can use to improve their users' experience

Page 4: Building an ecosystem of networked references

Europeana Linked Data StrategyOur efforts and lines of work

Building an ecosystem of networked referencesCC BY-SA

• Europeana Data Model (EDM) offers a base for linking data

• We apply an enrichment strategy to link source data to reference data, incl. DBpedia

• Encourage data providers to contribute their own vocabularies so that we can benefit from data links made at data providers’ level

• We encourage alignment activities between domain vocabularies

Page 5: Building an ecosystem of networked references

Europeana Linked Data StrategyVocabularies currently provided to Europeana

CC BY-SA

Building an ecosystem of networked references

Page 6: Building an ecosystem of networked references

Europeana Linked Data StrategyEuropeana also hosts vocabularies

CC BY-SA

Building an ecosystem of networked references

Page 7: Building an ecosystem of networked references

Europeana Linked Data StrategyA strategy for Entities

Building an ecosystem of networked referencesCC BY-SA

• As a cornerstone for our strategy we are building an "Entity Collection"• A service that acts as a centralized point of reference and

access to data about contextual entities• Caching and curating data from the wider Linked Open Data

cloud• A sort of Europeana "knowledge graph"

Page 8: Building an ecosystem of networked references

The Entity CollectionUse Cases

CC BY-SA

Building an ecosystem of networked references

Europeana Collections Portal

● Findability: users can look for entities, not only records (Entity-Based Search)

● Understandability: Entity Pages group and present all assertions about an entity

● Exploration: Navigation along relationships becomes possible

Crowdsourcing

● Objects can be annotated with references to entities

● A controlled vocabulary for client applications

Enrichment of Provider’s Data

● A controlled vocabulary to help identify named references to entities

Republication for Re-use

● Entities can be republished as an open source to the community

Entity Collection

Page 9: Building an ecosystem of networked references

The Entity CollectionWhy DBpedia?

CC BY-SA

Building an ecosystem of networked references

• It offers labels in about 124 languages through all its language editions of which 48 match the languages that Europeana supports

• It gives fairly complete and accurate descriptive metadata about entities

• Works great as a “pivot” vocabulary, providing further links to other vocabularies such as Wikidata and Freebase

Page 10: Building an ecosystem of networked references

Entity Collection

The Entity CollectionIntegrating DBpedia resources

CC BY-SA

Building an ecosystem of networked references

RDFdumps

48 Language Editions

(~3.6GB/GZ)

http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/

TripleStore

MongoDBSPARQL

Dumps were carefully selected and downloaded for all languages that Europeana supports...

SPARQL queries select DBpedia resources for each EDM Contextual Class

Each DBpedia resource is converted to EDM using XSLT and further filtered

loaded...

Page 11: Building an ecosystem of networked references

The Entity CollectionSome statistics for DBpedia

CC BY-SA

Building an ecosystem of networked references

Entity Class Target vocabulary Size

Places GeoNames 140,097

Concepts DBpedia 5,284

GEMET 280

Agents DBpedia 161,209

Time Semium Time 2,566

Hugo Manguinhas
If I have time I will improve this slide
Hugo Manguinhas
with more statistics about entity data, I have them just need to put them in the table...
Antoine Isaac
I think you can just fly over this slide, as I have it in my presentation...
Page 12: Building an ecosystem of networked references

The Entity CollectionDBpedia resource for “Mozart” in our data

CC BY-SA

Building an ecosystem of networked references

<http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart> a <http://www.europeana.eu/schemas/edm/Agent> ; skos:prefLabel " वोल्फ़गांक आमडेयुस मोत्सार्ट�"@hi, "Волфганг Амадеус Моцарт"@sr, "Волфганг Амадеус Моцарт"@mk, "Волфганг Амадеус Моцарт"@bg, "Вольфганг Амадэй Моцарт"@be, "Βόλφγκανγκ Αμαντέους Μότσαρτ"@el, "Моцарт, Вольфганг Амадей"@ru, "Volfgangs Amadejs Mocarts"@lv, "Wolfgang Amadeusz Mozart"@pl, " 볼프강 아마데우스 모차르트 "@ko, " וואלפגאנג ,yi, "Volfqanq Amadey Mosart"@az, "Вольфганг Амадей Моцарт"@uk, "Wolfgang Amadeus Mozart"@nl@"אמאדעוס מאצארט"Wolfgang Amadeus Mozart"@es, "Wolfgang Amadeus Mozart"@et, "Wolfgang Amadeus Mozart"@en, " ვოლფგანგ ამადეუს

"@ka, "Wolfgang Amadeus Mozart"@de, "Wolfgang Amadeus Mozart"@ga, "Wolfgang Amadeus Mozart"@bs, "Wolfgang მოცარტიAmadeus Mozart"@hu, "Wolfgang Amadeus Mozart"@no, "Wolfgang Amadeus Mozart"@tr, "Wolfgang Amadeus Mozart"@ca, "Wolfgang Amadeus Mozart"@ro, "Wolfgang Amadeus Mozart"@hr, "Wolfgang Amadeus Mozart"@gd, "Wolfgang Amadeus Mozart"@gl, "Wolfgang Amadeus Mozart"@cs, "Wolfgang Amadeus Mozart"@cy, "Wolfgang Amadeus Mozart"@sq, "Wolfgang Amadeus Mozart"@it, "Wolfgang Amadeus Mozart"@is, "Wolfgang Amadeus Mozart"@eu, "Wolfgang Amadeus Mozart"@da, "Wolfgang Amadeus Mozart"@sk, "Wolfgang Amadeus Mozart"@sl, "Wolfgang Amadeus Mozart"@fr, "Wolfgang Amadeus Mozart"@sv, "Wolfgang Amadeus Mozart"@fi, "Wolfgang Amadeus Mozart"@pt, "Volfgangas Amadėjus Mocartas"@lt, "ヴォルフガング・アマデウス・モーツァルト "@ja, " "@hy, "Վոլֆգանգ Ամադեուս Մոցարտ أماديوس " ,he@"וולפגנג אמדאוס מוצרט فولفغانغar, "沃尔夫冈@"موتسارت ·阿马德乌斯 ·莫扎特 "@zh ; skos:altLabel "Mozart, Johann Chrysostom Wolfgang Amadeus"@en, "Mozart, Wolfgang Amadeus"@en ; dc:identifier "32197206" ; rdaGr2:placeOfBirth <http://dbpedia.org/resource/Austria>, <http://dbpedia.org/resource/Salzburg> ; rdaGr2:placeOfDeath <http://dbpedia.org/resource/Vienna>, <http://dbpedia.org/resource/Austria> ; edm:end "1791-12-05" ; rdaGr2:dateOfDeath "1791-12-05" ; rdaGr2:dateOfBirth "1756-01-27" ; rdaGr2:biographicalInformation … ; owl:sameAs <http://en.dbpedia.org/resource/Wolfgang_Amadeus_Mozart>, <http://zitgist.com/music/artist/b972f589-fb0e-474e-b64a-803b0364fa75>,<http://wikidata.org/entity/Q254>, <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Mozart_Wolfgang_Amadeus_1756-1791>, <http://sw.cyc.com/concept/Mx4rvlD9e5wpEbGdrcN5Y29ycA>, <http://yago-knowledge.org/resource/Wolfgang_Amadeus_Mozart>, <http://wikidata.dbpedia.org/resource/Q254>, <http://rdf.freebase.com/ns/m.082db> … .

Preferred labels for 48 languages

Coreference links to 6 other datasets(e.g. Freebase, Wikidata)

Inter-linking information

Page 13: Building an ecosystem of networked references

The Entity CollectionIs DBpedia enough?

CC BY-SA

• Not enough coreferencing information to other vocabularies• particularly to the ones we receive from data providers

(e.g. MIMO)

• Labels and values are not always accurate and normalized• need for better reference data (e.g. VIAF)

Building an ecosystem of networked references

Hugo Manguinhas
is it authoritive data? what is the best way to name it...?
Timothy Hill
As far as I can tell (I'm still digging into this) it's inferred from NLP techniques. So perhaps a bullet-point like 'Relatively relationship-poor', sub-point 'Data-mining and other algorithmic techniques can allow rich relations to be inferred (e.g. Babelnet). Though perhaps leave it out, as I'm still not 100% sure of what magic BabelNet is doing.
Page 14: Building an ecosystem of networked references

The Entity CollectionOur roadmap for the next years

CC BY-SA

Building an ecosystem of networked references

• Generate Europeana URIs for Entities

• Make entity services and data available via an API, and further integrate existing components

• Integrate vocabularies that can further improve• entity descriptions and multilingual coverage (e.g.

VIAF)• linking between entities (e.g. Wikidata)

• Integrate alignments• particularly, links between local/domain vocabularies to

pivot vocabularies

Antoine Isaac
I think you can make the slide a bit lighter by removing 'progressive integration w/ existing components'. This is quite a natural thing to do, nobody will be surprised... Same for the API thing. Actually the API could be seen as a special case of integration. So at least I'd argue for the two bullets to be merged (if not removed).
Hugo Manguinhas
[email protected] , I have added to the roadmap a last point about alignments... I think it may be enough to introduce the next part of the presentation... what do you think?
Hugo Manguinhas
or shall it go to last and add here instead a slide about the problem of local/domain vocabulary linking
Antoine Isaac
No I think it should be ok like this
Hugo Manguinhas
Do you like the phrase "Support and Integrate..."... personally, I am not fond of, but I lack a better wording...
Antoine Isaac
No we don't support
Hugo Manguinhas
Shall I just say "Integrate alignment efforts"?
Anonymous
Well integrate means that we load them in the entity collection: is this the case? If not then just 'encourage'
Hugo Manguinhas
The end goal is to integrate... if we can encourage better...
Anonymous
Integrating alignments sound acceptable in the medium term. Integrating 'actions', 'plans' or 'services' (this is how I understand the word 'efforts' in this context) seems less possible even on long-term.
Page 15: Building an ecosystem of networked references

Linking Metadata to MIMO with CultuurLinkAbout CultuurLink

CC BY-SA

Building an ecosystem of networked references

• Online tool for aligning SKOS vocabularies, http://cultuurlink.beeldengeluid.nl

• Successor to EuropeanaConnect's Amalgame• Developed by Spinque• With support of the Network Digital Heritage

• Semi-automatic discovery of alignments• Ability to define complex strategies• Manual assessment and export of the results

Page 16: Building an ecosystem of networked references

Linking Metadata to MIMO with CultuurLinkThe Europeana Sounds Experiment

CC BY-SA

Building an ecosystem of networked references

• Scope• Evaluate MIMO as target vocabulary for enrichment of subject

fields• Evaluate the potential of CultuurLink for alignment of

vocabularies

• Participants• British Library, CREM, MMSH, NISV (6 collections in total)

• Work done• A SKOS vocabulary was generated for each collection from

the labels found within subject fields• Each participant used CultuurLink to discover alignments by

designing and testing different strategies

Page 17: Building an ecosystem of networked references

Linking Metadata to MIMO with CultuurLinkExample from CNRS

CC BY-SA

Building an ecosystem of networked references

see http://cultuurlink.beeldengeluid.nl

Page 18: Building an ecosystem of networked references

Linking Metadata to MIMO with CultuurLinkResults of the experiment

CC BY-SA

Building an ecosystem of networked references

• There were a couple of issues with metadata quality• matching previous observations: enrichment works better

when metadata quality is good

• The feedback was positive for CultuurLink • Applying different strategies showed to be crucial for

discovering alignments between concepts • ... taking into account different features of the data

(prefered / alternative labels, different languages, vernacular labels, etc.)

• Providers have realized the interest and feasibility of linking to a richer, more multilingual pivot vocabulary such as MIMO

Page 19: Building an ecosystem of networked references

Linking Metadata to MIMO with CultuurLinkDemo

CC BY-SA

Building an ecosystem of networked references

http://cultuurlink.beeldengeluid.nl

Page 20: Building an ecosystem of networked references

Conclusion

CC BY-SA

Building an ecosystem of networked references

• A Strategy for Entities is a “must” for Europeana

• There is no “one fits all” vocabulary

• We have a long way to go…• ...but we are making progress

Valentine Charles
I think the conclusion should accentuate on the integration issues of the different tools and their impact on data. In way maybe those aspects could appear a bit better throughout the presentation. We have put in place this LOD strategy because of all the different challenges we have to face and the fact that there no solution to address them all at the same time.
Hugo Manguinhas
I have added in the slide on the roadmap, a point about integration of components... here I would like to state that there is no vocabulary that fullfils all our needs and we thus need to integrate with more than one... what do you think?
Valentine Charles
This is indeed the challenge from a vocabulary point of view but the challenge is also in terms of tools integration
Page 21: Building an ecosystem of networked references