Keynote csws2013
-
Upload
victor-de-boer -
Category
Education
-
view
260 -
download
0
description
Transcript of Keynote csws2013
Linked Data for Digital Heritage and History
Victor de Boer
VU University Amsterdam Keynote CSWS 2013 Shanghai
About me
Victor de Boer
Assistant professor at VU University Amsterdam
Domain-driven Semantic Technologies, Linked Data
Cultural Heritage
Digital History
Linked Data for Development
Linked Data is ``a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.’’ --Wikipedia
The evolution of Science
Antonie van Leeuwenhoek’s microscope (17th C.)
Large Hadron Collider in Switzerland (21th C.)
Why Linked Data for E-science
Large amounts of data
Efficient analysis, data mining
Sharing data, information and knowledge between scientists
Across continents
Across disciplines
OpenPhacts explorer
http://www.openphacts.org/
But what about the humanities?
Cultural Heritage
MultimediaN E-Culture project
• Museums have increasingly nice websites • But: most of them are driven by stand-alone collection
databases
• Data is isolated, both syntactically and semantically
• If users can do cross-collection search, the individual collections become more valuable!
• Semantic Search
http://e-culture.multimedian.nl/
Search for objects which are linked via concepts (semantic link)
China
Kanton
PartOf
Query “China”
Use the type of semantic link to provide meaningful presentation of the search results
Rijksmuseum: View of Canton, with two Dutch ships
Semantic Search
Vocabulary alignment
• In large virtual collections there are always multiple vocabularies with its own perspective – In multiple languages – You can’t just merge them
• But you can use vocabularies
jointly by defining a limited set of links
• It is surprising what you can
do with just a few links
Vocabulary alignment
“Easel-pieces”
RMA concept “Schilderij”
RMA is the thesaurus of Rijksmuseum
AAT artefact type “Easel Piece” “Painting”
AAT is Getty’s Art & Architecture Thesaurus
http://e-culture.multimedian.nl/
Amsterdam Museum as Linked Open Data
17
Amsterdam Museum
• Formerly Amsterdam Historic Museum – “The rich collection of works of art,
objects and archaeological finds brings to life the fortunes of Amsterdammers of days gone by and today.”
• In March 2010 published their whole collection online – 70.000 objects
– CC license
Requirements for conversion and linking
• Transparent conversion and linking of the data – Use of provenance and
reproducibility
• keep original complexities of the data
• while making it interoperable with other (Europeana) data
• Retain the relation to original data
19
Methods
ClioPatria
XMLRDF
1. XML ingestion (OAI)
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources
6. Publish as Linked Data
Amalgame
Tools
ClioPatria.swi-prolog.org
ClioPatria is powered by
XMLRDF rewriting rules examples
Mapping to popular vocabularies
am:obj_22093 “Job Cohen” am:contentPersonName
rdfs:subPropertyOf
dcterms:subject
Amalgame alignment platform
• Semi-automatic linking – Simple automatic
techniques,
– chained together by hand
• Transparent and interactive
Amsterdam Museum als Linked Open Data
http://thedatahub.org/dataset/amsterdam-museum-as-edm-lod
E-history (digital history)
BiographyNet
(Narrative) historical methodology
• Historical facts derived mainly from archival findings and existing literature
• Historians put them together into a narrative/synthesis.
– The Narrative: a historical synthesis which can not be
scientifically proven (only made likely) based on facts which can be proven or falsified. There is necessarily a creative element in drawing up a narrative
Slides by BiographyNet team
Where do eScience and Biographical History meet?
• Quantitative analyses of a larger group of people (prosopography). Surpassing the anecdotal.
• Finding relations/networks
between people which are otherwise hard to detect
Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Linked Data for BiograpyNet
Thorbecke
Biographical Description
Provenance Meta Data
NNBW
Person Meta Data
“Thorbecke”
Biography Parts
Birth 1798
Event
Biographical Description
Enrichment NLP Tool
Person Meta Data
Event Birth
Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Zwolle 1798-01-14
Prototype under development
The information provided by the first system can
be used to:
1. Identify alternative descriptions of events (same time, location and/or participants)
2. Identify relations between events (same locations & time, consequent events, same participants, etc.)
3. Initial networks of people
http://www.biographynet.nl
Verrijkt Koninkrijk
History of German occupied Dutch society (1940-1945) Published between 1969 and 1991 in 14 volumes, 30 parts, 18.000 pages 1. Digitization, 2. Open Data, 3. Enriched access with Linked Open Data
Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog
(The Kingdom of the Netherlands During World War II )
country, collection, doc-type, volume, chapter, section, sub-section, paragraph
Back of the Book Vocabulary +
Named Entity Vocabulary
SKOS vocabularies as stepping stones
http://semanticweb.cs.vu.nl/verrijktkoninkrijk/
niod:Blitzkrieg
niod:parRef
niod:oai_wo2_niod_nl_rec_102045 dct:subject
http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
botb:Blitzkrieg
skos:exactMatch
skos:exactMatch
skos:exactMatch
botb:sjanghai
dbpedianl:sjanghai
dbpedia:sjanghai
owl:sameAs
Шанхай
Thượng Hải
上海市
Xangai
Šanghaj
Shanghai
Shanghai
rdfs:label
dbpedia:
Shanghai_Jiao_Tong_University
dbp:is_city_of
SELECT * WHERE
{ ?s skos:prefLabel ?pl.
?s skos:closeMatch ?geo.
?geo gn:parentADM1 ?prov.
?prov gn:name ?provname.
?s niod:pageRef ?pref. }
0
2000
4000
6000
8000
10000
12000
NE index
BotB index
Geographical analysis using background knowledge from
GeoNames
Results are links to paragraphs
SPARQL for R
National-Socialist
29%
Social-Democrat
21% Protestant
13%
Liberal 12%
R-Catholic 12%
Communist 8%
Jewish 5%
Pillar1 Pillar2 Co
Liber. Protestant 0.29
Protestant R-Cath. 0.22
Liber. R-Cath. 0.21
Comm Soc-dem 0.20
Liber. Soc-dem 0.15
Dutch Ships and Sailors
gz:Mercuur
1782
gz:Buijksloot
gz:Batavia
gz:Claas Roem
voc:Claas Roem
voc:Buijksloot
1752 das:Mercuur
das:Departure
das:Roem, Klaas
19-12-1780 das:Texel
das:Arrival
20-7-1781 das:Batavia
das:Voyage1
Web of Data
DataLab
Lessons Learned
Be humble, transparent and interactive in your data conversion and linking
Lessons Learned
Lessons Learned
Retain complexities of the data and establish layers of interoperability
Lessons Learned
A Little Semantics goes a Long Way…and so does a small amount of links
Lessons Learned
Make sure your solutions and tools fit the methodology of the field
Lessons Learned
Show added benefit for scientific research and (unexpected) re-use
Lessons Learned
Linked Data is a good fit for Humanities research
Image credits
• Wikipedia lemmas • Flickr images (cc-licensed)
– RMTip21 – Argonne National Laboratory – thegarethwiscombe
• “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” • http://blogs.voanews.com/science-world/tag/cern/ • Gezicht op Canton, Vingboons-atlas, Bussum 1981, p. 35 VOC Kenniscentrum
Links
• http://semanticweb.cs.vu.nl
• http://biographynet.nl
• http://e-culture.multimedian.nl
• http://cliopatria.swi-prolog.org