Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous...

32
Linked Data for Digital History Connecng Data for Research Victor de Boer With input from Christophe Guéret, Serge ter Braake, Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora Aroyo, Johan Oomen, Oana Inel, Jan Wielemaker, Jeroen Entjes

Transcript of Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous...

Page 1: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Linked Data for Digital History

Connecting Data for Research

Victor de Boer

With input from Christophe Guéret, Serge ter Braake, Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora Aroyo, Johan Oomen,

Oana Inel, Jan Wielemaker, Jeroen Entjes

Page 2: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Victor de Boer

Web & Media Group, CS, Vrije Universiteit AmsterdamNetherlands Institute for Sound and Vision

Cultural HeritageDigital History

Linked Data for Development

Page 3: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Digital History

Sub-discipline of digital humanities

Part of the effort of historian is moved from the physical archives to digital ones

Cross-domain collaborationImg:www.doaks.org, www.dkrz.de

Page 5: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

“That is great. I would love that…

…but my research questions are slightly different.”

Img:Monty Python

Page 6: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Aging

Data Tool

C. Guéret based on http://redmonk.com/jgovernor/2007/04/05/why-applciations-are-like-fish-and-data-is-like0wine/

Page 7: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Even better

Do not bake the data into the tool and treat data as an end product.Build tools on top of the data.Make sure others can do so as well.

Fig: C. Guéret

Page 8: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Linked Data for Digital History

• Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)– Link what can be linked

• re-use and re-usability

• Linked Data is the (technically) best way to publish and share your (research) data

OBJECT EVENT

PLACE

TIME

PERSON

CONCEPT

PROVENANCE

Page 9: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Some examples

Page 10: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Dutch Ships and Sailors

Page 11: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

The Problem:((Maritime) historical) data is not integrated

Page 12: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

KB NEWSPAPERS

Dutch-Asiatic Shipping “VOC Opvarenden”

Jur Leinenga Matthias van Rossum

Elbing voyagesArchangel voyages

Page 13: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

DIFFERENT but LINKED DATAMODELS BASED ON COMPETENCY QUESTIONS

dss:Recordgzmvoc:Telling

gzmvoc:telling-1046-De_Berkel

__bnode_1

gzmvoc:aziatischeBemanning

dss:Shipgzmvoc:Schip

gzmvoc: schip-1046-De_Berkel

dss:has_shipgzmvoc:schip

"1046"

“Schip”

“De Berkel”

rdfs:labeldss:scheepsnaam

gzmvoc:scheepsnaam

dss:ShipTypegzmvoc:Scheepstype

gzmvoc: type-Shipdss:has_shiptype

gzmvoc:has_shiptype

gzmvoc:scheepstype

“21”

“Moorse mattroosen”

dss:azRegistratieKop

gzmvoc:azAantalMatrozen

gzmvoc:telling

gzmvoc:heeft DAS heenreis

dss:Recorddas:Voyage

das:voyage-1918_61

Page 14: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

ACCESS IT ATHTTP://DUTCHSHIPSANDSAILORS.NL/DATA

OR

HTTP://SEMANTICWEB.CS.VU.NL/DSS

SELECT * WHERE { ?record dss:hasOriginalScan ?scan. ?record dss:has_kb_link ?kblink. ?record mdb:schip ?schip. ?schip mdb:scheepstype ?shiptype. ?shiptype skos:exactMatch ?em. ?em skos:broader* aat:kustvaarders. }

Page 15: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Data analysis and visualisation

Page 16: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

DIVE

Page 17: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

MEDIA HISTORIANS AND RESEARCHERS Med

ia re

search

er La

r s Arv

e R

øs sla

nd

of th

e U

niv

ersi ty

of B

erg

en. (P

hoto

: An

dre

as R

. Gra

ven)

EXPLORATIVE SEARCH

Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation

Page 18: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

DATA: OPENIMAGES.EU and DELPHER.NL

Page 19: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

ENTITY EXTRACTION

CROWDTRUTH.ORG

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

Page 20: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

DATA CONNECTED IN KNOWLEDGE GRAPH

DIVE:MEDIA OBJECT

SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

LINKS TO EUROPEANA LINKS TO DBPEDIA

Page 21: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

“DIGITAL SUBMARINE” INTERFACE

DIVE.BEELDENGELUID.NL

Page 22: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

BiographyNetStarting Point: Biography Portal of the Netherlands; www.biografischportaal.nl

125,000 short biographical descriptions with limited metadata from 23 Dutch biographical dictionaries (~76,000 individuals)

What kind of historical questions can be answered with these data with the help of computational methods

Biographynet.nl

Page 23: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Linked Data for BiograpyNet

Thorbecke

Biographical Description

ProvenanceMeta Data

NNBW

PersonMeta Data

“Thorbecke”

BiographyParts

Birth1798Event

Biographical Description

Enrichment NLP Tool

PersonMeta Data

EventBirth

Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Zwolle1798-01-14

Biographynet.nl

Page 24: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

a

Provenance in BiographynetEnsure credibility of the demonstrator, to evaluate its performance and to improve the academic status of the tool

Information involved Sources, but also: NER input data, etc. Processes involved All steps in enrichment, aggregation…People involved Who was responsible for pipeline, tool,

Biographynet.nl*Daniel Garijo, Yolanda Gil; http://www.opmw.org/model/p-plan

Page 25: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Interface for historians

Biographynet.nl

Page 26: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Framework generic solutions with historians

1. Preprocess, Clean, Model, Link, Enrich data in a collaboration with domain experts

2. Access heterogeneous datasets in a convenient way to get an intuition of the character and anomalies of the (linked) data;

3. Perform arbitrary queries to retrieve results relevant to their research questions;

4. Verify the veracity of query results, by following provenance links to original material

5. Retrieve and analyze the data with tool of preference.

6. Republish and share results

Page 27: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Historical tool criticism… willingness from historians to invest the time to learn about computer processes (at least the basic principles)

Possibilities for education at universities to bridge the gap between computer science and humanities studies and make tool criticism an integral part of student’s curricula

“Why do we still teach history student to decipher 17th Century handwriting, but not SQL”

Page 28: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Thank you!

Victor de Boer

http://[email protected]

@victordeboer

Page 29: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Verrijkt Koninkrijk

Page 30: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

30

National-Socialist; 29%

Social-Democrat; 21%Protestant; 13%

Liberal; 12%

Roman-Catholic; 12%Communist; 8%

Jewish; 5%

http://semanticweb.cs.vu.nl/verrijktkoninkrijk/

http://search.loedejongdigitaal.nl/

Page 31: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

Results are links to paragraphs

Page 32: Linked Data for Digital History · Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)

re-usability

http://qhp.science.uva.nl/