Dutch Ships and Sailors Project @ WAI 2014

30
Dutch Ships and Sailors Victor de Boer - WAI - 17-3-2014

description

VU Weekly AI Meeting (WAI) talk showing the current status of the CLARIN-DSS Dutch Ships and Sailors project.

Transcript of Dutch Ships and Sailors Project @ WAI 2014

Page 1: Dutch Ships and Sailors Project @ WAI 2014

Dutch Ships and Sailors

Victor de Boer - WAI - 17-3-2014

Page 2: Dutch Ships and Sailors Project @ WAI 2014

Dutch History = Maritime history

Page 3: Dutch Ships and Sailors Project @ WAI 2014

The problem

25+ Maritime datasets; Heterogeneous

Page 4: Dutch Ships and Sailors Project @ WAI 2014

• CLARIN Call 4 project (9 mo. – ends april)– VU Hist (Matthias van Rossum)– Huygens ING (Jur Leinenga)– VU CS (me)

• Inventory of Maritime DBs• Create Linked Data cloud for subset

– Link places, persons, ships, concepts, events• Link to KB newspapers• Reusable components

Dutch Ships and Sailors

Page 5: Dutch Ships and Sailors Project @ WAI 2014

11-04-2023

Datasets on ‘ships’, ‘places’, ‘persons’ VOC OpvarendenDutch-Asiatic ShippingGenerale ZeemonsterrollenMonsterrollen Noordelijke Scheepvaart

Textual historical data on ‘ship movements’, ‘events’Historische Kranten (KB)

DSS data sources

Page 6: Dutch Ships and Sailors Project @ WAI 2014

Matthias van Rossum onderzocht de verhoudingen tussen Europese en Aziatische zeelieden onder de Verenigde Oost-Indische Compagnie (1602-1795) erg gelijkwaardig waren. Dat is in scherp contrast met de latere 19de eeuwse situatie, toen Aziatische zeelieden in een ongelijkwaardige en soms onvrijere positie werkten onder slechtere behandeling en beloning. Het werken onder de VOC werd bovendien gekenmerkt door een nuchter multiculturalisme.

Matthias van Rossum – Generale Zeemonsterrollen VOC

Page 7: Dutch Ships and Sailors Project @ WAI 2014

Jur Leinenga – Monsterrollen Noordelijke provincies

Monsterrollen-database 1803-1937: Monsterrollen zijn bemanningslijsten met naam, rang, gage, woonplaats en leeftijd van elke zeeman aan boord, evenals de naam, het type en de grootte van het schip. […] voor Groningen en Friesland ligt het begin pas in de negentiende eeuw. Ze gunnen ons een kijkje in het beroepsleven van de zeeman in de negentiende en begin twintigste eeuw.

Page 8: Dutch Ships and Sailors Project @ WAI 2014

Dutch Ships and Sailors

Page 9: Dutch Ships and Sailors Project @ WAI 2014

Why Linked Data?

Page 10: Dutch Ships and Sailors Project @ WAI 2014

Why Linked Data?

gz:Mercuur

1782

gz:Buijksloot

gz:Batavia

gz:Claas Roem

voc:Claas Roem

voc:Buijksloot

1752das:Mercuur

das:Departure

das:Roem, Klaas

19-12-1780 das:Texel

das:Arrival

20-7-1781 das:Batavia

das:Voyage1

Web of Data

Page 11: Dutch Ships and Sailors Project @ WAI 2014

Why Linked Data?

mdb:Persoon

das:Persoon

gzmvoc:Schipper

dss:Person

foaf:Person

mdb:Begunstigde

mdb:Opvarende

Page 12: Dutch Ships and Sailors Project @ WAI 2014

Why Linked Data?

mdb:Schip1 mdb:Kof

mdb:scheepsType

das:ShipX das:Kofship

das:typeOfShip

dss:has_shipType

rdfs:subPropertyOf

rdfs:subPropertyOf

Page 13: Dutch Ships and Sailors Project @ WAI 2014

mdb:Schip1 mdb:Kof

mdb:scheepsType

das:ShipX das:Kofship

das:typeOfShip

Aat:Kof

Aat:Platbodems

skos:exactMatch

skos:exactMatch

skos:exactMatch

Why Linked Data?

Page 14: Dutch Ships and Sailors Project @ WAI 2014

Why Linked Data

• Heterogeneous models, one dataformat– Link what can be linked

• Keep specificity, allow integration at project level• Links to other sources: re-use knowledge

• Extensible• Allow multiple levels of semantic enrichment/

normalization – through Named Graphs – Provenance

Page 15: Dutch Ships and Sailors Project @ WAI 2014

Methods

ClioPatria

XMLRDF

1. XML ingestion (OAI)

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

ClioPatria powered by

Page 16: Dutch Ships and Sailors Project @ WAI 2014

Noordelijke Monsterrollen

Page 17: Dutch Ships and Sailors Project @ WAI 2014
Page 18: Dutch Ships and Sailors Project @ WAI 2014

Model mdb: aanmonstering-gron_nsm-1868-2

gzmvoc:schip-gron_nsm-1868-2-Frouwke

gzmvoc: persoon-gron_nsm-1868-2-Harm_Klaassens_Heins

"1868-01-21"

"66"

Frouwke

Smak

Harm Klaassens

Heins

gzmvoc: persoonscontract-gron_nsm-1868-2-Harm_Klaassens_Heins

"kapitein"

46

Leeftijd

Page 19: Dutch Ships and Sailors Project @ WAI 2014
Page 20: Dutch Ships and Sailors Project @ WAI 2014

Conversion: Generale Zeemonsterrollen

Page 21: Dutch Ships and Sailors Project @ WAI 2014

Model

gzmvoc:telling-3659-Marsseveen

gzmvoc:schip-3659-Marsseveen

gzmvoc:schipper-3659-Tollen

"NB: Ervaren onderstuurman Thomas Aldermark (Stokholm, 32 g, Meijenberg 1734), derdewaak Pieter Terduijn (Altena, 26 g, Opperdoes 1735)"

"5188 -> F6095"

Marsseveen

Schip

Gerrit

van derTollen

"21 gemeene zoldaaten"

Page 22: Dutch Ships and Sailors Project @ WAI 2014
Page 23: Dutch Ships and Sailors Project @ WAI 2014

gzmvoc:telling-7271-Marsseveen

gzmvoc:schip-3659-Marsseveen

"5188 -> F6095"

Marsseveen

Schip

“55 soldaten"

gzmvoc:telling-2881-Eendracht

gzmvoc:schipper-2881-Tollen

Gerrit

v.d.Tollen

Page 24: Dutch Ships and Sailors Project @ WAI 2014

gzmvoc:telling-7271-Marsseveen

gzmvoc:schip-3659-Marsseveen

"5188 -> F6095"

Marsseveen

Schip

“55 soldaten"

gzmvoc:telling-2881-Eendracht

gzmvoc:schipper-2881-Tollen

Gerrit

v.d.Tollen?

Page 25: Dutch Ships and Sailors Project @ WAI 2014

Identifying ships – Robin Ponstein

• Identify ships within a dataset– Based on: name, size, type, destinations etc.– Background knowledge

• Gold standard fabricated by Jur Leinenga• Base line algorithm: 74%• How dataset specific is this task?• Save results as separate graphs, with provenance

Date ShipName ShipType ShipSize HomePort CurrentPort Captain1852-02-27 Alberdiena kof NULL NULL Noorwegen (N) Wolkammer Albert Augustinus1852-07-31 Alberdina kof NULL Farmsum Friedrichstadt (D) Wolkammer Albert A.1861-09-30 Alberdina kof 98 NULL Gdansk, Danzig (PL) Wolkammer Albert Augustinus1870-03-08 Alberdina brik 222 NULL NULL Wolkammer Albert Augustinus1875-09-22 Alberdina bark 309 NULL Oostzee Wolkammer Augustinus

Page 26: Dutch Ships and Sailors Project @ WAI 2014

Linking to Historical newspapers - Andrea Bravo Balado

• Using existing data about ships to link to news items in a collection of historical newspapers

• Performing limited information extraction to enrich existing records

• Features: ship name, time intervals, captain’s names, ship type, named entities, keywords, background knowledge

Page 27: Dutch Ships and Sailors Project @ WAI 2014

Current status

• Input data set: Noordelijke Monsterrollen

• “Semi-supervised learning” – Multiple versions of algorithm– Evaluation done by expert (Jur

Leinenga)

• Current version: 94% precision, 9.739 records have 1+ links

Example: http://purl.org/collections/nl/dss/mdb/aanmonstering-del_gem-1879-101

Page 28: Dutch Ships and Sailors Project @ WAI 2014

Short demo

http://semanticweb.cs.vu.nl/dss/home

Page 29: Dutch Ships and Sailors Project @ WAI 2014

“To do”

• Example application (map)• Query Interface

• Provenance– How to represent (un)certainty for graphs?

• Link records to source images

• Infrastructure @ Huygens ING

• Link to other VU hist datasources!– DATATHON 2-4-2014!

Page 30: Dutch Ships and Sailors Project @ WAI 2014

DataLab

Questions?

Victor de Boer - WAI - 17-3-2014