datos.bne.es: Publishing and Consuming

22
datos.bne.es: Publishing and consuming Daniel Vila Suero [email protected] Ontology Engineering Group, Universidad Politécnica de Madrid Acknowledgements: OEG Members, BNE team (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí, Ricardo Santos and others) 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS) Edinburgh- 21st September 2012

description

Talk at the 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS). Taking place in Edinburgh, Scotland on 21st September 2012

Transcript of datos.bne.es: Publishing and Consuming

Page 1: datos.bne.es: Publishing and Consuming

datos.bne.es: Publishing and

consuming Daniel Vila Suero [email protected]

Ontology Engineering Group, Universidad Politécnica de Madrid Acknowledgements: OEG Members, BNE team (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí,

Ricardo Santos and others)

2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS)

Edinburgh- 21st September 2012

Page 2: datos.bne.es: Publishing and Consuming

datos.bne.es

2

Page 3: datos.bne.es: Publishing and Consuming

Background

•  Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid.

•  Multidisciplinary effort: Librarians, Computer scientists, linguists..

•  Close collaboration between library experts and computer scientists.

•  Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, RDA..)

3

datos.bne.es

Page 4: datos.bne.es: Publishing and Consuming

Main goals

•  Perform the transformation incrementally and iteratively

•  Develop a system where library experts can define and assess the mappings to RDF independently from the IT people

•  Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example)

•  Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data)

4

datos.bne.es

Page 5: datos.bne.es: Publishing and Consuming

Some figures

5

datos.bne.es •  Total number of authority records: 4.100.000 •  Total number of bibliographical records: 2.390.140 •  Total number of RDF triples: 58.053.215 •  Number of links: (15% authorities): 587.520 •  Linked sources:

•  VIAF •  SUDOC (French Collective University Catalogue) FR •  GND (German National Library Authorities) GER •  LIBRIS Sweden •  DBPedia •  Soon BNF, BNB, German Bibliographie

Page 6: datos.bne.es: Publishing and Consuming

Some statistics

6

datos.bne.es

2.390.103

1.969.526

1.163.764

1.114.719

497.644

282.879

Manifestation

Work

Person

Expression

Thema

Corporate Body

Page 7: datos.bne.es: Publishing and Consuming

Some statistics

7

datos.bne.es

0

500.000

1.000.000

1.500.000

2.000.000

2.500.000 2.129.222 2.129.222

1.246.773

1.246.773

1.054.736

1.054.736

85.347 85.347 78.561 16.462 16.462

755 755

Page 8: datos.bne.es: Publishing and Consuming

Publishing

8

Page 9: datos.bne.es: Publishing and Consuming

Our data model

9

Publishing

frbr:WORK frbr:EXPRESSION

frbr:MANIFESTATION

frbr:CORPORATE BODY frbr:PERSON

frsad:THEMA

is creator of is created by

is part of

has subject

is subject of is part of

is embodied in

is subordinate of

frbr frad

ObjectProperty

Class

DatatypeProperties

frsad

frbr

frbr frad

frbr

isbd

PREFIXES frbr: http://iflastandards.info/ns/fr/frbr/frbrer/ frad : http://iflastandards.info/ns/fr/frad/ frsad: http://iflastandards.info/ns/fr/frsad/ isbd: http://iflastandards.info/ns/isbd/elements/

ELEMENTS

is realized through

is realization of

is embodiment of

is realized by

is realizer of

Page 10: datos.bne.es: Publishing and Consuming

Transformation process

10

Publishing

•  How to facilitate the mapping process to library experts? 1.  Use a familiar and intuitive interface: Spreadsheets 2.  Work only on what's in the database: Pre-process records

to build the spreadsheets

•  3 step-process 3 different spreadsheets

1.  Classification: is it a Person? a Work? a Manifestation? 2.  Annotation: name, birth date, title, language of expression 3.  Relation: find relationships between entities (Person is

creator of a certain work)

Page 11: datos.bne.es: Publishing and Consuming

100 $a Cervantes Saavedra, Miguel de

100 $a frbr:Person

String(100 $a $t) frbr:isCreatorOf100 $a Cervantes Saavedra, Miguel de$t Don Quijote de la Mancha

String(100 $a)

100 $a $t

frbr:titleOfWork100 $t

MARC 21 DATA MARC 21 STRUCTURE RDFS/OWL

100 $a frbr:nameOfPerson

PRE-PROCESSING STEP

has subfield

has subfield

has heading

has heading

has content

has content

contained in

frbr:Work

Heading Class Object property Datatype/Annotation property

maps to

maps to

maps to

maps to

maps to

Librarians manually define the mappings

Variation(100$a + $t)

11

Publishing

Page 12: datos.bne.es: Publishing and Consuming

Mapping process

12

Publishing Open mappings at: http://bne.linkeddata.es/mapping-marc21

Page 13: datos.bne.es: Publishing and Consuming

Mapping process

13

Publishing

Page 14: datos.bne.es: Publishing and Consuming

Mapping process

14

Publishing

Page 15: datos.bne.es: Publishing and Consuming

Still a lot of work to do

15

Publishing

•  We cover only core relations of FRBR

•  There is a significant amount of manifestations not linked to their expressions currently looking at more sophisticated clustering techniques

•  Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica) Next version (to be published this year) will contain these links

•  Classification step can be further automatized

Page 16: datos.bne.es: Publishing and Consuming

Consuming

16

Page 17: datos.bne.es: Publishing and Consuming

Perspectives

•  2 different perspectives: -  Systems and applications:

•  SPARQL endpoint, •  Linked Data API

-  End-user interfaces

•  + an interesting side-effect: -  By applying FRBR and RDF mappings we can (and did)

improve the catalogue

•  Using standard web technologies and more intuitive models we open the door to:

-  Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions… 17

Consuming

Page 18: datos.bne.es: Publishing and Consuming

Graph analysis example

18

Consuming Don Quijote de la ManchaFrench manifestations

(213)

Novelas EjemplaresSpanish manifestations

(303)

Don Quijote de la ManchaSpanish manifestations

(840)

Don Quijote de la ManchaEnglish manifestations

(247)

Don Quijote de la Manchafrbr:Work

Miguel de Cervantes

Don Quijote de la ManchaGerman manifestations

(49)

EntremesesSpanish manifestations

(86)

frbr:Work frbr:isEmbodiedIn frbr:Expression

frbr:Expression frbr:IsManifestedBy frbr:Manifestation

frbr:Person frbr:isCreatorOf frbr:Work

( ) Number of resources

Using Open-source tools: Gephi for example

http://bne.linkeddata.es/graphvis

Page 19: datos.bne.es: Publishing and Consuming

Enabling access to systems and apps

19

Consuming Linked Data API: http://datos.bne.es/frontend/persons

Page 20: datos.bne.es: Publishing and Consuming

Flexible access to data

20

Consuming Out of the box: • Search by every field • Access cluster of resources • Filtering • Paging • Serve multiple formats: XML, Turtle, JSON

Page 21: datos.bne.es: Publishing and Consuming

Different views over the data

21

Consuming

HTML

XML

Page 22: datos.bne.es: Publishing and Consuming

22

Consuming END-user interfaces

Current linked data opens the door to: • Re-rank OPAC results • Better clustering of results • Recommendation • Enhance data from other sources