Post on 15-Jul-2020
|
LDIF – LINKED DATA INTEGRATION FRAMEWORK
Christian Becker
Andrea Matteini
|
• Raw data (RDF)
• Accessible on the web
• Data can link to other data sources
• Benefits: Ease of access and re-use; enables discovery
WHAT IS LINKED DATA?
Thing
Thing
Thing
Thing
Thing
Thing
A B C
Thing
Thing
Thing
Thing
D E
data link data link data link data link
|
LINKING OPEN DATA CLOUD
As of September 2011
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
http://lod-cloud.net
|
TYPES OF LINKED DATA
InternalLinked Data
Open,Public Data
(LOD Cloud)
Commercial Linked Data
COMING SOON ?
• Provide interfaces on top of them
• Augment your website
• Integrate them into your application logic
• Create specialized data marts
... AND WHAT YOU CAN DO WITH THEM
|
AUGMENT YOUR WEBSITE: BBC
BBC online properties make intensive use of data from Wikipedia and MusicBrainz
|
DATA MARTS: NEUROWIKI
• NeuroWiki creates views for genes, drugs and diseases data from four RDF data sources
• Provides navigation and composition tools for accessing and mining the data
|
LINKED DATA CHALLENGES
• Data sources that overlap in content ...
• ... use a wide range of different RDF vocabularies
• ... use different identifiers for the same real-world entity
• Implications:
• Queries are usually hand-crafted against individual sources – no different than an API
• Improvised or manual merging of entities
• Integrating public datasets with internal databases poses the same problems
OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } . OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc } OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
Source: http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php
|
LDIF – LINKED DATA INTEGRATION FRAMEWORK
• Open source (Apache License, Version 2.0)
• Collaboration between Freie Universität Berlin and mes|semantics
• Supported in part by Vulcan Inc. as part of its Project Halo and by the EU FP7 project LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943).
Collect data: Managed download and update
Translate data into a single target vocabulary
Resolve identifier aliases into local target URIs
Output
1
2
3
4
|
Supported data sources:
• RDF dumps (various formats)
• SPARQL Endpoints
• Crawling Linked Data
LDIF PIPELINE
Collect data
Translate data
Resolve identities
Output
1
2
3
4
|
dbpedia-owl: City
LDIF PIPELINE
Collect data
Translate data
Resolve identities
Output
1
2
3
4
R2R
• Mappings expressed in RDF (Turtle)
• Simple mappings using OWL / RDFs statements(x rdfs:subClassOf y)
• Complex mappings with SPARQL expressivity
• Transformation functions
Data sources use a wide range of different RDF vocabularies
schema:Place
fb:location.citytown
local:City
|
LDIF PIPELINE
Collect data
Translate data
Resolve identities
Output
1
2
3
4
rdf:type wiki:Gene
wiki:IsInvolvedIn
Silk
Berlin, GermanyBerlin, CTBerlin, MDBerlin, NJBerlin, MA
Berlin
52° 3
1′ N,
13° 2
4′ O
• Profiles expressed in XML
• Supports various comparators and transformations
Data sources use different identifiers for the same entity
Berlin=
Berlin, Germany
52° 3
1′ N,
13° 2
4′ O
|
Output options:
• N-Quads
• N-Triples
• SPARQL Update Stream
• Provenance tracking using Named Graphs
LDIF PIPELINE
Collect data
Translate data
Resolve identities
Output
1
2
3
4
|
LINKED DATA APPLICATION ARCHITECTURE
!
!
!
!
Application!Code!!
!!
Application!Layer!
Data!Access,!!Integration!and!!Storage!Layer!
Web!of!Data!
Publication!Layer!
Integrated!Web!Data!
Data!Translation!Module!
!
Identity!Resolution!Module!
!!
SPARQL!or!RDF!API!
LD!Wrapper!
HTTP!
HTTP! HTTP!
Quality!Evaluation!Module!
!
Database!A!
RDF/!XML!
HTTP!
LDIF!0.4!
Web!Data!Access!Module!
!!
LD!Wrapper!
Database!B!
CMS!
RDFa!
|
VERSIONS
• In-memory
• keeps all intermediate results in memory
• fast, but scalability limited by local RAM
• RDF Store (TDB)
• stores intermediate results in a Jena TDB RDF store
• can process more data than In-memory but doesn't scale
• Cluster (Hadoop)
• scales by parallelizing work across multiple machines using Hadoop
• can process a virtually unlimited amount of data
|
BENCHMARKSKEGG GENES VS. UNIPROT (CLUSTER)
300M TRIPLES
3.6B TRIPLES
|
NEXT STEPS
• Support for Amazon Elastic MapReduce
• UIs for managing workflow and mappings
• Additional import and output modules
• Quality Evaluation and Data Fusion Module
|
Q & A
|
• Website: http://bit.ly/ldifweb
• Google Group: http://bit.ly/ldifgroup
• http://mes-semantics.com
THANKS!