Introduction to linked data and the semantic web
-
Upload
dave-reynolds -
Category
Technology
-
view
4.827 -
download
3
description
Transcript of Introduction to linked data and the semantic web
Linked data and its role in the semantic web
Dave Reynolds, Epimorphics Ltd@der42
Roadmap
image: Leo Oosterloo @ flickr.com
What is linked data?
What is linked data?
ExamplesExamples
ModellingModelling
AccessAccess
Strengths and weaknesses
Strengths and weaknesses
other topicsother topics
Linked data intro
Linked data ...
publishing data on the web ...
... to enable integration, linking and reuse across silos
Can’t we just publish data as files?pdf
easy to read and publishExcel
allows further processing and analysis csv
processing without need for proprietary tools
But ... structure of data not explained no connection between different data sets, silos static and fixed – can’t retrieve just slices relevant
to problem
Linked dataApply the principles of the web to publication of
dataThe web:
is a global network of pages each identified by a URL fetching a URL gives a document pages connected by links open, anyone can say anything about anything
else
Linked dataApply the principles to the web to publication of
dataThe linked data web:
is a global network of things each identified by a URI fetching a URI gives a set of statements things connected by typed links open, anyone can say anything about anything
else
Linked data is “data you can click on”
Example schools informationhttp://education.data.gov.uk/id/school/401874
Example schools informationhttp://education.data.gov.uk/id/school/401874
“Cardiff High School”“Secondary”
“Cardiff”
label phasedistrict
Example schools informationhttp://education.data.gov.uk/id/school/401874
“Cardiff High School”
phasedistrict
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label
school:PhaseOfEducation_Secondary
label
Example schools informationhttp://education.data.gov.uk/id/school/401874
“Cardiff High School”
phasedistrict
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label
school:PhaseOfEducation_Secondary
http://data.ordnancesurvey.co.uk/id/7000000000025484
label
contains wardextent
contains parishGML: 310499.4 184176.6 310476.5 ...
Example schools informationhttp://education.data.gov.uk/id/school/401874
“Cardiff High School”
phasedistrict
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label
school:PhaseOfEducation_Secondary
http://data.ordnancesurvey.co.uk/id/7000000000025484
label
contains wardextent
contains parishGML: 310499.4 184176.6 310476.5 ...
same as
Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up
those names When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
Include links to other URIs, so that they can discover more things
Pattern of application of semantic web stack
Linked open data cloud: 2007
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2009
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2010
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Data.gov.uk – linked datasets and APIs
Data.gov.ukvisualizations on top of linked data
Ordnance survey
Environment agency - data, API, visualizations
BBC – integration and site design
E-commerce and rich snippets
Overstock.com
Peek-cloppenburg.de
Internal use
Open?
Linked open data =
linked data +
open data
Modelling
ModellingThing, entity, concept ... resource resource being described
abstract concept real world thing data item, particular measurement document
identify by URI provide information making statements about
those resources identifier NOT a container c.f. UML
open schema critical to open extensibility and integration similar to Entity-Attribute-Value modelling
Modelling – RDF – Resource Description Framework
Statement, triple, logical assertion
Subject Predicate Object
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
some school has a name/label some literal
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
http://education.data.gov.uk/id/
school/401874
has a name/label “Cardiff High School”
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
http://education.data.gov.uk/id/
school/401874
http://www.w3.org/2000/01/rdf-schema#label
“Cardiff High School”
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
school:401874 rdfs:label “Cardiff High School”
whereschool: = http://education.data.gov.uk/id/school/rdfs: = http://www.w3.org/2000/01/rdf-schema#
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
school:401874 rdfs:label “Cardiff High School”
school:401874 ont:districtAdministrative
la:00PT
la:00PT rdfs:label Cardiff
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
school:401874 rdfs:label “Cardiff High School”
school:401874 ont:districtAdministrative
la:00PT
la:00PT rdfs:label “Cardiff”
school:401874
“Cardiff High School”
ont:districtAdministrative
la:00PT
“Cardiff”
rdfs:label
rdfs:label
Modelling – RDFStatement, triple, logical assertion
Subject Predicate Object
school:401874 rdfs:label “Cardiff High School”
school:401874 ont:districtAdministrative
la:00PT
la:00PT rdfs:label “Cardiff”
la:00PT rdfs:label “Caerdydd”@cy
RDF SyntaxesRDF/XML
normative
Turtle more human readable/writable being standardized
RDFa embed in (X)HTML
[others omitted]
Modelling – RDFRDF/XML syntax
Subject Predicate Object
school:401874 rdfs:label “Cardiff High School”
school:401874 ont:districtAdministrative
la:00PT
la:00PT rdfs:label “Cardiff”
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ont="http://education.data.gov.uk/def/school/" xmlns:la="http://statistics.data.gov.uk/id/local-authority-district/" xmlns:school="http://education.data.gov.uk/id/school/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://education.data.gov.uk/id/school/401874"> <rdfs:label>Cardiff High School</rdfs:label> <ont:districtAdministrative> <rdf:Description rdf:about="http://statistics.data.gov.uk/id/local-authority-district/00PT"> <rdfs:label>Cardiff</rdfs:label> </rdf:Description> </ont:districtAdministrative> </rdf:Description></rdf:RDF>
Modelling – RDFTurtle syntax
Subject Predicate Object
school:401874 rdfs:label “Cardiff High School”
school:401874 ont:districtAdministrative
la:00PT
la:00PT rdfs:label “Cardiff”
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix school: <http://education.data.gov.uk/id/school/> .@prefix ont: <http://education.data.gov.uk/def/school/> .@prefix la: <http://statistics.data.gov.uk/id/local-authority-district/> .
school:401874 rdfs:label "Cardiff High School"; ont:districtAdministrative la:00PT .
la:00PT rdfs:label "Cardiff" .
ModellingVocabularies
so far no actual models, let alone semantics want to define
types of thing : Class what you can say about them : Property
encode definitions in more RDFand publish at the corresponding URIs link from data to data model reuse published vocabularies to enable integration freely combine different vocabularies or new ones
Modelling – vocabulariesLogical modelling modelling the domain, not a particular data
structure what exists what is asserted? what can you deduce from that? not about constraints as such monotonic, open world
controlledvocabulary
taxonomy
thesaurus
ontology
Ontology
Modelling – vocabularies
unfamiliar terminology but related to information architecture and conceptual modelling domain-driven design
... and yes knowledge representation
Modelling – RDFSRDF vocabulary description language classes, types and type hierarchy
ont:School rdfs:Classrdf:type
“School”rdfs:label
Modelling – RDFSRDF vocabulary description language classes, types and type hierarchy
ont:WelshEstablishment
ont:School rdfs:Classrdf:type
rdf:typerdfs:subClassOf
“School”rdfs:label
Modelling – RDFSRDF vocabulary description language classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
“School”rdfs:label
Modelling – RDFSRDF vocabulary description language classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
school:401874
ont:WelshEstablishment
ont:School
rdf:type
“School”rdfs:label
“School”rdfs:label
Modelling – RDFSRDF vocabulary description language properties, property hierarchy
school:401874
person:JoeBloggsont:staffAt
ont:headOf
rdf:Property
ont:headOfrdf:type
rdfs:subPropertyOf
school:401874person:JoeBloggs
ont:staffAt
ont:headOf
Modelling – RDFSRDF vocabulary description language class/property relations
domain range
Already have power to do some vocabulary mapping declare classes or properties from different
vocabularies to be equivalent:A rdfs:subClassOf BB rdfs:subClassOf A
Modelling - OWL richer modelling and semantics axioms on properties
transitive, symmetric, inverseOf, ... functional, inverse functional equivalent property
axioms on classes intersection, union, disjoint, equivalent
restrictions on classes some value from, all values from, cardinality, has
value, one of, keys axioms on individuals
same as, different from, all different imports
Modelling – OWL supports much richer modelling consistency checking of model consistency checking of data
some surprises if used to schema languages open world, no unique name assumption can extend to closed world checking
inference classification inferred relationships
ModellingSpectrum of goals and styles
Lightweight vocabularies Rich ontological models
simple modelling just enough agreement
to get useful work done removing boundaries to
enable information to be found and connected
global consistency not possible
a little semantics goes a long way
rich domain models need expressivity consistency is critical make complex
inferences you can rely on, across data you trust
knowledge is power
ModellingOntology reuse invest in complete ontology for a domain
rich but general model, may be modular inside strong “ontological commitment” e.g. medical ontologies
reuse small, common, vocabularies FOAF, SIOC, Dublin Core, Org ... pick and choose classes and properties you need fill in a few missing links for your domain
generic reusable vocabularies Data cube vocabulary
Accessing all this data link following
HTTP GET, follow links, aggregate relevant statements
query SPARQL
rdfs:labelont:districtAdministrative
SPARQL core idea is pattern matching
graph patterns with variables any subgraph which matches yields row of
bindings
syntax based on Turtle syntax for RDF web API endpoints lots of power
?school [ ] “Cardiff”
filters optionals named graphs
sub-queries property chains aggregation
federated query update construct
Accessing all this data link following
HTTP GET, follow links, aggregate relevant statements query
SPARQL linked data API
RESTful API onto linked data resources simple query, usable without RDF stack, web dev
friendly easy to layer visualizations and UIs on top
third parties search engines and aggregators e.g. Sindice,
sameAs.org
Semantic web layer cake
Strengths and weaknesses
image: spcbrass @ flickr.com
Strengths data integration
use of global identifiers (URIs) composable – statements v. containers, schemaless linking, vocabulary mapping
extensible, incremental, decentralized, resilient no global ontology/schema to develop or maintain freely add terms from other vocabularies open world assumption
modelling and data entwined link data to models, data in context use same technology to share, manage extend models
supports inference and classification rich access routes
web linking, download, query, web APIs
Weaknesses complexity of the stack
alphabet soup – RDF, RDFS, OWL, SPARQL, RIF .. unfamiliar “ontology”, “logical entailment” lots of arcane details RDF/XML syntax
performance of schema-less stores optimization challenges
limited validation and constraints
cost of modelling,ontology development
no inbuilt notions of time, uncertainty
• use the parts you need• tooling e.g. Linked Data API• core ideas not that complex
• technology improving steadily• hybrid solutions
• closed world checkers
• ontology reuse• generic ontologies (data cube)• tools
• model on top
Wrapping up
image: erika g. @ flickr.com
Things we missed out RDF nuances
blank nodes, containers and collections named graphs
linked data nuances URI for thing v. web page, content negotiation, httprange-14 URI architecture
OWL nuances OWL species, serializations, lots of details
Other technologies in the stack SPARQL update, rules (RIF), GRDDL, Powder, Geo
SPARQL, RDB mapping, triple/quad stores Embedding structured data in markup
RDFa, micro formats, micro data, schema.org and all that
Hot topics Government linked data
identifiers to seed linked data data publication
transparency, improving services, economic growth
structured data and search engines rich snippets, structured results, SEO search => question answering
user interfaces visualization, exploration, exploiting linking
data as a service
fin.
Spare
Case study: Local government payments
data model publish useuse