Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije...
-
Upload
darius-rumble -
Category
Documents
-
view
213 -
download
0
Transcript of Introduction to Semantic Web What? Why? How? So far? Next? Frank van Harmelen AI Department Vrije...
Introduction to Semantic Web
What? Why? How? So far? Next?
Frank van HarmelenAI Department
Vrije Universiteit Amsterdam
Creative Commons License: allowed to share & remix,but must attribute & non-commercial
Who am IFrank van HarmelenProf in AI at Vrije Universiteit AmsterdamKnowledge RepresentationEarly Semantic Web Projects (> 1999)Co-designed OWLTech advisor of Aduna (Sesame)Scientific Director of LarKC
(Large Knowledge Collider)I know nothing about image analysis…
Who are you?who knows roughly what Semantic Web is?who has heard of RDF & OWL?who has studied RDF & OWL?who has used RDF & OWL?who expects ever to use RDF & OWL?
who is a logicianwho is a KR researcherwho is a Web researcherwho is an image researcher
General idea ofthe Semantic Web
General idea of Semantic WebMake current web more machine
accessible(currently all the intelligence is in the user)
Motivating use-cases• search• personalisation• semantic linking• data integration• web services• ...
General idea of Semantic Web
Do this by:1. Making data and meta-data
available on the Webin machine-understandable form (formalised)
2. Structure the data and meta-data in ontologies
These are non-trivial design decisions.Alternative would be:
Make current web more machine accessible(currently all the intelligence is in the user)
What’s wrong with the Web?
linked web-pages, written by people, written for people, used only by people...
Many of these pagesalready come from data,usable by computers!But we can’t link the data....
?
? ?
??
linked data,usable by computers!useful for people!
"Web of Data" (TBL)
1. expose data on the web (“facts”) in interoperable form (RDF)
2. expose knowledge on the webwith interoperable semantics (ontologies, RDF Schema, OWL)
3. Apply lightweight inference for Interoperability Query answering Search Unexpected reuse …
Semantic Web
Not just data,also knowledge
All of this:• Low expressivity logic (RDF)• That allows some inference:
Property inheritance, domain/range inference
Some of this:• Medium expressive logic (OWL)• That allows more inference:
(in)equality, number restrictions, datatypes
Desideratum:On the Web of Data, anyone can say anything about anything
• Need for total decoupling of • data• vocabulary • meta-data
x T
[<x> IsOfType <T>]
differentowners & locations
<village>
Two versions of Semantic Web story:
V1: Semantic Web = annotated Web ;1 & 2 are embedded in text & images on the Web
V2: Semantic Web = Web of Data ;1 & 2 live in dedicated repositories (triple stores)
x T
[<x> IsOfType <T>]
differentowners & locations
<village>
Why is this hard?
machine accessible meaning (What it’s like to be a machine)
<name>
<symptoms>
<drug>
<drugadministration>
<disease>
<treatment>
IS-A
alleviatesMETA-DATA
What is meta-data?
it's just datait's data describing other dataits' meant for machine consumption
disease
name
symptoms
drug
administration
What is required?
Required are:1. one or more standard vocabularies
so search engines, producers and consumersall speak the same language
2. a standard syntax, so meta-data can be recognised as such
3. lots of resources with meta-data attached
Bluffer’s Guide to RDF & RDF Schema
Bluffer’s Guide to RDF• Express relations between things:
• Results in labelled network (“graph”)• All labels are actually web-addresses (URIs)• You can “ping” any label and find out more• Bits of the graph can live at physically different
locations & have different owners
Frank y
x
AuthorOf
AuthorOf MITpublishedBy
Subject ObjectPredicate
Bluffer’s Guide to RDF Schema
• types for subjects & objects & predicates• Types organised in a hierarchy• Inheritance of properties
Frank y
x
AuthorOf
AuthorOf MITpublishedBy
author book publisher
person artifact
man
So what’s special about RDF(S)? statements about an identifier can be
distributed
<owl:Individual ID="CENTRAL-COAST" />
<owl:Individual rdf:about="CENTRAL-COAST"> <type rdf:resource="#CALIFORNIA-REGION"/></owl:Individual>
no unique name assumption no closed world assumption
Rememberweb-style
decoupling
Remember:
• Need for total decoupling of • data• vocabulary • meta-data
x T
[<x> IsOfType <T>]
differentowners & locations
<village>
RDF(S) have a (very small) formal semanticsDefines what other statements are
implied by a given set of RDF(S) statements
Ensures mutual agreement on minimal contentbetween parties without further contact
In the form of “entailment rules”Very simple to compute
(and not explosive in practice)
RDF(S) semantics: examplesAspirin isOfType Painkiller
Painkiller subClassOf Drug Aspirin isOfType Drug
aspirin alleviates headachealleviates range symptom headache isOfType symptom
RDF(S) semantics: examples isOfType subClassOf isOfType
range isOfType
RDF(S) semanticsX R Y + R domain T X IsOfType TX R Y + R range T Y IsOfType TT1 SubClassOf T2 +
T2 SubClassOf T3 T1 SubClassOf T3X IsOfType T1 +
T1 SubClassOf T2 X IsOfType T1
Semantics = predictable inference
Bluffer’s Guide to OWL
OWL: things RDF Schema can’t doequalityenumerationnumber restrictions
• Single-valued/multi-valued• Optional/required values
inverse, symmetric, transitiveboolean algebra
• Union, complement…
Layered language OWL Lite: Classification hierarchy Simple constraints
OWL DL: Maximal expressiveness While maintaining tractability Standard formalisation
OWL Full: Very high expressiveness Loosing tractability Non-standard formalisation All syntactic freedom of RDF
(self-modifying)
Syntactic layeringSemantic layering
Syntactic layeringSemantic layering
Full
DL
Lite
Language Layers
Full
DL
Lite
OWL Full Allow meta-classes etc
OWL DLNegationDisjunctionFull CardinalityEnumerated types
OWL Light(sub)classes, individuals(sub)properties, domain, rangeconjunction(in)equalitycardinality 0/1datatypesinverse, transitive, symmetrichasValuesomeValuesFromallValuesFrom
RDF Schema
Backward compatibility with RDF
<owl:Class rdf:ID="City"> <rdfs:subClassOf rdf:resource="#GeographicEntity"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#ruler"/> <owl:allValuesFrom rdf:resource="#Mayor"/> </owl:Restriction> </rdfs:subClassOf></owl:Class>
OWL agents understand everything…
<owl:Class rdf:ID="City"> <rdfs:subClassOf rdf:resource="#GeographicEntity"/> <daml:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#ruler"/> <daml:toClass rdf:resource="#Mayor"/> </daml:Restriction> </daml:subClassOf></owl:Class>
OWL agents understand everything…… others still the most important aspects
Backward compatibility with RDF
OWL also has a formal semanticsDefines what other statements are implied by a
given set of statements
Ensures mutual agreement on content(both minimal and maximal)between parties without further contact
Can be used for integrity/consistency checking
Hard to compute (and rarely/sometime/always explosive in practice)
OWL semantics: minimalvanGogh isOfType Impressionist
Impressionist subClassOf Painter vanGogh isOfType Painter
vanGogh painter-of sunflowerspainter-of domain painter vanGogh isOfType painter
OWL semantics: maximalvanGogh isOfType Impressionist
Impressionist disjointFrom Cubist NOT: vanGogh isOfType Cubist
painted-by has-cardinality 1sun-flowers painted-by vanGoghPicasso different-individual-from vanGogh NOT: sun-flowers painted-by Picasso
Remember:Require are
1. standard vocabularies2. a standard syntax,3. lots of resources with meta-data
attached
Ontologies: real life examples handcrafted
• music: CDnow (2410/5), MusicMoz (1073/7)• biomedical: SNOMED (200k), GO (15k),
Emtree(45k+190kSystems biology
ranging from lightweight • Yahoo, UNSPC, Open directory (400k)
to heavyweight (Cyc (300k))
ranging from small (METAR) to large (UNSPC)
Biomedical ontologies (a few..)Mesh
• Medical Subject Headings, National Library of Medicine • 22.000 descriptions
EMTREE• Commercial Elsevier, Drugs and diseases• 45.000 terms, 190.000 synonyms
UMLS• Integrates 100 different vocabularies
SNOMED• 200.000 concepts, College of American Pathologists
Gene Ontology• 15.000 terms in molecular biology
NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),
Remember:Require are
1. standard vocabularies2. a standard syntax,3. lots of resources with meta-data
attached
Who makes the meta-data?
Don’t throw away what we already have:• Databases (Amazon.com)• Navigation structures• meta-data in documents
• Office, Acrobat, MP3, jpg
As spin-off on what we already do• MIT Media Lab photo annotator
Automated analysis• Text, Images, Video
Summary so far
Linked Data/Semantic Web
Identification• Uniform Resource Identifier (URI) • Global identifier (NB: persistent!)• Looks like a URL,
is often and internationalized Resource Identifier (IRI) Description
• Resource Description Framework (RDF)• RDF Schema (RDFS)• Simple Knowledge Organization System (SKOS)• Web Ontology Language (OWL)
Querying• RDF Triple stores• SPARQL Query Language
Hoe ziet RDF eruit?Datamodel is een (directed) graphElk data-item is een ‘resource’ met
een URI als identifierElke eigenschap is een binaire relatie:
• ‘triple’• Tussen resources:
<subjectURI, predicateURI, objectURI>
• Tussen een resource en een ‘literal’<subjectURI, predicateURI, “literal value”>
Why is this a Web of data?
Global unique identifiersReuse of identifiers in other datasets
• For data:(two sources say something about over ‘Amsterdam’ )
• For schema:(two sources each use the same concept ‘City’)
This reuse builds “links” between datasets
Does this work in practice?
already many billions of facts & rules
Linked Open Data cloud
Encyclopedia
Encyclopedia
Geographic names (millio
ns)
Geographic names (millio
ns)
names of artis
ts & art works
(10.000’s)
names of artis
ts & art works
(10.000’s)
scientific bibliographies
scientific bibliographies
hierarchical dictio
naries
(UK, F
R, NL)
hierarchical dictio
naries
(UK, F
R, NL)
life-science databases
life-science databases
any CD ever recorded (a
lmost)
any CD ever recorded (a
lmost)
May ‘09 estimate > 4.2 billion triples + 140 million interlinks
May ‘09 estimate > 4.2 billion triples + 140 million interlinks
basic facts on every country
on the planet
basic facts on every country
on the planet
common sense rules & fa
cts (100.000’s)
common sense rules & fa
cts (100.000’s)
It gets bigger every month
It gets bigger every month
And remember:not just data
All of this: Low expressivity logic (RDF/RDFS) That allows some inference:
Property inheritance, domain/range inference
Some of this: Medium expressive logic (OWL) That allows more inference:
(in)equality, number restrictions, datatypes
Nice in the lab, but are you getting
anywhere in practice?
Semantic Web
News Quiz• Google• Reuters• New York Times• Microsoft• Zemanta• Obama Government• BBC (music, worldcup, wildlife)• BestBuy.com• Facebook
Challenges
What to do when success is becoming a problem?
Heterogeneity ontology mapping, instance identification
Scale (10^10 statements)Dynamics, versioning
(Flickr: 3000 pictures/minute, Wikipedia: 100 edits/minute)
Trust, attribution, provenanceMultimedia
In both directions