The Semantic Web A progress report and some observations Pat Hayes, IHMC.

67
The Semantic Web A progress report and some observations Pat Hayes, IHMC

Transcript of The Semantic Web A progress report and some observations Pat Hayes, IHMC.

The Semantic Web

A progress report and some observations

Pat Hayes, IHMC

The vision of the Semantic Web

• The WWW is a planet-wide system linking computers which enables people to communicate, establish links and publish content to one another.

• The SW plans to use it to do this with machine-usable content, so that software can read it, draw conclusions from it and act on it.

The vision of the Semantic Web

• The WWW is a planet-wide system linking computers which enables people to communicate, establish links and publish content to one another.

• The SW plans to use it to do this with machine-usable content, so that software can read it, draw conclusions from it and act on it.

• Possible applications include B2B, services, improved WWW access, integrated datahandling, Global Mind…

The vision of the Semantic Web

• The WWW is:• Moore’s law

The vision of the Semantic Web

• The WWW is:• Moore’s law + Optic fiber

The vision of the Semantic Web

• The WWW is:• Moore’s law + Optic fiber + HTTP

The vision of the Semantic Web

• The WWW is:• Moore’s law + Optic fiber + HTTP + HTML

The vision of the Semantic Web

• The WWW is:• Moore’s law + Optic fiber + HTTP + HTML (+

extra software goodies such as Javascript)

The vision of the Semantic Web

• The WWW is:• Moore’s law + Optic fiber + HTTP + HTML • The SW will use the first two, and rely on the

third for now.

The vision of the Semantic Web

• The WWW is:• Moore’s law + Optic fiber + HTTP + HTML • The SW will use the first two, and rely on the

third for now. • But it needs a new ‘semantic HTML’, i.e. a

standard reference language for expressing content. This is where most of the effort has gone so far.

Semantic markup languages

• There are several candidate languages now being used or proposed:

RDF RDFS

DAML+OIL

OIL

OWL

Semantic markup languages

• There are several candidate languages now being used or proposed:

RDF RDFS

OWL-RDF

DAML+OIL

OIL

OWL-DL

Semantic markup languages

• There are several candidate languages now being used or proposed:

RDF RDFS

OWL-Full

DAML+OIL

OILExtensional, ‘layered’

Intensional, non-wf.

OWL-DL

W3C semantic markup languages

RDF RDFS

OWL-Full

Uniform and very simple syntactic model, processable by simple XML engines.

Intensional, non-well-founded semantics.

All RDF/RDFS/OWL assertions are encoded as sets of triples of form

aaa RRR bbb .

which means RRR(aaa, bbb); all variables are existential; all names are urirefs or literals.

The rest of the family consists of semantic extensions to this basic RDF model.

W3C semantic markup languages

RDF RDFS

OWL-Full

Uniform and very simple syntactic model, processable by simple XML engines.

Intensional, non-well-founded semantics.

All RDF/RDFS/OWL assertions are encoded as sets of triples of form

aaa RRR bbb .

which means RRR(aaa, bbb); all variables are existential; all names are urirefs or literals.

The rest of the family consists of semantic extensions to this basic RDF model.

(There is also a very ugly XML serial syntax.)

<ex:Mary> <ownershipOntologies:had> _:ll .

_:ll <rdf:type> <ex:Lamb> .

_:ll <dimensionOntologies:size> <ex:Little> .

W3C semantic markup languages

RDF RDFS

OWL-Full

Users are expected to define classes and use classes and properties defined by other users. The urirefs used as names constitute the ‘links’ between ontologies, eg

_:xx dc:title “My Diary” .

_:xx dc:author _:yy .

_:yy rdf:type biocat:HumanBeing .

_:yy w3:mailbox “[email protected]” .

_:yy usgov:ssNumber “567881962”^^xsd:string .

Many of these RDF ontologies already exist (c. 10|6 lines of RDF).

Universal resource identifiers

• Links on the WWW are mostly URLs (global file address scheme), but also URNs and others.

• Key SW idea is that a URI locates the ‘owner’ of any name, ie the authoritative source of information about the intended meaning.

• NB, the URI is usually not the intended denotation.

• The names are the links.

W3C semantic markup languages

RDF RDFS

OWL-Full

RDFS has vocabulary for talking about properties (binary relations), membership in classes, subclass and subproperty relationships, eg

rdf:Property rdf:type rdf:Class .

rdf:Class rdf:type rdf:Class .

ph:FatherOf rdfs:subPropertyOf ph:ancestorOf .

Two different classes can have the same members…classes can contain themselves…

<ex:Mary> <prop:had> _:xxx .

_:xxx <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> _:ll .

_:ll <http://www.w3.org/2000/01/rdf-schema#subClassOf> <ex:Lamb> .

_:ll <http://www.w3.org/2000/01/rdf-schema#subClassOf> <ex:Little> .

<ex:Lamb> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < http://www.w3.org/2000/01/rdf-schema#Class>

<owl:Class rdf:about=“#OwnersOfOneLittleLamb”>

<owl:Restriction owl:cardinality=“1”^xsd:integer>

<owl:onProperty rdf:resource=“prop:had” />

<owl:someValueFrom rdf:resource=“#LittleLambs” />

</owl:Restriction>

</owl:Class>

<Person rdf:ID=“ex:Mary”>

<prop:had rdf:value=“MarysLamb” />

</Person>

< owl:IntersectionOf LittleLambs rdf:resource=

<rdf:List>

<owl:Restriction owl:onProperty ex:size >

<owl:allValuesFrom <owl:Class owl:one of ex:Small /> /> </owl:Restriction>

<Ex:Lambs>

</rdf:List> />

W3C semantic markup languages

RDF RDFS

OWL-Full

RDF: basic assertions (existential conjunctive binary positive logic); containers (bags, sequences, lists), XML literals, reification, …

RDFS: classes, subclass, subproperty; property ranges and domains; Literals corresponding to all XML Schema datatypes (strings, numbers, dates, etc…)

OWL: Notions of transitive, symmetric, functional properties; union, intersection and complement of classes; explicit class constructors; equality and inequality; classes defined by restrictions on properties.

W3C semantic markup languages

RDF RDFS

OWL-Full

Owl reasoning is much more complex than ‘bare’ RDF, yet OWL is all expressed as RDF triples. The extra complexity comes from extra OWL semantic conditions, mostly on the properties, eg.

ppp rdf:type owl:SymmetricProperty .

aaa ppp bbb .

owl-entails

bbb ppp aaa .

W3C semantic markup languages

RDF RDFS

OWL-Full

Owl reasoning is much more complex than ‘bare’ RDF, yet OWL is all expressed as RDF triples.

The extra complexity comes from extra OWL semantic conditions, but can be all expressed by giving a translation from OWL/RDF into first-order logic.

Lbase as a foundation formalism

RDF tripleswritten using RDF/RDFS/OWL vocabularies RDF

axioms RDFS axioms

OWL-Full axioms

Lbase translation of triples

Lbase as a foundation formalism

RDF tripleswritten using RDF/RDFS/OWL vocabularies RDF

axioms RDFS axioms

OWL-Full axioms

Lbase translation of triples

(A subset of CL adapted for SW use)

RDF RDFS

OWL-Full

DAML+OIL

OILExtensional, ‘layered’

Intensional, non-wf.

OWL-DL

OWL-DL

OWL-FullOWL-Lite

Restricted syntactic constructions:

Individual/literal/class/property vocabularies separated

No classes of classes, properties of properties, etc.,

Extensional; need to distinguish OWL-DL from RDFS categories.

Restricted vocabulary

Allows frame-like notation

Same syntactic freedom as RDF

State of play

Final RDF/RDFS specs now being produced (published about now)

OWL being finalized now, published in next few weeks.

See W3C website for details

DAML and OIL deployed, esp. by DARPA intelligence community and DAML-S.

SCL initiative is a ‘fast-track’ effort to define a better Lbase = subset of CL which is adapted to SW uses and integrated with RDF/RDFS/OWL .

A small ad-hoc international working group has been formed and we plan to have a draft standard proposal written by July 2003.

SCL initiative is a ‘fast-track’ effort to define a better Lbase = subset of CL which is adapted to SW uses and integrated with RDF/RDFS/OWL .

A small ad-hoc international working group has been formed and we plan to have a draft standard proposal written by July 2003.

Watch This Space…..

How is the SW going to work?

OK, so you put some machine-readable stuff on your website. Now what?

How is it going to work?

OK, so you put some machine-readable stuff on your website. Now what?

Hopefully, someone is going to do something useful with it.

How is the SW going to work?

OK, so you put some machine-readable stuff on your website. Now what?

Hopefully, someone is going to do something useful with it. Such as put you in touch with customers more effectively, or find your website more efficiently, or draw some useful conclusions.

How is the SW going to work?

OK, so you put some machine-readable stuff on your website. Now what?

Hopefully, someone is going to do something useful with it. Such as put you in touch with customers more effectively, or find your website more efficiently, or draw some useful conclusions.

All of these assume some kind of collusion between the publisher and the user, but they also assume a detachment of purpose. In general, the writer of the content does not know what the information is going to be used for.

Transmitting content

The writer of the content does not know what the information is going to be used for.

What can the writer assume about the way the information is used? No more than is in the spec, in general. But logical semantics only supplies truth-conditions; and those provide only a very minimal constraint upon use, even with the strongest possible assumptions.

Transmitting content

What content is in fact transmitted? Idea of “social meaning” is central, but new for AI/KR

Eg.

A: gobshite rdf:type rdfs:Class

rdf:comment “A gobshite is a contemptible person who habitually tells lies.”

B:Irish rdfs:subClassOf A#gobshite .

C:http://www.coginst.uwf.edu/~phayes rdf:type B#Irish

rdfs-entails:

http://www.coginst.uwf.edu/~phayes rdf:type A#gobshite .

Transmitting content

Logical semantics only supplies truth-conditions; and those provide only a very minimal constraint upon use, even with the strongest possible assumptions.

And we cannot even make the strongest assumptions, since we cannot even assume a shared meaning when software agents are involved, since they have access only to the surface forms.

Is this the right thing to be working on?

Is this the right thing to be working on?

•Moore’s law + Optic fiber + HTTP + HTML

Is this the right thing to be working on?

•Moore’s law + Optic fiber + HTTP + HTML

So far we have been focusing on the ‘semantic HTML’ based on XML. But what we need also is a ‘semantic HTTP’ to support negotiation of meaning and content.

We cannot even assume a shared meaning

<ex:Mary> <prop:age> “10” .

What does this literal mean? Seems obvious….

We cannot even assume a shared meaning

<ex:Mary> <prop:age> “10” .

a. It means the number ten.

b. It means the character string ‘10’.

c. It means both the number and the string.

d. It means either the number or the string.

e. It doesn’t mean anything unless associated with a datatype, and then what it means depends on the datatype.

a. It means the number ten.

b. It means the character string ‘10’.

c. It means both the number and the string.

d. It means either the number or the string.

e. It doesn’t mean anything unless associated with a datatype, and then what it means depends on the datatype.

Then it would be impossible to represent property values which were strings or binary numbers.

Then the range of the property would be a set of pairs, and there is no way to say that in RDF

Then the range of the property wouldn’t be well-defined.

Then two identical literals might mean different things, so one could not identify them.

a. It means the number ten.

b. It means the character string ‘10’.

c. It means both the number and the string.

d. It means either the number or the string.

e. It doesn’t mean anything unless associated with a datatype, and then what it means depends on the datatype.

Then it would be impossible to represent property values which were strings or binary numbers.

Then the range of the property would be a set of pairs, and there is no way to say that in RDF

Then the range of the property wouldn’t be well-defined.

Then two identical literals might mean different things, so one could not identify them.

b. It means the character string ‘10’.

<ex:Mary> <prop:age> “10” .

b. It means the character string ‘10’.

<ex:Mary> <prop:age> “10” .

<ex:Mary> <prop:age> “10”^^<xsd:number> .

The scary part of this story is that it took a group of <10 reasonably intelligent, dedicated people more than seven months intensive effort to get to this point, and nobody is really happy with the result.

(Is the chandelier in the room or part of the room?)

Tougher case: different universes of discourse.

What is the complement of a class? Eg what is in the class of US non-citizens?

What is the range of a quantifier? When integrating information from various sources, we have to assume that the quantifiers range over (at least) the union of the universes assumed by the different sources.

Many data archives and sources are built assuming a restricted universe. We need universe-protection mechanisms.

owl:Class vs. rdfs:Class in OWL-DL

A tougher case; time and change.

A tougher case; time and change.

Different propositions are true at different times

Do we associate times ….with assertions (tense)

….with relations (situation reasoning)

….with physical things (4-d spatiotemporal reasoning)

?

A tougher case; time and change.

Different propositions are true at different times

Do we associate times ….with assertions (tense)

….with relations (situation reasoning)

….with physical things (4-d spatiotemporal reasoning)?

Ans: yes.

A tougher case; time and change.

Different propositions are true at different times

Do we associate times ….with assertions (tense)

….with relations (situation reasoning)

….with physical things (4-d spatiotemporal reasoning)?

Ans: yes.

Philosophical/ontological debates have been extremely heated, and the moral for the SW is that it is impossible to legislate a correct standard answer.

The ‘standards’ do not agree

• “Individual: unique existence with a particular space-time extension.” [ISO 15926-2] Individuals are 4-d and have locations and times; things and processes are classified under same common categorization. Standard in process industry ontologies, eg EPISTLE (http://www.epistle.ws/)

• “Under the concept of Physical, we have the disjoint concepts of Object and Process.  …. the SUMO assumes a so called 3D orientation, rather than a 4D orientation.” [Proposed IEEE Standard Upper Merged Ontology, 2001. (http://suo.ieee.org/) Arbitrary choice made by software engineers.

• “Whereas 1stOrderEntities exist in time and space 2ndOrderEntities occur or take place, rather than exist.” [EuroWordNet (expertContrib/ewntop.zip)] Based on ‘endurantist’ ideas derived originally from Aristotle. Common in linguistic analyses.

A tougher case; time and change.

Different propositions are true at different times

Do we associate times ….with assertions (tense)

P(a, b) true at t

….with relations (situation reasoning)

P(a, b, t)

….with physical things (4-d spatiotemporal reasoning)

P( s(a,t), s(b,t) )

A tougher case; time and change.

P(a, b) @ t

P(a, b, t)

P( s(a,t), s(b,t) )

A tougher case; time and change.

P(a, b) @ t

P(a, b, t)

P( s(a,t), s(b,t) )

Even when translated into Lbase, these will not interface easily.

A tougher case; time and change.

P(a, b) @ t

P(a, b, t)

P( s(a,t), s(b,t) )

These vary by how far down the logical syntax you place the time parameter.

Moral: let it ‘float’ and allow the unification algorithm to match across levels.

(Basic rule: a parameter cannot govern any expression containing it.)

Same trick works for simple spatial reasoning, situational reasoning, etc.

A tougher case; time and change.

P(a, b) @ t

P(a, b, t)

P( s(a,t), s(b,t) )

Moral: ….. allow the unification algorithm to match across levels.

This requires altering the logical machinery. This violates the academic work-boundary rules, so is very hard to achieve in a committee setting.

Maybe this does have something to do with language after all….

We cannot legislate a WW ontology standard, we have to allow different ways of representing the same content to co-exist and communicate with each other.

When publishing content we cannot know exactly how the reader will make use of it.

We have to expect to find misunderstandings and the need to negotiate intended meanings.

We need to do for machine agents what nature did for human agents.

Why are almost all XML-based languages unreadable?

XML was designed as a TEXT MARKUP language.The tags describe the tagged text, providing ‘metadata’.

However, it is now widely used as a structure description/specification language. In this use, the tags describe the same structure that is exemplified by the syntactic structure of the XML itself. It is being used to describe itself, in effect, like a dancer giving a running commentary on her own movements.

<sentence type="simpleActive">

<subject>

<nounPhrase type="definite">

<article>The</article>

<noun type="singular" class="animateEntity">cat</noun>

</nounPhrase></subject><verb type="active" tense="simplePast">sat</verb>

<object><phrase type="locative">

<preposition>on</preposition>

<nounPhrase type="indefinite">

<article>a</article>

<noun type="singular" class="SurfaceObject">mat </noun>

</nounPhrase>

</phrase>

</object>

</sentence> 423 characters

<sentence type="simpleActive">

<subject>

<nounPhrase type="definite">

<article> The </article>

<noun type="singular" class="animateEntity"> cat </noun>

</nounPhrase></subject><verb type="active" tense="simplePast"> sat </verb>

<object><phrase type="locative">

<preposition> on </preposition>

<nounPhrase type="indefinite">

<article> a </article>

<noun type="singular" class="SurfaceObject"> mat </noun>

</nounPhrase>

</phrase>

</object>

</sentence> 423 characters

The cat sat on a mat. 21 characters

<rdfs:Class rdf:ID="elephant"> <rdfs:subClassOf> <rdfs:Class rdf:about="#animal"/> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#eats"/> <daml:toClass> <rdfs:Class rdf:about="#plant"/> </daml:toClass> </daml:Restriction> </rdfs:subClassOf>

<rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#colour"/> <daml:hasValue> <daml:ConcreteTypeExpression>EQUAL ``grey'' </daml:ConcreteTypeExpression> </daml:hasValue> </daml:Restriction> </rdfs:subClassOf></rdfs:Class>

591 characters

elephant_s are animal_s

[ which eats plant

which color is ‘grey’ ]

62 characters