Web3.0 and Language Resources Marta Sabou Knowledge Media Institute (KMi) The Open University...

Post on 20-Jan-2016

213 views 0 download

Tags:

Transcript of Web3.0 and Language Resources Marta Sabou Knowledge Media Institute (KMi) The Open University...

Web3.0 and Language Resources

Marta SabouKnowledge Media Institute (KMi)

The Open University

Exploiting Semantic Web Ontologies:

An Experimental Report

QuickTime™ and a decompressor

are needed to see this picture.

Outline

• The Semantic Web– Online ontologies– Gateways to the Semantic Web

• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment

• Outlook for Language Technology

Scientific American, May 2001:

The Semantic Web

Tim Berners-Lee:– “an extension of the current web (1) in which information is given well-defined meaning (2), better enabling computers and people to work in cooperation (3).”

1. The SW will gradually evolve out of the existing Web, it is not a competition to the current WWW

2. Represent Web content in a form that is more easily machine-processable

3. An open platform allowing information to be shared and processed

Ontology

Metadata

UoD

<rdf:RDF><channel rdf:about=“http://watson.kmi.open.ac.uk/blog”><title>Elementaries - The Watson Blog</title><link>http://watson.kmi.open.ac.uk:8080/blog/</link><description>"Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23</description><language>en</language><copyright>Watson team</copyright><lastBuildDate>Thu, 01 Mar 2007 13:49:52 GMT</lastBuildDate><generator>Pebble (http://pebble.sourceforge.net)</generator><docs>http://backend.userland.com/rss</docs>…

<rdf:RDF><channel rdf:about=“http://watson.kmi.open.ac.uk/blog”><title>Elementaries - The Watson Blog</title><link>http://watson.kmi.open.ac.uk:8080/blog/</link><description>"Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23</description><language>en</language><copyright>Watson team</copyright><lastBuildDate>Thu, 01 Mar 2007 13:49:52 GMT</lastBuildDate><generator>Pebble (http://pebble.sourceforge.net)</generator><docs>http://backend.userland.com/rss</docs>…

<rdf:RDF> <foaf:Image rdf:about='http://static.flickr.com/132/400582453_e1e1f8602c.jpg'> <dc:title>Zen wisteria</dc:title> <dc:description></dc:description> <foaf:page rdf:resource='http://www.flickr.com/photos/xcv/400582453/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/vittelgarden/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/wisteria/'/> <dc:creator> <foaf:Person><foaf:name>Mathieu d'Aquin</foaf:name> …

<rdf:RDF> <foaf:Image rdf:about='http://static.flickr.com/132/400582453_e1e1f8602c.jpg'> <dc:title>Zen wisteria</dc:title> <dc:description></dc:description> <foaf:page rdf:resource='http://www.flickr.com/photos/xcv/400582453/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/vittelgarden/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/wisteria/'/> <dc:creator> <foaf:Person><foaf:name>Mathieu d'Aquin</foaf:name> …

<rdf:RDF> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://usefulinc.com/ns/doap#"/> </owl:Ontology> <j.1:Organization rdf:ID="KMi"> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >The Knoledge Media Institute of the Open University, Milton Keynes UK</rdfs:comment> </j.1:Organization> <j.1:Document rdf:ID="KMiWebSite"> …

<rdf:RDF> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://usefulinc.com/ns/doap#"/> </owl:Ontology> <j.1:Organization rdf:ID="KMi"> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >The Knoledge Media Institute of the Open University, Milton Keynes UK</rdfs:comment> </j.1:Organization> <j.1:Document rdf:ID="KMiWebSite"> …

DOAP

FOAFDC

RSS TAPWORDNET

NCI GalenMusic

…… …

SW = A Conceptual Layer over the web

SW is Heterogeneous!

Interlinked, Semantic Data on the Web

2007 2008 2009

Semantic Web Gateways

Search engines for the semantic data: collect, index and provide access to online semantic data.

10K ontologies

QuickTime™ and a decompressor

are needed to see this picture. 50 million semantic documents

QuickTime™ and a decompressor

are needed to see this picture.250K ontologies and metadata

Semantic Web Status

Online semantic data constitutes now the largest and most heterogeneous knowledge resource known in AI/KR.

Semantic Web Gateways offer a way to access this data easily.

So, the question is…How to use it? How to make the best out of it?

Next Generation Semantic Web Applications

Dynamically retrieving, exploiting and combining relevant semantic resources from the SW, at large

Gateway to the Semantic Web

IEEE Intelligent Systems23(3), pp. 20-28, May/June 2008

• Key aspects of the paradigm• Tech. Infrastructure• Concrete Applications

Outline

• The Semantic Web– Online ontologies– Gateways to the Semantic Web

• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment

• Outlook for Language Technology

Concept_A

(e.g., Supermarket)

Concept_B

(e.g., Building)

ScarletScarlet≡≡

Semantic Web

Semantic Relation

( )

Deduce

Access

- SCARLET - relation discovery on the SW

- http://scarlet.open.ac.uk/

- Automatically selects and combines multiple online ontologies to derive a relation

Relation Discovery

M. Sabou, M. d’Aquin, E. Motta, “Using the Semantic Web as Background Knowledge in Ontology Mapping", Ontology Mapping Workshop, ISWC’06.

Two strategies

Supermarket Building

Supermarket

Shop

PublicBuilding⊆

⊆Building

ScarletScarlet

Cholesterol OrganicChemical

Cholesterol

Steroid

Lipid⊆

⊆OrganicChemical

ScarletScarlet

Steroid

≡≡ ≡ ≡

Deriving relations from (A) one ontology and (B) across ontologies.

Semantic Web

(A) Strategy 1 (B) Strategy 2

Matching two large scale agricultural thesauri:• AGROVOC

• UN’s Food and Agriculture Organisation (FAO) thesaurus • 28.174 descriptor terms• 10.028 non-descriptor terms

• NALT• US National Agricultural Library Thesaurus• 41.577 descriptor terms• 24.525 non-descriptor terms

Experiment

M. Sabou, M. d’Aquin, E. Motta, “Exploring the Semantic Web as Background Knowledge in Ontology Matching", Journal of Data Semantics, 2008.

Results - S1

226 Used Ontologies - S1

http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf

http://reliant.teknowledge.com/DAML/SUMO.daml

http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml

http://reliant.teknowledge.com/DAML/Economy.damlhttp://gate.ac.uk/projects/htechsight/Technologies.daml

Results - S2

306 Used Ontologies - S2

http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf

http://reliant.teknowledge.com/DAML/SUMO.daml

http://a.com/ontology

http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml

http://www.dannyayers.com/2003/08/udef.rdfs

http://gate.ac.uk/projects/htechsight/Technologies.daml

http://reliant.teknowledge.com/DAML/Economy.daml

Evaluation

• Manual assessment of 1000 mappings (15%)

• Performed for both strategies• Evaluators:

– Researchers in the area of the Semantic Web– 10 people split in two groups

Evaluation - Precision

• S1

• S2

Indicative Comparison with Other Techniques

• Traditional Matching (only eq.): 54% - 83%• Using a single, pre-selected domain ontology: 76%

• Using the entire Web (via Google): 38% - 50%• Using pre-selected, domain texts: 53% - 75%• Using dynamically selected ontologies: 70%

The Semantic Web offers high quality data that can be used to improve ontology matching.

Evaluation - Error Analysis S1

Error Analysis S2 old

Subsumption as generic relation.

Subsumption as part-whole.

Subsumption as role.

Findings(1)

• Online ontologies are good enough to provide performance values comparable with other methods

• All relations have a formal “explanation”

BUT:• Sparseness in domain coverage• Several modeling errors, most often the miss-use of subsumption

Outline

• The Semantic Web– Online ontologies– Gateways to the Semantic Web

• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment

• Outlook for Language Technology

PowerAqua

Natural language question

Answers from online semantic data

Open domain QA by exploring online available semantic data.

Findings (2)

• Online ontologies allowed answering 69% of our question set

BUT:• Weakly populated

– Most ontologies do not have enough instances• Sparseness in domain coverage

– Only 20% of the IR TREC topics covered• Limited amount of non-taxonomic relations• Low quality:

– Several modeling errors, most often the miss-use of subsumption

– Unclear labels– Missing domain and range information

Outline

• The Semantic Web– Online ontologies– Gateways to the Semantic Web

• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment

• Outlook for Language Technology

Search in Tag Spaces

5/24 ≈ 21% relevant

Dog Dog

DogDog

Bird

Bird

Bird

Bird

Bird

Bird

Bird

Tiger

Tiger

Tiger

Tiger

CatLandscape

Landscape

Landscape

Let’s find photos of “animals which live in the water”

Query: Animal Water

Bring in the SW…

Dolphin Seal

Marine Mammal

Mammal

Sea

livesIn

Whale

Body of Water

Ocean

Sea Elephant

FishlivesIn

Animal

FreshwaterFish SaltwaterFish

livesIn

Animal Water

<Animal livesIn Water>

<Dolphin>or<Seal>or<“Sea Elephant”>or<Whale>

Results

dolphin

seal

whale

sea elephan

t

18/24 ≈ 75% relevant

FLOR - Folksonomy enrichment

kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea

grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter

mammal animal zoo nature dolphin farm

DolphinSeal

Marine Mammal SeahasHabitat

Whale

Body of Water

Ocean

Mammal

Terrestrial Mammal

Tiger Lion

Sea Elephant

Animal

kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea

grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter

mammal animal zoo nature dolphin farm

FLOR - Experiment

kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea

grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter

mammal animal zoo nature dolphin farm

Structure_WN Structure_SW

Interface_WN Interface_SW

Richness ofstructure

Increase inSearch results

WordNet

Findings (3)

• SW covers (some) multilingual tags• SW covers novel tags

BUT:• on average, SW leads to less senses than WordNet per

tag• on average, SW leads to a weaker structure than

obtained from WordNet

YET:• Better results obtained when Structure_SW is used

for querying – Better alignment between tags and online concepts– Less fine-grained structure

Findings

• Good results obtained for relation discovery, open domain QA, improvement of search in folksonomies

• Large scale– More than 10K ontologies and growing!!!– Larger than any knowledge source in KR/AI

• Heterogeneous– Wrt. Size, quality of conceptualization, e.t.c

• Constantly evolving– Covers new terms that don’t (yet) appear in WordNet

• Multi-domain • Multilingual • Tools and API’s exist to allow its exploration

However…

• Domain coverage is still rather limited• Ontology quality affects some applications:

– Modeling errors– Few non-taxonomic relations– Unclear labels for ontology entities– Weakly populated– Less senses than in WordNet– Lack of domain and range information

Outline

• The Semantic Web– Online ontologies– Gateways to the Semantic Web

• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment

• Outlook for Language Technology

The Web as a LRWeb 1.0 •Web-based relatedness

•Calibrasi & Vitanyi, 2007•Verifying semantic relations

•Cimiano et Al, 2004

The Web as a LR

kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea

grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter

mammal animal zoo nature dolphin farm

Web 2.0

+•Wikipedia based relatedness

•Strube et. Al, 2006•Folksonomy based relatedness

• Stumme et. Al, 2008

•Web-based relatedness •Calibrasi & Vitanyi, 2007

•Verifying semantic relations

•Cimiano et Al, 2004

The Web as a LR

kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea

grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter

mammal animal zoo nature dolphin farm

DolphinSeal

Marine Mammal SeahasHabitat

Whale

Body of Water

Ocean

Mammal

Terrestrial Mammal

Tiger Lion

Sea Elephant

Animal

•Web-based relatedness •Calibrasi & Vitanyi, 2007

•Verifying semantic relations

•Cimiano et Al, 2004

•Wikipedia based relatedness•Strube et. Al, 2006

•Folksonomy based relatedness• Stumme et. Al, 2008

Besides deepening research on the frontier of Web2.0 and LRs,

… the next important wave is in exploring Web3.0. resources.

Web 3.0 +

+

LT <---> SW

• LT <--- SW:– Complementary to existing LRs

• Additional senses, novel terms and relations– Combine with other LRs– How to explore redundancy of knowledge?– How to explore heterogeneity?

• LT ---> SW :Can LT methods help to:– Increase domain coverage?– Detect modeling errors?

• E.g., by checking evidence from Web, Wikipedia– Improve anchoring?

• E.g., WSD methods

Thank you!

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a

decompressorare needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Strategy 2 - Definition

BABCCAr

BABCCAr

BABCCAr

BABCCAr

BABCCAr

⊇⇒≡∧⊇⊇⇒⊇∧⊇⊥⇒⊥∧⊆≡⇒≡∧⊆⊆⇒⊆∧⊆

')5(')4(')3(')2(')1(

Principle: If no ontologies are found that contain the two terms then combine information from multiple ontologies to find a mapping.

A Brel

Sem

anti

c W

eb

A’BC

C’B’rel

rel

Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:

(a) if find relation between C and B.(b) if find relation between C and B.

CA ⊆'CA ⊇'

Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:

(a) if find relation between C and B.(b) if find relation between C and B.

Strategy 2 - Examples

PoultryChicken⊆FoodPoultry ⊆

Chicken Vs. Food(midlevel-onto)

(Tap)

Ex1:

FoodChicken⊆

Ham Vs. FoodEx2:

(r1)

MeatHam⊆FoodMeat ⊆

(pizza-to-go)

(SUMO) FoodHam⊆

(Same results for Duck, Goose, Turkey)

(r1)

Ham Vs. SeafoodEx3:

MeatHam⊆SeafoodMeat ⊥

(pizza-to-go)

(wine.owl) SeafoodHam ⊥(r3)

1

0.9

0.9 0.91

0.5

0.5

–Label similarity methods •e.g., Full_Professor = FullProfessor

–Structure similarity methods•Using taxonomic/property related information

Context: Ontology Matching

New paradigm: use of background knowledge

A B

Background Knowledge(external source)

A’ B’R

R

External Source = One Ontology

Aleksovski et al. EKAW’06• Map (anchor) terms into concepts from a richly axiomatized domain ontology • Derive a mapping based on the relation of the anchor terms

Assumes that a suitable (rich, large) domain ontology (DO) is available.

Strategy 1 - Definition

Find ontologies that contain equivalent classes for A and B and use their relationship in the ontologies to derive the mapping.

A Brel

Sem

anti

c W

eb

A1’B1’

A2’B2’

An’Bn’

O1

O2 On

BABA

BABA

BABA

BABA

⊥⇒⊥⊇=>⊇⊆=>⊆≡⇒≡

''

''

''

''For each ontology use these rules:

These rules can be extended to take into account indirect relations between A’ and B’, e.g., between parents of A’ and B’:

'''' BABCCA ⊥⇒⊥∧⊆

External Source = Web

van Hage et al. ISWC’05• rely on Google and an online dictionary in the food domain to extract semantic relations between candidate terms using IR techniques

A Brel

+ OnlineDictionary

IR Methods

Precision increases significantly if domain specific sources are used:50% - Web; 75% - domain texts.

Does not rely on a rich DO