Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: [email protected] Natural...

24
Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: [email protected] Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction

Transcript of Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: [email protected] Natural...

Page 1: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

Danica Damljanović, Milan Agatonović, Hamish Cunningham

contact: [email protected]

Natural Language Interfaces to Ontologies: Combining Syntact ic Analysis and

Ontology-Based Lookup through the User Interact ion

Page 2: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

2 WEB OF DATA

Large datasets such as Linked Open Data available

How can we use these data?

Modigliani test: “tell me the locations of all the original paintings of Modigliani” (Richard MacManus, ReadWriteWeb)

03 JUNE 2010

Page 3: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

3

PREFIX fb: <http://rdf.freebase.com/ns/>PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX dbp-prop: <http://dbpedia.org/property/>PREFIX dbp-ont: <http://dbpedia.org/ontology/>PREFIX umbel-sc: <http://umbel.org/umbel/sc/>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX ot: <http://www.ontotext.com/>

SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_citWHERE { ?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?ow ot:preferredLabel ?owner_l . OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } . OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?

city_db_loc } OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }}

03 JUNE 2010

PASSING MODIGLIANI TEST

Source:http://blog.larkc.eu/: “LDSR Passes the Modigliani Test for Semantic Web”, more than 1h to generate a SPARQL query

Page 4: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

4PASSING MODIGLIANI

TEST: FUTURE

03 JUNE 2010

“tell me the locations of all the original paintings of Modigliani”

Page 5: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

5BUT, OTHERS HAVE ALREADY DONE IT?

03 JUNE 2010

low precisionhigh recall

low precisionlow recall

high precisionhigh recall

high precisionlow recall

large datasets (several domains)

simple factual questions

complex questions

small datasets(narrow domain)

(Damljanović and Bontcheva, 2009.)

Page 6: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

6

FREYA (FEEDBACK, REFINEMENT, EXTENDED

VOCABULARY AGGREGATOR)

Increase recall by:

generating the dialog whenever an “unknown” term appears in the question

Increase precision by:

generating the dialog whenever one term refers to more than one concept in the ontology

The dialog is generated by combining the language of the user and the ontology

Learn from the dialog

03 JUNE 2010

Page 7: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

8 FREYA WORKFLOW

03 JUNE 2010

answer

answer

NL query

POCsOCs

triples

SPARQL

Potential Ontology Concept (POC)

Ontology Concept (OC)

learn

Page 8: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

10 FINDING POCS

03 JUNE 2010

Page 9: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

11FINDING OCS

03 JUNE 2010

Page 10: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

13

geo:City

geo:State new york

POC

POC

population

geo:cityPopulation

MAPPING POC TO OCS

03 JUNE 2010ESWC 2010

geo:State

Page 11: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

14 NEW YORK IS A CITY

03 JUNE 2010

Page 12: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

15 NEW YORK IS A STATE

03 JUNE 2010

Page 13: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

16

POC

POC

POC

state

areageo:stateArea

geo:State

geo:isLowestPointOf

point

THE USER CONTROLS THE OUTPUT

03 JUNE 2010

maxgeo:LoPoint

geo:loElevation

min

Page 14: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

17WHAT IS THE LOWEST POINT OF THE

STATE WITH THE LARGEST AREA?

03 JUNE 2010

TRIPLES:?firstJoker – geo:isLowestPointOf – geo:Stategeo:State – (max) geo:stateArea - ?lastJoker

SPARQL:prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>prefix xsd: <http://www.w3.org/2001/XMLSchema#>select ?firstJoker ?p0 ?c1 ?p2 ?lastJoker where { { { ?c1 ?p0 ?firstJoker} UNION { ?firstJoker ?p0 ?c1} . filter (?p0=<http://www.mooney.net/geo#isLowestPointOf>) . } ?c1 rdf:type <http://www.mooney.net/geo#State> . ?c1 ?p2 ?lastJoker . filter (?p2=<http://www.mooney.net/geo#stateArea>) . } ORDER BY DESC(xsd:double(?lastJoker)) however...

Page 15: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

18WHAT IS THE LOWEST POINT OF THE

STATE WITH THE LARGEST AREA?

03 JUNE 2010

TRIPLES:?firstJoker – (min) geo:loElevation – geo:LoPointgeo:LoPoint - ?joker3 – geo:Stategeo:State – (max) geo:stateArea - ?lastJoker

SPARQL:prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>prefix xsd: <http://www.w3.org/2001/XMLSchema#>select ?firstJoker ?p0 ?c1 ?joker3 ?c2 ?p3 ?lastJoker where { ?c1 ?p0 ?firstJoker . filter (?p0=<http://www.moony.net/geo#loElevation>) . ?c1 rdf:type <http://www.mooney.net/geo#LoPoint> . {{ ?c2 ?joker3 ?c1 } UNION { ?c1 ?joker3 ?c2 }} ?c2 rdf:type <http://www.mooney.net/geo#State> . ?c2 ?p3 ?lastJoker . filter (?p3=<http://www.mooney.net/geo#stateArea>) . } ORDER BY ASC(xsd:double(?firstJoker)) DESC(xsd:double(?lastJoker))

the answer for both is Death Valley

Page 16: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

19

FREYA: A NATURAL LANGUAGE INTERFACE TO ONTOLOGIES

03 JUNE 2010

http://gate.ac.uk/freya

Page 17: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010 21

EVALUATION

• correctness

• ranked suggestions

• learning

03 JUNE 2010

Page 18: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

22EVALUATION:

CORRECTNESS

Mooney

GeoQuery

dataset:

250 questions

03 JUNE 2010

19

32

127

72

Precision = Recall = 92.4%

incorrect2 dialogs1 dialogno dialog

Page 19: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

23EVALUATION:

SUGGESTIONS RANKING

Mooney GeoQuery dataset: 250 questions

Manually labelled correct rankings

Mean Reciprocal Rank (MRR): 0.81

03 JUNE 2010

Page 20: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

24EVALUATION:

LEARNING

103 questions correctly answered by engaging the user into 1 dialog

MRR 0.72

03 JUNE 2010

Page 21: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

25EVALUATION:

LEARNING

MRR improved from 0.72 to 0.78

03 JUNE 2010

Page 22: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

26 NEXT STEPS

Passing Modigliani test

Exploring unknown data structures with FREyA, especially if they are large LDSR: DBPedia, Freebase, Geonames, UMBEL,

Wordnet, CIA World Factbook, Lingvoj, MusicBrainz

http://ontotext.com/ldsr

User-centric evaluation

03 JUNE 2010

Page 23: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

27

Contact: [email protected]

THANK YOU FOR YOUR ATTENTION! QUESTIONS?

Thanks to Abraham Bernstein and Esther Kaufmann from the University of Zurich, for sharing with us Mooney dataset in OWL format, and J. Mooney from University of Texas for making this dataset publicly available.

Page 24: Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk Natural Language Interfaces to Ontologies: Combining Syntactic Analysis.

ESWC 2010

28 REFERENCES

Damljanovic, D., Bontcheva, K.: Towards Enhanced Usability of Natural Language Interfaces to Knowledge Bases. In Devedzic V. and Gasevic D. (Eds.), Special issue on Semantic Web and Web 2.0, Annals of Information systems, Springer-Verlag, 2009.

03 JUNE 2010