Post on 05-Jul-2015
description
Natural Language Interface: Challenges and
Partial Solutions
NURFADHLINA MOHD SHAREF (PhD)
Postdoctoral Fellow
Knowledge Technology Group
Centre of Artificial Intelligence
Faculty of Technology and
Information Science
Universiti Kebangsaan Malaysia
fadhlinams81@gmail.com
Outline
• Part 1: Introduction to Semantic Web– RDF– OWL– SPARQL
• Part 2: Natural Language Interface– Semantic Web Search Engine– NLI Applications– Challenges and Partial Solutions– Potential Works
• Part 3: Practical Examples– Mooneys Geography Dataset– Automatic SPARQL Construction for Natural Language-based
Search in Semantic Database
Part 1
• Introduction to Semantic Web
– RDF
– OWL
– SPARQL
Semantic Web: “a web of data that can be processed directly and indirectly by
machines (Tim Berners-Lee) ”
5
RDF (Resource Description Framework)
• Talk about resources – Resources can be pretty much anything– Resources are identified by Uniform Resource Identifiers (URIs)– Things (in a broad sense) are labelled with URIs– URIs act as globally valid names– Sets of names are organized in vocabularies– Vocabularies are demarcated by namespaces
• Information is encoded in Triples= subject-predicate-objectpatterns – Malaysia has capital Kuala Lumpur– Participant has course Semantic Technology
6Taken from: http://www.w3.org/2009/Talks/1030-Philadelphia-IH/Tutorial.ppt
7
http://.../KualaLumpur
ShoppingMall
:hasShoppingMall
Resource /Subject
Properties / Predicate
Literals
KLCC
Object
Imbi_Plaza
Literals
From Feigenbaum8
RDF Example9
Properties of the resource- The elements, artist, country, company, price, and yearare defined in the http://www.recshop.fake/cd# namespace.
XML Declaration
namespace
10
From: http://www.w3.org/TR/1998/WD-rdf-schema/
11
rdf:typerdfs:subClassOf
rdfs:subPropertyOf
Ontology in Information Science• An ontology is an engineering artefact consisting of:
– A vocabulary used to describe (a particular view of) some domain
– An explicit specification of the intended meaningof the vocabulary. • Often includes classification based information
– Constraints capturing background knowledgeabout the domain
• Ideally, an ontology should:
– Capture a shared understanding of a domain of interest
– Provide a formal and machine manipulateablemodel
12
OWL
• built on top of RDF • for processing information on the web • designed to be interpreted by computers • was not designed for being read by people • written in XML • is a W3C standard• Based on predecessors (DAML+OIL)• A Web Language: Based on RDF(S)• An Ontology Language: Based on logic
13
OWL vs RDF
• OWL and RDF are much of the same thing, but OWL is a stronger language with greater machine interpretability than RDF.
• OWL comes with a larger vocabulary and stronger syntax than RDF.– specific relations between classes, cardinality, equality,
richer typing of properties, characteristics of properties, and enumerated classes.
• OWL comes in three increasingly expressive layers that are designed for different groups of users– OWL Lite, OWL DL, and OWL Full
14
OWL Ontology
15
KualaLumpurInfo.owl
16
http://.../KualaLumpur
ShoppingMall
:hasShoppingMall
KLCC Imbi_Plaza
Thing rdf:type
i
:hasPublicTransport
land rail
:hasPublicTransport
ExpressGrocers
Seven Eleveni
rdfs:subClassOf
ERL
i
LRT
ii
owl:equivalentOf
:railTransport :railTransport
rdfs:subPropertyOf
rdfs:domain
rdfs:range
17
18
name mbox
Johnny Lee Outlaw <mailto:jlow@example.com>
Peter Goodguy <mailto:peter@example.org>
19
20
The SPARQL Query Language
?name ?faculty
Joe “CS“
Fred “CS“
21
SELECT ?name ?faculty
WHERE {
?teacher rdf:type Teachers.
?teacher name ?name.
?teacher faculty ?faculty.
}
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
name
facultyfaculty
Operator AND („.“)
The SPARQL Query Language
?name ?faculty
Joe “CS“
22
SELECT ?name ?faculty
WHERE {
?teacher rdf:type Teachers.
?teacher name ?name.
?teacher faculty ?faculty.
FILTER (?name=„Joe“)
}
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
name
facultyfaculty
Operator FILTER
The SPARQL Query Language
?name ?faculty ?title
Joe “CS“
Fred “CS“ “Professor“
23
SELECT ?name ?faculty ?title
WHERE {
?teacher rdf:type Teachers.
?teacher name ?name.
?teacher faculty ?faculty.
OPTIONAL {
?teacher title ?title.
}
}
title
„Professor“
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
name
facultyfaculty
Operator OPTIONAL
Part 2
• Natural Language Interface
– Semantic Web Search Engine
– NLI Applications
– Challenges and Partial Solutions
– Potential Works
Semantic Web Search Engine
• to provide ability to understand the intent of the searcher and return result in the context of the query meaning.
• distinguished from standard search engine because the sources of the documents are in the RDF, OWL and RDF-extended HTML documents.
• E.g: Swoogle, Serene, Watson
Natural Language Interface (NLI)
• allows user to query in human-like sentences, without requiring them to be aware of the underlying schema, vocabulary and query language
• Famous for question answering • three types of NLI
– with structured data such as database and ontologies, – with a semi or unstructured data such as text documents, – interactive setting as conversational system
• Approaches– Controlled Natural Language for query construction– Visual-based query construction– NL query mapping to triple representation
NLI Example - NLPReduce
NLI Example – Semantic Crystal
NLI Example – GINO & Ginseng
NLI Example - Querix
NLI Example - AquaLog
NLI Example - PowerAqua
NLI Example - FREyA
Comparison
Year
Inp
ut
typ
e
Syn
on
ym
sup
po
rt
Syn
tact
ic a
nal
ysis
Cal
cula
te s
trin
g si
mila
rity
Cla
rifi
cati
on
d
ialo
gue
Lear
nab
ility
Sup
po
rt K
B
Het
ero
gen
eit
y
SemanticCrystal
1993 Graphical based query
NO NO NO NO NO NO
GINO /Ginseng
2006 Controlled natural language based interface
WordNet YES NO NO NO NO
Querix 2006 Query by example WordNet NO NO YES NO NONLPReduce 2007 Keywords, sentence
fragments and fullsentences
NO NO NO NO NO NO
QuestIO 2008 Full natural language Gazetteer YES YES NO NO NO
ORAKEL 2008 Factual question Lexicon NO NO NO NO NOAquaLog /PowerAqua
2010 Full natural language WordNet,Lexicon
YES YES YES NO YES
FREyA 2012 Full natural language WordNet YES NO YES YES NO
NLI Implementation
• Query: “Who wrote The Neverending Story?”
• PowerAqua triple:
<[person,organization], wrote,Neverending Story>
• Triple Matching from Dbpedia:
<Writer, IS A,Person>
<Writer, author,The Neverending Story>
• Answer: “Michael Ende”
NLI Challenges (Unger et al., 2012)
1.
(a) Which cities have more than three universities?
(b) <[cities],more than,universities three>
(c) SELECT ?y WHERE {
?x rdf:type onto:University . ?x onto:city ?y .
} HAVING (COUNT(?x) > 3)
2.
(a) Who produced the most films?
(b) <[person,organization], produced,most films>
(c) SELECT ?y WHERE {
?x rdf:type onto:Film . ?x onto:producer ?y .
} ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 1
NLI Challenges
• Layer 1: Query understanding
– E.g: Complex query: negation, subqueries, arithmetic operation, etc
• Layer 2: Query-KB granularity homogenisation
– E.g: Different format/styles
– E.g: Mismatch in concept name
• Layer 3: Result presentation
– E.g: ranking
Query Understanding
• Input type
– Current
• Guided query: controlled natural language, query indicator (e.g: WH-terms)
• Graphical query construction
– Problem
• Confusing
• Requires a degree of background knowledge
• Constrained search
Query Understanding
• Compositional Density– Current
• Triple generated by PowerAqua for Give me five albums by Pink Lloyd– <[albums, five], null, Lloyd Pink>,
– <[five], null, albums>,
– <[Pink], null, Lloyd>
– Potential Works• Negation (e.g: not, outside, except)
• Arithmetic (e.g: sum of, how many, largest)
• Auxiliary (e.g: largest, latest, top)
Query Understanding
• Ambiguity Reduction
– Triple identification
– Stanford Parser
– WordNet
– Similarity Matching
– Clarification Dialogue
– Entity Identification
Types of queries (Ferre & Hermann, 2011)
• Visualization– exploration of the facet hierarchy
• Selection– count or list items that have a particular feature
• Path– subjects had to follow a path of properties.
• Disjunction – required the use of unions
• Negation – required the use of exclusions
• Inverse – required the crossing of the inverse of properties
• Cycle – required the use of co-reference variables (naming and reference navigation
links)
Query-KB granularity homogenisation
• KB variation
– Format (e.g: RDF, OWL)
– Style (e.g: with/without schema)
– Concept names (e.g: length)
– Query-triple conversion
– Sources supported (single/multi sources, LOD)
– Disambiguat
• WordNet, Similarity Matching, Clarification Dialogue
Result Understanding
• Ranking result
– List vs. finite answer
– Degree of confidence / hit score
– Learnability
Part 3
• Practical Examples
– Mooneys Geography Dataset
– Automatic SPARQL Construction for Natural Language-based Search in Semantic Database
Geography.owl
Class
DataTypeProperty ObjectProperty
Name Domain Range Name Domain Range
City cityPopulation City float borders State State
Capital statePopulation State float isCityOf City State
State statePopDensity State float hasCity State City
HiPoint abbreviation State string isCapitalOf Capital State
LoPoint stateArea State float hasCapital State Capital
Mountain lakeArea Lake float isMountainOf Mountain State
Lake height Mountain float hasMountain State Mountain
River hiElevation HiPoint float isHighestPointOf HiPoint State
Road loElevation LoPoint float hasHighPoint State HiPoint
length River float isLowestPointOf LoPoint State
number Road float hasLowPoint State LoPoint
isLakeOf Lake State
hasLake State Lake
runsThrough River State
hasRiver State River
passesThrough Road State
hasRoad State Road
Can you tell me the capital of texas? How large is texas?
Give me all the states of usa? How long is rio grande?
Give me the cities in texas? How long is the colorado river?
Give me the cities which are in texas? How long is the mississippi?
Give me the lakes in california? How long is the mississippi river?
Give me the states that border utah? How long is the mississippi river in miles?
Give me the number of rivers in california? How many capitals does rhode island have?
How many citizens in alabama? How many cities does texas have?
How many citizens live in california? How many cities does the usa have?
Give me the longest river that passes through the us?
How many citizens does the biggest city have in the usa?
Give me the largest state? How high are the highest points of all the states?
Could you tell me what is the highest point in the state of oregon? How high is the highest point in america?
Count the states which have elevations lower than what alabama has? How high is the highest point in montana?
How big is texas? How high is the highest point in the largest state?
How big is the city of new york? How large is the largest city in alaska?
How many colorado rivers are there? How long is the longest river in california?
How high is guadalupe peak? How long is the longest river in the usa?
How high is mount mckinley? How long is the shortest river in the usa?
How many cities named austin are there in the usa? How many big cities are in pennsylvania?
Approach• Can you tell me the capital of texas?
– POS: Can/MD you/PRP tell/VB me/PRP the/DT capital/NN of/IN texas/NNS ?/.
– Triple: <capital,?,texas>
– SPARQL:
"PREFIX geo:<http://www.mooney.net/geo#>"+
"SELECT ?s "+
"WHERE "+
"{?s geo:isCapitalOf geo:texas . }";
– Answer: geo:austinTx
• Give me all the states of usa?
– POS: Give/VB me/PRP all/PDT the/DT states/NNS of/IN usa/NN ?/.
– Triple: <states, ?, usa>
– SPARQL:
"PREFIX geo:<http://www.mooney.net/geo#>"+
"SELECT ?s "+
"WHERE "+
"{?s a geo:State . }";
– Answer: geo:kansas, geo:rhodeIsland, geo:montana, geo:tennessee, geo:arkansas, geo:newMexico, …(all the states)
POS tagging
• Give/VB me/PRP the/DT cities/NNS which/WDT are/VBP in/IN texas/NNS ?/.
• Give/VB me/PRP the/DT lakes/NNS in/IN california/NN ?/.
• Give/VB me/PRP the/DT states/NNS that/WDT border/NN utah/NN ?/.
• Give/VB me/PRP the/DT number/NN of/IN rivers/NNS in/IN california/NN ?/.
• How/WRB many/JJ citizens/NNS in/IN alabama/NN ?/.
More to Do
• Domain dependent/independent?
• Is the heuristics that POS and KB compliance enough for SPARQL generation?
• More complex queries
– Arithmetic operation (COUNT, SUB-QUERY)
– Aggregation (requires FILTER, OPTIONAL, HAVING)
– Auxiliary (e.g: latest, earliest)
Conclusion
• NLI is a potential area
• Highlight: ambiguity reduction, query understanding, query-KB matching
• Focus: SPARQL generation and optimization
• Potential sub-area: negation, arithmetic, temporal, complex queries
References
• Ferre, S., & Hermann, A. (2011). Semantic Search : Reconciling Expressive Querying and Exploratory Search. ISWC11 Proceedings of the 10th international conference on The semantic web (pp. 177-192).
• Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.-C., Gerber, D., & Cimiano, P. (2012). Template-based question answering over RDF data. Proceedings of the 21st international conference on World Wide Web -WWW 12, 639. New York, New York, USA: ACM Press. doi:10.1145/2187836.2187923
Contact
Nurfadhlina Mohd Sharef• Postdoctoral Fellow, Knowledge Technology Group,
Centre of Artifical Intelligence, Universiti KebangsaanMalaysia
(Room 4.4, Level 4, Block H)
• Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia
(Room C2.08, Level 2, Block C)
• fadhlinams81@gmail.com