semantic web & natural language

Post on 05-Jul-2015

76 views 4 download

description

a system called natural language interface which transforms user's natural language question into SPARQL query find related papers here https://sites.google.com/site/fadhlinams81/publication

Transcript of semantic web & natural language

Natural Language Interface: Challenges and

Partial Solutions

NURFADHLINA MOHD SHAREF (PhD)

Postdoctoral Fellow

Knowledge Technology Group

Centre of Artificial Intelligence

Faculty of Technology and

Information Science

Universiti Kebangsaan Malaysia

fadhlinams81@gmail.com

Outline

• Part 1: Introduction to Semantic Web– RDF– OWL– SPARQL

• Part 2: Natural Language Interface– Semantic Web Search Engine– NLI Applications– Challenges and Partial Solutions– Potential Works

• Part 3: Practical Examples– Mooneys Geography Dataset– Automatic SPARQL Construction for Natural Language-based

Search in Semantic Database

Part 1

• Introduction to Semantic Web

– RDF

– OWL

– SPARQL

Semantic Web: “a web of data that can be processed directly and indirectly by

machines (Tim Berners-Lee) ”

5

RDF (Resource Description Framework)

• Talk about resources – Resources can be pretty much anything– Resources are identified by Uniform Resource Identifiers (URIs)– Things (in a broad sense) are labelled with URIs– URIs act as globally valid names– Sets of names are organized in vocabularies– Vocabularies are demarcated by namespaces

• Information is encoded in Triples= subject-predicate-objectpatterns – Malaysia has capital Kuala Lumpur– Participant has course Semantic Technology

6Taken from: http://www.w3.org/2009/Talks/1030-Philadelphia-IH/Tutorial.ppt

7

http://.../KualaLumpur

ShoppingMall

:hasShoppingMall

Resource /Subject

Properties / Predicate

Literals

KLCC

Object

Imbi_Plaza

Literals

From Feigenbaum8

RDF Example9

Properties of the resource- The elements, artist, country, company, price, and yearare defined in the http://www.recshop.fake/cd# namespace.

XML Declaration

namespace

10

From: http://www.w3.org/TR/1998/WD-rdf-schema/

11

rdf:typerdfs:subClassOf

rdfs:subPropertyOf

Ontology in Information Science• An ontology is an engineering artefact consisting of:

– A vocabulary used to describe (a particular view of) some domain

– An explicit specification of the intended meaningof the vocabulary. • Often includes classification based information

– Constraints capturing background knowledgeabout the domain

• Ideally, an ontology should:

– Capture a shared understanding of a domain of interest

– Provide a formal and machine manipulateablemodel

12

OWL

• built on top of RDF • for processing information on the web • designed to be interpreted by computers • was not designed for being read by people • written in XML • is a W3C standard• Based on predecessors (DAML+OIL)• A Web Language: Based on RDF(S)• An Ontology Language: Based on logic

13

OWL vs RDF

• OWL and RDF are much of the same thing, but OWL is a stronger language with greater machine interpretability than RDF.

• OWL comes with a larger vocabulary and stronger syntax than RDF.– specific relations between classes, cardinality, equality,

richer typing of properties, characteristics of properties, and enumerated classes.

• OWL comes in three increasingly expressive layers that are designed for different groups of users– OWL Lite, OWL DL, and OWL Full

14

OWL Ontology

15

KualaLumpurInfo.owl

16

http://.../KualaLumpur

ShoppingMall

:hasShoppingMall

KLCC Imbi_Plaza

Thing rdf:type

i

:hasPublicTransport

land rail

:hasPublicTransport

ExpressGrocers

Seven Eleveni

rdfs:subClassOf

ERL

i

LRT

ii

owl:equivalentOf

:railTransport :railTransport

rdfs:subPropertyOf

rdfs:domain

rdfs:range

17

18

name mbox

Johnny Lee Outlaw <mailto:jlow@example.com>

Peter Goodguy <mailto:peter@example.org>

19

20

The SPARQL Query Language

?name ?faculty

Joe “CS“

Fred “CS“

21

SELECT ?name ?faculty

WHERE {

?teacher rdf:type Teachers.

?teacher name ?name.

?teacher faculty ?faculty.

}

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

name

facultyfaculty

Operator AND („.“)

The SPARQL Query Language

?name ?faculty

Joe “CS“

22

SELECT ?name ?faculty

WHERE {

?teacher rdf:type Teachers.

?teacher name ?name.

?teacher faculty ?faculty.

FILTER (?name=„Joe“)

}

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

name

facultyfaculty

Operator FILTER

The SPARQL Query Language

?name ?faculty ?title

Joe “CS“

Fred “CS“ “Professor“

23

SELECT ?name ?faculty ?title

WHERE {

?teacher rdf:type Teachers.

?teacher name ?name.

?teacher faculty ?faculty.

OPTIONAL {

?teacher title ?title.

}

}

title

„Professor“

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

name

facultyfaculty

Operator OPTIONAL

Part 2

• Natural Language Interface

– Semantic Web Search Engine

– NLI Applications

– Challenges and Partial Solutions

– Potential Works

Semantic Web Search Engine

• to provide ability to understand the intent of the searcher and return result in the context of the query meaning.

• distinguished from standard search engine because the sources of the documents are in the RDF, OWL and RDF-extended HTML documents.

• E.g: Swoogle, Serene, Watson

Natural Language Interface (NLI)

• allows user to query in human-like sentences, without requiring them to be aware of the underlying schema, vocabulary and query language

• Famous for question answering • three types of NLI

– with structured data such as database and ontologies, – with a semi or unstructured data such as text documents, – interactive setting as conversational system

• Approaches– Controlled Natural Language for query construction– Visual-based query construction– NL query mapping to triple representation

NLI Example - NLPReduce

NLI Example – Semantic Crystal

NLI Example – GINO & Ginseng

NLI Example - Querix

NLI Example - AquaLog

NLI Example - PowerAqua

NLI Example - FREyA

Comparison

Year

Inp

ut

typ

e

Syn

on

ym

sup

po

rt

Syn

tact

ic a

nal

ysis

Cal

cula

te s

trin

g si

mila

rity

Cla

rifi

cati

on

d

ialo

gue

Lear

nab

ility

Sup

po

rt K

B

Het

ero

gen

eit

y

SemanticCrystal

1993 Graphical based query

NO NO NO NO NO NO

GINO /Ginseng

2006 Controlled natural language based interface

WordNet YES NO NO NO NO

Querix 2006 Query by example WordNet NO NO YES NO NONLPReduce 2007 Keywords, sentence

fragments and fullsentences

NO NO NO NO NO NO

QuestIO 2008 Full natural language Gazetteer YES YES NO NO NO

ORAKEL 2008 Factual question Lexicon NO NO NO NO NOAquaLog /PowerAqua

2010 Full natural language WordNet,Lexicon

YES YES YES NO YES

FREyA 2012 Full natural language WordNet YES NO YES YES NO

NLI Implementation

• Query: “Who wrote The Neverending Story?”

• PowerAqua triple:

<[person,organization], wrote,Neverending Story>

• Triple Matching from Dbpedia:

<Writer, IS A,Person>

<Writer, author,The Neverending Story>

• Answer: “Michael Ende”

NLI Challenges (Unger et al., 2012)

1.

(a) Which cities have more than three universities?

(b) <[cities],more than,universities three>

(c) SELECT ?y WHERE {

?x rdf:type onto:University . ?x onto:city ?y .

} HAVING (COUNT(?x) > 3)

2.

(a) Who produced the most films?

(b) <[person,organization], produced,most films>

(c) SELECT ?y WHERE {

?x rdf:type onto:Film . ?x onto:producer ?y .

} ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 1

NLI Challenges

• Layer 1: Query understanding

– E.g: Complex query: negation, subqueries, arithmetic operation, etc

• Layer 2: Query-KB granularity homogenisation

– E.g: Different format/styles

– E.g: Mismatch in concept name

• Layer 3: Result presentation

– E.g: ranking

Query Understanding

• Input type

– Current

• Guided query: controlled natural language, query indicator (e.g: WH-terms)

• Graphical query construction

– Problem

• Confusing

• Requires a degree of background knowledge

• Constrained search

Query Understanding

• Compositional Density– Current

• Triple generated by PowerAqua for Give me five albums by Pink Lloyd– <[albums, five], null, Lloyd Pink>,

– <[five], null, albums>,

– <[Pink], null, Lloyd>

– Potential Works• Negation (e.g: not, outside, except)

• Arithmetic (e.g: sum of, how many, largest)

• Auxiliary (e.g: largest, latest, top)

Query Understanding

• Ambiguity Reduction

– Triple identification

– Stanford Parser

– WordNet

– Similarity Matching

– Clarification Dialogue

– Entity Identification

Types of queries (Ferre & Hermann, 2011)

• Visualization– exploration of the facet hierarchy

• Selection– count or list items that have a particular feature

• Path– subjects had to follow a path of properties.

• Disjunction – required the use of unions

• Negation – required the use of exclusions

• Inverse – required the crossing of the inverse of properties

• Cycle – required the use of co-reference variables (naming and reference navigation

links)

Query-KB granularity homogenisation

• KB variation

– Format (e.g: RDF, OWL)

– Style (e.g: with/without schema)

– Concept names (e.g: length)

– Query-triple conversion

– Sources supported (single/multi sources, LOD)

– Disambiguat

• WordNet, Similarity Matching, Clarification Dialogue

Result Understanding

• Ranking result

– List vs. finite answer

– Degree of confidence / hit score

– Learnability

Part 3

• Practical Examples

– Mooneys Geography Dataset

– Automatic SPARQL Construction for Natural Language-based Search in Semantic Database

Geography.owl

Class

DataTypeProperty ObjectProperty

Name Domain Range Name Domain Range

City cityPopulation City float borders State State

Capital statePopulation State float isCityOf City State

State statePopDensity State float hasCity State City

HiPoint abbreviation State string isCapitalOf Capital State

LoPoint stateArea State float hasCapital State Capital

Mountain lakeArea Lake float isMountainOf Mountain State

Lake height Mountain float hasMountain State Mountain

River hiElevation HiPoint float isHighestPointOf HiPoint State

Road loElevation LoPoint float hasHighPoint State HiPoint

length River float isLowestPointOf LoPoint State

number Road float hasLowPoint State LoPoint

isLakeOf Lake State

hasLake State Lake

runsThrough River State

hasRiver State River

passesThrough Road State

hasRoad State Road

Can you tell me the capital of texas? How large is texas?

Give me all the states of usa? How long is rio grande?

Give me the cities in texas? How long is the colorado river?

Give me the cities which are in texas? How long is the mississippi?

Give me the lakes in california? How long is the mississippi river?

Give me the states that border utah? How long is the mississippi river in miles?

Give me the number of rivers in california? How many capitals does rhode island have?

How many citizens in alabama? How many cities does texas have?

How many citizens live in california? How many cities does the usa have?

Give me the longest river that passes through the us?

How many citizens does the biggest city have in the usa?

Give me the largest state? How high are the highest points of all the states?

Could you tell me what is the highest point in the state of oregon? How high is the highest point in america?

Count the states which have elevations lower than what alabama has? How high is the highest point in montana?

How big is texas? How high is the highest point in the largest state?

How big is the city of new york? How large is the largest city in alaska?

How many colorado rivers are there? How long is the longest river in california?

How high is guadalupe peak? How long is the longest river in the usa?

How high is mount mckinley? How long is the shortest river in the usa?

How many cities named austin are there in the usa? How many big cities are in pennsylvania?

Approach• Can you tell me the capital of texas?

– POS: Can/MD you/PRP tell/VB me/PRP the/DT capital/NN of/IN texas/NNS ?/.

– Triple: <capital,?,texas>

– SPARQL:

"PREFIX geo:<http://www.mooney.net/geo#>"+

"SELECT ?s "+

"WHERE "+

"{?s geo:isCapitalOf geo:texas . }";

– Answer: geo:austinTx

• Give me all the states of usa?

– POS: Give/VB me/PRP all/PDT the/DT states/NNS of/IN usa/NN ?/.

– Triple: <states, ?, usa>

– SPARQL:

"PREFIX geo:<http://www.mooney.net/geo#>"+

"SELECT ?s "+

"WHERE "+

"{?s a geo:State . }";

– Answer: geo:kansas, geo:rhodeIsland, geo:montana, geo:tennessee, geo:arkansas, geo:newMexico, …(all the states)

POS tagging

• Give/VB me/PRP the/DT cities/NNS which/WDT are/VBP in/IN texas/NNS ?/.

• Give/VB me/PRP the/DT lakes/NNS in/IN california/NN ?/.

• Give/VB me/PRP the/DT states/NNS that/WDT border/NN utah/NN ?/.

• Give/VB me/PRP the/DT number/NN of/IN rivers/NNS in/IN california/NN ?/.

• How/WRB many/JJ citizens/NNS in/IN alabama/NN ?/.

More to Do

• Domain dependent/independent?

• Is the heuristics that POS and KB compliance enough for SPARQL generation?

• More complex queries

– Arithmetic operation (COUNT, SUB-QUERY)

– Aggregation (requires FILTER, OPTIONAL, HAVING)

– Auxiliary (e.g: latest, earliest)

Conclusion

• NLI is a potential area

• Highlight: ambiguity reduction, query understanding, query-KB matching

• Focus: SPARQL generation and optimization

• Potential sub-area: negation, arithmetic, temporal, complex queries

References

• Ferre, S., & Hermann, A. (2011). Semantic Search : Reconciling Expressive Querying and Exploratory Search. ISWC11 Proceedings of the 10th international conference on The semantic web (pp. 177-192).

• Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.-C., Gerber, D., & Cimiano, P. (2012). Template-based question answering over RDF data. Proceedings of the 21st international conference on World Wide Web -WWW 12, 639. New York, New York, USA: ACM Press. doi:10.1145/2187836.2187923

Contact

Nurfadhlina Mohd Sharef• Postdoctoral Fellow, Knowledge Technology Group,

Centre of Artifical Intelligence, Universiti KebangsaanMalaysia

(Room 4.4, Level 4, Block H)

• Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia

(Room C2.08, Level 2, Block C)

• fadhlinams81@gmail.com