ANGIE in wonderland
-
Upload
inria-oak -
Category
Data & Analytics
-
view
41 -
download
0
Transcript of ANGIE in wonderland
Motivating example
Long term goal: new intelligent applications such as
Applications that automatically compute vacations plans
Example: • I would like to travel for 3 weeks in South America • Visit UNESCO sites • Old palaces
2
3
Automatic computation of vacation plans
Personal CalendarWeb Services API
Traveling Related BooksWeb Services API
Flights Web Services API
Countries, Cities, Airports
Web Services API
Web Service APIs available on the Web
ProgrammableWeb.com counts >12000 APIs from various domains:
• Search (3200 APIs)
• Social (3000 APIs)
• Traveling (1200 APIs)
• Music (1000 APIs)
• Financial (1200 APIs), Science (600 APIs), Weather (300 APIs)
4
Query examples
• Places in Peru listed as UNESCO heritage
• Books written by South American Nobel Prize Winners
• Memorial houses of Brazilian Kings
5
Our research
• Query Evaluation using Web Service APIs
• Mapping Web Services to Knowledge Bases
6
Web Services WWW
SUSIE
Web Services
ANGIE
KB
Web Services KnowledgeBase
DORIS
8
Problem Description
Given a query Q against
• a knowledge base (KB)
• a set of Web services F
• a bound Max for the number of Web service calls
compute answers for Q using at most Max calls
8
9
Representing functions of Web Service APIs
A function is a named parameterized conjunctive query where
• Inputs must be bound to entities before the call execution
• Outputs are bound as the result of the call
• Relations are from a global schema (knowledge base schema)
outputinput
parent
p_place
birthplace
?child
?c_place
birthplace
hasChild
getChildren(parent, p_place,?child, ?c_place)
getChildren(parent, p_place,?child, ?c_place) :- birthplace(parent, p_place),
hasChild(parent, ?child)
birthplace(hasChild,?c_place)
9
Query example
parent
p_place
birthplace
?child
?c_place
birthplace
hasChild
getChildren(parent, p_place,?child, ?c_place)
?place
birthplace
Pedro II of Brazil
Query
Pedro II of Brazil
Baseline Solution (aiming at completeness)
getChildren
birthplace
hasChild
getChildren
birthplace
hasChild
X
Brussels
birthplace
Isabella of Austria
getChildren
hasChild….
getChildren birthplace
hasChild
birthplace
Palace of São Cristóvão, Rio de
Janeiro
Pedro II of Brazil
birthplace
Kensington Palace,
London
Queen Victoria of the UKBut I only have a small budget of calls !
11
ANGIE Algorithm: the bang for the buck
birthplace
?place
Pedro II of Brazil
parent
p_place
birthplace
hasChild
Pedro II of Brazil
hasChild
Pedro I of Brazil
Ajuda, Lisbonbirthplace
hasChild
Juan VI of Portugal
parent
p_place
birthplace
hasChild
Querluz Palace, Lisbon
Palace of São Cristóvão, Rio de
Janeiro
Juan VI of Portugal
Ajuda, Lisbon
Pedro I of Brazil
parent
p_place
birthplace
hasChild ?child
?c_place
birthplace
12
13
Property
For a pipeline of calls:
W1 < W2 <… Wi … Wn < Q
where the inputs are extracted using the local queries
Q1KB Q2
KB … QiKB … Qn
KB
If the knowledge base has answers for QiKB then
execute only Wi … Wn
13
Web call composition graph
YAGO
Query
?placebirthplace
?personidhasId
getInfoByPersonId
?idperson
getPersonId
hasId
GetChildren
Juan VI of Portugal, Ajuda
GetChildren
Pedro I of Brazil
Pedro II of Brazil
GetPersonId
GetInfoByPersonId
id_Pedro-II
14
0
100
200
300
400
500
600
0 4 81
21
62
02
42
83
23
64
04
44
85
25
65
96
36
77
17
57
98
38
79
19
59
91
03
DF
F-RDF
F-RDF-R
Number of answers
Nu
mb
er
of c
alls
ANGIE
ANGIE-cost
Books of French Nobel Prize winners
Experiments
15
50 real Web services from 3 domains:
• Music• Books• Movies
16
ANGIE: Active Knowledge & Interaction Exploration
Query MediatorDynamically computes the Web calls that answer the query
RDF Warehouse
• The local KB stores the results of all executed Web calls
• Stored call results may speed-up the evaluation of related queries
16Active Knowledge : Dynamically Enriching RDF Knowledge Bases by Web Services.with F. M. Suchanek, G. Kasneci, T. Neumann, W. Yuan, G. Weikum, SIGMOD 2010
Problem: Asymmetric accesses
• Consider a source publishing only the Web service:
getLeaderInfo(leader, type, country)
• And the queries:
Q1: getLeaderInfo(Pablo II, ?, ?)
Q2: getLeaderInfo(?, ?, Brazil)
Q3: getLeaderInfo(?, king, Brazil)
18
Easy
ImpossibleImpossible
DB of leaders
1 million calls and two will succeed
Our Approach: Use the Web as an Oracle
Example: implement “get head by country and type”
19
King, Brazil “King of Brazil?”
Lula
Pedro I
Pedro II
HTMLInformationExtraction (IE)
getLeaderInfo King, Brazil getLeaderInfo King, Brazil
getLeaderInfo President, Brazil
3 calls and 2 will succeed
X
Model oracles as functions
20
HTMLInformationExtraction (IE)
[outputs (verified by WS)]
[country, head-type] “[type] of [country]”
oracleGetCandidates(person, type, country)
countryheadOf?person country
type
type
New Query
22
?countryheadOf
Brazil
King
type
oracleGetCandidates ?inauguration
headOfleader
date
inauguration
getInaugurationDay(leader, date) oracleGetCandidates(leader, type, country)
countryheadOfleader country
type
type
Pedro I of Brazil
10 March 1826
getInaugurationDay
Consider the additional Web services
getCurrentLeader(country, leader)
countryheadOfleader country
getPredecessor(leader, pLeader, pType, pDate, pCountry)
predecessor
leader
countryheadOfleader country
type
type
date
inauguration
Relevant but inefficient results
24
getCurrentLeader(Brazil, leader1, type1, date1)
getPredecessor(leader1, leader2, type2, date2, country2)
getPredecessor(leader2, leader2, type3, date3, country3)
getPredecessor(leader2, leader, type4, date4, country4)
countryheadOf
type
King
Brazil
inauguration
Smart calls vs. relevant but “guess” plans
25
countryheadOf
type
King
inauguration
getCurrentLeader(Brazil)getPredecessor(leader)
oracleGetCandidates(Brazil, King)
getInaugurationDay(leader)
Brazil predecessor
predecessor
Smart calls
Given a call Wi that belongs to a plan W1,… Wi,… Wn we say Wi is a smart call if its consequences are:
• either included in the union of the consequences of the previous functions Wi-1, ... W1
• or are atoms of the query
Property:
If a plan consists of only smart calls, and if every call has results, then the plan will deliver an answer for the query.
26
27
Experiments
50 Web services from three domains:
• Books• isbndb.org
• librarything.com
• abebooks.com
• Movies• internetvideoarchive.com (IVA)
• Music• musicbrainz.org
• last.fm
• discogs.com
• lyricWiki.org27
Evaluation results
28
Get prize winners TD ANGIE SUSIE
Nobel Prize in Literature 0 0 14
Golden Pen Award 0 0 11
Franz Kafka Prize 0 0 5
American Book Medal 0 0 16
Jerusalem Prize 0 0 11
Get books of winners of prize TD ANGIE SUSIE
Nobel Prize Literature 0 0 198
Golden Pen Award 0 0 228
Franz Kafka Prize 0 0 132
Jerusalem Prize 0 0 220
Get books of winners by prize and country TD ANGIE SUSIE
Nobel Prize Literature, France 0 0 144
Franz Kafka Prize, UK 0 0 79
Related Work: Answering Queries using Views
• Maximal contained rewritings (MCR) • Plans computing the largest number of answers
• Approaches based on reducing the number of irrelevant calls
• Benedict & al. PODS 2011, VLDB 2012• S. Kambhampa, JIIC 2004
• SUSIE does not target maximal contained rewritings• Relevant calls for MCR includes all calls that might return results
• Smart calls are a subset of relevant calls.
29
SUSIE
• Addressed the problem of asymmetric accesses
• A novel approach to answer such queries where the inputs for the Web service call are extracted on the fly, from the Web
• New evaluation algorithm that prioritizes smart calls
• An experimental evaluation using a representative set of queries and real data sources
30
SUSIE: Search Using Services and Information Extraction.with F. M. Suchanek, W. Yuan, G. Weikum ICDE 2013
31
Ongoing work
Given a query Q and a set of function F compute all smart plans (for which it can be proven that they return answers)
31
Web Service API
• Web Services for applications ≅ Web forms for humans
• An API = collection Web services
• A Web Service • expects bindings for input parameters
• returns structured data: XML or JSON
33
<geonames> <country> <ccode> AR </ccode> <cname> Argentina </cname> <isonumeric>032</isonumeric> <fipscode> ARG <fipscode> <continent> SA </continent> <continentName> Argentina </continentName> <capital> Buenos Aires </capital> <cities>
<city>
<name>Buenos Aires</name>
Goals
For every Web service:1) Compute a parameterized query (relations are from the KB)
2) Compute a transformation script XSLT to be applied for every call result XML result results for the parameterized query
34
1) Parameterized query for getCountryByName 35
getCountryByName(country, name, time-zone, capital, type, lat, lng city, c_lat, c_lng)
labelcountry
has
Ca
pita
l
time-zone
hasT
imeZ
one
name
hasCity
typetype
citylabel
c_lat
c_lng
latitude
longitude
lnglat
longitude
latitude
r
e
“Republic”“ARS’’
“Argentina”
“Buenos Aires”
f
“Buenos Aires”
g h
“-34”
i
“-64” “Córdoba”
g h
“-31.40833”
i
“-64.18388”
f
dcba j l
“-34” “-64”
getCountryByName(Argentina)
r
e
“Republic”“GMT+2’’
“Romania”
“Bucharest”f
“Bucharest”
g h
“44.4”
i
“26.1” “Rm Valcea”
g h
“45.1”
i
“24”
f
dcba j l
“44.4” “26.1”
2) An XSLT transformation for all call results
getCountryByName(Romania, GMT+2, Bucharest, Republic, 44.4, 26.1, Bucharest, 44.4,
26.1)
getCountryByName(Romania, GMT+2, Bucharest, Republic, 44.4, 26.1, Rm Valcea, 45.1, 24)
General Challenges
• Heterogeneity: Every Web services has its schema for outputs
• Schemas are unknown• >85% of Web services implemented using REST• REST Web services do not expose schema descriptions
Our approach: use the overlapping between Web services & Knowledge Bases
Intuition
38
r
e
“Republic”“ARS’’
“Argentina”
“Buenos Aires”
f
“Buenos Aires”
g h
“-34”
i
“-64” “Córdoba”
g h
“-31.40833”
f
dcba j l
“-34” “-64”
label
URI1
Argentina
has
Ca
pita
l
URI2
label
Buenos Aires
URI1
r
Three steps algorithm
1) Align root-to text-nodes to paths from the input in the KB
2) Compute class and relation alignment candidates satisfying functional constraits
3) For each candidate compute transformation functions and check inclusion and equivalence for the non-functional relations
Observation:
The first 2 steps alone lead to a precision/recall of of around 90%
39
40
DORIS: Some experimental results
More than 50 Web services from 4 domains• Books• Movies• Music• Geo data
KB Precision Recall
Classes Relations Classes Relations
YAGO 0.92 0.91 0.96 0.93
DBpedia 0.89 0.88 0.98 0.95
BNF 1 1 1 1
40
Summary
• Addressed the problem of inferring views
• An instance based approach to the schema matching problem
• An experimental evaluation using real Web sources
41
DORIS: Discovering ontological relations in sources.with Mary Koutraki, Dan Vodislav, in preparation
getCountryByName(country, name, time-zone, capital, type, lat, lng, city, c_lat, c_lng)
labelcountry
has
Cap
ital
time-zone
hasTim
eZone
name
hasCity
type
type
citylabel
c_lat
c_lng
latitudelongitude
lnglat
longitudelatitude
<geonames> <country> <ccode> AR </ccode> <cname> Argentina </cname> <isonumeric>032</isonumeric> <fipscode> ARG <fipscode> <continent> SA </continent> <continentName> Argentina </continentName> <capital> Buenos Aires </capital> <areaInSqKM> <areaInSqKM>
Our work
• Query Evaluation using Web Service APIs
• Mapping Web Services to Knowledge Bases
42
Web Services WWW
SUSIE
Web Services
ANGIE
KB
Web Services KnowledgeBase
DORIS
Same plan as a graph
predecessor
getPredecessorcountry
Henrique Cardoso Brazil
President
type
headOfState
1 January 1995
predecessor
getPredecessorcountry
Lula da Silva Brazil
President
type
headOfState
1 January 2003
getCurrentHeadOfState
Dilma RousseffcountryheadOfState
Brazil
King
type
President
1 January 2011
BrazilDilma Rousseff
Lula da Silva
IE: Authors who won prize X
44
Precision Recall Prize
38% 59% National Book
62% 44% Phoenix
23% 52% Jerusalem
78% 79% Pulizer
25% 73% Franz Kafka
31% 13% Prix Femina
28% 6% Prix Decembre
41% 29% Nobel Prize
25% 73% Golden Pen
Challenges of an instanced-based approach
• XML elements do not correspond to entities in KB
• Entities in KB are URIs and are not to be found in call results
• What is an entity in the XML call result?
• Spurious matches (Argentina is a capital and also a person)
45
Idea: align properties expressed as text or literals first