Biperpedia: An ontology of Search Application
-
Upload
harshjk -
Category
Technology
-
view
327 -
download
2
description
Transcript of Biperpedia: An ontology of Search Application
![Page 1: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/1.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
1
BiperpediaAN ONTOLOGY FOR SEARCH APPLICATIONS
![Page 2: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/2.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
2Present By
Dipen Shah110420107064
Harsh Kevadia110420107049
Nancy Sukhadia110420107025
![Page 3: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/3.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
3Introduction
Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases.
While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small.
![Page 4: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/4.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
4Introduction (Cont.)
We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names.
Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text.
For every attribute Biperpedia saves a set of synonyms and text patterns in which it appears, thereby enabling it to recognize the attribute in more contexts.
In addition to a detailed analysis of the quality of Biperpedia, we show that it can increase the number of Web tables whose semantics we can recover by more than a factor of 4 compared with Freebase(FREEBASE.COM).
![Page 5: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/5.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
5Introduction (Cont.)
![Page 6: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/6.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
6Introduction (Cont.)
We describe Biperpedia, an ontology of binary attributes that contains up to two orders of magnitude more attributes than Freebase.
An attribute in Biperpedia (see Figure 1) is a relationship between a pair of entities (e.g., CAPITAL of countries), between an entity and a value (e.g., COFFEE PRODUCTION), or between an entity and a narrative (e.g., CULTURE).
Biperpedia is concerned with attributes at the schema level.
Extracting actual values for these attributes is a subject of a future effort.
Biperpedia is a best-effort ontology in the sense that not all the attributes it contains are meaningful.
![Page 7: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/7.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
7Introduction (Cont.)
Biperpedia includes a set of constructs that facilitates query and text understanding.
In particular, Biperpedia attaches to every attribute a set of common misspellings of the attribute, its synonyms (some which may be approximate), other related attributes (even if the specific relationship is not known), and common text phrases that mention the attribute.
![Page 8: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/8.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
8Agenda
Section 2 : defines our problem Setting
Section 3 : describes the architecture of Biperpedia.
Section 4 : describes how we extract attributes from the query Stream
Section 5 : describes how we extract additional attributes from text.
Section 6 : describes how we merge the attribute extractions and enhance the ontology with synonyms.
Section 7 : evaluates the attribute quality.
Section 8 : describes an algorithm for placing attributes in the hierarchy.
Section 9 : describes how we use Biperpedia to improve our interpretation of Web tables.
Section 10 : describes related work
Section 11 : concludes.
![Page 9: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/9.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
9Problem Definition
The goal of Biperpedia is to find schema-level attributes that can be associated with classes of entities.
For example, we want to discover CAPITAL, GDP(Gross domestic product), LANGUAGES SPOKEN, and HISTORY as attributes of COUNTRIES.
Biperpedia is not concerned with the values of the attributes. That is, we are not trying to find the specific GDP of a given country.
![Page 10: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/10.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
10It Solve The Problem In Following Steps:
Name, domain class, and range:
Synonyms and misspellings:
Related attributes and mentions:
Provenance:
Differences from a traditional ontology:
Evaluation:
![Page 11: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/11.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
11The Biperpedia System
The Biperpedia extraction pipeline is shown in Figure 2. At a high level, the pipeline has two phases.
In the first phase, we extract attribute candidates from multiple data sources, and in the second phase we merge the extractions and enhance the ontology by finding synonyms, related attributes, and the best classes for attributes.
The pipeline is implemented as a FlumeJava pipeline .(FlumeJava is one type of java library)
![Page 12: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/12.jpg)
04
/08
/20
23
08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd N
an
cy S
ukh
adia
12Biperpedia Extraction Pipeline
![Page 13: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/13.jpg)
Query Stream Extraction
Find Candidate Attribute
Reconcile to Freebase InstanceCount(C,A)
QueryCount(C,A)
Remove co-reference mentions
Output attribute candidates
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
13
![Page 14: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/14.jpg)
Extraction From Web Text
Noun and Verb (Concept)
Extraction via distant supervision
Attribute classification
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
14
![Page 15: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/15.jpg)
Extraction Via Distant Supervision
Figure shows the yield of the top induced extraction patterns. Although we induce more than 2500 patterns, we see that the top- 200 patterns account for more than 99% of the extractions.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
15
![Page 16: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/16.jpg)
Separation By Attribute Type0
4/0
8/2
02
3 08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
16
![Page 17: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/17.jpg)
Attribute Classification
Example
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
17
![Page 18: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/18.jpg)
Synonym Detection
For spell correction, we rely on the search engine. Given an attribute A of a class C, we examine the spell corrections that the search engine would propose for the query “C A”.
If one of the corrections is an attribute A’ of C, then we deem A to be a misspelling of A’.
For example, given the attribute WRITTER of class BOOKS, the search engine will propose that books writer is a spell correction of books writter.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
18
![Page 19: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/19.jpg)
Attribute Quality
DBPedia DBpedia is a crowd-sourced community effort to extract structured
information from Wikipedia and make this information available on the Web.
Experimental setting
Overall quality
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
19
![Page 20: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/20.jpg)
Experimental Setting0
4/0
8/2
02
3 08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
20
![Page 21: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/21.jpg)
Overall Quality
3 evaluators to determine whether an attribute is good or bad for this class.
1. Rank by Query
2. Rank by Text
3. Precision (specifies the fraction of attributes that were labelled as good)
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
21
![Page 22: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/22.jpg)
Finding The Best Class
Biperpedia attaches attribute to every class in hierarchy.
For more modular ontology or attribute that can contribute to freebase, best class need to be found.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
22
![Page 23: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/23.jpg)
Example0
4/0
8/2
02
3 08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
23
![Page 24: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/24.jpg)
Placement Algorithm
How can we decide which can be best class for the attribute?
The algorithm traverses, in a bottom up fashion, each tree of classes for which A has been marked as relevant
Equation:-
Squery(C, A) = InstanceCount(C, A)
Max A*{InstanceCount(C, A*)}
Support(S) is the ratio between the number of instances of C that have A and the maximal number of instances for any attribute of C.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
24
![Page 25: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/25.jpg)
(contd..)
Which one to choose When there are several siblings with sufficient support.
Diversity Measure for the sibling.
n>1,
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
25
![Page 26: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/26.jpg)
Algorithm0
4/0
8/2
02
3 08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
26
![Page 27: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/27.jpg)
Evaluation
We can Check whether the assignment of the attribute is exact or not.
Precision Measures:
Mexact: ratio of number of exact assignments to all assignments.
Mapprox: ratio of number of approximate assignments to all assignments. Note that an approximate assignment is still valuable because a human curator would only have to consider a small neighbourhood of classes to find the exact match.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
27
![Page 28: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/28.jpg)
Results
Best Result when Θ = 0.9
Algorithm outperforms by more than 50%.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
28
![Page 29: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/29.jpg)
Interpreting WEB TABLES
Biperpedia is useful if it can improve search applications.
There are millions of high-quality HTML tables on the Web with very diverse content.
One of the major challenges with Web tables is to understand the attributes that are represented in the tables.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
29
![Page 30: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/30.jpg)
Mapping Algorithm0
4/0
8/2
02
3 08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
30
![Page 31: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/31.jpg)
Interpretation Quality
The Representative column shows the number of tables for which at least one correct representative attribute was found.
The Overall (P/R) column shows the average precision/recall over all mappings.
The Avg. P/R per table columns compute the precision/recall per table and then averages over all the tables.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
31
![Page 32: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/32.jpg)
Comparison with Freebase
The first set of columns shows the number of mappings to Biperpedia attributes, the number that were mapped to Freebase attributes, and the ratio between them.
The second set of columns show these numbers for mappings to representative attributes.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
32
![Page 33: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/33.jpg)
Error Analysis
Noisy token in the surrounding text and page title
Incorrect string matching against column headers
Table is too specific.
Not enough information.
Evaluator Disagreement.
Biperpedia too small.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
33
![Page 34: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/34.jpg)
(Contd..)0
4/0
8/2
02
3 08
:25
AM
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
34
![Page 35: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/35.jpg)
Conclusion
Biperpedia, an ontology search application that extends Freebase from query stream and Web text. It enables interpreting over a factor of 4 more Web tables than is possible with Freebase. This algorithm can be applied to any query stream with possibly different results.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
35
![Page 36: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/36.jpg)
References
M. D. Adelfio and H. Samet. Schema extraction for tabular data on the web. PVLDB, 2013.
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data.
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web.
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning.
A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012.
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
36
![Page 37: Biperpedia: An ontology of Search Application](https://reader034.fdocuments.us/reader034/viewer/2022051816/54641bd9af79596b4d8b520b/html5/thumbnails/37.jpg)
Thank You
Q/A!
04
/08
/20
23 0
8:2
5 A
M
Cop
yrig
ht ©
By H
arsh
Kevadia
,Dip
en S
hah a
nd
Nan
cy S
ukh
ad
ia
37