Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University...

76
Query-Driven Knowledge Base Completion with Multimodal Fusion Yang Peng Data Science Research Lab University of Florida 1

Transcript of Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University...

Page 1: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Query-Driven Knowledge Base Completion

with Multimodal Fusion

Yang Peng

Data Science Research Lab

University of Florida

1

Page 2: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Outline

• Introduction

• Recap of Work before Candidacy

• Knowledge Base Completion

– Query-Driven Web-Based Question Answering

– Augmented Rule Inference based on Question

Answering

– Ensemble Fusion

• Conclusions

2

Page 3: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Introduction

• Multimodal Data

3

Page 4: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

4

The University of Florida (commonly referred to as Florida or UF) is an American public land-grant, sea-grant, and space-grant research university on a 2,000-acre (8.1 km2) campus in Gainesville, Florida. It is a senior member of the State University System of Florida and traces its origins to 1853,[7] and has operated continuously on its Gainesville campus since September 1906.[8]

https://en.wikipedia.org/wiki/University_of_Florida

Page 5: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Introduction

• Multimodal Data

– Unstructured text

– Structured knowledge

– Images

– etc.

5

Page 6: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Multimodal Fusion

• Multimodal fusion is the use of algorithms to

combine information from different kinds of data

with the purpose of achieving better performance

• This dissertation focuses on employing multimodal

fusion on multimodal data to improve performance

for various tasks, as well as providing scalability and

high efficiency

6

Page 7: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Recap

• Discovered the correlative and complementaryrelations between different modalities

• Multimodal data can either provide additional information or emphasize the same information

• Multimodal ensemble fusion model for word sense disambiguation and information retrieval

– Achieve better performance than image-only or text-only approaches

7

Page 8: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Recap

• Streaming fact extraction

– Process 5TB text data in less than 1 hour

– Extract hundreds of thousands facts for 13 relations

• Scalable image retrieval on Hadoop

– Process millions of images in less than 2 hour

– Propose two distributed K-Means algorithms, hundreds of

times faster than normal K-Means

8

Page 9: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Knowledge Bases

• Huge knowledge bases such as Freebase,

NELL, and YAGO

• A large knowledge graph of entities, relations

and facts

9

Page 10: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

10

Google Knowledge Graph

Page 11: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Knowledge Bases

• Facts stored in triples– <subject, relation, object>, e.g., <Marvin_Minsky,

wasBornIn, New_York_City>

– Marvin Lee Minsky (August 9, 1927 -- January 24, 2016)

was an American cognitive scientist concerned largely with

research of artificial intelligence (AI), co-founder of the

Massachusetts Institute of Technology’s AI laboratory, and

author of several texts concerning AI and philosophy.

https://en.wikipedia.org/wiki/Marvin_Minsky

11

Page 12: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Knowledge Base Completion

• Huge knowledge bases such as Freebase,

NELL, and YAGO

• Despite large size, they are highly incomplete

12

Page 13: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Knowledge Base Completion

13

Page 14: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Knowledge Base Completion

• Knowledge Base Completion

– Knowledge base completion (KBC) is the task to

fill in the gaps in knowledge bases in a targeted

way

14

Page 15: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Knowledge Base Completion

• Knowledge Base Completion

– Knowledge base completion (KBC) is the task to fill in the gaps in knowledge bases in a targeted way

– <subject, relation, ?>: given the subject and relation, what is the corresponding object value(s)?

– Example

• Query: <Marvin_Minksy, wasBornIn, ?>

• Answer: New_York_City

15

Page 16: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Related Work

• Knowledge Base Construction/Population

– Fact extraction from unstructured text datasets

– Human workers to add information

16

Page 17: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Related Work

• Knowledge Base Construction/Population

– Fact extraction from unstructured text datasets

– Human workers to add information

• Drawbacks

– Time-consuming

– Intense human labor

17

Page 18: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Related Work

• Inference and Learning

– Logical rules

– Probabilistic graphical models

– Embeddings and Matrix Factorization

18

Page 19: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Related Work Cont.

• Inference and Learning

– Logical rules

– Probabilistic graphical models

– Embeddings and Matrix Factorization

• Drawbacks

– Learning effective, efficient and expressive models

is difficult

– Knowledge bases are highly incomplete

19

Page 20: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Related Work Cont.

• Question Answering– Robert West, Evgeniy Gabrilovich, Kevin Murphy, etc. 2014.

Knowledge base completion via search-based question

answering. In Proceedings of the 23rd international conference

on World wide web (WWW '14)

– Parse structured queries to natural language questions and

use an in-house question answering system to get answers

20

Page 21: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Related Work Cont.

• Question Answering– Robert West, Evgeniy Gabrilovich, Kevin Murphy, etc. 2014.

Knowledge base completion via search-based question

answering. In Proceedings of the 23rd international conference

on World wide web (WWW '14)

– Parse structured queries to natural language questions and

use an in-house question answering system to get answers

• Drawbacks– Low recall for unpopular relations and entities

21

Page 22: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Main Problems with Related Work

• Uses either only unstructured textual

information or only structured information

– Lower recall/precision

• Batch-oriented KBC

– User query answers may still be missing

22

Page 23: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Our Approach

• Propose a query-driven pipeline with multimodal fusion

for knowledge base completion

• Enhance web-based question answering and rule

inference using both unstructured text and structured

knowledge for knowledge base completion

• To our best knowledge, this is the first system for query-

time knowledge base completion fusing unstructured and

structured data

23

Page 24: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Novelty

• Fuse unstructured text and structured

knowledge bases to improve performance

• Combine novel text-based approaches and

knowledge-based approaches

• Query-driven vs batch-oriented

24

Page 25: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

System Overview

• Design a query-driven web-based question

answering system to extract missing facts from

textual snippets crawled from the Web

25

Page 26: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

System Overview

• Design a query-driven web-based question

answering system to extract missing facts from

textual snippets crawled from the Web

• Build an augmented rule inference system

based on web-based question answering,

logical rules and existing facts inside KBs to

infer missing facts

26

Page 27: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

System Overview

• Design a query-driven web-based question answering system to extract missing facts from textual snippets crawled from the Web

• Build an augmented rule inference system based on web-based question answering,logical rules and existing facts inside KBs to infer missing facts

• Use ensemble fusion approaches to combine question answering and rule inference

27

Page 28: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Rule Inference

Web-Based Question Answering

Ensemble Fusion

Web

Knowledge Bases

metadata, entity semantics, etc.

logical rules, existing facts, etc.

relevant textual snippets

<subject, relation, ?>

<subject, relation, ?>

candidate answers

candidate answers

final results

System Pipeline

Page 29: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Rule Inference

Web-Based Question Answering

Ensemble Fusion

Web

Knowledge Bases

metadata, entity semantics, etc.

logical rules, existing facts, etc.

relevant textual snippets

<subject, relation, ?>

<subject, relation, ?>

candidate answers

candidate answers

final results

System Pipeline

Page 30: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Web-Based Question Answering (WebQA)

• Parse queries to natural language questions,

search on the Web and extract candidate

answers from snippets

30

Page 31: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

WebQA Pipeline

31

Question Generation

Answer Ranking

Set of questions

Set of snippets

Set of candidate answers

Ranked answers with confidence scores

Web-Based Question Answering

Answer Extraction

Data Collection

<subject, relation, ?>

Page 32: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Example: WebQA Pipeline

32

Question Generation

Answer Ranking

Questions:Marvin Minksy born;Marvin Minksy birth;…

Snippets:

Marvin Lee Minsky was born in New

York City, to an eye surgeon father,

Henry, and to a mother, Fannie ...

Candidate Answers:New_York_CityBoston…

Ranked Answers:New_York_City: 0.95Boston: 0.4…

Web-Based Question Answering

Answer Extraction

Data Collection

Query: <Marvin_Minksy, wasBornIn, ?>

Page 33: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Question Generation

33

• Use question templates to transform queries

to questions– Each relation has multiple question templates

– wasBornIn: born, birth and birthplace

Question Generation

Answer Ranking

Questions:Marvin Minksy born;Marvin Minksy birth;…

Snippets:

Marvin Lee Minsky was born in New

York City, to an eye surgeon father,

Henry, and to a mother, Fannie ...

Candidate Answers:New_York_CityBoston…

Ranked Answers:New_York_City: 0.95Boston: 0.4…

Web-Based Question Answering

Answer Extraction

Data Collection

Query: <Marvin_Minksy, wasBornIn, ?>

Page 34: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Data Collection

• Use search engines to crawl snippets by searching

questions on the Web

– Another example snippet: Marvin Minsky - A.M. Turing

Award Winner, BIRTH: New York City, August 9, 1927.

DEATH: Boston, January 24, 2016 ... 34

Question Generation

Answer Ranking

Questions:Marvin Minksy born;Marvin Minksy birth;…

Snippets:

Marvin Lee Minsky was born in New

York City, to an eye surgeon father,

Henry, and to a mother, Fannie ...

Candidate Answers:New_York_CityBoston…

Ranked Answers:New_York_City: 0.95Boston: 0.4…

Web-Based Question Answering

Answer Extraction

Data Collection

Query: <Marvin_Minksy, wasBornIn, ?>

Page 35: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Answer Extraction

• Extract noun phrases from snippets and link them to

entities in KBs, for example:

– New_York_City: a city entity

– Henry_Minsky: a person entity

– Category filtering 35

Question Generation

Answer Ranking

Questions:Marvin Minksy born;Marvin Minksy birth;…

Snippets:

Marvin Lee Minsky was born in New

York City, to an eye surgeon father,

Henry, and to a mother, Fannie ...

Candidate Answers:New_York_CityBoston…

Ranked Answers:New_York_City: 0.95Boston: 0.4…

Web-Based Question Answering

Answer Extraction

Data Collection

Query: <Marvin_Minksy, wasBornIn, ?>

Page 36: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Answer Ranking

• Extract multimodal features from textual

snippets and knowledge bases for candidate

answers and use a classifier to classify and

rank them 36

Question Generation

Answer Ranking

Questions:Marvin Minksy born;Marvin Minksy birth;…

Snippets:

Marvin Lee Minsky was born in New

York City, to an eye surgeon father,

Henry, and to a mother, Fannie ...

Candidate Answers:New_York_CityBoston…

Ranked Answers:New_York_City: 0.95Boston: 0.4…

Web-Based Question Answering

Answer Extraction

Data Collection

Query: <Marvin_Minksy, wasBornIn, ?>

Page 37: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Details on Answer Ranking (I)

Multimodal Features

• Fuse information from unstructured textual snippets and structured knowledge bases– count: the number of times a candidate answer appearing in snippets

– average rank: the average rank of the snippets a candidate answer appears in

– keyword count: the number of times question keywords appearing together with the candidate answer in snippets

– context similarity between the context of the candidate answer and the abstract of the query entity

– abstract similarity between the abstracts of the candidate answer and the query entity

– relatedness between the candidate answer and the query entity inside the knowledge base

37

Page 38: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Details on Answer Ranking (II)

Classification & Ranking

• Imbalanced datasets

– # negative samples / # positive samples > 30

– Resampling (better)

– Cost reweighting: false negatives have higher cost

• Classifiers

– Logistic regression (best)

– Decision tree

– Support vector machines

38

Page 39: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Query-Driven Techniques

• Question Template Selection

• Query-Driven Snippet Filtering

39

Page 40: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Question Template Selection

• Each relation has multiple templates

– Increase the chance of finding correct answers

• Issuing all possible questions is problematic

– Long running time

– Performance deterioration

40

Page 41: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Question Template Selection

• Propose a greedy selection algorithm to select

a minimal set of templates with the highest

performance

• Greedy strategy was the best strategy

according to previous work (WWW’14)

41

Page 42: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Greedy Selection Algorithm

• Four templates, t1, t2, t3, t4, in descending order of

individual performance

• Previous approach greedily selects the templates

according to their performance ranking

– {t1}

– {t1, t2}

– {t1, t2, t3}

– {t1, t2, t3, t4}

42

Page 43: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Greedy Selection Algorithm

• However, previous work ignored the

correlation between templates

• Our greedy strategy is to choose the

templates which work best together

– {t1}

– {t1, t3}: {t1} works better with t3 than t2 and t4

– {t1, t3, t4}: {t1, t3} works better with t4 than t2

– {t1, t3, t4, t2}

43

Page 44: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Greedy Selection Algorithm

44

Page 45: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Datasets, Benchmark and Metric

• Dataset: Yago2 knowledge base

– more than 10 million entities (like persons, organizations, cities, etc.) and more than 120 million facts about these entities

• Query Benchmark

– 8 relations: wasBornIn, diedIn, hasChild, hasCapital, isMarriedTo, isCitizenOf, graduatedFrom, hasAcademicAdvisor

– Randomly select 500 queries for training and100 queries for testing from Yago for each relation

45

Page 46: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Datasets, Benchmark and Metric

• Mean average precision (MAP) as metric

– Mean of average precision scores over all queries

– Average precision (AP): considering both precision

and ranking to evaluate a ranked list of answers

• Ranked list: {1, 0, 0, 1, 0, 0}

• AP = (1/1 + 2/4) / 2 = 0.75

46

Page 47: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Template Selection Evaluation

47

K is the number of questions

Page 48: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Template Selection Evaluation

• 2 or 3 templates are enough for most relations

• We use much fewer templates than previous work to achieve better performance

• Compare four relations with previous work– Robert West, Evgeniy Gabrilovich, Kevin Murphy, etc. 2014.

Knowledge base completion via search-based question answering. In Proceedings of the 23rd international conference on World wide web (WWW '14). ACM, New York, NY, USA, 515-526. DOI: http://dx.doi.org/10.1145/2566486.2568032

48

Page 49: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Results for Template Selection

49

0.93, 8 questions

0.67, 32 questions

0.18, 8 questions

0.5, 8 questions

0.45, 2 questions

0.75, 2 questions

0.24, 2 question

0.52, 3 questions

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

isCitizenOf

wasBornIn

hasChild

isMarriedTo

The performance of WebQA vs previous work (WWW’14 paper), measured by mean average precision

WebQA WWW'14

WWW’14 used top searched entities while we use randomly selected entities.

Page 50: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Query-Driven Snippet Filtering

• 50 snippets for each question (WWW’14), 100 – 150

snippets for each KBC query

• Entity linking on all snippets is time-consuming

– Server lookups

– Multi-threading is not enough

• Not all snippets are good

50

Page 51: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Query-Driven Snippet Filtering

• Features

– The original rank of a snippet returned by a search

engine

– A boolean indicator about whether the keyword in

question templates appearing in the snippet or not,

– The number of words of entity names appearing in

the snippet

• Logistic regression with resampling

51

Page 52: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Results for Snippet Filtering

52

0.7

0.21

0.48

0.39

0.31

0.45

0.19

0.1

0.71

0.21

0.5

0.40.38

0.48

0.22

0.16

0.7

0.24

0.51

0.41 0.4

0.51

0.220.18

0.75

0.24

0.52

0.450.43

0.52

0.250.21

0.67

0.18

0.5

0.93

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10 snippets 20 snippets 30 snippets All snippets WWW'14

Page 53: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

WebQA Efficiency

• Bottleneck

– Network delay

– Server-side delay

• Solution

– Use much fewer questions

– Snippet filtering

53

Page 54: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Runtime Results for Query-Driven

Techniques

54

12.5

4.13.2 3.1

0

2

4

6

8

10

12

14

150 50 30 20

Ru

nti

me

(sec

on

ds)

Number of snippets

Average runtime of WebQA for relation wasBornIn with 3 questions

WebQA

Page 55: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Rule Inference

Web-Based Question Answering

Ensemble Fusion

Web

Knowledge Bases

metadata, entity semantics, etc.

logical rules, existing facts, etc.

relevant textual snippets

<subject, relation, ?>

<subject, relation, ?>

candidate answers

candidate answers

final results

System Pipeline

Page 56: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Rule Inference

• Logical rules pre-learned from knowledge

bases to infer missing facts

• Horn-clause rules

56

isMarriedTo(x,y), hasChild(y,z) => hasChild(x,z); 0.62

Body/Premise Head/Conclusion

Confidence ScoreLiteral

Page 57: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Rules

• Length-1 rules

– e.g. diedIn(x, City_A) => wasBornIn(x, City_A);

confidence score = 0.12

• Length-2 rules

– e.g. isMarriedTo(x, y), hasChild(y, z) => hasChild(x,

z); confidence score = 0.62

57

Page 58: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Ordinary Rule Inference (OrdRI)

• Facts of KBs are stored in database tables

• Use only existing facts inside knowledge bases

– SQL query

• Each relation has multiple rules

– Parallelization using multi-threading

58

Page 59: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Ordinary Rule Inference (OrdRI)

• Fusion of results from multiple rules

– Sum score

– Maximum score

– Logistic regression (average confidence score,

total # of rules, etc.)

59

Page 60: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Problem

• Using only existing facts in knowledge bases

– Knowledge bases are highly incomplete, hence

many premises or body predicates of rules are

missing

• Solution

– Use WebQA to extract missing body literals from

the Web and use rules to infer missing head facts

– Query-driven

60

Page 61: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Augmented Rule Inference with WebQA

(AugRI)

61

diedIn(Marvin_Minsky, y) in KB?

yes

no

Extract from KB

WebQA

y values with confidence scores

Output

Single-Literal Processing

Working for length-1 rules,

such as diedIn(x, y) => wasBornIn(x, y)

Page 62: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Augmented Rule Inference with WebQA

(AugRI)

62

Two-Literal Processing

Single-Literal Processing

Single-Literal Processing

isMarriedTo(Marvin_Minsky, y)

hasChild(y1, z)

Working for length-2 rules, such as

isMarriedTo(x, y), hasChild(y, z) => hasChild(x, z)

Single-Literal Processing

Output

z values with confidence scores

hasChild(yn, z)

Page 63: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Answer Confidence Score

• Length-1 rule– diedIn(x, y) => wasBornIn(x, y); 0.12

– score(y) = score(diedIn(x, y)) × scorer

– If diedIn(x,y) exists inside KBs, its confidence is 1; else its confidence is the score returned by WebQA

• Length-2 rule

– isMarriedTo(x, y), hasChild(y, z) => hasChild(x, z); 0.62

𝑠𝑐𝑜𝑟𝑒𝑟 ×

𝑦

𝑠𝑐𝑜𝑟𝑒(𝑖𝑠𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑇𝑜(𝑥, 𝑦)) × 𝑠𝑐𝑜𝑟𝑒(ℎ𝑎𝑠𝐶ℎ𝑖𝑙𝑑(𝑦, 𝑧))

63

Page 64: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Fusion of Multiple Rules

• Similar to ordinary rule inference

64

Page 65: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Query-Driven Optimization

• Length-2 Rules

– Could possibly launch too many WebQA queries

for the second literal

65

Page 66: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Query-Driven Optimization

• Length-2 Rules

– Could possibly launch too many WebQA queries

for the second literal

• Use threshold to filter out low-confidence

results from the first step

• Only use top k answers from the first step

• Empirical parameter learning

66

Page 67: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Results Comparing Ordinary Rule

Inference and Augmented Rule Inference

67

0.96

0.12

0.08

0.22

0.2

0.06

0.96

0.49

0.53

0.24

0.3

0.17

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

isMarriedTo

wasBornIn

isCitizenOf

hasChild

diedIn

graduatedFrom

MAP of OrdRI vs AugRI on the Yago benchmark

AugRI OrdRI

Page 68: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Rule Inference

Web-Based Question Answering

Ensemble Fusion

Web

Knowledge Bases

metadata, entity semantics, etc.

logical rules, existing facts, etc.

relevant textual snippets

<subject, relation, ?>

<subject, relation, ?>

candidate answers

candidate answers

final results

System Pipeline

Page 69: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Ensemble Fusion

• Candidate answers with confidence scores

from WebQA and AugRI

• Ensemble approaches

– Linear rule

– Maximum rule

– Sum rule

– Logistic regression

69

Page 70: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Ensemble Fusion of Question Answering and

Rule Inference

70

KBC Performance of individual approaches and fusion approaches

(measured by MAP).WebQA is conducted with 30 snippets.

0.67

0.18

0.93

0.5

0.7

0.24

0.41

0.51

0.4

0.22

0.49

0.24

0.53

0.96

0.3

0.17

0.82

0.4

0.55

0.96

0.47

0.3

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

wasBornIn hasChild isCitizenOf isMarriedTo diedIn graduatedFrom

WWW'14 WebQA AugRI WebQA + AugRI

Page 71: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

System Efficiency

• In average, very fast, about 4 – 5 seconds,

because query-driven optimization guarantees

only very few WebQA queries are initiated

• Future improvements

– Use in-house entity linking

– Use faster search engines

71

Page 72: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Conclusions

• Propose a query-driven pipeline with multimodal fusion for

knowledge base completion, fusing unstructured and

structured information

• Design a novel web-based question answering (WebQA)

system with multimodal features, question template selection

and query-driven snippet filtering

• Build a novel rule inference system with logical rules, WebQA

and query-driven optimization

• Achieve state-of-the-art performance

• Provide fast real-time responses to user queries on-the-fly

72

Page 73: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Publications

• Query-Driven Knowledge Base Completion with Multimodal Fusion.

Yang Peng, Daisy Zhe Wang. To be submitted to VLDB 2018.

• Multimodal Learning for Web Information Extraction. Dihong Gong,

Daisy ZheWang, Yang Peng. In proceedings of ACM Multimedia,

October 2017.

• Multimodal Ensemble Fusion for Disambiguation and Retrieval. Yang

Peng, Xiaofeng Zhou, Daisy Zhe Wang, Ishan Patwa, Dihong Gong,

ChunshengVictor Fang. IEEE Multimedia Magazine, April-June 2016

73

Page 74: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Publications

• Scalable Image Retrieval with Multimodal Fusion. Yang Peng, XiaofengZhou, Daisy Zhe Wang, ChunshengVictor Fang. Proceedings of the 29th International FLAIRS conference, May 2016.

• Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation. Yang Peng, Daisy Zhe Wang, Ishan Patwa, DihongGong. Proceedings of the IEEE ISM, December 2015.

• Streaming Fact Extraction for Wikipedia Entities at Web-Scale. MortezaShahriari Nia*, Christan Grant*, Yang Peng*, Daisy Zhe Wang, Milenko Petrovic. (* authors contributed equally) Proceedings of the 27th International FLAIRS Conference, May 2014

• University of Florida Knowledge Base Acceleration Notebook. MortezaShahriari Nia, Christan Grant, Yang Peng, Daisy Zhe Wang, Milenko Petrovic. Proceedings of the TREC 2013, November 2013.

74

Page 75: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Acknowledgments

Committee

• Prof. Daisy Zhe Wang

• Prof. Shigang Chen

• Prof. Sartaj Sahni

• Prof. Sanjay Ranka

• Prof. Tan Wong

Collaborators & Lab-mates

• Dr. Kun Li

• Dr. Christan Grant

• Dr. Morteza Shahriari Nia

• Dr. Yang Chen

• Xiaofeng Zhou

• Sean Goldberg

• Miguel Rodríguez

• Dihong Gong

• Ali Sadeghian

• etc.

75

Page 76: Query-Driven Knowledge Base Completion with Multimodal Fusionypeng/data/dp.pdf · 4 The University of Florida (commonly referred to as Florida or UF) is an American public land- grant,

Thank you!

• Questions?

76