Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge...

92
Expanding the YAGO knowledge base Rebele The YAGO knowledge base Outline Using YAGO for the humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion 1/41 Expanding the YAGO knowledge base Thomas Rebele Télécom ParisTech 2018-07-19

Transcript of Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge...

Page 1: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

1/41

Expanding the YAGO knowledge base

Thomas Rebele

Télécom ParisTech

2018-07-19

Page 2: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

What is a knowledgebase?

What is YAGO?

Involvement

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

What is a knowledge base?

2/41

Albert EinsteinMileva Maric

marriedAlfred Kleinerhas advisor

Nobel Prize in Physics

won prize

Page 3: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

What is a knowledgebase?

What is YAGO?

Involvement

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

What is a knowledge base?

2/41

Albert EinsteinMileva Maric

marriedAlfred Kleinerhas advisor

Nobel Prize in Physics

won prize

Applications of knowledge bases:

◮ Question answering

◮ Semantic search

◮ Text analysis

◮ Machine translation

Page 4: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

What is a knowledgebase?

What is YAGO?

Involvement

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

What is YAGO?

3/41

◮ Knowledge base with 10 million entities and >210 million facts

◮ Automatically extracted from Wikipedia, Wordnet, and Geonames

◮ Multilingual facts from 10 languages

◮ Focus on precision

◮ Developed by Max-Planck Institute for Informatics and TélécomParisTech

Page 5: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

What is a knowledgebase?

What is YAGO?

Involvement

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

What is YAGO?

4/41

Languages

EnglishGermanFrench

Albert Einstein

Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.

Albert Einstein

Spouse(s) Mileva MaricDoctoral advisor(s) Alfred Kleiner

Categories: Nobel laureates in Physics

Albert EinsteinMileva Maric

marriedAlfred Kleinerhas advisor

Nobel Prize in Physics

won prize

automaticextraction

Page 6: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

What is a knowledgebase?

What is YAGO?

Involvement

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Involvement

5/41

◮ I joined the project in 2014

◮ Maintenance and development

◮ Contributed to open source release in 2017 athttps://github.com/yago-naga/yago3/

◮ Coordinated / contributed to the evaluation

◮ ground truth: Wikipedia◮ 98% facts of the sample were correct

Publication: ISWC 2016 (resource paper)

Thomas Rebele Fabian Suchanek Johannes Hoffart Joanna Biega Erdal Kuzey Gerhard Weikum

YAGO is very accurate. But how complete is it?

Page 7: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

6/41

Contributions:

Extracting more information aboutresidences, gender, birth and death dates

Repairing regular expressionsby adding missing words

Preprocessing tabular databy transforming queries to Bash scripts

Page 8: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

6/41

Contributions:

Extracting more information aboutresidences, gender, birth and death dates

Repairing regular expressionsby adding missing words

Preprocessing tabular databy transforming queries to Bash scripts

Page 9: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Problem statement

7/41

◮ Every person lives somewhere,but YAGO knows the residence only for 30% of the people

◮ Every person has a gender,but YAGO knows the gender only for 64% of the people

How can we make YAGO more complete?

Page 10: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Place of residence

8/41

Plato

Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.

Plato

Birthplace Athens

Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer

residence

previous approach

Page 11: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Place of residence

8/41

Plato

Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.

Plato

Birthplace Athens

Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer

residence

previous approach

AustriaGreece Greece

mapping of

5900 demonyms

Page 12: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Place of residence

8/41

Plato

Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.

Plato

Birthplace Athens

Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer

residence

previous approach

AustriaGreece Greece

mapping of

5900 demonyms

Greece: 2Austria: 1

Page 13: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Place of residence

8/41

Plato

Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.

Plato

Birthplace Athens

Categories: 420s BC births | 340s BCdeaths | Greek philosopher | Greekmale wrestler | Austrian writer

residence

previous approach

AustriaGreece Greece

mapping of

5900 demonyms

Greece: 2Austria: 1

Page 14: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Gender

9/41

Extract gender:

Languages

EnglishGermanFrench

Albert Einstein

Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.

Albert Einstein

Categories: Male scientist | Swissphysicists

From pronoun:

◮ YAGO’s originalalgorithm

◮ Count pronouns (he,him / she, her)

◮ Assign genderaccordingly

Page 15: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Gender

9/41

Extract gender:

Languages

EnglishGermanFrench

Albert Einstein

Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.

Albert Einstein

Categories: Male scientist | Swissphysicists

From category

From pronoun:

◮ YAGO’s originalalgorithm

◮ Count pronouns (he,him / she, her)

◮ Assign genderaccordingly

Page 16: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Gender

9/41

Extract gender:

Languages

EnglishGermanFrench

Albert Einstein

Albert Einstein was aphysicist. His work in-fluenced the philosophyof science. He developedthe theory of relativity.

Albert Einstein

Categories: Male scientist | Swissphysicists

From category From first name:

◮ Countmales/females foreach first name

◮ Assign names togender accordingly

From pronoun:

◮ YAGO’s originalalgorithm

◮ Count pronouns (he,him / she, her)

◮ Assign genderaccordingly

Page 17: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Using YAGO for the humanities: Evaluation

10/41

◮ Compare extraction process on Wikipedia dump from 2017-02-20

◮ Extracted on 11 languages

◮ Evaluate precision based on a sample of 100 people

ExtractionYAGO

beforeRecall

YAGO

nowRecall Precision DBpedia (en)

Place of

residence 0.7m 30% 2.1m 91% (+201%) 97% (*) 0.7m

Gender 1.5m 64% 2.0m 87% (+35%) 98% 4k

Birth dates 1.6m 69% 1.7m 74% (+8%) 100% 0.8m

Death dates 0.7m 33% 0.8m 36% (+10%) 100% 0.3m

Table: Coverage and precision of our methods.Recall relative to total number of people in YAGO (2.2m).

m million k thousand(*) 6% of anachronistic residencies (e.g., German Empire instead of Germany)

Page 18: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Births per month

11/41

0 2 4 6 8 10 12

7.5%

8%

8.5%

9%

Month

Rel

ativ

eb

irth

s

YAGO - all

National Center for Health Statistics

Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).

Page 19: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Births per month

11/41

0 2 4 6 8 10 12

7.5%

8%

8.5%

9%

Month

Rel

ativ

eb

irth

s

YAGO - all

National Center for Health Statistics

Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).

Languages

EnglishEuskara

Relative age effect

The relative age effectdescribes a bias. Peopleborn early in the selec-tion period of sports oracademia are more likelyto succeed.

Categories: Ageism|Epidemiology

Page 20: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Births per month

11/41

0 2 4 6 8 10 12

7.5%

8%

8.5%

9%

Month

Rel

ativ

eb

irth

s

YAGO - no sportsmen

National Center for Health Statistics

Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).

Page 21: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Life span over time

12/41

100 500 1000 1500 1900

45

50

55

60

65

70

75

80

85

Year

Med

ian

age

male

female

Figure: Median age over time, by year of birth(with the Student’s t confidence interval at α = 95%).

Page 22: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Relative population size

13/41

−3000 −2500 −2000 −1500 −1000 −500 0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.81

Year

Rel

ativ

ep

op

ula

tio

nsi

ze

Egypt

Babylonian-EmpireSyria

ChinaGreece

Ancient-RomeBritainFrance

ItalyGermany

United-States

Figure: Relative population size, by century. The y-axis is scaled by a quadraticfunction.

Page 23: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Summary

14/41

◮ Extension of YAGO:◮ More people with residences (+201%, 97% precison)◮ More people with genders (+35%, 98% precision)◮ More birth and death dates (+8%/10%, 100% precision)

◮ Case studies:◮ Births per month◮ Life span over time◮ Relative population size over time

◮ Interdisciplinary project

Publication: ISWC 2017 (workshop paper)

Thomas Rebele Arash Nekoei Fabian Suchanek

We often had to repair regular expressions (e.g., for matching dates).Can we automate this step?

100 500 1000 1500 1900

50

60

70

80

Year

Med

ian

age

male

female

Page 24: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Problem statement

Extensions

Place of residence

Gender

Evaluation

Births per month

Life span over time

Relative populationsize

Summary

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Using YAGO for the humanities: Summary

15/41

Contributions:

Extracting more information aboutresidences, gender, birth and death dates

Repairing regular expressionsby adding missing words

Preprocessing tabular databy transforming queries to Bash scripts

Page 25: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Introduction

16/41

Why does YAGO not knowthe ISBN numbers of my books?

◮ We want to find ISBN numbers in Wikipedia to include it in YAGO

◮ We try the regex ISBN(978|979)?\d{10}

◮ I978-2-1234-5680-3

Page 26: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Introduction

16/41

Why does YAGO not knowthe ISBN numbers of my books?

◮ We want to find ISBN numbers in Wikipedia to include it in YAGO

◮ We try the regex ISBN(978|979)?\d{10}

◮ Why does the regex not find I978-2-1234-5680-3 ?

◮ How can we modify the regex automatically to match the word?

Page 27: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Problem statement

17/41

Problem statement, first try:

Given

◮ a regular expression r and ISBN(978|979)?\d{10}

◮ a set of positive examples E+, { I978-2-1234-5680-3 }find a regular expression r′ such that

◮ L(r) ⊆ L(r′)

◮ E+ ⊆ L(r′)

′ = .∗

Page 28: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Problem statement

17/41

Problem statement, first try:

Given

◮ a regular expression r and ISBN(978|979)?\d{10}

◮ a set of positive examples E+, { I978-2-1234-5680-3 }find a regular expression r′ such that

◮ L(r) ⊆ L(r′)

◮ E+ ⊆ L(r′)

Solution:

r′ = .∗

Page 29: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Problem statement

18/41

Problem statement:

Given

◮ a regular expression r, ISBN(978|979)?\d{10}

◮ a set of positive examples E+, { I978-2-1234-5680-3 }◮ a set of negative examples E−, { 0612345678 }

find a regular expression r′ such that

◮ L(r) ⊆ L(r′)

◮ E+ ⊆ L(r′)

◮ L(r′) ∩ E− is small

◮ ′ ≥ ≈

◮ ′ ≥

Page 30: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Problem statement

18/41

Problem statement:

Given

◮ a regular expression r, ISBN(978|979)?\d{10}

◮ a set of positive examples E+, { I978-2-1234-5680-3 }◮ a set of negative examples E−, { 0612345678 }

find a regular expression r′ such that

◮ L(r) ⊆ L(r′)

◮ E+ ⊆ L(r′)

◮ L(r′) ∩ E− is small

Evaluation:

◮ Precision of r′ ≥ or ≈ precision of r

◮ Recall of r′ ≥ recall of r(w.r.t. the intended meaning of the regex)

Page 31: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: What is new in our approach

19/41

Previous approaches:

regex E+ E− regex+ + −→

Our approach:

regex E+ E− regex+ + −→

Rationale: creating a large set of positive examples is difficult

Page 32: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Approximate regex matching

20/41

Step 1: match string and regex approximately [Myers et al. 1989]

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \d

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

...

Page 33: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Finding the gaps

21/41

Step 2: find the gaps

◮ Between regex leaves

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \dS B N

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

...

Page 34: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Finding the gaps

21/41

Step 2: find the gaps

◮ Between regex leaves

◮ Between characters of the string

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \d

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-

...

Page 35: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Add missing parts

22/41

Step 3 (simple approach): adapt regex, so that it includes the missingparts

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \dS B N

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

...

.

.

I ?

.

S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \d

Page 36: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Add missing parts

22/41

Step 3 (simple approach): adapt regex, so that it includes the missingparts

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \d

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-

...

.

.

I ?

.

S B N

?

|

.

9 7 8

.

9 7 9

?

-

.

\d \d ... \d \d

Page 37: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Add missing parts

22/41

Step 3 (simple approach): adapt regex, so that it includes the missingparts

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \d

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-

...

.

.

I ?

.

S B N

?

|

.

9 7 8

.

9 7 9

?

-

.

\d ?

-

\d ... \d \d

Page 38: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Add missing parts

22/41

Step 3 (simple approach): adapt regex, so that it includes the missingparts

.

.

I S B N

?

|

.

9 7 8

.

9 7 9

.

\d \d ... \d \d

I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-

...

.

.

I ?

.

S B N

?

|

.

9 7 8

.

9 7 9

?

-

.

\d ?

-

\d ... \d ?

-

\d

Page 39: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Add missing parts

23/41

Step 3 (adaptive approach): adapt regex, so that it includes the missingparts

Exemplarily for a concatenation:

.

a b ... c d

gs

2g1 g

e

2g3

{g1, g2, g3}

.

a

?

b ... c d

{g1, gs

2} {g e

2, g3}

Page 40: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Add missing parts

23/41

Step 3 (adaptive approach): adapt regex, so that it includes the missingparts

Exemplarily for a concatenation:

.

a b ... c d

gs

2g1 g

e

2g3

{g1, g2, g3}

.

a

?

b ... c d

{g1, gs

2} {g e

2, g3}

Page 41: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Feedback function

24/41

Now we want to find URLs:

◮ We try regex r = http://[a-zA-Z\.]+

◮ It does not find s = wikipedia.org

◮ Repaired regex r′ = (http://)?[a-zA-Z\.]+

Problem:

◮ r′ finds all words

◮ Precision drops

( ′) = | − ∩ ( ′)| ≤ α| − ∩ ( )|

◮ ( ′) =http://[a-zA-Z\.]+|wikipedia.org

Page 42: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Feedback function

24/41

Now we want to find URLs:

◮ We try regex r = http://[a-zA-Z\.]+

◮ It does not find s = wikipedia.org

◮ Repaired regex r′ = (http://)?[a-zA-Z\.]+

Problem:

◮ r′ finds all words

◮ Precision drops

Solution: use feedback on set of negative examples E−

◮ Determine the parts of the regex that we can make optional

◮ We use the number of false positives, i.e.,

f (r′) = |E− ∩ L(r′)| ≤ α|E− ∩ L(r)|

◮ If f (r′) = false, add the word as disjunction instead:http://[a-zA-Z\.]+|wikipedia.org

Page 43: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Experiments

25/41

Input data:

◮ Datasets:ReLIE [Li et al., 2008],Enron [Babbar et al., 2010], andWikipedia infobox attributes

◮ In total 8 tasks (e.g., phone numbers, software names, dates)

◮ In total 52 regexes

Page 44: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Experiments

26/41

Input data:

◮ Datasets:ReLIE [Li et al., 2008],Enron [Babbar et al., 2010], andWikipedia infobox attributes

◮ In total 8 tasks (e.g., phonenumbers, software names, dates)

◮ In total 52 regexes

Approaches:

◮ Dis: r|s1| · · · |sn

◮ Star: .*

◮ B&S: [Babbar et al., 2010](reimplementation)

◮ Simple

◮ Adaptive

baseline adaptive

measure original dis star B&S simple α = 1.0 α = 1.1 α = 1.20

F1 55 55 21 40 56 60 60 60recall 66 67 62 35 69 75 76 77precision 64 64 14 71 64 63 63 63

length 56 270 2 3929 250 76 80 81

Table: Averaged measures for the different systems. Length is # of characters ofthe regex.

Page 45: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Summary

27/41

Summary:

◮ Algorithm for adding missing words to regexes

◮ Increases recall, while keeping precision stable

◮ Source code available athttps://github.com/thomasrebele/regex-repair

Future work:

◮ Decrease dependency on E−

◮ Add a generalization step as postprocessing

Publications: ISWC 2017 (demo), PAKDD 2018 (full paper)

Thomas Rebele Katerina Tzompanaki Fabian Suchanek

Now that we have all this data, how can we process it efficiently?

Page 46: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

Introduction

Problem statement

What is new in ourapproach

Approximate regexmatching

Finding the gaps

Add missing parts

Feedback function

Experiments

Summary

AnsweringQueries with UnixShell

Conclusion

Adding Words to Regexes: Summary

28/41

Contributions:

Extracting more information aboutresidences, gender, birth and death dates

Repairing regular expressionsby adding missing words

Preprocessing tabular databy transforming queries to Bash scripts

Page 47: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Motivation

29/41

How can I findall my academic ancestors?

Albert Einstein

Relativity

teaches

Alfred Kleiner

Statistical physics

teaches

hasAdvisor

Johann Müller

Electromagnetism

teaches

hasAdvisor

Page 48: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Idea

30/41

QSPARQL / OWL

qresult

database

qTSV/n-triples/

/

σπ

Page 49: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Idea

30/41

QSPARQL / OWL

qresult

database

qTSV/n-triples/

/

σπ

Page 50: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Idea

30/41

QSPARQL / OWL

qresult

database

qTSV/n-triples/

/

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

/

Page 51: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Idea

30/41

QSPARQL / OWL

qresult

database

qTSV/n-triples/

/

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

/

Page 52: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

31/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

Query "Who are Einstein’s academic ancestors?" in SPARQL:

SELECT ?Y WHERE {<Einstein> <hasAdvisor>+ ?Y

}

Translating the query to Datalog (simplified):

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

Page 53: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , ., )( , ., )

( )

( , , )

( ) ( , )( , )

Page 54: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , ., )( , ., )

( )

( , , )

( )

projection

( , )( , )

selection

Page 55: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , ., )( , ., )

( )

( , , )

( )

projection

( , )( , )

selection

Page 56: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , ., )( , ., )

( )

join

( , , )

( ) ( , )( , )

Page 57: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

least fixed point

( ) ( )

( )

( , )

( , )( , )

( , ., )( , ., )

( )

( , , )

recursive call

( ) ( , )( , )

Page 58: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 59: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( )

( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 60: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( )

( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( )

( , )( , )

Page 61: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( )

( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 62: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( )

( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 63: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( )

( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 64: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 65: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

32/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

facts(X, Y, Z) :~ read_ntriples input

adv(X, Y) :- facts(X, "hasAdvisor", Y).

result(Y) :- adv("Einstein", Y).result(Y) :- result(X), adv(X, Y).

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

( ) ( )

( )

( , )

( , )( , )

( , s., )( , s., )

( )

( , , )

( ) ( , )( , )

Page 66: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

33/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

Optimizations:

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

µx

π3

σ1=Einstein

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

Page 67: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

33/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

Optimizations:

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

µx

π3

σ1=Einstein

σ2=hasAdvisor

input

π3

⋊⋉1=1

δx π1,3

σ2=hasAdvisor

input

Page 68: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

33/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

Optimizations:

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

µx

sort

π3

σ1=Einstein

σ2=hasAdvisor

input

π3

⋊⋉1=1

sort1

δx

sort1

π1,3

σ2=hasAdvisor

input

Page 69: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

33/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

Optimizations:

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

µx

π3

σ1=Einstein

σ2=hasAdvisor

input

sort

π3

⋊⋉1=1

sort1

δx

sort1

π1,3

σ2=hasAdvisor

input

(1st)

Page 70: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

33/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

Optimizations:

µx

π2

σ1=Einstein

π1,3

σ2=hasAdvisor

input

π3

⋊⋉1=1

x π1,3

σ2=hasAdvisor

input

£ a

sort1

π1,3

σ2=hasAdvisor

input

µx

π3

σ1=Einstein

σ2=hasAdvisor

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

Page 71: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

awk ’( $1 == "Einstein"

&& $2 == "hasAdvisor"){ print $3 >> "b" }

($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }

’ <(read_ntriples input)

# lock a(

sort -k 1 pre_a > a# unlock a

) &

Page 72: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

awk ’( $1 == "Einstein"

&& $2 == "hasAdvisor"){ print $3 >> "b" }

($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }

’ <(read_ntriples input)

# lock a(

sort -k 1 pre_a > a# unlock a

) &

Page 73: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

awk ’( $1 == "Einstein"

&& $2 == "hasAdvisor"){ print $3 >> "b" }

($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }

’ <(read_ntriples input)

# lock a(

sort -k 1 pre_a > a# unlock a

) &

Page 74: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

awk ’( $1 == "Einstein"

&& $2 == "hasAdvisor"){ print $3 >> "b" }

($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }

’ <(read_ntriples input)

# lock a(

sort -k 1 pre_a > a# unlock a

) &

Page 75: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

awk ’( $1 == "Einstein"

&& $2 == "hasAdvisor"){ print $3 >> "b" }

($2 == "hasAdvisor"){ print $1 FS $3 >> "pre_a" }

’ <(read_ntriples input)

# lock a(

sort -k 1 pre_a > a# unlock a

) &

Page 76: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

while# ...sort -k 1 -u

<( # wait for ajoin -1 1 -2 1 -o 2.2

<(sort -k 1 -u delta)a

)

# ...[ -s delta ];do continue; done

Page 77: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Approach

34/41

QSPARQL / OWL

&Bash script

Q1. Datalog

σπ2. Algebra

÷3. Optimize

^4. Translate

£ a

sort1

π1,3

σ2=adv.

input

µx

π3

σ1=Einstein

σ2=adv.

input

sort

π3

⋊⋉1=1

sort1

δx

a

(1st)

while# ...sort -k 1 -u

<( # wait for ajoin -1 1 -2 1 -o 2.2

<(sort -k 1 -u delta)a

)

# ...[ -s delta ];do continue; done

Page 78: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

35/41

How can I find all professors?

Professor(X) :- Person(X),teachesCourse(X,Y).

Professor(X) :- advisorOf(X,Y),Professor(Y).

Person(X) :- Employee(X).Person(X) :- Professor(X).

Professor(X) :- Professor(X),teachesCourse(X,Y).

Page 79: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

35/41

How can I find all professors?

Professor(X) :- Person(X),teachesCourse(X,Y).

Professor(X) :- advisorOf(X,Y),Professor(Y).

Person(X) :- Employee(X).Person(X) :- Professor(X).

Professor(X) :- Professor(X),teachesCourse(X,Y).

Page 80: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

35/41

How can I find all professors?

Professor(X) :- Person(X),teachesCourse(X,Y).

Professor(X) :- advisorOf(X,Y),Professor(Y).

Person(X) :- Employee(X).Person(X) :- Professor(X).

Combining the first and the last rule leads to

Professor(X) :- Professor(X),teachesCourse(X,Y).

Page 81: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

35/41

How can I find all professors?

Professor(X) :- Person(X),teachesCourse(X,Y).

Professor(X) :- advisorOf(X,Y),Professor(Y).

Person(X) :- Employee(X).Person(X) :- Professor(X).

Combining the first and the last rule leads to

Professor(X) :- Professor(X),teachesCourse(X,Y).

Page 82: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

36/41

µx

π1

⋊⋉1=1

employee x

teachesCourse

π1

⋊⋉2=1

advisorOf x

( ) (?)

( ) (?)

( )

( , , ?)

( )

( )

(?)

(?, , )

( )

Page 83: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

36/41

µx

π1

⋊⋉1=1

employee x

teachesCourse

π1

⋊⋉2=1

advisorOf x

( )

(?)

( )

(?)

( )

( , , ?)

( )

( )

(?)

(?, , )

( )

Page 84: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

36/41

µx

π1

⋊⋉1=1

employee x

teachesCourse

π1

⋊⋉2=1

advisorOf x

( )

(?)

( )

(?)

( )

( , , ?)

( )

( )

⇒ superfluous

(?)

(?, , )

( )

Page 85: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

36/41

µx

π1

⋊⋉1=1

employee x

teachesCourse

π1

⋊⋉2=1

advisorOf x

( ) (?)

( ) (?)

( )

( , , ?)

( )

( )

⇒ superfluous

(?)

(?, , )

( )

Page 86: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

36/41

µx

π1

⋊⋉1=1

employee x

teachesCourse

π1

⋊⋉2=1

advisorOf x

( ) (?)

( ) (?)

( )

( , , ?)

( )

( )

⇒ superfluous

(?)

(?, , )

( )

⇒ necessary

Page 87: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Optimization

36/41

µx

π1

⋊⋉1=1

employee x

teachesCourse

π1

⋊⋉2=1

advisorOf x

(c1) (?)

(c1) (?)

(c1)

(c1, c1, ?)

(c1)

(c1)

⇒ superfluous

(?)

(?, c1, c1)

(c1)

⇒ necessary

Page 88: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Answering Queries with Unix Shell: Experiments

37/41

◮ Dataset: LUBM university benchmark

◮ 14 different queries

◮ Competitors: Datalog-based systems (DLV, Souffle, RDFox),Triple stores (Jena, Stardog, Virtuoso),Database management systems (MonetDB, Postgres)

Number of finished queries within time limit

BashDLV

Souffle

RDFoxJena

Stardog

Virtuoso

MonetDB*

Postgres*

0

5

10

14

#o

ffi

nis

hed

qu

erie

s

LUBM 10

* = we folded the TBox into the query

Page 89: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Answering Queries with Unix Shell: Experiments

38/41

101 102 103

100

101

102

103

# of universities

Ru

nti

me

(s)

Bash DLV RDFox

Stardog Souffle Jena

Virtuoso MonetDB Postgres

Page 90: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Experiments

39/41

dataset Bash RDFox BigDatalog Stardog Virtuoso

LiveJournal 117 70 532 941 -orkut 225 121 1838 1123 -friendster 16306 - - - -

Table: Runtime for the reachability query, in seconds.

Page 91: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Motivation

Idea

Approach

Optimization

Experiments

Summary

Conclusion

Answering Queries with Unix Shell: Summary

40/41

Summary:

◮ Preprocess large datasets without installing software

◮ Supports subset of SPARQL / OWL and Datalog as query language

◮ Try it online athttps://www.thomasrebele.org/projects/bashlog

◮ Source code available athttps://github.com/thomasrebele/bashlog

Future work:

◮ Numerical comparisons

◮ Aggregations (e.g., max, count)

Publication: ISWC 2018 (full paper)

Thomas Rebele Thomas P. Tanon Fabian Suchanek

Page 92: Expanding the YAGO knowledge base - Thomas Rebele · YAGO knowledge base Rebele The YAGO knowledge base What is a knowledge base? What is YAGO? Involvement Outline Using YAGO for

Expanding theYAGO knowledge

base

Rebele

The YAGOknowledge base

Outline

Using YAGO forthe humanities

Adding Words toRegexes

AnsweringQueries with UnixShell

Conclusion

Conclusion

41/41

This thesis showed how to extend YAGO along several axes:

◮ Improve completeness w.r.t. people

◮ Automatically repairing of its regular expressions

◮ Preprocessing queries using only a Bash shell

Other accomplishments:

◮ Source code of all contributions is available online

◮ Publications at ISWC 2016 (resource paper),ISWC 2017 (demo, workshop), PAKDD 2018 (full paper),ISWC 2018 (full paper)

Future work:

◮ More studies on human society using facts from YAGO (ongoing)

◮ Combine YAGO and Wikidata

◮ Queries with numerical comparisons and aggregations