Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured...

22
Combining Information Extraction, Deductive Reasoning and Machine Learning for Relation Prediction Xueyan Jiang 2 , Yi Huang 1,2 , Maximilian Nickel 2 , Volker Tresp 1,2 Siemens AG, Corporate Technology, Munich, Germany 1 Ludwig Maximilian University of Munich, Munich, Germany 2 May 31, 2012 1 / 22

Transcript of Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured...

Page 1: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Combining Information Extraction, DeductiveReasoning and Machine Learning for Relation

Prediction

Xueyan Jiang 2, Yi Huang 1,2, Maximilian Nickel 2,Volker Tresp 1,2

Siemens AG, Corporate Technology, Munich, Germany 1

Ludwig Maximilian University of Munich, Munich, Germany 2

May 31, 2012

1 / 22

Page 2: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Introduction

Relation prediction in RDF graph

RDF graph: knowledge base in form of a triple store

2 / 22

Page 3: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Introduction

Relation prediction in RDF graph

RDF graph: knowledge base in form of a triple storeTask: predict the truth of an instance of a relation orstatement, i.e. of an RDF triple

3 / 22

Page 4: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Introduction

Knowledge base: existing triples and new triples derived fromdeductive reasoning

4 / 22

Page 5: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Introduction

Unstructured contextual information: Wikipedia pages, Webpages, texts in literals

5 / 22

Page 6: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Motivation

Common approaches for relation prediction

IE (Information Extraction)

Data source: unstructured data, such as texts or imagesLimitation: unstructured information may not be available

DR (Deductive Reasoning)

Data source: a set of axiomsLimitation: can only derive subset, difficult to deal withuncertainty

ML (Machine Learning)

Data source: a set of true statementsLimitation: data must contain relevant statistical structureAdvantage: can express statistical dependencies betweenrelations, handle incomplete data

6 / 22

Page 7: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Motivation

Common approaches for relation prediction

IE (Information Extraction)Data source: unstructured data, such as texts or imagesLimitation: unstructured information may not be available

DR (Deductive Reasoning)Data source: a set of axiomsLimitation: can only derive subset, difficult to deal withuncertainty

ML (Machine Learning)Data source: a set of true statementsLimitation: data must contain relevant statistical structureAdvantage: can express statistical dependencies betweenrelations, handle incomplete data

Proposal

Combine IE, DR and ML in a principled way to make use of allknowledge sources for relation prediction

7 / 22

Page 8: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Outline

Matrix Representation for an RDF Graph

Proposed Framework for Combining IE, DR and ML

Prediction of relations from unstructured information (IE step)Derivation of relations from the knowledge base (DR step)Combination of IE step and DR stepDerivation of confidence values for predicted relations using aprobabilistic latent factor model (ML step)

8 / 22

Page 9: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Matrix Representation for an RDF Graph

We construct a matrix X from the RDF graphEach subject is represented as a rowEach column represents a (p,o) pair

9 / 22

Page 10: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Matrix Representation for an RDF Graph

We construct a matrix X from the RDF graphEach subject is represented as a rowEach column represents a (p,o) pairA matrix element X(s,p,o) is equal to one if the correspondingtriple is known to exist and is equal to zero otherwise

10 / 22

Page 11: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Prediction of relations from unstructured information(IE step)

In principle, any IE system can be usedIn our approach, we build a classifier to predictP(X = 1|IE ) ⇐⇒ P(X = 1|textsubject , textobject)

11 / 22

Page 12: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Prediction of relations from unstructured information(IE step)

In principle, any IE system can be usedIn our approach, we build a classifier to predictP(X = 1|IE ) ⇐⇒ P(X = 1|textsubject , textobject)

12 / 22

Page 13: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Derivation of relations from the knowledge base (DR step)

Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used

13 / 22

Page 14: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Derivation of relations from the knowledge base (DR step)

Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used

14 / 22

Page 15: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Derivation of relations from the knowledge base (DR step)

Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used

15 / 22

Page 16: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Combination of IE step and DR step:P(X = 1|IE ,DR) = max(P(X = 1|IE ),P(X = 1|DR))

16 / 22

Page 17: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

Derivation of confidence values for predicted relations using aprobabilistic latent factor model (ML step)

Model descriptionWe define a new parameterization with a continuous fi,k usingsig(fi,k) = P(Xi,k = 1|IE ,DR)For each subject entity ei we introduce a d-dimensional latentvariable hi ∼ N(0, I )For each subject entity ei , αi is generated, via αi = Ahi ,where A has d columnsThen we assume fi,k = αi,k + εi,k

17 / 22

Page 18: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Proposed Framework for Combining IE, DR and ML

The maximum likelihood solution can be written as

α̂i = Ud diagd

(λj − σ̂2

λj

)UTd fi

where the columns of Ud are the principal d eigenvectors ofthe covariance matrix C = FTF with eigen values λ1, . . . , λd

Then P(Xi ,k = 1|IE ,DR,ML) = sig(α̂i ,k)

18 / 22

Page 19: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Experiments

Predicting gene-disease-relationships using LOD’s Linked LifeData and BIO2RDF (2462 genes, 331 diseases)

Target: for a given gene, predict likely diseasesIE: text fields from literals

19 / 22

Page 20: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Experiments

YAGO2 experiment: Prediction of writers’ nationalitiesML: 354 writers, 4 countries, city of birthML + AGG: include as columns the country of birth, derivedfrom the city of birth using geo reasoning (DR)IE: unstructured data from wikipages of the writers

20 / 22

Page 21: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Conclusion

IE: Exploit unstructured information

DR: Exploit axiomatic knowledge

ML: Exploit statistical patterns

We proposed an efficient way to combine ML, IE and DR in aprobabilistic model

21 / 22

Page 22: Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured information may not be available DR (Deductive Reasoning) Data source: a set of axioms

Thanks!

22 / 22