Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+...
Transcript of Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+...
![Page 1: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/1.jpg)
1
Part 2: Knowledge Extraction
Part 3:Graph Construction
Part 1: Knowledge Graphs
Part 4: Critical Analysis
![Page 2: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/2.jpg)
Tutorial Outline1. Knowledge Graph Primer [Jay]
2. Knowledge Extraction from Texta. NLP Fundamentals [Sameer]b. Information Extraction [Bhavana]
Coffee Break
3. Knowledge Graph Constructiona. Probabilistic Models [Jay]b. Embedding Techniques [Sameer]
4. Critical Overview and Conclusion [Bhavana]
2
![Page 3: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/3.jpg)
3
John Lennon
Alfred Lennon
Julia Lennon
Liverpoolbirthplace
childOf
childOf
John was born in Liverpool, to Julia and Alfred Lennon.
John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person
NNP VBD VBD IN NNP TO NNP CC NNP NNP
Lennon..John Lennon...
Mrs. Lennon.... his mother ..
his fatherAlfredhe
the PoolNLP
InformationExtraction
Extraction graph
Annotated text
Text
![Page 4: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/4.jpg)
Information Extraction3 IMPORTANT SUB-‐PROBLEMSCATEGORIES OF IE TECHNIQUES
KNOWLEDGE FUSION
IE SYSTEMS IN PRACTICE
4
![Page 5: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/5.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domain
Learning extractors
Scoring the facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
5
![Page 6: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/6.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domainLearning extractors
Scoring the facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
6
![Page 7: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/7.jpg)
Defining Domain: Manual
7
Everything
Animals
Mammals Reptiles
Food
Fruits Vegetables
Subset
Disjoint
[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
![Page 8: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/8.jpg)
Defining Domain: Manual
8
Everything
Animals
Mammals Reptiles
Food
Fruits Vegetables
Animal-‐eats-‐Food
[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
• Highly semantic ontology
• Leads to high precision extractions
• Expensive to create• Requires domain
experts
![Page 9: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/9.jpg)
Defining Domain: Semi-‐automatic• Subset of types are manually defined
• SSL methods discover new types from unlabeled data
9
Everything
Animals
Mammals Reptiles
Food
Fruits Vegetables Beverages
Location
Country City
[Exploratory Learning, Dalvi et al., ECML 2013] [Hierarchical Semi-‐supervised Classification with Incomplete Class Hierarchies, Dalvi et al., WSDM 2016]
Everything
Animals
Mammals Reptiles
Food
Fruits Vegetables
![Page 10: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/10.jpg)
Defining Domain: Semi-‐automatic• Assume: Types and type hierarchy is manually definedE.g. River, City, Food, Chemical, Disease, Bacteria
• Relations are automatically discovered using clustering methods
10
Discovered relation
Patterns Seed instances
River-‐in heart of-‐City
“in heart of”“in the center of”“which flows through”
“Seine, Paris”, “Nile, Cairo”“Tiber river, Rome”“River arno, Florence”
Food-‐to produce-‐Chemical
“to produce”“to make”“to form”
“Salt, Chlorine”“Sugar, Carbon dioxide”“Protein , Serotonin”
Disease-‐caused by-‐Bacteria
“caused by”“is the causative agent of”“is the cause of”
“pneumonia, legionella”“mastitis, staphylococcus aureus”“gonorrhea, neisseriagonorrhoeae”
[Discovering Relations between Noun Categories, Mohamed et al., EMNLP 2011]
• Easier to derive types using existing resources
• Relations are discovered from the corpus
• Leads to moderate precision extractions
• Partially semantic ontology
![Page 11: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/11.jpg)
Defining Domain: Automatic
• Any noun phrase is a candidate entity
• Any verb phrase is a candidate relation
11[Open Information Extraction from the Web, Banko et al., IJCAI 2007]
• Cheapest way to induce types/ relations from corpus
• Little expert annotations needed
• Limited semantics• Leads to noisy extractions
![Page 12: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/12.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domain
Learning extractors Scoring candidate facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
12
![Page 13: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/13.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domain
Learning extractors Scoring candidate facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
13
![Page 14: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/14.jpg)
Learning Extractors: Manual
• Human defined high-‐precision extraction patterns for each relation
14
Person-‐member of-‐Band
<PERSON> works for <BAND><PERSON> is part of <BAND>
Extract relation instances(John Lennon, The Beatles)
(Brian Jones, The Rolling Stones)
![Page 15: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/15.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domain
Learning extractors Scoring candidate facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
15
![Page 16: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/16.jpg)
Learning Extractors: Semi-‐supervised
16
Set of relation instances (I)
Set of extraction patterns (P)
Extract patterns that occur around relation instances in I
Apply patterns in P to extract more relation instances
Seed instances
Bootstrapping
![Page 17: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/17.jpg)
Learning Extractors: Semi-‐supervised
17[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
Person-‐member of-‐Band
<PERSON> works for <BAND><PERSON> is part of <BAND><BAND> includes <PERSON><BAND> was admired by <PERSON>
Relation instances(John Lennon, Beatles)
(Brian Jones, The Rolling Stones)
Learn patterns
Apply patterns
Seed instances
Candidate facts(Ringo Starr, The Beatles)(Nick Mason, Pink Floyd)
Add top-‐kinstances
Semantic Drift!
![Page 18: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/18.jpg)
Learning Extractors : Interactive
18[Open information extraction to KBP relations in 3 hours, Soderland et al., TAC KBP 2013]
++-‐-‐
Person-‐member of-‐Band
<PERSON> works for <BAND><PERSON> is part of <BAND><BAND> was invited by <PERSON><BAND>’s manager <PERSON>
Relation instances(John Lennon, Beatles)
(Brian Jones, The Rolling Stones)
Learn patterns
Apply correct patterns
Seed instances
Candidate facts(Nick Mason, Pink Floyd)(Allen Klein, The Beatles)
+-‐
Positive instances
Helps reduce semantic drift!
![Page 19: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/19.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domain
Learning extractors Scoring candidate facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
19
![Page 20: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/20.jpg)
Learning Extractors : Unsupervised•Identify candidate relations:
for each verb find the longest sequence of words s.t. syntactic and lexical constraints are satisfied
•Identify arguments for each relation:For each identified relation phrase r, find the closest noun-‐phrases on the left and right of rsatisfying certain syntactic constraints
20[Identifying Relations for Open Information Extraction, Fader et al., EMNLP 2011]
Syntactic constraint
Regular expressions of POS tags
Lexical constraint
|distinct arguments| a relation phrase takes
![Page 21: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/21.jpg)
Learning Extractors : Unsupervised
21[Identifying Relations for Open Information Extraction, Fader et al., EMNLP 2011]
Hudson was born in Hampstead, which is a suburb of London.
e1: (Hudson, was born in, Hampstead) e2: (Hampstead, is a suburb of, London)
![Page 22: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/22.jpg)
Information Extraction
3 CONCRETE SUB-‐PROBLEMS
Defining domain
Learning extractors
Scoring candidate facts
3 LEVELS OF SUPERVISION
Supervised
Semi-‐supervised
Unsupervised
22
![Page 23: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/23.jpg)
Scoring the candidate facts•Human defined scoring function orScoring function learnt using supervised ML with large amount of training data{expensive, high precision}
•Small amount of training data is availablescoring refined over multiple iterations using both labeled and unlabeled data
•Completely automatic (Self-‐training)Confidence(extraction pattern) ∝ (#unique instances it could extract)Score(candidate fact) ∝ (#distinct extraction patterns that support it){cheap, leads to semantic drift}
![Page 24: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/24.jpg)
Impact of early supervision
Defining domain
Extractors for each relation of interest
Scoring the candidate facts
24
Puts constraints on the space of possibly true
extractionsEarly removal of noisy extraction pattern can avoid semantic drift in
later stages
Enables inheritance and mutual exclusion at extractor level
Domainexpertiseneeded
![Page 25: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/25.jpg)
Effect of supervision on extractions
25
Precision,Human efforts
Recall,Speed
![Page 26: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/26.jpg)
Information Extraction3 IMPORTANT SUB-‐PROBLEMS
CATEGORIES OF IE TECHNIQUESKNOWLEDGE FUSION
IE SYSTEMS IN PRACTICE
26
![Page 27: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/27.jpg)
Categories of IE Techniques1. Narrow domain patterns
2. Ontology based extraction
3. Interactive extraction
4. Open domain IE
5. Hybrid approach (Adding structure to OpenIE KB)
27
![Page 28: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/28.jpg)
(1) Narrow domain patterns
Arg1 Arg 2,Person Organization
DT CEO of
appos nmod
casedet Implies Arg1 Arg2headOf
28
Person OrganizationheadOf
Defining domain
Learning extractors
Scoringcandidate facts
![Page 29: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/29.jpg)
(1) Narrow domain patterns
29
Defining domain
Learningextractors
Scoringcandidate facts
![Page 30: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/30.jpg)
(2) Ontology based extraction
30
Everything
Animals
Mammals Reptiles
Food
Fruits Vegetables
Animal-‐eats-‐Food
Disjoint
Subset
Defining domain
![Page 31: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/31.jpg)
(2) Ontology based extraction
31
instances (I)
patterns (P)
Extract patterns
Apply patterns
Bootstrapping
Everything
Animals
Mammals Reptiles
Food
Fruits Vegetables
Disjoint
Subset
Ontological constraints
[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
![Page 32: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/32.jpg)
(2) Ontology based extraction
32
instances (I)
patterns (P)
Extract patterns
Apply patterns
Disjoint
Subset Subset
Animal
Mammal Reptile
[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
instances (I)
patterns (P)
Extract patterns
Apply patterns
instances (I)
patterns (P)
Extract patterns
Apply patterns
Coupled Bootstrap learning
![Page 33: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/33.jpg)
(2) Ontology based extraction
33
Arg1 ISA Animal Arg2 ISA Food
Animal eats Food
Animal Food
[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
instances (I)
patterns (P)
Extract patterns
Apply patterns
instances (I)
patterns (P)
Extract patterns
Apply patterns
instances (I)
patterns (P)
Extract patterns
Apply patterns
LearningextractorsCoupled Bootstrap learning
![Page 34: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/34.jpg)
(2) Ontology based extraction•Self-‐training for scoring candidate facts• Confidence(extraction pattern) ∝ (#unique instances it could extract)• Score(candidate fact) ∝ (#distinct extraction patterns that support it)
34[Toward an Architecture for Never-Ending Language Learning, Carlson et al. AAAI 2010]
Scoringcandidate facts
![Page 35: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/35.jpg)
(2) Ontology based extraction
35
Defining domain
Learningextractors
Scoringcandidate facts
![Page 36: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/36.jpg)
(3) Interactive Extraction
36
++-‐-‐
Person-‐member of-‐Band
<PERSON> works for <BAND><PERSON> is part of <BAND><BAND> was invited by <PERSON><BAND>’s manager <PERSON>
Relation instances(John Lennon, Beatles)
(Brian Jones, The Rolling Stones)
Learn patterns
Apply correct patterns
Seed instances
Candidate instances(Nick Mason, Pink Floyd)(Allen Klein, The Beatles)
+-‐
Positive instances
Defining domain
Learningextractors
Scoringcandidate facts
[ IKE -‐ An Interactive Tool for Knowledge Extraction, Dalvi et al, AKBC 2015 ]
![Page 37: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/37.jpg)
(3) Interactive Extraction
37
Defining domain
Learningextractors
Scoringcandidate facts
![Page 38: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/38.jpg)
Can we do Web-‐scale IE?
1. Narrow domain patterns
2. Ontology based extraction
3. Interactive extraction
4. Open domain IE
5. Hybrid approach (Adding structure to OpenIE KB)
38
Assume expert inputBiased towards high precisionHigh costs
![Page 39: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/39.jpg)
(4) Open domain IE
39
Open domainany NP is a candidate entityAny VP is a candidate relation
Hudson was born in Hampstead, which is a suburb of London.
Scoring based on classifier (features: POS tags, dependency parse ...)
(Hudson, was born in, Hampstead) : 0.88 (Hampstead, is a suburb of, London) : 0.9
[Identifying Relations for Open Information Extraction, Fader et al, EMNLP 2011]
Defining domain
Scoringcandidate facts
Learning extractors
![Page 40: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/40.jpg)
(4) Open domain IE
40
Defining domain
Learningextractors
Scoringcandidate facts
![Page 41: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/41.jpg)
Pros and Cons of Open domain IE• Open domain IE paradigm can be easily applied • on a large scale corpus • in a new domain (no training data)
•Main disadvantages• Poor aggregationDoesn’t detect different surface forms for same entity or relation
• Lack of semantics OpenIE merely tells us how many times the lexical fact occurred in a corpus
41
![Page 42: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/42.jpg)
(5) Hybrid approach(adding structure to Open IE KB)
42
Open IEKB
Cluster noun-‐phrases
Cluster verb-‐phrases
[ Canonicalizing Open Knowledge Bases, Galárraga at al., CIKM 2014 ]
CanonicalizedKB
![Page 43: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/43.jpg)
(5) Hybrid approach • Clustering entities
• Clustering relations
43[ Canonicalizing Open Knowledge Bases, Galárraga at al., CIKM 2014 ]
![Page 44: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/44.jpg)
(5) Hybrid approach
44[Discovering Semantic Relations from the Web and Organizing them with PATTY, SIGMOD 2013]
Cluster typedrelations
Relation-‐1 cluster
OpenIE KB
Relation-‐n cluster
hiearachyExisting type hierarchye.g. YAGO, Freebase
![Page 45: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/45.jpg)
(5) Hybrid approach
45
Defining domain
Learningextractors
Scoringcandidate facts
Open domain IE
Distant supervision to add structure
![Page 46: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/46.jpg)
Categories of IE Techniques
1. Narrow domain patterns
2. Ontology based extraction
3. Interactive extraction
4. Open domain IE
5. Hybrid approach (Adding structure to OpenIE KB)
46
Assume expert inputBiased towards high precisionHigh cost
No expert annotations Biased towards high recallLow cost
![Page 47: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/47.jpg)
Information Extraction3 IMPORTANT SUB-‐PROBLEMS
CATEGORIES OF IE TECHNIQUES
KNOWLEDGE FUSION IE SYSTEMS IN PRACTICE
47
![Page 48: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/48.jpg)
Knowledge fusion
Defining domain
Learning extractors
Scoring candidate facts
Manual
Semi-‐automatic
Automatic
48
Fusing multiple extractors
Single extractor
![Page 49: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/49.jpg)
Multiple extractors• Extractor 1: text patterns to extract ISA relationse.g. coupled pattern learner
• Extractor 2: learning wrappers for HTML pages to extract ISA relations from structured text
49
![Page 50: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/50.jpg)
Knowledge fusion schemes• Voting (AND vs OR of extractors)
• Co-‐training (multiple extraction methods)
•Multi-‐view learning (multiple data sources)
• Classification
50
![Page 51: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/51.jpg)
(1) Voting Schemes• AND of two extractors:• For a candidate extraction to be promoted to a fact in KB, both the extractors should support the fact
• score(fact) = Min(score_extractor1(fact), score_extractor2(fact))
•OR of two extractors• For a candidate extraction to be promoted to a fact in KB, both the extractors should support the fact
• score(fact) = Max(score_extractor1(fact) , score_extractor2(fact))
•Hand-‐coded heuristic rules• E.g. (at least one extractor has confidence > 0.9) or
(two extractors support the fact with confidence > 0.6)…..
51
![Page 52: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/52.jpg)
(2) Co-‐training
52
Extractor A
Extract instances using Extractor A
Extract instances using Extractor B
Instance Set A
Instance Set B
Extractor B
Acquire patterns for Extractor B
Acquire patterns for Extractor A
[ Combining Labeled and Unlabeled Data with Co-‐Training, Blum and Mitchell, CoLT 1998 ]
![Page 53: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/53.jpg)
(3) Multi-‐view learning• Task: Entity typing
• Each entity can be represented using two independent data views
53[Multi-‐View Hierarchical Semi-‐supervised Learning by Optimal Assignment of Sets of Labels to Instances, Dalvi et al. in preparation, link]
Entity: Carnegie Mellon University
![Page 54: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/54.jpg)
(3) Multi-‐view learning
54
Extractor for View A
Update parameters per view
Instancelabels
Extractor for View B
Maximize score of label assignment,Minimize disagreement between views
[Multi-‐View Hierarchical Semi-‐supervised Learning by Optimal Assignment of Sets of Labels to Instances, Dalvi et al. in preparation, link]
![Page 55: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/55.jpg)
(4) Classification
55[Dong, Xin et al. “Knowledge vault: a web-‐scale approach to probabilistic knowledge fusion.” KDD (2014)]
Text documents
(TXT)
Classifier
HTML Tables (TBL)
Per candidate fact per extractor features: # sources,Avg score …
HTML trees(DOM)
P(candidate fact = true)
![Page 56: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/56.jpg)
Knowledge fusion schemes• Voting (AND vs OR of extractors)
• Co-‐training (multiple extraction methods)
•Multi-‐view learning (multiple data sources)
• Classification
56
![Page 57: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/57.jpg)
Information Extraction3 IMPORTANT SUB-‐PROBLEMS
CATEGORIES OF IE TECHNIQUES
KNOWLEDGE FUSION
IE SYSTEMS IN PRACTICE
57
![Page 58: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/58.jpg)
IE systems in practice• Conceptnet
• NELL
• Knowledge vault
• Open IE
58
![Page 59: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/59.jpg)
ConceptNet
59
ConceptNet is a freely-‐available semantic network, designed to help computers understand the meanings of words that people use.
This knowledge was derived from thousands of human contributors.
![Page 60: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/60.jpg)
Never Ending Language Learning (NELL)
60[Never-‐Ending Learning, Mitchell et al., AAAI 2015 ]
![Page 61: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/61.jpg)
Knowledge Vault
61[Architecture diagram taken from Kevin Murphy’s slides]
![Page 62: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/62.jpg)
Open IE (KnowItAll)
62[Architecture diagram taken from Oren Etzioni’s slides]
![Page 63: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/63.jpg)
63
Defining domain
Learningextractors
Scoring candidate facts
Fusing extractors
IE systems at a glance
![Page 64: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/64.jpg)
64
Defining domain
Learningextractors
Scoring candidate facts
Fusing extractors
ConceptNet
NELL
Knowledge Vault
OpenIE
IE systems at a glance
Heuristic rules
Classifier
![Page 65: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/65.jpg)
Tutorial Outline1. Knowledge Graph Primer [Jay]
2. Knowledge Extraction from Texta. NLP Fundamentals [Sameer]b. Information Extraction [Bhavana]
Coffee Break
3. Knowledge Graph Constructiona. Probabilistic Models [Jay]b. Embedding Techniques [Sameer]
4. Critical Overview and Conclusion [Bhavana]
65
![Page 66: Part%1:%Knowledge%Graphs Part%2:% Part%3: Knowledge% … · 3 John+ Lennon Alfred Lennon Julia+ Lennon Liverpool birthplace childOf childOf John was born in Liverpool, to Julia and](https://reader033.fdocuments.us/reader033/viewer/2022050115/5f4c5674068305465e60d2cc/html5/thumbnails/66.jpg)
Thank YouSEE YOU AFTER THE COFFEE BREAK!
66