Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two...
Transcript of Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two...
![Page 1: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/1.jpg)
Incremental Knowledge Base Construction Using Deep Dive
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré
Presented by: Usama and AndrewSlides adapted from Ofir Ymir
![Page 2: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/2.jpg)
Takeaways● Knowledge base construction is iterative
○ Can benefit from incremental techniques
● Two methods for incremental inference○ Sampling○ Variational○ Decided using rule-based optimizer
● DeepDive is a Knowledge Base Construction engine○ Uses database and machine learning techniques○ Faster and more accurate than experts
![Page 3: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/3.jpg)
3
Introduction❑ What does DeepDive do ?
“System to extract value from dark data.”
Extract complex relationships between entities and form facts about these entities.
Wisci(-pedia)
![Page 4: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/4.jpg)
4
Introduction❑ What is KBC system ?
The process of populating a knowledge base (KB) with facts (or assertions) extracted from data (e.g., text, audio, video, tables, diagrams, ...)
❑ What is a Knowledge Base ?A technology used to store information
![Page 5: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/5.jpg)
The GoalUnstructured information
High-quality structured
Knowledge Base
5
![Page 6: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/6.jpg)
6
Quality
❑ How do we assess quality ?
• Precision – how often a claimed tuple is correct
• Recall – how many of the possible tuples to extract wereare actually extracted
https://commons.wikimedia.org/wiki/User:Walber
![Page 7: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/7.jpg)
Datalog review● parent(bill, mary).● Bill is the parent of Mary
● ancestor(X,Y) :- parent(X,Y).● ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).
https://en.wikipedia.org/wiki/Datalog
![Page 8: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/8.jpg)
KBC ExampleOne may use DeepDive to build an application to extractspouse relations from sentences in the Web.
Tuple in the has_spouse tablerepresenting the fact (for example), “Barack Obama” is married to “Michelle Obama”.
Output
8
![Page 9: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/9.jpg)
KBC Terminology
Entity – Object in the real world (the person Barack)
Entity Level Relation – Relation (e.g. has_spouse) among entities
Mention – Reference in the sentence to an Entity (the word “Barack” ) Mention Level Relation – Relation (e.g. has_spouse) among mentions
Entity Linking – The process of referring mentions to entities (the words“Barack” refers to the entity Barack)
9
![Page 10: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/10.jpg)
KBC Terminology
10
![Page 11: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/11.jpg)
11
DeepDive – End to End framework
Building KBC system:
Input – Collection of unstructured data ranging from textdocument to existing but incomplete KB’s.
Output – Relational database containing facts extracted from the input.
![Page 12: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/12.jpg)
DeepDive – End to End framework
The developer (user) develop the orange parts
12
![Page 13: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/13.jpg)
10
DeepDive – 2 main phases❑ Grounding
SQL queries to produce a data structure called factor graph that describes a set of nodes and how they are correlated.
❑ InferenceStatistical action using standard techniques on the factor graph. The output of the inference phase is marginal probability of every tuple in the database.
![Page 14: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/14.jpg)
Data flowData Preprocessing
ExtractionFactor Graph Generation
Statistical Inference And LearningError Analysis
Phase 1
11
Phase2
![Page 15: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/15.jpg)
DeepDive takes input data (articles in text format), loads them
into a relational database:
•By default, DeepDive stores all documents in the database in one sentence per row with markup produced using standard Natural Languages Processing (NLP) pre-processing tools.
•The output is sentence-level information including words
in each sentence, POS tags, named entity tags, etc.
Data Preprocessing
15
![Page 16: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/16.jpg)
DeepDive processes the data to create entities.It performs entity linking, feature extraction, and distant supervision, to create the variables (nodes) on which it will then perform inference.
•The results of extraction will be then used to build the factor graph according to rules specified by the user.
Extraction
16
![Page 17: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/17.jpg)
DeepDive executes 2 types of queries:
•Candidate mapping – SQL queries that produce possible mentions, entities and relations.
•Feature extractors – Associate features to candidates e.g.,“… and his wife…” in the input file
Extraction
17
![Page 18: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/18.jpg)
Candidate mapping:Simple SQL queries with User Defined Functions.
with low precision but high-recall.If candidate mapping misses a fact then DeepDive has no chance to extract it.
Extraction
18
![Page 19: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/19.jpg)
Feature extraction (2 ways):
•User Defined functions – Specified by the user.
•Weights – Which weight should be used for a givenmentioning in a sentence.
Extraction
19
![Page 20: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/20.jpg)
Weight (example):
=> which weight should be used for a given phrase in a sentence e.g.
“and his wife”
Extraction
20
![Page 21: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/21.jpg)
![Page 22: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/22.jpg)
Distant Supervision:Popular technique to create evidence in KBC systems. Collecting examples from existing database for the relation we want to extract. We then use these examples to automatically generate our training data.
For example, database contains the fact:“Barack Obama and Michelle Obama are married”.
We take this fact, and then label each pair of "Barack Obama" and "Michelle Obama" that appear in the same sentence as a positive example for our marriage relation.
Extraction
22
![Page 23: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/23.jpg)
![Page 24: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/24.jpg)
A factor graph is a type of probabilistic graphical model.It has two types of nodes:
•Variables - Either evidence variables when their value is known, or query variables when their value should be predicted.
•Factors - Define the relationships between variables in the graph. Each factor can be connected to many variables and comes with a factor function to define the relationship between these variables.
Factor Graph Generation
24
![Page 25: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/25.jpg)
For example, if a factor node is connected to two variables nodes A and B then a possible factor function could be imply(A,B) => if (A = 1) then (B = 1).
Each factor function has a weight associated.The weight is the confidence we have in the relationship expressed by the factor function.
Factor Graph Generation
20
![Page 26: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/26.jpg)
The developer (user) writes SQL queries:1. Instruct the system about which variables to create2. How to connect them using factor functions.
These queries usually involve tables (evidence) from the extraction step.
Factor Graph Generation
26
![Page 27: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/27.jpg)
Factor Graph Generation
27
![Page 28: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/28.jpg)
Grounding is the process of writing the graph to disk so that it can be used to perform inference.
DeepDive writes the graph to a set of five files:one for variables, one for factors, one for edges, one for weights, and one for metadata useful to the system. The format of these file is special so that they can be accepted as input by the sampler.
Factor Graph Generation
28
![Page 29: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/29.jpg)
Data flow example (steps 1-3)
III.
I. Sentence is processed into words,POS tags and named entity tags
II. Extracting (1) mentions of person and location, (2) candidate relations of has_spouse, and (3) features of candidates relations (e.g. words between mentions)
DeepDive use rules written by developers (like inference_rule_1) to build a factor graph29
![Page 30: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/30.jpg)
The final step performs marginal inference on the factor graphvariables to learn the probabilities of different values.
The inference step we take the grounded graph (i.e., the five files written during the grounding step) as input, and a number of arguments to specify the parameters for the learning procedure.
The values of factor weights specified in inference rules are calculated. These weights represent, intuitively, the confidence in the rule.
Statistical Inference AndLearning
30
![Page 31: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/31.jpg)
After inference, the results are written into a set of database tables.
The developer (user) can get results via a SQL query andperform error analysis to improve results
Statistical Inference AndLearning
31
![Page 32: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/32.jpg)
At the end of the learning and inference, we have themarginal probability for each candidate fact.
Error analysis is the process of understanding the most common mistakes (incorrect extractions, too specific feature, candidate mistake, etc.) and deciding how to correct them.
The error analysis is written by the developer (user) using standard SQL queries.
Error Analysis
32
![Page 33: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/33.jpg)
33
➢ No reference to the underlying machine learning algorithms.Enable debugging the system independently from thealgorithms (inference phase).
➢ Developers write code, e.g. feature extraction, in familiarlanguages (such as Python, SQL, Scala).
➢ Familiar languages allows standard tools to inspect andvisualize the data.
➢ The developer construct End-To-End system and then refinesthe quality of the system.
DeepDive - Advantages
![Page 34: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/34.jpg)
Semantics of DeepDive● DeepDive is a set of rules with weights
○ In learning, weights are found to maximize probability of evidence○ In inference, weights are known
● : support of a rule in possible world I ● sign = 1 if boolean query q() is in I and -1 otherwise● : weight of in possible world I ● : world is more likely● : world is less likely
![Page 35: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/35.jpg)
● g: Support multiple semantics● Ratio is generally the best option● Each semantic affects efficiency
![Page 36: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/36.jpg)
Semantics of DeepDive
● : probability distribution over all J possible worlds
![Page 37: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/37.jpg)
Trump Semantics Example “Donald Trump’s hair is real”
“Donald Trump wears a live ferret”
● Solution? Vote:
● Produces equations:
![Page 38: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/38.jpg)
Trump example (cont.)Suppose |Up| = 1,000,000 and |Down| = 999,900
Linear: P[q()] ≅ 1
Ratio: P[q()] ≅ 0.5
Logical: P[q()] = 0.5
![Page 39: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/39.jpg)
39
Incremental KBCTo help the KBC system developer be more efficient, an incremental technique performed on the grounding and inference steps of KBC execution.The approach to incrementally maintaining a KBC runs in 2 phases:
•Incremental grounding – The goal is to evaluate an update to for the DeepDive program to produce the “delta” of the modified factor graph, i.e. the modified variables ΔV and factors ΔF.
•Incremental inference – The goal is by given (ΔV,ΔF) to runstatistical inference on the changed factor graph.
![Page 40: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/40.jpg)
40
Standard technique for delta rules
DeepDive is based on SQL, therefore we able to take advantage decades of work on incremental view maintenance.The input to this phase is the same as in the grounding phase, a set of SQL queries and the user schema.
The output of this phase is how the grounding changes, i.e. a set of modified variables ΔV and their factors ΔF.
Since V and F are simply views over the database, any common view maintenance technique can be applied to the incremental grounding.
![Page 41: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/41.jpg)
Standard technique for delta rulesDeepDive uses algorithm named DRed which includes bothadditional and deletion.
In DRed, for each relation (table) in the user’s schema, we create a delta relation , with the same schema as .For each tuple (row), .count represent the number ofderivations of in .
41
![Page 42: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/42.jpg)
Standard technique for delta rulesOn update, DeepDive update delta relations in 2 steps:
1. For tuples in ,DeepDive directly updates thecorresponding counts.
2. A SQL query called a “delta rule”written by the developer is executed which process this counts to generate modified variables ΔV and factors ΔF.
*The overhead of DRed is modest and the gain is substantial.
42
![Page 43: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/43.jpg)
Novel technique for incremental maintenance inference
Given a set of (ΔV,ΔF), the goal is to compute the new distribution.We split the problem into 2 phases:
1. In the Materialization phase, we are given access to entire DeepDive program and we attempt to store information about the original distribution, denoted .
2. In the Inference phase, we get the input from theMaterialization phase and the (ΔV,ΔF).Our goal is to perform inference with respect to the changed distribution, denoted .
43
![Page 44: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/44.jpg)
We present 3 techniques for the incremental inference phase on a factor graph :
Strawman: Complete MaterializationMaterialization phase – We explicitly store the values of the probability for every possible world .This approach has perfect accuracy, but storing all possible worlds takes exponential amount of space and time in the numbers of variables in the original factor graph.
![Page 45: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/45.jpg)
Inference phase – We use Gibbs Sampling:
Even if distribution has changed to , we only need access to and to the new factors in ΔF to perform Gibbs update.
We get speed improvement because we don’t need to access all factors in the original and to perform a computation with them since we can look at them up in .
Gibbs sampling: Markov chain Monte Carlo algorithm for obtaining observations from a probability distribution
![Page 46: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/46.jpg)
Sampling ApproachAt this approach we sample a set of possible worlds from the original distribution instead of all possible worlds.However, because we take samples from distribution which is different from the updated distribution we cannot use them directly, so we use standard Metropolis-Hasting scheme to ensure convergence to
46
Metropolis-Hasting: MCMC method to simulate multivariate distributions● Distribution of the next sample being dependent only on the current
sample value● Sample either accepted or rejected with some probability
![Page 47: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/47.jpg)
Sampling approach inference
Use an “acceptance test” to propose samples and accept them
Can be evaluated on without the entire factor graph
Only (ΔV,ΔF) is necessary
![Page 48: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/48.jpg)
Variational ApproachRather storing the exact original distribution, we store a factor graph with fewer factors that will approximates the original distribution.On a smaller graph, running inference and learning is faster.
Use log-determinant relaxation to select factors
![Page 49: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/49.jpg)
Variational ApproachInference: apply (ΔV,ΔF) update to the approximated graphRun inference and learning on the resulting new graph
![Page 50: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/50.jpg)
Rule-based optimizerFirst generate as many samples as possible, then
1. If update does not change structure of the factor graph ⇒ sampling2. If update modifies evidence ⇒ variational3. If update introduces new features ⇒ sampling4. If samples run out ⇒ variational
![Page 51: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/51.jpg)
![Page 52: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/52.jpg)
Experiments and evaluation
● Used DeepDive for Paleontology, Geology, a defence contractor and KBC competition.
● Evaluated the results using double-blind experiments.● 2 of the systems fared comparable or better than experts.● Probably due to better recall● Was the best system from 45 submissions in KBC competition.
![Page 53: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/53.jpg)
Influence of incremental strategy● Their main contribution is to introduce the incremental updates to knowledge
base and model.● Took snapshots of different DeepDive iterations● Compared the snapshots to understand the role of incremental techniques in
improving development speed.
![Page 54: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/54.jpg)
Datasets and workloadsTested DeepDive for 5 KBC systems:
News, genomics, adversarial, pharmacogenomics, paleontology
![Page 55: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/55.jpg)
News as an example
KB between persons, locations and organizations.
Input: new articles, webpages
Relations: HasSpouse, MemberOf etc
![Page 56: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/56.jpg)
Rules for news
![Page 57: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/57.jpg)
System design● Scala and C++● Greenplum for SQL● Feature extractor in python● Inference, learning, incremental maintenance in C++● 4 CPU(each 12-core), 1TB of ram, 12 1TB hard drives
![Page 58: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/58.jpg)
Performance and quality comparisonTwo DeepDive versions
● RERUN: Runs DeepDive from scratch● INCREMENTAL: Uses full strength of DeepDive discussed● Result: Speed up development of high-quality KBC through incremental
maintenance with little impact on quality
![Page 59: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/59.jpg)
Measures for comparison1. Time taken by the system2. F1 score
F1 score = 2*((precision*recall)/(precision+recall))
![Page 60: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/60.jpg)
Execution time comparison
● Competition winning F1 score of 0.36● 22X faster, 6hours vs 30minutes
![Page 61: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/61.jpg)
Quality comparison
● Similar end-to-end quality● 99% of high-confidence reruns(P > 0.9) appears in incremental and vice
versa
![Page 62: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/62.jpg)
Efficiency of evaluating update● Comparison of both strategies when a new update comes.● Two execution times are considered:● Time for feature extraction and grounding● Time for inference and learning● 360 times speed up for rule FE1 for feature extraction and grounding
![Page 63: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/63.jpg)
Inference and learning
![Page 64: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/64.jpg)
Materialization time● Incremental took 12 hours● 2x more samples than rerun● Only needs to be done once at the start● Amortized cost
![Page 65: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/65.jpg)
Lesion studies● Understand the performance trade-off● Disabled a component of DeepDive and kept others untouched● Disabled either sampling approach or variational approach● NoWorkLoad first runs sampling approach then runs variational● Using different materialization techniques for different groups of variables help
performance.
![Page 66: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/66.jpg)
practical applicationshttp://deepdive.stanford.edu/showcase/apps
![Page 67: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/67.jpg)
![Page 68: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/68.jpg)
![Page 69: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/69.jpg)
References
• Incremental Knowledge Base Construction Using DeepDive
• http://deepdive.stanford.edu/
69
![Page 70: Incremental Knowledge Base - Semantic Scholar€¦ · Can benefit from incremental techniques Two methods for incremental inference Sampling Variational Decided using rule-based optimizer](https://reader035.fdocuments.us/reader035/viewer/2022071102/5fdafccab228bb7baa0644f4/html5/thumbnails/70.jpg)
Questions
70