Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of...
-
Upload
abigayle-fisher -
Category
Documents
-
view
213 -
download
0
Transcript of Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of...
Multi-view Exploratory Learningfor AKBC Problems
Bhavana Dalvi and William W. CohenSchool Of Computer Science, Carnegie Mellon University
Motivation
Modeling Unobserved
Classes
Multi-view Exploratory EM
AKBC tasks
Acknowledgements : This work is supported by Google PhD Fellowship in Information Extraction and a Google Research Grant.
Conclusions
Traditional EM method for SSL jointly learns missing labels of unlabeled data points as well as model parameters.
We consider two extensions of traditional EM for SSL: We consider a new latent variable,
unobserved classes, by dynamically introducing new classes when appropriate.
Assigning multiple labels from multiple levels of class hierarchy while satisfying ontological constraints, and considering multiple data views.
Our proposed framework combines structuralsearch for the best class hierarchy with SSL, reducing the semantic draft associated with erroneouslygrouping unanticipated classes with expected classes.
Exploratory learning helps reduce semantic drift of seeded classes. It gets more powerful in conjunction with multiple data views and class hierarchy, when imposed as soft-constraints on the label vectors.
It can be applied for multiple AKBC tasks like macro-reading, gloss finding, ontology extension etc.
Datasets and code can be downloaded from: www.cs.cmu.edu/~bbd/exploratory_learning
Inputs: N: #data points; .
Outputs: {}: Parameters for k seed and m newly added classes; Class can have data views; : Set of class constraints between k+m classes; : Labels for
Initialize the modelwith a few seeds per class Iterate till convergence (Data likelihood and
number of classes) E Step (Iteration t): Predict labels for unlabeled
data points
For i = 1 : N = CombineMultiViewScore()
If NewClassCreationCriterion(
Create a new class , assign to it = UpdateConstraints()
OptimalLabelAssignment
M step: Re-compute model parameters using seeds and predicted labels for unlabeled data points .
Number of classes might increase in each iteration.
Check if model selection criterion is satisfied.
If not, revert to model in Iteration `t-1’
Dynamically introducing new classes Hypothesis: Dynamically inducing clusters
of data-points that do not belong to any of the seeded classes will reduce the semantic drift.
For each data-point , we compute posterior distribution of belonging to any of the existing classes [Dalvi et al., ECML’13]
Criterion 1 : MinMax
IfCreate a new class/cluster Criterion 2 : JS (Jensen–Shannon
divergence) = uniform distribution over k classes if Create a new class/cluster For hierarchical classification we also need to
decide where to place this newly created class: Divide and conquer method for extending
tree structured ontology [Dalvi et al. AKBC 2013]
Extension of DAC to extend a generic ontology with subset and mutual exclusion constraints (OptDAC) [Dalvi and Cohen, under review]
Model Selection This step makes sure that we do not create
too many new classes. We tried BIC, AIC, and AICc criteria, and
Extended AIC (AICc) worked best for our tasks.
AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1) Here g: Model being evaluated, L(g): Log-likelihood of data given g, v: Number of free parameters of the model, n: Number of data points.
Each data point and class centroid or classifier has representation in multiple views and
E.g. In the noun phrase classification task, we consider co-occurrences of NPs in text sentences (View-1) and HTML tables (View-2).
Combining scores from multiple views Sum-Score: Addition of scores Prod-Score: Product of scores Max-Agree: Maximize agreement between per view
label assignments [Dalvi and Cohen, in submission]
Multiple Data Views
Incorporating Multiple Views
and Ontological Constraints
Each data point is assigned a bit vector of labels. Subset and mutual exclusion constraints decide consistency of potential bit vectors.
GLOFIN: A mixed integer program is solved for each data point to get optimal label vector. [Dalvi et al. WSDM 2015]
Optimized Divide and Conquer (OptDAC): Here we combine 1) divide and conquer based top-down strategy to detect and place new categories in the ontology, with 2) mixed integer programming technique (GLOFIN) to select optimal set of labels for a data point, consistent w.r.t. ontological constraints.
Semi-supervised classification of noun-phrasesinto categories, using distributional features.
Exploratory learning can reduce semantic drift of seed classes.[Dalvi et al. ECML 2013]
Macro-reading
(Explore-EM)
Micro-reading Task: To classify an entity mention using context
specific features . Clustering NIL entities for KBP entity discovery and
linking (EDL) task [Mazaitis et al., KBP 2014]Multi-view Hierarchical SSL (MaxAgree)
MaxAgree method exploits clues from different data views.
We define multi-view clustering as an optimization problem and compare various methods for combining scores across views.MaxAgree method is more robust compared to Prod-Score method when we varydifference of performance between views.
Our proposed Hier-MaxAgree method can incorporate both: the clues from multiple view, and ontological constraints.[Dalvi and Cohen, in submission]
On entity classification for NELL KB, our proposed Hier-MaxAgree method gave state-of-the-art performance.
Different Document Representations
Naïve Bayes: Assumes multinomial distribution for feature occurrences, explicitly models class prior.
Seeded K-Means: Similarity based on cosine distance between centroids and data points
Seeded von Mises-Fisher: SSL method for data distributed on the unit hyper-sphere.
Ontological Constraints
Automatic gloss finding for KBs (GLOFIN)
We developed GLOFIN method that takes a gloss-free KB, a large collection of glosses and automatically matches glosses to entities in the KB. [Dalvi et al. WSDM 2015]
We used Glosses with only one candidate KB entity (unambiguous glosses) are used as training data to train hierarchical classification model for categories in the KB. Ambiguous glosses are then disambiguated based on the KB category they are put in.
Our method outperformed SVM and a label propagation baseline especially when amount of training data is small.
In future: Apply GLOFIN to word sense disambiguation w.r.t. WordNet synset hierarchy.
Hierarchical Exploratory Learning (OptDAC) We proposed OptDAC that can do hierarchical SSL in the
presence of incomplete class ontologies. It employs mixed integer programming formulation to find
optimal label assignments for a data point, while traversing the class ontology in top-down fashion to detect whether a new class needs to be added and where to place it. [Dalvi and Cohen, under review]
Precision Recall F10
1020304050607080
SVMLabal PropagationGLOFIN-Naïve-Bayes
1 2
3 4
5
6
5 10 15 20 25 3040
45
50
55
60
65
70
Concatena-tion
Co-training
Sum-Score
Prod-Score
Hier-Max-Agree
Training Percentage
Macro
-avera
ged
F1
score
Performance improvement over best view
Correlation w.r.t difference in views
Coefficient
P-value
Prod-Score -0.59 0.01
MaxAgree -0.05 0.82
Text-patterns + Ontology-1
Text-patterns + Ontology-2
HTML-tables +
Ontology-1HTML-
tables + Ontology-2
An example of extended ontology by OptDAC
Root
FoodLocation
Country
State
Vegetable
Condiment
1.0
Coke
0.10.9
0.55 0.45
C8
Example use-case of
Exploratory EM
20 Newsgroups Dataset (#seed classes = 6)