Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10....
Transcript of Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10....
Semi-Supervised Learning
Barnabas Poczos
Slides Courtesy: Jerry Zhu, Aarti Singh
Supervised Learning
Learning algorithm
Labeled
Goal:
Feature Space Label Space
Optimal predictor (Bayes Rule) depends on unknown PXY, so instead
learn a good prediction rule from training data
2
Labeled and Unlabeled data
Human expert/
Special equipment/
Experiment
“Crystal” “Needle” “Empty”
Cheap and abundant ! Expensive and scarce !
“0” “1” “2” …
“Sports”
“News”
“Science”
…
3
Free-of-cost labels?
Luis von Ahn: Games with a purpose (ReCaptcha)
Word challenging to OCR (Optical Character Recognition)
You provide a free label!
4
Semi-Supervised learning
Supervised learning (SL)
Semi-Supervised learning (SSL)
Learning algorithm
Goal: Learn a better prediction rule than based on labeled data alone.
“Crystal”
5
Semi-Supervised learning in Humans
6
Can unlabeled data help?
Assume each class is a coherent group (e.g. Gaussian)
Then unlabeled data can help identify the boundary more accurately.
Positive labeled data
Negative labeled data
Unlabeled data
Supervised Decision Boundary
Semi-Supervised Decision Boundary
7
Can unlabeled data help?
35
8
7
9
4
2
1
53
8
7
9
4
2
1
“0” “1” “2” …
“Similar” data points have “similar” labels8
This embedding can be done by manifold learning algorithms
Some SSL Algorithms
▪ Self-Training
▪ Generative methods, mixture models
▪ Graph-based methods
▪ Co-Training
▪ Semi-supervised SVM
▪ Many others
9
Notation
10
Self-training
11
Self-training Example
Propagating 1-NN
12
Mixture Models for Labeled Data
15
Mixture Models for Labeled Data
> 1/2<
Estimate the
parameters from the
labeled data
Decision for any test
point not in the labeled
dataset
16
Mixture Models for Labeled Data
17
Mixture Models for SSL Data
18
Mixture Models
19
Mixture ModelsSL vs SSL
20
Mixture Models
21
Gaussian Mixture Models
22
EM for Gaussian Mixture Models
23
Assumption for GMMs
24
Assumption for GMMs
25
Assumption for GMMs
26
Related: Cluster and Label
27
28
Graph Based Methods
Assumption: Similar unlabeled data have similar labels.
29
Graph RegularizationSimilarity Graphs: Model local neighborhood relations between data points
30
Assumption:
Nodes connected by heavy edges
tend to have similar label
Graph Regularization
If data points i and j are similar (i.e. weight wij is large), then their labels
are similar fi = fj
Loss on labeled data
(mean square,0-1)
Graph based smoothness prior
on labeled and unlabeled data
31
Co-training
Co-training (Blum & Mitchell, 1998) (Mitchell, 1999) assumes that (i) features can be split into two sets; (ii) each sub-feature set is sufficient to train a good classifier.
• Initially two separate classifiers are trained with the labeled data, on the two sub-feature sets respectively.
• Each classifier then classifies the unlabeled data, and ‘teaches’ the other classifier with the few unlabeled examples (and the predicted labels) they feel most confident.
• Each classifier is retrained with the additional training examples given by the other classifier, and the process repeats.
33
Co-training Algorithm
Co-training AlgorithmBlum & Mitchell’98
Semi-Supervised SVMs
35
Semi-Supervised Learning
▪ Generative methods
▪ Graph-based methods
▪ Co-Training
▪ Semi-Supervised SVMs
▪ Many other methods
SSL algorithms can use unlabeled data to help improve prediction accuracy if data satisfies appropriate assumptions
36