Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10....

36
Semi - Supervised Learning Barnabas Poczos Slides Courtesy: Jerry Zhu, Aarti Singh

Transcript of Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10....

Page 1: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Semi-Supervised Learning

Barnabas Poczos

Slides Courtesy: Jerry Zhu, Aarti Singh

Page 2: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Supervised Learning

Learning algorithm

Labeled

Goal:

Feature Space Label Space

Optimal predictor (Bayes Rule) depends on unknown PXY, so instead

learn a good prediction rule from training data

2

Page 3: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Labeled and Unlabeled data

Human expert/

Special equipment/

Experiment

“Crystal” “Needle” “Empty”

Cheap and abundant ! Expensive and scarce !

“0” “1” “2” …

“Sports”

“News”

“Science”

3

Page 4: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Free-of-cost labels?

Luis von Ahn: Games with a purpose (ReCaptcha)

Word challenging to OCR (Optical Character Recognition)

You provide a free label!

4

Page 5: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Semi-Supervised learning

Supervised learning (SL)

Semi-Supervised learning (SSL)

Learning algorithm

Goal: Learn a better prediction rule than based on labeled data alone.

“Crystal”

5

Page 6: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Semi-Supervised learning in Humans

6

Page 7: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Can unlabeled data help?

Assume each class is a coherent group (e.g. Gaussian)

Then unlabeled data can help identify the boundary more accurately.

Positive labeled data

Negative labeled data

Unlabeled data

Supervised Decision Boundary

Semi-Supervised Decision Boundary

7

Page 8: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Can unlabeled data help?

35

8

7

9

4

2

1

53

8

7

9

4

2

1

“0” “1” “2” …

“Similar” data points have “similar” labels8

This embedding can be done by manifold learning algorithms

Page 9: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Some SSL Algorithms

▪ Self-Training

▪ Generative methods, mixture models

▪ Graph-based methods

▪ Co-Training

▪ Semi-supervised SVM

▪ Many others

9

Page 10: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Notation

10

Page 11: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Self-training

11

Page 12: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Self-training Example

Propagating 1-NN

12

Page 13: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled
Page 14: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled
Page 15: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture Models for Labeled Data

15

Page 16: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture Models for Labeled Data

> 1/2<

Estimate the

parameters from the

labeled data

Decision for any test

point not in the labeled

dataset

16

Page 17: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture Models for Labeled Data

17

Page 18: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture Models for SSL Data

18

Page 19: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture Models

19

Page 20: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture ModelsSL vs SSL

20

Page 21: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Mixture Models

21

Page 22: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Gaussian Mixture Models

22

Page 23: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

EM for Gaussian Mixture Models

23

Page 24: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Assumption for GMMs

24

Page 25: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Assumption for GMMs

25

Page 26: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Assumption for GMMs

26

Page 27: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Related: Cluster and Label

27

Page 28: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

28

Page 29: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Graph Based Methods

Assumption: Similar unlabeled data have similar labels.

29

Page 30: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Graph RegularizationSimilarity Graphs: Model local neighborhood relations between data points

30

Assumption:

Nodes connected by heavy edges

tend to have similar label

Page 31: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Graph Regularization

If data points i and j are similar (i.e. weight wij is large), then their labels

are similar fi = fj

Loss on labeled data

(mean square,0-1)

Graph based smoothness prior

on labeled and unlabeled data

31

Page 32: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Co-training

Page 33: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Co-training (Blum & Mitchell, 1998) (Mitchell, 1999) assumes that (i) features can be split into two sets; (ii) each sub-feature set is sufficient to train a good classifier.

• Initially two separate classifiers are trained with the labeled data, on the two sub-feature sets respectively.

• Each classifier then classifies the unlabeled data, and ‘teaches’ the other classifier with the few unlabeled examples (and the predicted labels) they feel most confident.

• Each classifier is retrained with the additional training examples given by the other classifier, and the process repeats.

33

Co-training Algorithm

Page 34: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Co-training AlgorithmBlum & Mitchell’98

Page 35: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Semi-Supervised SVMs

35

Page 36: Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

Semi-Supervised Learning

▪ Generative methods

▪ Graph-based methods

▪ Co-Training

▪ Semi-Supervised SVMs

▪ Many other methods

SSL algorithms can use unlabeled data to help improve prediction accuracy if data satisfies appropriate assumptions

36