1 Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods Charles...

1

Semi-Supervised Trainingfor Appearance-Based Statistical Object

Detection Methods

Charles Rosenberg

Thesis OralMay 10, 2004

Thesis CommitteeMartial Hebert, co-chair

Sebastian Thrun, co-chairHenry Schneiderman

Avrim BlumTom Minka, Microsoft Research

2

Motivation: Object Detection

• Modern object detection systems “work”.

• Lots of manually labeled training data required.

• How can we reduce the cost of training data?

Example eye detections from the Schneiderman detector.

3

Approach: Semi-Supervised Training

• Supervised training: costly fully labeled data

• Semi-Supervised training: fully and weakly labeled data.

• Goal: Develop semi-supervised approach for the object detection problem and characterize issues.

4

What is Semi-Supervised Training? • Supervised Training

– Standard training approach

– Training with fully labeled data

• Semi-Supervised Training– Training with a combination of fully labeled data and

unlabeled or weakly labeled data

• Weakly Labeled Data– Certain label values unknown

– E.g. object is present, but location and scale unknown

– Labeling is relatively “cheap”

• Unlabeled Data– No label information known

5

Issues for Object Detection• What semi-supervised approaches are applicable?

– Ability to handle object detection problem uniqueness.

– Compatibility with existing detector implementations.

• What are the practical concerns?– Object detector interactions

– Training data issues

– Detector parameter settings

• What kind of performance gain possible?– How much labeled training data is needed?

6

Contributions

• Devised approach which achieves substantial performance gains through semi-supervised training.

• Comprehensive evaluation of semi-supervised training applied to object detection.

• Detailed characterization and comparison of semi-supervised approaches used.

7

Presentation Outline

• Introduction

• Background

• Semi-supervised Training Approach

• Analysis: Filter Based Detector

• Analysis: Schneiderman Detector

• Conclusions and Future Work

8

• Complex feature set– high dimensional, continuous with a complex distribution

• Large inherent variation– lighting, viewpoint, scale, location, etc.

• Many examples per training image– many negative examples and a very small number

of positive examples.

• Negative examples are free.• Large class overlap

– the object class is a “subset” of the clutter class

What is Unique About Object Detection?

P(X)P(X)

9

Background• Graph-Based Approaches

– Graph is constructed to represent the labeled and unlabeled data relationships – construction method important.

– Edges in the graph are weighted according to distance measure.

– Blum, Chawla, ICML 2001. Szummer, Jaakkola, NIPS 2001. Zhu, Ghahramani, Lafferty, ICML 2003.

• Information Regularization– explicit about information transferred from P(X) to P(Y|X)

– Szummer, Jaakkola, NIPS 2002; Corduneanu, Jaakkola, UAI 2003.

• Multiple Instance Learning– Addresses multiple examples per data element

– Dietterich, Lathrop, Lozano-Perez, AI 97. Maron, Lozano-Perez, NIPS 1998. Zhang, Goldman, NIPS 2001.

• Transduction, other methods…

10


• Introduction

• Background





11

Semi-Supervised Training Approaches• Expectation-Maximization (EM)

– Batch Algorithm • All data processed each iteration

– Soft Class Assignments• Likelihood distribution over class labels• Distribution recomputed each iteration

• Self-Training– Incremental Algorithm

• Data added to active pool at iteration– Hard Class Assignments

• Most likely class assigned• Labels do not change once assigned

12

Semi-Supervised Training with EM

Repeat for a fixed number of iterations

or until convergence.

Train initial detector model with initial labeled data set.

Run detector on weakly labeled set and compute most

likely detection.

Compute expected statistics of fully labeled examples and weakly labeled examples weighted by class

likelihoods.

Update the parameters of the detection model.

Maximization StepExpectation step

•Dempster, Laird, Rubin, 1977. •Nigam, McCallum, Thrun, Mitchell. 1999.

13

Semi-Supervised Training with Self-Training

Repeat until weakly labeled data exhausted for until some other stopping criterion.

Train detector model with the labeled data set.

Run detector on weakly labeled set and compute most

likely detection.

Score each detection with the selection metric.

Select the m best scoring examples and

add them to the labeled training set.

Nigam, Ghani, 2000. Moreno, Agaarwal, ICML 2003

14

Self-Training Selection Metrics• Detector Confidence

– Score = detection confidence– Intuitively appealing– Can prove problematic in practice

• Nearest Neighbor (NN) Distance– Score = minimum distance between detection and labeled

examples

data point score = minimum distance

15

Selection Metric Behavior

ConfidenceMetric

Nearest-Neighbor (NN) Metric

= class 1 = class 2 = unlabeled

16


ConfidenceMetric



17


ConfidenceMetric



18


ConfidenceMetric



19


ConfidenceMetric



20


ConfidenceMetric



21

Semi-Supervised Training & Computer Vision

• EM Approaches– S. Baluja. Probabilistic Modeling for Face Orientation

Discrimination: Learning from Labeled and Unlabeled Data. NIPS 1998.

– R. Fergus, P. Perona, A. Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning. CVPR 2003.

• Self Training– A. Selinger. Minimally Supervised Acquisition of 3D Recognition

Models from Cluttered Images. CVPR 2001.

• Summary– Reasonable performance improvements reported– “One of” experiments– No insight into issues or general application.

22


• Introduction

• Background





23

Filter Based Detector

Input Image

FilterBank

FeatureVector

Gaussian Mixture Models

Clutter GMM

Object GMM

xi

fi Mo+Mc

24

Filter Based Detector Overview

• Input Features and Model– Features = output of 20 filters at each pixel location

– Generative Model = separate Gaussian Mixture Model for object and clutter class

– A single model is used for all locations on the object

• Detection– Compute filter responses and likelihood under the object and

clutter models at each pixel location

– “Spatial Model” used to aggregate pixel responses into object level responses

25

Spatial Model

Training Images Object Masks Spatial Model

Example DetectionLog Likelihood Ratio Log Likelihood Ratio

26

Typical Example Filter Model Detections

Sample Detection Plots

Log Likelihood Ratio Plots

27

Filter Based Detector Overview

• Fully Supervised Training– fully labeled example = image + pixel mask

– Gaussian Mixture Model parameters trained

– Spatial model trained from pixel masks

• Semi-Supervised Training– weakly labeled example = image with the object

– Initial model is trained using the fully labeled object and clutter data

– The spatial model and clutter class model are fixed once trained with the initial labeled data set.

– EM and self-training variants are evaluated

28

Self-Training Selection Metrics

• Confidence based selection metric

– selection is detector odds ratio

data point score = minimum distance

)|()|(

XClutterYPXObjectYP

• Nearest neighbor (NN) selection metric

– selection is distance to closest labeled example

– distance is based on a model of each weakly labeled example

29

Filter Based Experiment Details• Training Data

– 12 images desktop telephone + clutter, view points +/- 90 degrees

– roughly constant scale and lighting conditions

– 96 images clutter only

• Experimental variations– 12 repetitions with different fully / weakly training data splits

• Testing data – 12 images, disjoint set, similar imaging conditions

Correct Detection Incorrect Detection

30

Example Filter Model Results

Labeled Data Only Expectation-Maximization

Self-Training Confidence Metric Self-Training NN Metric

31

Single Image Semi-Supervised Results

Labeled Only = 26.7% Expect-Max = 19.2%

Confidence Metric = 34.2% 1-NN Selection Metric = 47.5%

32

Two Image Semi-Supervised Results

Labeled Data Only + Near Pair = 52.5% 4-NN Metric + Near Pair = 85.8%

Close FarNearReference

33


• Introduction

• Background





34

Example Schneiderman Face Detections

35

Schneiderman Detector Details

WaveletTransform

Feature Construction

Detection Process

Training Process

Classifier

WaveletTransform

Feature Search

Feature Selection Adaboost

...log )|()|(

1 1

1 c

o

FPFP

Search OverLocation + Scale

Schneiderman 98,00,03,04

36

Schneiderman Detector Training Data• Fully Supervised Training

– fully labeled examples with landmark locations

• Semi-Supervised Training– weakly labeled example =

image containing the object

– initial model is trained using fully labeled data

– Variants of self-training are evaluated

37

Self Training Selection Metrics

• Confidence based selection metric– Classifier output / odds ratio

• Nearest Neighbor selection metric– Preprocessing = high pass filter +

normalized variance

– Mahalanobis distance to closest labeled example

)|()|(

)|()|(

2)|()|(

1 log...loglog2

2

1

1

cr

or

c

o

c

o

FPFP

rFPFP

FPFP

CandidateImage

LabeledImages

)),(),((MahMin)Score( jiji LgWgW

38

Schneiderman Experiment Details• Training Data

– 231 images from the Feret data set and the web

– Multiple eyes per image = 480 training examples

– 80 synthetic variations – position, scale, orientation

– Native object resolution = 24x16 pixels

– 15,000 non-object examples from clutter images

39

Schneiderman Experiment Details

Number of False Positives

• Evaluation Metric – +/- 0.5 object radius and +/- 1 scale octave are correct

– Area under the ROC curve (AUC) performance measure • ROC curve = Receiver Operating Characteristic Curve

• Detection rate vs. false positive count

Det

ecti

on R

ate

in

Per

cent

40

Schneiderman Experiment Details

• Experimental Variations– 5-10 runs with random data splits per experiment

• Experimental Complexity– Training the detector = one iteration

– One iteration = 12 CPU hours on a 2 GHz class machine

– One run = 10 iterations = 120 CPU hours = 5 CPU days

– One experiment = 10 runs = 50 CPU days

– All experiments took approximately 3 CPU years

• Testing Data – Separate set of 44 images with 102 examples

41

Example Detection Results

Fully Labeled Data Only

Fully Labeled + Weakly Labeled Data

42

Example Detection Results

Fully Labeled Data Only

Fully Labeled + Weakly Labeled Data

43

When can weakly labeled data help?

• It can help in the “smooth” regime• Three regimes of operation: saturated, smooth, failure

Performance vs. Fully Labeled Data Set Size

saturatedsmoothfailure

Fully Labeled Training Set Size on a Log Scale

F

ull D

ata

Nor

mal

ized

AU

C

44

Performance of Confidence Metric Self-Training

• Improved performance over range of data set sizes.• Not all improvements significant at 95% level.

Confidence Metric Self-Training AUC Performance

Fully Labeled Training Set Size 24 30 34 40 48 60

F

ull D

ata

Nor

mal

ized

AU

C

45

• Improved performance over range of data set sizes.• All improvements significant at 95% level.

Fully Labeled Training Set Size

Performance of NN Metric Self-TrainingNN Metric Self-Training AUC Performance

24 30 34 40 48 60

F

ull D

ata

Nor

mal

ized

AU

C

46

MSE Metric Changes to Self-Training Behavior

Confidence Metric Performance vs. Iteration

NN Metric Performance vs. Iteration

NN metric performance trend is level or upwards

B

ase

Dat

a N

orm

aliz

ed A

UC

Bas

e D

ata

Nor

mal

ized

AU

C

Iteration Number Iteration Number

47

Example Training Image Progression

1

2

Confidence Metric NN Metric

0.822

0.770

0.798

0.822

0.867

0.882

48

Example Training Image Progression

3

4

50.759

0.745

0.798 0.922

0.931

0.906

49

How much weakly labeled data is used?

It is relatively constant over initial data set size.


24 30 34 40 48 60

Weakly labeled data set size Weakly labeled data set ratio


24 30 34 40 48 60

T

rain

ing

Dat

a S

ize

R

atio

of

Wea

kly

to F

ully

Lab

eled

Dat

a

50


• Introduction

• Background





51

Contributions

• Devised approach which achieves substantial performance gains through semi-supervised training.

• Comprehensive evaluation (3 CPU years) of semi-supervised training applied to object detection.

• Detailed characterization and comparison of semi-supervised approaches used – much more analysis and many more details in the thesis.

52

Future Work

• Enabling the use of training images with clutter for context– Context priming

• A. Torralba, P. Sinha. ICCV 2001 and A. Torralba, K. Murphy, W. Freeman, M. Rubin. ICCV 2003.

• Training with weakly labeled data only– Online robot learning

– Mining the web for object detection• K. Barnard, D. Forsyth. ICCV 2001.

• K. Barnard, P. Duygulu, N. de Frietas, D. Forsyth. D. Blei. M. Jordan. JMLR 2003.

53

Conclusions

• Semi-supervised training can be practically applied to object detection to good effect.

• Self-training approach can substantially outperform EM.

• Selection metric is crucial for self-training performance.

54

• • •

55

• • •

56

Filter Model Results

• Key Points– Batch EM does not provide performance increase

– Self-training provides a performance increase

– 1-NN and 4-NN metrics work better than confidence

– “Near Pair” accuracy is highest

Algorithm Single Image Accuracy

Close Pair Accuracy

Near Pair Accuracy

Far Pair Accuracy

Full Data Set 100.0% 100.0% 100.0% 100% True Location 86.7% 95.8% 98.3% 98.3% Labeled Only 26.7% 40.8% 52.5% 50.8%

Batch EM 19.2% 35.8% 52.5% 54.2% Confidence Metric 34.2% 48.3% 73.3% 52.5%

1-NN Metric 47.5% 64.2% 82.5% 70.8% 4-NN / 40-MM Metric 53.3% 69.2% 85.8% 76.7%

57

Weakly Labeled Point Performance

Does confidence metric self-training improve point performance?• Yes - over a range of data set sizes.

58

Does MSE metric self-training improve point performance?• Yes – to a significant level over a range of data set sizes.

Weakly Labeled Point Performance

59

Schneiderman Features

60

Schneiderman Detection Process

61

Sample Schneiderman Face Detections

62

• • •

63

Simulation Data

Labeled and Unlabeled Data Hidden Labels

64

Simulation Data

Nearest Neighbor Confidence Metric

65

Simulation Data

Model Based Confidence Metric

66

• • •

67

Future Work – Mining the Web

Green regions are “Not-Clinton”.

“Clinton” Colors

“Not-Clinton” Colors

68

Future Work – Mining the Web

Green regions are “Not-Flag”.

“Flag” Colors

“Not-Flag” Colors

1 Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods Charles...

Documents

Transcript of 1 Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods Charles...