Online and Batch Learning of Pseudo-Metrics

23
Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University

description

Online and Batch Learning of Pseudo-Metrics. Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University. Motivating Example. Our Technique. Map instances into a space in which distances correspond to labels. Outline. - PowerPoint PPT Presentation

Transcript of Online and Batch Learning of Pseudo-Metrics

Page 1: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 1

Online and Batch Learning of Pseudo-Metrics

Shai Shalev-Shwartz

Hebrew University, Jerusalem

Joint work with

Yoram Singer, Google Inc.

Andrew Y. Ng, Stanford University

Page 2: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 2

Motivating Example

Page 3: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 3

Our Technique• Map instances into a space in which

distances correspond to labels

Page 4: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 4

Outline

• Distance learning setting

• Large margin for distances

• An online learning algorithm

• Online loss analysis

• A dual version

• Experiments:• Online - document filtering• Batch - handwritten digit recognition

Page 5: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 5

Problem Setting

• Training examples:• two instances• similarity label

• Hypotheses class: Pseudo-metrics

matrix

symmetric positive semi-definite matrix

Page 6: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 6

Large Margin for Pseudo-Metrics

• Sample S is -separated w.r.t. a metric

Page 7: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 7

Batch Formulations.t.

s.t.

Page 8: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 8

Pseudo-metric Online Learning Algorithm (POLA)

For

• Get two instances

• Calculate distance

• Predict

• Get true label and suffer hinge-loss

• Update matrix and threshold

If: we want that

If: we want that

Page 9: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 10

Core Update: Two Projections

• Start with• An example

defines a half-space

• is the projection of onto this half-space

• is the projection of onto the PSD cone

PS

D c

one

All zero loss matrices

Page 10: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 11

Online Learning

• Goal – minimize cumulative loss

• Why Online?• Online processing tasks (e.g. Text Filtering)• Simple to implement• Memory and run-time efficient• Worst-case bounds on the performance• Online to batch conversions

Page 11: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 12

Online Loss Bound

• sequence of examples s.t.

• any fixed matrix and threshold

• Then,

Loss bound does not depend on dimension

Loss suffered by “Complexity” of

Page 12: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 13

Incorporating Kernels

• Matrix A can be written as ,

where

• Therefore:

Page 13: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 14

Online Experiments• Task: Document filtering according to topics

• Dataset: Reuters-21578 • 10,000 documents

• Documents labeled as Relevant and Irrelevant

• A few relevant documents (1% - 10% of entire set)

• Algorithms: • POLA

• 1 Nearest Neighbor (1-NN)

• Perceptron Algorithm

• Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)

Page 14: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 15

POLA for Document Filtering

• Get a document

• Calculate distance to relevant documents observed so far using current matrix

• Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold

• Get true label

• Update matrix and threshold

Page 15: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 16

Document Filtering Results• Each blue point corresponds to one topic

• Y-axis designates the error of POLA

• Points beneath the black diagonal line mean that POLA wins

1-NN error

PO

LA

err

or

Perceptron error

PO

LA

err

or

PAUM error

PO

LA

err

or

Page 16: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 17

Batch Experiments• Task: Handwritten digits recognition

• Dataset: MNIST dataset• 45 binary classification problems (all pairs)

• 10,000 training examples

• 10,000 test examples

• Algorithms: Used k-NN with various metrics:• Pseudo-metric learned by POLA

• Euclidean distance

• Metric induced by Fisher Discriminant Analysis (FDA)

• Metric learned by Relevant Component Analysis (RCA)

(Bar-Hillel, Hertz, Shental, and Weinshall)

Page 17: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 18

MNIST Results

Euclidean distance errorFDA errorRCA error

RCA was applied after using PCA as a pre-processing step

• Each blue point corresponds to one binary classification problem

• Y-axis designates the error of POLA

• Points beneath the black diagonal line mean that POLA wins

Page 18: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 20

Toy problem

A color-coded matrix of Euclidean distances between pairs of images

Page 19: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 21

Metric found by POLA

Page 20: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 22

Mapping found by POLA

• Our Pseudo-metrics:

Page 21: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 23

Mapping found by POLA

Page 22: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 24

Summary and Extensions• An online algorithm for learning pseudo-metrics• Formal properties, good experimental results

Extensions:• Alternative regularization schemes to the

Frobenius norm • “Learning to learn”:

• Learning a metric from one set of classes and apply to another set of related classes

Page 23: Online and Batch Learning  of Pseudo-Metrics

Learning of Pseudo-Metrics. Slide 25

• Hello bye = w ¢ x