Learning a nonlinear embedding by preserving class neibourhood structure 최종

33
Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure AISTATS `07 San Juan, Puerto Rico Salakhutdinov Ruslan, and Geoffrey E. Hinton. Presenter: WooSung Choi ([email protected] ) DataKnow. Lab Korea UNIV.

Transcript of Learning a nonlinear embedding by preserving class neibourhood structure 최종

Page 1: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure

AISTATS `07 San Juan, Puerto RicoSalakhutdinov Ruslan, and Geoffrey E. Hinton. 

Presenter:

WooSung Choi([email protected])

DataKnow. LabKorea UNIV.

Page 2: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Background

(k-) Nearest Neighbor Query

Page 3: Learning a nonlinear embedding by preserving class neibourhood structure   최종

kNN(k-Nearest Neighbor) Query

0 x

y

Page 4: Learning a nonlinear embedding by preserving class neibourhood structure   최종

kNN(k-Nearest Neighbor) Classifi-cation

Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear embedding by preserving class neighbourhood

structure." International Conference on Artificial Intelligence and Sta-tistics. 2007.

NN Class1-NN 6

2-NN 6

3-NN 6

4-NN 6

5-NN 0

<Result of 5-NN>

Result of 5-NN Classification: 6 (80%)

Page 5: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Motivating Example• MNIST

Dimensionality: 28 x 28 = 78450,000 training images10,000 test images

• Error: 2.77%• Query response: 108ms

Page 6: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Reality Check• Curse of dimensionality

[Qin lv et al, Image Similarity Search with Compact Data Structures @CIKM`04]

poor performance when the number of dimensions is high

Roger Weber et al, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces @ VLDB`98

Page 7: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Locality Sensitive Hashing, Data Sensitive Hashing

Curse of Dimension-ality

Recall 데이터 분포 고려 기반 기술Scan X (없음 ) 1 △ N/A

RTree-based Solution O (강함 ) 1 O index: TreeLocality Sensitive

Hashing △ (덜함 ) X Hashing + Mathematics

Data Sensitive Hash-ing △ (덜함 ) O

Hashing+ Machine Learning

Page 8: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Abstract

Page 9: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Abstract• How to pre-train and fine-tune a MNN

To lean a nonlinear transformation From the input space To a low dimensional feature space

Where KNN classification performs well

Improved using unlabeled data

Page 10: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Introduction

Page 11: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Notation• Transformation to Low-Dim Feature Space

Input vectors: Transformation Function

Parameterized by Output vectors:

• Similarity Measure Input vectors: Output:

Page 12: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Objective (informal)

• Goal

Page 13: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Objective (formal)

• Goal: Maximizing

Page 14: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Relative Works: Linear Transforma-tion• Linear Transformation [8,9,18]

Weakness Limited number of parameters

, then should be 30 by 784 matrix(23,520 parameters) In this paper: 785*500 + 501*500 + 501*500 + 2001*30 parameters

Cannot model higher-order correlation

• Deep Autoencoder [14], DBN[12]

Page 15: Learning a nonlinear embedding by preserving class neibourhood structure   최종

In this paper• Non-Linear Transformation

Overview Pre-training: Similar to [12,14]

Stack of RBM RBM1 784-500 RBM2 500-500 RBM3 500-2000 RBM4 2000-30

Fine-tuning: backpropagation To maximize the objective function

Maximize the expected number of correctly classified points

on the training data

Page 16: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Objective (formal)

• Goal: Maximizing

Page 17: Learning a nonlinear embedding by preserving class neibourhood structure   최종

2. Learning Nonlinear NCA

Neighbourhood Component Analysis

Page 18: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Notation

Symbol DefinitionIndex

training vector (d-di-mensional data)

{1,2,…,C} Label of training vectorLabeled training casesOutput of Multilayer

Neural network parame-terized by

Euclidean distance met-ric

The probability thatpoint a selects one of its neighbor b in the trans-

formed feature space

0 1 5 7 7

1 0.3678

0.0497

0.002 0.002

0 0.88 0.11 0 0

𝑝𝑎𝑏=0.3678

0.3678+0.0497+0.0002+0.0002 ≈0.88

Page 19: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Notation

Symbol Definition

The probability thatpoint a selects one of its neighbor b in the transformed feature

space

The probability that point a belongs to

class k

The Expected Num-ber of correctly clas-

sified point on the training data

N/A 3 3 2 1

0 0.88 0.11 0 0

𝑝 (𝑐𝑎=3 )=0.99𝑝 (𝑐𝑎=2 )=0𝑝 (𝑐𝑎=1 )=0

Page 20: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Learning Rule• Backpropagation To maximize

• Derivation

𝜕𝜕 𝑓 (𝑥𝑎∨𝑊 )

¿

Page 21: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Learning Rule• Backpropagation To maximize

• Derivation

Standard backpropagationOutput Layer: Inner Layer:

Page 22: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Learning Rule• Backpropagation To maximize

• Derivation

Page 23: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Details• Pre-training

Mini-batch Each containing 100cases Epoch: 50

Fine-Tuning Method: Conjugate gradients on larger

mini-batches of 5,000 with three line search performed for each mini-batch

Epoch: 50

Dataset 60,000 training images 10,000 for validation

Page 24: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Experiment

Page 25: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Result

Page 26: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Result

Page 27: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Appendix

Regularized Nonlinear NCA

Page 28: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Regularized Nonlinear NCA

Page 29: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Application

• Learn Compact binary codes that allow efficient re-trieval Gist descriptor + Locality Sensitive Hashing Scheme + Non-linear NCA

Dataset: LabelMe 22,000 images label: {human, woman, man, etc}

Torralba, Antonio, Rob Fergus, and Yair Weiss. "Small codes and large image data-bases for recognition." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

http://labelme2.csail.mit.edu/Release3.0/browserTools/php/publications.php

Page 30: Learning a nonlinear embedding by preserving class neibourhood structure   최종

Neural Network

Toy Example: AND gate, XOR gate

Page 31: Learning a nonlinear embedding by preserving class neibourhood structure   최종

AND gate

Z

𝑥

𝑦

1

x y t

𝑤0

𝑤1

𝑤2

sigm

𝑠𝑖𝑔𝑧

𝑧=𝑥 ∙𝑤0+ 𝑦 ∙𝑤1+1 ∙𝑤2

𝑠𝑖𝑔𝑚 ( 𝑥 )= 11+𝑒−𝑥

Page 32: Learning a nonlinear embedding by preserving class neibourhood structure   최종

XOR gate

x y t

𝑧1𝑥𝑦1

𝑤0 0

𝑤01

𝑤0 2

sigm 𝑠𝑖𝑔𝑧1

𝑧 2𝑥𝑦1

𝑤10

𝑤11

𝑤12

sigm 𝑠𝑖𝑔𝑧 2

𝑧 3

1

𝑤20

𝑤21

𝑤22

sigm

𝑠𝑖𝑔 𝑧 3