EigenTransfer : A Unified Framework for Transfer Learning

EigenTransfer: A Unified Framework for Transfer

LearningWenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang

Yang and Yong Yu

Shanghai Jiao Tong University & Hong Kong University of Science and Technology

Motivation Problem Formulation Graph Construction Simple Review on Spectral Analysis Learning from Graph Spectra Experiments Result Conclusion

Outline

Motivation

A variety of transfer learning tasks have been investigated.

Motivation

Lifelong Learning (Thrun,

1996)

Multi-task Learning

(Caruana, 1997)

Cross-domain Learning (Wu et

al., 2004)

Cross-category Learning (Raina

et al., 2006)

Self-taught Learning (Raina

et al., 2007)

General

Framework

Difference◦ Different tasks◦ Different approaches & algorithms

Common

Motivation

Auxiliary Data

Target Data (Training)

Target Data (Test)

Common parts or relation

We can have a graph:

Motivation

Features

Auxiliary Data Training Data Test Data

Labels

New Representation

We can get the new representation of Training Data and Test Data by Spectral Analysis.

Then we can use our traditional non-transfer learner again.

Motivation

Problem Formulation

Target Training Data: with labels Target Test Data: without labels Auxiliary Data:

Task◦ Cross-domain Learning◦ Cross-category Learning◦ Self-taught Learning

Problem Formulation

1{ }i nt t ix

1{ }i iutkx

1{ }i im

a ux

Problem Formulation

Graph Construction

Graph Construction

Cross-domain Learning

-( )- -( )- -( )- -( 1 )- -( 1 )-

itx

jf,i jiux

jf,i jiax

jf,i j

itxiux

jCjC

Graph Construction

Cross-category Learning

-( )- -( )- -( )- -( 1 )- -( 1 )-

itx

jf,i jiux

jf,i jiax

jf,i j

itxiux

jtCjaC

Graph Construction

Self-taught Learning

-( )- -( )- -( )- -( 1 )-

itx

jf,i jiux

jf,i jiax

jf,i j

itx

jtC

Graph Construction

Doc-Token Matrix Adjacency Matrix

Token Token …

Doc

Doc

…

Doc Feature

Label

Doc ?

Feature

? 0

Label 0 0

Simple Review on Spectral Analysis

G is an undirected weighted graph with weight matrix W, where .

D is a diagonal matrix, where

Unnormalized graph Laplacian matrix:

Normalized graph Laplacians:


0ij jiWW

L D W

1/2 1/2 1/2 1/2sym D LD I D WDL

1 1rwL D L WI D

ii ijj

WD

Calculate the first k eigenvectors The New representation:


1 2, kv v v

v1 v2 v3

Node1

Node2

Node3

Node4

…

New Feature Vector of the

Node2

Learning from Graph Spectra

Graph G Adjacency matrix of G: Graph Laplacian of G: Solve the generalized eigenproblem:

The first k eigenvectors form a new feature representation.

Apply traditional learners such as NB, SVM


W

L D W

L Dv v

DocFeatur

e Label

Doc

Feature

Label


DocFeatur

e Label

Doc

Feature

Label

v1 v2

Train

Test

Auxiliary

Feature

Label

Train

v1 v2

Test v1 v2

Classifier

W

L

The only problem remain is the computation time.

Which is lucky:◦ Matrix L is sparse◦ There are fast algorithms for sparse matrix for

solving eigen-problem. (Lanczos) The final computational cost is linear to


( )nz L k

Experiments

Basic Progress

Experiments

Training Data

Test DataAuxiliary

Data

New Training

Data

New Test Data

15 Positive Instances &15 Negative Instances

Baseline

Result

Repeat 10 times

Calculate average

Sample

Classifier(NB/SVM/TSVM)

CV

Cross-domain Learning Data

◦ SRAA◦ 20 Newsgroups (Lang, 1995)◦ Reuters-21578

Target data and auxiliary data share the same categories(top directories), but belong to different domains(sub-directories).

Experiments

ExperimentsCross-domain result with NB

cdl-s

raa1

cdl-s

raa2

cdl-2

0ng1

cdl-2

0ng2

cdl-2

0ng3

cdl-2

0ng4

cdl-2

0ng5

cdl-2

0ng6

cdl-r

eute

rs1

cdl-r

eute

rs2

cdl-r

eute

rs3

aver

age

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Non-TransferSimple combineEigen Transfer

ExperimentsCross-domain result with SVM

cdl-s

raa1

cdl-s

raa2

cdl-2

0ng1

cdl-2

0ng2

cdl-2

0ng3

cdl-2

0ng4

cdl-2

0ng5

cdl-2

0ng6

cdl-r

eute

rs1

cdl-r

eute

rs2

cdl-r

eute

rs3

aver

age

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45


ExperimentsCross-domain result with TSVM

cdl-s

raa1

cdl-s

raa2

cdl-2

0ng1

cdl-2

0ng2

cdl-2

0ng3

cdl-2

0ng4

cdl-2

0ng5

cdl-2

0ng6

cdl-r

eute

rs1

cdl-r

eute

rs2

cdl-r

eute

rs3

aver

age

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


Cross-domain result on average

Experiments

Non-Transfer Simple Combine EigenTransfer

NB 0.250±0.036 0.239±0.000 0.134±0.031

SVM 0.190±0.039 0.213±0.000 0.095±0.018

TSVM 0.140±0.038 0.145±0.000 0.101±0.019

Cross-category Learning Data

◦ 20 Newsgroups (Lang, 1995)◦ Ohscal data set from OHSUMED (Hersh et al.

1994) Random select two categories as target

data. Take the other categories as auxiliary labeled data.

Experiments

ExperimentsCross-category result with NB

ccl-2

0ng1

ccl-2

0ng2

ccl-2

0ng3

ccl-2

0ng4

ccl-2

0ng5

ccl-o

hs1

ccl-o

hs2

ccl-o

hs3

ccl-o

hs4

ccl-o

hs5

aver

age

0

0.05

0.1

0.15

0.2

0.25

0.3

Non-TransferEigenTransfer

ExperimentsCross-category result with SVM

ccl-2

0ng1

ccl-2

0ng2

ccl-2

0ng3

ccl-2

0ng4

ccl-2

0ng5

ccl-o

hs1

ccl-o

hs2

ccl-o

hs3

ccl-o

hs4

ccl-o

hs5

aver

age

0

0.05

0.1

0.15

0.2

0.25


ExperimentsCross-category result with TSVM

ccl-2

0ng1

ccl-2

0ng2

ccl-2

0ng3

ccl-2

0ng4

ccl-2

0ng5

ccl-o

hs1

ccl-o

hs2

ccl-o

hs3

ccl-o

hs4

ccl-o

hs5

aver

age

0

0.05

0.1

0.15

0.2

0.25


Cross-category result on average

Experiments

Non-Transfer EigenTransfer

NB 0.186±0.038 0.099±0.025

SVM 0.131±0.032 0.065±0.016

TSVM 0.104±0.010 0.091±0.013

Self-taught Learning Data

◦ 20 Newsgroups (Lang, 1995)◦ Ohscal data set from OHSUMED (Hersh et al.

1994) Random select two categories as target

data. Take the other categories as auxiliary without labeled data.

Experiments

ExperimentsSelf-taught result with NB

stl-2

0ng1

stl-2

0ng2

stl-2

0ng3

stl-2

0ng4

stl-2

0ng5

stl-o

hs1

stl-o

hs2

stl-o

hs3

stl-o

hs4

stl-o

hs5

aver

age

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35


ExperimentsSelf-taught result with SVM

stl-2

0ng1

stl-2

0ng2

stl-2

0ng3

stl-2

0ng4

stl-2

0ng5

stl-o

hs1

stl-o

hs2

stl-o

hs3

stl-o

hs4

stl-o

hs5

aver

age

0

0.05

0.1

0.15

0.2

0.25


ExperimentsSelf-taught result with TSVM

stl-2

0ng1

stl-2

0ng2

stl-2

0ng3

stl-2

0ng4

stl-2

0ng5

stl-o

hs1

stl-o

hs2

stl-o

hs3

stl-o

hs4

stl-o

hs5

aver

age

0

0.05

0.1

0.15

0.2

0.25


Self-taught result on average

Experiments

Non-Transfer EigenTransfer

NB 0.189±0.038 0.107±0.032

SVM 0.126±0.030 0.070±0.017

TSVM 0.106±0.011 0.098±0.024

ExperimentsEffect of the number of Eigenvectors

ExperimentsLabeled Target Data

We proposed a general transfer learning framework.

It can model a variety of existing transfer learning problems and solutions.

Our experimental results show that it can greatly outperform non-transfer learners in many experiments.

Conclusion

Thank you!

EigenTransfer : A Unified Framework for Transfer Learning

Documents

Transcript of EigenTransfer : A Unified Framework for Transfer Learning