Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
-
Upload
jocelyn-mcgee -
Category
Documents
-
view
219 -
download
0
Transcript of Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
![Page 1: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/1.jpg)
Transfer Learning with Applications to Text Classification
Jing PengComputer Science Department
![Page 2: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/2.jpg)
Machine learning:
study of algorithms that
① improve performance P② on some task T③ using experience E
Well defined learning task: <P,T,E>
![Page 3: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/3.jpg)
Learning to recognize targets in images:
![Page 4: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/4.jpg)
Learning to classify text documents:
![Page 5: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/5.jpg)
Learning to build forecasting models:
![Page 6: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/6.jpg)
Growth of Machine Learning
Machine learning is preferred approach to
① Speech processing② Computer vision③ Medical diagnosis④ Robot control⑤ News articles processing⑥ …
This machine learning niche is growing
① Improved machine learning algorithms② Lots of data available③ Software too complex to code by hand④ …
![Page 7: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/7.jpg)
Learning Given Least squares methods
Learning focuses on minimizing
inm
iii yyxz , with x, i1
m
iii
fxfy
mf
1
21minarg
xdffX
2
. target therepresents f
:approximation errorH
X
H xdff 2
:estimation error),( HzS
Hf
f
f xdfffX
HH
2min
X
H
X
xdffHzSxdff 22 ),(
xdffxdffHzSX HX
22),(
![Page 8: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/8.jpg)
Main Challenge:1. Transfer learning2. High Dimensional (4000 features)3. Overlapping (<80% features are the same)4. Solution with performance bounds
Transfer Learning with Applications to Text Classification
![Page 9: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/9.jpg)
Standard Supervised Learning
New York Times
training (labeled)
test (unlabeled)
Classifier 85.5%
New York Times
![Page 10: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/10.jpg)
In Reality……
New York Times
training (labeled)
test (unlabeled)
Classifier 64.1%
New York Times
Labeled data not available!Reuters
![Page 11: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/11.jpg)
Domain Difference Performance Droptrain test
NYT NYT
New York Times New York Times
Classifier 85.5%
Reuters NYT
Reuters New York Times
Classifier 64.1%
ideal setting
realistic setting
![Page 12: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/12.jpg)
High Dimensional Data Transfer High Dimensional Data:
Text Categorization Image Classification
The number of features in our experiments is more than 4000
Challenges: High dimensionality.
more than training examples Euclidean distance becomes meaningless
![Page 13: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/13.jpg)
Why Dimension Reduction?
DMAXDMAX
DMINDMIN
![Page 14: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/14.jpg)
Curse of Dimensionality
DimensionsDimensions
![Page 15: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/15.jpg)
Curse of Dimensionality
DimensionsDimensions
6104 8108
![Page 16: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/16.jpg)
High Dimensional Data Transfer High Dimensional Data:
Text Categorization Image Classification
The number of features in our experiments is more than 4000
Challenges: High dimensionality.
more than training examples Euclidean distance becomes meaningless
Feature sets completely overlapping?No. Some less than 80% features are the same.Marginally not so related?Harder to find transferable structuresProper similarity definition.
![Page 17: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/17.jpg)
PAC (Probably Approximately Correct) learning requirement
Training and test distributions must be the same
![Page 18: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/18.jpg)
Transfer between high dimensional overlapping distributions
• Overlapping DistributionsData from two domains may not come from the same part of space; potentially overlap at best.
![Page 19: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/19.jpg)
Transfer between high dimensional overlapping distributions
• Overlapping Distribution
A ? 1 0.2 +1
Data from two domains may not come from the same part of space; potentially overlap at best.
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
x y z label
![Page 20: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/20.jpg)
Transfer between high dimensional overlapping distributions
• Overlapping Distribution
A ? 1 0.2 +1
Data from two domains may not come from the same part of space; potentially overlap at best.
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
x y z label
![Page 21: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/21.jpg)
Transfer between high dimensional overlapping distributions
• Overlapping Distribution
A ? 1 0.2 +1
Data from two domains may not be lying on exactly the same space, but at most an overlapping one.
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
x y z label
![Page 22: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/22.jpg)
Transfer between high dimensional overlapping distributions
• Overlapping Distribution
A ? 1 0.2 +1
Data from two domains may not be lying on exactly the same space, but at most an overlapping one.
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
x y z label
![Page 23: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/23.jpg)
Problems with overlapping distributions Overlapping features alone may not provide
sufficient predictive power
Transfer between high dimensional overlapping distributions
![Page 24: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/24.jpg)
Problems with overlapping distributions Overlapping features alone may not provide
sufficient predictive power
Transfer between high dimensional overlapping distributions
A ? 1 0.2 +1
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
f1 f2 f3 label
![Page 25: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/25.jpg)
Problems with overlapping distributions Overlapping features alone may not provide
sufficient predictive power
Transfer between high dimensional overlapping distributions
A ? 1 0.2 +1
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
f1 f2 f3 label
![Page 26: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/26.jpg)
Problems with overlapping distributions Overlapping features alone may not provide
sufficient predictive power
Transfer between high dimensional overlapping distributions
A ? 1 0.2 +1
B 0.09 ? 0.1 +1
C 0.01 ? 0.3 -1
f1 f2 f3 labelHard to predict correctly
![Page 27: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/27.jpg)
Overlapping Distributions Use the union of all features and fill in
missing values with “zeros”?
Transfer between high dimensional overlapping distributions
![Page 28: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/28.jpg)
Overlapping Distributions Use the union of all features and fill in missing
values with “zeros”?
Transfer between high dimensional overlapping distributions
A 0 1 0.2 +1
B 0.09 0 0.1 +1
C 0.01 0 0.3 -1
f1 f2 f3 label
![Page 29: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/29.jpg)
Overlapping Distribution Use the union of all features and fill in the missing
values with “zeros”?
Transfer between high dimensional overlapping distributions
A 0 1 0.2 +1
B 0.09 0 0.1 +1
C 0.01 0 0.3 -1
f1 f2 f3 label
Does it helps?
![Page 30: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/30.jpg)
Transfer between high dimensional overlapping distributions
![Page 31: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/31.jpg)
Transfer between high dimensional overlapping distributions
D2 { A, B} = 0.0181
>
D2 {A, C} = 0.0101
![Page 32: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/32.jpg)
Transfer between high dimensional overlapping distributions
D2 { A, B} = 0.0181
>
D2 {A, C} = 0.0101
A is mis-classified as in the class of C, instead
of B
![Page 33: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/33.jpg)
Transfer between high dimensional overlapping distributions
When one uses the union of overlapping and non-overlapping features and replaces missing values with “zero”, distance of two marginal distributions p(x) can
become asymptotically very large as a function of non-overlapping features:
becomes a dominant factor in similarity measure.
![Page 34: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/34.jpg)
High dimensionality can underpin important features
Transfer between high dimensional overlapping distributions
![Page 35: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/35.jpg)
Transfer between high dimensional overlapping distributions
![Page 36: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/36.jpg)
Transfer between high dimensional overlapping distributions
The “blues” are closer to the “greens” than to
the “reds”
![Page 37: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/37.jpg)
LatentMap: two step correction
Missing value regression Bring marginal distributions closer
Latent space dimensionality reduction Further bring marginal distributions closer Ignore non-important noisy and “error imported
features” Identify transferable substructures across two
domains.
![Page 38: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/38.jpg)
Predict missing values (recall the previous example)
Missing Value Regression
![Page 39: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/39.jpg)
Predict missing values (recall the previous example)
Missing Value Regression
![Page 40: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/40.jpg)
Predict missing values (recall the previous example)
Missing Value Regression
1. Project to overlapped feature
![Page 41: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/41.jpg)
Predict missing values (recall the previous example)
Missing Value Regression
1. Project to overlapped feature
2. Map from z to xRelationship
found byregression
![Page 42: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/42.jpg)
Predict missing values (recall the previous example)
Missing Value Regression
1. Project to overlapped feature
2. Map from z to xRelationship
found byregression
![Page 43: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/43.jpg)
Predict missing values (recall the previous example)
Missing Value Regression
1. Project to overlapped feature
2. Map from z to xRelationship
found byregression
D { img(A’), B} = 0.0109
<
D {img(A’), C} = 0.0125
![Page 44: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/44.jpg)
Predcit missing values (recall the previous example)
Missing Value Regression
1. Project to overlapped feature
2. Map from z to xRelationship
found byregression
D { img(A’), B} = 0.0109
<
D {img(A’), C} = 0.0125
A is correctlyclassified
as in the same class as B
![Page 45: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/45.jpg)
out-domainword vectors
in-domainword vectors
X
Dimensionality Reduction
![Page 46: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/46.jpg)
out-domainword vectors
in-domainword vectors
X
Dimensionality Reduction
Missing Values
![Page 47: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/47.jpg)
out-domainword vectors
in-domainword vectors
X
Dimensionality Reduction
Overlapping Features
Missing Values
![Page 48: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/48.jpg)
out-domainword vectors
in-domainword vectors
X
Dimensionality Reduction
Missing Values Filled
Overlapping Features
Missing Values
![Page 49: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/49.jpg)
out-domainword vectors
in-domainword vectors
X
Dimensionality Reduction
Missing Values Filled
Overlapping Features
Missing Values
Word vector Matrix
![Page 50: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/50.jpg)
Dimensionality Reduction
• Project the word vector matrix to the most important and inherent sub-space
![Page 51: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/51.jpg)
Dimensionality Reduction
• Project the word vector matrix to the most important and inherent sub-space
=
d×t
XVk
UT
Σ-1
![Page 52: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/52.jpg)
Dimensionality Reduction
• Project the word vector matrix to the most important and inherent sub-space
=
d×t
XVk
UT
Σ-1
Low dimensional
representation
![Page 53: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/53.jpg)
Solution (high dimensionality)
recall the previous example
![Page 54: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/54.jpg)
Solution (high dimensionality)
recall the previous example
![Page 55: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/55.jpg)
Solution (high dimensionality)
recall the previous example
The blues are closer to the greens than to the
reds
![Page 56: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/56.jpg)
Solution (high dimensionality)
recall the previous example
![Page 57: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/57.jpg)
Solution (high dimensionality)
The blues are closer to the reds than to the greens
recall the previous example
![Page 58: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/58.jpg)
Properties It can bring the marginal distributions of two
domains closer.- Marginal distributions are brought closer in high-dimensional space (section 3.2)- Two marginal distributions are further minimized in low dimensional space. (theorem 3.2)
It brings two domains conditional distributions closer.- Nearby instances from two domains have similar
conditional distributions (section 3.3)
It can reduce domain transfer risk- The risk of nearest neighbor classifier can be bounded in transfer learning settings. (theorem 3.3)
![Page 59: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/59.jpg)
Experiment (I)
Data Sets 20 News Groups
20000 newsgroup articles SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation)
Reuters 21758 Reuters news articles (1987)
![Page 60: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/60.jpg)
Experiment (I)
Data Sets 20 News Groups
20000 newsgroup articles SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation)
Reuters 21758 Reuters news articles (1987)
First fill up the “GAP”, then useknn classifier to do classification
20 News groups
comp
comp.sys
comp.graphics
rec
rec.sport
rec.auto
Out-Domain
In-Domain
![Page 61: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/61.jpg)
Experiment (I)
Data Sets 20 News Groups
20000 newsgroup articles SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation)
Reuters 21758 Reuters news articles (1987)
Baseline methods naïve Bayes, logistic regression, SVMs Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0
![Page 62: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/62.jpg)
Experiment (I)
Data Sets 20 News Groups
20000 newsgroup articles SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups Reuters
21758 Reuters news articles Baseline methods
naïve Bayes, logistic regression, SVM Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0
Try to justify the two steps in our framework
![Page 63: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/63.jpg)
Learning Tasks
![Page 64: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/64.jpg)
Experiment (II)10 win1 lossOverall performance
![Page 65: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/65.jpg)
Experiment (III)
knnReg: Missing values filled but without SVD
Compared with knnReg8 win3 loss
pLatentMap: SVD but without filling missing values
Compared with pLatentMap8 win3 loss
![Page 66: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/66.jpg)
Conclusion Problem: High dimensional overlapping domain
transfer -– text and image categorization
Step 1: Missing values filling up
--- Bring two domains’ marginal distributions closer
Step 2: SVD dimension reduction
--- Further bring two marginal distributions closer (Theorem 3.2)
--- Cluster points from two domains, making conditional distribution transferable. (Theorem 3.3
![Page 67: Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.](https://reader036.fdocuments.us/reader036/viewer/2022062719/56649ed35503460f94be417c/html5/thumbnails/67.jpg)