Transfer Learning for Low-resource Natural...
Transcript of Transfer Learning for Low-resource Natural...
![Page 1: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/1.jpg)
Yuan ZhangJanuary 30, 2017
Transfer Learning for Low-resource Natural Language Analysis
![Page 2: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/2.jpg)
Low-resource Problem
• Top-performing systems need large amounts of annotated data
55
65
75
85
9590.3
74.2
Dependency Parsing Accuracy on English
Size of Training Data3.6k tokens 950k tokens
2
![Page 3: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/3.jpg)
ç
Low-resource Scenarios
Low-resource Languages:
Malagasy annotations~1,000 tokens ç
English annotations> 1 million tokens
3
![Page 4: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/4.jpg)
Medical: ~ 500 sentences
ç
Low-resource Scenarios
Low-resource Languages:
Low-resource Domains:
Malagasy annotations~1,000 tokens ç
English annotations> 1 million tokens
News articles: > 100k sentences
3
![Page 5: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/5.jpg)
Our Work: Transfer Learning
Target:
78
74
Improvement via transfer learning
low accuracyresource-poor
Source: resource-rich
Similar Languages
Related Domains
• Use rich resources in related source tasks to improve target performance
4
![Page 6: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/6.jpg)
Challenges in Transfer: Multilingual
• Part-of-speech (POS) tagging: different vocabulary
a red apple DET ADJ NOUN
Source: English
une pomme rougeDET NOUN ADJ
Target: French
(a) (apple) (red)
❓
5
![Page 7: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/7.jpg)
Challenges in Transfer: Multilingual
• Part-of-speech (POS) tagging: different vocabulary
a red apple DET ADJ NOUN
Source: English
une pomme rougeDET NOUN ADJ
Target: French
(a) (apple) (red)
❓
• Dependency parsing: different word ordering
5
![Page 8: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/8.jpg)
Challenges in Transfer: Monolingual• Domain transfer: different writing-style
The fries were undercooked
Source: Restaurant reviews Target: Hotel reviews
The room rained water from above
6
❓
![Page 9: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/9.jpg)
Challenges in Transfer: Monolingual• Domain transfer: different writing-style
The fries were undercooked
Source: Restaurant reviews Target: Hotel reviews
The room rained water from above
• Aspect transfer: different aspects in the same domain
Source Aspect: IDC Target Aspect: LVI
FINAL DIAGNOSIS: BREAST (LEFT) … INVASIVE DUCTAL CARCINOMA (IDC) Tumor size: num x num x num cm Grade: 3. Lymphatic vessel invasion (LVI): Not identified. Blood vessel invasion: Suspicious. Margin of invasive carcinoma …
6
❓
❓
![Page 10: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/10.jpg)
General Setup: Low-resource Transfer• No annotations for the target task
Source TargetLabeled ✔ ✖
Unlabeled ✔ ✔
• No parallel data, or a few word translation pairs
• Low level of human effort✦ Existing external resources✦ No feature engineering
7
![Page 11: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/11.jpg)
General Setup: Low-resource Transfer• No annotations for the target task
Source TargetLabeled ✔ ✖
Unlabeled ✔ ✔
• No parallel data, or a few word translation pairs
• Low level of human effort✦ Existing external resources✦ No feature engineering
7
Contribution: Improve low-resource transfer in multilingual and monolingual scenarios
![Page 12: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/12.jpg)
Our ApproachMultilingual Transfer:
• Hierarchical tensors for dependency parsing- Prior knowledge incorporation without feature engineering
• Multilingual embeddings for POS tagging
Monolingual Transfer:
• Adversarial networks for aspect transfer
8
![Page 13: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/13.jpg)
Multilingual Transfer for Dependency ParsingTrain on Source Languages
9
English
Test on Target Language
… …
Spanish
… …
Je mange une pomme rouge
Dependency Parser
French
Je mange une pomme rouge✴ sentences are non-parallel
(I) (eat) (a) (apple) (red)
(I) (eat) (a) (apple) (red)
![Page 14: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/14.jpg)
Non-lexical Transfer via Universal POS
10
Train on Source Languages Test on Target Language
…
…
Dependency Parser
French
PRON VERB DET NOUN ADJ
PRON VERB DET NOUN ADJ
English
… …
… …
Spanish
PRON VERB DET ADJ NOUN
PRON VERB DET NOUN ADJ
![Page 15: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/15.jpg)
Challenge: Different Word Ordering
11
Train on Source Languages Test on Target Language
Dependency Parser
French
PRON VERB DET NOUN ADJ
PRON VERB DET NOUN ADJ
…
…
English
… …
……
Spanish
PRON VERB DET ADJ NOUN
PRON VERB DET NOUN ADJ
![Page 16: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/16.jpg)
Solution: Linguistic Typology
12
French: 87A=NOUN-ADJ
English: 87A=ADJ-NOUN
Spanish: 87A=NOUN-ADJ
❌
• Form of typological features
Typological Feature English French
87A: Order of Noun and Adjective ADJ-NOUN NOUN-ADJ
• Idea of selective transfer
![Page 17: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/17.jpg)
Utilizing Typology Knowledge
13
Knowledge Utilization
Engineering Effort
Low
High
Manual Automatic
![Page 18: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/18.jpg)
Utilizing Typology Knowledge
13
Knowledge Utilization
Engineering Effort
Low
High
Manual Automatic
Traditional approach: manual feature engineering
![Page 19: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/19.jpg)
Utilizing Typology Knowledge
13
Knowledge Utilization
Engineering Effort
Low
High
Manual Automatic
Traditional approach: manual feature engineering
Tensor scoring: invalid features violating prior knowledge
![Page 20: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/20.jpg)
Utilizing Typology Knowledge
13
Knowledge Utilization
Engineering Effort
Low
High
Manual Automatic
Traditional approach: manual feature engineering
Tensor scoring: invalid features violating prior knowledge
Our approach: hierarchical tensor with prior knowledge
![Page 21: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/21.jpg)
Traditional Approach: Feature Engineering
14
• Manually conjoin standard parsing features with typological features (Täckström et al., 2013)
{head POS=NOUN, modifier POS=ADJ, direction=Right, 87A=NOUN-ADJ}f100(·) = I
✴ 87A: code of noun-adjective typological feature
![Page 22: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/22.jpg)
Traditional Approach: Feature Engineering
14
• Manually conjoin standard parsing features with typological features (Täckström et al., 2013)
{head POS=NOUN, modifier POS=ADJ, direction=Right, 87A=NOUN-ADJ}
English: 87A=ADJ-NOUN
• Features are selectively shared
f100(·) = I
Spanish: 87A=NOUN-ADJ
NOUN ADJf100( ) = 0
f100( ) = 1NOUN ADJ
French: 87A=NOUN-ADJ
f100( ) = 1NOUN ADJ
❌
✴ 87A: code of noun-adjective typological feature
![Page 23: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/23.jpg)
Traditional Approach: Feature Engineering
14
• Manually conjoin standard parsing features with typological features (Täckström et al., 2013)
{head POS=NOUN, modifier POS=ADJ, direction=Right, 87A=NOUN-ADJ}
English: 87A=ADJ-NOUN
• Features are selectively shared
f100(·) = I
Spanish: 87A=NOUN-ADJ
NOUN ADJf100( ) = 0
f100( ) = 1NOUN ADJ
French: 87A=NOUN-ADJ
f100( ) = 1NOUN ADJ
❌
✴ 87A: code of noun-adjective typological feature
• In practice, need to manually construct hundreds of features
![Page 24: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/24.jpg)
Tensor Scoring Method
15
head POS
VERB
NOUN
ADV
ADJ
……
modifier POS direction typology
LEFT
RIGHT
NOUN-ADJ
ADJ-NOUN
VERB
NOUN
ADV
ADJ
……
NULL NULL
• Represent arc features in a tensor view (e.g., 4-way tensor)• Automatically capture all possible feature combinations
![Page 25: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/25.jpg)
Low-rank Feature Representation
16
• Avoid parameter explosion via low-rank factorization• Learn feature mappings to a low-rank representation
head POS feature vector parameter matrix low-rank representation
1⇥ d
d⇥ r
1⇥ r⇥ =
![Page 26: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/26.jpg)
Low-rank Feature Representation
17
head POS modifier POS direction typology
![Page 27: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/27.jpg)
Low-rank Feature Representation
17
head POS modifier POS direction typology
•
low-rank representation of an arc
• Compute low-rank representation of an arc via element-wise product
element-wise product
![Page 28: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/28.jpg)
Low-rank Feature Representation
17
head POS modifier POS direction typology
•
low-rank representation of an arc
• Compute low-rank representation of an arc via element-wise product
element-wise product
S(h ! m) = e0 + e1 + e2 + · · ·+ er
• Compute arc score as:
![Page 29: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/29.jpg)
Issue of Tensor Methods
18
head POS
VERB
NOUN
ADV
ADJ
……
modifier POS
direction typology
LEFT
RIGHT
NOUN-ADJ
ADJ-NOUN
NULL NULL
VERB
NOUN
ADV
ADJ
……
feature combination
{VERB, NOUN, LEFT,ADJ-NOUN}
Invalid Combination
• Capture invalid feature combinations and assign non-zero weights
![Page 30: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/30.jpg)
head POS
VERB
NOUN
ADV
ADJ
……
modifier POS
direction typology
LEFT
RIGHT
NOUN-ADJ
ADJ-NOUN
NULL NULL
VERB
NOUN
ADV
ADJ
……
feature combination
{VERB, NOUN, LEFT,ADJ-NOUN}
Invalid Combination
Issue of Tensor Methods
19
• Capture invalid feature combinations and assign non-zero weights• Should avoid directly taking tensor-product between typology and others
![Page 31: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/31.jpg)
Avoid Product Operation
20
head POS modifier POS direction
• typology
❌
![Page 32: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/32.jpg)
Target Feature Combination
21
head POS
modifier POS
direction
typology
••
Not combined
• Union of different feature groups
![Page 33: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/33.jpg)
Solution: Hierarchical Structure
22
head POS modifier POS direction
• typology
Typology representation over head, modifier and direction
Traditional representation over head, modifier and direction
• Element-wise sum operation over different representations of the same set of atomic features
element-wise sum
![Page 34: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/34.jpg)
Solution: Hierarchical Structure
22
head POS modifier POS direction
• typology
Typology representation over head, modifier and direction
Traditional representation over head, modifier and direction
Mixed representation over head, modifier and direction
• Element-wise sum operation over different representations of the same set of atomic features
element-wise sum=
![Page 35: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/35.jpg)
Solution: Hierarchical Structure
23
head POS modifier POS direction
• typology=
![Page 36: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/36.jpg)
Solution: Hierarchical Structure
23
head POS modifier POS direction
• typology=
label
•
Representat ion over head, modifier, direction and label
![Page 37: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/37.jpg)
Solution: Hierarchical Structure
23
head POS modifier POS direction
• typology=
label
•
Representat ion over head, modifier, direction and label
Typology representation over head, modifier, direction and label. E.g. subject-verb
= label typology
![Page 38: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/38.jpg)
Solution: Hierarchical Structure
24
head POS modifier POS direction
• typology=
label
• = label typology
• low-rank representation of an arc
head context POS
modifier context POS
![Page 39: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/39.jpg)
Algebraic Interpretation
25
=
=
• Algebraically equal the sum of three multiway tensors with shared parameters• Capture three groups of feature combinations
![Page 40: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/40.jpg)
Algebraic Interpretation
25
=
=
head POS
modifier POS
directionlabel
head context POS
modifier context POS
label
• Algebraically equal the sum of three multiway tensors with shared parameters• Capture three groups of feature combinations
head context POS
modifier context POS
head context POS
modifier context POS
typology label typology
• • •
![Page 41: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/41.jpg)
Avoid Invalid Features
26
• Assign zero weights to invalid features
• Exclude the combination of typology with head, modifier and direction
head POS
modifier POS
direction
typology
••
Not combined
✴ Weight of {head POS=VERB, mod POS=NOUN, typology=ADJ-NOUN} is 0
![Page 42: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/42.jpg)
Parameter Initialization and Learning
27
Algebraic view: Compute the gradient for each multiway tensor and take the sum
Tensor initialization: Use iterative power methods
Parameter learning: Adopt online learning with passive-aggressive algorithm
Other details: Follow previous work (Lei et al., 2015)
![Page 43: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/43.jpg)
Experimental Setup
28
Dataset: Universal Dependency Treebank v2.0- 10 languages- Universal POS tags (12 tags)- Stanford dependency labels (40 labels)
Baselines:- Direct transfer (McDonald et al., 2005)- Feature-based transfer (Täckström et al., 2013)- Traditional multiway tensor
![Page 44: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/44.jpg)
Unsupervised Results
29
65
67
69
71
73 72.6
Ours
• Setting: no annotations in the target language
Averaged Unlabeled Attachment Score (UAS)
![Page 45: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/45.jpg)
Unsupervised Results
30
65
67
69
71
73 72.6
67.8
Direct Transfer Ours
Averaged Unlabeled Attachment Score (UAS)
![Page 46: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/46.jpg)
Unsupervised Results
31
65
67
69
71
73 72.671.5
67.8
Direct Transfer OursNT-Select
• NT-Select: our model without the tensor component, corresponding to prior feature-based method (Täckström et al., 2013)
Averaged Unlabeled Attachment Score (UAS)
![Page 47: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/47.jpg)
Unsupervised Results
32
65
67
69
71
73 72.672
71.5
67.8
Averaged Unlabeled Attachment Score (UAS)
Direct Transfer OursNT-Select Multiway
• Multiway: traditional multiway tensor without hierarchical structure
![Page 48: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/48.jpg)
Semi-supervised Results
33
71
73
75
77
7977.9
76.976.2
75.6
73.4
Averaged Unlabeled Attachment Score (UAS)
Direct Transfer OursNT-Select Multiway
• Setting: 50 annotated sentences in the target language• Sup50: trained only on the 50 sentences in the target language
Sup50
![Page 49: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/49.jpg)
Summary
34
• Limitation: our model heavily relies on non-lexical transfer via universal POS tags
• Modeling: we present a hierarchical tensor that effectively uses linguistic prior knowledge
• Performance: our model outperforms state-of-the-art approach and traditional tensors
Next part: lexical-level multilingual transfer
![Page 50: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/50.jpg)
Our ApproachMultilingual Transfer:
• Hierarchical tensors for dependency parsing
- Effective multilingual transfer with ten translation pairs
• Multilingual embeddings for POS tagging
Monolingual Transfer:
• Adversarial networks for aspect transfer
35
![Page 51: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/51.jpg)
Multilingual Transfer of POS Tagging
Tagging Accuracy on German98.2
Supervised700k tokens(Brants, 2000)
36
![Page 52: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/52.jpg)
Multilingual Transfer of POS Tagging
Tagging Accuracy on German98.2
82.8
Supervised700k tokens(Brants, 2000)
Multilingual Transfer2m parallel sentences
(Das and Petrov, 2011)
37
![Page 53: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/53.jpg)
Multilingual Transfer of POS Tagging
Tagging Accuracy on German98.2
82.8
25.5
Supervised700k tokens(Brants, 2000)
Multilingual Transfer2m parallel sentences
(Das and Petrov, 2011)
Prototype-driven14 prototypes
(Haghighi et al., 2006)
38
![Page 54: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/54.jpg)
Multilingual Transfer of POS Tagging
Tagging Accuracy on German98.2
82.8
25.5
Supervised700k tokens(Brants, 2000)
Multilingual Transfer2m parallel sentences
(Das and Petrov, 2011)
Multilingual TransferTen Translation PairsNo parallel sentences
❓
Prototype-driven14 prototypes
(Haghighi et al., 2006)
39
![Page 55: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/55.jpg)
Multilingual Transfer of POS Tagging
Tagging Accuracy on German98.2
82.8
25.5
Supervised700k tokens(Brants, 2000)
Multilingual Transfer2m parallel sentences
(Das and Petrov, 2011)
Multilingual TransferTen Translation PairsNo parallel sentences
❓
Prototype-driven14 prototypes
(Haghighi et al., 2006)
How little parallel data is necessary to enable multilingual transfer?
39
![Page 56: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/56.jpg)
Our Work• Task: multilingual transfer of part-of-speech (POS) tagging• Data:
Source TargetLabeled ✔ ✖
Unlabeled ✔ ✔ (non-parallel data)
40
![Page 57: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/57.jpg)
Our Work• Task: multilingual transfer of part-of-speech (POS) tagging• Data:
Source TargetLabeled ✔ ✖
Unlabeled ✔ ✔
Ten Translation Pairs
. || .
, || ,
der || the
die || the
in || in
und || and
dem || the
von || from
-‐ || -‐
zu || to
(non-parallel data)
40
![Page 58: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/58.jpg)
Our Work• Task: multilingual transfer of part-of-speech (POS) tagging• Data:
Source TargetLabeled ✔ ✖
Unlabeled ✔ ✔
Ten Translation Pairs
. || .
, || ,
der || the
die || the
in || in
und || and
dem || the
von || from
-‐ || -‐
zu || to
68.7
25.5
Prototype(Haghighi et al., 2006)
Ours
POS Accuracy on German
(non-parallel data)
40
![Page 59: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/59.jpg)
Our Two-step Method
1. Learn coarse mapping between embeddings via ten translation pairs
2. Refine embedding transformations and model parameters via unsupervised learning on the target language
41
![Page 60: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/60.jpg)
Coarse Mapping between Embeddings• Goal: find a linear transformation from target to source embedding space• Objective: minimize the distance between translation pairs
Monolingual Embedding
dog
catred
is
Source(English)
Target(German)
Hund (dog)
Katze (cat)
rot (red)
ist (is)
42
![Page 61: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/61.jpg)
Coarse Mapping between Embeddings• Goal: find a linear transformation from target to source embedding space• Objective: minimize the distance between translation pairs
Monolingual Embedding
dog
catred
is
Source(English)
Target(German)
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
red | | rot cat | | Katzedog | | Hund
42
![Page 62: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/62.jpg)
Coarse Mapping between Embeddings• Goal: find a linear transformation from target to source embedding space• Objective: minimize the distance between translation pairs
Monolingual Embedding
dog
catred
is
Source(English)
Target(German)
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
red | | rot cat | | Katze
Too many degrees of freedom
dimension: # pairs: degree of freedom:
201010
dog | | Hund
42
![Page 63: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/63.jpg)
Coarse Mapping between Embeddings• Goal: find a linear transformation from target to source embedding space• Objective: minimize the distance between translation pairs
Monolingual Embedding
dog
catred
is
Source(English)
Target(German)
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
red | | rot cat | | Katze
Too many degrees of freedom
dimension: # pairs: degree of freedom:
201010
Solutions need to be constrained!
dog | | Hund
42
![Page 64: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/64.jpg)
Source(English) dog
catred
is
dog
catred
is
Target(German)Translation Pairs
red | | rot cat | | Katzedog | | Hund
Our Solution: Isometric Constraints• Transformation is an isometric (orthonormal) matrix• Transformation preserves angles and lengths (cosine similarity) of word
vectors, thus preserving semantic relations
Monolingual Embedding
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Isometric Solution
P
Isometric Constraints
P TP = I
43
![Page 65: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/65.jpg)
Source(English) dog
catred
is
dog
catred
is
Target(German)Translation Pairs
red | | rot cat | | Katzedog | | Hund
Our Solution: Isometric Constraints• Transformation is an isometric (orthonormal) matrix• Transformation preserves angles and lengths (cosine similarity) of word
vectors, thus preserving semantic relations
Monolingual Embedding
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Isometric Solution
P
Isometric Constraints
P TP = I
coshcat, dogi ⇡ coshKatze, Hundi, coshdog, redi ⇡ coshHund, roti
43
![Page 66: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/66.jpg)
Translation Pairs
red | | rot cat | | Katzedog | | Hund
P TP = I
Our Solution: Isometric Constraints• Transformation is an isometric (orthonormal) matrix• Transformation preserves angles and lengths (cosine similarity) of word
vectors, thus preserving semantic relations
P
Monolingual Embedding Isometric Solution
Isometric Constraints
Target(German)
dog
catred
is
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Source(English)
44
![Page 67: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/67.jpg)
Translation Pairs
red | | rot cat | | Katzedog | | Hund
dog
catred
is
P TP = I dog
catred
is
Our Solution: Isometric Constraints• Transformation is an isometric (orthonormal) matrix• Transformation preserves angles and lengths (cosine similarity) of word
vectors, thus preserving semantic relations
P
Monolingual Embedding Isometric Solution
Isometric Constraints
Target(German)
Source(English)
Hund (dog)
Katze (cat)
rot (red)
ist (is)
45
![Page 68: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/68.jpg)
Translation Pairs
red | | rot cat | | Katzedog | | Hund
dog
catred
is
P TP = IHund (dog)
Katze (cat)
rot (red)
ist (is)
• Use the steepest descent algorithm (Abrudan et al., 2008)
Our Solution: Isometric Constraints• Transformation is an isometric (orthonormal) matrix• Transformation preserves angles and lengths (cosine similarity) of word
vectors, thus preserving semantic relations
P
Monolingual Embedding Isometric Solution
Isometric Constraints
Target(German)
Source(English) dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
46
![Page 69: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/69.jpg)
Validation of Isometric Constraints• Validation for• Verify whether nearest neighbors are preserved after translations
k 2✦ For 50% of word pairs,
English: nearest neighbor
dog
cat
German: k-th (k≤2) nearest neighbor?
Hund (dog)
Katze (cat)
coshcat, dogi ⇡ coshKatze, Hundi, coshdog, redi ⇡ coshHund, roti
47
![Page 70: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/70.jpg)
Validation of Isometric Constraints• Validation for• Verify whether nearest neighbors are preserved after translations
k 2
k 10
✦ For 50% of word pairs,
✦ For 90% of word pairs,
English: nearest neighbor
dog
cat
German: k-th (k≤2) nearest neighbor?
Hund (dog)
Katze (cat)
English: nearest neighbor
dog
cat
German: k-th (k≤10) nearest neighbor?
Hund (dog)
Katze (cat)
coshcat, dogi ⇡ coshKatze, Hundi, coshdog, redi ⇡ coshHund, roti
47
![Page 71: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/71.jpg)
Direct Transfer Model• Supervised source language HMM
✦ Feature-based HMM (Berg-Kirkpatrick et al., 2010)✦ Word embeddings as emission features
Source
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
p(x|y) / exp{vT
x
µy
}
Direct Transfer
Target
p
dt
(x|y) / exp{vT
x
Pµy
}
48
![Page 72: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/72.jpg)
Direct Transfer Model• Supervised source language HMM
✦ Feature-based HMM (Berg-Kirkpatrick et al., 2010)✦ Word embeddings as emission features
Source
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
p(x|y) / exp{vT
x
µy
}
Direct Transfer
Target
Coarse mapping is not accurate
p
dt
(x|y) / exp{vT
x
Pµy
}
48
![Page 73: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/73.jpg)
Our Two-step Method
1. Learn coarse mapping between embeddings via ten translation pairs
2. Refine embedding transformations and model parameters via unsupervised learning on the target language
49
![Page 74: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/74.jpg)
Unsupervised Target Language HMM• Use the direct transfer model (based on the coarse mapping) to initialize
and regularize the unsupervised tagger on the target language
• Refine mapping via global linear transformation and local non-linear adjustment
p(x|y) / exp{vT
x
PMµy
+ ✓
x,y
}
M✓x,y
50
![Page 75: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/75.jpg)
Unsupervised Target Language HMM• Use the direct transfer model (based on the coarse mapping) to initialize
and regularize the unsupervised tagger on the target language
• Refine mapping via global linear transformation and local non-linear adjustment
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Coarse Mapping
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
p(x|y) / exp{vT
x
PMµy
+ ✓
x,y
}
M✓x,y
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
50
![Page 76: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/76.jpg)
Unsupervised Target Language HMM• Use the direct transfer model (based on the coarse mapping) to initialize
and regularize the unsupervised tagger on the target language
• Refine mapping via global linear transformation and local non-linear adjustment
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Coarse Mapping
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Global: Local:
p(x|y) / exp{vT
x
PMµy
+ ✓
x,y
}
M✓x,y
M✓x,y
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
Unsupervised Learning
50
![Page 77: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/77.jpg)
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Coarse Mapping
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Global: Local:
p(x|y) / exp{vT
x
PMµy
+ ✓
x,y
}
M✓x,y
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
Unsupervised Learning
Unsupervised Target Language HMM• Use the direct transfer model (based on the coarse mapping) to initialize
and regularize the unsupervised tagger on the target language
• Refine mapping via global linear transformation and local non-linear adjustment
M✓x,y
51
![Page 78: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/78.jpg)
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Coarse Mapping
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Global: Local:
p(x|y) / exp{vT
x
PMµy
+ ✓
x,y
}
M✓x,y
dog
catred
is
Hund (dog)
Katze (cat)
rot (red)
ist (is)
Translation Pairs
Unsupervised Learning
Unsupervised Target Language HMM• Use the direct transfer model (based on the coarse mapping) to initialize
and regularize the unsupervised tagger on the target language
• Refine mapping via global linear transformation and local non-linear adjustment
M✓x,y
52
![Page 79: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/79.jpg)
Experimental Setup
• Datasets: Universal Dependency Treebank v1.2✦ Source: English✦ Target (Indo-European): Danish, German, Spanish✦ Target (non-Indo-European): Finnish, Hungarian, Indonesian
• Universal tagset: 14 tags (noun, verb, adjective etc.)
• Word embeddings: 20-dimension vectors trained on Wiki dumps using word2vec
53
![Page 80: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/80.jpg)
Indo-European Results
0
20
40
60
80 72.9
60.9
31.8
Prototype(Haghighi et al., 2006)
Direct Transfer Ours Full
Averaged Accuracy on Indo-European Languages
54
![Page 81: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/81.jpg)
Non-Indo-European Results
0
17.5
35
52.5
7062.1
57.7
27.6
Prototype(Haghighi et al., 2006)
Direct Transfer Ours Full
Averaged Accuracy on non-Indo-European Languages
55
![Page 82: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/82.jpg)
Prediction of Linguistic Typology
40
55
70
85
100 93
80
66.760
Prototype Direct Transfer
• Task: predict whether a language is verb-object or object-verb (five typological properties)
• Features: bigrams and trigrams of POS tags
GoldOurs Full
56
![Page 83: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/83.jpg)
Impact of Amount of Supervision
# Translation Pairs or Prototypes10 20 50 100 150 200 500 1000
Accu
racy
20
30
40
50
60
70
80
90
Direct TransferTransfer+EMPrototype
Accuracy on German
• Ours Full with 10 pairs = 150 prototypes
57
Ours FullDirect Transfer
Prototype
![Page 84: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/84.jpg)
Impact of Amount of Supervision
# Translation Pairs or Prototypes10 20 50 100 150 200 500 1000
Accu
racy
20
30
40
50
60
70
80
90
Direct TransferTransfer+EMPrototype
Accuracy on German
• Ours Full with 10 pairs = 150 prototypes
• Prototype improves with large amount of annotations
57
Ours FullDirect Transfer
Prototype
![Page 85: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/85.jpg)
Summary
• Modeling: ten translation pairs are sufficient to enable multilingual transfer for POS tagging
• Performance: our model significantly outperforms the direct transfer and the prototype-driven method
58
![Page 86: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/86.jpg)
Our ApproachMultilingual Transfer:
• Hierarchical tensors for dependency parsing
- Joint aspect-driven encoding and domain adversarial training
• Multilingual embeddings for POS tagging
Monolingual Transfer:
• Adversarial networks for aspect transfer
59
![Page 87: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/87.jpg)
Aspect Transfer in Pathology Report
Pathology report:FINAL DIAGNOSIS: BREAST (LEFT) … INVASIVE DUCTAL CARCINOMA (IDC) Tumor size: num x num x num cm Grade: 3. Lymphatic vessel invasion (LVI): Not identified. Blood vessel invasion: Suspicious. Margin of invasive carcinoma …
Diagnosis results:IDC: Positive LVI: Negative
Transfer:Source: IDC Target: LVI
60
![Page 88: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/88.jpg)
Challenge
Same report; Different key sentences
Source Aspect: IDC Target Aspect: LVI
• Traditional methods will fail because they always induce the same representation for the same input
FINAL DIAGNOSIS: BREAST (LEFT) … INVASIVE DUCTAL CARCINOMA (IDC) Tumor size: num x num x num cm Grade: 3. Lymphatic vessel invasion (LVI): Not identified. Blood vessel invasion: Suspicious. Margin of invasive carcinoma …
61
![Page 89: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/89.jpg)
Available Supervision
Source Target
Labeled Data
Unlabeled Data
Relevance Rules
❌
• Relevance rules: common names of aspects- ALH: Atypical Lobular Hyperplasia, ALH- IDC: Invasive Ductal Carcinoma, IDC
62
![Page 90: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/90.jpg)
Transfer Assumption: Aspects Are Related
63
• Different aspects share the same label set: positive/negativeIDC: Positive LVI: Negative
![Page 91: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/91.jpg)
Transfer Assumption: Aspects Are Related
63
• Different aspects share the same label set: positive/negative
• Common words are directly transferrable
Invasive Carcinoma is presentLabel: Positive
Lymphatic vessel invasion: presentLabel: Positive
IDC: Positive LVI: Negative
![Page 92: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/92.jpg)
Transfer Assumption: Aspects Are Related
63
• Different aspects share the same label set: positive/negative
• Common words are directly transferrable
Invasive Carcinoma is presentLabel: Positive
Lymphatic vessel invasion: presentLabel: Positive
• Aspect-specific words are not directly transferrable
- Goal: map them to invariant representations
Invasive Ductal Carcinoma
Lymphatic Vessel Invasion
IDC: Positive LVI: Negative
![Page 93: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/93.jpg)
Key Idea: Aspect-driven Encoding• Leverage relevance rules to learn to identify key sentences
Source aspect representation
…
…
Target aspect representation
…… INVASIVE CARCINOMA Tumor size: Grade: 3. Lymphatic vessel invasion: Not identified. ……
…… INVASIVE CARCINOMA Tumor size: Grade: 3. Lymphatic vessel invasion: Not identified. ……
64
• Learn differential representations for different aspects from the same input
![Page 94: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/94.jpg)
Key Idea: Aspect-driven Encoding• Leverage relevance rules to learn to identify key sentences
Source aspect representation
…
…
Target aspect representation
…… INVASIVE CARCINOMA Tumor size: Grade: 3. Lymphatic vessel invasion: Not identified. ……
…… INVASIVE CARCINOMA Tumor size: Grade: 3. Lymphatic vessel invasion: Not identified. ……
64
Reduce aspect transfer to standard domain adaptation
• Learn differential representations for different aspects from the same input
![Page 95: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/95.jpg)
Key Idea: Domain-Adversarial
• Use domain-adversarial training for learning invariant representations
D
Source aspect/domain representation
Target aspect/domain representation
- Objective: Not separable by the domain classifier
65
• Jointly train a domain classifier
![Page 96: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/96.jpg)
Overall Framework: Three Components
Pathology report
Document representation Label
predictordocument
label y
Domain classifier
domain label d
…
Document encoder
66
![Page 97: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/97.jpg)
Overall Framework: Three Components
Pathology report
Document representation Label
predictordocument
label y
Domain classifier
domain label d
…
Document encoder
66
![Page 98: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/98.jpg)
Sentence Embedding• Apply a CNN to each sentence
ductal carcinoma is identified
… … … …
… …… …
……
…sentence embeddings
max-pooling
x0 x1 x2 x3
h1 h2
67
![Page 99: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/99.jpg)
Sentence Embedding• Apply a CNN to each sentence
reconstruction of
ductal carcinoma is identified
… … … …
… …… …
……
…sentence embeddings
max-pooling…
x0 x1 x2 x3
x̂2 = tanh(Wch2 + b
c)x2
h1 h2
• Improve adversarial training by reconstruction
67
![Page 100: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/100.jpg)
Aspect-relevance Prediction• Predict relevance score based on sentence embeddings
Pathology report
INVASIVE CARCINOMA Tumor size … Grade: 3.
Lymphatic vessel invasion: Not identified.
……………….
……………….
……
r = 1.0
r = 0.0
Sentence embeddings
• Train on relevance rules (e.g., names of IDC, LVI)
68
Predicted relevance score
![Page 101: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/101.jpg)
Aspect-relevance Prediction• Predict relevance score based on sentence embeddings
Pathology report
INVASIVE CARCINOMA Tumor size … Grade: 3.
Lymphatic vessel invasion: Not identified.
……………….
……………….
……
r = 1.0
r = 0.0
Sentence embeddings
• Train on relevance rules (e.g., names of IDC, LVI)
Relevance rules
68
Predicted relevance score
ground truth=1
ground truth=0
![Page 102: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/102.jpg)
Aspect-driven Document Encoding
• Combine sentence vectors based on relevance weights
……
…
Weighted combination
Document representation
Pathology report
INVASIVE CARCINOMA Tumor size … Grade: 3.
Lymphatic vessel invasion: Not identified.
……………….
……………….
……
r = 1.0
r = 0.0
Sentence embeddings
69
Predicted relevance score
![Page 103: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/103.jpg)
Aspect-driven Document Encoding
• Combine sentence vectors based on relevance weights
……
…
Weighted combination
Document representation
Pathology report
INVASIVE CARCINOMA Tumor size … Grade: 3.
Lymphatic vessel invasion: Not identified.
……………….
……………….
……
r = 1.0
r = 0.0
Sentence embeddings
• Add a transformation layer at the end
69
Predicted relevance score
![Page 104: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/104.jpg)
Document Label Predictor• Share for both source and target aspects• Train on labeled data in the source aspect
Pathology report
Document representation document label y
Objective: predict labels
…
…
ReLUSoftmax
70
![Page 105: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/105.jpg)
Domain Classifier and Adversary• Learn domain-invariant representations• Train on both labeled and unlabeled data
Pathology report
Document representation document label y
Objective: predict labels
…
…
ReLUSoftmax
domain label d
Objective: predict domains
Adversary objective: fail the domain classifier
…ReLU
Softmax
71
![Page 106: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/106.jpg)
Pathology Dataset• Aspect-transfer on breast cancer pathology reports from hospitals such
as MGH
FINAL DIAGNOSIS: BREAST (LEFT) … INVASIVE DUCTAL CARCINOMA Grade: 3. Lobular Carcinoma In-situ: Not identified. Blood vessel invasion: Suspicious. …
Source: IDC Target: LCIS
72
![Page 107: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/107.jpg)
Pathology Dataset• Aspect-transfer on breast cancer pathology reports from hospitals such
as MGH
FINAL DIAGNOSIS: BREAST (LEFT) … INVASIVE DUCTAL CARCINOMA Grade: 3. Lobular Carcinoma In-situ: Not identified. Blood vessel invasion: Suspicious. …
Source: IDC Target: LCIS
Aspects #Labeled #Unlabeled Relevance RulesDCIS 23.8k
96.6k
DCIS, Ductal Carcinoma In-Situ
LCIS 10.7k LCIS, Lobular Carcinoma In-Situ
IDC 22.9k IDC, Invasive Ductal Carcinoma
ALH 9.2k ALH, Atypical Lobular Hyperplasia
✦ 500 reports for testing
• Statistics and relevance rules:
72
![Page 108: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/108.jpg)
Review Dataset• Domain transfer for sentiment analysis: positive or negative• Common words (e.g. excellent) are directly transferrable, but domain-specific
words are not
- Excellent food.- The fries were undercooked
and thrown haphazardly into the sauce holder …
Target: Restaurant (Yelp)Source: Hotel (TripAdvisor)
- This place was excellent!- In the second bedroom it
literally rained water from above …
73
![Page 109: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/109.jpg)
Review Dataset• Domain transfer for sentiment analysis: positive or negative• Common words (e.g. excellent) are directly transferrable, but domain-specific
words are not
- Excellent food.- The fries were undercooked
and thrown haphazardly into the sauce holder …
Target: Restaurant (Yelp)Source: Hotel (TripAdvisor)
- This place was excellent!- In the second bedroom it
literally rained water from above …
Domains #Labeled #Unlabeled Relevance Rules
Hotel 100k 100k Five aspects, 290 keywords (Wang et al., 2011)
Restaurant - 200k (only one overall aspect)
✦ 2k reviews for testing
• Statistics and relevance rules:
73
![Page 110: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/110.jpg)
Results on Pathology Dataset
20
40
60
80
100 94.1
67.1
Averaged accuracy over 6 transfer scenarios
mSDA Ours-Full
• mSDA: marginalized stacked denoising autoencoder (Chen et al., 2012)
74
![Page 111: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/111.jpg)
Results on Pathology Dataset
20
40
60
80
100 94.1
81.3
67.1
Averaged accuracy over 6 transfer scenarios
mSDA Ours-NA Ours-Full
• Ours-NA: our model without adversarial training
75
![Page 112: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/112.jpg)
Results on Pathology Dataset
20
40
60
80
100 94.1
69.881.3
67.1
Averaged accuracy over 6 transfer scenarios
mSDA Ours-NA Ours-NR Ours-Full
• Ours-NR: our model without aspect-relevance scoring
76
![Page 113: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/113.jpg)
Results on Pathology Dataset
20
40
60
80
100 96.994.1
69.881.3
67.1
Averaged accuracy over 6 transfer scenarios
mSDA Ours-NA Ours-NR Ours-Full In-domain
• In-domain: supervised training with in-domain annotations
77
![Page 114: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/114.jpg)
Results on Review Dataset
60
70
80
90
10093.4
86.487.383.981.6
Averaged accuracy over 5 transfer scenarios
mSDA Ours-NA Ours-NR Ours-Full In-domain
• Ours-NR and Ours-Full are the two best performing systems
78
• Relevance scoring has little impact because aspects are highly correlated
![Page 115: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/115.jpg)
Impact of Reconstruction
75
80
85
90
95
+adversarial -adversarial
81.3
94.1
78.6
89.5
-reconstruction+reconstruction
Average accuracy on the pathology dataset
79
• The same observation on the review dataset
![Page 116: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/116.jpg)
Reason behind Improvement
-adversarial, -reconstruction +adversarial, -reconstruction +adversarial, +reconstruction
• Heat-map: each row corresponds to a document vector- Top: source domain; Bottom: target domain
80
• Adversarial training removes lots of information
![Page 117: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/117.jpg)
Reason behind Improvement
-adversarial, -reconstruction +adversarial, -reconstruction +adversarial, +reconstruction
81
• The reconstruction loss improves both the richness and diversity of the learned representations
• Heat-map: each row corresponds to a document vector- Top: source domain; Bottom: target domain
• Adversarial training removes lots of information
![Page 118: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/118.jpg)
Case Study of Learned Representations
Restaurant Reviews• the fries were undercooked and thrown haphazardly into the sauce holder . the
shrimp was over cooked and just deep fried . … even the water tasted weird .
82
![Page 119: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/119.jpg)
Case Study of Learned Representations
Restaurant Reviews• the fries were undercooked and thrown haphazardly into the sauce holder . the
shrimp was over cooked and just deep fried . … even the water tasted weird .
Nearest Hotel Reviews by Ours-Full: learns to map domain-specific words• the room was old . … we did n’t like the night shows at all . …• however , the decor was just fair . … in the second bedroom it literally rained
water from above .
✦ distance measured by cosine similarity between representations
82
![Page 120: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/120.jpg)
Case Study of Learned Representations
Restaurant Reviews• the fries were undercooked and thrown haphazardly into the sauce holder . the
shrimp was over cooked and just deep fried . … even the water tasted weird .
Nearest Hotel Reviews by Ours-Full: learns to map domain-specific words• the room was old . … we did n’t like the night shows at all . …• however , the decor was just fair . … in the second bedroom it literally rained
water from above .
• rest room in this restaurant is very dirty . …• the only problem i had was that … i was very ill with what was suspected to be
food poison
Nearest Hotel Reviews by Ours-NA: only captures common sentiment phrases
✦ distance measured by cosine similarity between representations
82
![Page 121: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/121.jpg)
Summary
• Modeling: an aspect-augmented adversarial network for cross-aspect and cross-domain transfer tasks.
• Performance: our model significantly improves over the mSDA baseline and our model variants on a pathology and a review dataset
83
![Page 122: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/122.jpg)
ContributionsMultilingual Transfer:
• Hierarchical tensors for dependency parsing
- Joint aspect-driven encoding and domain adversarial training
• Multilingual embeddings for POS tagging
Monolingual Transfer:• Adversarial networks for aspect transfer
84
- Prior knowledge incorporation without feature engineering
- Effective multilingual transfer with ten translation pairs
![Page 123: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/123.jpg)
85
Thank you!
![Page 124: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/124.jpg)
ContributionsMultilingual Transfer:
• Hierarchical tensors for dependency parsing
- Joint aspect-driven encoding and domain adversarial training
• Multilingual embeddings for POS tagging
Monolingual Transfer:• Adversarial networks for aspect transfer
86
- Prior knowledge incorporation without feature engineering
- Effective multilingual transfer with ten translation pairs
![Page 125: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/125.jpg)
Backup Slides
87
![Page 126: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/126.jpg)
Typological Features
88
Word ordering: five features, e.g.
Order of Subject and Verb (82A)
Typological feature templates: eight templates, e.g.
direction, 87A, head POS=NOUN, modifier POS=ADJ
direction, 82A, head POS=VERB, modifier POS=NOUN, label=SUBJ
Order of Adjective and Noun (87A)
![Page 127: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/127.jpg)
Feature Weights of Multiway Model
89
Weights of valid features:
head POS=NOUN, mod POS=ADJ, 87A=ADJ-NOUN 2.24⇥ 10�3
Weights of invalid features:
head POS=VERB, mod POS=NOUN, 87A=ADJ-NOUN 8.88⇥ 10�4
head POS=NOUN, mod POS=NOUN, 87A=ADJ-NOUN 9.48⇥ 10�4
Multiway model assigns non-zero weights to invalid features
![Page 128: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/128.jpg)
Impact of Embedding Dimensions and Window Size
Dimension10 20 50 100 200
Accuracy
55
60
65
70window=1window=5
• Train embeddings with different dimensions and context window size• Small window size favors POS tagging
90
![Page 129: Transfer Learning for Low-resource Natural …people.csail.mit.edu/yuanzh/papers/phd_defense.pdfTransfer Learning for Low-resource Natural Language Analysis Low-resource Problem •](https://reader033.fdocuments.us/reader033/viewer/2022042116/5e942778a902c5724d63b2bb/html5/thumbnails/129.jpg)
Impact of Embedding Dimensions and Window Size
Dimension10 20 50 100 200
Accuracy
55
60
65
70window=1window=5
• Train embeddings with different dimensions and context window size• Small window size favors POS tagging
90
• Performance drops with either smaller or larger dimensions