Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

68
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012

Transcript of Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

Page 1: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

Semi-Supervised Learning & Summary

Advanced Statistical Methods in NLPLing 572

March 8, 2012

Page 2: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

2

RoadmapSemi-supervised learning:

Motivation & perspective

Yarowsky’s modelCo-training

Summary

Page 3: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

3

Semi-supervised Learning

Page 4: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

4

MotivationSupervised learning:

Page 5: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

5

MotivationSupervised learning:

Works really well But need lots of labeled training data

Unsupervised learning:

Page 6: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

6

MotivationSupervised learning:

Works really well But need lots of labeled training data

Unsupervised learning:No labeled data required, butMay not work well, may not learn desired

distinctions

Page 7: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

7

MotivationSupervised learning:

Works really well But need lots of labeled training data

Unsupervised learning:No labeled data required, butMay not work well, may not learn desired

distinctions

E.g. Unsupervised parsing techniquesFits data, but doesn’t correspond to linguistic

intuition

Page 8: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

8

SolutionSemi-supervised learning:

Page 9: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

9

SolutionSemi-supervised learning:

General idea:Use a small amount of labeled training data

Page 10: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

10

SolutionSemi-supervised learning:

General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training

data Use information in unlabeled data to improve models

Page 11: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

11

SolutionSemi-supervised learning:

General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training

data Use information in unlabeled data to improve models

Many different semi-supervised machine learnersVariants of supervised techniques:

Semi-supervised SVMs, CRFs, etc

Page 12: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

12

SolutionSemi-supervised learning:

General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training

data Use information in unlabeled data to improve models

Many different semi-supervised machine learnersVariants of supervised techniques:

Semi-supervised SVMs, CRFs, etc

Bootstrapping approaches Yarowsky’s method, self-training, co-training

Page 13: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

13

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in therainforest that we have not yet discovered.Biological Example

The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world-wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the…Industrial Example

Label the First Use of “Plant”

Page 14: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

14

Word Sense Disambiguation

Application of lexical semantics

Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest

Crucial for real syntactic & semantic analysis

Page 15: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

15

Word Sense Disambiguation

Application of lexical semantics

Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest

Crucial for real syntactic & semantic analysisCorrect sense can determine

.

Page 16: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

16

Word Sense Disambiguation

Application of lexical semantics

Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest

Crucial for real syntactic & semantic analysisCorrect sense can determine

Available syntactic structureAvailable thematic roles, correct meaning,..

Page 17: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

17

Disambiguation FeaturesKey: What are the features?

Page 18: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

18

Disambiguation FeaturesKey: What are the features?

Part of speech Of word and neighbors

Morphologically simplified formWords in neighborhood

Question: How big a neighborhood? Is there a single optimal size? Why?

(Possibly shallow) Syntactic analysisE.g. predicate-argument relations, modification, phrases

Collocation vs co-occurrence featuresCollocation: words in specific relation: p-a, 1 word +/-Co-occurrence: bag of words..

Page 19: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

19

WSD Evaluation

Page 20: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

20

WSD EvaluationIdeally, end-to-end evaluation with WSD

componentDemonstrate real impact of technique in systemDifficult, expensive, still application specific

Page 21: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

21

WSD EvaluationIdeally, end-to-end evaluation with WSD

componentDemonstrate real impact of technique in systemDifficult, expensive, still application specific

Typically, intrinsic, sense-basedAccuracy, precision, recallSENSEVAL/SEMEVAL: all words, lexical sample

Page 22: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

22

WSD Evaluation Ideally, end-to-end evaluation with WSD component

Demonstrate real impact of technique in system Difficult, expensive, still application specific

Typically, intrinsic, sense-based Accuracy, precision, recall SENSEVAL/SEMEVAL: all words, lexical sample

Baseline: Most frequent sense

Topline: Human inter-rater agreement: 75-80% fine; 90% coarse

Page 23: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

23

Minimally Supervised WSDYarowsky’s algorithm (1995)

Bootstrapping approach:Use small labeled seedset to iteratively train

Page 24: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

24

Minimally Supervised WSDYarowsky’s algorithm (1995)

Bootstrapping approach:Use small labeled seedset to iteratively train

Builds on 2 key insights:One Sense Per Discourse

Word appearing multiple times in text has same sense

Corpus of 37232 bass instances: always single sense

Page 25: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

25

Minimally Supervised WSD Yarowsky’s algorithm (1995)

Bootstrapping approach: Use small labeled seedset to iteratively train

Builds on 2 key insights: One Sense Per Discourse

Word appearing multiple times in text has same senseCorpus of 37232 bass instances: always single sense

One Sense Per CollocationLocal phrases select single sense

Fish -> Bass1

Play -> Bass2

Page 26: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

26

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag

Page 27: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

27

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K

Page 28: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

28

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K(A) Calculate Informativeness on Tagged Set,

Order:

Page 29: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

29

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K(A) Calculate Informativeness on Tagged Set,

Order:

(B) Tag New Instances with Rules

Page 30: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

30

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K(A) Calculate Informativeness on Tagged Set,

Order:

(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D)

Page 31: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

31

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K(A) Calculate Informativeness on Tagged Set,

Order:

(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2

Page 32: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

32

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K(A) Calculate Informativeness on Tagged Set,

Order:

(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2

3. Apply 1 Sense/Discourse

Page 33: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

33

Yarowsky’s AlgorithmTraining Decision Lists

1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,

Word +K(A) Calculate Informativeness on Tagged Set,

Order:

(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2

3. Apply 1 Sense/Discourse

Disambiguation: First Rule Matched

Page 34: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

34

Yarowsky Decision List

Page 35: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

35

Iterative Updating

Page 36: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

36

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in therainforest that we have not yet discovered.Biological Example

The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world-wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the…Industrial Example

Label the First Use of “Plant”

Page 37: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

37

Sense Choice With Collocational Decision

ListsCreate Initial Decision List

Rules Ordered by

Page 38: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

38

Sense Choice With Collocational Decision

ListsCreate Initial Decision List

Rules Ordered by

Check nearby Word Groups (Collocations)Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words

Page 39: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

39

Sense Choice With Collocational Decision

ListsCreate Initial Decision List

Rules Ordered by

Check nearby Word Groups (Collocations)Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words

Result: Correct Selection95% on Pair-wise tasks

Page 40: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

40

Self-TrainingBasic approach:

Start off with small labeled training set

Train a supervised classifier with the training set

Apply new classifier to residual unlabeled training data

Add ‘best’ newly labeled examples to labeled training

Iterate

Page 41: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

41

Self-TrainingSimple – right?

Page 42: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

42

Self-TrainingSimple – right?

Devil in the details:Which instances are ‘best’ to add?

Page 43: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

43

Self-TrainingSimple – right?

Devil in the details:Which instances are ‘best’ to add?

Highest confidence?Probably accurate, butProbably add little new information to classifier

Page 44: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

44

Self-TrainingSimple – right?

Devil in the details:Which instances are ‘best’ to add?

Highest confidence?Probably accurate, butProbably add little new information to classifier

Most different?Probably adds information, butMay not be accurate

Use most different, highly confident instances

Page 45: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

45

Co-TrainingBlum & Mitchell, 1998

Basic intuition: “Two heads are better than one”

Page 46: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

46

Co-TrainingBlum & Mitchell, 1998

Basic intuition: “Two heads are better than one”Ensemble classifier:

Uses results from multiple classifiers

Page 47: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

47

Co-TrainingBlum & Mitchell, 1998

Basic intuition: “Two heads are better than one”Ensemble classifier:

Uses results from multiple classifiers

Multi-view classifier:Uses different views of data – feature subsetsIdeally, views should be:

Conditionally independent Individually sufficient – enough information to learn

Page 48: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

48

Co-training Set-upCreate two views of data:

Typically partition feature set by typeE.g. predicting speech emphasis

View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context

Page 49: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

49

Co-training Set-upCreate two views of data:

Typically partition feature set by typeE.g. predicting speech emphasis

View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context

Some approaches use learners of different types

In practice, views may not truly be conditionally indep.But often works pretty well anyway

Page 50: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

50

Co-training ApproachCreate small labeled training data set

Train two (supervised) classifiers on current training Using different views

Use two classifiers to label residual unlabeled instances

Select ‘best’ newly labeled data to add to training data* Adding instances labeled by C1 to training data for C2, v.v.

Iterate

Page 51: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

51

Graphically

Figure from Jeon&Liu’11

Page 52: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

52

More Devilish DetailsQuestions for co-training:

Page 53: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

53

More Devilish DetailsQuestions for co-training:

Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine

Page 54: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

54

More Devilish DetailsQuestions for co-training:

Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine

How many instances to add per iteration?Threshold – by count, by value?

Page 55: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

55

More Devilish DetailsQuestions for co-training:

Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine

How many instances to add per iteration?Threshold – by count, by value?

How long to iterate?Fixed count? Threshold classifier confidence? etc…

Page 56: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

56

Co-training ApplicationsApplied to many language related tasks

Blum & Mitchell’s paperAcademic home web page classification95% accuracy: 12 pages labeled; 788 classified

Sentiment analysis

Statistical parsing

Prominence recognition

Dialog classification

Page 57: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

57

Learning Curves:Semi-supervised vs

Supervised

9 12 24 50 100 30066

68

70

72

74

76

78

80

82

84

supervisedsemi-supervised

# of labelled examples

Ac

cu

rac

y

Page 58: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

58

Semi-supervised LearningUmbrella term for machine learning techniques

that:Use a small amount of labeled training dataAugmented with information from unlabeled data

Page 59: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

59

Semi-supervised LearningUmbrella term for machine learning techniques

that:Use a small amount of labeled training dataAugmented with information from unlabeled data

Can be very effective:Training on ~10 labeled samples Can yield results comparable to training on 1000s

Page 60: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

60

Semi-supervised LearningUmbrella term for machine learning techniques that:

Use a small amount of labeled training dataAugmented with information from unlabeled data

Can be very effective:Training on ~10 labeled samples Can yield results comparable to training on 1000s

Can be temperamental:Sensitive to data, learning algorithm, design choicesHard to predict effects of:

amount of labeled data, unlabeled data, etc

Page 61: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

61

Summary

Page 62: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

62

ReviewIntroduction:

Entropy, cross-entropy, and mutual information

Classic machine learning algorithms:Decision trees, kNN, Naïve Bayes

Discriminative machine learning algorithms:MaxEnt, CRFs, SVMs

Other models:TBL, EM, Semi-supervised approaches

Page 63: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

63

General MethodsData organization:

Training, development, test data splits

Cross-validation:Parameter turning, evaluation

Feature selection:Wrapper methods, filtering, weighting

Beam search

Page 64: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

64

Tools, Data, & TasksTools:

Mallet libSVM

Data:20 Newsgroups (Text classification)Penn Treebank (POS tagging)

Page 65: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

65

Beyond 572Ling 573:

‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,

repositories

Page 66: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

66

Beyond 572Ling 573:

‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,

repositories

Ling 575s:Speech technology: Michael Tjalve (TH: 4pm)NLP on mobile devices: Scott Farrar (T: 4pm)

Page 67: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

67

Beyond 572Ling 573:

‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,

repositories

Ling 575s:Speech technology: Michael Tjalve (TH: 4pm)NLP on mobile devices: Scott Farrar (T: 4pm)

Ling and other electives