Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

41
Unsupervised Approaches to Sequence Tagging, Morphology Induction, and Lexical Resource Acquisition Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

description

Unsupervised Approaches to Sequence Tagging, Morphology Induction, and Lexical Resource Acquisition. Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008. Unsupervised methods. Sequence labeling (Part-of-Speech tagging) Morphology Induction Lexical Resource acquisition . pronoun. - PowerPoint PPT Presentation

Transcript of Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Page 1: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised Approaches to Sequence Tagging, Morphology Induction, and

Lexical Resource Acquisition

Reza Bosaghzadeh & Nathan Schneider

LS2 ~ 1 December 2008

Page 2: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised methods– Sequence labeling (Part-of-Speech tagging)

– Morphology Induction

– Lexical Resource acquisition

.

She ran to the station quickly

pronoun verb preposition det noun adverb

un-supervise-d learn-ing

Page 3: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Part of Speech (POS) Tagging

• Prototype-driven model– Cluster based on a few examples – (Haghighi and Klein 2006)

• Contrastive estimation – Discount positive examples at cost of

implicit negative example– (Smith and Eisner 2005)

Page 4: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Prototype Driven taggingOverview

+

PrototypesTarget Label

Unlabeled Data

Prototype List

Annotated Data

(Haghighi and Klein 2006)

Page 5: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Prototypes

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed.

FEATURE kitchen, laundry

LOCATION near, closeTERMS paid, utilitiesSIZE large, feetRESTRICT cat, smoking

Information Extraction: Classified Ads

FeaturesLocationTermsRestrictSize

Prototype List

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed.

(Haghighi and Klein 2006)

Page 6: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Prototypes

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed.

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed.

Prototype List

NN VBN CC JJ CD PUNCIN NNS IN NNP RB DET

NN president IN of

VBD said NNS shares

CC and TO to

NNP Mr. PUNC .

JJ new CD million

DET the VBP are

English POS

(Haghighi and Klein 2006)

Page 7: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Where to get Prototypes

• 3 prototypes per tag

• Automatically extracted by frequency

(Haghighi and Klein 2006)

Page 8: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Prototypes

Tie each word to its most similar prototype

Features are same as (Smith & Eisner 05)

Trigram taggerWord type, suffixes up to length 3, contains hyphen, contains digit, initial capitalization

Similarity (Schuetze 93)• SVD dimensionality reduction• cos() between context vectors

(Haghighi and Klein 2006)

Page 9: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Prototypes – Language Variation

Pros• Can get prototypes for popular

languages• Doesn’t require tagging dictionary

Cons• Needs tag set• Needs prototypes for each tag

Page 10: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Contrastive Estimation• Already discussed in class• Key Idea:

– Mutating training examples often gives ungrammatical (negative) sentences

– During training, shift probability mass from generated negative examples to given positive examples

(Smith and Eisner 2005)

Page 11: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised POS taggingCurrent results

Best supervised results (CRFs): 99.5% !

Page 12: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised methods– Sequence labeling (Part-of-Speech tagging)

– Morphology Induction

– Lexical Resource acquisition

.

She ran to the station quickly

pronoun verb preposition det noun adverb

un-supervise-d learn-ing

Page 13: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised Approaches to Morphology

• Morphology refers to the internal structure of words– A morpheme is a minimal meaningful

linguistic unit– Morpheme segmentation is the

process of dividing words into their component morphemes

unsupervised learning

.

Page 14: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised Approaches to Morphology

• Morphology refers to the internal structure of words– A morpheme is a minimal meaningful

linguistic unit– Morpheme segmentation is the process of

dividing words into their component morphemes

un-supervise-d learn-ing– Word segmentation is the process of

finding word boundaries in a stream of speech or text

unsupervisedlearningofnaturallanguage

Page 15: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised Approaches to Morphology

• Morphology refers to the internal structure of words– A morpheme is a minimal meaningful

linguistic unit– Morpheme segmentation is the process of

dividing words into their component morphemes

un-supervise-d learn-ing– Word segmentation is the process of

finding word boundaries in a stream of speech or textunsupervised learning of natural language

Page 16: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

• Learns inflectional paradigms from raw text– Requires only the vocabulary of a corpus– Looks at word counts of substrings, and

proposes (stem, suffix) pairings based on type frequency

• 3-stage algorithm– Stage 1: Candidate paradigms based on

frequencies– Stages 2-3: Refinement of paradigm set via

merging and filtering• Paradigms can be used for morpheme

segmentation or stemming

Page 17: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• A sampling of Spanish verb

conjugations

Page 18: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• A proposed paradigm with stems {habl,

bail} and suffixes {-ar, -o, -amos, -an}

Page 19: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dance buyhablar bailar comprarhablo bailo comprohablamos bailamos compramoshablan bailan compran… … …• Same paradigm from previous slide,

but with stems {habl, bail, compr}

Page 20: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• From just this list, other paradigm analyses

(which happen to be incorrect) are possible

Page 21: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• Another possibility: stems {hab, bai},

suffixes {-lar, -lo, -lamos, -lan}

Page 22: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dance buyhablar bailar comprarhablo bailo comprohablamos bailamos compramoshablan bailan compran… … …• Spurious segmentations—this paradigm

doesn’t generalize to comprar (or most verbs)

Page 23: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

• What if not all conjugations were in the corpus?

speak dance buyhablar bailar comprar

bailo comprohablamos bailamos compramoshablan… … …

Page 24: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

• We have two similar paradigms that we want to merge

speak dance buyhablar bailar comprar

bailo comprohablamos bailamos compramoshablan… … …

Page 25: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

speak dance buyhablar bailar comprarhablo bailo comprohablamos bailamos compramoshablan bailan compran… … …• This amounts to smoothing, or

“hallucinating” out-of-vocabulary items

Page 26: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

ParaMor: Monson et al. (2007, 2008)

• Heuristic-based, deterministic algorithm can learn inflectional paradigms from raw text

• Paradigms can be used straightforwardly to predict segmentations– Combining the outputs of ParaMor and

Morfessor (another system) won the segmentation task at MorphoChallenge 2008 for every language: English, Arabic, Turkish, German, and Finnish

• Currently, ParaMor assumes suffix-based morphology

Page 27: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

• Word segmentation results – comparison

• See Narges & Andreas’s presentation for more on this model

Goldwater Unigram DPGoldwater Bigram HDP

Goldwater et al. (2006; in submission)

Page 28: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Multilingual morpheme segmentation Snyder & Barzilay (2008)

speak ES speak FRhablar parlerhablo parlehablamos parlonshablan parlent… …• Abstract morphemes cross

languages: (ar, er), (o, e), (amos, ons), (an, ent), (habl, parl)

• Considers parallel phrases and tries to find morpheme correspondences

• Stray morphemes don’t correspond across languages

Page 29: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Morphology papers: inputs & outputs

Page 30: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Unsupervised methods– Sequence labeling (Part-of-Speech

tagging)

– Morphology Induction

– Lexical Resource acquisition

.

She ran to the station quickly

pronoun verb preposition det noun adverb

un-supervise-d learn-ing

Page 31: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Lexical Resource Acquisition• Learning Bilingual Lexicons from

Monolingual Corpora (Haghighi et al. 2008)

• Narrative Event Chain Acquisition (Chambers and Jurafsky 2008)

• Labeling semantic roles for verbs (Grenager and Manning 2006)

Page 32: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Learning Bilingual Lexicons from Monolingual Corpora

SourceText

TargetText

Matching

mstate

world

name

Source Words

s

nation

estado

política

Target Words

t

mundo

nombre

(Haghighi et al. 2008)

Page 33: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Bilingual Lexicons: Data Representation

state

Orthographic Features 1.0

1.0

1.0

#sttatte#

5.0

20.0

10.0

Context Featuresworldpoliticssociety

SourceText

estado

Orthographic Features 1.0

1.0

1.0

#esstado#

10.0

17.0

6.0

Context Featuresmundopoliticasociedad

TargetText

(Haghighi et al. 2008)

Use CCA (Canonical Correlation Analysis) Trained with Hard EM to find best matching

Page 34: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Feature Experiments

Series10

25

50

75

100

61.1

80.1 80.289.0

Edit Dist Ortho Context MCCA

Prec

isio

n• MCCA: Orthographic and context

features

4k EN-ES Wikipedia Articles (Haghighi et al. 2008)

Page 35: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Narrative events: Chambers & Jurafsky (2008)

• Given a corpus, identifies related events that constitute a “narrative” and (when possible) predict their typical temporal ordering– E.g.: CRIMINAL PROSECUTION narrative, with

verbs: arrest, accuse, plead, testify, acquit/convict

• Key insight: related events tend to share a participant in a document– The common participant may fill different

syntactic/semantic roles with respect to verbs: arrest.OBJECT, accuse.OBJECT, plead.SUBJECT

Page 36: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Narrative events: Chambers & Jurafsky (2008)

• A temporal classifier can reconstruct pairwise canonical event orderings, producing a directed graph for each narrative

Page 37: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Narrative events: Grenager & Manning (2006)

• From dependency parses, a generative model predicts semantic roles corresponding to each verb’s arguments, as well as their syntactic realizations– PropBank-style: ARG0, ARG1, etc. per verb (do

not necessarily correspond across verbs)– Learned syntactic patterns of the form:

(subj=give.ARG0, verb=give, np#1=give.ARG2, np#2=give.ARG1) or (subj=give.ARG0, verb=give, np#2=give.ARG1, pp_to=give.ARG2)

• Used for semantic role labeling

Page 38: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

“Semanticity”: Our proposed scale of semantic richness

• text < POS < syntax/morphology/alignments < coreference/semantic roles/temporal ordering < translations/narrative event sequences

• We score each model’s inputs and outputs on this scale, and call the input-to-output increase “semantic gain”– Haghighi et al.’s bilingual lexicon induction wins

in this respect, going from raw text to lexical translations

Page 39: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Robustness to language variation• About half of the papers we examined

had English-only evaluations• We considered which techniques were

most adaptable to other (esp. resource-poor) languages. Two main factors:– Reliance on existing tools/resources for

preprocessing (parsers, coreference resolvers, …)

– Any linguistic specificity in the model (e.g. suffix-based morphology)

Page 40: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

SummaryWe examined three areas of unsupervised NLP:

1. Sequence tagging: How can we predict POS (or topic) tags for words in sequence?

2. Morphology: How are words put together from morphemes (and how can we break them apart)?

3. Lexical resources: How can we identify lexical translations, semantic roles and argument frames, or narrative event sequences from text?

In eight recent papers we found a variety of approaches, including heuristic algorithms, Bayesian methods, and EM-style techniques.

Page 41: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

Thanks to Noah and Kevin for their feedback on the paper; Andreas and Narges for their collaboration on the presentations; and all of you for giving us your attention!

Questions?

un-supervise-d learn-inghablar bailarhablo bailohablamos

bailamos

hablan bailan

subj=give.ARG0 verb=give np#1=give.ARG2 np#2=give.ARG1