Mono- and bilingual modeling of selectional preferences
description
Transcript of Mono- and bilingual modeling of selectional preferences
Mono- and bilingual modeling of selectional preferences
Sebastian PadóInstitute for Computational Linguistics
Heidelberg University
(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)
Some context
•Computational lexical semantics: modeling the meaning of words and phrases
•Distributional approach• Observe the usage of words in corpora
• Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model
Knowledge
Corpus
Structure
Methods:Distributional
semantics
Phenomena:Semantic
relations in bilingual
dictionaries
Application:Predictions of plausibility judgments
Plausibility of Verb-Relation-Argument-TriplesVerb Relation Argumen
tPlausibility
eat subject customer 6.9eat object customer 1.5eat subject apple 1.0eat object apple 6.4• Central aspect of language
• Selectional preferences [Katz & Fodor 1963, Wilks 1975]• Generalization of lexical similarity• Incremental language processing [McRae &
Matsuki 2009]• Disambiguation [Toutanova et al. 2005],
Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]
Modelling Plausibility•Approximating plausibility by frequency
•Two lexical variables: Frequency of most triples is zero•Implausibility or sparse data?• Generalization based on an ontology (WordNet)
[Resnik 1996]• Generalization based on vector space [Erk, Padó, und
Padó 2010]
English corpus
(eat, obj, apple) 100
(eat, obj, hat) 1(eat, obj,
telephone) 0(eat, obj, caviar) 0
(eat, obj, apple): highly plausible(eat, obj, hat): somewhat plausible(eat, obj, telephone): ?(eat, obj, caviar): ?
Semantic Spaces
• Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998]
• Geometrically: Vector in high-dimensional space
• High vector similarity implies high semantic similarity• Next neighbors = synonyms
cultiver
rouler
mandarine
5 1
clémentine
4 1
voiture 1 20
Frcultiver
rouler
mandarineclémentine
voiture
Similarity-based generalization[Pado, Pado & Erk 2010]
•Plausibility is average vector space similarity to seen arguments
• (v, r, a): verb – relation – argument head word triple
• seenargs: set of argument head words seen in the corpus
• wt: weight function• Z: normalization constant• sim: semantic (vector space) similarity
Geometrical interpretation
Peter
husbandchild
orangeapple
breakfastcaviar Seen objects of
“eat”
Seen subjects of “eat”
telephone
Evaluation
•Triples with human plausibility ratings [McRae et al. 1996]• Evaluation: Correlation of model
predictions with human judgments• Spearman’s ρ = 1: perfect correlation; ρ = 0:
no correlation•Result: Vector space model attains almost quality of “deep” model at 98% coverage
Modell Abdeckung
Spearman’s rho
Resnik 1996 [ontology-based]
100% 0.123 n.s.
EPP [vector space-based] 98% 0.325 ***U. Pado et al. 2006 [“deep” model]
78% 0.415 ***
From one to many languages…
•Vector space model reduces the need for language resources to predict plausibility judgments• No ontologies•Still necessary: Observations of triples, target words• Large, accurately parsed corpus• Problematic for basically all languages except
English
•Can we extend our strategy to new languages?
Resnik [Brockmann & Lapata 2002]
TIGER+ GermaNet
ρ= .37
EPP [Pado & Peirsman 2010]
HGC ρ= .33
Predicting plausibility for new languages
•Transfer with a bilingual lexicon [Koehn and Knight 2002]• Cross-lingual knowledge transfer
•Print dictionaries are problematic• Instead: acquire from distributional data
cultiver – grow
pomme – apple
(cultiver, Obj, pomme) Englishmodel
Englishcorpus
(grow, obj, apple): highly plausible
Bilingual semantic space
• Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997]• Dimensions are bilingual word pairs, can be
bootstrapped• Frequencies observable from comparable
corpora• Nearest neighbors:
Cross-lingual synonyms ⟷ Translations
(cultiver, grow)
(rouler, drive)
mandarine
5 1
mandarin
4 2
car 1 20
Fr
cultiver/grow
rouler/drive
mandarinemandarin
carE
Nearest neighbors in bilingual space
• Similar usages / context profiles do not necessarily indicate synonymy
(cultiver, grow)
(rouler, drive)
pear 5 1pomme 4 2car 1 20
Fr
cultiver/grow
rouler/drive
pearpomme
carE
• Bilingual case: Peirsman & Pado (2011)• Lexicon extraction for EN/DE and
EN/NL
Evaluation against Gold Standard
•Evaluation of nearest cross-lingual neighbors against a translators’ dictionary
Analysis of 200 noun pairs (EN-DE)
Meta-Relation Relation Frequency
Example
Synonymy (50%) 99 Verhältnis - relationship
Semantic similarity (16%)
Antonymy 1 Inneres - exteriorCo-Hyponymy
15 Straßenbahn - bus
Hyponymy 3 Kunstwerk - painting
Hypernymy 15 Dramatiker - poetSemantic relatedness (19%)
39 Kapitel - essay
Errors (14%) 28 DDR-Zeit – trainee
Similarity by relation
How to proceed?
•Classical reaction: Focus on cross-lingual synonyms• Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues
•Our hypothesis (prelimimary version):• Non-synonymous pairs still provide information about
bilingual similarity• Should be exploited for cross-lingual knowledge transfer• Experimental validation: Vary number of synonyms,
observe effect on cross-lingual knowledge transfer
Varying the number of neighbors
•Nearest neighbors: 50% of synonyms•Further neighbors: quick decline to 10% of synonyms
Experimental setup
rouler – drive
bagnole – jalopy, banger,
car
(bagnole, subj, rouler) English model
Englishcorpus
Consider plausibilities für:
(jalopy, subj, drive)(banger, subj, drive)
(car, subj, drive)
Details• Model:• English model: trained on BNC as before• Bilingual lexicon extracted from BNC und
Stuttgarter Nachrichtenkorpus HGC as comparable corpora
• Prediction based on n nearest English neighbours for German argument
• Evaluation:• 90 German (v,r,a) triples with human
plausibility ratings [Brockmann & Lapata 2003]
Results – EN-DE
1 NN
2 NN
3 NN
4 NN
5 NN
Translated English EPP 0.34 0.41 0.44 0.46 0.40
Model Resources Sperman’s ρResnik [Brockmann & Lapata 2002]
TIGER corpus, German Word Net
.37
EPP German [Pado & Peirsman 2010]
HGC corpus parsed with PCFG
.33
• Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included
Results: Details
1 NN
2 NN
3 NN
4 NN 5 NN
English EPP (all ) 0.34 0.41 0.44 0.46 0.40
English EPP (subjects) 0.53 0.51 0.56 0.56 0.55English EPP (objects) 0.58 0.61 0.61 0.64 0.58
English EPP (pp objects)
0.33 0.45 0.45 0.46 0.42
Sources of the positive effect•Non-synonyms are in fact informative for plausibility translation
•Semantically similar verbs: eat – munch – feast• Similar events, similar arguments [Fillmore et al.
2003, Levin 1993]
•Semantically related verbs: peel – cook – eat• Schemas/narrative chains: shared participants
[Shank & Abelson 1977, Chambers & Jurafsky 2009]
Our hypothesis with qualifications
• Using non-synonymous translation pairs is helpful1. if transferred knowledge is lexical• Many infrequently observed datapoints
2. if knowledge is stable across semantically related/similar word pairs
• Counterexample: polarity/sentiment judgments• food – feast – grub • Parallel experiment: best results for single
nearest neighbor
Summary
•Plausibility can be modeled with fairly shallow methods• Seen head words plus generalization in vector
space• Precondition: accurately parsed corpus•If unavailable: Transfer from better-endowed language• Translation through automatically induced
lexicons•Transfer of knowledge about certain phenomena can benefit from non-synonymous translations• Corresponding to monolingual results from QA
[Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …