Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

10
Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University

Transcript of Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Page 1: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Implications and applications of corpus-based analysis

Christiane Fellbaum

Princeton University

Page 2: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Implications

• Linguist is no longer his/her own corpus• Corpus data don’t necessarily agree with introspection,

intuition• Broad speaker community may reveal linguist’s idiolect

Page 3: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Implications

• New research methodology requires new analyses • Statistical, "soft" rules rather than hard “yes/no” rules• “Messy” theories (hence remaining resistance to CL!)

Page 4: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Applications

Gain better theoretical understanding of linguistic phenomenaFor lexical semantics work:--New challenges for lexicographic representation--Natural Language Processing Applications, e.g., text understanding,

language generation

Page 5: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Two examples

• Large-scale corpus analysis of German VP idioms

• Discovery of scales

Page 6: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Corpus-based study of idioms

Linguists, lexicographers, psycholinguists assume that idioms are fixed

kein Blatt vor den Mund nehmen

No leaf in front of the mouth take

“speak freely and frankly”

Non-compositional, opaque

Page 7: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Corpus data show• Morphosyntactic variation:Ein Blatt nehmen sie dabei vor keinen MundA leaf take they in front of no mouth(topicalization, shift of negation)

• Lexical variationEin Regierungssprecher ist ein Mann,A government spokesman is a man der sich 100 Blaetter vor den Mund nimmtwho 100 leaves in front of his mouth takes

• No theory of idiom grammar/representation accounts for all phenomena

Page 8: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Discovering scales

• Scalar adjectives (Sheinman & Tokunaga 2009; Schulam & Fellbaum 2010)

…terrible-lousy-bad-mediocre-good-great-outstanding…

• Gradable emotions (Fellbaum & Mathieu 2010)…alarm-frighten-scare-terrify…

• Where on the scale are these words placed? What is their relative position (their strength, intensity)?

Page 9: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Discovering scales

Corpus searches with seed pair reveals lexical-semantic patterns for asymmetry, such as

X even Y (Y is stronger than X)If not X, at least Y (X>Y)X but not Y (X is weaker than Y)

Patterns can be applied to all members of a scale, establish relative order

Page 10: Implications and applications of corpus-based analysis Christiane Fellbaum Princeton University.

Conclusion

Corpus may reveal linguistic data that challenge current theories

escape introspection