School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study...

11
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora Owen Nancarrow, Language research group

Transcript of School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study...

Page 1: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

School of somethingFACULTY OF OTHER

School of ComputingFACULTY OF ENGINEERING

A comparative study of the tagging of adverbs in modern English corpora

Owen Nancarrow,

Language research group

Page 2: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Introduction

A comparative study of the tagging of adverbs in modern English corpora

NLP uses CORPORA, words are PoS-TAGGED: tagged with Parts of Speech: noun, verb, adjective, preposition ...

With subcategories, e.g. singular/plural common/proper noun

Brown, LOB, BNC and ICE-GB : 4 English corpora with related but different tag-sets; adverbs are particularly different

Adverb is a “dustbin” category (“if we’re not sure which PoS then call it an adverb?”); subcategories are inconsistent between corpora, even within one corpus

We present a detailed analysis, grounded on descriptions of adverbs in ELT (English Language Teaching) textbooks

Page 3: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Four sets of related English corpora

Page 4: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Corpora compared in this thesis

Page 5: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Thomson and Martinet 69

Traditional adverb subcategories in ELT grammar textbooks:

“... There are seven kinds of adverbs

1 of manner: e.g. quickly, bravely, happily, hard, fast, well

2 of place: e.g. here, there, everywhere, up, down, near, by

3 of time: e.g. now, soon, yet, still, then, today

4 of frequency: e.g. twice, often, never, always, occasionally

5 of degree: e.g. very, fairly, rather, quite, too, hardly

6 interrogative: e.g. when? where? why?

7 relative: e.g. when, where, why ...”

(Thomson and Martinet, 1969: 38)

Page 6: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Problems with tagging adverbs

To complicate matters, some adverbs are ambiguous, e.g.:

“Some words can be used as either prepositions or adverbs.

The most important words of this type are:

in, on, up, down, off, near, through, along, across, under, round”

(Thomson and Martinet, 1969: 52)

ALSO:

Other problems, e.g.: some adverbs are tagged inconsistently;

combined words (e.g. here’s) are quasi-adverbs; ...

Page 7: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Adverb or preposition

Page 8: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Inconsistent taggings in Brown

Page 9: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Combined adverb tags in Brown

Page 10: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Synoptic table

Page 11: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Conclusions

Other studies have included comparisons between English corpus tagsets (eg van Halteren 1999, Atwell et al 2000, Jurafsky and Martin 2000), but none to our knowledge has focused on adverbs, or examined differences of sub-categorizations in such detail.

Tagset standards should include this level of detail.

The approach in this thesis provides a methodology to follow in examining sub-categorizations in other corpus tagsets, and/or other grammatical categories.