School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study...

Post on 28-Mar-2015

214 views 0 download

Transcript of School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study...

School of somethingFACULTY OF OTHER

School of ComputingFACULTY OF ENGINEERING

A comparative study of the tagging of adverbs in modern English corpora

Owen Nancarrow,

Language research group

Introduction

A comparative study of the tagging of adverbs in modern English corpora

NLP uses CORPORA, words are PoS-TAGGED: tagged with Parts of Speech: noun, verb, adjective, preposition ...

With subcategories, e.g. singular/plural common/proper noun

Brown, LOB, BNC and ICE-GB : 4 English corpora with related but different tag-sets; adverbs are particularly different

Adverb is a “dustbin” category (“if we’re not sure which PoS then call it an adverb?”); subcategories are inconsistent between corpora, even within one corpus

We present a detailed analysis, grounded on descriptions of adverbs in ELT (English Language Teaching) textbooks

Four sets of related English corpora

Corpora compared in this thesis

Thomson and Martinet 69

Traditional adverb subcategories in ELT grammar textbooks:

“... There are seven kinds of adverbs

1 of manner: e.g. quickly, bravely, happily, hard, fast, well

2 of place: e.g. here, there, everywhere, up, down, near, by

3 of time: e.g. now, soon, yet, still, then, today

4 of frequency: e.g. twice, often, never, always, occasionally

5 of degree: e.g. very, fairly, rather, quite, too, hardly

6 interrogative: e.g. when? where? why?

7 relative: e.g. when, where, why ...”

(Thomson and Martinet, 1969: 38)

Problems with tagging adverbs

To complicate matters, some adverbs are ambiguous, e.g.:

“Some words can be used as either prepositions or adverbs.

The most important words of this type are:

in, on, up, down, off, near, through, along, across, under, round”

(Thomson and Martinet, 1969: 52)

ALSO:

Other problems, e.g.: some adverbs are tagged inconsistently;

combined words (e.g. here’s) are quasi-adverbs; ...

Adverb or preposition

Inconsistent taggings in Brown

Combined adverb tags in Brown

Synoptic table

Conclusions

Other studies have included comparisons between English corpus tagsets (eg van Halteren 1999, Atwell et al 2000, Jurafsky and Martin 2000), but none to our knowledge has focused on adverbs, or examined differences of sub-categorizations in such detail.

Tagset standards should include this level of detail.

The approach in this thesis provides a methodology to follow in examining sub-categorizations in other corpus tagsets, and/or other grammatical categories.