Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer.

46
Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    0

Transcript of Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer.

Guy Aston

SSLMIT, University of [email protected]

The learner as corpus designer

… or the art of fruit salads

Learner uses of corpora

• Form-focussed (data-driven learning)• Meaning-focussed (learning the

culture)• Skill-focussed (reading practice)• Browsing environment (serendipity)• Reference tool for other tasks

(reading/writing aid)

Why make your own corpus?

• You can devise your own recipe• You know what’s in it• You learn how to do it• Can be fun• Can provide practice in language

use

The raw ingredients

Devising your own recipe

• Only the text-type(s) you want• Only the texts you want• The quantity you want

… small and specialised is beautiful

You know what’s in it

• Top-down knowledge of corpus• Top-down knowledge of texts

You learn how to do it

• Can be a useful skill for many language workers – technical writers– translators– teachers

• Can make you a more critical corpus user

It can be fun

• Provides a challenge• Gives sense of achievement/satisfaction

Practice in language use

• Design/construction/evaluation of corpora can be communicative activities

Why use standard corpora?

• Less effort• More reliable• Better packaging• You don’t want to learn to make

your own

Less effort

More reliable

• if it’s well designed

• if it fits your needs

Better packaging

• Metatextual information

• Annotation

• Corpus-specific software

You don’t want to learn to make your own?

A compromise strategy: make your own

subcorpus

• assemble using the pre-prepared ingredients of a larger corpus

or in other words… go to a (fruit) salad bar

(Pick ’n’ mix with the BNC)

You have a choice of

• text-types• individual texts

• selection by pre-determined criteria

• selection by hand

… or both

You know what went in• so top-down processing is easier

Little effort• in comparison with making your

own

Good packaging

• Metatextual information• Linguistic annotation• Can use software designed for full

corpus• Indexed

You get to learn

• what are(n’t) useful subcorpora• what are(n’t) useful design criteria• how to do it

It can be fun

• challenge / achievement / satisfaction

You can talk about its

• design / construction / evaluation

Talking about fruit salad

BNC Sampler: KC2

Talking about fruit salad

BNC Sampler: KC2

And now to details …

the Sampler awaits!

You can create subcorpora of

• specific corpus texts• texts containing solutions to a query • encoded categories of texts• your own categories of texts

and compare them with • other subcorpora• the full corpus

Text analysis: selectingChoosin

gspecific

texts

Viewing the index

Viewing the

index

Party policies (will/shall be + VVN)

Or, to return to our fruit salad text …

A bad language subcorpus: texts containing solutions to a

query

Choosing the bad language texts

j

collocates of f.*k.*

collocates of f_ words

collocates of oh

collocates of oh

• ‘context-governed’ spoken texts- monologue: 17 texts - dialogue: 29 texts

Making subcorpora using encoded

categories

More frequent in M*– could– had– he– know– their– were– when– who– your

More frequent in D*– 'll– 'm– any– no– pounds– right– yeah– yes

*ranked 20+ positions higher in first 100 words

Monologue vs Dialogue

• no occurrences of all right in monologue

• when you’re / you’ll / you’d / you’ve is more common in monologue than when we’re / we’ll / we’d / we’ve; vice-versa in dialogue

Investigating the differences

you and we

you we

Monologue 42532014

Dialogue 66354949

Subcorpora using your own categories

David Lee’s book genres• academic non-fiction (13 texts)• non-academic non-fiction (15 texts)• prose fiction (13 texts)

Distinctive -ly adverbs of:

• academic non-fiction– accordingly, essentially, eventually,

largely, namely, notably, respectively, surprisingly

• non-academic non-fiction– effectively, merely, normally,

obviously, possibly, specially• prose fiction

– carefully, quietly, slightly, slowly, softly, surely, truly

largely (academic non-fict)largely (academic non-fiction)

it (academic non-fiction)

To conclude …

Working with subcorpora can allow

• study/comparison of forms/meanings in particular texts/text-types

• better-focussed reading practice• more appropriate reference tools

for particular tasks• more focussed browsing

• may not be representative (but nor is most language learning data)

• are good for forming hypotheses to be tested more widely

• will allow more interesting uses when extracted from a larger corpus

Subcorpora

Making your own provides

• better preparation and motivation for corpus use

• more critical awareness• lots to talk about

Enjoy!