Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer.
-
date post
19-Dec-2015 -
Category
Documents
-
view
224 -
download
0
Transcript of Guy Aston SSLMIT, University of Bologna [email protected] The learner as corpus designer.
Learner uses of corpora
• Form-focussed (data-driven learning)• Meaning-focussed (learning the
culture)• Skill-focussed (reading practice)• Browsing environment (serendipity)• Reference tool for other tasks
(reading/writing aid)
Why make your own corpus?
• You can devise your own recipe• You know what’s in it• You learn how to do it• Can be fun• Can provide practice in language
use
Devising your own recipe
• Only the text-type(s) you want• Only the texts you want• The quantity you want
… small and specialised is beautiful
You learn how to do it
• Can be a useful skill for many language workers – technical writers– translators– teachers
• Can make you a more critical corpus user
It can be fun
• Provides a challenge• Gives sense of achievement/satisfaction
Practice in language use
• Design/construction/evaluation of corpora can be communicative activities
Why use standard corpora?
• Less effort• More reliable• Better packaging• You don’t want to learn to make
your own
A compromise strategy: make your own
subcorpus
• assemble using the pre-prepared ingredients of a larger corpus
or in other words… go to a (fruit) salad bar
You have a choice of
• text-types• individual texts
• selection by pre-determined criteria
• selection by hand
… or both
You know what went in• so top-down processing is easier
Little effort• in comparison with making your
own
Good packaging
• Metatextual information• Linguistic annotation• Can use software designed for full
corpus• Indexed
You get to learn
• what are(n’t) useful subcorpora• what are(n’t) useful design criteria• how to do it
It can be fun
• challenge / achievement / satisfaction
You can talk about its
• design / construction / evaluation
You can create subcorpora of
• specific corpus texts• texts containing solutions to a query • encoded categories of texts• your own categories of texts
and compare them with • other subcorpora• the full corpus
• ‘context-governed’ spoken texts- monologue: 17 texts - dialogue: 29 texts
Making subcorpora using encoded
categories
More frequent in M*– could– had– he– know– their– were– when– who– your
More frequent in D*– 'll– 'm– any– no– pounds– right– yeah– yes
*ranked 20+ positions higher in first 100 words
Monologue vs Dialogue
• no occurrences of all right in monologue
• when you’re / you’ll / you’d / you’ve is more common in monologue than when we’re / we’ll / we’d / we’ve; vice-versa in dialogue
Investigating the differences
Subcorpora using your own categories
David Lee’s book genres• academic non-fiction (13 texts)• non-academic non-fiction (15 texts)• prose fiction (13 texts)
Distinctive -ly adverbs of:
• academic non-fiction– accordingly, essentially, eventually,
largely, namely, notably, respectively, surprisingly
• non-academic non-fiction– effectively, merely, normally,
obviously, possibly, specially• prose fiction
– carefully, quietly, slightly, slowly, softly, surely, truly
Working with subcorpora can allow
• study/comparison of forms/meanings in particular texts/text-types
• better-focussed reading practice• more appropriate reference tools
for particular tasks• more focussed browsing
• may not be representative (but nor is most language learning data)
• are good for forming hypotheses to be tested more widely
• will allow more interesting uses when extracted from a larger corpus
Subcorpora
Making your own provides
• better preparation and motivation for corpus use
• more critical awareness• lots to talk about