Teachers’ Top 10 Uses For a Language Corpus Saturday, May 18, 9-10h * PLUS Breakout Session at...
-
Upload
payton-fogg -
Category
Documents
-
view
222 -
download
3
Transcript of Teachers’ Top 10 Uses For a Language Corpus Saturday, May 18, 9-10h * PLUS Breakout Session at...
1
Teachers’ Top 10 UsesFor a Language Corpus
Saturday, May 18, 9-10h * PLUS Breakout Session at 10h15 <HERE> *
Sunshine State TESOL 2013«Expanding Traditions: Merging Methodology & Technology»
ORLANDO, FLORIDA
Tom CobbDidactique des langues
Université du Québec à Montréal
FIND THIS PPT AT WWW.LEXTUTOR.CA/CV/SS-TESOL.PPT
2
Who?
Tom Cobb teaches Applied Linguistics at a French
university in Montreal. His main interest is adapting the
computer tools of linguists to the needs of language teachers
through his website www.lextutor.ca.
Tom worked abroad for many years (Saudi Arabia,
Oman, Hong Kong) before returning to North America,
and continues to work as a consultant in both developed and
developing counries (Japan, Niger, Benin, Barbados). His
research writings are available at lextutor.ca/cv/ .
3
SS-TESOL - Blurb
A “language corpus” is a sampled collection of written or spoken texts large enough to represent part of a language (medical, economic) or even a language as a whole. The applied linguistics literature is full of references to research involving corpora, and ESL teacher-training courses exhort new teachers to get familiar with corpora and use them for various purposes in their teaching. But - when teachers get into the classroom, do they follow this advice? And if so what do they use a corpus for?
The Lextutor website (wwww.lextutor.ca) offers teachers access to several corpora and checks how they use them. User data shows that > 1,000 (mainly teachers) per day consult a corpus on Lextutor. This data along with email queries and conference presentation makes it clear what teachers are using corpora for, and has made it possible to evolve the tools in line with teachers’ needs and goals. My talk will outline the main 10 reasons teachers consult a concordance.
4
Really ?
5
6
7
8
What?
• What is a corpus?• Why do we need corpora?• What difference do they make?• What is ``the corpus revolution``?
Or, ``Is there a corpus revolution?``
>>>> A brief primer on CORPORA before we get to teachers’ uses
9
Corpora – what are they?
11
Dr Johnson A Dictionary of the English Language
Longman 1755 Based on quotations from literature
copied onto many slips of paper
But using literature has some problems
Early corpora
12
120 years later - James Murray, OED 1879 – REAL LANGUAGE examples sent in by post - Oxford City Post Office sets up a special sub-branch for OED
1960s - Enter The Computer
13
14
15
What is a corpus? NOT just «a lot of text»!
A large collection of language in use, but
…Assembled systematically, according to explicit criteria
of representativeness
How large? Depends on the goal
16
Goals and sizes Linguistics goal - to represent
entire language• 100 million wds still under-represents
common collocations
Pedagogical goal – S`s meet common words, structures
• 1-million-words gives 10 hits for frequent words
Applied linguistics goal – trace an acquisition feature
• 100,000 word Learner Corpora are common
17
Drilling down into… Pedagogical goal – S`s meet
common grammar and vocab Grammar – 1 million is adequate
– All structures get many hits Lexis
• Basic vocab – 1 million gives 10 hits @ 2k level
• Main collocations– 1 million gives the main ones
Torrential rain?
• “Raining cats and dogs”? – 1 billion gives 5 hits
• Identify specialist lexis– 200,000 may be enough
18
A growth industry
Brown 1970………………..1,000,000 wdshttp://icame.uib.no/brown/bcm.html
BNC 1994 .……………… 100,000,000 wdswww.natcorp.ox.ac.uk
COCA (BYU) 2013 .……. 450,000,000 wdsContemporary corpus U.S. English 1990-2012
http://corpus.byu.edu/coca/
Cambridge Int’l 2002....1,000,000,000 wdswww.cambridge.org./elt/corpus/international_corpus.htm
19
Design / composition e.g., Brown (1970s)
Page from Lextutor
20
What does a corpus represent? A language as a whole
• BNC
Or a part• Cancode oral, COCA, MICASE
academic
Or of an individual • Jack London’s collected works
Or a group of individuals–Class of ESL learners
21
How do we read a corpus?
Cannot read it naturally–Defeats the goal
Needs the help of a search technology
concordance index frequency list many others
22
Concordancers
http://www.lextutor.ca/concordancers/concord_e.html
23
Corpora – why do we need them?
24
Why do we need corpora?
A. Corpus work is sexy
B. We have computers – let’s use them
C. Linguistic intuitions are unreliable
25
Linguistic intuitions are notoriously unreliable
Demo 1: Do you think however is more common in spoken or in written language?
By how much? (3 to 1… etc)
26http://www.lextutor.ca/range/range_corpus/
27
Demo 2: What are the main senses of back and which is most common?
• By what factor?
http://www.lextutor.ca/concordancers/concord_e.html
28
29
30
Demo 3: Can you rank order these roughly by frequency band?
0 - 2k3k - 5k6k - 10k11k-15k
http://www.lextutor.ca/freq/train/
31Try one? http://www.lextutor.ca/freq/train/
32
Many linguistic intuitions are unreliable
Implicit patterns are extremely slow to extract from input
N. Ellis, J. Hulstijn
… because of the severe limitations on what we can see and remember
… unaided
And if pattern perception is slow and unreliable for Native Speakers…
How much slower for LEARNERS ?!
33
34
Not only linguistic intuitions are problematic
For every appearance,many possibleexplanations
Stand outside on astarry evening, what does it look like?
35
The role of the computer in modern science is well known. In disciplines like physics and biology, the computer's ability to store and process inhumanly large amounts of information has disclosed patterns and regularities in nature beyond the limits of normal human experience. Similarly in language study, computer analysis of large texts reveals facts about language that are not limited to what people can experience, remember, or intuit. In the natural sciences, however, the computer merely continues the extension of the human sensorium that began 200 years ago with the telescope and microscope. But language study did not have its telescope or microscope. The computer is its first analytical tool, making feasible for the first time a truly empirical science of language.
– Cobb 1999
36
Before the computer, linguists could only study small samples of language at a time because of their limitations of their powers of observation and their memories. Even scholars who relentlessly collected instances of usage all their lives only had a few examples of any particular pattern, and there was no way of telling what they had missed.
Sinclair, 2003, p. ix
37
Most sciences - supplemented by technologies from the 15th century
BIOLOGY..……….microscope ASTRONOMY..…..telescope NAVIGATION.……astrolabe etc
Language study – late 20th century –
….machine readable corpora
38
Corpus Findings – Very Good News for ESL
39
Fabled Core of English is close to disclosure through 35 yrs of corpus work Main lexis + coverage
2000 wd families = 80%, Carrol et al 76 Main collocations in BNC-speech
84 HF collocations belong in 1k list, Shin & Nation 2007
Main phrasal verbs – 25 Ph vbs = 1/3 of all ph vbs in BNC, Gardner & Davies, 2007
Main morphologies Bauer & Nation, 1993
Main stress patterns (Murphy & Kandil) Cf. All this coming together at the same time as
the human genome, also a corpus project
40
Numerous errors are now corrected (in principle)
Definitions no longer harder than the defined word
Simple present no longer automatically the first verb tense taught
Written language no longer the model for spoken language
Status of multi-word units is reinstated Grammar no longer taught …
via unknown lexis as unconnected to lexis
41
Thus the “corpus revolution”
Dictionaries Grammars Courses Studies
42
This is all great, but… What do teachers do
with corpora?
<<< Back to 10 main uses of Lextutor corpora with ESL learners
43
This is all great, but…• What do teachers do
with corpora?
• <<< Back to 10 main uses of Lextutor corpora with ESL learners
44
1. The obvious use – source of examples for the teacher
• Teacher finds examples to show students– Words– Structures– Discourse features
• Find sentences for test questions – within a rough-tuned level – within a domain
««-- MEANS WE CAN GO LIVE EASILY FROM THIS PLACE
45
Display words, collocations, structures in classroom
50
Conclusion: most of “What it means to know a word” can be shown in a million-word corpus
Nation’s 18 kindsof word knowledge
51
• Uses 2-9 are concordancing in a task context– Where teachers set up concordances for learners
to use independently– because they achieve some goal by doing so
• Payoff for looking through multi-examples
• These were independent uses of concordances– Later incorporated in dedicated interfaces
52
EXAMPLE: A student writer wants to describe a teacher as ``one of the best teacher…``
2. Corpus as a writing resource
53A writing resource click-linked to learner`s text
56
4-5 : Corpus as a reading resource
Expand the text• Via concordancer hooked up to
learner’s text–With potential payoff in strategy
development
58
4. Give lexical info while reading
• Or, develop lexical strategies while reading• Or, eta-lexical competence… etc
63
7. Group made concs for collab-vocab• Learners contribute concordance lines• Since there are too many words to learn alone…
64
7a. Facilitate transfer of word knowledge• to novel context
67
9. Snapshot of a set of learner essays
• Error patterns?• Are recently learned words coming
through in production?• Are new structures coming through?
– Correctly?
69
10. And, under development
• By popular demand• From my best Googe-hitting paper (1997)• Scope out a level by contextual inference
– Like in L1 but with support
70
Any research supporting all of this?• COCONCORDANCE AS A READING RESOURCE
– Cobb, T. (2009). Internet and literacy in the developing world: Delivering the teacher with the text. In K. Parry (Ed.), Literacy for All in Africa Vol. 2: Reading in Africa: Beyond the School. Kampala: African Book Collective.
• CONCORDANCE AS WRITING FEEDBACK– Gaskell, D., & Cobb, T. (2004) Can learners use concordance feedback for writing errors?
System, 32(3), 301-319
• LEARNER-BUILT CONCORDANCE FOR VOCAB DEVELOPMENT– Horst, M., Cobb, T., & Nicolae, I. (2005). Expanding Academic Vocabulary with a
Collaborative On-line Database. Language Learning & Technology 9(2), 90-110
• CONCORDANCE INVESTIGATION OF LEARNER PRODUCTION– Cobb, T. (2003).
Analyzing late interlanguage with learner corpora: Quebec replications of three European studies. Can. Modern Language Review 59(3), 393-423.
• CONCORDANCE FOR SCOPING OUT A K-LEVEL– Cobb, T. (1997). Is there any measurable learning from hands-on concordancing?
System 25 (3), 301-315.– Cobb, T. & Horst, M (2011). Does Word Coach coach words? CALICO 28(3), 639-661.
MORE AT LEXTUTOR.CA/CV/