Qiufang Wen The national research center for foreign language education, BFSU Chinese learner...

Post on 24-Dec-2015

223 views 0 download

Tags:

Transcript of Qiufang Wen The national research center for foreign language education, BFSU Chinese learner...

Qiufang Wen

The national research center for foreign language education, BFSU

Chinese learner Chinese learner corpora and second corpora and second language researchlanguage research

The 2006 International Symposium of Computer-Assisted Language Learning

June 2-4, 2006, Beijing

Topics to be addressedTopics to be addressed•English corpora of Chinese learnersEnglish corpora of Chinese learners

•Corpus-based studies on English learners in mainlanCorpus-based studies on English learners in mainland Chinad China

•Several corpus-based studies on English learners’ iSeveral corpus-based studies on English learners’ interlanguage by myself or together with my colleaunterlanguage by myself or together with my colleaugesges

•Advantages and disadvantages of corpus-based studAdvantages and disadvantages of corpus-based studies on the interlanguageies on the interlanguage

Topic OneTopic One

English corpora of

Chinese learners

•Chinese learner English Corpus (CLEC)

•College Learners’ Spoken English

Corpus (COLSEC)

•Spoken and Written Corpus of Chinese

Learners (SWECCL)

–Version 1

–Version 2 (under construction)

•Bilingual Corpus of Chinese English

Learners (BICCEL): under construction

1. Chinese learner English Corpus (C1. Chinese learner English Corpus (CLEC) by Gui & Yang in 2003LEC) by Gui & Yang in 2003

•Written corpus: 1 million

•Timed and untimed compositions

•Levels of proficiency– Middle school students

– Non-English major (Band 4)

– Non-English major (Band 6)

– English majors (Band 4 )

– English majors (Band 8)

•Error-tagged

Two Types of English Learners in University

English Majors Non-English majors

Year 4

Year 3

Year 2

Year 1

Band 8Band 8

Band 4Band 4

Year 4

Year 3

Year 2

Year 1

Band 6

Band 4

Band 2Band 2

2. College Learners’ Spoken English Corp2. College Learners’ Spoken English Corpus (COLSEC) by Yang & Wei in 2005us (COLSEC) by Yang & Wei in 2005

•Tokens: 0.7million

•Source: National spoken English

test for non-English majors

•Test items

– Teacher-student conversation

– Student-student discussion

– teacher-student discussion

•Data format: written transcripts

3. Spoken and Written Corpus of Chinese Learn3. Spoken and Written Corpus of Chinese Learners (SWECCL) by Wen, Wang & Liang in 2005ers (SWECCL) by Wen, Wang & Liang in 2005 (V

ersion 1)

SWECCL

WECCLSECCL

1.18 million1.46 million

Spoken (SECCL)Spoken (SECCL)

•Source of data

– National spoken English test: 1996-2002

– Second-year English majors

•Data format

– Digital sounds as well as transcripts of the

speeches

National spoken English test for English majors —

Band 4 •Test format

– Test in a lab•The number of testees annually

– 2006: more than 16,000 – Expect to have 50,000 in the future

•Scoring procedures– A random sample (30-35 tapes)– Two raters scoring one tape independently

•Number of subjects

– 6 groups from each year (1996-

2002)

– 42 groups (30/35) = about 1400

students

– About 230 hours’s speech

•Testing items

Testing itemsTesting items

Task Content Preparation time

Retelling A story Listen twice but no

preparation

3 min.

Monologue

Personal experience

3 min. 3 min.

Role play About an issue in daily life

3 min. 4 min.

The structure of SECCLThe structure of SECCL

SECCL

Text

Tagged

Raw

Special

Article

Past TenseWholeTask

Year

Task ATask B

Task C

Sound files (1996-2002)

The written component

Written

Year 1 Year 2 Year 3 Year 4

The written component

•Source of data

– Timed compositions in class (40 minutes,

no less than 300 words)

– Take-home compositions (no word limit)

•Types of compositions

– Argumentative (a list of topics provided)

– Narrative

SWECCL in 2007SWECCL in 2007 (Version 2)

SWECCL

WECCLSECCL

Two millionTwo million

SECCL(Version 2)SECCL(Version 2)

•2003-2006 National Spoken English Test fo

r second-year English majors (band 4)

•2000-2006 National Spoken English Test fo

r 4th-year English majors-Band 8 (Task 3)

•A longitudinal data (2001-2004)

Spoken (Band 8)

•Testing item (Task C)

– Make a comment on a given

topic

•Data format

– Digital sounds as well as

transcripts of the speeches

Spoken (Longitudinal)Spoken (Longitudinal)

•72 students 56 students•40 hours’ speech

Year 1 Year 2 Year 3 Year 4

Data

collection

time

2001 2002 2003 2004

TasksTasks

•Reading aloud

•Retelling a story

•Talking on a given topic (Narrative)

•Talking on a given topic (argumentative)

•Conversation (Role play)

•Discussion on a given topic

4. Bilingual Corpus of 4. Bilingual Corpus of Chinese English Learners Chinese English Learners

(BICCEL)(BICCEL)

BICCEL

Spoken Written

E-C C-E E-C C-E

0.5 million 0.5 million 0.5 million 0.5 million

Spoken component of Spoken component of BICCELBICCEL

•National Oral English test — Band 8– The 4th year English majors

– Interpreting from English to Chinese (Task A)

– Interpreting from Chinese to English (Task B)

– 2001-2005: 1100 testees

Written component of Written component of BICCELBICCEL

•Source of data: in-class

assignment

–E-C and C-E translation

–Across the 3rd and 4th years

–30 universities across the country

Topic TwoTopic Two

A brief review of corpus-A brief review of corpus-

based studies on Chinese based studies on Chinese

learner Englishlearner English

SourcesSources

•China National Knowledge

Infrastructure (CNKI)(On-line

journals)

•Digital dissertation database

Corpus-based studies in mainland Corpus-based studies in mainland ChinaChina

Studies

Year

Articles dissertations

2006 9 7

2005 40 282004 29 172003 8 5

2002 6 5

2001 6 1

2000 1 0

Total 99 63

Research areasResearch areas

Articles

Dissertations

Total

Phonological 5 1 6

Lexical 43 48 91

Grammatical 27 8 35

Discourse 8 2 10

Others 16 4 20

Total 99 63 162

Conferences & workshopConferences & workshop

•The International conference on “Corpus Linguistics” 25-27 October, 2003

•The First National Symposium on corpus linguistics and ELT Education

11-13 October, 2004

•Workshop on the use of corpus in teaching and research 17-19 March, 2006

Topic ThreeTopic Three

Several corpus-based studies on

English learners’ interlanguage

by myself or together with my col

leagues

Study OneStudy One

Features of oral style in English compositions of advanced Chinese EFL learners

(Wen, Q.F. Ding, Y.R. & Wang, W.Y. 2003, Foreign Language Teaching & Research (4):268-274.

Study TwoStudy Two

A Study on Frequency Adverbs A Study on Frequency Adverbs

Used by Advance English Used by Advance English

Learners in China Learners in China

Wen, Q. F. & Ding, Y. R. 2004. Wen, Q. F. & Ding, Y. R. 2004.

Modern foreign languages(2): Modern foreign languages(2):

141-147.141-147.

Study ThreeStudy Three

An analysis of English Majors’ Abstracting abilities through their English compositions

Wen, Q.F. & Liu, R.Q. 2006. Foreign Languages (2)

Study FourStudy Four

•A longitudinal study on the developmental features of speaking vocabulary by English majors in mainland China

Wen, Q. F. 2006. Foreign Language Teaching and Research (3).

Study FiveStudy Five

•A comparison of developmental features of Speaking and Writing vocabulary by English majors

•Wen, Q. F. 2006. Foreign languages and Foreign Language Teaching (4)

Study SixStudy Six

Patterns of change in

speaking vocabulary

development by English

majors

Study TwoStudy Two

A Study on Frequency Adverbs A Study on Frequency Adverbs

Used by Advance English Used by Advance English

Learners in China Learners in China

Wen, Q. F. & Ding, Y. R. 2004. Wen, Q. F. & Ding, Y. R. 2004.

Modern foreign languages(2): Modern foreign languages(2):

141-147.141-147.

Frequency AdverbsFrequency Adverbs

•Adverbs used for

describing “how often”

something happens

•never, sometimes, usually,

always

Top Twenty Frequency Top Twenty Frequency AdverbsAdverbs

•Most frequently used by native

speakers according to the analyses of the British National Corpus (BNC) by Leech, Rayson and Wilson (2001)

Top Twenty Frequency Adverbs (TTFAs)Top Twenty Frequency Adverbs (TTFAs)Level of vocabulary

Frequency adverbs No.

1000-word level

never, always, often, ever, *sometimes, usually, once, generally, hardly, no longer, increasingly, *twice, in general, occasionally, mostly

15

2000-word level

frequently, rarely, regularly

3

Academic word list

normally, constantly 2

Common featuresCommon features

•All high-frequency words

•Different frequencies in speech and writing except sometimes and twice

(Leech et al. 2001)(Leech et al. 2001)

A comparison of TTFAs in speech aA comparison of TTFAs in speech and writingnd writing

•The overall difference TTFAs more likely occur in writing than in s

peech.

•The specific differences Speech: never, always, ever, normally Neutral: sometimes, twice Writing: 14 words

PPrevious corpus-based revious corpus-based studiesstudies

•e.g. Altenberg & Granger, 2001; Cobb, 2002; Ringbom, 1998; Wen, Ting, & Wang , 2003

•Conflicting finding one: overuse vs. underuse

ExamplesExamples

•Overuse high-frequency words in writing (Cobb, 2001)

•Overuse modal verbs (Aijmer, 2002)

•Underuse adverbial connectors (Altenberg & Tapper, 1998)

•No study on frequency adverbs

Conflicting finding twoConflicting finding two

•Tend to use written style features in their speech

•Tend to use a mixed register in either speech or in writing

•Tend to use oral style features in their writing

•Did not compare the use of high-frequency words in speech with writing

General purposes of this General purposes of this studystudy

Whether Chinese EFL learners simply oveWhether Chinese EFL learners simply ove

ruse the TTFAs or they overuse some whilruse the TTFAs or they overuse some whil

e underusing others e underusing others

whether they use the TTFAs similarly or dwhether they use the TTFAs similarly or d

ifferently when compared their speech wifferently when compared their speech w

ith writingith writing

Research questionsResearch questions

• Do they overuse or underuse the TTFAs differently between speech and writing?

• Do they differ more from native speakers in writing or in speaking with regard to the use of the TTFAs?

• Do they demonstrate a similar pattern of writing-speaking difference as native speakers in the use of the TTFAs?

Data for analysisData for analysisThe

learner corpus:

The corpus of English

majors in China

Spoken

(SECCL)

473,408 words

 

955,043 wordsWritten

(CLEC) 481,635 words

The native-speaker corpus:

The British

National Corpus(BNC)

Spoken(BNCS)

10 million words

100 million words

Written(BNCW)

90 million words

 

955,043 words

Data analysisData analysisFour comparisons

• Learners’ speech and native speakers’ speech

SECCL vs. BNCS

• Learner’s writing and native speakers’ writing CLEC vs. BNCW

• Dif. in learners’ speech & native speakers’ and Dif. In learners’ writing & native speakers’

SECCL vs. BNCS and CLEC vs. BNCW

• Dif. In learners’ speech & writing and dif. in native speakers’ speech & writing

SECCL vs. CLEC and BNCS vs. BNCW

Results(1)Results(1)TTFA use in learners’ spoken corpus (SECCL)Tendency Words

Overuse Always, once, often, sometimes, usually, hardly

(6 words/407 Occurrences)(6 words/407 Occurrences)

Underuse Normally, never, ever, twice, generally,in general, occasionally, no longer, constantly, increasingly

(10 words/48 occurrences)(10 words/48 occurrences)

Results(2)Results(2)TTFAs use in learners’ written corpus(CLEC)

Tendency Words

Overuse Always, sometimes, usually, no

longer, never, once, often,

generally, mostly

(9 words/125 occurrences)

Underuse Constantly, occasionally, ever,

regularly, rarely, frequently, twice,

increasingly, normally,

(9 words/37 occurrences)

Results(3)Results(3)Comparison of learners’ speech with their writing in TTFA use (Overuse)

Tendency Words Frequency difference

SECCL BNCS(Spoken) (6)

always, once, often, sometimes, usually, hardly

407

CLEC BNCW(Written) (9)

always, sometimes, usually, no longer, never, once, often, generally, mostly

125

Results(3)Results(3)Comparison (Underuse)

Tendency Words Frequency

difference

SECCL BNCS(Spoken) (10)

normally, never, ever, twice, generally, in general, occasionally, no longer, constantly, increasingly

- 48

CLEC BNCW(Written) (9)

normally, increasingly, twice, frequently, rarely, regularly, ever, occasionally, constantly

- 37

Results(3)Results(3)Comparison (identical or similar)

Tendency Words Frequency

difference

SECCL BNCS(Spoken) (4)

frequently, regularly, rarely, mostly

- 4

CLEC BNCW(Written) (2)

in general, hardly 3

Results(4)Results(4)Speaking-writing differences in TTFA use in the CEMIC and the BNC

Register-neutral Spoken-register sensitive

BNC TwiceSometimes (2)

Never, always, normally, ever (4)

CEMIC Constantly, never, regularly, rarely, increasingly, normally (6)

Always, once, often, sometimes, hardly (5)

Results(4)Results(4)Speaking-writing differences in TTFA use in the CEMIC and the BNC

Written-register sensitive

BNC Often, once, no longer, generally, increasingly, usually, frequently, hardly, rarely, regularly, constantly, in general, occasionally, mostly (14)

CEMIC No longer, generally, usually, in general, ever, mostly, occasionally, frequently, twice (9)

•English majors in China tend to overuse and underuse certain TTFAs in their speech and writing. The overuse tendency is stronger than the underuse tendency in both speech and writing.

Summary (1)Summary (1)

Summary (2)Summary (2)

•The overuse tendency is more marked in their speech than in their writing while the underuse tendency is also slightly stronger in speech than in writing. Some of the overused or underused TTFAs in speech are the same as those in writing but others are different.

Summary (3)Summary (3)

•Chinese English majors demonstrate a pattern of speaking-writing difference that is opposite to that shown in the native speakers’ corpus: they tend to use more TTFAs in their speech than in their writing while native speakers tend to use more TTFAs in their writing than in their speech. This shows that Chinese EFL learners use TTFAs without awareness of their register differences.

Possible reasonsPossible reasons

•Limited vocabulary (Table 1b)

•Use them as “time buyers”

•Without equivalents readily

available in Chinese

Topic FourTopic Four

Advantages and Advantages and

disadvantages of disadvantages of

corpus-based studies on corpus-based studies on

SLASLA

Advantage One Advantage One

•A large sample stored

electronically and open to the

public

– Validity and reliability

(replicable)

– Possible for a diachronic study

Advantage TwoAdvantage Two

•Using a computer software such as WordSmith– Effectiveness and efficiency

Advantage ThreeAdvantage Three

•Understand the learner language from a different perspective– Correct vs. incorrect

– More acceptable vs. less acceptable – Frequency

• Overuse

• Underuse

• unuse

Can Cannot Product Process

Productive Receptive

Group patterns Individual differences

Language use Language knowledge

DisadvantagesDisadvantages

Closing RemarkClosing Remark

•The number of researchers increasing

•Constructing different types of corpora

•Carrying corpus-based studies

•Findings useful for textbook writers as well as for practitioners

Thank you!!!