A Corpus Based Study of Connectors in Student Writing

18
A corpus-based study of connectors in student writing Research from the International Corpus of English in Hong Kong (ICE-HK) Kingsley Bolton, Gerald Nelson, and Joseph Hung The University of Hong Kong / The Chinese University of Hong Kong This paper focuses on connector usage in the writing of university students in Hong Kong and in Great Britain, and presents results based on the comparison of data from the Hong Kong component (ICE-HK) and the British component (ICE-GB) of the International Corpus of English (ICE). While previous studies of Hong Kong student writing have dealt with the ‘underuse’, ‘overuse’, and ‘misuse’ of connectors, this study confines itself to the analysis of underuse and overuse, and is especially concerned with methodological issues relating to the accurate measurement of these concepts. Specifically, it takes as its benchmark of overuse and underuse the frequency of connectors in professional academic writing, in this case the data in the ICE-GB corpus. The results show that measured in this way, both groups of students – native speakers and non-native speakers alike – overuse a wide range of connectors. The results offer no evidence of significant underuse. Keywords: connectors, student writing, ESL writing, academic writing, cohesion, coherence . Introduction This paper presents research findings from the analysis of corpus linguistic data generated by the International Corpus of English project in Hong Kong (ICE- HK). 1 The research reported on in this paper focuses on the comparison of the use of connectors in the writing of university students in Hong Kong and Britain, and presents results derived from the comparison of data from ICE- International Journal of Corpus Linguistics : (), ‒. ‒ ⁄ - ‒ © John Benjamins Publishing Company

Transcript of A Corpus Based Study of Connectors in Student Writing

Page 1: A Corpus Based Study of Connectors in Student Writing

A corpus-based study of connectorsin student writing

Research from the International Corpus ofEnglish in Hong Kong (ICE-HK)

Kingsley Bolton, Gerald Nelson, and Joseph HungThe University of Hong Kong / The Chinese University of Hong Kong

This paper focuses on connector usage in the writing of university students inHong Kong and in Great Britain, and presents results based on thecomparison of data from the Hong Kong component (ICE-HK) and theBritish component (ICE-GB) of the International Corpus of English (ICE).While previous studies of Hong Kong student writing have dealt with the‘underuse’, ‘overuse’, and ‘misuse’ of connectors, this study confines itself tothe analysis of underuse and overuse, and is especially concerned withmethodological issues relating to the accurate measurement of theseconcepts. Specifically, it takes as its benchmark of overuse and underuse thefrequency of connectors in professional academic writing, in this case thedata in the ICE-GB corpus. The results show that measured in this way, bothgroups of students – native speakers and non-native speakers alike – overusea wide range of connectors. The results offer no evidence of significantunderuse.

Keywords: connectors, student writing, ESL writing, academic writing,cohesion, coherence

. Introduction

This paper presents research findings from the analysis of corpus linguistic datagenerated by the International Corpus of English project in Hong Kong (ICE-HK).1 The research reported on in this paper focuses on the comparison ofthe use of connectors in the writing of university students in Hong Kong andBritain, and presents results derived from the comparison of data from ICE-

International Journal of Corpus Linguistics : (), ‒.

‒ ⁄ - ‒ © John Benjamins Publishing Company

Page 2: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

HK to data from the British component of the International Corpus of EnglishProject (ICE-GB). Whereas previous studies of Hong Kong student academicwriting have dealt with issues relating to ‘underuse’, ‘overuse’, and ‘misuse’, thispresent study confines itself to the analysis of underuse and overuse, and alsoincludes a detailed discussion of methodological issues. Such issues include,first, the identification of sets of ‘connectors’ in written academic texts, and,second, the quantitative units of analysis and comparison used in such studies.A key argument here is that the methodological design of such comparisonstudies has an important effect on determining validity and comparability ofthe results.

. Previous research

A substantial amount of previous research has been carried out on the anal-ysis of patterns of connector (or ‘connective’) usage in student writing. Muchof this research has been evidently motivated by the need to teach English-language learners in a ‘second-language’ (ESL) or ‘foreign-language’ (EFL) en-vironment, while other research has also been carried out in the context ofrhetoric and composition teaching programmes in North America. A relativelyearly study by Neuner (1987), for example, investigated the use of cohesive de-vices in ‘good’ and ‘poor’ freshman essays at a US college. While his paper doesnot deal exclusively with the use of connectors, Neuner highlights the waysin which cohesion in essay writing is achieved through a variety of cohesivedevices, including ‘chains’ of reference, conjunctions, and lexical ties (Neuner1987:101).

Much of the other research on this topic has been concerned with the writ-ing of students of English as a second/foreign language. Three recent academicpapers have previously investigated the broad area of interest of this currentstudy, i.e. the use of connectives in academic writing by Hong Kong Chinesestudents, for whom English is generally a language learned/acquired at school.Crewe (1990:316–317) sets out to examine ‘the misuse and overuse of logicalconnectors’ through the study of the writings of ESL students at The Universityof Hong Kong. Crewe notes that, in the Hong Kong context, expressions suchas on the contrary are frequently misused, and argues that such misuse may re-sult from pedagogic practice in textbooks and teaching that relies on paradig-matic lists of connectors. As significant, if not more worrying, for Crewe is the‘overuse’ of connectives, citing one student writer who packs a chain of expres-

Page 3: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

sions such as moreover, indeed, as a matter of fact, in actuality, however, never-theless into the space of just three short paragraphs of prose. Crewe observesthat the overuse of such devices might even be seen as a way of ‘disguisingpoor writing’ (Crewe 1990:321). In his conclusion, he again focuses on overusenoting that:

Over-use at best clutters up the text unnecessarily, and at worst causes thethread of the argument to zigzag about, as each connective points it in a dif-ferent direction. Non-use is always preferable to misuse because all readers,native-speaker or non-native-speaker, can mentally construe logical links inthe argument if they are not explicit, whereas misuse always causes compre-hensive problems and may be so impenetrable as to defy normal decoding.

(Crewe 1990:324)

Field and Yip (1992) use an experimental approach to study ‘internal conjunc-tive cohesion’ in the ESL writing of senior secondary/high school students atForm Six Level in Hong Kong. In this study they compare the use of connec-tors and other cohesive devices in the essays of three groups of Hong Kongstudents (67 students in all) with those used in the essay of ‘L1’ students fromSydney, Australia (29 students). The working hypothesis for this study was that‘Cantonese students writing in English use more conjunctive cohesive devicesin the organization of their essays than students at a similar educational levelwho are native English speakers’ (Field & Yip 1992:15). Following Halliday andHasan (1976), the authors adopt a four-way classification of cohesive devices interms of additive (also, and, furthermore, etc.) adversative (but, however, on theother hand, etc.), causal (hence, thus, etc.) and temporal categories (next, etc.).

The results of Field and Yip’s analysis again suggest that ‘L2’ writers fromHong Kong tended to ‘overuse’ such devices compared with the L1 Australiangroup, and they comment that:

The high frequency of devices in L2 and even in L1 scripts may be due to thelimited time provided for completion of the task. Content had to be devisedquickly and writers may have relied on organizational devices to shape theessay rather than a strong development of their thought. The . . . educationallevel of the writers, who would have little essay writing experience, may alsoaccount for an overall high use. (Field & Yip 1992:24)

They then note particular problems in the use of the connectors on the otherhand, moreover, furthermore and besides, among Hong Kong Chinese students.In the first case, they state that on the other hand is frequently used to make anadditional point, with no indication of an implied contrast, suggesting trans-

Page 4: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

fer from an approximate equivalent in the Cantonese L1 of student writers.In the case of the other three connectors, moreover, furthermore and besides,Field and Yip note that in their data these occurred only in the essays of HongKong students and invariably in initial position. Besides was seen as a particularproblem, as an example of misuse rather than overuse. They suggest that be-sides is common in (L1) English speech, but not in formal written English, andpropose that ‘it would be best to discourage the use of besides in essay writing’(Field & Yip 1992:26). They summarize findings of their research as follows:

It has shown that Cantonese speakers use far more devices than their nativespeaker counterparts, that many of them choose expressions that seldom ap-pear in the writing of L1 students of a similar age and educational level andthat they tend to choose the initial paragraph and sentence position ratherthan to place devices within the sentence. ESL writers in this study tended tochoose from a wider range of ICCs [internal conjunctive cohesion devices, i.e.connectors] than those who acquired the language naturally. An awareness ofthe variety of devices acquired from second language teaching led many writ-ers to overuse them and sometimes to misuse them.

(Field & Yip 1992:27)

The third Hong Kong study of this topic to appear in recent years is that ofMilton and Tsang (1993), who adopt a corpus-based approach to the study ofstudent writing, drawing on data which at that time formed part of a four-million-word (now larger) corpus of learner English, the Hong Kong Univer-sity of Science and Technology (HKUST) corpus of learner English. The par-ticular subset of data used for their analysis comprised 2,000 assignments writ-ten by around 800 first-year undergraduates, together with 206 examinationscripts from the composition section of the Hong Kong Examination Author-ity’s ‘A’ level Use of English examination (the equivalent to a high-school exittest in North America). Milton and Tsang’s study attempts to compare the useof connectors among Hong Kong students with that included in three ‘native-speaker’ corpora, i.e. the Brown Corpus, the London Oslo/Bergen (LOB) Cor-pus, and another corpus of their own which consists of computer sciencetextbooks.

Following the categorization of Celce-Murcia and Larsen-Freeman (1983),Milton and Tsang chose to study the occurrence and distribution of 25 single-word logical connectors, which they classified as additive (also, moreover, fur-thermore, besides, actually, alternatively, regarding, similarly, likewise, namely),adversative (nevertheless, although), causal (because, therefore, consequently),and sequential (firstly, secondly, previously, afterward(s), eventually, finally, lastly,

Page 5: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

anyhow, anyway). On the basis of the comparison of results from the HKUSTcorpus and the L1 corpora, the researchers identified 25 connectors which areregularly overused by Hong Kong students, i.e.

also, moreover, furthermore, besides, regarding, namely, nevertheless,although, because, therefore, firstly, secondly, lastly

Of these, they calculate that the connectors with the six highest rates of overuseare lastly (used on average 17.4 times more frequently in the Hong Kong datacompared with that of the L1 corpora), besides (with a ratio of 16.8), moreover(14.9), secondly (12.9), firstly (12.5), and consequently (11.2). Their analysis ofstudent difficulties in this aspect of essay-writing suggests that there are twomain problem areas, i.e. redundant use (‘overuse’), and misuse. By ‘redundantuse’ they mean that ‘the logical connector is not necessary; its presence doesnot contribute to the coherence of the text’. ‘Misuse’ occurs when ‘the use ofthe logical connector is misleading; another cohesive device should have beenused; the logical connector is placed inappropriately . . . [which] is related toloose organisation and faulty logic within the text’ (Milton & Tsang 1993:228).As an example of redundant use, they cite the following example with moreover:

Any animal or insects need to generate their next generation with no excep-tion. Moreover, the very first step is to date an opposite sex.

(Milton & Tsang 1993:228)

As an example of misuse, they focus on the use of therefore, which should, theyassert, be used ‘as a causal logical connector . . . where the cause precedes the ef-fect’. Thus, the following example of misuse is an instance of faulty logic ‘wheretherefore is used to force a conclusion from unsupported assumptions’:

In conclusion, beside the methods mentioned above, there are many othermethods of courtship and they are interesting. Therefore, its better for us tocontact more the nature. (Milton & Tsang 1993:230)

In their conclusion, Milton and Tsang reiterate that, in the writing of HongKong students ‘[t]here is a high ratio of overuse of the entire range of logicalconnectors in our students’ writing, in comparison to published English’ al-though they also concede that distributional patterns may also be affected bysuch factors as ‘genre’ and ‘variety’ (Milton and Tsang 1993:239).

Another very relevant study that employed a corpus-based approach to thisissue is that of Granger and Tyson (1996). In this study, the researchers analyseddata from a large-scale corpus of learner English, the International Corpus of

Page 6: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

Learner English (ICLE), in an analytical approach to what they call ‘contrastiveinterlanguage analysis’. Using corpus techniques, Granger and Tyson set outto compare a sample of almost 90,000 words of the mother-tongue ICLE sub-component (i.e. writing in French) to a 78,000 word sample of French studentswriting in English. Their hypothesis was that their research would reveal thattheir French learners would overuse connectors in their essay writing. Grangerand Tyson selected 108 connectors derived from Quirk et al.’s (1985) identi-fication and description of such cohesive devices. Using TACT concordancingsoftware, they then extracted all instances of these connectors in their data, andthen went on to calculate overall frequencies for these items, both raw frequen-cies and frequencies per 100,000 words (Granger & Tyson 1996:19–20). Onthe basis of frequency analysis, they were then able to proceed to the analysis ofstylistic, semantic and syntactic misuse.

The overall initial hypothesis concerning overuse was not supported bytheir results, but a number of rather subtle patterns did emerge with referenceto particular groups of connectors. Their analysis suggests that overuse doesoccur in the case of both corrobative (indeed, of course, in fact) and additive(moreover) connectors. Conversely, they were also able to show that there wasan evident underuse of contrastive connectors such as however, though, and yet.Instances of overuse involved the connectors actually, indeed, of course, more-over, e.g., for instance, namely, and on the contrary. Instances of underuse werefound in the cases of however, instead, though, yet, hence, then, therefore, andthus. Granger and Tyson then proceed to argue that examples of overuse may beplausibly related to patterns of transfer from French, substantiating this claimby reference to the results from their German sub-corpus. Later in their paper,they go on to look at issues relating to stylistic sensitivity, i.e. the [mis]use of in-formal connectors such as anyway, anyhow in formal essays, as well as syntacticissues, notably the tendency of learners to front such connectors to sentence-initial position (Granger & Tyson 1996:20–24). Their paper ends with a pleafor ‘much more research’, and for research oriented towards ‘contrastive inter-language investigations, noting that contrastive explanations need to be sup-plemented by analyses of ‘universal’ factors. On a pedagogic note, the authorsend with the hope that ‘heightened awareness of the semantic, stylistic and syn-tactic properties of connectors will lead students to think more carefully aboutthe ideas these connectors are linking’ (Granger & Tyson 1996:26).

Page 7: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

. Methodological issues

At least three major methodological issues arise from the previous research re-viewed above. These are (i) the identification of linguistic items as ‘connectors’;(ii) the calculation of the ‘ratio of occurrence’ (or ‘ratio of frequency’) of logi-cal connectors in corpus-based studies, and (iii) the measurement of ‘overuse’,‘underuse’ and ‘misuse’ of connectors using quantitative techniques.

With reference to the first issue, i.e. the identification of linguistic items as‘connectors’, remarkably, most researchers in previous studies appear to takethe identification of such items as uncontroversial and given. For example,Field and Yip (1992) base their analysis on Halliday and Hasan’s (1976) classifi-cation, while Milton and Tsang (1993) adopt a framework from Celce-Murciaand Larsen-Freeman (1983), and Granger and Tyson (1996) avail themselvesof a list of connectors derived from Quirk et al. (1985). In the course of ourown research, it became clear that such lists of connectors were neither uncon-troversial nor finite, and we were therefore moved to question a methodologythat relied purely on pre-existing categorizations.

The second issue of the measurement of the ‘ratio of occurrence’ of logi-cal connectors in corpus-based studies also arose through our reading of theliterature. As indicated above, the Crewe (1990) paper eschews a quantitativemethodology completely, while it was evident that the other studies in thisfield showed a distinct mismatch of the analytical units of quantitative anal-ysis used in corpus-based comparison studies. This is particularly true of themethods adopted by various researchers to calculate the ‘ratio of occurrence’of connectors in their linguistic data.

Field and Yip’s (1992) data-analysis relied on, first, a raw frequency countof the number of ‘conjunctive cohesive devices’ or connectors in terms of in-stances per L1 (‘English as a first language’) or per L2 (‘English as a second lan-guage’) group, and, second, the percentage of such connectors across the fourcategories of ‘additive’, ‘adversative’, ‘causal’, and ‘temporal’. No ‘ratio of occur-rence’ facilitating comparison across individuals and groups is noted. In thecase of Milton and Tsang’s (1993) study, the term ‘ratio of occurrence’ is em-ployed, although this is calculated simply by dividing the number of identifiedconnectors by the number of words in the corpus. Granger and Tyson (1996)estimate a raw frequency count of the target connectors in a native-speaker(L1) and a foreign/second language learner (L2) writing corpus, and then pro-ceed to calculate a ‘ratio of occurrence’ based on the frequency of occurrenceof connectors per 100,000 words of text.

Page 8: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

Of particular methodological interest here is the great variation in themethods of calculating the ‘ratio of occurrence’ of connectors in these pre-vious studies. Both Milton and Tsang (1993) and Granger and Tyson (1996)adopt a word-based calculation. In the former case, the ‘ratio of occurrence’ iscalculated from the simple division of the number of logical connectors by thetotal number of words in the corpus, while in the latter, it is calculated as thenumber of logical connectors per 100,000 words. To the authors of the presentstudy, this method of calculating the ratio of occurrence appears fundamentallyflawed. The primary function of connectors in academic texts is surely that ofrelating linguistic units at the sentential level and beyond, and yet a word-basedmethod of calculation ignores this fact. For example, a sample which contains100 sentences will offer at least twice as many opportunities to use a connectorthan a sample with only 50 sentences, regardless of the number of words ineach sample. We would thus argue that it is illogical to base the calculation ofsuch a ratio on the fundamental unit of the word, and in the present study, itwas crucially decided to adopt the sentence as the basic unit of analysis (see thenext section below).

Following on from this, we were also concerned about the various mea-sures previously adopted to arrive at calculations of the ‘overuse’, and ‘under-use’ of connectors. Crewe’s essential argument in his study is that in his data,Hong Kong students’ overuse of connectors may be motivated by their ‘try-ing to impose surface logicality on a piece of writing where no deep logical-ity exists’ but that the result is typically a ‘clutter’ that ‘makes the argumentextremely tortuous’ (Crewe 1990:320). This is, however, an impressionisticjudgement, and no quantitative data is presented. Field and Yip (1992) do pro-vide a quantitative analysis, but the basis of this is a comparison of a ‘nativespeaker group’ (Australian schoolchildren) with ‘three groups of Cantonesespeakers’ (schoolchildren from Hong Kong). Milton and Tsang compare con-nector usage in the academic writing of Hong Kong high school and universitystudents with that found in ‘three native speaker corpora’, i.e. the AmericanBrown corpus, the Lancaster-Oslo/Bergen (LOB) corpus, and a Hong Kongcorpus of extracts from first-year university computer textbooks (Milton &Tsang 1993:220). Granger and Tyson’s methodology relies on a comparisonof ‘non-native’ against ‘native’ corpus data, specifically, a corpus of native es-say writing compiled from the essays of British students (Granger & Tyson1996:18–19).

On consideration, it seemed to the authors of this current study that theselast three comparative studies all had methodological problems. Milton and

Page 9: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

Tsang’s study compares Hong Kong student academic writing against two gen-eral corpora, Brown and LOB (containing texts from newspaper, literature,popular writing, etc.), and against a very narrowly defined corpus of computertextbooks. Both Field and Yip and Granger and Tyson compare ‘non-native’student academic writing with ‘native’ student academic writing. In this lat-ter case, the assumption is that the best ‘target model’ for ‘non-native’ or ESLstudents is the writing of other students, those from a ‘native-speaking’ coun-try (however that is defined). Again, we would challenge that assumption, andwould instead argue that a better set of control data would be provided by acorpus of published academic writing in English. The target norm in academicwriting, for both ‘native’ and ‘non-native’ students is better defined as aca-demic writing itself, and the best texts for comparison are clearly those alreadypublished in international English-language academic journals.

. The present study

In this study, our data consists of 10 untimed essays and 10 timed examina-tion scripts written by undergraduate Hong Kong students. The data comprises2755 sentences (46,460 words), and is part of the Hong Kong component of theInternational Corpus of English (ICE-HK). In addition, we examine the cor-responding data from the British component of the International Corpus ofEnglish (ICE-GB), comprising 2471 sentences and 42,587 words.

With reference to the three methodological concerns identified above, i.e.the identification of linguistic items as ‘connectors’, the measurement of the‘ratio of occurrence’ of connectors in our data, and the calculation of ‘overuse’and ‘underuse’ of connectors, a number of measures were adopted in orderto avoid inconsistencies in the research method. First, the list of connectorswe chose to identify and investigate were not derived from pre-existing cat-egorizations provided by Halliday and Hasan (1976), Quirk et al. (1985), orsimilar pedagogic and reference grammars, but devised ourselves by analyz-ing the subset of academic writing taken from the ICE-GB corpus. This con-sists of 40 samples, taken from academic papers and books across a range ofdisciplines, published between 1990 and 1993 inclusively. It comprises 85,628words, in 4,507 sentences. Here, our approach was to initially identify the con-nectors used by text authors in the academic writing component of ICE-GB asa valid starting point for the analysis which followed. This approach had theimportant advantage of giving us a reliable and non-arbitrary list of connec-

Page 10: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

tors to form the basis of our study. We found that the use of this more ‘realistic’list of connectors greatly improves the accuracy of the analysis which followed,as it was then possible to use this same list as a benchmark when calculatinginstances of ‘overuse’ or ‘underuse’.

Table 1 below shows a complete list of the connectors found in the aca-demic writing data in ICE-GB, together with their raw frequencies, and theirfrequencies per sentence (multiplied by 1000). The list contains a total of 54connectors; we include the complete list here in order to show the wide va-

Table 1. All connectors in the academic writing category, ICE-GB corpus (The figuresin parentheses are raw frequencies. Total number of sentences = 4,507; total number ofwords = 85,628)

Connector Frequency per sentence (× 1000)

however 20.4 (92)

therefore 10.6 (48)

but, then 8.6 (39)

thus 7.8 (35)

indeed 5.5 (25)

and, so 4.0 (18)

in fact 3.5 (16)

hence 3.3 (15)

moreover 2.4 (11)

consequently, first, on the other hand 2.2 (10)

rather 2.0 (9)

instead 1.5 (7)

nevertheless 1.3 (6)

again, in other words, nonetheless, secondly, second 1.1 (5)

as a result, on the whole, though 0.9 (4)

at the same time, firstly, on the contrary 0.7 (3)

alternatively, conversely, furthermore, in contrast 0.4 (2)

above all, accordingly, also, at any rate, by comparison, 0.2 (1)by contrast, finally, first of all,in any case, in effect, in short, in sum, in the event,in the first place, in total, in turn, lastly, or, overall, still, yet

TOTAL 107.8 (487)

Page 11: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

riety of connectors that are available, although many of them have very lowfrequency.

The other major methodological consideration we have is the calculationof ‘ratio of occurrence’ (termed ‘ratio of frequency’ in this study). As may beseen from Table 1, the base unit for our analysis is the sentence, for the rea-sons we explain in the previous section of this paper. Therefore the frequencyof connectors per 100,000 words, as presented by Granger and Tyson, is, wecontend, not an appropriate measure of connector frequency. In all cases, ourfrequencies per sentence are multiplied by 1,000, in order to eliminate verylow figures.

The next stage of our analysis was thus to compare the frequencies of theseconnectors in the writing of Hong Kong students (from ICE-HK) and in thewriting of British students (from ICE-GB). For ease of comparison, Table 2 alsoreproduces the data in Table 1 for academic writing.2

With reference to Table 2 below, we can see that in both of the studentdatasets, 19 of the connectors that are used in academic writing have a score ofzero. The following connectors are not used at all by the students:

on the whole, on the one hand, in contrast, in sum, in the event, in total, or,still

Both groups of students use a smaller number of different connectors thantheir academic counterparts, so we would expect some overuse of these. Thetotal figures in Table 2 show that this is indeed the case. Both the Hong Kongstudents and the British students overuse these particular connectors. How-ever, this overuse is much greater on the part of Hong Kong students, who usethese connectors more than twice as much as academic writers (239.6 per 1,000sentences, compared with 107.8).

The Hong Kong students show much greater differences, in terms of fre-quency, from the academic norm. The most overused connector in the HongKong data is so, with a difference of +31.6 from the academic norm. In theBritish data, however is most overused, with a corresponding difference of+20.5. In the case of other overused connectors, the differences from the aca-demic norm are comparatively high for Hong Kong students, and compara-tively low for British students. This can be seen in Table 3, which shows thetop ten most overused connectors from each dataset, together with the differ-ence from the academic norm, and the mean difference from the norm. For thetop 10 rank positions, the Hong Kong data shows a mean difference from theacademic norm of +11.8 – almost twice that for the British data (+6.7).

Page 12: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

Table 2. Connectors in students’ writing, in comparison with academic writing (The+/– columns show the difference between the relevant value and the value in academicwriting; a positive value denotes overuse, a negative value denotes underuse. Rf = Ratioof frequency)

Hong Kong Great Britain AcademicFreq. Rf per 1000 (+/–) Freq. Rf per 1000 (+/–) Freq. Rf per 1000

sentences sentences sentences

however 65 23.6 +3.2 101 40.9 +20.5 92 20.4therefore 52 18.9 +8.2 47 19.0 +8.4 48 10.7but 47 17.1 +8.4 14 5.7 –3.0 39 8.7then 45 16.3 +7.7 28 11.3 +2.7 39 8.7thus 50 18.1 +10.4 36 14.6 +6.8 35 7.8indeed 4 1.5 –4.1 11 4.5 –1.1 25 5.5and 77 27.9 +24.0 11 4.5 +0.5 18 4.0so 98 35.6 +31.6 40 16.2 +12.2 18 4.0in fact 22 8.0 +4.4 9 3.6 +0.1 16 3.6hence 8 2.9 –0.4 9 3.6 +0.3 15 3.3moreover 28 10.2 +7.7 1 0.4 –2.0 11 2.4consequently 3 1.1 –1.1 2 0.8 –1.4 10 2.2first 8 2.9 +0.7 1 0.4 –1.8 10 2.2on the other hand 20 7.3 +5.0 0 0.0 –2.2 10 2.2rather 3 1.1 –0.9 2 0.8 –1.2 9 2.0instead 1 0.4 –1.2 2 0.8 –0.7 7 1.6nevertheless 5 1.8 +0.5 3 1.2 –0.1 6 1.3again 0 0.0 –1.1 4 1.6 +0.5 5 1.1in other words 5 1.8 +0.7 1 0.4 –0.7 5 1.1nonetheless 1 0.4 –0.7 2 0.8 –0.3 5 1.1secondly 8 2.9 +1.8 4 1.6 +0.5 5 1.1second 6 2.2 +1.1 0 0.0 –1.1 5 1.1as a result 9 3.3 +2.4 5 2.0 +1.1 4 0.9on the whole 0 0.0 –0.9 0 0.0 –0.9 4 0.9though 1 0.4 –0.5 8 3.2 +2.4 4 0.9at the same time 0 0.0 –0.7 1 0.4 –0.3 3 0.7firstly 13 4.7 +4.1 13 5.3 +4.6 3 0.7on the contrary 2 0.7 +0.1 0 0.0 –0.7 3 0.7on the one hand 0 0.0 –0.7 0 0.0 –0.7 3 0.7alternatively 0 0.0 –0.4 1 0.4 0.0 2 0.4conversely 0 0.0 –0.4 1 0.4 0.0 2 0.4furthermore 10 3.6 +3.2 15 6.1 +5.6 2 0.4in contrast 0 0.0 –0.4 0 0.0 –0.4 2 0.4above all 1 0.4 +0.1 0 0.0 –0.2 1 0.2accordingly 2 0.7 +0.5 0 0.0 –0.2 1 0.2

Page 13: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

Table 2. (Continued.)

also 43 15.6 +15.4 7 2.8 +2.6 1 0.2at any rate 0 0.0 –0.2 0 0.0 –0.2 1 0.2by comparison 0 0.0 –0.2 1 0.4 +0.2 1 0.2by contrast 0 0.0 –0.2 0 0.0 –0.2 1 0.2finally 11 4.0 +3.8 4 1.6 +1.4 1 0.2first of all 1 0.4 +0.1 0 0.0 –0.2 1 0.2in any case 0 0.0 –0.2 1 0.4 +0.2 1 0.2in effect 0 0.0 –0.2 1 0.4 +0.2 1 0.2in short 2 0.7 +0.5 0 0.0 –0.2 1 0.2in sum 0 0.0 –0.2 0 0.0 –0.2 1 0.2in the event 0 0.0 –0.2 0 0.0 –0.2 1 0.2in the first place 2 0.7 +0.5 0 0.0 –0.2 1 0.2in total 0 0.0 –0.2 0 0.0 –0.2 1 0.2in turn 0 0.0 –0.2 4 1.6 +1.4 1 0.2lastly 2 0.7 +0.5 4 1.6 +1.4 1 0.2or 0 0.0 –0.2 0 0.0 –0.2 1 0.2overall 0 0.0 –0.2 1 0.4 +0.2 1 0.2still 0 0.0 –0.2 0 0.0 –0.2 1 0.2yet 5 1.8 +1.6 0 0.0 –0.2 1 0.2

TOTAL 660 239.6 +131.7 395 159.9 +52.0 486 107.8

Table 3. The top 10 most overused connectors, with their differences from the aca-demic norm

Rank Hong Kong Great Britain

1 so (+31.6) however (+20.5)2 and (+24.0) so (+12.2)3 also (+15.4) therefore (+8.4)4 thus (+10.4) thus (+6.8)5 but (+8.4) furthermore (+5.6)

6 therefore (+8.2) firstly (+4.6)

7 moreover (+7.7) then (+2.7)

7 then (+7.7) also (+2.6)

9 on the other hand (+5.0) though (+2.4)

10 in fact (+4.4) finally, in turn, lastly (+1.4)

Mean difference = +11.8 Mean difference = +6.7

The connectors so and and are particularly overused by the Hong Kong stu-dents. The British students also overuse so, though the majority of the Britishoveruse is attributable to the frequency of however. At 40.9 instances per 1,000

Page 14: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

sentences, the British students use however about twice as much as academicwriters (20.4). In contrast, the Hong Kong students (23.6) are quite close to theacademic norm in the use of this connector. Both groups of students overusetherefore and thus. The figures for but, moreover and on the other hand are alsoworth noting. These connectors are overused by the Hong Kong students, andslightly underused by the British students. Instances of connector overuse areillustrated in the following excerpts from ICE-HK (Excerpt 1) and ICE-GB(Excerpt 2). Spelling errors are in the originals.

Excerpt 1: Student writing from ICE-HK(ICE-HK-W1A-014: Timed examination script in Music, by a student in aHong Kong university)

<#98:1> So, we can see that the British opera now become more human-ity, not only reflects the king or Queen.<#99:1> And, in the hamony, the development is the mostly use withtonality.<#100:1> (i.e. with Center key).<#101:1> Besides also use with the Aeolian Dolian and Phygian mode forthe hamony.<#102:1> And modulation is also fully used.<#103:1> However, the techniques of the 20th century such as the atonal-ity, bitonality, are also used.<#104:1> And in Britten opera’s one technique he has used is interestingi.e. reconile the hostile key by enharmonic mean.<#105:1> In the Peter Grimes, the last Prologue, Peter and Ellen first meetand sing in the F minor and E major.<#106:1> However, later, Peter sing in A flat and G sharp and as a resultthey sing in the unison.<#107:1> On the other hand, the using of orchestra is developied.

Excerpt 2: Student writing from ICE-GB(ICE-GB-W1A-020: Timed examination script in Geology, by a student at aBritish university)

<#42:2> Intrusive igneous rocks include such formations as sills.<#43:2> These are generally concordant to the rocks.<#44:2> (However on distinguishing between sills ands and lava flows sillsmay transgress the bedding planes).<#45:2> Sills generally form along bedding planes, and may extend formany miles.

Page 15: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

<#46:2> Therefore on erosion of the rocks after uplift and the variousprocesses of denudation resulting in the topography, the sill may be ex-posed as a linear feature.<#47:2> It may also be exposed as a wide plane.<#48:2> However this is rare as generally rocks are deformed after depo-sition and rarely remain horizontal.<#49:2> Dykes are another igneous intrusion which generally form fromsills.<#50:2> They are a vertical, discordant feature which cut across the bed-ding planes.<#51:2> Therefore on a horizontal plane they are linear features at prac-tically 90◦ to the surface.

The connectors which have a score of zero in the student datasets are instancesof non-use, rather than underuse (see Table 2). Most of them also have fairlylow frequencies in academic writing. A notable exception to this is on the otherhand, which is not used at all by the British students, but is overused by theHong Kong students. In the Hong Kong data, the figures for indeed, conse-quently, and again indicate some underuse. However, across the entire rangeof connectors, the figures for underuse are noticeably lower than those foroveruse. In summary, Table 2 shows considerable levels of overuse and muchsmaller levels of underuse, on the part of all the students.

What is also significant is the fact that the results of this present studyconfirm a number of findings from the two earlier Hong Kong based studies,but contradict many more. For example, our results do agree with Field andYip’s (1992) finding that on the other hand and besides3 occur only in the HongKong student data, and that moreover is somewhat overused by the Hong Konggroup. The results for Hong Kong student writing indicate a Rf (ratio of fre-quency) of 10.2 compared to an Rf of 0.4 for the ICE-GB student group, and anRf of 2.4 for the ICE-GB academic writing group (see Table 2 above). However,there is no support for their contention that furthermore is overused by HongKong students in comparison with a ‘native speaker’ group of students. In fact,in our data, the Rf for this connector is 3.6 occurrences per 1,000 sentences, incomparison with a figure of 6.1 for the British students in ICE-GB, and a fig-ure of 0.4 for the ICE-GB academic writing group. More significantly, perhaps,Field and Yip’s earlier study also fails to clearly profile the high frequencies ofthe ‘top five’ connectors used by Hong Kong student writers (Table 3), i.e. so(Rf, 31.6), and (24.0), also (15.4), thus (10.4), and but (8.4). Our results in Ta-ble 3 also directly contradict those of Milton and Tsang (1993:226), whose rank

Page 16: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

ordering of overused connectors gives the following result: lastly (1), besides(2), moreover (3), secondly (4), and firstly (5).

. Conclusion

In this paper, we have presented the results of a systematic analysis of connec-tors in the writing of students in Hong Kong (as represented in the ICE-HKcorpus), and students in Britain (ICE-GB), using a corpus of published aca-demic writing from ICE-GB as a benchmark against which to measure ‘overuse’and ‘underuse’.

The results presented here indicate clearly that the overuse of connectorsis not confined to non-native speakers, but is a prominent feature of students’writing generally. Both non-native (Hong Kong) students and native (British)students use a considerably smaller number of different connectors in theirwriting than professional academics. As a result, both sets of students tendto overuse those connectors within their repertoire, and this overuse is muchgreater in the corpus of Hong Kong student writing, particularly with itemssuch as so, and, also, thus, and but. In the British data, overuse is most markedwith the items however, so, therefore, thus, and furthermore. Our analysis raisesa number of critical issues concerning the methodology used in corpus-basedstudies related to the identification of target norms of linguistic behaviour, andthe measurement of ‘ratios of occurrence’ (or ‘ratios of frequency’). A centralargument in this paper is that the precise methodology of studies of this kindhas a direct effect on the validity and comparability of the results attained. Themethodological inconsistencies of previous studies, as indicated above, maywell explain why the results presented in this study tend to directly contradictthose obtained by previous researchers in Hong Kong.

Notes

. The ICE Hong Kong project has been supported by a grant from the Research GrantsCouncil of the Hong Kong Special Administrative Region, China (Project No. HKU7174/000H). The ICE Hong Kong project aims to collect, computerize, and analyze onemillion words of spoken and written Hong Kong English from the 1990s. Each word willbe labelled for its wordclass (noun, verb, etc.) and sample speech recordings will be dig-itized and aligned to the transcriptions. The ICE corpus will be the most comprehensivedatabase of Hong Kong English ever compiled. This research is being conducted in parallel

Page 17: A Corpus Based Study of Connectors in Student Writing

Connectors in Hong Kong student writing

with nineteen other national or regional varieties of English from around the world, includ-ing Australia, Canada, East Africa, Great Britain, Malaysia, New Zealand, South Africa, theCaribbean, the Philippines, and the United States (Greenbaum 1996:3–5).

. Unlike the ICE-GB corpus, the Hong Kong corpus has not yet been POS-tagged, so allresults for Hong Kong are based on a manual examination of the data. In order to ensureconsistency with the results from ICE-GB, the following procedures have been followed:

(a) And, but, and or are counted as connectors only when they occur in sentence-initialposition.

(b) In the case of then, we distinguish between adverbial then, which expresses temporalsequence, as in [1], and connector then, which is used to develop the argument, as in[2] and [3]:

[1] The simple sugar formed is then fermented by yeast to form alcohol.[ICE-HK-W1A-016]

[2] According to the above evidence, we find that women usually are under men’scontrol in working sphere. Then how about the role taking of women in family?[ICE-HK-W1A-008]

[3] The result of these injunctions, then, was to promote the constant accumulation ofcapital [...] [ICE-HK-W1A-003]

. In the ICE-HK data, on the other hand occurs 20 times (7.2 per 1000 sentences). In eachinstance, it occurs without the corresponding connector on the one hand. This confirms Fieldand Yip’s (1992) observation that it is misused – and not simply overused – by Hong Kongstudents, who use it to add information, without any expresssion of contrast. In the samedata, besides occurs 30 times (10.9 per 1000 sentences), in each case in sentence-initial posi-tion, and often in paragraph-initial position, again with no apparent expression of contrast.The following example illustrates the misuse of this connector:

<#92:1> For example, in Britten opera’s A Midsummer Night’s Dream and Tip-pett’s Midsummer Marriages are the subject from Shakesperian The MidsummerNight’s Dream.<#94:1> Besides, Tippett’s The knot Garden also from Shakesperian’s All wells thatend wells </p><p> <#95:1> Besides they also deal with the contrast between the collective activitiesand the loneliness and misery of discontented individual e.g. Peter Grimes.[ICE-HK-W1A-014]

References

Celce-Murcia, M., & D. Larsen-Freeman (1983). The grammar book: An ESL/EFL teacher’scourse. Rowley, Mass.: Newbury House.

Crewe, W. (1990). The illogic of logical connectives. ELT Journal, 44 (4), 316–325.

Page 18: A Corpus Based Study of Connectors in Student Writing

K. Bolton, G. Nelson, and J. Hung

Field, Y., & L. M. O. Yip (1992). A comparison of internal conjunctive cohesion in theEnglish essay writing of Cantonese speakers and native speakers of English. RELCJournal, 23 (1), 15–28.

Granger, S., & S. Tyson (1996). Connector usage in the English essay writing of native andnon-native EFL speakers of English. World Englishes, 15 (1), 17–27.

Greenbaum, S. (1996). Comparing English World-Wide. Oxford: Clarendon Press.Halliday, M. A. K., & R. Hasan (1976). Cohesion in English. London: Longman.Milton, J., & E. S. C. Tsang (1993). A corpus-based study of logical connectors in EFL

students’ writing: directions for future research. In R. Pemberton & E. S. C. Tsang(Eds.), Studies in Lexis (pp. 215–246). Hong Kong: The Hong Kong University ofScience and Technology Language Centre.

Neuner, J. L. (1987). Cohesive ties and chains in good and poor freshman essays. Research inthe Teaching of English, 21, 92–103.

Quirk, R., S. Greenbaum, G. Leech, & J. Svartvik (1985). A Comprehensive Grammar of theEnglish Language. London: Longman.