Noun+noun Collocations in Learner Writing

download Noun+noun Collocations in Learner Writing

of 11

description

Noun+noun Collocations in Learner Writing

Transcript of Noun+noun Collocations in Learner Writing

  • Received in revised form 26 July 2015

    Corpus analysisCollocationsNounenoun phrasesInuence of L1EFL compared to ESL

    investigate use of nounenoun collocations by learners in their academic writing. The

    noun

    shortage.ed differences be-onal phrases, andprocient writersestigate the use ofrociency may in-

    uence their use of nounenoun phrases.To do this, I use three sub-corpora of the International Corpus of Learner English (ICLE) (Granger, Dagneaux, Meunier, &

    Paquot, 2009) to compare three groups of similar prociency. In doing so I seek to consider the relevance rstly of whetherEnglish is being learnt as a foreign or second language, and secondly the inuence of the nature of the noun phrase in thelearners' L1.

    E-mail address: [email protected].

    Contents lists available at ScienceDirect

    Journal of English for Academic Purposes

    Journal of English for Academic Purposes 20 (2015) 103e113Examples of noun phrases in which nouns pre-modify the head noun include air pollution, and electricityPrevious ndings concerning noun modiers in learner writing (Parkinson & Musgrave, 2014) indicat

    tween less and more procient student writers. Certain noun modiers, including nouns, prepositiappositive noun phrases were used signicantly more frequently by more procient student writers. Lessrelied more heavily on adjectives to modify nouns in their writing. The focus in the present study is to invnounenoun phrases by non-native writers more closely, and to consider what possible factors besides pEnglish for academic purposes teachers, because of the tendency in academic writing to increase conciseness by packinginformation into noun phrases. Condensing a clause into a nounenoun phrase is one way that such conciseness is achieved.1. Introduction

    This study examines use of nounehttp://dx.doi.org/10.1016/j.jeap.2015.08.0031475-1585/ 2015 Elsevier Ltd. All rights reserved.accuracy of nounenoun phrases is signicantly greater in the writing of ESL learners. Asecond question considered is what inuence the presence or absence of nounenounphrases in the rst language (L1) has on learner use of these phrases in English. For thispurpose, production of nounenoun phrases in written English by L1 Mandarin writers (alanguage that permits nounenoun phrases) is compared to writing by L1 Spanish writers(a language that does not allow nounenoun phrases). Findings are that learners whose L1permits nounenoun phrases produce signicantly more of them in English than learnerswhose L1 does not. Problems that learners had in forming nounenoun phrases are dis-cussed qualitatively, and implications for EAP teaching are suggested.

    2015 Elsevier Ltd. All rights reserved.

    phrases in academic writing by learners of English. Noun phrases are of interest toKeywords:guage rather than as a second language. The study therefore compares the inuence of ESLand EFL learning contexts on learner use of nounenoun collocations. Findings are thatAccepted 6 August 2015Available online xxx

    literature to date has focused on contexts where English is being learnt as a foreign lan-Nounenoun collocations in learner writing

    Jean ParkinsonSchool of Linguistics and Applied Language Studies, Victoria University of Wellington, New Zealand

    a r t i c l e i n f o

    Article history:Received 7 November 2014

    a b s t r a c t

    Studies of collocations to date have emphasised use and learning of nouneverb and ad-jectiveenoun collocations. This study uses three sub-corpora of the ICLE corpus to

    journal homepage: www.elsevier .com/locate/ jeap

  • J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e1131042. Literature review

    I begin this section by reviewing studies of collocation, before looking at restrictions on formation of nounenoun phrasesin English.

    2.1. Collocations

    Collocations have been considered from a phraseological perspective and from a frequency-based perspective. From aphraseological perspective, word combinations are considered along a continuum from xed idioms through words thatcombine with a restricted set of other words (e.g. television set/programme/viewers) to words that appear to combine freelywith a range of other words.

    Nesselhauf's (2003) phraseology-based study of verbenoun combinations in the written English of the German sub-corpus of ICLE found that learners made considerably more mistakes with combinations without word for word corre-spondence in the German and English combinations. This is of relevance to my study in which one of the three data setsconcerns a language (Mandarin) which allows nounenoun phrases. The L1 in such a case may inuence use in the L2.Nesselhauf (2005) for example found that around 50% of inappropriate verbenoun collocations could be traced to thelearners' L1, and Laufer and Waldman (2011), studying verbenoun collocations, also found L1 inuence in about 50% ofatypical verbenoun collocations.

    Laufer andWaldman (2011) found that learners used fewer different collocations than native speakers. Advanced learnersused more collocations than less procient learners but, in both more advanced and less procient learners, around one thirdof collocations were atypical of NS usage. However, Thewissen (2013) reported that advanced learners produced a largernumber of near-hits compared to intermediate learners. This supports a suggestion by Schmitt and Carter (2004, p. 5) thatlearning of these chunks is not all or nothing, but that they can initially be learnt incompletely. This notion of incomplete,gradual learning is also found a review by Boers and Lindstromberg (2012), which suggests that uptake of collocations as aresult of meaning-focused input alone is incremental, requiring many encounters with the same phrase. From the oppositeperspective, Wray (2008) suggests that formulaic chunks may initially be learnt whole and only analysed into theircomponent parts later if necessary. Evidence of the learning of formulaic language holistically is suggested by Boers andLindstromberg (2009) who found that L2 learners doing dictation exercises hear and write unfamiliar lexical phrases assingle non-words. Although my study is not about learning collocations, I review these studies here because they shed lighton certain erroneous collocations in my data sets.

    In contrast with a phraseological denition of collocation, a frequency-based denition identies collocations ascombinations in which two words are more likely to co-occur than would be expected based on the statistical frequency ofeach word. In a study of adjectiveenoun collocations in writing by both native and non-native speakers (NNS), Siyanovaand Schmitt (2008) found in NNS writing a mixture of collocations that are frequent (>5 times) in the BNC, those infre-quent in the BNC ( 3 is suggested by Hunston(2002) as a signicant collocation threshold.

  • As I show below, the data indicate that the L2 writers in the study had a number of problems in producing nounenounphrases. To assist in describing and in explaining these problems, I briey consider an aspect of English nounenoun phrases:the case of plural pre-modifying nouns.

    2.2. Plural nouns as pre-modiers in nounenoun phrases in English

    One problem experienced by writers in this study is that of using a plural pre-modifying noun when a singular noun isappropriate. Most pre-modifying nouns in English are singular (Biber, Johansson, Leech, Conrad, & Finegan, 1999), making itdifcult for learners to know when a plural pre-modier is appropriate. In deciding this, it is possible that a writer's L1 may

    in these corpora is reported as being higher intermediate to advanced level (Granger et al., 2009, p. 11). The Mandarin and

    J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113 105Mean total nounenoun combinations per 400 word text 6.43 SD 4.52 2.36 SD 2.05 4.52 SD 3.72 p < 0.0001 p 0.0013 p < 0.0001Mean unique nounenoun combinations per 400 word text 5.10 SD 3.52 2.13 SD 1.76 3.64 SD 2.71 P < 0.0001 P 0.0012 P < 0.0001Mean unique appropriate combinations per 400 word text 4.06 SD 3.03 1.57 SD 1.52 3.03 SD 2.45 P < 0.0001 p 0.0089 p < 0.0001Mean unique inappropriate combinations per 400 word text 1.05 SD 1.22 0.56 SD 0.78 0.61 SD 0.71 p 0.0009 p 0.0021 p 0.6364 nsSpanishwriters acquired English as a foreign language, while the Tswanawriters acquired it as a second language. It should bementioned that the ESL context of the Tswanawriters is one that is typical in contexts such as South Africa, India, Nigeria andother ex-colonies of Britain, in which the inuence of the colonial language is still very substantial. The colonial language isused in education, commerce and administration. In such contexts English arguably plays a similar role to Latin in medievalEurope. Discussing different understandings of what the term ESL means, Nayar (1997) distinguishes the above ESL contextfrom the ESL contexts which may be more familiar to some readers in which migrants to an English country (e.g. Australia)learn English. In South Africa, English, the L1 of only 10% of the population (Census in brief, 2012) is the language of gov-ernment, business, andmost signicantly, of most schools. As van Rooy (2009) notes, although the Tswanawriters of this sub-corpus used English as a medium of instruction at school and university, they live in a part of South Africa where exposure toL1 speakers of English is minimal, and use of English outside of the classroom or ofcial contexts may be limited. This istypical of Nayar (1997) rst ESL context.

    To attain sub-corpora of similar sizes, the age of the writers of the Spanish sub-corpus was selected as 21 or less, while theage of the writers of the Tswana sub-corpus were selected as 22 or less. The rst 400 words of one hundred essays wereincluded in each sub-corpus, making each of the three sub-corpora 40 000 words in length. The purpose of ensuring that alltexts were the same length is because part of what I wish to measure is lexical diversity, and this is sensitive to the length ofthe text; the longer the text the more likely the writer is to repeat a particular word or phrase. All three sub-corpora drew onwriting on a limited range of topics. Table 1 shows a mean per-text count of nounenoun combinations in each sub-corpus.

    Table 1Mean nounenoun combinations per text in the sub-corpora.

    Mandarinsub-corpus

    Spanishsub-corpus

    Tswanasub-corpus

    t-Test M/Sp t-Test M/T t-Test Sp/Tinuence them. Languages differ as to how plural meaning is marked. Some languages mark all nouns for number usinggrammatical morphemes. In many such languages, including Spanish, noun modiers agree with the head noun for numberand gender (e.g. in Spanish el chico alto the tall boy; la chica alta the tall girl; las chicas altas the tall girls). Similarly, in Tswana,adjectives must show concord with nouns both in class and number (e.g. monna yo moleele the man is tall or the tall manbut banna ba baleele the men are tall or the tall men).

    When learners whose L1 shows number agreement between noun modiers and headnoun produce phrases such as*bombs blasts, we might hypothesise that they have been inuenced by their L1 to make the pre-modier agree in numberwith the head noun. However languages such as Mandarin have no nominal endings for singular and plural nouns andplurality is not indicated with inection, so for L1 speakers of these languages, explaining phrases like *bombs blasts as anattempt to make the noun modier and headnoun agree in number is less convincing.

    Based on this difference, we might hypothesise that writers in whose L1 noun modiers must agree with the nouns theymodify (e.g. Tswana and Spanish) might try to make pre-modifying nouns agree in number with the noun they modify. Myresearch design allows investigation of this possibility. A corpus by L1 speakers of Mandarin (in which there is no numberagreement between nouns and their modiers), is compared to corpora of writing by L1 speakers of Spanish and another byL1 Tswana writers. In both of these, noun modiers must agree in number with the headnoun. The comparison of corporaproduced by writers from Spain (an EFL context) and L1 Tswana writers from South Africa (an ESL context) is designed toindicate the inuence, if any, of an EFL compared to an ESL learning context.

    3. Methods

    Argument essays from the ICLE corpus written by L1 speakers of Mandarin, Spanish and Tswana form the three sub-corpora for this study. This corpus consists entirely of argumentative essays in response to essay titles suggested by theCentre for English Corpus Linguistics at the Universite catholique de Louvain (Paquot, 2012a). The prociency level of writing

  • frequency of occurrence is a high proportion of the overall frequency of either of the pair (Collins wordbanks online,

    J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e1131062008), nounenoun pairs with MI < 3 may be pairs in which the individual words collocate with a range of otherwords. In this category I included all nounenoun phrases with MI < 3, no matter how frequent they were.

    As Siyanova and Schmitt (2008) note, however, merely because a lexical phrase is not attested in a large corpus, this doesnot mean that it is unacceptable. Transient formations may also be acceptable. Therefore, all nounenoun combinations notconrmed by the BNC or COCA, were examined in context by eleven raters, and rated as either appropriate or inappropriate. Afurther two categories are:

    4. Nounenoun phrases not attested in the BNC or COCA, but which were judged acceptable by 8 out of 11 raters whowere L1English speakers. Examples of lexical phrases rated as acceptable but which were unattested in the BNC or COCA arecartoon language, tourism student, and hippy ideology.

    5. Nounenoun phrases not attested in the BNC or COCA andwhich were judged unacceptable by four ormore raters out of 11.

    The eleven raters (of whom the author was one) included speakers of New Zealand English (7), British English (2), Ca-nadian English (1) and South African English (1). Raters were all either teachers of writing, or Linguistics students. UsingRandolph's (2008) online kappa calculator, inter-rater reliability was calculated at kappa 0.694, just short of the 0.7 thatRandolph (2008) views as adequate agreement. Because nounenoun combinations are used in Mandarin, translation fromthe L1may inuence use of nounenoun combinations in English. To quantify this inuence, a Mandarin rater judged whetheror not the inappropriate combinations used by the Mandarin writers are translations or at least inuenced by existingMandarin nounenoun combinations.

    Log-likelihood values (LL) were calculated (using Rayson's (n.d.) online calculator) in order to test for signicance ofdifferences between the three sub-corpora (see Tables 2e5). Log-likelihood was chosen over a chi-squared test followingRayson and Garside (2000, p. 2) who report that the chi-squared value becomes unreliable when the expected frequency isless than 5 , which is the case with some of the comparisons I make.

    4. Quantitative results

    This section presents a quantitative comparison between the three sub-corpora. Table 1 indicates variation between andwithin sub-corpora in the number of nounenoun combinations per 400word text. Table 2 shows the variation in frequency ofnounenoun combinations in the three sub-corpora. Table 3 considers the different (unique) nounenoun combinations ineach of the three sub-corpora (lexical diversity); it categorises variation in frequency in the ve categories of nounenounphrase outlined in the previous section. Again considering the different nounenoun combinations (lexical diversity) in eachof the three sub-corpora, Table 4 quanties grammatically and lexically inappropriate nounenoun combinations in each sub-corpus. Table 5 considers a particular type of grammatically inappropriate nounenoun phrase: those that show problems ofinappropriate agreement between noun pre-modier and head noun.

    Table 1 indicates mean number of total nounenoun combinations per 400 word text. The standard deviation (SD) in-dicates fairly wide differences between texts in each corpus. Some writers produced no nounenoun combinations, while aminority produced as many as 19.

    Table 1 also shows that the writing by L1 Mandarin writers was signicantly the most lexically diverse with regard tonounenoun combinations; on average they used 5.1 unique nounenoun combinations per 400 word text compared to 3.6 bythe L1 Tswana writers, who in turn used signicantly more than the Spanish writers (2.1). This trend ofMandarin > Tswana > Spanish was in evidence also for use of unique appropriate nounenoun combinations. However,The three sub-corpora were tagged using the CLAWS5 tagset using the automatic tagging service provided by the Uni-versity Centre for Computer Corpus Research on Language at the University of Lancaster (CLAWS part-of-speech tagger forEnglish, n.d.). Instances of nounenoun phrases were identied using WordSmith 5. Proper nouns such as Animal Farm (thenovel) and Richards Bay Minerals were omitted from the count.

    Following Durrant and Schmitt (2009), each nounenoun combination in the three sub-corpora was compared to its use inlarge native-speaker corpora: the BNC and the COCA. I elected to use both corpora, as it is not clear that either British orAmerican English has a predominant inuence in the three contexts. A mutual information (MI) score of 3 (as calculated bythe Brigham Young University Corpora webpage (Davies, n.d.) or more was used as a signicant collocation threshold(Hunston, 2002), and all nounenoun combinations with an MI score less than three are referred to as nounenoun phrases.The nounenoun combinations were thus divided into ve categories:

    1. Frequent collocations. These were nounenoun phrases that have an MI score greater than 3 and are also frequent inEnglish. For the purposes of this study frequent in English was taken as being attested more than 5 times in the BNC ormore than 25 times in the COCA.

    2. Infrequent collocations. These were nounenoun phrases that have a MI score greater than 3, but are relatively lessfrequent (1e5 times in the BNC or 1e25 times in the COCA).

    3. Nounenoun phrases found in the BNC or COCA, with a MI < 3. As a high MI score reects a pair of words for which the

  • Table 2Nounenoun combinations in three sub-corpora.

    Mandarin corpus Spanish corpus Tswana corpus LL M/Sp LL M/T LL Sp/T

    Raw Raw Raw

    Words in sub-corpus 40 000 40 000 40 000Total nounenoun combinations 633 233 445 191.96 p < 0.0001 32.95 p < 0.0001 67.41 p < 0.0001Lexical diversity: Total unique

    nounenoun combinations397 178 289 85.55 p < 0.0001 17.07 p < 0.0001 26.64 p < 0.0001

    Unique appropriate combinations 298 126 232 71.83 p < 0.0001 8.24 p < 0.01 31.86 p < 0.0001Unique inappropriate combinations 99 52 57 14.83 p < 0.001 11.45 p < 0.001 0.23 ns

    Table 3Appropriacy of unique nounenoun phrases in three corpora (lexical diversity).

    J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113 107Mandarin corpus Spanish corpus Tswana corpus LL M/Sp LL M/T LL Sp/T

    Raw % Raw % Raw %

    Total unique nounenoun combinations 397 178 2891. Unique appropriate combinations Of which: 298 75 126 71 232 80 0.31 ns 1.31 ns 1.55 ns1a . Combinations absent from BNC/COCA but

    judged appropriate by L1 raters11 3 1 1 20 7 3.61 ns 6.29 P < 0.05 13.08 P < 0.001

    1b . Combinations attested in BNC or Coca, MI < 3 68 17 31 17 38 13 0.01 ns 1.74 ns 1.33 nsalthough the mean inappropriate combinations per 400 word text by the Mandarin writers was signicantly greater thanboth the other groups, the mean use of inappropriate combinations by the Spanish and Tswana writers was not signicantlydifferent from each other. In sum then, the Mandarin writers produced a signicantly greater mean number of appropriateand inappropriate combinations than the other two groups. The Tswana writers in turn produced more appropriate com-binations per text than the Spanish writer, but did not produce more inappropriate combinations than did the Spanishwriters.

    1c . Infrequent collocations: 1e5 times in BNC or1e25 times in COCA; MI > 3

    27 7 12 7 36 12 0.00 ns 5.73 P < 0.05 3.72 ns

    1d . More frequent collocations: >5 times in BNCor >25 times in COCA; MI > 3

    192 48 82 46 138 48 0.14 ns 0.01 ns 0.07 ns

    2. Unique inappropriate nounenoun combinations 99 25 52 29 57 20 0.84 ns 2.03 ns 4.14 P < 0.05

    Table 4The nature of inappropriate nounenoun combinations in the three sub-corpora.

    Mandarin Spanish Tswana LL M/Sp LL M/T LL Sp/T % Inter-rater agreement

    Total unique nounenoun combinations 397 178 289Unique inappropriate NeN combinations

    Of which:99 52 57 0.84 ns 2.03 ns 4.14 P < 0.05

    1. Lexically inappropriate phrases 38 23 27 1.25 ns 0.01 ns 1.29 ns 90%2. Grammatically problematic phrases: 61 29 30 0.07 ns 3.22 ns 2.96 ns2a Adj-N would be more appropriate 32 10 10 1.05 6.19 P < 0.05 1.16 ns 89%2b Possessive noune noun phrase would

    be more appropriate15 5 10 0.35 ns 0.05 ns 0.15 ns 87%

    2c N-PP would be more appropriate 5 7 5 3.82 ns 0.25 ns 2.00 ns 81%2d Inappropriately singular or plural

    pre-modier or head noun (see Table 5)9 7 5 1.15 ns 0.24 ns 2.00 ns 85%

    Table 5Number agreement between inappropriately plural or singular pre-modifying nouns and head nouns.

    Mandarin corpus Spanish corpus Tswana corpus LL M/Sp LL M/T LL Sp/T

    Total unique nounenoun combinations in three40 000 word corpora. Of which:

    397 178 289

    Plural rather than singular pre-modifying noun used 2 5 4 4.83 p < 0.05 1.47 ns 1.12 nsSingular rather than plural pre-modifying noun used 2 0 0 1.48 ns 2.19 ns 0.00Singular head noun is used instead of plural 4 2 1 0.02 ns 1.10 ns 1.00 nsPlural head noun is used instead of singular 1 0 0 0.74 ns 1.09 ns 0.00Total inappropriately singular or plural pre-modieror head noun

    9 7 5 1.15 ns 0.24 ns 2.00 ns

    Total inappropriately singular or plural phrases inwhich the nouns agree in number

    7 5 4 0.61 ns 0.15 ns 1.12 ns

  • J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113108Table 2 distinguishes total number of nounenoun phrases in each corpus and also distinguishes the number of uniquenounenoun phrases (lexical diversity). The second of these is a better reection of writer usage in each sub-corpus, becausethe essays in the three corpora are on a limited selection of topics, and these cue multiple uses of the same nounenounphrases. Considering diversity of nounenoun combinations, frequency of unique nounenoun phrases (compared to words inthe sub-corpus) is signicantly higher in the Mandarin sub-corpus than in either of the other two sub-corpora. In summary,relative frequency of nounenoun phrases is Mandarin > Tswana > Spanish.

    Table 2 also shows that L1 Mandarin writers produce both a greater frequency of appropriate and of inappropriatenounenoun phrases than the other two groups. In addition, appropriate nounenoun combinations were signicantly morefrequent in the Tswana corpus than in the Spanish corpus. In summary, for appropriate nounenoun combinations,Mandarin> Spanish; Tswana> Spanish. For inappropriate nounenoun combinationsMandarin> Spanish;Mandarin> Tswana.

    Table 3 considers the lexical diversity in each of the sub-corpora. As a proportion of the different nounenoun phrases ineach corpus, 80% in the Tswana combinations were appropriate in some degree, the proportion being 75% for the Mandarinand 71% for the Spanish writers. These differences are not however signicant.

    Table 3 also categorises these phrases according to their frequency in the BNC or COCA. Of the different nounenounphrases they produce, all three groups produce similar frequencies of appropriate nounenoun combinations (category 1,Table 3). The L1 Tswana writers produce a signicantly greater frequency than do either the Mandarin or Spanish writers ofnounenoun phrases that are not in the BNC or COCA but which are judged appropriate by L1 raters (1a, Table 3). In addition,Tswana writers produce a signicantly greater frequency of appropriate combinations which are infrequent nounenouncollocations (MI> 3,1e5 times in the BNC or 1e25 times in the COCA) than theMandarinwriters (1c, Table 3). All three groupsproduce similar proportions (i.e., 47%) of nounenoun collocations that are frequent in the BNC or COCA (1d). Tswanawritersproduce a signicantly lower frequency of inappropriate nounenoun combinations (category 2) than the Spanish writers. Insum, the Tswana writers produced a signicantly greater proportion than either of the other two groups of acceptablecombinations absent from the BNC/COCA; they also produce more nounenoun collocations infrequent in the BNC/COCA thando the Mandarinwriters. Tswanawriters also produced a lower proportion of inappropriate combinations than the other twogroups, but only in the case of the Spanish writers was this difference signicant.

    The inappropriate nounenoun combinations fell into a range of types, as reected in Table 4. Some were lexically inap-propriate (Table 4, category 1), and in NS English theywould be replaced either with existing collocations (e.g. crimes/criminalacts rather than the inappropriate crime doings in my data) or with another word (e.g. homicides rather than blood crimes).Table 4 shows that fewer than half of the inappropriate phrases in each sub-corpus were lexically inappropriate. In coding, thelexically inappropriate phrases were identied as ones where most raters reformulated the phrase lexically (e.g. homeviolence/ domestic violence) rather than changing it grammatically.

    Other inappropriate nounenoun combinations were ones in which appropriate lexis had been selected, but they deviatedgrammatically from those in native English. Table 4 shows the different grammatical problems. These include use of a pre-modifying noun where most raters judged a pre-modifying adjective to be more appropriate (category 2a, Table 4) (e.g.democracy revolution), use of a pre-modifying noun where a possessive noun was more appropriate (2b, Table 4) (e.g. peopleneeds), use of a pre-modifying noun where a post-modifying prepositional phrase was more appropriate (2c, Table 4) (e.g.imagination capacity), and inappropriately singular or plural pre-modifying noun or head noun (2d, Table 4) (e.g. trafccondition; tattoo equipments).

    Most inappropriate nounenoun combinations have more than one possible appropriate realisation in English. Forexample, in place of the inappropriate phrase, religion alienation, which was found in the data, in English, acceptable phraseswould be both the adjectiveenoun combination, religious alienation, and the noun-prepositional combination, alienation fromreligion. To categorise them, I relied on the 11 raters to suggest what phrase they would use to replace nounenoun phrasesthey viewed as inappropriate. Inter-rater percentage agreement is also reected in Table 4.

    Table 4 is based on the total unique nounenoun combinations and the frequency of different types of inappropriatenounenoun phrase within these. There is little variation in these frequencies between the 3 groups of writers when it comesto either lexical or grammatical inappropriacy, but overall the frequency of problematic combinations in the Tswana sub-corpus is signicantly lower than that in the Spanish corpus. In addition, the Mandarin writers had a signicantly greatertendency than did the Tswanawriters to use a nounenoun combinationwhen native raters judged an adjectiveenoun phraseto be more appropriate.

    The issue of inappropriate use of a plural or singular pre-modifying noun (2d Table 3) has relevance to the question ofwhether the presence or absence in the L1 of agreement between head nouns and premodiers inuences use in English. Ifthis is the case, wewould expect Tswana and Spanish writers to produce more such combinations than the Mandarinwriters,whose L1 does not do this. Table 5 considers such inappropriate nounenoun phrases that show problems of inappropriateagreement between noun pre-modier and head noun. In general there were no signicant differences between the threegroups in the number of these phrases inwhich the nouns had erroneously been made to agree in number. However, as Table5 shows, the L1 Mandarin writers were signicantly less likely than the L1 Spanish writers to inappropriately make pre-modifying nouns plural. As Table 5 shows, though, this is the single piece of evidence that writers whose L1 does notallow nounenoun combinations were more likely to make the nouns inappropriately agree in number. The number of in-stances of inappropriate number agreement were probably too low in this corpus to reach any conclusion on this issue.

    In the next section, I consider the implications of these results. I then provide a qualitative discussion of the nounenounusage regarded as inappropriate in English by the L1 raters.

  • J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113 1095. Discussion of quantitative results

    This section explores possible explanations for the differences between the three sub-corporawhich are reected in Tables1e5 In general, Table 1 (nounenoun combinations per 400 word text) demonstrates that Mandarin writers produce moreappropriate and more inappropriate unique nounenoun combinations per 400 word text than do the other two groups.Tswana writers produce more appropriate nounenoun combinations than Spanish writers. I explore each of these ndingsbelow.

    Comparing the two EFL groups, nouns as pre-modiers are found in Mandarin but are absent from Spanish. Because theMandarin writers are used to using this resource in Mandarin, they may be more sensitive to its use in what they hear andread in English; in addition their greater use (output) in English of nounenoun combinations may assist them in learning therules associated with English use. Existence of nounenoun phrases in Mandarin may not be the only factor in this difference,but it is likely to be a relevant one. Nounenoun phrases in the L1 may also be a factor in the signicantly greater frequency ofinappropriate nounenoun phrases the Mandarin group produced compared to the other two groups. Judgment by an L1Mandarin rater indicated that 71% of inappropriate nounenoun combinations in the Mandarin sub-corpus showed inuenceof L1 nounenoun phrases; examples are window closet (rather than shop window), spirit stanchion (spiritual support) andmeteor rain (meteor shower). This coincides with the nding of L1 inuence on production of atypical collocations by Lauferand Waldman (2011).

    Comparing the EFL and ESL context of learning English, Tswana, like Spanish, does not permit nounenoun phrases. Tswanawriters' greater production of appropriate nounenoun phrases compared to that of Spanish writers indicates that Tswanawriters' ESL language learning context, a context inwhich English has been consistently used as the language of instruction inall school subjects and in which English is the dominant language of the media, business and ofcialdom, has given theTswana writers greater exposure to English and thus predisposed them by comparison with Spanish writers to the use ofnounenoun phrases. The production of a signicantly lower overall frequency of inappropriate nounenoun phrases by theTswana writers than by the Mandarin writers (see Tables 1 and 2) and a signicantly lower proportion of nounenouncombinations that are inappropriate than the Spanish writers (see Table 3) is likely to be strongly inuenced by the context oftheir learning of English.

    Table 2 (overall production of nounenoun combinations in each sub-corpus) supports the ndings of Table 1 in that bothappropriate and inappropriate nounenoun combinations are signicantly more frequent in the Mandarin sub-corpus as awhole than in the other two sub-corpora. In addition, the appropriate nounenoun combinations are signicantly morefrequent in the Tswana sub-corpus as a whole than in the Spanish sub-corpus, but the Tswana and Spanish writers producesimilar frequencies of inappropriate combinations. This again supports my above claim that presence of nounenoun phrasesin the L1 prompts their use in the L2, and that the greater exposure to English consequent on learning in an ESL contextsimilarly prompts their use in the L2. Thus the nding of variation across three sub-corpora in the frequency of nounenounphrases produced by three groups of equal prociency suggests that both L1 and context of learning predispose writers to useof nounenoun combinations.

    Of the nounenoun phrases they produce, the Tswana writers produce a greater frequency than Spanish or Mandarinwriters of phrases that are not in the BNC or COCA but which are judged appropriate by L1 raters (category 1a, Table 3). TheTswana writers thus appear to be better able to invent or use exibly their own appropriate nounenoun phrases than are theother two groups. A possible reason for this is that the Tswana writers, possibly because of their greater exposure to Englishthan either of the other two groups, are better able to judge the appropriacy of their own invented nounenoun combinations.Tswanawriters also produced a signicantly greater proportion of nounenoun collocations that are infrequent in NS than didtheMandarinwriters (see category 1c, Table 3). Again this may relate to the ESL context of their learning, suggesting that theyhave been exposed to these less frequent nounenoun collocations more often than have the other two groups.

    In sum, although both groups learnt English in an EFL context, it seems that Mandarin writers are more aware ofnounenoun phrases as a possible resource in expressing meaning than are the Spanish writers. Although nounenouncombinations are absent from the L1 of both groups the ESL Tswanawriters use nounenoun combinations more than Spanish(but less than Mandarin writers). The Tswana writers are also able to judge the appropriacy of their own apparently inventedcombinations somewhat better than the other two groups.

    Regarding nounenoun collocations (MI > 3) that are frequently produced by native speakers (i.e. >5 times in the BNC or>25 times in the COCA), Table 3 (category 1d) shows that the proportions produced by theMandarin (48%), Spanish (46%) andTswanawriters (48%) are not signicantly different. It is notable that these rates are similar to the ndings of 45% by Siyanovaand Schmitt (2008) for frequent adjectiveenoun collocations. However, although my means of identifying collocations(appearance in the BNC or COCA) was similar to theirs (appearance in BNC), my inclusion of the COCA as a reference corpus islikely to identify a greater proportion of nounenoun phrases as collocations than theirs. It therefore seems likely that thissimilarity may be coincidental. This again suggests the usefulness of further research comparing appropriate usage of col-locations of different types by the same group of writers.

    6. Results and discussion of qualitative analysis

    As shown in Table 4, of the inappropriate nounenoun combinations in my sub-corpora, some had grammatical problemswhile others were lexically problematic. I will consider the grammatical problems rst. More than half of the inappropriate

  • nounenoun phrases in each sub-corpus are grammatically inappropriate. Table 4 shows the different grammatical problems.

    J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113110These include use of a pre-modifying nounwhere the raters judged that a pre-modifying adjective or a possessive nounwouldbe more appropriate, use of a pre-modifying noun where a post-modifying prepositional phrase would be more appropriateand inappropriately singular or plural pre-modfying noun or head noun.

    In the rst category a head noun was pre-modied by a noun, when a pre-modifying adjective phrase would be used toexpress this meaning in English (example 1).

    1. We are subject to capitalism manipulation thanks to television (Sp) (capitalist manipulation)

    Another category was headnouns modied by a nounwhen a pre-modifying possessive nounwas appropriate (example 2).

    2. However it is a real life in many people view (Ma) (peoples' view)

    Another sub-type of inappropriate combinations of the grammatical sort are nounenoun combinations where the nounmodier was inappropriately plural (3), or inappropriately singular (4). These shed some light on whether the presence orabsence of number agreement in noun phrases in the L1 erroneously leads learners to expect that there should be agreementin number between the nouns in a nounenoun phrase.

    3. Moreover, most of the employees who have universities degrees agree that what they learn is seldom used. (Ma) (uni-versity degrees)

    4. Many companies advertise on television in order to increase the sale volume (Ma) (sales volume)

    My hypothesis was that the Spanish and Tswana writers would be more inclined to make the noun modier agree withthe head noun (the case for noun modiers in Spanish and Tswana) than the writers whose L1 is Mandarin, where suchagreement does not occur. As I note above in my discussion of Table 5, there was a signicantly lower tendency for Mandarincompared to Spanish writers to inappropriately make pre-modifying nouns plural. As I discuss above this is the only evidencethat writers whose L1 does not allow nounenoun combinationsweremore likely tomake the nouns inappropriately agree innumber; the low incidence of this inappropriate usage in the three corpora and the moderate size of the corpora used make aconclusion on this issue tentative.

    Also included in Table 5 are frequencies of nounenoun phrases in which the noun modier was inappropriately singular.These all concerned use of nouns as pre-modiers which are invariably plural in English. For the Mandarin writers, thesephrases included sport channel and sport match. Because plural noun pre-modiers are rare, it makes sense for writers to use asingular pre-modier as the default choice.

    Table 5 also included the small number of cases where the head noun was inappropriately plural (5) or singular (6).

    5. This notion is too narrow and does no present many invaluable aspects of university lives (Ma) (university life)6. Women were supposed to take care of the family, to take care of the home re. (Sp) (home res)

    Moving on to lexically inappropriate combinations, it is useful rstly to consider all lexically new formulations produced bywriters of the 3 sub-corpora, whether appropriate or inappropriate. Such new formulations are ones that do not appear in theBNC or COCA. Somewere considered appropriate by the raters (category 1a, Table 3). Other new formulations were consideredlexically inappropriate by raters (last rowof Table4); ratersmade lexical (rather thangrammatical) changes tooneorbothnouns.Comparing these two categories of lexically new formulations, more were rated unacceptable in English than acceptable. Soinventing nounenoun collocations is risky for learners because they are more likely to be inappropriate than appropriate.

    In lexically inappropriate combinations, one of the nouns in an existing nounenoun collocation is replaced with a wordthat does not usually collocate with the other noun. In example 7, the pre-modifying adjective in an English collocation,domestic violence, is replaced with a synonym, home, producing a phrase that is not a collocation in English; in example 8 it isthe head noun in rehabilitation programme/facility that is replaced, producing rehabilitation school.

    7. Nearly 30 percent women are facing home violence or quarrels with family members (Ma) (domestic violence)8. Another way of counselling may be to attend rehabilitation school to be taught about how they should behave. (Ts)

    (rehabilitation programmes)

    Other inappropriate phrases were formulations that either use highly infrequent words (9) or invent another phrase for anexisting collocation (10). As discussed above, both of these examples, from theMandarin sub-corpus, are likely to be the resultof translation from Mandarin.

    9. Royalty are still a vital part of Britain society And most people's heart, it still the spirit stanchion of the Britain (Ma)(spiritual support)

  • (Ch) (soap operas)

    J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113 111Soup operas in 13, although probably a spelling mistake, is included as it shows one example of the collocation soap operahaving been imperfectly learnt. Another instance of imperfect learning of the same collocation by another learner is reectedin example 14. It is possible that soda show (also bubbly/soapy) in 15 may also refer to soap operas.

    14. When they watch the bubble operas they think that they are in need of relaxing (Ma) (soap operas)15. We relax ourselves by watching the entertainment programmes on television such as soda show, television series

    and movies. (Ma) (soap opera)

    In contrast with family numbers which suggests the possibility of holistic learning of collocations, bubble operas and sodashow provide evidence of analytical learning. The writers of these remember that soap operas have something to do withsoap/bubbles, but choose the nouns bubble and soda to represent a characteristic of soap (bubbles) instead of soap itself.Bubble operas has not been retrieved whole frommemory by the writer but rather constructed analytically from the partiallyremembered characteristic of the phrase. Similarly, if elephant hide was intended by elephant peel in example 16, the writerhas constructed this nounenoun phrase analytically by using a synonym (peel) for skin/hide (as in banana peel).

    16. He changed his elephant peel for a henwithout thinking if maybe the peel could havemore value (Sp) (elephant hide?)

    The Tswana writers also showed evidence of phrases that are found in the L2 variety of English that is widely spoken inSouthern Africa. I would speculate that these are stable collocations in this variety, but this would need further comparisonwith a large South African corpus. These combinations included cousin sister, veld re, ID book, and Technikon campus. Thesecombinations were all rejected as acceptable by the raters, but are known to the author as acceptable South African usage.

    7. Conclusion

    This study has produced statistically signicant quantitative ndings about use of nounenoun phrases by three groups ofnon-native writers of English; the ndings shed light on the inuence of the presence of nounenoun phrases in the L1 andalso the inuence of the context in which English has been learnt. The study has supplemented these quantitative ndingswith a qualitative discussion of the different categories of nounenoun phrase that are inappropriate in English. Beforeexamining the main ndings from these two parts of this study, I note that this study is limited to a focus on use rather thanlearning of nounenoun collocations. Nevertheless this focus provides a number of insights, which I summarise below. Someof these insights are of interest to EAP teachers.

    The study sought to test the possibility rstly that the nature of the L1 inuences production of nounenoun phrases inEnglish, and secondly that context of learning (ESL or EFL) will have an inuence on the learning of these phrases. In testingthese, the frequency of production of nounenoun phrases per 400 word text, the proportion of phrases of different categoriesin each sub-corpus as a whole, their accuracy, and the extent to which nounenoun combinations produced are collocations,i.e. frequent in the language of native speakers, were considered.

    With regard to per text frequency, this study found signicant differences in the frequencies produced by writers ofdifferent L1s. The L1 Mandarin writers, an EFL group whose L1 allows nounenoun combinations, produced the greatestfrequency of nounenoun phrases, followed by the L1 Tswana writers, an ESL group whose L1 does not allow nounenouncombinations. In the writing by L1 Spanish writers, an EFL group whose L1 does not allow these phrases, their frequency was10. Sometimes maybe you don't know what thing is when you nd the new type of goods which appear on the shop'swindow closet. (Ma) (shop window)

    Some lexically problematic combinations are interesting in their ability to shed light on the process of learning of col-locations. In these it appeared that the writer had gone some way towards learning an existing nounenoun chunk and thatthese writers partly know the collocation. Such instances exemplify Schmitt and Carter (2004) incompletely learnt collo-cations; with more exposure it is possible that writers will learn the collocation more fully. However it is also possible thatsuch erroneously learnt collocations may be retained; Wray (2012, p. 248) gives the examples of (native speaker) confusionbetween streets/streaks ahead and off his own back/bat. Some of these imperfectly learnt collocations are ones where either themodifying or head noun are replaced by words that sound like the intended word. In example 12, numbers and memberssound similar, as do diary and daily in 13, which also look similar. As discussed above, Boers and Lindstromberg (2009) arguethat this is evidence for holistic learning of collocations. Although these few examples cannot be said to provide substantialevidence for this claim, they do suggest that further research with a bigger corpus might be valuable.

    11. The murderer's family numbers can go to prison and look in them instead of crying for their death (Ma) (familymembers)

    12. Television has become an important factor in our diary life (Sp) (daily life)13. Numbers of educational programmes are small in contrast with a large number of advertisements and soup operas

  • J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113112the lowest. I have argued that the presence of nounenoun phrases in the L1 makes learners more inclined to use them inEnglish. My ndings suggest too that the greater exposure afforded by learning in an ESL context also increases use in the L2.

    Although the L1 Mandarin writers produced more nounenoun phrases than the L1 Tswana writers, who produced morethan the Spanish writers, the proportion of appropriate phrases was not signicantly different in the three groups; howeverthe L1 Mandarinwriters produced a signicantly greater proportion of inappropriate nounenoun phrases than the other twogroups. Direct translation from the L1 appear to have played some role here; an L1 Mandarin rater judged L1 inuence in 71%of the inappropriate combinations.

    The proportion of nounenoun collocations (MI > 3) that are frequently used by native speakers (>5 times in BNC or >25times in COCA) was about the same in the writing of three groups. L1 Tswana writers produced a signicantly greater pro-portion of nounenoun collocations (i.e. MI > 3) which are used less frequently by native English speakers than did theMandarin writers. Given their English-medium education, the Tswana learners are likely have been exposed more often toeach of the nounenoun collocations, giving them an advantage in acquiring them. Notable too is the signicantly greaterability of the Tswana writers to produce their own acceptable combinations absent from the BNC or COCA.

    Inuence of the L1 was apparent in the greater production of nounenoun combinations by the L1 Mandarinwriters. Moreevidence of inuence of the L1 was the greater tendency of L1 writers of Spanish (which marks pre-modiers to agree withthe noun in number) compared to L1 writers of Mandarin (which does not) to inappropriately make pre-modifying nounsplural. I tentatively conclude that, by comparison with L1 Mandarin writers, their L1 has led to the Spanish writers over-marking the plural. This has relevance for EAP teachers.

    Another nding of relevance to EAP teachers whose students' L1 is Mandarin is their somewhat greater tendency (bycomparison with the Tswana writers) to use a nounenoun phrase when an adjectiveenoun phrase would be more appro-priate. This difculty may stem from the fact that nouns are often used as adjectives (e.g. blueberry mufn).

    Also of relevance to EAP is that learners did not always signal explicitly the possessive relationship between the nouns in anounenoun phrase. This may be because the meaning relationship between nouns in nounenoun phrases is usually implicit(Biber et al., 1999, p.590) and not always signalled using a possessive noun (e.g. gun power; motor vehicles), making thisconfusing for learners.

    For the EAP teacher, the fact that new nounenoun formulations by learners are more likely to be inappropriate thanappropriate is difcult to respond to. Should learners be encouraged only to use nounenoun phrases that they know, ratherthan inventing their own? Or does using them, even if inappropriately, promote sensitivity to them and a greater level ofnoticing of these forms in future reading? The fact that Mandarin writers used a signicantly greater number than Spanishwriters, but with little increase in the proportion used accurately, suggests not. That the Tswana writers, who had mostexposure to nounenoun phrases, used them signicantly more accurately, suggests that repeated exposure is the best way toincrease accuracy. This issue would benet from further research.

    As discussed above, further research that compares use of different types of collocations (e.g. adjenoun compared tonounenoun) within the same populationwould also be of value. My comparison of the present ndings with previous studiesof adjectiveenoun and verbenoun collocations has not been conclusive, because of differences in the ways in which collo-cations are identied in different studies, and variation between studies in the L1 of writers.

    Acknowledgements

    My thanks to Jill Musgrave, Anna Siyanova, and two anonymous reviewers for their valuable comments on an earlierversion of this article.

    References

    Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21, 101e114.Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Edinburgh: Pearson Education Limited.Boers, F., & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed second language acquisition. Basingstoke, UK: Palgrave Macmillan.Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic sequences in a second language. Annual Review of Applied Lin-

    guistics, 32, 83e110.Census in brief. (2012). Statistics South Africa. Retrieved on 9 March 2014 from http://www.statssa.gov.za/Census2011/Products/Census_2011_Census_in_

    brief.pdfCLAWS part-of-speech tagger for English n.d http://ucrel.lancs.ac.uk/claws/. Retrieved from.Collins Wordbanks Online. (2008). A guide to statistics: t-Score and mutual information. Retrieved on 3rd April 2014 from http://wordbanks.harpercollins.co.

    uk/Docs/Help/statistics.html.Davies, M. n.d. Corpus.byu.edu. Retrieved on 26 July 2015 from http://corpus.byu.edu/.Durrant, P., & Schmitt, N. (2009). what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics in Language

    Teaching, 47(2), 157e177.Durrant, P., & Schmitt, N. (2010). Adult learners' retention of collocations from exposure. Second Language Research, 26(2), 163e188.Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). In International corpus of learner English (pp. 198e204). Presses Universitaires de Louvain.Hunston, S. (2002). Corpora in applied linguistics. Cambridge University Press.Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: a corpus analysis of learners' English. Language Learning, 61(2),

    647e672.Nayar, P. B. (1997). ESL/EFL dichotomy today: language politics or pragmatics? TESOL Quarterly, 31(1), 9e37.Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24, 223e242.Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins Publishing.Paquot, M. (2012a). Corpus collection guidelines. Retrieved on 26th July 2015 from https://www.uclouvain.be/en-317607.html.Paquot, M., & Granger, S. (2012b). Formulaic language in learner corpora. Annual Review of Applied Linguistics, 32, 130e149.

  • Parkinson, J., & Musgrave, J. (2014). Development of noun phrase complexity in the writing of English for Academic Purposes students. Journal of English forAcademic Purposes, 14, 48e59.

    Randolph, J. J. (2008). Online kappa calculator. Retrieved September 20, 2014, from http://justusrandolph.net/kappa/.Rayson, P. (n.d.) Log-likelihood calculator. Retrieved May 3, 2014 from http://ucrel.lancs.ac.uk/llwizard.html.Rayson, P., & Garside, R. (2000, October). Comparing corpora using frequency proling. In Proceedings of the workshop on comparing corpora (pp. 1e6).

    Association for Computational Linguistics.van Rooy, B. (2009). The status of English in South Africa. In S. Granger, E. Dagneaux, F. Meunier, & M. Paquot (Eds.), International corpus of learner English

    (pp. 198e204). Presses Universitaires de Louvain.Schmitt, N., & Carter, R. (2004). Formulaic sequences in action. Formulaic sequences: acquisition, processing and use. In N. Schmitt (Ed.), Formulaic sequences

    (pp. 1e22). Amsterdam: John Benjamins Publishing.Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: a multi-study perspective. Canadian Modern Language Review/La

    Revue canadienne des langues vivantes, 64(3), 429e458.Thewissen, J. (2013). Capturing L2 accuracy developmental patterns: insights from an error-tagged EFL learner corpus. The Modern Language Journal, 97(S1),

    1e25.Wray, A. (1999). Formulaic language in learners and native speakers. Cambridge University Press.Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford University Press.Wray, A. (2012). What do we (think we) know about formulaic language? an evaluation of the current state of play. Annual Review of Applied Linguistics, 32,

    231e254.

    Jean Parkinson teaches Applied Linguistics and TESOL at Victoria University of Wellington in New Zealand. Her research interests are academic writing,spoken and written genres in Science, and Corpus Linguistics. ([email protected]).

    J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113 113

    Nounnoun collocations in learner writing1. Introduction2. Literature review2.1. Collocations2.2. Plural nouns as pre-modifiers in nounnoun phrases in English

    3. Methods4. Quantitative results5. Discussion of quantitative results6. Results and discussion of qualitative analysis7. ConclusionAcknowledgementsReferences