Elisabete Ferreira MA Thesis - DiVA portal
Transcript of Elisabete Ferreira MA Thesis - DiVA portal
Degree Project Master’s Level Anaphoric demonstratives in student academic writing A cross-disciplinary study of (un)attended this and these
Author: Elisabete Ferreira Supervisor: Annelie Ädel Examiner: Jonathan White Subject/main field of study: English for Academic Purposes Course code: EN3077 Credits: 15 Date of examination: 13/06/19 At Dalarna University it is possible to publish the student thesis in full text in DiVA. The publishing is open access, which means the work will be freely accessible to read and download on the internet. This will significantly increase the dissemination and visibility of the student thesis. Open access is becoming the standard route for spreading scientific and academic information on the internet. Dalarna University recommends that both researchers as well as students publish their work open access. I give my/we give our consent for full text publishing (freely accessible on the internet, open access):
Yes ☒ No ☐
Dalarna University – SE-791 88 Falun – Phone +4623-77 80 00
1
Abstract
Cohesive devices such as anaphoric reference play an important role in written discourse. This
thesis investigates the extent to which the anaphoric demonstratives this and these are used as
determiners (‘attended’) or pronouns (‘unattended’) by first-year undergraduate students from
four different academic disciplines. Data extracted from the British Academic Written English
(BAWE) corpus were analysed quantitatively to determine the frequency of use of attended and
unattended this/these across disciplines, as well as qualitatively to examine the types of nominal
and verbal structures that follow the demonstratives. When compared to findings from previous
studies, novice student writers were found to employ this/these as pronouns to a larger extent
than both students at a more advanced level and research article writers. It was also observed
that the determiners this and these pattern differently, selecting distinct attending nouns to a
great extent. In addition, comparison of the results for each subcorpus shows that even though
there are some differences between the four disciplines, these differences are not as great as
might be expected and do not indicate a clear distinction between ‘hard’ and ‘soft’ sciences.
While the influence of genre has not been scrutinised, other possible explanations proposed
relate to the educational context and level of study in association with the range of lexical
choices available to novice student writers.
Keywords: cohesion, demonstratives, anaphoric reference, disciplinary variation, student writing
2
Table of Contents 1 Introduction ........................................................................................................................ 3
1.1 Aim and Research Questions ........................................................................................ 4 2 Review of the Literature ...................................................................................................... 6
2.1 Academic Discourse(s) ................................................................................................. 6 2.1.1 English for general or specific academic purposes ................................................ 8
2.1.2 Disciplinary variation in academic writing.......................................................... 10 2.1.2.1 ‘Common core’ versus discipline-specific vocabulary ..................................... 11
2.2 Discourse Grammar: Cohesion in Written Discourse .................................................. 13
2.2.1 Cohesive devices and reference patterns ............................................................. 15 2.2.2 Demonstratives as determiners or pronouns ........................................................ 17
2.2.3 Attended and unattended this/these in academic writing ..................................... 19 3 Material and Methodology ................................................................................................ 23
3.1 Data............................................................................................................................ 23 3.2 Method of Analysis .................................................................................................... 24
3.2.1 Models of classification ...................................................................................... 28 4 Results and Discussion ...................................................................................................... 29
4.1 Overall Frequencies of Attended and Unattended this/these ........................................ 30 4.2 Most Frequent Nouns and Noun Types Following this/these ....................................... 32
4.3 Most Frequent Verbs and Verb Types Following this/these ........................................ 38 4.4 Distribution of Attended and Unattended this/these across Disciplines ....................... 42
4.5 Distribution of Nouns/Verbs and Types across Disciplines ......................................... 45 5 Conclusion ........................................................................................................................ 52
References ........................................................................................................................... 56 Appendix 1 List of nouns following this/these ...................................................................... 61
Appendix 2 List of nouns per type ........................................................................................ 73 Appendix 3 List of verbs following this/these....................................................................... 76
Appendix 4 Complete list of nouns per discipline ................................................................. 79 Appendix 5 Complete list of verbs per discipline.................................................................. 84
3
1 Introduction
In order to construct a rhetorically effective text, writers adopt several strategies. One of them
is the use of cohesive devices to structure the text, connect ideas and maintain a clear flow of
information, which facilitates understanding and helps build a strong and convincing argument.
The importance of these devices is reflected in the attention paid in the literature to specific
linguistic items such as linking adverbials or connectors (e.g. Charles, 2011; Hinkel, 2001;
Granger & Tyson, 1996). While connectors have been researched to some extent, cohesive
items such as the demonstratives this/these and that/those have been less explored. As markers
of anaphoric reference, these help to establish “contextual ties between ideas” (Hinkel, 2001,
p. 112) and play an important role in text cohesion.
The appropriateness or not of leaving an anaphoric demonstrative (in particular this)
‘unattended’, that is, not followed by a noun, has been a matter of debate in academic writing
for decades (Swales, 2005; Geisler, Kaufer & Steinberg, 1985). The circumstances that lead
writers to choose between the pronominal form of a demonstrative and the determiner form
followed by a noun have received considerable attention in the North American approach to
composition and writing instruction (e.g. Moskovit, 1983; Geisler et al., 1985). In addition,
style manuals and textbooks (e.g. American Psychological Association, 2010; Glenn & Gray,
2013; Swales & Feak, 2012) continue to draw attention to the potential ambiguity and confusion
that writers can create by using an unattended demonstrative, placing an unnecessary burden
on the reader. Teachers too tend to be adamant that students should avoid disrupting the
cohesion of a text by making the anaphoric reference clear and often “scribble in the margin
‘this what?’” (Swales, 2005, p. 2).
Rather than taking a prescriptive approach to the use of (un)attended this/these, the
research focus has shifted in recent years to descriptive examinations of how these forms are
actually used in patterns of anaphoric reference, especially in academic writing. This
4
development has been strongly supported by the increasing use of corpora and corpus tools.
Much of the emphasis has been placed, however, on published research articles (Gray & Cortes,
2011; Gray, 2010; Swales, 2005) and comparatively few studies have investigated how
demonstrative determiners and pronouns are used by student writers; even fewer have targeted
a non-North American context (e.g. Petch-Tyson, 2000). Those studies that have focused on
student writing have prioritised advanced or upper-level students (e.g. Wulff, Römer & Swales,
2012; Römer & Wulff, 2010). A comparison of their findings with results from an investigation
of attended and unattended demonstratives in novice writing could provide useful information
about their usage at different levels of study.
In comparison with the relative scarcity of studies on anaphoric demonstratives, a
phenomenon that has been somewhat more attended to in recent years is the systematic variation
within academic discourse. An increased awareness of disciplinary variation and the
importance of a discipline-specific identity have been widely acknowledged, as research has
demonstrated that academic discourse varies to a large extent across genres and disciplinary
communities (e.g. Hyland, 2004, 2006; Bhatia, 2002; Becher & Trowler, 2001). Students at
both undergraduate and postgraduate level are also expected to follow certain linguistic
conventions and develop disciplinary discourses in order to successfully progress through
tertiary education (Hyland, 2006, p. 39). The disciplinary perspective is thus also relevant to
explore in student writing (e.g. Nesi & Gardner, 2012; Staples, Egbert, Biber & Gray, 2016;
Gardner, Nesi & Biber, 2018).
1.1 Aim and Research Questions
The overarching aim of this study is to investigate the use of the demonstratives this/these in a
representative collection of texts produced by first-year undergraduate students from different
academic disciplines. It seeks to determine how frequently this/these are used as determiners or
pronouns, as well as the most common nouns and verbs that follow these demonstratives, and
5
whether any disciplinary variation can be found. For these purposes, frequencies of use and the
most frequent nouns and verbs and respective types will be analysed, followed by a comparison
of the results across disciplines. The study will be guided by the following research questions:
1) To what degree are attended and unattended this/these used in first-year undergraduate
student writing?
2) Which nouns and noun types (concrete, deictic, shell, other abstract nouns) most
frequently follow attended this/these?
3) Which verbs and verb types (lexical, primary, modal) most frequently follow unattended
this/these?
4) To what extent, if any, do student writers in different disciplines differ with respect to
the frequency of use of attended and unattended this/these?
5) How are the most frequently co-occurring nouns/verbs and respective types distributed
across academic disciplines?
To address these research questions, the next section will review literature on academic writing
related to disciplinary discourses and cohesion, as well as provide an overview of previous
corpus-based research on the use of demonstratives as cohesive resources. Section 3 will then
describe the material and methodology employed in this study, after which the results of the
analysis of the data will be presented and discussed in section 4. Finally, a summary of the main
findings, as well as limitations and suggestions for future research will follow in the concluding
section.
6
2 Review of the Literature
2.1 Academic Discourse(s)
Academic discourse broadly concerns “the ways of thinking and using language which exist in
the academy” (Hyland, 2009, p. 1). It is through discourse, in its different realisations, that
academics collaborate, communicate, create and disseminate knowledge. Academic discourse
is used by members of different social groups, often referred to as ‘discourse communities’,
within which they develop specialised knowledge and competence in the communication
practices of their particular disciplines (Paltridge, 2002, p. 15; see also Becher & Trowler,
2001). According to Swales (1990, p. 9), “discourse communities are sociorhetorical networks
that form in order to work towards sets of common goals”. A discourse community is further
identified by a particular way of communicating, its members possessing a certain level of
expertise and familiarity with the specialised vocabulary and genres that are relevant for their
communicative purposes (Swales, 1990, pp. 24-27). Different academic discourse communities
thus have distinct ways of using language and doing things. While it has been somewhat
contested, the concept of discourse community nevertheless “proves useful in identifying how
writers’ rhetorical choices depend on purposes, setting and audience” (Hyland, 2009, p. 66) of
any given discourse, which in turn can help to understand how those communicative purposes
or goals are achieved by different groups.
Research on academic discourse in recent years has demonstrated that “the discourses of
the academy are enormously diverse” (Hyland, 2004, p. x) with respect to both genre and
discipline (e.g. Bhatia, 2002; Hyland, 2004), resulting in an increasing awareness of multiple
‘academic discourses’ that contrasts with the previously-held perception of “a single,
monolithic ‘academic English’” (Hyland, 2009, p. ix). The implications and value of this
7
research, for language teaching in general and English for Academic Purposes pedagogy in
particular, are widely recognised (Flowerdew, 2002; Hyland, 2006).
Within academic discourse, particular attention has been paid to written discourse. As the
“main way scholarship is transmitted” (Flowerdew, 2016, p. 6), writing is essential for the
dissemination of knowledge through publication by professional academics. At the same time,
writing plays an equally essential role in the display of knowledge for assessment purposes by
students and “is probably the single most important skill necessary for academic success” (Nesi
& Gardner, 2012, p. 3). In spite of, or because of this role, it represents a challenge for novices
in academia. Academic writing in English specifically is usually thought to be a complex and
elaborated form of discourse, characterised by long and unfamiliar words and rather abstract
(Biber & Gray, 2010). Certain features that are typically associated with academic texts include
high lexical density and nominalisation (compressed nominal or phrasal structures); a formal,
detached and impersonal style; source use and intertextuality (McCarthy, Matthiessen, & Slade,
2010, p. 55; Paltridge, 2002, pp. 136-137; see also Biber, 2006; Biber & Gray, 2010). Some of
these features make academic language not only difficult to read and understand but also
intimidating for students, especially novice writers (Flowerdew, 2016, p. 7; Biber, 2006).
Among the multiple approaches to the study of academic discourse, findings from corpus-
based investigations have contributed considerably to the description and characterisation of
academic discourse in general, and academic writing in particular. In addition to research on
the distinctive features of academic discourse, the different registers of university language
have been examined (e.g. Biber, 2006), as have the diversity of academic genres and their
interplay with the various disciplinary discourses (e.g. Hyland, 2004). Large corpora of
academic texts, such as the British Academic Written English (BAWE) corpus, have also been
used to investigate different genres and features of student academic writing. Nesi and Gardner
(2012), for instance, have identified, from the wide variety of university assignment genres
8
across several disciplines, five different social functions, which are linked to the different stages
of a degree: ‘demonstrating knowledge and understanding’; ‘critical evaluation and developing
arguments’; ‘developing research skills’; ‘preparing for professional practice’; and ‘writing for
oneself and others’. These functions help to gain a better understanding of the type of
knowledge that students need to demonstrate to meet tutors’ expectations for different tasks, as
well as of the various linguistic features that are characteristic of each. Also using data from the
BAWE corpus, Staples et al. (2016) have examined how students’ writing develops in terms of
grammatical complexity through levels of study, mediated by genre and discipline, and found
that their writing becomes more complex as students advance in their studies. Taking a different
approach, Gardner et al. (2018) have also explored the interaction between several situational
variables in student writing. Using a multidimensional analysis, they identified the way certain
linguistic features are differently clustered or dispersed along four dimensions across
disciplines, levels of study and genres in the BAWE corpus.
2.1.1 English for general or specific academic purposes
The increasingly pervasive use of English in academia and research in the past decades has led
to the emergence of an area of study in Applied Linguistics designated English for Academic
Purposes (EAP). One of the main branches of English for Specific Purposes (ESP), EAP
specifically “covers language research and instruction that focuses on the communicative needs
and practices of individuals working in academic contexts” (Hyland & Shaw, 2016, p. 1). The
main focus of EAP instruction is the learner and their needs in preparation for tasks or work in
academic environments, while taking into account the demands of specific academic
disciplines. One of the goals of EAP is thus to equip students with skills in and awareness of
different disciplinary conventions, which will help them produce more genre-oriented and
discipline-specific writing in English (Flowerdew, 2002).
9
The growing importance of English as a medium of university instruction alongside the
globalisation of higher education has led to the expansion of the role and focus areas of EAP,
which in turn generated the “tension between a general EAP addressing a generalized academic
discourse and dedicated disciplinary-discourse instruction” (Hyland & Shaw, 2016, p. 6). As a
result, two main approaches to EAP instruction have emerged, English for General Academic
Purposes (EGAP) and English for Specific Academic Purposes (ESAP). EGAP is concerned
with the general academic skills and ‘common core’ features of academic language that are
needed by students in all disciplines, whereas ESAP addresses the specific needs of students of
particular disciplines (Flowerdew, 2016, p. 7). Often associated with different levels of study,
EGAP courses provide undergraduates and taught graduates with the generic academic skills
they require, while research postgraduates and academics have more discipline-specific needs
that can be met through ESAP instruction (Flowerdew, 2016, p. 8). The fact that many fields of
study are becoming more interdisciplinary, leading to higher demands on students to acquire
competence in more than one discipline, adds however another layer of complexity (Bhatia,
2002, p. 27; Flowerdew, 2016, p. 8).
The ongoing debate over the issue of specificity in ESP/EAP has centred around the
question whether there are transferrable or common skills and language features to justify a
general approach, or should the focus lie on providing students with discipline-specific
instruction (Hyland, 2002, p. 385; see also Shutz, 2013). As discussed in more detail in Section
2.1.2.1, a main area of focus of EAP where this question is particularly relevant and remains
central is that of academic vocabulary (generic versus discipline-specific), namely the creation
of word lists based on frequency, for which the use of corpora has been instrumental (Nesi,
2016). Corpora have been increasingly used by EAP researchers and practitioners as sources of
authentic language use for the analysis of large samples of academic text in different registers
and contexts (e.g. Biber et al., 1999; Biber, 2006; Hyland, 2004; Nesi & Gardner, 2012; Gardner
10
et al., 2018). In addition to being used as research tools, corpora are also useful resources for
EAP writing instruction and materials development (Nesi, 2016; Hunston, 2002).
2.1.2 Disciplinary variation in academic writing
The concept of variation is not only relevant to distinguish between academic and other types
of discourse, but also within academic discourse, which varies according to different parameters
such as genre, register, mode (spoken/written), and discipline. Of particular interest here,
discipline is a complex concept to define, not least because it varies over time and geographical
location due to the “changing nature of knowledge domains” (Becher & Trowler, 2001, p. 41).
Hyland (2004, p. 1) suggests that disciplines are primarily defined by disciplinary-approved
practices that members of a local discourse community adopt and are competent in. Along the
same lines, for Becher and Trowler (2001) academic communities (or ‘tribes’) and disciplinary
knowledge (the ‘territories’) are “inseparably intertwined” (p. 23). In other words, belonging to
a discipline means sharing common ways of producing and communicating knowledge.
Disciplines differ in terms of preferred genres, how knowledge and arguments are
constructed, as well as the vocabulary or terminology used. For instance, the sciences typically
emphasise the construction of knowledge through proof in an objective way and value
precision, whereas the humanities favour the strength of argument and an interpretive discourse
that attempts to describe and understand familiar human experiences, while the social sciences
are seen as falling somewhere in between (Hyland, 2009, p. 63). It follows that disciplinary
knowledge mainly “reflect[s] real-world differences in subject matter” (Becher & Trowler,
2001). The distinction between ‘hard’ (sciences/technology) and ‘soft’ (humanities/social
sciences) disciplines that generally reflects common perceptions does not, however, imply a
clear-cut division between disciplinary groupings (Hyland, 2009, p. 64).
Gaining an awareness of discipline-specific conventions alongside developing
disciplinary knowledge is therefore essential for successful student academic writing
11
(Flowerdew, 2016, p. 8; Hyland & Tse, 2007, pp. 248-249). As university students are exposed
to and learn to master the different communication skills that are specific to their target
disciplines, they also become socialised into the discourse communities of the various academic
disciplines (e.g. Nesi & Gardner, 2012; Hyland, 2006; Becher & Trowler, 2001).
Corpus research has contributed to revealing the extent of disciplinary variation in
academic writing both within and across disciplines (e.g. Hyland, 2004, 2006; Hyland & Tse,
2007). Advanced student writing in particular has been found to reflect to some extent the
differences in disciplinary discourses that students gradually come to recognise and use as they
progress through their studies (Staples et al., 2016; Gardner et al., 2018).
2.1.2.1 ‘Common core’ versus discipline-specific vocabulary
The fact that academic discourse can be distinguished from other types of discourse by the
prominent use of certain linguistic features (e.g. Biber, 2006) could suggest that there is a
common set of general academic skills or language features that cuts across disciplines.
Different disciplines will, however, vary in the extent to which they adhere to those features
and in how they use them (e.g. Hyland & Tse, 2007), especially considering the variety of
academic writing tasks and assignments that, for instance, university students are expected to
do in different disciplines (Nesi & Gardner, 2012). In addition, in terms of terminology or
vocabulary specifically, it is not clear whether there is a large enough ‘common core’
vocabulary which is characteristic of academic discourse and similarly used in all disciplines
(Hyland & Tse, 2007, p. 243).
Research on academic writing has underlined the distinction between technical or
specialised (discipline-specific) terminology and general academic vocabulary, that is, those
“items which are reasonably frequent in a wide range of academic genres but are relatively
uncommon in other kinds of texts” (Hyland & Tse, 2007, p. 235; see also Hyland, 2002). The
latter in particular has been the focus of lists of words specific to academic discourse, such as
12
the Academic Word List (Coxhead, 2000), the Academic Vocabulary List (Gardner & Davies,
2013), and the Academic Formulas List (Simpson-Vlach & Ellis, 2010), which attempt to
capture the most important vocabulary that university students should master (Nesi, 2016).
The usefulness of such vocabulary lists has been debated over time and the question
remains “whether it is useful for learners to possess a general academic vocabulary […] because
it may involve considerable learning effort with little return (Hyland & Tse, 2007, p. 236).
Different positions have been taken on this point, with some authors arguing for the importance
of making lists of general academic vocabulary accessible to meet university students’ needs
(e.g. Simpson-Vlach & Ellis, 2010; Gardner & Davies, 2013), while others claim that many
words or phrases have different meanings or phraseological patterns in different disciplines and
that is more important for students to be made aware of those distinctive uses in their own target
disciplines (e.g. Hyland & Tse, 2007; Hyland, 2008).
Hyland and Tse (2007), for instance, question “the assumption that a single inventory can
represent the vocabulary of academic discourse and be so valuable to all students irrespective
of their field of study” (2007, p. 238) on the basis that their investigation shows that the extent
to which lexical items in the AWL are used across disciplines varies considerably. They also
highlight the differences in meaning and collocational environments that particular items (e.g.
volume, attribute) exhibit in different disciplines. Hyland (2008) further adds that disciplines
show different preferences in their use of lexical bundles (or highly frequent collocations), with
less than half of the top 50 identified bundles being common to the four disciplines under
investigation (Biology, Electrical Engineering, Applied Linguistics, and Business Studies). In
support of a discipline-specific approach to EAP teaching, Hyland concludes that these findings
undermine “the widely held assumption that there is a single core vocabulary needed for
academic study” (p. 20). In contrast, Simpson-Vlach and Ellis (2010, p. 509) identify a
“common core of academic formulas that do transcend disciplinary boundaries” (the Academic
13
Formulas List) by using a ‘formula teaching worth’ measure that combines frequency and MI
statistics in the ranking of formulas. While acknowledging the need for further research on
disciplinary variation, they argue that differences in operationalisation in Hyland’s (2008) study
could explain their contrasting results. Simpson-Vlach and Ellis (2010) maintain that the AFL
is an “empirically derived, pedagogically useful list” of frequently recurrent formulaic
sequences across academic genres and disciplines that occur significantly more in (spoken and
written) academic discourse than in non-academic discourse, and hence could be considered
representative of an academic style of discourse. Other studies (Shutz, 2013; Römer & Wulff,
2012) lend further support to the relevance of a general approach to teaching academic
vocabulary by demonstrating that there are a number of verbs (related to the research activity,
e.g. reporting information, describing data and results) and nouns (mainly metadiscoursal and
related to methodology) that are highly frequent in and common to several disciplines.
Based on the increasing literature on general versus discipline-specific EAP instruction,
it seems that it would be more adequate to say that one does not exclude the other; rather, a
combination would suit a larger number of students, depending on specific teaching contexts
and stages. In addition to the specific uses of vocabulary (both individual words and multiword
units) in different disciplines that students need to be exposed to and acquire, there is still an
important set of academic words or phrases that are shared across disciplines, which reflect the
activities and functions that are typical of ‘academic’ work.
2.2 Discourse Grammar: Cohesion in Written Discourse
A text can be defined as “the verbal record of a communicative act” (Brown & Yule, 1983,
p. 6), meaning that it is an instance of language in use rather than language as an abstract system
of meanings and grammatical relations. A stretch of language of any length can be identified as
a unit of meaning, or text, if it constitutes a unified whole, made up of certain resources that
create texture, within a specific communicative context (Halliday & Hasan, 1976). Cohesion is
14
what ties a text together, that is, the network of grammatical and lexical relations that connect
various parts of a text (Halliday & Hasan, 1976):
Cohesion occurs where the INTERPRETATION of some element in the
discourse is dependent on that of another. The one PRESUPPOSES the other, in
the sense that it cannot be effectively decoded except by recourse to it. (p. 4;
authors’ emphases).
These surface meaning relations enable the reader to understand and interpret the content of the
text. By analyzing these relations or ties between dependent elements, the purpose and structure
of the text and its component parts can be more easily identified.
Cohesion also makes writing flow, or “mov[e] from one statement in a text to the next”
(Swales & Feak, 2012, p. 30), by creating and reinforcing connections at different structural
levels—sentence, paragraph and discourse. Within these, cohesion in writing can be established
through various language features, including information structure and thematic progression at
a macro-level, and lexicogrammatical cohesive devices, such as repetition, linking words, and
pronominal reference, at a more micro-level.
The notion of information structure typically concerns the development from ‘given’ (or
old) to ‘new’, which is the unmarked pattern of organisation of information in English. Given
information is that which is already known to the hearer/reader or otherwise “recoverable either
anaphorically or situationally” (Halliday, 1967, p. 211, as cited in Brown & Yule, 1983, p. 179),
whereas new information is not. In other words, given information represents shared knowledge
and provides a reference point to which new information can be related (Bloor & Bloor, 2004,
p. 66). This given-to-new pattern facilitates understanding of the information conveyed by the
speaker/writer and clearly indicates what they consider the most important information. The
early placement of given information (i.e. in the subject position) “establishes a content
15
connection backward and provides a forward content link that establishes the context” (Swales
& Feak, 2012, p. 31).
Information structure is closely related to the thematic structure established within a
clause by its constituents, ‘theme’ and ‘rheme’. In the context of writing, the former is what the
clause is about, or its “starting point”, and the latter what is said about the theme, that is, the
new element or piece of information being introduced (Paltridge, 2012, p. 129; Bloor & Bloor,
2004, p. 73). The theme then serves both as a point of orientation by connecting back to previous
stretches of text and as a point of departure by connecting forward and contributing to the
development of later stretches (Bloor & Bloor, 2004, p. 73). The relationship between theme
and rheme, for example the way a theme develops a topic introduced by a previous rheme, also
contributes to the texture of a text. Thematic progression, a “key way in which information flow
is created in a text” (Paltridge, 2012, p. 131; italics in original), allows the writer to maintain a
continuity of ideas and the reader to follow the ideas in a text.
Writing cohesively thus implies ensuring a smooth flow of information as well as a clear
connection between ideas and dependent elements within a text. The lexicogrammatical devices
that can be used to establish cohesive ties will be the focus of the following section.
2.2.1 Cohesive devices and reference patterns
In their classic model of cohesion in English, Halliday and Hasan (1976) identify five main
cohesive devices which contribute to the texture of a text: reference, substitution, ellipsis,
conjunction, and lexical cohesion. Of particular interest for this study, reference and ellipsis
require the reader to elicit information in a certain point of the text, by retrieving it from or
relating it to a relevant part of the text. While ellipsis involves retrieval of information that can
be presupposed, reference creates textual cohesion by linking elements (referents) that enable
recovery of information from the immediately preceding or subsequent context. More
specifically, reference can be defined as the relationship of identity which enables the reader to
16
trace entities or events in a text. It comprises a set of grammatical and discoursal resources that
allow the writer to refer back (anaphorically) to something that appeared before in the text or
forward to something that is yet to be introduced (Halliday & Hasan, 1976):
[…] the specific nature of the information that is signalled for retrieval. In the
case of reference the information to be retrieved is the referential meaning, the
identity of the particular thing or class of things that is being referred to; and the
cohesion lies in the continuity of reference, whereby the same thing enters into
the discourse a second time. (p. 31)
Maintaining continuity of reference, or chains of reference, enables the reader to interpret and
follow the flow of information as intended by the writer. As “sequences of noun phrases all
referring to the same thing […] in a relation of co-reference” (Biber, Johansson, Leech, Conrad
& Finegan, 1999, p. 234; emphasis in original), chains of reference contribute to a great extent
to text cohesion. Co-reference can be expressed through different linguistic forms.
Halliday and Hasan (1976) distinguish between three types of reference: personal,
demonstrative, and comparative. Personal reference involves using personal pronouns,
possessive determiners and possessive pronouns to “refer to something by specifying its
function or role in the speech situation” (p. 44). Through demonstrative reference, the referent
is identified by means of adverbial and nominal demonstratives in terms of location (in space
or time) and proximity. Comparative reference is indirect and includes two types of comparison,
general (likeness or unlikeness) and particular (quantity or quality), which are expressed by
adjectives and adverbs.
These types of reference can refer to the context of the situation (exophorically) or to
entities mentioned within a text (endophorically). Endophoric reference is established in two
main ways, namely through the use of anaphoric and cataphoric expressions (Brown & Yule,
1983, p. 192; Bloor & Bloor, 2004, p. 96). Anaphora refers to the pattern of reference by which
17
a word or phrase is used to refer back to someone or something (antecedent) that has already
been mentioned earlier on in the text. In contrast, cataphora is the process by which some
linguistic items refer forward to someone or something coming later in the text. In this study,
only anaphoric reference is of interest as textual cataphoric cohesion is much less common
(Halliday & Hasan, 1976, p. 68).
The demonstratives (this, these, that and those) are an important means of establishing
cohesive and referential relations in discourse, as will be discussed in more detail below.
2.2.2 Demonstratives as determiners or pronouns
Described as forms of “verbal pointing” (Halliday & Hasan, 1976, p. 57), the demonstratives
this, these, that and those constitute a category of deictics or deictic expressions, that of spatial
deixis. Meaning “pointing” through language in ancient Greek, the term “deixis” generally
refers to those linguistic forms (e.g. this and that, here and now) that are tied to the situational
or textual context shared by the speaker/hearer or writer/reader, which means that context is
essential for their identification and interpretation. As important cohesive resources, deictic
expressions serve not only to point to something but also to establish anaphoric or cataphoric
reference to a preceding or following part of the discourse. Even though deictic expressions are
typically associated with spoken discourse, in writing the text itself can provide the context that
is needed to interpret these “pointing” expressions. In fact, deictics such as this are common
not only in conversation registers, but also in academic writing (Swales, 2005, p.1; Biber et al.,
1999, p. 349; see also Biber, 2006, p. 15) their referential meaning being determined by the
textual context in which they occur: relative proximity (this, these) and relative remoteness
(that, those).
In establishing anaphoric reference between a referring expression and an antecedent,
demonstratives can be used either independently as pronouns or ‘Heads’ (e.g. This means
that…) or dependently as determiners or ‘Modifiers’ preceding a head noun (e.g. This
18
explanation…). Focusing specifically on what Halliday and Hasan (1976) call selective nominal
demonstratives (this, these, that, and those), these are distinguished in terms of number
(singular versus plural) and proximity (near versus distant), as shown in Table 1 below.
Table 1. Distinctive features of the demonstratives
Singular Plural
Near this these
Distant that those
According to Biber et al. (1999, p. 349), “proximity is insufficient to account for the
distribution of the demonstrative pronouns”, which varies depending on register: that more
common in conversation; this, these and those more often employed in academic prose. The
high frequency of the demonstratives this and these as determiners and as pronouns in academic
writing can be explained by “their use in marking immediate textual reference” (Biber et al.,
1999, p. 349). One further distinction then between these forms of demonstratives relates to the
anaphoric distance that is normally associated with each form: the shorter anaphoric distance
of demonstrative pronouns contrasts with the larger anaphoric distance of demonstrative
determiners. This in turn is related to the “relationship between explicitness and anaphoric
distance” (Biber et al., 1999, p. 240) that further distinguishes the demonstrative forms when
used pronominally as Heads or followed by a noun as Modifiers: the greater the distance
between the referring expression and its antecedent, the greater the possibility of creating
ambiguity. Halliday and Hasan also highlight the notion that singular demonstrative Heads (this
and that) are associated with extended text reference (cf. ‘situation reference’ in Petch-Tyson,
2000) in addition to referring anaphorically to a single referential point, but “in either case the
effect is cohesive” (Halliday & Hasan, 1976, p. 67).
This study focuses on the ‘near’ demonstratives this and these, which are characteristic
of academic writing (Biber, 2006, p. 15), and in particular on their distinctive anaphoric uses
19
as determiners or as pronouns in establishing connections between parts of a text. More
specifically, this and these are often used in given-to-new patterns to refer back to a part or
entirety of the preceding sentence (or even several previous sentences), thus contributing to
textual cohesion. Phrases or clauses beginning with this/these followed by a noun (or ‘summary
word’) “summarize what has already been said and pick up where the previous sentence has
ended” (Swales & Feak, 2012, p. 43), as illustrated in example (1) below. Nesi and Gardner
(2012, p. 110) also draw attention to the summarising role of the anaphoric demonstrative this
followed by a lexical verb, such as This clearly shows that or This illustrates that, in essays
across disciplines in the BAWE corpus. The following examples illustrate how this can be used
to connect pieces of information and create text cohesion both as a determiner (1) and as a
pronoun (2).1
(1) When children reach school, they are required to have a metalinguistic awareness and
understanding of the language. This ability demands that they think about language, its
uses and the rules that govern it, in order to read and write. (LING_6045a)
(2) Without the vocabulary it has learnt during the acquisition of its first 50 words, a child
could not progress to this stage, and these words could not have been acquired without
the experimentation of sound production in the babbling phase. This shows that all stages
in the speech development process are important and should be viewed as equally
significant. (LING_6067b)
2.2.3 Attended and unattended this/these in academic writing
As previously noted, the anaphoric demonstratives this and these are important elements in
given-to-new information structuring of texts, offering a convenient way of cohesively “getting
out of a sentence and into another” (Swales, 2005, p. 6). They can, however, serve other
functions. As pronouns (or “unattended”), they can be an economical and effective means of
1 All examples are taken from the data used for this study and are identified by an abbreviation of the subcorpus (e.g. LING) followed by the Document ID number (e.g. 6045a) as referenced in the BAWE corpus documentation. Any typographical or grammatical errors in the original text were kept.
20
offering an explanation or conveying information in reference to preceding discourse of varying
lengths. When supported by a noun (or “attended”), they can pinpoint the focal point of a
proposition, while simultaneously allowing the writer to add clarity or interpretation of the
referent (Swales & Feak, 2012, pp. 43-48; Geisler et al., 1985, p. 151). These different functions
and the “long and unfinished story” (Swales, 2005, p. 4) of the attended and unattended forms
of the anaphoric demonstratives can be summarised in terms of trade-offs between economy
and clarity and rhetorical opportunities (Geisler et al., 1985):
Out of control, the unattended this points everywhere and nowhere; under
control, it is the lanuage’s [sic] routine for creating a topic out of a central
prediction [sic], pointing to it, bringing it in to focus, and discussing it; all done
in one stroke, gracefully, economically, and without names. (p. 153)
While Swales (2005) argues that “tacit sense of the tradeoff between economy and clarity […]
probably only comes with considerable writing experience” (p. 14), the choice between clear
and economical reference is particularly relevant for university students, who are expected to
learn how to write clear, unambiguous texts using an ‘academic’ style of discourse that is
typically compressed and inexplicit (Biber & Gray, 2010, p. 19).
Despite the potential effectiveness of both determiner and pronominal forms of the
demonstratives, academic style guides and textbooks often advise against or recommend the
careful use of unattended this (e.g. American Psychological Association, 2010; Glenn & Gray,
2013; Swales & Feak, 2012). The APA manual, for instance, refers to the pronominal forms
this, that, these, and those as “the most troublesome” and recommends writers to “[e]liminate
ambiguity by writing, for example, this test, that trial, these participants, and those reports”
(2010, p. 68). In a section called “This and Summary Phrases”, Swales and Feak (2012) also
recommend as best practice that the demonstrative this be followed by a noun whenever “there
is a possibility your reader will not understand what this is referring to […] so that your meaning
21
is clear” (p. 43). They add however, in a final section commentary which was not included in
previous editions of the textbook, that “there are occasions when “unattended” this (no
following noun) is perfectly reasonable” (p. 48). This additional commentary, as noted by the
authors, is based on the findings of a corpus-based study on advanced student academic writing
(Wulff et al., 2012).
Recent research on published academic prose (Gray, 2010; Gray & Cortes, 2011; Swales,
2005) corroborate these findings by reporting on the common use of this and these as pronouns
in empirical research articles across academic disciplines. One possible explanation relates to
requirements on academics during the editorial and revision process to submit shorter texts, and
a “simple way of doing this is to ‘de-attend’ instances of this” (Swales, 2005, p. 14). It could
also be argued that this increasing awareness and use of the pronominal demonstratives (and in
particular of this) in academic writing might be related to a shift in academic English towards
a less explicit expression of meaning that has been found in recent studies (e.g. Biber & Gray,
2010). In contrast to the (mostly implicit) reference to elements in the situational context in
spoken discourse where pronouns (including deictics) abound, academic written discourse is
often claimed to be “maximally explicit in meaning” (Biber & Gray, 2010, p. 11). In writing,
deictic pronouns are used for textual reference, pointing to a specific part of the text (or
proposition). Failure to use referring expressions that can identify antecedents clearly could
create ambiguity and cause confusion in particular to non-experts (Biber & Gray, 2010).
The demonstratives this and these have also been associated with a perceived increase of
informality in research articles, which is not as prevalent as generally thought (Hyland & Jiang,
2017; Biber & Gray, 2010, p. 17). From a list of ten features typically considered ‘informal’
and proscribed in academic writing, Hyland and Jiang (2017) found that unattended reference
was one of the three main features that influenced the overall results the most. Despite their
high frequency, anaphoric unattended pronouns (this, these, that, those, it) showed a declining
22
trend over time in the four disciplines chosen to represent the hard and soft sciences (Applied
Linguistics, Sociology, Engineering, and Biology). It is unclear, however, to what proportion
each of the anaphoric pronouns declined (or increased). Adopting the same list of ten categories
of informality, Lee, Bychkovska and Maxwell (2019) compared the use of informal language
in argumentative essays by native English and non-native undergraduate students using data
extracted from the MICUSP and COLTE corpora of student writing.2 They found that both
groups frequently use features of informality, in particular anaphoric unattended pronouns,
which represented over 47% and 55% of all informal features in the native and non-native
student corpora, respectively. Mixed patterns were found, however, concerning the individual
pronouns preferred by each group, with native students using unattended this significantly more
than non-native students, while the remaining pronouns (these, that, those, it) occurred
significantly more frequently in the non-native corpus. Lee et al. (2019) argue that the native
students use “a broader range of informal features, particularly those that have become
relatively legitimized in academic writing such as […] unattended this” (p. 152). It could be
added that not only first language but also level of study/expertise is an important explanatory
factor, since the native students, as senior undergraduates, are presumably more exposed to and
more aware of research writing practices than non-native students from first-year writing
courses. Further studies would help “to determine whether it is the L1, writing experience,
reader (i.e., subject-matter or composition instructor), writing task, or a combination of these
factors that affect undergraduate students’ stylistic choices” (Lee et al., 2019, p. 152).
2 The Michigan Corpus of Upper-Level Student Papers (MICUSP) corpus contains 830 A-graded, upper-level papers of different genres from 16 academic disciplines. The Corpus of Ohio Learner and Teacher English (COLTE) is a large collection of English as a second language (ESL) student writing and teacher written feedback, compiled at Ohio University.
23
3 Material and Methodology
This chapter presents the material used for the study and describes the quantitative research
methods employed and the models of classification used in the qualitative analysis.
3.1 Data
The material for this study was extracted from the British Academic Written English (BAWE)
corpus, which comprises texts in a wide range of university genres (e.g. essays, research reports,
case studies) from four levels of study (first-year undergraduate to taught master’s level) across
30 main academic disciplines, totaling approximately 6.5 million words.3 The corpus includes
coursework assignments produced by native and non-native English speakers from four British
universities which were awarded ‘merit’ and ‘distinction’ grades, and hence the writing can be
considered matching the standards set by subject tutors (Nesi & Gardner, 2012, p. 6).
The documentation provided with the BAWE corpus includes a spreadsheet with
metadata about its content (e.g. discipline, grade, first language of the author). By using the
metadata filters, the corpus data was restricted to texts by first-year native English speaker
students from four disciplines representing each of the four broad disciplinary groups—Arts
and Humanities, Social Sciences, Life Sciences, and Physical Sciences.4 The choice of
discipline was based on the most amount of data available for the first-year level of study:
Linguistics (LING), Law (LAW), Biological Sciences (BIO) and Engineering (ENG). The
composition of the four subcorpora that were created is shown in Table 2.
3 “The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics, Warwick), Paul Thompson (formerly of the Department of Applied Linguistics, Reading) and Paul Wickens (School of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800).” https://warwick.ac.uk/fac/soc/al/research/collections/bawe/how_to_cite_bawe Additional information about the corpus is available at www.coventry.ac.uk/BAWE 4 Two main groups can be distinguished corresponding to the general distinction between ‘hard’ and ‘soft’ sciences: the natural and physical sciences (Biological Sciences and Engineering) versus the humanities and social sciences (Linguistics and Law).
24
Table 2. Overview of the subcorpora
Disciplinary group Discipline Number of
words
Number of
texts
Average text
length
Arts and Humanities Linguistics 40,812 25 1,632
Social Sciences Law 56,630 26 2,178
Life Sciences Biological Sciences 72,492 42 1,726
Physical Sciences Engineering 58,857 36 1,635
228,791 129 Note: No editing was made to the original texts; any footnotes or reference lists are included in the word count.
As can be seen, there is a certain imbalance between the four subcorpora, with the
Biological Sciences subcorpus being larger than the rest. In addition, genre could not be
controlled for due to the wide variation of the assignments and the unequal distribution of data
available across disciplines and genres (see Gardner and Nesi, 2013, for details on the
classification of texts in the BAWE corpus into 13 ‘genre families’ according to their purpose
and generic structure). Each subcorpus comprises different types of texts, but predominantly:
essay in Linguistics; essay and critique in Law; methodology recount and explanation in the
Biological Sciences; and methodology recount in Engineering—closely matching the general
distribution of these genres across disciplines and levels of study in the BAWE corpus (Nesi &
Gardner, 2012, pp. 51-52). This means that the results of the present study could be mediated
not only by discipline but also genre. While acknowledging these limitations, the selected data
can nevertheless be considered representative of native English proficient student writing in
each of the selected disciplines. It is acknowledged, however, that the generalizability of the
findings is limited by the small size of the data sets.
3.2 Method of Analysis
Corpus linguistics has been used by researchers from different fields as a methodology to study
language in real-life contexts through corpora of authentic texts and the use of computer
software (e.g. concordance tools). A corpus can be defined as “an electronically stored
25
collection of samples of naturally occurring language” (Hunston, 2006, p. 234) designed for a
particular purpose. Several tools and methods, such as word lists, keywords and lists of
collocates, can be used to explore corpora and investigate specific linguistic features or
recurrent phraseology (Nesi, 2016; Hunston, 2002). While predominantly quantitative in nature
(based on frequency data), corpus-based analyses are often combined with a close examination
of concordance lines that show the context in which words co-occur, allowing researchers to
test hypotheses and identify patterns of language use in large amounts of data that could go
unnoticed if relying only on intuition or introspection.
This study employed quantitative corpus-based methods to determine the frequencies of
occurrence and the extent of variation or similarity in the use of the demonstratives this and
these in the data sets detailed in the previous section. A further analysis of the context in which
this and these occur as a pronoun or as a determiner was carried out to determine the main types
of verbs and nouns that follow the demonstratives.
As a first step, each of the four corpora was searched for the terms this and these using
the concordance tool AntConc (Anthony, 2019) and the total number of occurrences per corpus
was recorded (see Section 4.1). The output of each search was saved as a text file and imported
into an Excel spreadsheet for subsequent analysis and annotation of the data.
Next, the context of each concordance line was analysed to distinguish and code each this
and these as a pronoun or a determiner, after which their frequencies (raw and normalised) and
percentages were calculated and their distribution across disciplines determined (see Sections
4.1 and 4.4). Some instances were excluded from further analysis, including (3) below, where
the demonstrative this was used for cataphoric reference. Additionally, concordance lines where
this or these were part of quoted text or linguistic examples, such as (4) and (5), were also
excluded.
26
(3) Alan Freeman in "Truth and Mystification in Legal Scholarship" says this: […]
(LAW_0086d)
(4) Denning effectively overruled this, disregarding the doctrine of precedent, and stating
that "beneficial interests in the matrimonial, or furniture, belongs to one or the other
absolutely, or it is clear that they intended to hold it in definite shares. The court will give
effect to these intentions;" striking an accommodating approach. (LAW_0209a)
(5) In a phrasal exchange we get I got into this guy with a discussion for I got into a discussion
with this guy. Perhaps the speaker meant to say I got into this discussion with a guy.
(LING_6010c)
The subsequent step was to identify and classify the verbs and nouns that follow this/these as
pronouns or determiners, respectively, according to the models of classification described in
the next section. Figure 1 illustrates the coding of this in a sample of the data.
Figure 1. Example of coding of a data sample from the Engineering corpus
In the case of verbs, these were classified only when the pronoun served as the subject of the
verb, which means that for instance are in example (6) below was coded (verb: be; type of verb:
primary), whereas are in (7) was not analysed further since its subject is the noun implications.
27
(6) However, this patient speaks fluently with quite long utterances and few pauses. Again,
these are typical signs of Wernicke's aphasia (LING_6067f)
(7) It makes sense that this could work vice versa with words altering the meaning of pictures.
The implications of this are huge. Moving away from the sphere of music videos and the
potential of multi-modal texts is huge and important (LING_6018a)
With regard to nouns, the frequency counts and classification of types apply to head nouns only,
where the demonstrative was part of a noun phrase with pre- or post-modification, except for
compound nouns. For instance, in examples (8) and (9) below, only the head nouns module and
dilution were counted and coded. In the case of coordinated noun phrases with two head nouns,
only the first one was recorded. In addition, nouns that were misspelled were excluded from
annotation.
(8) If there was a fire in the house then this module will receive a high value in the heat
section and so call for the fire brigade. This software module would have different levels
of action. (ENG_0228f)
(9) This end point dilution gives the HA titre of the virus, which was 6400 HA units/ml and
there was 2.88 x 1010 virus particles/ml. (BIO_0006b)
To make the coding of the noun types more manageable, for this part of the analysis only a
subset of the data was used, namely those instances of this/these in sentence-initial position,
which constitute approximately 40% of the total amount of data (see Table 3 in Section 4.1).
The specific challenges related to the classification of noun types will be further discussed in
the following section.
Finally, after the data coding was completed, the frequencies of the nouns and verbs that
follow this/these and respective types were calculated for each corpus and for the total, after
which their usage across disciplines was compared (see Sections 4.2, 4.3 and 4.5).
28
3.2.1 Models of classification
For pronominal uses of this/these, the first verb following the demonstrative was classified as
lexical (e.g. give, explain), primary (be, have, and do), or modal (e.g. may, will), according to
the three main verb classes in the Longman Grammar of Spoken and Written English (Biber et
al., 1999, p. 358). This framework was employed to distinguish between the different types of
verbs found in the data and to determine which types are most used with unattended instances
of this/these overall and whether any disciplinary preferences can be found.
The noun taxonomy in Gray’s (2010) study was initially adopted to classify the nouns
modified by this/these as concrete, shell, abstract, and species/quantifier. To these, a category
used by Gray and Cortes (2011), deictic nouns, was later added to account for the prevalence
in the analysed data of nouns referring to the text itself or to parts of it (e.g. this essay, this
paper, this graph), as could be expected in university assignments where students often refer
to their own writing. In contrast, species/quantifier nouns were too infrequent in the data (less
than half a percent of the total occurrences) to represent a separate category and were thus
grouped under “other abstract nouns”. Table 3 below shows the four noun types adopted in this
study (concrete, deictic, shell, other abstract nouns), based on a combination of the categories
used by Gray (2010) and Gray and Cortes (2011).
Table 3. Definitions and examples of noun types, adapted from Gray (2010) and Gray and
Cortes (2011)
Concrete nouns Nouns that represent physical entities or
objects that can be touched, heard or seen.
e.g. apparatus, card,
kit, student, specimen
Deictic nouns Nouns that orient the reader by pointing to
the overall text or a specific part of it, or to
extralinguistic elements.
e.g. study, article,
figure, section
29
Shell nouns Abstract nouns that summarise or
encompass preceding information and
carry it into the next clause or sentence.
e.g. method, result,
model, finding, issue,
analysis
Other abstract nouns Nouns that refer to concepts or ideas that
cannot be measured or observed.
e.g. requirement,
value, level, range,
type, kind
The above definitions of each category are useful to a certain extent; however, it became
clear early on in the coding that there is no straightforward one-to-one correspondence between
form and meaning (see examples (16) to (18) in Section 4.5). Shell nouns in particular (also
referred to in the literature as ‘general nouns’, ‘anaphoric nouns’, ‘carrier nouns’ or ‘signalling
nouns’) could only be coded as such if they performed the function of “encapsulation of
meanings expressed in prior discourse” (Gray & Cortes, 2011, p. 34). Distinguishing between
shell and other abstract nouns thus proved to be the most time-consuming and complex part of
the classification, which mostly rested on the interpretation of the specific meaning of each
noun based on its use in context. This contrasts with Gray’s operationalisation of shell nouns
as those that had been identified in previous research, which could have had some influence on
the results reported (2010, p. 181). Given the overlap between these noun types and the degree
of subjectivity involved in their analysis, it remains to be determined whether such distinction
can be objectively and effectively operationalised.
4 Results and Discussion
This section presents the main findings of this study as follows: in Section 4.1 the overall
frequencies of this/these as determiners and pronouns are analysed; the nouns, verbs, and
respective types that follow the demonstratives are then identified in Sections 4.2 and 4.3;
30
finally, Sections 4.4 and 4.5 compare the distribution across academic disciplines of pronominal
and determiner uses of this/these, as well as of the most frequent nouns, verbs and types of
nouns and verbs.
4.1 Overall Frequencies of Attended and Unattended this/these
A total of 2,547 instances of this (2,046) and these (501) were found in the corpora of student
writing analysed in this study, as can be seen in Table 4 below. The singular form is much more
frequent, as might be expected due to the predominant reference to abstract entities and
concepts in academic prose, occurring on average nearly nine times for every thousand words
in the data. This contrasts with the average of six times per thousand words reported for
published research articles (Swales, 2005, p.1). The number of texts further reinforce this
difference between the singular and plural forms: this is present at least once in each of the texts
that comprise the corpus, whereas these can be found in fewer texts. In contrast, the two forms
are similar in terms of the position they take in the sentence, with both this and these occurring
more often in a medial or final position in the sentence than sentence-initially.
Table 4. Number of occurrences of this and these (percentages rounded to the nearest whole
number)
Demonstrative Raw
frequency
Frequency per
1,000 words
Number of
texts
Sentence-initial
(%)
this 2,046 8.94 129/129 870 (43)
these 501 2.19 116/129 174 (35)
2,547 1,044 (41)
A further distinction between this and these is found in their use as determiners or pronouns.
The figures in Table 5 show that this is employed fairly evenly as determiner and as pronoun,
which contrasts with the overwhelming use of these followed by a noun in the data.
31
Table 5. Frequencies of this/these as determiners and pronouns (percentages rounded to the
nearest whole number)
Demonstrative Determiner % Pronoun %
this 1,063 52 970 47
these 390 78 108 21
1,453 1,078
Similarly to other studies of student academic writing (Wulff et al., 2012; Petch-Tyson,
2000; Römer & Wulff, 2010), these results show that first-year undergraduates are not
observing the prescriptive guidelines that generally advise against leaving a demonstrative
unattended. They also indicate that student writers use pronominal this more frequently than
research article writers (e.g. Gray, 2010; Gray & Cortes, 2011). On the other hand, the results
for this differ somewhat from those reported previously by Wulff et al. (2012) and Petch-Tyson
(2000), even though no direct comparison can be made due to differences in operationalisation.
Wulff et al. (2012) identified more cases of attended than unattended this (57% versus 43%) in
papers by advanced-level students (MICUSP corpus), but they counted only those instances in
sentence-initial position rather than in all positions in the sentence. A much higher percentage
of demonstrative determiners (64%) than of pronouns (36%) was also reported by Petch-Tyson
(2000), who compared the use of both this/these and that/those by American university students
(a subset of the LOCNESS corpus) and English as a foreign language (EFL) students.5
It could be argued that the educational context (North American in the case of the above
mentioned studies, or British in the case of the material used for the present study) might play
a role in the degree to which novice student writers use this/these as determiners or pronouns.
More exposure in North American contexts to style manuals, writing handbooks and textbooks
(e.g. American Psychological Association, 2010; Glenn & Gray, 2013; Swales & Feak, 2012)
5 The Louvain Corpus of Native English Essays (LOCNESS) corpus comprises general argumentative essays written by American and British students.
32
that offer explicit guidance on what constitutes ‘good’ academic writing could then provide at
least part of the explanation for the differences observed in the frequencies of use of attended
versus unattended this/these in this study. The high percentage of unattended this could also be
the result of a more limited range of academic vocabulary or fewer lexical options at the
disposal of first-year undergraduates when compared to more advanced-level students, which
could potentially lead first-year students to resorting to somewhat less complex structures when
constructing their arguments. This possible explanation might be further supported by analysing
the most frequent nominal and verbal structures following this and these, which will be reported
on in the next sections.
4.2 Most Frequent Nouns and Noun Types Following this/these
Overall, 1,423 head nouns (lemmas) were identified in the data (the complete list is provided
in Appendix 1), a substantial number of which occurs only once (272 nouns).6 Table 6 shows
the most common nouns (ten or more instances) that follow this and these combined, which
account for approximately one third of the total number of occurrences. The majority are
methodology- or results-related nouns (e.g. method, process, result) and nouns that relate to
student activity or type of task (e.g. essay, report), but also nouns with general meaning are
found (e.g. case, point, way), which in context take on specific meanings. The latter include
nouns that occur in specific prepositional phrases with anaphoric function, such as “in this case”
and “in this way”, which can be considered semi-fixed expressions and occur relatively
frequently in the data (for example, there are 36 instances of in this case). It is also interesting
to note that as few as seven nouns occur 20 or more times: case, experiment, essay, point, way,
theory and value. The specificity of this short list confirms the main uses of the demonstratives
to refer not only to elements of the real world, but also to discourse-level elements.
6 The results reported on throughout Section 4 refer to lemmas for both nouns and verbs.
33
These findings are generally consistent with those of other studies on (un)attended this
both in student and professional academic writing. Just over a third of the nouns in Table 6
appear in the top 25 head nouns attending this in MICUSP papers, as reported by Römer and
Wulff (2010). A number of those are also part of the top 50 attendant nouns that Swales (2005)
found to be most frequent in a corpus of research articles representing ten different disciplines.
Both studies also point out the prominence in their data of metadiscoursal nouns, as well as
nouns referring to methodology, which seem to be not only frequent but also common to several
disciplines.
Table 6. The most frequent nouns following
this/these (cut-off frequency of 10)
Noun Frequency
case
experiment
essay
point
way
theory
value
method
model
process
laboratory
result
data
investigation
reaction
report
approach
organism
stage
56
47
37
29
25
24
20
17
17
17
16
16
14
14
14
14
13
13
13
34
area
idea
question
time
gene
offence
type
feature
12
12
12
12
11
11
11
10
In line with the disparate results for this and these reported on in Section 4.1, differences
can also be observed in terms of how they pattern with nouns individually. As can be seen in
Table 7, there is almost no overlap between the ten most frequent noun lemmas that co-occur
with the singular and plural forms, with only experiment (underlined) appearing in both lists,
albeit with different degrees of occurrence. In addition, the nouns that follow this are
predominantly abstract nouns and coincide to a large extent with the top half of the list of most
common nouns overall shown in Table 6 above, whereas these is most often used with concrete
nouns (e.g. cell, gene, organism, people, bacterium), in this case nouns specific to the
Biological Sciences corpus, which could be partly due to the larger size of that corpus.
Somewhat related, the predominance of methodology recounts in the data (particularly in the
hard disciplines) is a possible explanation for result to be the most frequent noun lemma
attending these, considering that the main identified purpose of those assignments is to
“demonstrate/develop familiarity with […] conventions for recording experimental findings”
(Gardner & Nesi, 2013, p. 39).
Table 7. The most frequent noun lemmas following this versus these
this these
Noun Frequency Noun Frequency
case 53 result 14
35
experiment
essay
point
way
theory
laboratory
method
process
investigation
report
43
37
27
25
21
16
16
16
14
14
value
model
cell
gene
variable
error
organism
people
bacterium
difference
experiment
feature
number
question
stage
task
word
11
9
8
7
7
6
5
5
4
4
4
4
4
4
4
4
4
As indicated in Section 3.2, a subset of the data consisting of sentence-initial this/these
occurrences was used for the analysis of noun types. A total of 428 head nouns were classified
as (i) concrete, (ii) deictic, (iii) shell or (iv) other abstract nouns.7 The complete list of head
nouns ordered by type is provided in Appendix 2. As can be seen in Figure 2 below, while
deictic (e.g. essay, table, graph) and concrete nouns (e.g. plant, organism, device) constitute
approximately a third of the data, the majority of nouns fall into the other two categories: shell
nouns (38%) and other abstract nouns (31%).
The predominance of shell nouns (e.g. evidence, idea, process, approach) and other
abstract nouns (e.g. energy, evolution, density, equity) indicate that most of the occurrences of
sentence-initial this and these modify nouns that represent abstract entities and concepts
7 Refer to Section 3.2 for a complete description of how the selection of head nouns was made; for example, specialised terminology such as “word spurt”, “gene transfer” and “mens rea” was excluded.
36
frequently employed in academic writing. These combinations of this/these + noun are cohesive
demonstrative structures used to point back to previously mentioned information or referent
either by repeating the same noun or by using a more general noun that encapsulates the
preceding text.
Figure 2. Percentage of types of nouns overall (percentages rounded to the nearest whole
number)
The fact that deictic nouns are slightly more common in the data than concrete nouns
could be explained by the type of texts included in the corpora, mostly comprising essays and
methodology reports, and thus the recurrent use of nouns related to the assignment itself is
perhaps not so surprising. Further investigation using different types of texts could be valuable
to help understand the role of genre as a variable in the use of anaphoric demonstratives.
Figure 3 displays the distribution of the noun types that follow this and these separately.
The patterns here are somewhat mixed when compared to the overall percentages in Figure 2.
As can be seen, while the trend line for this closely reflects the overall distribution, the use of
38
31
17
15
SHELL OTHER ABSTRACT DEICTIC CONCRETE
37
other abstract nouns and concrete nouns with these is more salient, in line with the listing of the
most common noun lemmas attending each form presented in Table 7.
Figure 3. Percentage of types of nouns following this and these separately (percentages rounded
to the nearest whole number)
Reporting on expert writers’ use of this/these in Applied Linguistics and Engineering,
Gray and Cortes (2011) identified a similar distribution of noun types (decreasing from shell to
concrete nouns) but with different proportions; however, their results for this and these are
combined and based on discipline, and also include a third abstract noun category. Similarly,
even though no direct comparison can be made, Gray (2010) found shell and abstract nouns to
be predominant in her corpora of research articles in two different disciplines (Education and
Sociology), with concrete nouns coming last. Both these studies’ findings and those of the
present study reflect the pervasive use of abstract nouns in academic writing that contributes to
the complexity and formality that are typically associated with an academic style of discourse
41
28
19
12
29
38
11
23
S H E L L O T H E R A B S T R A C T D E I C T I C C O N C R E T E
this these
38
(Biber & Gray, 2010). They could further suggest that novice students are generally following
up on expectations in terms of the types of nouns characteristic of academic English.
4.3 Most Frequent Verbs and Verb Types Following this/these
Overall, 111 different verbs were identified in the data, the majority (67) occurring only once
and 14 with more than ten occurrences. The complete list of verbs is provided in Appendix 3.
As can be seen in Table 8, the verb lemma BE (i.e. including all its forms) most frequently
follows this and these in the student corpora, accounting for 373 instances of the total 865 that
were classified for verb type.8 The fact that copula be is one of the “specific verb categories that
are typical of academic prose” (Biber, 2006, p. 14) provides one explanation for the prominence
of BE in the data.
Table 8. The most frequent verbs
following this/these.
Verb Frequency
BE
would
can
MEAN
may
HAVE
could
DO
SUGGEST
SHOW
will
373
48
39
38
30
25
21
21
20
16
16 Note: Small caps identify lemmas; primary verbs
are in bold, lexical verbs are shown underlined.
8 Refer to Section 3.2 for a complete description of how the selection of verbs was made.
39
From the analysis of the most common verbs in the data, some trends appear to emerge:
the striking frequency of the verb BE in all its forms; a wide variety of lexical verbs, with only
a small number of them occurring often; and a few recurrent patterns, such as this is because
(see example 15) and this is due to (see example 12). These trends seem to indicate that students
consider the pronominal this + BE (especially in sentence-initial position, which accounts for
118 out of 334 instances) a ‘prefabricated’, fixed structure that conveniently allows them to
explain or give a reason for a previous point without resorting to more specific verb or noun
choices with potentially added shades of meaning. In addition, the most frequent lexical verbs
(MEAN, SUGGEST and SHOW) all have in common that they can be readily used to explain or
interpret findings (e.g. this means that, this suggests that, this shows that), as part of
constructing an argument, as can be seen in examples (10) to (12) below.
These findings are aligned with those of Wulff et al.’s (2012) investigation in that they
also found MEAN and SUGGEST to be predominantly used in clusters with unattended this in the
sort of interpretive or explanatory activity that is common in student writing. Wulff et al. further
indicate that their “results point towards an ongoing delexicalization of this + verb clusters like
this is and this means into textual organization markers” (p. 129). Interestingly, formulaic
sequences containing this is also feature prominently in Simpson-Vlach and Ellis’ (2010) list
of core formulas (AFL) in academic speech and writing.
(10) The method followed is as described in the laboratory manual. In estimating the
concentration of the prepared ovalbumin 0.01ml of the preparation was taken, rather than
the suggested 0.1ml. This was because at the suggested concentration, the absorbance at
280nm was off the scale of the machine. (BIO_0032c)
(11) Bacterial DNA can be exchanged between different strains allowing new allele
combinations to be produced. This means that a 'wild type' gene can be passed to a strain
with a mutation in this gene and through recombination the gene activity can be restored.
(BIO_0007a)
40
(12) The production of polyclonal antibodies usually requires a large amount of pure protein,
whereas monoclonal antibodies require smaller amounts of impure protein. This is due
to the fact that polyclonal antibodies are produced by many different B lymphocytes, and
polyclonal preparations contain a mixture of antibodies that recognise different parts of
the protein. This means that more amounts of pure protein will be needed in order to
segregate the specific polyclonal antibody needed. (BIO_6119b)
A closer look at the concordance lines also reveals that pronominal this appears to be used
more frequently to refer anaphorically to longer antecedents, rather than single noun phrases,
as reported by other studies (e.g. Gray, 2010; Gray & Cortes, 2011; Wulff et al., 2012). This is
especially salient in the case of sentence-initial this, which often introduces paragraph-ending
sentences that summarise, explain, or give a reason for information presented in previous
sentences, such as evidence from experiments or descriptions of procedures (cf. Nesi &
Gardner, 2012, p. 110 on the role of this in the development of arguments). Wulff et al. (2012)
also found that this is clusters occur “relatively much more often in the middle and final sections
of paragraphs and texts” (p. 145).
Example (13) shows this is being used to provide a reason (why) for what was described
previously in a paragraph of five sentences, while (14) illustrates the use of this shows that
pattern in relating evidence to a claim. Both uses of this as determiner and as pronoun can be
seen in one single extract from Law in (15); while the noun definition refers anaphorically to a
delimited antecedent (a quote), this is because can be interpreted as referring to the entire
previous sentence.
(13) Looking at the two main choices, aluminium and copper, both create a stable oxide on
their surface once exposed to the elements. The oxides are stable at constant temperature.
However, copper oxides grow if the copper is heated. If the copper is heated for a lengthy
amount of time, then eventually the metal will harden and break. This is why it is
41
especially important to make sure the copper has a good connection, therefore higher
tolerances in the joining processes need to be observed. (ENG_0023a)
(14) Different kinds of units such as features, segments, morphemes, words, and even
phrases are involved in speech errors where they might be exchanged, anticipated and
produced earlier than intended, or persevered and repeated. This shows that we construct
syntactic phrases and sentences in a buffer memory, where error may occur, before
articulating them instead of retrieving one word from the mental lexicon and say it one at
a time without involving any planning (Fromkin, Rodman & Hyams, 2003; Garman,
1990). (LING_6055a)
(15) Patteson J, in Thomas v. Thomas stated: “...consideration means something which is of
some value in the eye of the law.” This definition was used as binding precedent for
many years, and it can be said that Williams does not contradict this, but does in fact
agree with it. This is because now, the 'eye of the law' believes the practical, not the legal
benefit or detriment to be of value, and this is therefore an argument against Williams
creating ambiguity. (LAW_0143b)
The distribution of the most common verbs presented in Table 8 above is reflected to
some extent in the overall frequency of the three verb types used in the classification. Figure 4
shows the use percentage of the different types of verbs that follow this/these in the data.
Primary verbs (BE, HAVE and DO) account for nearly half of the occurrences (419), as could be
expected due to their functions as main and auxiliary verbs. Of the three primary verbs, BE is
the most predominantly used representing 89%. The second most common type comprises a
large variety of lexical verbs (289), most of which are infrequently used. While the auxiliary
modal verbs account for only 18% (157) of the total, five of them appear in the list of the ten
most frequently used verbs, as previously shown in Table 8.
42
Figure 4. Percentage of types of verbs (percentages rounded to the nearest whole number)
4.4 Distribution of Attended and Unattended this/these across Disciplines
Table 9 shows the distribution of the demonstratives across the four disciplines represented in
the data. Neither individual nor combined figures for this and these indicate a clear distinction
between the so-called ‘hard’ sciences (Biological Sciences and Engineering) and ‘soft’ sciences
(Law and Linguistics) in terms of frequency of occurrence.9 There is, however, some
disciplinary variation in the preference for anaphoric demonstrative structures, which is more
evident when considering individual disciplines. While Engineering and Biological Sciences
show the highest combined frequencies per thousand words, they are followed very closely by
Linguistics. The most pronounced discrepancy is found between these three disciplines and
Law, which could suggest that the demonstratives this and these are less favoured by Law
students, who could be using other cohesive devices to establish anaphoric reference. Looking
at this and these separately, the same pattern is repeated for the plural form, despite a smaller
difference between Engineering and Biological Sciences, whereas this shows a slightly
9 Refer to Table 2 in Section 3.1 for details on the selected hard and soft disciplines according to their grouping in the BAWE corpus.
48
18
33
PRIMARY MODAL LEXICAL
43
different pattern with Linguistics and Biological Sciences inverting positions as the second
highest number of instances, as well as a less marked discrepancy between Law and the other
disciplines.
The figures in Table 9 indicate a highly significant association (at the p < 0.0001 level)
between individual disciplines and the use of anaphoric demonstrative structures.10
Table 9. Raw and normalised frequencies (per 1,000 words) of this and these per discipline
thisa thesea this+these
Discipline Raw Normalised Raw Normalised Raw Normalised
Linguistics 360 8.82 96 2.35 456 11.17
Law 402 7.10 56 0.99 458 8.09
Biological Sciences 631 8.70 189 2.61 820 11.31
Engineering 640 10.87 157 2.67 797 13.54
2,033 498 2,531 a Chi-squared = 22.53 (df = 3); p-value = 0.00005063138 (p<0.0001); effect size measure: Cramér’s V = 0.0943
A comparison of the combined frequencies of attended and unattended this/these across
the four corpora also reveals some disciplinary variation. As can be seen in Figure 5 below, the
Biological Sciences show a clearer preference for using this/these as determiners rather than
pronouns. The difference is much less pronounced, however, for the other disciplines, whose
frequencies reflect the overall frequencies of occurrence that were presented in Section 4.1.
10 Statistical significance testing performed using the UCREL Significance Test System.
44
Figure 5. Percentage of this/these as determiners and pronouns per discipline (percentages
rounded to the nearest whole number)
When considered individually, again this and these exhibit different patterns when used
as determiners or pronouns, as can be seen in Table 10, which provides the relative normalised
frequencies of (un)attended this/these by discipline. The singular demonstrative is attended by
a noun or left unattended in similar proportions, except in Biological Sciences, as illustrated
above in Figure 5. It appears that biology students tend to make reference to a previous
antecedent more explicit than students of the other three disciplines. In contrast, these is used
on average three times more often as a determiner than as a pronoun regardless of discipline.
Looking more closely at the frequencies for this, it is noteworthy that in two seemingly
opposite disciplines, Linguistics and Engineering, pronominal this is actually slightly more
frequent than the determiner form. A more detailed, functional analysis of the extended
environments (including antecedents) that co-occur with (un)attended this in these disciplines
would be needed, however, to determine whether this similarity in frequency is also matched
in terms of discourse functions.
55
53
64
55
45
47
36
45
L I N G U I S T I C S L A W B I O L O G I C A L S C I E N C E S E N G I N E E R I N G
Determiner Pronoun
45
Table 10. Normalised frequencies (per 1,000 words) of attended and unattended this and these
per discipline, including results of statistical significance testing
thisa theseb this+thesec
Discipline Attended Unattended Attended Unattended Attended Unattended
Linguistics 4.36 4.46 1.79 0.56 6.15 5.02
Law 3.57 3.53 0.71 0.28 4.27 3.81
Biological
Sciences
5.12 3.59 2.08 0.52 7.20 4.11
Engineering 5.30 5.57 2.14 0.53 7.44 6.10 a Chi-squared = 15.76 (df = 3); p-value = 0.001270433 (p<0.01); effect size measure: Cramér’s V = 0.0880
b Chi-squared = 2.48, p-value = 0.4787066 (p<0.5); effect size measure: Cramér’s V = 0.0706
c Chi-squared = 20.02, p-value = 0.0001684888 (p<0.001); effect size measure: Cramér’s V = 0.0889
In general, the above results of the distribution of attended and unattended this/these point
to some cross-disciplinary variation, which is similar to the findings of Wulff et al. (2012) who
found some degree of variation, albeit not consistent, across disciplines in the frequencies of
this in advanced student writing. Their results, however, apply to final-year undergraduates and
graduate students, hence level of study could be an important explanatory factor. Wulff et al.
(2012) also add that “DISCIPLINE and LEVEL interact in quite intricate ways” (p. 141;
emphasis in original).
4.5 Distribution of Nouns/Verbs and Types across Disciplines
The distribution across disciplines generally reflects the overall findings for nouns and noun
types. Some differences are evident concerning the most frequently used nouns and noun types.
Table 11 shows the most common nouns in each of the four disciplines (see Appendix 4 for a
complete list); in bold the nouns that appear with this/these in at least three disciplines, and
underlined those that occur in two.
46
Table 11. The most frequent noun lemmas following this/these, by discipline
Linguistics Law Biological Sciences Engineering
essay (11) case (8) study (8) feature (7) stage (7) variable (7) question (6) task (6) claim (5) idea (5) point (5) way (5)
case (32) approach (13) offence (11) essay (9) decision (7) rule (6) presumption (5) act (4) argument (4) requirement (4) right (4) theory (4) view (4)
experiment (22) theory (17) essay (16) organism (13) process (13) reaction (13) way (12) gene (11) method (9) point (8) temperature (8) bacterium (7) cell (7) enzyme (7) condition (6) region (6)
experiment (21) value (18) laboratory (16) model (16) data (13) point (13) report (13) case (11) result (10) investigation (7) method (6) problem (6) error (5) industry (5) number (5) way (5)
Note: The nouns that appear in at least three disciplines are shown in bold; those in two disciplines are underlined.
As previously pointed out in Section 4.2, the type of task or assignment seems to play a
role in the use of nominal structures, since the deictic noun essay is one of the most common
nouns in the data regardless of discipline. The other ones are case, which has different meanings
in different disciplines as reflected in its classification as both shell and other abstract nouns,
and point and way that are general nouns whose meaning is dependent on the specific context
they occur, mostly used as shell nouns for encapsulating information presented previously. The
following examples show typical uses of case in different disciplines; not surprisingly, in Law
case is predominantly used in the sense of ‘legal action’ as opposed to ‘an instance of a
particular situation’ in the examples from Linguistics and Engineering.
(16) His impact on contractual licensee's rights to occupation was finally negated by the
Court of Appeal in Ashburn Anstalt v. Arnold. This case necessitated a full review of the
status of contractual licences. (LAW_0069b)
47
(17) In line 26 the word reaction precedes the phrase “set in”. Standing alone the word
'reaction' can be said to have either a negative or a positive prosody however in this case
once again it appears to have a negative prosody. (LING_6183a)
(18) The IPOD Oximeter module consists of a peripheral probe, with a microprocessor unit.
In this case, the peripheral probe contains two IR sensors, and the trace is displayed on a
computer screen. (ENG_0347e)
The remaining nouns are more discipline-specific or related to methodology, such as feature,
variable, question and task in Linguistics; offence, decision, act, and right in Law; organism,
reaction, gene and cell in Biological Sciences; laboratory, model, data, and report in
Engineering. This corresponds to the predominance of metadiscoursal and methodology-related
nouns found in Römer and Wulff’s (2010) case study on attended and unattended this in
advanced student writing (MICUSP).
The distribution of noun types in Figure 6 below shows a similar cross-disciplinary
variation to some degree, particularly in terms of shell and concrete nouns, whose frequencies
differ the most between the four disciplines. As might be expected, the highest frequencies of
concrete nouns are found in Biological Sciences and Engineering due to the focus of these
disciplines in physical entities or real world phenomena (Becher & Trowler, 2001, p. 36), while
Linguistics and Law show little or no use of this type of noun. In contrast, Law students favour
shell nouns (60%) to a larger extent when compared to the other disciplines and in particular to
Biological Sciences (33%) and Engineering (29%), where shell nouns are employed the least
amount of times. Deictic nouns are less used in Law (11%) and to a comparable degree in the
other disciplines, particularly in Linguistics (20%) and Engineering (21%).
The pervasive use of shell nouns in the data, in particular in the soft disciplines, and their
role for text cohesion would warrant a more detailed qualitative analysis in order to scrutinise
48
whether they are used to the same purposes in different disciplines and whether that use
develops over time.
Figure 6. Percentage of types of nouns per discipline (percentages rounded to the nearest whole
number)
The following examples illustrate typical uses of the most distinctive noun types in the
soft disciplines versus the hard disciplines, shell nouns and concrete nouns respectively. Two
of the most frequent shell nouns in Law and Linguistics, approach and evidence, are seen in
examples (19) and (20), where the noun takes its meaning from the antecedent; examples (21)
and (22) show the concrete nouns organism and sensors in Biological Sciences and
Engineering.
(19) In Hine v. Hine, Lord Denning M.R. argued that the courts would have discretionary
use of this section, in allocating shares of such assets, in cases where the acquisition of
property was achieved by joint efforts (which he argued would include non-financial
contributions such as child-care). This approach, typical of Denning, was firmly rejected
by the House of Lords in Pettitt v. Pettitt. (LAW_0069b)
29
33
60
48
37
26
29
26
21
15
11
20
13
26
6
E N G I N E E R I N G
B I O L O G I C A L S C I E N C E S
L A W
L I N G U I S T I C S
SHELL OTHER ABSTRACT DEICTIC CONCRETE
49
(20) The chimpanzees also seem to possess the ability to understand new combinations of
signs that they have not come across before and to express themselves spontaneously i.e.
without waiting for a stimulus which provokes them to respond. This evidence suggests
that the chimps did indeed develop a grasp of simple language. (LING_6067b)
(21) The most mitochondrial-like bacterial genome is that of Rickettsia prowazekii, the cause
of louse borne typhus. This organism has a genome of more than a million bp in size and
contains 834 protein encoding genes. (BIO_0032a)
(22) An extensive system would have a lot of sensors placed around the home. These sensors
would give information to the processing unit where the situation of the elderly person
would be determined. (ENG_0228f)
Overall, these results show variation to a larger extent when compared to the similar
percentages of noun types in Applied Linguistics and Engineering research articles reported by
Gray and Cortes (2011). It remains unclear to what degree this difference reflects the influence
of discipline or genre, since undergraduate student writing differs considerably from research
article writing in terms of purpose.
In terms of the most common verbs following this/these, no salient cross-disciplinary
differences are found. As can be seen in Table 12 (see Appendix 5 for a complete list), the most
frequent verb in the four corpora is BE, followed by other auxiliary verbs DO, may and would.
The only lexical verb that appears in all disciplines is MEAN, confirming its importance in
providing explanations or interpretations in student writing (cf. Wulff et al., 2012). There are
also a number of overlapping verbs that appear in most of the corpora, which points to a broad
similarity across the disciplines. In fact, only a few verbs occur more often with this/these in
one discipline only: LINK, PROVIDE, INCLUDE, PRODUCE. None of these, however, can be said to
be particularly characteristic of the discipline in which it occurs. Gray and Cortes (2001) also
found that only a few verbs in their data seem to be discipline-specific, further suggesting that
50
the majority of the verbs are instead genre-specific “in that they represent processes and states
typical in research articles but do not require prior knowledge of the discipline in order to be
comprehended” (p. 38). The results shown in Table 12, however, seem to indicate that those
are not only genre- but register-specific verbs since they appear quite frequently in academic
writing in general (cf. Schutz, 2013), including student writing, rather than being specific to
research articles.
Table 12. The most frequent verb lemmas following this/these, by discipline
Linguistics Law Biological Sciences Engineering
BE (75)
SUGGEST (9)
can (7)
DO (7)
SHOW (6)
may (5)
would (5)
could (4)
HAVE (4)
LINK (4)
MEAN (4)
BE (79)
HAVE (8)
MEAN (7)
would (7)
DO (5)
SUGGEST (5)
may (4)
could (3)
LEAD TO (3)
will (3)
BE (115)
can (14)
MEAN (13)
would (11)
may (8)
OCCUR (7)
PROVIDE (5)
will (5)
ALLOW (4)
CAUSE (4)
DO (4)
INCLUDE (4)
LEAD TO (4)
SHOW (4)
SUGGEST (4)
BE (104)
would (25)
can (16)
MEAN (14)
may (13)
could (11)
HAVE (11)
CAUSE (8)
will (8)
ALLOW (6)
SHOW (6)
DO (5)
OCCUR (4)
PRODUCE (4)
Note: Small caps identify lemmas. The verbs that appear in all disciplines are shown in bold; those in only one
discipline are underlined.
The distribution of verb types displayed in Figure 7 also shows minor variation, again
mostly between the hard and soft disciplines, with the figures for Linguistics and Law matching
almost entirely and those for Engineering and Biological Sciences differing only slightly in the
proportion of primary versus modal verbs. This suggests that, in contrast to the results for nouns
51
and noun types reported on above, novice student writing does not appear to reflect as much
disciplinary specificity with regards to the verb types used in cohesive structures with this/these.
Figure 7. Percentage of types of verbs per discipline (percentages rounded to the nearest whole
number)
Looking specifically at how the most frequent primary verb (BE) and the two most
common lexical verbs overall (MEAN and SUGGEST) are distributed across the disciplines, some
interesting patterns emerge. A search for this + BE reveals that this structure is used slightly
more in the hard disciplines (59%) than in the soft disciplines (41%). In contrast, the structure
this + MEAN/SUGGEST + that performs the previously commented on explanatory function more
predominantly in Biological Sciences and Engineering texts with the phrase this means that (21
out of 24 instances), and in Linguistics and Law mainly through this suggests that (11 out of 14
instances). These frequencies seem to indicate distinct verb preferences between the hard and
soft sciences that could reflect the way knowledge is constructed differently, based on evidence
or tentative interpretation, respectively. It could be argued, however, that genre also plays an
57 5847
40
14 13
16 25
30 3036 35
L I N G U I S T I C S L A W B I O L O G I C A L S C I E N C E S E N G I N E E R I N G
PRIMARY MODAL LEXICAL
52
important role in this distribution since a large proportion of texts in the Biological Sciences
and Engineering corpora belong to the Explanation genre family (Gardner & Nesi, 2013).
5 Conclusion
The present study has investigated the extent to which the anaphoric demonstratives this and
these are used as determiners (attended) or pronouns (unattended) by first-year undergraduate
students from four different academic disciplines, as well as analysed the nominal and verbal
structures that follow the demonstratives. Based on data extracted from the BAWE corpus, the
following five research questions were answered:
1) To what degree are attended and unattended this/these used in first-year undergraduate
student writing?
The demonstratives this and these are used to varying degrees as determiners and pronouns in
the data. While these occurs predominantly together with an attendant noun, this is almost
evenly followed by a noun or a verb.
2) Which nouns and noun types (concrete, deictic, shell, other abstract nouns) most
frequently follow attended this/these?
Overall, this and these are predominantly attended by methodology- or results-related nouns
(e.g. experiment, method, result) and nouns that relate to student activity or type of task (e.g.
essay, report), as well as other more general nouns (e.g. case, way) that occur mostly in ‘semi-
fixed’ expressions with an anaphoric function such as “in this case” and “in this way”. When
considered individually, this and these mainly pattern with different nouns. With respect to the
types of nouns, shell and other abstract nouns occur predominantly in the data, followed by
53
deictic and concrete nouns, as might be expected given that academic discourse tends to be
abstract, and includes a great deal of metadiscoursal references.
3) Which verbs and verb types (lexical, primary, modal) most frequently follow unattended
this/these?
The verb lemma BE accounts for nearly half the occurrences of verbs in the data, and is widely
used in seemingly ‘prefabricated’, fixed structures such as this + BE to present a reason for a
previous point. A great variety of lexical verbs can be found, but only a small number of them
occur frequently, including MEAN, SUGGEST and SHOW, which seem to be used to provide an
explanation or interpretation of the findings, in constructing arguments. The distribution of verb
types reflects closely that of the most common verbs, with primary verbs occurring most
frequently (48%), followed by lexical verbs (33%) and modal verbs (18%).
4) To what extent, if any, do student writers in different disciplines differ with respect to
the frequency of use of attended and unattended this/these?
Some disciplinary variation can be found in the use of the demonstratives, with Law students
employing these cohesive devices to a smaller extent than students of the other three disciplines.
The percentages of (un)attended this/these also show some variation between the disciplines,
namely a clearer preference in Biological Sciences for using this/these with an attendant noun
rather than unattended. The individual distributions for this and these indicate different patterns
for the demonstratives across disciplines. The demonstrative this is used to a similar extent as
a determiner and pronoun in Law, Linguistics and Engineering, and more often as a determiner
in Biological Sciences, while these is predominantly used as a determiner in all disciplines.
5) How are the most frequently co-occurring nouns/verbs and respective types distributed
across academic disciplines?
54
Cross-disciplinary variation is more salient in the distribution of nouns/noun types than of
verbs/verb types. While three of the most frequently used nouns appear in all four disciplines,
the majority is either related to methodology or more discipline-specific. In terms of types of
nouns, there is a striking predominance of shell nouns in Law which contrasts with their more
sparse use in the Biological Sciences and Engineering. The results for verbs and types of verbs,
on the other hand, show minor disciplinary variation, with a large number of overlapping verbs.
The main difference is found in the distribution of verb types between ‘soft’ and ‘hard’ sciences.
The findings of this study stand in contrast to the prescriptivist views on unattended this
as an unwanted phenomenon that should be corrected in students’ writing. Native English first-
year undergraduates, as represented by the BAWE corpus data, do use this/these as pronouns
to a large extent in their assignments, although with varying degrees of frequency in different
disciplines. When compared to previous studies (e.g. Wulff et al., 2012; Petch-Tyson, 2000;
Römer & Wulff, 2010), some possible explanations for these results were proposed that relate
to the educational context and level of study in association with the range of lexical choices
available to novice student writers.
Some differences were also observed in the distribution of nouns, verbs and respective
types across disciplines, which may be explained by factors other than discipline. Given the
predominance of different genres in the different subcorpora (essay and critique in Linguistics
and Law; methodology recount and explanation in the Biological Sciences and Engineering),
the influence of genre in the results needs to be taken into account. Future studies that are able
to tease apart these interrelated variables could provide insightful findings.
The size of the data sets is another limitation of the present study that prevents making
any generalisable observations. Further research using larger corpora could also extend this
study by comparing the use of the demonstratives between levels of study (early to advanced
undergraduates) or expertise (student versus professional academic writing), which could help
55
reveal possible patterns of development in cohesive writing using anaphoric demonstratives.
Likewise, an investigation of (un)attended this/these in academic writing by non-native English
speaking novice writers would complement the findings of the present study. The role of the
anaphoric demonstratives this/these in different communicative contexts, as well as a
comparison with their plural counterparts that/those, could also be examined in future studies.
By establishing what is done in actual learning contexts, better-informed EAP writing
instruction and reference materials can be developed in line with current practices (Hunston,
2002; Nesi, 2016). The results of this empirical study on anaphoric (un)attended this/these as
important cohesive devices in successful academic writing by first-year undergraduates could
be of value to both teachers and students. The fact that unattended this in particular seems to be
such a widespread phenomenon in novice student writing merits further attention by EAP
practitioners and researchers. It is important to draw attention to all the choices available to
students when selecting resources that can enhance text cohesion while balancing clarity and
economy of expression.
56
References
American Psychological Association. (2010). Publication manual of the American
Psychological Association (6th ed.). Washington, DC: American Psychological
Association.
Anthony, L. (2019). AntConc (Version 3.5.8) [Computer Software]. Tokyo, Japan: Waseda
University. Available from http://www.laurenceanthony.net/software
Becher, T., & Trowler, P. (2001). Academic Tribes and Territories: Intellectual Enquiry and
the Culture of Disciplines. London: The Society for Research into Higher Education &
Open University Press.
Bhatia, V. K. (2002). A Generic View of Academic Discourse. In J. Flowerdew (Ed.), Academic
Discourse (pp. 21-39). London: Longman.
Biber, D. (2006). University language: A corpus-based study of spoken and written registers.
Amsterdam: John Benjamins.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar
of spoken and written English. Harlow: Pearson Education.
Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: Complexity,
elaboration, explicitness. Journal of English for Academic Purposes, 9(1), 2–20.
Bloor, T., & Bloor, M. (2004). The Functional Analysis of English: A Hallidayan Approach.
London: Arnold.
Brown, G., & Yule, G. (1983). Discourse analysis. Cambridge: Cambridge University Press.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly 34(2), 213–238. Charles, M. (2011). Adverbials of result: Phraseology and functions in the Problem–Solution
pattern, Journal of English for Academic Purposes, 10(1), 47–60.
Flowerdew, J. (2016). English for Specific Academic Purposes (ESAP) writing: Making the
case. Writing and Pedagogy, 8, 1–32.
57
Flowerdew, J. (2002). Introduction: Approaches to the Analysis of Academic Discourse in
English. In J. Flowerdew (Ed.), Academic Discourse (pp. 1-17). London: Longman.
Gardner, D., & Davies, M. (2013). A New Academic Vocabulary List, Applied Linguistics,
35(3), 305–327.
Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing.
Applied Linguistics, 34(1), 25–52.
Gardner, S., Nesi, H., & Biber, D. (2018). Discipline, level, genre: Integrating situational
perspectives in a new MD analysis of university student writing. Applied Linguistics.
(Advance article)
Geisler, C., Kaufer, D. S., & Steinberg, E. R. (1985). The unattended anaphoric “this”: When
should writers use it? Written Communication, 2(2), 129–155.
Glenn, C., & Gray, L. (2013). The Writer’s Harbrace Handbook. International Edition (5th ed.).
Boston, MA: Wadsworth Cengage Learning.
Granger, S., & Tyson, S. (1996). Connector Usage inthe English Essay Writing of Native and
Non-Native EFL Speakers of English. World Englishes, 15/1, 17–27.
Gray, B. (2010). On the use of demonstrative pronouns and determiners as cohesive devices: A
focus on sentence-initial this/these in academic prose. Journal of English for Academic
Purposes, 9(3), 167–183.
Gray, B. & Cortes, V. (2011). Perception vs. evidence: An analysis of this and these in academic
prose. English for Specific Purposes, 30(1), 31–43.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.
Hinkel, E. (2001). Matters of cohesion in L2 academic texts. Applied Language Learning,
12(2), 111–132.
Hunston, S. (2006). Corpus Linguistics. In K. Brown (Ed.), Encyclopedia of Language &
Linguistics (2nd ed.). 234–248.
58
Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
Hyland, K. (2009). Academic discourse: English in a global context. London: Continuum.
Hyland, K. 2008. As can be seen: Lexical bundles and disciplinary variation. English for
Specific Purposes, 27(1), 4–21.
Hyland, K. (2006). Disciplinary differences: Language variation in academic discourses. In K.
Hyland & M. Bondi (Eds.), Academic discourse across disciplines (pp. 17-45). Frankfurt:
Peter Lang.
Hyland, K. (2004). Disciplinary discourses: social interactions in academic writing. Ann
Arbor, MI: University of Michigan Press.
Hyland, K. (2002). Specificity revisited: how far should we go now?, English for Specific
Purposes, 21(4): 385–395.
Hyland, K., & Jiang, F. (2017). Is academic writing becoming more formal? English for Specific
Purposes, 45, 40–51.
Hyland, K., & Shaw, P. (2016). Introduction. In K. Hyland & P. Shaw (Eds.), The Routledge
Handbook of English for Academic Purposes (pp. 1–14). London/New York: Routledge.
Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly, 41, 235–
253.
Lee, J. J., Bychkovska, T., & Maxwell, J. D. (2019). Breaking the rules? A corpus-based
comparison of informal features in L1 and L2 undergraduate student writing. System,
80, 143–153.
McCarthy, M., Matthiessen, C., & Slade, D. (2010). Discourse Analysis. In N. Schmitt (Ed.),
An Introduction to Applied Linguistics (pp. 53-69). London: Arnold Publishers.
Moskovit, L. (1983). When is broad reference clear? College Composition and Communication,
34(4), 454–469.
59
Nesi, H. (2016). Corpus studies in EAP. In K. Hyland & P. Shaw (Eds.), The Routledge
Handbook of English for Academic Purposes (pp. 206-217). London/New York:
Routledge.
Nesi, H., & Gardner, S. (2012). Genres across the disciplines: Student writing in higher
education. Cambridge, UK: Cambridge University Press.
Paltridge, B. (2012). Discourse analysis: An introduction. London: Continuum.
Petch-Tyson, S. (2000). Demonstrative expressions in argumentative discourse: A computer
corpus-based comparison of non-native and native English. In S. Botley & A. M.
McEnery (Eds.), Corpus-based and computational approaches to discourse anaphora
(pp. 43-64). Amsterdam: John Benjamins.
Römer, U., & Wulff, S. (2010). Applying corpus methods to writing research: Explorations of
MICUSP. Journal of Writing Research 2(2). 99–127.
Schutz, N. (2013). How Specific is English for Academic Purposes? A look at verbs in business,
linguistics and medical research articles. In G. Andersen & K. Bech (Eds.), English
Corpus Linguistics: Variation in Time, Space and Genre (pp. 237-257). Amsterdam:
Rodopi Publishers.
Simpson-Vlach, R., & Ellis, N. (2010). An academic formulas list: New methods in
phraseology research. Applied Linguistics 31(4), 487–512.
Staples, S., Egbert, J., Biber, D., & Gray, B. (2016). Academic Writing Development at the
University Level: Phrasal and Clausal Complexity Across Level of Study, Discipline, and
Genre. Written Communication, 33(2), 149–183.
Swales, J. (2005). Attended and unattended “this” in academic writing: A long and unfinished
story. ESP Malaysia, 11(1), 1–15.
Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge:
Cambridge University Press.
60
Swales, J., & Feak, C. (2012). Academic Writing for Graduate Students: Essential Tasks and
Skills (3rd ed.). Ann Arbor, MI: University of Michigan Press.
Wulff, S., Römer, U., & Swales, J. (2012). Attended/unattended this in academic student
writing: Quantitative and qualitative perspectives. Corpus Linguistics & Linguistic
Theory, 8, 129–157.
61
Appendix 1
Complete list of head nouns (lemmas) following this/these in the data (480 types / 1423 tokens).
Rank Noun Frequency 1 2 3 4 5 6 7 8 8 8 8
12 13 13 13 13 17 17 17 20 20 20 20 24 24 24 27 28 29 29 29 29 29 29 29 29 29 29 39
case experiment essay point way theory value method model process laboratory result data investigation reaction report approach organism stage area idea question time gene offence type feature rule cell error graph market problem region study task temperature variable bacterium
56 47 37 29 25 24 20 17 17 17 16 16 14 14 14 14 13 13 13 12 12 12 12 11 11 11 10 9 8 8 8 8 8 8 8 8 8 8 7
62
39 39 39 39 39 39 39 39 48 48 48 48 48 48 48 48 48 57 57 57 57 57 57 57 57 57 57 57 57 57 57 71 71 71 71 71 71 71 71 71 71 71 71
behaviour decision difference enzyme procedure table technique view claim condition evidence knowledge number part period phrase requirement argument code discussion figure hypothesis industry issue people phase presumption section situation stain word act association assumption change concept conclusion definition device fact form group information
7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4
63
71 71 71 71 71 71 71 71 71 71 71 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94
level lifecycle module organelle page phenomenon project property right sequence structure ability adaptation antibiotic application arrangement article aspect barrier boundary calculation characteristic company cost criticism design disease effect element example explanation factor finding formula fusion image instance line load material molecule notion paper
4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
64
94 94 94 94 94 94 94 94 94 94 94 94 94
140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140
pathogen plant practical reason reptile research respect sort sound strategy technology threat transfer beetle DNA action amoeba animal assignment bird bond book bottle car chain charge chimpanzee choice compartment compound consideration context demand development diagram distinction equation exercise field fraction frame framework function
3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
65
140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 209 209
genome implant increase interaction interest intron kind lab lion machine measurement mechanism motor need optimum piece plate position product protein provision rate relationship sample scenario sense sentence set sign signal software song source state statement subject system thickness undulipodia vehicle year acquisition addition
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1
66
209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209
advancement advert advertisement aerobe agreement aim alteration alternative amount analogy analysis anticipation apparatus appendix approximation array attachment attempt background balance basis benefit block blockage breakthrough building capsule carrier cassette catalysis categorisation channel children circumstance clumping coil collection colony combination commonplace comparison competency component
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
67
209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209
composition conformation constant construction containment convention conviction cooperation corpus cover craftmanship creature crime crista crossing cursor cycle danger death defect defence degradation density description detail dialect difficulty diffusion digression dilution direction disc discovery disorder distortion division doctrine dome drink drum duty electron end
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
68
209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209
energy enhancement equity era establishment event evolution examination excitement experience expression extension fear female firm fluid foetus force format foundation freedom gesture gland graphology grounding grouping growth habit histone honour hormone hunk imagery imbalance imposition imprint inaccuracy incident inconsistency individual infection informant infusion
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
69
209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209
interview investment item jurisdiction justification law licence linguist link locality male manipulation manufacturer measure member membrane methodology mist mixture mode moment movement multiplicity muscle mutation nap nature negation nest objection opinion outbreak overview oxaloacetate pH pair papule paradox parasite particle patch patient pattern
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
70
209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209
percentage perspective pesticide photograph picture pilus plot polymer polynomial polypeptide positioning possibility precaution precautions preference prevention pride principle promoter proposal protection quanta quantity quote radius ratio rationalisation reasoning receiver receptor recycling rejection reliance repressor residency resistance respirator response responsibility restriction revelation reversion risk
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
71
209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209
robot route routine scale scene search sensitivity sensor separateness setting shape shift size skill society solution species speculation speech spike split stance standard step stock story strain strength style substance subunit success sugar suggestion summary symbol terminal thesis thylakoid titre tradition transition trend
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
72
209 209 209 209 209 209 209 209 209 209 209
tube unit use utterance variation version vocalisation voltage waiting weight work
1 1 1 1 1 1 1 1 1 1 1
73
Appendix 2 Complete list of head nouns classified as shell, other abstract, deictic, and concrete.
SHELL ABSTRACT DEICTIC CONCRETE ability action adaptation advancement alteration analysis approach area arrangement aspect association assumption behaviour benefit boundary calculation case categorisation change characteristic claim combination comparison composition condition conformation construction cooperation criticism decision definition demand development diffusion discovery division doctrine enhancement error establishment evidence example experience explanation fact
ability act approach area barrier blockage bond breakthrough case chain choice clumping code company consideration conviction cursor data death density design difference DNA energy equity error evolution feature form formula graph graphology idea inaccuracy industry infusion investment jurisdiction knowledge law licence lifecycle load mechanism model
data diagram essay experiment figure graph image interview investigation lab laboratory line overview page paper photograph phrase picture point practical project question quote report result section sentence stain study table task
antibiotic apparatus bacterium beetle capsule cassette cell colony compartment compound crista device dilution disc dome drum enzyme female fluid frame gene hormone implant implant individual male manufacturer material member membrane mist mixture molecule motor muscle nest organism oxaloacetate papule parasite plant protein reptile sensor species
74
feature finding fraction habit honour hypothesis idea imbalance inconsistency information issue justification line measure method model need offence pattern phenomenon point precaution problem procedure property question rationalisation reaction reasoning region relationship requirement responsibility restriction result revelation separateness sequence situation speech stage story strategy strength task technique theory time tradition transfer
module mutation negation notion number number objection page percentage phrase polynomial presumption quanta quantity question radius rate reaction recycling region relationship repressor residency scale sensitivity sequence set shape size song strategy structure study system theory transfer type value vocalisation voltage weight
spike stain sugar terminal thylakoid undulipodia
75
use value view way
76
Appendix 3
Complete list of verbs (lemmas) following this/these in the data (111 types / 865 tokens).
Rank Verb Frequency 1 2 3 4 5 6 7 7 9
10 10 12 13 13 15 16 17 18 18 18 21 21 21 21 25 25 25 25 25 30 30 30 30 30 35 35 35 35 35
be would can mean may have could do suggest show will cause allow occur lead to include provide involve produce result create increase indicate seem demonstrate give highlight link reflect affect bring contain prevent use add apply go help might
373 48 39 38 30 25 21 21 20 16 16 14 11 11 9 8 7 6 6 6 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2
77
35 35 35 35 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44
need prove stop work address affirm assign assist assume attribute build bypass change combine concentrate contend cover decrease depend deter dilute disadvantage display drive eliminate enable end up enforce equate exclude exist explain extend fail fall forego generate happen hold illustrate induce lack limit
2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
78
44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44
look lower make mix must offer open place play pose preclude present refer remove render revise rouse run save secrete send set simplify start strip tend undermine verge
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
79
Appendix 4
Alphabetically ordered list of noun lemmas per discipline (types/tokens).
Biological Sciences (221/501)
Engineering (187/434)
Law (113/240)
Linguistics (130/251)
ability acquisition action adaptation addition aerobe amoeba animal antibiotic area arrangement array aspect association assumption attachment bacterium balance behaviour benefit bird block blockage bond capsule carrier case catalysis cell chain change characteristic charge claim clumping collection colony compartment compound concept conclusion condition conformation
ability adaptation advancement alteration analogy apparatus appendix application approximation area argument arrangement assignment assumption barrier beetle behaviour book bottle boundary breakthrough building calculation car case cassette cell change code coil company comparison competency component conclusion cost craftmanship crossing cursor data defect demand density
act agreement aim alternative analysis approach area argument article assumption attempt basis case categorisation charge choice circumstance commonplace concept conclusion consideration construction context convention conviction cost crime criticism decision defence definition degradation discussion distinction doctrine duty effect equity essay establishment examination example fact
ability action advert advertisement amount anticipation area article assignment association background behaviour boundary case change channel children chimpanzee choice claim code combination composition concept context corpus definition demand description development dialect difference difficulty digression discussion disorder distortion drink error essay evidence example excitement
80
constant containment cooperation cover creature crista cycle danger data death dilution discovery discussion disease division dna electron element end energy enhancement enzyme error essay event evidence evolution exercise experiment extension fact factor feature female figure finding foetus form fraction fusion gene genome gland graph group growth histone hormone hypothesis idea
design detail development device diagram difference diffusion direction disc dome drum effect element equation era error essay exercise experience experiment explanation expression fear feature figure firm fluid force formula frame freedom graph grounding group grouping habit honour idea image implant inaccuracy industry information infusion instance investigation investment issue kind knowledge
foundation frame idea imposition incident inconsistency increase instance interest investigation issue jurisdiction justification knowledge licence line measure method nature negation notion offence page period perspective phrase piece point position precaution preference presumption principle procedure proposal protection provision question rationalisation reason reasoning rejection requirement respect responsibility restriction revelation right rule scenario
experiment explanation fact factor feature field finding form format framework function gesture graphology hunk idea image imagery informant instance interview investigation issue item knowledge level line linguist link manipulation method mode multiplicity need notion paper paradox part patient people period phase phenomenon phrase piece point process question receiver research result
81
imbalance imprint increase individual infection information interaction intron investigation lifecycle lion male material mechanism membrane method model molecule moment movement mutation need nest number opinion optimum organelle organism outbreak oxaloacetate page paper papule parasite part particle patch pathogen pattern period pesticide ph phase phenomenon photograph pilus plant plate plot point
lab laboratory law level load locality machine manufacturer market material measurement member method methodology mist mixture model module motor muscle nap number objection overview page pair part people percentage period phenomenon picture point polynomial positioning problem procedure process product project property question quote radius rate ratio reaction reason region relationship
section sense sentence shift situation society stance standard statement subject theory thesis threat time transition type use view way word
robot rule scale scene search sense sentence sequence set sign skill sort sound speech stage story structure study style subject success symbol task technique theory time two type unit utterance variable variation view waiting way word work
82
polymer polypeptide position possibility practical precautions prevention pride problem procedure process product promoter property protein quanta quantity rate reaction reason receptor recycling region relationship reliance report repressor reptile residency respect result reversion route sample scenario section sensitivity separateness sequence size solution song source species speculation spike split stage stain step
report requirement resistance respect respirator response result risk routine rule section sensor set setting shape signal situation software sort source stage state strain strategy strength structure summary system table task technology terminal theory time type value variable vehicle view voltage way weight word year
83
stock strategy structure substance subunit sugar suggestion system table technique technology temperature theory thickness thylakoid time titre tradition transfer trend tube type undulipodia value version view vocalisation way
84
Appendix 5
Alphabetically ordered list of verb lemmas per discipline (types/tokens).
Biological Sciences (49/256)
Engineering (60/298)
Law (40/160)
Linguistics (29/152)
affect allow be bring bypass can cause change contain could create demonstrate deter dilute disadvantage do drive end exist explain generate have highlight include increase indicate induce involve lead to may mean occur play pose prevent produce provide refer reflect result secrete show start
add allow assign assist assume be can cause combine concentrate could cover create decrease demonstrate depend display do eliminate enforce equate exclude give happen have help highlight include increase indicate involve look lower make may mean might mix need occur offer open place
add address affirm allow apply attribute be bring build can cause contend could create do extend fail fall give go have hold increase indicate involve lead to may mean must preclude present reflect remove revise rouse seem suggest undermine will would
affect be can could demonstrate do enable forego have help highlight illustrate include increase indicate lack lead to limit link may mean need reflect run seem show suggest tend would
85
stop suggest up verge will would
produce prove provide render result save seem send set show simplify strip suggest use will work would