Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the...

27
Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW

Transcript of Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the...

Page 1: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Research Methods Workshop

Introducing Corpus Linguistics Techniques (1):

Making the Most of the VIEW

Page 2: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

A reminder

• Corpus Linguistics is a methodology, which tends to:

– involve the analysis of “actual” language use in natural texts (but the analysis of literary texts is also possible)

– utilise a large and principled collection of natural texts, known as a “corpus”, as the basis for analysis

– makes extensive use of computers, utilising both automatic and interactive techniques

– depend on both quantitative and qualitative analytical techniques:“The goal of corpus-based investigations is not simply to report quantitative findings, but to explore the importance of these findings for learning about the patterns of language use”

(adapted from Biber et al 1998: 4-5)

Page 3: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Concordancing

= An alphabetical listing of the words in a text, given together with the contexts in which they appear. – The most common form of concordance today is the Keyword-in-

Context (KWIC) index:

Figure 1: Concordance of poor in Tale of Two Cities, Book 1 1320 taste it is that such poor cattle always have in their mouths 948 of sparing the poor child the inheritance of any part of 778 small property of my poor father, whom I never saw--so long 1870 desolate, while your poor heart pined away, weep for it 1947 Miss, if the poor lady had suffered so intensely 1884 the love of my poor mother hid his torture from me 1615 stockings, and all his poor tatters of clothes, had, in a long 1577 faded away into a poor weak stain. So sunken and 1001 on your way to the poor wronged gentleman, and, with a 1036 detachment from the poor young lady, by laying a brawny

Page 4: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

What do concordancers let you do?

– let you look at a word in context, see how common it is, see the style associated with it.

– Let you compare your usage with that of others (very useful in EFL)– Let you compare usage across different genres/registers (very useful

in ESL)– More advanced users can explore attitudes (the thought processes

that lie behind the words)

The recall problem: Although concordancers allow you to specify search words, it’s worth remembering that …

– Some tools will only give you the results for what you said you were looking for, which may not be the same thing as what you thought you were looking for.

– You notice only what you get back; you will not notice what you did not find.

Page 5: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Becoming familiar with VIEW

= Variation In English Words and PhrasesYou can find it at: http://corpus.byu.edu/bnc/

So what’s so good about VIEW?– allows you to quickly and easily search for a wide range of words and phrases

of English in the 100 million word BNC.– BNC = represents modern English of the late 20th century

– As with some other BNC interfaces, you can search for words and phrases by • exact word or phrase • wildcard or part of speech• combinations of word/phrase and wildcard/part of speech. 

– Time permitting, we’re going to master the first two on the list.

Page 6: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Search: ‘corporation’

Clicking on the word brings up a concordance

We can search the whole of the BNC – or just a small part of it (i.e. W_commerce) – but

remember to tick the “limit” box!!!

Page 7: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

KWIC concordance of ‘corporation’ (in w_commerce)

Page 8: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Sorting our entries …

We can sort our entries according to ‘left’ and ‘right’ context, by using an *

However, if we want to look at the same results,we have to pick the appropriate register in the left

hand column … e.g. w_commerce

Page 9: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Results for … corporation *

What strikesyou about the results?

Page 10: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Results for … * corporation

What strikesyou about

these results?

Page 11: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Group Task

• How is ‘corporation’ used in newspaper tabloids?• Is it used in similar ways to the use of ‘corporation’ in

W_commerce?

Let’s explore some other words ….

You choose …. !

Page 12: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Check out the “CHART” button …

CHART is useful when you want to see the extent to which specific words are utilised inthe different genres. …

Page 13: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Using VIEW to search for collocates

We use the ‘surrounding’ display … remembering to:• Make sure we’re in the TABLE display• Define the size of window (the smaller the window, the

closer our words will be to X)• Put the ‘min freq’ to X (i.e. any number between 2-7)• Tick the ‘limit’ box• Choose the register

Page 14: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Using VIEW to search for collocates Search word = marketClick on ‘surrounding’

Register = W_commerceTick limit box

What strikesyou about

these results?

Replicate the search on your computer, and then answer the following:• Are there any collocates that are predictable in your view?• Do any of the collocates of ‘market’ surprise you?

Page 15: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

We can use a similar process to search for antonyms and synonyms …

Page 16: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Comparing synonyms

‘Search String’ to worker/employee‘Surrounding’ on (5/5 window)

‘Register’ = W_commerce‘Limit’ to on ‘Min freq.’ to 5

Page 17: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Comparing synomyns

Your chosen words

No. of times that word X appears near to chosen words

Page 18: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Group task: The collocates of worker/employee

• What collocates with ‘worker’?

• What collocates with ‘employee’?

• Change your search so that you use the whole BNC …register 1 = -- IGNORE –– Have the collocates for ‘worker’ remained the

same?– Have the collocates for ‘employee’ remained

the same?

Page 19: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Lexical priming and semantic prosody

Lexical priming: “Every word is primed for use in discourse as a result of the cumulative effects of an individual's encounters with the word...Every word is primed to occur with particular words; these are its collocates.” (Hoey 2005)

Semantic prosody: …occurs when the habitual collocates of a word (or phrase) colour its meaning so it can no longer be seen in isolation from its semantic prosody.

Some questions to ponder …• How do we study 'semantic prosody'?• What can it tell us?• Where can we find it?• How can we find it?

Page 20: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Searching for meaningful patterns

residual/core meaning

DENOTATION

COLLOCATION

COLLIGATION

SEMANTIC ASSOCIATION:

semantic PREFERENCEsemantic PROSODY

textual meaning

= literal meaning

= patterns of words appearing together

= collocation patterns based on syntactic groups rather than individual words

= tendency of a word to keep company with a semantic set or class; some members of this set or class will usually be collocates.

= colouring of meaning (? Permanently ?)

Patterns contribute to the creation of a network of textual meanings; computers and human interpretation can be used in conjunction to identify

(and make sense of) these patterns ...

Page 21: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Group Task

• Do a search for the following:

– “slump”, “slumped”, slumps”, “jinxed”, “shortfall”, “demand”

– How are they used in context and are they always negative?

– Are the meanings of any of these terms “coloured” (i.e. can no longer be seen in isolation from its semantic prosody)?

Page 22: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Now let’s explore parts of speech

• What do you think the most common noun in English is?

– Write down your answers on a piece of paper

– Now do the following search to find out whether your “hunch” was correct:

[nn*]

Page 23: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

The most frequent nouns in the BNC

We search for nouns by including [n*] here …

What strikesyou about the results?

Page 24: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Most frequent nouns in spoken section of BNC ( = 10 million words)

Notice that TIME is now the second most frequent noun …

but there are a lot of other nouns relating to periods of time …

Indeed - YEAR, DAY, YEARS, WEEK, NIGHT, MORNING –

are all in the top 25!

Question: How much does this result suggest we are

preoccupied with time in Britain?

Page 25: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

Other parts of speech worthy of exploration

• [vv*]

• [v*]

• [aj*]

• [av*]

Page 26: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

CL: Best Practice

• We need to balance a quantitative approach with a qualitative approach

• We need to know our data – or be prepared to become very familiar with it!

• We need to be prepared to engage with theory

Page 27: Research Methods Workshop Introducing Corpus Linguistics Techniques (1): Making the Most of the VIEW.

References

Biber, D., Conrad, S., and R. Reppen (1998) Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.,

Barnbrook (1996) Language and computers. Edinburgh: Edinburgh University Press.

Hoey, M. (2005). Lexical priming: a new theory of words and language. London: Routledge.

Nelson, Mike ‘Computers and Semantic Prosody’. Online paper, available at http://www.kielikanava.com/semantic.html.*

Sinclair, J. (2004). Trust the text. London: Routledge.

Stubbs, M. (1996). Text and corpus analysis. Oxford: Blackwell.