Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey,...

76
Forecasting the Forecasting the beginnings of beginnings of newspaper texts newspaper texts Some corpus & Some corpus & experimental findings experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott BAAL 11-13 September 2008, Swansea University

Transcript of Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey,...

Page 1: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Forecasting the Forecasting the beginnings of beginnings of newspaper textsnewspaper textsSome corpus & experimental Some corpus & experimental findingsfindings

Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and

Mike Scott

BAAL 11-13 September 2008, Swansea University

Page 2: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we note subconsciously

the words it occurs with (its collocations),

the grammatical patterns it occurs in (its colligations),

the meanings with which it is associated (its semantic associations),

Page 3: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

word collocates with against and a

a word against has a semantic association with sending & receiving communication

(e.g. hear a word against)

send/receive a word against has a pragmatic association with denial

(e.g. wouldn’t hear a word against)

Page 4: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we note subconsciously

the words it occurs with (its collocations),

the meanings with which it is associated (its semantic associations),

Page 5: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we note subconsciously

the words it occurs with (its collocations),

the meanings with which it is associated (its semantic associations),

Page 6: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

word collocates with against and a

a word against has a semantic association with sending & receiving communication

(e.g. hear a word against)

send/receive a word against has a pragmatic association with denial

(e.g. wouldn’t hear a word against)

Page 7: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we note subconsciously

the words it occurs with (its collocations),

the meanings with which it is associated (its semantic associations),

the pragmatics it is associated with (its pragmatic associations),

Page 8: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we note subconsciously

the words it occurs with (its collocations),

the meanings with which it is associated (its semantic associations),

the pragmatics it is associated with (its pragmatic associations),

Page 9: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

send/receive a word against has a pragmatic association with denial

(e.g. wouldn’t hear a word against)

denial + send/receive a word against has a pragmatic association with hypotheticality

(e.g. wasn’t prepared to say a word against)

Page 10: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

send/receive a word against has a pragmatic association with denial

(e.g. wouldn’t hear a word against)

denial + send/receive a word against has a pragmatic association with hypotheticality

(e.g. wasn’t prepared to say a word against)

Page 11: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claimWhenever we encounter a word (or

syllable or combination of words), we also note subconsciously

the grammatical patterns it is associated with (its colligations),

the genre and/or style and/or social situation it is used in,

whether it is used in a context we are likely to want to emulate or not

Page 12: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

denial + send/receive a word against colligates with modal verbs

(e.g. wouldn’t hear a word against)

denial + send/receive a word against also colligates with human subjects and human prepositional objects

Page 13: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

denial + send/receive a word against colligates with modal verbs

(e.g. wouldn’t hear a word against)

denial + send/receive a word against also colligates with human subjects and human prepositional objects

Page 14: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim All the features we notice prime us so

that when we come to use the word ourselves, we are likely (in speech, particularly) to use it in the same lexical context, with the same grammar, in the same semantic context, as part of the same genre/style, in the same kind of social and physical context, with a similar pragmatics and in similar textual ways.

Page 15: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Our ability to do this is what it means to know a word.

We are ALL learners, since we never stop being primed.

The only difference between the native speaker and the non-native speaker is the way that they are typically primed.

Creativity is the result of overriding some of one’s primings.

Page 16: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

A footnote

Whenever we encounter a word (or syllable or combination of words), we note subconsciously …

the words it occurs with (its collocations),

the grammatical patterns it occurs in (its colligations),

the meanings with which it is associated (its semantic associations),

Page 17: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

A footnote

Whenever we encounter a word (or syllable or combination of words), we note subconsciously …

the words it occurs with (its collocations),

the grammatical patterns it occurs in (its colligations),

the meanings with which it is associated (its semantic associations),

Page 18: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

A footnote

Whenever we encounter a word (or syllable or combination of words), we note subconsciously …

the words it occurs with (its collocations),

the grammatical patterns it occurs in (its colligations),

the meanings with which it is associated (its semantic associations),

Page 19: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we also note subconsciously

the positions in a text that it occurs in, e.g. does it like to begin sentences? Does it like to start paragraphs? (its textual colligations),

the genre and/or style and/or social situation it is used in

Page 20: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we also note subconsciously

the positions in a text that it occurs in, e.g. does it like to begin sentences? Does it like to start paragraphs? (its textual colligations),

the genre and/or style and/or social situation it is used in

Page 21: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

The Lexical Priming claim

Whenever we encounter a word (or syllable or combination of words), we also note subconsciously

the positions in a text that it occurs in, e.g. does it like to begin sentences? Does it like to start paragraphs? (its textual colligations),

the genre and/or style and/or social situation it is used in

Page 22: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Research QuestionResearch Question

• Do certain words and groups of words exhibit preferences for particular textual positions, such as the beginnings of texts and paragraphs? (Once upon a time is canonical example)

If they do, how can these items be discovered in a corpus?

Page 23: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Research QuestionResearch Question

• Do certain words and groups of words exhibit preferences for particular textual positions, such as the beginnings of texts and paragraphs? (Once upon a time is canonical example)

• If they do, how can these items be discovered in a corpus?

Page 24: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

AHRC Textual Priming AHRC Textual Priming ProjectProject

Using a corpus of Home News articles from the Guardian/Observer newspaper 1998-2004◦Approx. 54 million words◦113,288 articles

Each sentence in body of each article is classified according to its positionTISC – first sentence of first paragraphPISC – first sentence of any subsequent

paragraphNISC – any non-initial sentence

Thanks to AHRC

Page 25: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

AHRC Textual Priming AHRC Textual Priming ProjectProject

Using a corpus of Home News articles from the Guardian/Observer newspaper 1998-2004◦Approx. 54 million words◦113,288 articles

Each sentence in body of each article is classified according to its positionTISC – first sentence of first paragraphPISC – first sentence of any subsequent

paragraphNISC – any non-initial sentence

Thanks to AHRC

Page 26: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

AHRC Textual Priming AHRC Textual Priming ProjectProject

Using a corpus of Home News articles from the Guardian/Observer newspaper 1998-2004◦Approx. 54 million words◦113,288 articles

Each sentence in body of each article is classified according to its position◦TISC – first sentence of first paragraph(Text-Initial Sentence Corpus)

Thanks to AHRC

Page 27: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

AHRC Textual Priming AHRC Textual Priming ProjectProject

Using a corpus of Home News articles from the Guardian/Observer newspaper 1998-2004◦Approx. 54 million words◦113,288 articles

Each sentence in body of each article is classified according to its position◦TISC – first sentence of first paragraph◦PISC – first sentence of any subsequent

paragraph(Paragraph-Initial Sentence Corpus)

Thanks to AHRC

Page 28: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

AHRC Textual Priming AHRC Textual Priming ProjectProject

Using a corpus of Home News articles from the Guardian/Observer newspaper 1998-2004◦Approx. 54 million words◦113,288 articles

Each sentence in body of each article is classified according to its position◦TISC – first sentence of first paragraph◦PISC – first sentence of any subsequent

paragraph◦NISC – any non-initial sentence

Thanks to AHRC

Page 29: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

AHRC Textual Priming AHRC Textual Priming ProjectProject

Using a corpus of Home News articles from the Guardian/Observer newspaper 1998-2004◦Approx. 54 million words◦113,288 articles

Each sentence in body of each article is classified according to its position◦TISC – first sentence of first paragraph◦PISC – first sentence of any subsequent

paragraph◦NISC – Non-Initial Sentence Corpus

Thanks to AHRC

Page 30: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classification

Page 31: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

Page 32: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

PISCsentence

Page 33: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

PISCsentence

NISCsentence

Page 34: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

PISCsentence

PISCsentence

NISCsentence

Page 35: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

PISCsentence

PISCsentence

NISCsentence

NISCsentence

Page 36: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

GuardianGuardian Home News 1998- Home News 1998-20042004

TISC PISC NISC

tokens 3,122,037 12,521,902 19,338,590

types 58,432 127,038 141,793

type/token ratio (TTR) 53.43 98.57 136.39

sentences 113,288 607,125 1,064,493

mean (in words) 28 21 18

std.dev. 11.11 9.68 9.88

Summary of positional subcorporaSummary of positional subcorpora

Page 37: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Method: Intra-textual Key Intra-textual Key Word AnalysisWord Analysis• Compare the frequency of words

and clusters in one section of text with their frequency in another

• For example, fresh occurs significantly more frequently in text-initial sentences (TISC) than in non-initial sentences (NISC)

• fresh is a text-initial key word

• It also exhibits distinctive patterns in TISC contexts in terms of collocates:• fresh{row,controversy,

embarrassment}

Page 38: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Method: Intra-textual Key Intra-textual Key Word AnalysisWord Analysis• Compare the frequency of words

and clusters in one section of text with their frequency in another

• For example, fresh occurs significantly more frequently in text-initial sentences (TISC) than in non-initial sentences (NISC)

• fresh is a text-initial key word

• It also exhibits distinctive patterns in TISC contexts in terms of collocates:• fresh{row,controversy,

embarrassment}

Page 39: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Method: Intra-textual Key Intra-textual Key Word AnalysisWord Analysis• Compare the frequency of words

and clusters in one section of text with their frequency in another

• For example, fresh occurs significantly more frequently in text-initial sentences (TISC) than in non-initial sentences (NISC)

• fresh is a text-initial key word

• It also exhibits distinctive patterns in TISC contexts in terms of collocates:• fresh{row,controversy,

embarrassment}

Page 40: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Method: Intra-textual Key Intra-textual Key Word AnalysisWord Analysis• Compare the frequency of words

and clusters in one section of text with their frequency in another

• For example, fresh occurs significantly more frequently in text-initial sentences (TISC) than in non-initial sentences (NISC)

• fresh is a text-initial key word

• It also exhibits distinctive patterns in TISC contexts in terms of collocates:• fresh {row, controversy,

embarrassment}

Page 41: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Comparative KW Method: Comparative KW listslistsTake the pair-wise comparisons

for TISC, PISC and NISC and create Key Word and Key Cluster lists:

TISC_NISC

TISC_PISC

Page 42: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Comparative KW Method: Comparative KW listslistsTake the pair-wise comparisons

for TISC, PISC and NISC and create Key Word and Key Cluster lists:

TISC_NISC

TISC_PISC

PISC_NISC

PISC_TISC

Page 43: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Comparative KW Method: Comparative KW listslistsTake the pair-wise comparisons

for TISC, PISC and NISC and create Key Word and Key Cluster lists:

TISC_NISC

TISC_PISC

PISC_NISC

PISC_TISC

NISC_TISC

NISC_PISC

Page 44: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Key Word/Cluster Method: Key Word/Cluster MatrixMatrixEach word/cluster scored according

to whether (Y) or not (N) it is found on each of the six lists:

TISC_NISC TISC_PISC PISC_NISC PISC_TISC NISC_TISC NISC_PISC

yesterday Y Y Y N N N

said N N Y Y Y N

also N N N Y Y N

recall N N N N Y N

it was announced

Y Y N N N N

revealed that

Y N Y N N N

Page 45: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Key Word/Cluster Method: Key Word/Cluster MatrixMatrixEach word/cluster scored according

to whether (Y) or not (N) it is found on each of the six lists:

TISC_NISC TISC_PISC PISC_NISC PISC_TISC NISC_TISC NISC_PISC

yesterday Y Y Y N N N

said N N Y Y Y N

also N N N Y Y N

recall N N N N Y N

it was announced

Y Y N N N N

revealed that

Y N Y N N N

Page 46: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Key Word/Cluster Method: Key Word/Cluster MatrixMatrixEach word/cluster scored according

to whether (Y) or not (N) it is found on each of the six lists:

TISC_NISC TISC_PISC PISC_NISC PISC_TISC NISC_TISC NISC_PISC

yesterday Y Y Y N N N

said N N Y Y Y N

also N N N Y Y N

recall N N N N Y N

it was announced

Y Y N N N N

revealed that

Y N Y N N N

Page 47: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Key Word/Cluster Method: Key Word/Cluster MatrixMatrixEach word/cluster scored according

to whether (Y) or not (N) it is found on each of the six lists:

TISC_NISC TISC_PISC PISC_NISC PISC_TISC NISC_TISC NISC_PISC

yesterday Y Y Y N N N

said N N Y Y Y N

also N N N Y Y N

recall N N N N Y N

it was announced

Y Y N N N N

revealed that

Y N Y N N N

Page 48: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Key Word/Cluster Method: Key Word/Cluster MatrixMatrixEach word/cluster scored according

to whether (Y) or not (N) it is found on each of the six lists:

TISC_NISC TISC_PISC PISC_NISC PISC_TISC NISC_TISC NISC_PISC

yesterday Y Y Y N N N

said N N Y Y Y N

also N N N Y Y N

recall N N N N Y N

it was announced

Y Y N N N N

revealed that

Y N Y N N N

Page 49: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Method: Key Word/Cluster Method: Key Word/Cluster MatrixMatrixEach word/cluster scored according

to whether (Y) or not (N) it is found on each of the six lists:

TISC_NISC TISC_PISC PISC_NISC PISC_TISC NISC_TISC NISC_PISC

yesterday Y Y Y N N N

said N N Y Y Y N

also N N N Y Y N

recall N N N N Y N

it was announced

Y Y N N N N

revealed that

Y N Y N N N

Page 50: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Categories from patternsCategories from patterns From our corpus there 18

resulting patterns, covering:◦ 4467 words◦ 50861 clusters

Here we focus on four patterns:Text-Initial (YYNNNN & YNNNNN)

Paragraph-Initial (NNYYNN & NNYNNN)

TI and PI (YNYNNN)

Non-initial (NNNNYY)

Page 51: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Categories from patternsCategories from patterns From our corpus there 18

resulting patterns, covering:◦ 4467 words◦ 50861 clusters

Here we focus on four patterns:1. Text-Initial (YYNNNN & YNNNNN)

Paragraph-Initial (NNYYNN & NNYNNN)

TI and PI (YNYNNN)

Non-initial (NNNNYY)

Page 52: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Categories from patternsCategories from patterns From our corpus there 18

resulting patterns, covering:◦ 4467 words◦ 50861 clusters

Here we focus on four patterns:1. Text-Initial (YYNNNN & YNNNNN)

2. Paragraph-Initial (NNYYNN & NNYNNN)

TI and PI (YNYNNN)

Non-initial (NNNNYY)

Page 53: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Categories from patternsCategories from patterns From our corpus there 18

resulting patterns, covering:◦ 4467 words◦ 50861 clusters

Here we focus on four patterns:1. Text-Initial (YYNNNN & YNNNNN)

2. Paragraph-Initial (NNYYNN & NNYNNN)

3. TI and PI (YNYNNN)

Non-initial (NNNNYY)

Page 54: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Categories from patternsCategories from patterns From our corpus there 18

resulting patterns, covering:◦ 4467 words◦ 50861 clusters

Here we focus on four patterns:1. Text-Initial (YYNNNN & YNNNNN)

2. Paragraph-Initial (NNYYNN & NNYNNN)

3. TI and PI (YNYNNN)

4. Non-initial (NNNNYY)

Page 55: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Category 1: Text-initialCategory 1: Text-initial(YYNNNN, YNNNNN & YYNNNY)(YYNNNN, YNNNNN & YYNNNY)

TISC PISC NISC

ONE OF BRITAIN’S 132.0 13.4 9.0A REPORT BY THE 16.0 6.8 3.1ARE TO BE 271.9 23.8 24.7THAT COULD 106.7 41.4 61.7AFTER BEING 334.4 67.8 54.5

• 1,600 (36%) of our key words and 29,303 (58%) of our key clusters being to this category

normalized to occurrences per million words

Page 56: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Category 1: Text-initialCategory 1: Text-initial(YYNNNN, YNNNNN & YYNNNY)(YYNNNN, YNNNNN & YYNNNY)

TISC PISC NISC

ONE OF BRITAIN’S 132.0 13.4 9.0A REPORT BY THE 16.0 6.8 3.1ARE TO BE 271.9 23.8 24.7THAT COULD 106.7 41.4 61.7AFTER BEING 334.4 67.8 54.5

• 1,600 (36%) of our key words and 29,303 (58%) of our key clusters being to this category

normalized to occurrences per million words

Page 57: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Category 2: Paragraph-initialCategory 2: Paragraph-initial(NNYYNN,NNYNNN & NNNYNN)(NNYYNN,NNYNNN & NNNYNN)

TISC PISC NISC

THE FINDINGS 13.1 43.5 16.7CAME AS 5.8 47.9 9.8IS THE LATEST 10.6 17.2 3.5GENERAL SECRETARY OF THE 15.4 58.5 11.6CONFIRMED THAT 43.2 66.5 32.3

• 732 (16%) of our key words and 5,755 (11%) of our key clusters being to this category

normalized to occurrences per million words

Page 58: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Category 3: Text- & Category 3: Text- & Paragraph-initialParagraph-initial(YNYNNN & YNYYNN)(YNYNNN & YNYYNN)

TISC PISC NISC

THE CONTROVERSY 28.5 17.2 6.6HEAD OF THE 93.5 89.4 46.8DECISION TO 151.8 130.8 73.9SAID YESTERDAY THAT 80.1 67.5 26.6ISSUED A 51.9 35.9 20.3

• 253 (6%) of our key words and 913 (2%) of our key clusters being to this category

normalized to occurrences per million words

Page 59: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Category 3: Text- & Category 3: Text- & Paragraph-initialParagraph-initial(YNYNNN & YNYYNN)(YNYNNN & YNYYNN)

TISC PISC NISC

THE CONTROVERSY 28.5 17.2 6.6HEAD OF THE 93.5 89.4 46.8DECISION TO 151.8 130.8 73.9SAID YESTERDAY THAT 80.1 67.5 26.6ISSUED A 51.9 35.9 20.3

• 253 (6%) of our key words and 913 (2%) of our key clusters being to this category

normalized to occurrences per million words

Page 60: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Category 4: Non-initialCategory 4: Non-initial(NNNNYY, NNNYYY & NYNNYY)(NNNNYY, NNNYYY & NYNNYY)

TISC PISC NISC

HAVE TO 174.2 352.2 616.1WHILE 530.1 589.3 701.0BUT 1262.0 4164.3 6068.6BE ABLE TO 81.0 108.9 165.5GOING TO 78.2 268.8 494.9

• 486 (11%) of our key words and 3,105 (6%) of our key clusters being to this category

normalized to occurrences per million words

Page 61: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

PISCsentence

PISCsentence

NISCsentence

Page 62: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

On Wednesday and Thursday a brief respite should see most of the country becoming fine, with heavy rain only expected across parts of Northern Ireland. But by Friday, much of England and Wales will again be hit by storms and further downpours.

So far, Britain's recent storms have already claimed the lives of six people. Yesterday, insurers said the cost of the cleanup could run into tens of millions of pounds.

Method: Sentence classificationMethod: Sentence classificationTISCsentence

Page 63: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

TISC 79 per million sentencesPISC 5 per million sentencesNISC 22 per million sentences

Method: Sentence classificationMethod: Sentence classificationTISCsentence

Page 64: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

TISC 117 per 100,000 sentencesPISC 25 per 100,000 sentencesNISC 12 per 100,000 sentences

Method: Sentence classificationMethod: Sentence classificationTISCsentence

Page 65: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

TISC 48 per thousand sentencesPISC 7 per thousand sentencesNISC 6 per thousand sentences

Method: Sentence classificationMethod: Sentence classificationTISCsentence

Page 66: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

More wet weather was predicted across Britain today as experts warned many areas were already saturated with rain.

TISC 17 per thousand sentencesPISC 4 per thousand sentencesNISC 3 per thousand sentences

Method: Sentence classificationMethod: Sentence classificationTISCsentence

Page 67: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Theoretical ImplicationsTheoretical Implications

Confirmation of prediction made by lexical priming theory

Knowing a word includes knowing where it will be used in a text

Clusters are more important than single words in textual positioning (cf. Wray)

Page 68: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Theoretical ImplicationsTheoretical Implications

Confirmation of prediction made by lexical priming theory

Knowing a word includes knowing where it will be used in a text

Clusters are more important than single words in textual positioning (cf. Wray)

Page 69: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Theoretical ImplicationsTheoretical Implications

Confirmation of prediction made by lexical priming theory

Knowing a word includes knowing where it will be used in a text

Clusters are more important than single words in textual positioning (cf. Wray)

Page 70: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Theoretical ImplicationsTheoretical Implications

Confirmation of prediction made by lexical priming theory

Knowing a word includes knowing where it will be used in a text

Clusters are more important than single words in textual positioning (cf. Wray)

Page 71: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Applied Linguistic Applied Linguistic ImplicationsImplications

Translation

Academic writing

Authentic data

Death (or redefinition) of the topic sentence

Page 72: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Applied Linguistic Applied Linguistic ImplicationsImplications

Translation

Academic writing

Authentic data

Death (or redefinition) of the topic sentence

Page 73: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Applied Linguistic Applied Linguistic ImplicationsImplications

Translation

Academic writing

Authentic data

Death (or redefinition) of the topic sentence

Page 74: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Applied Linguistic Applied Linguistic ImplicationsImplications

Translation

Academic writing

Death of the topic sentence

Page 75: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Applied Linguistic Applied Linguistic ImplicationsImplications

Translation

Academic writing

Death (or redefinition) of the topic sentence

Page 76: Forecasting the beginnings of newspaper texts Some corpus & experimental findings Michael Hoey, Matthew Brook O’Donnell, Michaela Mahlberg and Mike Scott.

Applied Linguistic Applied Linguistic ImplicationsImplicationsLearning a word or phrase includes

learning its characteristic textual positioning, or else a learner’s text will read awkwardly

Fabricated texts are unlikely to preserve the natural textual colligations of the language if the intention of these texts is to illustrate other features

Textual colligation is where discourse analysis and dictionaries meet.