LIN 3098 Corpus Linguistics
Albert Gatt
In this lecture
We proceed with our discussion of how corpus-based studies influence the study of grammar.
Focus: lexico-grammar
Uses of corpora in grammar studies
The use of corpora to study grammar is relatively recent. With corpora, the unit of analysis tends to be the
word (tokens/types) Studies of lexis therefore a natural application.
The study of grammar has in fact emphasised the role of lexis. Also aided by recent developments in automatic
POS tagging and parsing. Additional grammatical information enables
search and analysis of complex structures.
Part 1
The relationship between grammar and lexis
Degrees of abstraction
We have already looked at the use of corpora in studying collocations.
Given sufficient grammatical annotation, we can look at collocational patterns at different degrees of abstraction.
Degrees of abstraction Example: all
preceding collocates of the noun time in the BNC.
Not all collocates are equally interesting. lots of noise when
searching for a single word!
word frequency
the 266
first 104
this 96
of 72
same 67
a 65
Practical task 1
Let’s try to make our search more interesting, by focusing on a combination of lexical and grammatical material.
Conduct a search for: Any adjective followed by the noun time
Degrees of abstraction Example: only
adjectival collocates of the noun time in the BNC.
Can make grammatically informed queries. [ADJ + time]
Allows focus on what is truly of interest.
word frequency
long 38
good 11
spare 7
little 6
present 6
whole 5
Practical task 2
We can go further in abstracting away from specific lexical material.
Conduct a search for: Any adjective followed by any noun
Degrees of abstraction Suppose we were
interested in all adjective-noun combinations. [ADJ + N]
Given a query language of the right complexity (such as CQL), we can extract grammatically interesting collocations.
ADJ+N Freq.
prime minister 102
other hand 65
local authorities 44
long time 42
soviet union 41
Limitations of these approaches
What we’ve done still retains a focus on the word.
The main purpose is to improve lexical research by incorporating a limited amount of grammatical info (usually POS)
Can we go further and really investigate grammar?
Part 2
Collocational Frameworks
Does this sound familiar? Colourless green ideas sleep furiously Chomsky’s example illustrates an approach
to syntax where: the primary focus is on syntactic rules rules manipulate lexical items of the right
categories “grammatical” or “legal” is distinct from
“sensible” or “meaningful” syntactic rules operate (semi-) independently of
lexical items: if X is of the right category, then X can be slotted into a syntactic position
Chicken and egg questions When we formulate an utterance, which
comes first? syntax? lexical items? both in parallel?
Do particular syntactic constructions have a meaning (or communicative function)? E.g. what is the meaning of: the appositive that-construction
The reason that he gave was… the extraposed it-construction
It is possible to hire a car if you want one.
Lexical approaches to grammar Assumptions:
syntactic structures are highly sensitive to the lexical items that they can select
structures also may have specific communicative functions or meanings speakers/authors convey meaning, and syntax
is used as a resource to convey it ideally, grammar+lexis should be viewed as part
and parcel of the same process phraseology and co-selection play an important
role in particular constructions, we find that
particular words tend to co-occur with great regularity
The idiom principle
Sinclair (1991): “a language user has available to
him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments”
Implications
The idiom principle suggests that speakers/writers: Don’t just apply abstract rules to build
structures; Re-use bits of structure;
It also implies that bits of structure are themselves meaningful.
The idiom principle vs open choice
This principle contrasts with the “open-choice” principle.
Open choice predicts that: Syntactic rules operate independently of
lexical items. Structures are constructed by applying
rules and “plugging” in lexemes.
Putting the idiom principle to work
Sinclair and Renouf (1991) introduced collocational frameworks Intended as a practical way to investigate the
use and meaning of grammatical constructions
A collocational framework consists of a pattern involving 3 items: A function word A content word (specified via POS) Another function word Example: [a + Noun + of]
Collocational frameworks
Is a pattern like [a + Noun + of] a linguistic unit? If it is, we would expect that: The grammatical context (a, of) makes
restrictions on the semantics of the Noun in the middle (not any noun can be used)
Practical task 3
Conduct a search for: The collocational framework
[a+Noun+of] In looking at the nouns that occur here,
can you spot any semantic commonalities?
What does this tell you about the way the structure itself is used, and what it usually means?
[a + Noun + of]
Nouns in this construction are often quantities: a lot of a number of ...
This suggests that this construction itself places a restriction on the semantics of the content words used in it.
Collocational frameworks: final remarks
Sinclair and Renouf did not suggest that any string of words or pattern counts as a collocational framework.
Crucially, there has to be evidence for semantic restrictions on content words.
E.g. [Verb in NP] doesn’t count as a good pattern, because practically any verb can occur in the first position.
Part 3
Colligates
Colligations
Roughly, a collocation at the level of part of speech.
An idea due to Firth. The main question is: What are the grammatical environments
in which a particular word occurs? One way of answering this question is
to look for a word, and then look at the POSs to the left and right.
Practical task 4
Conduct a search for the word consequence, specifying any word to the right and any word to the left.
Make a frequency count of node tags.
What do you observe?
Some data (Gries 2009)
Left context of consequence Article Adjective ...
Right context: Of Preposition ...
Observations
This operationalisation of the concept of colligation is highly related to the collocational framework of Renouf/Sinclair.
It’s primarily intended to give an idea of the grammatical environment in which a word occurs.
Limitations
Both collocational frameworks and colligations have some drawbacks: They’re still highly word-based They focus only on POS (not full syntax) Their view of grammatical structure is
purely linear.
Part 3
Some case studies
Example 1: It as object Components:
non-referential use of it object of a verb followed by an NP or AdjP
Examples (from the BNC): Many people who use drugs regularly find it
difficult to exist in a drug-free world . You can also find it hard to remember things in court unless they agree to do so , making it
difficult for detainees to challenge the validity
Example 1 continued Typical analysis:
this construction involves extraposition:People who use drugs find existing in a drug-free world difficult.People who use drugs find it difficult to exist in a drug-free world
Some empirical observations on lexis (Francis 1993): 98% of cases involve find and make some other verbs like think, consider, see to
Possible “meaning”/function of the structure: a stereotyped way of presenting a situation in terms
of how it is evaluated evaluation is placed after the verb
Example 2: appositive clauses Apposition:
a relation between an NP and another phrase which refers to the same thing (Leech and Svartvik, 1975)
Examples: your daughter, the lawyer, is here
In English, can also occur with that-clauses and to-clauses: the news that your daughter was here the plot to assassinate the president
Example 2: appositive clauses Distinguished from restrictive relative
clauses: the dog that I saw yesterday restricts the reference of the head noun
Appositive clause: the fact that I came does not restrict the reference of the head noun “amplifies” or “qualifies” the head noun
Example 2: Appositives Appositive that-clauses (BNC):
The fining of airlines plus the fact that the nationals of many refugee-producing countries
as firm as the Emperor Augustus about the principle that a ruler 's actual appearance matters less
Traditional grammars (Leech and Svartvik 1975): “head noun must be an abstract noun”
Question: what are the lexical restrictions here? do they have implications for the function of this
syntactic structure?
Levels of stereotypicality in syntax
Phraseological constraints: the co-selection of particular lexical
items within a particular syntactic structure
These seem to range on a continuum. At one extreme: fixed, unchanging
constructions (behave like multi-word lexical items)
At the other: complete freedom in lexical selection.
Phraseology Completely fixed idioms:
it never rains but it pours Less fixed idioms:
put on a brave face putting a brave face on … put a good face on… Some room for lexical manoeuvre
Semi-prepackaged phrases which allow for variation: I haven’t the faintest/foggiest/remotest idea/notion
Highly nebulous lexico-syntactic dependencies: be a case of X a case of déjà vu a case of take the money and run …
Syntactic “fixedness” Given the cline from fixed to flexible, some
linguists (e.g. Francis 1993) suggest that the distinction between “lexicon” and “syntax” is arbitrary. This argument is based on phraseological
constraints observable only in very large corpora.
This is not too far from recent positions in Generative Grammar: Jackendoff (2002)’s parallel architecture; Construction Grammar (e.g. Goldberg, 1995)
The “item” and the “environment”
Francis proposes that the distinction between “lexical item” and “syntactic environment” only be used for convenience.
Proposed method: look at a syntactic environment discover lexical regularities focus on a subset of the lexical items discover further generalisations about the
grammar of those items
Case study: Extraposed it-clauses
One of the most frequent adjectives is possible: it is possible to hire a car it is possible that it will rain
Proposed interpretations: that-clause is used for possibility to-clause is used to express ability
This suggests that possible might have (at least) two different meanings.
The grammar of possible Further patterns involving possible:
article + superl. adj. + possible + nounthe best possible start
as … as possible …
Main idea: specifications of possible grammatical environments of the item can help specify its range of meanings. these examples seem to confirm the
ability/probability use of possible
Top Related