Constraining the Theory - Prof. Fredreck J. Newmeyer
-
Upload
phoenix-tree-publishing-inc -
Category
Education
-
view
69 -
download
7
description
Transcript of Constraining the Theory - Prof. Fredreck J. Newmeyer
1
FREDERICK J. NEWMEYER
UNIVERSITY OF WASHINGTON, UNIVERSITY OF BRITISH COLUMBIA,
AND SIMON FRASER UNIVERSITY
Class 2:Constraining the Theory
2
The need for constraints
Syntactic Structures contrasted three formal models of syntax:
Finite-State grammars
Phrase-structure grammars
Transformational grammars
3
The need for constraints
Finite-State grammars. Every rule is of the form:
Si a Sj (a terminal symbol followed by the initial symbol)
Notice that a FSG does generate an infinite set of sentences.
But it provides no structure to these sentences.
And it cannot generate sentences like:If Si, then Sj
Either Si or Sj
4
The need for constraints
Phrase-structure grammars. Every rule is of the form:
A B CB a D
These rules give you a tree diagram: A
B C
a D
• So they can handle structure and constituency.
• But they cannot handle discontinuous elements or long-distance dependencies:
• Mary is work+ing• Who did you see?
5
The need for constraints
Transformational rules. They transform trees into trees:
6
The need for constraints
The problem: Transformational rules are too powerful!
They allow things to happen that never occur in any human language:
To reverse all of the words in a sentence.To move a preposition in the highest clause and attach it to an adjective in the lowest clauseTo delete every other word.
7
The need for constraints
So the important thing was to constrain transformational rules — to make them less powerful.
This was especially important, given that a child acquires a transformational grammar.
The less powerful the rules, the smaller the ‘search space’ for the child and therefore the easier to explain the rapidity of language acquisition.
8
Island constraints
The most important set of constraints over the years are what are called ‘island constraints’.
They prohibit movement out of a particular syntactic configuration.
A configuration out of which nothing can move is called an ‘island’.
9
Island constraints
The first island constraint was the A-over-A principle (from Chomsky 1964).
Mary saw the boy walking to the railroad station is ambiguous:
Mary saw [NP the boy [VP walking to the railroad station]]
Mary saw [NP [NP the boy] [VP walking to the railroad station]]]
10
Island constraints
Who did Mary see walking to the railroad station? is unambiguous
It can be a question corresponding to (i), but not to (ii):
(i) Mary saw [NP the boy [VP walking to the railroad station]]
(ii) Mary saw [NP [NP the boy] [VP walking to the railroad station]]]
11
Island constraints
The idea of the A-over-A principle: You can’t move an NP (or a VP, S, etc.) if it is dominated by an NP (or a VP, S, etc.):
So A1 is an island,preventing the movementof x.
12
Island constraints
JOHN R. ROSS
In his 1967 dissertation, John R. Ross showed that the A-over-A principle did not work.
He proposed in its place, 6 or 7 other constraints to replace A-over-A.
13
Island constraints
Complex Noun Phrase Constraint:
Element A cannot be moved out of NP1
*Who do you believe the claim that John saw?
14
Island constraints
The Coordinate Structure Constraint
Neither conjunct in a coordinatestructure can be moved.
*What did John eat beans and?
15
Island Constraints
Ross proposed several other constraints as well.
We still talk today about ‘Ross constraints’.
However, Chomsky in 1973 found a way to unify most of them under one simple constraint.
16
Island Constraints
The crucial notion is ‘bounding node’.
Bounding nodes (for English) are the nodes S (=IP), and NP (=DP).
Subjacency (for English): No element may be moved across more than one bounding node.
17
Island Constraints
Who did you believe Bill saw? is a grammatical sentence. The first movement crosses only one bounding node and the second also crosses only one.
18
Island Constraints
*Who did you believe the claim that Bill saw? is an ungrammatical sentence. The first movement crosses only one bounding node, but the second crosses two.
19
Island Constraints
didyouwonder edwhereJohnputwhat
*What did you wonder where John put? is impossible because it is a Subjacency violation.
20
SIMPLIFYING THE SYNTACTIC COMPONENT
The trend in chomskyan thinking has been to reduce the scope and complexity of the ‘narrow syntax’ in two ways:
1. To derive to the extent possible syntactic complexity from independently needed principles,
and
2. To shift the burden of accounting for specific phenomena from the syntax to other components (lexicon, morphology, interfaces).
21
SIMPLIFYING THE SYNTACTIC COMPONENT
It’s the first (more interesting!) strategy that dominated syntactic theory for the most part.
In early transformational syntax, grammars were lists of complex rules, the rule lists of one language not looking very much like the rule lists of another language.
22
SIMPLIFYING THE SYNTACTIC COMPONENT
23
SIMPLIFYING THE SYNTACTIC COMPONENT
Throughout most of the history of generative syntax, language-particular rules have been simplified (or eliminated).
More general principles have replaced them.
24
SIMPLIFYING THE SYNTACTIC COMPONENT
A good example of deriving syntactic facts from independently needed principles:
Joe Emonds’s Structure Preserving ConstraintA large class of T-rules can move an element only into a position that could have been created by the Phrase-Structure rules.
25
SIMPLIFYING THE SYNTACTIC COMPONENT
PASSIVE RULE
Russia defeated Germany.Germany was defeated by Russia.
BUT NOTa. *Germany Russia was defeated by
b. *Germany was Russia defeated by c. *Germany was by Russia defeated d. *Germany was defeated Russia by
Therefore, the passive rule can be greatly simplified.
26
THE LEXICALIST HYPOTHESIS
THE LEXICALIST HYPOTHESIS: Transformational rules cannot change the syntactic category of an item, perform derivational morphology, etc.
Before the late 1960s (and in Generative Semantics), John’s refusal of the offer was derived transformationally from something like the fact that John refused the offer.
27
THE LEXICALIST HYPOTHESIS
But that fails to explain why
John’s three unexpected refusals of the offer
has exactly the same structure as
John’s three boring books about surfing
In other words, you lose a generalization by deriving nouns from verbs.
28
THE LEXICALIST HYPOTHESIS
Another argument for the LH is that the relationship between verbs and their corresponding nominalizations can be very idiosyncratic:
motion, but *mote; usher, but *ush; tuition, but *tuit; etc. profess (‘declare openly’) — professor (‘university teacher’) — profession (‘career’)
ignore (‘pay no attention to’) — ignorance (‘lack of knowledge’) — ignoramus (‘very stupid person’)
person (‘human individual’) — personal (‘private’) — personable (‘friendly’) — personality (‘character’) — personalize (‘tailor to the individual’) — impersonate (‘pass oneself off as’)
social (‘pertaining to society’; ‘interactive with others’) — socialist (‘follower of a particular political doctrine’) — socialite (‘member of high society’)
29
SURFACE SEMANTIC INTERPRETATION
The lexicalist hypothesis emphasized the importance of ‘shallow’ levels of syntactic structure.
Shallow levels became even more important with the introduction of surface semantic interpretation in the late 1960s.
RAY JACKENDOFF
30
SURFACE SEMANTIC INTERPRETATION
S
NP VP
Q N V NP
many men read Q N
few books
INTERPRET WITH WIDE SCOPE INTERPRET WITH NARROW SCOPE
31
THE EXTENDED STANDARY THEORY
But some aspects of interpretation still seemed to take place at Deep Structure.
For example, interpretation seems to have to take place before Passive, since in a passive sentence like Mary was seen by John, Mary is interpreted in object position.
The model with both Deep and Surface Structure rules of interpretation was called the Extended Standard Theory (EST).
32
THE EXTENDED STANDARD THEORY
33
TRACES AND OTHER ABSTRACT ELEMENTS
The ‘price paid’ for the addition of constraints on movement, surface interpretation, and so on was an explosion of the number of ‘invisible elements’ like traces, PRO, pro, and so on.
Look at how traces work(ed).
34
TRACES AND OTHER ABSTRACT ELEMENTS
S
NP AUX VP
was V NP
seen
Mary t
35
TRACES AND OTHER ABSTRACT ELEMENTS
But the biggest plus for traces and other empty elements was that they seemed to allow for the unification of constraints on movement and constraints on anaphora.
36
TRACES AND OTHER ABSTRACT ELEMENTS
Mary helped herself andMary seemed t to be happy
are grammatical for the same reason.
*Mary asked [John to help herself] and
*Mary seemed [to be true t to be happy]
are ungrammatical for the same reason.
37
MOVE-a
By the mid 1970s the transformational component had been ‘cleaned up’ to the point where it was suggested (by Chomsky) that there could be one all-purpose movement rule Move- .a
38
MOVE-a
Passive Dative wh-movement
Move-a
Extraposition Scrambling Subject-Aux- Inversion
39
ALL INTERPRETATION NOW ON THE SURFACE
Note that traces allow all interpretation to take place on the surface.
Johni was seen ti by Mary
The trace of John marks its original D-Structure position.
The model on the next slide (also called the Extended Standard Theory) was dominant from the mid 1970s to the mid 1990s.
40
41
THE GOVERNMENT-BINDING THEORY (GB)
The next major step forward in syntactic theory was the Government-Binding Theory (GB).
Published in 1981, LGB synthesized all of the results of the previous decade.
42
THE GOVERNMENT-BINDING THEORY (GB)
GB posited that a grammar is a set of interacting principles.
Movement applies freely, constrained by these principles.
The principles are:BoundingGovernmentTheta-theoryBindingCaseControlX-bar
43
THE GOVERNMENT-BINDING THEORY (GB)
Bounding: The principles that determine how far an element can move. Subjacency is the most important Bounding principle.
Government: The relation between a head and its dependent element, e. g. V and NP, V and PP, INFL and the subject position.
The centrepiece of government was the Empty Category Principle: Every empty element needs to be governed in a particularly strong way.
44
THE GOVERNMENT-BINDING THEORY (GB)
*Who did you wonder if solved the problem is an ECP violation (note that it does not violate Subjacency).
45
THE GOVERNMENT-BINDING THEORY (GB)
Theta-theory: governs the positioning of arguments (elements that have thematic roles).
A consequence of theta-theory: an element can move only into a position that is not assigned a semantic role:
it seems [John is willing to help] Johni seems [ti willing to help]
This movement is possible because seem does not assign a thematic role to its subject.
46
THE GOVERNMENT-BINDING THEORY (GB)
Binding theory: Governs the relationship between an element and its antecedent:
Principle-A: An anaphor must be free in its governing category: Theyi like each otheri, but not *Theyi think that Mary likes each otheri.
Principle-B: A pronominal must be free in its governing category: Johni likes himj, but not *Johni likes himi.
Principle-C: A referring expression must be free everywhere: *Hei thinks that Johni is smart.
47
THE GOVERNMENT-BINDING THEORY (GB)
Case theory: Every NP must be Case-marked:
e was seen John Johni was seen ei
John has to move because participles do not assign Case. (It only looks like Passive is obligatory.)
• Control theory: The relationship between an antecedent and PRO (Johni wants PROi to leave).
48
THE GOVERNMENT-BINDING THEORY (GB)
X-bar theory: The principles that govern phrase structure:
Functional and lexical categories
49
THE GOVERNMENT-BINDING THEORY (GB)
GOVERNMENT THEORY
BINDING THEORY
CASE THEORY
THETA-THEORY
X-BAR THEORY
BINDING THEORY
ETC.
A SENTENCE IS THE PRODUCT OF THE INTERACTION OF THE DIFFERENT PRINCIPLES
50
PARAMETERS
The principles are parameterized: (Ideally) by allowing a small number of principles each to have a small number of settings, the superficially complex differences of the worlds’ languages can be accounted for.
LUIGI RIZZI
51
PARAMETERS
Rizzi noticed that extraction is more permissive in Italian than in English.
In Italian the literal equivalent of English *What did you wonder where John put? is grammatical.
Rizzi proposed that Subjacency is parameterized: Bounding nodes in English: IP and DP Bounding nodes in Italian: CP and DP.
52
PARAMETERS
Other languages are more restrictive than English.
Russian has Wh-Movement, but the wh-element cannot be extracted from its clauses.
So in Russian you can say things like Who did you see?, but not *Who did you ask Mary to see?
Therefore in Russian, the bounding nodes are both CP and IP.
53
PARAMETERS
But what about languages like Chinese that appear not to have any Wh-Movement (wh-in situ languages)?
Zhangsan xiang-zhidao [Lisi mai-le shenme] (Chinese)Zhangsan wonder Lisi bought what‘Zhangsan wonders what Lisi bought’
John-ga dare-o butta ka (Japanese)John-SU who-OB hit‘Whom did John hit?’ (compare: John-ga Bill-o butta ‘John hit Bill’)
54
PARAMETERS
James Huang has worked out the parameters for Chinese and typologically similar languages:
JAMES HUANG
55
PARAMETERS
Huang has proposed a ‘Question Movement Parameter:
In languages like English, Italian, and Russian, Wh-Movement applies in the overt syntax.
In languages like Chinese and Japanese, Wh-Movement applies covertly in LF:
56
PARAMETERS
So the first sentence below would have the LF representation under it:
Zhangsan xiang-zhidao [Lisi mai-le shenme]Zhangsan wonder Lisi bought what‘Zhangsan wonders what Lisi bought’
Zhangsan xiang-zhidao [CP shenmei [IP Lisi mai-leti]
Zhangsan wonder what Lisi bought
57
PARAMETERS
Why believe in LF movement?
Because interpretations in wh-in-situ languages (often!) obey at least some island constraints.
*Ni xiangxin Lisi weisheme lai de shuofayou believe [the claim [that [Lisi came why]]]‘*Why do you believe the claim that Mary came ___?’ *John-wa Mary-ga naze sore-o katta kadooka siritagatte iru noJohn wants to know [whether [Mary bought it why]]‘Why does John want to know whether Mary bought it ___?’
This can be captured if LF Wh-Movement is subject to these constraints.
58
PARAMETERS
Another parametric difference among languages:
No null arguments (English, French): It is raining, *is raining; Mary left, *left
Null subjects (Spanish, Italian): llueve (‘It is raining’); comió la manzana (‘He/She ate the apple’)
Null subjects are usually analyzed as the empty pronominal ‘pro’
59
PARAMETERS
Both null subjects and null objects (Chinese):
Zhangsani xiwang [ei keyi kanjian Lisi]
Zhangsan hope can see Lisi‘Zhangsan hopes that he can see Lisi’
Zhangsani shuo Lisi kanjian-le ei
Zhangsan say Lisi see LE‘Zhangsan said Lisi saw him’
• Huang analysed the empty position as null topic (old information, salient in discourse)
60
PARAMETERS
The Head Parameter has also been historically very important.
Joseph Greenberg’s 1963 paper launched modern typology.
JOSEPH GREENBERG, 1915-2001
61
PARAMETERS
The Greenbergian correlations:VO correlate OV correlate adposition - NP NP – adpositioncopula verb - predicate predicate - copula verb‘want’ - VP VP - ‘want’tense/aspect auxiliary verb - VP VP - tense/aspect auxiliary verbnegative auxiliary - VP VP - negative auxiliarycomplementizer - S S – complementizerquestion particle - S S - question particleadverbial subordinator - S S - adverbial subordinatorarticle - N' N' – articleplural word - N' N' - plural wordnoun - genitive genitive – nounnoun - relative clause relative clause – nounadjective - standard of comparison standard of comparison – adjectiveverb - PP PP – verbverb - manner adverb manner adverb - verb
62
PARAMETERS
The correlations are generally captured by the Head parameter: A language is either head-initial (VO) or head-final (OV).
The problem: Many, probably most, languages are not completely consistent.
For example, Chinese is consistently head-final except in the rule expanding X’ to X0 (if the head is verbal it precedes the complement).
63
PARAMETERS
So Chinese manifests the ordering V-NP, but NP-N:
you sange ren mai-le shuHAVE three man buy-ASP book‘Three men bought books’
Zhangsan de sanben shuZhangsan DE three book‘Zhangsan’s three books’
64
PARAMETERS
The usual assumption has been that ‘inconsistent’ language have more complex grammars than ‘consistent’ languages.
So Huang has suggested that Chinese has a more complicated X-bar schema to ‘pay’ for its inconsistency:
XP —> YP X’X' —> X0 YP iff X = [+v]
YP X0 otherwise
65
PARAMETERS
Lisa Travis has suggested a different way of handling the inconsistent ordering of Chinese.
LISA TRAVISNormally, if a language is head final, it assigns Case and Theta-Role to the left, as in (a). However Chinese has a special setting (b) that violates this default ordering.
a. Unmarked setting: HEAD-RIGHT THETA-ASSIGNMENT TO LEFT & CASE-ASSIGNMENT TO LEFT
b. Marked setting (Chinese): HEAD-RIGHT & THETA-ASSIGNMENT TO RIGHT & CASE-ASSIGNMENT TO RIGHT
66
PARAMETERS
Some other important parameters:oSerial verbs (YES, as in Chinese;
no, as in English)oPolysynthesis (YES, as in Inuit and
Athabaskan; NO, as in English and Chinese)
oAccusative or Ergative (Accusative as in English and Chinese; Ergative as in Dyirbal and Georgian)
67
PARAMETERS
68
PARAMETERS
Parameterized principles came to play such an important role that the Government-Binding theory is sometimes called the ‘Principles-and-Parameters’ approach.
At one point, syntacticians were confident that acquiring a language was just a matter of finding the right ON-OFF settings for each language.
69
PARAMETERS
The theory of parameters is in difficulty now:
o One might need hundreds or even thousands of them.
o The clustering effects have not worked out very well.
o They are out of spirit with the Minimalist Program.
o There have been recent attempts to derive them from the process of language acquisition.
70
THE MINIMALIST PROGRAM
Starting in the 1990s, the Government-Binding theory has been gradually replaced by the Minimalist Program.
But the MP is in many ways not an abrupt change of direction from GB.
Many conceptions from GB were incorporated directly into the MP.
71
THE MINIMALIST PROGRAM
Everybody knew that there was huge redundancy in the scope of the GB principles.
Binding, bounding, Case, theta, etc. overlapped considerably in their domains.
Some ungrammatical sentences were ruled out by 3 or 4 different principles!
So it became clear that it was desirable to reduce the number of principles.
72
THE MINIMALIST PROGRAM
Many of the principles seemed to have a ‘least-effort/economy’ essence.
That is, they moved elements as short a distance as possible …
… or they looked at only the closest possible relationship between an anaphor or its antecedent of a gap and its filler.
That seemed to suggest that ‘formal economy’ should be at the centre of the theory.
73
THE MINIMALIST PROGRAM
Other things that were important in the early theory no longer seemed so important.
The levels of D(eep)-Structure and S(urface)-Structure seemed to be playing less and less work.
X-bar theory seemed to follow from independent principles.
More and more generalisations seemed to apply at the interfaces with PF and LF, rather than in the course of the derivation.
74
THE MINIMALIST PROGRAM
Chomsky’s big idea: Rebuild grammatical theory from the ‘bottom up’: Start with only what we know is necessary and go from there.
That’s why it is called the ‘Minimalist Program’.
The idea that language might be ‘perfect’ is a leading idea of the MP.
If that is true, language is unlike all other known biological systems.
75
THE MINIMALIST PROGRAM
• The Minimalist Program is committed to probing to what extent the human language faculty is an optimal solution to minimal design specification.
• The hope is that the only grammatical processes are those that are subject to ‘virtual conceptual necessity’.
• Notice that one consequence is that there is a much smaller innate UG.
76
THE MINIMALIST PROGRAM
has no level of D-Structure or S-Structure
leaves a more important role for the semantic and phonetic interfaces
So processes that used to be considered syntax-internal, like binding, bounding, etc., are now handled at LF or at PF.
77
THE MINIMALIST PROGRAM
has only one structure-building operation, namely, ‘Merge’ (in other words, recursion) is all that there is in the narrow syntax.
Sentences are built from the ‘bottom up’, in the manner of categorial grammar.
Movement is considered to be ‘Internal Merge’, that is, the merging (expansion) of an element already in the derivation.
78
THE MINIMALIST PROGRAM
79
THE MINIMALIST PROGRAM
appeals to ‘third-factor’ explanations: those that are based on factors outside of universal grammar.
The reason for that is clear — the more that you remove from UG, the more that other systems are going to need to take over the work.
So maybe economy principles arise from pressure for efficient computation and have nothing to do with UG.
80
THE MINIMALIST PROGRAM
The next few classes will go into more detail about the MP, its strengths and its weaknesses.
One thing to keep in mind: If the narrow syntax is not accounting for grammatical complexity, then what is?
The answer: the lexicon and the interface components.
If so, does the MP lead to an overall simplification?
81
LEXICALIST APPROACHES
Not all formal linguists work in ‘Chomskyan’ syntax.
Recall that the idea was to impose more and more constraints on syntactic transformations.
By the late 1970s, some linguists posited ‘constraining’ transformations out of existence.
These became (super)-lexicalist approaches.
82
LEXICALIST APPROACHES
Head-Driven Phrase Structure Grammar (HPSG)
IVAN SAG, 1949-2013
83
LEXICALIST APPROACHES
Lexical Functional Grammar (LFG)
JOAN BRESNAN
84
LEXICALIST APPROACHES
A wide range of approaches called ‘Construction Grammar’
ADELE GOLDBERG