Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English...

37
Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006 Thomas Hoffmann (University of Regensburg)

Transcript of Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English...

Page 1: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Corpus and Experimental Data as Corroborating Evidence:The Case of Preposition Placement in English Relative Clauses

Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006

Thomas Hoffmann

(University of Regensburg)

Page 2: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

1. Introduction: Corpus vs. Introspection

We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135)

You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5-6)

Which type of data are we left with then?

Page 3: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

1. Introduction: Corpus vs. Introspection

A corpus and an introspection-based approach to linguistics […] can be gainfully viewed as being complementary.

(McEnery and Wilson 1996: 16)

corpus and introspection data = corroborating evidence

case study: P placement in English Relative clauses

Page 4: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

1. Introduction: What to Expect

1. corpora vs. introspection?

2. categorical corpus data (ICE-GB corpus)

3. Magnitude Estimation experiment

4. variable corpus data (ICE-GB corpus)

5. conclusion

Page 5: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. Corpora and Introspection

Arguments against corpus data:

• “performance” problem:

• “negative data” problem:

• “homogeneity” problem:

“only use introspection”

Page 6: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. Corpora and Introspection

Arguments against corpus data: no corpus

• “performance” problem: yet: performance result of competence

modern corpora representative

• “negative data” problem: yet: only additional (different) data needed

• “homogeneity” problem:yet: empirical claim that needs to be investigated

use corpora + additional data type

Page 7: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. Corpora and Introspection

Arguments against introspection data:

• “unnatural data” problem:

• “irrefutable data” problem:

• “illusion” problem:

• “stability” problem:

“only use corpora”

Page 8: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. Corpora and Introspection

Arguments against introspection data: no introspection

• “unnatural data” problem:yet: only additional (context) data needed

• “irrefutable data”:yet: depends only on collection method

• “illusion” problem: yet: only additional (natural) data needed

• “stability” problem: yet: empirical claim that needs to be investigated

use corpora + additional data type

Page 9: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. Corpora and Introspection

Corpora and introspection are corroborating evidence:

= weaknesses of corpus data

= weaknesses of introspection data

+ ungrammaticality+ unexpected patterns

+ negative data+ contextual factors

+ rare phenomena+ natural language

introspectioncorpus

Page 10: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

3. Case Study: Preposition Placement

I want a data source ...

(1) a. which I can rely on [stranded preposition]

b. on which I can rely [pied-piped preposition]

driving question:data source for empirical analysis of (1a,b)?

Page 11: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

4. Empirical Study I: Corpus Data

• Corpus used:

International Corpus of English ICE-GB (Nelson et al. 2002)(educated Present-day BE, written & spoken)

• Analysis tool:

GOLDVARB computer programme (logistic regression; Robinson et al. 2001) relative influence of various contextual factors (weights: <0.5 = inhibiting factors; >0.5 = favouring)

Page 12: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Pstrand/pied-piped token tested for

1. finiteness

2. restrictiveness

3. relativizer

4. XP contained in (V / N, e.g. entrance to sth. / Adj, e.g. afraid of sth.)

5. level of formality

6. X-PP relationship (Vprepositional, PPLoc_Adjunct, PPMan_Adjunct …)

except 2: all factors discussed in literature before, but not w.r.t. interdependence (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000)

4. Empirical Study I: Corpus Data I

Page 13: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

raw ICE-GB P-placement data:

1074 finite relative clauses

659 (61.4%) tokens: pied piped

415 (38.6%) tokens: stranded

as expected: many categorical effects

accidental vs. systematic gaps?

4.1 Categorical corpus data

Page 14: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

1. relativizer:

all that/Ø-tokens in ICE-GB stranded

176 that+Pstranded-token

(2) a data source on that I can rely

177 Ø+Pstranded-token

(3) a data source on Ø I can rely

ICE-GB result: expected

implications: (2) = (3)? / that WH-

4.2 Categorical corpus data: that/Ø ≠ WH-relatives

Page 15: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. X-PP relationship:

Literature (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000):

Pstranding favoured with complement PP

disfavoured with adjunct PP

ICE-GB data:

Pstranding restricted to PPs which

add thematic information to predicates/events

4.3 Categorical corpus data: Constraints on Pstrand

Page 16: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. X-PP relationship:

categorical effect of WH-PPAdjuncts-tokens:

a) just P+WH / no that/Ø+P in ICE-GB: manner, degree, frequency & respect PPs, e.g.:

(4) a. the ways in which the satire is achieved <ICE-GB:S1B-014 #5:1:A>

b. the ways which/that/Ø the satire is achieved in

4.3 Categorical corpus data: Constraints on Pstrand

Page 17: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

2. X-PP relationship:

categorical effect of WH-PPAdjuncts-tokens:

b) just P+WH / but that/Ø+P in ICE-GB: subcat. PP (put sth. in/into/under)

& locative, affected loc., direction PP adjuncts

(5) a. … the world that I was working in and studying in <ICE-GB:S1A-001 #35:1B>

b. … the world in which I was working and studying

4.3 Categorical corpus data: Constraints on Pstrand

Page 18: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Claim: comparison of WH- vs that/Ø shows:

P can only be stranded if: PP adds thematic information to predicates/events

• manner & degree adjuncts:compare events “to other possible events of V-ing” (Ernst 2002: 59)

• frequency & respect adjuncts: have scope over temporal information (frequency) and truth value of entire clause (respect)

don’t add thematic participant Pstrand with these: systematic gap

4.3 Categorical corpus data: Constraints on Pstrand

Page 19: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Claim: comparison of WH- vs that/Ø shows:

P can only be stranded if: PP adds thematic information to predicates/events

• subcat. PP & loc., affected loc., direction PP adjuncts:

add thematic participant WH+P with these: accidental gap

4.3 Categorical corpus data: Constraints on Pstrand

Page 20: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Claim: comparison of WH- vs that/Ø shows:

P can only be stranded if: PP adds thematic information to predicates/events

Comparison of WH- vs that/Ø good evidence, but:still “negative data” problem

further corroborating evidence neededIntrospection: Magnitude Estimation study

4.3 Categorical corpus data: Constraints on Pstrand

Page 21: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

• relative judgements (reference sentence)

• informal, restrictive RCs tested for:

P-PLACEMENT (Pstrand, Ppied-piped)RELATIVIZER (WH-, that-, Ø-)X-PP (VPrep, PPTemp/Loc_Adjunct, PPManner/Degree_Adjunct)

• tokens counterbalanced: 6 material groups a 18 tokens + 36 filler = 54 tokens

• tokens randomized (Web-Exp-software)

• N = 36 BE native speakers (sex: 18m, 18f / age: 17-64)

5. Empirical Study II: Magnitude Estimation

Page 22: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

18 filler sentences: ungrammatical

a. That’s a tape I sent them that done I’ve myself (word order violation; original source: <ICE-GB:S1A-033 074>)

b. There was lots of activity that goes on there (subject contact clause; original source: <ICE-GB:S1A-004 #067>)

c. There are so many people who needs physiotherapy (subject-verb agreement error; original source: <ICE-GB:S1A-003 #027>)

5. Empirical Study II: Magnitude Estimation

Page 23: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

ANOVA: significant effects

• P-PLACEMENT: F(1,33) = 4.536, p < 0.05

• RELATIVIZER: F(2,66) = 17.149, p < 0.001

• P-PLACEMENT*X-PP: F(2,66) = 9.740, p < 0.001

• P-PLACEMENT*RELATIVIZER: F(2,66) = 4.217, p < 0.02

5. Empirical Study II: Magnitude Estimation

Page 24: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

ANOVA: not significant

• AGE: F(1,33) = 2.760, p > 0.10

• GENDER:F(1,33) = 1.495, p > 0.20

indicates: homogeneity of subjects

5. Empirical Study II: Magnitude Estimation

Page 25: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Post-hoc Tukey test: P-Place*Relativizer

• Ppied-piped:WH- >> that [p < 0.001]WH- >> [p < 0.001]

that > [p < 0.010]

• Pstrand:no difference:WH- = that = [p >> 0.100]

5. Empirical Study II: Magnitude Estimation

Page 26: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Post-hoc Tukey test: P-Place*X-PP

• Ppied-piped:PPMan/Deg > VPrep [p < 0.010]PPMan/Deg = PPTemp/Loc [p = 0.100]

VPrep = PPTemp/Loc [p > 0.100]

• Pstrand:no difference:VPrep > PPTemp/Loc > PPMan/Deg [p < 0.001]

5. Empirical Study II: Magnitude Estimation

Page 27: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

me

nts

(z-

sco

res)

P+WH

P+That

P+0

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Fig. 1: Magnitude estimation result for P + relativizer

P+WH >> P+that > P+Ø

Page 28: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Fig. 2: Magnitude estimation result for P + relativizercompared with fillers

P+that & P+Ø = ungrammatical fillers violation of “hard constraint” (Sorace & Keller 2005)

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

men

ts (

z-sc

ore

s)

P+WH

P+That

P+0

Filler (grammatical)

Filler (*Agree)

Filler(*ZeroSubj)

Filler(*WordOrder)

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Page 29: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

me

nts

(z-

sco

res)

WH+P

That+P

0+P

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Fig. 3: Magnitude estimation result for relativizer + P

WH + P= that + P = Ø + PVPrep > PPTemp/Loc > PPMan/Deg

Page 30: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

me

nts

(z-

sco

res)

X+P

Filler_Good

Filler(*Agree)

Filler(*ZeroSubj)

Filler(*WordOrder)

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Fig. 3: Magnitude estimation result for relativizer + P

VPrep > PPTemp/Loc > PPMan/Deg >> ungrammatical filler violation of “soft constraint” (Sorace & Keller 2005)

Page 31: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

6. Corroborating Evidence

Corroborating evidence:

corpus: man/deg PPs: no Pstranded (not even with that/) semantic constraint on Pstranded

experiment:man/deg PPs worst environment for Pstranded yet: better than ungrammatical fillers

(soft constraint violation)

Page 32: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Constraints on variable corpus data (354 finite WH-token):

Goldvarb identified 3 independent factors: (Log likelihood = -88.437 Significance = 0.004;

Fit: X-square(27) = 27.977, accepted, p = 0.2040)

1. level of formality (as expected)

2. type of PP contained in (as expected)

3. restrictiveness (unexpected): restrictive RC favour pied piping: (weight: 0.592)

nonrestrictive RC clearly inhibit pied piping (i.e. favour stranding; weight: 0.248)

7. Empirical Study III: Corpus Data II

Page 33: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

(6) And uhm he left me there with this packet of Durex which I hadn't got a clue what to do **[with]** to be totally honest <ICE-GB:S1B-049 #167:1:B>

reasons for restrictiveness effect:

1. weaker semantic ties of non-restrictive clause with antecedent (pause/comma)

2. Pied-piped P receives connective function

functionalisation of preposition placement in WH-relative clause

7. Empirical Study III: Corpus Data II

Page 34: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

corpus and introspection data = corroborating evidence:

corpora:frequency/context effects (e.g. level of formality)unexpected patterns (e.g. restrictiveness)categorical data require further investigation

introspection: differentiation of

accidental gaps (WH+P with PPTemp/Loc)systematic gaps (X+P with PPMan/Deg)detection of degrees of ungrammaticality

8. Conclusion

Page 35: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

9. References

Aarts, B. 2000. "Corpus linguistics, Chomsky and Fuzzy Tree Fragments". In Christian Mair and Marianne Hundt, eds. 2000. Corpus Linguistics and Linguistic Theory. Amsterdam and Atlanta, GA: Rodopi, 5-13.

Bard, E.G. et al. 1996. “Magnitude Estimation of Linguistic acceptability”. Language 72:32-68.

Bergh, G. & A. Seppänen. 2000. “Preposition stranding with wh-relatives: A historical survey”. English Language and Linguistics 4:295-316.

Cowart, W. 1997. Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage.

Huddleston, R. et al. 2002. “Relative constructions and unbound dependencies”. In: G.K. Pullum & R. Huddleston, eds. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press, 1031-1096.

Jackendoff, R. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press.

Levine, R. & I.A. Sag. 2003. “WH-Nonmovement”. <http://www-csli.stanford.edu/~sag>, 04.07.2004.

Page 36: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

9. References

Nelson, G. et al. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam, Philadelphia: Benjamins.

McEnery, T. and A. Wilson. 1997. Corpus Linguistics. Edinburgh: Edinburgh University Press.

Pesetsky, D. 1998. “Some principles of sentence production”. In: Pilar Barbosa et al., eds. Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, 337-83.

Penke, M. & A. Rosenbach. 2004. "What counts as evidence in linguistics? An introduction". Studies in Language 28,3: 480-526.

Pickering, M. & G. Barry. 1991. “Sentence processing without empty categories”. Language and Cognitive Processes 6:229-259.

Quirk, R. et al. 1985. A Comprehensive Grammar of the English Language. London: Longman.

Robinson, J. et al. 2001. “GOLDVARB 2001: A Multivariate Analysis Application for Windows”. <http://www.york.ac.uk/depts/lang/webstuff/goldvarb/manualOct2001>

Page 37: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

9. References

Sag, I.A. 1997. “English relative constructions”. Journal of Linguistics 33:431-484.

Sampson, G. 2001. Empirical Linguistics. London, New York: Continuum.

Schütze, Carson T. 1996. The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press.

Sorace, Antonella and Frank Keller. 2005. "Gradience in linguistic data". Lingua 115,11: 1497-1525.

Trotta, J. 2000. Wh-clauses in English: Aspects of Theory and Description. Amsterdam and Philadelphia, GA: Rodopi.

Van der Auwera, J. 1985. “Relative that — a centennial dispute”. Journal of Linguistics 21:149-179.