Comparing three methods for sentence judgment

30
Comparing three methods for sentence judgment experiments Shin Fukuda Dan Michel Henry Beecher Grant Goodall 1

Transcript of Comparing three methods for sentence judgment

Page 1: Comparing three methods for sentence judgment

Comparing three methods for sentence judgment

experiments

Shin FukudaDan Michel

Henry BeecherGrant Goodall

1

Page 2: Comparing three methods for sentence judgment

3 response methods in experimental syntax• Yes/No forced choice

• n-point numerical scale (Likert scale)

• Magnitude estimation

2

Does the sentence sound good?

1. What does Helen want Gary to drink? yes no

very bad

very good

1. What does Helen want Gary to drink? 1 2 3 4 5

Suppose that the following sentence is given a value of 100 in terms of how good it sounds.

What do you wonder whether Mary bought?Now estimate how good the sentences below sound in relation to this reference sentence.

How good does the sentence sound?

Reference sentence:What do you wonder whether Mary bought?

100

1. What does Helen want Gary to drink?200

Page 3: Comparing three methods for sentence judgment

Standard view of response methods

Advantages Disadvantages

Yes/No Simple task for subjects Coarse-grained

Numerical scale

-Familiar task for subjects-Somewhat fine-grained

-May not be true interval data.-Subjects may make more distinctions than scale allows.

Magnitude Estimation

-Subjects can make as many distinctions as needed-More powerful statistics

Unfamiliar, strange task for subjects

ME provides insights that other methods do not (Bard et al. 1996, Cowart 1997, Featherston 2005 etc.) 3

Page 4: Comparing three methods for sentence judgment

Recent critiques of ME

• Sprouse (2008):

– The role of reference in ME at best questionable.

– Participants are not able to make use of the alleged advantages of ME.

• Weskott & Fanselow (2008):

– ME data not more informative.

– ME data may have more spurious variance.

4

Page 5: Comparing three methods for sentence judgment

Structure of Weskott & Fanselow(2008)

• One set of subjects judges same stimuli with both Yes/No and 7-point scale.

• Another set judges same stimuli with both Yes/No and ME.

• Judgment tasks 2 weeks apart; order balanced

• Stimuli: ACC- and DAT-scrambling in German.

5

Page 6: Comparing three methods for sentence judgment

Structure of Weskott & Fanselow(2008)

• All three methods captured expected contrasts equally well (comparable effect sizes)

• With ½ of subjects, loss of effect size greater with ME than other methods

6

Page 7: Comparing three methods for sentence judgment

Comment on Weskott & Fanselow(2008)

• Potential drawback:

– Same subjects do 2 tasks; experience with one could influence judgments on other.

– Type of stimuli very limited (only scrambling).

7

Page 8: Comparing three methods for sentence judgment

Our study

• Three groups of subjects:

– Yes/No (N=36)

– 5-point scale (N=36)

– ME (N=36)

• All UCSD students, randomly assigned to response method.

8

Page 9: Comparing three methods for sentence judgment

Our study

• Three subexperiments:

– Presence/absence of inversion

– that-trace effect

– Extraction out of subject and object DPs

• Range of expected contrasts, from dramatic (inversion) to subtle (extraction out of subj/obj DPs).

• Are the three methods equally effective in capturing acceptability contrasts with different degrees of subtlety? 9

Page 10: Comparing three methods for sentence judgment

Presence/absence of inversion

1. What will you watch on Thursday?

2. What will he watch on Thursday?

3. What will the man watch on Thursday?

4. What you will watch on Thursday?

5. What he will watch on Thursday?

6. What the man will watch on Thursday?

10

Page 11: Comparing three methods for sentence judgment

Results: InversionY/N 5-point

ME 11

0

0.2

0.4

0.6

0.8

1

inversion no inversion

-0.1

0

0.1

0.2

0.3

0.4

inversion no inversion

p < .05

1

2

3

4

5

inversion no inversion

Page 12: Comparing three methods for sentence judgment

that-trace effect

1. Who do you feel that __insulted Pat at the theater?

2. Who do you feel that Pat insulted __at the theater?

3. Who do you feel __insulted Pat at the theater?

4. Who do you feel Pat insulted __at the theater?

12

Page 13: Comparing three methods for sentence judgment

Results: that-traceY/N 5-point

ME

p < .05

13

Page 14: Comparing three methods for sentence judgment

Extraction out of subject and object DPs

1. What do you think [pictures of __] will be on the website?

2. What do you think the website will post [pictures of __]?

3. Do you think pictures of the new car will be on the website?

4. Do you think the website will post pictures of the new car?

14

Page 15: Comparing three methods for sentence judgment

Results: Extraction from subject/object DPs

Y/N 5-point

ME

p < .05

15

Page 16: Comparing three methods for sentence judgment

Interim conclusion

• Yes/No and 5-point scale able to capture contrasts just as well as ME.

16

Page 17: Comparing three methods for sentence judgment

Informativity (effect size)

• Significance is virtually the same across all three methods.

• But are the results of the three methods equally informative?

• We can measure this with η2 (Cohen 1973, Cowart 1997).

• Shows how much of variation between two means is accounted for by factor of interest (as opposed to random variation).

17

Page 18: Comparing three methods for sentence judgment

Inversion

• All three methods found significant difference:

What will you watch on Thursday?

What you will watch on Thursday?

• But how much of this difference is due to presence/absence of inversion?

18

Page 19: Comparing three methods for sentence judgment

η2 of Inversion

6148

29

0

10

20

30

40

50

60

70

80

90

100

Yes/NO 5-point ME

19

% of the total variance

Page 20: Comparing three methods for sentence judgment

that-trace effect

• All three methods found significant difference:

Who do you feel that __ insulted Pat at the theater?

Who do you feel __ insulted Pat at the theater?

• But how much of this difference is due to presence vs. absence of that?

20

Page 21: Comparing three methods for sentence judgment

η2 of presence vs absence of that

89

3120

0

10

20

30

40

50

60

70

80

90

100

Yes/No 5-point ME

21

% of the total variance

Page 22: Comparing three methods for sentence judgment

Extraction out of subject and object DPs

• All three methods found significant difference:

What do you think [pictures of __] will be on the website?

What do you think the website will post [pictures of __]?

• But again, how much of this difference is due to argument position?

22

Page 23: Comparing three methods for sentence judgment

η2 of argument with extraction

23

2231

180

10

20

30

40

50

60

70

80

90

100

Yes/NO 5-point ME

% of the total variance

Page 24: Comparing three methods for sentence judgment

What to conclude from this?

• ME does allow finer-grained distinctions, but…

– This may lead to spurious variance in some cases (cf. Wescott & Fanselow 2008),

– rather than contributing to our understanding of the contrast.

24

Page 25: Comparing three methods for sentence judgment

General implications of this study

• For the working syntactician:

– Any of these response methods works.

– More familiar techniques (both to researcher and to subjects) are just as sensitive as ME.

– ME may introduce some spurious variance.

25

Page 26: Comparing three methods for sentence judgment

And perhaps more importantly…

• “Fear of ME” should not be a roadblock to more widespread adoption of experimental techniques.

• Y/N forced-choice & n-point numerical scale can capture even subtle contrasts with well-designed experiments.

(see also Myers 2009, Philips 2009)

26

Page 27: Comparing three methods for sentence judgment

Thank you!

Experimental Syntax Lab, UCSD

Fall 2009 Experimental Syntax class, UCSD

Sara Cantor, Carleton College

27

Page 28: Comparing three methods for sentence judgment

W&F vs. this study (η2)

Exp1 YN1 YN2 7-point ME

ACC-scrambling .95 .95 .96 .98

Exp2 YN1 YN2 7-point ME

DAT-scrambling .91 .83 .92 .96

W&F

This study

YN 5-point ME

Inversion .61 .48 .29

that-trace .89 .31 .20

Extraction out of Subj/Obj

.22 .31 .18

28

Page 29: Comparing three methods for sentence judgment

η2 and partial η2

• η2 = SSfactor /SStotal

• Partial η2 = SSfactor /SSfactor +SSerror

29

Page 30: Comparing three methods for sentence judgment

Inversion: Break-down by subject type

Y/N 5-point

ME

p < .01

p < .05

30