Early Association of Prosodic Focus with alleen ‘only ...uczaksz/Mulders and Szendroi...

fpsyg-07-00150 February 29, 2016 Time: 17:36 # 1

001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

032

033

034

035

036

037

038

039

040

041

042

043

044

045

046

047

048

049

050

051

052

053

054

055

056

057

058

059

060

061

062

063

064

065

066

067

068

069

070

071

072

073

074

075

076

077

078

079

080

081

082

083

084

085

086

087

088

089

090

091

092

093

094

095

096

097

098

099

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

ORIGINAL RESEARCHpublished: xx March 2016

doi: 10.3389/fpsyg.2016.00150

Edited by:Judit Gervain,

Centre National de la RechercheScientifique – Université Paris

Descartes, France

Reviewed by:Bálint Forgács,

Université Paris Descartes, FranceAritz Irurtzun,

CNRS – IKER (UMR 5478), FranceAntje Sauermann,

Zentrum für AllgemeineSprachwissenschaft, Germany

*Correspondence:Kriszta Szendroi

[email protected];Iris Mulders

[email protected]

Specialty section:This article was submitted to

Language Sciences,a section of the journalFrontiers in Psychology

Received: 08 June 2015Accepted: 27 January 2016

Published: xx March 2016

Citation:Mulders I and Szendroi K (2016) Early

Association of Prosodic Focus withalleen ‘only’: Evidence from EyeMovements in the Visual-World

Paradigm. Front. Psychol. 7:150.doi: 10.3389/fpsyg.2016.00150

Early Association of Prosodic Focuswith alleen ‘only’: Evidence from EyeMovements in the Visual-WorldParadigmIris Mulders1* and Kriszta Szendroi2*

1 Utrecht Institute of Linguistics OTS, Utrecht University, Utrecht, Netherlands, 2 UCL Linguistics, Psychology and LanguageSciences, University College London, London, UK

In three visual-world eye tracking studies, we investigated the processing of sentencescontaining the focus-sensitive operator alleen ‘only’ and different pitch accents, suchas the Dutch Ik heb alleen SELDERIJ aan de brandweerman gegeven ‘I only gaveCELERY to the fireman’ versus Ik heb alleen selderij aan de BRANDWEERMANgegeven ‘I only gave celery to the FIREMAN’. Dutch, like English, allows accent shift toexpress different focus possibilities. Participants judged whether these utterances matchdifferent pictures: in Experiment 1 the Early Stress utterance matched the picture, inExperiment 2 both the Early and Late Stress utterance did, and in Experiment 3 neitherdid. We found that eye-gaze patterns start to diverge across the conditions alreadyas the indirect object is being heard. Our data also indicate that participants performanticipatory eye-movements based on the presence of prosodic focus during auditorysentence processing. Our investigation is the first to report the effect of varied prosodicaccent placement on different arguments in sentences with a semantic operator, alleen‘only’, on the time course of looks in the visual world paradigm. Using an operator in thevisual world paradigm allowed us to confirm that prosodic focus information immediatelygets integrated into the semantic parse of the proposition. Our study thus providesfurther evidence for fast, incremental prosodic focus processing in natural language.

Keywords: focus, semantics, marked stress, prosody, incremental language processing, eye tracking, visualworld paradigm, anticipatory eye movements and predictions

INTRODUCTION1

Prosodic Focus and Contrast: Pragmatic EffectFocus is an important information-structuring device. It occurs in every utterance, and itsignals to the hearer the most prominent part of the utterance: what is new, or what iscontrasted or highlighted. In many languages including English and Dutch, it is markedby prosodic prominence, specifically, with a pitch accent. Focus processing is crucial forcomprehension of utterances in context. To illustrate this, we can consider what makes a question–answer pair pragmatically felicitous. Capitals indicate prosodic stress and corresponding pitchaccent throughout. (1b), with prosodic accent on the direct object ‘some tea’ is a felicitous

1For ease of exposition, we illustrate the characteristics of focal stress and only with English examples. Everything we claimhere holds for Dutch in the same way.

Frontiers in Psychology | www.frontiersin.org 1 March 2016 | Volume 7 | Article 150

http://www.frontiersin.org/Psychology/

http://www.frontiersin.org/Psychology/editorialboard

http://www.frontiersin.org/Psychology/editorialboard

http://dx.doi.org/10.3389/fpsyg.2016.00150

http://creativecommons.org/licenses/by/4.0/

http://dx.doi.org/10.3389/fpsyg.2016.00150

http://crossmark.crossref.org/dialog/?doi=10.3389/fpsyg.2016.00150&domain=pdf&date_stamp=2016-03-xx

http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00150/abstract

http://loop.frontiersin.org/people/231896/overview

http://loop.frontiersin.org/people/237600/overview


http://www.frontiersin.org/

http://www.frontiersin.org/Psychology/archive


115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

Mulders and Szendroi Early Association of Prosodic Focus with ‘only’

answer to the question in (1a), while (1c) with prosodic accenton the indirect object ‘to the woman’ is not. (Rather, it would bea felicitous answer to a different question, namely ‘Who did yougive some tea to?’) This is because the question in (1a) asks forinformation about the object that was given to the woman, andthus expects the responder to prosodically highlight the directobject in their response.

(1) a. What did you give to the woman?b. I gave some TEA to the woman.c. #I gave some tea to the WOMAN.

Prosodic focus can also play an important role in determiningthe felicity of utterances in a non-linguistic context. An utterancelike (2), with contrastive accent on the modifying adjective, isonly felicitous in a context where not just green balls, but ballsof some other color are also present. The pragmatic function ofthe accent placement here is contrastive.

(2) Give me the GREEN ball.

Given the importance of prosodic accent placement forinformation structuring, many studies have tried to uncoverthe effect of prosodic prominence on language processing.Eberhard et al. (1995) were the first to report an experimentinvolving a reference resolution task in a real world settinginvolving prosodic prominence. The instructions involved eithercontrastive or neutral stress (Touch the LARGE/large blue square)on modifying adjectives in two different visual contexts. Theresults showed that in the contrastive stress condition, thelatency of eye movements to the target referent was significantlyshorter in the setting where contrastive stress was informative(i.e., in contexts with large and small blue squares) than inthe uninformative setting (i.e., in contexts with only largeblue squares). The eye movement latency was also shorter inthe contrastive stress condition compared to the unstressedcondition. So, contrastive stress facilitated reference resolution(but cf. Arnold, 2008).

The facilitatory effect of the contrastive L+H∗ accent inEnglish reference resolution tasks in contrastive contexts hasbeen further supported in a series of experiments by Ito andSpeer (2008, 2011). Here participants heard pairs of instructionslike Hang the blue ball with a second instruction followingbearing either neutral or contrastive stress, e.g., Next, hang thegreen/GREEN ball. In addition to confirming the processingadvantage of contrastive accents on the modifying adjective whenused in a contrastive context, Ito and Speer also demonstratedthat the use of such accents leads to anticipatory looks to thepreviously mentioned entity type (i.e., balls) and to ‘garden path’effects if used in infelicitous contexts (e.g., blue angel followedby GREEN ball). They thus demonstrated early interpretation ofcontrastive prosody.

Note, however, that in all these experiments, the referenceresolution task can also be carried out without the presence ofthe contrastive accent. In other words, were the instructions readout with a different intonation, the reference resolution task couldstill be carried out correctly. The presence of the contrastiveaccent is facilitatory and its absence informative, but ultimately, it

only has a pragmatic effect: it does not contribute to the sentencemeaning directly as it does not change the truth conditions of thesentence.

Prosodic Focus and Only: SemanticIntegrationProsodic focus placement is not only relevant for pragmaticfelicity of certain utterances in linguistic or non-linguisticcontext. Sometimes the position of the prosodic focus withinthe utterance directly contributes to the semantic meaning ofthe utterance. Sentences that involve the operator only are anexample of this.

(3) I only gave some tea to the woman.

In sentences involving an operator, like only, the prosodicfocus placement is not only relevant for pragmatic felicity ofthe utterance in context. Rather, the position of the prosodicfocus within the utterance directly contributes to the semanticmeaning of the utterance. In fact, the correct semantics cannotbe determined without accentual information. So, presentedin writing, (3) is ambiguous; its meaning depends on itsaccentuation pattern. The ambiguity can be resolved by prosody,as in (4).

(4) a. I only gave some tea to the WOMAN.= The only person Igave some tea to was the woman.

b. I only gave some TEA to the woman. = The only thing Igave to the woman was some tea.

In (4a), with pitch accent on the indirect object, only associateswith the stress-bearing indirect object, the woman, while in(4b), with stress on tea, only associates with the direct object,some tea. Accordingly, linguistic theories agree that the differentinterpretations in (4a) and (4b) are an indirect result of the twostress patterns; they arise because the operator only is focus-sensitive, meaning that it associates in its interpretation with thefocus of the utterance (Horn, 1969; Krifka, 1992; Rooth, 1992).Focus, in turn, is determined by main stress and correspondingpitch accent in English (Chomsky, 1971) and Dutch.2,3

2Note that (4a) with indirect object stress is in fact ambiguous in itself betweenthe readings indicated in (i) and (ii). This is not important for the present studyfor two reasons. First, adults strongly prefer the reading in (i), which is the onetargeted in this study (Crain and Steedman, 1985). Second, the experiments involvephonetically marked accent on the indirect object, which again makes the readingin (i) to be the preferred one.

(i) The only person I gave some tea to was the woman = indirect object focusreading

(ii) The only thing I did was give some tea to the woman =verb phrase focusreading

3Only cannot associate with just any focus-bearing element; the focus-bearingelement must be in its scope syntactically. Utterances like (3) are syntacticallydistinct from utterances like (i) in the sense that only in (3) is a verb-phrase-leveladverb, while it directly modifies the subject noun phrase in (i).

(i) Only [the WOMAN] gave a banana to the monkey.The kind of ambiguity that was present in (3) with only modifying the verbphrase, disappears in (i), because the operator only takes scope over its c-commanddomain, which is the verb phrase in (3) and the subject noun phrase in (i).Therefore, (i) can only have the reading exemplified in (iia).

(ii) a. The only person that gave a banana to the monkey was the woman.b. ∗The only event that took place was the woman giving a banana to the

monkey.






229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342


In order to be able to consider the psycholinguistic aspectsof processing only-sentences, we need to understand in a littlemore detail how the semantic meaning of such sentences isdetermined. Utterances with only can be decomposed into twoconjoined propositions (Horn, 1969; Krifka, 1992; Rooth, 1992).The first conjunct, (5b) and (6b), respectively, correspond tothe meaning of the proposition without only. This is called the‘presupposition’ or the ‘non-focal meaning component’. This partof the meaning is shared by the two utterances with indirectobject and direct object stress (5a and 6a). The second conjunctis entailed by the original only-sentence. It expresses the fact thatthe presence of only has the effect that the proposition does nothold for any other relevant alternatives. This part of the meaningis called the ‘assertion’ or the ‘focal meaning component’ (5c and6c).

(5) a. I only gave some tea to the WOMAN.b. Non-focal meaning component:

I gave some tea to the woman ANDc. Focal meaning component:

For all x [x 6= the woman], I did not give some tea to x.(6) a. I only gave some TEA to the woman.

b. Non-focal meaning component:I gave some tea to the woman AND

c. Focal meaning component:For all y [y 6= some tea], I did not give y to the woman.

As we can see from (5c) and (6c), it is the focal meaningcomponent that bears the semantic difference between the twoutterances with different prosodic accent placement. The non-focal meaning component is the same. So, it is the focal meaningcomponent that we need to target in our psycholinguisticinvestigations.

It helps to understand that the focal meaning component is infact a set of conjoined propositions. In the case of (5c), we canspell it out as in (7a), while (7b) corresponds to the focal meaningcomponent of the direct object stress utterance, (6c). The exactnumber of alternatives in each assertion set is determined by theactual context of the utterance. So, for instance, in (7a) we useda context where a man and a boy are present in addition to thewoman, while in (7b) we used a context where some coffee andbiscuits were available alongside the tea.

(7) a. {I didn’t give any tea to the man AND I didn’t give any teato the boy}

b. {I didn’t give any coffee to the woman AND I didn’t giveany biscuits to the woman}

Let us now turn to the psycholinguistic characteristics ofprocessing only-sentences. By studying the auditory processingof sentences like (3) we can investigate how fast prosodicfocal information gets integrated into the semantic parse of theutterance. In other words, as soon as we can detect evidence thatpeople can distinguish the meaning in (4a) from the meaningin (4b) in online auditory comprehension, we can concludethat they have processed the prosodic focal information andintegrated that information into the semantic parse of theutterance. Evidence of this can come from evidence of the

participants considering the focal meaning components in (5c)and (6c) or their equivalent set of propositions in (7a) and (7b).

There are two possibilities regarding the timing of thiscomputation. First, it is possible that the integration of focalprosody information is very fast and incremental. This wouldmatch the Ito and Speer (2008) findings about contrastivity.If so, one should see evidence of the non-focal meaningcomponent being considered at the earliest possible point. Giventhe semantics of only-sentences described above, the earliestpoint that the focal meaning component (i.e., 5c and 6c) canbe considered is when the proposition is complete. This is eventrue of utterances with early stress on the direct object, as in(6a). This is because even though in such utterances the prosodicfocus is available earlier, during the direct object, in order tointegrate that information into the semantic parse and computethe non-focal, and focal meaning components, it is necessaryto know the whole proposition, i.e., the verb and the indirectobject.

The second possibility is that semantic integration ofprosodic focus is considerably slower than pragmatic effectsof contrastivity. Perhaps due to the complex nature of thecalculations involved in the semantics of only-sentences (i.e.,non-focal and focal meaning components), it is possible thatevidence of the non-focal and focal meaning components beingconsidered would not emerge until well after the utteranceoffset, during wrap-up processing. Perhaps pragmatic effects ofcontrastivity would be manifest, as found by Ito and Speer (2008),at the point of the occurrence of the prosodic focus itself. Butsemantic integration of the prosodic focus information would bedelayed.

A number of reading studies have been done involving only-sentences. Paterson et al. (2007) compared reading times fordative sentences (and also double object constructions) wherethe position of the focus particle varied between a pre-directobject position (e.g., Jane passed only the salt to her mother)and a pre-indirect-object position (e.g., Jane passed the salt toonly her mother). In these constructions only associates with theimmediately adjacent noun phrase. In terms of the semanticsof only-sentences discussed above, this means that the focalmeaning components for the test sentences were Jane didn’tpass anything else to her mother and Jane didn’t pass the salt toanyone else, respectively. Accordingly, they used congruous vs.incongruous replacives as continuations to the sentences (suchas but not the pepper / but not her father) to determine whetherparticipants are sensitive to the placement of the focus particlewhen creating contrasts. This is based on the expectation thatif participants compute the relevant focal meaning componentby the time they encounter the replacives, they would findthem incongruous if they are mismatched. They found that theposition of only evoked the expected focus effect on-line (seealso Sauermann et al., 2013). This, however, manifested itself inlonger reading times for the postreplacive region, rather than thereplacive region itself. Paterson et al. suggested that ‘this delay[...] was attributable to the operation of inferential processes toevaluate the congruency of the supplied contrast’ with the focusstructure of the sentence (Paterson et al., 2007, p. 1440). Giventhis delay, it is not possible to determine whether the semantic






343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456


integration of focus is itself late, or if it happens earlier, but ismasked by the delay caused by the inferential process involvedin determining the focus in the absence of direct prosodicinformation.

Another study that involved sentences with only is the self-paced reading experiment by Crain et al. (1994), extended bySedivy (2002) (cf. Paterson et al., 1999; Clifton et al., 2000;Liversedge et al., 2002; Filik et al., 2005). This study investigatedvariants of established garden-path sentences involving only:

(8) a. Businessmen loaned money at low interest were told torecord their expenses.

b. Only businessmen loaned money at low interest were toldto record their expenses.

The results indicated that the presence of only amelioratesthe garden path effect. This effect is consistent with a scenariowhere participants create a contrast set based on the presence ofthe operator, prompting them to build the appropriate reducedrelative clause structure already before the disambiguating mainclause verb were told appears. The amelioration of the gardenpath effect disappears again when a contrast set is given inthe discourse. We can take this as evidence that only promptsreaders to generate a contrast set (if there is none in the context)at some point before the disambiguating main clause verb.But we do not know exactly at what point it happens beforethen.

Overall, while these reading studies indicate that focusinformation is used during processing, by their nature readingstudies cannot be informative about the disambiguating role ofstress, as stress is generally not marked in writing. Furthermore,reading studies typically tap into the analysis that participantsmake by disconfirming this analysis later on in the text,measuring a resulting slowdown effect at that point; this meansthat there can always be a gap between the point in time wherethe analysis was made by the participant and when we detect itseffects.

The visual world paradigm can give precise information aboutthe interpretation of the sentence at each point in time duringthe sentence. Gennari et al.’s (2005, p. 250) measured responsetimes and overall fixation patterns in a visual-world paradigm,using a visual setup involving three people: for instance, a woman,a man and a boy. In the picture, the boy had a glass of milk infront of him, the man had a glass of milk and a cup of coffee.The woman, standing in the background, was holding a tray witha milk carton and a teapot. Participants heard utterances eitherwith neutral stress on the indirect object (like 9a) or with markedstress on the direct object (9b) in a picture verification task.

(9) a. The mother only gave some milk to the boy. Neutral stressFALSE

b. The mother only gave some MILK to the boy. Markedstress TRUE

Gennari et al. (2005) proposed that ‘marked stress is usedimmediately by the parser to decide which noun phrase bearssemantic focus and, therefore, which contrast set should beinvoked for sentence interpretation’. In other words, they

proposed that focus processing is fast and incremental in only-sentences. They reached their conclusion based on their findingthat in the Neutral Stress condition, there were fewer correctresponses than in the Marked Stress condition (MS: proportionof correct responses 0.84, SD: 0.18; NS: 0.70, SD: 0.19). Notethat this is an indirect reasoning: there could be many reasonswhy the number of correct responses was lower in the NeutralStress condition that have nothing to do with the potential earlyintegration of Marked Stress information. They did not find aresponse time difference between the two conditions (Gennariet al., 2005, p. 254). Note, however, that the expected responsesdiverged in the two conditions (MS: TRUE, NS: FALSE). It ispossible that this influenced response times because it may takelonger or shorter to verify a proposition than to falsify it. Therewas also a qualitative difference between the phonetic salience ofneutral and marked stress, which may have boosted participants’performance in the Marked Stress condition.

Gennari et al. (2005) only report overall proportion of lookson the various entities treating the entire utterance and the timebetween the utterance offset and the participants’ response as onesingle time window. They found that the ‘boy’s milk’ draws asignificantly higher proportion of looks when it bears contrastivestress (i.e., MS) compared to when it does not (i.e., NS).4

However, the different pattern of looks across the conditions canonly be interpreted as evidence for early, incremental effect offocus if they are time-locked to the appearance of the prosodicfocal information in the auditory input. To establish this, onewould need to know not only the overall fixation patterns,as provided by Gennari et al. (2005), but also how the eyemovements progress as the sentence unfolds. To sum up, Gennariet al. (2005) found that the number of correct responses washigher in the Marked Stress condition, but no difference inresponse times. They also found that entities bearing contrastivefocus are targeted more by eye gaze if overall looks are considered,raising the possibility that a pragmatic effect of contrast occursearly. They did not investigate the time course of semanticintegration of prosodic focal information.

To sum up, we have seen that prosodic focus, as aninformation structuring device, often has pragmatic effects, i.e., itmakes certain utterances felicitous or infelicitous in context. Onesuch effect is contrastivity, which was investigated by Eberhardet al. (1995) and Ito and Speer (2008, 2011). What they found

4They further report that significantly more looks targetted what they labelled‘contrast’ entities, namely ‘the man’s coffee (as well as on the set of contrastingelements such as the teapot taken as a whole)’ (Gennari et al., 2005, p. 256), inMarked Stress than in Neutral Stress. Unfortunately, it is unclear whether thecombined looks to ‘contrast entities’ includes looks to ‘the man’s milk’, which isa crucial object for determining the truth value of the sentence The mother onlygave some milk to the boy. Whether or not the ‘man’s milk’ is included in whatGennari et al. (2005) called ‘contrast entities’, it is difficult to interpret the increasedlooks to these entities in the Marked Stress condition. It is unexpected in the lightof the semantics outlined above (see examples 5–7), since neither the man’s coffeenor the woman’s teapot plays any role in either the non-focal or the focal meaningcomponent in the Marked Stress condition. For establishing the truth value of thesentence The mother only gave some MILK to the boy, only the boy’s possessions arerelevant. At the same time, it is also hard to interpret this increase as manifestationof the pragmatic effect of contrastivity demonstrated by Ito and Speer (2008, 2011).Ito and Speer (2008, 2011) found that looks increase to the target entities whencontrastive accent is used appropriately, not to the entities contrasting with thetarget entity.






457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570


was that contrastive prosody facilitates reference resolution andthat the contrastive information is interpreted with respect tothe actual context. But focus can also directly contribute to thesemantics of the utterance if an operator such as ‘only’ is presentin the utterance. In such sentences, one simply cannot computethe full meaning of the utterance without knowing where theprosodic focus is. So in such sentences, focus has a semanticeffect, not just a pragmatic one. Semantic integration of focuswas investigated by Paterson et al. (2007), but in a reading study,so the position of the focal prosody is only inferred, which mayhave contributed to the observed delay in the integration of theprosodic focus information into the semantic parse.

Our StudyWe investigated utterances with alleen ‘only’ with different focalaccent placements in a visual world paradigm, such as the DutchIk heb alleen SELDERIJ aan de brandweerman gegeven ‘I onlygave CELERY to the fireman’ versus Ik heb alleen selderij aan deBRANDWEERMAN gegeven ‘I only gave celery to the FIREMAN’.Our objective was to detect the earliest point that participants’ eyegaze patterns give evidence that they consider the propositionsthat make up the focal meaning component of the utteranceswith different focal accents. Our study thus reveals how fastdifferent focal accent placements on the arguments of the verbget integrated into the semantic parse of the utterance duringauditory comprehension. We carried out three visual-worldparadigm experiments to investigate these issues, measuringresponse times and the time course of eye movements. InExperiment 1 the divergent expected responses (Early Stress:YES; Late Stress: NO) corresponded to the Gennari et al. (2005)study to maximize chances of comparison. In Experiment 2, thevisual stimulus was adapted in such a way that the expectedresponse was YES in both conditions. In Experiment 3, thevisual stimuli were changed to trigger NO responses in bothconditions.

Our hypothesis was that if participants integrate prosodic focalinformation immediately, their looks will reflect the semanticparsing of the utterance during utterance comprehension. Thealternative position is that only the pragmatic effect of contrastis fast, while semantic integration of prosodic focus into theparse only appears later, during wrap-up computation. In orderto determine the expected looks during sentence verification, letus apply Rooth’s (1992) semantics to the specific example fromour experiments to the visual scene seen in Experiment 1, seeFigure 1.

(10) Early Stress (ES) condition:

a. Example in English: I only gave CELERY to the firemanb. Non-focal meaning: I gave celery to the firemanc. Focal meaning: I did not give anything else to the

fireman = {I didn’t give x to the fireman AND I didn’tgive y to the fireman AND I didn’t give z to the fireman. . . , where x, y, z, . . . are objects that could have beengiven to the fireman in the context}

d. Potentially relevant entities for verification of focalmeaning in visual context: fireman and his objects

FIGURE 1 | Example of visual stimulus for Experiment 1.

(11) Late Stress (LS) condition:

a. Example in English: I only gave celery to the FIREMANb. Non-focal meaning: I gave celery to the firemanc. Focal meaning: I did not give celery to anyone else = {I

didn’t give celery to x AND I didn’t give celery to y ANDI didn’t give celery to z . . . , where x, y, z, . . . are peoplethat a celery could have been given to in the context}

d. Potentially relevant entities for verification of focalmeaning component in visual context: any other personin the picture and their objects

e. Falsifying proposition in Experiment 1: I gave celery tothe diver.

f. Entities relevant for the falsifying proposition inExperiment 1: diver, diver’s celery

Concerning the time course of the verification procedure thefollowing predictions hold. The earliest point at which the focalor non-focal meaning components can be verified is when theproposition is complete. Given that the verb is predictable in ourexperiment, we may actually find verification of the focal (andnon-focal) meaning component to start at the indirect objectdirectly preceding it. In the ES condition, marked stress and thusfocus can be identified earlier: at the point when the direct objectis heard. But note that the focal meaning components cannot yetbe computed at this point. When the participant hears I only gaveCELERY to the. . . the sentence could end in two different ways.Either the indirect object turns out to be the fireman, as in ouractual example, in which case the utterance matches the picture,or the indirect object could turn out to be the diver, in whichcase the utterance would not match the picture. (In principle, the






571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684


utterance could also end with the indirect object the doctor, butthis would constitute an infelicitous utterance. Since the doctorhas no celery in the picture, the non-focal meaning componentwould not be true.) For this reason, we do not expect a differencein response times across the conditions. In terms of expectednumber of correct responses, we do not expect a differencebetween the conditions either. Both the ES and LS conditionscontain prosodically marked, contrastive stress associated withonly so there is no reason to expect that either condition wouldbe easier to perform than the other.

Given the basic linking hypothesis that auditory inputguides visual attention, one would expect looks to targetthe entities mentioned in the utterances, i.e. the firemanand any celery when these words are heard. Looks to theseentities could also reflect verification of the non-focal meaningcomponent, which directly corresponds to the utterance withoutonly.5

The main goal of this paper is to investigate the effectsof prosodic focal accent on the direct object versus theindirect object in only-sentences. For this reason, we will beprimarily interested in the looks that can be attributed tothe different focal meanings. The looks required to verify thefocal meaning components differ across the two conditions.Let us take the ES condition first. In order to verify I didn’tgive anything else to the fireman, looks need not shift awayfrom the fireman and his plates. Participants simply need tocheck that the fireman does not have other objects beside hiscelery.

In contrast, in the LS condition, the focal meaning componentis I didn’t give celery to anyone else. Verification of thisproposition would require checking that the propositions makingup the focal meaning component, i.e., (11c), are all true. Thiswould mean looking at any people in the picture and their objectsbecause these are potentially relevant entities for determiningthat the fireman is the only person who received a celery (see11d). Since the utterance with late stress is actually false inExperiment 1, one of these propositions is false. The actualfalsifying proposition in the visual context is ‘I gave celery tothe diver’ (see 11e). In other words, it is due to the fact thatthe diver has a celery in the picture, that the utterance doesnot match the picture. For this reason, we expect that lookswill target the diver and the diver’s celery. Without lookingat these entities, it is simply impossible to reach the correctresponse. In addition, looks targeting the diver’s corn duringthe computation of the response in the LS condition are alsoconsistent with our predictions but are not necessary to reach acorrect response. While the diver’s corn is irrelevant for the focalmeaning component, participants may look at it to verify that itis not celery. In addition, participants might target the doctor, tosee if he has any celery. Since in our specific example the doctor’splates are empty and the emptiness of a plate is easy to identify

5We remain agnostic as to whether participants would actually verify the truthof the non-focal meaning component. Kim (2008) found that presuppositionalmeaning is sometimes verified, and sometimes not depending on the task and thesaliency of the entities and the grammatical encoding of the elements. Note that ifthe focal meaning is verified, the verification can only take place after the indirectobject has been heard.

parafoveally, it is expected that the doctor’s empty plates are notdirectly targeted by looks.

To sum up, our hypothesis is that once participants proceedto verify the focal meaning component, we expect that lookswill diverge across the two conditions. In particular, we predictthat in the ES condition looks will stay on the fireman and thefireman’s celery, while in the LS condition, looks will shift to thediver, the diver’s celery and to a lesser extent to the diver’s corn(and perhaps even to the doctor). This could take place at theearliest during the indirect object or during the sentence finalverb gegeven if semantic integration of prosodic focus is fast, andmay occur after the utterance offset if semantic integration ofprosodic focus is slower. If looks diverge in the predicted wayalready at the point of the indirect object, we would take that tobe evidence for fast, incremental semantic integration of prosodicfocus information.

There is one additional specific conclusion that we can draw,if our predictions are born out, irrespective of the fast or slownature of the semantic integration. The proposed findings wouldconstitute evidence that the verification process corresponds tothe semantics associated with the utterance (see Horn, 1969;Krifka, 1992; Rooth, 1992 and discussion above). In principle,one may imagine that instead of looks corresponding directlyto the focal meaning component participants could engage inheuristic strategies. For argument’s sake, one may hypothesizefor instance that in order to verify an utterance involving only itwould be enough to look for the falsifying entity. So, in I onlygave celery to the FIREMAN looks could target any celery inthe picture that does not belong to the fireman. The participantcould legitimately reject the utterance without having verifiedthat this offending celery in fact belongs to the diver. In otherwords, looks to the falsifying entity are logically necessary forfalsification, but looks to the possessor of that falsifying entity arenot. If we find looks targeting the diver too, that would supportthe hypothesis that sentence verification follows the proposition-based semantics associated with only-sentences. In other words,we take looks to the diver (in addition to looks to the diver’scelery) as an indication that participants do not simply look for anoffending object, but attempt to verify the relevant proposition ofthe focal meaning component, I didn’t give celery to anyone else,falsified by the proposition I gave celery to the diver. Naturally,this evidence is only indirect. We cannot be sure that looks to thediver necessarily mean that participants entertain the falsifyingproposition. But the implication holds the other way: anyoneentertaining the falsifying proposition would have to look at thediver as well as his celery.

EXPERIMENT 1

Materials and MethodsParticipantsTwenty adult participants were recruited from the UiL OTSparticipant pool, which is largely made up of undergraduatestudents from Utrecht University. All participants were non-dyslectic native speakers of Dutch. Participants were unawareof the purpose of the experiment, and were paid 5€ for their






685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798


participation. The mean age of the participants was 22;8 years(range: 19–29); 18 participants were females; 17 were right-handed. This study was carried out in accordance with researchethical laws of the Netherlands with informed consent from allsubjects.

MaterialsSixteen items were constructed in two conditions. Figure 1 aboveshows an example scene for both conditions. There are threepersons in the picture; a diver, a doctor and a fireman in thisexample. The persons all hold large plates on their left and rightside. Some of the plates are empty and some contain food or drinkitems, a celery or a corn cob in this example. In particular, thediver has a plate with a celery and one with a corn cob, while thefireman has a plate with a celery and an empty plate.

As shown by the example test items in (12) the expectedresponse in the ES condition was YES, while it was NO in theLS condition. This allowed us to determine whether participantsreached the correct response depending on the prosody of theitem. This also allowed us to have results that are comparableto Gennari et al.’s (2005) study, although this difference doesintroduce a potential confound for response time measurements.

The visual scenes were designed using a 3 × 3 grid design,the distance between any two objects is identical and would besufficiently large to be well-suited for eye-tracking evaluation.The pairs of objects in the pictures – corn and celery in Figure 1 –were chosen to match in size, shape, and gray value, to makesure participants shift their gaze to them and not identify themparafoveally while looking at the person in the middle.

The audio stimuli corresponding to the visual scene inFigure 1 are in (12). The full items list is in Appendix 3.

(12) a. ES condition: Expected answer: YESIk heb alleen SELDERIJ aan de brandweerman gegevenI have only celery to the fireman given‘I only gave celery to the fireman.’

b. LS condition: Expected answer: NOIk heb alleen selderij aan de BRANDWEERMAN gegevenI have only celery to the fireman given‘I only gave celery to the fireman.’

For practical reasons, the experiment was performed in Dutch.Dutch prosody is sufficiently similar to English prosody to allowcomparison with previous work in English. Both languages markcontrastive focal accent by enhanced duration and H∗L pitchaccent. Stress placement within an utterance is free to matchthe focus within the syntactic scope of the semantic operator.Relevant phonetic details for the examples in (12) are in TableS1 in Appendix 1.

Verbal stimuli were pre-recorded by a female native speakerof Dutch. They were checked for the placement of pitch accentsusing PRAAT (Boersma and Weenink, 2006); for examples seeFigures 2 and 3.

The names of target objects and people that were usedin the sentences were matched in length. They were all atleast three syllables long. There were no significant differencesbetween conditions in the overall lengths of the audio stimuli[t(16)= 0.95, p= 0.925].

FIGURE 2 | Pitch track for example of Late Stress (LS) Conditionutterance.

FIGURE 3 | Pitch track for example of Early Stress (ES) Conditionutterance.

Although the utterances as a whole do not differ in lengthin the two conditions, there are slight differences in lengthbetween conditions in certain auditory segments: the directobject segment was slightly longer when it was stressed (in theES condition), as was the indirect object segment in the LScondition. These differences canceled each other out overall,since each condition contains exactly one segment with markedstress.

Ninety-eight filler items were constructed including variousquantifiers (e.g., niet iedereen ‘not everybody’). The fillers werebalanced for YES/NO expected responses. To match our testitems, the fillers either involved early marked stress on the directobject or late marked stress on the indirect object. The fillersincluded a set of 32 control items involving alleen, 16 with earlyand 16 with late stress, where the expected response was differentfrom the expected response of the corresponding test condition.This would discourage people from developing a strategy relatingthe position of the accent to the expected response in thetest items (i.e., early stress = YES; late stress = NO). Finally,half of these control items referred to the ‘doctor’ (i.e., tothe person in the middle in the visual stimulus), so thatparticipants do not disregard the middle person in the picturein general. A list of the type of fillers used is in Table S2 inAppendix 1.

Furthermore, we controlled for potential confounds causedby the spatial location of objects in the picture by varyingtheir positions on the top-down and the left-right axes. The






799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912


falsifying entity for the LS condition (the diver’s celery in theexample in Figure 1) appeared in four different positions: infour items it was located in the top left-hand corner of thepicture (as in Figure 1); four times it appeared in the top right;four times in the bottom right; and four times in the bottomleft.

ProcedureThe participants were tested individually in a sound-treatedbooth. Prior to the experiment, they read an instruction sheet,which included the setting of the experiment (see Appendix2 for a translation). This provided a context in which bothutterances with early stress and utterances with late stress wouldbe pragmatically felicitous. The participants’ task was to indicatewhether the sentence matched the visual scene by pressing abutton on a button box, using the dominant hand to give a YESresponse, and the non-dominant hand to give a NO response.This might have introduced a response time bias in favor of theES condition.

The experiment was programmed in FEP (Veenker, 2005). Eyemovements of the participants’ right eye were recorded with anEyeLink 1000 eye tracker in remote mode using a target stickerto track head movements, at a 500 Hz sampling rate. Participantswere seated at a distance of 600–650 mm from the screen wherethe visual image was presented; the height of the participants’chair was adjusted to get an optimal image of the eye.

After the experimenter had ensured a clear image of thepupil, corneal reflection, and target sticker, the experimenter leftthe participant booth and a 13-point calibration and validationprocedure was initiated from the control room. These wererepeated until the experimenter was satisfied that they weresuccessful. Every stimulus was preceded by a fixation target inthe middle of a blank screen. An automatic drift check wasapplied as the participant fixated this fixation target and arecalibration initiated if the drift check indicated a drift of morethan 20 pixels. Participants were allowed 1000 ms to explorethe visual scene before the utterance was presented. The wholeprocedure, including instruction and calibration, took about20 minutes for each participant.

After successful calibration, the participants were exposedto a practice block of 12 practice items (fillers, 2 of thoseresembling experimental items), to familiarize them with the task.The practice block was followed by a small pause in which theparticipants could ask questions about the task (if necessary).After this, the experiment would start. The remaining 118 trials(32 test items, 32 controls, 54 fillers) were presented in two blocks;each block was preceded by a calibration.

All the names of the persons and objects involved in theexperimental items were mentioned in the first 16 filler trials(including the practice block), to ensure that the participants hadseen them and knew what they were called.

All participants saw all the test items in both conditions.The items and fillers were presented to the participants in apseudo-randomized order where an experimental item neverdirectly followed another experimental item in any condition;of the (experimental or filler) items involving alleen ‘only’,the trials with late stress never followed a trial with early

stress or vice versa; and experimental items never followeda filler involving alleen ‘only’ with any stress pattern. Nomore than three trials with the same stress pattern occurredsuccessively.

ResultsFor the response data, two experimental trials belonging to thesame participant were removed because the answer had alreadybeen given before onset of the indirect object. In addition, onefiller trial was removed because the answer had been given beforesentence onset.

Number of Correct ResponsesThe percentage of correct responses for the LS condition was98%, for the ES condition 99%. The difference was not significant[F1(1,19) = 2.923, p = 0.104, η2

p = 0.133). The overall correctresponse rate for the experiment was 98%.

Response TimeThe overall mean response time from utterance onset for the LScondition was 3034 ms, while it was 3048 ms for the ES condition.The difference is not significant [F1(1,19) = 0.027, p = 0.871,η2

p = 0.001].So, we did not find a significant effect in response time or

accuracy across the conditions.

Eye Gaze PatternsCoding and analysisWe identified six Areas of Interest, the ‘fireman’, the ‘fireman’scelery’, the ‘diver’, the ‘diver’s celery’, the ‘diver’s corn,’ and the‘doctor’. See Figure 1. Fixations were assigned to the AoI theyoccurred on. For ease of reference, we refer to the AoIs with thecontent of the example stimulus in Figure 1; the data and plotsthat we give are calculated over all items and participants, so whenfor instance we say ‘fireman’, we mean ‘the person in the picturewho is part of the non-focal proposition in all the items’.

For the fine-grained analysis of eye movements over time,we divided the utterances into relevant audio segments,and determined the onset of each segment using PRAAT.The sentence segments are: selderij/SELDERIJ ‘celery/CELERY’;aan de ‘to the’; BRANDWEERMAN/brandweerman ‘FIREMAN/fireman’; gegeven ‘given’. For each segment, we analyzed thefixation samples falling between 200 ms after segment onset,and 200 ms after the offset of that auditory segment, to takeinto account that it takes 200 ms to launch a saccade drivenby linguistic input (cf. Altmann and Kamide, 2004). The finalthree segments comprise the interval starting at 200 ms afterthe onset of the verb gegeven ‘given’ and ending 1500 ms later.This was divided into three segments of identical length. Forease of reference we call these the auditory segment gegeven‘given’, the ‘first 500 ms interval after offset’ and the ‘second500 ms interval after offset’. For reference, the average durationsof the utterance components are given in Table S3 in Appendix1.

For each experimental trial, we computed the proportionof time the participant spent fixating each area of interest ineach auditory segment. We averaged these proportions over






913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026


TABLE 1 | Analysis of variance per auditory segment for Experiment 1.

Auditory segment Factor df1 df2 F1 p η2p

Direct object AoI 3.689a 70.087a 4.814 0.001∗ 0.202

Condition 1 19 4.682 0.043∗ 0.198

AoI∗condition 5 95 2.113 0.071† 0.100

aan de ‘to the’ AoI 3.415a 64.885a 24.715 0.000∗ 0.565

Condition 1 19 0.223 0.642 0.012

AoI∗condition 3.729a 70.849a 2.747 0.023∗ 0.126

Indirect object AoI 4.148a 78.805a 104.798 0.000∗ 0.847

Condition 1 19 5.586 0.029∗ 0.227

AoI∗condition 4.032a 76.607a 8.793 0.000∗ 0.316

gegeven ‘given’ AoI 3.015a 57.278a 65.559 0.000∗ 0.775

Condition 1 19 0.950 0.342 0.048

AoI∗condition 5 95 28.532 0.000∗ 0.600

0–500 ms after offset AoI 3.064a 58.217a 30.372 0.000∗ 0.615

Condition 1 19 1.438 0.245 0.070

AoI∗condition 2.210a 41.998a 19.112 0.000∗ 0.501


Condition 1 19 0.395 0.537 0.020

AoI∗condition 2.884a 54.803a 3.907 0.003∗ 0.171

aHuynh-Feldt corrected.

TABLE 2 | Proportions of time spent fixating each of the relevant people and objects in Early Stress (ES) and Late Stress (LS) condition for each auditorysegment, and pairwise comparisons between conditions with Bonferroni correction applied for Experiment 1.

Auditory segment Person/object Proportion of time looking in ES Proportion of time looking in LS F1(1,19) p η2p

Indirect object Diver 0.10 0.14 10.557 0.004∗ 0.357

Diver’s celery 0.04 0.10 15.540 0.001∗ 0.450

Fireman 0.42 0.33 14.756 0.001∗ 0.437

gegeven ‘given’ Diver 0.05 0.14 29.426 0.000∗ 0.608

Diver’s celery 0.04 0.12 13.907 0.001∗ 0.423

Fireman 0.42 0.24 83.212 0.000∗ 0.814

Fireman’s celery 0.19 0.12 9.824 0.005∗ 0.341

0–500 ms after offset Diver 0.04 0.10 14.356 0.001∗ 0.430

Diver’s celery 0.02 0.08 25.298 0.000∗ 0.571

Diver’s corn 0.01 0.04 8.803 0.008† 0.317

Fireman 0.29 0.13 20.688 0.000∗ 0.521

Fireman’s celery 0.13 0.08 6.946 0.016† 0.268

participants for the ES and LS conditions. We carried out six two-way repeated measures ANOVAs (using SPSS 22), one for eachauditory segment, using Visual AoI and Condition as the twomain factors. The results are reported in Table 1. We also provideeffect size measures (η2

p).We found a significant main effect for AoI in all auditory

segments, meaning that participants look more to some AoIsthan to others in all auditory segments. We also found asignificant main effect for condition in the direct object andthe indirect object auditory segments, which means that onecondition has a more unequal distribution of the proportionof looks than the other. Since they are not relevant toour research question, we do not investigate these maineffects further. Although these may well contain interestinginformation about how our test sentences are processed, in

this paper, we concentrate on the differences in eye gazepatterns that can be attributed to the prosodic differenceacross the conditions, which is manifested in the interactionsbetween Visual AoI and Condition. This is consistent withour intention to investigate whether (and if so when) thereis any effect of the position of prosodic focus (i.e., earlyversus late) on the eye gaze patterns associated with auditorysentence processing. Thus, the most relevant findings are thatwe found significant interactions of Visual AoI and Conditionfor the aan de ‘to the’, the indirect object, the verb (gegeven‘given’) time segments, and for the first and second 500 msinterval after utterance offset. These are indicated in bold inTable 1 for ease of reference. For these auditory segments,we carried out pairwise comparisons, applying Bonferronicorrections, to reveal which AoIs were targetted by eye fixations






1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140


FIGURE 4 | Time course of the mean fixation time proportions for each AoI in Experiment 1.

differently, at any segment, across the two conditions. Thesignificant results are reported in Table 2. Statistically significantdifferences are indicated with an asterisk, and marginallysignificant effects are marked with a dagger throughout thepaper.

Figure 4 gives the time course of the mean fixation timeproportions for each AoI in each auditory segment in Experiment1. The figure contains 8 plots corresponding to the nine cells inour 3 × 3 visual scene (the ‘doctor’s empty plates’ are plottedtogether). In each plot we give the mean proportion of timespent looking at that cell (e.g., the ‘fireman’ etc.) during eachauditory segment in the two conditions (ES vs. LS). The error barsindicate±2 SE.

DiscussionBehavioural MeasuresWe found a high number of correct responses in both conditionsand no significant difference in the number of correct judgmentsbetween the conditions. We speculate that the reason why wegot a higher number of correct responses than Gennari et al.(2005) may have been because their experiment involved a morerealistic and more complex visual scene, while ours was a stylised3 × 3 design, and because we provided an overall context story,while participants in the Gennari et al. (2005) study heard itemswithout context.

Like Gennari et al. (2005), we did not find any significantdifferences in response times between the conditions.






1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254


Importantly, this result may have been influenced by thedifference in expected responses (ES: YES, LS: NO), as in theoriginal Gennari et al. (2005) study. It is possible that it takeslonger (or shorter) to find a falsifying entity in a picture thanit takes to scan the picture and verify that there is no falsifyingentity present. It could also be the case that a negative judgmenttakes longer (or shorter) than a positive one. Moreover, it ispossible that the use of the non-dominant hand to tap a NOresponse introduced a bias in favor of the ES condition. Inorder to control for these factors, we performed Experiment2, where the expected response in both conditions was YES.In Experiment 3, the expected response was NO in bothconditions.

Eye Gaze PatternsThe most important finding from Experiment 1 is that looksstarted diverging across conditions in the predicted way duringthe indirect object time segment: More looks targeted the ‘diver’and the ‘diver’s celery’ in the LS condition than in the EScondition and more looks targeted the ‘fireman’ in the EScondition than in the LS condition. This provides evidence thatparticipants have computed the focal meaning component asearly as the indirect object time segment: so focus informationwas integrated into the semantic parse of the utterance very fast.This is because the observed divergent looks correspond to theparticipants’ attempt at verifying the focal meaning componentof the utterances, which is different in the two conditions [see(10c) and (11c) above].

The looks follow the predictions further at the sentence-finalverb gegeven ‘given’ and after the utterance offset: participants’looks target the ‘fireman’ and the ‘fireman’s celery’ more in theES condition, while looks target the ‘diver’, the ‘diver’s celery’, andsomewhat later also the ‘diver’s corn’ in the LS condition (theeffect on the ‘diver’s corn’ is right at the Bonferroni-correctedalpha level, to be on the conservative side we take it to bemarginally significant). As expected, the effect was longer on the‘diver’s celery’ and the ‘diver’ as these correspond to the actualfalsifying proposition in the LS condition, and it was shorter andless pronounced on the ‘diver’s corn’ which is an entity that is onlypotentially relevant for falsification.

These findings also constitute evidence that the verificationprocess corresponds to the semantics associated with theutterance (see Horn, 1969; Krifka, 1992; Rooth, 1992 anddiscussion above). Participants do not simply look for anoffending object (i.e., ‘diver’s celery’), their looks also target theperson holding that object (i.e., ‘diver’). We take this to mean thatthey are verifying the relevant proposition of the focal meaningcomponent, I didn’t give celery to anyone else, falsified by theproposition I gave celery to the diver.

Our expectation that looks follow the logic of the semanticallydetermined focal meaning component is further supported by therelative absence of looks to irrelevant entities. While participantsdo look at the (potentially) falsifying entities (the ‘fireman’s celery’in the ES condition; the ‘diver’s celery’ in the LS condition),they do not target Gennari et al.’s (2005) ‘contrast entities’, i.e.,the ‘diver’s corn’ in the ES condition. At no auditory segment

are looking proportions to the ‘diver’s corn’ higher in the EScondition than in the LS condition.

Overall, our findings are consistent with early and incrementalfocus identification and association with only, consolidatingearlier results by Gennari et al. (2005), Paterson et al. (2007),and Ito and Speer (2008), and pinpointing the effect to theearliest point in time that participants have the necessaryinformation to compute the meaning components of theutterance, namely the indirect object. No facilitation wasfound for response times. However, this may have been dueto the fact that the two conditions had divergent expectedresponses in Experiment 1. We investigate this issue further inExperiment 2.

As a final point, recall that our research aim is to determinethe earliest point at which participants’ looks give evidence ofdistinguishing the two conditions. In general, in visual-worldeye-tracking, we can observe that fixation proportions on aparticular AoI always grow gradually, eventually reaching apeak, and then gradually diminishing. As a result, any robustdifference we find in a particular sound segment is likely to beimmediately preceded (and followed) by a less robust differencein the preceding (or following) sound segment. As noted above,we found a predicted difference between time spent fixatingon the ‘fireman’ in the ES and the LS conditions, which issignificant (with Bonferroni correction) from the indirect objectsegment onwards. But as Figure 4 shows, the difference inlooks to the ‘fireman’ across the conditions seems to startto grow already during the aan de ‘to the’ segment. At thissegment, the effect is not robust under correction for multiplecomparisons [F1(1,19) = 6.006, p = 0.024, η2

p = 0.240], so







1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368


we cannot draw any strong conclusions from it, but it is anindication that the effect may start to happen earlier than weexpected. This is surprising because it suggests that participantsstart looking at the ‘fireman’ before they have actually heardthe noun brandweerman ‘fireman’. We interpret the potentialearly start of the effect as the participants’ anticipating thecontinuation of the utterance to be brandweerman ‘fireman’ atthe point when they have heard Ik heb alleen SELDERIJ aan de. . .‘I only (gave) CELERY to the. . .’. We tested this hypothesis inExperiment 3.

EXPERIMENT 2

Material and MethodsParticipantsTwenty non-dyslexic native Dutch speakers were recruited fromthe UiL OTS participant pool. Participants were unaware ofthe purpose of the experiment, and were paid 5€ for theirparticipation. The mean age of the participants was 24;3 years(range: 19–46); 19 females and 1 male; 17 participants wereright-handed.







1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

1473

1474

1475

1476

1477

1478

1479

1480

1481

1482




Direct object AoI 2.351a 44.671a 32.373 0.000∗ 0.630

Condition 1 19 0.914 0.351 0.046

AoI∗condition 2.869a 54.514a 0.952 0.419 0.048

aan de ‘to the’ AoI 1.874a 35.606a 108.419 0.000∗ 0.851

Condition 1 19 0.215 0.648 0.011

AoI∗condition 2.571a 48.857a 0.551 0.623 0.028


Condition 1 19 1.061 0.316 0.053

AoI∗condition 3.339a 63.437a 5.486 0.001∗ 0.224

gegeven ‘given’ AoI 2.089a 39.684a 18.029 0.000∗ 0.487

Condition 1 19 0.213 0.649 0.011

AoI∗condition 4 76 9.312 0.000∗ 0.329


Condition 1 19 0.014 0.907 0.001

AoI∗condition 3.647a 69.295a 3.946 0.008∗ 0.172

501–1000 ms after offset AoI 2.740a 52.055a 2.690 0.061 0.124

Condition 1 19 0.572 0.459 0.029

AoI∗condition 2.932a 55.700a 1.064 0.371 0.053


TABLE 4 | Proportions of time spent fixating each of the relevant people and objects in ES and LS condition for each auditory segment, and pairwisecomparisons between conditions with Bonferroni corrections applied for Experiment 2.

Auditory segment Person/object Proportion of time looking in ES Proportion of time looking in LS F1(1,19) p η2p

Indirect object Diver’s corn 0.05 0.09 6.777 0.017† 0.263

Fireman 0.44 0.37 8.070 0.010† 0.298

gegeven ‘given’ Diver’s corn 0.05 0.13 17.702 0.000∗ 0.482

Fireman’s celery 0.20 0.12 22.323 0.000∗ 0.540

0–500 ms after offset Diver’s corn 0.04 0.08 9.169 0.007∗ 0.326

MaterialsLike in Experiment 1, 16 items were constructed in twoconditions. The auditory stimuli were identical to the ones usedin Experiment 1. The visual stimuli of Experiment 1 were changedin such a way that the expected responses were YES in bothconditions. In particular, the ‘diver’ had a plate with a ‘corn cob’and an empty plate, while the ‘fireman’ had a plate with a ‘celery’and an empty plate. See Figure 5 for an example.

The 64 unrelated fillers from Experiment 1 were includedalongside the test items. In addition 32 controls involving alleen‘only’ were created, 16 with early stress, 16 with late stress,with half of the items referring to the ‘doctor’. The expectedresponse for the controls was NO, to counterbalance the YES biasintroduced by the test items.

ProcedureThe procedure was identical to that of Experiment 1.

PredictionsWe were interested to see if there was a facilitatory effect of earlystress resulting in shorter response times in the ES conditioncompared to the LS condition. Regarding eye movement patterns,our predictions were the same as in Experiment 1 except that

since both conditions are true in the pictures, there is no falsifyingentity in the picture.

ResultsTwo trials from different experimental participants were removedfrom the response data because the response was given before theindirect object segment.

Number of Correct ResponsesThe percentage of correct responses for the LS conditionwas 100%, for the ES condition 99%. The difference was notsignificant [F1(1,19) = 0.322, p = 0.577, η2

p = 0.017]. Theoverall correct response rate for the experiment as a wholewas 98%.

Response TimeThe overall mean response time from utterance onset for the LScondition was 2843ms, while it was 2868 ms for the ES condition.The difference is not significant [F1(1,19) = 0.147, p = 0.706,η2

p = 0.008).

Eye Gaze PatternsCoding and analysis was identical to that in Experiment 1.






1483

1484

1485

1486

1487

1488

1489

1490

1491

1492

1493

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

1511

1512

1513

1514

1515

1516

1517

1518

1519

1520

1521

1522

1523

1524

1525

1526

1527

1528

1529

1530

1531

1532

1533

1534

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1556

1557

1558

1559

1560

1561

1562

1563

1564

1565

1566

1567

1568

1569

1570

1571

1572

1573

1574

1575

1576

1577

1578

1579

1580

1581

1582

1583

1584

1585

1586

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596


Figure 6 gives the mean proportion of looking time for eachAoI in the auditory segments.

Table 3 gives the analysis of variance for Experiment 2.Like in Experiment 1, we were interested in the interaction

between Visual AoI and Condition, because this would revealany potential effect of the different positioning of focal accent onsentence processing. The interaction between AoI and Conditionwas significant during the indirect object, the verb gegeven ‘given’,and during the first 500 ms after the verb. These are indicatedwith bold in Table 3 for ease of reference. To reveal wherethe actual differences in eye gaze patterns across the conditionslied, we performed pairwise comparisons for these auditorysegments. The significant Bonferroni-corrected results are givenin Table 4.

Looks started diverging across conditions during the indirectobject auditory segment, where more time was spent looking atthe ‘diver’s corn’ in the LS condition, and more time was spentlooking at the ‘fireman’ in the ES condition. On the ‘diver’s corn’the effect continued during the auditory segments correspondingto the verb, and the first 500 ms intervals after utterance offset.During the utterance final verb gegeven ‘given’, there was alsosignificantly more time spent targeting the ‘fireman’s celery’ inthe ES condition.

DiscussionBehavioral MeasuresWe found a high rate of correct responses for both testconditions. We did not find that the early occurrence of stressfacilitated verification, as response times did not differ acrossconditions. See Section “General Discussion” below for more onthis.

Eye Gaze PatternsThe eye gaze patterns were similar to Experiment 1. The lookingpatterns diverged in the expected way: more looks on the‘fireman’ and the ‘fireman’s celery’ in the ES condition and on the‘diver’s corn’ in the LS condition. Perhaps due to the more simplenature of the visual stimulus, the effects are not as sustained overtime as they are in Experiment 1.

Let us now turn to our final experiment, where the expectedresponses were NO in both conditions.

EXPERIMENT 3

IntroductionRecall that in Experiment 1, we found that the difference betweenconditions in the looks targeting the ‘fireman’ seems to startalready before the indirect object was heard. Although we didexpect more looks targeting the ‘fireman’ in the ES condition, wedid not expect this to happen until after the indirect object debrandweerman ‘the fireman’ was actually heard. We believe thatthis increase in looks targeting the ‘fireman’ in the ES condition isanticipatory. Our hypothesis is that in a picture verification task,participants employ an unconscious strategy when performingthe task: they start out with the assumption that the utterance willmatch the picture.

Let us explain this in more detail using our actual example.Take the moment when participants hear the first half of theutterance Ik heb alleen SELDERIJ . . . ‘I have only CELERY . . .’in a setting where the fireman is the only person that has onlycelery, as in Figure 1 from Experiment 1. At this point, thereis only one continuation of this utterance that would make thesentence true in the picture, namely the actual continuation (i.e.,. . . aan de brandweerman gegeven. ‘. . .to the fireman given.’).Any alternative continuation (e.g., referring to the ‘diver’) wouldmake the sentence false. Given that there is only one way asentence can be true in a picture and there are many ways itcould be false, it would make sense for the listener to adopt acognitive strategy that assumes that the sentence is true untilproven wrong. In contrast, in the LS condition, when participantshear Ik heb alleen selderij . . . ‘I have only celery . . .’ in a contextof a picture where both the fireman and the diver has celery, as inFigure 1, there is no continuation of the utterance that can makethe sentence true. So, there is no anticipatory advantage in thiscondition. This has the effect that there is a stronger tendencyfor participants’ eye gaze to already target the ‘fireman’ in the EScondition than in the LS condition, even before they have heardthe word brandweerman ‘fireman’ (i.e., during the aan de ‘to the’auditory segment).6 In short, we speculate that in a bi-modalverification task, participants anticipate the continuation of thesentence to be such that it makes the utterance true in the picture.

This is in line with findings of Altmann and Kamide (1999)and Kamide et al. (2005) (see also Ito and Speer, 2008). Theseauthors found that while listening to utterances presented tothem in the visual-world paradigm, participants can sometimesanticipate certain semantic properties of forthcoming lexicalitems based on the lexical items they have already heard. Inparticular, they tested utterances like The boy will eat the cakepresented in a visual setting with a boy, a cake, a toy car, a toy trainand a ball. They found that participants’ eye movements targetedthe cake already before the object noun phrase, so when they onlyheard The boy will eat. . .. Altmann and Kamide (1999) showedthat this was because the verb eat places a semantic restriction onthe object noun phrase and the cake was the only edible object inthe picture.

The goal of Experiment 3 was to investigate our anticipatorylook hypothesis.

Material and MethodsParticipantsTwenty-three non-dyslexic native speakers of Dutch wererecruited from the UiL OTS participant pool.

Data of three participants were discarded prior to analysis;one participant did not receive adequate instruction prior tothe experiment and reread the instruction sheet repeatedlyduring the experiment; one experimental run suffered from anunresponsive button box; and one participant was intimatelyfamiliar with linguistic theories on stress shift. The remaining

6Note that in Experiment 2, the continuation of the sentence with brandweerman‘fireman’ matches the picture in both the ES and LS condition, so our hypothesispredicts the same anticipatory looks targeting the ‘fireman’ in both conditions,with no significant difference between conditions. As can be seen in Figure 4, thisexpectation was met.






1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

1611

1612

1613

1614

1615

1616

1617

1618

1619

1620

1621

1622

1623

1624

1625

1626

1627

1628

1629

1630

1631

1632

1633

1634

1635

1636

1637

1638

1639

1640

1641

1642

1643

1644

1645

1646

1647

1648

1649

1650

1651

1652

1653

1654

1655

1656

1657

1658

1659

1660

1661

1662

1663

1664

1665

1666

1667

1668

1669

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

1703

1704

1705

1706

1707

1708

1709

1710


participants were unaware of the purpose of the experiment,and were paid 5€ for their participation. The mean age of the20 remaining participants was 22;8 years, ranging from 17 to27 years; 14 females and 6 males; 18 participants were right-handed.

MaterialsLike in Experiments 1 and 2, 16 items were constructed in twoconditions. The auditory stimuli were identical to the ones usedin Experiments 1 and 2. The visual stimuli were changed in sucha way that the expected responses were NO in both conditions.In particular, the ‘fireman’ had a ‘celery’ and a ‘corn cob’, while the‘diver’ had a ‘celery’. See Figure 7 for an example.

The 64 unrelated fillers from Experiments 1 and 2 wereincluded alongside the test items. In addition 32 controls werecreated, 16 with early stress, 16 with late stress; half of the itemsmentioning the ‘doctor’. The expected response for the controlswas YES, to counterbalance the test items.

ProcedureThe procedure was identical to that of Experiments 1 and 2.

PredictionsThe experiment was designed to test our hypothesis that thetrend toward an early increase in looks targeting the ‘fireman’ inExperiment 1 was anticipatory in the following sense. Participantshear Ik heb alleen SELDERIJ. . . ‘I have only CELERY. . .’ andanticipate that the utterance will continue in such a way thatit matches the picture. In Experiment 3 the visual stimuluswas changed in such a way that the only person that has


only celery is the ‘diver’. See Figure 7. The audio stimuliwere identical to that of Experiment 1. Thus, our anticipationhypothesis predicts that participants will look more at ‘thediver’ during the aan de ‘to the’ auditory segment in the EScondition. This is because the ‘diver’ has only ‘celery’. But oncethey hear the indirect object de brandweerman ‘the fireman’,their looks are expected to shift to the ‘fireman’. So, it isexpected that in Experiment 3 the anticipatory strategy ‘tricks’participants.

In addition, we expected that the findings of Experiments 1and 2 about divergent looks between the two conditions wouldbe replicated, except potentially, due to the potential hinderingeffect of the anticipatory looks, somewhat delayed.

ResultsEleven trials (six experimental) were removed from analysis ofthe response data because the response had already been givenbefore the onset of the indirect object (six experimental and fourfiller items) or utterance onset (one filler).

Number of Correct ResponsesThe percentage of correct responses for the LS condition was97%, for the ES condition 98%. The difference was not significant[F1(1,19) = 1.353, p = 0.259, η2

p = 0.066]. The overall correctresponse rate for Experiment 3 was 97%.

Response TimeThe overall mean response time from utterance onset for the LScondition was 2875 ms, while it was 2909 ms for the ES condition.The difference is not significant [F1(1,19) = 0.418, p = 0.526η2

p = 0.022).

Eye Gaze PatternsCoding and analysis was identical to that in Experiments 1 and 2.

Table 5 gives the analysis of variance for Experiment 3. Like inExperiments 1 and 2, we focus only on the significant interactionsbetween Visual AoI and Condition, as those are the relevantfindings for our research question. For ease of reference, theseare given in bold.

We find a significant interaction between AoI and Conditionin the aan de ‘to the’ auditory segment, during the indirect object,during the verb gegeven ‘given’, and in the two auditory segmentsafter utterance offset. Like in Experiments 1 and 2, the significantBonferroni-corrected pairwise comparisons for these auditorysegments are given in Table 6.

Figure 8 gives the mean proportion of looks for each AoI ineach auditory segment.

Discussion of Experiment 3 and GeneralDiscussionBehavioral MeasuresOverall, in none of the experiments was there any facilitatoryeffect of the early occurrence of stress in terms of shorterresponse times or a higher accuracy rate for the ES condition.We think that this is because even though participants mayuse the earliness of stress to anticipate the continuationof the utterance, they still have to wait until the sentence






1711

1712

1713

1714

1715

1716

1717

1718

1719

1720

1721

1722

1723

1724

1725

1726

1727

1728

1729

1730

1731

1732

1733

1734

1735

1736

1737

1738

1739

1740

1741

1742

1743

1744

1745

1746

1747

1748

1749

1750

1751

1752

1753

1754

1755

1756

1757

1758

1759

1760

1761

1762

1763

1764

1765

1766

1767

1768

1769

1770

1771

1772

1773

1774

1775

1776

1777

1778

1779

1780

1781

1782

1783

1784

1785

1786

1787

1788

1789

1790

1791

1792

1793

1794

1795

1796

1797

1798

1799

1800

1801

1802

1803

1804

1805

1806

1807

1808

1809

1810

1811

1812

1813

1814

1815

1816

1817

1818

1819

1820

1821

1822

1823

1824




Direct object AoI 5 95 7.429 0.000∗ 0.281

Condition 1 19 0.105 0.750 0.005

AoI∗condition 5 95 1.273 0.282 0.063

aan de ‘to the’ AoI 5 95 8.342 0.000∗ 0.305

Condition 1 19 3.147 0.092 0.142

AoI∗condition 5 95 3.434 0.007∗ 0.153


Condition 1 19 5.637 0.028∗ 0.229

AoI∗condition 5 95 3.894 0.003∗ 0.170

gegeven ‘given’ AoI 5 95 43.086 0.000∗ 0.694

Condition 1 19 20.789 0.000∗ 0.522

AoI∗condition 5 95 16.929 0.000∗ 0.471


Condition 1 19 12.266 0.002∗ 0.392

AoI∗condition 4.178a 79.389a 11.201 0.000∗ 0.371


Condition 1 19 0.152 0.701 0.008

AoI∗condition 3.310a 62.893a 5.869 0.000∗ 0.236


TABLE 6 | Proportions of time spent fixating each of the relevant people and objects in ES and LS condition for each auditory segment, and pairwisecomparisons between conditions with Bonferroni corrections applied for Experiment 3.

Auditory segment Person/object Proportion of time looking in ES Proportion of time looking in LS F1 (1,19) p η2p

aan de ‘to the’ Diver 0.27 0.18 7.518 0.013† 0.284

Indirect object Fireman’s celery 0.07 0.11 8.288 0.010† 0.304

gegeven ‘given’ Diver’s celery 0.04 0.11 15.733 0.001∗ 0.453

Fireman 0.33 0.20 28.690 0.000∗ 0.602

Fireman’s corn 0.18 0.09 21.543 0.000∗ 0.531

0–500 ms after offset Diver’s celery 0.02 0.07 19.961 0.000∗ 0.512

Fireman 0.15 0.07 13.266 0.002∗ 0.411

Fireman’s corn 0.13 0.06 16.140 0.001∗ 0.459

501–1000 ms after offset Diver’s celery 0.01 0.04 14.628 0.001∗ 0.435

is actually finished until they can establish the meaningcomponents based on the actual continuation. So, overall, evenif accentual information is presented earlier in the ES condition,leading to early identification of focus, this cannot facilitatecomputation of the meaning components associated with onlyoverall, due to the propositional nature of these meaningcomponents.

Eye Tracking PatternsThe visual stimulus in Experiment 3 was designed to test ourhypothesis that participants anticipate the continuation of theutterance when they hear Ik heb alleen SELDERIJ aan de. . . ‘Ihave only CELERY to the. . .’ in the ES condition. If the sentencewould continue with duiker ‘diver’, it would match the picture;and in the ES condition we do indeed find marginally more timeis being spent looking at the ‘diver’ than in the LS condition, rightbefore the indirect object is heard. Given that this effect is the startof a fixation curve (i.e., gradual growth, followed by robust peak,followed by gradual diminishing effect), we did not expect a more

robust effect at this point. So, we can confirm our anticipationhypothesis.

But, the actual continuation of the utterance turns out tobe brandweerman ‘fireman’. So, the utterance ends up beingfalse. As predicted (and already found in Experiments 1 and2), once the indirect object has been heard, looks shift tothe ‘fireman’ and his possessions in the ES condition andto the ‘diver’s celery’ in the LS condition. The effects aresomewhat delayed compared to Experiment 1, presumably dueto the hindering effect of the anticipation strategy. Duringthe sentence final verb, there are more looks targeting the‘fireman’ and the ‘fireman’s corn’ in the ES condition andmore looks targeting the ‘diver’s celery’ in the LS condition.We also expected that at this point, looks to the ‘diver’ wouldbe higher in the LS condition than in the ES condition,—the direct opposite of our expectation for the aan de ‘to the’auditory segment. Looks to the ‘diver’ are indeed numericallyhigher in the LS condition during the verb gegeven ‘given’but the effect does not reach significance [F1(1,19) = 6.576,






1825

1826

1827

1828

1829

1830

1831

1832

1833

1834

1835

1836

1837

1838

1839

1840

1841

1842

1843

1844

1845

1846

1847

1848

1849

1850

1851

1852

1853

1854

1855

1856

1857

1858

1859

1860

1861

1862

1863

1864

1865

1866

1867

1868

1869

1870

1871

1872

1873

1874

1875

1876

1877

1878

1879

1880

1881

1882

1883

1884

1885

1886

1887

1888

1889

1890

1891

1892

1893

1894

1895

1896

1897

1898

1899

1900

1901

1902

1903

1904

1905

1906

1907

1908

1909

1910

1911

1912

1913

1914

1915

1916

1917

1918

1919

1920

1921

1922

1923

1924

1925

1926

1927

1928

1929

1930

1931

1932

1933

1934

1935

1936

1937

1938



p = 0.019, η2p = 0.257). The expected effects on the ‘fireman’,

the ‘fireman’s corn’ and the ‘diver’s celery’ continue throughoutthe first 500 ms interval after utterance offset. For the‘diver’s celery’, the effect is still present during the second500 ms after utterance offset. In addition, a marginally higherproportion of looks targetted the ‘fireman’s celery’ in the LScondition during the indirect object auditory segment, whichwas unexpected. We interpret this as the participants verifyingthe non-focal meaning component (i.e., that the ‘fireman’ has‘celery’), which perhaps does not occur in the ES condition atthis point due to the hindering effect of the mis-anticipatedcontinuation.

Overall, we found that sentence verification starts early. Infact, perhaps unexpectedly, it starts already before the wholeutterance is heard. Participants anticipate the continuation ofthe utterance assuming that the utterance will turn out tomatch the picture. Crucially, prosodic focus on the directobject was found to be relevant for guiding anticipatory looksalready at the next sound segment, during aan de ‘to the’ (seeExperiment 3). This gives evidence of incremental prosodic focusprocessing.

In all three experiments, we found that utterance verificationproceeds according to the semantics of only-sentences (Horn,1969; Krifka, 1992; Rooth, 1992). Participants’ looks robustly






1939

1940

1941

1942

1943

1944

1945

1946

1947

1948

1949

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

2027

2028

2029

2030

2031

2032

2033

2034

2035

2036

2037

2038

2039

2040

2041

2042

2043

2044

2045

2046

2047

2048

2049

2050

2051

2052


diverge already during the sentence final verb gegeven ‘given’in the two conditions in all three experiments, as they verifythe different focal meaning components associated with earlyand late occurrence of stress. In Experiment 1, robust effectswere already found during the indirect object. So, we foundevidence that participants’ looks not only target the falsifyingentity in the picture, but rather the falsifying proposition wasestablished. This provides support for the psychological reality ofproposition-based semantics for prosodic focus association withonly.

CONCLUSION

Our results show incremental focus processing and thus fall inline with earlier results (Gennari et al., 2005; Paterson et al.,2007). Investigating the time course of looks accompanyingthe computation of only-sentences allowed us to pinpointthe time course of the semantic processing associated withfocal differences that are marked prosodically. We found thatpeople process prosodic focus immediately: there is evidenceof participants verifying the focal meaning component alreadyduring the indirect object. We also found that participants makeanticipatory looks in this picture verification task taking intoaccount the prosodic focus of the utterance, providing furtherevidence of incremental focus computation at the earliest possiblepoint, at the point where the direct object with or withoutprosodic focus is heard.

AUTHOR CONTRIBUTIONS

KS is responsible for the original design. IM determined theprocedure and was responsible for creating the materials, andimplementing and carrying out the experiment. IM carried

out the data analysis and statistical calculations. KS and IMinterpreted the data together and co-wrote the discussions.

FUNDING

KS wishes to acknowledge financial help from the Dutch ScienceFoundation (NWO) [VENI grant number 275-70-011], theGerman Research Foundation (DFG) [SFB 632], and to expressher gratitude to Wolfson College, Oxford for an academicvisitorship while the paper was prepared for submission.

ACKNOWLEDGMENTS

We thank Ignace Hooge for help with designing the visualstimuli. We thank Tom Fritzsche, Judit Gervain, Pim Mak, andShravan Vasishth for many useful comments about the contentof our paper, and Laura Boeschoten, Kirsten Passoa-Schutter,Chris van Run, Maarten Duijndam, and Arnout Koornneef forhelp on understanding the statistical concepts involved. Wethank the students who helped create the stimuli and run theexperiments: Stavroula Alexandropoulou, Desirée Capel, JanWillem Chevalking, Moreno Coco, Anja Goldschmidt, AlexiaGuerra Rivera, Loes Koring, Sophia Manika, Yueru Ni, AndryPanayiotou, Myrto Pantazi, Will Schuerman, and Elli Tourtouri.Last but not least, we dedicate this paper to the memory of TanyaReinhart, who inspired us to carry out this work.

SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be foundonline at: http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00150

REFERENCESAltmann, G. T., and Kamide, Y. (1999). Incremental interpretation at verbs:

restricting the domain of subsequent reference. Cognition 73, 247–264. doi:10.1016/S0010-0277(99)00059-1

Altmann, G. T. M., and Kamide, Y. (2004). “Now you see it, now you don’t:mediating the mapping between language and the visual world,” in TheInterface of Language, Vision, and Action: Eye Movements and the VisualWorld, eds J. M. Henderson and F. Ferreira (New York, NY: Psychology Press),347–368.

Arnold, J. E. (2008). {THE} {BACON} not the bacon: how children and adultsunderstand accented and unaccented noun phrases. Cognition 108, 69–99. doi:10.1016/j.cognition.2008.01.001

Boersma, P., and Weenink, D. (2006). Praat: Doing Phonetics by Computer.Available at: http://www.praat.org/

Chomsky, N. (1971). “Deep structure, surface structure, and semanticinterpretation,” in Semantics: An Interdisciplinary Reader in Philosophy,Linguistics and Psychology, eds L. A. Jakobovits and D. D. Steinberg (Cambridge:Cambridge University Press), 183–216.

Clifton, C. Jr., Bock, J., and Radó, J. (2000). “Chapter 23 – effects of the focusparticle only and intrinsic contrast on comprehension of reduced relativeclauses,” in Reading as a Perceptual Process, eds A. Kennedy, R. Radach, D.Heller, and J. Pynte (Oxford: North-Holland), 591–619.

Crain, S., Ni, W., and Conway, L. (1994). “Learning, parsing, and modularity,” inPerspectives on Sentence Processing, eds C. Clifton, L. Frazier, and K. Rayner(Hillsdale, NJ: Lawrence Erlbaum Associates), 443–467.

Crain, S., and Steedman, M. (1985). “On not being led up the garden path: the useof context by the psychological syntax processor,” in Natural Language Parsing:Psychological, Computational, and Theoretical Perspectives, eds D. R. Dowty,L. Karttunen, and A. M. Zwicky (Cambridge: Cambridge University Press),320–358.

Eberhard, K., Spivey-Knowlton, M., Sedivy, J., and Tanenhaus, M. (1995). Eyemovements as a window into real-time spoken language comprehensionin natural contexts. J. Psycholinguist. Res. 24, 409–436. doi: 10.1007/BF02143160

Filik, R., Paterson, K. B., and Liversedge, S. P. (2005). Parsing with focusparticles in context: eye movements during the processing of relativeclause ambiguities. J. Mem. Lang. 53, 473–495. doi: 10.1016/j.jml.2005.07.004

Gennari, S., Meroni, L., and Crain, S. (2005). “Rapid relief of stress in dealing withambiguity,” in Approaches to Studying World-Situated Language Use: Bridgingthe Language-as-Product and Language-as-Action Traditions, eds J. C. Trueswelland M. K. Tanenhaus (Cambridge, MA: MIT Press), 245–259.

Horn, L. R. (1969). “A presuppositional analysis of only and even,” in Proceedingsof the Annual Meeting of the Chicago Linguistics Society, Vol. 5, (Chicago, IL:University of Chicago), 97–108.


http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00150

http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00150

http://www.praat.org/





2053

2054

2055

2056

2057

2058

2059

2060

2061

2062

2063

2064

2065

2066

2067

2068

2069

2070

2071

2072

2073

2074

2075

2076

2077

2078

2079

2080

2081

2082

2083

2084

2085

2086

2087

2088

2089

2090

2091

2092

2093

2094

2095

2096

2097

2098

2099

2100

2101

2102

2103

2104

2105

2106

2107

2108

2109

2110

2111

2112

2113

2114

2115

2116

2117

2118

2119

2120

2121

2122

2123

2124

2125

2126

2127

2128

2129

2130

2131

2132

2133

2134

2135

2136

2137

2138

2139

2140

2141

2142

2143

2144

2145

2146

2147

2148

2149

2150

2151

2152

2153

2154

2155

2156

2157

2158

2159

2160

2161

2162

2163

2164

2165

2166


Ito, K., and Speer, S. R. (2008). Anticipatory effects of intonation: eyemovements during instructed visual search. J. Mem. Lang. 58, 541–573. doi:10.1016/j.jml.2007.06.013

Ito, K., and Speer, S. R. (2011). “Semantically-independent but contextually-dependent interpretation of contrastive accent,” in Prosodic Categories:Production, Perception and Comprehension, eds S. Frota, G. Elordieta, and P.Prieto (Dordrecht: Springer Netherlands), 69–92.

Kamide, Y., Altmann, G. T., and Heywood, S. (2005). “The time courseof constraint application during sentence processing in visual contexts:anticipatory eye movements in English and Japanese,” in Approaches to StudyingWorld-Situated Language Use: Bridging the Language-as-Product and Language-as-Action Traditions, eds J. C. Trueswell and M. K. Tanenhaus (Cambridge, MA:MIT Press), 229–243.

Kim, C. (2008). “Processing presupposition: verifying sentences with ’Only’,”in Proceedings of the Penn Linguistics Colloquium, Penn Working Papers inLinguistics, eds J. Tauberer, A. Eliam, and L. MacKenzie, 14, 213–226. Availableat: http://repository.upenn.edu/pwpl/vol14/iss1/17

Krifka, M. (1992). “A compositional semantics for multiple focus constructions,”in Informationsstruktur und Grammatik, ed. J. Jacobs (Opladen: WestdeutscherVerlag), 17–53.

Liversedge, S. P., Paterson, K. B., and Clayes, E. L. (2002). The influence of only onsyntactic processing of “long” relative clause sentences. Q. J. Exp. Psychol. A 55,225–240. doi: 10.1080/02724980143000253

Paterson, K. B., Liversedge, S. P., Filik, R., Juhasz, B. J., White, S. J., andRayner, K. (2007). Focus identification during sentence comprehension:evidence from eye movements. Q. J. Exp. Psychol. 60, 1423–1445. doi:10.1080/17470210601100563

Paterson, K. B., Liversedge, S. P., and Underwood, G. (1999). The influence of focusoperators on syntactic processing of short relative clause sentences. Q. J. Exp.Psychol. A 52, 717–737. doi: 10.1080/713755827

Rooth, M. (1992). A theory of focus interpretation. Nat. Lang. Semantics 1, 75–116.doi: 10.1007/BF02342617

Sauermann, A., Filik, R., and Paterson, K. B. (2013). Processing contextual andlexical cues to focus: Evidence from eye movements in reading. Lang. Cogn.Process. 28, 875–903. doi: 10.1080/01690965.2012.668197

Sedivy, J. C. (2002). Invoking discourse-based contrast sets and resolving syntacticambiguities. J. Mem. Lang. 46, 341–370. doi: 10.1006/jmla.2001.2812

Veenker, T. (2005). Flexible Experiment Program. Utrecht: Utrecht University.

Conflict of Interest Statement: The authors declare that the research wasconducted in the absence of any commercial or financial relationships that couldbe construed as a potential conflict of interest.

The reviewer BF and handling Editor declared a current collaboration and thehandling Editor states that the process nevertheless met the standards of a fair andobjective review.

Copyright © 2016 Mulders and Szendroi. This is an open-access article distributedunder the terms of the Creative Commons Attribution License (CC BY). Theuse, distribution or reproduction in other forums is permitted, provided theoriginal author(s) or licensor are credited and that the original publication inthis journal is cited, in accordance with accepted academic practice. No use,distribution or reproduction is permitted which does not comply with theseterms.


http://repository.upenn.edu/pwpl/vol14/iss1/17









Early Association of Prosodic Focus with alleen ‘only ...uczaksz/Mulders and Szendroi...

Documents

Transcript of Early Association of Prosodic Focus with alleen ‘only ...uczaksz/Mulders and Szendroi...