4.2.2 Analysis of Human Search Strategies

LarKCThe Large Knowledge Collider

a platform for large scale integrated reasoning and Web-search

FP7 – 215535

4.2.2Analysis of Human Search Strategies

Coordinator: Hansjorg Neth (MPG)

With contributions from: Hansjorg Neth, Lael J. Schooler,Jorg Rieskamp, Jose Quesada (MPG)

Jie Xiang, Rifeng Wang, Lijuan Wang, Haiyan Zhou,Yulin Qin, Ning Zhong, Yi Zeng (WICI)

Quality Assessor: Michael Witbrock (Cycorp)Quality Controller: Zhisheng Huang (VUA)

Document Identifier: LarKC/2008/D4.2.2/V1.0Class Deliverable: LarKC EU-IST-2008-215535Version: version 1.0Date: September 30, 2009State: finalDistribution: public

FP7 – 215535

Deliverable 4.2.2

EXECUTIVE SUMMARY

This report builds upon the research agendas and plans described in LarKC Deliverable4.2.1, entitled Experimental designs for mapping human search strategies, submitted inOctober 2008 (M7 of the LarKC project).

Chapter 2 (co-authored by Hansjorg Neth, Lael J. Schooler and Jorg Rieskamp) is en-titled Stopping Rules for Memory Foraging and reports MPG’s ongoing work on humanstopping rules. A series of three experiments has been conducted to explore the rules andheuristics that humans use to stop semantic memory retrievals. The results so far indicatethat some people are capable of escaping diminishing returns by abandoning tasks for whichthe marginal gains fall below some threshold and that decisions to stop and switch tasks areinfluenced by both bottom-up and top-down features of the task and task environments. Al-though the analyses included in this report are mostly descriptive, they provide guidancetowards the development of computational models and LarKC plug-ins that promise to in-crease system efficiency by stopping processes at the right time.

Chapter 3 (co-authored by Jie Xiang, Rifeng Wang, Lijuan Wang, Haiyan Zhou, YulinQin, Ning Zhong, and Yi Zeng) is entitled Application of Heuristics and Heuristic Search-ing and reports WICI’s work on the selection and application of heuristics in the contextof human problem solving. Two experiments that investigated heuristics for selection andsearching by combining behavioral research with fMRI imaging and computational cog-nitive modeling techniques are reported. This work was financially supported by outsidefunding and is intended to serve as a extra support for the LarKC project. As it is unlikelythat the fMRI studies will provide applicable results within the timeframe of the LarKCproject they will be removed from future deliverables. But to demonstrate that the workannounced in D4.2.1 has been productive these studies are included here to gain closure onthis line of research.

2 of 59

FP7 – 215535

Deliverable 4.2.2

DOCUMENT INFORMATION

IST ProjectNumber

FP7 – 215535 Acronym LarKC

Full Title The Large Knowledge Collider: a platform for large scale integratedreasoning and Web-search

Project URL http://www.larkc.eu/Document URLEU Project Officer Stefano Bertolo

Deliverable Number 4.2.2 Title Analysis of Human Search StrategiesWork Package Number 4 Title Reasoning and Deciding

Date of Delivery Contractual M18 Actual 30-Sep-09Status version 1.0 final �Nature prototype 2 report � dissemination 2DisseminationLevel

public � consortium 2

Authors (Partner)Hansjorg Neth, Lael J. Schooler, Jorg Rieskamp, & Jose Quesada (MPG),Jie Xiang, Rifeng Wang, Lijuan Wang, Haiyan Zhou, Yulin Qin, Ning Zhong, YiZeng (WICI)

Resp. Author Hansjorg Neth (MPG) E-mail [email protected] MPG Phone +49 30 82406-696

Abstract(for dissemination)

This report builds upon the research agendas and plans described in LarKC De-liverable 4.2.1, entitled Experimental designs for mapping human search strate-gies, submitted in October 2008 (M7 of the LarKC project).Chapter 2 (co-authored by Hansjorg Neth, Lael J. Schooler and Jorg Rieskamp)is entitled Stopping Rules for Memory Foraging and reports MPG’s ongoingwork on human stopping rules. A series of three experiments has been conductedto explore the rules and heuristics that humans use to stop semantic memory re-trievals. The results so far indicate that some people are capable of escapingdiminishing returns by abandoning tasks for which the marginal gains fall be-low some threshold and that decisions to stop and switch tasks are influenced byboth bottom-up and top-down features of the task and task environments. Al-though the analyses included in this report are mostly descriptive, they provideguidance towards the development of computational models and LarKC plug-ins that promise to increase system efficiency by stopping processes at the righttime.Chapter 3 (co-authored by Jie Xiang, Rifeng Wang, Lijuan Wang, Haiyan Zhou,Yulin Qin, Ning Zhong, and Yi Zeng) is entitled Application of Heuristics andHeuristic Searching and reports WICI’s work on the selection and application ofheuristics in the context of human problem solving. Two experiments that inves-tigated heuristics for selection and searching by combining behavioral researchwith fMRI imaging and computational cognitive modeling techniques are re-ported. This work was financially supported by outside funding and is intendedto serve as a extra support for the LarKC project. As it is unlikely that the fMRIstudies will provide applicable results within the timeframe of the LarKC projectthey will be removed from future deliverables. But to demonstrate that the workannounced in D4.2.1 has been productive these studies are included here to gainclosure on this line of research.

Keywords information search, human experiments, stopping rules, heuristics, Sudoku, fMRI

3 of 59

FP7 – 215535

Deliverable 4.2.2

Version LogIssue Date Rev

No.Author Change

August 26, 2009 1 Haiyan Zhou WICI provides first draft.September 10, 2009 2 Hansjorg Neth MPG provides first draft (Experiment 2).September 22, 2009 3 Hansjorg Neth Additional MPG parts (Experiments 1+3).September 26, 2009 4 Hansjorg Neth

and Lael J.Schooler

Revisions of experimental parts (MPG).

September 29, 2009 5 Hansjorg Neth Submission of version 0.8.September 30, 2009 6 Hansjorg Neth Submission of version 1.0.

4 of 59

FP7 – 215535

Deliverable 4.2.2

PROJECT CONSORTIUM INFORMATION

Participant’s name Partner ContactSemantic Technology Institute Innsbruck,Universitaet Innsbruck

Prof. Dr. Dieter FenselSemantic Technology Institute (STI),Universitaet Innsbruck,Innsbruck, AustriaEmail: [email protected]

AstraZeneca AB Bosse AnderssonAstraZenecaLund, SwedenEmail: [email protected]

CEFRIEL - SOCIETA CONSORTILE A RESPON-SABILITA LIMITATA

Emanuele Della ValleCEFRIEL - SOCIETA CONSORTILE A RESPONS-ABILITA LIMITATAMilano, ItalyEmail: [email protected]

CYCORP, RAZISKOVANJE IN EKSPERIMEN-TALNI RAZVOJ D.O.O.

Michael WitbrockCYCORP, RAZISKOVANJE IN EKSPERIMENTALNIRAZVOJ D.O.O.,Ljubljana, SloveniaEmail: [email protected]

Hochstleistungsrechenzentrum,Universitaet Stuttgart

Georgina GallizoHochstleistungsrechenzentrum,Universitaet StuttgartStuttgart, GermanyEmail : [email protected]

MAX-PLANCK GESELLSCHAFT ZUR FO-ERDERUNG DER WISSENSCHAFTEN E.V.

Dr. Lael Schooler,Max-Planck-Institut fur BildungsforschungBerlin, GermanyEmail: [email protected]

Ontotext AD Atanas Kiryakov,Ontotext Lab,Sofia, BulgariaEmail: [email protected]

SALTLUX INC. Kono KimSALTLUX INCSeoul, KoreaEmail: [email protected]

SIEMENS AKTIENGESELLSCHAFT Dr. Volker TrespSIEMENS AKTIENGESELLSCHAFTMuenchen, GermanyEmail: [email protected]

THE UNIVERSITY OF SHEFFIELD Prof. Dr. Hamish Cunningham,THE UNIVERSITY OF SHEFFIELDSheffield, UKEmail: [email protected]

VRIJE UNIVERSITEIT AMSTERDAM Prof. Dr. Frank van Harmelen,VRIJE UNIVERSITEIT AMSTERDAMAmsterdam, NetherlandsEmail: [email protected]

THE INTERNATIONAL WIC INSTITUTE, BEI-JING UNIVERSITY OF TECHNOLOGY

Prof. Dr. Ning Zhong,THE INTERNATIONAL WIC INSTITUTEMabeshi, JapanEmail: [email protected]

INTERNATIONAL AGENCY FOR RESEARCHON CANCER

Dr. Paul Brennan,INTERNATIONAL AGENCY FOR RESEARCH ONCANCERLyon, FranceEmail: [email protected]

INFORMATION RETRIEVAL FACILITY Dr. John Tait, Dr. Paul Brennan,INFORMATION RETRIEVAL FACILITYVienna, AustriaEmail: [email protected]

5 of 59

FP7 – 215535

Deliverable 4.2.2

TABLE OF CONTENTS

LIST OF FIGURES 8

LIST OF TABLES 9

LIST OF ACRONYMS 10

1 INTRODUCTION 11

2 STOPPING RULES FOR MEMORY FORAGING 122.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 The Need for Smart Stopping Rules . . . . . . . . . . . . . . . . . 122.1.2 Stopping Rules in Animal Foraging Theory . . . . . . . . . . . . . 122.1.3 Stopping Rules in Psychology and Cognitive Science . . . . . . . . 132.1.4 Heuristics for Stopping . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Experiment 1: The Dynamics of Human Free Recall . . . . . . . . . . . . 172.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Experiment 2: Discretionary Stopping and Switching . . . . . . . . . . . . 222.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Experiment 3: Estimating Set Size and Retrievals . . . . . . . . . . . . . . 262.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Integrative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 HEURISTICS APPLICATION AND HEURISTIC SEARCHING 343.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Experiment 1: Selection and Application of Heuristics . . . . . . . . . . . 34

3.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 Experiment 2: Heuristic Searching . . . . . . . . . . . . . . . . . . . . . . 413.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

46References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 of 59

FP7 – 215535

Deliverable 4.2.2

A APPENDICES 49A.1 Experimental Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49A.2 Experiment 1: Cumulative Curves for each Participant and Question . . . . 49A.3 Experiment 2: Cumulative Curves for each Participant and Question . . . . 53

7 of 59

FP7 – 215535

Deliverable 4.2.2

LIST OF FIGURES

2.1 Optimal patch leaving time according to the marginal value theorem (MVT). . . . 132.2 Open-ended retrieval paradigm. . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Experimental task screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 Point/error-time diagram for one participant and question. . . . . . . . . . . . . 202.5 Cumulative points of (a) one participant and (b) one question. . . . . . . . . . . 212.6 Mean number of retrievals over 120 seconds. . . . . . . . . . . . . . . . . . . . 222.7 Histograms of observed points and question times. . . . . . . . . . . . . . . . . 242.8 Performance of Participant 10 on three tasks on over all tasks. . . . . . . . . . . 242.9 Cumulative performance on the same three tasks. . . . . . . . . . . . . . . . . 252.10 Cumulative performance of three participants. . . . . . . . . . . . . . . . . . . 252.11 Experimental task screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.12 Immediate feedback on guessing accuracy and speed. . . . . . . . . . . . . . . 282.13 Comparison of actual (circles) and mean estimated (triangles) set sizes for all 60

questions. (Error bars around estimates represent one standard deviation.) . . . . . 292.14 Comparison of mean actual (circles) and mean estimated (triangles) number of re-

trievals (after 120 seconds) for all 60 questions. (Error bars around estimates rep-resent one standard deviation of the empirical estimates.) . . . . . . . . . . . . . 30

3.1 Examples of simlified Sudoku problems. . . . . . . . . . . . . . . . . . . . . . 353.2 Protocol of a scan trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3 BOLD responses in ACT-R theory and fMRI data in Study 1 . . . . . . . . . . . 393.4 Examples of materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.5 BOLD effects in predefined ROIs in Study 2 . . . . . . . . . . . . . . . . . . . 44

A.1 Cumulative points (dark upper line) and errors (light lower line) per subject (Par-ticipants 1–30) with fitted power functions. . . . . . . . . . . . . . . . . . . . 51

A.2 Cumulative points (dark upper line) and errors (light lower line) per subject (Par-ticipants 31–60) with fitted power functions. . . . . . . . . . . . . . . . . . . . 52

A.3 Cumulative points (dark upper lines) and errors (light lower lines) per task (Ques-tions 1–30) with fitted power functions. . . . . . . . . . . . . . . . . . . . . . 54

A.4 Cumulative points (dark upper lines) and errors (light lower lines) per task (Ques-tions 31–60) with fitted power functions. . . . . . . . . . . . . . . . . . . . . . 55

A.5 Cumulative points (upper line) and errors (lower line) per subject (Participants 1–25). 56A.6 Cumulative points (upper line) and errors (lower line) per subject (Participants 26–50). 57A.7 Cumulative points (upper line) and errors (lower line) per task (Questions 1–30). . 58A.8 Cumulative points (upper line) and errors (lower line) per task (Questions 31–60). . 59

8 of 59

FP7 – 215535

Deliverable 4.2.2

LIST OF TABLES

2.1 Experiment 1: Descriptive results per participant and question. . . . . . . . . . . 202.2 Experiment 3: Set size-estimates by their speed and accuracy. . . . . . . . . . . 292.3 Variance of Ti (in Experiment 2) explained by individual predictors. . . . . . . . 32

3.1 Examples of declarative and procedural knowledge in 4×4 Sudoku. . . . . . . . . 373.2 Parameters for predicting BOLD response of five modules. . . . . . . . . . . . . 373.3 Correct rate and response time to solve 4 types of Sudoku problems in Experiment 1

(correct rate in brackets). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Operations in five modules for 1-step simple 4×4 Sudoku. . . . . . . . . . . . . 383.5 Operations in five modules for 1-step complex 4×4 Sudoku. . . . . . . . . 393.6 Operations in five modules for 2-step simple 4×4 Sudoku. . . . . . . . . . . . . 403.7 Operations in five modules for 2-step complex 4×4 Sudoku. . . . . . . . . . . . 403.8 Correct rate and response time to solve 4 types of Sudoku in Study 2 (correct rate

in the bracket). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A.1 Tasks used in all memory foraging experiments (Part 1). . . . . . . . . . . . . . 49A.2 Tasks used in all memory foraging experiments (Part 2). . . . . . . . . . . . . . 50

9 of 59

FP7 – 215535

Deliverable 4.2.2

LIST OF ACRONYMS

Acronym Description

BOLD response Blood-Oxygen-Level Dependent responsefMRI functional Magnetic Resonance ImagingMVT Marginal Value Theorem

10 of 59

FP7 – 215535

Deliverable 4.2.2

1. Introduction

This report builds upon the research agendas and plans described in LarKC Deliverable4.2.1, entitled Experimental designs for mapping human search strategies, submitted inOctober 2008 (M7 of the LarKC project).

Chapter 2 (co-authored by Hansjorg Neth, Lael J. Schooler and Jorg Rieskamp) is en-titled Stopping Rules for Memory Foraging and reports MPG’s ongoing work on humanstopping rules. A series of three experiments has been conducted to explore the rules andheuristics that humans use to stop semantic memory retrievals. The results so far indicatethat some people are capable of escaping diminishing returns by abandoning tasks for whichthe marginal gains fall below some threshold and that decisions to stop and switch tasks areinfluenced by both bottom-up and top-down features of the task and task environments. Al-though the analyses included in this report are mostly descriptive, they provide guidancetowards the development of computational models and LarKC plug-ins that promise to in-crease system efficiency by stopping processes at the right time.

Chapter 3 (co-authored by Jie Xiang, Rifeng Wang, Lijuan Wang, Haiyan Zhou, YulinQin, Ning Zhong, and Yi Zeng) is entitled Application of Heuristics and Heuristic Search-ing and reports WICI’s work on the selection and application of heuristics in the contextof human problem solving. Two experiments that investigated heuristics for selection andsearching by combining behavioral research with fMRI imaging and computational cog-nitive modeling techniques are reported. This work was financially supported by outsidefunding and is intended to serve as a extra support for the LarKC project. As it is unlikelythat the fMRI studies will provide applicable results within the timeframe of the LarKCproject they will be removed from future deliverables. But to demonstrate that the workannounced in D4.2.1 has been productive these studies are included here to gain closure onthis line of research.

As the two chapters represent two distinct lines of research and one of them will nolonger be continued within the LarKC project, our conclusions can be found in the respectivechapters.

11 of 59

FP7 – 215535

Deliverable 4.2.2

2. Stopping Rules for Memory Foraging

In short, we make search in our memory for a forgotten idea, just as werummage our house for a lost object. James (1890, Ch. 16)

Long-term memory operates like a second environment, (. . . )through which the problem solver can search. . . Simon (1996, p. 88)

2.1 Introduction

The research reported in this chapter was extensively motivated by Chapter 2, entitled Stop-ping Rules for Information Search, of LarKC Deliverable 4.2.1, Experimental Designs forMapping Human Search Strategies. Rather than repeating the content of that chapter, wewill briefly re-capitulate its main argument.

2.1.1 The Need for Smart Stopping Rules

Knowing when to stop is one of the most fundamental problems when engaging in any typeof activity. Most real-world problems do not have a pre-defined completion criterion. Forinstance, when trying to identify the leading experts on some health-related issue, searchingfor genetic codes contributing to a disease, or simply attempting to remember what we didlast summer the task itself is potentially infinite and does not provide clear guidance asto when a search for information, choice alternatives or a solution ought to be terminated.Importantly, both premature abandonment of a problem and excessive perseverance in itspursuit can be costly. Failing to recall some symptoms when diagnosing a disease can beas dangerous as continuing to retrieve long lists of increasingly irrelevant symptoms beforebeginning treatment. The problem of search termination resurfaces in an aggravated formwhen a system faces more than a single problem at once. When time and effort need to beallocated to multiple tasks finding the right moments to switch between tasks constitutes adifficult optimization problem.

The Large Knowledge Collider (LarKC) envisages a configurable platform for massivedistributed reasoning that aims to transform the web by making large amounts of semanticinformation accessible to machines and useful to human users (Fensel et al., 2008). Aban-doning the traditional panaceas of consistency and completeness the LarKC vision explicitlyembraces inconsistency and incompleteness. But incomplete processes (e.g., of data selec-tion, transformation and reasoning) need to be terminated at some point — and the qualityof both the process (in terms of its performance characteristics) and the end result (in termsof quality and usefulness) will crucially depend on the precise stopping rules used.

2.1.2 Stopping Rules in Animal Foraging Theory

Human beings have been called ‘informavores’ (Miller, 1983). The habitual pursuit andconsumption of information has been described as information foraging (Pirolli & Card,1999) and online search behavior has been characterized as sniffing information scent (Fu& Pirolli, 2007; Pirolli, 2007). Consequently, we pointed out the analogy between humaninformation search and animal foraging in our research proposal and turned towards behav-ioral ecology for guidance about possible stopping rules.

Foraging theory is essentially an economic approach to foraging behavior (see Stephens& Krebs, 1986, for an overview). Animals are assumed to inhabit a patchy environment in

12 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 2.1: Optimal patch leaving time t∗ according to the marginal value theorem of Charnov(1976). (Image source from Pirolli & Card, 1999, Figure 5a, p. 653.)

which energy gains by feeding on prey items are weighted against the costs of moving withinand between patches (incurred through locomotion, prey handling, or the risk of predation).Optimal foraging theory (MacArthur & Pianka, 1966) assumes that organisms forage byallocating their resources in ways that maximize their energy intake per unit of time.

The analogy to animal foraging is attractive because foraging theory yields formal mod-els on the basis of assumptions about environmental structure and an organism’s capabili-ties. By assuming that animals have adapted optimally to their environments these modelsyield empirical predictions in which normative models serve as theoretical benchmarks withwhich the actual behavior of animals is compared.

An example of a productive formal model is the marginal-value theorem (MVT) byCharnov (1976) that directly addresses the issue when the currently visited patch ought tobe deserted in favor of another one. It states that the optimal time to leave a patch (with aknown prey distribution and diminishing returns) is when the marginal rate of return matchesthe the average rate of gain so far.

Figure 2.1 illustrates that the optimal patch leaving time t∗ depends on the specific gainfunction g(tW ) of the current patch and the travel time tB between patches. The MVT allowsto make several qualitative predictions that have been confirmed for various organisms (seee.g., Cowie, 1977). The predictions of foraging theory are not only attractive to researchersin biology and psychology but also applicable to many real-world problems, particularlyin computer science and online information search (see Banks, Vincent, & Phalp, 2008;Bhavnani, Jacob, Nardine, & Peck, 2003; Browne, Pitts, & Wetherbe, 2005; Fu & Pirolli,2007; Pirolli & Card, 1999; Pirolli, 2007; Spink, Park, Jansen, & Pedersen, 2006; Spink &Cole, 2006; Zipf, 1949, for examples).

Importantly, Charnov’s MVT provides an optimal solution to the problem of patch leav-ing times but not the mechanism by which organisms could actually achieve this solution.Whereas the foraging models developed by ecologists address the questions “What is the op-timal strategy?” (to exploit a patch, given some constraints) and “How are animals doing?”(relative to the normative benchmark) a major question for psychologists and cognitive sci-entists is “How are animals doing?” Addressing this latter question requires consideringvarious cue variables and simple heuristics that could potentially be used to approximatethe optimal solution without the need for complex computations (e.g., Iwasa, Higashi, &Yamamura, 1981; Ydenberg, 1984; Nishimura, 1999; Green, 1984, 2006).

2.1.3 Stopping Rules in Psychology and Cognitive Science

In the context of information search and decision making the issue of stopping rules hasboth theoretical and practical implications. Theoretically, the termination of search directly

13 of 59

FP7 – 215535

Deliverable 4.2.2

affects possible outcomes by determining which facts or choice alternatives are considered.Practically, a principled understanding of stopping rules can be used to design more effectiveand efficient systems. As the amount of readily-accessible information increases knowingwhen to stop processing becomes a key skill of humans and a crucial component of tech-nology. Without functional stopping rules both humans and machines could easily drown indata or computation.

Given the importance of stopping rules for all aspects of information search it is surpris-ing how little psychological research has been conducted to specifically address this issue.

Task Switching

An area that specifically addresses stopping and switching decisions is the literature onhuman multitasking. Whereas the majority of work in this area focuses mainly on switchcosts there have been some recent studies that ask about the potential benefits of switchingand relate their approaches to the animal foraging literature.

Payne, Duggan, and Neth (2007) report a series of experiments that investigated dis-cretionary interleaving, i.e., the simultaneous completion of two independent tasks. Par-ticipants embraced the opportunity to spontaneously interleave two tasks. Importantly, fre-quently switching back and forth between tasks tended to yield an adaptive allocation oftime to tasks, e.g., spending the majority of time on the easier of two tasks. A complex pat-tern of time allocations, exit-latencies and between-item times was replicated across threeexperiments and could qualitatively be explained by Green’s rule (Green, 1984), which an-alyzes an organism’s within-patch time T as a linear function of a minimum within-patchtime Tmin plus a gain G component that is added upon finding each item i, i.e., T = Tmin + iG.A quantitative model fit was achieved by extending Green’s rule by an independent sub-goalparameter that specifies the probability of terminating a task directly after finding an item.

Hutchinson, Wilke, and Todd (2008) also explicitly relate cognition to animal foragingtheory in studying human patch leaving decisions in an experimental fishing task (see alsoWilke, 2006). In their results Hutchinson et al. (2008) emphasize some maladaptive tenden-cies: Compared to the optimal strategy participants delayed their switch decisions for toolong and spent too much time in ponds in which they had found more fish. The main cuespredicting a switch decision were the current time interval without a capture (analogousto the exit latency tX or giving-up time above), the interval preceding the last capture (lastIRT), and the total time T spent at the current pond. In contrast to Payne et al. (2007) theauthors attribute the observation of frequent short exit latencies as a possible instance of theConcorde (or sunk cost) fallacy (Arkes & Ayton, 1999) rather than to a tendency to switchupon sub-goal completion.

Hills, Todd, and Goldstone (2008) hypothesize that goal-directed cognition is an evo-lutionary descendant of spatial foraging behavior, as both external and internal search pro-cesses require the organism to strike a balance between exploration and exploitation. Byshowing that participants’ patch residence times in a cognitive (word search) task can beprimed by a spatial foraging task the authors argue in favor of generalized cognitive searchprocesses.

Although some studies in this tradition (e.g., Payne et al., 2007; Hills et al., 2008; Wilke,Hutchinson, Todd, & Czienskowski, 2009) describe the process of searching for words as‘internal’ search these searches are still based on external cues or stimuli (e.g., a sequenceof letters or a word search puzzle). In our research, we take the meaning of internal searchmore literally and ask how people stop searching their memory contents.

14 of 59

FP7 – 215535

Deliverable 4.2.2

ever, few studies have documented the role of motivation atretrieval.

One reason for the apparent failure to document a relationshipbetween motivation and memory at retrieval may be due to the factthat prior research has used, almost exclusively, fixed retrievalintervals for free-recall tasks. Giving participants a fixed retrievalinterval eliminates (experimentally) any effect that individual dif-ferences in persistence might have on the duration of memorysearch. Indeed, in a study that used a paired-associates task with anopen-ended response window, Loftus and Wickens (1970) showedthat manipulations of motivation (e.g., incentives) increased timespent searching memory.

As mentioned above, traditional memory paradigms typicallyemploy fixed retrieval intervals, in which participants are given aspecific amount of time (e.g., 2 min) to retrieve from memory. Thetop panel of Figure 1 presents a graphical depiction of this type ofdesign. A common dependent variable within this paradigm is theinterretrieval times (IRTs), defined as the latencies between suc-cessive retrievals (Rohrer & Wixted, 1994). IRTs provide insightinto the underlying dynamics of memory retrieval processes. Onelimitation of the fixed-retrieval-interval design is that it does notallow the researcher to measure how long the participant takesbefore terminating memory search.

The fixed-retrieval-interval paradigm can be modified to pro-vide an open-ended response window that allows participants toset their own retrieval intervals, as depicted in the bottom half ofFigure 1. When participants are given control over search termi-nation, two additional temporal variables become meaningful: (a)

the total time spent in search and (b) the latency between the lastsuccessful retrieval and the decision to terminate search. We referto the latter variable as the exit latency.1 The Xs in Figure 1 depictthe stopping points of two hypothetical participants, denoted A andB. As is the case for Participants A and B, the total time spent insearch and in the exit latencies likely differs across individuals andreflects one’s personal rule for choosing when to terminate search.

Both total time and exit latencies relate to how long peoplespend searching memory. Total time is a function of both thenumber of successful retrieval attempts and the amount of timespent searching without success. Obviously, total time will in-crease monotonically with the number of items retrieved, meaningthat total time confounds number of items retrieved with exitlatencies. In contrast, exit latencies measure how long peoplecontinue to search memory in the absence of an additional suc-cessful retrieval. In other words, it measures how willing peopleare to continue searching memory when they appear to haveexhausted the search space. Thus, exit latency is not confoundedwith the number of items retrieved and therefore serves as a purermeasure of an individual’s willingness to continue searching mem-ory. Moreover, given the negatively accelerated function relatingcumulative recall to time (Bousfield & Sedgewick, 1944), rela-tively large differences in participants’ willingness to continuememory search without successful retrieval might yield relativelysmall differences in the total number of items retrieved. Thus, exitlatencies should also be a more sensitive measure of the willing-ness to continue memory search than is the number of itemsretrieved.

In the present study, control of when to terminate search wasgiven to participants. This modification made it possible for us tocapture the degree to which motivation, in particular persistence,influences memory search processes. In much the same way thatpersistence influences how much time one spends studying, wesuggest, persistence is likely to influence how much time is spentattempting to retrieve information from memory. Given this as thecase, providing participants with a fixed retrieval interval masks avariable critical for assessing the role of motivation on memoryretrieval, namely, how long they are willing to continue searchingmemory before deciding that they can retrieve nothing more.

We measured individual differences in motivation using theNeed for Closure (NFC) Scale (Webster & Kruglanski, 1994). TheNFC Scale includes subscales that measure two main aspects ofhow people acquire and use information. We were concerned withthe Decisiveness subscale (see the Appendix), as it is assumed tomeasure how quickly people are willing to adopt a solution to aproblem and is negatively correlated with how long participantssearch for information in the environment before making a deci-sion (Kruglanski & Webster, 1996; Neuberg, Judice, & West,1997). As this subscale appears to measure persistence in infor-mation search, we predicted that it would show a negative corre-lation with exit latency: More decisive participants should truncatememory search more quickly.

1 By way of preview, we use the term exit latency to refer to the rawlatency between the final item retrieved and the decision to terminatesearch. The term exit rate refers to the rate transformation of the exitlatencies (1/exit latency). All statistical analyses were performed using theexit rate.

Experimenter Stops Search

SearchBegins

1st 2nd 3rd 4th

Retrievals

Experimenter Allotted Time for Retrieval

Inter-RetrievalTime (IRT)

Participant B Stops Search

SearchBegins

1st 2nd 3rd 4th

Participant A Stops Search

Inter-RetrievalTime (IRT)

Participant A’s Exit Latency

Participant B’s Exit Latency

Figure 1. Representation of the retrieval process. The top panel showsthe standard experimental design for multiple-item retrieval from memory.Here, the retrieval interval is determined by the experimenter, and theinterretrieval times are often measured. The bottom panel shows the designused in the present experiment. Here, the participant determines the re-trieval interval. The Xs demark the spots where two hypothetical partici-pants might choose to terminate search. IRT ! interretrieval time.

1109MOTIVATED TO RETRIEVE

Figure 2.2: Schematic view of the open-ended retrieval paradigm (from Dougherty & Harbison,2007, p. 1109). Different participants can have different retrieval intervals, affecting the total numberof items retrieved and introducing two new dependent variables: The total retrieval time T lasts fromthe beginning to the stopping of the search process; the exit latency tX from the last retrieval to searchtermination.

Stopping Memory Search

The idea that memory processes could resemble external navigation patterns is not new.Memory has figuratively been described as a mirror of the environment (Draaisma, 2000;Shepard, 2002) and as early as 1890 William James compared memory retrieval to the searchfor an external object (see the introductory quotes on page 12, James, 1890; Simon, 1996).

From a cognitive viewpoint, skin and skull do not necessarily constitute the bound-ary between internal and external environments. For instance, Newell and Simon (1972)refer to the external task environment during problem solving as “external working mem-ory” and researchers on cognitive representations frequently assume semantic knowledge tobe organized in structured (quasi-spatial) representations of networks and maps (Steyvers& Tenenbaum, 2005). The philosophical view of extended cognition (Clark & Chalmers,1998) claims that a blurred boundary between cognition and environment will “radicallyreconfigure our image of rationality” (Clark, 2001, p. 121). According to Clark (2003) wealready are living cyborgs, routinely wearing and relying on cognitive prostheses.

Although memory is the most well-studied construct of cognitive psychology (see Tul-ving & Craik, 2000, for an overview) there is relatively little research on the rules thatdetermine the decision to terminate memory search (Kahana & Miller, under revision).Dougherty and Harbison (2007) attribute the lack of research on memory search terminationto the almost ubiquitous use of fixed retrieval intervals in the study of free-recall tasks andpropose the use of an open-ended retrieval paradigm in which participants control when toterminate their retrieval attempts. Figure 2.2 illustrates that this change has two importantconsequences: First, if a participant B is willing to search his or her memory for a longerperiod of time, it is likely that he or she will retrieve more items. This implies that the num-ber of items retrieved on any given task is not only dependent on the properties of memoryencoding but also the participant’s motivation or willingness to continue search. Secondly,in addition to the inter-retrieval times (IRTs) the total time T devoted to the task by theparticipant and the exit latency tX , defined as the duration of time elapsed from retrievingthe last item to abandoning the task, become meaningful dependent variables that are infor-mative about the dynamics of memory recall. Harbison, Davelaar, and Dougherty (2008)and Harbison, Dougherty, Davelaar, and Fayyad (2009) compare the predictions of stoppingrule candidates with experimental results on the basis of a theoretical framework based onthe SAM memory model (Raaijmakers & Shiffrin, 1981).

15 of 59

FP7 – 215535

Deliverable 4.2.2

2.1.4 Heuristics for Stopping

Heuristics are effective and efficient solution methods that ignore information (Gigerenzer,Todd, & the ABC research group, 1999; Gigerenzer, 2000, 2008). Unlike optimizationmethods, which attempt to maximize some criterion, heuristics aim to satisfice, i.e., theychoose the first option that exceeds an aspiration level. This definition preserves key ele-ments of Herbert Simon’s characterization of methods of heuristic search as examples of ra-tional adaptation and as “methods for arriving at satisfactory solutions with modest amountsof computation” (Simon, 1990, p. 11) and emphasizes the positive potential of heuristics toguide information search and modify problem representations to facilitate solutions.

Heuristics and stopping rules are closely related. First, stopping rules are crucial build-ing blocks when specifying heuristic as a computational process model. For instance, anyprocess requiring information search needs to specify when to terminate the search for dataand begin processing. Secondly—and more centrally in the current context—stopping rulescan themselves be heuristics. For instance, Green’s rule (Green, 1984, see page 14) essen-tially is a heuristic that leads to an adaptive time allocation in patches for which the rewardfunction is not known. (See Green, 2006, for a Bayesian analysis.)

As pointed out above (see page 13) an early criticism of foraging theory was its pre-occupation with mathematically optimal solutions without concern for the mechanism orimplementation of those solutions. The search for satisfactory solutions to stopping andprey choice problems thus lead to an early exploration of heuristics that used little infor-mation (e.g., Iwasa et al., 1981; Ydenberg, 1984; Stephens & Krebs, 1986). Hutchinsonand Gigerenzer (2005) point out that animals’ rules-of-thumb closely resemble the simpleheuristics studied in human decision making. This rapprochement between biology and psy-chology is echoed in recent publications by leading behavioral ecologists: Ydenberg (2007)predicts that “after a long absence from the scene, ‘rules of thumb,’ based on a deeper appre-ciation of mechanisms, are poised for a re-emergence.” and McNamara and Houston (2009)write: “Behavioural ecologists have built complex models of optimal behaviour in simpleenvironments, we argue that they need to focus on simple mechanisms that perform well incomplex environments.”

Is search in memory governed by the same principles that make us rummage from roomto room in search of our keys? When do we stop searching our memory? We will attemptto answer these questions by conducting a series of experiments designed to illuminate thestopping and switching rules that humans use when dealing with problems of informationsearch. Although our research will initially focus on memory retrieval we assume that ourfindings will be applicable to other tasks like abstraction, decision making, reasoning, andproblem solving.

We conducted several experiments that investigate human free recall in analogy to ani-mal and information foraging (Stephens & Krebs, 1986; Pirolli & Card, 1999; Pirolli, 2007).Although we view the open-ended retrieval interval (Dougherty & Harbison, 2007) as an im-portant methodological innovation, we find that self-determined stopping is a necessary butinsufficient condition for discovering the stopping decisions of memory search. Many de-cisions to stop a task may actually be decisions to switch to another task. Thus, the studyof stopping rules requires the adoption of a more global perspective—a focus on an entireset of tasks and the relative attractiveness of each with respect to its alternatives. Thus, wepropose to follow the lead of foraging theory by studying not just individual tasks, but sys-tematically analyzing entire sessions, the ways in which the behavior of organisms unfoldsover time, and the precise structure of their task environments. The following experiments

16 of 59

FP7 – 215535

Deliverable 4.2.2

constitute the first steps towards a solution, but our results so far are descriptive and havenot yet provided a definitive answer.

2.2 Experiment 1: The Dynamics of Human Free Recall

2.2.1 Introduction

Consider two memory retrieval tasks:

• A. Name as many African countries as you possibly can.

• B. Name as many James Bond movies as you possibly can.

Both tasks are naming a particular category (African countries in the case of task A, JamesBond movies in the case of task B) and are asking that you retrieve exemplars in your head.The number of correct exemplars that you will be able to retrieve depends on your familiaritywith the categories, which depends jointly on the contents of your long-term memory (orknowledge) and your ability to retrieve items from it (recall). Introspection and experienceshow that the search for items will initially yield several results in quick succession, butslow down over time, and eventually seize entirely. The actual number of retrieved itemsand their temporal dynamics of this process is likely to vary between people and may varywithin one person at different points in time. If your total time to retrieve exemplars waslimited—e.g., you only had a total of one minute available to retrieve exemplars of bothcategories—you would have to decide whether and when to switch from one category to theother. If, when and how you would switch may also depend on your general knowledge, thedynamics of your memory retrievals, or other personal traits.

This example serves to illustrate several hypotheses that we tacitly assume to be trueabout human memory retrievals:

1. There are systematic differences between tasks: Some free recall tasks will yield moreexemplars (averaged across people) than others.

2. There are systematic differences between people: Some people will recall more items(averaged across tasks) than other people.

3. Each task will exhibit diminishing returns with respect to the average number of re-called items across people, i.e., during a given time interval (e.g., 20 seconds) peoplewill recall more items on average at an earlier time than at a later time.

4. Each person will exhibit diminishing returns with respect to his or her average perfor-mance across tasks, i.e., individuals will recall more items on average during a giventime interval at an earlier time than at a later time.

5. Systematic differences between tasks will influence people’s stopping and switchingdecisions.

6. Systematic differences between people will influence their stopping and switchingdecisions.

17 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 2.3: The interface showing a practice task of the experiment.

These assumptions vary from relatively trivial to fairly complex. Our general goal is toexplain and predict task stopping and switching decisions as a function of tasks (point 5)and people (point 6). Although we harbor strong intuitions that different tasks and differentpeople will exhibit different performance profiles, the precise shape and parameters of theseprofiles need to be investigated empirically.

The purpose of Experiment 1 is to collect baseline data on points 1 and 2 and describetheir dynamics over time in terms of diminishing returns (points 3 and 4). To obtain suchdata, we asked people to devote a fixed minimum time of 120 seconds to each of 20 memoryretrieval tasks.

2.2.2 Method

Participants

Sixty persons (40 women and 20 men, with an average age of 25.1 years) volunteered totake part in the experiment. Each participant was paid a sign-up fee of e 6 and received anadditional performance-based compensation (see Procedure).

Materials

As set of sixty questions was constructed. Each question identifies a natural category thatis identified by a domain label (e.g., ‘Geography’) and a verbal description (e.g., ‘Africancountries’) and contains multiple alternative exemplars (e.g., ‘Algeria’, ‘Angola’, ‘Benin’,etc.). Categories were drawn from diverse areas of background knowledge (e.g., arts, brands,sciences, sports) and the number of true exemplars varied widely (from 4 to 64 items).The full set of questions and their corresponding number of true exemplars is available inAppendix A.1 (Tables A.1 and A.2).

18 of 59

FP7 – 215535

Deliverable 4.2.2

Apparatus

The software interface to the puzzle was programmed in MS Visual C# 2008. The taskwindow illustrated by Figure 2.3 is displayed centrally on a 17-inch LCD-display in is sub-divided into two panels:

• The upper panel shows a horizontal time bar in which an initial time of 120 seconds(represented in green) slowly depletes (in 1 second intervals) and the overall numberof points scored so far.

• The lower panel shows the current task—identified by its domain (e.g., ‘Sports’) andquery (‘Formula-1 world champions (since 1980)’)—and contains a white text-entryfield in which individual answers are to be entered (by typing an answer and pressingthe ‘Return’-key or clicking on the ‘Eingabe’-button).

Upon entering an answer candidate string S (which can consist of a single or multiplewords) is evaluated by comparing it to the stored strings of correct solutions and the programprovides immediate feedback. If S is recognized as ‘correct and unique’ a message ‘S iscorrect.’ appears in the grey feedback field below the text-entry field and a point is scoredand added to both the tally of points scored on the current question (the red counter onthe lower panel) and to the tally of points overall (the red counter on the upper panel). Tofacilitate the recognition of correct answers the program stores multiple variants of a correctsolution (e.g., ‘Michael Schumacher’, ‘Schumacher’, ‘Schumi’) and tolerates spelling errorsby allowing S and a stored solution string of length n to differ by up to n/4 characters,measured by the Levenshtein distance between both strings.1 If the current string is not‘correct and unique’, it is classified as either ‘incorrect and unique’, ‘correct but repeated’,or ‘incorrect and repeated’. If either of these errors occurs a corresponding verbal messageis shown in the grey field below the text-entry field and no point is awarded.

The centered Continue- (German: Weiter-) button below both panels only becomes avail-able once the time interval of 120 seconds has elapsed or all correct exemplars of the currentcategory have been entered. Once the button is available participants are free to either con-tinue working on the current task or to switch to the next task by pressing the button.

Design

Each participant answered a unique subset of 20 questions out of the population of 60 ques-tions. Questions were counter-balanced such that any three consecutive participants receivedthe entire population of 60 questions once.2

Procedure

On entry to the experimental laboratory, participants were presented with written instruc-tions and with six practice tasks. Participants were instructed that they had to spend at least120 seconds per task, but could move on to the next questions if they had entered all correctanswers. They were free to remain in a task for as long as they wanted after the 120-second

1This implies that for any four letters one insertion, replacement or deletion of a letter is tolerated.2This was accomplished by randomly selecting 20 out of 60 questions for participant n, randomly select-

ing 20 out of the 40 remaining questions for participant n + 1, and assigning the 20 remaining questions toparticipant n+2, for any sequence of three participants.

19 of 59

FP7 – 215535

Deliverable 4.2.2

Table 2.1: Experiment 1: Descriptive results per participant and question.

View: Variable: Min: Max Mean: STD:

(a) per participant: total points (20 tasks) 60 254 130.8 39.2mean time per task (in sec.)‡ 113.6 195.1 135.4 45.8

(b) per question: total points in 120 sec (20 participants) 13 495 121.6 89.1total points overall (20 participants) 13 596 130.8 104.9mean time per task (in sec.)‡ 54.7 253.0 135.4 26.6

‡: Tasks lasted a minimum of 120 seconds unless all alternatives had correctly been entered.

interval had elapsed. Their goal was to maximize the total number of correctly entered ex-emplars (or total points) over all tasks. This objective was reinforced by rewarding everypoint by an additionale .03, which resulted in a monetary bonus betweene 1.80 ande 7.62.

Participants on average completed the 20 test questions within 48 minutes.

2.2.3 Results

Table 2.1 presents some basic descriptive results of this experiment. As predicted, the num-ber of correct category exemplars retrieved varied considerably both between participantsand between questions. Participants on average retrieved a total number of 130.8 items andspent 135.4 seconds on a task, but the minima, maxima and standard deviation values inTable 2.1a show that the ranges of both points and times are large. Similarly, the averagetotal number of items per question was 121.6 after 120 seconds or 130.8 overall, but theranges of both performance measures varied immensely between questions (Table 2.1b).

An impression of the events during a particular trial (i.e., one participant answering oneparticular question) is provided by a point/error-by-time diagram that increments a counterfor each correct or incorrect entry during the entire duration of the trial. For instance,Figure 2.4 displays the time-course of points and errors for participant 6 and question 17(‘African countries’). The first entry (after about 5 sec) was scored as an error (red line),but the next 14 entries (occurring within approximately 60 sec) were all correct (upper line).After 70 seconds there merely were two more incorrect and one more correct entry (yielding

Figure 2.4: Point/error-by-time diagram for Participant 6, Question 17 (‘African countries’). Theupper step function indicates a total of 15 correct entries, the lower step function indicates 3 incorrectentries.

20 of 59

FP7 – 215535

Deliverable 4.2.2

(a) Participant 6, all 20 questions. (b) Question 17, all 20 participants.

Figure 2.5: Panel (a) shows the cumulative points of Participant 6 over all 20 question. The upperstep function indicates 173 correct entries, the lower step function indicates 111 incorrect entries.Panel (b) shows the cumulative points on Question 17 (‘African countries’) over all 20 participants.There were 178 correct entries (upper line, right y-axis) and 48 incorrect entries (lower line) upto 120 sec (dashed vertical line). Additional entries were made by the participants who chose tocontinue working on the task. The number of participants remaining in the task is indicated by theblack line (left y-axis). At 200 seconds 5 participants have not yet left the task.

a total of 15 points and 3 errors) and the participant chose to leave the task shortly after the120 sec minimum time interval (dotted vertical line) elapsed.

We hypothesized that the number of correct retrievals would show diminishing returnsand this particular example (i.e., the upper line in Figure 2.4) certainly exhibits the char-acteristic reduction of new items as time progresses. When adding all points and errors ofparticipant 6 over all 20 tasks or all points and errors on question 17 over the 20 participantsthat received it the resulting curves exhibit a smoother diminishing return characteristic (seeFigures 2.5(a) and 2.5(b), respectively).

As the experiment comprised 60 different tasks and 60 different subjects we obtained120 curves, each showing the cumulative points and errors of a particular task (averagedover 20 subjects) or subject (averaged over 20 tasks). As a mathematical model for theperformance on each question and each participant we fitted power functions of the formy = axb to each cumulative curve. Appendix A.2 (pp. 49ff.) contains these functions.

On yet another level of aggregation we determined the average number of cumulativeretrievals (for an average trial and average person) over the 120-second interval. We com-puted this function by adding all points that have been scored up to a particular time anddividing the resulting sum by the number of participants that were active in their respectivetask at this time. The resulting curve and a fitted power function is shown in Figure 2.6.

2.2.4 Discussion

The results of this experiment confirm several of our initial assumptions by revealing sys-tematic differences in performance both between tasks and between participants. The num-ber of correct retrievals on a task or a participant clearly exhibited the diminishing returnscharacteristics. Our measurements allow us to quantify this effect and predict the expected

21 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 2.6: Mean number of cumulative retrievals over 120 seconds (dotted line), averaged over allquestions and participants. The smoother curve (continuous line) shows a power function that wasfitted to the empirical data.

yield of each task for up to 120 seconds. Thus, this experiment provides important baselinedata for subsequent experiments.

2.3 Experiment 2: Discretionary Stopping and Switching

2.3.1 Introduction

In the previous experiment (Section 2.2) participants had to spend a fixed minimum time of120 seconds on each individual task and we measured performance as a function of theirtime-on-task. In the present experiment, participants are given an overall time budget (of45 minutes), but are free to switch to another task at any time. This self-paced procedureensures that time-on-task and giving-up times are now at the participants’ discretion (as inPayne et al., 2007; Wilke et al., 2009) and implies that the number of tasks completed byeach participant is no longer constant, but varies between participants as a function of theirpropensity to switch.

Additionally, we introduced a switch cost factor by imposing a 2- or 12-second lock-outtime upon every task switch.

2.3.2 Method

Participants

Fifty persons (24 women and 26 men, with an average age of 25.2 years) volunteered totake part in the experiment. Each participant was paid a sign-up fee of e 8 and received anadditional performance-based compensation (see Procedure).

22 of 59

FP7 – 215535

Deliverable 4.2.2

Materials

The same set of sixty questions as described in Section 2.2.2 (page 18) was used (see alsoAppendix A.1, Tables A.1 and A.2, for the full set of questions).

Apparatus

The task interface was programmed in MS Visual C# 2008 and identical to that described inSection 2.2.2 (see Figure 2.3 on page 18) with three differences:

1. The horizontal time bar on the upper panel indicated the time remaining until the endof the experimental session, rather than the minimum time on any individual task,and was initialized to 45 minutes. The corresponding label read Remaining time (inGerman: ‘Verbleibende Zeit’).

2. The label of the lower panel did no longer indicate the total number of tasks available,i.e., it read only ‘Task X’ rather than ‘Task X of 20’ before.

3. The central Continue (in German: Weiter)-button was enabled at all times, allowingparticipants to switch tasks whenever they wanted.

All other aspects of the interface—the display of questions, spelling-tolerant analysis ofentered strings, feedback and tally of points—were as described in Section 2.2.2.

Design

Each participant encountered a subset of up to 60 questions in a random order. A between-subjects factor of switch cost was counter-balanced across subjects. For one half of theparticipants it took 2 seconds to switch to the next task, for the other half it took 12 secondsto switch to the next task.3

Procedure

Participants were presented with written instructions and a five minute practice phase thatcontained up to five tasks. Participants were instructed that they had a total time budget of45 minutes (2700 seconds) to work on as many tasks as they wished. Their goal was to max-imize the total number of correctly entered exemplars (or total points) over all tasks. Thisobjective was reinforced by rewarding every point by an additional e .03, which resulted ina monetary bonus between e 1.98 and e 13.35.

2.3.3 Results

The self-paced procedure of this experiment implied that different participants worked ondifferent numbers of questions within the 45 minute period. On average, participants an-swered 46 questions. The minimum number of questions was 19 and 10 participants (20%of all participants) exhausted the entire set of 60 questions before the test period had ended.Participants scored a total of 228.92 points on average (incurring a bonus reward of e 6.87)(minimum of 66 points, maximum of 445 points).

3As this factor did not yield any main effects we will not analyze it further in the context of this document.

23 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 2.7: Histograms of observed points and question times.

Figure 2.7 shows histograms of all trial times and points scored per question. The av-erage trial time was 50.35 seconds, with a range from .78 to 420.48 seconds. 9.8% of allquestions were abandoned in less than 10 seconds, 38.7% in less than 30 seconds, 70.6% inless than one minute, 92.8% in less than two minutes. The average number of points on aquestion was 4.98, with a range of 0 to 41. 19.0% of all questions yielded no points, 65.7%yielded up to 5 points, 88.4% yielded 10 or fewer points, and 97.8% yielded up to 20 points.

Again of interest are the temporal dynamics of retrievals when averaging them overparticipants or questions. Figure 2.8 shows the points scored by an individual participant(No. 10) on three tasks as step-functions in which the abscissa represents time-on-task andthe ordinate represents points per task. Clearly, this participant spent more time and scoredmore points on Question 16 (European capitals) than on Question 30 (Nobel laureates inliterature).

Figure 2.8: Performance of Participant 10 on three tasks and over all tasks.

24 of 59

FP7 – 215535

Deliverable 4.2.2

(a) Q16: European capitals. (b) Q38: Orchestra instruments. (c)Q30: Literature Nobel laureates.

Figure 2.9: Cumulative performance (in terms of points and rate of task abandonment) on the samethree tasks.

When averaging a participant’s step-functions a diminishing returns curve results (seethe green line in Figure 2.8). However, this is partly an artifact of the fact that earlier partsof this curve (e.g., the first 20 seconds) contain many more tasks than later parts (e.g., 60or more seconds). This illustrates that such cumulative curves need to be normalized (ordivided) by the number of data sources that are contributing to them.

The three panels of Figure 2.9 show corrected cumulative curves for three different ques-tions. The green curves are generated by dividing the total points obtained up to a particulartime by the number of participants (indicated by the descending black lines) that are stillactive in the task at that time. Clearly, the corrected questions still show the characteristicfeatures of diminishing returns. In addition, the comparison of the three tasks with the per-formances of Participant 10 on the same three tasks (as shown in Figure 2.8) shows that thisparticipant was representative of most other participants, both in terms of total points pertask and the rates at which people tended to abandon these tasks.

What happens, if the cumulative curves per participant are corrected by the number oftasks that a participant is still working on? The three panels of Figure 2.10 illustrate this forthree participants.

In contrast to the earlier Figure 2.8 the corrected Figure 2.10a shows that Participant 10maintained a linear rate of points per second for about 100 seconds. By contrast, the samecorrected curve for Participant 13 shows diminishing returns (Figure 2.10b). Note that bothsub-figures differ not just in the shape of the corrected cumulative curves (in green) but

(a) Participant 10. (b) Participant 13. (c) Participant 20.

Figure 2.10: Cumulative performance of three participants (in terms of points and rate of taskabandonment) across all tasks.

25 of 59

FP7 – 215535

Deliverable 4.2.2

also in the slope of the black curves (that denote number of active tasks). It appears thatParticipant 10 maintained the linear rate of points-per-second mainly by abandoning tasksmore readily than Participant 13.4

Does a linear gain curve guarantee good performance, i.e., a high yield of points? Notnecessarily, as illustrated by Figure 2.10c. Participant 20 shows a fairly linear gain curve (asindicated by e R2 = .98 of the best fitting regression line) but this rate is not very high (about.06 points-per-second). Thus, the linearity of a gain curve and its slope are independentin principle, i.e., high linearity can coincide with both good and with poor performance.However, when correlating the linearity of the gain curve (as measured by the R2 of the bestfitting regression line) with the number of points achieved a positive correlation of r = .45(p = .001) indicates that participants with more linear gain curves were more successfuloverall.

Appendix A.3 (pp. 53ff.) shows the corrected cumulative curves for all 50 participantsand 60 tasks.

2.3.4 Discussion

In summary, individual tasks showed diminishing returns, but also large variability in totalpoints and times-on-task.

This large variability in times-on-task and retrievals per task allows us to rule out thesimplest stopping rules, like the ‘stop-after-N-items’ and ‘stop-after-N-seconds’. Partici-pants definitely allocated time adaptively by leaving tasks that yielded fewer points morequickly than tasks that yielded more points.5

Some participants showed linear gain curves (i.e., a constant number points accruedin an average task over time). These participants managed to maintain a constant rate byadaptively leaving tasks before diminishing returns set in. This suggests that at least someparticipants were sensitive to the rate at which they generated points and abandoned a taskwhen the rate dropped below a threshold. However, as this could be achieved by multiplemechanisms (e.g., a giving-up-time rule that adjusts its threshold for leaving a task basedon the sequence of exemplars found in a task) more extensive analyses are necessary todetermine the mechanisms used to determine when to stop a particular task.

2.4 Experiment 3: Estimating Set Size and Retrievals

2.4.1 Introduction

We mentioned above (in Section 2.3.3) that—when given the opportunity—participantsabandoned 9.8% of all trials within 10 seconds. This rapid decision to leave tasks is some-what at odds with a strict bottom-up perspective that assumes that tasks are abandoned whenno reward can be obtained for a prolonged period of time. Instead, it suggests that peoplemake, at least sometimes, a quick evaluative judgment of a task’s potential yield and aban-don it when this does not seem high enough. To further analyze such a meta-cognitive com-

4A side-effect of abandoning tasks more quickly is that Participant 10 encountered more tasks overall (themaximum of 60, as opposed to 24 for Participant 13.

5Note that a positive correlation between points-per-task and time-per-task does not indicate that the pos-sibility of scoring more points caused participants to spend more time on it. The reverse is true too: Spendingmore time on a task leads to more points on it.

26 of 59

FP7 – 215535

Deliverable 4.2.2

ponent in our memory retrieval task we conducted an experiment that assessed participant’sability to make quick judgments about a target category’s set size and retrieval estimates.

2.4.2 Method

Participants

The same sixty participants as described in Section 2.2.2 took part in this study. In fact, thepresent experiment was run as a separate task prior to Experiment 1.6 There was a 5–10minute break between this experiment and Experiment 1.

Materials

The same set of sixty questions as described in Section 2.2.2 (page 18) was used. (Seealso Appendix A.1, Tables A.1 and A.2, for the full set of questions.) However, rather thanretrieving exemplars participants were asked two questions about every category:

1. (Set size:) How many (elements of this category) exist?

2. (Retrieval guess:) How many (elements of this category) would you retrieve in 2 min-utes?

The parts in parentheses were not displayed on screen, but explained in the instructions.Both questions require a numerical answer, entered as digits.

Apparatus

Figure 2.11 shows the screen interface of this experiment. The major change to the previousinterfaces (see Sections 2.2.2 and 2.3.2) was the removal of the horizontal timer bar andof the white text-entry field below the question display area. Instead of the text entry fieldparticipants saw an abbreviated form of the two questions with two entry fields that acceptedonly numerical digits. These were to be entered on the numeric keypad and followed by theEnter-key, which advanced the program to the next entry field (after the set size-question)or the enabled Continue-button (after the retrieval guess-question).

The display of questions and request of answers was subject to a rigorous time-regime.Each trial started with a 3-second countdown (during which the digits 3-2-1 were displayedin the question field). After those three seconds the question and question domain weredisplayed and the participant had a maximum of 4 seconds to enter his or her numericresponse to the set size question.7 As soon as the first guess has been entered (by pressingthe Enter-key) the cursor advanced to the next entry field and the retrieval guess questionhad to be entered within 3 seconds. The elapse of time was signaled by issuing a briefticking sound every second and a different sound when the 4- or 3-second interval elapsedwithout the required entry.

6This fixed order assumes that guessing the set size and expected number of retrievals to a category doesnot influence the ability to subsequently retrieve exemplars for a minimum of 120 seconds on a subset of thesame categories. Whereas it seems likely that any effect in this direction would only be minimal the oppositeorder (asking for 120 seconds of retrieval prior to estimating set size and number of expected retrievals) wouldclearly be problematic.

7As these 4 seconds also required participants to read and understand the question this interval was deemedtoo brief to retrieve many exemplars.

27 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 2.11: The interface showing a practice task of Experiment 3.

Every entry to the set size-question was followed by immediate feedback on the en-try’s speed (distinguishing between in-time vs. late answers) and accuracy (allowing for adeviation of up to 50% from a category’s actual population). Answers to the retrieval guess-question incurred immediate feedback on speed. This feedback was both visual (see thegreen- and red-colored squares in Figure 2.12) and auditory (by playing different sounds forin-time vs. late answers) and reflected in the scoring scheme that allowed for 0–3 points perquestion (one point per green square).

Design

Each participant encountered the full set of 60 questions in a random order.

Procedure

Participants were presented with written instructions to provide “quick numeric estimates”and aim for both speed and accuracy. To reinforce this instruction each estimate within thetime limit was rewarded by e .03 and each accurate estimate of set size was rewarded by anadditional e .03, allowing for a reward of up to e .09 for each question. Although it wasemphasized that they should enter their estimates as quickly as possible participants werealso informed that they were free to take discretionary breaks between any two questions bypressing the Continue-button (which was enabled after both guesses have been entered) onlywhen they were ready for the next question. Participants completed six practice trials beforethe main instructions were re-iterated. Participants then provided both types of estimates for

Figure 2.12: Immediate feedback on accuracy (labeled Prazision) and speed (Zeit) after each guesshas been entered.

28 of 59

FP7 – 215535

Deliverable 4.2.2

Table 2.2: Frequencies and mean latencies (in parenthesis) of set size-estimates by their speed andaccuracy.

Speed:Accuracy: On time: Late: Sums:

Accurate: 1925 (2.81) 145 (5.32) 2070 (2.99)Inaccurate: 1387 (2.94) 143 (5.89) 1530 (3.22)

Sums: 3312 (2.87) 288 (5.60) 3600 (3.09)

each of 60 questions, taking on average 11.8 seconds per question and about 15 minutes forthe entire session.

2.4.3 Results

Participants on average scored 146.40 points, yielding a monetary bonus of e 4.39, with aminimum of 123 points (e 3.69) and a maximum of 163 points (e 4.89).

Estimates of set size could be either accurate or inaccurate (if the entered number dif-fered from the actual population size by more than 50%) and be either in time or late (if thenumeric entry was registered more than four seconds after the onset of the question display).Table 2.2 shows the frequency and mean latencies for the resulting four types of retrievalestimates. 3312 (92.0% of 3600) estimates were on time with a mean latency of 2.87 sec-onds. 2070 (57.5%) estimates were accurate with a mean latency of 2.99 seconds. 1925(53.5%) estimates were both on time and accurate, with a mean latency of 2.81 seconds.Thus, participants overwhelmingly managed to respond within the 4-second time limit, butfewer than 60% of their responses were classified as accurate.

To further evaluate the accuracy of participants’ estimates of set size we contrast themwith the actual number of correct exemplars. Figure 2.13 compares the actual number ofexemplars that are correct solutions to each question with the quick set size estimates thatparticipants provided (irrespective of the time at which they were given). If participants’estimates were distributed normally around some ‘true’ estimate, 68.26% of all estimateswould lie within one standard deviation (SD) and 95% of all estimates would lie within 1.96SDs of this value. Counting the number of actual values that lie within these boundariesprovides a simple measure of accuracy. The true population size values for 44 out of 60

!"

#!"

$!"

%!"

&" #" '" $" (" %" )" *" +" &!" &&" &#" &'" &$" &(" &%" &)" &*" &+" #!" #&" ##" #'" #$" #(" #%" #)" #*" #+" '!" '&" '#" ''" '$" '(" '%" ')" '*" '+" $!" $&" $#" $'" $$" $(" $%" $)" $*" $+" (!" (&" (#" ('" ($" ((" (%" ()" (*" (+" %!"

,-./"0/,"012/"

3/45"6./00"78"0/,"012/"

Figure 2.13: Comparison of actual (circles) and mean estimated (triangles) set sizes for all 60questions. (Error bars around estimates represent one standard deviation.)

29 of 59

FP7 – 215535

Deliverable 4.2.2

!"!!#

$!"!!#

%!"!!#

&!"!!#

$# %# &# '# (# )# *# +# ,# $!# $$# $%# $&# $'# $(# $)# $*# $+# $,# %!# %$# %%# %&# %'# %(# %)# %*# %+# %,# &!# &$# &%# &&# &'# &(# &)# &*# &+# &,# '!# '$# '%# '&# ''# '(# ')# '*# '+# ',# (!# ($# (%# (&# ('# ((# ()# (*# (+# (,# )!#

-./012.324.5/61$%!7.8#

-./012.324.5/619:.77#

Figure 2.14: Comparison of mean actual (circles) and mean estimated (triangles) number of re-trievals (after 120 seconds) for all 60 questions. (Error bars around estimates represent one standarddeviation of the empirical estimates.)

tasks lie within one SD of the set size estimates; 57 out of 60 lie within 1.96 SDs. In otherwords, only 3 out of 60 estimates (5%) differ significantly from the true population value(assuming the standard significance level of α = .05). It is instructive to inspect the threequestions for which there are significant deviations:

No. Question/query true value: mean estimate:

9 Italian pasta types 44 13.9311 Characters of The Muppets TV Show 38 8.7246 Nobel peace laureates (since 1975) 48 21.68

In all three cases, the true population values are substantially higher than the estimatedvalues. Figure 2.13 shows that the majority of deviations from the true values are in thisdirection, i.e. questions generally tend to have more correct answers than participants esti-mated. This may partly be due to the fact that our stored results aimed to be very inclusiveto prevent frustrating instances of misses, i.e., people entering true exemplars that were notrecognized and rewarded.

The second numeric estimate for each question concerned the number of expected re-trievals within 120 seconds and needed to be entered within three seconds after the firstestimate to score a point for being on time. 3402 (94.5% of 3600) estimates were withinthe time limit, with a mean latency of 1.83 seconds. The mean latency of the remaining 198(5.5%) estimates was 3.9 seconds. Thus, the overwhelming majority of retrieval estimateswere within the 3-second time limit.

How accurately could participants estimate the number of retrievals for a question? Fig-ure 2.14 contrasts the actual number of retrievals after 120 seconds (as observed in Experi-ment 1, see Section 2.2 ) with the quick retrieval estimates that participants provided (irre-spective of the speed at which the estimates were entered and of the identity of person mak-ing the estimate and retrieving exemplars).8 The greater degree of correspondence betweenretrieval estimates and actual retrievals shows that participants were relatively successful inpredicting their memory performance. The mean actual 120-second retrieval numbers for 56out of 60 questions are within one SD of participants’ estimates. 60 out of 60 mean actualretrieval numbers are within 1.96 SDs of the estimated ones, i.e., there are no significantdeviations between participants’ estimates and number of actual retrievals.

8Note that all 60 participants provided guesses, but only 20 participants actually encountered each particu-lar problem.

30 of 59

FP7 – 215535

Deliverable 4.2.2

2.4.4 Discussion

Participants were able to provide quick estimates about the set sizes and number of expectedretrievals of our target categories. Whereas the set size estimates were not always accurate(and tended to under-estimate the number of actual exemplars) the retrieval estimates cor-responded closely to the actually retrieved number of items. The meta-cognitive ability toquickly judge how many items one will be able to retrieve may be an important predictor ofhow much time one is willing to allocate to a task.

2.5 Integrative Analysis

Although we presented the experiments in a different order, the logic behind their design isthat Experiment 1 and Experiment 3 provide auxiliary information to help explain partici-pants’ stopping and switching decisions in Experiment 2. More specifically, our overall goalis to answer the following question: What predicts the total time-on-task Ti for a particularquestion i? Although we have not yet discovered the solution, we can draw a sketch of ananswer.

When explaining human stopping decisions we can distinguish between two generaltypes of explanations:

1. bottom-up: The stopping point is determined by what happens while engaging inthe task. In the case of a free recall task, the stopping point is determined by thesuccession of items retrieved from memory.

2. top-down: The stopping point is determined by what is known about the task. Inthe case of a free recall task, a participant may have meta-cognitive insights aboutcategory set size and the number of expected retrievals without or prior to actuallyretrieving those items.

As both of these types of explanation are not mutually exclusive we could insert a qual-ifying “partially” before “determined” in both points. The following analyses suggest thatsome combination of both approaches may be our best candidate explanation.

Given our sequence of experiments, the following variables could serve as possible pre-dictors for the mean observed task time Ti of question i in Experiment 2:

1. True set size Si for each question i.9

2. Number of retrievals N after some time interval (e.g., 10s, 30s, 60s, 90s, 120s, totaltime T ) of Experiment 1.

3. Quick estimates of question set size Si and expected number of retrievals Ni (collectedin Experiment 3, but in principle available to participants of Experiment 2.)

4. Number of total retrievals N in Experiment 2 at Ti (as upper benchmark predictor).

To get an idea about the predictive power of these variables we first conduct separatelinear regressions for each variable. Table 2.3 shows the proportion of variance explained

9Note that the true number of exemplars could be a good predictor, but is unknown to participants for mostquestions.

31 of 59

FP7 – 215535

Deliverable 4.2.2

Table 2.3: Proportion of variance (R2) of Ti (in Experiment 2) explained by individual predictors(via linear regression).

Source: Predictor: adjusted R2:

None True set size Si .258Experiment 1 Number of mean retrievals N after 10s .205

Number of mean retrievals N after 30s .522Number of mean retrievals N after 60s .743Number of mean retrievals N after 90s .814Number of mean retrievals N after 120s .837Number of mean total retrievals N after Tmax .855

Experiment 2 Number of mean total retrievals N after Tmean .841Experiment 3 Quick estimate of set size Si .193

Quick estimate of number of retrievals Ni .580

(as indicated by adjusted R2 ) by various individual predictors if those are entered in a linearregression to predict Ti. The true set size Si of each question explains 25.8% of the variance,but is not available to participants. When considering the predictive power of the meannumber of retrievals from Experiment 1 it is not suprising that longer time intervals aremore powerful predictors. Similarly, it is to be expected that the total number of retrievalsin Experiment 2 is the best predictor overall, as finding additional retrievals ought to bethe reason for staying in a task (i.e., N and T of Experiment 2 are highly correlated). Thereal surprise among the predictors is the substantial contribution of the quick estimate ofthe number of retrievals Ni, which took the participants of Experiment 3 an average of 1.83seconds (on 94.5% of all estimates, see page 30) and nevertheless explains 58.0% of thevariance of T in Experiment 2. The observation that both the actual number of retrievals (asobserved in Experiment 1) and the quick estimates of numbers of retrievals (as measured inExperiment 3) are good predictors by themselves suggests that the best model may involvesome combination of the bottom-up and top-down perspectives.

A more integrative view combines several simultaneous predictors in a multi-linear re-gression. As many of the predictor variables are highly correlated to each other and someare only known to some participants we constrain this analysis on possible indicators thatare actually available to most participants. As 70.6% of all tasks in Experiment 2 were aban-doned in less than one minute (see page 24) we chose the number of average retrievals Nafter 10 and 30 seconds (from Experiment 1) as available bottom-up predictors and the quickestimates of set size Si and expected retrievals Ni (from Experiment 3) as possible top-downpredictors and enter these variables into a stepwise regression to see which variables aremost predictive and whether other variables carry additional predictive power after the mostpredictive ones are already included. When doing this, the estimate of expected retrievals Niis most predictive variable (accounting for the 58.0% of the variance, as seen in Table 2.3 ).The next predictor included is the number of average retrievals N after 30 seconds, whichyields another 12.8% of variance to boost the explained variance to 70.4%. Adding any ofthe other two variables does not yield a significant improvement. A combination of top-down and bottom-up measures yields the highest predictive accuracy.

32 of 59

FP7 – 215535

Deliverable 4.2.2

2.6 General Discussion

We will conclude this chapter by briefly sketching our next steps and future directions. Ourcurrent objectives fall into two broad categories:

• Data analysis and mathematical modeling: The results reported in the previous sec-tions have mainly been descriptive and need to be further analyzed by testing sta-tistical hypotheses. To further enlighten the stopping rules used by participants inExperiment 2 we want to test additional regression models and conduct analyses thattake into account the specific sequence of questions by individual participants (i.e.,their history) into account. Another opportunity for a more detailed analysis is tocompare the actual retrievals of participants in Experiment 1 with their own estimatesin Experiment 3, rather than merely comparing average retrievals with the averages ofretrieval estimates. Such analyses and more detailed mathematical and computationalmodels of possible strategies will allow us to select between alternative stopping rulecandidates and design more decisive empirical experiments.

• Computational cognitive modeling: The regression analyses of Section 2.5 suggestthat computational models of stopping in the context of memory retrieval will requirea semantic component in combination with more traditional foraging models. Fu &Pirolli’s (2007) SNIF-ACT is one model that we can look to for guidance in thisrespect. The essential idea behind SNIF-ACT is that a goal relevant query is used toevaluate the quality of some patch—a web link in the case of SNIF-ACT. The overalllevel of activation returned by the patch, or its information scent, is used as a measureof its quality. In our application, the scent of a query, such as European capitals,could be used to estimate the number of relevant exemplars that might be found inmemory and to set initial aspiration levels or other parameters of the foraging models.If successful, such activation augmented foraging models could be used as the basisfor LarKC plug-ins that exploit information scent.

Another opportunity for an application of our stopping rule research is the interleavingof selection and reasoning components. In collaboration with VUA we are currentlyexploring how different stopping rules affect performance characteristics when usingthe ACT-R architecture (Anderson et al., 2004; Anderson, 2007b) as a decider plug-into the LarKC platform.

We aim to contribute to a working implementation of stopping heuristics in LarKC Deliver-able D4.3.2 (due M24).

33 of 59

FP7 – 215535

Deliverable 4.2.2

3. Heuristics Application and Heuristic Searching

3.1 Motivation

In this section we will introduce our research on human strategies of heuristics applica-tion and searching. Heuristic methods are used to rapidly come to a solution that approxi-mates the best possible answer or ‘optimal solution’ (http://en.wikipedia.org/wiki/Heuristic). Based on such experience-based techniques human beings tend to display apowerful ability to select suitable information from complex environments and make rea-sonable judgments to solve problems.

In LarKC Deliverable 4.2.1 we introduced a simplified Sudoku task to investigate howhuman select heuristic rules and apply them to solve problems. Several characteristics ofthis research are worthy of noting.

First, to solve a Sudoku problem without existing heuristic knowledge a human partic-ipant would face a very large problem space that is likely to exceed the limits of humancognitive capacity. However, once participants have developed experience-based heuristics,the problem becomes much easier. This change in skill as a result of aquiring heuristicknowledge is the phenomenon in which we are interested. To better understand the acquisi-tion and principles of heuristic strategies, we carried out a series of experiments. The goal ofthese experiments was to find out the principles by which humans select heuristic rules thatcorrespond to the current task environment and apply them to solve Sudoku problems. Inour first experiment on the selection and application of heuristics (Section 3.2), the goal ofthe problem was clear, but participants have to choose heuristic rules based on the directionof the goal.

The goal of the second experiment on heuristic searching (Section 3.3) was to find outmore about how some heuristics outperform others when the problem goal was not clear andproblem-solvers have to examine the environment carefully and find the suitable heuristics.

We investigated the human heuristic strategies by integrating the methodologies of be-havioral observation, brain imaging, and computer modelling. This combination of methodsallows us to approach the issue in a deep way, especially the computer modelling techniquesbased on the ACT-R cognitive architecture in combination with the fMRI-imaging technique(Anderson, Qin, Jung, & al., 2007; Anderson, 2007a; Qin, Bothell, & Anderson, 2007). Therelative clear and precise processing progress in simulation not only could help us to under-stand human heuristic strategies, but also might provide cognitive inspirations for machineprocessing system. We hope that such research work on the heuristic strategies of humanscould be helpful to develop plug-ins with more powerful problem solving abilities in thecontext of the LarKC platform.

3.2 Experiment 1: Selection and Application of Heuristics

In this study, we are interested in the selection and retrieval of heuristic strategies. In asimplified Sudoku task we provide a “cue” to solve a problem, which is the symbol of aquestion mark (‘?’). Participants are trained to use this cue to select a suitable heuristic andthen apply it. This is a relatively simple and straightforward task, which allows us to makethe process of problem solving very uniform and consistent. By combining brain imagingand ACT-R modeling, we investigate goal driven processes, including the representation ofproblem spaces and memory retrieval.

34 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 3.1: Examples of simlified Sudoku problems.

Figure 3.2: Protocol of a scan trial

3.2.1 Method

Participants 19 college students (9 males and 10 females) from Beijing University ofTechnology participated in this study and were scanned after obtaining informed consent.The average age of participants was 22.8. All participants were right-handed native Chinesespeakers.

Task and Materials Event related fMRI data were recorded while participants were solv-ing simplified 4x4 Sudoku problems. Sudoku is a combinatorial number-placement puzzle,the goal of the puzzle is to fill a 4x4 grid so that each column, each row, and each of the four2x2 boxes contains the digits from 1 to 4 only one time each. In this study, we simplified thepuzzle and asked participants to determine the answer of the cell marked by the ‘?’-symbol(see Figure 3.1).

The experimental 2x2 design used two 2-level factors: number of solution steps (1-step vs. 2-steps) and problem complexity (simple vs. complex). There are four types ofconditions in total. In the 1-step condition, participants only needed to find the answerof the cell marked with the ‘?’-symbol (e.g., Figure 3.1a and b). In the 2-step conditionparticipants had to find the answer of the cell marked with the asterix ‘*’ before they couldfind the answer of the cell marked with the ‘?’ (e.g., Figure 3.1c) and d). On the other hand,in a simple condition, the participants only needed to check one column, one row, or one boxto find the answer for the cell marked with the ‘?’ (see Figure 3.1a)for an example of onlychecking one box); but in a complex condition, the participants had to check column and/orrow and/or box (see Figure 3.1b for an example). These conditions had little differencesin terms of their visual stimulus, but had substantial differences in the problem solvingprocesses, i.e., their problem representation and the necessary memory retrievals for the useof heuristics.

35 of 59

FP7 – 215535

Deliverable 4.2.2

As shown in Figure 3.2, a trial started with a red star shown for 2 seconds as a warning(the stimulus was visually shown on a black screen), then a period of maximally 20 secondsduring which the participants could solve the problem. When participants found the answerto the cell marked with ‘?’ they were asked to press a button immediately and speak outthe answer within a 2 second period. Participants were encouraged to finish the problem asaccurately and quickly as possible. After that, the correct answer was provided on the screenfor 2 seconds as feedback. Then there was a 10-second inter-trial interval (ITI; a white crossshown on the screen) and the participants were asked to take a brief break during this period.There were 5 sessions, each with 48 or more trials, and each session involved 4 types ofconditions, which were randomly selected with equal probability.

Scan Protocols and Data Processing The images were acquired on a 3.0T MR scan-ner (Siemens Trio+tim, Genmeny NMR equipment) and a SS-EPI (single shot echo planarimaging) sequence sensitive to BOLD (blood oxygen level dependent) signals was used toacquire the fMRI data. The functional images were acquired with the following parameters:TR=2000 ms, TE=30ms, flip angle=90, FOV=200mm x 200mm, matrix size=64x64, slicethickness=3.2 mm, slice gap=0 mm, and 32 axial slices with AC-PC on the 10th slice fromthe bottom of the brain.

Brain Imaging Data Processing Data preprocessing (e.g., motion correction) and sta-tistical analysis were performed with the NeuroImaging Software package (NIS, http://kraepelin.wpic.pitt.edu/nis/). Three participants were excluded due to head move-ment exceeding 5mm. All images were co-registered to a common reference structural MRIimage and smoothed with a 6-mm full-width half-maximum three-dimensional Gaussianfilter.

We defined several regions of interest (ROIs) according to the ACT-R (Adaptive Controlof Thought-Rational) theory (Anderson, 2007a; Anderson, Bothell, Byrne, & al., 2004).As a cognitive architecture, ACT-R assumes that cognition emerges through the interactionof a set of modules. Eight brain regions were mapped to these specific modules. In thisstudy, BOLD effects in 5 regions in the left brain that related to the Sudoku solving wereselected to as ROIs. The 5 regions were as followed: lateral inferior prefrontal cortex (PFC), centered at Talairach coordinates x =40, y =21, z =21, reflected retrieval of informationin declarative module; posterior parietal cortex (PPC) , centered at x =23, y =-64, z =34,reflected changes to problem representations in imaginal module; anterior cingulate cortex(ACC), centered at x =5, y =10, z =38, controlled various stages of processing and preventedthe problem solving state from distracting from the goal; caudate, centered at x =15, y =9,z =2, played an action-selection role; and fusiform gyrus (FG), centered at x =42, y =-60, z=-8, engaged in visual processing (Anderson, 2007a; Anderson et al., 2004). In each ROI,we calculated the percentage of signal change relative to the baseline (two scans before thestimulus onset scan of the trial) of the periods of 10 scans from the baseline of two scansbefore stimulus to the eight scans of problems solving after stimulus presentation.

ACT-R Modeling To obtain the systematically information processing in detail duringheuristic selection and retrieval, we built an ACT-R model to simulating the interaction ofcognitive components in brain. First, we set up the simulation environment to make themodel performing a cognitive task as a real participant. The environment of Lisp was usedsame with other ACT-R models. Then we defined declarative knowledge and proceduralknowledge. These two kinds of knowledge make a model have experience-based memory

36 of 59

FP7 – 215535

Deliverable 4.2.2

Table 3.1: Examples of declarative and procedural knowledge in 4×4 Sudoku.

Knowledge ExampleDeclarative Knowledge (chunk-type Sudoku x1 x2 x3 x4)

(chunk-type quesdata a1 a2 a3 a4)(Chunk-Type, Chunk) (p1 isa Sudoku x1 1 x2 2 x3 3 x4 4) · · ·

Procedural Knowledge Visual-?, visual-row, visual-col,visual-met, encode-row, encode-col,

(Production Rules) encode-met, get-answer, pressing,· · ·

Table 3.2: Parameters for predicting BOLD response of five modules.

Module Visual Goal Retrieval Imaginal Productionm 0.8 1 1 1.3 0s 1.5 0.9 2.5 1.5 1.5b 6 7 2 6.7 6

and cognitive capability. In ACT-R, declarative knowledge is presented as chunk definedby chunk-type and slots while procedural knowledge is presented as production rules likeif/then patterns. As shown in Table 3.1, in the model of Sudoku solving, chunks wererelated to heuristics representation, and production rules were related to problem spaces andheuristics encoding. Third, it is to set parameters for a new model. Most of parameters aredefault, except some of them do not fit to a new task. Table 3.2 listed out the parameters usedin our model including five predefined brain regions bilaterally, where m was the magnitudeof response, and s was the time scale and b was the exponent. Others used the default values.Finally, we could debug and run the model to get a prediction of behavioral and/or functionalfMRI BOLD response in Sudoku solving

3.2.2 Results

Behavioral Results The mean accuracy and response time were shown in Table 3.3. Forthe correct rate, the main effect of step was significant (F(1,15)=15.994, P< .001) and themain effect of complexity was significant (F(1,15)=7.320, P< .05), suggesting the problemsof 1-step and simple had higher correct rate. For the response time, it cost shorter timeto solve the 1-step problems than 2-step ones (mean latency for 1-step condition was 2.2second, and mean latency for 2-step condition was 4.95 second, F(1,15)=111.161, P<.001),and it cost much shorter time to solve the simple Sudoku problems than complex prob-lems (mean latency for simple condition was 2.75 second, and mean latency for complexcondition was 4.4 second, F(1,15)=77.499, P<.001). The result suggested that the longertime cost in the hard and complex problems might be related to the heuristics retrieval andapplication. We will consider the issue in detail in the result of simulation based on ACT-R.

fRMI Results based on ROIs Figure 3 showed the BOLD effects in predefined ROIs byconfirmation analysis, including bilaterally lateral inferior prefrontal cortex (PFC), poste-rior parietal cortex (PPC), anterior cingulate cortex (ACC), caudate, fusiform gyrus (FG).Stronger activation and longer BOLD effect were observed in all these areas in the condition

37 of 59

FP7 – 215535

Deliverable 4.2.2

Table 3.3: Correct rate and response time to solve 4 types of Sudoku problems in Experiment 1(correct rate in brackets).

Simple Complex1-step 1.5s (100%) 2.9s (99%)2-step 4s (97%) 5.9s (94%)

Table 3.4: Operations in five modules for 1-step simple 4×4 Sudoku.

Time Visual Goal Retrieval Imaginal Production

0.5Encode ? ? EncodeEncode Row Visualizing Focus Row

1.0Encode 4 Encoding 4 Encode

Encode 2,1 2,1 Encode

1.5Integrating 4, 2, 1 SolveRetrieving 4,2,1,(3) 3 RetrievePressing (3) Press key

of 2-step than that in the condition of 1-step. And there were stronger activation in all theseareas in the complex condition than that in the simple condition. The BOLD effects wereconsistent with the behavioral results, suggesting when participants were solving complexproblem, the brain was involved more during the processing. We will also consider the issuein detail in the result of fMRI simulation based on ACT-R.

ACT-R Modeling The correlation of response latency of ACT-R model and real RT datawas 0.999, and the difference was 0.04. And the mean correlation in five modules of pre-dicted BOLD effect and fMRI data was 0.87 and the mean difference was 0.057. Theseresult showed that the prediction was acceptable and the model was reasonable. Figure 5showed the BOLD responses in our ACT-R model in the five modules. Fusiform regionwas corresponding to the visual module. The predictions from the visual module showeda good match in two type tasks and the correlation with the left fusiform was 0.93. Theanterior cingulated cortex was corresponding to the goal module, and the correlation withthe left ACC region was 0.90. The prefrontal region was corresponding to the retrievalmodule. The correlation with the left prefrontal region was 0.83 which showed a less fit-ting. The parietal region was corresponding to the imaginal module. The correlation withthe left parietal region was 0.86. The caudate region was corresponding to the proceduralmodule. The correlation with this region was 0.82, the least fitting in the model (similar tothat observed by other researchers (Anderson, Ficham, Qin, & Stocco, 2008)). All thesepredictions of BOLD response showed that our modeling hypotheses on cognitive processeswere reasonable in some extent.

Tables 3.4 to 3.7 show the information processing processes on five modules of ACT-R for the four types of 4x4 Sudoku based on trace outputs of ACT-R model. The resultssuggested that all the five modules were involved in the processing and cooperated to successthe problem solving. With the increase of solving step and problem complexity, the moduleswere involved more in the processing.

38 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 3.3: BOLD responses in ACT-R theory and fMRI data in Study 1

Table 3.5: Operations in five modules for 1-step complex 4×4 Sudoku.

Time Visual Goal Retrieval Imaginal Production0.5 Encode ? Visualizing ? Encode

1.0Focus Row

Encode Row Encoding EvaluateEncode 3 3 Focus Col

1.5Encode Col Integrating EncodeEncode 4 Retrieving 3,4,(1,2) 4Encode 1 pressing 1 Evaluate

2.0 Integrating 3,4,1 Solve2.5 Retrieving 3,4,(1,2) 2 Retrieve

39 of 59

FP7 – 215535

Deliverable 4.2.2

Table 3.6: Operations in five modules for 2-step simple 4×4 Sudoku.

Time Visual Goal Retrieval Imaginal Production0.5 Encode? Encoding ? Visual?

1.0Encode 1 Retrieving 1,3,(2,4) 1,3 Visual RowEncode 3 Retrieve

1.5 Encode* 1,3* Judge

2.0Recode 1 RevisualRecode 3 Row

2.5Encode 2 Retrieving 1,3,2,(4) =4 Visual Col

Retrieve

3.0Recode 1 Encoding 1,3,4 RevisualRecode 3 Row

3.5 Retrieving 1,3,4,(2) 1,3,4 Retrieve4.0 Pressing =2 Press Key

Table 3.7: Operations in five modules for 2-step complex 4×4 Sudoku.

Time Visual Goal Retrieval Imaginal Production0.5 Encode? Encoding ? Visual?1.0 Encode 1 1 Visual Col

1.5Visual MetRetrieve

2.0 Recode 3 1.3 Retrieve2.5 Encode* Retrieving 1,3,(2,4) 1,3,* Judge3.0 Encode4,3 Visualizing 4,3 Visual Row

3.5Visual Col

Encode 1 Retrieving 4,3,1,(2) =2 Retrieve4.0 Recode? Visualizing 2,? Visual

4.5Visual Col

Recode 1 Retrieving 2,1,(3,4) 2,1,? Retrieve5.0 Recode 3 2,1,3 Visual Met5.5 Retrieving 2,1,3,(4) Retrieve6.0 Pressing =4 Press Key

40 of 59

FP7 – 215535

Deliverable 4.2.2

3.2.3 Conclusions

In Experiment 1, we used a ”cue” directed simplified Sudoku task to investigate heuristicsselection and application. The brain image data showed that the areas of prefrontal cortex(PFC), posterior parietal cortex (PPC), anterior cingulate cortex (ACC), caudate, fusiformgyrus (FG) were involved in the processing of Sudoku solving. The fitness of model pre-diction showed that several cognitive modules played important roles during the processingof heuristics retrieval. The first one was the imaginal module of representation. For a givenproblem like 4*4 Sudoku, after visual recognition and checking, participants had to keep thevisual information, including the information about digitals and their position, in the mind,and updated and integrated the problem states and a suitable heuristics to find the answer.When problems were harder, such as the 2-step and complex problems, participants had tokeep and integrate more information during problem solving. The second one was the goalmodule. The basic procedure of control state in the goal module includes visualizing the in-formation from input modality through eyes, encoding and integrating the information, andthen retrieving the heuristics to get the answer, pressing key and speaking the answer out inthe end. When problem was harder, the goal module had to control more operations. Andthe third one was the retrieval module. The simulated model showed that no matter whattypes of problems were, only one time of retrieval was operated in the module of retrieval,but the latency of retrieval was longer in the harder problems. So the results suggested thatinformation retrieval only happened after the operations and cooperation between modulesduring heuristics processing, and it was a results of pre-processing and a ”ready” state.

3.3 Experiment 2: Heuristic Searching

In study 1, we focused on the processing of heuristics retrieval and application, which meantthat participants only need to follow the cue of mark ”?” to decide which heuristics to be usedto solve the problem. While we didn’t consider the issue about how a heuristics emerged, sothat the problem could be solved very quickly and easy. So in study 2, we did not providethe cue of Sudoku solving, but ask participants to find the key position that could be solvedfirst. In this task, participants have to check the problem state carefully and then to judgewhich heuristics to retrieve and use. That is a processing of heuristics searching.

3.3.1 Method

Participants 20 college students (10 males and 10 females) from Beijing University ofTechnology participated in this study and they were scanned after obtaining informed con-sent. The average age of participants was 24.5, and all of them were right-handed nativeChinese speakers. None of the participants reported any history of neurological or psychi-atric diseases.

Task and Materials In the whole Sudoku problems mentioned above, in order to fill allof the empty grids, the first step is very important. Maybe just because of this number of thegrid, other grids become easily filled. So the position of the first number filled we called it”key”. It was a 2x2 designed experiment with two 2-level factors: anchored (yes vs. no) anddifficulty (easy vs. difficult). As shown in Figure 4, totally there are four types of conditions.In the condition of anchored problems, the ”key” is provided by being marked it with ”?”.

41 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 3.4: Examples of materials

In the condition of no-anchored, anchor was not provided, and participants were instructedto find the ”key” themselves. In ”no-anchored” problems, 2 zeros were to avoid the visualdifference compared with ”anchored” problems.

Stimuli were presented pseudo-randomly in an event related design. The stimuli pre-sentation paradigm was same as that in study 1. A trial starts with a red star shown for 2seconds as warning (the stimulus was visually shown on a black screen), then a period of 20seconds in maximum for the participants to solve the problem. To the anchored problems,participants needed to first press the key of the position (row and column) where the ”?” wasand then press the key where the ”?” cell should be filled. To the no-anchored problems,participants were asked to press the position (row and column) of the cell which he/she canget the answer at once and then press the answer. After the response, the right answer wouldbe present in the screen. Then there was a 10-second of inter-trial interval (ITI; a whitecross shown on the screen) and the participants were asked to take a rest in this period.There were 5 sessions in all, including 2 anchored problems and 3 no-anchored problems.Participants were encouraged to finish the problem as correctly and quickly as possible. Apost pen-and-paper survey was performed for each participant after fMRI experiment.

Scan Protocols The images were acquired on a 3.0T MR scanner (Siemens Trio+timNMR equipment) and a SS-EPI (single shot echo planar imaging) sequence sensitive toBOLD (blood oxygen level dependent) signals was used to acquire the fMRI data. Thefunctional images were acquired with the following parameters: TR=2000 ms, TE=25ms,flip angle=79, FOV=200mm200mm, matrix size=6464, slice thickness=3.2 mm, slice gap=0mm, and 35axial slices with AC-PC on the 13th slice from the bottom of brain.

Brain Imaging Data Processing Data preprocessing (e.g., motion correction) and statisti-cal analysis were performed with NeuroImaging Software package (NIS, http://kraepelin.wpic.pitt.edu/nis/). 3 participants were excluded due to head movement exceeding5mm. All images were coregistered to a common reference structural MRI image andsmoothed with a 6-mm full-width half-maximum three-dimensional Gaussian filter.

Same with Study 1, we defined the same five regions of interest (ROIs) according toACT-R (Adaptive Control of Thought-Rational) theory. The 5 regions were as followed:lateral inferior prefrontal cortex (PFC) , centered at Talairach coordinates x =40, y =21, z=21; posterior parietal cortex (PPC) , centered at x =23, y =-64, z =34; anterior cingulatecortex (ACC), centered at x =5, y =10, z =38; caudate, centered at x =15, y =9, z =2; andfusiform gyrus (FG), centered at x =42, y =-60, z =-8 (Anderson et al., 2004; Qin et al.,2003). In each ROI, we calculated the percentage of signal change relative to the baseline(two scans before the stimulus onset scan of the trial) of the periods of 10 scans from thebaseline of two scans before stimulus to the eight scans of problems solving after stimuluspresentation.

42 of 59

FP7 – 215535

Deliverable 4.2.2

Table 3.8: Correct rate and response time to solve 4 types of Sudoku in Study 2 (correct rate in thebracket).

Easy DifficultAnchored 2.3s (92%) 3.8s (91%)

No-Anchored 2.4s (89%) 5.6s (80%)

3.3.2 Results

Behavioral Results Data of 3 participants were removed because 2 of them had largebrain motion correction and the last one had not finished the experiments. So there were 17participants’ data in total to be analyzed. The mean accuracy and response time were shownin Table 8. For the correct rate, the main effect of anchor was significant (F(1,16)=16.450,P< .001), the main effect of difficulty was significant (F(1,16)=13.400, P< .01), and theinteraction between these two factors was significant (F(1,16)=4.994, P< .05). These resultssuggested that the problems of simple and with anchor had higher accuracy. For the responsetime, the main effect of anchor was significant (F(1,16)=60.776, P¡0.001), suggesting thatthe response times to solve problems with anchor were significantly shorter than that tosolve the problems with no-anchor. The main effect of complexity was also significant(F(1,16)=258.899, P¡0.001), suggesting that the response times to solve easy problems wereshorter than that to solve difficult problems. The interaction between these two factorswas significant (F(1,16)=57.167, P¡0.001), suggesting that anchor and complexity jointlyaffected this kind of problem solving.

fRMI Results Based on ROIs Figure 5 showed the BOLD effects in predefined ROIsby confirmation analsis, including lateral inferior prefrontal cortex (PFC), posterior parietalcortex (PPC), anterior cingulate cortex (ACC), caudate, fusiform gyrus (FG). These areasshowed sensitive to the two factors: anchored or not and problem difficulty. Stronger acti-vation and longer BOLD effect were observed in all these areas in the condition of with noanchor than that in the condition of with anchor. And there were stronger activation in allthese areas in the difficult condition than that in easy condition. The BOLD effects wereconsistent with the behavioral results and Study 1, suggesting when participants were solv-ing complex problem the brain was involved more during the processing. To be note that theactivation in the area of fusiform gyrus was stronger than that in Study 1, which might berelated to the careful visual check on the problem states when there was not a cue to guidethe problem solving direction. We are also going to investigate this heuristics searching andselection processing carefully in the simulation based on ACT-R. And this part of work isunder going and we will report this part of result next time.

3.3.3 Conclusions

In study 2, we investigated the processing of heuristics searching. According to the fMRIresults, there were significant activations in the predefined ROIs. The results suggested thatthe areas of lateral inferior prefrontal cortex (PFC), posterior parietal cortex (PPC), ante-rior cingulate cortex (ACC), caudate, fusiform gyrus (FG) also played central role duringheuristics searching. The high activation in fusiform gurus might indicate the important

43 of 59

FP7 – 215535

Deliverable 4.2.2

Figure 3.5: BOLD effects in predefined ROIs in Study 2

44 of 59

FP7 – 215535

Deliverable 4.2.2

role of visual checking before a heuristic searching, which would help to make sure howto recognize the problem space and guide the way to solve the problem. To investigatewhat principles are under the processing of heuristic searching, we are going to continue theresearch of combining fMRI and ACT-R modeling.

3.4 Summary

We explored human heuristics application and searching in two studies with the method ofcombining brain imaging and ACT-R based modeling. The simplified Sudoku task was usedin our studies, which could be a good paradigm to investigate the neural basis of heuristicsbecause it was a rather simpler task and could be finished in 10 seconds. The result showedthat the 5 following brain regions involved in heuristics processing and cooperated witheach other to succeed a systematic information processing: lateral inferior prefrontal cor-tex (PFC), reflected retrieval of information in declarative module, posterior parietal cortex(PPC), reflected changes to problem representations in imaginal module, anterior cingulatecortex (ACC), controlled various stages of processing and prevented the problem solvingstate from distracting from the goal, caudate, played an action-selection role, and fusiformgyrus (FG), engaged in visual processing. Except for their cooperation, the representationmodule of imaginal and control module of goal were more like the working memory andcentral processing unit in a system, and they supported the whole processing to go on. Beworthy of noting, the visual processing was not only just an input module, but also might de-cide the way of problem solving, since it was in charge of the recognition of problem states.As to the retrieval module, it selected information of heuristics from memory system afterthe difference between goal state and problem state had been eliminated, and this moduleseemed more like a be-ready state to operate the processing of retrieval.

45 of 59

FP7 – 215535

Deliverable 4.2.2

References

Anderson, J. R. (Ed.). (2007a). How Can the Human Mind Occur in the Physical Universe?USA: Oxford University Press.

Anderson, J. R. (2007b). How can the human mind occur in the physical universe? NewYork, NY, USA: Oxford University Press.

Anderson, J. R., Bothell, D., Byrne, M. D., & al. et. (2004). An Integrated Theory of theMind. Psychological Review, 111(4), 1036–1060.

Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. (2004). Anintegrated theory of the mind. Psychological Review, 111(4), 1036–1060.

Anderson, J. R., Ficham, J. M., Qin, Y., & Stocco, A. (2008). A central circuit of the mind.Trends in Cognitive Sciences, 12(4), 136–143.

Anderson, J. R., Qin, Y. L., Jung, K. J., & al. et. (2007). Information-processing modulesand their relative modality specificity. Cognitive Psychology, 54(3), 185–217.

Arkes, H. R., & Ayton, P. (1999). The sunk cost and Concorde effects: Are humans lessrational than lower animals? Psychological Bulletin, 125, 591–600.

Banks, A., Vincent, J., & Phalp, K. (2008). Natural strategies for search. Natural Comput-ing.

Bhavnani, S., Jacob, R., Nardine, J., & Peck, F. (2003). Exploring the distribution of onlinehealthcare information. Proceedings of the SIGCHI conference on Human factors incomputing systems, 816–817.

Browne, G. J., Pitts, M. G., & Wetherbe, J. C. (2005). Stopping rule use during web-basedsearch.

Charnov, E. (1976). Optimal foraging: The marginal value theorem. Theoretical PopulationBiology, 9, 129–136.

Clark, A. (2001). Reasons, robots and the extended mind. Mind and Language, 16(2),121–145.

Clark, A. (2003). Natural-born cyborgs. Minds, technologies, and the future of humanintelligence. Oxford, UK: Oxford University Press.

Clark, A., & Chalmers, D. (1998). The Extended Mind. Analysis, 58(1), 7–19.Cowie, R. (1977). Optimal foraging in great tits (Parus major). Nature, 268(5616), 137–139.Dougherty, M. R., & Harbison, J. I. (2007). Motivated to retrieve: How often are you willing

to go back to the well when the well is dry? Journal of Experimental Psychology.Learning, Memory, and Cognition, 33(6), 1108–1117.

Draaisma, D. (2000). Metaphors of memory: A history of ideas about the mind. Cambridge,UK: Cambridge University Press.

Fensel, D., Harmelen, F. van, Andersson, B., Brennan, P., Cunningham, H., Valle, E. D.,Fischer, F., Huang, Z., Kiryakov, A., Lee, T. K. il, Schooler, L. J., Tresp, V., Wesner,S., Witbrock, M., & Zhong, N. (2008). Towards LarKC: a platform for web-scalereasoning. In Proceedings of the ieee international conference on semantic computing(icsc 2008), august 4-7, 2008, santa clara, ca, usa. Los Alamitos, CA, USA: IEEEComputer Society Press.

Fu, W.-T., & Pirolli, P. (2007). SNIF-ACT: A cognitive model of user navigation on theWorld Wide Web. Human-Computer Interaction, 22(4), 355–412.

Gigerenzer, G. (2000). Adaptive thinking: Rationality in the real world. New York, NY:Oxford University Press.

Gigerenzer, G. (2008). Rationality for mortals: Risk and rules of thumb. New York, NY:Oxford University Press.

46 of 59

FP7 – 215535

Deliverable 4.2.2

Gigerenzer, G., Todd, P. M., & the ABC research group. (1999). Simple heuristics thatmake us smart. New York, NY, USA: Oxford University Press.

Green, R. F. (1984). Stopping rules for optimal foragers. The American Naturalist, 123(1),30.

Green, R. F. (2006). A simpler, more general method of finding the optimal foraging strategyfor Bayesian birds. Oikos, 112(2), 274–284.

Harbison, J. I., Davelaar, E. J., & Dougherty, M. R. (2008). Stopping Rules and MemorySearch Termination Decisions. In Proceedings of the 30th annual conference of thecognitive science society (pp. 565–570). Austin, TX: Cognitive Science Society.

Harbison, J. I., Dougherty, M. R., Davelaar, E. J., & Fayyad, B. (2009). On the lawfulnessof the decision to terminate memory search. Cognition, 111, 397–402.

Hills, T. T., Todd, P. M., & Goldstone, R. L. (2008). Search in External and Internal Spaces:Evidence for Generalized Cognitive Search Processes. Psychological Science, 19(8),802–808.

Hutchinson, J. M. C., & Gigerenzer, G. (2005). Simple heuristics and rules of thumb:Where psychologists and behavioural biologists might meet. Behavioural Processes,69(2), 97–124.

Hutchinson, J. M. C., Wilke, A., & Todd, P. M. (2008). Patch leaving in humans: Can ageneralist adapt its rules to dispersal of items across patches? Animal Behaviour, 75,1331–1349.

Iwasa, Y., Higashi, M., & Yamamura, N. (1981). Prey distribution as a factor determiningthe choice of optimal foraging strategy. The American Naturalist, 117(5), 710–723.

James, W. (1890). The principles of psychology, Vols. 1 & 2. New York, NY: Holt.Kahana, M. J., & Miller, J. F. (under revision). What makes recall stop? ?MacArthur, R. H., & Pianka, E. R. (1966). On optimal use of a patchy environment. The

American Naturalist, 100(916), 603–609.McNamara, J. M., & Houston, A. I. (2009). Integrating function and mechanism. Trends in

Ecology & Evolution, In Press, Corrected Proof, –.Miller, G. A. (1983). Informavores. In F. Machlup & U. Mansfield (Eds.), The study of

information: Interdisciplinary messages (pp. 111–113). Wiley-Interscience.Newell, A., & Simon, H.-A. (1972). Human problem solving. Englewood Cliffs, NJ:

Prentice Hall.Nishimura, K. (1999). Exploration of optimal giving-up time in uncertain environment: a

sit-and-wait forager. Journal of Theoretical Biology, 199(3), 321–327.Payne, S., Duggan, G., & Neth, H. (2007). Discretionary task interleaving: Heuristics for

time allocation in cognitive foraging. Journal of Experimental Psychology: General,136(3), 370–380.

Pirolli, P. (2007). Information foraging theory: Adaptive interaction with information. NewYork, NY: Oxford University Press.

Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106, 643–675.Qin, Y., Sohn, M.-H., Anderson, J. R., Stenger, V. A., Fissell, K., Goode, A., & Carter, C. S.

(2003). Predicting the Practice Effects on The Blood Oxygenation Level-Dependent(BOLD) Function of fMRI in a Symbolic Manipulation Task. Proceeding of the Na-tional Academy of Sciences of the United States of America, 100, 4951–4956.

Qin, Y. L., Bothell, D., & Anderson, J. R. (2007). ACT-R meets fMRI. In N. Zhong (Ed.),In proceedings of WImBI 2006, LNAI 4845 berlin. Berlin, Germany: Springer-Verlag.

Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psycholog-ical Review, 88(2), 93–134.

47 of 59

FP7 – 215535

Deliverable 4.2.2

Shepard, R. N. (2002). Perceptual-cognitive universals as reflections of the world. Behav-ioral and Brain Sciences, 24(04), 581–601.

Simon, H. A. (1990). Invariants of human behavior. Annual Reviews in Psychology, 41(1),1–20.

Simon, H. A. (1996). The sciences of the artificial (3rd ed.). Cambridge, MA: The MITPress.

Spink, A., & Cole, C. (2006). Human information behavior: Integrating diverse approachesand information use. Journal of the American Society for Information Science andTechnology, 57(1), 25–35.

Spink, A., Park, M., Jansen, B. J., & Pedersen, J. (2006). Multitasking during Web searchsessions. Information Processing and Management, 42(1), 264–275.

Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton Univer-sity Press.

Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks:Statistical analyses and a model of semantic growth. Cognitive Science, 29(1), 41–78.

Tulving, E., & Craik, F. I. M. (2000). The Oxford Handbook of Memory. New York, NY:Oxford University Press.

Wilke, A. (2006). Evolved responses to an uncertain world. Unpublished doctoral disserta-tion, Freie Universitat Berlin.

Wilke, A., Hutchinson, J. M. C., Todd, P. M., & Czienskowski, U. (2009). Fishing forthe right words: Decision rules for human foraging behavior in internal search tasks.Cognitive Science, 33, 497–529.

Ydenberg, R. C. (1984). Great tits and giving-up times: Decision rules for leaving patches.Behaviour, 90(1-3), 1–24.

Ydenberg, R. C. (2007). Foraging: An overview. In D. W. Stephens, J. S. Brown, & R. C.Ydenberg (Eds.), Foraging: Behavior and ecology. Chicago, IL, USA: University ofChicago Press.

Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction tohuman ecology. Cambridge, MA: Addison-Wesley.

48 of 59

FP7 – 215535

Deliverable 4.2.2

A. Appendices

A.1 Experimental Tasks

The six practice and sixty test tasks used in all stopping rule experiments are listed in Ta-bles A.1 and A.2. The last column (set size) specifies the number of correct exemplars thatwere stored in the program.

Table A.1: The tasks used in all memory foraging experiments (Part 1 of 2). As all studies wereconducted with native German speakers the tasks are presented in German.

Phase: ID: Domain: Query text (category): Set size:

Practice 1 Alphabet Vokale oder Umlaute 82 Sprache Auf ‘-nf’ endende einsilbige Worter 53 Farben Grund- oder Spektralfarben 64 Marken ‘Mode-/Bekleidungsmarken’ im Wert von uber $1 Milliarde (2009) 75 Sport Formel-1 Fahrerweltmeister (seit 1980) 146 Zahlen Primzahlen zwischen 1 und 100 (als Ziffer) 25

Total set size (practice phase): 65

Test 1 Astrologie ‘Tierkreiszeichen’ bzw. Sternbilder des Zodiaks/Horoskops 122 Astrologie Zeichen des ‘Chinesischen’ Tierkreises bzw. Erdzweige 123 Astrologie Zeichen des ‘Indianer’-Tierkreises bzw. Horoskops 124 Astronomie ‘Planeten’ unseres Sonnensystems 95 Biologie Getreidearten (Gattungen) 76 Biologie Arten von (Gross- und Klein-) ‘Katzen’ (Felidae) 387 Biologie Gattungen mitteleuropaischer ‘Laubbaume’ 168 Biologie Gattungen der ‘Kieferngewachse’ (Pinaceae) 119 Ernahrung Italienische ‘Nudelsorten’ 44

10 Ernahrung Von ‘McDonalds’ angebotene Fleischprodukte 2311 Fernsehen Stars der TV-Show ‘The Muppets’ 3812 Film Spielfilme von ‘David Lynch’ 1013 Film Hauptfigur des 1. ‘Krieg der Sterne’ bzw. ‘Star Wars’ Films (1977) 1014 Film ‘James Bond’ Spielfilme (Titel) 2315 Geographie Lander bzw. Staaten in ’Europa’ 4816 Geographie ‘Europaische Hauptstadte’ (aktuell) 4817 Geographie Lander bzw. Staaten in ‘Afrika’ 5318 Geographie Lander bzw. Nationalstaaten ‘(Nord- Sd- und Mittel-) Amerikas’ 3819 Geographie Lander bzw. Nationalstaaten ‘Asiens’ 5120 Geographie Grossstadte in Deutschland (mind. 200.000 Einwohner) 38

Continued in Table A.2. . .

A.2 Experiment 1: Cumulative Curves for each Participant andQuestion

Figures A.1 and A.2 show the cumulative points (upper lines) and errors (lower lines) foreach participant (over all questions) with fitted power functions (dotted lines) of the formy = axb.

Figures A.3 and A.4 show the cumulative points (dark upper lines) and errors (light lowerlines) for each question (over all participants) with fitted power functions of the form y =

49 of 59

FP7 – 215535

Deliverable 4.2.2

Table A.2: The tasks used in all memory foraging experiments (Part 2 of 2). As all studies wereconducted with native German speakers the tasks are presented in German.

Phase: ID: Domain: Query text (category): Set size:

Test 21 Geographie Grossstadte in Osterreich oder Schweiz (mind. 100.000 Einwohner) 1122 Geographie Bundesstaaten der USA (aktuell) 5023 Geographie Hauptstadte US-amerikanischer Bundesstaaten (aktuell) 5024 Geographie Deutsche ‘Bundeslander’ (aktuelle BRD) 1625 Geographie Hauptstadte deutscher Bundeslander (aktuell) 1626 Geographie ‘Kontinente’ der Erde 727 Geographie Hochste ‘Berge’ bzw. ‘Gipfel’ der Kontinente 728 Geographie Lander bzw. Staaten mit mehr als 80 Millionen Einwohnern (2009) 1629 Gesellschaft Gesetzliche ‘Feiertage’ in Deutschland 2430 Literatur Literaturnobelpreistrager (1945 bis heute) 6431 Marken ‘Biermarken’ mit uber $1 Milliarde Umsatz (2008) 1632 Marken ‘Technologiemarken’ im Wert von uber $5 Milliarden (2009) 1933 Marken ‘Olkonzerne’ (Tankstellen) im Wert von uber $700 Mio (2009) 1034 Marken ‘Kaffeemarken’ im Wert von ber $600 Mio. (2009) 835 Marken ‘Sprudel- bzw. Wassermarken’ im Wert von ber $300 Mio (2009) 936 Musik Mitglieder der Pop-Band ‘The Beatles’ 637 Musik Opern von Wolfgang Amadeus Mozart 2138 Musik Instrumente eines Sinfonieorchesters 3039 Musik Mitglieder der Pop-Band ‘ABBA’ 440 Mythologie Olympische Gotter der ‘griechischen’ Mythologie 1241 Mythologie Olympische Gotter der ‘romischen’ Mythologie 1242 Politik Deutsche ‘Bundeskanzler’ (BRD seit 1949) 843 Politik Amerikanische ‘Prasidenten’ (USA seit 1945) 1344 Politik Deutsche ‘Aussenminister’ (BRD seit 1951) 1045 Politik Mitgliedsstaaten des UN-Sicherheitsrates (2009) 1546 Politik Friedensnobelpreistrager (seit 1975) 4847 Sport ‘Wimbledon’ Gewinner im Herren- oder Dameneinzel (seit 1980) 2648 Sport Vereine der ‘Fussballbundesliga’ (abgelaufene Saison 2008/09) 1849 Sport Austragungsorte der olympischen Winterspiele (bis heute) 1750 Sport Austragungsorte der olympischen Sommerspiele der Neuzeit (bis heute) 2251 Sport Basketballteams der U.S.-Profiliga ‘NBA’ (Saison 2008/09) 3052 Sport Vereine der ‘2. Fussballbundesliga’ (abgelaufene Saison 2008/09) 1853 Sport Arten von Figuren beim ‘Schachspiel’ 654 Wirtschaft Programme der ‘Microsoft Office’-Suite (bis heute) 2055 Wirtschaft Produkte (hard- oder software) der Firma ‘Apple’ (aktuell) 4456 Wirtschaft US-Amerikanische Autohersteller (aktuell) 2057 Wirtschaft Deutsche Autohersteller (aktuell) 1058 Wirtschaft Japanische Autohersteller (aktuell) 1359 Wirtschaft Unternehmen im Deutschen Aktienindex ‘DAX’ (5 2009) 3060 Wirtschaft ‘Fluglinien’ mit ber 18 Mio. internationalen Passagieren (2007) 9

Total set size (test phase): 1326

50 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.1:C

umul

ativ

epo

ints

(dar

kup

perl

ine)

and

erro

rs(l

ight

low

erlin

e)pe

rsub

ject

(Par

ticip

ants

1–30

)with

fitte

dpo

wer

func

tions

.

51 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.2:C

umul

ativ

epo

ints

(dar

kup

perl

ine)

and

erro

rs(l

ight

low

erlin

e)pe

rsub

ject

(Par

ticip

ants

31–6

0)w

ithfit

ted

pow

erfu

nctio

ns.

52 of 59

FP7 – 215535

Deliverable 4.2.2

axb. The decreasing step function (in black) denotes the number of participants still workingon a task. The quality of the fit is indicated by the R2 ≤ 1 values. Only the cumulativeretrievals during the first 120 seconds on each question (were all subjects were still presentunless they had already entered all correct answers) were fitted.

A.3 Experiment 2: Cumulative Curves for each Participant andQuestion

Figures A.5 and A.6 show the cumulative points (upper line) and errors (lower line) foreach participant (over all questions). Each cumulative curve is corrected by dividing the rawnumber of cumulative points by the number of tasks still active (decending line from left toright).

Figures A.7 and A.8 show the cumulative points (upper line) and errors (lower line) foreach question over all participants. Each cumulative curve is corrected by dividing the rawnumber of cumulative points by the number of participants still active (decending line fromleft to right).

53 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.3:C

umul

ativ

epo

ints

(dar

kup

perl

ines

)and

erro

rs(l

ight

low

erlin

es)p

erta

sk(Q

uest

ions

1–30

)with

fitte

dpo

wer

func

tions

.

54 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.4:C

umul

ativ

epo

ints

(dar

kup

perl

ines

)and

erro

rs(l

ight

low

erlin

es)p

erta

sk(Q

uest

ions

31–6

0)w

ithfit

ted

pow

erfu

nctio

ns.

55 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.5:C

umul

ativ

epo

ints

(upp

erlin

e)an

der

rors

(low

erlin

e)pe

rsub

ject

(Par

ticip

ants

1–25

).

56 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.6:C

umul

ativ

epo

ints

(upp

erlin

e)an

der

rors

(low

erlin

e)pe

rsub

ject

(Par

ticip

ants

26–5

0).

57 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.7:C

umul

ativ

epo

ints

(upp

erlin

e)an

der

rors

(low

erlin

e)pe

rtas

k(Q

uest

ions

1–30

).

58 of 59

FP7 – 215535

Deliverable 4.2.2

Figu

reA

.8:C

umul

ativ

epo

ints

(upp

erlin

e)an

der

rors

(low

erlin

e)pe

rtas

k(Q

uest

ions

31–6

0).

59 of 59

4.2.2 Analysis of Human Search Strategies

Documents

Transcript of 4.2.2 Analysis of Human Search Strategies