Think Aloud Method

THE THINK ALOUD METHOD

A practical guide to modelling cognitiveprocesses

Maarten W. van SomerenYvonne F. Barnard

Jacobijn A.C. Sandberg

Department of Social Science InformaticsUniversity of Amsterdam

Published by Academic Press, London, 1994ISBN 0-12-714270-3

Copyright M.W. van Someren, Y.F. Barnard and J.A.C. Barnard

Contents

1 Thinking aloud 11.1 A first impression . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Theories of cognitive processes in problem-solving . . . . . . . 81.3 Building knowledge-based systems . . . . . . . . . . . . . . . . 91.4 Overview of this book . . . . . . . . . . . . . . . . . . . . . . 12

2 Studying the content of cognitive processes 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Cognitive processes in problem-solving . . . . . . . . . . . . . 132.3 Observation methods . . . . . . . . . . . . . . . . . . . . . . . 152.4 Structured techniques . . . . . . . . . . . . . . . . . . . . . . . 162.5 Verbal reports . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.1 The verbal reporting process . . . . . . . . . . . . . . . 192.5.2 Retrospection . . . . . . . . . . . . . . . . . . . . . . . 212.5.3 Introspection . . . . . . . . . . . . . . . . . . . . . . . 232.5.4 Questions and prompting . . . . . . . . . . . . . . . . . 232.5.5 Dialogue observation . . . . . . . . . . . . . . . . . . . 24

2.6 Differences between methods . . . . . . . . . . . . . . . . . . . 242.7 Think aloud protocols . . . . . . . . . . . . . . . . . . . . . . 262.8 Combining methods . . . . . . . . . . . . . . . . . . . . . . . . 26

3 The think aloud method 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 History of the think aloud method . . . . . . . . . . . . . . . . 293.3 Selecting subjects . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Criteria for selecting subjects . . . . . . . . . . . . . . 343.3.2 Experts as subjects . . . . . . . . . . . . . . . . . . . . 343.3.3 Differences in verbalization skills . . . . . . . . . . . . 35

3.4 Selecting problems . . . . . . . . . . . . . . . . . . . . . . . . 363.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

vi The Think Aloud Method

3.6 Overview of the analysis of think aloud protocols . . . . . . . 37

4 Practical procedures in obtaining think aloud protocols 41

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4 Warming up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.5 Behaviour of the experimenter and prompting . . . . . . . . . 44

4.6 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.7 Transcription of the protocol . . . . . . . . . . . . . . . . . . . 44

4.8 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Building models of problem-solving 49

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 Modelling cognitive processes . . . . . . . . . . . . . . . . . . 49

5.3 The form of models of cognitive processes . . . . . . . . . . . . 51

5.4 Procedural models and explanation of human behaviour . . . . 53

5.5 Building models . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.6 Task analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.6.1 The construction of a task analysis . . . . . . . . . . . 56

5.6.2 Example: task analysis of solving arithmetic word prob-lems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.6.3 Example: task analysis of architectural design . . . . . 63

5.6.4 The role of task analysis . . . . . . . . . . . . . . . . . 64

5.7 Theories of problem-solving . . . . . . . . . . . . . . . . . . . 65

5.7.1 The role of psychological theories . . . . . . . . . . . . 65

5.7.2 Example: psychological theories on solving arithmeticword problems . . . . . . . . . . . . . . . . . . . . . . 68

5.7.3 Example: psychological theories on problem-solving inarchitectural design . . . . . . . . . . . . . . . . . . . . 68

5.8 Psychological model . . . . . . . . . . . . . . . . . . . . . . . 69

5.8.1 The construction of psychological models . . . . . . . . 69

5.8.2 Example: a psychological model of solving arithmeticword problems . . . . . . . . . . . . . . . . . . . . . . 70

5.8.3 Example: a psychological model of architectural design 73

5.9 Dimensions of models . . . . . . . . . . . . . . . . . . . . . . . 75

5.10 On the boundaries of task analysis and model construction . . 77

Contents vii

6 Languages for task analysis and psychological modelling 796.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.2 A conceptual modelling language . . . . . . . . . . . . . . . . 81

6.2.1 CPML (Conceptual Protocol Modelling Language) . . 816.2.2 Domain layer . . . . . . . . . . . . . . . . . . . . . . . 826.2.3 Inference layer . . . . . . . . . . . . . . . . . . . . . . . 846.2.4 Task layer . . . . . . . . . . . . . . . . . . . . . . . . . 866.2.5 Example: a CPML model of architectural design . . . . 916.2.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . 95

6.3 Pseudo programming language . . . . . . . . . . . . . . . . . . 966.4 Problem Behaviour Graph and production rule systems . . . . 98

6.4.1 Problem Behaviour Graph (PBG) . . . . . . . . . . . . 986.4.2 Example: part of a PBG model of architectural design 996.4.3 Production rules . . . . . . . . . . . . . . . . . . . . . 1016.4.4 Extensions of production rules . . . . . . . . . . . . . . 1046.4.5 Problem Behaviour Graphs, production rule systems and

human memory . . . . . . . . . . . . . . . . . . . . . . 1066.5 Programming languages . . . . . . . . . . . . . . . . . . . . . 1106.6 Using a language or adapting it . . . . . . . . . . . . . . . . . 1126.7 Differences between languages . . . . . . . . . . . . . . . . . . 113

7 Analysing the protocols 1177.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.2 The role of protocols as data in research . . . . . . . . . . . . 1197.3 Transcription and segmentation . . . . . . . . . . . . . . . . . 1197.4 Coding scheme and verbalization theory . . . . . . . . . . . . 120

7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1207.4.2 Constructing a coding scheme . . . . . . . . . . . . . . 1217.4.3 Grain size and aggregation . . . . . . . . . . . . . . . . 1227.4.4 Special coding categories . . . . . . . . . . . . . . . . . 1227.4.5 Coding form . . . . . . . . . . . . . . . . . . . . . . . . 1237.4.6 Example: a coding scheme for architectural design . . . 1237.4.7 Verbalization theory . . . . . . . . . . . . . . . . . . . 1247.4.8 Example of a verbalization theory . . . . . . . . . . . . 1257.4.9 Methodological requirements for the coding scheme . . 126

7.5 Coding procedures . . . . . . . . . . . . . . . . . . . . . . . . 1277.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1277.5.2 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . 1277.5.3 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.5.4 Rating protocols or protocol fragments . . . . . . . . . 128

viii The Think Aloud Method

7.6 Intercoder reliability . . . . . . . . . . . . . . . . . . . . . . . 1287.7 Comparing the coded protocols with the models . . . . . . . . 133

7.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1337.7.2 Comparing protocols with procedural models . . . . . . 1337.7.3 Issues in quantifying the fit . . . . . . . . . . . . . . . 1357.7.4 Comparing sets of protocols . . . . . . . . . . . . . . . 136

7.8 Computer support tools for analysis . . . . . . . . . . . . . . . 1367.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1367.8.2 Indexing tools . . . . . . . . . . . . . . . . . . . . . . . 1377.8.3 An implemented model as tool . . . . . . . . . . . . . . 137

7.9 Reporting the results of protocol analysis . . . . . . . . . . . . 137

8 Examples 1418.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418.2 Solving physics problems . . . . . . . . . . . . . . . . . . . . . 141

8.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1418.2.2 An example problem . . . . . . . . . . . . . . . . . . . 1428.2.3 The model of advanced problem solving . . . . . . . . 1428.2.4 Design of the experiments . . . . . . . . . . . . . . . . 1458.2.5 A protocol . . . . . . . . . . . . . . . . . . . . . . . . . 1468.2.6 Coding the protocol . . . . . . . . . . . . . . . . . . . 1488.2.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1508.2.8 The sequence of tasks . . . . . . . . . . . . . . . . . . . 1518.2.9 The completeness of the model . . . . . . . . . . . . . 1518.2.10 The level of detail of the model . . . . . . . . . . . . . 1528.2.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 152

8.3 Explaining novice errors in computer programming . . . . . . 1528.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1528.3.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . 1528.3.3 Design of the study . . . . . . . . . . . . . . . . . . . . 1548.3.4 An analysed protocol . . . . . . . . . . . . . . . . . . . 1548.3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.4 Acquisition of medical knowledge . . . . . . . . . . . . . . . . 1578.4.1 Example: a medical diagnosis task . . . . . . . . . . . 1578.4.2 Knowledge structures . . . . . . . . . . . . . . . . . . . 1588.4.3 The problem-solving process . . . . . . . . . . . . . . . 1608.4.4 Alternative models . . . . . . . . . . . . . . . . . . . . 1638.4.5 Predicted and actual protocol . . . . . . . . . . . . . . 1638.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1678.4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 167

Contents ix

A Exercises 169A.1 Exercise 1: collecting verbal data . . . . . . . . . . . . . . . . 169A.2 Exercise 2: applicability of the think aloud method . . . . . . 170A.3 Exercise 3: task analysis and model construction . . . . . . . . 170

A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 170A.3.2 Example problem . . . . . . . . . . . . . . . . . . . . . 171A.3.3 Suggestions for task analysis and psychological model . 172A.3.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . 174

A.4 Exercise 4: knowledge acquisition . . . . . . . . . . . . . . . . 174A.5 Exercise 5: physics problem solving . . . . . . . . . . . . . . . 175

B Instructions for two problem-solving tasks 177B.1 Task 1: waterjug problems . . . . . . . . . . . . . . . . . . . . 177B.2 Task 2: improving technical devices . . . . . . . . . . . . . . . 177

C Protocols of ‘learning word meanings’ 179

D Analysing expert problem-solving 187D.1 Case description . . . . . . . . . . . . . . . . . . . . . . . . . . 187D.2 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

E Coding scheme architectural design 195

F Protocol of novice problem-solving in physics 201

Bibliography 205

x The Think Aloud Method

Preface

This book gives a detailed description of the think aloud method. The thinkaloud method consists of asking people to think aloud while solving a problemand analysing the resulting verbal protocols. This method has applications inpsychological and educational research on cognitive processes but also for theknowledge acquisition in the context of building knowledge-based computersystems. In many cases the think aloud method is a unique source of infor-mation on cognitive processes. In this book we present the method in detailwith examples.

This book is intended for two types of readers: social scientists who want touse the think aloud method for research on cognitive processes and knowledgeengineers who want to use the method for knowledge acquisition. To make thebook readable for both audiences, it contains short introductions to issues thatare basic knowledge for one readership, but that are not part of the standardknowledge in the other community. We have included introductory sectionson those topics that are relevant for both communities. As a result the bookpresupposes almost no specific knowledge but it is written for readers whoare basically familiar with one of the two major application areas of the thinkaloud method: research on cognitive processes and knowledge acquisition. Therole of the think aloud method (and related techniques) is explained separatelyfor psychological research and for knowledge acquisition. The book discussesthe aspects of computer models that are directly associated with the thinkaloud method but not programming techniques.

This book has grown out of a long tradition at the University of Amster-dam. The tradition started in the 1930s with Otto Selz who used the thinkaloud method to study the creative reasoning processes. In the 1940s A.D.de Groot used the method in his famous study of thought processes in chess.In the 1960s and 1970s Jan Elshout and his colleagues used the method indetailed process studies of cognitive skills that were related to general intelli-gence. In this period Elshout also designed the first university course in whichthe method was taught. Many of the ideas presented in this book originate

xii The Think Aloud Method

from this work. Bob Wielinga introduced techniques for computational mod-elling into the course and the method. This book is based directly on materialused in a course on the think aloud method as it is taught at the University ofAmsterdam. We hope to provide a practical guide to all who like to use themethod in research or teach the method in a course.

Acknowledgments

The authors are grateful to all students who commented on earlier drafts ofthis text. Furthermore we like to thank Ronald Hamel and Wouter Janswei-jer who allowed us to use their work on architectural design and on physicsproblem-solving as illustrative material in this book and who gave many usefulcomments on earlier drafts. Finally we thank the Department of Social ScienceInformatics at the Faculty of Psychology of the University of Amsterdam forthe time and resources that allowed us to write this book.

The protocol texts of architects in Chapter 1, Figure 5.2 and the cod-ing scheme in Appendix E were adapted with permission of R. Hamel: Hetdenken van de architect, AHA Books, 1990, distributed by Staatsdrukkerij enUitgeverij (SDU), ’s Gravenhage.

The protocols of physics problem solving and the description of the modelof physics problem solving in Chapter 8 were adapted with permission ofW.H.N. Jansweijer: PDP: Een benadering vanuit de kunstmatige intelligentievan probleemoplossen en leren door doen in een semantisch rijk domein, 1988,PhD thesis, University of Amsterdam.

Chapter 1

Thinking aloud

1.1 A first impression

Suppose that you want to understand the design process of architects, theknowledge that they use, the cognitive actions that they take and the strategiesthey employ. How would you go about this? One obvious possibility is to asksome architects how they design a building. Interestingly enough, they willnot find this an easy question to answer. They are used to do their job, not toexplain it. If they do try to tell you how they go about their design work, itis quite possible that their account of it will be incomplete or even incorrect,because they construct this account from memory. They may be inclinedto describe the design process neatly in terms of the formal design methodsthat they acquired during their professional training, whereas the real designprocess deviates from these methods. Psychologists have demonstrated thatsuch accounts are not very reliable. Another possibility is to look at thearchitects’ designs and at their intermediate sketches. However, now you arelooking at the products of the thought processes of these architects, and not atthe thought processes themselves. What is needed are more direct data on theongoing thinking processes during working on a design. If you want to knowhow they arrive at their designs, what they think, what is difficult for themand what is easy, how they reconcile conflicting demands, a different researchmethod is needed.

A good method in this situation is to ask architects to work on a designand to instruct them to think aloud. What they say is recorded and usedas data for analysis of the design process. This is a very direct method togain insight in the knowledge and methods of human problem-solving. The

2 The Think Aloud Method

speech and writings are called spoken and written protocols. In this bookwe will describe a method for systematically collecting and analysing suchthink aloud protocols. This method can be used by psychologists and othersocial scientists who want to know more about cognitive processes. It is also animportant method for knowledge engineers whose goal is to build a knowledge-based computer system on the basis of human expertise.

First let us start with an example of a think aloud protocol of an architectwho is engaged in a design task. The example is taken from a study by RonaldHamel (1990) on the process of architectural design. Hamel was interestedin the thought processes of architects. Many architects believe that theirreasoning process is an unstructured flow of ideas that at some point convergesto a design. Hamel’s hypothesis was that the thought process may have astructure that is similar to that of other problem solving processes studied bypsychologists. He started his research with the development of a descriptive,psychological model of architectural design.

Hamel defined a design task and found architects who were prepared toperform the design task while thinking aloud. The task was to design a fa-cility for children up to the age of 16. The facility had to consist of a simplebuilding in which all kinds of indoor activities can take place, two playinggrounds, one for small children and one for children between 6 and 10 yearsold, and a sports field. The architects involved in the study were provided withdescriptions, maps and photographs of the specific location of the proposedfacility. They were asked to produce an initial design for the facility, to drawa plan and to make sketches taking different angles. They could use severaldrawing materials and could ask the researcher for more information. Theywere allowed to work on this task for two hours. The architects were asked tothink aloud, and their utterances were tape-recorded.

One of Hamel’s ideas was that the design process would consist of a cycleof the following steps: analysing the current problem, proposing a solution,implementing a solution, evaluating the solution.

Here follows a fragment of a verbatim protocol. This fragment starts atthe point where the architect has decided to design an inner court. Now he isthinking about the problem how to make this court an interesting playgroundfor the children.

1: then you find there, er, something like a place for parents

to sit

2: so ... a bench

3: water they will probably want

4: would you want that like a tap or would you ...

Thinking aloud 3

5: want that like something they could play with some more a

kind of ...

6: tub, er, yes

7: a water tub they can just fit their bare feet into or so

8: but that is going to make a mess in winter of course ...

9: so the question is whether that’s what you want

10: a bit of a mess it will be

11: er, that is lovely in summer naturally

12: but maybe we can with er do something with that shack

13: water I’ll just put tap [notes ‘tap’]

14: what children of course

15: what what what is much handier

16: [sketches water tub]

17: maybe something, er, where water comes out of, er,

18: and that you turn off in winter

19: but then it does not trouble you

20: and then it may run into here somewhere

21: then they can still mess about ... with water

22: then they can play with this

23: that just comes it slowly drips out of it or so

24: then they’ll get dripping wet in summer

25: and then they can get around this

26: plus I’ll just take that water tub

27: for maybe we’ll find something that, er,

28: we’ll do that later

29: [notes ‘water tub’]

30: that that we’ll find something which gives you

31: fun with it in summer

32: and no trouble in winter

33: for that’s what it is about...

34: the best would be of course if the roof could come off the

building [laughs]

35: no trouble in winter but well

36: then you’ll have these kids up on that roof again ...

37: and that is fine

38: but naturally that is a bit dangerous for certain age groups

From his initial model of the problem-solving process, Hamel developed acoding scheme which described in detail what kinds of actions belong to eachcategory identified in the model. Next he categorized protocol fragments in


terms of these actions. This made it possible to compare the design processof a particular architect with the model of the design process. For example,at the beginning of the protocol the architect is elaborating the area aroundthe building. This can be used as a playground for children, which in turnrequires objects to play with. He suggests a water tap, but evaluates this asnot very good for playing. He replaces this idea by a water tub which leads toa long evaluation. This indicates that there is actually an underlying structurein the thought process. Cycles of analysing the current problem, proposinga solution, implementing a solution, and evaluating the solution were clearlypresent in the protocols.

Note that the protocol gives information about the architect’s reasoningthat could not be obtained by simply looking at the resulting design. Theprotocol gives data about strategies and the knowledge that the architect usesto construct a design, data about lines of reasoning that were abandoned atsome point and so on.

Both the structure of the problem-solving process and the results of problem-solving steps that appear in the protocol can be used as a basis for a knowledge-based computer system that performs or supports an architectural design. Forexample, it provides several arguments that can be used to evaluate whether awater tub is an appropriate solution to a sub-problem in architectural design.

Hamel’s study of architectural design focused on a particular aspect of thereasoning process: the overall structure of the reasoning process. His study in-volved competent problem solvers because the goal was the study of competentdesign. A different goal may be the study of the behaviour of less competentproblem solvers. In psychology one often wants to explain errors or inefficientproblem-solving behaviour. One task that has been studied extensively is thesolving of arithmetic word problems. Consider for example the problem:

A father, a mother and their son are 80 years old together. The father istwice as old as the son. The mother has the same age as the father. How oldis the son?

If we present this problem to two students, and if they both give the correctsolution, then we can not just assume that their problem-solving processes arethe same. Below we show two think aloud protocols taken from two psychol-ogy students who both solved the problem given above.

Thinking aloud 5

Student 1:

1: a father, a mother and their son are together 80 years old

2: the father is twice as old as the son

3: the mother is as old as the father

4: how old is the son?

5: well, that sounds complicated

6: let’s have a look

7: I just call them F, M and S

8: F plus M plus S is 80

9: F is 2 times S

10: and M equals F

11: what do we have now?

12: three equations and three unknowns

13: so S ...

14: 2 times F plus S is 80

15: so 4 times S plus S is 80

16: so 5 times S is 80

17: S is 16

18: yes, that is possible

19: so father and mother are 80 minus 16

20: 64

21: er ... 32.

Student 2:

1: father, mother and son are together 80 years old

2: how is that possible?

3: if such a father is 30 and mother too

4: then the son is 20

5: no, that is not possible

6: if you are 30, you cannot have a son of 20

7: so they should be older

8: about 35, more or less

9: let’s have a look

10: the father is twice as old as the son

11: so if he is 35 and the son 17

12: no, that is not possible

13: 36 and 18

14: then the mother is

15: 36 plus 18 is 54


16: 26 ...

17: well, it might be possible

18: no, then she should have had a child when she was 9

19: oh, no

20: no, the father should, the mother should be older

21: for example 30

22: but then I will not have 80

23: 80 minus 30, 50

24: then the father should be nearly 35 and the son nearly 18

25: something like that

26: let’s have a look, where am I?

27: the father is twice ...

28: the mother is as old as the father

29: oh dear

30: my mother, well not my mother

31: but my mother was 30 and my father nearly 35

32: that is not possible

33: if I make them both 33

34: then I have together 66

35: then there is for the son ... 24

36: no, that is impossible

37: I don’t understand it anymore

38: 66, ..., 80

39: no, wait, the son is 14

40: almost, ... the parents are too old

41: 32, 32, 64, 16, yes

42: the son is 16 and the parents 32, together 80

When we compare these two protocols, we see two very different problem-solving processes. The first student treats the problem as mathematical equa-tion and solves it straightforwardly. The second student uses another strategy.He starts with a guess: the father is 30 years old. He evaluates the result by us-ing knowledge about reasonable age differences between parents and children.Then he forms a new estimation, and evaluates it again. The two studentsgive the same answer, but they use a very different solving strategy. Thethink aloud protocols give a clear insight in how they reach the solution. Theprotocols show clearly how the students solve the problem step by step. Thesecond protocol also shows where this student encounters difficulties and whenhe gets confused.

This illustrates how think aloud protocols can provide data about both

Thinking aloud 7

sophisticated and less sophisticated cognitive processes that are difficult toobtain by other means. Therefore it is an essential method for areas suchas cognitive psychology, educational science and knowledge acquisition. Theprotocols given above may give the impression that protocols in general areeasy to understand. This is not always the case. Below we give a protocolthat is more typical than those above. This fragment comes from a protocolby a child (12 years old) who solves the following problem:

Irene has 6 sweets less than Suzanne. Diana has 5 more than Suzanne. Howmany sweets does Irene have less than Diana?

1: so Suzanne has 6 more than Irene

2: and Diana has 5 more than Suzanne

3: so Suzanne comes first

4: because Suzanne has some 5, er, 6

5: but Diana has 5 more, even more

6: so altogether 11

7: but less is asked here

8: how many does Irene have less than Suzanne

9: then you subtract that

10: Diana, Suzanne, Irene

At this point the experimenter intervenes and says:If Diana has the most sweets and Irene has the fewest sweets, then

how can it be that Irene has only one less than Diana?

11: [reads the problem again]

12: I don’t understand it ...

13: Irene has something of which Suzanne has 6 more

14: Diana has 5 more than Suzanne

15: so something must be added

16: and then something must be subtracted

17: if I knew how many Suzanne had

18: Suzanne has 11, I think


20: [pause] 10, I think

21: how many does Irene have less than Diana

22: I think 10

23: [gives up]

This protocol illustrates that it may be hard to understand think aloud pro-tocols, even when they concern relatively simple tasks, such as this arithmetic


word problem. For now, we shall leave the interpretation of this protocol tothe reader. We return to it in Chapter 5.

The examples above gave a first impression of the think aloud method. Thismethod consists of (a) collecting think aloud protocols like those above and (b)analysing the protocols to obtain a model of the cognitive processes that takeplace during problem solving or to test the validity of a model that is derivedfrom a psychological theory. Protocols are collected by instructing peopleto solve one or more problems while saying ‘what goes through their head’,stating directly what they think. In studies on cognition, verbal protocols areused as raw data about cognitive processes. Such protocols require substantialinterpretation and analysis to see their implications for process theories ofproblem-solving. The other main application of the think aloud method isto collect expert knowledge which may be used as the basis for a computersystem. This book introduces both types of application of protocol analysisfor those readers who are not familiar with either cognitive psychology orknowledge acquisition for computer systems. Before we continue with a moredetailed description of the think aloud method, we will elaborate on the rolethis method plays in psychology and in knowledge acquisition for knowledge-based computer systems.

1.2 Theories of cognitive processes in problem-solving

Problem-solving means answering a question for which one does not directlyhave an answer available. This can be because the answer cannot be directlyretrieved from memory but must be constructed from information that is avail-able in memory or that can be obtained from the environment (for example,the givens of the problem or extra information that can be requested). Anotherpossibility is that finding the answer involves exploring possible answers noneof which is immediately recognized as the solution to a problem. Problem-solving then means that new information must be inferred from givens andknowledge in memory to accept or reject possible answers. Most problem-solving involves a combination of these two types of reasoning: constructingsolutions and constructing justifications of these solutions.

Problem-solving is the cognitive process to which the think aloud method isapplied most frequently. It can also be applied to other processes that produceintermediate thoughts that can be verbalized. For example, learning processesin the context of problem-solving are studied by Anzai & Simon (1979). Inthis study the think aloud method is used to identify changes in the knowledgeduring repeated problem-solving on a single task (in addition to the way in

Thinking aloud 9

which the problem-solving task itself is performed).

The think aloud method can be used to investigate differences in problem-solving abilities between people, differences in difficulty between tasks, effectsof instruction and other factors that have an effect on problem-solving. Sometheories under investigation concern fairly detailed aspects of the processesinvolved in problem-solving. Their aim is to explain almost every step taken bythe problem solver. Other theories employ more general properties of problem-solving processes and use more global properties of people. For example,pupils’ performance on an arithmetic test potentially can be explained fromgeneral intelligence, from some particular type of intelligence, from the generalperformance level at school, from specific performance at arithmetic, or evenfrom the detailed knowledge that the student had at the time when she madethe test. The think aloud method is only relevant if properties of the solutionprocess are relevant to the theory. The think aloud method is a means tovalidate or construct theories of cognitive processes, in particular of problem-solving.

1.3 Building knowledge-based systems

Protocol analysis is not only used for the scientific study of cognitive processes,but also for the construction of knowledge-based systems. For those readerswho are unfamiliar with the area of knowledge-based systems (or ‘expert sys-tems’) we give a brief introduction. Knowledge-based computer systems arecomputer programs that perform tasks that normally are performed by hu-man experts. These systems are based on the knowledge of human expertsin a certain area on how to perform a specific task. Examples of such tasksare medical diagnosis, monitoring production processes, giving advice aboutlegal conflicts and giving advice about the stock market. If human expertsare available, their knowledge will be the basis of the system. This knowledgemust then be acquired from the expert and represented in the computer sys-tem. Just as in the case of psychological or educational research, an obviousway to obtain the knowledge is to ask experts how they perform a task, whichknowledge they use, etc. However, experience has shown that people are oftenunable to tell how they perform a task. Even worse, there is evidence thatthey may give false information. The think aloud method is a good way toavoid ‘false’ information and obtain direct data about the solution process thattakes place when an expert solves a problem.

The most important characteristics of knowledge-based systems are thefollowing:


Expert level performance: performance is comparable to the level reachedby humans who are specialized in this task.Narrow task domain: only a relatively small class of problems can be solvedby the system. Usually these systems lack the knowledge required to solve awide range of problems.Explanation: the system can explain or justify its outcomes by displayingits reasoning and the knowledge on which this is based. To provide acceptableexplanations, the system needs enough explicit knowledge about the task andtherefore it must be more than just a table of rules relating problems to an-swers. The knowledge that is used to generate these explanations is usuallybased on the way in which human experts find the solution to a problem. Thismakes it necessary to acquire this knowledge.

Experts are the main source of the knowledge that is used to build the pro-gram. They are not the only source, of course, since relevant knowledge canalso be found in textbooks, manuals and specialized literature. However, spe-cific expertise that is acquired by one or more human experts in many years ofexperience with the task at hand is an essential ingredient here. Expert sys-tems are often constructed for tasks for which expertise is rare and expensive.A goal of building a knowledge-based system can be to make expertise avail-able to many people in an organization, or to an even wider audience. Notethat the notion of ‘expert’ is relative here. Some ‘experts’ do by no means showperfect performance, but are only experts relative to most other people. Forexample, financial experts are not able to predict economic processes withoutmaking errors, nor are medical specialists able to diagnose diseases perfectly.They are just more knowledgeable than most other people.

The think aloud method is an element in the repertory of methods availableto the knowledge engineer who acquires the knowledge for the system and whodesigns and sometimes implements the system. Collecting knowledge froman expert is called knowledge elicitation. Other elicitation techniques are forexample interviewing, collecting problems that are presented to an expert andthe expert’s solution or systematically asking the expert to define importantconcepts in the domain.

Initially, knowledge-based systems were built in a very direct way: theknowledge engineer simply asked an expert how to find the solution to a prob-lem and then used the expert’s statements to program an initial version of thesystem. This method is called ‘rapid prototyping’, because it relies on quickconstruction of prototypes of the system. The prototype is used for the elici-tation of new knowledge: it is demonstrated to the expert, who then criticizes

Thinking aloud 11

and extends it, which leads to the second version of the system. This proce-dure is repeated until knowledge engineer and expert are more or less satisfiedwith the system.

This style of development is not adequate for the construction of larger sys-tems, because constructing a new prototype from a previous version becomesincreasingly difficult when the system becomes larger. Modern developmentmethodologies (in particular KADS, see Schreiber et al., 1993) emphasize theneed for a structured approach that begins with an analysis of the require-ments for the system, including the required user interaction and an analysisof the expertise from which the system is to be built. This is done withoutconstructing a program. The analysis is done on paper, in a special modellinglanguage.

For example, in the context of medical diagnosis it is important to find atask that can be performed by a computer and that fits in the organizationalstructure. The data for the task must be available or it must be possible tofeed them into the computer. It must be possible to separate the task for thefuture computer system from other tasks that are performed by people. Forexample, if the data that are needed for diagnosis must also be recorded ina medical file then it must be possible to avoid collecting these data twice.If only certain people are authorized to have access to patient data this hasconsequences for the system. If the data must be readily available duringtreatment of the patient and if it must be possible to update these data, thishas consequences for the system. All such issues are dealt with by designing anorganization model that defines the role of the future system in the organizationwith respect to other persons and departments in the organization.

Once a suitable task is identified, the knowledge required for performingthis task must be acquired. This is where think aloud protocols can be used.In particular knowledge about the problem-solving process can be acquired inthis way. Other sources and elicitation techniques are normally used to identifybasic concepts and knowledge in the domain but as in psychological research,the think aloud method is one of the few techniques that give direct data aboutthe reasoning process. This knowledge is represented in a structured formatand is used as the basis for designing and implementing the computer system.


1.4 Overview of this book

The content of the rest of this book is structured as follows:

1. Collecting think aloud protocolsChapter 2 describes techniques for studying problem-solving processes. Thisis used to sketch the main dimensions of these techniques and to characterizethe position of the think aloud method among these techniques. Chapter 3introduces the think aloud method and Chapter 4 the practical procedures forcollecting verbal protocols.2. Modelling cognitive processesThis issue is addressed in Chapter 5. Chapter 6 describes modelling languagesthat are often used for building models of problem-solving processes includingcomputational languages.3. Methodology for the analysis of think aloud protocolsThis is presented in Chapter 7.4. ExamplesChapter 8 gives some examples of the use of the think aloud method and thisbook ends with some exercises.

Literature

Hamel’s research on architectural design is reported in Dutch in Hamel (1990).There are many overviews of psychological research on problem-solving. Agood introduction is given by Anderson (1990). A good technically orientedintroduction to knowledge-based systems is given by Jackson (1990). Schreiberet al. (1993) gives an overview of the KADS methodology, with an emphasis onmodelling techniques. Overviews of elicitation techniques are given by Diaper(1989) and Meyer & Booker (1991). Recent developments are reported in thejournals Knowledge Acquisition and International Journal of Man-MachineStudies.

Chapter 2

Studying the content ofcognitive processes

2.1 Introduction

In this chapter we will describe several techniques for studying cognitive pro-cesses in problem-solving. We start with an introduction on these cognitiveprocesses. Before we turn to the think aloud method, we will describe sev-eral other methods: the observation of problem-solving behaviour, structuredtechniques (especially in knowledge acquisition) and different forms of verbalreports. These methods are compared with each other. Then we will showthe place of the think aloud method within the spectrum of other techniquesregarding its use in psychology and in knowledge elicitation.

2.2 Cognitive processes in problem-solving

In this book we will concentrate on the cognitive processes in problem-solving.People frequently engage in problem-solving activities, professionally as well asprivately. Problem-solving can be characterized as a cognitive process that isgoal directed and requires effort and concentration of attention. The solutionis not found directly in a single step but via intermediate reasoning steps, someof which may later appear useless or false. Some problems that people solveare well defined, for example mathematical equations, questions in a schoolphysics examination, medical diagnosis problems in a standard setting. Thereare also examples of problem-solving activities in which the problem itself andits potential solution are not so well defined and where it is not so easy to


evaluate the solutions in terms of correctness. Examples of these activitiesare: designing clothes, writing an article about the results of an experiment,selecting a new personnel member. Such activities require the solution of manysmaller problems. In the case of clothes design, one has to find out what kindsof material may be used given the amount needed and the price allowed. Inthe case of writing an article, one has to find solutions to problems like whento start a new paragraph, how to phrase an idea, when to insert an illustration.

Problem-solving not only occurs in a professional or educational undertak-ing. In everyday life one has to solve a lot of problems. What shall we eattonight, how much does a kilo of apples cost given the price of a pound, whatis the most efficient route to the office, how to discuss marital problems withone’s husband or wife? Sometimes you are well aware of the fact that youare trying to solve a problem, for example when you are trying to figure outhow much something costs in a foreign currency. One is consciously tryingto remember the exchange rate and performing the mathematical operationsrequired. Sometimes problem-solving goes on without noticing, like when onedecides which flowers to buy. You may not perceive your mental process asproblem-solving. Still you are solving the problem of finding the flowers thatharmonize with the colours of your furniture, that are in line with your part-ner’s taste, that are not too high-priced, that will last for some time.

In all these examples the topic is the problem-solving process rather thanthe outcome of this process. One may want to study these problem-solvingprocesses for many different reasons. Psychologists may be interested in themechanisms underlying human reasoning, the causes of errors, the characterand origin of differences in performance between people. Educational scien-tists may be interested in the effect of education or the difficulties that pupilsexperience when solving exercise problems. A knowledge engineer may wantto understand how a person carries out a task, to be able to build a computersystem that can do the same. As we shall show below, the aim you may haveas researcher partly determines the nature of the procedure you follow, whenusing protocol analysis. In this chapter we discuss methods for collecting dataabout the content of cognitive processes, emphasizing their commonalities anddifferences. There are several research methods for gaining insight in aspectsrelated to thinking and learning. We will illustrate the most important meth-ods with the example of problem-solving processes that take place when aperson solves a physics problem. The text of this problem is:

A container is closed by a piston. holds an ideal gas. The volume of thegas is 2 litres and the pressure is 120 kilopascals. By slowly moving the pistonoutwards, the volume is increased to 3 litres, while the temperature of the gas

Studying the content of cognitive processes 15

is kept constant. What is the pressure of the gas after the piston is moved?

This problem can be solved by making the assumption that ‘slowly movingthe piston outwards’ means that energy is exchanged between the gas in thecontainer and the environment. In that case the following relation holds be-tween volume (V ), pressure (P ) and temperature (T ) before (1) and after (2)moving the piston:

P1×V1

T1= P2×V2

T2

This relation is called Boyle’s Law. In this case, the temperature is constant,which reduces this formula to:

P1 × V1 = P2 × V2 or: P2 = P1×V1

V2

Substituting the data from the problem gives: P2 = 120×23

which gives the answer: 80 kilopascals.

A psychologist or an educational scientist will be interested in, for exam-ple, differences between beginners and experts on this problem, in the causeof errors or in different strategies that people use to solve this problem. Forexample, novices tend to solve this problem ‘backward’. They try to find aformula that has the pressure in it, try to ‘fill’ this with the other givens andsee if this can solve the problem. A possible cause of errors for novice problemsolvers lies in not checking if a formula applies to the situation described inthe problem. Experts on the other hand tend to follow a forward reasoningstrategy in the style of the explanation given above. Testing and exploringsuch hypotheses requires data on the problem solving process.

2.3 Observation methods

One class of methods is based on observation of problem-solving behaviour.The first, product analysis, uses the result of problem-solving. The solution toa problem may reflect aspects of the problem-solving behaviour. In the exam-ple above, the answer ‘800 kilopascals’ would suggest an error in calculation. Ifnotes are made during problem-solving, these provide additional information.However, these data do not tell us if the condition for applying the formula(‘closed system’) was checked or if the problem was solved ‘backward’ in novicestyle or ‘forward’ in expert style. More information about the problem-solving


process can be obtained by observing the problem solving behaviour concur-rently while it takes place. This is called observation of behaviour.

A researcher may observe the student’s problem-solving behaviour. Doesshe give a solution immediately, does she make notes, does she hesitate often,what kind of literature does she use, which extra questions does she ask, etc?In the physics example the notes and the order in which they are taken canbe observed and reconstructed. This will give more information than justobserving the answer or the notes. However, very much information will notappear in the notes. Some tasks involve the manipulation of objects, forinstance, an architect moving objects in a model. In the case of our student,she may first make a drawing of the situation and then look in a physicsbook for a physics law, or she may proceed the other way around. She maysolve the problem in a short time span, or interrupt the process and work atthe problem at some later time. She may seem to work cheerfully or sounddesperate. Besides simple observation, special equipment may be used toobserve properties that are not directly visible or audible. For example eyemovements can be registered during problem-solving or even activity in variousparts of the brain may be measured by special measurement techniques. Theselatter kind of measurements may provide data on what information is beingfocused on and processed at a certain moment.

Behavioural observations are registered as action protocols. If a person ma-nipulates objects during problem-solving (for example when using a computeror another device) a behaviour trace can be stored. This is one of the fewtechniques that give access to data about the problem-solving process. Theanalysis of action protocols is very similar to that of think aloud protocols.

2.4 Structured techniques

Observation is an unstructured technique in the sense that it does not con-strain the subjects’ behaviour. The range of possible observations is thereforevery broad. Another class of techniques that is used both in psychology andin knowledge acquisition uses predefined forms in which the subject shouldexpress his knowledge. This can be done in many ways, depending on thetask for the subject and the purpose of the study. There is an infinite varietyof possible formats. A possible format are questions with predefined answersfrom which one or more is selected. For example, we may ask a subject tosolve the physics problem and ask Did you use Boyle’s Law? Yes/No or Didyou check if the formula that you used to compute the answer was applicable?Yes/No or Did you solve the problem by reasoning backward from the question


or forward from the givens of the problem? Forward/Backward. Alternatively,we can ask questions in a general form such as Do you normally check if a for-mula that you use to compute the answer is applicable before using it? Yes/Noor Do you normally solve problems by reasoning backward from the questionor forward from the givens of the problem? Forward/Backward.

Note that some of these questions are difficult to answer and are likely toproduce false answers. To obtain reliable results, the format has to correspondas closely as possible to what is known about the original cognitive process andthe information that it involves, to avoid interpretation and distortion by thesubject. Another factor that affects the quality of reports is the time intervalbetween the process and the report. The questions in general form always askabout processes that took place over a longer period of time which will reducethe quality of the answers.

We give another example of a method using structured data that is takenfrom knowledge acquisition for expert systems: the Ariadna knowledge ac-quisition system (Morgoev, 1989). Ariadna is a computer-based elicitationsystem that collects data about a classification process. The problem-solvingtask is to classify an object by asking for information about the object. Thiscan be, for example, a patient who must be diagnosed or an applicant who isto be classified as suitable or not for a job. Ariadna asks predefined questionsbut it allows the user some freedom in expressing the answers. Ariadna usesinformation provided by the user to generate new questions and it keeps trackof questions that it should ask later. We give a sample dialogue between Ari-adna and a medical expert. Ariadna plays two roles in the dialogue: that ofan imaginary patient and of a dialogue director.

Ariadna: What would be your first question?

Expert: Where is the pain?Ariadna: What are the possible answers to this question?

Expert: Chest, left arm, right arm, head, left leg, right leg, stomach.Ariadna: Suppose that the answer is ‘chest’.

Which possible classifications are now still possible?

Expert: Heart infarction, angina pectoris, functional complaints,lung embolism, kidney dysfunction, arrhythmia.

Ariadna: What would be your next question?

Now Ariadna is back at its first step. This process continues until there are nomore questions left. Ariadna assumes that the experts problem-solving con-sists of the following reasoning steps:


1. Ask for information (in this case symptoms and properties of a patient).2. Use this information to find a set of possible classifications.

Initially a large set of possible classifications is generated, but gradually thisset will become smaller when new information becomes available that excludescertain classifications.

When the user gives the possible answers to a question, Ariadna selectsone (or the user can select one) but it also remembers that later on it shouldask about the other possible answers. After some more questions the user willindicate that only one more class is still possible (or that there are no moreuseful questions). Now Ariadna will go back to an earlier point in the dialogue.It can return to the question above and ask the expert:

Ariadna: Suppose that the answer to your question, ‘where is thepain?’ was ‘left arm’. Which possible classifications

are now still possible?

This process continues until the user stops it or until all possible answer pat-terns are discussed. In this way Ariadna collects a problem-solving trace thatconsists of the following elements:

• Information requests with their possible answers.• Sets of possible classifications after each new piece of information.• Possible classes after obtaining a series of answers.

Ariadna gives a structured solution trace and is able to actively prompt fornew information in terms of this structure. The advantage over less struc-tured techniques is that the information is well structured and that the activeprompts make it likely that this information is reasonably complete. A disad-vantage is that the information may not be complete in other aspects than thepossible classifications. For example, the subject may reason about the possi-bilities and no trace is found of this (although the Ariadna system gives thesubject the option to make notes during the session). Note that this particularmethod can only be used if the problem-solving task is of the ‘classification di-alogue’ type as above. Ariadna could not be used to obtain data about physicsproblem-solving because the reasoning process in physics problem-solving hasa completely different structure than the elimination of candidate diagnosesthat is assumed by Ariadna. On the other hand, Ariadna does not require astrict format for the knowledge that is entered into the system.


In general the differences between structured and unstructured techniques are:

1. The structure makes it possible to generate directed requests for new infor-mation.2. Structured information is ‘closer’ to a computer program, thus making iteasier to build a program from it.3. Structured techniques require the user to structure the information. Thisrequires a kind of translation by the subject which distorts the informationwith respect to the cognitive process.

2.5 Verbal reports

2.5.1 The verbal reporting process

A final class of methods involves unstructured verbal reports of problem-solving.There are different ways in which verbal reports can be obtained. To appreciatethe differences between these methods, a simple model of the verbal reportingprocess is useful. The model in Figure 2.1 is based on a simple model of thehuman cognitive system.

Long-term memory (LTM) is the part where knowledge is stored more or lesspermanently. It takes some time to store information there and it can beretrieved later on to be used again. At the other end we find the sensory sys-tem that transforms information from the environment into an internal form.Working memory (WM) is the part where the currently ‘active’ informationresides. In this model there are five processes:

1. Perception: Information flows from the sensory buffer into working mem-ory.2. Retrieval: Information is retrieved from long-term memory into workingmemory. It still exists in long-term memory but is activated into workingmemory.3. Construction: New information is constructed from other information inworking memory. For example, when solving the physics problem, someonemay note that ‘slowly moved piston’ may in general refer to ‘adiabatic process’and the resulting new association between these concepts is stored as a newobject in working memory.4. Storage: Stores information from working memory into long-term mem-


Sensorybuffer

Workingmemory

Long-termmemory

protocols

FIGURE 2.1: Memory model


ory.5. Verbalization: Information that is active in working memory is put intowords. The output of this process is the spoken protocol.

Note that this model has several important implications for the meaning ofverbal reports. One important point is that the information that can be ver-balized is the content of working memory. This means that the content oflong-term memory (the general knowledge) cannot be verbalized (unless it issomehow retrieved rather than used), nor can the cognitive architecture, themachinery, that applies the knowledge be verbalized. About these aspects onlyindirect knowledge is available.

2.5.2 Retrospection

In the case of retrospection the subject solves a problem and is questionedafterwards about the thought processes during the solving of a problem. So inthe example of the student she may be asked: ‘How did you solve the physicsproblem?’ The questions may also be more detailed, like ‘Which physical lawdid you use?’, or ‘When did you think that the problem was hard to solve?’ Itis also possible to record the problem-solving session on video and to reviewthe video-tape together with the subject so she can give her interpretation ofwhat happened when she looked into the textbook or made a note.

Retrospection may be difficult. It is not always easy to remember exactlywhat one did, especially if some time has passed after completion of the task.Sometimes even, one is not very aware of what one is doing. The physicsstudent may have no conscious idea of how she calculated the filled-in formula.Another problem is that subjects may tend to present their thought processesas more coherent and intelligent than they originally were. Sometimes thisis intentional, because people do not like to admit that they do not have thefaintest idea of how to solve a problem considered as easy. So they will say:‘Well, I first stared at the problem statement and asked myself a few questions,like have I seen this problem before, do I know a procedure to solve thisproblem?’ This may give the false impression of perfectly rational behaviour.In other cases, this kind of post hoc rationalizing happens unintentionally. Ifyour behaviour was rather irrational to begin with, you may not remember itlike that. Humans are just inclined to reconstruct events as more structuredthan they were originally. Their memory is guided by their knowledge of theresult.

Psychologists have shown in many experiments that the data that are ob-tained by retrospection are not always valid (Nisbett & Wilson, 1979; Ericsson


& Simon, 1993). People may not report thoughts that they have clearly hadbefore and they will also report false memories: thoughts that they cannothave had at the time. Close examination of the conditions under which re-ports are unreliable has shown that all discrepancies were found in situationsin which there was either a delay in time between the cognitive process andthe report, or there was a question by the experimenter that required an in-terpretation rather than a direct report, (‘Why did you do X instead of Y?’),or both. When asked for memories, explanations or motivations, people an-swer a question not from direct memory of the cognitive process but from aninterpretation that can easily be influenced by expectations.

A classic example is an old study by Maier (1931). Subjects were broughtinto a room in which various objects were lying around and two ropes hangfrom the ceiling. Their task was to tie the ropes together. Because of thelength of the ropes and the distance between them it was not possible to justtake one, walk to the other and tie them together. After explaining the task,the experimenter gave two hints: he ‘accidentally’ touched one rope whichmade it swing back and forth a bit and, if a subject still failed to solve itafter some time, the experimenter gave him a pair of scissors and told him:‘With the aid of this and no other object you can solve the problem’. In thiscase subjects found the solution (tie an object to the rope, swing it, walk tothe other, pull it over and grab the swinging rope) much quicker than whenthe experimenter had not given a hint. Afterwards subjects were asked if theysolved the problem ‘as a whole’, without intermediate steps leading to solutionor that they gradually solved it in steps and they were also asked if they hadnoticed that the experimenter had touched the rope and if that had had anyinfluence on their reasoning. Maier found that subjects who discovered thesolution ‘as a whole’ reported that they had not noticed the cue or that theyhad not used it. The explanation for this is that from the subjects’ perspectiveit must be highly unlikely that this would have had any effect, considering thetime that they spent solving the problem (between several minutes and halfan hour) and the sudden appearance of the solution. Yet, from the differencein solution time, one can conclude that the cue did have an effect.

The memory model explains why this is the case. Retrospection means thatinformation must be retrieved from long-term memory and then verbalized.The disadvantage is that the retrieval process may not produce all informationthat actually appeared in working memory during the problem-solving process.What is worse, it is possible that information that was not actually in workingmemory is retrieved as if it was. After solving the physics problem, the solutionwill help to remember the steps that actually led to it. These are then easilyreconstructed. However, odd and fruitless steps that occurred on the way are


less likely to be retrieved.

2.5.3 Introspection

An alternative to retrospection is to instruct the subject to report not aftercompleting the problem-solving task, but at intermediate points chosen bythe subject. This is called introspection. In classic introspection, as used bypsychologists in the 1920s and 1930s, the subject is also encouraged to givean accurate, complete and coherent report on a cognitive process. This mayinvolve interpretation on the part of the subject, and the use of psychologicalterminology. As we shall see this technique is somewhere between retrospec-tion and thinking aloud. The main difference with the think aloud methodis that the latter requires concurrent verbalization and discourages interpre-tation on the part of the subject. As a result, introspective reports are more‘readable’ than concurrent protocols but also more subject to memory errorsand misinterpretations.

2.5.4 Questions and prompting

Another method implies actually interrupting the problem-solving process.The experimenter may ask questions during the problem-solving process or thesubject may be prompted at given intervals to tell what he is thinking or doing.Examples are: ‘Did you check if the formula applies here?’, ‘Why are you usingthis formula?’, ‘Does this problem remind you of another one?’ or just ‘Whatare you thinking of?’, the latter being the most neutral prompt. By thismethod it is possible to explore specific aspects, selected by the experimenter,of the knowledge state of the subject at a given moment. The subject doesnot have the chance to ‘smooth over’ the answer as he may in the case ofretrospection or to skip over it. The disadvantage of this method is thatthe problem-solving process is interrupted. The subject may have difficultyin taking up the thread. Prompts that require interpretation will affect theproblem-solving process (see for example Chi et al., 1989; Ferguson-Hessler &de Jong, 1990; van Someren & Elshout, 1985). These prompts can be verysimilar to those proposed by educational scientists to improve problem solvingperformance and learning. It has been shown that asking questions about atext helps people to understand and remember the meaning.

In terms of the model of verbalization, prompting introduces additionalcues in working memory that may lead to retrieval of spurious informationfrom long-term memory and that may push current information out of workingmemory, disrupting the process. The advantage is that if the goal of the


prompt is in working memory, it is more likely to be recorded than in anyother method.

2.5.5 Dialogue observation

Some problem solving tasks naturally involve dialogues. Dialogues can berecorded on audio or video and the protocols can be used as verbal data aboutthe process. These data are clearly different from individual verbal reports.However, dialogues have the advantage that they can be recorded under morenatural circumstances than a think aloud session. We can distinguish naturaldialogues from induced dialogues. A task can be adapted to induce dialogues.For example one can ask a subject to explain to someone else how to do thetask. Our student might be asked to teach a junior student how to solvea kind of problem. This kind of instruction seems to work quite well with(young) children to whom the think aloud instruction is not very clear. An-other possibility to enhance talkativeness is to get people to collaborate on atask. The task may be adjusted so that there is a real need for cooperation.For example, the task information may be divided between the collaborators(Barnard & Erkens, 1989). Such division really requires explicit exchange ofinformation. Or they are asked to communicate solely by means of a connec-tion between computer terminals (Sandberg et al., 1988). In that case, theparticipants cannot support their utterances by gestures or facial expression,so they are obliged to be much more explicit in their statements. In these situ-ations, the experimenter is enabled to study the interaction and is able to inferwhich information is used at what time. When people collaborate they willsometimes have differing opinions. Thus they are forced to give arguments,to clarify steps of their thinking processes. For example: ‘I used this formulabecause I wanted first to reduce the number of variables’. In this method theproblem-solving process is not interrupted as disturbingly as is the case withthe questions and prompting method. The obvious disadvantage is that notall tasks involve dialogues (and changing the task may change the cognitiveprocess) and that people will not verbalize all their thoughts in a dialoguesituation.

2.6 Differences between methods

The methods described above differ on several dimensions with respect to va-lidity (the report does not reflect the cognitive process) and completeness ofthe reports that they produce:


Invalid data due to disturbance of the cognitive process: The retro-spection method will cause the least disturbance, the subject is not disturbedduring the problem-solving itself. The only difficulty is that the solving pro-cess may change as a consequence of knowing that one is going to be askedquestions afterwards. Questioning and prompting will give most disturbance.In the case of dialogues, the subject can partly do his or her own timing, but heor she may also be disturbed by the other person asking questions or makingremarks. The disadvantage caused by disturbance in these cases lies not somuch in the hampering of concentration, but in the possibility that thoughtprocesses take directions different from those they would have taken had thesubject been left on his or her own.Invalid and incomplete data due to memory errors: The data gath-ered, were they post hoc constructions or were they gathered on-line duringthe thought process? Prompting gives the most direct data. Retrospection ofcourse only gives data after a time lapse. In the case of dialogues, people willsometimes directly give their arguments and questions, but sometimes theywill wait a while. Memory errors can produce both incomplete and false re-ports.Invalid and incomplete data due to interpretation by the subject: Ifthe subject is asked to interpret his cognitive process or if he is required to doso by a structured technique that does not fit the content of the process this isa source of distortion and invalidity of the data with respect to the cognitiveprocess. In the case of retrospection it is the subject who, for a large part,gives the initial interpretation of his or her behaviour.

The table below summarizes the techniques with respect to the method:

Disturbance Memory errors InterpretationRetrospection: no yes yesIntrospection: no little yesPrompting: yes no littleDialogues: not applicable no noStructuredtechniques: yes (?) yes (?) yes (?)The question marks means that this is generally so.

In the case of structured techniques the risk of distorted data depends ondetails of the structuring and timing of the questions.

When one wants to study the thought processes themselves, and would like


to know what is going on in a subject’s mind from moment to moment, thereis another method for obtaining verbal reports: asking the subject to thinkaloud during problem-solving.

2.7 Think aloud protocols

Thinking aloud is the method we will discuss in this book, so later on thismethod will be described extensively. To summarize: the subject is asked totalk aloud, while solving a problem and this request is repeated if necessaryduring the problem-solving process thus encouraging the subject to tell whathe or she is thinking.

Thinking aloud during problem-solving means that the subject keeps ontalking, speaks out loud whatever thoughts come to mind, while perform-ing the task at hand. Unlike the other techniques for gathering verbal data,there are no interruptions or suggestive prompts or questions as the subjectis encouraged to give a concurrent account of his thoughts and to avoid in-terpretation or explanation of what he is doing, he just has to concentrate onthe task. This seems harder than it is. For most people speaking out loudtheir thoughts becomes a routine in a few minutes. Because almost all of thesubject’s conscious effort is aimed at solving the problem, there is no roomleft for reflecting on what he or she is doing. As discussed by Ericsson &Simon (1993), in general, talking out loud does not interfere with the taskperformance.

Thinking aloud is a method which, in principle, does not lead to muchdisturbance of the thought process. The subject solves a problem while thetalking is executed almost automatically. The data so gathered are very direct,there is no delay. The subject does not give an interpretation of his or herthoughts nor is he or she required to bring them into a predefined form asin structured techniques. He or she renders them just as they come to mind.Think aloud protocols are not necessarily complete because a subject mayverbalize only part of his thoughts. Compared with structured elicitationtechniques, the think aloud method makes it easy for the subjects, becausethey are allowed to use their own language. Structuring the information is thetask of the person who will analyse the protocols (see Chapter 7).

2.8 Combining methods

It is often possible to combine methods. The general principle is that onemethod is used to collect data that can be used to focus or facilitate applica-


tion of the next method. We give a few examples:

Action protocol - retrospection: For some tasks retrospection can besupported by records of observations or intermediate products. For example,notes taken during problem-solving or a video of the problem-solving processcan support the retrospection. This gives better results than just retrospectionbecause the action protocol helps the subject to remember thoughts that hewould not reconstruct otherwise.Introspection - prompting: Introspection is used to identify critical eventsin a cognitive process, and in a second stage, subjects are prompted specificallyabout these events. This helps to overcome the effect of memory and interpre-tation by the subject and keeps the possibility to obtain more complete dataabout the process than would be possible by introspection alone.Thinking aloud - retrospection: The think aloud protocols or behaviouralobservations during a session are used to obtain a retrospective protocol onpauses in the think aloud session or on fragments of the think aloud sessionthat sounded incomprehensible, very incomplete or very odd. If possible thisshould be done directly after the think aloud session. If this is not feasible,one may show the written protocol to the subject and discuss the pauses, in-comprehensible parts, etc. in a later stage.Introspection - thinking aloud: Introspection is used to obtain a tentativemodel of the structure of the cognitive process. The think aloud method isthen used to validate this model with more direct data. The result of intro-spection is used to help in the analysis of the protocols.


Literature

Maier’s experiment is reported in Maier, 1931. The Ariadna system is de-scribed by Morgoev (1989). The extension that we sketched is inspired by theASK system (Gruber, 1989). This article also contains an excellent discussionof the issues involved in building elicitation systems. Another discussion ofsuch system with examples can be found in the collection by Marcus (1988).In Nisbett & Wilson, 1979, a number of studies are reviewed in which verbalreports could be shown to give false results. Ericsson & Simon (1993) showedthat all these can be reduced to the factors that we discussed above: time de-lay and the effect of directed prompting. Their book contains a comprehensivereview of the literature on the issues discussed in this book.

Chapter 3

The think aloud method

3.1 Introduction

In this chapter we discuss the history of the think aloud method and the condi-tions under which the method can be effectively used. Some of these conditionsplay an important role in knowledge acquisition and will be discussed sepa-rately. This chapter ends with an overview of the analysis process which willbe discussed in the following chapters.

3.2 History of the think aloud method

The think aloud method has its roots in psychological research. It was de-veloped from the older introspection method. Introspection is based on theidea that one can observe events that take place in consciousness, more orless as one can observe events in the outside world. Some early psychologists,for example Titchener (1929), went as far as to claim that the events in con-sciousness were the actual object of psychology in contrast to the outside worldwhich is the object of the natural sciences. In this view, psychologists studythe type of events that take place in human consciousness and their causalstructure just as other scientists study the events that occur in the outsideworld. The analogy between introspection and observation was taken quitefar. A methodological principle was for example, that only well trained ob-servers (experienced psychologists) were to be used as observers because theywould be able to interpret the events in consciousness in the right way, just asa biologist who observes animal behaviour may notice things that an ordinaryobserver will miss.


Introspection has led to some successful research but there were also funda-mental theoretical and methodological problems attached to it. The theoreti-cal problems concern the model of introspection as perception of the contentsof consciousness. This model makes a separation between the processes inconsciousness and the introspection process itself, thereby suggesting that thelatter is not accessible in consciousness. If that were true, then how is the in-trospection process accessed by the observer? On the other hand, if both areconsidered to be accessible in consciousness, a ‘homunculus’ problem is raised:is the introspection process itself subject to introspection? These questionscould not be answered satisfactorily within the framework of introspection asperception of consciousness. As we discussed in Chapter 2, the solution thatunderlies the think aloud method is to assume a simpler process (verbalizationinstead of observation and interpretation) and to assume that only the contentsof working memory are verbalized instead of the entire cognitive process.

A methodological problem with more severe practical consequences is thatin the introspection view the research data are the events that take place inconsciousness. These are to be analysed and explained. However, these dataare fundamentally accessible only to a single observer, who also performs thethought process. This makes it impossible to replicate empirical studies andthereby to settle scientific discussions about thought processes. These discus-sions and the built-in limitation of the introspection method made psycholo-gists turn away from the introspectionist method and associated theories. Be-cause introspection was a central method in studying cognitive processes, thisalso meant that psychological research turned away from cognitive processes.This contributed to the rise of behaviourism in the 1930s. Behaviourism tookthe other extreme view. It banned all theorizing about processes that cannotbe observed from the outside of the body, as speculation, with the exceptionof physiological processes. (Physiology was considered a related field.) Thehistory of the introspection method in psychology has made psychologists sus-picious of methods that resemble introspection. Note that we know now thatthis suspicion is not justified with respect to the think aloud method for tworeasons:

• The think aloud method avoids interpretation by the subject and only as-sumes a very simple verbalization process.• The think aloud method treats the verbal protocols, that are accessible toanyone, as data thus creating an objective method.

There has been a ‘thin’ line of research even in the 1930s and 1940s thatcontinued to experiment with variations of introspective methods. The main

The think aloud method 31

methodological advancement with respect to the introspective method was totreat the (verbal) reports as data, instead of the processes in consciousness.The advantage is that these data are open to inspection and interpretationfor anyone. The theoretical model of the production of the verbal reportsbecame less important, now that a working method became available. In-teresting results were obtained with the think aloud method by for exampleDuncker (1945) and de Groot (1946 and 1965). Duncker analysed problem-solving processes in terms of memory search. He explained the sequence ofpossible solutions that people explore from an informal model of retrieval ofrelevant partial solutions from memory. These solutions were then accepted,modified or rejected by applying them to the problem and evaluating theirimplications. Without verbal data about the process this is clearly hard toinvestigate. De Groot was able to describe problem-solving by expert chessplayers as progressive refinement of a plan, using a large set of specific conceptsand principles.

By the end of the 1960s the interest in internal cognitive processes grewvery fast and thereby the interest in methods that can provide data aboutthese processes. A major result was the work by Newell & Simon (1972), whoused think aloud protocols in combination with computer models of problem-solving processes to build very detailed models. Using this methodology Newelland Simon were able to explain protocol data from a theory of human memoryand assumptions about the knowledge that subjects could bring to bear on atask. This work had a major influence, because it showed that very detailedexplanations of verbal data can be obtained. Although many psychologistswere skeptical, the method gained more and more acceptance especially inthe period from 1980 on, when computer simulations of cognitive processesbecame increasingly popular.

In the 1980s computer scientists began to develop expert systems. Usingtechniques from Artificial Intelligence, they demonstrated that it was possibleto build programs that performed at an expert level of performance. Initially,however, there was no systematic way to obtain the knowledge from humanexperts. Knowledge engineers used more or less structured interviews withexperts to obtain the initial version of a knowledge-base and in a later stageemployed them as an oracle to repair errors in the program: a false solutionthat the program had found was shown to the expert, along with the trace ofthe solution process and the expert was asked to indicate where this had gonewrong and how the knowledge should be modified.

This method amounts to a combination of introspection and a structuredform of prompting that suffers from the strengths and weaknesses of both:introspection is likely to miss many special tricks, heuristics, shortcuts and


special case solutions and structured prompting is suggestive and imposes aparticular format on the knowledge. In the worst case, this may focus theexpert on an aspect of the problem that is actually not relevant and it maydistort the knowledge (if the format is inadequate). Some of the expertise isbuilt up by experience and cannot easily be articulated and explained to anoutsider. This led some researchers to use the think aloud method to elicitexpert knowledge. If the expert can apply his knowledge in problem-solvingthen this becomes visible in the protocols. In this way it became possibleto obtain knowledge of which the expert was not aware and the free (verbal)format avoids distortions and misrepresentations.

Currently the think aloud method is accepted as a useful method by alarge part of the scientific community in psychology and it also has its placein the repertoire of many knowledge engineers. In section 2.6 we discussedfactors that threaten the validity and completeness of verbal data. These fac-tors were: invalidity due to disturbance of the cognitive process, invalidity andincompleteness due to memory errors and invalidity and incompleteness due tointerpretation by the subject. What is the position of the think aloud methodwith regard to these factors:

Invalidity due to disturbance of the cognitive process: Does the ad-ditional task of thinking aloud change the cognitive process? Will a differentprocess take place than without thinking aloud? Consider the following exam-ple. A psychologist is interested in the way in which people operate a powerplant. He instructs the operators to think aloud while they monitor and oper-ate the control panel of the plant. However, at some point an operator seemsto forget her instruction to think aloud and quickly takes a series of actions.If the think aloud instruction is repeated and emphasized then the operatorbecomes a bit irritated and begins to act differently. It seems that she nowperforms the task differently. Finally emotional and motivational factors canresult in a cognitive process that is different from the process that would takeplace during task performance without thinking aloud. There is not muchevidence that thinking aloud adds much to the effect of being studied andevaluated that is inevitable in knowledge acquisition and experimental set-tings. In the next chapter we discuss how to minimize these effects.Invalidity and incompleteness due to memory errors: Errors due toincomplete or false recall are essentially absent in case of the think aloudmethod. In any case they are not comparable with the errors that are causedby reconstruction processes in memory.Invalidity due to interpretation by the subject: In psychological ex-periments no evidence was found that think aloud protocols are inaccurate in


the sense that people give incorrect information about the cognitive processconcerned (other than occasional errors like those normally found in spokenlanguage). This does not mean that protocols are easy to understand. As wealready saw in Chapter 1, interpreting protocols can actually be quite difficult.As with other verbal reporting techniques, the form of the information and theverbal ability of the subject determine the quality of the reports. Sewing on abutton or selecting spices for a sauce would not be easy to verbalize for mostof us.

Although the think aloud method does not suffer from the threats to com-pleteness and validity that play a role in the other techniques, it introducestwo new threats to the validity of reports:

Incompleteness due to synchronization problems: Thinking aloud takesplace concurrently with the cognitive process. A cognitive process takes longerwhen the think aloud method is used. This means that people are able to slowdown the normal process to synchronize it with verbalization. However, sub-jects frequently report that sometimes verbalization does not keep up with thecognitive process and that their report is incomplete . This is consistent withthe observation that occasionally protocols contain ‘holes’ of which it is almostnecessary to assume that an intermediate thought occurred here.Invalidity due to problems with working memory : If reasoning takesplace in verbal form then verbalizing the contents of working memory is easyand uses no capacity of working memory. However, if the information is non-verbal and complicated then verbalization will not only cost time but alsospace in working memory because it becomes a cognitive process by itself.This will cause the report of the original process to be incomplete and it cansometimes even disrupt this process.

A related effect occurs when verbalizing the information in working memoryis difficult and uses some of the capacity of working memory. For example,if the process operator wants to verbalize one of the items that quickly pass,she must keep it in memory while finding a suitable description of it. Thismemory space cannot be used for other information. This is a problem ifshe is used to quickly check several instruments and then consider the results.If she tries to follow her routine, then the capacity of working memory maybe insufficient. Problems with working memory and synchronization can berecognized by complaints by the subject and interrupted verbalizations.

If you find that the think aloud method does not work well possibilities areto change the method, the task or the subjects. We have already discussed


the range of applicable methods. Below we discuss the selection of subjectsand tasks.

3.3 Selecting subjects

3.3.1 Criteria for selecting subjects

Both subjects and tasks must thus be selected that the effect of possible dis-ruptive effects of thinking aloud is minimized. The cognitive process in whichwe are interested should occur when the task is presented to the subject, dis-ruption of the process by thinking aloud should be minimized and so shouldsynchronization problems and working memory overload. Both in scientificresearch and in knowledge acquisition one does not always have a choice. Re-search may be directed at a particular kind of persons and we need a randomsample of those because the results must be generalized over all persons of thiskind. In knowledge acquisition it is often difficult to get access to an expertand one often cannot choose. Two important properties of subjects with re-gard to the applicability of the think aloud method are the degree of expertiseand verbalization skills.

3.3.2 Experts as subjects

If the think aloud method is used for the elicitation of expert knowledge severalproblems are likely to occur. Expert knowledge is often partially ‘compiled’ inthe sense that experts are able to perform a task very well, but that they cannotexplain how they found the right answer (‘I just saw that it had to be this’).The think aloud method makes some of this knowledge visible. On severaloccasions we observed that experts were able to make their knowledge explicitin a discussion afterwards about their think aloud protocols and our analysisof the protocols. However, as with regular subjects, experts that perform atask as a routine and very fast, are unable to verbalize their thoughts duringthis performance.

One has to take into account social and motivational aspects in psycho-logical research with human experts. These factors may induce a person tobehave more rationally in a psychological laboratory than in a more naturalsetting. Being supervised by a psychologist may influence the reasoning pro-cesses in experts when they think aloud. Experts can be secretive about theirexpertise and they may be reluctant to give someone else insight in their ac-tual problem-solving behaviour. Most experts are well aware that they cannot


easily justify their answers (acting by their routine), but they may not wanttheir audience to know this. Because of this, they may adopt a more rationalreasoning style for the occasion of the knowledge acquisition session.

If you study expert behaviour with the goal of building a knowledge-basedsystem, one could argue that the expert’s rationalized problem-solving be-haviour is not a problem, but a potential positive feature of a technique. In-trospection or retrospection seem appropriate here because they yield morerational knowledge than the ad hoc reasoning that appears in practice. Thetarget knowledge-based computer system should also be an optimal systemrather than an imitation of the expert, including his weaknesses. However,in practice a ‘rationalizing’ expert may produce rationalizations that haveno relation with the actual expertise and that are a much weaker basis fora knowledge-based system than the observed problem-solving behaviour. Inparticular, a rationalizing expert is likely to hide the unclear, poorly under-stood areas in his expertise, either in an effort to help the knowledge engineerby keeping things simple and avoiding the messy details, or by helping theknowledge engineer to find justifications for the answers. She may for exam-ple refuse to solve a problem because it has no justifiable solution even whenshe has a strong feeling about the best solution. It is better to use the thinkaloud sessions to acquire the expertise ‘in action’ and to find interpretations,explanations and generalizations quietly and systematically during analysis ofthe protocols. An expert should therefore be instructed that it is more im-portant that the protocol is natural and direct than that it is comprehensibleto the person taking the protocol (the knowledge engineer). Incomprehensibleparts can always be cleared up afterwards, but missing thoughts and knowl-edge cannot be recovered. This means that one should take care to ensure thecooperativeness of the expert for the think aloud sessions. One way to achievethis, is by pointing out that this method may reveal patterns in his or herreasoning that are novel and by making clear that the expert will be involvedin the analysis of the protocols.

3.3.3 Differences in verbalization skills

There are substantial differences in the ease with which people verbalize theirthoughts. As we shall discuss in Chapter 4, a little training will help peopleto become more fluent, but differences remain, even after training. As a resultsome protocols will be more complete than others. If it is reasonable to as-sume that this skill is not associated with the cognitive process involved thenone would of course prefer to select subjects with good verbalization skills.However, for most skills this is not known. In our experience, the quality of


verbalizations is not strongly associated with other properties that can eas-ily be observed or measured. One possible exception is age. Young childrenusually find it difficult to think aloud. It is not clear if this is due to theirverbalization skills, to the content of their thought processes or to the gen-eral difficulty of concentrating on a problem-solving task. Here too, the onlypractical approach is to try out the think aloud procedure in a pilot session.

3.4 Selecting problems

The considerations that we listed for selecting subjects also apply to the selec-tion of problems for use in think aloud sessions. We already mentioned thatcertain tasks are less suited because they involve non-verbal information orbecause speed is inherent in the task. Other factors may also interfere withthe think aloud task. For example tasks that involve verbal communication(for example, air traffic control, psychotherapy) are in their original form notsuited for the think aloud method. Also within a task area that is suitable(for example, solving physics problems or architectural design) it is not easyto select a task which gives good data. Important considerations are:

(a) Is the task at a level of difficulty that is appropriate for the subjects withrespect to the cognitive process? The subjects should not be able to solve theproblem in an automated manner. The task should be difficult enough.(b) Is the task representative with respect to the cognitive process involved?One is sometimes tempted to select an unusual problem because only that willbe difficult enough. A risk is then that it introduces matters that are onlymarginally relevant to the cognitive process one wants to study. A practicalway to overcome this is asking the assistance of another expert in selectingseveral problems that are both difficult and not too far fetched. After theexperimental session one should also ask the subjects if they felt that theproblems were unusual in any sense.

Because of time restrictions it is usually possible to apply the think aloudmethod to only a rather small set of problems. The best way to handle this isprobably to combine the think aloud method with less time-consuming tech-niques that make it possible to get a picture of the generality of the resultsthat were obtained with the think aloud method.


3.5 Summary

Not all tasks or subjects are equally suited for the think aloud method. Thepurpose of the method is to obtain data about a cognitive process. Thereforethe situation should be such that this target process takes place in ‘optimaforma’ and that it is ‘verbalizable’ in the sense that it involves verbalizablecontents on working memory, does not proceed too fast to allow synchroniza-tion and does not cause working memory overload. These factors are relativeto subject and task. The same task may be automated or verbalizable for oneperson but not for another.

3.6 Overview of the analysis of think aloud protocols

After discussing the verbalization process and methodological aspects of col-lecting think aloud protocols, we shall now turn to the analysis of the protocols.Here we give an overview of the analysis process as it will be discussed in thefollowing chapters. Figure 3.1 presents the objects that play a role in theanalysis of think aloud protocols:

Psychological theory of problem-solving: This is a theory about oneor more aspects of human problem-solving. Think aloud protocols are rel-evant for theories about aspects of problem-solving that appear during theproblem-solving process (and not only factors that influence it or characteris-tics of problem-solving performance) and that are accessible to verbalization(see Chapter 5).Task analysis: Normative and competence models of problem-solving, de-scribing the best way to perform a task and possible alternatives. If a taskis complicated and a detailed model is required (for example, a computerprogram that can perform a task) then a task analysis itself is constructedfollowing intermediate steps. For example, one may first construct an infor-mal sketch of the model, next a more detailed design and finally a computerprogram (see Chapters 5 and 6).Psychological model: A task-specific model of the problem-solving processthat is the result of applying the psychological theory to the task analysis.The result is a model that predicts from the psychological theory and fromthe structure of the task (and the knowledge required to perform it) how peo-ple will behave when performing the task. Just like the task analysis, thepsychological model can also be built in steps, from an informal model to acomputer simulation (see Chapters 5 and 6).


Task analysis

Psychologicalmodel

Codedprotocols Coding scheme Verbalization

theory

Segmentedprotocols

Rawprotocols

Psychologicaltheory

Predictedcoded

protocols

FIGURE 3.1: Overview of protocol analysis


Verbalization theory: This is a theory about the way in which thoughtsthat occur during problem-solving are verbalized. Verbalization itself is a taskthat has been studied in psychology and theories about cognitive processescan be applied here. This theory itself consists of a general psychological part(about the process of verbalization) and a part that is specific to the currenttask and the subject. Psychological theories about verbalization are usuallynot specific enough to construct a coding scheme and therefore it is necessaryto use pilot protocols to obtain the vocabulary and phrasing that appear inprotocols (see Chapter 7).Coding scheme: An operationalization of the psychological model, that re-lates the psychological model to the text of the think aloud protocols. It isin the form of a coding scheme for protocol fragments. The coding schemeis obtained by applying verbalization theory to the psychological model (seeChapter 7).Coded protocols: By applying the coding scheme to the protocols, codedprotocols can be obtained.Predicted coded protocols: The psychological model should imply predic-tions for the coded protocols (see Chapter 7).Segmented protocols: The first step in the analysis is to divide the proto-cols into segments (see Chapter 7).Raw protocols: This refers to protocols that are transcribed from the audiorecording, possibly extended with other data such as notes or observations (seeChapter 4).

Literature

Pioneers of the use of the think aloud method in knowledge acquisition wereBreuker & Wielinga (1987). The best source on the history and validity ofthe think aloud method is Ericsson & Simon, 1993. This gives an extensivediscussion of the psychological literature on the method.

Chapter 4

Practical procedures inobtaining think aloud protocols

4.1 Introduction

It is not difficult to collect think aloud protocols, but small errors in theprocedure can render the data almost useless. In this chapter we describe thepractical procedures to be applied in experiments where subjects are asked tothink aloud. Most of the procedures for obtaining think aloud protocols applyto both applications for psychological research and for knowledge acquisition.We use the terms subject and experimenter for both situations. In terms ofthe overview of protocol analysis we gave at the end of Chapter 3, this chapterdeals with the obtaining of raw protocols.

4.2 Setting

The first thing to do when one wants to get a subject to think aloud is to makesure that the setting is such that the subject feels at ease. The subject shouldbe settled comfortably. The room should be quiet, a glass of water should beat hand, the chair should be comfortable. Although this holds for all kinds ofpsychological research, it is especially important to remember in the case ofthinking aloud, particularly when an experiment is going to take quite sometime and will be tiresome for the voice and throat of the subject.

The situation should be focused on the task and the experimenter shouldinterfere as little as possible with the thought process to avoid influencing itscourse.


In the case of a psychological study, an explanation can be given aboutthe purpose of the research, about what is going to happen and about theprotection of the data. It may be wise to emphasize that you are interested inthe way people solve problems, and not in unconscious emotions and hiddenthoughts. People may have reservations about this kind of research and it isbest that you make your purpose clear and explain that there are no hiddenmotives. Explaining that the data are to be handled strictly confidentiallyis important. Not only is privacy protection a matter of ethics and of legalpractice, it is also important for the research itself. When the subject is verynervous, this may hinder his speaking out loud. This is not a negligible factor.When asking people to talk while they are performing a task, you ask themto bring out into the open the way they tackle a problem. This may be veryembarrassing. For example, the physics student may be used to solve problemsby randomly trying some formulas. Yet she may be quite well aware of thefact that there is a neat, consistent way of solving the problem. She may notlike to give the impression that she just messes around. And when a task issupposed to be quite easy for an expert, the same expert may feel discomfortat having to admit that he does not find it easy at all. Or the expert may beinclined to simplify the problem-solving process for the alleged benefit of theexperimenter.

In the context of knowledge acquisition it is important to make the ex-perts collaborate. Otherwise the validity of the data is threatened: the expertmay behave differently from his normal way of performing the task and avoidreasoning steps of which he himself is not certain. An acknowledged expertin a certain domain, for example, would not like to run the risk of his bossknowing about his failing to solve certain problems. And unnecessary anxietymay hamper the whole experimental procedure. Thus to create an atmosphereof confidence and easiness is of the utmost importance.

4.3 Instructions

Instructions about the task at hand should be given as customary. The in-struction on thinking aloud is quite simple. The essence of the instruction is:Perform the task and say out loud what comes to your mind. Write down theinstructions beforehand and read them to the subject. Here are some examplesof instructions:

‘I will give you a problem. Please keep talking out loud while solving the prob-lem.’

Practical procedures in obtaining think aloud protocols 43

‘Please solve the following problems and while you do so, try to say every-thing that goes through your mind.’

Part of the instruction used by Hamel (1990) in his study on architecturaldesign was as follows:

You will in a moment receive a design task. You are asked to perform thistask in the way you are used to go about a commission in your daily practice.It is important that you say aloud everything that you think or do in designing.

It is better not to use phrases like: ‘Tell me what you think’. People may thinkthat you are asking for their opinion, or for an evaluation of their thoughts.This instruction would also suggest that a problem must be solved by realthinking and they would not feel free to report ideas that just occurred tothem, apparently not as part of a rational goal-directed thought process. Donot make the instruction too long. The more you say, the more subjects willmake up their own interpretations about what it is you want from them.

4.4 Warming up

Although most people do not have much difficulty rendering their thoughts,there are some subjects for whom this method does not work. Most subjectsneed a little training but after some time, usually a few minutes to a quarterof an hour, most subjects will talk quite automatically. When after a quarterof an hour a subject still finds it hard to verbalize his thoughts, it is better tostop because this subject is unlikely to provide useful protocols.

Give the subject an opportunity to practice thinking aloud. Sometimesthe task they have to perform will also ask for a practice phase. Then it isefficient to use that phase to practice thinking aloud as well. If the task doesnot involve such phase, one has to look for a task to practice on. In general itis wise to look for a task which is not too different from the target task. Anexample of a practice task is:

A bottle of wine costs £5. The wine costs £4.50 more than the bottle. Howmuch does the bottle cost?

In Appendix B, some examples of tasks are given. Some of the easy versionsof those tasks may be used as practice tasks. The think aloud instruction for


the practice task is the same as for the main task. Practising does not onlygive the subject an opportunity to familiarize himself with thinking aloud, butit also gives the experimenter an opportunity to train the subject to stick toverbalizing his thoughts and not to interpret the thoughts. If the subject offersinterpretations and starts analysing his or her own problem-solving processes,the experimenter has to correct the subject and explain anew what the subjectis supposed to do and not to do. One should not start with the real sessionbefore one is confident that the subject is feeling comfortable with the task ofthinking aloud. It is better to prolong the practising phase than to have tointerfere in the target task or to end up with a useless protocol.

4.5 Behaviour of the experimenter and prompting

When the subject is working on the task, the role of the experimenter is arestrained one. Interference should only occur when the subject stops talking.Then the experimenter should prompt the subject by just, and only just say-ing: ‘Keep on talking’. This is usually enough to keep the subject engaged inthinking aloud for some time. It is a hard job for the experimenter. Especiallywhen the experimenter is familiar with the task domain, he is inclined to cor-rect the subject when going wrong or help the subject along when stuck. Thisreally must be avoided. So the experimenter should have some experience ingathering think aloud data, in particular in avoiding unnecessary interference.

4.6 Recording

The session is usually recorded on audio- or video-tape. It may be wise toinclude the instruction and the practising phase, in order to be able to checkafterward whether the procedure was performed correctly. It may sound rathertrivial, but always check and double check the recording instruments; youwould not be the first to end up with an empty tape after a long session witha unique subject. Also check your equipment regularly during the session, asinconspicuously as you can.

4.7 Transcription of the protocol

After the session has been recorded, it has to be transcribed. Typing out com-plete protocols is usually inevitable in psychology to be able to apply reliablecoding procedures. In knowledge engineering it is always useful to transcribe


an initial set of protocols, but later on protocols can be transcribed more selec-tively: only parts that show new knowledge or new reasoning processes needto be transcribed.

Transcribing a protocol usually means typing it out as verbatim as possible.This typing out brings along its own difficulties. Typing out protocols is atedious and time-consuming task. Transcribing may take about 10 times asmuch time as the original protocol, depending on the clarity of the protocoland the fluency of the subject. People may have accents which are strange tothe typist, they may mumble, they may have a habit of not completing theirwords, etc. Still, even if the subject talks very clearly, there will be manyinstances where it is difficult to understand what exactly the subject is saying.The subject may interrupt himself in the middle of a word, he may whispersomething, he may make strange noises. It is also possible that during thesession a noise was made in or outside the room that obscures the voice of thesubject. Sometimes it may help when the typist asks another person to try tohear what the subject said. However, it is sometimes impossible to understandwhat exactly was said. Instances like these should be marked in the typed outprotocol, for example by typing ‘unintelligible’.

This leads us to another important problem: what should be noted downand what can be left out? In psychological research, in principle everythingcan be relevant and therefore basically everything should be typed out. Thatis to say everything that was said during the session: the thinking out loudof the subject and the instructions and interruptions by the experimenter,including also utterances by the subject which have no bearing on the problem-solving process at all. For example, asking for a glass of water, or a remarkabout the rain which suddenly started. Sometimes the session was interrupted.Someone came into the room, asking for something, an announcement wasmade on the building’s intercom. All these things should be noted down,identifying the speaker. It is important to know what happened during asession, because interruptions may have an influence on the problem-solvingprocess. It is also a check on whether the experimenter behaved correctly andrefrained from helping the subject a little. Off-side remarks by the subjectmay also be an indication of an impasse in the thought process. Getting stuckplays an important role in most problem-solving theories. When do people getinto impasses and how do they resolve them? Noticing impasses may be ofimportance for further analysis of the protocol. A difficult problem is how totype out a subject’s ‘humming and hawing’. Most people say things like ‘Er, Ier ...’. Generally speaking, the typist should try to type it out as faithfully ashe can, staying as close as possible to what the subject said. This is sometimeshard and leads to rather ugly protocols. Stammering as well should be typed


out just as it occurred.Recognizable pauses and unusual silences between two words are noted

down by special marks, conventionally by dots, for example: ‘I guess ... theanswer is ten’. A long silence may be transcribed as ‘silence’. Some tran-scribers give more dots for a longer silence. However, when should a pause beconsidered unusual? And when is it long? It is a matter of interpretation bythe typist. But on the other hand, the human ear is well trained for detectingunusual pauses between words in a sentence.

Punctuation is another point to consider. When a complete and grammat-ically correct sentence is spoken, it is easy to type a full stop or a comma.However, most sentences in think aloud protocols are not so well-formed. Oneshould therefore be careful with punctuation, in order not to give one’s owninterpretation to a sentence. It is often wise not to use punctuation at all, andto start a new line for each new sentence - or when one thinks a new sentencestarts. Especially question marks should be avoided, as it is often hard to besure whether some utterance was meant as a question or as a positive remark.

Still, how should one go about the intonation of the subject? Sometimesone can hear perfectly well that the subject said something questioningly. Orangrily, depressedly, cheerfully. Some researchers note down the intonationof the subject as such. However, research shows that the reliability of obser-vations of this kind is very low, and that people hear different things whenlistening to the same tape.

When a protocol has been recorded on video or other data such as notesor observations are available, it is convenient to insert the action protocol intothe protocol, usually in a separate column next to the protocol text. Forexample: ‘the subject takes up his pencil’, ‘the subjects looks in the book’.The same goes for data noted down by the experimenter, or recorded by alog file on the computer, for example ‘the subject types in 43 + 21’. Notesconcerning these data should be clearly distinguished from the think aloudprotocol. The question here is, what do we do with observations which arenot clearly objective? For example: ‘the subjects seems to bite his pencil andtries to think hard’, ‘the subject looks around’. If one wants to make this kindof observations, because they might be useful later on, the best thing to do isto note them down separately from the other data.

All of the problems sketched above have to do with one central theme:avoiding unwarranted interpretation. One wants to analyse and to model theprotocols, so one needs transcriptions which differ as little as possible from thereal protocols as recorded on tape. Later on the subject’s behaviour has to beinterpreted in the light of the model one uses for analysing the protocols. So,what is needed are transcriptions of protocols in which interpretation does not


play a (large) role. Although it is impossible to avoid interpretation altogether,one has to try to keep it out of the transcription. Protecting measures whichcan be taken are the following. Let someone else than the experimenter typeout the protocols. Give the typist very careful instructions to transcribe theprotocol as literal as possible and to avoid any restructuring, improvements tothe style and grammar and avoiding interpretations. If something is unclear onthe audio-tape then it better to type ‘unclear’ than to interpret it at that stage.One could even let another person compare the transcription to the tape.Although these are sound measures, they are often not feasible. Typing outprotocols takes a large amount of time and there may be no one else availablebut the experimenter. For example, when subjects use a lot of specializedjargon, it may be nearly impossible for a secretary to type out the protocol.

Sometimes it is not necessary to type out the complete protocol. It maybe more efficient to encode protocols directly from the tape. Especially if avery coarse grained coding system is used or if you are only interested in somespecific actions in the protocols. For example: if you want to know whatinformation sources a problem solver uses, such as books or a teacher, you areonly interested in those sections of the protocol where a reference is made toan information source. In this case it is possible just to listen to the tape andnote down a code every time an information source is mentioned.

Although direct encoding from audio-tape instead of transcribing and cod-ing the transcription seems attractive in terms of efficiency, it is often not amethod to be recommended. There are several reasons:

(a) In many cases the data may be subject to several cycles of interpretation,each time with a revised coding scheme. This can be done much more effi-cient with written protocols. Especially in an exploratory phase of a project,protocols are used for model building and this is of course much easier from awritten down protocol.(b) Objectivity: it is difficult for another researcher to inspect whether yourcoding is correctly performed.(c) The context effect: it is not possible in this procedure to avoid the effectof the context on coding protocol fragments (see also Chapter 7).

The best procedure is to use direct encoding only in a late stage of the re-search, when the model and the coding scheme are fixed. In the near futurecomputer technology may make it easier to analyse verbal recordings withouttranscribing them, see Section 7.8.


4.8 Review

Reviewing the protocol with the subject can provide very useful additionalinformation. Protocols are usually incomplete and difficult to interpret andthe subject can be very helpful here. A good procedure is to review theprotocol with the subject as soon as possible after the actual think aloudsession. In psychological research the additional comments and explanationsthat are given during the review should not be treated in the same manneras the protocol, because they have a different status (retrospective instead ofthink aloud data).

Chapter 5

Building models ofproblem-solving

5.1 Introduction

Now that we have discussed how to obtain and document think aloud proto-cols, we turn to analysis of these protocols. The purpose of the collection andanalysis of protocols is the study of cognitive processes. This means that wewant to construct or test a process model. In this chapter we will first dis-cuss the purpose of constructing models. Next we explain how the process ofbuilding models can be divided into analysing the task and applying psycho-logical theory. We will discuss in detail how such models can be constructed.The psychological model forms the core of the protocol analysis. We will giveseveral detailed examples in order to familiarize the reader with concepts andprocedures in protocol analysis.

5.2 Modelling cognitive processes

In knowledge engineering the purpose of the analysis is to construct a computersystem that embodies part of the knowledge and skills of a human expert. Onemay think that the reasoning process of a human expert is not really relevantfor building intelligent computer systems. The main point in that view is thatthe system should just give correct solutions to problems irrespective of howthe human expert solves those problems. However, as we mentioned in Chapter1, the reasoning process of the expert is relevant to building knowledge-basedsystems for several reasons:


Explanation: Many knowledge-based computer systems are required to ‘ex-plain’ their solutions. To be understandable for human users of these systems,these explanations best show correspondence to the knowledge used in an ex-pert’s problem solving process. The reasoning steps can be incorporated inthe computer system to enable the system to explain its solution to a problem.Breaking down knowledge acquisition: It is usually not possible to presentall possible problems to an expert and ask her to find a solution for each ofthem. Breaking the task down into reasoning steps and modelling each stepmakes it possible to avoid this. Consider a medical diagnosis task that weshall discuss more elaborately in Chapter 8. The givens for a diagnostic prob-lem, data about a patient, consist of answers to some 40 questions. Thereare about 15 possible diagnoses. This gives a very large number of possibleproblems (i.e. possible descriptions of patients). It is of course impossible topresent all possible patient descriptions to an expert and ask him or her fora diagnosis. However, several groups of properties are associated with under-lying properties. For example, several different descriptions of pain actuallyrefer to a particular type of pain: ‘cardiac pain’. This can be exploited by sep-arately acquiring knowledge about recognizing ‘cardiac pain’ and then findingout how ‘cardiac pain’ is related to the possible diagnoses. This structuringof the task reduces the amount of knowledge that has to be acquired fromthe expert. This makes it so important to find the expert’s reasoning steps.Validation: A related point is validation of the knowledge in a computersystem by an expert. It is often necessary to have the knowledge in the finalsystem validated by an expert. As in acquisition, the decomposition that canbe achieved by validating reasoning is that the system gives the correct solu-tions to problems.

All this means that in knowledge acquisition the reasoning process of theexpert must be modelled and not only the relation between the givens andsolutions of problems. In psychology the interest for cognitive processes is amatter of taste and paradigm. Some researchers are not interested in processesbut only in overall relations between properties of people, situations and visi-ble behaviour but others believe that fairly detailed understanding of cognitiveprocesses is the key to understanding human behaviour. Protocol analysis isonly useful if one is interested in these processes. In this chapter we discussthe construction of models of cognitive processes.

Building models of problem-solving 51

5.3 The form of models of cognitive processes

Descriptions of cognitive processes can take different forms. The most impor-tant forms are dimensional models, categorical models and procedural models.A dimensional model means that a protocol is rated on one or more dimen-sions. These dimensions concern the cognitive process. For example, we candefine properties such as ‘duration’, ‘number of reasoning steps’ or ‘extentto which the problem data are used’ and find a way to measure these. Themodel predicts ratings for different types of people or tasks. A categoricalmodel assigns categories of cognitive processes to a protocol. An example is‘checks applicability of formula before using it’. Of course not all dimensionsof cognitive processes require think aloud protocols. Solution times can sim-ply be measured and other dimensions or categories can be measured from theproduct or from visible behaviour.

A model of cognitive processes can be defined in terms of properties ofthe processes (either as dimensions or as categories) but a different way is toconstruct a model in the form of procedures. A procedural model describesa sequence of steps. Consider again the protocol of word problems given inChapter 1. A procedure that describes at least part of this protocol is thefollowing:

Procedure Algebraic(in: text; out: solution):1. Read the problem text(text; sentences)2. Translate each sentence into an algebraic equation (sentences; equations)3. Solve the resulting set of equations for the unknown (equations; solution)

This very simple model describes step by step the cognitive process that takesplace during problem-solving. In this case the procedural model simply con-sists of a sequence of steps that are described in very abstract terms. However,each of these steps could be elaborated in more detail, including the informa-tion that they use and produce. For example, the step ‘translate each sentenceinto an algebraic equation’ can be elaborated by specifying the possible sen-tences appearing in the arithmetic word problems (the information used), pos-sible algebraic equations (possible results of the translation step) and the stepsthat are needed to perform the translation. In that case we would construct ageneral procedural model that can be applied to a wide range of problems andthat describes different solution traces depending on the problem to which itis applied. For example, a procedural model for solving arithmetic word prob-lems could contain a sub-procedure as follows:


Procedure Translate(in: sentence; out: equation):FOR EACH sentence:IF sentence = ‘A has N Z’ THEN equation = ‘A = N1’IF sentence = ‘B has N Z more than C’ THEN equation = ‘N = B - C’etc.

This procedure would translate sentences like Mary has 5 apples into: ‘Mary= 5’. This procedure will of course produce different results depending on thedata. For each problem it can generate a solution trace. This trace can beviewed as a prediction for the cognitive process as it appears in the protocol. Itis obvious that fully specifying such procedures will be a complicated exercise.The amount of detail that is needed depends on the need for a computationalmodel and on the amount of detail that is relevant for the original researchquestion about the cognitive process.

Once a procedure is specified, it can be used in two directions: (1) togenerate a reasoning process (or a description of it) or (2) to recognize a givenprocess. For example, we can compare a given sequence of reasoning steps witha procedure to see if this process can be generated by the procedure. If thisis the case, the sequence of reasoning steps satisfies the procedure. Considerthe example above. Suppose that we observe that a person reads the sentence:Mary has 5 apples and says: ‘OK, Mary 5’ This suggests that he applied theprocedure ‘Translate’, as described above because the result of the proceduresis the same as the verbalization.

The procedural model above is similar to a computer program in the sensethat both generate reasoning steps when they are applied to a problem. Themain difference between the procedure above and a computer program is that acomputer program is written in a programming language, a language that canbe executed by a computer system. A model that is written in an executableprogramming language is called a computational model.

In knowledge acquisition the ultimate goal of protocol analysis is to con-struct a computer program based on the knowledge that becomes visible inthe protocols. In psychology, it can be useful to construct a computer programthat generates behaviour that can be compared to the protocols. However, formany analyses informally specified procedures are adequate. We shall discusslanguages for procedural models, including programming languages in detailin the next chapter. In this chapter we focus on the problem of dealing withthe complexity, the richness and the initial obscurity of think aloud protocols.


5.4 Procedural models and explanation of human be-haviour

In knowledge acquisition it is obviously the ultimate goal to construct a fullcomputational model for a task. In psychological or educational research thestatus of procedural models and the need for them is subject to discussionand debate. In this book we present the analysis of think aloud protocols as amethod for data analysis. The protocols are data and the goal is to constructa psychological model that describes the data. The main differences betweenthis method and other research methods in psychology are the unstructuredverbal data and the use of procedural models. A question that is often raisedconcerns the meaning of the procedural model. A procedural model will con-tain elements (sub-procedures, production rules, etc.) that can be interpretedas descriptions of components of the human mind. Our position in this bookis the following:

(a) It is not always necessary to make claims about the meaning of each aspectof a model. Consider componential models of intelligence that are formulatedas numerical functions of performance measures or studies of cognitive skillsthat have the form of linear models of reaction times. Not all elements in sucha mathematical expression can be interpreted in terms of cognitive processesor cognitive structures.(b) It is attractive to hypothesize that components of a procedural modelcorrespond directly to components of the mind but this requires additionalevidence. There are many different procedural models that can account forprotocols and additional psychological constraints are needed to select the cor-rect one.

This brief discussion also clarifies in which sense procedural models ‘explain’cognitive behaviour. A psychological model explicitly relates properties ofthe task and hypotheses about human problem-solving to (verbal) behaviour.When designing the model one is often forced (or at least tempted) to in-troduce elements that are needed to complete a model. These may be newdiscoveries about the task or new psychological hypotheses and in this sensethey have a special status in the model.


Task analysis

Psychologicalmodel

Psychologicaltheory

FIGURE 5.1: Constructing a psychological model

5.5 Building models

In Chapter 3 the entire process of protocol analysis is summarized in Figure 3.1.In this chapter we are concerned with the upper part of this diagram, presentedin Figure 5.1.The principle that underlies the analysis is that the content of the protocolscan be predicted from from the structure of the task, psychological knowledgeand knowledge about the verbalization process (see Chapter 7 for the latter).Task analysis and psychological theory are used to construct a psychologicalmodel of the problem-solving process. We will discuss these three issues ingreater detail.

The last example protocol in Section 1.1 exemplifies these phenomena.


This protocol comes from a collection of protocols of 30 arithmetic word prob-lems solved by over 40 subjects. This resulted in a large number of pages ofprotocols of which half a page is reproduced in Chapter 1. Although someof these protocols are less obscure than the one shown here, one can imaginethe feelings of the psychologist who collected these protocols with a generalinterest in the way in which children solve these problems and why they findthem so hard. The first impression is discouraging, to say the least. Thisexperience is equally common in the use of think aloud protocols of experts.Expert protocols are often equally obscure. There are several principles tohelp dealing with this complexity:

Abstraction: Especially computational procedural models require very muchdetail. This leads to very complex models. One method to deal with thiscomplexity is to define layers of abstraction. As we shall discuss in the nextchapter languages for procedural models allow the definition of layers of ab-straction. For example, a complex process can be defined as a top-level processand sub-processes. Abstraction should focus on aspects of the protocols thatare relevant for a particular problem. A psychologist may for example be in-terested in different aspects of the problem-solving process than a knowledgeengineer. Psychologists can also be interested in different aspects of problem-solving processes. For example, one may be interested in the cause of errors,the effect of education, individual differences in cognitive styles, cognitive skillsor general intelligence. Such different interests will result in different modelsand analyses. The interest must focus the analysis and the process must bemodelled from the viewpoint of this interest. Starting with no initial viewpointor interest is of course difficult. This will lead to gradual development of aviewpoint and corresponding abstractions but that tends to take much effort.The level of detail to which a model must be elaborated depends on the levelof detail required for coding the protocols later on. A coarse-grained model isusually more difficult to compare with protocols than a detailed model. If acomputational model is required then this will determine the level of detail.Separate task analysis and psychological theory: In case of psycholog-ical research it is usually necessary to construct a model that explicitly statesthe meaning of the psychological theory in the context of the task and that pro-vides more detail on the process than is implied by the psychological theory.Usually there are many other sources available to construct a first approxima-tion of the cognitive behaviour by only looking at possible ways to performthe task. Examples of these sources are textbooks, interviews, etc. Using thisinformation to build a first approximation of a model is called task analysis.


5.6 Task analysis

5.6.1 The construction of a task analysis

The structure of a procedural model is based on the idea that cognitive pro-cesses can be explained from three factors:

The problem: The information given as problem data and the question thattogether form the problem obviously have an effect on the problem-solvingbehaviour. Given the same knowledge and the same mechanism for apply-ing it, different problems will result in different problem-solving behaviours.The knowledge available to the problem solver: The knowledge aboutthe task, consisting of facts, rules, principles, methods, strategies and the likeclearly has an effect on problem-solving. This is immediately obvious, for ex-ample, from the example protocols given in Chapter 1.The cognitive architecture: The cognitive architecture is the mechanismfor applying the knowledge. Properties of this mechanism have an effect onthe behaviour. As we shall discuss later in this chapter, an important explana-tory concept is the capacity limit of human working memory. This is part ofthe machinery that applies knowledge and has an effect on the way in whichproblems are solved.

Task analysis means constructing a first approximation of the model frominformation about the task without taking specific psychological factors intoaccount. These will be ‘added’ in the next step. For many tasks there aresources that are more accessible than think aloud protocols and it is a goodidea to exploit these first before trying the more complex protocol data. Oneuseful source are existing models that have been constructed for similar tasks.The literature on Artificial Intelligence, knowledge-based systems and cogni-tive science contains a wide variety of methods, strategies, models and rea-soning mechanisms that can be used as the basis of procedural models ofproblem-solving. Another important source are textbooks and manuals aboutparticular tasks such as architectural design, solving physics problems andsolving arithmetic word problems. Textbooks and manuals present the basicconcepts and methods for a task. This information is often complementary tothe computational models because these models emphasize the methods andthe process structure instead of the knowledge that is specific to a task. Ourexperience is that textbooks are not nearly sufficient to construct a model ofthe problem-solving process. For example, textbooks tend to leave out knowl-edge that is essential to solve problems. However, they usually do a good job


at providing an outline of the procedure and a set of important concepts. Mosttextbooks do not take into account psychological aspects of problem-solving.For example, they do not tell you how to avoid forgetting relevant informa-tion, how to organize the problem-solving process efficiently, etc. They tendto focus on the knowledge that is to be used. Therefore, even if it is possi-ble to construct a more or less accurate procedure for solving problems, thisis unlikely to be an accurate description of the way in which people actuallysolve problems.

Although in general other sources are used for the task analysis, one canalso use very clear and comprehensible think aloud protocols at this stage.This use of protocols is called ‘bottom-up’. Bottom-up use of protocols meansfinding abstract descriptions for parts of protocols. By recognizing similaritiesbetween different parts, abstractions can be defined that are assembled into aprocedural model (or a property-based model). The top-down strategy needsan initial model to start from. This model is sometimes obtained from aninitial holistic impression of a set of protocols but it can very well be basedon other sources such as introspective reports, prescriptive methods, etc. Thismodel is used to categorize parts of the protocol. If necessary, the modelis extended or refined. (In the context of empirical research it is important,though, to separate the stages of exploratory research aimed at finding a theoryor a model and of testing a hypothesis. As we shall discuss in Chapter 7 oneshould not use the same dataset for both stages.)

The risk of the bottom-up approach is that it is not possible to find goodcategories by inspecting only part of the data. In that case one ends up in thefollowing cycle: an idea comes up when reading a protocol fragment. This isnoted down. The context forces qualification of this idea, which gives a revisedversion. Next the idea is applied to other protocols, where it may reappear ina somewhat different form - or not at all. At some point the analysis will haveto get a focus: a focus on a sub-process, a selection of important propertiesof a cognitive process or a model of the process that abstracts from certaindetails. Our advice is therefore to avoid the pure bottom-up approach.

The risk of the top-down strategy is that the initial model (or one of itsrefinements) is not adequate for the data. Because this model determines therest of the analysis, this leads to complicated model revisions or to modelsthat have a poor fit with the data. This analysis suggests that it is better toavoid both extremes and follow a mixed strategy: start from the safe side. If ahighly plausible model can be obtained from sources that are more accessibleand yet appropriate then start from there. If the interpretations of the dataare very clear, start from there. This gives a mixed strategy.

Finally, a good method for task analysis that is easy to apply is simply


introspection. Solving the task and reflecting on your own thoughts is a verysensible and respectable technique for task analysis.

5.6.2 Example: task analysis of solving arithmetic word problems

Suppose that we want to know how children solve arithmetic word problemsand why they have difficulty with certain types of problems. First considerthe problem data. In Chapter 1 we gave a few examples of arithmetic wordproblems:

Problem 1: A father, a mother and their son are 80 years old together. Thefather is twice as old as the son. The father has the same age as the mother.How old is the son?

Problem 2: Irene has 6 sweets less than Suzanne. Diana has 5 more thanSuzanne. How many sweets does Irene have less than Diana?

These are two examples of a wide range of problems that together form thetask arithmetic word problems. What they have in common is that they giveinformation about quantities in the form of ordinary sentences. A problemmay involve two or more numbers. The information may concern quantitiesor relations between quantities. In general arithmetic word problems do notrequire scientific knowledge (such as physics) but only common sense knowl-edge.

Suppose that we are interested in understanding the way in which childrensolve such problems, for example, to understand differences in level of difficultybetween types of problems, the errors that children make and the knowledgethat is needed to solve them, for example for teaching purposes. One mayexpect that the direct approach is the best: collect think aloud protocols andtry to understand what is happening. Consider again the protocols in Chapter1. The protocols about the first problem are relatively clear and one can con-struct a procedural model from them. The last example protocol shows thatthis is not always feasible. It is better to first analyse the task using othersources than the protocols and then return to the protocol. We use the taskanalysis to find an initial approximation of the knowledge needed to performthis task.

The algebraic methodHow can arithmetic word problems be solved? One method that is taught insecondary school is the algebraic method. We summarized this method before


in section 5.3. The problem is translated into algebraic equations and these aresolved by standard algebraic procedures. If we apply this method to Problem1 we get:

1. Note that the quantities are associated with the ages of father, motherand son. These become the variables in the equations.2. Translating the sentences of Problem 1 gives the following equations:age-father + age-mother + age-son = 80age-father = 2 × age-sonage-mother = age-father3. These equations can be solved for ‘age-son’ in various ways that we shallnot describe here. The result represents the age of the son and is therefore thesolution to the problem.

What is the knowledge required for this task? From Problem 1 it may seemthat translating sentences into equations is a relatively straightforward processthat requires little knowledge. The knowledge seems to be in the style of theprocedure Translate in section 5.3 and so simple that it is not likely to causeerrors. However, consider the following problem:

Problem 3: A bottle of wine costs £5. The wine costs £4.50 more than thebottle. How much does the bottle cost?

The second and third sentence suggest that quantities are associated withthe wine, the bottle and the bottle of wine. This would give the followingequations:

price-wine = 4.50price-bottle-of-wine = 5

Now the ‘price-bottle’ is asked. However, this set of equations cannot besolved for ‘price-bottle’ ! It requires some interpretation to see that ‘price-bottle of wine’ does not refer to a third object with an associated quantity butto the sum of ‘price-wine’ and ‘price-bottle’. This initial analysis shows thatthere are at least two sub-processes: translate the sentences in the probleminto algebraic equations and solve the equations for the asked.

The approximation methodThe second method to solve arithmetic word problems is the approximationmethod. This is not taught in schools but it is known as a method for solving


mathematical equations and also as a form of means-ends analysis, a more gen-eral problem-solving method defined in Artificial Intelligence. The sentencesin the problem are viewed as constraints on unknown values that appear inthe problem. Possible values for these must satisfy these constraints. Thismeans that the sentences must be true or at least possibly true after filling inthe possible answer.

Procedure Approximate:1. Note the objects and their possible values.As in the previous method these are:

price-of-bottleprice-of-wineprice-bottle-and-wineprice-bottle minus price-wine

The latter two are known. The first two can be (almost) anything. How-ever, if we have a value for the price of the bottle or of the wine, the givenvalues can be used to compute the missing one.2. Generate a candidate solution for the asked quantity. In the current prob-lem we would generate the candidate ‘price of bottle = 50p’.3. Substitute the solution into the problem text. In the example this wouldgive: A bottle of wine costs £5. The wine costs £4.50 more than the bottle.The bottle costs 50p.4. If this gives a consistent story then the candidate is the solution, else aninconsistency occurs. In the latter case, use the inconsistency that occurs inthe story to decide if the candidate solution is too big or too small and usethis feedback to generate a new candidate.

This would imply that the wine costs £5 and the bottle with the wine £5.50,which in turn is inconsistent with the first sentence in the story. Now we mustdecide if the candidate value for the bottle is too big or too small. Since theinconsistency was caused by a sum of bottle and wine that was too big, it islikely that the value of 50p. for the bottle was too high. The pure approxima-tion method now reduces the value by an arbitrary amount, continuing thisprocedure until the right answer has been found.

This method is clearly more useful if only specific outcomes are allowed,for example only integer values. If real numbers are allowed the approximationprocess will not automatically halt and a criterion for stopping the approxi-mation process is required. Each step in the approximation method requires


knowledge: verifying a candidate solution, deciding if the candidate is toosmall or too large, estimating the size of the discrepancy. Again this is notrivial knowledge. We shall not elaborate this here but leave it as an exerciseto the reader. Also note that some knowledge appears in all procedures.

The schema application methodA third method is taken from the psychological literature. It is based on thenotion of schema: a structured abstract description. This schema applicationmethod identifies a schema in memory that fits the description in the story.Associated with this schema are procedures or other schemata with which theasked can be found. The literature is less clear about the details of theseschemata and about how they are retrieved. One type of schemata that hasbeen proposed refers to the states and events in the problem text in terms ofthe following classification:

Combine events: The text describes quantities that are somehow put to-gether. For example: John has 5 apples. Ann has 2 apples. How many applesdo John and Ann have altogether?Change events: A quantity changes from one value to another. For example:John has 5 apples. He gets 2 from Ann. How many does John have now?Compare states: Two quantities are mentioned and compared in the text.For example: John has 5 apples. Ann has 2 apples. How many apples doesJohn have more than Ann?

When a problem is read, a schema is retrieved on the basis of keywords, pat-terns in the problem text or the meaning of the problem text. A schema thatwould apply to the bottle of wine problem is the combine-schema. Note thatthis must be recognized from the problem text. This schema involves three el-ements, two quantities and the result of ‘combining’ these. Filling this schemagives:

Role in schema: Name: Value:Result of combining bottle of wine 5Quantity 1 bottle askedQuantity 2 wine ?Difference wine − bottle 4.50

Note that the elements of this schema are essentially the same as those in theapproximation method. The difference is that the schema is associated withprocedures that compute the asked from the rest of the schema. For example,


the combine-schema has the following procedure:

IF Part 1 is asked THEN the answer is: Difference − Quantity 2IF Part 2 is asked THEN the answer is: Difference − Quantity 1IF Difference is asked THEN

IF Quantity 1 > Quantity 2 THEN the answer is: Quantity 1 − Quantity 2IF Quantity 1 < Quantity 2 THEN the answer is: Quantity 2 − Quantity 1

The procedure that corresponds to the schema application method is:

Procedure Schema application(in: text; out: answer):1. Recognize schema (text; schema)2. Fill in schema (text; schema; filled-in schema)3. Find computational procedure (filled-in schema; procedure)4. Apply computation (procedure; filled-in schema; answer)

The direct recognition methodFinally, it is possible, especially in a restricted sub-task defined by a particularclass of arithmetic word problems, to follow a direct matching approach, thedirect recognition method. Consider the sub-task of solving arithmetic wordproblems involving only problems with two numbers in the text that must ei-ther be added or of which the difference must be taken. This effectively meansthat problems must be classified as sum problems or difference problems. Thiscan in principle be done from patterns occurring in the problem text. Forexample, if the final sentence is of the form ‘How many ... do ... and ...have together?’ then the numbers in the problem must be added to find theresult. If the form of the sentences is simple and standard then this patternrecognition approach is quite feasible although the rules associating patternsand solutions will be more complex than the example above.

In addition to these four methods there are general weak methods that guideproblem-solving. Novice problem solvers, who do not know methods or strate-gies for solving such problems, or who do not have the knowledge that isneeded to apply the method to a problem will use more general but weakermethods or they will mix parts from the methods that they do know. Forexample, people who do not know appropriate schemata and who cannot ingeneral solve algebraic equations will solve arithmetic word problems eitherby the approximation method perhaps combined with parts of the schema oralgebraic method that they do know. Analysing this further requires more


psychological knowledge about which knowledge people will have and takes usto construction of a psychological model.

The result of this exercise in task analysis is that we have identified fourtypes of procedures for solving arithmetic word problems:

1. The algebraic method.2. The approximation method.3. The schema application method.4. The direct recognition method.

Note that these analyses are still very far from completely specifying a modelof the solution process. Some of the steps in the procedural models can beelaborated easily (for example ‘solving a set of algebraic equations’ or ‘re-trieving a schema’) but other steps are likely to be very complicated. Forexample ‘transforming the sentences of the problem text into algebraic equa-tions’ may be rather hard to elaborate. It may also be hard to specify theproblem schemata that people retrieve.

We could elaborate these methods in more detail. We could even buildinitial computer models for them. However, it is a good idea to stop here andfirst present a different example of task analysis.

5.6.3 Example: task analysis of architectural design

The example is again taken from the study by Hamel (1990). He made a taskanalysis of architectural design by studying relevant literature and by inter-viewing 15 architects about their activities. He asked them for a descriptionof their activities from the moment the assignment was given till the actualrealization of the building. The main tasks of designing could be characterizedas follows: gathering information, decomposition of design problems, solvingthe sub-problems, synthesis of the sub-problems and styling. For each of thesecategories he made a decomposition into sub-activities, distinguishing betweenthe psychological aspects on the one hand and the involved knowledge and ac-tions on the other hand. For example one of the main categories ‘gatheringinformation’ consists of these activities:

1. The architect forms an idea of the kind of assignment he has been pre-sented with.2. The architect analyses the assignment by specification of the attributes.3. The architect fills in missing information.4. The architect checks the presented information.


To perform this task an architect needs to know:(a) Which information is indispensable for making a design.(b) The underlying wishes and goals from the respective clients for a greatnumber of designs.(c) Sources of information and the methods to gather more information.

This is a more abstract type of task analysis than the one we gave for arith-metic word problems.

5.6.4 The role of task analysis

Task analysis gives a first approximation of a procedural model. In particularit gives a first conceptualization of the range of behaviours that can appear inthe protocols. Suppose that it has been possible to describe the knowledge anda procedure for applying it that can actually be used to solve problems. Per-haps it has even been possible to write a computer program that can solve theproblems we are interested in. Psychological knowledge may now be applied topoint out that certain knowledge is unlikely to be available to certain people,that they may apply other knowledge (that is not actually relevant or useful)or that the mechanism applying the knowledge differs from what we knowabout how the human mind works. From this psychological knowledge thetask analysis can be modified to obtain a better approximation of the actualproblem-solving behaviour. We call this second approximation the psycholog-ical model. Applying psychological knowledge will be described in Section 5.7and constructing a psychological model in Section 5.8. This two-step methodfor model construction (task analysis followed by application of psychologicalknowledge) is the best method we know of in that it reduces the complexity ofthe modelling task. However, the model constructed in this manner is far fromcomplete. Nevertheless, without an initial task analysis exploiting accessiblesources about the task, constructing a model is much more difficult and oneends up mixing task analysis with ‘data driven’ modelling from the protocols.

The distinction between task analysis and psychological model and thetwo-step approximation of the psychological model is useful for both knowl-edge engineering and psychological research. In knowledge engineering thepsychological factors may not be useful as part of the final model, in particu-lar if they result in less competent behaviour. Think aloud protocols are thenused to complete the task analysis with information about specific skills andtricks of the trade that cannot be found in textbooks and other sources. How-ever, also in this case it is useful to take the psychological factors into account


when analysing the protocols, even if they are not included in the final model.Consider, for example, the effect of practice. An experienced expert may notshow the problem-solving behaviour that one might expect on the basis of atextbook or a training manual. From experience with the task she may haveacquired special methods or shortcuts. This can sometimes be explained frompsychological factors such as: experience with the task, capacity limits in thehuman cognitive system, ‘slips’ in performance caused by small errors in per-ception or memory, strong effect of recent events, etc. Another example isthe actual context in which the task is performed. The architects in Hamel’sstudy had access to their problem description, notes and sketches. The way inwhich these are used affects their behaviour. In this respect the conditions inwhich a computer system operates may be different. The architect may forgetpart of his original problem or he may incidentally note a useful idea in an oldsketch. Both of these events cannot easily be accounted for by a task analysisas such.

5.7 Theories of problem-solving

5.7.1 The role of psychological theories

Thus far we discussed the first step towards a complete model: task analysis.The sources used for task analysis are accessible and rational. They supply thegenerally available knowledge about how a task should be performed. Nearlyall sets of protocols are full of surprises if we compare them to the ratio-nal, well-informed way of performing the task. People behave in ways thatare sometimes less rational and based on false and unexpected assumptions.However, they may also have discovered knowledge that has not been gener-ally accessible but that allows them to perform better than the ‘rational’ taskanalysis can explain.

Psychology can supply additional knowledge on ‘limitations of rational-ity’ that make people behave in ways that are not optimal from a rationalviewpoint. Unfortunately, most psychological research and the correspondingtheory does not focus on cognitive processes but on properties of the result ofperforming the task, properties of the task and differences between individu-als. This work can be used only indirectly to explain cognitive processes. Forexample, it is hard to say what properties like ‘verbal intelligence’ imply forsolving arithmetic word problems or what differences in solution times betweenproblems tell us about why one problem is more difficult than another.

Psychological theory contributes knowledge about the cognitive machinery


and knowledge about what people generally know and can do (for example‘Dutch children over 6 years old can count and perform basic arithmetic op-erations’). In the task analysis, these kinds of knowledge are not taken intoaccount. For example, the fact that people can only hold a small number ofitems in working memory may influence their problem-solving process in waysnot represented in the task analysis. Human problem-solving behaviour is de-termined both by characteristics of the human mind, the knowledge that theperson uses and by the task itself. In many cases there are limitations on theways in which a task can be performed.

There are many theories of problem-solving. One example of a generaltheory postulates three types of process (Newell & Simon, 1972):

• Orientate• Solve• Evaluate

The orientation phase consists of activities that clarify the problem state-ment. It is characterized by asking oneself questions like: what do I knownow, what is given, what is the problem, have I seen such a problem before,etc? All these questions are meant to derive extra information pertaining tothe problem and its possible solution. The solving phase is characterized bythe application of solving procedures (however incorrect these might be). Theevaluation phase relates the solution to the original problem statement andis meant to check the solution for correctness or plausibility. Although thesephases are very general indeed, they are specific enough to differentiate dif-ferent behaviour. Beginners, people who are not familiar with the problem,often do not perform many orientation activities, and do not evaluate theirsolutions. Experts solve problems in the order denoted by the three phases.They often spend much time on orientation activities and they nearly alwayscheck their solutions.

Note that this theory on problem-solving abstracts from the task. It isapplicable to a very wide range of tasks, from mathematical problem-solving toarchitectural design. The theory makes a prediction about differences betweennovices and experts at these tasks and these predictions concern the processrather than the result. In fact the theory does not even predict that expertswill find better solutions: it only concerns the process.

Another example of a general theory is based on the notion of means-endsanalysis (Newell & Simon, 1972). Consider the difference between the currentknowledge and the goal and select a method or reasoning step that will reducethe difference as much as possible. This general idea has considerable explana-


tory power, especially if it is combined with a theory of memory capacity. Thistheory too abstracts from the content and structure of the task.

In our experience the most important elements in the psychological part ofa model are capacity limits (in particular the capacity of working memory andof perception) and previously acquired knowledge. In most cases it is possibleto obtain an initial model of the knowledge that people will use for a task bylooking at relevant instruction and experience. This acts as a constraint onthe result of task analysis, which is usually broader.

It is not possible to completely separate the psychological part from thetask analysis. For example, the distinction between orientation, solving andevaluation has both a psychological and a rational background. The result oforientation is often necessary for actually solving the problem. This meansthat task analysis will tell us that orientation is necessary to solve the task.However, task analysis involves mainly the rational version: correct orienta-tion behaviour that takes place at the best possible moment.


5.7.2 Example: psychological theories on solving arithmetic wordproblems

In Section 5.6.2 we gave a task analysis that consisted of several differentstrategies for solving a class of arithmetic word problems. We also sketchedthe knowledge that is needed to actually perform the task. Psychological the-ories that are relevant are:

(a) Special theories about arithmetic word problems which try to explaindifferences in difficulty between types of problems and differences betweenchildren’s ability to solve them. However, there are few theories about thecognitive processes involved in solving arithmetic word problems.(b) General theories about the structure (‘architecture’) of the human mind.

For example, one theory explains differences in difficulty between problemsby the order in which schemata are acquired. This theory assumes a ver-sion of the ‘schema application method’. Some problems are more difficultbecause they involve a schema for situations or events that children acquireat a later developmental stage. For example, according to this theory thecompare-schema is acquired later than the combine-schema, which explainsthe difference in difficulty. Another theory focuses on the role of languageunderstanding. Certain expressions may be misunderstood by children, whichcauses errors. For example, the problem text: Mary has 5 marbles. She gives3 marbles to Jim. How many marbles does Mary have? may be misunder-stood as meaning that Mary now has five marbles (even though she gave threemarbles to Jim), which would give the answer ‘five’. Also more general no-tions like ‘orientation’ can be applied here. This theory implies that childrendo not structure and integrate the information in the problem text but im-mediately look for a quick way to solve the problem. These theories can beintegrated with the task analysis and will result in fairly detailed predictionsabout differences in behaviour between problems and subjects.

5.7.3 Example: psychological theories on problem-solving in archi-tectural design

There are of course many psychological theories that have implications forthe way in which architects design buildings. Here we shall follow Hamel’swork that focuses on general properties of the human cognitive architecture.An important psychological factor in architectural design is the limited ca-pacity of working memory. Architectural design involves a large amount of


information as follows from the task analysis. Because the capacity of humanworking memory is limited and information is necessary for further steps inthe design process, architects need to take measures to overcome this problem.According to Hamel there are two methods architects use to solve this problem:

1. They organize the information into a structure of smaller ‘packages’ (the‘problem conception’) in such a way that relevant information will be accessi-ble at the right moment during problem-solving.2. They make notes and sketches which give a quick overview of the currentdesign without creating extra memory load. Notes and sketches act as an‘external memory’ and are accessed by looking at them instead of retrievingthem from memory.

This psychological knowledge can be used in combination with the task anal-ysis to construct a psychological model of architectural design.

5.8 Psychological model

5.8.1 The construction of psychological models

After discussing task analysis and the role of psychological theories, we nowturn to the question how psychological theory and task analysis are integratedinto a psychological model of the cognitive process at hand. The psychologicalmodel will be the basis of predictions about the protocol data. It summa-rizes what we know about how people will behave when performing a task.This means that knowledge about the cognitive mechanisms and about theknowledge that people bring to bear on task are used to modify the result oftask analysis. When comparing the resulting model with the protocol data,discrepancies point to errors in our knowledge and unpredicted behaviour.

The psychological model describes the cognitive process that will take placein the context of a particular task, as implied by a psychological theory. Thepsychological theory may constrain rational behaviour as described in the taskanalysis, but it may also introduce new possible processes that appear, forexample, when a person does not have adequate knowledge for performing atask.

Ideally, you would have a model that predicts what will appear in a pro-tocol. In that case, it would be possible to generate a verbal protocol on thebasis of the model. This latter case is very rare. Normally, the models usedare somewhere between very loose and very strict. What kind of model one


needs is partly dependent on the research question. If you want to test a the-ory, your model needs to be a proper operationalization of this theory. If youwant to demonstrate global differences between two groups of subjects solvinga certain type of problem, your model needs only to specify these global dif-ferences and how they might be reflected in a protocol, without specifying allproblem-solving steps in detail. In the next paragraphs we will come back tothe use of models and how they are constructed.

5.8.2 Example: a psychological model of solving arithmetic wordproblems

In Section 5.6.2 we sketched a task analysis for arithmetic word problems andin Section 5.7.2 we summarized some psychological research that is relevantfor this task. Let us now turn back to the task that we gave as an illustrationin Chapter 1. The problem that we gave as an instance of this task is:

Irene has 6 sweets less than Suzanne. Diana has 5 more than Suzanne. Howmany sweets does Irene have less than Diana?

This task was taken from a class of arithmetic word problems that all havethe general form:

A has N1 ‘things’ less/more than B.A/B/C has N2 less/more than A/B/C.How many ‘things’ does A/B/C have less/more than A/B/C?

So far the task analysis has given us four methods:

1. The algebraic method.2. The approximation method.3. The schema application method.4. The direct recognition method.

Young children have not been taught the knowledge required for the alge-braic method and it is therefore extremely unlikely that they will use it. Thisis also true of the approximation method although this is similar to the rea-soning involved in evaluating the answer. This is something that children maylearn at school. Evaluation is the first step in the approximation method andit may therefore appear in some form. Schemata are likely to be known bychildren at this age but probably not for this particular type of problem. It


is likely that schemata for similar types of arithmetic word problems interferewith their reasoning about these problems. The recognition method is typi-cally a method for experts in a domain. Whether children will use this methodwill therefore depend on the amount of practice that they have with this task.The subject of the example protocol in Chapter 1 had very little experience.

This suggests that a form of schema application is the best candidate. Amore detailed model requires knowledge about the schemata and how theseare applied. This is a good point to change the direction of our search fromtop-down (from refining a general model of the task) to bottom-up, inspectingthe protocols to identify typical reasoning steps from the schema applicationmethod: schema recognition, schema application, selection of an arithmeticoperation and computing the answer. Here is a first attempt:

Phrase Interpretation/comment

1: so Suzanne has 6 more than Irene transforms sentence;fits it into a schema

2: and Diana has 5 more than Suzanne

3: so Suzanne comes first suggests transformation to a‘X more-than Y more-than Z’schema

4: because Suzanne has some 5, er, 6

5: but Diana has 5 more, even more, keyword ‘more’ suggests ‘add’6: so altogether 11 operation suggested by keyword7: but less is asked here conflict with keyword cue8: how many does Irene have less

than Suzanne

9: then you subtract that ‘less’ suggests ‘subtract’10: Diana, Suzanne, Irene returns to schema

Experimenter intervenes11: [reads the problem again]

12: I don’t understand it ...

13: Irene has something of which

Suzanne has 6 more


15: so something must be added

16: and then something must operation suggested bybe subtracted schema; requires value

of Suzanne17: if I knew how many Suzanne had


18: Suzanne has 11, I think operation fromapproximation method

19: Diana has 5 more than Suzanne cannot be checked20: [pause] 10, I think

21: how many does Irene have less

than Diana

22: I think 10

Individual steps in the protocol can be interpreted as steps from the schemaapplication method. For example, several fragments show that a version of the‘direct recognition method’ is used: directly recognize the computation from apattern in the problem (e.g. lines 7-9). Other fragments can be interpreted asattempts to fill in a schema (e.g. lines 3-5 and 14-15). However, the knowledgethat is needed to correctly follow any method is not available. As a result thesubject encounters impasses:

(a) The question of the problem is of the form ‘how many does X have lessthan Y’ where the schema is stated in terms of ‘more’. This confuses her.(b) Although she can fill the ‘X more-than Y more-than Z’ schema this doesnot suggest an arithmetic operation. This may be due to interference fromapplication of this schema to other problems. Most arithmetic word problemsgive the value of variables. The problem presented here gives only relations be-tween variables. If the value is given, a schema easily suggests an appropriatearithmetic operation (even if this is not associated directly with the schema).

There is no easy way to resolve these impasses. The attempts in lines 4 to19 are based on operations that are useful for solving other arithmetic wordproblems. Lines 13-15 show an attempt to apply knowledge from a directrecognition method (keyword ‘more’ suggests ‘add’; keyword ‘less’ suggests‘subtract’) and lines 17-19 show an attempt to apply knowledge from the ap-proximation method. The subject guesses the value of one variable and thentries to evaluate the result. However, both attempts are abandoned becauseshe evaluates them as insufficient.

This bottom-up analysis gives us some building blocks for a model. It alsosuggests several new sub-processes that were not part of the model so far:

(a) It is useful to include methods for similar problems in the model becausethese are likely to play a role even if they are not adequate for the currenttask.


(b) This subject does not follow a single method but mixes methods or ratherparts of methods. The models that we sketched so far did not include this pos-sibility. This is more likely to occur in novice problem-solving because novicesby definition do not know a single adequate method.

5.8.3 Example: a psychological model of architectural design

In the next example of a psychological model, a psychological theory onproblem-solving is combined with specific information from the task analysis.In Figure 5.2 the psychological model of architectural design, as developed byHamel, is given.The model is concerned with the design process, with what architects do andthink while they design. Therefore the model is focused on what is called thetask schema, which describes the procedures involved in design tasks. Theproblem conception schema contains all the information about the state ofthe design process. When the architect is given a design problem, the firstthing to do is to orientate, which means to extract information from the textof the assignment. The actual work on the design problem is the executionphase. This phase consists of two sub-phases: the analysis of the problem andthe styling of the solution to the problem into a design. The third activity inthe task schema is the final evaluation. In the sub-schemata for the analysisand the styling phase the same divisions are found: orientation, execution andevaluation. In the analysis schema the orientation activities are: gatheringinformation, decomposing the problem, and finding partial solutions. Theexecution phase of the analysis schema is further decomposed into the synthesisschema. This is an important phase because solutions to partial problemshave to be combined to form one solution to the design as a whole. This sub-schema also is decomposed into orientation, execution and evaluation. Thestyling schema relates to the activities in the design phase where the solutionfound in the analysis phase is so styled that it meets architectural criteria.These criteria involve the aesthetic value of the design and the elegance ofthe solution: reaching maximal results with minimal means. So if the endproduct of the analysis activities is an intermediate design which meets thetechnical requirements of the client, the end product of the styling activitiesis a complete design which meets aesthetic and professional criteria.

For each of the general activities in the model, specific activities as found inthe task analysis can be distinguished. For example on the synthesis schemalevel, orientation consists of re-reading the givens and making estimates onthe combinations of design aspects. On the styling schema level, orientationconsists of studying one’s own sketches and making estimates on the outlook


Problem conception

schema

Working memory

Orientation

Execution A

Execution B

OrientationExecution

Synthesis schema

OrientationExecutionEvaluation


Evaluation

Styling schema

Analysis schema

Task schema

Long term

memory

Evaluation

FIGURE 5.2: The psychological model of architectural design


of the building.

The structure of the model as a whole is nested. The sequence of theactivities during the design process follows from this structure. The modelpostulates the order of the four levels and within these the order of the threecategories orientation, execution and evaluation. It does not postulate, how-ever, the order of the activities within each of these categories. So one canpredict that evaluation in a certain schema comes after the execution, but onecannot predict whether an architect first re-reads the givens and then makesan estimate or vice versa. It is possible to postulate the predicted order ofcategories of activities in terms of transitions between these categories. Thisleads to a matrix of admitted transitions.

For further details see Appendix E where all the specific activities belongingto the different main categories are listed.

5.9 Dimensions of models

Models may differ on a lot of dimensions:

Task-oriented models versus psychological models: The research ques-tion determines whether the model will be task oriented or psychologicallyoriented. If the question is the acquisition of knowledge about how a certaintask can be performed then the model will focus on the problem-solving pro-cesses of experts, ignoring mistakes and inefficient methods. An example is amodel of an expert performing a troubleshooting task. The goal is to acquireknowledge about how the troubleshooting can be done in order to build anexpert system.

Task-specific versus task-independent: A model may be formulated invery specific terms which are only used in a particular task. An example is amodel of solving a physics problem which uses terms like applying Newton’sSecond Law. A task-independent model will use only general terms, appli-cable to a wider class of tasks. An example of a task-independent model isa model using general terms like ‘applying law’, without specification of theexact nature of this law, so it can be used in several physics tasks, or stillmore general ‘applying rule’, so it can be used in all science tasks. Even adomain-independent model will have a limited range, over a number of do-mains. A model cannot cover all human problem-solving. ‘Domain specificity’and ‘fineness of grain’ do not exactly map. Although very fine-grained mod-els will often not cover many different domains, it is still possible to have a


domain-independent model which is fine-grained, and to have coarse-grainedmodels which are very domain-specific.

Process structure versus process properties: Although protocol anal-ysis is always directed at cognitive processes, the main focus may either beproperty or dimension of a process (such as the proportion orientation ac-tivities) or the process structure: the sequence and conditions under whichcognitive actions take place.

Scope: Some models cover, for example, the entire problem-solving process,where others only concern certain aspects or sub-processes. For example, astudy may focus on how children transform sentences to equations in arith-metic word problems. Although complete protocols are taken, only a sub-process is relevant. This may not appear in certain protocols at all!

Individual differences versus general human behaviour: Models maydeal with individual differences. An example is a model of how different typesof individuals react to the same treatment. Models may also deal with howpeople behave in general. An example is a model of how people retrieve in-formation when under strain, when they have to perform for instance severaltasks at one and the same time. In the latter case the researcher is not in-terested in individual differences but in general possibilities and limitations ofhuman behaviour.

Deterministic versus non-deterministic models: Task analysis and psy-chological model together may not specify a single possible cognitive processbut a (possibly very large) set of possible processes. This means that it willnot give a single prediction about what we can expect in a think aloud protocolof a person performing the task. Note that this is different from a model thatpredicts different behaviour depending on the problem data (or on the resultsof later reasoning). Such a model also covers a wide range of processes but itmay still give a single unique prediction about each problem (or intermediatesituation). Non-deterministic models specify a set of possibilities. This simplyshows our ignorance. In psychological research this means that our currenttheory about the process is weak. It will be tested by checking if the protocolsmatch at least one possible process. In knowledge acquisition, the knowledgeengineer will use the expert’s behaviour and the correctness of the result asthe main criteria to choose a particular model. For reasons of explanation itmay even be useful to elaborate and implement all variants.


Granularity: A model may describe the problem-solving process in verydetailed steps. For example, a model of doing subtractions in columns mayinclude all the different steps needed to perform the operation: selecting acolumn, reading the first number, reading the second number, subtracting thelower number from the upper number, writing down the result etc. The granu-larity of a model must be at least as fine as the psychological theory involved (orthe level of detail required in the task analysis, if that is constructed bottom-up) and it must also match the categories one distinguishes in the expectedutterances reflected in the protocol. A model with reasonably fine granularityis needed when all separate utterances made in a think aloud protocol are tobe represented in a model. A model may also deal with broader categories ofdescriptions. For example, a model of the process of writing an essay will notdescribe every step an author may take, but will have categories like searchingfor new information, making notes, formulating a question. All these stepsmay take a long time and involve many different sub-steps. Another exampleis the solving of complex physics problems, where doing subtraction is only asub-step of the category ‘calculating’. Another possibility is to define (dimen-sional or categorical) properties of reasoning steps that involve abstraction.However, if the theory or the task analysis requires a level of detail that doesnot appear in think aloud protocols, the method is simply inadequate.

Distance to verbal data: The model may use terms which are more orless close to the verbal data in the protocol. Steps in the model like readingthe problem out loud are very close to the data to be found in the protocols.Steps like orientation on the problem are further away, it is not readily seenthat reading out loud belongs to this step. We will discuss these issues in thesection on making a coding scheme in Chapter 7.

As we indicated in the descriptions, the choice one makes with respect tothese dimensions depends on the goal of the study, the level of detail found inthe protocols and the need for a computational model.

5.10 On the boundaries of task analysis and model con-struction

Difficult decisions concern (a) the scope and depth of the task analysis and(b) the degree of formalization. The knowledge that is potentially relevantis usually enormous for real-life problem-solving tasks. Architectural design,for example, may at some point require theories of human perception, heat


flow and even more remote knowledge. Where to stop depends on the purposeof the task analysis. In the case of top-down psychological research this means:

1. Interpreting the theory of problem-solving in terms of the problem-solvingtask.2. Finding a psychological model that will allow defining a coding scheme thatis adequate for the problem-solving task that will be used in the experiment.

To decide if the second goal has been achieved, it may be necessary to tryto define a coding scheme and test it on pilot protocols. In the case of ex-ploratory psychological research it is more difficult to determine the scope ofthe task analysis.

It may seem that in the context of building knowledge-based systems, taskanalysis is putting the cart before the horse. The possible solutions, givens,possible solution methods, etc. are just what we need to know. However, inthis case too it is more effective to start by a task analysis from other sourcesthan think aloud protocols, such as textbooks. In this case the scope of thetask analysis will be determined by the requirements for the knowledge-basedsystems instead of the experimental tasks. It may happen that an expert hasunique expertise.

The example of a psychological model of architectural design that we dis-cussed was rather informal. In psychological research it often useful to usemore formal languages to formulate a model. In some cases it may even beuseful to use an executable computer language and build an executable model.In the next chapter we discuss possible languages with examples of models ofsolving arithmetic word problems.

Literature

Theories and models of the way in which children solve arithmetic word prob-lems are given by Kintsch (1985), Riley et al. (1983) and Sandberg & deRuiter (1985). There are many possible problem-solving methods that canbe used as the basis for models. Some of these are described in the ArtificialIntelligence and cognitive science literature. See Patil, 1988, for an overviewof methods for medical diagnosis and for example Mitri, 1991, for a discussionof methods for ‘candidate evaluation’. See Clancey, 1988; Kuipers & Kassire,1984; Kuipers et al., 1988, for models of medical problem-solving based onthink aloud protocols.

Chapter 6

Languages for task analysis andpsychological modelling

6.1 Introduction

A task analysis and a psychological model must be represented in some form,in a language. In addition to normal human language more formal languagesare used for this purpose. In knowledge acquisition a structured, more or lessformal language is always necessary, even if it were only the language in whichthe resulting computer program is written. Even though knowledge-basedsystems usually address only a relatively small and specialized task, they mayrequire many different types of knowledge such as concept definitions, scientifictheories, experience and rules of thumb. To make all this work together, todocument the knowledge for later modification or for collaboration betweenpeople working on the task of building the system, and to make it executableby a computer, requires that it is expressed in a structured form.

In psychological research it is sometimes possible, and even advantageous,to avoid complex models by using tasks that are basically simple and involvelittle knowledge. This makes it easier to understand what people do whensolving a problem. However, even rather simple tasks, such as puzzles, oftenlead to complex problem-solving processes, in particular when the person solv-ing the problem has no routine, clear-cut way of finding the solution. In thiscase too, a formal language will help to formulate the task analysis and thepsychological model clearly.

There are many different languages that can be used for this purpose. Thechoice of a language is not of decisive importance when building a model but


an appropriate language will make the task analysis or model more conciseand easier to understand. In the previous chapter we used language based onprocedures and sub-procedures to formulate models. For some models suchlanguage is less suitable, for example because it is not executable. To formulatesuch models different languages are more appropriate. The most importantproperties of a language are:

(a) It must contain constructs that allow compact and clear expression ofthe important elements in a task analysis or psychological model. For exam-ple, if the model focuses on problem-solving strategy then the language shouldcontain constructs for representing strategies, such as ‘if X then do Y elsedo Z’ or ‘repeat X until Y’. If, on the other hand, the model emphasizesthe representation of the problem, it should have constructs to represent, forexample, structured objects, possibly graphical objects. In well-formalized do-mains it is usually possible to borrow the language that is used in the domain.For example, if we model arithmetic problem-solving, we use a formal languagethat is used to describe arithmetic procedures.(b) Depending on the way in which it will be used, the language may haveto be executable by a computer. Some formal languages are executable on acomputer, which means that if a complete model is specified in the language, itcan in principle be applied to a problem and executed as a computer program.Then it will ‘animate’ the model and simulate the behaviour that it describes.

The format of the task analysis and the psychological model for solving arith-metic word problems and architectural design was rather informal. This isadequate if no computational model is needed and if we want to focus on dif-ferences in procedures. However, we may want a more formal description oreven an implemented simulation program, for example to facilitate analysis ofthe implications and assumptions of a model. In that case we need a moreformal representation language. What language shall we use? A languagethat is popular in psychology is (linear) algebraic equations or functions. Sev-eral studies of arithmetic word problems formulate their model as for example:

difficulty(Problem) = g × schematype(Problem)

where g is a number representing the statistical association between schematypeand difficulty and where schematype(Problem) is the type of the schema thatapplies to the problem (see Section 5.6.2). More complex models take studentproperties into account, which would give for example:

Languages for task analysis and psychological modelling 81

difficulty(Problem, Person) =g1× schematype(Problem) + g2× intelligence(Person, Problem)

This type of model is also used in building knowledge-based systems. Forexample, in a knowledge-based system that finds a diagnosis from a set ofdata on a patient we could look for a function of the form:

WeightedSum = g1×symptom−1+g2×symptom−2+...+g3×symptom−n

and decide on a threshold for WeightedSum to make a final decision. How-ever, it is usually rather difficult, if not impossible, to obtain such weights.They cannot be found in the protocols and they are hard to acquire otherwise.This would also not give us a model of the cognitive process but only of theinput and output of the process. Algebraic equations about properties suchas schematype, intelligence and symptom are clearly an inappropriate lan-guage to express procedural models. More appropriate are special procedurallanguages.

Frequently used languages are: conceptual modelling languages, produc-tion rules, problem behaviour graphs and pseudo programming languages. Inthis section we will only present a few, frequently used, formal modelling lan-guages. These languages are presented in a simplified form, for more complexand detailed descriptions we refer to the literature. However, even with simpleversions, it is possible to represent sophisticated models.

In the context of this book it is not possible to present formal modellinglanguages in full depth. The purpose of the remainder of this chapter is toillustrate several frequently used modelling languages and to explain them withexamples to a level at which you can make models yourself.

6.2 A conceptual modelling language

6.2.1 CPML (Conceptual Protocol Modelling Language)

We start our discussion with languages that are more formal than just verbaldescriptions, but that involve fewer details than for example programming lan-guages. The first language that we discuss here, CPML (Conceptual ProtocolModelling Language) is a simplified version of a language that was designed fordocumenting knowledge in the course of knowledge engineering. The knowl-edge that is acquired by interviewing experts, consulting textbooks and othersources, must be documented before it will be used as the basis of a com-


puter system. The KADS methodology (Schreiber et al., 1993) for developingknowledge-based systems assigns an important role to the analysis of thinkaloud protocols. To be able to better handle the complexity of human exper-tise, the KADS methodology employs not one but several modelling languages.This allows a stepwise transformation between verbal data on expertise to acomputer system. Here we shall describe a version of the Conceptual ProtocolModelling Language, the first level of formalization. The second and thirdlevels consist of a formal specification language based on predicate logic andon more technical characteristics of the computer system. We shall not discussthese here but in the literature references at the end of this chapter we point tosources for this. Although CPML was designed for building knowledge-basedsystems, it is well suited to model any kind of cognitive process.

CPML distinguishes different types of knowledge. In many problem-solvingtasks there is a useful distinction between descriptive domain knowledge andmethods for problem-solving. In architectural design one can distinguish be-tween on the one hand, the overall design method and on the other hand,knowledge about requirements, regulations, possible materials and structuresof buildings. These are clearly separated in CPML but not in certain other lan-guages as we shall see below. CPML relies on concepts that have an intuitiveinterpretation. It involves a structure of three different layers of knowledge.The first layer, the domain layer, contains the descriptive knowledge that canbe used to reason from givens to solution (including useless reasoning steps).The second layer, the inference layer, describes the types of reasoning stepsand the third layer, the task layer, specifies the conditions under which suchtypes of steps are taken.

6.2.2 Domain layer

At the domain layer the descriptive knowledge about the domain is repre-sented. For example, the domain layer of the architectural design task wouldinvolve definitions of concepts that appear in the initial assignment as it isgiven to the architect, the concepts and language for the design that the ar-chitect produces and all concepts, facts, definitions, rules etc. that play arole in the task. Examples of concepts are roof, door, entrance and variousmaterials, costs associated with parts of the design, functions of parts of thedesign, ways of using it, risks associated with them, and stylistic properties.In arithmetic word problems the domain layer would contain descriptions ofpossible ‘problem texts’ and ‘equations’ and of the knowledge involved in thesolution process.

In CPML there is no fixed syntactic form for representing descriptive do-


main knowledge. Forms that are frequently used here are lists of concepts,concepts structured in hierarchies or trees or if-then rules. Representationstructures that can be used to represent domain knowledge are frames, tablesor networks. These notions are taken from computer science and artificial in-telligence. We shall not discuss these notions in detail and we use them in anintuitive way. At the end of this chapter we give pointers to the literature.

To illustrate this we give a sample of the domain of a model of solvingarithmetic word problems such as:

John has 5 apples. Ann has 2 apples. How many apples do John and Annhave altogether?

In this case the domain layer is represented in the form of concept defini-tions. In the example below these are: problem text, sentence, person,object and equation.

Problem text:

a series of sentences

Sentence:

Object-1 Action-1 Object-2

where: Object-1 is a person

Object-2 is an object

Action is an action (has, earns, holds)

Object-1 has Number Object-2 more/less than Object-3

where: Object-1 is a person

Number is a number

Object-2 is an object

Object-3 is a person

...

Person:

John

Mary

Ann

...

Object:

marble

apple


balloon

...

Equation:

Var = Nr

where: Var is a variable

Nr is a number

Var1 + Var2 = Nr

where: Var1 is a variable

Var2 is a variable

Nr is a number

Var1 - Var2 = Nr

...

The problem text can be defined further in terms of expressions that mayoccur in it. The definition of equation could be extended by other types ofequations and of variables.

When formulating the domain layer one must keep in mind that knowledgeat the domain layer must be associated with a knowledge source at the nextlayer, the inference layer. This means that structures at the domain layermust correspond to structures in the use of knowledge. For example, a modelmay say that there is a process translate that infers an equation from aproblem text. Here the translate reasoning step uses problem text andequation. It must therefore be possible to define these concepts in terms ofthe domain knowledge. In addition to these concept definitions, the domainlayer will contain rules that relate problem text to equations. These rules canbe formulated analogous to those that we shall give in Section 6.4.3.

6.2.3 Inference layer

At this layer procedural knowledge is represented that is used to make infer-ences based on the problem data and the knowledge at the domain layer. Forexample, in the analysis of the architectural design task at this layer we willfind knowledge for evaluating a partial design with respect to requirements,for proposing extensions or revisions, etc. The inference layer is representedas knowledge sources and metaclasses. Knowledge sources are procedures andmetaclasses are the input and output of procedures. The inference layer de-scribes procedures and their relations but it does not specify the order in whichprocesses take place or the conditions under which they take place. The do-main layer is connected to the inference layer in two ways:


MC-1 MC-3

MC-2

KS-1

FIGURE 6.1: Knowledge source with metaclasses

1. Metaclasses are associated with elements in the domain layer. For example,the metaclass equation can be used in the inference layer and this means thatan equation will appear at that point in the reasoning process. The connectionis made by naming the metaclasses after concepts in the domain layer. Every-thing that belongs to this concept belongs to the metaclass and can thereforeplay the role in the reasoning process that is specified by the metaclass.2. Knowledge sources are also associated with domain knowledge. The do-main knowledge associated with a knowledge source is called the theory ofthat knowledge source. The knowledge in this theory is used to find elementsof the output metaclass(es) from elements of the input metaclass(es). In ourdiscussion we shall not distinguish between the knowledge source and its as-sociated theory.

For the inference layer a standard diagram format is used. Knowledge sourcesare represented as ellipses with the name of the knowledge source inside andmetaclasses are represented as rectangles with the name inside, see Figure 6.1.


This figure represents a procedure that uses input data from two sources,expressed by the arrows pointing into the ellipse and that produces one outputexpressed by the outgoing arrow. Each input arrow must either come fromanother knowledge source or it must be explicitly labelled as input to the wholeprocess. Similarly, each output arrow must point to another knowledge sourceor be declared as output of the whole process. It is not allowed to connecttwo knowledge sources (i.e. without intermediate metaclass(es)) or to connecttwo metaclasses (without intermediate knowledge source(s)). This forces oneto be explicit about the role of the domain knowledge.

An important property of a language is the ability to create abstractions,where details of a model can be hidden. In CPML this is done by allowinga knowledge source to be elaborated in a sub-inference model. To keep themodel coherent, the connections between a box at the higher level and itsneighbours must appear in the inference model. Consider the example inFigure 6.2. Model-2 elaborates the knowledge source KS-1 in Model-1. Notethat the metaclass MC-1 in Model-1 corresponds directly to MC-1.1 in Model-2.MC-2 in Model-1 is split into MC-2.1 and MC-2.2 in Model-2.To illustrate inference layers we give in Figures 6.3 to 6.5 three inference modelsfor solving arithmetic word problems described in Sections 5.6.2 and 5.8.2.Note that all metaclasses must correspond to terms that are defined on thedomain layer and all terms on the domain layer must belong to at least onemetaclass. If a metaclass does not correspond to domain concepts it is not clearwhich domain knowledge is to be applied at the reasoning steps described bythe knowledge source. If domain knowledge is not associated with either meta-classes or a knowledge source then it is not clear for which type of reasoningsteps it is used.

Also note that the inference layer does not fully specify under which con-ditions a knowledge source applies. For example, in the model of the schemaapplication method (Figure 6.5) the features are used as input to both recog-nize schema and fill in schema. The diagram does not indicate which of thesemust be done first. This is specified at the next layer up, the task layer.

6.2.4 Task layer

The inference layer only models which processes need input from which otherprocesses but it does not show in which order, how often or under whichconditions (in terms of the data) a process takes place. This is done at thetask layer.


MC-1.1

MC-2.1

MC-2.2

KS-1.1

KS-1.2

MC-4

MC-5

KS-1.3 M-3.1

MC-1

MC-2

KS-1 MC-3

Model-2:

Model-1:

FIGURE 6.2: Layered inference model


problemtext

interpret

equations

solveequations

solution

FIGURE 6.3: The algebraic method

The task layer is expressed in a procedural language, using the following con-structs:

Elementary procedures:

knowledge-source(input: metaclasses; output: metaclasses)

Conditional procedures:

IF condition

THEN procedure-1

ELSE procedure-2

Iterative procedures:

REPEAT procedure

UNTIL condition

In the task layer procedure names must correspond to names of knowledgesources, parameters must correspond to metaclasses of domain concepts andconditions apply to possible instances of these concepts. For example, the tasklayer for the schema application method could be:

Solve problem (input: problem text; output: answer):

REPEAT

Read (problem text, sentence)

Interpret (sentence, equation)

UNTIL no more sentences

Solve equations (equations, answer)

Write (answer)

Here Read(problem text, sentence) represents the knowledge source Read

with metaclasses problem text and sentence. The condition no more sentences


read problem

interpretedproblem

guessanswer

possibleanswer

evaluate

evaluation

interpretedproblem

possibleanswer

fill inpossible answer

new interpretedanswer

inferimplications

implications

check ifconsistent

evaluation

evaluate

FIGURE 6.4: The approximation method


problemtext

recognizefeatures

features

recognizeschema

schema

fill in schema find solutionmethod

filled in schema solution method

applymethod

solution

FIGURE 6.5: The schema application method


applies to the current value of the metaclass problem text. The knowledgesources Read and Write need not be elaborated here. For the knowledgesource Solve equations another inference model can be formulated that hasthe same input and output metaclasses.

6.2.5 Example: a CPML model of architectural design

In this example we illustrate the use of CPML with a model of architecturaldesign. Figure 5.2 shows the structure of the reasoning process. Here we elab-orate the sub-process that corresponds to the analysis schema in Figure 5.2.This process consists of three sub-processes: orientation, synthesis and eval-uation. In Figure 5.2 the synthesis sub-process is elaborated separately inthe synthesis schema. These processes use information from the problem con-ception, the information about the current problem that has been collectedand derived at a particular moment during problem-solving and they also useknowledge from long-term memory.

In CPML the reasoning processes correspond to knowledge sources and theelements of the problem conception that are used by a knowledge source corre-spond to the metaclasses. The knowledge in long-term memory corresponds tothe knowledge in the domain layer. The model in Figure 5.2 does not specifythe knowledge at the CPML task layer. Below we translate parts of the modelpresented in Figure 5.2 into CPML, we will present part of the inference layer(Figure 6.6) and also we add a possible task layer and examples of domainknowledge.

Domain layer:

• Metaclass requirements: Input to the analysis process are the require-ments on the design. The form of these can vary over tasks. Here we only givea few examples:

functional requirements

maps and drawings

geographical requirements

photographs

aesthetic requirements

budget requirements

• Metaclass structured requirements: The initial requirements are decom-posed into requirements for components of the design and the ‘sub-designs’


orientation

synthesis

evaluation

requirements

structuredrequirements

functionallyadequate

design

evaluated design

FIGURE 6.6: Part of an inference layer of architectural design

are partially elaborated. The ‘sub-designs’ concern rooms and other parts ofthe building. Examples are:

room

livingroom

meetingroom

playroom

garage

bathroom

cloakroom

corridor

outdoor spaces

garden

playground

access road

entrance

For each basic component design choices are made about properties of thecomponent. Examples are:

Room:


the approximate size

the presence of windows

requirements for pipes

orientation with respect to sun

• Metaclass functionally adequate design: This refers to designs that arecomplete in the sense that they are composed from the partial designs gener-ated above in such a way that requirements are not violated. The format ofthis consists of:

sketches and diagrams

approximate measures

sketches of floor plans

sketches of cross-sections

positioning in geographical layout

• Metaclass evaluated design: This is the result of evaluating the function-ally adequate design in terms of the initial requirements to test if indeed nooverall requirements have been violated. It consists of the design plus violatedrequirements (if any). This metaclass therefore refers to the same domain layerelements as functionally adequate design plus requirements.

• Knowledge source orientation: Below some examples of rules are givenfor orientation. The first rule represents that, if one of the requirements is aparking facility then one of the ‘rooms’ must be a ‘parking space’. A parkingspace has several attributes (size, floor size, access, ventilation, fire escape)that will later on have to be established. The next rule states that if the ‘floorsize’ of a parking space is unknown, the number of cars that are to fit intothe parking space must be requested (from the client). The third rule repre-sents knowledge for calculating the (approximate) floor size from the numberof cars. The last rule shows a decomposition of a functional requirement intorequirements for sub-functions (corresponding to required rooms).


orientation:

IF required: parking facility

THEN ADD: parking space:

floor size =

access =

ventilation =

fire escape =

IF required: floor size of parking space

AND number of cars = ‘unknown’

THEN REQUEST: number of cars

IF required: floor size of parking space

AND number of cars = N

THEN floor size = 2.5 * N

IF required: shop

THEN ADD: required: shoproom

required: storageroom

required: coffee corner

required: toilet

• Knowledge source synthesis: This contains knowledge for composing thebuilding blocks into a complete design. In the model in Figure 5.2 this iselaborated in the sub-model Synthesis schema. To simplify the illustration weonly give an example of synthesis domain knowledge.

synthesis:

IF required: parking space

AND possible: basement

THEN insert parking space in basement

• Knowledge source evaluation: Evaluate each requirement from the problemstatement in terms of the functionally adequate design.

Task layer:

The task layer of this model concerns the choice to be made when an evalu-ation is reached. This is the only point where the model does not specify the


course of the reasoning process and therefore the only point where a task layermakes sense. Depending on the content of the evaluation, the whole processstops or returns to synthesis or to orientation. Hamel’s analysis does notspecify how this is decided so we formulate a hypothesis:

IF there are no violated requirements

THEN stop process

IF there is a violated requirement that involves only 1 component

THEN return to synthesis

IF there is a violated requirement that involves more components

THEN return to orientation

It will be clear that it will be very hard to elaborate this model in generalto this level of detail. However, if the study is restricted to particular designtasks then it is possible to elaborate the domain layer for that special case. Ofcourse this level of detail is not always necessary. The model can be kept moreabstract and only refers to categories of processes rather than their detailedcontent.

6.2.6 Concluding remarks

The modelling language CPML is well suited for the analysis of protocols, be-cause there is a natural correspondence between protocol elements and modelelements as follows:

CPML Protocol elementsdomain layer concepts/wordsinference layer types of transitions between conceptstask layer conditions under which transitions occur

Conceptual models can be more or less detailed in their description of bothprocesses and knowledge that plays a role in problem-solving. Less detailedmodels are expressed by giving only general domain knowledge and only partialdomain layer theories. A knowledge source may correspond to a single protocolsegment, a larger protocol fragment or part of a segment. In the latter casethe analysis of the protocols is facilitated if sub-knowledge sources are definedthat do correspond to a single protocol fragment.

Think aloud protocols normally consist of intermediate results of a cogni-tive process. They usually do not contain verbalizations of mechanisms thatproduce these results or of the (general) conditions under which transitionstake place. That means that only the metaclasses (or rather their instances:


the actual domain concepts) of the most detailed knowledge sources appear inthe protocols. The knowledge used for reasoning steps (the knowledge sources)must be able to reproduce the protocol data.

Conceptual modelling has proven very useful in practice. In some situationsit is enough to construct only part of a full conceptual model. For example, ifwe are only concerned with the structure of the reasoning process and not withthe domain knowledge, then it may be enough to only specify the inferencemodel and the task model and not the domain model. This corresponds to theconstruction of pseudo programs. Alternatively, only the types of reasoningsteps may be relevant and not the order or conditions under which they appear.In that case only the inference layer is relevant.

6.3 Pseudo programming language

The language that we used for the task layer of the models in the previoussections is a member of a family of languages that are referred to as pseudoprogramming languages. These languages are derived from programming lan-guages like Basic, Fortran, Algol and Pascal. The difference with the actualprogramming languages is that pseudo programs cannot be executed. Thesyntax of the pseudo programming languages is usually less strict than theactual languages, constructs are used that do not exist in the programminglanguage and datastructures are less well defined. Constructs, references todata and sub-procedures that have a clear intuitive meaning are not elabo-rated. The advantage over actual programming languages is that less care isrequired to formulate a model in a pseudo programming language than to con-struct a comparable running computer program in a programming language.That the model does not ‘run’ is sometimes not important.

Compared to CPML, pseudo programming languages do not have a stan-dard representation that allows abstraction from the data and knowledge thatare involved in reasoning processes. Examples of constructs in pseudo pro-gramming languages are:

IF Condition THEN Action ELSE

REPEAT Action UNTIL Condition

ASSIGN Value TO Variable

In an ad hoc defined pseudo programming language, the models of the cognitiveprocesses in solving arithmetic word problems that we gave above can beformulated as variants of a general procedure. This highlights what the modelshave in common and where they differ:


Solve problem:

REPEAT

Read from problem text to: sentence;

Translate (sentence, equation)

UNTIL no more sentences

Solve equations (equations, answer)

Write (answer)

Variant 1: Algebraic model

Translate (sentence, equation):

Parse sentence (sentence, parsetree)

Translate to equation (parsetree, equation)

Here ‘parsing’ means finding the grammatical structure of the sentence. Ele-ments such as the subject, the object and the action of a sentence are identifiedas a preparatory step toward translation.

Variant 2: Schema application model

Interpret (sentence, equation):

Recognize keywords (problem text, keywords)

Select schema (key words, schema)

Fill slots of the schema (schema, filled-schema)

Translate to equation (filled-schema, equation)

As the example shows, procedures in pseudo programming language can con-tain sub-procedures. It is possible that a model does not specify a singlepossible procedure but only several possible procedures. These alternativemodels can be represented by alternative sub-procedures.

Although this pseudo programming language resembles real programminglanguages more than CPML, there is still a significant difference betweenpseudo programming languages and real programming languages. Unfortu-nately, it is difficult for non-programmers to appreciate the difference betweenwriting pseudo programs and writing real, running programs. We shall not tryto convey this here but simply warn the unexperienced reader who believesthat his or her paper and pencil model is almost a program that, in general,he or she is not even half way at this point.

The relation between pseudo programs and think aloud protocols is similarto that between CPML and protocols. Protocols show the intermediate results


rather than the operations themselves. Variations in models appear as alter-native programs. Here too, commonalities and differences can be highlightedby sharing sub-procedures and datastructures.

In general, we recommend using CPML unless the model remains very sim-ple. The extra abstraction mechanisms in CPML are very useful for buildingmore complex models.

6.4 Problem Behaviour Graph and production rule sys-tems

6.4.1 Problem Behaviour Graph (PBG)

Newell and Simon (1972) formulated a psychological theory of problem-solvingwith an associated representation scheme. One of the key ideas of this theoryis that problem-solving proceeds through a sequence of steps, each of whichchanges the knowledge that a person has about the problem at hand. Whenproblem-solving starts only the problem givens are known. This is the initialproblem state. Knowledge is applied that changes this initial state. In general,it adds something to the initial state. Problem-solving ends when the solutionis known. The knowledge that is used to change one state to another is calledoperator. A solution process can then be viewed as a sequence of operatorapplications and states that starts with an initial state and that ends with agoal state.

The knowledge that people use to solve a problem is partially given inthe instruction to a problem or to a general task and for the rest peopleuse additional knowledge that they have acquired before. In most cases theknowledge that is provided in the instruction to a task can be translated intooperators that can be applied to the problem givens, the initial state. However,in general more than one of these operators can be applied to the initial state.This means that the person solving the problem must decide which operatorto apply to a particular problem.

In the framework of Newell and Simon this is modelled by making a dis-tinction between the actual operators and ‘control knowledge’ for selecting anoperator. The latter becomes the focus of interest, because the operators arederived directly from the instructions or from general knowledge. Newell andSimon introduced the concept problem space to describe properties of this con-trol knowledge in relation to the operators. A problem space is characterizedby:


1. States: States consist of the knowledge at a given time about the problem,such as the results of some part of the problem-solving process. There are twospecial kinds of states:• Initial state: the state of knowledge in the beginning of the problem-solvingprocess, such as the givens of a problem.• Goal state: the state which has to be reached, such as the goal of theproblem or the requested solution.2. Operators: an operator is the connection between two states. When anoperator is applied to a state, a new state is produced. For example, when anadding operator is applied to a state with two numbers, the new state consistsof the previous state plus the sum of the two numbers.

Because usually more than one operator can be applied to a state, the ini-tial states and the operators together define a graph. An actual solution pathis a path through this graph.

6.4.2 Example: part of a PBG model of architectural design

To make this abstract notion more concrete, consider the following example ofa partial PBG for architectural design. A problem state is represented as ob-jects corresponding to the notions in Hamel’s model (see Figure 5.2): problemconception, task schema, analysis schema, etc. There will be special operatorsthat manipulate schemata (for example to create a new schema, to transferresults from one schema to another, to stop working on a schema and to startworking on another) and operators that operate in the context of a singleschema. The information that is used in architectural design is substantialand complex. This will result in very complex operators if an effort is madeto express the full process in detail. An extra complication are sketches anddiagrams that are quickly scanned by an architect. These processes are hardto model. In his study Hamel did not go into that level of detail. The followingoperators are therefore constructed to illustrate the use of PBGs.


Some example state descriptions:

problem conception:

requirements:

function 1: price = environment:

name = available space = style =

persons = limit on height = height =

activities = required materials = colour =

noise level = ... ...

floor scratching =

...

current design:

surface size = playground: building:

expected price = size = size =

style = surface size = shape =

drawing = surface material = material =

main colours = colour =

... map =

...

room 1: room 2:

function = function =

floor size = floor size =

height = height =

shape = shape =

heated by = heated by =

windows: windows:

size = size =

shape = shape =

colour = colour =

material = material =

floor material = floor material =

colour = colour =

number of levels = number of levels =

electricity connections = electricity connections =

gas connection = gas connection =

isolation walls = isolation walls =

isolation floor = isolation floor =

isolation top/roof = isolation top/roof =


In the operators below objects such as room 1 and playground appear asarguments. For example, the term size(playground) refers to the propertysize of the playground in the design. Some operators are defined in general,abstracting from the object involved. In that case the unspecified argument isrepresented as a capital letter.

Some examples of operators:

Op1: IF required function(X) = folk dancing

THEN required floor size(X) = 80 sq. metres

Op2: IF floor size(X) = Y AND floor material(X) = wood AND

Z = X * 120

THEN price floor(X) = Z


THEN design floor material = wood


THEN design floor material = ceramic

Op5: IF floor size(X) = Y AND floor material(X) = ceramic AND

Z = X * 200

THEN price floor(X) = Z

Op6: IF persons = children

THEN colours = {yellow, blue, red, white, green}

Op7: IF total-set(rooms) = RoomSet AND sum(prices(RoomSet)) = SUM

THEN roomsPrice = SUM

Like Conceptual Modelling, this is a more abstract language than a program-ming language. It does not specify many details that would be required fora programming language. The PBG formalism has no fixed language for thestates or for the operations that change the states. The language for this isto be defined. This can be a computer language but it can also be an infor-mal, non-executable language. A knowledge state is intended to represent theknowledge that exists in the memory of a person who is solving a problem.

6.4.3 Production rules

Newell and Simon adopted an existing formal language that fits very well intothis model as a more formal modelling language: production rule systems.Production rule systems are actually a family of formal and executable lan-guages. These languages all share the following basic structure but they differin a wide range of other properties. All production rule systems (PS) consist


of productionindexproduction rules rules and a working memory (WM). Theproduction rules have the form:

IF conditions THEN actions

The conditions test if elements are present in, or absent from, working mem-ory. A positive condition is true if the element is in working memory and falseif it is not. For a negative condition it is just the other way round. The ac-tions are either add or delete actions, adding or deleting elements in workingmemory. Production rule models thus assume a mechanism that can performmatching, selection and ‘adding’ and ‘deleting’. We illustrate this notion witha miniature example of a production rule system for solving arithmetic wordproblems by the schema application method. The model is designed for simpleproblems such as:

Mary has 5 apples more than John. John has 3 apples. How many applesdoes Mary have?

We need a few datastructures to represent sentences and schemata. We assumehere that the sentences are represented as follows:

[<sentence number> <word number> <word>]

For example, the problem above would appear as:

[1 1 Mary]

[1 2 has]

[1 3 5]

[1 4 apples]

[1 5 more]

[1 6 than]

[1 7 John]

[2 1 John]

etc.

Once the format for the elements in working memory is fixed, it is possible todefine rules that can apply to the content of working memory. Here are a fewexamples of production rules. The format for the rules is:

<patterns in working memory> --> ADD(element) or DELETE(element)


PRODUCTION RULES:

[? ? before] --> [ADD(change)]

[? ? now] --> [ADD(change)]

[? ? loses] --> [ADD(change)]

[? X1 more] [? X2 than] [one-more(X2, X1)] --> [ADD(compare)]

These four rules look for keywords in the problem text. If they find a keywordthen they add the name of the schema that corresponds to the keyword tothe working memory. The fourth rule does not look for a single keyword butfor a combination of two keywords more and than that occur together. Thisis necessary because, for example, more by itself is not enough to recognize aschema. Note that here we need an extension to the language: one-more. Ina programming language each expression must either have a special meaningin the programming language or it must be defined by the programmer. If wewrite these rules informally then we can be a little less careful here.

If the text of the example were expressed in this notation scheme the firstrule above would add change to working memory. This means that it is aproblem that is to be solved by the change-schema. Other production ruleswill have the presence of change as a condition which will have the effect thatonly rules that are appropriate for this schema will be applied.

The rules below fill a schema. For example, the change-schema consistsof an amount-before, an amount-after and an amount-change. These arerepresented as three separate elements in working memory. These representthe amounts before and after the change and the difference. Two of these aregiven in the problem text. Once the change-schema has been recognized theamounts that are given can be extracted from the text.

[? X1 lost] [? X2 N] [one-more(X2, X1)] [change] -->

[ADD([change-difference N])]

[? X1 now] [? X2 has] [? X3 N] [one-more(X2, X1)]

[one-more(X3, X2)] [change] -->

[ADD([change-after N])]

The other rules fill the change-schema from the problem text. The firstlooks for a pattern ... lost N ... in a sentence and assigns N to thechange-difference in the change-schema and the second rule looks for apattern ... now has N ... and assigns N to the amount-after the changein the change-schema. For example, these the first rule would apply to thefollowing sentence:


Mary lost 5 marbles.

and add the fact change-difference(5) to working memory.

Finally we give a rule for computing the answer:

[change] [change-before(N1)] [change-after(N2)] [N1 > N2] -->

[A = N1 - N2] [ADD(answer(A))]

This rule computes A by subtracting N2 from N1 and adds the result toworking memory as answer(A). Note that this rule assumes both a specialcondition N1 > N2 and a special action A = N1 - N2 for computing A.

If the model now receives the sentences from the problem above as elementsin its working memory, the rules given here can be applied and they will addnew information to working memory. In the example above, the rules wouldadd the name of the schema, fill the parts of the schema and compute theanswer. Note that in a given situation, more than one rule is applicable. Inthis example, all rules add information to working memory. This means thatat least we should prevent that the same rule is applied over and over. Thisis a simple example of the control problem that we mentioned above.

Programming languages based on production rules have built-in mecha-nisms for finding applicable rules and selecting one. Such a mechanism couldfor example exclude rules that double information in working memory. Suchlanguages also have predefined structures for expressing elements in workingmemory and they have predefined special procedures. The way in which suchpredefined elements are organized defines the different members of the familyof production rule systems.

6.4.4 Extensions of production rules

The basic mechanism of production rules (add, delete and matching theconditions of rules with the contents of working memory) have been extendedin many ways to make it easier to express various types of reasoning. Here wesummarize some of these extensions to give the reader an impression of theproblems with simple production rules and also of the possibilities of imple-mented production rule systems.

Structured objects and variables: The example above contained onlyunstructured objects in working memory. Many applications require struc-tured objects with production rules that access parts of the objects. One very


common example are goal structures. Problem-solving may be modelled as re-ducing the goal of problem-solving (design a building that meets requirementsR1 ... Rn) to one or more sub-goals (design a building that meets requirementsR1.1 ... R1.n, R2.1 ...). The relation between these goals must be kept inworking memory. A common way to represent this is as a stack. Productionrules can have actions that test the top of the goal stack, ‘push’ an elementonto the goal stack or ‘pop’ an element from the goal stack. These becomespecial conditions and actions for the mechanism applying the rules. Thiswould make it easy to model a reasoning process like:

so I have to find how many Mary has

well to do that I need to collect what is given

let’s see, John has 7

he has 4 more than Mary ...

OK, now how many does Mary have ...

3

This reasoning clearly involves the manipulation of a goal stack. We leave theconstruction of the model as an exercise to the reader.

Another example of a special structure is the structure of the answer. Thearchitect’s design must also be accessed in terms of its structure. For example,reasoning steps will need the rooms adjacent to room X, or the doors givingaccess to room Y. Special productions that access structured data in workingmemory can be written for these operations. This makes it much easier to con-struct the model. It is possible to consider a special language for architecturaldesign that contains predefined structures and procedures which simplifies theconstruction of a model. Here we list some important extensions to the basicproduction system architecture:

Calculations: The formalism above does not include calculation or evalua-tion of arithmetical functions. This can be added to the language by allowingspecial conditions or rules that involve calculating the result. There are severalways to realize this. For example, we could allow conditions like:

[annual-salary = X] [annual-interest = Y] -->

[Z = X * Y] [ADD(annual-income = Z)]

Certainty factors: In some models the contents of working memory areuncertain. This can be represented by associating numbers with elements inworking memory that express the degree of certainty. Think aloud protocolsdo not make it possible to infer certainty factors for conclusions, although they


will contain qualifications such as ‘likely’, ‘possible’, ‘unlikely’.External memory: During certain tasks people take notes and refer to theseduring problem-solving or they use other media that carry information aboutthe current problem. This can be incorporated in a model as an ‘external mem-ory’. This is similar to the standard working memory except that it is accessedby special production rules that ‘read’ information from external memory intoworking memory or that ‘note’ information from working memory on the ex-ternal medium.Explicit control knowledge: Hamel’s model of architectural design hada layered structure. One level describes the various sub-processes of designand nested within this structure we find the content of the knowledge. Thisstructure is not easily reflected in production rules. One way to express thisstructure is to define a production system with a two-step cycle. First a sub-process is selected and then rules (representing domain knowledge) are selectedwithin the knowledge associated with this sub-process. Special rules can stopthe sub-process and start a different sub-process, using information broughtinto working memory by the sub-process and other processes.Rule strength: Some production rule models associate numbers with produc-tion rules that represent the ‘strength’ of a rule. ‘Stronger’ rules are preferredwhen more than one rule is applicable. This is especially useful in systemsthat learn from experience. The strength numbers then reflect how often arule has been applied successfully and these numbers are modified on the ba-sis of experience.Parallel execution: A more recent variation consists of production rulessystems in which all rules fire in parallel and the effects of rules are combined.

6.4.5 Problem Behaviour Graphs, production rule systems and hu-man memory

There is a subtle but important distinction between production rule systemsand PBGs. In a PBG a solution path may involve ‘backtracking’ to earlier(knowledge) states. For example, consider the structure of the solution processthat is given in Figure 6.7. The numbers indicate the sequence of states.Now consider what happens to the knowledge of a person during this process.What if after S4 a subject returned to S2 and from there to S5? This seemscontradictory. If S1 to S6 are the knowledge states, then what does it mean toreturn to S2? Does he forget S3 and S4? How can he return to S2? If we wantto model this in a production rule system, how do we model this backtracking?The answer is to hypothesize special knowledge structures that keep track ofthe sequence of reasoning steps. Two structures are frequently used for this


S1 S2 S3 S4

S5 S6

INITIAL

FIGURE 6.7: Example of a solution process represented in a PBG


purpose: ‘stacks’ and ‘trees’. One possibility is to maintain a ‘stack’ of statesthat have been visited. If an operator is applied the new state is put on topof the stack. If no operator can be found to continue from a state, the processreturns to an earlier state in the stack. An obvious possibility is the element ontop of the stack, which was the element that was put there last: the previousstate.

Note that it is also necessary to be able to prevent running in circles. If S6has no possible continuations we may not want the model to consider returningto a previous state that would lead to S6 again (in the case we have reasonto believe that people would not do so either). This means that the systemhas to keep track not only of its current path (with the stack) but also of thepaths that it has explored before. This can be done by maintaining a treeinstead of a stack. This tree represents the part of the problem space thathas been explored so far. The stack or the tree is in working memory but ismanipulated by standard procedures in the production rule machinery. Notethat one need not identify the mechanism of the production rule language withthe architecture of the human mind. What is ‘knowledge’ in the psychologicalmodel can be represented as part of the mechanism of the production rulelanguage. For example, if one assumes that people have the knowledge neededto perform simple computations correctly, computations can be included in aproduction rule system as extensions to the basic architecture as we discussedabove.

Another subtle point is the construction of new objects. Classic produc-tion rule systems did not allow the creation of new objects from old elements.They allowed only adding and deleting from working memory. A problem-solving operation usually cannot be modelled directly by simply retrievingand applying a production rule. For example, problem-solving often involvesconstructing a new element from existing elements. This construction requiresa special mechanism for matching and applying a rule. A simple example arecalculations. Constructing a sum and adding it to working memory cannotsimply be done by retrieving the given numbers and the definition of sum.To actually construct the answer, arithmetic operations are needed in addi-tion to retrieval and matching. Another example, from architectural design,is extending a partial design with a component that is constructed from twoparts, for example, combining a half-designed kitchen and a half-designed liv-ingroom into a single half-designed livingroom with open kitchen. Both theseissues may seem details but our experience has shown that they become realstumbling blocks when one wants to elaborate and implement a model.

We have discussed production rule systems in some detail, because theyare very popular in both cognitive psychology and knowledge engineering. The


examples also gave a small impression of the difficulties encountered in formal-izing a model. Production rule models are quite comparable to PBG models.The correspondence is as follows:

PBG Production systemsoperator production ruleknowledge state content of working memoryinitial state initial content of working memorygoal state content recognized as goalreasoning step application of production rulecontrol knowledge knowledge for selecting production rule

There is of course a strong structural similarity between the architecture thatunderlies production systems and an approximate model of human memory.Newell and Simon elaborate the analogy along these lines:

Human memory Production systemsworking memory working memorylong-term memory production rule memorymemory element production ruleproblem-solving selecting and applying a production ruleperception take input from environmentaction change environment

This model is the same as the one we used in Chapter 2 to explain the verbal-ization process: thinking aloud means partial verbalization of (new) elementsin working memory. It is a matter of debate how far this similarity goes.One advantage of this modelling language is that it is easy to include knowl-edge about the cognitive architecture in the model. For example, if we havepsychological knowledge about the capacity of working memory or about thetime needed to retrieve knowledge from long-term memory, then we can easilyinclude these in the psychological model.

To see how architectural assumptions can be included, consider again theproduction rule model of solving arithmetic word problems. We can assumethat the amount of information that can be attended to (that is, the capacityof working memory) is limited. When the limit is reached, elements will disap-pear. We need knowledge or assumptions about how this works. Suppose thatwe assume that information that has not been used recently disappears. Tomodel this, we can assign a number to each element in working memory thatindicates the time at which it was used. When the capacity limit is reached,


the oldest element disappears and can therefore no longer be used. Candi-dates for this are sentences of the problem text that are not used for schemarecognition and schema filling. It may be necessary to read such a sentenceagain to enter it into working memory at a later stage.

6.5 Programming languages

More and more researchers not only formulate a model on paper, but alsoimplement the model in a computer program: a simulation program. All thedistinguished steps in the model will be translated to steps or procedures inthe computer program. Often the program is written in languages used inArtificial Intelligence such as Lisp and Prolog, but is also possible to use otherlanguages or authoring systems. When the model is implemented, it can be runon a computer in order to see how it behaves. The behaviour of the programcan be compared to the actual behaviour of the subjects. It is then possibleto modify elements of the model to see if and how the behaviour changes.The main disadvantage of building fully computational models is that thistakes much more effort than non-computational models. Important advan-tages of building simulation programs are:

Clear interpretations: Constructing simulation programs forces the modelbuilder to be precise. It is impossible to write a running program when themodel is fuzzy and inconsistent. The simulation program provides a modelwhich is defined in formal terms. This makes it clear what all the differentelements in the model mean, how they influence the behaviour of the model.Visibility of gaps and redundant branches: Building a simulation pro-gram makes it clear where there are gaps in the model. The program cannotrun when there are situations which are not covered. It is also possible to de-tect, when running the program several times, which situations never appearor which actions are never taken.Unexpected behaviour and experiments: Running a complex simulationprogram, using different conditions or problems, sometimes shows unexpectedbehaviour. This may be behaviour which is never to be found in subjects,showing the weakness of the model. However, it may give new insights in theproblem-solving processes. It is easy to feed the program all kinds of variationsof a class of problems, many more than would be possible to give to subjects.Maybe some problems will lead to other behaviour than predicted. This wouldbe an interesting outcome asking for new experiments with subjects. Anotherpossibility is to vary the conditions under which the model has to work. These


may be extreme conditions, for example overloading the model with informa-tion or setting severe constraints, experiments which might be hard to do withreal subjects. Doing experiments of all kinds with computer simulations iseasy and cheap, while experiments with real subjects is hard, expensive andtime consuming.

Often the main advantage in building simulation programs is not found inhaving a runnable program, but in the process of building it. Formalizing themodel, debugging it, making extensions and so forth, often gives a lot of in-sight in the processes to be modelled. Sometimes, researchers will never finishthe program, but the model they started with will be much improved duringthe process of designing the simulation program.

Of all the languages that we discussed there exist versions that are exe-cutable. For example, an executable production rule language means that theform of the information in working memory and the form of the rules is spec-ified and a mechanism is implemented for matching and applying productionrules. This makes it possible to write a program consisting of production rules,input data into working memory and start execution of the program.

The disadvantage of computational models is that they require very muchdetail. Usually a computational model requires more detail than psychologycan offer. For example, in a production rule system, all production rules thatmatch the content of working memory are retrieved. This is a simplificationwith respect to the architecture of human memory. Human memory may failto retrieve knowledge that was retrieved correctly in a very similar situation.It is usually not possible to construct a computational architecture and modelthe knowledge such that all aspects of the human process are modelled in de-tail. The resulting unexplained effects will appear in the analysis (see the nextchapter) and indicate the limits of our knowledge.


6.6 Using a language or adapting it

In the examples that we gave so far, formal languages were used directly torepresent a model. Often it is necessary to extend the language or to define anew language usually based on an existing one. This happens when the chosenformal language lacks constructs that are needed for the model.

For example, consider the task of architectural design. This involves themanipulation of partial designs. Objects in the design are added, deleted,moved, made larger or smaller, made to fit other objects, etc. Suppose thatwe want to model this in a production rule language. A (partial) designwill be a complex object, represented as a set of facts in working memory.Deleting an object from the design will correspond to the application of a setof productions.

A good way to construct such a model is by defining procedures for complexoperations. For example, it would be convenient if we can define a complexproduction rule to delete an object from the design. This complex pro-duction rule would then contain knowledge about the details of the deleteoperation. This requires careful definition of this new delete production ruleto make it work correctly on a variety of objects and (partial) designs. If itis possible to define this, then we have in a sense extended the representationlanguage with a new construct. Defining a new complex procedure in termsof other procedures is called procedural abstraction.

Psychological models are usually defined in terms of complex constructswhich makes this type of abstraction necessary. Note that this makes it pos-sible to design a language to meet the requirements that were listed in theintroduction of this chapter: it is possible to define appropriate language con-structs that have a psychological interpretation. The resulting language isstill executable because it is defined entirely in terms of the initial executablelanguage.

Consider the following example. Architectural design can be viewed asgoal directed problem-solving. A problem is defined as a set of requirementson the design. Problem-solving can now be viewed as constructing the design,where design choices are made on the basis of both the current partial designand the requirements. Now suppose that we want to use a kind of productionrules to model the design process. In general the conditions of these rules willrefer to the partial design and to the requirements. We can try to construct amodel along this line. However, at some points architects do not simply addelements to the design but they revise previous choices because completing thedesign becomes difficult as a result of previous design choices. It is difficult toavoid violation of requirements that were satisfied by the current design.


Modelling this process can be done by adding special production rules thatmodel each revision operation. However, a different possibility is to construct ageneral ‘revise design’ mechanism that finds previous requirements and checksif they are not violated by the modified design. This mechanism can be ap-plied by invoking it and giving it the current design, the revision and theold requirements. This can be viewed (and realized) as an extension to theproduction rule language.

A more principled solution is to introduce explicit datastructures that rep-resent requirements that have been validated by certain design choices and tocreate special procedures that operate on these requirement structures. Theseoperations perform tasks, such as selecting a requirement, finding a designchoice that meets the requirement, evaluating a requirement on a partial design.This results in the construction of a special problem-solving mechanism, a re-quirement solver that is constructed on top of production rules. The model cannow be expressed in terms of requirements and the knowledge associated withthe requirements, such as select requirement, etc. A new language has beendefined in terms of production rules. Recent research in knowledge acquisitionhas resulted in a range of systems that are based on special problem-solvingmechanisms and that are therefore applicable to specific reasoning processes.These systems are not only relevant for knowledge acquisition but also forpsychology and education because they contain models of particular types ofreasoning processes.

6.7 Differences between languages

The reader will have noticed that there are similarities between the languagesthat we discussed: the language for the task layer in the CPML is a pseudoprogramming language and the rules that can be used to represent the domainknowledge associated with a knowledge source can be production rules. For aparticular model we therefore have at least the following three possibilities:

1. Use the CPML.2. Use only production rules. This amounts to constructing only the domainlayer of the CPML model.3. Use only the pseudo programming language. Compared to the CPMLmodel this amounts to expressing the domain knowledge in pseudo program-ming language. This model will consist of the task layer of the CPML modelbut the knowledge sources and their associated domain knowledge will nowalso be represented as code in a pseudo programming language.


Note that the last possibility makes it more difficult to use structures suchas concept hierarchies and rules to represent knowledge involved in problem-solving. Using only production rules makes it more difficult to represent thestructure in the knowledge because that will not be reflected directly in themodel. We must add that these are not fundamental properties of the lan-guages. It is possible, for example, to organize the rules in rulesets and togroup these in the description of the model. However, the predefined struc-tures in CPML support and enforce this which is helpful, especially in largermodels. Important properties are:

Executable: Are there computer systems that can actually run models spec-ified in this type of language?Control structures: Another important aspect of cognitive processes is theorder in which processes take place and the conditions under which one actiontakes place instead of another that was also possible. For example, in thearchitectural design task it was noted by Hamel that architects first constructa design that meets the functional requirements and then refine this to achievea particular style or aesthetic accent. This could also have been done in anearly stage or after finishing part of the design. This phenomenon is difficultto model by production rules, because it is not easy to specify the conditionsunder which aesthetic rules are to be applied.Deterministic or not: Production rule models make it easy to specify mod-els that are non-deterministic with respect to control knowledge. This can bean advantage if the conditions under which a sub-process takes place are onlypartially known. This partial knowledge can easily be reflected in an undeter-ministic production rule model.Assumptions about the cognitive mechanism: Production rule systemshave an underlying mechanism, the architecture, which resembles the humancognitive system. This makes it easier than with other languages to includeassumptions about the architecture in the model.

The characterizations below apply only to the languages that we sketchedhere. There are many variations of for example production rules and concep-tual modelling languages and these differ with respect to the properties in thiscomparison.


CPML Pseudo PBG/programming productionlanguages rules

Executable no nearly possibleControl structures yes yes possibleDeterministic about control optional yes optionalIncluding assumptions about hard very hard easymechanism

Literature

Our language CPML is based on the KADS modelling language (KCML).CPML is somewhat simplified (for more details, consult the references at theend of this section). The most important difference between KCML and CPMLis that CPML has no restricted set of knowledge sources. The original KCMLhas a set of predefined knowledge sources that describe types of problem-solving steps. Research in the context of this modelling language has produceda library of models defined in terms of these knowledge sources that can beused as the basis for constructing new models. Another difference is that wehave omitted a fourth layer of the KCML model, the strategic layer. The bestreferences on the KADS methodology can be found in the book by Schreiberet al. (1993) and in several articles. More specifically Wielinga et al. (1992)give an overview of the KADS methodology, van Harmelen & Balder (1992)describe the logic-based formal specification language (ML)2 and Breuker &Wielinga (1987) discuss the use of think aloud protocols in the KADS method-ology. Several systems have been built by designing and implementing a newlanguage. For example, the PDP system (Jansweijer et al., 1987; Jansweijer,1988) which combines concept hierarchies and goal oriented production rulesand was written on top of Prolog. Brownstone et al. (1985) discuss productionrule systems. Another implemented system based on production rules is SOAR(Laird et al. 1987). Newell & Simon (1972) provide an excellent and extensivediscussion with examples of problem behaviour graphs and production rulemodels. Lucas & van der Gaag (1991) give a detailed technical discussion ofproduction rule systems. Another psychologically motivated language basedon production rules is the GRAPES system by Anderson and his colleagues(1989). A collection of special purpose problem-solving mechanisms developedfor knowledge acquisition is Marcus (1988). Musen (1989a, 1989b) describes amore general method for designing special purpose problem solvers along withsupport tools.

Chapter 7

Analysing the protocols

7.1 Introduction

Thus far we have discussed how think aloud protocols are collected and howpsychological models are constructed. We now turn our attention to the re-lation between the protocols and the psychological model. There are threeissues that deserve attention in the analysis of think aloud protocols:

1. Constructing a mapping between protocols and model.2. Avoiding bias and interpretation errors in comparing protocols and model.3. Quantifying the correspondence between protocols and model.

This chapter will give standard procedures and techniques for the analysisprocess. Figure 7.1 lists the steps of this phase of protocol analysis. The goalis to construct a mapping between the psychological model and how the cog-nitive process will appear in the protocols. This mapping will take the form ofa coding scheme that is based on the psychological model and a verbalizationtheory: a theory about the verbalization process. Using this, the protocol canbe compared with the model. We shall describe how to construct a codingscheme and how to apply this in the context of psychological research. Finallywe go into the comparison of the coded protocol with the constructed model.This chapter will be concluded by making suggestions on how to report theresults of studies using protocol analysis.

In the context of knowledge acquisition objective measurement of the corre-spondence between protocol and model is less important than in empiricalresearch. However, also the knowledge engineer has to specify the relation


Psychologicalmodel

Codedprotocols Coding scheme Verbalization

theory

Segmentedprotocols

Rawprotocols

Predictedcoded

protocols

FIGURE 7.1: The analysis process

Analysing the protocols 119

between protocols and model but it is not necessary to take measures to avoidbias in the interpretation or to quantify the correspondence between proto-col and model. Therefore Sections 7.5 to 7.7 (Coding procedures, Intercoderreliability and Comparing the coded protocols with the models) and 7.9 (Re-porting the results of protocol analysis) are less relevant for the reader who isonly interested in knowledge acquisition. Both in knowledge acquisition andin scientific research a mapping between protocols and model is constructed.Therefore the rest of this chapter is relevant for both applications.

7.2 The role of protocols as data in research

In scientific research a distinction can be made between ‘raw data’, ‘data’ and‘theory’. Raw data are obtained by measuring procedures that are objectivein the sense that they can be applied under any condition and will produce thesame results. These results are generally considered as valid by the scientificcommunity. Examples of such standard procedures are the use of certainthermometers to measure the temperature, the use of electronic clocks andhand-switches to measure reaction times and the use of certain intelligencetests and procedures for administering these. In some cases these objectivedata are interpreted, abstracted or aggregated in a way that is not as objectiveas the previous procedures. For example, the scores on an intelligence test maybe combined to measure a new type of intelligence or time measurements areclassified as indicators of a cognitive style by human judgment. In that case theresult is less objective because it may depend on human judgment or becauseits validity is not generally accepted.

In the context of think aloud protocols we give standardized procedures forsegmenting the written protocols and we argue that segmented protocols canbe treated as raw data. The next step from raw data to the model is to codethe segments in terms of the model. This usually requires a coding scheme,an extension to the model that describes how categories of the model willappear in the protocol. The coded protocols are data in the sense that codinginvolves procedures and coding schemes that are less objective than those forsegmenting. The coded protocol that can be compared with the model whichis directly derived from the theory.

7.3 Transcription and segmentation

It is very hard to analyse a think aloud protocol directly from audio recording.This is especially so in exploratory use of protocols. It is simply more difficult


to get an overview over audio recordings and it is also more difficult to retrievefragments from an audio recording. Protocols are generally transcribed intotext. Although we may hope that in the future this process can be automatedit is now simply much work. Transcribing is easier when using a special cassetteplayer with a foot-switch and much depends on the quality of the recording.Notes and other observations are inserted in the transcription as much aspossible.

Next the protocol is segmented, that is divided into segments. Researchon language production and language understanding shows that in speech theboundaries of phrases are usually marked by pauses (Ericsson & Simon, 1993).The combination of these pauses and the linguistic structure provide a natu-ral and general method to segment a think aloud protocol. Experience withsegmentation has shown that there generally exists a high level of agreementbetween people asked to segment a written protocol while listening to it. Seg-mentation becomes more difficult and less reliable when it is done on the basisof the written text only. It is a good idea to design a form for the protocolsegments and the analysis and to number the segments on this form. There aremany computer systems that can be used to support storing and retrievingprotocol fragments. Segments can for example become items in a databaseand be indexed by information about the protocol (subject, problem, date,etc.) and by model categories. In the analysis, segments are often combinedinto episodes, see Section 7.5.2. An episode is a sequence of segments thatcorresponds to a single element in the model.

7.4 Coding scheme and verbalization theory

7.4.1 Introduction

A procedural psychological model describes which cognitive processes will oc-cur and also in which order they will occur. There is often still a substantialgap between the model and the protocol data. This makes it necessary toextend the model with an operational definition of categories in the model inthe form of a coding scheme. The coding scheme specifies how elements of themodel can be identified in the data. In the context of knowledge acquisition itis also relevant to consider this question because the verbalization theory willtell the knowledge engineer which information can be obtained directly fromthe protocols, which cannot be obtained at all and which can be obtained par-tially or indirectly. In the context of knowledge engineering a coding scheme isnot necessary to achieve objective data analysis but is used for retrieval of rele-


vant protocol fragments. In practice one usually associates protocol fragmentswith model components without a mediating coding scheme.

7.4.2 Constructing a coding scheme

Because it is difficult and therefore unreliable to compare the protocol and thepsychological model directly, it is usually necessary to make a coding schemeto help in the analysis. A coding scheme is based on the psychological modeland and on the verbalization theory. The step from psychological model tocoding scheme is usually quite straightforward. Take every (sub-)process dis-tinguished in the model and state how you expect these processes to appearin the protocols. Take for example the word problem solving processes. Themodel for using the approximation method has a sub-process called ‘guessing’.One could assign the coding category ‘guessing’ to this sub-process, and onewould expect to find statements in the protocol indicating a numerical solu-tion with a certain uncertainty. The subject will for example say: ‘Maybe itis X (X = number)’, ‘Could it be X?’ or ‘Let’s try X’. This would appear in acoding scheme as:

Cognitive process DescriptionGuessing ‘Maybe it is X (where X is a number)’, or

‘Could it be X?’ or ‘Let’s try X’.

For every process described in the model one defines the type of statementreferring to that process. Categories in the coding scheme can be describedin general terms but it is usually very helpful to give some examples of proto-typical statements for each category. If two or more categories are similar ithelps to emphasize the difference. It is of course possible that it appears to beunfeasible to define a reliable coding scheme. In that case one can has to dropthe distinction, to revise the conditions under which the data were collectedor even to find a different method for collecting data.

This type of analysis of verbal reports is similar to content analysis. Con-tent analysis is usually applied to written documents that are stated in gram-matically correct language, which simplifies the analysis. However, contentanalysis has also been applied to, for example, interview texts, which bringsit quite close to the analysis of think aloud protocols. The main difference be-tween content analysis in general and the analysis of think aloud protocols isthat the latter usually involve problem-solving processes and therefore involveprocess models. Content analysis usually concerns static properties of texts.


7.4.3 Grain size and aggregation

If the task analysis and the psychological model are very detailed then elementsof the psychological model may correspond directly to segments in the pro-tocol. However, often the model is more coarse-grained and a single element(for example a single knowledge source or production rule) corresponds to asequence of several segments. This has no consequences for the coding scheme,because one simply defines categories that cover more than one segment.

7.4.4 Special coding categories

There are some categories in the coding scheme which are not directly derivedfrom the model. These are the verbalizations which are not covered by themodel, but may still be anticipated in the protocols. For example:

(a) Talking about not-task related issues (‘Oh, I must not forget to call myfriend’).(b) Evaluation of the task or task-situation at a meta-level (‘It is tiring to talkso much’, ‘I hate these kinds of problems’).(c) Comments on oneself (‘I am thirsty’, ‘I am not comfortable’).(d) Silent periods. At times people will briefly stop verbalizing. After sometime they may continue or they may be prompted to continue. It may berelevant to assign a code to relatively long pauses.(e) Actions. The subject performs an action (for example, writes a note ormanipulates a device). It is usually best to include this in the coding scheme.

In some cases one would ignore all these events as irrelevant - because they donot bear upon task performance - so putting them in one category: ‘irrelevantcomments’. At other times, however, these remarks might be an indicationof the level of difficulty of a sub-task or of the cognitive load of the subject.For example, if a subject makes a lot of task-irrelevant remarks each time hemust do a calculation, one might suspect calculating is difficult for this per-son. Sometimes the content of these interruptions in task performance is notrelevant, but the moment at which they occur is. For example, it may indicatethat the person who solves the problem does not make progress (reached an‘impasse’). In that case, special codes should be used for interruptions.


7.4.5 Coding form

To facilitate the coding process, design a coding form that consists of thesegmented protocol (with numbers for the segments) and has space for markingthe code assigned to each segment and for indicating discrepancies betweenmodel and coded protocol. If the protocols are stored in a computer databasethis information should be added for each segment or episode.

7.4.6 Example: a coding scheme for architectural design

Let us take a look at a part of the coding scheme for architectural designconstructed by Hamel (1990). This coding scheme corresponds directly to themodel in Figure 5.2. For every category in this model several processes aredistinguished. Take, for instance, an architect whose cognitive process at acertain moment is hypothesized to be taking place on the ‘the styling schemalevel, execution phase’. Hamel states that this process should be reflected inthe protocol as comments on the strategy and the way of working during thesynthesis of the design. Such comments are, for example, verbalized in theprotocol as: ‘Well, I am now going to look at the consequences of this locationfor the playing possibilities of the kids and their safety.’ Below we give exam-ples of a few coding categories. The complete coding scheme is reproduced inAppendix E.

Code DescriptionSO2 Synthesis, orientation, estimation of combining aspects of

the designAO2-10 Analysis, orientation, using one’s own knowledge with regard

to functions of the assignmentSU9 Synthesis, execution, isometric perspectiveSE1 Synthesis, evaluation, comparison with expectations, inspection,

or checking of data or requirements

These coding categories clearly require knowledge about the task. Let us nowtake a look at a part of the protocol fragment given in Chapter 1 to see howthis was coded.

Code Line Protocol text

SO2 12: but maybe we can with er do something with

that shack

AO2-10 13: water I’ll just put tap [notes ‘tap’]


AO2-10 14: what children of course

AO2-10 15: what what what is much handier

SU9 16: [sketches water tub]

SU9 17: maybe something er where water comes out of er

SU9 18: and that you turn off in winter

SU9 19: but then it does not trouble you

SU9 20: and then it may run into here somewhere

SE1 21: then they can still mess about ... with water

SE1 22: then they can play with this

SE1 23: that just comes it slowly drips out of it or so

SE1 24: then they’ll get dripping wet in summer

SE1 25: and then they can get around this

Note that here each segment is coded. An alternative way to code this part ofthe protocol is to divide the protocol into episodes and assign a code to eachepisode. In that case, line 13-15 would be coded as AO2-10, line 16-20 wouldbe coded as SU9 and line 21-25 as SE1.

7.4.7 Verbalization theory

A coding scheme is based on the psychological model and on our knowledge ofthe way in which cognitive processes will be verbalized. In Chapter 2 we dis-cussed the factors that influence verbalization. In particular, information thatresides in working memory for a very short time, that is difficult to verbalizebecause of complexity or because of its non-verbal character, may not appearin a protocol. For example, a chess master analysing a chess position may wellperceive a large part of the chess board as a whole. Verbalizing this requiresher to construct a short linear representation that can be uttered as spokenlanguage. This task is very hard and likely to give synchronization problemsbetween the speed of the thought process and verbalization.

If verbalization is difficult, then verbalizations will be idiosyncratic: therewill be individual differences in verbalization, even if the content of the cog-nitive process would be the same. Take for example tasting wine. Unexpe-rienced wine-tasters when asked to compare different wines will talk of theflavour of the wines in their own terms, calling a taste sour or bitter. Expertwine tasters share a common vocabulary to express how wine tastes, callingthe wine fruity. Idiosyncratic expressions cannot be covered by a verbalizationtheory and a coding scheme. In some cases it is possible to use part of theprotocol to construct a ‘personal coding scheme’ for each subject. This is a


good way to handle systematic differences in vocabulary between subjects. Ifa task involves information that is hard to verbalize then the analysis shouldallow incomplete verbalization and use a more abstract (or more branched)coding scheme.

7.4.8 Example of a verbalization theory

Consider the models of solving arithmetic word problems discussed in Chap-ter 5 and 6. Suppose that the persons solving these problems are reasonablyexperienced in this kind of task. What can we expect to find in the protocols?According to the model of Chapter 2, people verbalize the new contents ofworking memory or rather part of that. From the model we can find whichinformation will appear in working memory. In the case of production rulemodels this is simply the information added by the rules and for the otherlanguages the new information that is constructed during problem-solving islikely to appear there. For example, for the production rule model sketched inSection 6.4.3 we would find the following elements that can appear in workingmemory:

• The problem text (when it is read by the subject).• Names of schemata (for example change and compare).• Parts of schemata (for example in the change-schema we have: amount-before, amount-after and amount-change. These are either filled from theproblem text or by computation).

For some expressions it is not clear if they will appear in the protocol. Forexample, according to the production rule model the name of a schema will beadded to working memory. However, it is quite possible that schema recogni-tion takes place implicitly and that the result does not appear in the protocols.This is plausible if there is no standard word for the change-schema and onthe other hand this refers to the rather common concept of losing or obtain-ing something. This means that it is likely that people will not report thisthought: it will pass very quickly and is relatively difficult to verbalize. Fromsuch considerations we can drop part of the list of possible verbalizations. Thiscan be indicated in the coding scheme (‘will/may not appear in protocol’). Forthe same reason the names of parts of schemata are unlikely to appear in aprotocol. We will expect only the amounts.

Note that the production rules themselves do not appear in working mem-ory. They are retrieved and applied by the cognitive machinery of the subjectand do not appear as contents in working memory.


How sophisticated and elaborate should the verbalization theory be? Inmost cases there is simply not enough psychological knowledge and knowledgeabout the people involved in the task to predict what will appear in the pro-tocol. This means that one has to resort to pilot protocols. These are used toperform the exercise described above.

7.4.9 Methodological requirements for the coding scheme

The main requirement for a coding scheme is that it allows objective codingof protocol fragments in terms of the psychological model. This breaks downinto the following specific requirements:

Completeness (with respect to the model): The coding scheme mustcontain descriptions of all reasoning steps (or their results) that appear in themodel and that can be expected to appear in the protocols on the basis of theverbalization theory. We should of course not require the coding scheme tocover all of the protocols. This coverage is the hypothesis that is to be vali-dated. Segments (or episodes) that cannot be coded correspond to cognitivesub-processes that are not explained by the model!Justified: The coding scheme must be justified by the model and the ver-balization theory. One should not introduce new elements or concepts in thecoding scheme that do not follow directly from the psychological model andthe verbalization theory.Grain size: Either the grain size must correspond to that of segments or itmust be possible to objectively aggregate episodes that correspond to the cat-egories in the coding scheme (else aggregation becomes part of coding whichcomplicates the analysis). In the latter case aggregation should be done in aseparate pass through the data. See also Section 7.5.2.Unambiguous: The coding scheme must be clear enough to be used byoutsiders. This is necessary to maintain objectivity of the coding procedure.Different coders must assign corresponding codes. We give a measure for thisbelow.Context independent: If a coding category describes a single cognitive pro-cess then it must be possible to recognize this without the context in which itappears. This is particularly important if we want to test the sequence pre-dicted by the model. If the context in which a segment or episode appears isnecessary to assign a code to it, it is no longer possible to really test the orderbecause the order was used to assign the codes!

As with all methodological requirements, it is usually not possible to meetthem for 100 per cent nor is it possible to construct an adequate coding scheme


in one pass. The coding scheme (and the coding procedure) must be testedand evaluated on pilot protocols before it is applied to the actual data. Forsome criteria it is possible to quantify the extent to which they are met.

7.5 Coding procedures

7.5.1 Introduction

Coding means assigning labels to protocol episodes following the coding scheme.The result of applying the coding scheme to a protocol (raw data) will be acoded protocol. Segments or episodes that cannot be coded and sequences thatare not predicted by the model but that do appear in the protocol reflectdeviations of the model.

7.5.2 Aggregation

Transcription and segmentation are standard procedures that can be per-formed without any knowledge about the task or the model. This is nottrue of aggregation. Aggregation means collecting segments into groups thatcorrespond in ‘grain size’ to the model. This is necessary if the model cannotdescribe protocols at the grain size of segments. For example, a model maycontain a component ‘compute the average’ that is not elaborated further. Inthe protocol several segments together may correspond to this process. Inthis case only an episode, a sequence of segments can be recognized. Strictlyspeaking, aggregation of segments into episodes is part of the coding processbecause it requires knowledge about the model and cannot be performed in amodel-independent way.

7.5.3 Coding

If possible, it is best to leave the coding of the protocols to independent coders.The researcher who has constructed the model and the coding scheme usuallyis too much attached to a certain research hypothesis to do the coding with anobjective mind. Try to find coders who are not involved in the research project,and who have no specific interest in the outcome of the protocol analysis. Thisgives the best guarantee for objective (reproducible) coding. Give the codersonly the minimal information about the purpose of the study and instructthem to be as precise as possible. Coders need to be trained in the use of the


coding scheme. The protocols used for revising the coding scheme might beused for this.

The interpretation of a single phrase may be influenced by the context inwhich it appears. Take for example a subject who has been making a lot oferrors and was very confused about the right way of solving the assigned wordproblem. If this subject would then say: ‘the answer is 15’, one would beinclined to interpret this step in the problem-solving as another wild guess. Ifthe subject was someone who methodically followed a straightforward proce-dure, one would think it was just the outcome of the problem-solving processand thus the phrase ‘the answer is 15’ would be coded accordingly.

Measures one could take are: give coders only the minimal context infor-mation for coding. To refrain coders from using the context in which a protocolepisode appears which might bias their coding - minimize the available con-text. The context involves:

(a) The subject who produced the protocol: shuffle protocols of different sub-jects and distribute them at random over coders.(b) The content of the protocol: if possible cut the protocol in pieces thatare just big enough to be coded reliably and shuffle these pieces. This willminimize the bias due to the context in which the fragment appears in theprotocol. This is not always possible, sometimes protocol statements get un-intelligible when lifted out of their contexts. In that case one must acceptcontext-dependent coding.

7.5.4 Rating protocols or protocol fragments

If the model and the theory about the process are not procedural but struc-tural, describing properties of protocols or protocol fragments, then protocolsare rated. In this case scales are defined describing properties of the protocoland these are applied by coders following a procedure that is analogous to thatfor procedural models. In this case, the rating procedure produces standardnumerical data and standard procedures for analysis can be applied.

7.6 Intercoder reliability

Before it is applied, a coding scheme and the coding procedure must be evalu-ated. In particular, correspondence between codes assigned by different codersto the same data must be found. If this correspondence, intercoder reliability,is low, then this means that the coding scheme is ambiguous. How can this


correspondence be quantified? The techniques for quantifying correspondenceall use one set of data that is coded by two (or sometimes more) coders. Inthe case of think aloud protocols, the data consist of an entire protocol, sev-eral protocols or one or more fragments. Intercoder reliability may vary overprotocols and fragments so it is important to use a representative sample ofthe total set of protocols.

Reliability is usually quantified over segmented protocols. The segmenta-tion itself is usually rather reliable and coding reliability can simply not bequantified over protocols that are segmented in different ways. It is also bet-ter to exclude irrelevant parts of the protocol (for example comments on thesituation, reading fragments of text) because these inflate the reliability withrespect to the actual cognitive process.

The selected data are then coded following the coding scheme that wasdesigned in advance. Usually this means that they are assigned to a codingcategory. From these codings a cross-table can be constructed. Each cellcontains the number of elements (for example fragments) as coded by bothcoders.

Take, for example, a small protocol which consists of eight segments. Twocoders have coded this protocol. The coding scheme has only two categories:A and B. The two coders have coded the protocol as follows:

Segment Code by Code by Correspondencecoder 1 coder 2

line 1 A A yesline 2 A B noline 3 B B yesline 4 A A yesline 5 A A yesline 6 A B noline 7 A A yesline 8 A A yes

In this example both coders coded a line 5 times as A (line 1, 4, 5, 7 and 8),1 time as B (line 3), and 2 times coder 2 assigned a B where coder 1 coded anA (line 2 and 6):


C1 C2 FrequencyA A 5A B 2B A 0B B 1

From this coding the following cross-table is constructed:

Code A B TotalA 5 2 7B 0 1 1

Total 5 3 8

Correspondence between coders is now quantified as the association betweentheir codings. The first obvious measure is the proportion of correspondingcodes with respect to all codes. For the above table that would be the sum ofthe number of codes on the diagonal, namely 6 (5 + 1) divided by the totalnumber, namely 8. This gives 75 per cent correspondence.

A conceptual problem is, though, that this measure is sensitive to differ-ences in marginal frequencies, the proportion of the different coding categoriesused by the coders. In this example, code A is much more often used thancode B, namely seven times by coder 1 and five times by coder 2, where Bis used one time by coder 1 and three times by coder 2. This means thatthe expectation for an arbitrary segment to be coded as A, is higher than theexpectation for it to be coded as B. In this small example the need for correc-tion for marginal frequencies might not be so obvious. But take for example aprotocol of 100 segments. If 99 segments are coded as A by both coders andonly 1 segment is coded as B by one of the coders, the correspondence wouldbe 99 per cent. However, it is not fair to say that in this case an extremelyhigh intercoder reliability has been reached. For a segment to be coded as A,the chance was 99 per cent. So in some sense the coders did not do better thanif the coding had been done by a random generator. The one segment aboutwhich there was no agreement between coders is far more significant than allthe other segments which were coded the same.

In another example, where from the 100 segments 50 were coded as A byboth coders, and 49 as B, a reliability of 99 per cent indicates much morecorrespondence between the coders. In this case every segment had a chanceof some 50 per cent to be coded as A. In this case the fact that the coderscoded nearly all segments the same, cannot be ascribed to chance.

Several measures have been invented that to some extent correct for dif-


ferences in marginal frequencies. These measures show subtle differences inbehaviour and in the underlying notion of association. One frequently usedmeasure, that we recommend, is Kappa. This measure is based on a correctionfor marginal frequencies and defines association as the relative proportion ofcorresponding codes with the following correction:

(proportion corresponding−expected proportion corresponding)Kappa =

(1−expected proportion corresponding)

Here the expected proportion corresponding is calculated by multiplying andadding marginal frequencies. In the example the correspondence for A thatcan be expected from the marginal frequencies of B is 7/8 × 5/8 = 0.55. ForB we will get 1/8 × 3/8 = 0.05 and then the total is 0.55 + 0.05 = 0.60. Forthe table above Kappa is:

(0.75− 0.60) 0.15Kappa = = = 0.38

(1− 0.60) 0.40

So, in this example we have gone from a proportion of corresponding codingsof 0.75, to a Kappa of 0.38. The fact that Kappa is lower is due to the factthat code A is more frequently used than B. That the difference is so great isdue to the fact that this example is so very small.

What will happen to the two extreme examples of the 100 segments pro-tocol if we calculate the Kappa? In the case of 99 segments coded as A theKappa is:

(0.99− 0.99)Kappa = = 0.00

(1− 0.99)

In the example of 50 segments coded as A and 49 segments coded as B byboth coders the Kappa is:


(0.99− 0.499)Kappa = = 0.98

(1− 0.499)

The Kappa makes a correction for the correspondence that can be expectedfrom the marginal frequencies. This is a conservative estimate of intercoderreliability because similar marginal frequencies will make Kappa low where onecould argue that similar marginal frequencies themselves indicate intercoderreliability. Since the proportion correspondence is an optimistic estimate it isbest to report both proportion correspondence and Kappa.

Let us now look at a larger example. A set of 98 segments is coded by twocoders: Coder 1 and Coder 2. The coding scheme has 5 categories and thetable below shows how many fragments were scored as A by both, as A byCoder 1 and B by Coder 2, etc.

Code A B C D E TotalA 6 0 2 0 0 8B 1 48 11 0 0 60C 1 2 17 0 0 20D 0 0 0 5 3 8E 0 0 0 0 2 2

Total 8 50 30 5 5 98

The proportion corresponding codes with respect to all codes is calculated bydividing the sum of the number of codes on the diagonal (81) by the totalnumber (98). This gives 83 per cent correspondence.

Now we calculate Kappa. The expected proportion corresponding is cal-culated by multiplying and adding marginal frequencies. For example thecorrespondence for A that can be expected from the marginal frequencies ofB is: 8/98 × 8/98 = 0.007

The expected proportion corresponding values are:


Code Marginal Expected proportionfrequencies corresponding

A 8/98 × 8/98 = 0.007B 50/98 × 60/98 = 0.312C 30/98 × 20/98 = 0.062D 5/98 × 8/98 = 0.004E 5/98 × 2/98 = 0.001Total 0.386

For the table above Kappa is:

(0.83− 0.386) 0.444Kappa = = = 0.72

(1− 0.386) 0.614

As we see in this example, there is a difference in the outcome of the twomeasures (0.83 versus 0.72), but the difference is not as large as in the previousexample (0.75 versus 0.38). Of course it is hard to say when the Kappa issufficiently high. We would, generally speaking, say a Kappa should be above0.70 in order to have an intercoder reliability that is acceptable. If the Kappais less, we would strongly advise to improve the coding scheme.

7.7 Comparing the coded protocols with the models

7.7.1 Introduction

We now will discuss the stage of protocol analysis that is crucial in the hy-pothesis testing style, i.e. comparing the coded protocols with the model. Ifthe model is stated in terms of properties of protocols rather than procedures,these properties are measured by counting items or by rating protocols. Thisrequires procedures that are standard in social science research. Comparingprocedural models with protocols involves a different notion of fit that we shalldiscuss here.

7.7.2 Comparing protocols with procedural models

Each segment in each of the protocols should fit within the model. In general amodel predicts more than one possible process (we called this non-deterministic


models). Each of these corresponds to a possible coded protocol and fittingmeans verifying if the coded protocol is one of the predicted protocols. Youlook at the coded segments one by one and compare them with the predictedsteps of the model. If a segment (or episode) does not fit into the model be-cause it cannot be generated from the model and the state of the reasoningprocess then this is marked as a deviation. These deviations are counted.Differences between the coded protocols and the predictions from a processmodel, may be of three kinds:

1. The protocols show processes which are not predicted by the model.2. The model predicts processes not shown by the protocols.3. The protocols show processes in a different sequence than predicted by themodel.

1. Unpredicted processes: Unpredicted processes occur when there aresegments or fragments in the protocol which cannot be coded, not even underthe category ‘not task-related or meta-statements’. This means that there isno category in the model for the verbalized actions. For example, in the ex-ploratory phase of Hamel’s study, he encountered remarks on the assignmentrelated to how the building should look. These kinds of processes were not atfirst part of the model and thus were not represented in the coding scheme.What should you decide about your model if you have one or more uncodablesegments? The simplest judgement is that your model is false. If the modelis detailed and makes strong predictions instead of allowing many possibilitiesthen it is likely to be false. However, usually an interesting question is to whatextent the model is false. How much behaviour is covered correctly by themodel? Generally speaking, one could say that the more uncoded segmentsare found, the more evidence there is against the model. This analysis must bedone for each element of the model to show which elements are responsible forthe discrepancies. Such discrepancies are especially of interest when there areunpredicted processes in protocols from different subjects, and even more sowhen there exists correspondence between those subjects on the unpredictedprocesses.2. Absence of predicted processes: A similar argument applies to theabsence of processes that were predicted by the model. If the absence cannotbe explained by the verbalization model (as a verbalization failure) then itcontradicts the model. The best measure for the absence of predicted pro-cesses is simply their number or proportion of the total number of predictedprocesses.3. Unpredicted sequences: Sometimes a protocol shows the predicted cat-


egories but in a sequence that is not predicted by the model. This applies toprocess models that contain predictions about sequences of events. Here againunpredicted sequences are evidence against the model and it is necessary toquantify the difference. One method is to take transitions as a unit of analysisinstead of reasoning steps. A process model may predict possible transitionsand unpredicted transitions can simply be counted. This method was used inHamel’s study. In 12 protocols a total of 2829 transitions occurred, 2818 ofwhich were conform the model. Hamel could argue that the remaining devia-tions were not of such nature that his model should be rejected. Transitionsdo not take the wider context into account: whether a transition is possibledepends on the previous transition and not on any wider context.

A process model usually specifies component processes and a relation betweenthese components. For example, a model that predicts the processes orien-tation, execution and evaluation can also specify sub-processes of these three.It can now specify that these processes will occur in this order. This meansthat sequences are predicted at two levels of detail. This gives the same typesof errors as we discussed (missing and unpredicted elements and unpredictedsequences) but at different levels. This complicates the analysis. We have noreal solution for this problem. A simple method is to report results for each(important) level separately.

7.7.3 Issues in quantifying the fit

There are several issues that complicate quantifying the fit between protocoland model. These are:

(a) Degrees of freedom in the model: A model may not specify a uniquedescription of a protocol, but it may allow several possibilities. These may inturn be dependent on events that occur during problem-solving. For instance,if the subject makes an unpredicted mistake, problem-solving may follow anunpredicted course, but still fit the model. A model allowing many differentbehaviours is of course weaker, in the sense that it has less predictive powerthan one that predicts fewer possibilities.(b) Differences in size of the protocols: Two protocols may differ in lengthnot because of differences in thought processes, but because of different stylesof verbalization. This may affect the coding. Usually this effect is reduced byaggregation, because a protocol that is more verbose than another because ofthe verbalization process, is likely to be the same if phrases are aggregated toproblem-solving episodes. If phrases are used then a standardization on the


size of the protocol (by using the percentages) is a good solution.

7.7.4 Comparing sets of protocols

Sometimes groups of protocols are compared, for example sets of protocols oftwo types of subjects in a psychological experiment or of subjects before andafter an experimental treatment. If dimensions are used and protocols can beassigned a score on a dimension, possibly aggregating process properties, thenthe situation is the same as in standard experiments and the same analysisprocedures apply. For example, one may count the proportion (or number) oforientation fragments in protocols of novices and experts.

A special problem occurs if the comparison involves process structures.Suppose that we compare beginners and experts in architectural design andpredict that experts will follow the order orientation - execute - evaluate wherebeginners will mix the order. How can we quantify the difference between aset of (coded) beginners protocols and a set of (coded) expert protocols? Apossibility is to calculate the rank correlation between each protocol and theexpert sequence and test the difference between the two groups. However, thisprocedure suffers from the effect of different numbers of protocol fragments(beginners are likely to need more reasoning to solve the problems than ex-perts). Ideally this effect should disappear when a good coding scheme is usedbecause different ways to verbalize a cognitive process will result in the samecode being assigned to a protocol fragment (even if the fragment is longer inone protocol than in another). However, in practice this is not always ade-quate. The solution followed in practice is to define properties that abstractfrom these levels on an ad hoc basis.

7.8 Computer support tools for analysis

7.8.1 Introduction

Transcribing and coding or annotating protocols is a very time-consumingactivity. Currently, several types of computer systems are being developedto support this task. These systems are still in a stage of development andexploration and they are not yet standardized or easily accessible. Thereforewe only summarize the principles of these systems to give an impression oftools that are likely to be available in the near future.


7.8.2 Indexing tools

One useful type of tools are systems that can index and retrieve pieces oftext. If protocols are transcribed and stored in the computer, fragments of thetext can be marked and given an index. These indexes can again be stored.For example, we can give each fragment an individual reference but collect allreferences of fragments where the protocol concerns, for example, ‘selecting aschema’. This makes it possible to quickly retrieve all fragments where thisoccurs. This type of tool is relatively easy to build. It can be constructedusing most hypertext and database systems. The structure of the database isto use segments or aggregated fragments of the written text as entries in thedatabase and to index these by coding category, coder, date (of collecting theprotocol), problem (that is the problem solved here) and subject. Future tech-nical developments will make it possible to store the spoken protocol and indexit in this way, thus avoiding transcription and keeping additional informationin the spoken protocol.

7.8.3 An implemented model as tool

The most powerful tools are those that correspond to implemented psycho-logical models that can be used to recognize uncoded or coded protocols.The model of physics problem-solving described in Chapter 8 has been im-plemented. The resulting system can actually solve most of the problems thatwere used in the study described there. A recognition mode was added tothis system as follows. In recognition mode, the system is given the problemand then it asks the user of the system (who has the protocol), what the nextlines in the protocol are. The answer must be given in a predefined form. Thesystem then checks if this is consistent with what it expects. If this is the case,it continues. If the protocol deviates from the expected solution path, this isnoted as an unpredicted step (which violates the model). In many cases thesystem can understand the step taken in the protocol and resume problem-solving from the new situation. In this way protocols can be coded quickly andan overview of unpredicted and missing events can be produced automatically.

7.9 Reporting the results of protocol analysis

Analysis of think aloud protocols usually leads to a large amount of docu-mentation: the typed protocols, the detailed model, the coded protocols, thecomparison between the model and the coded protocols. Technical reports on


studies using protocol analysis are, generally speaking, large ones. In orderto justify the conclusions drawn from such a study, every step taken in theprocess of analysis must be explained and reasons for decisions made, must begiven. For example, if you decide to leave out certain fragments of a protocol,you must provide evidence that these fragments do not falsify the model. Ifyou want to draw the firm conclusion from your study that experts put muchmore effort into orientation processes than novices, you have to show whichprotocol fragments are considered as verbalizations of orientation processes forevery subject under study. In order to enable other researchers to verify yourfindings and conclusions, a lot of information must be provided. This is al-ready a problem for technical reports and even more so for an article. There isno full agreement on the way in which the results of protocol analysis shouldbe reported in scientific articles. Here we give a proposal that tries to bal-ance the degree to which an independent researcher can reproduce the resultof the analysis and the space restrictions imposed by most journals. Next tothe usual description of the research question, the subjects, the task and theexperimental procedure, the following items should be described:

Short fragment of a verbatim protocol: This fragment gives the readeran impression of the kinds of processes and verbalizations used by the sub-jects. If possible, present a fragment where the subject is working on an easilyunderstandable task, so that the fragment is clear to the reader who is nota task expert. If the fragment is not self-evident, explain shortly what thesubject is doing.Schematic description of the model: If the description of the model doesnot take up too much space, one can include the description of the entiremodel. However, if the model is elaborate, is very detailed or has severallayers, a summary should be given. A schematic figure, accompanied by ashort description, is a good way of presenting a model, providing an insightfuloverview. You should indicate shortly how the model is extended in its fullform. Sometimes it is useful to present some parts of the detailed model, forexample the full decomposition of one main category.The main categories of the coding scheme: If the coding scheme is con-cise, give the complete coding scheme, otherwise list the main categories. Forone category, the complete coding should be given. State shortly how thecoding scheme is derived from the model. Give some examples of how thecategories are reflected in the protocols. For example: ‘category orientation:remarks about related problems’.Outcomes of the coding in an aggregated manner: Present the mainfindings of the coding in a concise manner, for example in a table with the


coding frequencies on the main categories.Intercoder reliability of the coding: Explain how intercoder reliabilitywas determined and present the outcome, usually both the reliability figureand the Kappa. Give the number of coders, describe who they were, and thepercentage of the verbalizations on which the reliability-score was based. Ifthe reliability is not very high, give an explanation, if available.Fit between the model and the coded protocols: Discuss how the codedprotocols are related to the model, give arguments and examples. If possible,provide a quantitative measure of the fit. These should be reported per ele-ment of the model (that is, per coding category).An example of a fully worked out analysis of a protocol fragment:Present a short fragment of a protocol, say how it was coded and how itcompares to the model.

Literature

A discussion of association measures is given in, for example, Everitt (1977).

Chapter 8

Examples

8.1 Introduction

In this chapter we present examples of the use of the think aloud method toillustrate different styles of modelling and coding. The first and second exam-ples concern psychological studies of problem-solving in physics and computerprogramming. The third illustrates the use of the think aloud method inknowledge acquisition in medical diagnosis.

8.2 Solving physics problems

8.2.1 Introduction

To illustrate the use of protocol analysis we present an example borrowed fromJansweijer (1988). The main purpose of this study is to develop a theory aboutproblem-solving in so-called ‘semantically rich domains’. By semantically richdomains Jansweijer means domains in which a diversity of knowledge is neededto solve problems and in which the problems are often ill defined. As domainfor his study he chose thermodynamics, a topic in physics. The focus of thestudy was on differences between novice and advanced problem solvers. Themain hypothesis, derived from previous research in education and psychol-ogy, was that an important difference between novices and advanced problemsolvers is in the method that they follow. Advanced problem solvers pay moreattention to analysis of the problem. This process is called orientation inpsychology. This enables them to select appropriate knowledge to apply to aproblem. Novices tend to skip the orientation and look for a formula that can


be used to calculate what is asked in the problem. If that formula needs moreinformation, they simply look for additional formulae until a set of equationsis found that can be solved for the asked.

Jansweijer developed a computational model of advanced problem-solvingthat could solve a large part of the problems (if they were presented to thesystem in a structured form). First we will give the text of an example problem,the advanced level model and then the coding of a protocol. Finally we presentJansweijer’s conclusions based on the comparison of a series of protocols of onesubject with the advanced model.

8.2.2 An example problem

Here follows the text of one of the 35 thermodynamic problems Jansweijer used:

An isolated container with a content of 800 dm3 oxygen, is rapidly broughtfrom a starting pressure of 120 kilopascals to 170 kilopascals, so no warmth isexchanged with the environment. How much is the new volume?

The problem is about a quick expansion of gas, without exchange of heatwith the environment. This makes the assumption that the process is adia-batic plausible. For adiabatic processes Poisson’s law (P × V γ = constant)is applicable. In a format which relates the begin state to the end state thisformula reads:

P1 × V γ1 = P2 × V γ

2

P1, P2 and V1 are given, while V2 is the quantity asked for.

So: V γ2 = P1

P2× V γ

1 and V2 = γ

√P1

P2× V γ

1

which can be filled in and computed.

8.2.3 The model of advanced problem solving

The solution above is the result of task analysis. It was obtained by consult-ing experts (in this case physics teachers and on previous research on solu-tion methods for physics problems). Jansweijer starts from the assumptionthat all problem-solving behaviour of a subject is determined by the knowl-edge the subject has concerning the task at hand. The difference between

Examples 143

novices and experts is thus not attributed to differences in the cognitive mech-anisms but to differences in knowledge. Therefore the model concerns theknowledge needed to solve these problems. Two main types of knowledgeare distinguished: knowledge about physics concepts, principles and formu-lae and knowledge about problem-solving. Here we focus on the knowledgeabout problem-solving. Jansweijer based this part of the model on the generalproblem-solving theory which distinguishes three main processes: orientate,solve and evaluate. Within those processes different sub-processes are distin-guished. For the orientation processes, these are: read problem, sketchand schematize. This leads to the following decomposition:

ORIENTATE

READ_PROBLEM

READ_GLOBAL

EXTRACT_FEATURES

SKETCH

READ_FRAGMENT

EXTRACT_FEATURES

CANONIZE

CONSTRUCT_PROBLEM_SKETCH

SCHEMATIZE

DETERMINE_SYSTEM

DETERMINE_STATES

DETERMINE_PROCESS

ANALYSE_ASKED

ANALYSE_QUALITATIVE

READ PROBLEM means the reading of the problem text as a task con-sisting of: read globally (read global) and notice keywords like ‘adiabatic’,‘quick’ and ‘dynamic’ (extract features). In the example problem, theexpression ‘is rapidly brought from a starting pressure of 120 kilopascals to170 kilopascals, so no warmth is exchanged’ indicates an ‘adiabatic’ thermo-dynamic process.SKETCH is described in the model as a task at which the text is read care-fully sentence by sentence (read fragment). The other sub-tasks are againnoticing keywords (extract features) and the reduction of non-standarddevices to their standard form (canonize). The extract features tasklisted here is the same as listed under read problem above. The modelstates that extract features either takes place while reading the problem,or under sketch. The last sub-task under sketch concerns the structuringof all the givens into a knowledge structure (construct problem sketch).


In the example problem this means that a representation of the problem ismade that includes the two states and the properties of these states.SCHEMATIZE is subdivided into five sub-tasks. determine system choosesthe thermodynamical system. determine states decides on the basis of thegivens how many states there are and tries to detect the state variables thattogether describe the states. determine process analyses the changes thatare described in the problem text. It aims at connecting the detected statesby state changes. analyse asked determines the type of unknown: is it astate variable or a process variable? analyse qualitative has as its task tomake a qualitative judgment on what kind of process is taking place and whatkind of changes are occurring (their direction and estimated size).

After this orientation process the model gets into the solving process. Itsdecomposition is as follows:

SOLVE

SOLVE_FOR_VARIABLE

CLASSIFY_VARIABLES

RESOLVE_VARIABLE

GENERALIZE_VARIABLES

SELECT_PRINCIPLE

CHECK_APPLICABILITY

SPECIFY_EQUATION

SIMPLIFY_EQUATION

COMPUTE

SUBSTITUTE_EQUATIONS

FILL_EQUATIONS

CALCULATE

SOLVE FOR VARIABLE generates a system of equations that is solvablefor the asked. This is realized by the sub-tasks classify variables and re-solve variable.CLASSIFY VARIABLES looks if there are still unknowns left (variablesthat are neither known, nor calculable, nor retrievable). If there are no un-knowns left, then there is apparently a solvable system of equations. If thereare still unknowns left, then classify variable repeatedly chooses one un-known, for which the task resolve variable is performed.RESOLVE VARIABLE has five sub-tasks. generalize variables de-termines what kind of variables are under concern (what physics concepts).

Examples 145

That means finding out what the dimensions of the givens are and what thedimension of the asked is. By dimensions concepts like mass, temperature,pressure and volume are meant. Subsequently select principle preselectspotentially applicable physics principles. This concerns principles that relatethe dimensions of the givens to the dimension of the asked. In case there aremore than one applicable principles, the one that introduces the least unknownvariables is chosen. check applicability then checks if all conditions forthe application of the chosen principle are fulfilled. specify equation de-rives from the general physics principle the specific equation that relates thequantities stated in the problem at hand. simplify equation finally simpli-fies the generated equation. It eliminates those variables from the equationthat cannot be quantified because they are expressed in terms of one another.COMPUTE has three sub-tasks. substitute equation substitutes thesubsequent series of equations until the asked is found on the left-hand sideand the right-hand side shows only knowns. fill equation substitutes thesymbols at the right hand side with their numerical values. calculate cal-culates the asked from this final equation.

Finally the problem-solving process ends with the evaluation phase.

EVALUATE

CHECK_SOLUTION

The model proposes that the solution is checked.This description gives only the level of the problem solving method and

not the physics knowledge. The actual model gives detailed predictions of theintermediate results and reasoning steps.

8.2.4 Design of the experiments

Before think aloud protocols can be collected, subjects and problems must beselected. To make the results as relevant as possible for the normal situation,subjects were university students. The novice subjects were psychology stu-dents who had done some elementary physics in secondary school. They wereselected on the basis of a pilot study in which they solved mechanics problemswhile thinking aloud. Many subjects in this pilot study could solve hardlyany of the mechanics problems. As novices those subjects were selected thatshowed some initial competence in solving mechanics problems. The advancedsubjects were physics students who were rated as very good by their teachers.Half of these subjects had also been trained in a problem-solving method for


physics problems. The problems were selected from exercises in a textbook onthermodynamics. The problems varied considerably in difficulty. The simpleproblems can be solved by most novices after reading the introductory textand the most difficult ones are hard even for the advanced subjects. The ex-ample problem above is one of the easier problems. Due to the amount ofwork, the protocols of only 6 subjects were used. They solved 35 problems.Because the computational model could solve 17 of the problems, the analysiswas limited to those. All subjects solved the same problems. Next the modelof the advanced problem solvers was compared with the protocols of novicesand advanced problem solvers. The prediction was that at the level of theproblem-solving method, the model would fit the protocols of the advancedsubjects much better than the protocols of the novices.

8.2.5 A protocol

We illustrate the analysis of the protocols with a protocol of an advancedproblem-solver for the above problem.

1: an isolated container with a content

2: of eight hundred cubic decimetres

3: containing oxygen, is rapidly brought

4: from a starting pressure

5: of hundred-and-twenty kilopascal

6: to hundred-and-seventy kilopascal,

7: so no warmth

8: is exchanged with the environment

9: how much is the new volume?

10: v1,

11: that’s ... eight hundred ten to the power

12: minus three

13: [writes down: V1 = 800× 10−3]

14: [writes down: P1 = 1.2× 105]

15: t1 is t1 [writes down: T1 = T1]

16: that goes to v2 p2 t2,

17: [writes down: V2 = . . . P2 = . . . T2 = . . .]18: in which t2 is also unknown

19: [writes down: T2 = T2]

20: one comma seven ten to the power five

21: [writes down: P2 = 1.7× 105]

22: the volume is asked for [writes down: V2 =?]

Examples 147

23: adiabatic,

24: so now just ... p times v to the power

25: gamma is constant ... p1 v1 gamma

26: is p2 v2 gamma

27: [writes down: P1 × V γ1 = P2 × V γ

2 ]

28: v2 to the power gamma is p1 v1 to the power gamma

29: divided by p2

30: [writes down: V γ2 = P1

P2× V γ

1 ] v2 to the power gamma which

31: is then ... oxygen

32: just calculate gamma,

33: two point nine

34: this times four, I reckoned yes,

35: is one comma four

36: [writes down: γ = 1.4] so then v2 to the power gamma

37: is ... p1 one comma two ten to the power

38: five,

39: divided by ... one comma seven ten

40: to the power five,

41: cross out,

42: times v1,

43: zero comma eight to the ... one comma

44: four

45: [writes down: V γ2 = 1.2×105

1.7×105 × 0.8γ

46: cancels 105 out]

47: thus v2 that is then ... er

48: one comma four becomes already

49: from this total thing

50: from one comma two divided by one

51: comma seven times comma eight to the

52: ... one comma four.

53: [writes down: V2 = 1.4

√1.21.7× 0.81.4]

54: eeer ... point eight ...

55: [uses calculator] to the power

56: that is ... times two ... divided

57: by one point seven that is,

58: one comma four reversed to the ...

59: just store store zero

60: eeer yes,

61: I do not know whether store one,


62: recall one to the ... ehm ... recall

63: zero to the ... yes,

64: nothing,

65: recall one is ... ehm

66: v2 is zero comma sixty two,

67: cubic metres

68: [writes down: = 0.62m3]

69: pressure is increased,

70: volume decreases, yes ...

8.2.6 Coding the protocol

In this study no separate coding schema was used. The model was precise, de-tailed and close enough to potential verbalizations to be used as coding schemaas well. Here we code the protocol in terms of the model noting possible dis-crepancies. Note that the model consists of layers. Only the most specificlayers correspond directly to protocol segments. For example, the processorientate consists of several sub-processes. Only the most specific sub-processes (read global, determine states, determine process, de-termine states, analyse asked, analyse qualitative) appear in theprotocols. Also note that the description above is for the general case. Forsome problems sub-processes may not be relevant. For example, the processsimplify equation will only take place if an equation can actually be simpli-fied. The more detailed version of the model (in particular the computationalmodel) is able to make these detailed predictions. Below we shall not analysethe protocol at that level of detail but focus on the problem solving method.

Lines: Model elements: Comments:

ORIENTATE

READ PROBLEM

1-9 READ GLOBAL OKAt this point the modelpredicts EXTRACT FEATURES.This does not appear inthe protocol.

Next the model predicts aseries of steps for SKETCH.

Examples 149

However, this subjectdirectly proceeds withSCHEMATIZE.

SCHEMATIZE

Here the processDETERMINE SYSTEM is missing.

10-15 DETERMINE STATES OK16 DETERMINE PROCESS OK17-21 DETERMINE STATES OK22 ANALYSE ASKED OK23 ANALYSE QUALITATIVE OK

SCHEMATIZE is as predicted.

SOLVE

SOLVE FOR VARIABLE

The process CLASSIFY VARIABLES

is not predicted for thecurrent problem.

RESOLVE VARIABLE

Here the processGENERALIZE VARIABLES is missing.

24-25 SELECT PRINCIPLE OKHere the processCHECK APPLICABILITY is missing.

26-27 SPECIFY EQUATION OKThe process SIMPLIFY EQUATION

is not predicted here.

COMPUTE

28-30 SUBSTITUTE EQUATIONS OK31-54 FILL EQUATIONS OK55-68 CALCULATE OK

EVALUATE

69-70 CHECK SOLUTION OK

As the reader can see, all statements in the protocol are predicted by the model.Some processes do not appear here because they are not predicted by the moredetailed version of the model. These are: canonize, classify variables


and simplify equation. However, several predicted processes are missingand these are potential discrepancies with the model:

extract features: It is possible that the subject selected the formula with-out explicitly extracting the feature ‘adiabatic’. There is no good reason (suchas automated process) to assume that this was done very quickly so we haveto assume that it was skipped.sketch with its sub-processes: the subject constructs a schema (in physicsterms) without first constructing a sketch. For this problem sketch and schemaare very similar so this is easy to do. It is not consistent with the model,though.determine system: This is not mentioned but it is very plausible that thiswas automated. In this case the selection of the thermodynamic system istrivial (the gas in the container). This will not be verbalized here.generalize variables: This is also trivial in this case and the fact that itdoes not appear in the protocol can be attributed to automation.check applicability: This is skipped. Because the subject also did notperform extract features (which is a cue for applicability) it is quite pos-sible that did process did not take place. A missing process.

This means that all of the 70 protocol segments are predicted by the modelbut that 3 processes remain missing from the protocol.

8.2.7 Results

The model was compared with the protocols of two novice and four advancedproblem solvers. The model is tested on three levels:

1. Validation of the sequence of tasks within the model.2. Validation of the completeness of the model (are there protocol fragmentsthat cannot be coded in terms of the model?).3. Validation of the level of detail and correctness of the model (are theremodel statements that are never found in the protocols?).

Examples 151

8.2.8 The sequence of tasks

The task structure specified in the model largely agreed with the task struc-tures found in the protocols of the subject. In all protocols the orientation taskis carried out completely before the actual solving of the problem. Thereforethe model provides an accurate picture of the analysis of the problem textthat takes place before the solving of the problem. Diversions are not found atthe global level of the task structure but within sub-tasks. For example, thesequence of steps within schematize does not always agree with the sequencethe model predicts. Jansweijer concludes that his model is more in line withthe problem-solving behaviour of advanced subjects than models proposed bysome other researchers that do not take an orientation phase into account.

There was a clear difference between the protocols of the advanced prob-lem solvers and the novices. The clearest difference was in the ORIENTATION

process. Novice protocols show less ORIENTATION and also this process doesnot take place where it is predicted but is spread out between other processes.

8.2.9 The completeness of the model

The model is not complete. This follows from the fact that 17 per cent of allprotocol statements of this subject was coded as **no match**, which meansthat no parallel action could be found in the model. In many protocols, forexample, those **no match** statements concern ‘recognition of the problemas a particular problem type’. Statements were made, like:

‘is just like the previous problem, you can just say ...’

‘this is a ... well, rather a standard problem’

Jansweijer concludes that the model would increase in completeness if a com-ponent for the recognition of problem types were added. Another kind ofincompleteness detected concerned the derivation of a physics principle. Theadvanced subject consequently starts from the most general, basic principle,whereas the model in most cases initially chooses an more specific principle.And finally, the model lacks the ability to consider its own problem-solvingprocess, where the subject provides evidence that he does have this ability:

‘ ... well yes, I do not need to further what you call it, with

the components, that makes no difference, anyhow’

‘and that is ... well, it should in any case be larger’


8.2.10 The level of detail of the model

At some points the level of detail of the model is found to be too high. Forexample, within sketch the model has separate tasks for reading the problemtext sentence by sentence, extracting the important features, integrating thegivens in a sketch, finding the asked and determining the device in its standardform. The subject under investigation tends to integrate most of these taskswith the model’s next task schematize. Jansweijer concludes that althoughthe model is at some points too detailed this does hardly endanger its validity.He argues that it is very plausible that the separate tasks specified by themodel are still carried out by the subject, but automated to such a degreethat they no longer appear as separate statements in the protocols.

8.2.11 Conclusion

The example provided above shows how protocol analysis can be used to in-vestigate a scientific question. In this case the evaluation of the model showedthat the model was quite accurate. The analysis of the protocols formed anadequate basis for further refinement of the model.

8.3 Explaining novice errors in computer programming

8.3.1 Introduction

Several studies of computer programming have used the think aloud method.Most of these studies were aimed at understanding the errors and difficul-ties of novice computer programmers. One example is the study reported byvan Someren (1990). Subjects solved programming exercises while thinkingaloud. The analysis of the protocols was based on a detailed analysis of theprogramming problems and the hypothesis that errors were due to misleadinganalogies between constructions in the programming language that was used inthe exercises and structures in our common language or in other programminglanguages.

8.3.2 The model

On the basis of this idea a psychological model of programming behaviour wasconstructed. This model is in the form of production rules that transform analgorithm into a Prolog program. The model does not address other cognitive

Examples 153

processes that also play an important role in programming, such as: under-standing the problem, verifying the program, finding bugs and repairing these.A particular student can be modelled as a subset of these implementation rules.Some rules (‘malrules’) are incorrect, producing incorrect programs. The mal-rules are based on analogy between structures in the original algorithm and inthe Prolog programming language. To construct this model we started witha task analysis. How can a correct program be constructed from the initialproblem specification? Consider the following example of an algorithm forcomputing the maximum of a list of numbers:

Maximum(List, Max):

Initialize Max at 0

REPEAT

take next element N from list

IF N > Max THEN ASSIGN N TO Max

UNTIL list is EMPTY

There are several ways to implement this in Prolog. What is needed in anycase is a condition to stop the running of the program when the end of the listis reached, in order to avoid an endless loop. The condition to stop the REPEATcycle is that the list is empty. What is also needed in Prolog structures, is avariable to which the largest number found can be assigned, so that when theprogram stops running, it can give back this maximum.

Lists in Prolog consist of a ‘head’ and a ‘tail’. The head is the first elementof the list, the tail is the rest of the list. Take for example the list [2,7,8,4],the head is 2, the tail is [7,8,4]. A list can be empty, noted down as: [].

A Prolog program consists of a sequence of ‘clauses’. There are two typesof clauses. One type is a kind of IF-THEN rules and the other is a kind offacts. The program is started by calling it with a structure that has the sameform as the THEN-part of one or more rules. So for example, the programfor finding the maximum of the list [7, 8, 4] would be started by the callmax([7, 8, 4], M) where the result would appear as the value for M. Thestop-condition part can be implemented in Prolog, for example as one of thefollowing ‘fact’ structures:

max([], M, M). or max([M], M).

depending on the rest of the program. However, there is a Prolog clause

max([], M):- fail.


that is wrong, although, it is actually more similar to the expression ‘UNTIL

list is EMPTY’ than the correct implementations. The special Prolog con-struct fail has the effect of simply stopping the execution at this point, andthe maximum number is not exported. The correct implementations haveadditional effects (passing on substitutions for the variable Max).

A Prolog structure that is also false, but perhaps superficially even moreappropriate is:

max([], M).

Like the previous structure this is intended to implement an action that mustbe applied when the ‘empty list’ [] is encountered by the system. At thatpoint the procedure must stop.

This example illustrates how implementation errors can be explained bynaive analogies between the algorithm and the implementation. Reasoningsteps in the protocol were matched with applications of rules (and malrules).No additional coding scheme was constructed because application of rules andmalrules can be recognized both from the results of a reasoning step andfrom the protocol text. Coding was done at a fairly low level of granularity:episodes in the protocols were coded as showing clear evidence of applying areasoning step from the novice model or not. The frequency with which rulesand malrules match protocol fragments were reported. The results show thatmost rules and malrules appeared in the protocols, which gives some initialsupport to the model. Some rules did not appear in the protocols. The authorhas no good explanation for this, other than the relatively small set of data.In this study other support for the model is presented that is based on (false)programs that were constructed by students taking a course in Prolog. Manyfalse programs can be explained from the application of malrules. This is anexample of the use of several types of data to evaluate a model.

8.3.3 Design of the study

Subjects in this experiment were students who took an introductory course inProlog programming. Two types of data were used: a collection of programswith errors that were produced by students during practical work and a set ofthink aloud protocols.

8.3.4 An analysed protocol

To illustrate the coding procedure we give a protocol fragment and the rulesand malrules that were identified in this fragment.

Examples 155

Protocol + subject’s notes Analysis and comments

1: err, stop-condition must be This shows that the subjectthe list is finished has implicitly already

found an initial algorithm.2: he has to look at the whole Repeats algorithm.

list, anyway,

3: because otherwise he can never

know which is the biggest

4: call the predicate ‘biggest’ ‘Biggest’ corresponds to thefor the moment procedure that was called ‘Max’

5: and it can only say it when in the description above.the first list is empty

6: so the given list is an empty

list

7: so the final stop-condition,

8: that has to come in another

argument

9: this is the variable for the This variable corresponds to thebiggest variable M in the description above.

10: anyway the stop-condition Uses false implementation rule.[writes: biggest([], X)] The implementation is based on

the algorithm described above:Initialize Max at 0

REPEAT

take next element N from list

IF N > Max

THEN ASSIGN N TO Max

UNTIL list is EMPTY

11: ... perhaps it will be 1 of 3 (he means: three clauses)12: again a ‘biggest’, that is I

have a list,

13: a list with a head and a tail,

14: you have to look at it number

by number

15: another biggest,

16: that means that I have a list

17: a list with head and tail


18: you should look at it number

by number

19: er, here I have another

argument ...

20: and that is the biggest, er,

21: now I do it so that he takes

the first,

22: fixes it

23: then looks at the head of Here the body of the REPEAT

the tail loop is partially implemented[writes: biggest([H|T],Y):- following an incorrect

compare(H,X,Y).] implementation rule.25: you have to fix that instead

of the other

26: that is how it must be

27: and finally, er, if you have

done them all

28: you can go to the stop-condition

29: that is the principle

30: which should work, I think Explicitly evaluates the resultingprogram.

The resulting (erroneous) Prolog program is:

biggest([], X).

biggest([H|T], Y):-

compare(H, X, Y).

The analysis shows that only part of the protocol can be explained by themodel. In fact only actual program construction steps are modelled and otherprocesses (e.g. analysis of the algorithm, evaluation of the partial program)are not part of the model.

8.3.5 Conclusion

Since this study used both products of problem-solving (in the form of an-swer to exam questions and solutions produced by students during practicalprogramming work) and think aloud protocols, a comparison can be made be-tween the analysis of products and of protocols. The main difference is thatthe protocols never consist of a direct path from the problem to the program.

Examples 157

There were always several attempted solutions, some of which were writtendown but others were just mentioned, evaluated and rejected. In many casesthe final solution tells very little about the problem-solving process.

As illustrated by the analysis above, parts of the protocols can be explainedby false implementation rules. The result of this study was that support wasfound for the model but that a large part of the protocols involved otherprocesses than those predicted by the model.

8.4 Acquisition of medical knowledge

8.4.1 Example: a medical diagnosis task

In the context of the development of an advice system for an ambulance servicewe studied decision making by doctors. The task is to decide if an ambulancemust be dispatched on the basis of an interview by telephone. This situationdoes not allow application of the think aloud method ‘in vivo’, so we presentedthe interview data on a sheet of paper to experts from the medical faculty andasked them to take a decision while thinking aloud about the case.

This problem is non-standard for these experts, because they work ina hospital setting where much extra information is available (in particulardata about electrocardiogram, blood pressure data and physical examination).They found this task very difficult. The number of attributes is about 40 (notall attributes apply to each case). There were only three possible decisions:decide to send an ambulance, decide that medical attention is needed but thatit is not necessary that the patient is taken to hospital by ambulance, or de-cide that no immediate action is needed. In this version of the task all dataare available to the decision maker, a cardiologist, and he is free to use themin any order. The data for one patient are listed in Appendix D. The taskanalysis in this case used the following information:

(a) A superficial inspection of several protocols, one of which is reproduced inAppendix D. This was the basis of the process model.(b) A study of the medical literature. Many concepts and diagnostic rulescould be obtained directly from the literature. This was a problematic andtime-consuming exercise because of differences in terminology and the orga-nization of medical knowledge by disease states and processes rather thandiagnostic use. Many books, for example, discuss causes and varieties of heartdiseases and the knowledge that is directly useful for diagnosis is spread outand sometimes completely missing.


(c) Existing models of diagnostic reasoning. This also inspired the processmodel.

8.4.2 Knowledge structures

In an earlier stage of the project, the information about the patient that ispossibly relevant in the context of sudden chest pain had been collected frominterviews and a study of the literature. From this information those itemswere selected that can reasonably be requested in a telephone interview. Also,many possible diagnoses were collected. The most important are: heart in-farction, angina pectoris, tachycardia, pulmonary embolism, non-organic chestpain (i.e. no medical cause for the complaints). It is possible to find morespecific diagnoses, but this level is specific enough to make a decision. Thefirst requires urgent treatment. Two other, pulmonary embolism and anginapectoris, also require treatment, but the situation is (generally) less urgent.Neither the list of possible diagnoses nor the list of possibly relevant patientdata is exhaustive. We found that from time to time someone would raise apossible diagnosis that had not been considered and that required additionalpatient information.

How can we structure the knowledge involved in this diagnostic task? Wemake a distinction between patient data, decisions and diagnoses. The rela-tion between patient data and diagnoses is rather complex. Almost all dataare associated with more than one diagnosis. In fact, many complaints andsymptoms can have completely different causes. Medical textbooks and inter-views with experts show that many concepts are used in the domain. Here wedescribe the most important types with some examples. This is an informalversion of task analysis.

Patient data: The attributes to be used here were the result of an ear-lier stage in the analysis. See Appendix D for a list of these data. Examplesare: the patient has pain in the chest, the patient has difficulties in breathing(dyspnoea), and the patient is female.Decisions: The decisions are sending an ambulance or not. In the latter casethe caller is either advised to consult one’s doctor immediately or told his orher complaints require no urgent treatment, but that it is probably good tosee the family doctor.Diagnoses: In many cases, the decision is based on a preliminary diagnosisand an estimate of the severity. Diagnoses in this task are: heart infarction,angina pectoris, pulmonary embolism, non-organic chest pain and several otherless frequent categories. Diagnoses can be organized in classes. For example,

Examples 159

important classes are ischaemic diseases, cardiac diseases and afflicted thorax.Diagnostic classes: The diagnoses are often organized in hierarchies. Herewe give part of a possible structure:

urgent

urgent-cardiac

ischaemic

heart infarction*

medical attendance required

pulmonary embolism*

anginous

angina pectoris*

not urgent

not-urgent-cardiac

tachycardia*

thorax

muscles

bones

non-organic chest pain*

Here the concepts marked with * are diagnoses and the other are diagnosticclasses. This structure is currently incomplete. For example, the thorax diag-noses are not elaborated.Interpretation of attributes: From attributes, interpretations can be de-rived. For example, the attributes ‘does the dyspnoea get worse with moving’and ‘does the dyspnoea get worse with a deep sigh’ are very similar in meaning.If both are positive they indicate a pulmonary cause and if both are negativethey exclude it. If one is positive and one is negative, it is uncertain. Thesetwo attributes can therefore be combined into a single new internal attribute,which reduces the number of attributes needed for further reasoning. In thiscase, domain knowledge is needed to find the interpretations. This domainknowledge was obtained in the task analysis stage. It is unlikely to be com-plete, because the expert may well make interpretations that are not part oftextbook knowledge. We cannot give an exhaustive list of the interpretationsthat were found in task analysis, but we can give a few examples (in informallanguage):


IF dyspnoeic pain AND

NOT pleural irritation AND

NOT hyperventilation AND

NOT high CO2 level

THEN ischaemic_pain

IF stiff fingers OR

pain: pins and needles OR

pain: prickling OR

pain: burning

THEN high CO2 level

In general we shall refer to terms such as ‘serious’, ‘cardiac’, ‘pleural irritation’as concepts. Concepts are used to state conclusions about a patient: if it isestablished that a patient has a ‘cardiac’ diagnosis, then this is a conclusionabout the patient.

We have described the relations between concepts in symbolic, non-numericalterms. It is possible that people know weights or statistical data about theserelations. Some of these data can be found in the literature, but certainly notall. We do not know (during task analysis) which of these data the expertknows.

8.4.3 The problem-solving process

Now that we have described the knowledge that can be used for this task, weturn to the problem-solving process. On the basis of the types of knowledge,we distinguish the following types of reasoning steps:

1. Inference steps: The inferences can be further classified by antecedentand conclusion (attributes, interpretations, diagnostic classes, diagnoses, de-cisions), but for our purpose this is not essential. The task analysis will havegiven a table saying for each of these concepts, if an attribute is a positiveindication, a counterindication or neutral with respect to each concept. Forexample:

Taxonomic reasoning: Diagnostic classes are organized in a taxonomy. In-ferences about concepts that are in a hierarchy have implications for otheritems in the hierarchy. For example, the diagnostic class ‘not serious’ coversseveral more specific diagnostic classes and specific diagnoses. Taxonomic rea-soning means making inferences using properties of the taxonomy.

Examples 161

2. Recognize inconsistency: An inconsistency occurs when there is bothpositive and negative evidence for a conclusion.3. Resolve inconsistency: This can be done in several ways:

Use extra knowledge to discard premise: Although an attribute or inter-mediate conclusion may in general have predictive value for another conclusion,this may not be so in a particular situation. In a particular context a conclu-sion may be explained by some other interpretation. Recognizing either thenew conclusion or the old one that is inconsistent with it involves finding apossible interpretation and testing that. This involves extra knowledge andmay cause requests for extra information or re-examination of old information.Request extra information (to resolve inconsistency): This request canbe observed as looking back over the sheet with data, retrieving informationfrom memory or invoking extra concepts.Take decision: This means that the inconsistent evidence is somehow usedto come to a decision without really resolving the inconsistency.Postpone: Wait until new information makes it possible to resolve the prob-lem.

To resolve inconsistencies makes it necessary to retrieve the arguments thatare involved in the contradiction. This may be difficult to do without appro-priate intermediate structures. It is also a potential cause of errors, becausearguments may fail to be retrieved. The actions to resolve inconsistencies mayuse knowledge that has not been used before.

The overall process structure is determined by the way in which the attributesare handled. Because there are many data about the patient and becausethese must be read by the expert there will be a stepwise process in which anew piece of information is used to update the current knowledge. A possibleprocess structure is given in Figure 8.1.

A procedural model can be used to generate a predicted protocol. This isa protocol that is obtained by applying the model to a problem. If the modelis a full computational model it can be run on an example and generate asolution trace that corresponds to a protocol. Models that are less formal orthat do not contain full details of the knowledge involved can also be used togenerate predicted protocols. An example of a predicted protocol for this taskis given in Section 8.4.5.


Currentknowledge

Infer

Decide

Recognizeinconsistency

Inconsistency Resolve inconsistency

Readattribute

FIGURE 8.1: Model of diagnostic problem-solving

Examples 163

8.4.4 Alternative models

We have not discussed all possibilities for the problem solving process. Forexample, another process structure would be to work ‘backward’ from thediagnostic classes. One could exclude or verify diagnostic classes or specificdiagnoses by selectively examining the data. Since our expert is not used tothese forms and does not know precisely the attributes that are used, thisstrategy is less likely here. A full task analysis would include these. Thescope of alternatives is partially determined by the knowledge that we mayexpect from our expert and by the intended scope of the system: shouldit be able to give explanations? Which range of cases should it be able todiagnose? Although the context of this example is knowledge acquisition andnot a psychological study, psychological knowledge is relevant here becausepsychological factors determine the way in which a human expert will performthe task. This knowledge will allow a knowledge engineer to abstract fromthese factors and will help to decide which part of the model is worth includingin the computer system. Factors that were important in architectural designalso are important in medical diagnosis, in particular the capacity of workingmemory and the knowledge that the expert has acquired with respect to thisparticular task.

8.4.5 Predicted and actual protocol

In this section we address two issues: (1) How the model can be used to explainproblem-solving behaviour in the form of a predicted protocol and (2) how itacts as a source of information about the domain. To illustrate the explana-tory power of the model, we first show a predicted protocol and then the realprotocol, discussing the discrepancies and the way in which the protocol canprovide new information. The first part of the case description of the patient:

(1) Sex: female(2) Age: 30-40 years(3) Complaints: palpitations

dyspnoeatired

(4) How long has the patient had dyspnoea? over 4 hours(5) Has the patient had this dyspnoea ever before? yes

For the complete case description we refer to Appendix D. Next we show the


first part of the protocol predicted by the model. This predicted protocoltakes the first three lines of the case description as input. In the left column ashort indication is given of the patient data. The middle column contains thecontent of the working memory. In the right column we list the explanationsand knowledge that are used in the reasoning step.

Predicted protocol:

Patient data Working memory Knowledge used

1. Sex: female sex: female

2. Age: 30-40 sex: female sex: female and age: 30-40 ⇒age: 30-40 not urgentnot urgent

3. Complaints:palpitations sex: female sex: female and age: 30-40 and

age: 30-40 palpitations ⇒not urgent possibly non-organic chest painpalpitations non-organic chest pain ⇒ not urgentnon-organic

sex: femaleage: 30-40 sex: female and age: 30-40 andnot urgent palpitations ⇒palpitations possibly tachycardianon-organic tachycardia ⇒ not urgenttachycardia

dyspnoea sex: femaleage: 30-40not urgentpalpitationsnon-organictachycardiadyspnoea This fills up working memory.

Subject will drop one itemor replace several items by anew concept. Items that will be

Examples 165

dropped can be those that wereused least frequently and longago (sex: female or age 30-40)or those that are now redundant.Taking the first we would get:

age: 30-40not urgentpalpitationsnon-organictachycardiadyspnoea

tired not urgent palpitations and dyspnoeapalpitations and tired ⇒non-organic possibly non-organic chest paintachycardiadyspnoea palpitations and dyspnoeatired and tired ⇒

possibly tachycardia

The subject in this case is a cardiologist. He is an expert on heart diseasesbut is not used to this type of patient data. In practice he always has at leastsome results of physical examination and he can see the patient and talk tohim or her directly. Next we show a fragment of the actual protocol:

1: This is a woman between 30 and 40 years old

2: so I think that it’s probably not too serious.

3: ... uhh Patient has palpitations,

4: is dyspnoeic and tired,

5: and that are symptoms from what I think,

6: they don’t really belong to uhh cardiac complaints.

7: ... This could be uhh a more non-organic chest pain kind of

patient.

8: Palpitations do occur with women of that age as tachycardia

9: but then you should see the cardiogram.

As you can see, the predicted protocol and the actual protocol up to line 8correspond rather well. In line 9 the expert says that he could use some moreinformation, namely a cardiogram, but this is not available. This informationwould be needed in order to make a conclusive diagnosis. However, this is notcalled for by the task.


The next bit of information read by the cardiologist is that the patienthas dyspnoea for over four hours, and that she had it before. The protocolcontinues as follows:

10: This dyspnoea is has been going on for more than four hours

11: so it is probably not very urgent ... but ...

12: everything is possible.

13: She has had this dyspnoea before,

14: but still, this doesn’t tell me very much because

15: this might have happened one day ago, but also five

years ago,

16: so I don’t know what to do with this information,

17: I would like to ask when she has had this dyspnoea before.

Here we see a situation where the cardiologist needs additional informationto be able to use the patient data on dyspnoea for diagnosing the cause ofthe patient’s complaints. So far, the cardiologist inferred that the complaintswere unlikely to be caused by a serious heart problem (lines 6-8). The reasonthat more data about the dyspnoea are given is that dyspnoea might be anindication for a lung problem (pulmonary embolism), and in this case medicaltreatment is required. So there might be an inconsistency at this stage betweenthe content of the working memory not urgent and the new hypothesis thatmedical treatment is needed because of pulmonary embolism. The cardiologisttries to resolve this conflict by asking for additional information.

In task analysis a clear distinction is made between dyspnoea and pain inthe chest. However, somewhat further on in the protocol the expert says:

24: She’s having pain and this pain has been going on

25: for more than four hours as well,

26: and this makes me wonder if the pain and the dyspnoea are

somehow related,

27: because, if they both last for longer than four hours

28: they might be the same thing,

29: but then the patient is unable to distinguish

30: between the concept of pain and the concept of dyspnoea.

In this fragment, the expert assumes a communication problem: the subjectdescribes a single dyspnoea and pain instead of two different symptoms. Thisis not what we expected in the task analysis. This kind of information canonly be derived by studying actual protocols of problem-solving experts.

Examples 167

8.4.6 Results

This example illustrates how both domain specific knowledge can be extractedfrom protocols and how additional problem solving processes can be discoveredfrom a protocol. Knowledge in the form of production rules can be extractedfrom the protocol basically by taking the content of working memory and theaction at a particular step and turning this into a rule. In a later stage the rulesmust be analysed. In some cases this gives rules that were already known fromthe task analysis, but in other cases new events take place. In the protocolabove, a new process was discovered that was not found in the task analysis.The expert hypothesizes that the patient refers to a single feeling as both painand dyspnoea. This is a matter of language rather than a medical issue, butit is actually quite important and the protocol suggests how such a referenceproblem may be detected and resolved.

Resolving terminological problems has its own type of inference, but issimilar to resolving inconsistencies. Because segments from an episode aboutterminology are coded as a step in resolving inconsistencies, some sequencesin the protocol do not fit the model.

8.4.7 Conclusion

This analysis illustrates a possible use of think aloud protocols in knowledgeacquisition. This analysis was one of the elements on which an advice systemfor the ambulance service was based. The final system has a structure sim-ilar to that used in the analysis of the protocol, although finally a differentconceptual structure was used. The system is described by Post et al. (1993).

Appendix A

Exercises

A.1 Exercise 1: collecting verbal data

The purpose of this exercise is to practice procedures for collecting verbaldata, in particular think aloud protocols and retrospective protocols, to obtainexperience with transcribing a protocol and to find out what it means to be asubject in a think aloud experiment.

For this exercise you can use any problem-solving task that is compatiblewith thinking aloud. In Appendix B you find instructions for two tasks: wa-terjugs and improving technical devices. In addition to the materials for thetask, you need a cassette recorder. You also need the help of an assistant, whocan think aloud and who can assist you when you think aloud yourself.

Do the following:

1. Extend the instruction for the problem-solving task with an instructionfor the think aloud procedure. Consider the possibility of a warming up forthe think aloud procedure. Explain the instruction to your assistant.2. Take two think aloud protocols of two problems within the same task do-main (either waterjugs or improving devices). You are the subject and yourassistant should lead the session.3. Take two retrospective protocols of two problems in the same task domain.Again, you are the subject.4. Now your assistant solves two waterjug problems while thinking aloud.5. Finally your assistant solves two improving devices problems while thinkingaloud.


6. Transcribe about four pages of the recorded think aloud protocols.

From this experience answer the following questions:

(a) Did the verbalizations change, for example as a result of practice ?(b) Are there differences between the think aloud protocols and the retrospec-tive protocols? If so, are these consistent with those discussed in Chapter 2?(c) Would it be possible to obtain these data with interviews?(d) Does thinking aloud have an effect on the problem-solving process?(e) Which information is lost by transcription from the tape?(f) As far as you can tell, do the protocols reflect the cognitive process?

A.2 Exercise 2: applicability of the think aloud method

In order to design an experiment to obtain data about a cognitive processyou need to define, beside the cognitive process, the following elements: a setof problems, a verbalization procedure and subjects. The cognitive processshould occur when the task is performed by these subjects and the methodshould provide valid and complete data about the process. Examples of cogni-tive processes are overcoming impasses (in problem-solving), making assump-tions about unknown information or integrating large amounts of information.These processes can occur in the context of tasks such as computer program-ming, bird watching, legal decision making, predicting money exchange rates,selecting personnel and planning a meal. For each process, in the context ofparticular task, find a suitable set of problems, a suitable type of subjects andan appropriate technique for collecting data about the process.

A.3 Exercise 3: task analysis and model construction

A.3.1 Introduction

The purpose of this exercise is to practise task analysis and construction ofpsychological models. This example is taken from a study by Van Daalen-Kapteijns & Elshout-Mohr (1981). They studied a task in which verbal skillsmay be involved, namely learning the meaning of new words from texts inwhich these new words are used in context. Their main research questionswere: (1) are verbally gifted persons more able to infer word meanings thanothers and (2) if so, in what respect does their approach to the task differ fromthose less verbally able? The extent to which people are ‘verbally gifted’ was

Exercises 171

measured by a specific intelligence test.Their research showed that the answer to the first question is yes. Verbally

able persons are indeed more proficient in inferring word meanings from textsthan others. To analyse in what way they differ from others, Van Daalen-Kapteijns and Elshout-Mohr did a follow-up investigation in which protocolanalysis was used.

On the basis of a first set of protocols the researchers hypothesized that thedifference in performance is mainly determined by a different problem-solvingmethod when performing the task. The main difference is that persons of lowverbal skills tend to substitute the unknown word by a familiar word or expres-sion, whereas the verbally able persons try to infer the main characteristicsof the meaning of the new word and adapt these to all new information theyget. As a result of this less able persons forget information that was given inearlier sentences or that was inferred before. Therefore they need to go backto earlier sentences more often. Another effect is that they may give a solutionto the problem that is not consistent with all sentences. This results in poorsolutions or long solution times.

A.3.2 Example problem

Here follows the instruction for the task and part of a think aloud protocol.Subjects have a sheet of paper on which they can take notes. The subjects’notes are at the bottom of the episode during which they were made. Forexample the note ‘Kolper can be part of a room’ was made after readingsentence 1 and before reading sentence 2.

Instruction for the subjects

This study investigates the way people handle texts in which words unknown tothem are used. Before you, you have five little cards. At the back of each carda sentence is written, using the same unknown word. Your task is to figureout what each sentence means and then to infer valid information about themeaning of the new word. We ask you to think aloud while doing the task.Note any information you find out about the unknown word down on the padbefore you.

Two further remarks:

1. The new words used in this experiments have complex meanings that canonly be captured in one or more sentences. None of the words can be substi-


tuted by a single existing word.2. It may happen that a new word reminds you of another word which soundsthe same. Try not to go by such likeness, because sound kinship does not nec-essarily imply kinship of meaning.

Problem

Sentence 1: A room with one or two kolpers on a court is not very attrac-tive.

Sentence 2: During a heatwave many people long for kolpers so the sale ofmarquees reaches a record.

Sentence 3: The ad stresses the fact that the room is facing south, but just onthe south side the room has kolpers.

Sentence 4: Although this room certainly does not have kolpers, it is so deepthat you need to turn the light on, during the day as well, on clouded days.

Sentence 5: Those large trees along the canals are indisputably very beau-tiful but as a consequence many canal-houses have to put up with kolpers allsummer.

A.3.3 Suggestions for task analysis and psychological model

The basic structure of the task can be characterized as follows:

Goal: a definition that has approximately the following structure:A <target word> is a <concept/noun> that is a <description> of particularproperties distinguishing <target word> from <concept>.For example, a possible solution for the problem above is‘ A kolper is a windowthat cannot be reached by the sunlight’.Givens:(a) Directly given properties. These are properties of the target concept thatare given directly in the text.(b) Inferred properties. These are not given directly in the text but can beinferred. For example, the phrase during a heat wave people long for kolperssuggests the property that kolpers reduce heat. This property is not stateddirectly but is inferred from the given properties.

Exercises 173

The process of finding word meanings may use the following pieces of in-formation:• Current set of properties consisting of both inferred and directly given prop-erties.• Current concept.

Also, the process may consist of:• Inferring new properties.• Integrating new properties into the current set.• Constructing a tentative definition.

Integrating new properties involves handling possible inconsistencies. For ex-ample, suppose that the current set of properties of kolper is:part of room, furniture, size at least size of chair, reduces heat

(during heatwave)

and a new property is a kind of window. This is inconsistent with theproperty furniture. This inconsistency must be resolved by revising eitherfurniture or a kind of window.

The psychological hypothesis is that poor performance on this task iscaused by a weak method for updating the current property set. There aretwo important properties of possible strategies:

Revision versus replacing: Start with an initial property set. If this isinconsistent with a new property then drop the initial property set and con-struct a new one from the current text. The drawback is of course that thisdoes not maintain consistency with the previous texts. The advantage is that itavoids the revision process which can be difficult, because it requires retrievingthe justification for the inconsistent properties and revising this, which mayaffect other properties.Size of the property set: A text may contain several properties and it maybe possible to infer quite a few more from the text and the other current prop-erties. Revision or replacing can involve few properties (e.g. two: a conceptwith one discriminative property) or more. More properties require more rea-soning during the updating process.

The most elaborate strategy consists of full revision with a maximal prop-erty set. The strategy involving the least resources is replacing with a minimalproperty set. If resources allow application of the first strategy the result islikely to be better, because full use is made of all available information. The


hypothesis is that better performers will follow the better strategy and poorperformers the simpler and poorer strategy.

A.3.4 Exercise

In appendix C two segmented think aloud protocols are reproduced. Do thefollowing:

1. Make a task analysis. Limit the scope of the analysis to what is relevantfor the research issues and to what can be expected to appear in protocols.First try to make the task analysis without looking at the protocols. If thatfails, use one of the protocols as a pilot protocol.2. Apply the psychological hypothesis to the task analysis to obtain a psy-chological model. The model should be detailed enough to allow the con-struction of a coding scheme later on. Make a model for each type of problem-solving behaviour. The models should clearly reflect the hypothesized differ-ences between novices and experts and people with high and low verbal skills.3. Construct a coding scheme.4. Code the given protocols and have someone else do the same with the samecoding scheme.5. Determine the inter-coder reliability of the two codings.6. Use the coded protocols to answer the question that motivated the analysis.7. Try to collect new protocols from subjects that are similar to those used inthese experiments, to collect your own protocols and analyse these with thecoding scheme.8. Discuss the implications of the results of the analysis for the research ques-tion.9. Discuss the difficulties in applying the think aloud method and consideralternative methods.

A.4 Exercise 4: knowledge acquisition

Appendix D gives the full protocol of a medical diagnosis problem, part ofwhich was discussed in Chapter 8. Continue the analysis given in Chapter 8with the rest of the protocol. Construct a psychological model for this task,as far as possible. Since you are probably not an expert yourself this modelwill be very incomplete with respect to the actual knowledge that is used. Usethe protocol to complete your model and to check if it is complete. Revise themodel on the basis of the protocol.

Exercises 175

A.5 Exercise 5: physics problem solving

In Chapter 8 we discussed a model of advanced problem solving in physics. InAppendix F you find a think aloud protocol of a novice subject (translated from Dutch into English). Code the protocol in terms of the model and comparethe protocol with the model. Use the analysis to check Jansweijer’s hypothesisabout the differences between advanced and novice problem solvers.

Appendix B

Instructions for twoproblem-solving tasks

B.1 Task 1: waterjug problems

For this task you only need paper and pencil. There are several variations ofthe basic task. Here we give the instruction:You have two waterjugs. One jug can hold 5 litres and the other 3 litres. Thejugs have no marks and one cannot see how much water they contain. Theycan be filled from a water tap and emptied in a sink. One can also pour waterfrom one jug in another. Is that clear ?We ask you to note the contents of the waterjug on this sheet. Please maketwo columns, one for each jug. Initially, both jugs are empty, so we have 0 -0. Is that clear? Your task is to make 4 litres of water.

It is easy to make variations by changing the target amount, the volume ofthe jugs and also the number of jugs.

B.2 Task 2: improving technical devices

The instruction for the task ‘improving technical devices’ reads as follows:This task consists of inventing improvements for technical devices. I shall giveyou the name of a technical device and your task is to invent five improvementsof this device.

Some possible devices are: washing machine, telephone and elevator.

Appendix C

Protocols of ‘learning wordmeanings’

Protocol 1

1: [reads sentence 1]

2: oh ... when you find yourself

3: ... in a room with 1 or 2 kolpers on a court

4: ... well, it evidently is something depressing

5: something that takes away the view

6: ... er ...

7: with 1 or 2

8: although, not necessarily so

9: ... because a room on a court means having no view at all

10: ... so, strictly speaking that does not say anything

11: ... it has something to do with a room ...

12: I am looking whether kolpers is linked to room or court

13: ... a room with 1 or 2 kolpers on a court

14: ... I think it means something like windows or so

15: like a room with 1 or 2 windows on a court

16: but I don’t understand why they put 1 or 2 windows because

17: that would imply that there could well be other windows

18: that do provide a large view

19: ... first write down what I know now

20: ... it has something to do with a room

21: I think one could say that its a part


22: ... of the room [starts writing]

23: a kolper can

24: ... I have to say ‘can be’ because it can as easily be part

of something else

25: ... can be part of a room.

Note: Kolper can be part of a room

26 [reads sentence 2]

27: many people long for kolpers

28: that could mean that it is

29: ... er is kind of shut off window

30: or a window being in the shade

31: that would be congruent with the last sentence with that

depressing bit

32: maybe it means something like a window blocked by a wall

33: ... right in front

34: not permitting a view ... er and is cool ...

35: during something like a heatwave

36: ... er ...

37: I can’t infer much new information

38: I just note down that it has something to do with a room

39: ... er ...

40: now I am looking at this sentence’s construction again

41: ... who are the ones longing for kolpers

42: ... if a kolper were a window in the shade or something

like that

43: they would not have to buy marquees

44: maybe it are the shops or something that are longing for

kolpers ...

45: actually this sentence makes it less clear

46: except that I now know that it has something to do with a

window

47: I don’t know much else ...

Note: a kolper has something to do with a room


49: this reminds me immediately of a window without a view or

something

Protocols of ‘learning word meanings’ 181

50: or a wall just in front of it

51: er ... maybe more precise ...

52: something that is in the shadow ...

53: because on the south side you would expect sunlight in the

room

54: and if that’s just the place where the kolpers are

55: then it could mean something like a window

56: which permits no light to come through

57: but then I do not understand the previous sentence very well

58: ... oh, now I suddenly do understand the previous sentence

59: ... it is that people long for shadowed windows

60: and therefore they buy marquees ...

61: so I will just write that down ...

62: a kolper seems to be a shadowed window or something ...

63: [writes] ...

64: I believe that that’s it

65: that one can say nothing more about it ...

66: it may, but is not necessarily shadowed by a house in front

67: it may but needs not be

68: but it may as well be by marquees and so er ...

69: see what have I got now: shadowed room ...

70: yes that is it ...

Note: A kolper seems to be a shadowed window

Protocol 2


a room with two kolpers on a court is not very attractive

2: let me go back to this

3: what am I supposed to do

4: it is in there

5: I see, OK

6: two kolpers on a court is not very attractive ...

7: it doesn’t suggest anything to me offhand except that

8: let’s see

9: two things on a court

10: could possibly produce some kind of a crowding situation

11: or let’s see what else could be unattractive about it


12: could be some conflict

13: maybe I don’t know

14: if crowding is the thing that makes them very unattractive

15: perhaps they are very large

16: each kolper is rather large

17: and putting two together crowds the space somehow

18: or else they are belligerent

[writes]

19: they may fight when they are together

20: so ... it doesn’t specify it for me

21: so let’s see

22: what am I supposed to do

23: I still don’t quite understand

24: what it is that I am supposed to do

25: figure out what the sentence means

26: well the meaning of the sentence is not so difficult in

this case

27: a room with two objects of some

28: two kolpers on a court

29: how about the room

30: a room with something on a court

31: it doesn’t make very much sense to me

32: a room on a court ...

Note: belligerent, crowded

E: perhaps you should have a look at the next sentence


during a heatwave many people long for kolpers, so

the sale of marquees reaches a record

34: well, let’s see

35: this is, this gives some some relief

36: kolpers, a kolper gives a relief from heat

37: marquee, a marquee has something to do ...

38: the sale of marquees

39: a marquee is a kind of ...

40: if there is a heatwave signs appear advertising these things

41: because they somehow give relief from the heat

42: alright


43: that still doesn’t specify it very precisely

44: it could be any number of things I suppose

45: relief from the heat

46: it could be a cold place to go

47: an airconditioned ...

48: I don’t know

49: sale of marquees

50: do I get to look at another one

E: yes, you just turn over the sheet


the ad stresses the fact that the room is facing south

but just on the south side the room has kolpers

52: usually south facing rooms would be warmer than north

facing rooms

53: not necessarily, but it might be

54: but the thing is that at the south side the room has these

kolpers

55: and the kolpers are things that appear to have

56: something to do with unpleasant heat


although this room certainly does not have kolpers it is

so deep that you have to turn on the light, during the day

as well, on clouded days

58: this is also rather puzzling because

59: the very first one refers to the kolpers being on a court

60: two kolpers on a court

61: now, wouldn’t normally ...

62: saying a room has two such and such on a court

63: that sentence suggests a room facing on a court

64: so, a space outside the room

65: so the kolpers are not in the room

66: but the sentence number 4 seems to ...

67: now OK, the room does not have kolpers

68: this room does not have kolpers

69: that means that some room might have them

70: isn’t that impossible

71: but in any case the room is very deep


72: and that has something to do with this whole story

73: maybe because if you turn the light on it gets warmer

74: let’s see


those large trees on the canals are indisputably very

beautiful but as a consequence many canalhouses have to

put up with kolpers all summer

76: OK, let’s look what I can make out of this

77: to put up with these things all summer so these are

78: so these things appear only during summer

79: I am not getting a picture of what these things could be

80: except that they are not err ...

81: let’s see

82: I have the impression from what is stated there

83: these things wherever they are

84: are not found in rooms

85: are maybe seen from rooms

86: but then again I am not sure because sentence number 4

suggests

87: because it says although this room certainly does not

have kolpers

88: now that conditional phrase suggests some rooms might

possibly have them

89: this one certainly does not have them

90: and some rooms might have them

91: OK, but if they are something that you look out on

92: then if that were true

93: you wouldn’t normally expect to find them in a room at all

94: but let me see

95: what information do I actually have

96: OK, looking out on these things is not attractive

97: what else can I put that together with

98: people long for them during heatwaves

99: so it has something to do with avoiding heat

100: so my initial thought of why they are unattractive

101: may well or may not be the case

102: OK perhaps a kolper is some kind of a space

103: in which you can cool off

104: OK that might be a consistent interpretation


105: it could perhaps be an enclosure of some kind

106: that is found on the water or on a canal

107: maybe, I don’t know

108: because that would explain the advertisements during

heatwaves

109: as is often the case with temporary structures it might

be unattractive

110: maybe a small one is tolerable

111: so that might explain the first sentence

112: so perhaps it is some ...

113: my conclusion is that it is some kind of structure

114: that is not especially attractive

115: and that if you have more than one in the same area

116: and if you look out on such things

117: you talk about things that are not pleasant to look at

Appendix D

Analysing expertproblem-solving

In Chapter 8 we gave an example from a medical domain. Below we reproducethe case description and a full protocol.

D.1 Case description

(1) Sex: female(2) Age: 30-40 years(3) Complaints: palpitations

dyspnoeatired

(4) How long has the patient had dyspnoea? over 4 hours(5) Has the patient had this dyspnoea ever before? yes(6) Does the dyspnoea get worse with a deep sigh? no(7) Does the dyspnoea get worse when moving? no(8) Does the patient have pain? yes(9) How long has the patient had pain? over 4 hours(10) Where is the pain? left in the chest(11) Does the patient feel the pain anywhere else? no(12) Has the patient had this feeling ever before? yes


(13) What was the patient doing at that time? quiet(14) Did the patient use tongue tablets for this? no(15) Did the patient use those tablets this time? no(16) Is the pain at the same place as before? yes(17) Is the feeling the same as before? yes(18) Does the pain get worse with a deep sigh? no(19) Does the pain change with movement? no(20) How strong is the pain? somewhat strong(21) Does the patient feel sick? no(22) Does the patient sweat? no(23) Is the patient under treatment by a cardiologist

or internist? yes(24) Has the patient ever before had a heart infarction? no(25) Are there members of the patients family who

have had a heart infarction at an early age? yes(26) Does the patient smoke more than five

cigarettes a day? no(27) Has the patient done this a short time ago? no(28) Does the patient have high blood pressure

(hypertension)? yes(29) Does the patient have diabetes? no(30) What does the patient think that is wrong? the heart

D.2 Protocol

1: This is a woman between 30 and 40 years old

2: so I think that it’s probably not too serious.

3: ... uhh Patient has palpitations,

4: is dyspnoea and tired,

5: and that are symptoms from what I think,

6: they don’t really belong to uhh cardiac complaints.

7: ... This could be uhh a more non-organic chest pain kind of

patient.

8: Palpitations do occur with women of that age as tachycardia

9: but then you should see the cardiogram.

10: This dyspnoea is has been going on for more than four hours

11: so it is probably not very urgent ... but ...

12: everything is possible.

13: She has had this dyspnoea before,

14: but still, this doesn’t tell me very much because

Analysing expert problem-solving 189

15: this might have happened one day ago, but also five years

ago,

16: so I don’t know what to do with this information,

17: I would like to ask when she has had this dyspnoea before.

18: ... uhh The pain doesn’t get worse when she is sighing

deeply.

19: ... That can be interpreted as evidence against pulmonary

embolism,

20: uhh dyspnoea doesn’t get worse when she’s moving.

21: When the dyspnoea gets worse if the person is moving

22: that would suggest problems with the muscles or the bones

23: of the thorax, but this doesn’t look like that.

24: She’s having pain and this pain has been going on

25: for more than four hours as well,

26: and this makes me wonder if the pain and the dyspnoea are

somehow related,

27: because, if they both last for longer than four hours

28: they might be the same thing,

29: buh then the patient is unable to distinguish

30: between the concept of pain and the concept of dyspnoea.

31: The pain is on the left side of the chest.

32: From this we also know that this is

33: no evidence for angina pectoris or ischaemia in any case.

34: There is no pain in other places,

35: so there is no pain in the arms or in the chest.

36: She has had this feeling before, then in rest.

37: ... uhh ... This is strange because pain and dyspnoea

in rest

38: are usually not anginuous and ... then what matters is,

39: if we’re talking about the pain

40: and the pain occurs during rest,

41: then it might be an infarction.

42: ... Now I see,

43: suddenly we’re talking about this feeling.

44: This feeling refers to,

45: We are talking about both dyspnoea and pain

46: and I don’t know what this feeling actually means.

47: This patient has had this feeling before,

48: ... you have to be more specific about this.

49: What do you mean, dyspnoea or pain.


50: Anyway, she has had it before during rest

51: and, I think that this is a bit strange

52: because, if it is anginuous, and it is during rest,

53: then it might be an infarction

54: but it is strange if the pain, or the dyspnoea

55: always occurs during rest, because,

56: if it is related to the heart, then it is usually not

in rest.

57: ... And, if it was during rest now, and it was during

rest then,

58: than it is an infarction maybe,

59: but then should it have been an infarction then,

60: and that is strange.

61: So complaints about a pain that only occurs during rest,

62: or dyspnoea, are not very strong evidence for a infarction.

63: Well, she wasn’t using tablets for the tongue,

64: so she had, they probably were not prescribed.

65: She hasn’t used the tablets now either.

66: Actually, the question is if she was having those tablets

at all

67: You see, there is a difference between having them and

using them

68: If you don’t have the tablets you can’t use them,

69: but if you do have those tablets you have a choice of

using them

70: Do you see what I mean?

71: ... uhh Is the pain in the same place as before? Yes.

72: Well that is good, or anyway it gives a uhh

73: it is something that is consistent in the story,

74: and is it the same feeling as before?

75: That is also, the answer is yes,

76: that is also something that makes the story consistent.

77: It doesn’t get worse when she is sighing deeply.

78: Why are they asking this, because

79: this has been asked as dyspnoea before.

80: Here, first it is asked if the dyspnoea gets worse

81: when she is sighing deeply and then it is asked

82: if the pain gets worse when she is sighing deeply.

83: Actually, this question isn’t very suitable for dyspnoea

84: because having pain when you are sighing deeply


85: belongs to pulmonary pain and

86: pulmonary pain is interpreted, felt, as real pain.

87: We have said that both pain and dyspnoea

88: have to be seen as two words that indicate the same thing

89: because the pain of angina pectoris

90: is sometimes interpreted as dyspnoea,

91: but the pain of pulmonary pain is not something

92: that you would refer to as dyspnoea.

93: So, now actually the question if the dyspnoea gets worse

94: when she is sighing deeply is not very logical

95: if that question will return as pain when she is sighing

deeply.

96: Anyway, this is not the case.

97: The pain doesn’t get worse when she is moving.

98: This is the same question as about dyspnoea.

99: ... The the question is, what do you mean by moving?

100: We always meant moving arms and legs and chest,

101: and things like that, but in a way,

102: exertion is a way of moving too,

103: and that is something that we do not mean.

104: ... Well, the intensity of the pain is average.

105: She is not sick

106: she is not sweating,

107: that is not very useful information at the moment.

108: She is under treatment of a cardiologist or an internist.

109: That is strange because, why are women between 30 and 40

110: under treatment of a cardiologist or an internist.

111: So she has never had an infarction.

112: There are members of the family who have had a infarction

at a young age.

113: ... uhh She doesn’t smoke more than five cigarettes a day.

114: She hasn’t done this before either.

115: ... uhh ...

116: Well, these are not risk factors

117: that really apply to young women.

118: That is something that starts to matter when they are,

say 40, 50, 60 years old,

119: because these risk factors have to do their work for a

longer time.

120: Still, we know that sometimes, but this is highly


exceptional,

121: women of this age do get an infarction.

122: Well she does have a high blood pressure.

123: You would like to know what kind of high blood pressure,

124: how high I mean, and if she is getting treatment for it,

125: but that is not so relevant at this moment.

126: uhh This rises the question if this is the reason

127: that she is under treatment by a cardiologist or internist.

128: She doesn’t have diabetes and

129: her own opinion is that it is her heart.

130: Well, if that is what she is thinking herself,

131: this means that she is very worried about it.

132: Maybe this is because she is under treatment of a

cardiologist,

133: or maybe because she is a hearts neurotic.

134: uhh Do I know everything that I need to know?

135: uhh Concerning the medical history only,

136: I think that there is not very much left

137: that I would like to know, apart from uhh ...

138: maybe the pain that she,

139: I would like to know a bit more about the pain.

140: She says that she has had this before and

141: that it was at rest then,

142: and I would like to know if it happened

143: every day, every week, every month,

144: or that it happened only once before.

145: If I look at the story as a whole,

146: I think that it is not very likely that she,

147: that this is ischaemia

148: and I would like to ask again

149: if the pain really doesn’t occur during exertion.

150: Apparently she says spontaneously

151: that the pain occurs during rest,

152: but I would like to ask

153: if the pain really doesn’t occur during exertion,

154: and if she is able to exert.

155: Because, if someone is able to exert

156: and the pain only occurs during rest,

157: then that is evidence against ischaemia.

158: So, for the moment, I would like to conclude that


159: this is probably not cardiac.

160: uhh It is not impossible that she is having

161: something like a pulmonary embolism

162: but I wouldn’t put my money on it.

163: I wouldn’t send an ambulance.

Appendix E

Coding scheme architecturaldesign

In this section the complete coding scheme constructed by Hamel (1990) isgiven. To guide the reading of the coding scheme we present again the psy-chological model of architectural design in Figure E.1.

Task schema levelOrientation: general comment from which appears:O1: realization that a design task is concerned, for example:

‘what am I to design?’O2: realization of the kind of task: ‘it is a new development’;

‘a sort of youth centre I see’O3: that the outside area is to be designed as well: ‘so, also playgrounds’O4: estimation of the usefulness of the data suppliedO5: estimation of the extra information needed

Execution A: general comment concerning:X1: task-irrelevant incidents during design: ‘let me answer the phone’;

‘I’ll order coffee’X2: degree of difficulty, measure of effort, duration of activities:

‘I’ll see how far I can get in this amount of time’X3: strategy, procedure and approach during the analysis of the task:

‘let me begin by’; ‘I happen just to have built such a clubhouse,what you need to have there is’

X4: materials (maps, photographs) and equipment (paper, felt-tips)


Problem conception

schema

Working memory

Orientation

Execution A

Execution B

OrientationExecution

Synthesis schema



Evaluation

Styling schema

Analysis schema

Task schema

Long term

memory

Evaluation

FIGURE E.1: The psychological model of architectural design

Coding scheme architectural design 197

Analysis schema levelOrientation: comment showing the use of:AO1a: task textAO1b: task maps and drawingsAO1c: task photographsAO1d: information supplied orally by the investigatorAO2: own knowledgeAO3: information gathered, literatureAO4: communication with the clientAO5: communication with an expertAO6: making an assumption, estimation

Subjects the orientation is concerned with, to be permuted with AO1through AO6:

not one specific subject:0: just reads

situational data:1: characteristics of the situation: position, urban development

characteristics, zoning plan, construction characteristics,demographic data, services

2: measurements of the situation3: traffic4: walking routes and openings in the situation5: position of the situation in relation to the sun6: concerning users: age, behaviour

task data:7: available budget8: number of users9: management and exploitation

task requirements:10: functions11: criteria for functions, characteristics of functions12: measurements for functions13: concerning use14: concerning appearance


15: concerning technique

data, general requirements:16: norms, suggestions, regulations17: concerning use18: concerning technique

Execution: general comment concerning:AX1: strategy, procedure and approach during the synthesize of the design:

‘I’ll now look at the consequences of this position for the playingfacilities for the little ones and for safety’

AX2: the task as submitted: ‘this task leaves rather a lot of freedom’

Synthesis schema level

Orientation:SO1: reading through, checking, or recapitulating data, conclusions

or results of actions (for example: surface areas, characteristicsof functions)

SO2: estimation concerning combination of aspects of the design:‘this area seems very large for a building like this’; ‘I thinkwe’ll be able to combine a few functions in one room’

Execution: comment showing one is working on:SX1: organization diagram building, followed by SE1 or SE2SX2: organization diagram outside area, followed by SE1 or SE2SX3: area diagram building, followed by SE1 or SE2SX4: area diagram outside area, followed by SE1 or SE2SX5: construction diagram, followed by SE1 or SE2SX6: situation, urban-developmental layout, followed by SE1 or SE2SX7: cross-section building, followed by SE1 or SE2SX8: cross-section outside area, followed by SE1 or SE2SX9: isometric perspective

Evaluation:SE1: comparison of expectations, inspection, or checking of organization

diagram, area diagram, etc.: ‘I’ll have land to spare’;‘it’s too small for the number of users, but too big for thebudget available’

SE2: redesigning of organization diagram, area diagram, situation or

Coding scheme architectural design 199

cross-section, without check of earlier productSE3: organization diagram, area diagram, situation, or cross-section,

not followed by SE1 or SE2

Analysis schema level

Evaluation:AE1: comparison of expectations, inspection, or checking of data or

requirementsAE2: renewed deduction of data or requirements, search, etc.,

without check of earlier productAE3: data or requirements, not followed by AE1 or AE2AE4: making changes in the task

Task schema level

Execution B: general comment concerning:X1: task-irrelevant incidents during design: ‘let me answer the phone’;

‘I’ll order coffee’X2: degree of difficulty, measure of effort, duration of activities:

‘I’ll see how far I can get in this space of time’X4: materials (maps, photographs) and equipment (paper, felt-tips)X5: strategy, procedure and approach in the styling of the design:

‘I’ll now see how to fit this nicely into this corner’

Styling schema level

Orientation:YO1: study of one’s own sketches; read through, check, or recapitulate

data, conclusions, or results of actionsYO2: estimation concerning the appearance of the building and the

situation

Execution: comment showing one is working on:YX1: situation, followed by YE1 or YE2YX2: floor plan, followed by YE1 or YE2YX3: cross-section, followed by YE1 or YE2YX4: appearance, followed by YE1 or YE2YX5: perspective, isometric perspective, followed by YE1 or YE2


YX6: construction, followed by YE1 or YE2YX7: materials, followed by YE1 or YE2YX8: texture, followed by YE1 or YE2YX9: colours, followed by YE1 or YE2

Evaluation:YE1: comparison of expectations, inspection, or checking of the appearance

of the situation, floor plan, etc.: ‘I don’t like this’YE2: renewed sketching, without check of earlier sketchYE3: situation, floor plan, cross-section, appearance, etc.,

not followed by YE1 or YE2

Task schema level

Evaluation: general comment concerning:E1: the performance of the task: quick, easy, difficultE2: the experimental session as a wholeE3: explicit reflection on method, comment on the process,

interpretations, generalizations and analysis of one’s own thinkingE4: remarks on expectations concerning the usefulness of data supplied

and extra information obtained

Appendix F

Protocol of noviceproblem-solving in physics

This is the text of a complete segmented protocol of a novice problem solver.The first part of the protocol contains the literal text of the problem.

1: Ideal gas is in a container

2: that is closed by a piston

3: the volume of the gas is 2 litres

4: and the pressure 120 ... kilopascal

5: by slowly moving the piston outward

6: the volume is increased to 3 litres

7: the temperature of the gas

8: is kept constant

9: what is the pressure of the gas

10: after moving the piston

11: well this will be less of course

12: I mm I mm have to calculate that

Experimenter: yes and keep thinking aloud

13: well ehhh

14: let’s see

15: the temperature will be ehhh cons

16: no

17: let’s first go from 2 litres

18: to 3 litres


19: there is a formula for that

20: is probably in pressure

21: calculated in pressure

22: 120 kilopascal

23: Let’s see ehh

24: there is a formula

25: that says P is F divided by O

26: but that is not asked

27: that is simply given

28: perhaps another

29: yes ... ehhhh ...

30: here ...

31: this is it ... probably

32: I have to find a relation

33: between pressure and volume

34: where is it ...

35: the problem is that this pressure

36: is related only to surface

37: I think that I must be here ...

38: probably a ehh first

39: look at the ideal gasses

40: P is 120

41: that is the first ... law

42: or the first rule that ...

43: V is RT

44: that P is given

45: is 120 kilopascal

46: yes the V that is the first 2 litres

47: times 2

48: V is always given in litres

49: ehh or in cubic meters? ...

50: ehhhh ... well ...

51: I leave that to the 2

52: is R and that R

53: or is it not too heav

54: an ideal gas ... the ideal gas

55: was 3 2 R

56: is 3 divided by 2 ... oh

57: then it is Cp or Cv

58: constant pressure or constant ehh volume

Protocol of novice problem-solving in physics 203

59: it is no constant volume

60: so I use constant pressure

61: that R is ehh

62: take the universal gas constant

63: so times 8.13 ...

64: oh ... 31

65: 4

66: Joule times kelvin ... times T

67: that is the temperature

68: and that remained constant

69: so that will drop out

70: if I make another equation

71: so here we get again ehhh ...

72: this pressure must change

73: ... 2 ... is now ...

74: this must be the same

75: etcetera

76: yes 240 is

77: and then you can compute this T ...

78: shall I do that? ehh

79: 240 divided by 8.314 ... eh

80: 40 divided by ... 8 ... 14

81: divided by 5 ... times 2 ... 11.55 ...

82: so here should be 11.55 ...

83: this becomes 3

84: because that is the change

85: now, then I compute this ... ehh 5

86: divided by 2 ... divided by 3 ...

87: times 8.314 times 11.55 is ...

88: that will indeed be less

89: 80 0 2

90: now I am not certain if I ehh

91: should write litres or cubic metres

92: a litre is ... how many litres go into a cubic metre?

93: 1000 because a litre is a cubic decimetre

94: or not, no ... well it does not matter

95: it will all drop out against each other

96: this is the answer

Experimenter: 80, so 80 what?


97: let’s see, P

98: and P is kilopascal

99: should be kilopascal

Bibliography

Anderson, J. R. (1989). A theory of the origins of human knowledge. Ar-tifical Intelligence, 40:313–351.

Anderson, J. R. (1990). Cognitive psychology and its implications. W.H.Freeman, New York.

Anzai, Y. & Simon, H. A. (1979). A theory of learning by doing. Psycho-logical Review, 86:124–140.

Barnard, Y. F. & Erkens, G. (1989). Simulation and analysis of problemsolving and dialogue processing within co-operative learning. In Mandl,H., de Corte, E., Bennett, N., & Friedrich, H., editors, Learning andInstruction: European research in an international context, volume 2.1,pages 181–196. Pergamon Press, Oxford.

Breuker, J. A. & Wielinga, B. J. (1987). Use of models in the interpre-tation of verbal data. In Kidd, A. L., editor, Knowledge Acquisition forExpert Systems, a practical handbook, pages 17–44. Plenum Press, NewYork.

Brownstone, L., Farrell, R., Kant, E., & Martin, N. (1985). Pro-gramming expert systems in OPS5: an introduction to rule-based program-ming. Addison-Wesley, Reading, Massachusetts.

Chi, M. T. H., Bassok, M., Lewis, M., Reimann, P., & Glaser,R. (1989). Self-explanations: how students study and use examples inlearning to solve problems. Cognitive Science, 13:145–182.

Clancey, W. J. (1988). Acquiring, representing and evaluating a compe-tence model of diagnosis. In Chi, M. T. H., Glaser, R., & Farr, M. J.,editors, The Nature of Expertise, pages 343–418. Lawrence Erlbaum As-sociates, Hillsdale, New Jersey.

de Groot, A. D. (1946). Het denken van den schaker (Thought in chess).PhD thesis, University of Amsterdam, Amsterdam (in Dutch).

de Groot, A. D. (1965). Thought and choice in chess. Mouton, The Hague.

Diaper, D., editor (1989). Knowledge Elicitation: principles, techniques andapplications. Ellis Horwood, Chichester.


Duncker, K. (1945). On problem solving. The American PsychologicalAssociation, Washington. (orig. 1935).

Ericsson, K. A. & Simon, H. A. (1993). Protocol Analysis: Verbal Reportsas Data (revised edition). MIT Press, Cambridge, Mass.

Everitt, B. S. (1977). The analysis of contingency tables. Chapman andHall, London.

Ferguson-Hessler, M. G. M. & de Jong, T. (1990). Studying physicstexts: differences in study processes between good and poor performers.Cognition and Instruction, 7:41–54.

Gruber, T. G. (1989). Automated knowledge acquisition for strategic knowl-edge. Machine Learning, 4(3/4):293–336.

Hamel, R. (1990). Over het denken van de architect (On the thought processesof architects). AHA books, Amsterdam (in Dutch).

Jackson, P. (1990). Introduction to Expert Systems. Addison-Wesley, Read-ing, Massachusetts.

Jansweijer, W. N. H. (1988). PDP: Een benadering vanuit de kunstmatigeintelligentie van probleemoplossen en leren in een semantisch rijk domein(PDP: An artificial intelligence approach to problem solving and learn-ing by doing in a semantically rich domain). PhD thesis, University ofAmsterdam, Amsterdam (in Dutch).

Jansweijer, W. N. H., Elshout, J. J., & Wielinga, B. J. (1987). Theexpertise of novice problem solvers. In du Boulay, B., Hogg, D., & Steels,L., editors, Advances in Artifical Intelligence II, pages 121–130. ElsevierScience Publishers, Amsterdam.

Kintsch, W. & Greeno, J. G. (1985). Understanding and solving arith-metic word problems. Psychological Review, 92:109–129.

Kuipers, B. & Kassirer, J. P. (1984). Causal reasoning in medicine:Analysis of a protocol. Cognitive Science, 8:363–385.

Kuipers, B., Moskowitz, A. J., & Kassirer, J. P. (1988). Critical deci-sions under uncertainty: Representation and structure. Cognitive Science,12:177–210.

Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). SOAR: anarchitecture for general intelligence. Artificial Intelligence, 33:1–64.

Lucas, P. & van der Gaag, L. (1991). Principles of expert systems.Addison-Wesley, Reading, Massachusetts.

Maier, N. R. F. (1931). Reasoning in humans II. The solution to a problemand its appearance in consciousness. Journal of Comparative Psychology,12:181–194.

Marcus, S., editor (1988). Automatic knowledge acquisition for expert sys-tems. Kluwer, Boston.

Bibliography 207

Meyer, M. A. & Booker, J. M. (1991). Eliciting and Analyzing ExpertJudgement: A Practical Guide. Academic Press, London.

Mitri, M. (1991). A task-specific problem solving architecture for candidateevaluation. AI Magazine, 12(3):95–109.

Morgoev, V. K. (1989). ARIADNA: a knowledge elicitation support sys-tem. AI Comunications, 2(3/4):131–141.

Musen, M. A. (1989a). Automated Generation of Model-Based Knowledge-Acquisition Tools. Pitman, London.

Musen, M. A. (1989b). Automated support for building and extending ex-pert models. Machine Learning, 4:347–376.

Newell, A. (1990). Unified theories of cognition. Harvard University Press,Cambridge, Massachusetts.

Newell, A. & Simon, H. A. (1972). Human problem solving. Prentice HallInc., Englewood Cliffs, New Jersey.

Nisbett, R. E. & Wilson, T. D. (1979). Telling more than we can know:verbal reports on mental processes. Psychological Review, 84:231–259.

Patil, R. S. (1988). Artificial intelligence techniques for diagnostic reason-ing in medicine. In Shrobe, H. E., editor, Exploring Artificial Intelli-gence: Survey Talks from the National Conferences on Artificial Intelli-gence, pages 347–379. Morgan Kaufmann, San Mateo, California.

Post, W. M., Koster, R. W., Zocca, V., & Sramek, M. (1993). Co-operative medical problem solving. In Andreassen, S., Engelbrecht, R.,& Wyatt, J., editors, Artificial Intelligence in Medicine, Proceedings ofthe 4th Conference on Artificial Intelligence in Medicine Europe, pages259–270. IOS Press, Amsterdam.

Riley, M. S., Greeno, J. G., & Heller, J. I. (1983). Development ofchildren’s problem-solving ability in arithmetic. In Ginsburg, H. P., edi-tor, The development of mathematical thinking, pages 153–196. AcademicPress, New York.

Sandberg, J. A. C., Breuker, J. A., & Winkels, R. G. F. (1988).Research on help-systems: Empirical study and model construction. InKodratoff, Y., editor, Proceedings of the European Conference on ArtificialIntelligence, pages 106–111. Pitman Publishing, London.

Sandberg, J. A. C. & de Ruiter, H. (1985). The solving of simplearithmetic story problems. Instructional Science, 14:75–86.

Schreiber, A. T., Wielinga, B. J., & Breuker, J. A., editors (1993).KADS: A principled approach to knowledge-based system development.Academic Press, London.

Titchener, E. B. (1929). Systematic psychology: prolegomena. Macmillan,New York.


van Daalen-Kapteijns, M. M. & Elshout-Mohr, M. (1981). The ac-quisition of word meanings as a cognitive learning process. Journal ofVerbal Learning and Verbal Behavior, 20:386–399.

van Harmelen, F. & Balder, J. R. (1992). (ML)2: a formal language forKADS models of expertise. Knowledge Acquisition, 4(1):127–161.

van Someren, M. W. (1990). What’s wrong? Understanding beginners’problems with Prolog. Instructional Science, 19(4-5):257–282.

van Someren, M. W. & Elshout, J. J. (1985). Het effekt van zelfreflek-tie op leren probleemoplossen (The effect of self-reflection on learning tosolve problems). In Lodewijks, J. G. L. C. & Simons, P. R. J., editors,Zelfstandig leren, pages 110–117. Swets en Zeitlinger, Lisse (in Dutch).

Wielinga, B. J., Schreiber, A. T., & Breuker, J. A. (1992). KADS:A modelling approach to knowledge engineering. Knowledge Acquisition,4(1):5–53.

Bibliography 209

(Anzai & Simon, 1979) (Schreiber et al., 1993) (Hamel, 1990) (Ander-son, 1990) (Jackson, 1990) (Diaper, 1989) (Meyer & Booker, 1991) (Morgoev,1989) (Maier, 1931) (Chi et al., 1989; Ferguson-Hessler & de Jong, 1990; vanSomeren & Elshout, 1985) (Barnard & Erkens, 1989) (Sandberg et al., 1988)(Ericsson & Simon, 1993) (Gruber, 1989) (Marcus, 1988) (Nisbett & Wilson,1979) (Titchener, 1929) (Duncker, 1945) (de Groot, 1965) (de Groot, 1946)(Newell & Simon, 1972) (Breuker & Wielinga, 1987) (van Daalen-Kapteijns &Elshout-Mohr, 1981) (Everitt, 1977) (Jansweijer, 1988) (van Someren, 1990)(Post et al., 1993) (Schreiber et al., 1993) (Brownstone et al., 1985) (Lucas &van der Gaag, 1991) (Laird et al., 1987) (Schreiber et al., 1993) (Wielinga et al.,1992) (van Harmelen & Balder, 1992) (Breuker & Wielinga, 1987) (Jansweijeret al., 1987; Jansweijer, 1988) (Newell, 1990; Laird et al., 1987) (Anderson,1989) (Musen, 1989a) (Musen, 1989b) (Kintsch & Greeno, 1985) (Sandberg& de Ruiter, 1985) (Riley et al., 1983) (Patil, 1988) (Mitri, 1991) (Clancey,1988) (Kuipers & Kassirer, 1984) (Kuipers et al., 1988)

Think Aloud Method

Documents

Transcript of Think Aloud Method