A microgenetic investigation of stability and continuity in theory of mind development

24
Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society A microgenetic investigation of stability and continuity in theory of mind development Emma Flynn* School of Psychology, University of St Andrews, UK The processes behind the transition from consistently failing tests of false belief understanding to consistently passing the tests was investigated by tracking changes in children’s mental state understanding. Participants were 42 children (aged 3;1 to 4;3). There were two conditions; an experimental condition in which children were tested on a battery of eight theory of mind tests every four weeks for six phases of testing, and a control condition in which children only completed the battery of tests at the first and last testing phases. The profiles of performance showed that an understanding of false beliefs develops gradually and the development is relatively stable. An examination of the types of explanation children give on tests of false belief understanding showed that initially they rely on reality, then they progress through a period of confusion, where they do not provide an explanation, to a final stage in which they are able to explain behaviour by referring to an individual’s false belief. Further analyses examined practice effects, construct validity, and the role of verbal ability on the development of mental state understanding. Since Premack and Woodruff’s (1978) study, which first introduced the phrase ‘theory of mind’, there has been a great deal of interest in the development of an understanding of mental states. The development of theory of mind is a protracted process, with advances in the understanding of mental states being reported from infancy (Repacholi & Gopnik, 1997) through to adulthood (Happe ´, 1994). Yet, the majority of theory of mind research has concentrated on the understanding of false beliefs, which typically occurs between the ages of 3 and 5 years. It is now accepted that by about 4 years of age, typically developing children give correct answers on tests of false belief understanding (Wellman, Cross, & Watson, 2001). In tests such as the unexpected transfer task, older children appreciate that a story character who holds a false belief about the location of an object will search for that object not where it is truly located, but where the character believes the object is located. However, younger children refer to reality and state that the character will search for the object in its true location. * Correspondence should be addressed to Emma Flynn, School of Psychology, University of St Andrews, St Andrews, Fife, Scotland, KY16 9JP, UK (e-mail: egf1 @st-andrews.ac.uk). The British Psychological Society 631 British Journal of Developmental Psychology (2006), 24, 631–654 q 2006 The British Psychological Society www.bpsjournals.co.uk DOI:10.1348/026151005X57422

Transcript of A microgenetic investigation of stability and continuity in theory of mind development

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

A microgenetic investigation of stability andcontinuity in theory of mind development

Emma Flynn*School of Psychology, University of St Andrews, UK

The processes behind the transition from consistently failing tests of false beliefunderstanding to consistently passing the tests was investigated by tracking changes inchildren’s mental state understanding. Participants were 42 children (aged 3;1 to 4;3).There were two conditions; an experimental condition in which children were testedon a battery of eight theory of mind tests every four weeks for six phases of testing, anda control condition in which children only completed the battery of tests at the first andlast testing phases. The profiles of performance showed that an understanding of falsebeliefs develops gradually and the development is relatively stable. An examination ofthe types of explanation children give on tests of false belief understanding showed thatinitially they rely on reality, then they progress through a period of confusion, wherethey do not provide an explanation, to a final stage in which they are able to explainbehaviour by referring to an individual’s false belief. Further analyses examined practiceeffects, construct validity, and the role of verbal ability on the development of mentalstate understanding.

Since Premack and Woodruff’s (1978) study, which first introduced the phrase ‘theory ofmind’, there has been a great deal of interest in the development of an understanding of

mental states. The development of theory of mind is a protracted process, with advances

in the understanding of mental states being reported from infancy (Repacholi & Gopnik,

1997) through to adulthood (Happe, 1994). Yet, the majority of theory of mind research

has concentrated on the understanding of false beliefs, which typically occurs between

the ages of 3 and 5 years. It is now accepted that by about 4 years of age, typically

developing children give correct answers on tests of false belief understanding (Wellman,

Cross, & Watson, 2001). In tests such as the unexpected transfer task, older childrenappreciate that a story character who holds a false belief about the location of an object

will search for that object not where it is truly located, but where the character believes

the object is located. However, younger children refer to reality and state that the

character will search for the object in its true location.

* Correspondence should be addressed to Emma Flynn, School of Psychology, University of St Andrews, St Andrews, Fife,Scotland, KY16 9JP, UK (e-mail: [email protected]).

TheBritishPsychologicalSociety

631

British Journal of Developmental Psychology (2006), 24, 631–654

q 2006 The British Psychological Society

www.bpsjournals.co.uk

DOI:10.1348/026151005X57422

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Many aspects of false belief understanding have been investigated, including the

associated cognitive skills (Carlson, Moses, & Breton, 2002; Happe, 1995), the influence

of a child’s environment (see Carpendale & Lewis, 2004 for an overview), and the

parameters of false belief understanding tests (Wellman et al., 2001). Yet, in the plethora

of theory of mind research, there are only a handful of longitudinal studies that examine

how the same child’s theory of mind skills change over time. We assume from cross-sectional studies that, for typically developing children, there is a predictable sequence

of transition in children’s insights about the mind. However, little is known about how

this transition unfolds or the processes that drive this developmental shift. In order to be

able to discriminate between competing theoretical accounts of theory of mind

development, it is essential to begin with an empirically informed description of

individual children’s natural transition from consistent failure on tests of false belief

understanding to a level of consistent success. Only once such a description is in place

can we make experimental manipulations to differentiate between different theories.

The present study aims to address this gap in the literature by tracking changes in agroup of children’s theory of mind skills during the period of development of false belief

understanding, and identifying changes in the type of information children use to

answer questions regarding false beliefs. Before describing the present study in detail, a

review of research that has provided some insight into the period of transition from no

understanding of false belief to consistent success is presented.

Wellman et al. (2001) carried out a meta-analysis of 178 separate studies that

investigated the development of an understanding of false beliefs. This meta-analysis was

truly insightful, producing a number of important findings regarding the consistency of

false belief understanding with regard to different ages, countries, and tasks. In terms ofthe process of transition, the results showed that children who were 3 years 5 months

(3;5) and younger performed below chance, suggesting that young children made the

classic false belief error by referring to the true state of reality to predict a character’s

actions. However, children who were 4 years or older performed above chance,

suggesting that they were able to acknowledge that people held false beliefs and

correctly predict an individual’s actions from this knowledge. Between the ages of 3;5

and 4 years, there was a period of ‘confused, random performance’ where children were

performing at chance (Wellman et al., 2001, p. 678). Of course, these age parameters are

fluid rather than rigid, as there is variance in the age of onset and competence in falsebelief understanding. However, these parameters provide a useful guideline when

changes in mental state understanding occur.

A critical question arising from Wellman et al.’s meta-analyses is exactly what is

occurring during this period of at-chance level performance? The results of the meta-

analysis were based on the means of the samples of children studied, so it remains

unclear exactly how individual children progress through this period of transition.

There are a number of potential profiles of performance during this period. Firstly, the

at-chance mean could represent a group of children in which half understand false

beliefs and are systematically correct, but the other half do not understand and makesystematic errors. Therefore, false belief understanding is stable, with a sudden,

qualitative shift from no understanding to full understanding, with the at-chance level

mean being produced by two extreme levels of performance. Secondly, all children may

progress through a period of development where they begin by passing one test, then

two and so on, until they develop a full understanding of mental states. Such a

progression is essentially stable and continuous, with the chance-level performance,

suggested by Wellman et al.’s meta-analysis, being produced by individual differences in

Emma Flynn632

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

the initiation of this development. Thirdly, children’s performance on tests of false belief

understanding could be unstable, being ‘random’ and ‘confused’ with children showing

little within or between test consistency. However, if this profile is correct, then we

must bear in mind that during this period of random, confused performance, significant

changes are occurring in children’s cognition that allow them to consistently

understand false beliefs after the age of four years. Finally, there may not be a unique,systematic sequence of performance across children, and so there could be any

combination of the profiles of performance described above.

Wellman and Liu (2004) investigated the transition of children’s theory of mind

during the preschool period, by examining children’s changing competence across a

range of mental states. They used tasks that considered different mental states, but had

been scaled to require similar methodological demands. The sequence showed that

children become aware that two people can have different desires for the same object

before they become aware that people can have different beliefs about the same object.

Such a finding supports the proposal that the understanding of false beliefs isunderpinned by the understanding of desires (Wellman, 1990). Furthermore, Wellman

and Liu (2004) showed that understanding diverse beliefs, judging someone else’s

differing beliefs about the same situation when the child does not know which belief is

true and which is false, occurred before false beliefs, where the child does know which

belief is true and which is false. Finally, differentiating between real and apparent

emotion occurs late in the preschool years. Thus, Wellman and Liu suggest that

the transition in mental state understanding involves modification and mediation; this is

the broadening or generalizing of earlier insights to encompass later insights, and the

process of earlier insights enabling the attainment of later insights.There is a significant gap in our understanding of how individual children progress

from consistently failing false belief understanding tests to consistently passing them.

There is a lack of knowledge relating to the path, rate and variation both within and

across children during this period. Two important steps can be taken to fill this gap.

Firstly, a microgenetic approach can be adopted in which the same children are

repeatedly tested on a battery of theory of mind tests, so that within-participant change

can be examined, over the period of transition. Secondly, explanation tasks can be used

to establish changes in the type of information a child uses to address theory of mind

problems. The majority of theory of mind tasks, especially those that consider falsebelief understanding, do not require children to produce anything more than a simple

pointing gesture or a single-word answer. Such methodological requirements are

employed to overcome any differences in children’s linguistic competence. However, by

only having simple gestures or single-word answers, we are unable to elaborate on the

type of information that a child is using to reach an answer. For example, a child who

answers an unexpected transfer task incorrectly by pointing to the current location of

an object, rather than the original location, may be using a number of different types of

information. She could be using location-based information, for example, ‘she will go

there because that is where the object is’. She could use desire-based information, forexample, ‘she will go there because she wants that object’. Alternatively, she could give

an incorrect answer because she is confused and is simply guessing.

One method of distinguishing between the different types of information that

children use to answer problems of false belief is to ask children to explain why a

character behaves in a certain way in a false belief understanding task (Bartsch &

Wellman, 1989). For example, a child is told a story about a character who places an

object in a particular location before leaving a scene. While the character is absent, the

Stability and continuity in theory of mind 633

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

object gets relocated to a second location. In the prediction version of this task, the

character then returns, and the child is asked where the character will look for the

object. However, in the explanation version of the task, a child is shown the unfolding of

the full sequence of events, including the character going to the incorrect location to

retrieve the desired object. The child is initially asked where the object really is, and

then why the character has gone to the other location. A child who provides an answerthat refers to the story character’s false belief can be said to have an understanding of

false beliefs. In turn, a child who is unable to provide this lucid, and appropriately

complex, justification cannot be said to have an understanding of false beliefs. If a child

does not have a conceptual understanding of false beliefs, and so is consistently

incorrect, then the types of explanations the child gives, for example, location-based,

desire-based or no response/don’t know, provides important insight into the

information they are using during the period of transition before their consistent

success on tests of false belief understanding. Analysing changes in the type of

explanations that children provide on an unexpected transfer explanation task, and howthe children’s performance changes in relation to other mental state understanding

tasks, provides an important indication of children’s cognition during the period of

transition.

There has been much debate regarding the trajectories of the development of

explanations and predictions of behaviour in relation to false beliefs. Bartsch and

Wellman (1989) found that the ability to explain beliefs develops before the ability to

predict behaviour in tests of false belief understanding. In contrast, Wimmer and

Mayringer (1998) found the opposite sequence of development. Bartsch (1998) argues

that this disparity is due to different testing procedures. Procedures that requirechildren to make spontaneous explanations in false belief understanding tasks usually

show poorer results compared with prediction tests. Yet, when a prompt, for example,

‘what does Bill think?’, is included in the procedure, some studies have shown better

performance in explanation tests compared with prediction tests, although not all

(Foote & Holmes-Lonergan, 2003). An unexpected transfer explanation task was used in

the present study, and the specific procedure used provided a compromise between

these two extremes. After the initial test question was asked, if the participant did not

provide an answer focusing on the story characters false belief, then the test question

was repeated.There have been a handful of longitudinal studies investigating theory of mind

development in which participants were tested at two or more time points several months

apart (Astington & Jenkins, 1999; Carlson, Mandell, & Williams, 2004; De Villiers &

Pyers, 2002; Hughes, 1998b). Astington and Jenkins (1999) tested 59 3-year-old children’s

theory of mind and verbal ability at three time points over a 7-month period. They found

that language at an early time point predicted later theory of mind at two of three sets of

time points, whereas the reverse relation never held. Similarly, De Villiers and Pyers (2002)

found that language predicted later theory of mind on two occasions over three time points,

and that the reverse relation held once. Verbal ability plays an important role inunderstanding the narratives of false belief tests, and in providing children with the

resources to produce an appropriate response. However, examining the relations between

verbal skills and theory of mind over time shows that verbal skills play a fundamental role in

the development of theory of mind skills, as early verbal skills predict later theory of mind

ability. However, these longitudinal studies do not provide an insight into changes in theory

of mind as they are occurring. Instead, a more specific focus needs to be adopted. An

approach that allows change to be examined as it is happening is the microgenetic method.

Emma Flynn634

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

The microgenetic approach requires that: (a) observations track changes in the skills that

are of interest; (b) the density of observation is high relative to the change in that

competence; and (c) the collected information is subjected to trial-by-trial analysis to infer

the process that gave rise to the change (Siegler & Crowley, 1991). There are two studies

that have shown this intensive data collection in relation to the development of mental state

understanding.Bartsch and Wellman (1995) present an extensive analysis of the natural language of

10 children studied longitudinally from 112

to 6 years of age. The database contained

more than 200,000 utterances, with 12,000 of these using terms such as think, know,

and want. Such a database provided an exciting opportunity to undertake a fine-grained

analysis of transition in children’s understanding of the mind. The data showed that

children refer to another person’s beliefs before they grant these beliefs a central role in

explaining mind and action. Before explaining actions in terms of beliefs, children tend

to use desires to explain actions. The evidence to support this claim showed that 3-year-

olds progressed through a period when they often talk about beliefs, and even show

some evidence of understanding false beliefs, but they continue to explain actions with

reference to desires. In keeping with the findings of Wellman et al. (2001), Bartsch and

Wellman (1995) found that only at about 4 years do children begin to explain actions by

referring systematically to beliefs. These findings also concur with the findings of

Wellman and Liu (2004), as children’s theory of mind development shows an

understanding of desires before an understanding of beliefs or false beliefs.

Flynn, O’Malley, and Wood (2004) also adopted a microgenetic approach, but rather

than collecting observational data, 21 children aged between 3;1 and 3;10 were

repeatedly tested on three theory of mind tasks over six testing phases. The profiles

were examined to establish the path and rate of theory of mind development. Few

children showed a stable profile of development, as there were a number of regressions

in their aggregate theory of mind scores over the six phases. However, as only three

theory of mind tests were used, it was not possible to establish the true magnitude of

these regressions. In addition, the relations between the children’s performance on the

three tasks was not stable, within some phases of testing the tasks correlated

significantly with one another, but at other phases of testing they did not. These findings

were in keeping with Wellman et al.’s (2001) description of ‘confused, random

performance’ where children were performing at chance during the period of transition

(Wellman et al., 2001, p. 678).

The unstable performance shown on tests of false belief understanding in Flynn et al.

(2004) questions the reliability of these tests. In order to consider tests of false belief

understanding as reliable, we need to be clear that any instability in theory of mind

performance is caused by instability in the construct or associated skill, rather than

unreliability of the test being administered to evaluate the construct. Two studies have

directly examined the test–retest reliability of false belief understanding tasks. Mayes,

Klin, Tercyak, Cicchetti, and Cohen (1996) and Hughes et al. (2000) found that only

between 5% and 12% of test–retest trials show regressions in performance.

Furthermore, Wellman et al. (2001) argue that false belief understanding tests are

reliable. Their meta-analysis showed that overall, first-order false belief understanding

tests behaved reliably at 3;5 and below, as children tended to make the classic reality

error, and at 4 years and above, as children consistently showed an understanding of

false beliefs. Therefore, the random, confused performance that occurs between the

ages of 3;5 and 4 years is not due to a lack of test reliability.

Stability and continuity in theory of mind 635

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

The present study followed the design of Flynn et al. (2004) with a new sample of

children, and incorporated some significant changes. Firstly, seven theory of mind tests

were included rather than three, thus producing a better indication of the magnitude of

any potential between-phase fluctuations. Because seven theory of mind tests were

used, a criterion was set regarding the size of the change. Transitions of two or less

points were considered to be small, transitions of 3 or 4 points were consideredmoderate, and transitions of 5 or more points were considered large. Secondly, Flynn

et al. (2004) looked at the profiles of performance during the emergence of an

understanding of false beliefs, whereas the present study recruited participants from

across the period of transition to discover if and, if so, at which point theory of mind

skills began to stabilize. The study design allowed both within- and across-participant

comparisons to be made to provide an indication of the process of transition from

consistent failure to consistent success on tests of false belief understanding. Thirdly, an

unexpected transfer explanation task was included to discover at which point indevelopment children could explain behaviour with reference to false beliefs, and to

allow an analysis of the types of explanation children gave for a character’s behaviour.

The final change involved the inclusion of a control group. The rehearsal provided by

completing the battery of tasks at every testing phase could potentially have affected a

child’s performance on the tests. The inclusion of this group meant that the

experimental group’s performance at the last phase of testing could be examined for

practice effects.

The aims of this studyThe current study presents a unique piece of research that addresses a gap in our

understanding of theory of mind development by taking a detailed look at children in

this domain over a narrow time-frame. A microgenetic approach was employed in which

children were tested on a battery of mental state understanding on several occasions

over a period of 5 months. If a standard microgenetic study had been carried out, theneach participant would have been intensively tested for at least a year, from a period of

no understanding to full understanding. This would have led to problems of extensive

practice, boredom, and retaining participants. Therefore, in the present study,

participants were recruited at all stages of development, and the analysis examined

development both within and across individual children. The study concentrated on

establishing what happens during the transition from consistently failing tests of false

belief understanding to consistently passing the tests. The tests selected to be included

in the battery were not all false belief understanding tasks, but they were all theory ofmind tasks that showed a change in competence from 3 to 5 years of age. Furthermore, it

must be acknowledged that consistent success in the present study did not signify a full

understanding of false beliefs, as research has shown later developments in the

understanding of false beliefs, including second-order theory of mind (Perner &

Wimmer, 1985) and the implications of beliefs for emotions (Ruffman & Keenan, 1996).

Before the analysis could address the three main research questions, a number of

checks needed to be made relating to (i) the reliability of the coding and the ease of the

different versions of each test, (ii) whether the sample’s theory of mind was actuallychanging during the period of study, (iii) the inter-relations of the theory of mind tests,

(iv) practice effects, (v) the sequence of development, and (vi) the relations between

theory of mind and verbal ability. Following these initial analyses, the study’s three main

research questions could be investigated. Firstly, is the development of a theory of mind

Emma Flynn636

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

gradual, indicated by small improvements over time or are there sudden, larger changes

in performance? Secondly, are there regressions in theory of mind scores and, if so, what

is their magnitude? Finally, how does the information children use to provide

explanations for people’s behaviour change over time?

Method

DesignA microgenetic design was used to follow the performance of a group of children on a

battery of tests, which was administered every four weeks for six phases of testing.

These results were part of a larger study; however, the current analysis concentrates on

the development of theory of mind and verbal ability.

ParticipantsParticipants were 42 children (23 girls and 19 boys). At the first phase of testing, the

children were aged between 3;1 and 4;3 with a mean age of 3;10. After the first phase of

testing, each child was placed into one of two groups, a control group (which only

completed the battery at Phases 1 and 6), and an experimental group (which completed

all six testing phases). The control group was included to take into account practice

effects. The children’s allocation to each group was carried out so that the control and

experimental groups did not differ from one another in terms of age, gender, andperformance across the different tasks. However, as the experimental group was the

group of interest and would be more informative in terms of the study’s aims, more

children were allocated to the experimental group (N ¼ 28) than to the control group

(N ¼ 14). Both the control and experimental groups had a mean age at phase one of

3;10, both with a standard deviation of 4 months. The ratio of girls to boys in the control

and experimental groups were 8:6 and 15:13, respectively. Table 4 provides further

information about the two groups’ performances across the tasks, showing that there

was no significant difference between the groups’ performances on any of the tests atPhase 1.

The battery of tasksThere were 12 tasks in the battery; eight theory of mind tasks, three inhibition tasks, and

a verbal ability test. For the purposes of this analysis, only the theory of mind and verbal

ability tests will be described in detail because only data from these tests are presented

in the results.

Theory of mindThe eight theory of mind tasks were: a prediction and an explanation version of the

unexpected transfer task; a deceptive box test, which assessed a child’s understanding

of his/her own previous false belief and a naıve other’s false belief; an appearance-realitytask; a penny-hiding task, and two tests of false belief understanding in which the

location of the desired object was explicitly stated. For all of these tasks, except the

penny-hiding task, there were six different versions that were counterbalanced across

the six phases. This meant a child could not be successful simply because s/he had seen

the object/contents/story during a previous testing phase.

Stability and continuity in theory of mind 637

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Unexpected transfer tasks. Each child received two unexpected transfer tasks, a

prediction version and an explanation version, at each phase. There were 12 versions of

the tasks, six for the prediction task and six for the explanation tasks. Each version used

the same script but had minor alterations including different characters and different

transferred objects. For example, in one story, a ball was moved by Tom and looked for

by Rosie and in another, a cake was moved by Emma and looked for by Michael. Thescripts were adapted from the original Wimmer and Perner (1983) study and were

narrated by the experimenter with the aid of six pictures. Initially, a child was

introduced to two story characters (i.e. Character A and Character B) who were playing

in a room. The story progressed with Character A leaving the room after having placed

an object in a particular location. While Character A was out of the room, the object was

moved to a second location by Character B. Character A then returned and the child was

told that Character A wanted to find the object.

At this point, the prediction task and the explanation task differed. In the predictiontask, the test question was asked ‘Where will (insert the name of Character A) look for

the (insert the name of the object) first?’ This was taken from the work of Siegal and

Beattie (1991). Each child was then asked a reality check question: ‘Where is the (insert

the name of the object) really?’ This question was included to make sure a child knew

the true location of the object. Children were scored as correct and given a point if they

answered both the test question and the reality question correctly.

In the explanation task, adapted from Hughes (1998a) and Bartsch and Wellman

(1989), the child was told that Character A had returned to look for the object and askedthe reality check question ‘Where is the (insert name of object) now?’ With the aid of

another picture, it was then explained to the child that in fact Character A had looked for

the object in its original location. The child was then asked the test question ‘Why did

(insert name of Character A) look for the (insert name of object) there?’ Children were

coded as correct if they could provide an explanation for Character A’s behaviour that

referred to the character’s false belief. If a child was not correct, or did not respond after

the test question was asked, then the question was repeated. No other prompts were

provided.In both tasks memory questions were asked throughout the story, for example,

‘Where is the ball now?’ and ‘Where has Rosie gone?’ If a child was incorrect when

answering one of these questions, then that section of the story was repeated and the

question asked again. These were included to make sure that the child had paid

attention to the whole story.

Deceptive box test: Own previous and naıve other’s false beliefs. There were sixversions of the deceptive box test and a child received one of the versions at each phase

of testing. The procedure was taken from the original Hogrefe, Wimmer, and Perner

(1986) study. All the versions were presented using the same script but contained

different boxes and contents. The six different versions were: a Smarties tube containing

pencils; an egg box containing a spoon; a cornflakes box containing a bag; a biscuit box

containing a lemon; a Maltesers box containing marbles, and a crisp packet containing a

toy elephant. No child had any problems stating the prototypical contents of any of

these versions.Each version had two test questions; one evaluated a child’s understanding of another

person’s false belief and the other evaluated the child’s understanding of his/her own,

previous false belief. Initially, a child was shown a box that normally held prototypical

contents (e.g. a Smarties tube) and was asked what s/he thought was inside. All the

Emma Flynn638

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

children responded with the prototypical contents (e.g. Smarties). The box was opened

and the child was then shown that the box actually contained something novel (e.g.

pencils). The child was then asked what was really inside the box. Then the test

question concerning the child’s previous false belief was asked ‘When you first saw this

tube and it was all shut up like this, what did you think was inside?’ After a child had

responded to this test question, s/he was introduced to a puppet called ‘Sooty’ who hadbeen ‘asleep’ in the toy box. The child was asked ‘When Sooty wakes up and sees this

tube all shut up what will he think is inside it?’ For each test question, a child was given 1

point if s/he was able to acknowledge her/his own, previous false belief or the false

belief of the naıve individual and state that they believed that the box held the

prototypical contents (e.g. Smarties). S/he was coded as incorrect for each test question

if s/he stated that s/he had always thought, or that the naıve individual would believe

that the box contained the true contents (e.g. pencils).

Appearance-reality task. The procedure for this task was taken from the original

Flavell, Flavell, and Green (1983) study and was presented in exactly the same way for all

six objects. The six objects were: a pen that looked like a twig; a stone that looked like

an egg; a pencil sharpener that looked like a toy car; a sponge that looked like a rock; ahair band that looked like a toy duck; and a candle that looked like an apple. Initially, a

child was shown the object and asked ‘What does this look like?’ After the child had

answered (all children gave the correct answer to this question), s/he was allowed to

touch the object and the experimenter explained that although the object looked like a

specific object, it was in fact something else. The child was then asked ‘So what does it

look like?’ (appearance question) and then ‘What is this really?’ (reality question).

Children were scored as correct and given a point if they answered both questions

correctly.

Penny hiding task. The aim of this task was to appreciate that to hide a penny one has

to keep the penny concealed and not provide any clues to its whereabouts (Hughes,

1998a). The experimenter introduced this task by showing a penny to a child and

saying, ‘I’m going to put my hands behind my back and hide this penny. Now I’m going

to bring my hands out and you have to guess which hand it is in’. After hiding the pennyin one of her hands, the experimenter brought her two fists from behind her back and

placed them in front of the child and said ‘So which hand do you think it is in?’ After the

child had guessed, the experimenter showed the child if s/he was correct. After three

successive guesses, the experimenter said ‘Now, it’s your turn, you hide the penny

behind your back and I’ll guess which hand it’s in’. The child was given the penny and

s/he hid the penny on three separate trials with the experimenter guessing where the

penny was at each trial. A child was coded as successful on each trail if s/he: (i)

concealed the coin appropriately, (ii) produced both hands for guessing, and (iii) gaveno verbal clues as to the location of the coin. In keeping with Hughes (1998a), children

were given a point on this task if they were successful on two cut of three penny-hiding

trails.

False belief explicit location task. There were 12 versions of this task and a child

received two of these versions at each testing phase. The procedure was taken from the

original Wellman and Bartsch (1988) study. A child was shown a picture of a character in

Stability and continuity in theory of mind 639

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

a particular location (e.g. a garden) and told that the character wanted to find an object

(e.g. a bike). It was explained that the object was in a specific location but the character

believed that the object was in a different location. For example, ‘Billy is looking for his

bike. The bike is in the garage (experimenter points to the garage on the picture) but

Billy thinks his bike is in the shed (experimenter points to the shed on the picture)’. The

child was then asked ‘Where will Billy look for his bike first?’ (false belief question) and‘Where is the bike really?’ (reality question). This is similar to the unexpected transfer

prediction task but in this task the real location of the object and the character’s beliefs

are explicitly stated to the child. Each child was coded as successful if s/he answered

both questions correctly.

Verbal abilityThe children’s verbal ability was tested using a measure of receptive vocabulary, the

British picture vocabulary scale, (BPVS; Dunn, Dunn, Whetton, & Pintilie, 1997). This

was administered during the first and last phases of testing.

ProcedureEach child was taken into a quiet area of the nursery and after an initial introductory

period, the testing phase began. The order of presentation of the tasks was always kept

constant (Luria hand-game, penny-hiding task, unexpected transfer prediction task,

deceptive box test, appearance-reality task, false belief explicit location tasks, bear-dog

task, unexpected transfer explanation task, and the Luria lights task1). On both the first

and the last phases of testing, the BPVS was administered after all the other tasks.

Although presenting the tasks in a fixed order may have produced order effects, it was

more important that a fixed order allowed direct comparisons of the children’s abilitiesto be made over time and also between individual children. It also allowed certain tasks

to always be administered before others. For example, the prediction version of the

unexpected transfer task was administered before the explanation version. This reduced

the likelihood of children answering the prediction task correctly because they had seen

the events unfold in the explanation version. The whole battery took approximately 30

minutes.

Because the testing took place over a long period of time, it was not possible to have

all the children complete all of the test phases. Children were sometimes on holiday orabsent through illness. If a child missed a test phase and it was not possible to collect his

or her data over the next 2 days, then s/he was discounted from that test phase but was

included in the next. At the end of testing, 19 children had completed all six test phases,

three children missed the fourth, two missed the fifth, and two missed the sixth. Two

children missed more than one phase, one missed Phases 5 and 6, and one missed

Phases 3 and 6. Two children in the control group missed Phase 6.

Results

As explained in the Aims, before addressing the main experimental questions regardingrate, stability and changes in the type of information used in tests of theory of mind,

some initial analyses needed to be carried out. Firstly, in Section (i), inter-rater reliability

1 The Luria hand-game, bear-dog task, and Luria lights task were three measures of inhibitory control. These tasks will not beincluded in the analyses.

Emma Flynn640

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

is assessed, and the different versions of each test are examined to establish whether all

versions were of the same level of difficulty. In Section (ii), the experimental group’s

scores are examined for improvements on the theory of mind tests, as the children

needed to show some transition for the analysis to continue. Section (iii) presents an

exploratory principal component analysis, which examines the children’s scores to see

if there was consistency between the different tests. Practice effect investigations arepresented in Section (iv), comparing the performance of the control and experimental

groups. This was followed by an examination of the sequence of development of the

different tests in Section (v). Section (vi) presents the associations between children’s

theory of mind and verbal ability. Finally, in Section (vii), an aggregate theory of mind

score was produced for each child at each phase by adding his/her scores on seven of

the theory of mind tasks. The penny-hiding task was not included in this aggregate

score, as the principal component analysis showed that it loaded on to a different factor

to all the other measures. An illustration of the distribution of the aggregate scoresaccording to age is presented in Table 1. These aggregate scores allowed the main

research questions to be addressed in Section (vii), looking at the magnitude and rate of

improvement, and at the existence and magnitude of regressions. Finally, Section

(viii) investigates how the information children use to explain behaviour in terms of

false beliefs changes over time.

Section (i): Inter-rater reliability and comparison of the different versionsWe selected 100 testing sessions at random (55% of the total), which were coded by a

second experimenter to establish the level of reliability of the original coding. The levelof agreement for each of the theory of mind tests was never below 97%. Cohen’s kappa

was 0.89 for the coding of the explanations in the unexpected transfer explanation task.

Cases of disagreement were rectified by discussion based on the original videotaped

footage.

A set of one-way ANOVAS was carried out to assess whether any version of the same

test was easier or harder than any other version. When the children’s performances on

the different versions were collapsed across the phases, no differences were found

(unexpected transfer prediction task, Fð5; 151Þ ¼ 0:23, ns; deceptive box, own

previous false belief, Fð5; 151Þ ¼ 0:86, ns; deceptive box, naıve other’s false belief,

Fð5; 151Þ ¼ 1:44, ns; appearance-reality task, Fð5; 151Þ ¼ 1:74, ns; unexpected

transfer explanation task, Fð5; 151Þ ¼ 1:02, ns; false belief explicit location task,

Fð11; 302Þ ¼ 0:67, ns).

Section (ii): Developmental changeSign tests were performed on the experimental group’s scores for all the theory of mind

tasks. Each child’s performance on his/her first phase of testing was compared with

his/her performance on the last phase of testing that s/he undertook. Using the

children’s first and last scores, rather than relying in the scores from Phases 1 and 6,

allowed all the children’s scores to be included in this analysis. It was predicted thatthere would be an improvement on all the tasks. Table 2 shows the results of these sign

tests, as well as the number of children who improved, regressed or showed no change

in performance. All the tasks showed a significant improvement in performance across

the period of study, except the appearance-reality task and the second false belief

explicit location question.

Stability and continuity in theory of mind 641

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Table

1.

The

child

ren’s

aggr

egat

eth

eory

ofm

ind

score

s2

Age

/Child

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

A0

00

1–

–B

00

–2

1–

C2

20

–3

3D

12

23

4*

4*

E2

46

54

–F

12

23

6*

5G

44

34

–3

H2

22

43

3I

20

10

00

J4

55

66

–K

16

5–

66*

L3

65

56

6M

3*

57*

6*

67*

N3

22

55

7*

O3

6*

54*

3*

6*

P1

03

44*

2Q

7*

67*

7*

7*

7*

R6*

67*

7*

7*

7*

S0

4*

7*

7*

7*

7*

T0

3*

3*

3*

4*

3*

U5

55

5*

5*

7*

V3

5*

4*

–4*

5*

W6

66

–7*

6*

X0

11

01

1Y

4*

7*

54*

6*

4*

Z5*

6*

7*

6*

6*

6*

A1

13

33*

4*

3*

B1

6*

76

7*

7*

7*

(i)

00

1.3

2.3

2.5

3.3

3.6

3.4

2.5

2.3

2.6

3.9

3.5

4.5

4.9

5.1

5.0

5.1

4.7

5.3

(ii)

00

1.5

2.0

2.0

3.0

4.0

3.0

2.5

2.0

3.0

3.0

5.0

5.0

5.5

6.0

6.0

6.0

5.0

6.0

Emma Flynn642

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Table

1.

(Continued)

Age

/Child

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

(iii)

00

00

77

77

77

77

Note.

The

poss

ible

range

ofsc

ore

sw

as0–7.

–in

dic

ates

that

ach

ilddid

not

par

tici

pat

eat

that

phas

eT

he

score

sw

ith

aste

risk

sre

pre

sent

phas

esin

whic

ha

child

was

succ

essf

ulon

the

unex

pec

ted

tran

sfer

expla

nat

ion

task

.T

he

bott

om

lines

inth

eta

ble

show

:(i)

the

mea

nsc

ore

for

that

colu

mn;(ii)

the

med

ian

score

for

that

colu

mn;an

d(iii)

the

pre

dic

tions

bas

edon

Wel

lman

etal.

(2001)’s

met

a-an

alys

is.

2Eac

hch

ildis

pla

ced

inth

eta

ble

acco

rdin

gto

his

/her

age

atth

efirs

tphas

eofte

stin

g.T

he

loca

tions

ofth

esc

ore

sfo

llow

ing

the

firs

tsc

ore

are

only

appro

xim

ate

inso

far

asth

eir

rela

tion

toth

eag

eax

isas

the

phas

esofte

stin

gw

ere

set

at4

wee

ksap

art

asoppose

dto

cale

ndar

month

sap

art.

Stability and continuity in theory of mind 643

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Section (iii): Inter-relations of the theory of mind tasksAn exploratory principal component analysis was undertaken to discover whether all

the tests in the theory of mind battery were loading on to the same construct. The taskscores were collapsed across all the phases of testing for the experimental group. As

Table 3 shows, two components with eigenvalues greater than 1.0 accounted for 55% of

the total variance. The KMO test of sampling adequacy was met (KMO ¼ 0.79), and

Bartlett’s sphericity yielded a x2 of 313 (df ¼ 28, p , .001), indicating the factor model

was appropriate. Table 3 presents the loadings of the test items on to the different

components. Component 1 accounted for 39% of the variance and was loaded on by

seven of the eight tests (not the penny-hiding task). Component 2 accounted for a

further 16% of the variance and was loaded on by the penny-hiding task only.

This analysis was not inflated by the inclusion of both false belief explicit location

questions. When this analysis was repeated twice, but with only one of the false belief

explicit location questions entered each time, the results were the same as the initial

analysis, including the same amount of variance being accounted for by both

components. Therefore, all further analysis involving the aggregate theory of mind score

involved all the tests, except the penny-hiding test, as this did not load on to the same

component.

Table 2. The results of sign tests and the development of children in the experimental group’s

performances from their first to their last phase of testing

Theory of mind task N ¼ 28 Improved Regressed Unchanged Sign test result

Aggregate theory of mind score 23 2 3 ***Unexpected transfer prediction task 12 2 14 *Unexpected transfer explanation task 10 0 18 **Deceptive box: Another person’s false belief 10 2 16 *Deceptive box: Own, previous false belief 9 1 18 *Penny-hiding task 9 1 18 *Appearance-reality task 8 3 17 nsFalse belief explicit location tasksQuestion 1 8 1 19 *Question 2 8 3 17 ns

*p , .05, **p , .01, ***p , .001, ns indicates a non-significant result.

Table 3. Loading scores for the different theory of mind tests

Tests Component 1 Component 2

Unexpected transfer prediction task 0.796 20.205False belief explicit location task: Question 1 0.757 20.464False belief explicit location task: Question 2 0.738 20.463Deceptive box: Another person’s false belief 0.670 0.125Deceptive box: Own, previous false belief 0.586 0.397Unexpected transfer explanation task 0.582 0.337Appearance-reality task 0.506 0.367Penny-hiding task 0.338 0.622

Emma Flynn644

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Section (iv): Practice effectsRepeated measures multivariate analyses of variance were undertaken to compare

changes in the experimental and control groups’ aggregate theory of mind scores and

the individual task scores from Phase 1 to 6. Multivariate analyses were undertaken

because the assumption of sphericity was not met. Changes in the experimental group’s

aggregate theory of mind scores (mean change in score ¼ 1.92, SD ¼ 1:90) weresignificantly greater than changes in the control group’s aggregate scores (mean change

in score ¼ 0.71, SD ¼ 1:88; Fð1; 34Þ ¼ 5:24, p , .05). There did appear to be sizable

differences on the two deceptive box test questions but these were not statistically

significant, deceptive box: own, previous false belief Fð1; 34Þ ¼ 0:84, p ¼ :37; naıve

other’s false belief Fð1; 35Þ ¼ 2:69, p ¼ :11.

Section (v): Sequence of developmentThe analysis presented in Table 4 also allowed an examination of the level of difficulty of

the different tests. Each test’s difficulty is reflected in the number of children who

passed the test during a test phase; that is, easier tests are passed by more children than

harder tests. From the experimental group’s performance at Phase 1, the order of ease,

from easiest to most difficult, was: appearance-reality task, penny-hiding task, false belief

explicit location tasks, unexpected transfer prediction tasks, the deceptive box test

questions, and the unexpected transfer explanation task. The pass rates for the control

group at Phase 1 produced an almost identical sequence of performance, with only thedeceptive box test question regarding another person’s false belief being out of

sequence. Furthermore, this sequence of performance was almost identical when all of

the tests were collapsed across all of the phases for the experimental group. The order

of success, including pass rates, across all the phases was: penny-hiding task (75%),

appearance-reality task (62%), unexpected transfer prediction task (62%), false belief

explicit location tasks (both questions 61%), deceptive box: another person’s false belief

(56%), deceptive box: own, previous false belief (51%), unexpected transfer explanation

task (40%). These results suggest that the task’s difficulty appears to be relativelyconsistent across different groups and repeated testing. However, when the individual

profiles of performance for children who scored between 1 and 6 on the aggregate

theory of mind scores were examined, only 33 out of the 122 profiles (27%) fitted this

sequence of development.

Section (vi): Verbal abilityThe experimental group’s standardized BPVS scores at Phase 1 ranged from 82 to 119,

with a mean of 96.70 and a standard deviation of 10.51. The experimental group’s

aggregate theory of mind scores correlated significantly with their BPVS score at Phase

1, rð27Þ ¼ :31, p , .05; Phase 2, rð27Þ ¼ :36, p , .05; Phase 3, rð27Þ ¼ :33, p , .05);

Phase 4, rð23Þ ¼ :41, p , .05); Phase 5, rð25Þ ¼ :45, p , .05); and Phase 6, rð24Þ ¼ :46,

p , .05. A Pearson correlation was carried out to compare developmental changes in

the experimental group’s aggregate theory of mind scores (score at Phase 6 minus score

at Phase 1) and BPVS scores (raw score at Phase 6 minus raw score at Phase 1). Incontrast with correlations at specific phases, this correlation was not significant,

rð24Þ ¼ :25, p . .05. In an attempt to establish directionality between the children’s

BPVS scores and aggregate theory of mind scores, cross-lagged correlations were

performed across Phases 1 and 6 for the experimental group. The internal consistency

of the aggregate theory of mind scores and the raw BPVS scores were stable as these

Stability and continuity in theory of mind 645

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Table

4.

The

resu

lts

for

the

contr

olan

dex

per

imen

talgr

oups

atPhas

es1

and

6

Phas

e1

Contr

olgr

oup

14

Phas

e1

Exp-a

lgr

oup

28

Phas

e1

tte

stre

sults

Phas

e6

Contr

olgr

oup

12

Phas

e6

Exp-a

lgr

oup

24

Phas

e1

toPhas

e6

repea

ted

mea

sure

sM

AN

OVA

resu

lts

Mea

nag

e(m

onth

s)46.4

3(3

.95)

46.0

7(4

.41)

nsM

ean

BPV

SSc

ore

92.7

8(1

2.2

6)

96.7

0(1

0.5

1)

ns94.3

3(1

3.1

8)

98.9

1(1

1.6

5)

nsG

ender

8fe

mal

es15

fem

ales

ns7

fem

ales

13

fem

ales

ns6

mal

es13

mal

es5

mal

es11

mal

esTheoryof

mindtasks

Mea

nag

greg

ate

theo

ryofm

ind

score

3.2

1(2

.08)

2.6

4(2

.13)

ns3.9

2(2

.87)

4.7

1(2

.14)

*%

Pas

sra

tefo

rth

eunex

pec

ted

tran

sfer

pre

dic

tion

task

36

36

ns58

75

ns%

Pas

sra

tefo

rth

eunex

pec

ted

tran

sfer

expla

nat

ion

task

29

22

ns58

63

ns%

Pas

sra

tefo

rth

edec

eptive

box:O

wn,pre

vious

fals

ebel

ief

29

32

ns42

67

ns%

Pas

sra

tefo

rth

edec

eptive

box:A

noth

erper

son’s

fals

ebel

ief

43

32

ns33

67

ns%

Pas

sra

tefo

rth

eap

pea

rance

-rea

lity

task

64

50

ns67

67

ns%

Pas

sra

tefo

rth

epen

ny-h

idin

gta

sk64

46

ns83

79

ns%

Pas

sra

tefo

rth

efa

lse

bel

iefex

plic

itlo

cation

task

Ques

tion

157

43

ns67

71

nsQ

ues

tion

264

46

ns67

63

ns

Note.ns

indic

ates

anon-s

ignifi

cant

resu

lt,*p

,.0

5,st

andar

ddev

iations

are

inpar

enth

esis

.

Emma Flynn646

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

scores correlated significantly at Phase 1 and 6, rð24Þ ¼ :59, p , .01 and rð24Þ ¼ :57,

p , .01, respectively). The aggregate theory of mind score at Phase 1 did not correlate

significantly with the BPVS scores at Phase 6, rð24Þ ¼ :14, ns); however, the correlation

between the BPVS scores at Phase 1 and the aggregate theory of mind scores at Phase 6

approached significance, rð24Þ ¼ :39, p ¼ :06). When the control group was included

in this analysis, although correlation between the BPVS score at Phase 6 and theaggregate theory of mind correlation at Phase 1 remained non-significant, rð37Þ ¼ :20,

ns), the correlation between the BPVS score at Phase 1 and the aggregate theory of mind

score at Phase 6 reached significance, rð36Þ ¼ :55, p , .001.

Section (vii): Aggregate theory of mind scoresBinomial tests were carried out to test whether the experimental group’s aggregate

theory of mind scores deviated from chance in relation to the three age blocks indicatedin Table 1, derived from Wellman et al.’s (2001) meta-analysis predictions. A cut-off score

of three correct answers (representing success on half of the tests on which a child had

two potential answers) was used. The scores produced by children who were younger

than 3;5 were below chance (binomial test, p , .01), scores produced by children over

4 years were above chance (binomial test, p , .001) and scores produced by children

between 3;5 and 4 years were not significantly different from chance.

When the children’s phase-to-phase aggregate theory of mind scores were examined,

it was discovered that 37% of the transitions were improvements, (overall, 50% of theimprovements were by 1 point, 26% by 2 points, 20% by 3 points, 2% by 4 points and 2%

by 5 points). In contrast, 24% of the phase-to-phase transitions were regressions (84% by

1 point, 16% by 2 points). There were no regressions of more than two points. Thirty-

nine percent of the phase-to-phase transitions showed the same score at both phases,

22% of these were at ceiling.

Section (viii): Unexpected transfer explanation taskParticular attention was focused on the point during the development of theory of mind

at which children passed the unexpected transfer explanation task. Table 5 shows the

percentage of each explanation type for each aggregate theory of mind score (excluding

the unexpected transfer explanation task). As the control and experimental groups’

performance did not differ on this task, the table incorporates both groups’ answers.

Children’s responses fell into five categories: (i) no response or ‘don’t know’; (ii) a

situational explanation, describing some aspect of the current situation or prior eventbut with no reference to the false belief or no appreciation of the conflicting events

(e.g. ‘it’s in the cupboard’ or ‘cos it’s her teddy and he is hungry’); (iii) a wanting

explanation, describing the character’s desire for the object (e.g. ‘he wants it’ or ‘he

wants to read his book’); (iv) a correct answer that did not refer to mental states but did

reflect on the conflicting events and showed an understanding that the character’s

actions would be linked to their false beliefs (e.g. ‘she put it there but now it’s in there’

or ‘cos she put it in there and the other one put it in there’); and (v) a thinking

explanation, referring to the thoughts or knowledge of the character (e.g. ‘he thinks it’sin there’ or ‘cos she doesn’t know where it is, she thinks it’s in there’). Children who

gave an answer in either Category 4 or 5 were coded as correct on the task as they

showed an explicit understanding of mental states, or an appreciation of the conflicting

events and the character’s responses to them; all other explanations were incorrect.

Sometimes, children gave more than one explanation. On these occasions, the highest

Stability and continuity in theory of mind 647

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

level of explanation children provided was noted. A thinking explanation was thought

to be the most sophisticated, followed by a correct explanation (which did not mention

mental states), then the wanting explanation, as these are mentalistic in some sense but

fail to consider beliefs, then the situational explanations and finally the don’t know

responses. If children did not provide a correct explanation after the test question was

asked, then the question was repeated. In 151 of the 183 incidents of repeating the

question, children gave the same level of explanation.

Discussion

The present study provides a unique opportunity to examine exactly what occurs

during the period of transition from a state of equilibrium, in which children

consistently rely on reality to address questions of false belief, to a more sophisticated

state of equilibrium, where children are able to refer to individual’s false beliefs.

Examining changes in performance within the same children over time provides

stronger evidence for sequences than inferences from group means. Although the studycannot definitively distinguish between the different theories designed to explain the

development of theory of mind, it does provide some distinctive information regarding

changes in individual children’s profiles of performance. This includes information

about rate and stability, and how the types of answers provided by children to explain

behaviour change over time.

Although few children were consistently failing all the theory of mind tests before

3;5, their performance was poor and below chance during this period. Similarly,

although not all children were producing perfect scores after 4 years, the performanceswere stable and above chance, with the majority of children being successful on most of

the theory of mind tests. Between 3;5 and 4 years, the children’s aggregate theory of

mind scores covered the full range of possible scores, producing an overall group

performance that was at chance levels. These profiles of performance support the

Table 5. Percentage of each type of explanation according to children’s aggregate theory of mind score

(excluding the unexpected transfer explanation task)

Aggregate theory ofmind score

(i)Don’t know orno response

(ii)Situationalresponse

(iii)Wantingresponse

(iv)Correct

(no mental state)

(v)Thinkingresponse

09 82 9 0 0

132 61 7 0 0

239 39 9 0 13

342 19 3 13 23

433 7 13 7 40

523 3 14 9 51

618 8 0 26 48

Emma Flynn648

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

parameters of development suggested by Wellman et al.’s (2001) meta-analysis. It must

be noted, as shown in Table 1, that more children in the critical period of 3;5 to 4 years

would have been optimal. However, when the individual children’s profiles are

examined, it appears that children are showing change before and after these ages.

Therefore, one should view these parameters as guidelines, rather than rules.

If gradual development is defined as changes of no more than 2 out of the 7 possiblepoints from one phase to the next, 19 of the 28 children showed a gradual improvement

in their theory of mind score. The overwhelming majority of improvements (76%) were

never any more than 2 points, and the regressions were never by more than 2 points. As

the children did not appear to show sudden, large advances in performance, it can be

concluded that the development of an individual child’s expressed theory of mind

competence is gradual. However, Werner (1957) and Flavell (1971) have highlighted the

difficulty of establishing the specific processes involved in change. For example, a

discontinuous change involves a qualitative shift from one particular type of behaviour

to a different type of behaviour, but this change may become distinguishable onlygradually. Further examination of the profiles of performance in the light of the

children’s explanations on the unexpected transfer explanation task has shown that,

although the aggregate scores showed a gradual improvement, there may have been

some qualitative changes in the strategies or theories children used in tests of false belief

understanding.

Evidence from this study and that of Flynn et al. (2004) shows that children appear to

progress from a period in which they rely on reality to answer questions in false belief

understanding tasks to a period where they begin to pass tests but show very poor

overall theory of mind performance. No child who achieved an aggregate score(excluding the unexpected transfer explanation task) of between 0 and 1 was able to

explicitly explain behaviour in terms of false beliefs. For children who scored 0 or 1, 61%

to 82% of all answers provided on the unexpected transfer explanation task were

situational, relying on reality to explain the behaviour. As the aggregate theory of mind

scores increased the predisposition to relying on reality begins to fade, and the number

of situational explanations decreases. Some children still provide situational

explanations, but as the aggregate theory of mind scores increase there is an increase

in the number of don’t know/no responses. Such a finding is paradoxical; although the

children’s aggregate theory of mind scores are increasing, suggesting a betterunderstanding of mental states, the children are unable to articulate this understanding

and instead chose not to give an explanation. Therefore, there does appear to be a

period of confused performance, as suggested by Wellman et al. (2001), in which

children are unable to explicitly provide an explanation of a story character’s behaviour.

As children’s theory of mind skills become more established (i.e. a score of 3 or more),

the number of situational and don’t know/no responses decreases, as children are more

likely to explain behaviour in terms of an individual’s false beliefs. Once children begin

to provide ‘thinking’ explanations, they do not tend to return to a less sophisticated type

of explanation. There does not appear to be a systematic production, or stage ofexplanations related to desire-based reasoning, except that these tend not to occur at

the extremes, such as when children are relying on reality or when they are using false

beliefs to explain behaviour.

Interestingly, the developments illustrated by the profiles of performance for the

children’s aggregate theory of mind scores appear stable. Flynn et al. (2004) concluded

that the development of theory of mind skills was unstable. However, increasing the

number of theory of mind tests included in the aggregate score from 3 to 7 has shown that

Stability and continuity in theory of mind 649

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

the development is steady. Although 22 of the 28 children in this study showed some

regression in theory of mind ability, these regressions were small, never amounting to

more than 2 points out of a possible regression score of 7. Therefore, even when the

children are confused – being unable to provide lucid explanations on the unexpected

transfer explanation task – they are not showing great fluctuations in performance.

A reassuring finding from the present study was that all but one of the differentmeasures designed to evaluate theory of mind loaded on to the same construct.

Encouragingly, the tests that were showing this agreement were those tests that are

most commonly used in the literature, for example, the deceptive box test and the

unexpected transfer prediction task. The exception was the penny-hiding task, which

did not load on to the same factors as all the other tests. The penny-hiding test is a test of

perspective taking in a deception situation. It requires a child to hide a penny in front of

someone, therefore considering that person’s visual perspective, so that no clues are

given to the penny’s location. The results of the principal component analysis was

supported by Hughes (1998a) who found that the penny-hiding task correlatedsignificantly with another deception task, but not with either a deceptive box task or

false belief tasks that required an explanation using mental states. Therefore, the penny-

hiding test may be evaluating one aspect of theory of mind (i.e. deceptive ability) but the

other seven tests are measuring another ability relating to changes in 3- and 4-year-olds’

mental state understanding. This other component is closely associated with the

understanding of false beliefs as all but one of the tasks, the appearance-reality task

being the exception, are measures of false belief understanding. False belief

understanding represents a robust measure of an important early development of

theory of mind, but we must bear in mind that it is only one narrow aspect of amultifaceted domain. The use of the microgenetic approach has provided interesting

results, which could not have been produced using the usual, cross-sectional approach.

Future work could adopt this approach to examine change in other theory of mind

developments across the life-span including second-order theory of mind and later

developments at the first-order theory of mind period, such as the implication of

understanding beliefs for emotions.

As well as showing that the tests are valid, this study also shows that there is a

relatively consistent sequence of development across the tasks, which was apparent

across different groups and testing phases. This consistent sequence of developmentcould be caused by the non-specific task demands of the different theory of mind tests. A

child’s theory of mind skills may be in place, but the non-specific demands of the tests

cause children to perform differently on each test. Two pieces of evidence argue against

this proposal. Firstly, even though the tests are superficially very different, all the tests,

except those in the extremes (penny-hiding, appearance-reality, and the unexpected

transfer explanation tasks), were showing similar levels of difficulty (i.e. comparable

pass rates). Secondly, when the individual profiles of performance for children who

scored between 1 and 6 on the aggregate theory of mind scores were examined, only

1/4 fitted this sequence of development. This finding suggests that the sequence is trueof groups, but not of individuals. However, unlike Wellman and Liu (2004), the sequence

produced in the present study cannot be used as a definitive sequence by which to

compare individual children’s development, as the tasks were not counterbalanced

across children and testing phases, and so children’s performances may have been

affected by order of presentation.

The substantial associations found between the different theory of mind measures

contrasts with Flynn et al. (2004) who found that children’s performances on only two

Emma Flynn650

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

of the three false belief understanding tests were associated with one another. Flynn

et al. (2004) looked at the period of emergence of false belief understanding, whereas

the present study examined the development of theory of mind skills at a point when

the skill was becoming more established. Therefore, it appears that, at this point of

development, most tests of theory of mind show high construct validity. However, this

highlights the significant issue of reliability, validity and stability of false beliefunderstanding during this period of change. It is exactly this period of instability that

researchers interested in individual differences in false belief understanding must target.

Many studies have reported problems in finding robust associations between different

theory of mind tests, and their associated variables. For example, Mitchell (1996)

reported that, ‘The most strange thing about the age trends was the lack of them : : :Quite simply, it has become fashionable to claim that there is a sharp age trend, but in

fact there is not’ (Mitchell, 1996, pp. 137–138). The present study shows that rather

than the tasks lacking reliability and validity, children appear to progress through a

period of confusion during the transition, where they fail tasks that they previouslypassed, and are unable to explain false beliefs, even though they could do so before.

Therefore, researchers should not be surprised if they are not able to produce previously

reported effects when looking at performance during this critical period.

An association was found between children’s verbal ability and their theory of mind,

which replicated the findings of Astington and Jenkins (1999) and De Villiers and Pyers

(2002). Not only was there within-phase correspondence between children’s verbal

ability and their theory of mind, cross-lagged correlations showed that early linguistic

skills predicted later theory of mind but the reverse – early theory of mind predicting

later linguistic skills – did not hold. Verbal abilities play an important role in childrenunderstanding task instructions and being able to produce sophisticated explanations.

Yet, this robust evidence of early linguistic skills predicting later mental state

understanding shows that language plays a more fundamental role in social cognition

than simply being able to meet certain methodological requirements. Further work

needs to establish the elements of language that play a part in this development, and

how a child’s linguistic environment affects this language/theory of mind relationship.

An examination of practice effects showed that repeated testing on this battery of

theory of mind tasks significantly improved the experimental group’s aggregate theory

of mind scores compared with the control group’s aggregate scores. This practice effectwas surprising. Children were provided with no feedback from the experimenter

regarding their success or failure on the tasks. At the end of each test, the experimenter

provided some general encouragement and then moved on to the next task. Previous

studies have explored the role of different types of feedback on children’s theory of

mind performance. Slaughter and Gopnik (1996) avoided presenting children with

explanations in their training procedure, instead providing brief verbal feedback. Their

procedure was based on the assumption that it is primarily the child’s own perception

of conflicting evidence, rather than an adult’s tutoring of the correct answer, which

causes theory revision. The current findings support this hypothesis. Although theexperimenter provided no explicit feedback, the children may have experienced

implicit contradictions caused by being repeatedly presented with questions that

required them to consider their theories of human behaviour and mental states. Such a

finding is also supported by the fact that the effect was found for the aggregate theory of

mind score rather than an individual test, suggesting that this study’s procedure

facilitated some conceptual change rather than a rule or strategy change that was

implemented consistently successfully on one or two tests.

Stability and continuity in theory of mind 651

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

The current study extends our understanding of the time scale over which change

occurs. Training studies have previously shown changes in theory of mind performance

over relatively short periods. Slaughter and Gopnik (1996) facilitated change over 2 to 3

weeks and Appleton and Reddy (1996), over 4 to 6 weeks. In the present study, children

received eight tests every 4 weeks for six phases. Although there was a statistically

significant improvement in the experimental group’s aggregate theory of mind scorecompared with the control group, this improvement was not large (a mean difference of

1.92). Slaughter and Gopnik (1996) suggest that the revision of theories is not

immediate, but rather consists of a gradual appreciation of cognitive inconsistencies, as

children need time to ruminate on their theory change. These results show that the

procedure used in the present study provides children with the rumination time to

revise their theories. Further work needs to be undertaken to establish exactly which

element of the presentation of this battery, for example, frequency of presentation, size

of battery, number of phases, and so on, facilitates this improvement.The existence of practice effects also highlights another possibility for future

research. The administration of the unexpected transfer explanation task during the

period of transition in this study has provided important information about changes in

children’s information use during the transition from consistently failing tests of false

belief understanding to consistently passing these tests. However, this was just one task

in a battery of seven theory of mind tasks. Future work could include more explanation

tasks, or alternatively, an experimenter could ask a child to explain their answer on the

other theory of mind tests. As the present study was the first to implement themicrogenetic approach across the full transition period, it aimed to look at spontaneous,

rather than facilitated change, so as to produce an initial illustration of change from

which future experimental manipulations could be made. Repeatedly asking children to

explain their logic would have highlighted this logic and so may have facilitated the

development further. Future work could examine how and when explanations facilitate

theory of mind development.

Although the present study cannot definitively distinguish between the different

theories designed to explain the development of theory of mind, it does provide somedistinctive information regarding changes in individual children’s profiles of

performance, which a successful theory must explain. For example, a successful

theory must assume that the development of, or at least the demonstration of, theory of

mind skills (i) occurs gradually, (ii) is stable with occasional small regressions in

performance, (iii) is predicted by early verbal skills, (iv) can be facilitated by the

repeated presentation of a set of test over a number of weeks without explicit feedback,

and (v) causes children to progress through a period in which they no longer rely on

reality to explain an individual’s behaviour, but instead choose to give no response.

References

Appleton, M., & Reddy, V. (1996). Teaching three-year-olds to pass belief tests: A conversational

approach. Social Development, 5, 275–291.

Astington, J. W., & Jenkins, J. M. (1999). A longitudinal study of the relation between language and

theory-of-mind development. Developmental Psychology, 35, 1311–1320.

Bartsch, K. (1998). False belief prediction and explanation: Which develops first and why it

matters. International Journal of Behavioural Development, 22, 423–428.

Bartsch, K., & Wellman, H. (1989). Young children’s attribution of action to beliefs and desires.

Child Development, 60, 946–964.

Emma Flynn652

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Bartsch, K., & Wellman, H. (1995). Children talk about the mind. New York: Oxford University

Press.

Carlson, S., Mandell, D., & Williams, L. (2004). Executive function and theory of mind: Stability and

prediction from ages 2 to 3. Developmental Psychology, 40, 1105–1122.

Carlson, S., Moses, L., & Breton, C. (2002). How specific is the relation between executive

function and theory of mind? Contributions of inhibitory control and working memory. Infant

and Child Development, 11, 73–92.

Carpendale, J., & Lewis, C. (2004). Constructing an understanding of mind: The development of

children’s social understanding within social interaction. Behavioural and Brain Sciences,

27, 79–96.

De Villiers, J., & Pyers, J. (2002). Complements to cognition: A longitudinal study of the

relationship between complex syntax and false-belief-understanding. Cognitive Development,

17, 1037–1060.

Dunn, L., Dunn, L., Whetton, C., & Pintilie, D. (1997). British Picture Vocabulary Scale. Windsor:

NFER-Nelson.

Flavell, J. (1971). Stage-related properties of cognitive development. Cognitive Psychology, 2,

421–453.

Flavell, J., Flavell, E., & Green, F. (1983). Development of the appearance-reality distinction.

Cognitive Psychology, 15, 95–120.

Flynn, E., O’Malley, C., & Wood, D. (2004). A longitudinal, microgenetic study of the emergence of

false belief understanding and inhibition skills. Developmental Science, 7, 103–115.

Foote, R., & Holmes-Lonergan, H. (2003). Sibling conflict and theory of mind. British Journal of

Developmental Psychology, 21, 45–58.

Happe, F. (1994). An advanced test of theory of mind: Understanding of story characters’ thoughts

and feelings by able autistic, mentally handicapped and normal children and adults. Journal of

Autism and Developmental Disorders, 24, 129–154.

Happe, F. (1995). The role of age and verbal ability in the theory of mind task performance of

subjects with autism. Child Development, 66, 843–855.

Hogrefe, J., Wimmer, H., & Perner, J. (1986). Ignorance versus false belief: A developmental lag in

attribution of epistemic states. Child Development, 57, 567–582.

Hughes, C. (1998a). Executive function in preschoolers: Links with theory of mind and verbal

ability. British Journal of Developmental Psychology, 16, 233–253.

Hughes, C. (1998b). Finding your marbles: Does pre-schoolers’ strategic behaviour predict later

understanding of mind? Developmental Psychology, 34, 1326–1339.

Hughes, C., Adlam, A., Happe, F., Jackson, J., Taylor, A., & Caspi, A. (2000). Good test-retest

reliability for standard and advanced false-belief tasks across a wide range of abilities. Journal

of Child Psychology and Psychiatry and Allied Disciplines, 41, 483–490.

Mayes, L., Klin, A., Tercyak, K., Cicchetti, D., & Cohen, D. (1996). Test-retest reliability for false-

belief tasks. Journal of Child Psychology, Psychiatry and Allied Disciplines, 37, 313–319.

Mitchell, P. (1996). Acquiring a conception of mind: A review of psychological research and

theory. Hove, UK: Psychology Press.

Perner, J., & Wimmer, H. (1985). ‘John thinks that Mary thinks that: : :’: Attribution of second

order beliefs by 5–10 year old children. Journal of Experimental Child Psychology, 39,

437–471.

Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioural

and Brain Sciences, 1, 515–526.

Repacholi, B., & Gopnik, A. (1997). Early understanding of desires: Evidence from 14- and

18-month-olds. Developmental Psychology, 33, 12–21.

Ruffman, T., & Keenan, T. (1996). Children’s understanding of surprise: The case for a lag in

understanding relative to false belief. Developmental Psychology, 32, 40–49.

Siegal, M., & Beattie, K. (1991). Where to look first for children’s knowledge of false beliefs.

Cognition, 38, 1–12.

Stability and continuity in theory of mind 653

Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society

Siegler, R., & Crowley, K. (1991). The microgenetic method: A direct means for studying cognitive

development. American Psychologist, 46, 606–620.

Slaughter, V., & Gopnik, A. (1996). Conceptual coherence in the child’s theory of mind: Training

children to understand belief. Child development, 67(6), 2967–2989.

Wellman, H. (1990). The child’s theory of mind. Cambridge, MA: MIT Press.

Wellman, H., & Bartsch, K. (1988). Young children’s reasoning about beliefs. Cognition, 30,

239–277.

Wellman, H., Cross, D., & Watson, J. (2001). Meta-analysis of theory of mind development: The

truth about false belief. Child Development, 72, 655–684.

Wellman, H., & Liu, D. (2004). Scaling of theory-of-mind tasks. Child Development, 75, 523–541.

Werner, H. (1957). The concept of development from a comparative and organismic point of view.

In D. Harris (Ed.), The concept of development: An issue in the study of human behaviour.

Minneapolis, MA: University of Minnesota Press.

Wimmer, H., & Mayringer, H. (1998). False belief understanding in young children: Explanations

do not develop before predictions. International Journal of Behavioural Development, 22,

403–422.

Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of

wrong beliefs in young children’s understanding of deception. Cognition, 13, 103–128.

Received 31 March 2004; revised version received 10 June 2005

Emma Flynn654