A microgenetic investigation of stability and continuity in theory of mind development
-
Upload
emma-flynn -
Category
Documents
-
view
216 -
download
2
Transcript of A microgenetic investigation of stability and continuity in theory of mind development
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
A microgenetic investigation of stability andcontinuity in theory of mind development
Emma Flynn*School of Psychology, University of St Andrews, UK
The processes behind the transition from consistently failing tests of false beliefunderstanding to consistently passing the tests was investigated by tracking changes inchildren’s mental state understanding. Participants were 42 children (aged 3;1 to 4;3).There were two conditions; an experimental condition in which children were testedon a battery of eight theory of mind tests every four weeks for six phases of testing, anda control condition in which children only completed the battery of tests at the first andlast testing phases. The profiles of performance showed that an understanding of falsebeliefs develops gradually and the development is relatively stable. An examination ofthe types of explanation children give on tests of false belief understanding showed thatinitially they rely on reality, then they progress through a period of confusion, wherethey do not provide an explanation, to a final stage in which they are able to explainbehaviour by referring to an individual’s false belief. Further analyses examined practiceeffects, construct validity, and the role of verbal ability on the development of mentalstate understanding.
Since Premack and Woodruff’s (1978) study, which first introduced the phrase ‘theory ofmind’, there has been a great deal of interest in the development of an understanding of
mental states. The development of theory of mind is a protracted process, with advances
in the understanding of mental states being reported from infancy (Repacholi & Gopnik,
1997) through to adulthood (Happe, 1994). Yet, the majority of theory of mind research
has concentrated on the understanding of false beliefs, which typically occurs between
the ages of 3 and 5 years. It is now accepted that by about 4 years of age, typically
developing children give correct answers on tests of false belief understanding (Wellman,
Cross, & Watson, 2001). In tests such as the unexpected transfer task, older childrenappreciate that a story character who holds a false belief about the location of an object
will search for that object not where it is truly located, but where the character believes
the object is located. However, younger children refer to reality and state that the
character will search for the object in its true location.
* Correspondence should be addressed to Emma Flynn, School of Psychology, University of St Andrews, St Andrews, Fife,Scotland, KY16 9JP, UK (e-mail: [email protected]).
TheBritishPsychologicalSociety
631
British Journal of Developmental Psychology (2006), 24, 631–654
q 2006 The British Psychological Society
www.bpsjournals.co.uk
DOI:10.1348/026151005X57422
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Many aspects of false belief understanding have been investigated, including the
associated cognitive skills (Carlson, Moses, & Breton, 2002; Happe, 1995), the influence
of a child’s environment (see Carpendale & Lewis, 2004 for an overview), and the
parameters of false belief understanding tests (Wellman et al., 2001). Yet, in the plethora
of theory of mind research, there are only a handful of longitudinal studies that examine
how the same child’s theory of mind skills change over time. We assume from cross-sectional studies that, for typically developing children, there is a predictable sequence
of transition in children’s insights about the mind. However, little is known about how
this transition unfolds or the processes that drive this developmental shift. In order to be
able to discriminate between competing theoretical accounts of theory of mind
development, it is essential to begin with an empirically informed description of
individual children’s natural transition from consistent failure on tests of false belief
understanding to a level of consistent success. Only once such a description is in place
can we make experimental manipulations to differentiate between different theories.
The present study aims to address this gap in the literature by tracking changes in agroup of children’s theory of mind skills during the period of development of false belief
understanding, and identifying changes in the type of information children use to
answer questions regarding false beliefs. Before describing the present study in detail, a
review of research that has provided some insight into the period of transition from no
understanding of false belief to consistent success is presented.
Wellman et al. (2001) carried out a meta-analysis of 178 separate studies that
investigated the development of an understanding of false beliefs. This meta-analysis was
truly insightful, producing a number of important findings regarding the consistency of
false belief understanding with regard to different ages, countries, and tasks. In terms ofthe process of transition, the results showed that children who were 3 years 5 months
(3;5) and younger performed below chance, suggesting that young children made the
classic false belief error by referring to the true state of reality to predict a character’s
actions. However, children who were 4 years or older performed above chance,
suggesting that they were able to acknowledge that people held false beliefs and
correctly predict an individual’s actions from this knowledge. Between the ages of 3;5
and 4 years, there was a period of ‘confused, random performance’ where children were
performing at chance (Wellman et al., 2001, p. 678). Of course, these age parameters are
fluid rather than rigid, as there is variance in the age of onset and competence in falsebelief understanding. However, these parameters provide a useful guideline when
changes in mental state understanding occur.
A critical question arising from Wellman et al.’s meta-analyses is exactly what is
occurring during this period of at-chance level performance? The results of the meta-
analysis were based on the means of the samples of children studied, so it remains
unclear exactly how individual children progress through this period of transition.
There are a number of potential profiles of performance during this period. Firstly, the
at-chance mean could represent a group of children in which half understand false
beliefs and are systematically correct, but the other half do not understand and makesystematic errors. Therefore, false belief understanding is stable, with a sudden,
qualitative shift from no understanding to full understanding, with the at-chance level
mean being produced by two extreme levels of performance. Secondly, all children may
progress through a period of development where they begin by passing one test, then
two and so on, until they develop a full understanding of mental states. Such a
progression is essentially stable and continuous, with the chance-level performance,
suggested by Wellman et al.’s meta-analysis, being produced by individual differences in
Emma Flynn632
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
the initiation of this development. Thirdly, children’s performance on tests of false belief
understanding could be unstable, being ‘random’ and ‘confused’ with children showing
little within or between test consistency. However, if this profile is correct, then we
must bear in mind that during this period of random, confused performance, significant
changes are occurring in children’s cognition that allow them to consistently
understand false beliefs after the age of four years. Finally, there may not be a unique,systematic sequence of performance across children, and so there could be any
combination of the profiles of performance described above.
Wellman and Liu (2004) investigated the transition of children’s theory of mind
during the preschool period, by examining children’s changing competence across a
range of mental states. They used tasks that considered different mental states, but had
been scaled to require similar methodological demands. The sequence showed that
children become aware that two people can have different desires for the same object
before they become aware that people can have different beliefs about the same object.
Such a finding supports the proposal that the understanding of false beliefs isunderpinned by the understanding of desires (Wellman, 1990). Furthermore, Wellman
and Liu (2004) showed that understanding diverse beliefs, judging someone else’s
differing beliefs about the same situation when the child does not know which belief is
true and which is false, occurred before false beliefs, where the child does know which
belief is true and which is false. Finally, differentiating between real and apparent
emotion occurs late in the preschool years. Thus, Wellman and Liu suggest that
the transition in mental state understanding involves modification and mediation; this is
the broadening or generalizing of earlier insights to encompass later insights, and the
process of earlier insights enabling the attainment of later insights.There is a significant gap in our understanding of how individual children progress
from consistently failing false belief understanding tests to consistently passing them.
There is a lack of knowledge relating to the path, rate and variation both within and
across children during this period. Two important steps can be taken to fill this gap.
Firstly, a microgenetic approach can be adopted in which the same children are
repeatedly tested on a battery of theory of mind tests, so that within-participant change
can be examined, over the period of transition. Secondly, explanation tasks can be used
to establish changes in the type of information a child uses to address theory of mind
problems. The majority of theory of mind tasks, especially those that consider falsebelief understanding, do not require children to produce anything more than a simple
pointing gesture or a single-word answer. Such methodological requirements are
employed to overcome any differences in children’s linguistic competence. However, by
only having simple gestures or single-word answers, we are unable to elaborate on the
type of information that a child is using to reach an answer. For example, a child who
answers an unexpected transfer task incorrectly by pointing to the current location of
an object, rather than the original location, may be using a number of different types of
information. She could be using location-based information, for example, ‘she will go
there because that is where the object is’. She could use desire-based information, forexample, ‘she will go there because she wants that object’. Alternatively, she could give
an incorrect answer because she is confused and is simply guessing.
One method of distinguishing between the different types of information that
children use to answer problems of false belief is to ask children to explain why a
character behaves in a certain way in a false belief understanding task (Bartsch &
Wellman, 1989). For example, a child is told a story about a character who places an
object in a particular location before leaving a scene. While the character is absent, the
Stability and continuity in theory of mind 633
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
object gets relocated to a second location. In the prediction version of this task, the
character then returns, and the child is asked where the character will look for the
object. However, in the explanation version of the task, a child is shown the unfolding of
the full sequence of events, including the character going to the incorrect location to
retrieve the desired object. The child is initially asked where the object really is, and
then why the character has gone to the other location. A child who provides an answerthat refers to the story character’s false belief can be said to have an understanding of
false beliefs. In turn, a child who is unable to provide this lucid, and appropriately
complex, justification cannot be said to have an understanding of false beliefs. If a child
does not have a conceptual understanding of false beliefs, and so is consistently
incorrect, then the types of explanations the child gives, for example, location-based,
desire-based or no response/don’t know, provides important insight into the
information they are using during the period of transition before their consistent
success on tests of false belief understanding. Analysing changes in the type of
explanations that children provide on an unexpected transfer explanation task, and howthe children’s performance changes in relation to other mental state understanding
tasks, provides an important indication of children’s cognition during the period of
transition.
There has been much debate regarding the trajectories of the development of
explanations and predictions of behaviour in relation to false beliefs. Bartsch and
Wellman (1989) found that the ability to explain beliefs develops before the ability to
predict behaviour in tests of false belief understanding. In contrast, Wimmer and
Mayringer (1998) found the opposite sequence of development. Bartsch (1998) argues
that this disparity is due to different testing procedures. Procedures that requirechildren to make spontaneous explanations in false belief understanding tasks usually
show poorer results compared with prediction tests. Yet, when a prompt, for example,
‘what does Bill think?’, is included in the procedure, some studies have shown better
performance in explanation tests compared with prediction tests, although not all
(Foote & Holmes-Lonergan, 2003). An unexpected transfer explanation task was used in
the present study, and the specific procedure used provided a compromise between
these two extremes. After the initial test question was asked, if the participant did not
provide an answer focusing on the story characters false belief, then the test question
was repeated.There have been a handful of longitudinal studies investigating theory of mind
development in which participants were tested at two or more time points several months
apart (Astington & Jenkins, 1999; Carlson, Mandell, & Williams, 2004; De Villiers &
Pyers, 2002; Hughes, 1998b). Astington and Jenkins (1999) tested 59 3-year-old children’s
theory of mind and verbal ability at three time points over a 7-month period. They found
that language at an early time point predicted later theory of mind at two of three sets of
time points, whereas the reverse relation never held. Similarly, De Villiers and Pyers (2002)
found that language predicted later theory of mind on two occasions over three time points,
and that the reverse relation held once. Verbal ability plays an important role inunderstanding the narratives of false belief tests, and in providing children with the
resources to produce an appropriate response. However, examining the relations between
verbal skills and theory of mind over time shows that verbal skills play a fundamental role in
the development of theory of mind skills, as early verbal skills predict later theory of mind
ability. However, these longitudinal studies do not provide an insight into changes in theory
of mind as they are occurring. Instead, a more specific focus needs to be adopted. An
approach that allows change to be examined as it is happening is the microgenetic method.
Emma Flynn634
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
The microgenetic approach requires that: (a) observations track changes in the skills that
are of interest; (b) the density of observation is high relative to the change in that
competence; and (c) the collected information is subjected to trial-by-trial analysis to infer
the process that gave rise to the change (Siegler & Crowley, 1991). There are two studies
that have shown this intensive data collection in relation to the development of mental state
understanding.Bartsch and Wellman (1995) present an extensive analysis of the natural language of
10 children studied longitudinally from 112
to 6 years of age. The database contained
more than 200,000 utterances, with 12,000 of these using terms such as think, know,
and want. Such a database provided an exciting opportunity to undertake a fine-grained
analysis of transition in children’s understanding of the mind. The data showed that
children refer to another person’s beliefs before they grant these beliefs a central role in
explaining mind and action. Before explaining actions in terms of beliefs, children tend
to use desires to explain actions. The evidence to support this claim showed that 3-year-
olds progressed through a period when they often talk about beliefs, and even show
some evidence of understanding false beliefs, but they continue to explain actions with
reference to desires. In keeping with the findings of Wellman et al. (2001), Bartsch and
Wellman (1995) found that only at about 4 years do children begin to explain actions by
referring systematically to beliefs. These findings also concur with the findings of
Wellman and Liu (2004), as children’s theory of mind development shows an
understanding of desires before an understanding of beliefs or false beliefs.
Flynn, O’Malley, and Wood (2004) also adopted a microgenetic approach, but rather
than collecting observational data, 21 children aged between 3;1 and 3;10 were
repeatedly tested on three theory of mind tasks over six testing phases. The profiles
were examined to establish the path and rate of theory of mind development. Few
children showed a stable profile of development, as there were a number of regressions
in their aggregate theory of mind scores over the six phases. However, as only three
theory of mind tests were used, it was not possible to establish the true magnitude of
these regressions. In addition, the relations between the children’s performance on the
three tasks was not stable, within some phases of testing the tasks correlated
significantly with one another, but at other phases of testing they did not. These findings
were in keeping with Wellman et al.’s (2001) description of ‘confused, random
performance’ where children were performing at chance during the period of transition
(Wellman et al., 2001, p. 678).
The unstable performance shown on tests of false belief understanding in Flynn et al.
(2004) questions the reliability of these tests. In order to consider tests of false belief
understanding as reliable, we need to be clear that any instability in theory of mind
performance is caused by instability in the construct or associated skill, rather than
unreliability of the test being administered to evaluate the construct. Two studies have
directly examined the test–retest reliability of false belief understanding tasks. Mayes,
Klin, Tercyak, Cicchetti, and Cohen (1996) and Hughes et al. (2000) found that only
between 5% and 12% of test–retest trials show regressions in performance.
Furthermore, Wellman et al. (2001) argue that false belief understanding tests are
reliable. Their meta-analysis showed that overall, first-order false belief understanding
tests behaved reliably at 3;5 and below, as children tended to make the classic reality
error, and at 4 years and above, as children consistently showed an understanding of
false beliefs. Therefore, the random, confused performance that occurs between the
ages of 3;5 and 4 years is not due to a lack of test reliability.
Stability and continuity in theory of mind 635
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
The present study followed the design of Flynn et al. (2004) with a new sample of
children, and incorporated some significant changes. Firstly, seven theory of mind tests
were included rather than three, thus producing a better indication of the magnitude of
any potential between-phase fluctuations. Because seven theory of mind tests were
used, a criterion was set regarding the size of the change. Transitions of two or less
points were considered to be small, transitions of 3 or 4 points were consideredmoderate, and transitions of 5 or more points were considered large. Secondly, Flynn
et al. (2004) looked at the profiles of performance during the emergence of an
understanding of false beliefs, whereas the present study recruited participants from
across the period of transition to discover if and, if so, at which point theory of mind
skills began to stabilize. The study design allowed both within- and across-participant
comparisons to be made to provide an indication of the process of transition from
consistent failure to consistent success on tests of false belief understanding. Thirdly, an
unexpected transfer explanation task was included to discover at which point indevelopment children could explain behaviour with reference to false beliefs, and to
allow an analysis of the types of explanation children gave for a character’s behaviour.
The final change involved the inclusion of a control group. The rehearsal provided by
completing the battery of tasks at every testing phase could potentially have affected a
child’s performance on the tests. The inclusion of this group meant that the
experimental group’s performance at the last phase of testing could be examined for
practice effects.
The aims of this studyThe current study presents a unique piece of research that addresses a gap in our
understanding of theory of mind development by taking a detailed look at children in
this domain over a narrow time-frame. A microgenetic approach was employed in which
children were tested on a battery of mental state understanding on several occasions
over a period of 5 months. If a standard microgenetic study had been carried out, theneach participant would have been intensively tested for at least a year, from a period of
no understanding to full understanding. This would have led to problems of extensive
practice, boredom, and retaining participants. Therefore, in the present study,
participants were recruited at all stages of development, and the analysis examined
development both within and across individual children. The study concentrated on
establishing what happens during the transition from consistently failing tests of false
belief understanding to consistently passing the tests. The tests selected to be included
in the battery were not all false belief understanding tasks, but they were all theory ofmind tasks that showed a change in competence from 3 to 5 years of age. Furthermore, it
must be acknowledged that consistent success in the present study did not signify a full
understanding of false beliefs, as research has shown later developments in the
understanding of false beliefs, including second-order theory of mind (Perner &
Wimmer, 1985) and the implications of beliefs for emotions (Ruffman & Keenan, 1996).
Before the analysis could address the three main research questions, a number of
checks needed to be made relating to (i) the reliability of the coding and the ease of the
different versions of each test, (ii) whether the sample’s theory of mind was actuallychanging during the period of study, (iii) the inter-relations of the theory of mind tests,
(iv) practice effects, (v) the sequence of development, and (vi) the relations between
theory of mind and verbal ability. Following these initial analyses, the study’s three main
research questions could be investigated. Firstly, is the development of a theory of mind
Emma Flynn636
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
gradual, indicated by small improvements over time or are there sudden, larger changes
in performance? Secondly, are there regressions in theory of mind scores and, if so, what
is their magnitude? Finally, how does the information children use to provide
explanations for people’s behaviour change over time?
Method
DesignA microgenetic design was used to follow the performance of a group of children on a
battery of tests, which was administered every four weeks for six phases of testing.
These results were part of a larger study; however, the current analysis concentrates on
the development of theory of mind and verbal ability.
ParticipantsParticipants were 42 children (23 girls and 19 boys). At the first phase of testing, the
children were aged between 3;1 and 4;3 with a mean age of 3;10. After the first phase of
testing, each child was placed into one of two groups, a control group (which only
completed the battery at Phases 1 and 6), and an experimental group (which completed
all six testing phases). The control group was included to take into account practice
effects. The children’s allocation to each group was carried out so that the control and
experimental groups did not differ from one another in terms of age, gender, andperformance across the different tasks. However, as the experimental group was the
group of interest and would be more informative in terms of the study’s aims, more
children were allocated to the experimental group (N ¼ 28) than to the control group
(N ¼ 14). Both the control and experimental groups had a mean age at phase one of
3;10, both with a standard deviation of 4 months. The ratio of girls to boys in the control
and experimental groups were 8:6 and 15:13, respectively. Table 4 provides further
information about the two groups’ performances across the tasks, showing that there
was no significant difference between the groups’ performances on any of the tests atPhase 1.
The battery of tasksThere were 12 tasks in the battery; eight theory of mind tasks, three inhibition tasks, and
a verbal ability test. For the purposes of this analysis, only the theory of mind and verbal
ability tests will be described in detail because only data from these tests are presented
in the results.
Theory of mindThe eight theory of mind tasks were: a prediction and an explanation version of the
unexpected transfer task; a deceptive box test, which assessed a child’s understanding
of his/her own previous false belief and a naıve other’s false belief; an appearance-realitytask; a penny-hiding task, and two tests of false belief understanding in which the
location of the desired object was explicitly stated. For all of these tasks, except the
penny-hiding task, there were six different versions that were counterbalanced across
the six phases. This meant a child could not be successful simply because s/he had seen
the object/contents/story during a previous testing phase.
Stability and continuity in theory of mind 637
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Unexpected transfer tasks. Each child received two unexpected transfer tasks, a
prediction version and an explanation version, at each phase. There were 12 versions of
the tasks, six for the prediction task and six for the explanation tasks. Each version used
the same script but had minor alterations including different characters and different
transferred objects. For example, in one story, a ball was moved by Tom and looked for
by Rosie and in another, a cake was moved by Emma and looked for by Michael. Thescripts were adapted from the original Wimmer and Perner (1983) study and were
narrated by the experimenter with the aid of six pictures. Initially, a child was
introduced to two story characters (i.e. Character A and Character B) who were playing
in a room. The story progressed with Character A leaving the room after having placed
an object in a particular location. While Character A was out of the room, the object was
moved to a second location by Character B. Character A then returned and the child was
told that Character A wanted to find the object.
At this point, the prediction task and the explanation task differed. In the predictiontask, the test question was asked ‘Where will (insert the name of Character A) look for
the (insert the name of the object) first?’ This was taken from the work of Siegal and
Beattie (1991). Each child was then asked a reality check question: ‘Where is the (insert
the name of the object) really?’ This question was included to make sure a child knew
the true location of the object. Children were scored as correct and given a point if they
answered both the test question and the reality question correctly.
In the explanation task, adapted from Hughes (1998a) and Bartsch and Wellman
(1989), the child was told that Character A had returned to look for the object and askedthe reality check question ‘Where is the (insert name of object) now?’ With the aid of
another picture, it was then explained to the child that in fact Character A had looked for
the object in its original location. The child was then asked the test question ‘Why did
(insert name of Character A) look for the (insert name of object) there?’ Children were
coded as correct if they could provide an explanation for Character A’s behaviour that
referred to the character’s false belief. If a child was not correct, or did not respond after
the test question was asked, then the question was repeated. No other prompts were
provided.In both tasks memory questions were asked throughout the story, for example,
‘Where is the ball now?’ and ‘Where has Rosie gone?’ If a child was incorrect when
answering one of these questions, then that section of the story was repeated and the
question asked again. These were included to make sure that the child had paid
attention to the whole story.
Deceptive box test: Own previous and naıve other’s false beliefs. There were sixversions of the deceptive box test and a child received one of the versions at each phase
of testing. The procedure was taken from the original Hogrefe, Wimmer, and Perner
(1986) study. All the versions were presented using the same script but contained
different boxes and contents. The six different versions were: a Smarties tube containing
pencils; an egg box containing a spoon; a cornflakes box containing a bag; a biscuit box
containing a lemon; a Maltesers box containing marbles, and a crisp packet containing a
toy elephant. No child had any problems stating the prototypical contents of any of
these versions.Each version had two test questions; one evaluated a child’s understanding of another
person’s false belief and the other evaluated the child’s understanding of his/her own,
previous false belief. Initially, a child was shown a box that normally held prototypical
contents (e.g. a Smarties tube) and was asked what s/he thought was inside. All the
Emma Flynn638
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
children responded with the prototypical contents (e.g. Smarties). The box was opened
and the child was then shown that the box actually contained something novel (e.g.
pencils). The child was then asked what was really inside the box. Then the test
question concerning the child’s previous false belief was asked ‘When you first saw this
tube and it was all shut up like this, what did you think was inside?’ After a child had
responded to this test question, s/he was introduced to a puppet called ‘Sooty’ who hadbeen ‘asleep’ in the toy box. The child was asked ‘When Sooty wakes up and sees this
tube all shut up what will he think is inside it?’ For each test question, a child was given 1
point if s/he was able to acknowledge her/his own, previous false belief or the false
belief of the naıve individual and state that they believed that the box held the
prototypical contents (e.g. Smarties). S/he was coded as incorrect for each test question
if s/he stated that s/he had always thought, or that the naıve individual would believe
that the box contained the true contents (e.g. pencils).
Appearance-reality task. The procedure for this task was taken from the original
Flavell, Flavell, and Green (1983) study and was presented in exactly the same way for all
six objects. The six objects were: a pen that looked like a twig; a stone that looked like
an egg; a pencil sharpener that looked like a toy car; a sponge that looked like a rock; ahair band that looked like a toy duck; and a candle that looked like an apple. Initially, a
child was shown the object and asked ‘What does this look like?’ After the child had
answered (all children gave the correct answer to this question), s/he was allowed to
touch the object and the experimenter explained that although the object looked like a
specific object, it was in fact something else. The child was then asked ‘So what does it
look like?’ (appearance question) and then ‘What is this really?’ (reality question).
Children were scored as correct and given a point if they answered both questions
correctly.
Penny hiding task. The aim of this task was to appreciate that to hide a penny one has
to keep the penny concealed and not provide any clues to its whereabouts (Hughes,
1998a). The experimenter introduced this task by showing a penny to a child and
saying, ‘I’m going to put my hands behind my back and hide this penny. Now I’m going
to bring my hands out and you have to guess which hand it is in’. After hiding the pennyin one of her hands, the experimenter brought her two fists from behind her back and
placed them in front of the child and said ‘So which hand do you think it is in?’ After the
child had guessed, the experimenter showed the child if s/he was correct. After three
successive guesses, the experimenter said ‘Now, it’s your turn, you hide the penny
behind your back and I’ll guess which hand it’s in’. The child was given the penny and
s/he hid the penny on three separate trials with the experimenter guessing where the
penny was at each trial. A child was coded as successful on each trail if s/he: (i)
concealed the coin appropriately, (ii) produced both hands for guessing, and (iii) gaveno verbal clues as to the location of the coin. In keeping with Hughes (1998a), children
were given a point on this task if they were successful on two cut of three penny-hiding
trails.
False belief explicit location task. There were 12 versions of this task and a child
received two of these versions at each testing phase. The procedure was taken from the
original Wellman and Bartsch (1988) study. A child was shown a picture of a character in
Stability and continuity in theory of mind 639
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
a particular location (e.g. a garden) and told that the character wanted to find an object
(e.g. a bike). It was explained that the object was in a specific location but the character
believed that the object was in a different location. For example, ‘Billy is looking for his
bike. The bike is in the garage (experimenter points to the garage on the picture) but
Billy thinks his bike is in the shed (experimenter points to the shed on the picture)’. The
child was then asked ‘Where will Billy look for his bike first?’ (false belief question) and‘Where is the bike really?’ (reality question). This is similar to the unexpected transfer
prediction task but in this task the real location of the object and the character’s beliefs
are explicitly stated to the child. Each child was coded as successful if s/he answered
both questions correctly.
Verbal abilityThe children’s verbal ability was tested using a measure of receptive vocabulary, the
British picture vocabulary scale, (BPVS; Dunn, Dunn, Whetton, & Pintilie, 1997). This
was administered during the first and last phases of testing.
ProcedureEach child was taken into a quiet area of the nursery and after an initial introductory
period, the testing phase began. The order of presentation of the tasks was always kept
constant (Luria hand-game, penny-hiding task, unexpected transfer prediction task,
deceptive box test, appearance-reality task, false belief explicit location tasks, bear-dog
task, unexpected transfer explanation task, and the Luria lights task1). On both the first
and the last phases of testing, the BPVS was administered after all the other tasks.
Although presenting the tasks in a fixed order may have produced order effects, it was
more important that a fixed order allowed direct comparisons of the children’s abilitiesto be made over time and also between individual children. It also allowed certain tasks
to always be administered before others. For example, the prediction version of the
unexpected transfer task was administered before the explanation version. This reduced
the likelihood of children answering the prediction task correctly because they had seen
the events unfold in the explanation version. The whole battery took approximately 30
minutes.
Because the testing took place over a long period of time, it was not possible to have
all the children complete all of the test phases. Children were sometimes on holiday orabsent through illness. If a child missed a test phase and it was not possible to collect his
or her data over the next 2 days, then s/he was discounted from that test phase but was
included in the next. At the end of testing, 19 children had completed all six test phases,
three children missed the fourth, two missed the fifth, and two missed the sixth. Two
children missed more than one phase, one missed Phases 5 and 6, and one missed
Phases 3 and 6. Two children in the control group missed Phase 6.
Results
As explained in the Aims, before addressing the main experimental questions regardingrate, stability and changes in the type of information used in tests of theory of mind,
some initial analyses needed to be carried out. Firstly, in Section (i), inter-rater reliability
1 The Luria hand-game, bear-dog task, and Luria lights task were three measures of inhibitory control. These tasks will not beincluded in the analyses.
Emma Flynn640
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
is assessed, and the different versions of each test are examined to establish whether all
versions were of the same level of difficulty. In Section (ii), the experimental group’s
scores are examined for improvements on the theory of mind tests, as the children
needed to show some transition for the analysis to continue. Section (iii) presents an
exploratory principal component analysis, which examines the children’s scores to see
if there was consistency between the different tests. Practice effect investigations arepresented in Section (iv), comparing the performance of the control and experimental
groups. This was followed by an examination of the sequence of development of the
different tests in Section (v). Section (vi) presents the associations between children’s
theory of mind and verbal ability. Finally, in Section (vii), an aggregate theory of mind
score was produced for each child at each phase by adding his/her scores on seven of
the theory of mind tasks. The penny-hiding task was not included in this aggregate
score, as the principal component analysis showed that it loaded on to a different factor
to all the other measures. An illustration of the distribution of the aggregate scoresaccording to age is presented in Table 1. These aggregate scores allowed the main
research questions to be addressed in Section (vii), looking at the magnitude and rate of
improvement, and at the existence and magnitude of regressions. Finally, Section
(viii) investigates how the information children use to explain behaviour in terms of
false beliefs changes over time.
Section (i): Inter-rater reliability and comparison of the different versionsWe selected 100 testing sessions at random (55% of the total), which were coded by a
second experimenter to establish the level of reliability of the original coding. The levelof agreement for each of the theory of mind tests was never below 97%. Cohen’s kappa
was 0.89 for the coding of the explanations in the unexpected transfer explanation task.
Cases of disagreement were rectified by discussion based on the original videotaped
footage.
A set of one-way ANOVAS was carried out to assess whether any version of the same
test was easier or harder than any other version. When the children’s performances on
the different versions were collapsed across the phases, no differences were found
(unexpected transfer prediction task, Fð5; 151Þ ¼ 0:23, ns; deceptive box, own
previous false belief, Fð5; 151Þ ¼ 0:86, ns; deceptive box, naıve other’s false belief,
Fð5; 151Þ ¼ 1:44, ns; appearance-reality task, Fð5; 151Þ ¼ 1:74, ns; unexpected
transfer explanation task, Fð5; 151Þ ¼ 1:02, ns; false belief explicit location task,
Fð11; 302Þ ¼ 0:67, ns).
Section (ii): Developmental changeSign tests were performed on the experimental group’s scores for all the theory of mind
tasks. Each child’s performance on his/her first phase of testing was compared with
his/her performance on the last phase of testing that s/he undertook. Using the
children’s first and last scores, rather than relying in the scores from Phases 1 and 6,
allowed all the children’s scores to be included in this analysis. It was predicted thatthere would be an improvement on all the tasks. Table 2 shows the results of these sign
tests, as well as the number of children who improved, regressed or showed no change
in performance. All the tasks showed a significant improvement in performance across
the period of study, except the appearance-reality task and the second false belief
explicit location question.
Stability and continuity in theory of mind 641
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Table
1.
The
child
ren’s
aggr
egat
eth
eory
ofm
ind
score
s2
Age
/Child
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
A0
00
1–
–B
00
–2
1–
C2
20
–3
3D
12
23
4*
4*
E2
46
54
–F
12
23
6*
5G
44
34
–3
H2
22
43
3I
20
10
00
J4
55
66
–K
16
5–
66*
L3
65
56
6M
3*
57*
6*
67*
N3
22
55
7*
O3
6*
54*
3*
6*
P1
03
44*
2Q
7*
67*
7*
7*
7*
R6*
67*
7*
7*
7*
S0
4*
7*
7*
7*
7*
T0
3*
3*
3*
4*
3*
U5
55
5*
5*
7*
V3
5*
4*
–4*
5*
W6
66
–7*
6*
X0
11
01
1Y
4*
7*
54*
6*
4*
Z5*
6*
7*
6*
6*
6*
A1
13
33*
4*
3*
B1
6*
76
7*
7*
7*
(i)
00
1.3
2.3
2.5
3.3
3.6
3.4
2.5
2.3
2.6
3.9
3.5
4.5
4.9
5.1
5.0
5.1
4.7
5.3
(ii)
00
1.5
2.0
2.0
3.0
4.0
3.0
2.5
2.0
3.0
3.0
5.0
5.0
5.5
6.0
6.0
6.0
5.0
6.0
Emma Flynn642
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Table
1.
(Continued)
Age
/Child
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
(iii)
00
00
77
77
77
77
Note.
The
poss
ible
range
ofsc
ore
sw
as0–7.
–in
dic
ates
that
ach
ilddid
not
par
tici
pat
eat
that
phas
eT
he
score
sw
ith
aste
risk
sre
pre
sent
phas
esin
whic
ha
child
was
succ
essf
ulon
the
unex
pec
ted
tran
sfer
expla
nat
ion
task
.T
he
bott
om
lines
inth
eta
ble
show
:(i)
the
mea
nsc
ore
for
that
colu
mn;(ii)
the
med
ian
score
for
that
colu
mn;an
d(iii)
the
pre
dic
tions
bas
edon
Wel
lman
etal.
(2001)’s
met
a-an
alys
is.
2Eac
hch
ildis
pla
ced
inth
eta
ble
acco
rdin
gto
his
/her
age
atth
efirs
tphas
eofte
stin
g.T
he
loca
tions
ofth
esc
ore
sfo
llow
ing
the
firs
tsc
ore
are
only
appro
xim
ate
inso
far
asth
eir
rela
tion
toth
eag
eax
isas
the
phas
esofte
stin
gw
ere
set
at4
wee
ksap
art
asoppose
dto
cale
ndar
month
sap
art.
Stability and continuity in theory of mind 643
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Section (iii): Inter-relations of the theory of mind tasksAn exploratory principal component analysis was undertaken to discover whether all
the tests in the theory of mind battery were loading on to the same construct. The taskscores were collapsed across all the phases of testing for the experimental group. As
Table 3 shows, two components with eigenvalues greater than 1.0 accounted for 55% of
the total variance. The KMO test of sampling adequacy was met (KMO ¼ 0.79), and
Bartlett’s sphericity yielded a x2 of 313 (df ¼ 28, p , .001), indicating the factor model
was appropriate. Table 3 presents the loadings of the test items on to the different
components. Component 1 accounted for 39% of the variance and was loaded on by
seven of the eight tests (not the penny-hiding task). Component 2 accounted for a
further 16% of the variance and was loaded on by the penny-hiding task only.
This analysis was not inflated by the inclusion of both false belief explicit location
questions. When this analysis was repeated twice, but with only one of the false belief
explicit location questions entered each time, the results were the same as the initial
analysis, including the same amount of variance being accounted for by both
components. Therefore, all further analysis involving the aggregate theory of mind score
involved all the tests, except the penny-hiding test, as this did not load on to the same
component.
Table 2. The results of sign tests and the development of children in the experimental group’s
performances from their first to their last phase of testing
Theory of mind task N ¼ 28 Improved Regressed Unchanged Sign test result
Aggregate theory of mind score 23 2 3 ***Unexpected transfer prediction task 12 2 14 *Unexpected transfer explanation task 10 0 18 **Deceptive box: Another person’s false belief 10 2 16 *Deceptive box: Own, previous false belief 9 1 18 *Penny-hiding task 9 1 18 *Appearance-reality task 8 3 17 nsFalse belief explicit location tasksQuestion 1 8 1 19 *Question 2 8 3 17 ns
*p , .05, **p , .01, ***p , .001, ns indicates a non-significant result.
Table 3. Loading scores for the different theory of mind tests
Tests Component 1 Component 2
Unexpected transfer prediction task 0.796 20.205False belief explicit location task: Question 1 0.757 20.464False belief explicit location task: Question 2 0.738 20.463Deceptive box: Another person’s false belief 0.670 0.125Deceptive box: Own, previous false belief 0.586 0.397Unexpected transfer explanation task 0.582 0.337Appearance-reality task 0.506 0.367Penny-hiding task 0.338 0.622
Emma Flynn644
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Section (iv): Practice effectsRepeated measures multivariate analyses of variance were undertaken to compare
changes in the experimental and control groups’ aggregate theory of mind scores and
the individual task scores from Phase 1 to 6. Multivariate analyses were undertaken
because the assumption of sphericity was not met. Changes in the experimental group’s
aggregate theory of mind scores (mean change in score ¼ 1.92, SD ¼ 1:90) weresignificantly greater than changes in the control group’s aggregate scores (mean change
in score ¼ 0.71, SD ¼ 1:88; Fð1; 34Þ ¼ 5:24, p , .05). There did appear to be sizable
differences on the two deceptive box test questions but these were not statistically
significant, deceptive box: own, previous false belief Fð1; 34Þ ¼ 0:84, p ¼ :37; naıve
other’s false belief Fð1; 35Þ ¼ 2:69, p ¼ :11.
Section (v): Sequence of developmentThe analysis presented in Table 4 also allowed an examination of the level of difficulty of
the different tests. Each test’s difficulty is reflected in the number of children who
passed the test during a test phase; that is, easier tests are passed by more children than
harder tests. From the experimental group’s performance at Phase 1, the order of ease,
from easiest to most difficult, was: appearance-reality task, penny-hiding task, false belief
explicit location tasks, unexpected transfer prediction tasks, the deceptive box test
questions, and the unexpected transfer explanation task. The pass rates for the control
group at Phase 1 produced an almost identical sequence of performance, with only thedeceptive box test question regarding another person’s false belief being out of
sequence. Furthermore, this sequence of performance was almost identical when all of
the tests were collapsed across all of the phases for the experimental group. The order
of success, including pass rates, across all the phases was: penny-hiding task (75%),
appearance-reality task (62%), unexpected transfer prediction task (62%), false belief
explicit location tasks (both questions 61%), deceptive box: another person’s false belief
(56%), deceptive box: own, previous false belief (51%), unexpected transfer explanation
task (40%). These results suggest that the task’s difficulty appears to be relativelyconsistent across different groups and repeated testing. However, when the individual
profiles of performance for children who scored between 1 and 6 on the aggregate
theory of mind scores were examined, only 33 out of the 122 profiles (27%) fitted this
sequence of development.
Section (vi): Verbal abilityThe experimental group’s standardized BPVS scores at Phase 1 ranged from 82 to 119,
with a mean of 96.70 and a standard deviation of 10.51. The experimental group’s
aggregate theory of mind scores correlated significantly with their BPVS score at Phase
1, rð27Þ ¼ :31, p , .05; Phase 2, rð27Þ ¼ :36, p , .05; Phase 3, rð27Þ ¼ :33, p , .05);
Phase 4, rð23Þ ¼ :41, p , .05); Phase 5, rð25Þ ¼ :45, p , .05); and Phase 6, rð24Þ ¼ :46,
p , .05. A Pearson correlation was carried out to compare developmental changes in
the experimental group’s aggregate theory of mind scores (score at Phase 6 minus score
at Phase 1) and BPVS scores (raw score at Phase 6 minus raw score at Phase 1). Incontrast with correlations at specific phases, this correlation was not significant,
rð24Þ ¼ :25, p . .05. In an attempt to establish directionality between the children’s
BPVS scores and aggregate theory of mind scores, cross-lagged correlations were
performed across Phases 1 and 6 for the experimental group. The internal consistency
of the aggregate theory of mind scores and the raw BPVS scores were stable as these
Stability and continuity in theory of mind 645
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Table
4.
The
resu
lts
for
the
contr
olan
dex
per
imen
talgr
oups
atPhas
es1
and
6
Phas
e1
Contr
olgr
oup
N¼
14
Phas
e1
Exp-a
lgr
oup
N¼
28
Phas
e1
tte
stre
sults
Phas
e6
Contr
olgr
oup
N¼
12
Phas
e6
Exp-a
lgr
oup
N¼
24
Phas
e1
toPhas
e6
repea
ted
mea
sure
sM
AN
OVA
resu
lts
Mea
nag
e(m
onth
s)46.4
3(3
.95)
46.0
7(4
.41)
nsM
ean
BPV
SSc
ore
92.7
8(1
2.2
6)
96.7
0(1
0.5
1)
ns94.3
3(1
3.1
8)
98.9
1(1
1.6
5)
nsG
ender
8fe
mal
es15
fem
ales
ns7
fem
ales
13
fem
ales
ns6
mal
es13
mal
es5
mal
es11
mal
esTheoryof
mindtasks
Mea
nag
greg
ate
theo
ryofm
ind
score
3.2
1(2
.08)
2.6
4(2
.13)
ns3.9
2(2
.87)
4.7
1(2
.14)
*%
Pas
sra
tefo
rth
eunex
pec
ted
tran
sfer
pre
dic
tion
task
36
36
ns58
75
ns%
Pas
sra
tefo
rth
eunex
pec
ted
tran
sfer
expla
nat
ion
task
29
22
ns58
63
ns%
Pas
sra
tefo
rth
edec
eptive
box:O
wn,pre
vious
fals
ebel
ief
29
32
ns42
67
ns%
Pas
sra
tefo
rth
edec
eptive
box:A
noth
erper
son’s
fals
ebel
ief
43
32
ns33
67
ns%
Pas
sra
tefo
rth
eap
pea
rance
-rea
lity
task
64
50
ns67
67
ns%
Pas
sra
tefo
rth
epen
ny-h
idin
gta
sk64
46
ns83
79
ns%
Pas
sra
tefo
rth
efa
lse
bel
iefex
plic
itlo
cation
task
Ques
tion
157
43
ns67
71
nsQ
ues
tion
264
46
ns67
63
ns
Note.ns
indic
ates
anon-s
ignifi
cant
resu
lt,*p
,.0
5,st
andar
ddev
iations
are
inpar
enth
esis
.
Emma Flynn646
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
scores correlated significantly at Phase 1 and 6, rð24Þ ¼ :59, p , .01 and rð24Þ ¼ :57,
p , .01, respectively). The aggregate theory of mind score at Phase 1 did not correlate
significantly with the BPVS scores at Phase 6, rð24Þ ¼ :14, ns); however, the correlation
between the BPVS scores at Phase 1 and the aggregate theory of mind scores at Phase 6
approached significance, rð24Þ ¼ :39, p ¼ :06). When the control group was included
in this analysis, although correlation between the BPVS score at Phase 6 and theaggregate theory of mind correlation at Phase 1 remained non-significant, rð37Þ ¼ :20,
ns), the correlation between the BPVS score at Phase 1 and the aggregate theory of mind
score at Phase 6 reached significance, rð36Þ ¼ :55, p , .001.
Section (vii): Aggregate theory of mind scoresBinomial tests were carried out to test whether the experimental group’s aggregate
theory of mind scores deviated from chance in relation to the three age blocks indicatedin Table 1, derived from Wellman et al.’s (2001) meta-analysis predictions. A cut-off score
of three correct answers (representing success on half of the tests on which a child had
two potential answers) was used. The scores produced by children who were younger
than 3;5 were below chance (binomial test, p , .01), scores produced by children over
4 years were above chance (binomial test, p , .001) and scores produced by children
between 3;5 and 4 years were not significantly different from chance.
When the children’s phase-to-phase aggregate theory of mind scores were examined,
it was discovered that 37% of the transitions were improvements, (overall, 50% of theimprovements were by 1 point, 26% by 2 points, 20% by 3 points, 2% by 4 points and 2%
by 5 points). In contrast, 24% of the phase-to-phase transitions were regressions (84% by
1 point, 16% by 2 points). There were no regressions of more than two points. Thirty-
nine percent of the phase-to-phase transitions showed the same score at both phases,
22% of these were at ceiling.
Section (viii): Unexpected transfer explanation taskParticular attention was focused on the point during the development of theory of mind
at which children passed the unexpected transfer explanation task. Table 5 shows the
percentage of each explanation type for each aggregate theory of mind score (excluding
the unexpected transfer explanation task). As the control and experimental groups’
performance did not differ on this task, the table incorporates both groups’ answers.
Children’s responses fell into five categories: (i) no response or ‘don’t know’; (ii) a
situational explanation, describing some aspect of the current situation or prior eventbut with no reference to the false belief or no appreciation of the conflicting events
(e.g. ‘it’s in the cupboard’ or ‘cos it’s her teddy and he is hungry’); (iii) a wanting
explanation, describing the character’s desire for the object (e.g. ‘he wants it’ or ‘he
wants to read his book’); (iv) a correct answer that did not refer to mental states but did
reflect on the conflicting events and showed an understanding that the character’s
actions would be linked to their false beliefs (e.g. ‘she put it there but now it’s in there’
or ‘cos she put it in there and the other one put it in there’); and (v) a thinking
explanation, referring to the thoughts or knowledge of the character (e.g. ‘he thinks it’sin there’ or ‘cos she doesn’t know where it is, she thinks it’s in there’). Children who
gave an answer in either Category 4 or 5 were coded as correct on the task as they
showed an explicit understanding of mental states, or an appreciation of the conflicting
events and the character’s responses to them; all other explanations were incorrect.
Sometimes, children gave more than one explanation. On these occasions, the highest
Stability and continuity in theory of mind 647
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
level of explanation children provided was noted. A thinking explanation was thought
to be the most sophisticated, followed by a correct explanation (which did not mention
mental states), then the wanting explanation, as these are mentalistic in some sense but
fail to consider beliefs, then the situational explanations and finally the don’t know
responses. If children did not provide a correct explanation after the test question was
asked, then the question was repeated. In 151 of the 183 incidents of repeating the
question, children gave the same level of explanation.
Discussion
The present study provides a unique opportunity to examine exactly what occurs
during the period of transition from a state of equilibrium, in which children
consistently rely on reality to address questions of false belief, to a more sophisticated
state of equilibrium, where children are able to refer to individual’s false beliefs.
Examining changes in performance within the same children over time provides
stronger evidence for sequences than inferences from group means. Although the studycannot definitively distinguish between the different theories designed to explain the
development of theory of mind, it does provide some distinctive information regarding
changes in individual children’s profiles of performance. This includes information
about rate and stability, and how the types of answers provided by children to explain
behaviour change over time.
Although few children were consistently failing all the theory of mind tests before
3;5, their performance was poor and below chance during this period. Similarly,
although not all children were producing perfect scores after 4 years, the performanceswere stable and above chance, with the majority of children being successful on most of
the theory of mind tests. Between 3;5 and 4 years, the children’s aggregate theory of
mind scores covered the full range of possible scores, producing an overall group
performance that was at chance levels. These profiles of performance support the
Table 5. Percentage of each type of explanation according to children’s aggregate theory of mind score
(excluding the unexpected transfer explanation task)
Aggregate theory ofmind score
(i)Don’t know orno response
(ii)Situationalresponse
(iii)Wantingresponse
(iv)Correct
(no mental state)
(v)Thinkingresponse
09 82 9 0 0
132 61 7 0 0
239 39 9 0 13
342 19 3 13 23
433 7 13 7 40
523 3 14 9 51
618 8 0 26 48
Emma Flynn648
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
parameters of development suggested by Wellman et al.’s (2001) meta-analysis. It must
be noted, as shown in Table 1, that more children in the critical period of 3;5 to 4 years
would have been optimal. However, when the individual children’s profiles are
examined, it appears that children are showing change before and after these ages.
Therefore, one should view these parameters as guidelines, rather than rules.
If gradual development is defined as changes of no more than 2 out of the 7 possiblepoints from one phase to the next, 19 of the 28 children showed a gradual improvement
in their theory of mind score. The overwhelming majority of improvements (76%) were
never any more than 2 points, and the regressions were never by more than 2 points. As
the children did not appear to show sudden, large advances in performance, it can be
concluded that the development of an individual child’s expressed theory of mind
competence is gradual. However, Werner (1957) and Flavell (1971) have highlighted the
difficulty of establishing the specific processes involved in change. For example, a
discontinuous change involves a qualitative shift from one particular type of behaviour
to a different type of behaviour, but this change may become distinguishable onlygradually. Further examination of the profiles of performance in the light of the
children’s explanations on the unexpected transfer explanation task has shown that,
although the aggregate scores showed a gradual improvement, there may have been
some qualitative changes in the strategies or theories children used in tests of false belief
understanding.
Evidence from this study and that of Flynn et al. (2004) shows that children appear to
progress from a period in which they rely on reality to answer questions in false belief
understanding tasks to a period where they begin to pass tests but show very poor
overall theory of mind performance. No child who achieved an aggregate score(excluding the unexpected transfer explanation task) of between 0 and 1 was able to
explicitly explain behaviour in terms of false beliefs. For children who scored 0 or 1, 61%
to 82% of all answers provided on the unexpected transfer explanation task were
situational, relying on reality to explain the behaviour. As the aggregate theory of mind
scores increased the predisposition to relying on reality begins to fade, and the number
of situational explanations decreases. Some children still provide situational
explanations, but as the aggregate theory of mind scores increase there is an increase
in the number of don’t know/no responses. Such a finding is paradoxical; although the
children’s aggregate theory of mind scores are increasing, suggesting a betterunderstanding of mental states, the children are unable to articulate this understanding
and instead chose not to give an explanation. Therefore, there does appear to be a
period of confused performance, as suggested by Wellman et al. (2001), in which
children are unable to explicitly provide an explanation of a story character’s behaviour.
As children’s theory of mind skills become more established (i.e. a score of 3 or more),
the number of situational and don’t know/no responses decreases, as children are more
likely to explain behaviour in terms of an individual’s false beliefs. Once children begin
to provide ‘thinking’ explanations, they do not tend to return to a less sophisticated type
of explanation. There does not appear to be a systematic production, or stage ofexplanations related to desire-based reasoning, except that these tend not to occur at
the extremes, such as when children are relying on reality or when they are using false
beliefs to explain behaviour.
Interestingly, the developments illustrated by the profiles of performance for the
children’s aggregate theory of mind scores appear stable. Flynn et al. (2004) concluded
that the development of theory of mind skills was unstable. However, increasing the
number of theory of mind tests included in the aggregate score from 3 to 7 has shown that
Stability and continuity in theory of mind 649
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
the development is steady. Although 22 of the 28 children in this study showed some
regression in theory of mind ability, these regressions were small, never amounting to
more than 2 points out of a possible regression score of 7. Therefore, even when the
children are confused – being unable to provide lucid explanations on the unexpected
transfer explanation task – they are not showing great fluctuations in performance.
A reassuring finding from the present study was that all but one of the differentmeasures designed to evaluate theory of mind loaded on to the same construct.
Encouragingly, the tests that were showing this agreement were those tests that are
most commonly used in the literature, for example, the deceptive box test and the
unexpected transfer prediction task. The exception was the penny-hiding task, which
did not load on to the same factors as all the other tests. The penny-hiding test is a test of
perspective taking in a deception situation. It requires a child to hide a penny in front of
someone, therefore considering that person’s visual perspective, so that no clues are
given to the penny’s location. The results of the principal component analysis was
supported by Hughes (1998a) who found that the penny-hiding task correlatedsignificantly with another deception task, but not with either a deceptive box task or
false belief tasks that required an explanation using mental states. Therefore, the penny-
hiding test may be evaluating one aspect of theory of mind (i.e. deceptive ability) but the
other seven tests are measuring another ability relating to changes in 3- and 4-year-olds’
mental state understanding. This other component is closely associated with the
understanding of false beliefs as all but one of the tasks, the appearance-reality task
being the exception, are measures of false belief understanding. False belief
understanding represents a robust measure of an important early development of
theory of mind, but we must bear in mind that it is only one narrow aspect of amultifaceted domain. The use of the microgenetic approach has provided interesting
results, which could not have been produced using the usual, cross-sectional approach.
Future work could adopt this approach to examine change in other theory of mind
developments across the life-span including second-order theory of mind and later
developments at the first-order theory of mind period, such as the implication of
understanding beliefs for emotions.
As well as showing that the tests are valid, this study also shows that there is a
relatively consistent sequence of development across the tasks, which was apparent
across different groups and testing phases. This consistent sequence of developmentcould be caused by the non-specific task demands of the different theory of mind tests. A
child’s theory of mind skills may be in place, but the non-specific demands of the tests
cause children to perform differently on each test. Two pieces of evidence argue against
this proposal. Firstly, even though the tests are superficially very different, all the tests,
except those in the extremes (penny-hiding, appearance-reality, and the unexpected
transfer explanation tasks), were showing similar levels of difficulty (i.e. comparable
pass rates). Secondly, when the individual profiles of performance for children who
scored between 1 and 6 on the aggregate theory of mind scores were examined, only
1/4 fitted this sequence of development. This finding suggests that the sequence is trueof groups, but not of individuals. However, unlike Wellman and Liu (2004), the sequence
produced in the present study cannot be used as a definitive sequence by which to
compare individual children’s development, as the tasks were not counterbalanced
across children and testing phases, and so children’s performances may have been
affected by order of presentation.
The substantial associations found between the different theory of mind measures
contrasts with Flynn et al. (2004) who found that children’s performances on only two
Emma Flynn650
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
of the three false belief understanding tests were associated with one another. Flynn
et al. (2004) looked at the period of emergence of false belief understanding, whereas
the present study examined the development of theory of mind skills at a point when
the skill was becoming more established. Therefore, it appears that, at this point of
development, most tests of theory of mind show high construct validity. However, this
highlights the significant issue of reliability, validity and stability of false beliefunderstanding during this period of change. It is exactly this period of instability that
researchers interested in individual differences in false belief understanding must target.
Many studies have reported problems in finding robust associations between different
theory of mind tests, and their associated variables. For example, Mitchell (1996)
reported that, ‘The most strange thing about the age trends was the lack of them : : :Quite simply, it has become fashionable to claim that there is a sharp age trend, but in
fact there is not’ (Mitchell, 1996, pp. 137–138). The present study shows that rather
than the tasks lacking reliability and validity, children appear to progress through a
period of confusion during the transition, where they fail tasks that they previouslypassed, and are unable to explain false beliefs, even though they could do so before.
Therefore, researchers should not be surprised if they are not able to produce previously
reported effects when looking at performance during this critical period.
An association was found between children’s verbal ability and their theory of mind,
which replicated the findings of Astington and Jenkins (1999) and De Villiers and Pyers
(2002). Not only was there within-phase correspondence between children’s verbal
ability and their theory of mind, cross-lagged correlations showed that early linguistic
skills predicted later theory of mind but the reverse – early theory of mind predicting
later linguistic skills – did not hold. Verbal abilities play an important role in childrenunderstanding task instructions and being able to produce sophisticated explanations.
Yet, this robust evidence of early linguistic skills predicting later mental state
understanding shows that language plays a more fundamental role in social cognition
than simply being able to meet certain methodological requirements. Further work
needs to establish the elements of language that play a part in this development, and
how a child’s linguistic environment affects this language/theory of mind relationship.
An examination of practice effects showed that repeated testing on this battery of
theory of mind tasks significantly improved the experimental group’s aggregate theory
of mind scores compared with the control group’s aggregate scores. This practice effectwas surprising. Children were provided with no feedback from the experimenter
regarding their success or failure on the tasks. At the end of each test, the experimenter
provided some general encouragement and then moved on to the next task. Previous
studies have explored the role of different types of feedback on children’s theory of
mind performance. Slaughter and Gopnik (1996) avoided presenting children with
explanations in their training procedure, instead providing brief verbal feedback. Their
procedure was based on the assumption that it is primarily the child’s own perception
of conflicting evidence, rather than an adult’s tutoring of the correct answer, which
causes theory revision. The current findings support this hypothesis. Although theexperimenter provided no explicit feedback, the children may have experienced
implicit contradictions caused by being repeatedly presented with questions that
required them to consider their theories of human behaviour and mental states. Such a
finding is also supported by the fact that the effect was found for the aggregate theory of
mind score rather than an individual test, suggesting that this study’s procedure
facilitated some conceptual change rather than a rule or strategy change that was
implemented consistently successfully on one or two tests.
Stability and continuity in theory of mind 651
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
The current study extends our understanding of the time scale over which change
occurs. Training studies have previously shown changes in theory of mind performance
over relatively short periods. Slaughter and Gopnik (1996) facilitated change over 2 to 3
weeks and Appleton and Reddy (1996), over 4 to 6 weeks. In the present study, children
received eight tests every 4 weeks for six phases. Although there was a statistically
significant improvement in the experimental group’s aggregate theory of mind scorecompared with the control group, this improvement was not large (a mean difference of
1.92). Slaughter and Gopnik (1996) suggest that the revision of theories is not
immediate, but rather consists of a gradual appreciation of cognitive inconsistencies, as
children need time to ruminate on their theory change. These results show that the
procedure used in the present study provides children with the rumination time to
revise their theories. Further work needs to be undertaken to establish exactly which
element of the presentation of this battery, for example, frequency of presentation, size
of battery, number of phases, and so on, facilitates this improvement.The existence of practice effects also highlights another possibility for future
research. The administration of the unexpected transfer explanation task during the
period of transition in this study has provided important information about changes in
children’s information use during the transition from consistently failing tests of false
belief understanding to consistently passing these tests. However, this was just one task
in a battery of seven theory of mind tasks. Future work could include more explanation
tasks, or alternatively, an experimenter could ask a child to explain their answer on the
other theory of mind tests. As the present study was the first to implement themicrogenetic approach across the full transition period, it aimed to look at spontaneous,
rather than facilitated change, so as to produce an initial illustration of change from
which future experimental manipulations could be made. Repeatedly asking children to
explain their logic would have highlighted this logic and so may have facilitated the
development further. Future work could examine how and when explanations facilitate
theory of mind development.
Although the present study cannot definitively distinguish between the different
theories designed to explain the development of theory of mind, it does provide somedistinctive information regarding changes in individual children’s profiles of
performance, which a successful theory must explain. For example, a successful
theory must assume that the development of, or at least the demonstration of, theory of
mind skills (i) occurs gradually, (ii) is stable with occasional small regressions in
performance, (iii) is predicted by early verbal skills, (iv) can be facilitated by the
repeated presentation of a set of test over a number of weeks without explicit feedback,
and (v) causes children to progress through a period in which they no longer rely on
reality to explain an individual’s behaviour, but instead choose to give no response.
References
Appleton, M., & Reddy, V. (1996). Teaching three-year-olds to pass belief tests: A conversational
approach. Social Development, 5, 275–291.
Astington, J. W., & Jenkins, J. M. (1999). A longitudinal study of the relation between language and
theory-of-mind development. Developmental Psychology, 35, 1311–1320.
Bartsch, K. (1998). False belief prediction and explanation: Which develops first and why it
matters. International Journal of Behavioural Development, 22, 423–428.
Bartsch, K., & Wellman, H. (1989). Young children’s attribution of action to beliefs and desires.
Child Development, 60, 946–964.
Emma Flynn652
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Bartsch, K., & Wellman, H. (1995). Children talk about the mind. New York: Oxford University
Press.
Carlson, S., Mandell, D., & Williams, L. (2004). Executive function and theory of mind: Stability and
prediction from ages 2 to 3. Developmental Psychology, 40, 1105–1122.
Carlson, S., Moses, L., & Breton, C. (2002). How specific is the relation between executive
function and theory of mind? Contributions of inhibitory control and working memory. Infant
and Child Development, 11, 73–92.
Carpendale, J., & Lewis, C. (2004). Constructing an understanding of mind: The development of
children’s social understanding within social interaction. Behavioural and Brain Sciences,
27, 79–96.
De Villiers, J., & Pyers, J. (2002). Complements to cognition: A longitudinal study of the
relationship between complex syntax and false-belief-understanding. Cognitive Development,
17, 1037–1060.
Dunn, L., Dunn, L., Whetton, C., & Pintilie, D. (1997). British Picture Vocabulary Scale. Windsor:
NFER-Nelson.
Flavell, J. (1971). Stage-related properties of cognitive development. Cognitive Psychology, 2,
421–453.
Flavell, J., Flavell, E., & Green, F. (1983). Development of the appearance-reality distinction.
Cognitive Psychology, 15, 95–120.
Flynn, E., O’Malley, C., & Wood, D. (2004). A longitudinal, microgenetic study of the emergence of
false belief understanding and inhibition skills. Developmental Science, 7, 103–115.
Foote, R., & Holmes-Lonergan, H. (2003). Sibling conflict and theory of mind. British Journal of
Developmental Psychology, 21, 45–58.
Happe, F. (1994). An advanced test of theory of mind: Understanding of story characters’ thoughts
and feelings by able autistic, mentally handicapped and normal children and adults. Journal of
Autism and Developmental Disorders, 24, 129–154.
Happe, F. (1995). The role of age and verbal ability in the theory of mind task performance of
subjects with autism. Child Development, 66, 843–855.
Hogrefe, J., Wimmer, H., & Perner, J. (1986). Ignorance versus false belief: A developmental lag in
attribution of epistemic states. Child Development, 57, 567–582.
Hughes, C. (1998a). Executive function in preschoolers: Links with theory of mind and verbal
ability. British Journal of Developmental Psychology, 16, 233–253.
Hughes, C. (1998b). Finding your marbles: Does pre-schoolers’ strategic behaviour predict later
understanding of mind? Developmental Psychology, 34, 1326–1339.
Hughes, C., Adlam, A., Happe, F., Jackson, J., Taylor, A., & Caspi, A. (2000). Good test-retest
reliability for standard and advanced false-belief tasks across a wide range of abilities. Journal
of Child Psychology and Psychiatry and Allied Disciplines, 41, 483–490.
Mayes, L., Klin, A., Tercyak, K., Cicchetti, D., & Cohen, D. (1996). Test-retest reliability for false-
belief tasks. Journal of Child Psychology, Psychiatry and Allied Disciplines, 37, 313–319.
Mitchell, P. (1996). Acquiring a conception of mind: A review of psychological research and
theory. Hove, UK: Psychology Press.
Perner, J., & Wimmer, H. (1985). ‘John thinks that Mary thinks that: : :’: Attribution of second
order beliefs by 5–10 year old children. Journal of Experimental Child Psychology, 39,
437–471.
Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioural
and Brain Sciences, 1, 515–526.
Repacholi, B., & Gopnik, A. (1997). Early understanding of desires: Evidence from 14- and
18-month-olds. Developmental Psychology, 33, 12–21.
Ruffman, T., & Keenan, T. (1996). Children’s understanding of surprise: The case for a lag in
understanding relative to false belief. Developmental Psychology, 32, 40–49.
Siegal, M., & Beattie, K. (1991). Where to look first for children’s knowledge of false beliefs.
Cognition, 38, 1–12.
Stability and continuity in theory of mind 653
Copyright © The British Psychological SocietyReproduction in any form (including the internet) is prohibited without prior permission from the Society
Siegler, R., & Crowley, K. (1991). The microgenetic method: A direct means for studying cognitive
development. American Psychologist, 46, 606–620.
Slaughter, V., & Gopnik, A. (1996). Conceptual coherence in the child’s theory of mind: Training
children to understand belief. Child development, 67(6), 2967–2989.
Wellman, H. (1990). The child’s theory of mind. Cambridge, MA: MIT Press.
Wellman, H., & Bartsch, K. (1988). Young children’s reasoning about beliefs. Cognition, 30,
239–277.
Wellman, H., Cross, D., & Watson, J. (2001). Meta-analysis of theory of mind development: The
truth about false belief. Child Development, 72, 655–684.
Wellman, H., & Liu, D. (2004). Scaling of theory-of-mind tasks. Child Development, 75, 523–541.
Werner, H. (1957). The concept of development from a comparative and organismic point of view.
In D. Harris (Ed.), The concept of development: An issue in the study of human behaviour.
Minneapolis, MA: University of Minnesota Press.
Wimmer, H., & Mayringer, H. (1998). False belief understanding in young children: Explanations
do not develop before predictions. International Journal of Behavioural Development, 22,
403–422.
Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of
wrong beliefs in young children’s understanding of deception. Cognition, 13, 103–128.
Received 31 March 2004; revised version received 10 June 2005
Emma Flynn654