Early development of multimodal communicative...
Transcript of Early development of multimodal communicative...
Máster en Fonética y Fonología
Postgrado en Estudios Fónicos
CSIC-UIMP
2012
EARLY DEVELOPMENT OF MULTIMODAL
COMMUNICATIVE STRATEGIES
Trabajo realizado por Alfonso Igualada Pérez
Tutorizado por la Dra. Pilar Prieto Vives
Cotutorizado por la Dra. Laura Bosch Galcerán
2
3
INDEX
1. Acknowledgements _________________________________________________ 4
2. Resumen / Abstract _________________________________________________ 5
3. Introduction _______________________________________________________ 7
4. Methods _________________________________________________________ 12
5. Results __________________________________________________________ 20
6. Discussion and conclusions _________________________________________ 34
7. References _______________________________________________________ 36
4
ACKNOWLEDGEMENTS
I am grateful to my colleagues at ADAPEI-ASPRONA, an early intervention centre in
Albacete, and also to my friends and family for their help in the process of family
recruitment, working area concession, experimental assistance, and materials’
preparation. Special thanks go to the three colleagues who assisted me in the
experimental sessions, and to the joyful company of Fonfonero’s group. I would really
like to thank the infants and families who participated in the experimental sessions.
Many thanks also to all Estudios Fónicos and GrEP members for their warm
welcoming, particularly to Núria Esteve-Gibert for her help in the coding and analysis
procedures, to Joan Borràs-Comes for technical support, and to Santiago Gonzalez for
support. Special regards to master´s director Juana Gil, and to tutors Laura Bosch and
Pilar Prieto for their indispensable guidance. I am really thankful to all of them.
5
Resumen
En el presente estudio piloto investigamos las interacciones del gesto y el habla durante
el desarrollo comunicativo temprano. Se utilizó una tarea experimental basada en
Liszkowski et al. (2008) para obtener un amplio rango de producciones comunicativas
infantiles. Patrones comunicativos infantiles como las vocalizaciones, la prosodia, el
gesto y la mirada fueron analizados para estudiar la integración multimodal temprana y
sus relaciones con la emergencia del lenguaje. Aunque no se puedan extraer
generalizaciones robustas de una muestra tan pequeña, sí se han obtenido resultados
interesantes de su análisis cualitativo. Las producciones conjuntas de gesto y habla
parecen integrarse plenamente a la edad de 1;3, y se utilizan como recurso para atraer la
atención social del adulto en contextos comunicativos exigentes. Además, se ha podido
comprobar que la aparición durante las interacciones más tempranas de un conjunto de
parámetros productivos (la producción temprana de señalamiento, el alineamiento
temporal del gesto y habla, las configuraciones de la mano durante el señalamiento, y
las secuencias de mirada del niño) favorecen la aparición posterior de las
combinaciones de gesto y habla. Uno de los resultados principales de este estudio piloto
fue que el uso de las combinaciones de gesto y habla se pueden considerar como un
importante paso hacia una integración tempana en el uso de los componentes del
lenguaje. Así, el señalamiento en combinación de la vocalizaciones tempranas se podría
considerar un parámetro productivo de la comunicación intencional, en el cual se
integran por primera vez en el desarrollo infantil información semántica, pragmática y
fonológica.
6
Abstract
In the current pilot study we investigated the patterns of gesture-speech integration
during early communicative development. An experimental task based on Liszkowski et
al. (2008) was used to obtain a communicative productive sample from four children, in
two early longitudinal moments, namely at 1;0 and 1;3. Infant’s communicative patterns
of vocalization, prosody, gesture, and gaze were analyzed to study early multimodal
integration, and its relationship with language emergence. While strong generalizations
cannot be drawn on the basis of such a small sample, some interesting results may be
unfolded from qualitative analysis. Gesture-speech productions seem to be well
integrated at 1;3, and seem to work as a strong communicative resource to attract the
adult’s social attention in demanding communicative contexts. Also, the presence of a
set of productive parameters during early infant intentional interactions (early pointing
production, gesture-speech temporal alignment, hand configurations of pointing, and
infant look sequences) seem to correlate with the subsequent appearance of gesture-
speech combinations. One of the main results of the study was that the use of gesture-
speech combinations may be considered as a significant step towards the early
integrated use of language components. That is, pointing in combination with early
vocalizations may be a significant early signal of intentional communication, in which
semantic, pragmatic and phonological information are integrated for the first time in
development.
7
1. Introduction
Research on early gesture acquisition and its relationship to language emergence has
shown strong evidence that the early appearance of iconic and deictic gestures predict
early language development. For example, Colonnesi et al. (2010) found that child
comprehensive pointing at 1;0 contributed to comprehension of other children’s actions
at 3;3. In the same study, authors found a strong correlation between pointing gestures
and language development. Specifically, the appearance of deictic gestures with a
declarative function predicted infant verbal language to a greater degree than the
appearance of imperative deictic gestures. Similarly, some studies unveil the predictive
value of pointing gestures in early vocabulary development. Bavin et al., (2008) found
that children’s gesture and object use at 0;8 and 1;0 predicted vocabulary development
at 1;0 and 2;0. In Caselli et al. (2012), an infant’s early actions and gestures correlated
with comprehensive vocabulary in the 0;8 to 1;6 age range, indicating a transition to
productive vocabulary. Similarly Özçaliskan & Goldin-Meadow (2005) and Rowe &
Goldin-Meadow (2009) recorded infants at home during daily communicative activities.
They found that communicative gesture, and specifically the use and function of deictic
gestures, was able to predict both lexical and grammatical development. While isolated
pointing (i.e., the variety of object types pointed to) seemed to predict the size of the
child’s vocabulary later in development, the frequency of pointing in gesture-speech
combinations (i.e., pointing to a cookie and then saying “give”) had predictive value in
the later appearance of two-word combinations. It seems thus that in early stages of
language and cognition development, motor abilities develop early to signal intentional
communication, an ability that seems to predict and enhance verbal language
emergence.
It is also well known that social interaction plays a significant role in development of
communicative skills. For example, interaction routines between infants and adults, like
early shared attention, early vocal games, or informative pointing, allow infants to
participate in patterns of social interaction similar to those used in adult communication,
(e.g., Locke, 1997; Soltis, 2004; Liszkowski et al. 2008; Tomasello, Carpenter &
Liszkowski, 2007). Vocabulary development in relation to sharing attention to an object
has been studied by Brooks & Meltzoff (2008). In this study a gaze-following task was
8
used, meaning that the time spent looking at an object when attention was directed to
that object by an adult was measured. Authors found that longer looking times at ages
0;10 and 0;11 correlated with better language development measures at 2;0. More
interestingly, communicative integration of looking time and pointing were even better
predictors of later vocabulary performance, so infants with long and short looking times
in combination with pointing had better results than children who solely looked or
pointed. Though sharing interaction with reference to an object is a basic milestone in
the acquisition of intentional communication, Liszkowski et al. (2008) note that infant’s
communicative behaviors are also affected by adults’ attention patterns to the object of
reference. They found that the adult’s attention influenced a child’s productions, as the
child used more complex production abilities when the experimenter did not look at the
object pointed to (experimental condition A) as opposed to when he looked at neither
the object nor the child (experimental condition B). These repairing strategies to attract
the adult’s attention occurred significantly only during experimental conditions, not
during communicative conditions (base condition), in which the experimenter shared
attention with the child by looking at his eyes and then at the object, thus, actively
encouraging the child to look. By the same token, it has been found that adult’s
behavior changes depending on the child’s. Andrén (2011) found that parents responded
with elaborate responses during children’s sustained gesture strokes, and also children
adapted their responses depending on their parents’ actions. Therefore, the emergence
of language through multimodal communication seems to be strongly related to social
interaction, and findings on the acquisition of deictic gestures and eye contact and gaze
patterns reveal their importance for the development of social-communicative skills.
In addition, some studies have highlighted the relationship between the acquisition of
prosody and language development. First, early neurological activation in response to of
prosodic information was found at 0;3 by Homae et al. (2006). Shukla et al. (2010),
found evidence of the effect of prosody on language learning at 0;6. In this study, it was
found that infants associated better the sounds of continuous speech with their
corresponding object representation when statistical information and prosodic
information were aligned and children had worst word learning results when prosodic
boundaries and segmental level were misaligned. Other studies reveal the important role
of prosody in the expression of intentional communication in early stages of acquisition,
before vocabulary and early grammatical developments are achieved (Levitt, 1993;
9
Mampe el al., 2009; Prieto et al., 2011, Esteve-Gibert & Prieto, 2012; Sakkalou &
Gattis, 2012). On the other hand, children’s prosody advances seem to co-occur with the
development of language skills and other general cognitive abilities (Snow, 2002). This
author describes two different stages in which intonation changes from a period of
stable development to a discontinuous decrease in intonational ability to finally an
increase, a U-shaped pattern called a regression-reorganization model. The first stage
can be seen at 0;10, a period which is associated with the development of intentionality
and pragmatics, while the second occurs at 1;6, an age associated with the development
of expressive syntax. Of particular relevance to the present study, Parladé & Iverson
(2011) found a similar communicative transition in speech-gesture coordination in
relation to the pace of vocabulary development, so that children seemed to go through a
diminishing of gesture-speech coordination ability when going through a vocabulary
burst stage, revealing the interplay between pointing gestures, speech, and affective
gestures. This study found that children exhibited worse gesture-speech coordination
performance during the vocabulary burst period, characterized by a sharp increase in
active vocabulary, than during periods characterized by a gradual increase in
vocabulary. Yet little is known about the interaction between gesture, prosody, and
social interaction and their predictive value in language acquisition.
Gesture-speech integration happens at the beginning of one-word acquisition, as
children seem to develop an interesting ability to temporally synchronize gesture and
speech (for example, when a child points at a book while saying the word “book” to
express a conveyed information) (Butcher & Goldin-Meadow, 2000). Prior to this
achievement, children may have experimented with coordination of body and oral
movements. For example, around 0;6-0;8 canonical babbling occurs with rhythmic hand
movements (i.e. waving), and even functional development follows a path that parallels
those of gesture and speech, so that at 0;8-0;10 child word comprehension develops,
deictic gesture unfolds, gestural routines appear, the first tool use emerges, all at the
same time (Bates & Dick, 2002). As McNeill (1992) noted, gestures and speech are
synchronized in adults not only at the semantic level, since they express the same
meanings, but also at the pragmatic level, since they perform the same pragmatic
functions, and also the phonological level, because the most prominent part of the
gesture seem to be integrated with phonology. Esteve-Gibert & Prieto (in press) studied
adults’ speech-gesture coordination and found that prosody aligns with the apex (the
10
point of maximum extension during the stroke phase) of the pointing gesture. More
precisely, their results reveal that the intonation peak (the most prominent part of the
intonation contour) as well as the beginning and end of verbal production are anchored
temporally to the apex of the point gesture, regardless of variation in the metric pattern
of words.
Focusing on the development of speech-gesture temporal synchronization, to our
knowledge only Butcher & Goldin-Meadow (2000) and Esteve-Gibert & Prieto (under
review, 2012) have investigated speech-gesture alignment in an infant population. Both
investigations found that the two modalities develop together at the one-word-period
and the most prominent part of the gesture (the stroke) temporally aligns with the most
prominent part of the speech. This means that children around 1;0 have developed the
ability to fine-tune two communication modalities. Esteve-Gibert & Prieto (under
review, 2012) use prosodically prominent acoustic cues such as the onset and offset of
vocalization and the fundamental frequency peak in order to assess gesture-speech
alignment in children aged from 0;11 to 1;7. Their results reveal that children at 0;11 do
temporally align speech and gesture, but it is not until productive words emerge that the
two modalities align like in adults. This early behavior exhibits the integration of those
three linguistic levels since the two channels (gesture and vocalization) express
redundant information with the same meaning and pragmatic intention, and utterances
are attached to its equal phonological prominence. This convergence could be a
potential predictor of vocabulary emergence and thus language development. However,
little is known about the integration of pointing and prosody in particular, and further
research is needed on how this coordination develops and whether it is a predictor of
language development.
The aim of this study is to analyze the early stages of language and communication
development in four Spanish children, who were recorded at two points of their
language development, at ages 0;1 and 1;3. Each infant’s communicative interactions
will be assessed and also the extent to which these patterns predict the infant’s later
linguistic development. The analysis of data will focus on the correlation between the
communicative patterns of vocalization, prosody, gesture, and gaze (all present before
the infant’s first year), and the early signs of language acquisition (i.e., the development
of gesture-speech combinations and lexical development). The main goal will be to
11
assess the use of speech patterns, pointing gestures, and gaze patterns and explore their
predictive value for infant’s later language development.
The methodology used is based on Liszkowski et al. (2008)’s elicitation task, which
elicits speech-gesture combinations in a social communicative context with infants in a
controlled setting. This paradigm allowed us to obtain a wide behavioral sample of
communicative exchanges between the child and the adult in different social conditions.
The procedure motivates the child to initiate communication by means of a pointing
gesture and deploy his or her repertoire of communicative strategies in order to direct
the adult’s attention to a stimulus which has appeared from behind the experimenter.
12
2. Methods
2.1. Participants
Six typically-developing children participated in the study. Two of them had to be
excluded from analysis because of oral habits which interrupted pointing activity
(dysfunctional digital suction and tooth emergence). The four children analyzed (1 girl,
3 boys) were recorded at two longitudinal moments, the first recording taking place at
around 12 months (mean = 12;12; range =11;23-12;27 and the second recording three
months later. All of the infants were recruited from public nurseries in Albacete
(Castilla-La Mancha, Spain) from monolingual Spanish families that had expressed
interest in participating in the study. A small stipend was given to the parents upon
conclusion of the experiment. When initially contacted, all families were included as
they answered that their infant already begun to point at objects.
2.2. Experimental setting and materials
The recordings always took place in a 2.5 m x 5 m distractor-free testing room at the
ADAPEI-ASPRONA Early Intervention Center for children with developmental
disabilities in Albacete. The experimental setting was based on Liszkowski et al.
(2008), as follows. An opaque white cloth screen hid from view the middle of the back
of the room, and the infant sat on his or her caregiver’s lap in the middle of the room
facing the at a distance of 2 m from the screen. Through a large opening in the upper
center of the screen a camera recorded the child’s reactions, and a second camera was
placed on the back of the room in such a way that it could record the use of stimuli and
the experimenter’s facial reactions. The screen had four openings at each side through
which the puppets were made visible to the child, one at a time. These openings (six of
them 60 cm and two of them 100 cm from the floor) were symmetrically positioned at
about 45◦, 30◦ (2×) and 25◦ left and right from the infant’s midline. The puppets were
manipulated by an assistant behind the screen. A total of ten stimuli were used. Eight
consisted of hand puppets (cat, frog, cow, chicken, sun, snail, grandmother, and mouth),
and the remaining two stimuli which were visible on the floor, an electronically
activated dancing pig and a light. The two electronic stimuli were positioned on the
13
floor in front of the screen at approximately 30◦ to the infant’s left and right (See Fig.
1). A moveable bead toy and a pair of infant books were used between conditions to
return the infant’s attention to the experimenter, the large moveable beads toy was
attached to the small table. The spoken words which served as lexical stimuli were all
chosen from López-Ornat et al. (2005), the Spanish version of MacArthur’s inventory
of communicative development vocabulary items for children aged 8 months to 15
months. The white screen was situated behind the experimenter, who was seated on a
small chair. The child was seated on his or her caregiver’s lap facing the experimenter
and panel. A small table was placed between the experimenter and the caregiver with
his or her child, who were seated in a higher chair to facilitate video recording. The
caregiver wore a pair of earphones with music to distract from the activity. A total of
three assistants helped with the task, all of them were specialists in education or
rehabilitation and also trained in the procedure used in this study.
Figure 1. Experimental setting.
14
2.3. Procedure
Liszkowski’s et al (2008) procedure was used to elicit infant communicative behavior
(communicative gestures, vocalizations, and sharing gaze) through an enjoyable
activity, in this case watching puppets. The experimenter facilitated the child’s pointing
gestures by reacting to his or her communicative behavior in three different ways, each
reflecting a different social condition. In the most communicative condition (which will
be called the base condition or BaseCond) the experimenter established interaction with
the child in a communicative way when the child initiated pointing. In the other two
experimental conditions the experimenter’s attention was directed at the child but not
the stimulus (available condition) or his attention was directed at neither the child nor
the stimulus (non-available condition). Results from previous research suggested that
the latter two conditions would trigger greater communicative involvement on the part
of the child.
For each experimental session, the procedure was as follows. First, caregivers were
informed about the experiment’s procedure, permission to record was obtained, and
instructions for the task were given. Warm-up time before the experiment consisted of
extensive play between the experimenter and the infant in a different room in order for
the child to feel at ease with the experimenter. In the meantime, caregivers were brought
to the experiment room and instructed that they must not initiate any communicative
behavior toward the infant during testing and or look at the screen at any time. Rather,
they were encouraged to sit calm looking at their child and listening to music through
headphones. The caregiver was asked to gently hold the child in place on their laps to
maintain constant the child’s position during the experiment and minimize the child’s
stress during the experiment.
The experimental session began in the testing room with a short play period with the
bead toy on the table to keep the infant interested in the experimenter as a social partner,
though this toy was only used at the beginning of the segments of the experiment
involving communicative conditions, i.e., the base condition. When the experimenter
judged that the infant was relaxed and attentive, he gradually withdrew from the
interaction and signaled to the assistant behind the panel by means of snapping his
fingers out of the child’s sight that puppet stimuli could be activated. The assistant
15
always waved puppets one at a time from side to side and front to back within different
openings in the panel, silently, and looking through one of the holes to indicate when
child pointed. For each stimulus, the experiment snapped his fingers as soon as the child
initiated pointing. The child had 10 seconds within which to initiate the gesture. If the
child pointed within this time the stimulus continued (i.e., the puppet continued to be
visible) for 10 more seconds or until the infant was uninterested. But if child did not
initiate a pointing gesture, the stimulus was withdrawn after the first 10 seconds. In all
cases, the experimenter indicated by clucking his tongue when it was time for the
puppet to be withdrawn.
The first trial was always in the communicative condition (Base-Cond), i.e., when the
stimulus was activated the experimenter looked at the infant and ignored the stimulus
until the infant pointed to it, and then the experimenter reacted immediately and shared
attention for the ensuing 10 seconds, that is, the experimenter repeatedly looked back
and forth between the stimulus and the infant’s face, talking excitedly about the
stimulus and commenting on the fact that they were seeing it together. For example, the
experimenter would say something like: “Oh, it’s a Cat! Look how he’s saying hi to
you!” Then, 10 seconds after the infant’s first point the stimulus was withdrawn and the
trial was over. Following the first trial the experimenter shared a book activity until the
child was relaxed and attentive, then gradually withdrew the activity, and indicated to
the assistant with a finger click to activate the next stimulus, which could correspond
either to the available or the non-available condition. In both experimental conditions,
when the child pointed, the experimenter responded to the child by saying things like
“Hmm? What? What’s there? Hmm?” Thus the experimenter’s focus of attention
changed depending on the condition. While in the available condition trials (AExp) the
experimenter ignored the stimulus but looked at the infant, in the non-available
condition trials (BExp) the experimenter ignored both the stimulus and the child and
looked at the book.
Every session consisted of a sequence of a base condition and two experimental
conditions repeated five times (3 conditions x 5 times = total of 15 trials). Trial
sequences followed two orders counterbalanced in terms of experimental condition
(BaseCond-AExp-BExp or BaseCond-AExp-BExp) and two orders counterbalanced in
terms of the side of appearance of the first stimulus (starting by the right or by the left
16
side). Five stimuli had to appear twice to complete a total of 15 trials in every session,
the order of stimulus appearance was randomly chosen by the assistant. The
experimental sessions lasted an average of 12 minutes.
2.4. Coding
Coding was performed with ELAN software (Lausberg & Sloetjes, 2009) for complex
audio and video annotations. Acoustic analysis was done with Praat (Boersma &
Weenink, 2009) and then imported back into ELAN. Coding of multiple infant
audiovisual behavior measures was based on various authors (McNeill, 1992;
Liszkowski, et al., 2008; Brooks & Meltzoff, 2008; Cartmill, et al., 2012; Esteve-Gibert
& Prieto, under review, 2012) and included multimodal cues in relation to uttered
communicative modality, and different looking, pointing and speech aspects as detailed
in the explanation below. Measures were assessed separately for baseline and
experimental conditions.
Communicative modality and gaze patterns
Communicative modality was coded and included three options, namely, gesture-only,
speech-only, and gesture-speech combinations. The child’s sequence of looking at
stimulus object and experimenter included four options, namely: (a) when the child just
looked at the object; (b) when the child looked at the object and then at the
experimenter; (c) when the child shared object attention with the experimenter in a
triadic sequence (object-experimenter-object); and (d) when child repeated a triadic
look with the sequence repeated more than once. Following Brooks & Meltzoff (2008),
duration of the first look at the object was measured.
Pointing gesture analysis
Only instances of pointing at the correct stimulus were coded, while other
communicative gestures (i.e., showing palms to express “give me”) were not taken into
account. The number and duration of pointing gestures were coded in terms of (a) the
number of trials in which infants pointed once (trials with one point), and (b) the
number of trials in which infants pointed more than once (point repetitions). Latency
17
pointing was also measured; this consisted of the time interval between the infants’ first
look at the stimulus and their first pointing gesture.
Following Brooks & Meltzoff (2008), Liszkowski et al. (2008), and Cartmill et al.
(2012), the hand configuration of the pointing gesture was coded as either (a) pointing
with extended finger or (b) pointing with the hand with the palm downwards. Also
pointing performance was coded according to how far the arm was extended (either
fully or bent), and which arm was used to point (right or left).
Speech analysis
The infant’s communicative vocalizations were coded when they were directed at the
experimenter, while the experimenter’s verbal stimulus, and the infants shouting,
laughing, fussing, or vegetative sounds were excluded. Frequency and duration of
vocalizations were measured separately depending on their temporal location with
respect to the pointing gesture, as follows: vocalizations before pointing, vocalizations
during pointing, and vocalizations after pointing.
The phonological characterization of early infant vocalization was assessed following
Karousou et al. (2006), in which four combinatory speech patterns were considered the
most relevant to assess word-like vocalization properties. In this study adult listeners
assessed the resemblance of infant’s vocalizations to adult-like words. The results of the
study showed that four main properties of the stimulus have a significant effect on the
listener’s decisions. In our study, we coded in a binary way (i.e., presence vs. absence)
of these properties, as follows: (a) vocalic or consonantal quality (perceptive
resemblance to target vowels and consonants); (b) number of syllables (from 1 to 3
syllables); (c) prosodic patterns (flattened patterns during the whole vocalization and
repetitive patterns within vocalization segments were considered non-positive); and (d)
rhythmic patterns within a single syllable (trochaic or iambic were considered a positive
pattern). Each parameter was coded as present or absent (1 or 0), so that a composite
phonological measure was extracted for each experimental session, which could range
from 0 to 4.
18
Gesture-speech alignment
First, the most prominent part of the pointing gesture (i.e., the apex) was coded as the
specific point in time where the arm (hand or finger) reached its maximum extension. In
deictic gestures, the apex tends to be realized somewhere in the middle of the stroke,
which is the interval of time of gesture prominence, just after the initiation phase and
before the retraction phase of the gesture (Estève-Gibert & Prieto, under review, 2012).
Second, vocalizations were measured at three temporal moments: the fundamental
frequency peak of the intonation contour, the onset of the vocalization, and its offset.
Gesture-speech temporal alignment was measured in relation to the apex point, that is,
as the distance in milliseconds between the speech’s most prominent part or other
temporal points (the peak of F0, the onset, or the offset) and the most prominent part of
the gesture, namely the apex.
Figure 2. Snapshot of the ELAN coding scheme. The ELAN template included the following
tiers: (1) trial condition, (2) communicative condition, (3) infant’s look sequence, (4) duration of
the first look at the object, (5) latency before pointing, (6) pointing performance, (7) number of
pointing occurrences, (8) temporal location of the apex of the gesture, (9) temporal location of
the intonation peak, (10) vocal placement in relation to gesture, (11) number of vocalizations,
and (12) word-like pattern of the vocalization.
19
Figure 3. Waveform, spectrogram, F0 contour and coding scheme of the target vocalization in
Praat, The Praat coding scheme included the following tiers: (1) segmental analysis, (2) ToBI
transcription, (3) vocalization’s word-like pattern, (4) location of the fundamental frequency
peak, and (5) duration of the vocalization.
300300
340
380
420
460
500
F0 (
Hz)
0 0.5
L+H* L%
1111
00’21.5
00’21.3-00’21.8
20
3. Results
The main goal of this study was to analyze the early development of gesture-speech
combinations in relation to social factors, as well as the predictive value of different
linguistic cues. The results section is divided in four specific subsections, which
correspond to four different issues: (1) the development of gesture-speech
combinations; (2) the effects of social attention; (3) the description of early gesture-
speech temporal alignment; and (4) an analysis of the predictive value of early gesture-
speech combinations.
3.1. Development of gesture-speech combinations
Figure 4 shows the distribution (expressed in number of occurrences) of gesture-only,
speech-only, and gesture-speech combinations occurring at both longitudinal moments
(at 1;0 and 1;3), for the 4 children under analysis. The results of a chi-square test show a
significantly difference in the distribution of utterance modalities between the first and
the second stages ((2, N = 112) = 26.293, p < 0.05). As can be observed in the graph,
children show a great increase in the number of gesture-speech combinations at 1;3, so
integrated gesture-speech coordination is more fully accomplished in this later period of
speech development. Similarly, speech-only productions also increase from 1;0 to 1;3,
indicating a better speech capability at 15 months of age. Finally, a frequency drop is
documented in gesture-only utterances.
report a similar development of gesture-only and gesture-speech utterances for Catalan
speaking children. Their results show a change from a higher percentage of gesture-only
at 0;11 to a lower frequency at 1;3. Conversely, gesture-speech combinations increase at
the later age. Altogether, this evidence supports the idea of a transition from gesture-
only productions to gesture-speech combinations and a stabilization in the integration of
the two modalities, which seems to take place at 1;3 with gesture-speech combinations.
21
Figure 4. Distribution (expressed in number of occurrences) of gesture-only, speech-only, and
gesture-speech combinations occurring at 1;0 and 1;3 for the 4 children.
3.2. Effects of social attention
Previous results have shown that the communicative rapport of the adult will
significantly influence a child’s productions (e.g., Locke, 1997; Soltis, 2004;
Liszkowski et al. 2008, Tomasello, Carpenter & Liszkowski, 2007; Andrén, 2011). The
three graphs in Figure 5 show the distribution (expressed in number of occurrences) of
gesture-only, speech-only, and gesture-speech combinations produced by the 4 children
separated by communicative condition (baseline condition = top left graph, available
condition = top right graph, and non-available condition = bottom panel), at two ages.
The results of a chi-square test revealed significant effects of age on the modalities
used, for all communicative conditions (baseline conditions and the two experimental
conditions). Chi-square results are significant at p < 0.05 for every condition, namely
for the baseline condition (chi-square (2, N = 31) = 8.43), for the available condition
(chi-square (2, N = 36) = 8.72), and for the non-available condition (chi-square (2, N =
45) = 10.86). Another chi-square test was run to test the effect of communicative
condition on the distribution of the modalities, at both ages. No clear significant effect
22
of experimental condition was found. For the earlier age chi-square (4, N = 27) = 2.94, p
> 0.05, and for the later recording chi-square (4, N = 85) = 5.59, p > 0.05. Despite this,
the results in the graph show a clear increase of in the number of children’s responses
when comparing the two experimental conditions (and especially the available
condition (AExp) to the baseline condition. The results reveal that children at 15 months
show an increase in the use of gesture-speech combinations and speech-only responses
in the two experimental conditions, that is, when the child has to make an effort to
attract the adult’s attention. So when the focus of the adult’s attention is different from
the child’s object of interest, like trials within the available condition (adult looked at
infant but not at object) and the unavailable condition (adult looks at neither infant nor
object), we observe an increase in the more complex production abilities (meaning
gesture-speech combinations), which are used to attract the adult’s attention, while
gesture-only productions are reduced. Therefore, the new ability of gesture-speech
combinations seems to be activated in order to attract the adult’s attention in more
adverse conditions to achieve the communicative goal.
Similarly, Liszkowski et al.’s (2008) results showed effects of age and condition on the
use of pointing. First, older children (1;6) showed better pointing abilities than children
at 1;0. Second, children showed an increase in the use of repairing strategies like point
repetitions and longer durations during the experimental conditions. Though our
database would need a larger sample in order to obtain significant results, a similar
tendency is seen in our data in this respect.
In conclusion, the results in this section clearly show that vocalizations in coordination
with gestures are used by children when both abilities are well integrated at 1;3, and that
this strong communicative resource is used by children at 1;3 to actively attract adults’
attention.
23
Figure 5. Distribution (expressed in number of occurrences) of gesture-only, speech-only, and
gesture-speech combinations separated by communicative condition (baseline condition = top
left graph, available condition = top right graph, and non-available condition = bottom panel), at
two ages.
3.3. Early speech-gesture temporal alignment
Are vocalizations temporally aligned with pointing gestures in children’s utterances?
Figure 6 shows a distribution analysis of the number of vocalizations uttered before,
during, and after pointing, at the two ages. First, the results show that children at 1;0
uttered fewer vocalizations than at 1;3. Importantly, vocalizations during pointing show
a greater frequency at 1;3 than vocalizations before and after pointing. Chi-square tests
24
revealed significant effects of age on vocalization production in relation to pointing
position (chi-square (2, N = 105) = 40.19, p < 0.01).
Figure 6. Distribution (expressed in number of occurrences) of the number of vocalizations
uttered before, during, and after pointing, at the two ages.
Figure 7 shows the distribution (expressed in number of occurrences) of the number of
vocalizations uttered before, during, and after pointing, separated by experimental
condition. The results of a chi-square test reveal significant effects of experimental
condition on the temporal alignment of vocalizations in relation to pointing position
(chi-square (4, N = 105) = 14.07, p < 0.01)). As expected, both experimental conditions
(A and B) triggered a higher use of temporally aligned vocalizations, which are used by
children in order to achieve a more insistent intentional communication strategy when
the adult is not attentive.
25
Figure 7. Distribution (expressed in number of occurrences) of the number of vocalizations
uttered before, during, and after pointing, separated by experimental condition.
Now we will undertake a fine-grained analysis of temporal alignment among the
speech-gesture communicative modalities by taking into account the most prominent
part of the gesture (i.e., the gesture apex, the greatest extension of the gesture) and three
landmark points during the production of the vocalization, namely the onset of the
vocalization, the offset of the vocalization, and the intonation peak of the fundamental
frequency (F0). We compared the distance of each of the three vocalization acoustic
parameters to the apex, which was considered the point of reference for the alignment
analysis. The gesture-speech alignment analysis is based on the results of the term
Phonology paper, within the same MA degree.
The three box plots in Figure 8 show the mean temporal distance (in ms) between the
apex of the pointing gesture and the onset of the vocalization (top left panel), the apex
of the pointing gesture and the offset of the vocalization (top right panel), and the apex
of the pointing gesture and the intonational peak of the vocalization (bottom panel).
Three outliers were excluded from data analysis. The black line crossing the “0” value
represents the temporal moment of gesture apex production. Negative values (<0)
represent cases in which vocalization landmarks occur before the apex of the gesture,
26
while positive values (>0) represent cases in which vocalization landmarks occur after
the apex of the gesture. By comparing the results in the three box plots we see that the
F0 peak is usually performed before apex of the gesture (in 30 of the total 37 instances
seen in the sample).
Figure 8. Box plots of the mean temporal distance (in ms) between the apex of the pointing
gesture and the onset of the vocalization (top left panel), the apex of the pointing gesture and the
offset of the vocalization (top right panel), and the apex of the pointing gesture and the
intonational peak of the vocalization (bottom panel). Negative values (<0) represent cases in
which vocalization landmarks occur before the apex of the gesture, while positive values (>0)
represent cases in which vocalization landmarks occur after the apex of the gesture.
27
The average peak-apex distance is around 300 ms (mean peak-apex distance = -306 ms;
SD = 523 ms). And as expected, the start and end points of the vocalization were
produced before and after the intonation F0 peak respectively (mean onset-apex
distance is equal to -1578 ms; SD = 4839) and mean offset-apex distance is equal to -
212 ms; SD = 5129). The distance between both the onset and offset of the vocalization
and the apex of the pointing gesture show larger standard deviations than the distance
between the F0 peak and the apex, meaning that these synchronization patterns are more
variable than the peak-apex alignment. Thus, children seem to accurately align speech
and gesture, and the F0 peak seems to be an anchor point for the alignment of the apex
(both occur within around 0.5 seconds of each other).
In sum, our small data sample has first revealed effects of both age and experimental
condition on how vocalizations are temporally aligned with the pointing gesture,
consistent with Liszkowski et al. (2008). The results show that children use
communicative strategies more efficiently during experimental conditions than in the
baseline conditions in which the adult actively shares attention to the object with the
child.
3.4. Language acquisition perspective on gesture-speech combinations
In this section, we perform a subject-by-subject qualitative analysis of the potential
predictors (or language emergence parameters) of the individual capability of each child
to communicate with gesture-speech combinations at 1;3. As later vocabulary measures
at 1;6 have not yet been taken, a qualitative overview of children’s communicative
abilities is performed in this section. The following production parameters related to
communication have been taken into account: communicative modality (gesture-only,
speech-only, gesture-speech combinations), phonological development measures, hand-
motor abilities, and social gaze. We then assess the predictive value of each.
28
Figure 9. Distribution (expressed in number of occurrences) of gesture-only, speech-only, and
gesture-speech combinations separated by participants (4 participants, x-axis) and age (12
months = top panel, 15 months = bottom panel).
29
The two plots in Figure 9 show the distribution (expressed in number of occurrences) of
gesture-only, speech-only, and gesture-speech combinations separated by participants (4
participants, x-axis) and age (12 months = top panel, 15 months = bottom panel).
Though all children show a higher frequency of gesture-speech combinations at 1;3 than
at 1;0, each child shows a different performance. Participant 1 shows the highest rate of
gesture-speech responses at 1;3, which seems to correlate with a high rate of gesture-
only productions at the earlier age. Participant 3 shows the same pattern but at a lower
frequency. Thus, for participants 1 and 3 the presence of gesture-only productions at 12
months seems to influence later higher rates of gesture-speech combinations. A
different production pattern is shown by participant 4. Though this child used a high
number of vocalizations at 1;0, this ability is not transformed at a later age into an
increase in gesture-speech combinations. Finally, participant 2 develops communicative
abilities at a slower rate, as no responses were obtained at 1;0 and it only catches up
with participant 2 at 1;3. Thus, qualitative measures for these 4 children seem to point
to the fact that the appearance of early gesture-only responses can be a predictor of
gesture-speech combinations, which is taken as a measure of general communicative
success.
Vocalizations were assessed by analyzing the four phonological parameters proposed by
Karousous et al. (2006). These are perceptive measures that help in the assessment of
the phonological development of vocalizations. Each vocalization was coded with
presence vs. absence of the following four parameters: adult-like articulation, syllabic
adult-like properties, target-like intonation, and target-like rhythm (see the Methods
section). For example, if an utterance was adult-like in all four parameters it obtained a
score of 4.
Table 1 shows the mean score of word-like vocalization parameters for each participant,
at both ages. The results show again clear subject-by-subject differences. Participant 1
obtained the highest scores at both ages, which coincides with his better communicative
abilities (see Figure 9). By contrast, participant 2 obtained the lowest vocalization
means, which seems to be coherent with his speech and gesture frequency rate (see
Figure 9). Participant 3 seems to have a stable phonological rate of development at 1;3,
while the last child, participant 4, developed at a faster rate from 1;0 to 1;3. Curiously,
participant 4’s speech abilities and phonological measures seem to follow different
30
paths, as this child showed a very talkative pattern during the earliest age (at 1;0), so
one might expect a higher phonological measure of vocalizations. What this means is
that participant 4 used a lot of vocalizations with a lower measure of phonological
complexity parameters than those children who did not show as many speech-only
productions. More interestingly, the pace of phonological development of participant 4
(1:2.3) increased at a faster rate than that seen in the children with higher combination
abilities (around 1:1.09).
Therefore, while the phonological parameter scores of participants 1 and 2 clearly
correlate with their development of communicative gesture-speech combinations, this is
not the case with participants 3 and 4. On the other hand, participants 1 and 3 seem to
develop their phonological abilities at the same gradual rate, while the second’s child
rate is slower and the fourth child’s is the highest of the four.
Mean word-like vocalization measures
12-months-old 15-months-old
Participant1 3,5 3,8
Participant2 0,0 0,14
Participant3 2,0 2,2
Participant4 1,5 3,54
Table 1. Mean score of word-like vocalization measures (following Karousou et al., 2006) for
each participant, at both ages.
To sum up our results thus far, our small corpus reveals interesting results about the
non-predictive use of the number of vocalizations and phonological development. First,
more frequent use of vocalization-only does not seem to correlate with higher
phonological parameter scores. Thus, the fact that they produce many vocalizations
does not mean that children will have attained a more complex phonological ability.
Instead, the use of integrated gesture-speech combinations seems to work in favor of
phonological development.
31
Finally, two more correlates of intentional communication were taken into account for
descriptive analysis: the pointing effect or system and the infant’s gaze patterns. Firstly,
the hand configuration of the pointing gesture was coded as (a) pointing with extended
finger or (b) pointing with the hand with the palm downwards. An infant’s motor
development follows a well-known pattern of motor abilities acquisitions from closer to
distal parts of the body and from the main body to the extremities, so we predicted that
the use of the index finger would be correlated with a better communicative
performance. Figure 10 shows the distribution (expressed in number of occurrences) of
the hand configuration types in all pointing gestures produced by children at 15 months,
separated by participants (4 participants, x-axis).
Figure 10. Distribution (expressed in number of occurrences) of pointing types in all pointing
gestures produced by children at 15 months, separated by participants (4 participants, x-axis).
32
Figure 11. Distribution (expressed in number of occurrences) of the look sequences used by
children at 12 months (upper panel) and at 15 months (bottom panel), separated by participants
(4 participants, x-axis).
33
The ability to share attention with the adult through an object is another important
precursor of the emergence of intentional communication, and therefore the infant’s
ability to establish eye contact with the adult to direct attention to the object was also
measured. The hypothesis is that the infant who solely looked at the target object (as if
to say “Wow! There is something there!”) could have lower communication abilities
(less communicative intention) than an infant with dyadic look sequences, that is, a look
at the object and a look at the adult (“Have you seen what I have just seen?”) or even
that a child with a triadic look sequence, that is, they look at the object of reference,
look at the adult, then look once more at the object to attract the adult’s attention (“Are
you going to see what I am seeing?”), which could be interpreted as a greater intentional
ability. The two graphs in Figure 11 show the distribution (expressed in number of
occurrences) of the look sequences at 12 months (upper panel) and 15 months (lower
panel) for each participant (4 participants, x-axis).
The look sequence results at 0;1 show that while all the infants are able to establish
complex sharing look sequences with the adult, participants 3 and 4 have a greater
ability than participant 1, who performed better in all the other parameters mentioned
above. Probably this is due to the fact that participant 1 uses more complex abilities
(gesture-only and gesture-speech combinations) to attract the adult’s attention.
Participant 2 frequently uses more look-at-object sequences with less interest or
intention to share the object of attention. Interestingly, by 1;3 all the children use
triadic looks more frequently, which seem to be correlated with the more frequent
appearance of gesture-speech combinations. Participant 1 shows a higher triad looks
rate, which nicely correlates with a greater use of gesture-speech combinations. On the
other hand, participants 2 and 3 also show a frequent use of triad looking patterns.
In sum, look sequences may function as an important early strategy for intentional
communication, but once more complex abilities are applied with this objective each
infant follows his or her own pattern of development.
34
4. Discussion and conclusions
A sample of four infants’ intentional utterances were obtained in an experimental
context which was designed, following Liszkowski’s et al. (2008) procedure, to elicit
declarative pointing gestures and which simulated several patterns of social interaction
with the adult. The children spontaneously used a broad sample of their gesture and
speech communicative modalities in order to attract an adult’s attention to the object of
interest, which were puppets, a dancing pig and a light. Similarly to Liszkowski et al.
(2008), we found effects of age and experimental condition on the pointing and
communicative behavior of children at 1;0 and 1;3. Children tended to use more
complex communicative abilities in the experimental conditions, that is, when the adult
did not attend either the stimulus object or the child. Though more data is needed to
further test these conclusions, these results already point in interesting directions.
First, our results showed the more frequent use of gesture-only productions at 1;0 and a
more frequent use of gesture in combination with speech at 1;3 (as in Esteve-Gibert &
Prieto, under review, 2012). This supports the conclusion reported by Butcher &
Goldin-Meadow (2000) that children seem to develop the ability to fully integrate
gesture and speech around their one-word-period. In addition, this process of integrating
gesture with speech integration may imply a function effect on language emergence.
Also, by 1;3, children seem to acquire the synchronous use of gesture and speech to
express meaning. Thus, as they grow up, children progressively deploy more complex
abilities, such as gesture-speech combinations, and learn to do so in more demanding
situations such as those simulated by the experimental conditions described here, when
they seek to call the attention of an inattentive adult.
A total of 37 gesture-speech combinations were closely analyzed to describe temporal
alignment by measuring the temporal distance between the prominent part of the
gestures with the endpoints of vocalization (the onset and the offset) and the highest
intonation peak (F0 peak), respectively. The results indicate that children at 1;3 show a
clear ability to temporally align the prominent part of the pointing gesture (i.e., the
apex) with the speech prominent parts (separated by a mean distance of 0.5 seconds in
our study, though Esteve-Gibert & Prieto (under review, 2012) report shorter times.
35
Further research on these issues is needed to further explore the relationship between
these parameters and communicative development.
Finally, a qualitative description of the children’s ability to combine both modalities at
1;3 is related to other levels, aspects…?of phonological development. The appearance
of gesture-only pointing responses, motor abilities like finger articulation and early gaze
sharing with the adult seem to be related to the child’s potential ability to integrate
gesture and speech. Children with better alignment abilities also performed more
complex look sequences and had finer motor abilities. Early phonological measures
seem to work in parallel. However, accurate speech production can be relatively
independent of intentional communication.
In conclusion, while strong generalizations cannot be drawn on the basis of such a small
sample, the general patterns we have seen in terms of communicative development
parameters show an interesting degree of correlation. A set of productive parameters
during early infant intentional interactions (early pointing patterns, alignment of
gesture-speech patterns, pointing configurations, and look sequences) seem to favor the
subsequent appearance of gesture-speech combinations. Even more interestingly, the
expression of both communicative modalities through early temporal alignment may be
a significant step towards the integrated use of different language components, which in
turn could be a key factor in later vocabulary development.
36
5. References
1. ANDRÉN, M. (2011) “The organization of children’s pointing stroke
endpoints” in STAM, G. & ISHINO, M. (eds.) Integrating Gestures: The
interdisciplinary nature of gesture. John Benjamins, 153-162.
2. BATES, E. & FREDERIC DICK (2002). “Language, gesture, and developing
brain,” in Developmental Psychobiology 40: 293-310.
3. BAVIN, E.L. – PRIOR, M. – REILLY, S. – BRETHERTON, L. – WILLIAMS,
J. – EADIE, P. – BARRET, Y. – UKOUMUNNE, O.C. (2008). “The early
language in Victoria Study: predicting vocabulary at age one and two years from
gesture and object use” in Journal of Child Language 35: 687-701.
4. BOERSMA, P. & WEENINK, D. (2012). Praat: doing phonetics by computer
[Computer program]. Version 5.3.04, retrieved 12 January 2012 from
http://www.praat.org/.
5. BROOKS, R. & MELTZOFF, A.N. (2008). “Infant gaze following and pointing
predict accelerated vocabulary growth trough two years of age: a longitudinal,
growth curve modeling study” in Journal of Child Language 35: 207-220.
6. BUTCHER, C. & GOLDIN-MEADOW, S. (2000). “Gesture and the transition
from one-to-two word speech: when hand and mouth come together” in
MCNEILL, D. (ed.). Language and gesture. New York: Cambridge University
Press, 235-258.
7. CASELLI, M.C – RINALDI, P. – STEFANNI, S. – VOLTERRA, V. (2012).
“Early action and gesture vocabulary and its relation with word comprehension
and production” in Child Development 83: 526-542.
8. COLONNESI, C. – STAMS, G.J. – KOSTER, I. – NOOM, M.J. (2010). “The
relation between pointing and language development: a meta-analysis” in
Developmental Review 30: 352-366.
9. ESTEVE-GIBERT, N. & PRIETO, P. (2012). “Prosody signals the emergence
of intentional communication in the first year of life: evidence from Catalan-
babbling infants” in Journal of Child Language 00: 1-26.
37
10. ESTEVE-GIBERT, N. & PRIETO, P (under review2012). “The early
development of gesture-speech combinations and their temporal coordination”,
Journal of Speech, Language and Hearing Research.
11. HOMAE, F. – WATANABE, H. – NAKANO, T. – ASAKAWA, K. – TAGA,
G. (2006) “The right hemisphere of sleeping infant perceives sentential
prosody” in Neuroscience Research 4: 276-280.
12. LAUSBERG, H. & SLOETJES, H. (2009). Coding gestural behavior with the
NEUROGES- ELAN system. Behavior Research Methods, Instruments, &
Computers, 41(3), 841-849.
13. LEVITT, A.G. (1993). “The acquisition of prosody: Evidence from French and-
English-learning infants” in Haskins Laboratories Status Report on Speech
Research 113: 41-50.
14. LISZKOWSKI, U. – ALBRECHT, K. – CARPENTER, M. – TOMASELLO,
M. (2008). “Infants’ visual and auditory communication when a partner is or is
not visually attending” in Infant, Behavior & Development 31:157-167.
15. LOCKE, J.L. (1997). “A theory of neurolinguistic development” in Brain and
Language 58: 265-326.
16. LÓPEZ-ORNAT, S. & GALLEGO, C. (2005). Inventario del desarrollo
MacArthur. Madrid. TEA.
17. MAMPE B. – FRIEDERICI, A.D. – CHRISTOPHE, A. – WERMKE, K.
(2009). “Newborn’s cry melody is shaped by their native language” in Current
Biology 19:1994-1997.
18. MCNEILL, D. (1992). Hand and Mind. Chicago. The Chicago University Press.
19. ÖZÇALISKAN, S. & GOLDIN-MEADOW, S. (2005). “Gesture is at the
cutting edge of early language development” in Cognition 96: 101-113.
20. PARLADÉ, M.V. & IVERSON, J.M. (2011) “The interplay between language,
gesture, and affect during communicative transition: A dynamic systems
approach” in Developmental Psychology 47: 820-836.
21. PRIETO, P. – ESTRELLA, A. – THORSON, J. – VANRELL, M.M. (2011). “Is
prosodic development correlated with grammatical and lexical development?
Evidence from emerging intonation in Catalan and Spanish,” in Journal of Child
Language 10: 1-37.
22. ROWE, M.L. & GOLDIN-MEADOW, S. (2009). “Early gesture selectively
predicts later language development” in Developmental Science 12: 182-187.
38
23. SOLTIS, J. (2004) “The signal functions of early infant crying” in Behavioral
and Brain Sciences 27: 443-490.
24. SAKKALOU, E. & GATTIS, M. (2012). “Infants infer intentions from
prosody” in Cognitive Development 27: 1-16.
25. SHUKLA M. – WHITE, K.S. - ASLIN, R.N. (2010 ) “Prosody guides the rapid
mapping of auditory word forms onto visual objects in 6-month-old infants” in
Psychological and Cognitive Sciences 1-6.
26. SNOW, D. (2002). “Regression and reorganization of intonation between 6 and
23 months” in Child Development 77: 281-296.
27. TOMASELLO, M. – CARPENTER, M. – LISZKOWSKI, U. (2007) “A new
look at infant pointing” in Child Development 3:705-722.