Early development of multimodal communicative...

Máster en Fonética y Fonología

Postgrado en Estudios Fónicos

CSIC-UIMP

2012

EARLY DEVELOPMENT OF MULTIMODAL

COMMUNICATIVE STRATEGIES

Trabajo realizado por Alfonso Igualada Pérez

Tutorizado por la Dra. Pilar Prieto Vives

Cotutorizado por la Dra. Laura Bosch Galcerán

3

INDEX

1. Acknowledgements _________________________________________________ 4

2. Resumen / Abstract _________________________________________________ 5

3. Introduction _______________________________________________________ 7

4. Methods _________________________________________________________ 12

5. Results __________________________________________________________ 20

6. Discussion and conclusions _________________________________________ 34

7. References _______________________________________________________ 36

4

ACKNOWLEDGEMENTS

I am grateful to my colleagues at ADAPEI-ASPRONA, an early intervention centre in

Albacete, and also to my friends and family for their help in the process of family

recruitment, working area concession, experimental assistance, and materials’

preparation. Special thanks go to the three colleagues who assisted me in the

experimental sessions, and to the joyful company of Fonfonero’s group. I would really

like to thank the infants and families who participated in the experimental sessions.

Many thanks also to all Estudios Fónicos and GrEP members for their warm

welcoming, particularly to Núria Esteve-Gibert for her help in the coding and analysis

procedures, to Joan Borràs-Comes for technical support, and to Santiago Gonzalez for

support. Special regards to master´s director Juana Gil, and to tutors Laura Bosch and

Pilar Prieto for their indispensable guidance. I am really thankful to all of them.

5

Resumen

En el presente estudio piloto investigamos las interacciones del gesto y el habla durante

el desarrollo comunicativo temprano. Se utilizó una tarea experimental basada en

Liszkowski et al. (2008) para obtener un amplio rango de producciones comunicativas

infantiles. Patrones comunicativos infantiles como las vocalizaciones, la prosodia, el

gesto y la mirada fueron analizados para estudiar la integración multimodal temprana y

sus relaciones con la emergencia del lenguaje. Aunque no se puedan extraer

generalizaciones robustas de una muestra tan pequeña, sí se han obtenido resultados

interesantes de su análisis cualitativo. Las producciones conjuntas de gesto y habla

parecen integrarse plenamente a la edad de 1;3, y se utilizan como recurso para atraer la

atención social del adulto en contextos comunicativos exigentes. Además, se ha podido

comprobar que la aparición durante las interacciones más tempranas de un conjunto de

parámetros productivos (la producción temprana de señalamiento, el alineamiento

temporal del gesto y habla, las configuraciones de la mano durante el señalamiento, y

las secuencias de mirada del niño) favorecen la aparición posterior de las

combinaciones de gesto y habla. Uno de los resultados principales de este estudio piloto

fue que el uso de las combinaciones de gesto y habla se pueden considerar como un

importante paso hacia una integración tempana en el uso de los componentes del

lenguaje. Así, el señalamiento en combinación de la vocalizaciones tempranas se podría

considerar un parámetro productivo de la comunicación intencional, en el cual se

integran por primera vez en el desarrollo infantil información semántica, pragmática y

fonológica.

6

Abstract

In the current pilot study we investigated the patterns of gesture-speech integration

during early communicative development. An experimental task based on Liszkowski et

al. (2008) was used to obtain a communicative productive sample from four children, in

two early longitudinal moments, namely at 1;0 and 1;3. Infant’s communicative patterns

of vocalization, prosody, gesture, and gaze were analyzed to study early multimodal

integration, and its relationship with language emergence. While strong generalizations

cannot be drawn on the basis of such a small sample, some interesting results may be

unfolded from qualitative analysis. Gesture-speech productions seem to be well

integrated at 1;3, and seem to work as a strong communicative resource to attract the

adult’s social attention in demanding communicative contexts. Also, the presence of a

set of productive parameters during early infant intentional interactions (early pointing

production, gesture-speech temporal alignment, hand configurations of pointing, and

infant look sequences) seem to correlate with the subsequent appearance of gesture-

speech combinations. One of the main results of the study was that the use of gesture-

speech combinations may be considered as a significant step towards the early

integrated use of language components. That is, pointing in combination with early

vocalizations may be a significant early signal of intentional communication, in which

semantic, pragmatic and phonological information are integrated for the first time in

development.

7

1. Introduction

Research on early gesture acquisition and its relationship to language emergence has

shown strong evidence that the early appearance of iconic and deictic gestures predict

early language development. For example, Colonnesi et al. (2010) found that child

comprehensive pointing at 1;0 contributed to comprehension of other children’s actions

at 3;3. In the same study, authors found a strong correlation between pointing gestures

and language development. Specifically, the appearance of deictic gestures with a

declarative function predicted infant verbal language to a greater degree than the

appearance of imperative deictic gestures. Similarly, some studies unveil the predictive

value of pointing gestures in early vocabulary development. Bavin et al., (2008) found

that children’s gesture and object use at 0;8 and 1;0 predicted vocabulary development

at 1;0 and 2;0. In Caselli et al. (2012), an infant’s early actions and gestures correlated

with comprehensive vocabulary in the 0;8 to 1;6 age range, indicating a transition to

productive vocabulary. Similarly Özçaliskan & Goldin-Meadow (2005) and Rowe &

Goldin-Meadow (2009) recorded infants at home during daily communicative activities.

They found that communicative gesture, and specifically the use and function of deictic

gestures, was able to predict both lexical and grammatical development. While isolated

pointing (i.e., the variety of object types pointed to) seemed to predict the size of the

child’s vocabulary later in development, the frequency of pointing in gesture-speech

combinations (i.e., pointing to a cookie and then saying “give”) had predictive value in

the later appearance of two-word combinations. It seems thus that in early stages of

language and cognition development, motor abilities develop early to signal intentional

communication, an ability that seems to predict and enhance verbal language

emergence.

It is also well known that social interaction plays a significant role in development of

communicative skills. For example, interaction routines between infants and adults, like

early shared attention, early vocal games, or informative pointing, allow infants to

participate in patterns of social interaction similar to those used in adult communication,

(e.g., Locke, 1997; Soltis, 2004; Liszkowski et al. 2008; Tomasello, Carpenter &

Liszkowski, 2007). Vocabulary development in relation to sharing attention to an object

has been studied by Brooks & Meltzoff (2008). In this study a gaze-following task was

8

used, meaning that the time spent looking at an object when attention was directed to

that object by an adult was measured. Authors found that longer looking times at ages

0;10 and 0;11 correlated with better language development measures at 2;0. More

interestingly, communicative integration of looking time and pointing were even better

predictors of later vocabulary performance, so infants with long and short looking times

in combination with pointing had better results than children who solely looked or

pointed. Though sharing interaction with reference to an object is a basic milestone in

the acquisition of intentional communication, Liszkowski et al. (2008) note that infant’s

communicative behaviors are also affected by adults’ attention patterns to the object of

reference. They found that the adult’s attention influenced a child’s productions, as the

child used more complex production abilities when the experimenter did not look at the

object pointed to (experimental condition A) as opposed to when he looked at neither

the object nor the child (experimental condition B). These repairing strategies to attract

the adult’s attention occurred significantly only during experimental conditions, not

during communicative conditions (base condition), in which the experimenter shared

attention with the child by looking at his eyes and then at the object, thus, actively

encouraging the child to look. By the same token, it has been found that adult’s

behavior changes depending on the child’s. Andrén (2011) found that parents responded

with elaborate responses during children’s sustained gesture strokes, and also children

adapted their responses depending on their parents’ actions. Therefore, the emergence

of language through multimodal communication seems to be strongly related to social

interaction, and findings on the acquisition of deictic gestures and eye contact and gaze

patterns reveal their importance for the development of social-communicative skills.

In addition, some studies have highlighted the relationship between the acquisition of

prosody and language development. First, early neurological activation in response to of

prosodic information was found at 0;3 by Homae et al. (2006). Shukla et al. (2010),

found evidence of the effect of prosody on language learning at 0;6. In this study, it was

found that infants associated better the sounds of continuous speech with their

corresponding object representation when statistical information and prosodic

information were aligned and children had worst word learning results when prosodic

boundaries and segmental level were misaligned. Other studies reveal the important role

of prosody in the expression of intentional communication in early stages of acquisition,

before vocabulary and early grammatical developments are achieved (Levitt, 1993;

9

Mampe el al., 2009; Prieto et al., 2011, Esteve-Gibert & Prieto, 2012; Sakkalou &

Gattis, 2012). On the other hand, children’s prosody advances seem to co-occur with the

development of language skills and other general cognitive abilities (Snow, 2002). This

author describes two different stages in which intonation changes from a period of

stable development to a discontinuous decrease in intonational ability to finally an

increase, a U-shaped pattern called a regression-reorganization model. The first stage

can be seen at 0;10, a period which is associated with the development of intentionality

and pragmatics, while the second occurs at 1;6, an age associated with the development

of expressive syntax. Of particular relevance to the present study, Parladé & Iverson

(2011) found a similar communicative transition in speech-gesture coordination in

relation to the pace of vocabulary development, so that children seemed to go through a

diminishing of gesture-speech coordination ability when going through a vocabulary

burst stage, revealing the interplay between pointing gestures, speech, and affective

gestures. This study found that children exhibited worse gesture-speech coordination

performance during the vocabulary burst period, characterized by a sharp increase in

active vocabulary, than during periods characterized by a gradual increase in

vocabulary. Yet little is known about the interaction between gesture, prosody, and

social interaction and their predictive value in language acquisition.

Gesture-speech integration happens at the beginning of one-word acquisition, as

children seem to develop an interesting ability to temporally synchronize gesture and

speech (for example, when a child points at a book while saying the word “book” to

express a conveyed information) (Butcher & Goldin-Meadow, 2000). Prior to this

achievement, children may have experimented with coordination of body and oral

movements. For example, around 0;6-0;8 canonical babbling occurs with rhythmic hand

movements (i.e. waving), and even functional development follows a path that parallels

those of gesture and speech, so that at 0;8-0;10 child word comprehension develops,

deictic gesture unfolds, gestural routines appear, the first tool use emerges, all at the

same time (Bates & Dick, 2002). As McNeill (1992) noted, gestures and speech are

synchronized in adults not only at the semantic level, since they express the same

meanings, but also at the pragmatic level, since they perform the same pragmatic

functions, and also the phonological level, because the most prominent part of the

gesture seem to be integrated with phonology. Esteve-Gibert & Prieto (in press) studied

adults’ speech-gesture coordination and found that prosody aligns with the apex (the

10

point of maximum extension during the stroke phase) of the pointing gesture. More

precisely, their results reveal that the intonation peak (the most prominent part of the

intonation contour) as well as the beginning and end of verbal production are anchored

temporally to the apex of the point gesture, regardless of variation in the metric pattern

of words.

Focusing on the development of speech-gesture temporal synchronization, to our

knowledge only Butcher & Goldin-Meadow (2000) and Esteve-Gibert & Prieto (under

review, 2012) have investigated speech-gesture alignment in an infant population. Both

investigations found that the two modalities develop together at the one-word-period

and the most prominent part of the gesture (the stroke) temporally aligns with the most

prominent part of the speech. This means that children around 1;0 have developed the

ability to fine-tune two communication modalities. Esteve-Gibert & Prieto (under

review, 2012) use prosodically prominent acoustic cues such as the onset and offset of

vocalization and the fundamental frequency peak in order to assess gesture-speech

alignment in children aged from 0;11 to 1;7. Their results reveal that children at 0;11 do

temporally align speech and gesture, but it is not until productive words emerge that the

two modalities align like in adults. This early behavior exhibits the integration of those

three linguistic levels since the two channels (gesture and vocalization) express

redundant information with the same meaning and pragmatic intention, and utterances

are attached to its equal phonological prominence. This convergence could be a

potential predictor of vocabulary emergence and thus language development. However,

little is known about the integration of pointing and prosody in particular, and further

research is needed on how this coordination develops and whether it is a predictor of

language development.

The aim of this study is to analyze the early stages of language and communication

development in four Spanish children, who were recorded at two points of their

language development, at ages 0;1 and 1;3. Each infant’s communicative interactions

will be assessed and also the extent to which these patterns predict the infant’s later

linguistic development. The analysis of data will focus on the correlation between the

communicative patterns of vocalization, prosody, gesture, and gaze (all present before

the infant’s first year), and the early signs of language acquisition (i.e., the development

of gesture-speech combinations and lexical development). The main goal will be to

11

assess the use of speech patterns, pointing gestures, and gaze patterns and explore their

predictive value for infant’s later language development.

The methodology used is based on Liszkowski et al. (2008)’s elicitation task, which

elicits speech-gesture combinations in a social communicative context with infants in a

controlled setting. This paradigm allowed us to obtain a wide behavioral sample of

communicative exchanges between the child and the adult in different social conditions.

The procedure motivates the child to initiate communication by means of a pointing

gesture and deploy his or her repertoire of communicative strategies in order to direct

the adult’s attention to a stimulus which has appeared from behind the experimenter.

12

2. Methods

2.1. Participants

Six typically-developing children participated in the study. Two of them had to be

excluded from analysis because of oral habits which interrupted pointing activity

(dysfunctional digital suction and tooth emergence). The four children analyzed (1 girl,

3 boys) were recorded at two longitudinal moments, the first recording taking place at

around 12 months (mean = 12;12; range =11;23-12;27 and the second recording three

months later. All of the infants were recruited from public nurseries in Albacete

(Castilla-La Mancha, Spain) from monolingual Spanish families that had expressed

interest in participating in the study. A small stipend was given to the parents upon

conclusion of the experiment. When initially contacted, all families were included as

they answered that their infant already begun to point at objects.

2.2. Experimental setting and materials

The recordings always took place in a 2.5 m x 5 m distractor-free testing room at the

ADAPEI-ASPRONA Early Intervention Center for children with developmental

disabilities in Albacete. The experimental setting was based on Liszkowski et al.

(2008), as follows. An opaque white cloth screen hid from view the middle of the back

of the room, and the infant sat on his or her caregiver’s lap in the middle of the room

facing the at a distance of 2 m from the screen. Through a large opening in the upper

center of the screen a camera recorded the child’s reactions, and a second camera was

placed on the back of the room in such a way that it could record the use of stimuli and

the experimenter’s facial reactions. The screen had four openings at each side through

which the puppets were made visible to the child, one at a time. These openings (six of

them 60 cm and two of them 100 cm from the floor) were symmetrically positioned at

about 45◦, 30◦ (2×) and 25◦ left and right from the infant’s midline. The puppets were

manipulated by an assistant behind the screen. A total of ten stimuli were used. Eight

consisted of hand puppets (cat, frog, cow, chicken, sun, snail, grandmother, and mouth),

and the remaining two stimuli which were visible on the floor, an electronically

activated dancing pig and a light. The two electronic stimuli were positioned on the

13

floor in front of the screen at approximately 30◦ to the infant’s left and right (See Fig.

1). A moveable bead toy and a pair of infant books were used between conditions to

return the infant’s attention to the experimenter, the large moveable beads toy was

attached to the small table. The spoken words which served as lexical stimuli were all

chosen from López-Ornat et al. (2005), the Spanish version of MacArthur’s inventory

of communicative development vocabulary items for children aged 8 months to 15

months. The white screen was situated behind the experimenter, who was seated on a

small chair. The child was seated on his or her caregiver’s lap facing the experimenter

and panel. A small table was placed between the experimenter and the caregiver with

his or her child, who were seated in a higher chair to facilitate video recording. The

caregiver wore a pair of earphones with music to distract from the activity. A total of

three assistants helped with the task, all of them were specialists in education or

rehabilitation and also trained in the procedure used in this study.

Figure 1. Experimental setting.

14

2.3. Procedure

Liszkowski’s et al (2008) procedure was used to elicit infant communicative behavior

(communicative gestures, vocalizations, and sharing gaze) through an enjoyable

activity, in this case watching puppets. The experimenter facilitated the child’s pointing

gestures by reacting to his or her communicative behavior in three different ways, each

reflecting a different social condition. In the most communicative condition (which will

be called the base condition or BaseCond) the experimenter established interaction with

the child in a communicative way when the child initiated pointing. In the other two

experimental conditions the experimenter’s attention was directed at the child but not

the stimulus (available condition) or his attention was directed at neither the child nor

the stimulus (non-available condition). Results from previous research suggested that

the latter two conditions would trigger greater communicative involvement on the part

of the child.

For each experimental session, the procedure was as follows. First, caregivers were

informed about the experiment’s procedure, permission to record was obtained, and

instructions for the task were given. Warm-up time before the experiment consisted of

extensive play between the experimenter and the infant in a different room in order for

the child to feel at ease with the experimenter. In the meantime, caregivers were brought

to the experiment room and instructed that they must not initiate any communicative

behavior toward the infant during testing and or look at the screen at any time. Rather,

they were encouraged to sit calm looking at their child and listening to music through

headphones. The caregiver was asked to gently hold the child in place on their laps to

maintain constant the child’s position during the experiment and minimize the child’s

stress during the experiment.

The experimental session began in the testing room with a short play period with the

bead toy on the table to keep the infant interested in the experimenter as a social partner,

though this toy was only used at the beginning of the segments of the experiment

involving communicative conditions, i.e., the base condition. When the experimenter

judged that the infant was relaxed and attentive, he gradually withdrew from the

interaction and signaled to the assistant behind the panel by means of snapping his

fingers out of the child’s sight that puppet stimuli could be activated. The assistant

15

always waved puppets one at a time from side to side and front to back within different

openings in the panel, silently, and looking through one of the holes to indicate when

child pointed. For each stimulus, the experiment snapped his fingers as soon as the child

initiated pointing. The child had 10 seconds within which to initiate the gesture. If the

child pointed within this time the stimulus continued (i.e., the puppet continued to be

visible) for 10 more seconds or until the infant was uninterested. But if child did not

initiate a pointing gesture, the stimulus was withdrawn after the first 10 seconds. In all

cases, the experimenter indicated by clucking his tongue when it was time for the

puppet to be withdrawn.

The first trial was always in the communicative condition (Base-Cond), i.e., when the

stimulus was activated the experimenter looked at the infant and ignored the stimulus

until the infant pointed to it, and then the experimenter reacted immediately and shared

attention for the ensuing 10 seconds, that is, the experimenter repeatedly looked back

and forth between the stimulus and the infant’s face, talking excitedly about the

stimulus and commenting on the fact that they were seeing it together. For example, the

experimenter would say something like: “Oh, it’s a Cat! Look how he’s saying hi to

you!” Then, 10 seconds after the infant’s first point the stimulus was withdrawn and the

trial was over. Following the first trial the experimenter shared a book activity until the

child was relaxed and attentive, then gradually withdrew the activity, and indicated to

the assistant with a finger click to activate the next stimulus, which could correspond

either to the available or the non-available condition. In both experimental conditions,

when the child pointed, the experimenter responded to the child by saying things like

“Hmm? What? What’s there? Hmm?” Thus the experimenter’s focus of attention

changed depending on the condition. While in the available condition trials (AExp) the

experimenter ignored the stimulus but looked at the infant, in the non-available

condition trials (BExp) the experimenter ignored both the stimulus and the child and

looked at the book.

Every session consisted of a sequence of a base condition and two experimental

conditions repeated five times (3 conditions x 5 times = total of 15 trials). Trial

sequences followed two orders counterbalanced in terms of experimental condition

(BaseCond-AExp-BExp or BaseCond-AExp-BExp) and two orders counterbalanced in

terms of the side of appearance of the first stimulus (starting by the right or by the left

16

side). Five stimuli had to appear twice to complete a total of 15 trials in every session,

the order of stimulus appearance was randomly chosen by the assistant. The

experimental sessions lasted an average of 12 minutes.

2.4. Coding

Coding was performed with ELAN software (Lausberg & Sloetjes, 2009) for complex

audio and video annotations. Acoustic analysis was done with Praat (Boersma &

Weenink, 2009) and then imported back into ELAN. Coding of multiple infant

audiovisual behavior measures was based on various authors (McNeill, 1992;

Liszkowski, et al., 2008; Brooks & Meltzoff, 2008; Cartmill, et al., 2012; Esteve-Gibert

& Prieto, under review, 2012) and included multimodal cues in relation to uttered

communicative modality, and different looking, pointing and speech aspects as detailed

in the explanation below. Measures were assessed separately for baseline and

experimental conditions.

Communicative modality and gaze patterns

Communicative modality was coded and included three options, namely, gesture-only,

speech-only, and gesture-speech combinations. The child’s sequence of looking at

stimulus object and experimenter included four options, namely: (a) when the child just

looked at the object; (b) when the child looked at the object and then at the

experimenter; (c) when the child shared object attention with the experimenter in a

triadic sequence (object-experimenter-object); and (d) when child repeated a triadic

look with the sequence repeated more than once. Following Brooks & Meltzoff (2008),

duration of the first look at the object was measured.

Pointing gesture analysis

Only instances of pointing at the correct stimulus were coded, while other

communicative gestures (i.e., showing palms to express “give me”) were not taken into

account. The number and duration of pointing gestures were coded in terms of (a) the

number of trials in which infants pointed once (trials with one point), and (b) the

number of trials in which infants pointed more than once (point repetitions). Latency

17

pointing was also measured; this consisted of the time interval between the infants’ first

look at the stimulus and their first pointing gesture.

Following Brooks & Meltzoff (2008), Liszkowski et al. (2008), and Cartmill et al.

(2012), the hand configuration of the pointing gesture was coded as either (a) pointing

with extended finger or (b) pointing with the hand with the palm downwards. Also

pointing performance was coded according to how far the arm was extended (either

fully or bent), and which arm was used to point (right or left).

Speech analysis

The infant’s communicative vocalizations were coded when they were directed at the

experimenter, while the experimenter’s verbal stimulus, and the infants shouting,

laughing, fussing, or vegetative sounds were excluded. Frequency and duration of

vocalizations were measured separately depending on their temporal location with

respect to the pointing gesture, as follows: vocalizations before pointing, vocalizations

during pointing, and vocalizations after pointing.

The phonological characterization of early infant vocalization was assessed following

Karousou et al. (2006), in which four combinatory speech patterns were considered the

most relevant to assess word-like vocalization properties. In this study adult listeners

assessed the resemblance of infant’s vocalizations to adult-like words. The results of the

study showed that four main properties of the stimulus have a significant effect on the

listener’s decisions. In our study, we coded in a binary way (i.e., presence vs. absence)

of these properties, as follows: (a) vocalic or consonantal quality (perceptive

resemblance to target vowels and consonants); (b) number of syllables (from 1 to 3

syllables); (c) prosodic patterns (flattened patterns during the whole vocalization and

repetitive patterns within vocalization segments were considered non-positive); and (d)

rhythmic patterns within a single syllable (trochaic or iambic were considered a positive

pattern). Each parameter was coded as present or absent (1 or 0), so that a composite

phonological measure was extracted for each experimental session, which could range

from 0 to 4.

18

Gesture-speech alignment

First, the most prominent part of the pointing gesture (i.e., the apex) was coded as the

specific point in time where the arm (hand or finger) reached its maximum extension. In

deictic gestures, the apex tends to be realized somewhere in the middle of the stroke,

which is the interval of time of gesture prominence, just after the initiation phase and

before the retraction phase of the gesture (Estève-Gibert & Prieto, under review, 2012).

Second, vocalizations were measured at three temporal moments: the fundamental

frequency peak of the intonation contour, the onset of the vocalization, and its offset.

Gesture-speech temporal alignment was measured in relation to the apex point, that is,

as the distance in milliseconds between the speech’s most prominent part or other

temporal points (the peak of F0, the onset, or the offset) and the most prominent part of

the gesture, namely the apex.

Figure 2. Snapshot of the ELAN coding scheme. The ELAN template included the following

tiers: (1) trial condition, (2) communicative condition, (3) infant’s look sequence, (4) duration of

the first look at the object, (5) latency before pointing, (6) pointing performance, (7) number of

pointing occurrences, (8) temporal location of the apex of the gesture, (9) temporal location of

the intonation peak, (10) vocal placement in relation to gesture, (11) number of vocalizations,

and (12) word-like pattern of the vocalization.

19

Figure 3. Waveform, spectrogram, F0 contour and coding scheme of the target vocalization in

Praat, The Praat coding scheme included the following tiers: (1) segmental analysis, (2) ToBI

transcription, (3) vocalization’s word-like pattern, (4) location of the fundamental frequency

peak, and (5) duration of the vocalization.

300300

340

380

420

460

500

F0 (

Hz)

0 0.5

L+H* L%

1111

00’21.5

00’21.3-00’21.8

20

3. Results

The main goal of this study was to analyze the early development of gesture-speech

combinations in relation to social factors, as well as the predictive value of different

linguistic cues. The results section is divided in four specific subsections, which

correspond to four different issues: (1) the development of gesture-speech

combinations; (2) the effects of social attention; (3) the description of early gesture-

speech temporal alignment; and (4) an analysis of the predictive value of early gesture-

speech combinations.

3.1. Development of gesture-speech combinations

Figure 4 shows the distribution (expressed in number of occurrences) of gesture-only,

speech-only, and gesture-speech combinations occurring at both longitudinal moments

(at 1;0 and 1;3), for the 4 children under analysis. The results of a chi-square test show a

significantly difference in the distribution of utterance modalities between the first and

the second stages ((2, N = 112) = 26.293, p < 0.05). As can be observed in the graph,

children show a great increase in the number of gesture-speech combinations at 1;3, so

integrated gesture-speech coordination is more fully accomplished in this later period of

speech development. Similarly, speech-only productions also increase from 1;0 to 1;3,

indicating a better speech capability at 15 months of age. Finally, a frequency drop is

documented in gesture-only utterances.

report a similar development of gesture-only and gesture-speech utterances for Catalan

speaking children. Their results show a change from a higher percentage of gesture-only

at 0;11 to a lower frequency at 1;3. Conversely, gesture-speech combinations increase at

the later age. Altogether, this evidence supports the idea of a transition from gesture-

only productions to gesture-speech combinations and a stabilization in the integration of

the two modalities, which seems to take place at 1;3 with gesture-speech combinations.

21

Figure 4. Distribution (expressed in number of occurrences) of gesture-only, speech-only, and

gesture-speech combinations occurring at 1;0 and 1;3 for the 4 children.

3.2. Effects of social attention

Previous results have shown that the communicative rapport of the adult will

significantly influence a child’s productions (e.g., Locke, 1997; Soltis, 2004;

Liszkowski et al. 2008, Tomasello, Carpenter & Liszkowski, 2007; Andrén, 2011). The

three graphs in Figure 5 show the distribution (expressed in number of occurrences) of

gesture-only, speech-only, and gesture-speech combinations produced by the 4 children

separated by communicative condition (baseline condition = top left graph, available

condition = top right graph, and non-available condition = bottom panel), at two ages.

The results of a chi-square test revealed significant effects of age on the modalities

used, for all communicative conditions (baseline conditions and the two experimental

conditions). Chi-square results are significant at p < 0.05 for every condition, namely

for the baseline condition (chi-square (2, N = 31) = 8.43), for the available condition

(chi-square (2, N = 36) = 8.72), and for the non-available condition (chi-square (2, N =

45) = 10.86). Another chi-square test was run to test the effect of communicative

condition on the distribution of the modalities, at both ages. No clear significant effect

22

of experimental condition was found. For the earlier age chi-square (4, N = 27) = 2.94, p

> 0.05, and for the later recording chi-square (4, N = 85) = 5.59, p > 0.05. Despite this,

the results in the graph show a clear increase of in the number of children’s responses

when comparing the two experimental conditions (and especially the available

condition (AExp) to the baseline condition. The results reveal that children at 15 months

show an increase in the use of gesture-speech combinations and speech-only responses

in the two experimental conditions, that is, when the child has to make an effort to

attract the adult’s attention. So when the focus of the adult’s attention is different from

the child’s object of interest, like trials within the available condition (adult looked at

infant but not at object) and the unavailable condition (adult looks at neither infant nor

object), we observe an increase in the more complex production abilities (meaning

gesture-speech combinations), which are used to attract the adult’s attention, while

gesture-only productions are reduced. Therefore, the new ability of gesture-speech

combinations seems to be activated in order to attract the adult’s attention in more

adverse conditions to achieve the communicative goal.

Similarly, Liszkowski et al.’s (2008) results showed effects of age and condition on the

use of pointing. First, older children (1;6) showed better pointing abilities than children

at 1;0. Second, children showed an increase in the use of repairing strategies like point

repetitions and longer durations during the experimental conditions. Though our

database would need a larger sample in order to obtain significant results, a similar

tendency is seen in our data in this respect.

In conclusion, the results in this section clearly show that vocalizations in coordination

with gestures are used by children when both abilities are well integrated at 1;3, and that

this strong communicative resource is used by children at 1;3 to actively attract adults’

attention.

23


gesture-speech combinations separated by communicative condition (baseline condition = top

left graph, available condition = top right graph, and non-available condition = bottom panel), at

two ages.

3.3. Early speech-gesture temporal alignment

Are vocalizations temporally aligned with pointing gestures in children’s utterances?

Figure 6 shows a distribution analysis of the number of vocalizations uttered before,

during, and after pointing, at the two ages. First, the results show that children at 1;0

uttered fewer vocalizations than at 1;3. Importantly, vocalizations during pointing show

a greater frequency at 1;3 than vocalizations before and after pointing. Chi-square tests

24

revealed significant effects of age on vocalization production in relation to pointing

position (chi-square (2, N = 105) = 40.19, p < 0.01).

Figure 6. Distribution (expressed in number of occurrences) of the number of vocalizations

uttered before, during, and after pointing, at the two ages.

Figure 7 shows the distribution (expressed in number of occurrences) of the number of

vocalizations uttered before, during, and after pointing, separated by experimental

condition. The results of a chi-square test reveal significant effects of experimental

condition on the temporal alignment of vocalizations in relation to pointing position

(chi-square (4, N = 105) = 14.07, p < 0.01)). As expected, both experimental conditions

(A and B) triggered a higher use of temporally aligned vocalizations, which are used by

children in order to achieve a more insistent intentional communication strategy when

the adult is not attentive.

25

Figure 7. Distribution (expressed in number of occurrences) of the number of vocalizations

uttered before, during, and after pointing, separated by experimental condition.

Now we will undertake a fine-grained analysis of temporal alignment among the

speech-gesture communicative modalities by taking into account the most prominent

part of the gesture (i.e., the gesture apex, the greatest extension of the gesture) and three

landmark points during the production of the vocalization, namely the onset of the

vocalization, the offset of the vocalization, and the intonation peak of the fundamental

frequency (F0). We compared the distance of each of the three vocalization acoustic

parameters to the apex, which was considered the point of reference for the alignment

analysis. The gesture-speech alignment analysis is based on the results of the term

Phonology paper, within the same MA degree.

The three box plots in Figure 8 show the mean temporal distance (in ms) between the

apex of the pointing gesture and the onset of the vocalization (top left panel), the apex

of the pointing gesture and the offset of the vocalization (top right panel), and the apex

of the pointing gesture and the intonational peak of the vocalization (bottom panel).

Three outliers were excluded from data analysis. The black line crossing the “0” value

represents the temporal moment of gesture apex production. Negative values (<0)

represent cases in which vocalization landmarks occur before the apex of the gesture,

26

while positive values (>0) represent cases in which vocalization landmarks occur after

the apex of the gesture. By comparing the results in the three box plots we see that the

F0 peak is usually performed before apex of the gesture (in 30 of the total 37 instances

seen in the sample).

Figure 8. Box plots of the mean temporal distance (in ms) between the apex of the pointing

gesture and the onset of the vocalization (top left panel), the apex of the pointing gesture and the

offset of the vocalization (top right panel), and the apex of the pointing gesture and the

intonational peak of the vocalization (bottom panel). Negative values (<0) represent cases in

which vocalization landmarks occur before the apex of the gesture, while positive values (>0)

represent cases in which vocalization landmarks occur after the apex of the gesture.

27

The average peak-apex distance is around 300 ms (mean peak-apex distance = -306 ms;

SD = 523 ms). And as expected, the start and end points of the vocalization were

produced before and after the intonation F0 peak respectively (mean onset-apex

distance is equal to -1578 ms; SD = 4839) and mean offset-apex distance is equal to -

212 ms; SD = 5129). The distance between both the onset and offset of the vocalization

and the apex of the pointing gesture show larger standard deviations than the distance

between the F0 peak and the apex, meaning that these synchronization patterns are more

variable than the peak-apex alignment. Thus, children seem to accurately align speech

and gesture, and the F0 peak seems to be an anchor point for the alignment of the apex

(both occur within around 0.5 seconds of each other).

In sum, our small data sample has first revealed effects of both age and experimental

condition on how vocalizations are temporally aligned with the pointing gesture,

consistent with Liszkowski et al. (2008). The results show that children use

communicative strategies more efficiently during experimental conditions than in the

baseline conditions in which the adult actively shares attention to the object with the

child.

3.4. Language acquisition perspective on gesture-speech combinations

In this section, we perform a subject-by-subject qualitative analysis of the potential

predictors (or language emergence parameters) of the individual capability of each child

to communicate with gesture-speech combinations at 1;3. As later vocabulary measures

at 1;6 have not yet been taken, a qualitative overview of children’s communicative

abilities is performed in this section. The following production parameters related to

communication have been taken into account: communicative modality (gesture-only,

speech-only, gesture-speech combinations), phonological development measures, hand-

motor abilities, and social gaze. We then assess the predictive value of each.

28


gesture-speech combinations separated by participants (4 participants, x-axis) and age (12

months = top panel, 15 months = bottom panel).

29

The two plots in Figure 9 show the distribution (expressed in number of occurrences) of

gesture-only, speech-only, and gesture-speech combinations separated by participants (4

participants, x-axis) and age (12 months = top panel, 15 months = bottom panel).

Though all children show a higher frequency of gesture-speech combinations at 1;3 than

at 1;0, each child shows a different performance. Participant 1 shows the highest rate of

gesture-speech responses at 1;3, which seems to correlate with a high rate of gesture-

only productions at the earlier age. Participant 3 shows the same pattern but at a lower

frequency. Thus, for participants 1 and 3 the presence of gesture-only productions at 12

months seems to influence later higher rates of gesture-speech combinations. A

different production pattern is shown by participant 4. Though this child used a high

number of vocalizations at 1;0, this ability is not transformed at a later age into an

increase in gesture-speech combinations. Finally, participant 2 develops communicative

abilities at a slower rate, as no responses were obtained at 1;0 and it only catches up

with participant 2 at 1;3. Thus, qualitative measures for these 4 children seem to point

to the fact that the appearance of early gesture-only responses can be a predictor of

gesture-speech combinations, which is taken as a measure of general communicative

success.

Vocalizations were assessed by analyzing the four phonological parameters proposed by

Karousous et al. (2006). These are perceptive measures that help in the assessment of

the phonological development of vocalizations. Each vocalization was coded with

presence vs. absence of the following four parameters: adult-like articulation, syllabic

adult-like properties, target-like intonation, and target-like rhythm (see the Methods

section). For example, if an utterance was adult-like in all four parameters it obtained a

score of 4.

Table 1 shows the mean score of word-like vocalization parameters for each participant,

at both ages. The results show again clear subject-by-subject differences. Participant 1

obtained the highest scores at both ages, which coincides with his better communicative

abilities (see Figure 9). By contrast, participant 2 obtained the lowest vocalization

means, which seems to be coherent with his speech and gesture frequency rate (see

Figure 9). Participant 3 seems to have a stable phonological rate of development at 1;3,

while the last child, participant 4, developed at a faster rate from 1;0 to 1;3. Curiously,

participant 4’s speech abilities and phonological measures seem to follow different

30

paths, as this child showed a very talkative pattern during the earliest age (at 1;0), so

one might expect a higher phonological measure of vocalizations. What this means is

that participant 4 used a lot of vocalizations with a lower measure of phonological

complexity parameters than those children who did not show as many speech-only

productions. More interestingly, the pace of phonological development of participant 4

(1:2.3) increased at a faster rate than that seen in the children with higher combination

abilities (around 1:1.09).

Therefore, while the phonological parameter scores of participants 1 and 2 clearly

correlate with their development of communicative gesture-speech combinations, this is

not the case with participants 3 and 4. On the other hand, participants 1 and 3 seem to

develop their phonological abilities at the same gradual rate, while the second’s child

rate is slower and the fourth child’s is the highest of the four.

Mean word-like vocalization measures

12-months-old 15-months-old

Participant1 3,5 3,8




Table 1. Mean score of word-like vocalization measures (following Karousou et al., 2006) for

each participant, at both ages.

To sum up our results thus far, our small corpus reveals interesting results about the

non-predictive use of the number of vocalizations and phonological development. First,

more frequent use of vocalization-only does not seem to correlate with higher

phonological parameter scores. Thus, the fact that they produce many vocalizations

does not mean that children will have attained a more complex phonological ability.

Instead, the use of integrated gesture-speech combinations seems to work in favor of

phonological development.

31

Finally, two more correlates of intentional communication were taken into account for

descriptive analysis: the pointing effect or system and the infant’s gaze patterns. Firstly,

the hand configuration of the pointing gesture was coded as (a) pointing with extended

finger or (b) pointing with the hand with the palm downwards. An infant’s motor

development follows a well-known pattern of motor abilities acquisitions from closer to

distal parts of the body and from the main body to the extremities, so we predicted that

the use of the index finger would be correlated with a better communicative

performance. Figure 10 shows the distribution (expressed in number of occurrences) of

the hand configuration types in all pointing gestures produced by children at 15 months,

separated by participants (4 participants, x-axis).

Figure 10. Distribution (expressed in number of occurrences) of pointing types in all pointing

gestures produced by children at 15 months, separated by participants (4 participants, x-axis).

32

Figure 11. Distribution (expressed in number of occurrences) of the look sequences used by

children at 12 months (upper panel) and at 15 months (bottom panel), separated by participants

(4 participants, x-axis).

33

The ability to share attention with the adult through an object is another important

precursor of the emergence of intentional communication, and therefore the infant’s

ability to establish eye contact with the adult to direct attention to the object was also

measured. The hypothesis is that the infant who solely looked at the target object (as if

to say “Wow! There is something there!”) could have lower communication abilities

(less communicative intention) than an infant with dyadic look sequences, that is, a look

at the object and a look at the adult (“Have you seen what I have just seen?”) or even

that a child with a triadic look sequence, that is, they look at the object of reference,

look at the adult, then look once more at the object to attract the adult’s attention (“Are

you going to see what I am seeing?”), which could be interpreted as a greater intentional

ability. The two graphs in Figure 11 show the distribution (expressed in number of

occurrences) of the look sequences at 12 months (upper panel) and 15 months (lower

panel) for each participant (4 participants, x-axis).

The look sequence results at 0;1 show that while all the infants are able to establish

complex sharing look sequences with the adult, participants 3 and 4 have a greater

ability than participant 1, who performed better in all the other parameters mentioned

above. Probably this is due to the fact that participant 1 uses more complex abilities

(gesture-only and gesture-speech combinations) to attract the adult’s attention.

Participant 2 frequently uses more look-at-object sequences with less interest or

intention to share the object of attention. Interestingly, by 1;3 all the children use

triadic looks more frequently, which seem to be correlated with the more frequent

appearance of gesture-speech combinations. Participant 1 shows a higher triad looks

rate, which nicely correlates with a greater use of gesture-speech combinations. On the

other hand, participants 2 and 3 also show a frequent use of triad looking patterns.

In sum, look sequences may function as an important early strategy for intentional

communication, but once more complex abilities are applied with this objective each

infant follows his or her own pattern of development.

34

4. Discussion and conclusions

A sample of four infants’ intentional utterances were obtained in an experimental

context which was designed, following Liszkowski’s et al. (2008) procedure, to elicit

declarative pointing gestures and which simulated several patterns of social interaction

with the adult. The children spontaneously used a broad sample of their gesture and

speech communicative modalities in order to attract an adult’s attention to the object of

interest, which were puppets, a dancing pig and a light. Similarly to Liszkowski et al.

(2008), we found effects of age and experimental condition on the pointing and

communicative behavior of children at 1;0 and 1;3. Children tended to use more

complex communicative abilities in the experimental conditions, that is, when the adult

did not attend either the stimulus object or the child. Though more data is needed to

further test these conclusions, these results already point in interesting directions.

First, our results showed the more frequent use of gesture-only productions at 1;0 and a

more frequent use of gesture in combination with speech at 1;3 (as in Esteve-Gibert &

Prieto, under review, 2012). This supports the conclusion reported by Butcher &

Goldin-Meadow (2000) that children seem to develop the ability to fully integrate

gesture and speech around their one-word-period. In addition, this process of integrating

gesture with speech integration may imply a function effect on language emergence.

Also, by 1;3, children seem to acquire the synchronous use of gesture and speech to

express meaning. Thus, as they grow up, children progressively deploy more complex

abilities, such as gesture-speech combinations, and learn to do so in more demanding

situations such as those simulated by the experimental conditions described here, when

they seek to call the attention of an inattentive adult.

A total of 37 gesture-speech combinations were closely analyzed to describe temporal

alignment by measuring the temporal distance between the prominent part of the

gestures with the endpoints of vocalization (the onset and the offset) and the highest

intonation peak (F0 peak), respectively. The results indicate that children at 1;3 show a

clear ability to temporally align the prominent part of the pointing gesture (i.e., the

apex) with the speech prominent parts (separated by a mean distance of 0.5 seconds in

our study, though Esteve-Gibert & Prieto (under review, 2012) report shorter times.

35

Further research on these issues is needed to further explore the relationship between

these parameters and communicative development.

Finally, a qualitative description of the children’s ability to combine both modalities at

1;3 is related to other levels, aspects…?of phonological development. The appearance

of gesture-only pointing responses, motor abilities like finger articulation and early gaze

sharing with the adult seem to be related to the child’s potential ability to integrate

gesture and speech. Children with better alignment abilities also performed more

complex look sequences and had finer motor abilities. Early phonological measures

seem to work in parallel. However, accurate speech production can be relatively

independent of intentional communication.

In conclusion, while strong generalizations cannot be drawn on the basis of such a small

sample, the general patterns we have seen in terms of communicative development

parameters show an interesting degree of correlation. A set of productive parameters

during early infant intentional interactions (early pointing patterns, alignment of

gesture-speech patterns, pointing configurations, and look sequences) seem to favor the

subsequent appearance of gesture-speech combinations. Even more interestingly, the

expression of both communicative modalities through early temporal alignment may be

a significant step towards the integrated use of different language components, which in

turn could be a key factor in later vocabulary development.

36

5. References

1. ANDRÉN, M. (2011) “The organization of children’s pointing stroke

endpoints” in STAM, G. & ISHINO, M. (eds.) Integrating Gestures: The

interdisciplinary nature of gesture. John Benjamins, 153-162.

2. BATES, E. & FREDERIC DICK (2002). “Language, gesture, and developing

brain,” in Developmental Psychobiology 40: 293-310.

3. BAVIN, E.L. – PRIOR, M. – REILLY, S. – BRETHERTON, L. – WILLIAMS,

J. – EADIE, P. – BARRET, Y. – UKOUMUNNE, O.C. (2008). “The early

language in Victoria Study: predicting vocabulary at age one and two years from

gesture and object use” in Journal of Child Language 35: 687-701.

4. BOERSMA, P. & WEENINK, D. (2012). Praat: doing phonetics by computer

[Computer program]. Version 5.3.04, retrieved 12 January 2012 from

http://www.praat.org/.

5. BROOKS, R. & MELTZOFF, A.N. (2008). “Infant gaze following and pointing

predict accelerated vocabulary growth trough two years of age: a longitudinal,

growth curve modeling study” in Journal of Child Language 35: 207-220.

6. BUTCHER, C. & GOLDIN-MEADOW, S. (2000). “Gesture and the transition

from one-to-two word speech: when hand and mouth come together” in

MCNEILL, D. (ed.). Language and gesture. New York: Cambridge University

Press, 235-258.

7. CASELLI, M.C – RINALDI, P. – STEFANNI, S. – VOLTERRA, V. (2012).

“Early action and gesture vocabulary and its relation with word comprehension

and production” in Child Development 83: 526-542.

8. COLONNESI, C. – STAMS, G.J. – KOSTER, I. – NOOM, M.J. (2010). “The

relation between pointing and language development: a meta-analysis” in

Developmental Review 30: 352-366.

9. ESTEVE-GIBERT, N. & PRIETO, P. (2012). “Prosody signals the emergence

of intentional communication in the first year of life: evidence from Catalan-

babbling infants” in Journal of Child Language 00: 1-26.

http://www.praat.org/

37

10. ESTEVE-GIBERT, N. & PRIETO, P (under review2012). “The early

development of gesture-speech combinations and their temporal coordination”,

Journal of Speech, Language and Hearing Research.

11. HOMAE, F. – WATANABE, H. – NAKANO, T. – ASAKAWA, K. – TAGA,

G. (2006) “The right hemisphere of sleeping infant perceives sentential

prosody” in Neuroscience Research 4: 276-280.

12. LAUSBERG, H. & SLOETJES, H. (2009). Coding gestural behavior with the

NEUROGES- ELAN system. Behavior Research Methods, Instruments, &

Computers, 41(3), 841-849.

13. LEVITT, A.G. (1993). “The acquisition of prosody: Evidence from French and-

English-learning infants” in Haskins Laboratories Status Report on Speech

Research 113: 41-50.

14. LISZKOWSKI, U. – ALBRECHT, K. – CARPENTER, M. – TOMASELLO,

M. (2008). “Infants’ visual and auditory communication when a partner is or is

not visually attending” in Infant, Behavior & Development 31:157-167.

15. LOCKE, J.L. (1997). “A theory of neurolinguistic development” in Brain and

Language 58: 265-326.

16. LÓPEZ-ORNAT, S. & GALLEGO, C. (2005). Inventario del desarrollo

MacArthur. Madrid. TEA.

17. MAMPE B. – FRIEDERICI, A.D. – CHRISTOPHE, A. – WERMKE, K.

(2009). “Newborn’s cry melody is shaped by their native language” in Current

Biology 19:1994-1997.

18. MCNEILL, D. (1992). Hand and Mind. Chicago. The Chicago University Press.

19. ÖZÇALISKAN, S. & GOLDIN-MEADOW, S. (2005). “Gesture is at the

cutting edge of early language development” in Cognition 96: 101-113.

20. PARLADÉ, M.V. & IVERSON, J.M. (2011) “The interplay between language,

gesture, and affect during communicative transition: A dynamic systems

approach” in Developmental Psychology 47: 820-836.

21. PRIETO, P. – ESTRELLA, A. – THORSON, J. – VANRELL, M.M. (2011). “Is

prosodic development correlated with grammatical and lexical development?

Evidence from emerging intonation in Catalan and Spanish,” in Journal of Child

Language 10: 1-37.

22. ROWE, M.L. & GOLDIN-MEADOW, S. (2009). “Early gesture selectively

predicts later language development” in Developmental Science 12: 182-187.

38

23. SOLTIS, J. (2004) “The signal functions of early infant crying” in Behavioral

and Brain Sciences 27: 443-490.

24. SAKKALOU, E. & GATTIS, M. (2012). “Infants infer intentions from

prosody” in Cognitive Development 27: 1-16.

25. SHUKLA M. – WHITE, K.S. - ASLIN, R.N. (2010 ) “Prosody guides the rapid

mapping of auditory word forms onto visual objects in 6-month-old infants” in

Psychological and Cognitive Sciences 1-6.

26. SNOW, D. (2002). “Regression and reorganization of intonation between 6 and

23 months” in Child Development 77: 281-296.

27. TOMASELLO, M. – CARPENTER, M. – LISZKOWSKI, U. (2007) “A new

look at infant pointing” in Child Development 3:705-722.

Early development of multimodal communicative...

Documents

Transcript of Early development of multimodal communicative...