Multiunit sequences in first language acquisition
Transcript of Multiunit sequences in first language acquisition
The University of Manchester Research
Multiunit sequences in first language acquisition
DOI:10.1111/tops.12268
Document VersionAccepted author manuscript
Link to publication record in Manchester Research Explorer
Citation for published version (APA):Theakston, A., & Lieven, E. (2017). Multiunit sequences in first language acquisition. Topics in Cognitive Science.https://doi.org/10.1111/tops.12268
Published in:Topics in Cognitive Science
Citing this paperPlease note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscriptor Proof version this may differ from the final Published version. If citing, it is advised that you check and use thepublisher's definitive version.
General rightsCopyright and moral rights for the publications made accessible in the Research Explorer are retained by theauthors and/or other copyright owners and it is a condition of accessing publications that users recognise andabide by the legal requirements associated with these rights.
Takedown policyIf you believe that this document breaches copyright please refer to the University of Manchester’s TakedownProcedures [http://man.ac.uk/04Y6Bo] or contact [email protected] providingrelevant details, so we can investigate your claim.
Download date:07. Apr. 2022
1
Multiunit Sequences in First Language Acquisition1
Anna Theakston
and
Elena Lieven
ESRC International Centre for Language and Communicative Development
University of Manchester, UK
Acknowledgements: Anna Theakston and Elena Lieven are Professors in the International
Centre for Language and Communicative Development (LuCiD) at The University of
Manchester. The support of the Economic and Social Research Council [ES/L008955/1] is
gratefully acknowledged.
Abstract
Theoretical and empirical reasons suggest that children build their language not only out of
individual words but also out of multiunit strings. These are the basis for the development of
schemas containing slots. The slots are putative categories which build in abstraction while
the schemas eventually connect to other schemas in terms of both meaning and form. Evi-
dence comes from the nature of the input, the ways in which children construct novel utter-
ances, the systematic errors that children make, and the computational modelling of chil-
dren’s grammars. However, much of this research is on English, which is unusual in its rigid
word order and impoverished inflectional morphology. We summarise these results and ex-
plore their implications for languages with more flexible word order and/or much richer in-
flectional morphology.
1 Address for correspondence: Elena Lieven, Child Study Centre, Coupland 1 Building,
University of Manchester. M13 9PL, UK. Email: [email protected]
2
Multiunit Strings in First Language Acquisition
Introduction
The use of frequent multiunit strings in written and spoken language is well established and is
associated with a range of processing advantages (Wray, 2012). These strings can be fully
specified (e.g. How are you? Don’t know, Bybee & Scheibmann, 1999) or more flexible with
slots of varying abstractness (X found X’s way PP, X let alone Y, Fillmore, Kay & O’Connor
1988). The question we address in this paper is the role that multiunit strings of differing
degrees of abstraction might play in the learning of grammatical structure in first language
acquisition. By ‘abstractness’ and ‘abstraction’, we mean any slot in a string that goes beyond
a specific phonological form or word. So, for example, a child’s lexicon might initially
contain a frozen phrase such as Where’s Daddy? before adding Where’s PERSON? and then
Where’s PERSON/OBJECT?, ending up with Where’s NP?. During this process of
developing slots of increasingly wider scope within schemas, the X slot in the Where’s X?
schema has increased in abstractness.2
The role of multiunit strings in first language acquisition
From a usage-based theoretical perspective, our ability to understand and create novel
utterances derives from the fact that they are like utterances we have heard and used before.
Thus we can think of the task of language learning as learning to appropriately reuse the
language that one hears. Researchers working within a usage-based perspective argue that
multiunit strings form one important starting point in the language acquisition process. If
multiunit strings occur frequently enough in the input, then it is possible for children to learn
and use them. There are also good theoretical arguments for the idea that storing strings as
well as words is computationally efficient provided there is enough redundancy in the system
(Bannard & Lieven 2009). The process of abstraction is then thought to build on this early
learning. Multiunit strings provide an opportunity for children to dissect and recombine
linguistic elements, and to create more abstract frames by generalizing across lexical items;
these strings in turn eventually link up to create an adult-like grammar (e.g. Lieven, Salomo
& Tomasello 2009).
2 We use CAPS for the content of a slot and italics to denote a schema or utterance
3
For the usage-based approach to work, three important criteria must be met. First, if children
are to learn multiunit strings, these must exist in substantial numbers in the language that they
hear. Second, we need evidence that at least some of children’s early language is based on
these multiunit strings. Third, we need to demonstrate that multiunit strings facilitate the
development of more abstract linguistic representations, which explains how, based on
strings that they hear, children can produce novel utterances. Here, we take English as a test
case for these processes, before considering their application cross-linguistically.
Multiunit strings in English
English, as a language which is governed by strict word order and does not allow argument
ellipsis (except in certain pragmatically motivated contexts), lends itself to the acquisition of
multiunit sequences. Previous studies show that the language English-speaking children are
exposed to contains large numbers of multiunit collocations.
Input
Bannard and Matthews (2008) computed the frequency of 1-5 word strings in child-directed-
speech (CDS) and found many strings that were at least as frequent as individual words in the
core lexicon. This gives prima facie support to the suggestion that some multiunit strings are
highly frequent and available for children to learn. In a more detailed analysis of the speech
addressed to 12 English-learning children between the ages of 2-3 years, Cameron-Faulkner,
Lieven, and Tomasello (2003) found that half of the utterances addressed to the children
could be characterised by 52 item-based frames, with 45% beginning with one of just 17
different words. They argued that the distributional properties of the input were conducive to
children’s acquisition of lexical frames, and that repeated exposure to these frames could
provide a route into the more abstract linguistic system: “Whatever else it is, the acquisition
of a language is the acquisition of a skill, and skills are crucially dependent on repetition and
repetition with variation” (2003, p. 868).
Evidence for frames in early English child language
It is now well established, starting with the work of Braine (1976) and Peters (1983), that
children’s early utterances are less complex and more lexically-specific than those of adult
speakers. Our own work and that of our colleagues demonstrates that a large proportion of
children’s early utterances can be accounted for by a small number of high frequency frames
based around specific lexical items or multiunit chunks (e.g. I’m V-ing it, It’s a X, What can
4
X?, Where’s X?, Lieven, Pine, & Baldwin, 1997). Supporting evidence exists in a range of
linguistic domains including children’s early use of auxiliary verbs (Theakston, Lieven, Pine,
& Rowland, 2005), determiners (Pine, Freudenthal, Krajewski & Gobet, 2013), wh-questions
(Rowland & Pine, 2000), and grammatical constructions such as the transitive (Theakston,
Maslen, Lieven, & Tomasello, 2012). Many of the frames identified in early child speech
occur with high frequency in the input addressed to children (Cameron-Faulkner et al., 2003).
Further studies have found that the schemas used in early child language appear to exist
relatively independently of one another, at least in the earliest stages. They show relatively
little overlap in their content, for example with respect to the main verbs produced with
different auxiliaries (can’t X vs. don’t Y, Lieven, 2008) and the nouns used with particular
determiners (a X vs. the Y, Pine et al., 2013). These schemas may be individually learned
from the input and, initially at least, contain lexically and/or semantically restricted slots.
Another, more direct, source of evidence for children’s learning and storage of multiunit
sequences is the finding that children are better able to repeat four-word sequences found
frequently in CDS than less frequent four-word sequences, even when the frequency of the
individual items and bigrams is carefully controlled (e.g. compare a cup of tea with a cup of
milk, Bannard & Matthews, 2008).
Finally, there is evidence that children (and sometimes adults) appear to rely on lexically-
based frames in the production of complex utterances. Diessel and Tomasello (2001) reported
that children’s earliest complement clauses are organised around only a small number of
main verbs often occurring with specific subject arguments too, for example I
think+COMPLEMENT. These kinds of schemas also extend into children’s and adults’ use
of complex questions. Questions beginning with high frequency chunks such as What do you
think…? were repeated more accurately than those with the same syntactic structure and
vocabulary, but beginning with an infrequent sequence such as What does the old man
think…? (Dąbrowska, Rowland & Theakston, 2009).
Role of multiunit strings in abstraction, productivity and creativity
We envisage children’s linguistic systems building up as a network of constructions which
become increasingly interconnected along a range of dimensions: phonological, prosodic,
semantic, pragmatic and structural. For example, from early on English-speaking children
develop considerable flexibility with nouns such that they can use them across a number of
5
different schemas: I want a X, That’s a X. Initially this putative noun category may be quite
low-scope e.g. of objects and/or people, and this may vary from child to child. But as
children learn more and more constructions, the mapping of the semantics of ‘ENTITY’ to
‘noun phrase’ will become increasingly abstract, i.e. will lose much of its phonological and
semantic specificity, ultimately allowing for the insertion of relative clauses, clefts etc.
One way to examine the process of abstraction is to look at how early-produced utterances
are subsequently modified to produce longer, more complex sequences. If children are storing
prefabricated chunks of language, then earlier learned words might be expected to appear in
more complex utterances than later learned words, because the child has had the opportunity
to gradually increase the length of the sequences learned from the input. Furthermore, if
there is lexical overlap between earlier and later utterances, this might suggest that rather than
constructing utterances from scratch by combining abstract semantic or grammatical
categories, children are instead accessing prefabricated strings which they then combine with
other words/strings to build complexity. These strings may function to free up the child’s
processing resources as a result of automatization in memory or greater ease of retrieval
(Logan, 1988), enabling them to extend their earlier productions.
In diary data from one child, Tomasello (1992) observed that there was added-value in her
verb use over development as a function of her earlier uses of those same verbs. Verbs
tended to be combined with the same kinds of arguments (e.g. actor, object) as they had been
previously, with new arguments added over time. Complexity reflected their cumulative
frequency of use but varied from one verb to the next, suggesting that their representations
were not fully interconnected. Testing this claim more widely, McClure, Pine, and Lieven
(2006) found that for 12 English-learning children between 2-3 years, verb utterances
produced at MLU 2.0-2.49 were more complex for verbs that had been used previously and
consequently had a higher cumulative frequency of use, than for newly acquired verbs, while
Theakston et al., (2012), in a case study using densely sampled data, found that from age 2;0-
2;6, 79% of the verbs the child used in a full subject-verb-object construction had previously
been used in a simpler subject-verb or verb-object construction, many with the same subject
and/or object arguments.
Further evidence for the process of using multiunit strings comes from a series of studies that
attempted to derive later utterances from those heard or produced earlier in development. In a
6
study of two children’s early utterances, Lieven et al. (2009) found that at 2;0 many of the
children’s apparently novel utterances could be formed by making only minor changes to
their previous utterances. In a more robust test of this process, frequency-driven grammars
derived from child speech using a Bayesian computational model (Bannard, Lieven, &
Tomasello, 2009) were demonstrated to provide a better fit to children’s early utterances than
a more abstract grammar incorporating broad categories like Noun and Verb. However, by
3;0, introducing these categories made a significant difference to the goodness of fit.
A further source of evidence for the role of low-scope, multiunit strings in the development
of abstraction is the gradual convergence between child and adult language, as children
acquire greater flexibility in language use. In a recent study, Theakston, Ibbotson,
Freudenthal, Lieven & Tomasello (2015) examined the degree of overlap in the nouns used
as Subject and Object between children’s and their caregivers’ Subject-Verb-Object
combinations at 2 and 3 years of age, with the application of controls for vocabulary and
sample size. Over development, both the degree of lexical overlap between mother and child
in the nouns used with a given verb, and the degree of items uniquely attested in the
children’s speech increased, demonstrating both abstraction in the slots, and convergence on
the adult grammar. In summary, for English at least, children’s acquisition of lexical
collocations and low-scope variable frames appears to provide one plausible route into more
abstract and general grammatical representations.
The role of pronouns in abstraction
Multiunit strings can be sequential but there is also ample evidence that they can be created
around variable slots, for example in the frame Where’s X gone?. We know that infants are
better able to detect non-adjacent patterns in strings of syllables if there is sufficient variation
in an intervening element between two relatively stable anchors (Gomez, 2002), and in older
children there is some evidence that constructions modelled with a number of different verbs
facilitate generalisation to new verbs (Savage, Lieven, Theakston & Tomasello, 2006). In the
creation of these slot-and-frame patterns, pronouns have been observed to play a privileged
role as anchors around a variable verb slot. For example, pronouns account for over 80% of
transitive subjects in the input to young English-speaking children (Cameron-Faulkner et al.,
2003) and thus may support the acquisition of schemas such as He’s V-ing it, into which new
verbs can be inserted. Evidence for this suggestion comes from multiple sources:
7
when children corrected sentences with novel verbs presented in non-English word
orders (SOV - The fox the bear tammed), roughly half of the time they did so using
pronoun subjects and objects (Akhtar, 1999)
children show sensitivity to the form and placing of pronouns in their interpretation
of transitive sentences (Ibbotson, Theakston, Lieven & Tomasello, 2011)
in our dense data analysis of one child’s early transitive SVO use between age 2;4-
2;6, over 75% of objects were represented by the pronoun it (Theakston et al., 2012)
although children can generalise the SVO construction to a novel verb when exposed
to hundreds of examples with known verbs, generalisation improves if these models
include pronominal as well as lexical subjects and objects (Childers & Tomasello,
2001)
a computational model of early language exposed to child-directed-speech
spontaneously generated pronoun islands of organisation based on frequency of
exposure (Jones, Gobet, & Pine, 2000)
Thus, because pronouns are both frequent and systematically ordered in strings, they afford
not only the earlier learning of these strings but also the creation of increasingly abstract slots
preceding or following them. Because they are a closed set with extensive interconnections
across the child’s developing system, they allow for earlier, more sophisticated production
and comprehension than is shown with lexical nouns.
The role of multiunit strings in children’s errors
Children’s errors are an important test for the role of multiunit strings in acquisition, since
adults will not produce these errors. High frequency strings can both protect from error (and
conversely explain why children make more errors with strings of lower frequency), and lead
to error when children utilise them in the wrong contexts (Ambridge, Kidd, Rowland &
Theakston, 2015).
Evidence that high frequency strings can protect from errors comes from a number of studies.
For inflectional morphology, Arnon & Clark (2011) found that children produced irregular
plurals (e.g. mice) more accurately, avoiding overregularisation errors (e.g. mouses), when
they were part of strings such as ‘Three blind …’ than after unrelated high frequency strings
(So many …). In unpublished work, we found similar results for irregular past tense forms. In
8
corpus data from 19 children between 2;0-3;6, overregularisation errors (e.g. run-ed)
occurred significantly less often as part of high frequency three-word strings (defined from
collocations with the target verbs in CDS) than did correct utterances with those same verbs.
In wh- and yes-no-questions, where the required wh+auxiliary or auxiliary+pronoun
combination is frequent in the input, children produce fewer uninversion (e.g. *What he can
do?) and double marking errors (e.g. *Can he can’t go to the park?), than for questions
without a high frequency frame (Ambridge & Rowland, 2009; Rowland & Pine, 2000, see
Dąbrowska et al., 2009 on complex questions).
Sometimes, children appear to learn multiple strings from the input which then compete for
activation, leading to errors when they are used in inappropriate contexts. The likelihood of
any given string being produced depends on a number of factors including the relative
frequency of the competing strings in the input, and which string has most recently been
primed for retrieval. For example, in both typical and atypical language development
(Theakston & Lieven, 2008; Leonard & Deevy, 2011), apparently optional use of tense and
agreement markers is influenced by the input. Children who hear subject-verb strings with
no intervening auxiliary as part of a more complex utterance (e.g. Are you keefing?) produce
lower levels of auxiliary verb provision in their subsequent declaratives (You keefing).
Children’s erroneous use of me instead of I as the subject of sentences (e.g. *Me do it) can be
explained in a similar way as competition between a me-verb and I-verb frame. Kirjavainen,
Theakston, and Lieven (2009) showed that 2-3-year-old children’s production of me-for-I
errors was related to the relative rate at which their caregivers produced complex sentences in
which the accusative pronoun me appeared pre-verbally (e.g. Let me do it). Furthermore, the
verbs that children used erroneously with me were significantly more likely to occur in their
caregivers’ speech with pre-verbal me than those used correctly (see Kirjavainen, Lieven &
Theakston, in press for a similar account of infinitival-to omissions). In all of these
examples, the assumption is that errors are gradually eradicated as children build up
representations of the appropriate complex sentence structures, thus reducing the activation of
erroneous competing forms for simple sentences.
However, children also produce many utterances which cannot be a direct ‘reading off’ from
the input (e.g. *Me can’t do it, *Why can it can’t fit?), suggesting that children move beyond
simple repetition of strings to create more abstract slot-and-frame patterns from a legitimate
9
use in the input which they then use productively, resulting in error. Accounting for the
productivity which leads to errors is critical, and although high frequency strings can get us
so far in explaining children’s correct uses and errors, there is a clear need for multifaceted
models of acquisition to move us beyond a simple reliance on early learned schemas based
directly on the input children hear (Ambridge, et al. 2015).
In some areas of acquisition, initial attempts have been made to consider how frames might
be derived from the input in a less direct manner and then underpin children’s early linguistic
representations and their development towards the adult end-state. Here, instances of
children’s linguistic creativity require a focus on the interaction between the distributional
properties of the input and communicative function. One example is the development of
linguistic structures to express the negation of activities or events denoted by verbs. An
analysis of dense data for one 2-3-year-old child revealed his early use of a largely
ungrammatical no-Verb construction, before shifting to a not-Verb construction, then finally
settling on the more typical adult use of negated auxiliaries (can’t, don’t-Verb) to express
different functions (Cameron-Faulkner, Lieven, & Theakston, 2007). Although his early uses
were certainly not typical of patterns in the input, their origins were apparent: no was the
most frequent single word negator in the input whereas not was the most frequent form used
in multiword utterances. The late-acquired negated auxiliaries showed a function-specific
pattern of emergence (see also Drozd 1995) which related to the frequency and specificity of
the mapping between auxiliaries and functions in the input. Similar arguments have been put
forward to explain children’s use of double marked questions (e.g. Why do you don’t like
cake?), such that, in the absence of a suitable negative question frame (Don’t you…?)
children may resort to combining a well-known question frame (Do you…?) with an already
learned declarative form (You don’t like X) (Ambridge & Rowland, 2009). Thus, children
appear to make productive use of the resources that are available to them at any particular
point in development, reflecting a creative process by the child to discover and/or create
regularity in form and function.
Interim summary
English-learning children hear highly frequent multiunit strings in their input. Many of their
early utterances are based on these strings and this has been shown to have important effects
on the order and nature of developmental change in their linguistic systems. These strings
can explain why there are differences for structures that, despite having the same formal
10
description, differ in whether children can produce them with or without error. They can also
explain the ways in which abstraction and schematicity build up and, in turn, account for
some instances of linguistic creativity. However, as children’s linguistic systems develop,
these strings will become interconnected in an increasing variety of ways, allowing children
to use them more or less appropriately or creatively in different situations. Developing
precise predictions for the relative balance of factors in the production and comprehension of
particular types of utterances over time will require the use of converging methods involving
corpus analysis, experiments and modelling.
Other languages
There are (at least) two important differences between English and many other languages
which could affect whether children can extract and learn from multiunit strings in the same
ways that English children can. First, English word order is much more syntactically
constrained than in many other languages. Although some languages are described as having
‘free word order’, it is almost certainly the case that this is pragmatically constrained, but this
will still give more flexibility than in English. Thus, the word order regularities that allow
children to learn multi-word strings may be less available in other languages. The second
major difference is that, relative to many languages, English has very little morphology.
Case-marking is almost totally absent except on pronouns, and person, tense and aspectual
marking on verbs is also very limited. Other languages have extensive case systems (e.g.
Estonian and Polish are usually credited with seven singular and seven plural cases) and
verbal inflections (e.g. in addition to person, number and tense, Turkish marks evidentiality
and Russian marks perfective/imperfective aspect). Thus, rich inflectional morphology may
pose different challenges for children learning these types of languages.
Word order
Looking first at the input, Stoll, Abbot-Smith, and Lieven (2009) used a method similar to
that of Cameron-Faulkner et al. (2003) to determine the evidence for low-scope frames in the
CDS of German, English, and Russian. Despite differences due to the typology of the
languages, they found a very high degree of repetition in the first 1-3 words of the mothers’
utterances, which could facilitate the early acquisition of low-scope frames (see also Arnon
2016 for Hebrew). In computational modelling work, researchers have demonstrated that
words from a variety of languages can be classified into grammatical categories (e.g.
Croatian, English, Estonian, Dutch, French, Hebrew, Hungarian, Japanese, Sesotho) using a
11
combination of distributional and phonological information (Monaghan, Christiansen, &
Chater, 2007), or localised slot-and-frame patterns (Mintz, 2003, but see Stumper et al.,
2011). McCauley and Christiansen (2014) developed a chunk-based learning model and
applied it to corpora of children’s language and their input in 29 languages that differ widely
in their degree of morphological complexity. The model learns from CDS using backward
transitional probabilities and then is set the task of generating the children’s utterances by
correctly sequencing words presented in random order (see Chang, Lieven & Tomasello,
2008). The model achieves a mean accuracy rate of 78% for English child corpora. When
tested with 29 languages, the model was less accurate with morphologically complex
languages than for those with simpler morphology. However, since the model is trained on
words rather than morphemes, this suggests that there is considerable information in the
transitional probabilities between words, even in highly inflected languages.
One study that suggests that children learning languages other than English extract strings
from the input is a cross-linguistic comparison of tense omission errors (Freudenthal, Pine &
Gobet, 2010). Using a computational model which builds up chunks from the ends of
utterances in CDS, they showed that differences in the rates of tense omission in children
learning different languages could be explained by typological differences in the organisation
of finiteness marking, with some languages (English, German and Dutch) having a larger
proportion of non-finite verb forms at the ends of utterances than others (French & Spanish).
Further evidence comes from children’s use of complement-taking verbs in German. Brandt,
Lieven and Tomasello (2010) demonstrated that children start out with separate schemas for
verb-final and verb-second complements, each appearing with different groups of verbs and
different types of main clauses. Only gradually, over the third year, are these separate
constructions used more flexibly and in overlapping ways, providing evidence for children
creating structural links between them.
We have suggested that for English, pronouns act as an anchor for the learning of lexical
frames and as a basis for generalisation into more abstract constructions. The extent to which
pronouns might play a similar role in other languages will depend on whether their relative
frequency and distributional properties lend themselves to early acquisition. To illustrate,
consider a ‘weird word order’ study with French-speaking children (Matthews, Lieven,
Theakston, & Tomasello, 2007). Although English-speaking children frequently corrected
weird word orders by producing pronominal objects, in contrast, French-speaking children
12
rarely did so. The likely explanation is that French word order is more variable than in
English and differs according to whether lexical or pronominal objects are produced. Object
clitic pronouns appear pre-verbally (SoV - Il la pousse), whereas lexical noun objects occur
after the verb (SVO - Il pousse Marie). Consequently, it may be more difficult for French-
speaking children to derive the form-function mappings for the transitive and, initially at
least, they may learn two more fragmented constructions, and learn these more slowly due to
the lower individual frequency of the two word orders.
Morphological patterns in other languages
Thus far, we have mainly focussed on multi-word strings as the potential building blocks for
the acquisition of more abstract syntactic constructions. For English, this works reasonably
well, and we have also demonstrated the potential utility of such an approach in other
languages. However, in many languages it is necessary to consider the initial building blocks
at an even lower level of granularity – that of morphological marking.
Languages vary greatly not only in the extent of inflectional morphology present but also in
its complexity. For instance, languages like English, Chinese and Japanese have relatively
little inflectional morphology and what there is, is relatively transparent (they are more
‘analytic’ or ‘isolating’). At the other end of the scale are ‘polysynthetic’ languages like
Chintang (a Tibet-Burman language spoken in Eastern Nepal) in which a transitive verb has
more than 1849 synthetic forms per verb (Stoll & Bickel 2013). Between these two extremes
lie ‘synthetic’ languages but these can vary greatly in their degree of transparency in
morpheme-to-function mapping. Some languages are agglutinative with a fixed sequence of
morphemes which map on a one-to-one basis with a function (e.g. Turkish). Others are
fusional (e.g. Polish in which gender, person and number are all combined in one morpheme).
Finally, some languages show a lot of homophony (e.g. German, where one form of the
determiner, die, can be feminine, singular, nominative or accusative, as well as masculine,
feminine or neuter, nominative or accusative, plural). These different types of morphological
systems will set different challenges to learning.
In determining the nature of the challenge facing the learner, it is essential to look at the
morphological complexity of the language that children hear: as Xanthos et al (2011) point
out, this could differ considerably from the full array of forms and combinations possible in
the grammar. Xanthos et al. developed a measure of the morphological richness of nouns and
13
verbs in CDS and child corpora for a variety of languages (Mean Size of Paradigm, MSP),
and demonstrated that this can be relatively low even for languages with complex
morphology (e.g. although Croatian has complex morphology, for instance 18 different verb
forms (Stephany et al 2007), the MSP in Croatian CDS was a relatively low 1.4, compared to
Turkish CDS of 1.91). Their MSP measure of CDS across 9 languages was strongly
positively related to the speed with which the children developed the noun and verb
paradigms of the language that each was learning.
It is clear that children learning highly inflected languages do not start out with a fully-
fledged system. Studies on Spanish and Polish show that when a comparison is made
between the degree of flexibility in the use of morphemes in CDS and the child’s speech,
with careful controls applied for vocabulary, available inflections, and sample size, children
are significantly less productive than their caregivers until around age 3;0 (Aguado-Orea &
Pine 2015, Krajewski, Lieven & Theakston, 2012). Furthermore, entropy measures suggest
that initially morphemes are closely tied to particular constructions and that they become less
closely associated as the child’s system develops (Krajewski, et al., 2012; Stoll & Bickel
2013), raising the possibility that children learning complex morphological systems may rely
on strings or low-scope frames. Moreover, although overall error rates in the use of inflected
forms might be low, this typically hides pockets of pronounced error in (lower frequency)
parts of the system. Aguado-Orea & Pine (2015) showed that for Spanish present tense
inflections, very high rates of error are found in some parts of the person-number paradigm.
Frequency was very strongly associated with correct use, suggesting that inflected forms may
initially be learned as ‘chunks’ from the input, (only two verbs, I want=quiero and I can=
puedo, accounted for nearly 50% of correct, 1st person usage, see also Rojas Nieto 2011).
Morphological richness is demonstrated not only through the number of distinct noun/verb
forms possible in a language, but also by the number of affixes that can be present in a word.
Here, however, it is important to consider the degree of systematicity in the ways in which
these forms are combined. Kelly, Wigglesworth, Nordlinger and Blythe suggest that
“Polysynthetic languages contain words with many morphemes and expressing complex
grammatical concepts, but may be relatively regular in the templatic sequence in which they
are used.” (2014: 61). Although they express doubt about the usefulness of chunk learning in
these types of languages, if children are learning these templates as partially productive
strings, with some slots becoming schematic before others, this could provide a way into the
14
structure. In a study of Turkish newspaper articles (presumably with more complex language
that that addressed to children), Durrant (2013) identified ‘morpheme bundles’, sometimes
fully specified, other times containing partially productive slots; combinations that appeared
frequently, and which speakers could represent as highly accessible strings, suggesting that a
language like Turkish might lend itself to templatic slot-and-frame-based learning. Indeed,
Aksu-Koç and Slobin (1985: 206) suggested that Turkish children sometimes use schwas as
placeholders i.e. having learned the kind of templatic sequence that it seems Kelly et al. are
referring to, they were aware that a morpheme was required but did not know what it was.
We have conceptualised morphological slot-and-frame patterns as developing out of fixed
elements such as particular stems or morphemes which combine with a variety of items (e.g.
words denoting actions, morphemes denoting different person/number combinations or
tenses). If one thinks of inflectional morphology as showing regularities both ‘vertically’
(declensions, conjugations) and ‘horizontally’ (cases, person marking), there is evidence that
children can show partial productivity on both fronts. Thus, Dąbrowska and Tomasello
(2008) documented Polish children’s ability to abstract a particular case and generalise it to
other novel nouns (‘horizontal’ generalisation), while Krajewski, Theakston, Lieven and
Tomasello (2011) showed that children’s ability to provide the correct inflection for a
particular noun varies as a function of the previous case in which they heard it (‘vertical’
generalisation). However, this process may be complex in languages in which inflections
encode more than one function (e.g. a combination of both case and person as in Polish).
Furthermore, in many languages, there is no base form or stem, and thus the challenge facing
the child is not simply one of removing and replacing morphemes, but of learning relations
between distinct inflected forms.
All this suggests that the learning of phonologically-specific, inflected words, followed by the
development of low-scope, slot-and-frame patterns might be important processes in the early
learning of inflectional morphology. But the truth is that research from this perspective is
only just getting off the ground, and there are many challenges to be addressed in
understanding the extent to which these processes operate as we consider languages with a
wide range of typological characteristics and complexities.
Remaining issues
15
Although there has been considerable work on the role of multiunit strings in children’s
acquisition of English and to a lesser extent in other languages, in terms of forming the basis
for longer utterances, and underpinning grammatical errors and correct use, the question of
how these strings facilitate the process of abstraction has been less closely explored. And yet
understanding the process of abstraction is critical to developing a full theory of language
acquisition. In constructivist or usage-based approaches to adult language, it is widely
acknowledged that linguistic representations exist at multiple levels, from the fully
phonologically specified through to partially productive and fully abstract forms (Goldberg
2006, Verhagen 2005). Thus, in acquisition, children starting out with less abstract forms will
simultaneously begin to develop more abstract representations while retaining at least some
of the earlier learned forms.
As this process gets underway, we might expect to find some interesting patterns of results,
depending on whether children are utilising more abstract or more specific representations.
Indeed, some work with adults suggests that multiunit strings may differentially influence
speakers’ production and comprehension of high vs. low frequency exemplars, with low
frequency items more influenced by the global statistics of the input, whereas higher
frequency items retain item-specific patterns of use (Wonnacott, Newport, & Tanenhaus,
2008, see Dittmar, Abbot-Smith, Lieven, & Tomasello, 2013 for similar work on children’s
interpretation of passive sentences with familiar and novel verbs). Thus, learners apparently
keep track of specific linguistic information whilst simultaneously deriving the broad
statistics of the language. The extent to which either the broad or local statistics influence
subsequent behaviour is dependent on the strength of association between specific items and
the material with which they co-occur, and this will change over the course of development.
Another influence on the process of abstraction is construction meaning, and its interaction
with the meanings of particular lexical items. For example, Ambridge and colleagues
(Ambridge, Pine & Rowland, 2012) have demonstrated that the likelihood of adults and
children judging argument structure errors as grammatical (e.g. *He disappeared the rabbit)
depends on the extent to which the verb meaning is deemed compatible with the meaning of
the construction. This means that to understand how multiunit strings are broken down and
recombined in the language learner, we will need models of acquisition that are sensitive to
the meanings of the strings, and the semantic scope of the resulting slots. There is already an
assumption that the slots in constructions are semantically restricted, as reflected in the
16
choice of terminology (e.g. I PROCESS THING; Lieven et al., 2009), and there is some
evidence that slot-and-frame patterns in CDS may have particular functions (Cameron-
Faulkner & Hickey, 2011). However, to date there is relatively little work which explores this
issue directly (although see Matthews & Bannard, 2010).
Conclusions
In summary, multiunit strings provide one useful basis from which to learn language, but
while we have begun to understand their utility in the earliest stages of acquisition, there is a
long way to go before we can hope to fully understand how these mechanisms change over
time, how both multiunit strings and grammatical abstractions can co-exist into adult
language, and how these processes differ across languages as a function of their typological
make up. Critically, though, different cues are going to be used to a greater or lesser extent in
different languages, and with variable success depending on the kind of distributional
information that is available. Thus, only a learning mechanism that can pick up a variety of
cues and weigh them appropriately as a function of the language being learned will be
successful in explaining how and when multiunit strings will be learned from the input, what
the units consist of and how they might develop in terms of abstraction across different
languages.
17
References
Aguado-Orea, J. & Pine, J. (2015). Comparing different models of the development of verb
inflection in early child Spanish. Plos ONE, 10 (3).
Akhtar, N. (1999). Acquiring basic word order: Evidence for data-driven learning of syntactic
structure. Journal of Child Language, 26, 339–356.
Aksu-Koç, A. & Slobin, D. (1985). The acquisition of Turkish. In D. Slobin (Ed.) The
Crosslinguistic Study of Language Acquisition: Vol.1. The data. Hillsdale, New Jersey:
Lawrence Erlbaum Associates (pp. 839-880).
Ambridge, B., Kidd, E., Rowland, C. & Theakston, A. (2015) The ubiquity of
frequency effects in first language acquisition. Journal of Child Language, 42(2), 239-73
Ambridge, B., Pine, J. & Rowland, C. (2012) 'The roles of verb semantics,
entrenchment and morphophonology in the retreat from dative argument structure
overgeneralization errors'. Language, 88 (1) 45-81.
Ambridge, B. & Rowland, C. (2009). Predicting children's errors with negative questions:
Testing a schema-combination account. Cognitive Linguistics, 20(2), 225-266.
Arnon, I. (2016). The nature of child-directed speech in Hebrew: frequent frames in a
morphologically rich language. In R. Berman (Ed.) The Acquisition of Hebrew: Trends in
Language Acquisition Research Series. pp.221-224. John Benjamins: Amsterdam.
Arnon, I. & Clark, E. V. (2011): Why brush your teeth is better than teeth
– children's word production is facilitated in familiar sentence-frames, Language
Learning and Development, 7:2, 107-129
Bannard, C. & Lieven, E. (2009) Repetition and Reuse in Child Language Learning. In
R. Corrigan, E. Moravcsik, H. Ouali, K.Wheatley (eds.). Formulaic Language: Volume II:
Acquisition, Loss, Psychological reality, Functional Explanations. Amsterdam: John Ben-
jamins. (pps. 297-321).
Bannard, C., Lieven, E., & Tomasello, M. (2009). Modelling children's early grammatical
knowledge. Proceedings of the National Academy of Sciences, 106(41), 17284-17289.
Bannard, C., & Matthews, D. (2008). Stored word sequences in language learning: The effect
of familiarity of children’s repetition of four-word sequences. Psychological Science,
19(3), 241–248.
Braine, M. (1976). Children’s first word combinations. Monographs of the Society for
Research in Child Development, 41, 1-104.
Brandt, S., Lieven, E. & Tomasello, M. (2010). Development of word order in German
18
complement-clause constructions: Effects of input frequencies, lexical items, and dis-
course function. Language, 86, 3, 583-610
Bybee, J. & Scheibman, J. (1999) The effect of usage on degrees of constituency: the
reduction of don't in English. Linguistics 37, 575-596.
Cameron-Faulkner, T., & Hickey, T. (2011). Form and function in Irish child-directed-
speech. Cognitive Linguistics 22(3), 569-594.
Cameron-Faulkner, T. Lieven, E., & Theakston, A. (2007). What part of no do children not
understand? A usage-based account of multiword negation. Journal of Child Language,
34, 251-82.
Cameron-Faulkner, T., Lieven, E., & Tomasello, M. (2003). A construction based analysis of
child directed speech. Cognitive Science, 27(6), 843–873.
Chang, F., Lieven, E., & Tomasello, M. (2008). Automatic evaluation of syntactic learners in
typologically different languages. Cognitive Systems Research, 9(3), 198–213.
Childers, J. & Tomasello, M. (2001). The role of pronouns in young children’s acquisition of
the English transitive construction. Developmental Psychology, 37(6), 739-748.
Dąbrowska, E., Rowland, C., & Theakston, A. (2009). Children’s acquisition of questions
with long distance dependencies. Cognitive Linguistics, 20(3), 571-597.
Dąbrowska, E., & Tomasello, M. (2008). Rapid learning of an abstract language-specific
category: Polish children’s acquisition of the instrumental construction. Journal of Child
Language , 35 (3), 533-558.
Diessel, H. & Tomasello, M. (2001). The acquisition of finite complement clauses in English:
A corpus-based analysis. Cognitive Linguistics, 12, 1-45.
Dittmar, M., Abbot-Smith, K., Lieven, E., & Tomasello, M. (2013). Familiar verbs are not
always easier than novel verbs: how German pre-school children comprehend active and
passive sentences, Cognitive Science, 38(1), 128-151.
Drozd, K. (1995). Child English pre-sentential negation as metalinguistic exclamatory
sentence negation. Journal of Child Language, 22, 583-610.
Durrant, P. (2013). Formulaicity in an agglutinating language: the case of Turkish
Corpus Linguistics and Linguistic Theory; 9(1), 1–38. DOI 10.1515/cllt-2013-0009
Fillmore, C., Kay, P., & O'Connor, M. (1988). Regularity and idiomaticity in
grammatical constructions: The case of let alone. Language, 501-538.
Freudenthal, D., Pine, J. and Gobet, F. (2010). Explaining quantitative variation in the
rate of optional infinite errors across languages: A comparison of MOSAIC and the
Variational Learning Model. Journal of Child Language, 36, 643-669.
19
Goldberg, A. (2006). Constructions at Work: the nature of generalization in language,
Oxford University Press
Gómez, R. L. (2002). Variability and detection of invariant structure. Psychological Science,
13, 431–436.
Ibbotson, P., Theakston, A., Lieven, E. & Tomasello, M. (2011). The role of pronoun frames
in early comprehension of transitive constructions in English. Language Learning and
Development, 7, 1-16.
Jones, G., Gobet, F. & Pine, J. (2000). A process model of children's early verb use. In: L.
Gleitman & A. Joshi (Eds.), 22nd Annual Meeting of the Cognitive Science Society.
Mahwah, NJ: LEA, pp. 723-728.
Kelly, B., Wigglesworth, G., Nordlinge, R. & Blythe, J. (2014) The Acquisition of
Polysynthetic Languages. Language and Linguistics Compass 8/2, 51–64.
Kirjavainen, M., Lieven, E., & Theakston, A. (2016). Can infinitival-to omissions and
provisions be primed? An experimental investigation into the role of constructional
competition in infinitival-to omission errors. Cognitive Science.
Kirjavainen, M., Theakston, A., & Lieven, E. (2009). Can input explain children’s me-for-I
errors? Journal of Child Language, 36, 1091-1114.
Krajewski,G., Lieven, E. & Theakston, A. (2012). Productivity of a Polish child’s
inflectional noun morphology: a naturalistic study’ Morphology 22, 1, 9-34, DOI:
10.1007/s11525-011-9199-0
Krajewski, G., Theakston, A., Lieven, E., & Tomasello, M. (2011). How Polish children
switch from one case to another when using novel nouns: Challenges for models of
inflectional morphology. Language and Cognitive Processes, 26(4-6), 830-861.
Leonard, L., & Deevy, P. (2011). Input distribution influences degree of auxiliary use by
children with SLI. Cognitive Linguistics, 22, 247-273.
Lieven, E. (2008). Learning the English auxiliary: A Usage-based Approach. In H. Behrens
(Ed.) Corpora in Language Acquisition Research: Finding Structure in Data (Trends in
Language Acquisition Research, Vol. 6). Amsterdam: Benjamins. pp. 60-98.
Lieven, E., Pine, J., & Baldwin, G. (1997). Lexically-based learning and early grammatical
development. Journal of Child Language, 24(1), 187–219.
Lieven E., Salomo D., & Tomasello M. (2009). Two-year-old children's production of
multiword utterances: A usage-based analysis. Cognitive Linguistics, 20(3), 481-508.
20
Logan, G. (1988). Towards an instance theory of automatization. Psychological Review,
95(4), 492-527.
Matthews, D. & Bannard, C. (2010). Children’s production of unfamiliar word sequences is
predicted by positional variability and latent classes in a large sample of Child-Directed
Speech. Cognitive Science, 34, 465-488.
Matthews, D., Lieven, E., Theakston, A., & Tomasello, M. (2007). French children’s use and
correction of weird word orders: a constructivist account. Journal of Child Language, 34,
381-409.
McCauley, S. & Christiansen, M. (2014). Acquiring formulaic language: A
computational model. The Mental Lexicon, 9(3), 419-436.
McClure, K., Pine, J., & Lieven, E. (2006). Investigating the abstractness of children's early
knowledge of argument structure. Journal of Child Language, 33, 693-720.
Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed
speech. Cognition, 90(1), 91–117.
Monaghan, P., Christiansen, M., & Chater, N. (2007). The phonological-distributional
coherence hypothesis: Cross-linguistic evidence in language acquisition. Cognitive
Psychology, 55, 259–305.
Peters, A. (1983). The Units of Language Acquisition, Monographs in Applied
Psycholinguistics, Cambridge University Press.
Pine, J., Freudenthal, D., Krajewski, G., & Gobet, F. (2013). Do young children have
adult-like syntactic categories? Zipf’s law and the case of the determiner. Cognition,
127(3), 345-360. doi:10.1016/j.cognition.2013.02. 006
Rojas-Nieto, C. (2011) Developing first contrasts in Spanish verbal inflection. In I.
Arnon & E. Clark (Eds.) Experience, Variation and Generalisation. Trends in Language
Acquisition Research. Amsterdam: John Benjamins
Rowland, C., & Pine, J. (2000). Subject-auxiliary inversion errors and wh-question
acquisition: What children do know? Journal of Child Language, 27, 157–181.
Savage, C., Lieven, E., Theakston, A., & Tomasello, M. (2006). Structural Priming as
Learning in Language Acquisition: The Persistence of Lexical and Structural Priming in 4-
year-olds. Language Learning and Development, 2, 27-49.
Stephany, U., Voeikova, M., Christofidou, A., Gagarina, N., Kovacevic, M., Palmovic, M.,
et al. (2007). Strongly inflecting languages: Russian, Croatian, and Greek. In S. Laaha & S
Gillis (Eds.), Typological perspectives on the acquisition of noun and verb morphology.
Antwerp Papers in Linguistics 112 (pp. 35–46). Antwerp: University of Antwerp.
21
Stoll, S., Abbot-Smith, K., & Lieven, E. (2009). Lexically restricted utterances in Russian,
German and English child directed speech. Cognitive Science, 33, 75-103.
Stoll, S. & Bickel, B. (2013). The acquisition of ergative case in Chintang. In: Bavin, E.,
Stoll, S. (Eds.) The acquisition of ergativity. Amsterdam: Benjamins, 183–208.
Stumper, B., Bannard, C., Lieven, E. & Tomasello, M. (2011), “Frequent Frames” in
German Child-Directed Speech: A limited cue to grammatical categories. Cognitive
Science, 35: 1190–1205. doi: 10.1111/j.1551-6709.2011.01187.
Theakston, A., Ibbotson, P., Freudenthal, D., Lieven, E. & Tomasello, M. (2015).
Productivity of Noun Slots in Verb Frames. Cognitive Science, 39, 1369-95. doi:
10.1111/cogs.12216
Theakston, A. & Lieven, E. (2008). The influence of discourse context on children’s
provision of auxiliary BE. Journal of Child Language, 35, 129-58.
Theakston, A., Lieven, E., Pine, J, & Rowland, C. (2005). The acquisition of auxiliary
syntax: BE and HAVE. Cognitive Linguistics, 16, 247-277.
Theakston, A., Maslen, R., Lieven, E. & Tomasello, M. (2012). The acquisition of the
transitive construction. Cognitive Linguistics, 23(1), 91 – 128.
Tomasello, M. (1992). First verbs. A case study of early lexical development. Cambridge:
Cambridge University Press.
Verhagen, A. (2005). Constructions of Intersubjectivity. New York: Oxford University
Press.
Wray, A. 2012. What do we (think we) know about formulaic language? An evaluation of the
current state of play. Annual Review of Applied Linguistics 32(1), 231-254.
10.1017/S026719051200013X
Wonnacott, E., Newport, E., & Tanenhaus, M. (2008). Acquiring and processing verb
argument structure: Distributional learning in a miniature language. Cognitive Psychology,
56, 165-209.
Xanthos, A. et al. (2011). On the role of morphological richness in the early development
of noun and verb inflection. First Language 31(4) 461 –479.
DOI: 10.1177/0142723711409976