AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus...

13
Geographical distribution of phonological complexity IAN MADDIESON, TANMOY BHATTACHARYA, D. ERIC SMITH, and WILLIAM CROFT Linguistic Typology 15 (2011), 267–279 1430–0532/2011/015-0267 DOI 10.1515/LITY.2011.020 ©Walter de Gruyter 1. Introduction Atkinson’s intriguing article in Science suggests that there is a signal to be found in the global distribution of the richness of phonological contrasts across a large sample of languages which reflects the process of the spread of anatom- ically modern humans across the habitable areas of our planet from an origin in Africa. In our commentary we will focus on three principal issues. The first has to do with what the signal is based on and whether it seems to be reliable. The second has to do with whether the concept of a serial founder effect is a persuasive explanation for the global patterns found. The third has to do with considering whether there are alternative explanations which might account for the patterns found. 1 2. On the signal and the data Atkinson’s primary linguistic data concerns a variable he refers to as phoneme diversity. This is the mean by language of the normalized values of three ordi- nal variables from chapters in the World atlas of linguistic structures (WALS; Dryer & Haspelmath (eds.) 2011). The first variable reflects the size of the con- sonant inventory divided into five bins, the second is the size of the inventory of basic vowel qualities divided into three bins, and the third reflects the presence or absence and, if present, the complexity of the tone system, again in three bins. The first two of these variables have approximately normal distributions 1. A few details in this commentary were changed after the author of the target paper had seen it. Specifically, a note was added about which languages create the major peaks and valleys seen in Figure 2, and an improved method of calculating east-west distances in the Americas was used. These changes do not affect any of the major points made in this commentary and are not of a nature to require a separate response from the target author. AUTHOR’S COPY | AUTORENEXEMPLAR AUTHOR’S COPY | AUTORENEXEMPLAR

Transcript of AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus...

Page 1: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity

IAN MADDIESON, TANMOY BHATTACHARYA, D. ERIC SMITH, andWILLIAM CROFT

Linguistic Typology 15 (2011), 267–279 1430–0532/2011/015-0267DOI 10.1515/LITY.2011.020 ©Walter de Gruyter

1. Introduction

Atkinson’s intriguing article in Science suggests that there is a signal to befound in the global distribution of the richness of phonological contrasts acrossa large sample of languages which reflects the process of the spread of anatom-ically modern humans across the habitable areas of our planet from an originin Africa. In our commentary we will focus on three principal issues. The firsthas to do with what the signal is based on and whether it seems to be reliable.The second has to do with whether the concept of a serial founder effect is apersuasive explanation for the global patterns found. The third has to do withconsidering whether there are alternative explanations which might account forthe patterns found.1

2. On the signal and the data

Atkinson’s primary linguistic data concerns a variable he refers to as phonemediversity. This is the mean by language of the normalized values of three ordi-nal variables from chapters in the World atlas of linguistic structures (WALS;Dryer & Haspelmath (eds.) 2011). The first variable reflects the size of the con-sonant inventory divided into five bins, the second is the size of the inventory ofbasic vowel qualities divided into three bins, and the third reflects the presenceor absence and, if present, the complexity of the tone system, again in threebins. The first two of these variables have approximately normal distributions

1. A few details in this commentary were changed after the author of the target paper had seenit. Specifically, a note was added about which languages create the major peaks and valleysseen in Figure 2, and an improved method of calculating east-west distances in the Americaswas used. These changes do not affect any of the major points made in this commentary andare not of a nature to require a separate response from the target author.

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 2: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

268 Ian Maddieson et al.

(the central bin contains the largest number of languages and the numbers de-crease away from the center). The third is highly skewed in that about 60 % ofthe languages fall in the lowest bin, having no tone contrasts.

As we will show below, there seems to be a real statistical relationship be-tween Atkinson’s phoneme diversity variable, its components, and the mea-sures that underlie these components and distance from Africa calculatedthrough certain travel nodes. However, it is worth noting that the label phonemediversity is potentially misleading in two ways. The full set of phoneme andprosodic contrasts is not taken into account, and the internal diversity of thephoneme set is not considered. Notably, the WALS variable of “basic vowel in-ventory” strips away contrasts among matching vowels which differ in length,nasalization, voice quality, and other properties. For example, Navajo has 4 ba-sic vowel qualities /i, e, a, o/, but each of these can be long or short, and oral ornasalized. So this North American language has 16 vowel phonemes, but fallsinto the “small” bin in terms of its basic vowel inventory. On the other hand,the West African language Igbo has 8 basic vowel qualities which equals itstotal number of vowel phonemes, so it falls into the large bin, even though ithas only half the number of vowel phonemes as Navajo.

Adding tone to the phoneme diversity calculation incorporates one way inwhich the syllable inventory of a language can be enriched, but much greaterphonological variation between languages is created by differences in permit-ted phonotactic combinations (Maddieson 1984, Shosted 2006). A languagethat has 10 consonants and 5 vowels but allows only CV syllables can con-struct only 50 syllables. But if the same inventory can be freely deployed in a(C)V(C) template the number of possible syllables is 605.

The phoneme diversity variable Atkinson calculates is thus neither a truemeasure of the size of the phoneme inventory, nor of the phonotactic possi-bilities in the languages concerned, but a hybrid touching on some aspects ofthese properties. Further, it concerns only the number of distinctions, whereasthe term diversity might more readily be understood to refer to the content ofan inventory, rather than its size. The set of consonants in (1a) has less diversitythan the set in (1b), as the same few properties are re-combined in (1a) whereasmost of the consonants in (1b) have unique properties. In terms of the numberof phonetic traits exploited, the set in (1c) is equally diverse as that in (1a),even though it has fewer members.

(1) a. p t kb d gm n N

b. p ts qá nd k’m R ì

c. t kb

n

However, although we consider the measure Atkinson uses to be rather anodd construct and would have preferred that it not be labeled “diversity”, thesignal he has discovered seems to be valid. The most important part of this

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 3: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity 269

Table 1. Fit between phonological variables and best-origin distance (simple regressionon means by language family, n = 49)

Variable R2 Significance level

Consonant inventory .182 .0023Basic vowel inventory .048 .1300Tone system .097 .0293Combined variable .257 .0002

signal comes from the size of the consonant inventory.2 We have conductedseveral examinations of the data that Atkinson provides and also of the un-derlying data that forms the basis for the material in the WALS chapters thathe used. Table 1 shows the individual correlations of the three constituents ofhis “diversity” measure, and their combination, with the best-origin distancefrom Africa using data taken directly from Atkinson’s Table S2. This tablelists the means by language family for all families recognized which have twoor more members in the sample (using the classification given in WALS). Onelanguage family listed in the table, Yanomam, has been omitted as there is onlyone Yanomami language in the dataset – the Arawakan language Shiriana (ISO639-3 code xir) was misidentified and wrongly counted as Yanomami. We didnot correct for another error, the inclusion of the Tai-Kadai language Po-Ai inthe Austronesian family. Since there are 41 Austronesian languages this errorwill have little impact on the means.

Taken in isolation, the mean size of the basic vowel inventory does not corre-late significantly with mean distance by family, and the role of tone is marginal.When the three variables are simultaneously considered in a multiple regres-sion the role of tone is negligible, and that of basic vowel inventory is marginal.Results are shown in Table 2. In both analyses, the influence of the consonantinventory size is paramount.

To further examine this issue we calculated the correlation between the best-fit distance for individual languages and normalized values of the raw datawhich underlies the binned values for consonant and vowel inventory that arereported in WALS and were used by Atkinson. In addition, normalized val-

2. Largely because of the attention drawn by Atkinson’s article it came to light that a numberof errors had crept into the WALS data on consonant inventory size due to misalignment ofmultiple data columns in a single spreadsheet. These errors affect 63 of the 504 languages(12.5 %) used in Atkinson’s analysis, and have marginal impact on the results. We have re-peated the analysis of the total diversity measure using corrected values and find the relation-ship between this and the best-fit distance remains robust both across the set of languagesindividually (F(1,502) = 255.9, p < .0001; R2 = .338), and across language families (usingthose with three or more members – F(1,30) = 22.1, p < .0001; R2 = .424).

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 4: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

270 Ian Maddieson et al.

Table 2. Fit between phonological variables and best-origin distance (multiple regres-sion on means by language family)

Variable Significance level

Consonant inventory .0005Basic vowel inventory .0209Tone system .2709

ues of the total vowel inventory and of a four-step tone index and an 8-stepindex of syllable complexity were analyzed. The tone index distinguishes lan-guages with simple (2 tones), moderately complex (3 tones), and complex (4or more tones) tone systems from those which are not tonal (58 %). The syl-lable complexity index is the sum of values assigned to the onset (range 0–3),nucleus (range 1–2), and coda (range 0–3). A language which allows nothingmore complex than CV syllables and has no complex nuclei earns a score of 1.A language which allows elaborate onsets and codas and has complex nuclei(long vowels and/or diphthongs), such as English, earns a score of 8.

We were able to identify data for 495 of the 504 languages that Atkinsonlists in his Table S2. Not all data points are available for all the languages,and values for some are undoubtedly disputable. As Table 3 shows, consonantand vowel inventories and tone system complexity all individually correlateto a highly significant degree with Atkinson’s best-fit distance from Africa,although the R2 values are not especially high. The highest R2 value is withthe tone index, a reflection of the fact that 109 of the 123 languages in familiesof Africa in this data are tonal (89 %), whereas in all other continent-sizedareas tonal languages are a minority. In a stepwise regression analysis, the tonevariable is the first to enter, but consonant inventory, basic vowel inventory,and total vowel inventory all make significant independent contributions (indiminishing order of significance) to the model, which yields a cumulative R2

value of .331 (n = 426). Strikingly, the syllable index shows no relationshipto distance from the best-fit origin in Africa. In other words, what is arguablythe greatest contributor to phonological diversity among languages patternsgeographically quite unlike the variables that relate to paradigmatic distinctionsamong segments and tones.

The relative importance of consonant inventory and tone system complex-ity differs notably between the analysis of the binned data at language familylevel, and the analysis of the individual language data (using the raw conso-nant inventory size and a 4-level tone index). The low significance of the tonesystem variable in the family-level analysis may be due to the fact that Africareduces to five families (the familiar four – Niger-Congo, Nilo-Saharan, Afro-Asiatic, and the disputed Khoisan, plus Kadugli which WALS treats as separate

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 5: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity 271

Table 3. Fit between phonological variables and best-origin distance (simple regressionon normalized individual language values)

Variable R2 Significance level n

Consonant inventory .100 < .0001 493Basic vowel inventory .125 < .0001 494Tone system .136 < .0001 468Total vowel inventory .034 < .0001 494Syllable index .000 .6545 448

from Nilo-Saharan), whereas the Americas contain 26 families – more thanhalf the total, 10 of which consist of just two languages. These American lan-guage families range from those with a very high mean normalized value forthe tone system variable (Oto-Manguean 1.570) to those with the minimumvalue (e.g., Eskimo-Aleut, Aymaran, Chon −0.769). There is thus little geo-graphical pattern to the tone variable at the family level. On the other hand, 20of the American language families fall at or below the median of the mean con-sonant inventory variable, as does the Australian family and two of the three“Papuan” groups in WALS, all of these distant from Africa.

Our examination of the geographical pattern suggested by Atkinson, espe-cially the results shown in Table 3, indicates that there is merit in the idea thatcertain elements of phonological complexity diminish with distance from anorigin in Africa as calculated through the five choke-points used by Atkinson(the Sinai, the Bosphorus, the Mekong delta, the Bering Strait, and the Isthmusof Panama). In the next section we will discuss how far the idea of a foundereffect is an appealing explanation for this pattern. In the final section we willconsider whether alternatives might be plausibly considered.

3. On the appropriateness of the idea of a founder effect on languagediversity

A founder effect in genetics occurs because when a subset of a population sep-arates itself from the larger group that subset will likely include a smaller rangeof genetic variation than the larger population from which it splits. Descendantgroups will therefore start with this smaller pool of genetic variability. If thesedescendant groups then themselves split, a further reduction of genetic diver-sity would then follow. The concept of a serial founder effect with an origin inAfrica is generally accepted as underlying a very significant part of the globalpattern of human genetic diversity. For example, Ramachandran et al.’s (2005)study of 783 autosomal microsatellite sites in 53 populations showed a veryhigh correlation (R2 = .78) between lower heterozygosity and distance fromAfrica – in this case measured from an arbitrary origin in Addis Ababa via

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 6: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

272 Ian Maddieson et al.

similar choke-points. Atkinson argues that a somewhat parallel process mayexplain the pattern he has found, with serial reductions in phonological com-plexity as humans spread from Africa. He seems to suggest that the effect isnot directly analogous to the effect seen in the genetic record, but is ratheran indirect effect of population size. Founder populations are presumed to besmall, and a link is assumed between population size and linguistic properties.As he puts it, “[i]f phoneme distinctions are more likely to be lost in smallfounder populations, then a succession of founder events during range expan-sion should progressively reduce phonemic diversity with increasing distancefrom the point of origin” (Atkinson 2011: 346).

Expressing the relationship as one indirectly due to population size seemsprudent. After all, a subgroup of speakers of a given language does not usea subset of the phonemes of the language, but all of them. Hence the basisfor expecting an effect at the phonological level that is directly analogous to afounder effect in genetic terms is questionable. Moreover, a language’s inven-tory of tones and phonemes is a property that emerges from the language’s lex-icon, rather than being a property that exists independently. To lose a phonemesimply by the act of founding a new population would entail highly selectiveloss of the particular lexical items containing that phoneme. However, there isan alternative perspective in which the linguistic effect could be more closelyanalogous to the genetic effect. If we posit that early human populations beforeleaving Africa already spoke a wide diversity of languages, then any departinggroup would speak only a subset of these languages. In general, a larger set oflanguages will make use of a larger total set of contrastive sounds (and tones)than a smaller set simply because some sounds will appear in one or more ofthe languages that are absent in others. Note that this effect would not concernthe size of inventories in individual languages, but the summed diversity acrossa set of languages, and hence is not modeled by the proposal put forward byAtkinson.

If it is the case that Africa held the greatest diversity of languages at theorigin, and subsequent spread of modern humans to the rest of the world led toprogressive reduction in their diversity, it is surprising that standard methods oflinguistic comparison converge on the idea that there are few language familiesin Africa today. Although this consensus has been more recently challenged(see, e.g., Mous 2003, Sands 2009), even the suggested higher estimates forthe number of language families in Africa are far lower than the number offamilies that exist outside Africa, and are lower than the number of languagefamilies in the Americas by itself in particular.

If there is a signal of an African origin in the phonological complexity oflanguages, we might expect that a similar signal might also be detectable inlexical, morphological, or syntactic patterns. So far, this has certainly not beenthe case. For example, all of the three most rare orders of Subject, Object, and

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 7: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity 273

Verb (VOS, OVS, OSV) are predominently found far from Africa (Feature 81Ain WALS; Dryer 2011). In the WALS chapter on this issue all examples of theVOS order are outside Africa (including Madagascar, counted as part of South-East Asia by Atkinson). None of these less common orders occur anywhere inmainland Eurasia. Most cases of the OVS order are found in South America,where all six possible orders occur. If linguistic complexity in general declineswith distance from Africa, this distribution would be unexpected.

That there is no obvious signal of an African origin in the traits that con-ventional historical linguists have been studying for more than a century anda half could be due to large-scale language replacement having occurred inAfrica, resulting in loss of much of its earlier linguistic diversity (see Blench2006 for discussion, also Nettle 1999). For example, the vast geographicalrange of the Niger-Congo family in West and Central Africa suggests such re-placement was in progress well before the more recent “Bantu expansion”, andthe distribution of Semitic languages in Africa is due to comparatively recentback-migrations from Arabia. Given an assumption of replacement, the smallnumber of language families currently found in Africa would fail to reflect itsearlier linguistic richness. Under these circumstances, phonological richness insurviving individual language could certainly be a trace of an earlier situation,maintained through both continuity and cross-language contact. However, forphonological richness – specifically phoneme inventory size – to decline withdistance from Africa, two things must be true. Processes that reduce the size ofa phoneme inventory must be more frequent overall than those that increase it.And a mechanism must exist to increase loss that has some relation to distance.

Both internal processes and contact can lead to increases or decreases in thenumber of distinct segments (or tones) a language uses. For example, Stan-dard French in the last generation or so has lost a vowel through the mergerof /œ/ with /E/ and has gained a consonant /N/ through nativization of the En-glish suffix -ing. There seems nothing in the linguistic processes themselvesthat would lead loss to be more frequent than gain. Atkinson, relying inter aliaon Hay & Bauer 2007, argues that the imbalance between loss and gain is me-diated through an indirect effect of community size. On the leading edge ofmodern humans’ spread through new territories each new founder communityis assumed, reasonably enough, to be of small size. Small community size istaken to be correlated with reduced phonological complexity, though the pre-cise mechanism by which this transpires is not specified. We are inclined tothe opinion that data on the sizes of speaker populations are far too compro-mised by the distortions due to decimation of indigenous populations throughdisease, displacement, assimilation, and other tribulations following European,Chinese, and other colonial expansions for any reliable “natural” signal of alink between phoneme inventory size and population size to be recoverable.The most severely affected peoples include those of the Americas, Australia,

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 8: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

274 Ian Maddieson et al.

Taiwan, and Siberia and the Russian Far East (cf. Butlin 1983, Stannard 1992,Shepherd 1993), all areas quite distant from Africa. In view of this scepti-cism, we have not conducted any reanalysis related to the population figuresthat Atkinson used. Several authors (e.g., Haudricourt 1961, Trudgill 2004,McWhorter 2008) have suggested that population size in itself does not predictlinguistic complexity, but rather the nature of the broad social setting of lan-guage use may have varied impacts. In fact, small populations may favor thepreservation of complex and idiosyncratic linguistic features.

4. Are there alternatives to the African origin hypothesis?

From our perspective the distribution of phonological complexity seems tomuch more “lumpy” rather than smoothly clinal. This can be seen in Atkin-son’s Figure 3. In Figure 1 we show a comparable plot using our own data onthe size of consonant inventory, basic vowel inventory, and tone index, eachnormalized and then summed. This index is plotted against the best-fit dis-tances provided by Atkinson for 491 languages. A linear fit is plotted (greyline, R2 = .292), as well as a smoothed cubic spline (black line, λ = 107),which reflects the occurrence of clusters of high values in the distribution, no-tably at distances of around 11,000 and 21,000 km from the best origin, as wellas a salient low at around 4,500 km, not far from the origin. The first peakis primarily due to Sino-Tibetan and Tai-Kadai languages, the second to Oto-Manguean, Chibchan, and other Meso-American languages, and the salient lowto non-Chadic Afro-Asiatic languages. Variations in tonal complexity seem tobe playing a major role in defining these clusters.

Figure 2 plots a more inclusive index of phonological complexity againstthe same distance values for 424 languages. This index includes the size of theconsonant inventory, the size of the total vowel inventory, the tone index, andthe syllable complexity index, all normalized and then summed. The linear fitaccounts for much less of the variance in this data (grey line, R2 = .164), butpeaks are again apparent around 11,000 and 21,000 km, and the low around4,500 km is still marked. Although an overall “out of Africa” effect can still bedetected, more local effects are strongly visible and the possibility exists thatthe overall effect is an artifact of the summation of independent local effects.

Phonological systems are broadly speaking more elaborated in certain areassuch as Northwest Europe, the Caucasus, and in Southern Africa (in contrastto the rest of the continent) and are simpler in others such as Polynesia andAustralia. In the Americas there is an interesting impression of a primarilyeast-west divide. Western North America has high phonological complexity,but eastern North America has relatively low phonological complexity withthe western mountain chains, e.g., the Rockies, providing the approximate di-viding line. Meso-America has mostly relatively high phonological complex-

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 9: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity 275

Figure 1. Sum of normalized consonant, basic vowel inventory, and tone values againstbest-origin distance; linear and cubic spline fits

Figure 2. Sum of normalized consonant, total vowel inventory, tone, and syllable com-plexity values against best-origin distance; linear and cubic spline fits

ity. In South America, the Caribbean coast and the Amazon basin generallyhave low phonological complexity but along the Andean spine and the Pa-cific coast the phonological complexity is higher. In the Southern cone (Chileand Argentina) where there are very few indigenous languages phonologicalcomplexity is high. If these local variations are overlooked to concentrate ona broad overall pattern, a general decline in phonological complexity corre-lated with distance from an assumed point of entry in the Bering Strait wouldbe found, given that eastern locations are further from the Bering Strait thanlocations at the same latitude in the west, and Atkinson (2011: 348) indeedspecifically notes that “distance from the Bering Strait is inversely correlated

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 10: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

276 Ian Maddieson et al.

with phoneme inventory size within the Americas after controlling for popula-tion size (rdistance = −0.173, P = 0.043)”.

An alternative model might focus on noting that the eastern side of the Amer-icas is less complex than the west; where the land-mass is narrow in Meso-America the east-west division is neutralized. This might be readily congru-ent with scenarios for the peopling of the Americas that posit multiple earlyroutes of entry (for discussion see Dillehay 2009), with therefore likely dif-ferent linguistic characteristics. As a simple-minded exploratory examinationof this idea, we analyzed the two phonological indices that show a systematicrelationship with Atkinson’s best-fit distance in the Americas, the size of theconsonant inventory and the syllable index (the vowel inventory measures ac-tually show an increasing trend with distance from the Bering Strait, toneshows no correlation) using the latitude and longitude coordinates of the lan-guages. The sum of the standardized values of these variables correlates pos-itively (R2 = .190, p < .0001, n = 116) with higher latitude, which is a goodapproximation to strictly north-south distance. Since the western coast of theAmericas follows a broadly north-west/south-east slant, we used a slanted linefitted approximately along the coast to “correct” the longitude values for eachlanguage based on its latitude, so that the adjusted values are roughly propor-tionate to distance from the west coast in degrees. We then used a linear approx-imation to distance per degree of longitude at a given latitude (95 % accurate)to estimate east-west distance. There is a highly significant correlation betweenthese calculated distances and the phonological index (R2 = .209, p < .0001)reflecting lower complexity at more easterly locations. In a stepwise multipleregression with distance from the Bering Strait, latitude, and “corrected” lon-gitudinal distances as predictors of the summed consonant/syllable index, theapproximated east-west distance is the first to enter, a significant improvementto the model is made by adding latitude (cumulative R2 = .336, p < .00001),whereas distance from the Bering Strait makes no contribution (p = .5239).There are hints here that the single cline found by Atkinson might better be“parsed” into two separate ones.

It is obvious that there are other factors underlying the pattern of distributionof phonological complexity beyond a straightforward reduction in complexitywith distance from Africa. It is even possible that the “African origin” effect is aproxy for some combination of these other factors. One possibility is related tothe acoustic adaptation hypothesis (Morton 1975, Wiley & Richards 1978, Pe-ters & Peters 2010) which suggests that animal vocal communication systems(avian and mammalian) are adapted to the climatic/ecological environment inwhich they operate, optimizing for the transmission characteristics of the envi-ronment. Several studies have argued for a similar effect in human languages(e.g., Munroe et al. 2009). Temperate environments with open vegetation facil-itate transmission of higher frequency signals more than warmer more densely

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 11: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity 277

Figure 3. Sum of normalized consonant inventory and syllable complexity values againstdistance from temperate zone (centered at ±45° decimal latitude); linear fit

vegetated environments. Hence, languages in the first setting will tend to bemore consonant-heavy (more consonant contrasts, more consonant-heavy syl-lables), whereas those in the second are likely to be more vocalic and to havesimpler syllable structures. A crude test of this idea is shown in Figure 3. Thisplots the sum of normalized consonant inventory size and syllable index againsta rough measure of a language’s closeness to the temperate zone. This measureis the absolute value of the language’s point location in Latitude minus 45. Alanguage whose point location is given as 45° N or 45° S has a value of 0 onthis measure, one at the equator (or at the poles!) has a value of 45. There is ahighly significant correlation between this phonological index and the distancefrom the central latitude of the world’s temperate zones (p < .0001, R2 = .210),with decreasing values correlating with increasing distance from the temperatezone. The relationship is plotted in Figure 3. Note that most of the areas notedabove as having more complex phonological patterns are in or close to thetemperate zone as defined here.

If such a crude model can capture as much of the cross-language variabil-ity in phonological complexity as it does – approaching as much as the modelillustrated in Figure 1 based on Atkinson’s distance measures – then we won-der whether a more refined analysis of relationships between environmentalfactors and phonological complexity which also controlled for the inertial ef-fects of language family membership might not subsume any apparent effectof distance from Africa.

Received: 13 June 2011 University of New MexicoRevised: 25 July 2011 Los Alamos National Laboratories

Santa Fe Institute

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 12: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

278 Ian Maddieson et al.

Correspondence addresses: (Maddieson, corresponding author) MSC03 2130, Linguistics, Uni-versity of New Mexico, Albuquerque, NM 87131-0001, U.S.A.; e-mail: [email protected];(Bhattacharya) T-8 (MS B285), P.O Box 1663, Los Alamos National Laboratory, Los Alamos,NM 87545-0285, U.S.A.; e-mail: [email protected]; (Smith) Santa Fe Institute, 1399 HydePark Road, Santa Fe, NM 87501, U.S.A.; e-mail: [email protected]; (Croft) MSC03 2130, Lin-guistics, University of New Mexico, Albuquerque NM 87131-0001, U.S.A.; e-mail: [email protected]

References

Atkinson, Quentin D. 2011. Phonemic diversity supports a serial founder effect model of languageexpansion from Africa. Science 332. 346–349.

Blench, Roger. 2006. Archaeology, language and the African past. Lanham, MD: AltaMira.Butlin, Noel. 1983. Our original aggression: Aboriginal populations of southeastern Australia

1788–1850. Sydney: Allen & Unwin.Dillehay, Tom D. 2009. Probing deeper into first American studies. PNAS 106(4). 971–978.Dryer, Matthew S. & Martin Haspelmath (eds.). 2011. The world atlas of language structures

online. München: Max Planck Digital Library. http://wals.infoDryer, Matthew S. 2011. Order of subject, object and verb. In Dryer & Haspelmath (eds.) 2011,

Chapter 81. http://wals.info/chapter/81 (accessed 4 June 2011)Haudricourt, André-Georges. 1961. Richesse en phonèmes et richesse en locuteurs. L’Homme 1.

5–10.Hay, Jennifer & Laurie Bauer. 2007. Phoneme inventory size and population size. Language 83.

388–400.Maddieson, Ian. 1984. Patterns of sounds. Cambridge: Cambridge University Press.McWhorter, John. 2008. Why does a language undress? Strange cases in Indonesia. In Matti Mi-

estamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact,change, 167–190. Amsterdam: Benjamins.

Morton, Eugene S. 1975. Ecological sources of selection on avian sounds. American Naturalist109(965). 17–34.

Mous, Maarten. 2003. Loss of linguistic diversity in Africa. In Mark Janse & Sijmen Tol (eds.),Language death and language maintenance: Theoretical, practical and descriptive ap-proaches, 157–170. Amsterdam: Benjamins.

Munroe Robert L., John G. Fought & Ronald K. S. Macaulay. 2009. Warm climates and sonorityclasses: Not simply more vowels and fewer consonants. Cross-Cultural Research 43. 123–133.

Nettle, Daniel. 1999. Linguistic diversity. Oxford: Oxford University Press.Peters, Gustav & Marcell K. Peters. 2010. Long-distance call evolution in the Felidae: Effects

of body weight, habitat, and phylogeny. Biological Journal of the Linnean Society 101(2)487–500.

Ramachandran, Sohini, Omkar Deshpande, Charles C. Roseman, Noah A. Rosenberg, MarcusW. Feldman & L. Luca Cavalli-Sforza. 2005. Support from the relationship of genetic andgeographic distance in human populations for a serial founder effect originating in Africa.PNAS 102(44). 15942–15947.

Sands, Bonny, 2009. Africa’s linguistic diversity. Language and Linguistics Compass 3(2). 559–580. http://onlinelibrary.wiley.com/doi/10.1111/j.1749-818X.2008.00124.x/full

Shepherd, John R. 1993. Statecraft and political economy on the Taiwan frontier, 1600–1800.Stanford, CA: Stanford University Press.

Shosted, Ryan. 2006. Correlating complexity: A typological approach. Linguistic Typology 10.1–40.

Stannard, David. 1992. American Holocaust: Columbus and the conquest of the New World. NewYork: Oxford University Press.

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR

Page 13: AUTHOR’S COPY | AUTORENEXEMPLARtuvalu.santafe.edu/~desmith/PDF_pubs/LITY.2011.020.pdf · nucleus (range 1–2), and coda (range 0–3). A language which allows nothing more complex

Geographical distribution of phonological complexity 279

Trudgill, Peter. 2004. Linguistic and social typology: The Austronesian migrations and phonemeinventories. Linguistic Typology 8. 305–320.

Wiley, R. Haven & Douglas G. Richards. 1978. Physical constraints on acoustic communication inthe atmosphere: Implications for the evolution of animal vocalizations. Behavioral Ecologyand Sociobiology 3. 69–94.

AUTHOR’S COPY | AUTORENEXEMPLAR

AUTHOR’S COPY | AUTORENEXEMPLAR