From Predictive Statistical Models to Descriptive Color...

244
Triangulating Perspectives on Lexical Replacement From Predictive Statistical Models to Descriptive Color Linguistics Susanne Vejdemo Academic dissertation for the Degree of Doctor of Philosophy in Linguistics at Stockholm University to be publicly defended on Friday 3 March 2017 at 13.00 in hörsal 4, hus B, Universitetsvägen 10 B. Abstract The aim of this thesis is to investigate lexical replacement processes from several complementary perspectives. It does so through three studies, each with a different scope and time depth. The first study (chapter 3) takes a high time depth perspective and investigates factors that affect the rate (likelihood) of lexical replacement in the core vocabulary of 98 Indo-European language varieties through a multiple linear regression model. The chapter shows that the following factors predict part of the rate of lexical replacement for non-grammatical concepts: frequency, the number of synonyms and senses, and how imageable the concept is in the mind. What looks like a straightforward lexical replacement at a high time depth perspective is better understood as several intertwined gradual processes of lexical change at lower time depths. The second study (chapter 5) narrows the focus to seven closely-related Germanic language varieties (English, German, Bernese, Danish, Swedish, Norwegian, and Icelandic) and a single semantic domain, namely color. The chapter charts several lexical replacement and change processes in the pink and purple area of color space through experiments with 146 speakers. The third study (chapter 6) narrows the focus even more, to two generations of speakers of a single language, Swedish. It combines experimental data on how the two age groups partition and label the color space in general, and pink and purple in particular, with more detailed data on lexical replacement and change from interviews, color descriptions in historical and contemporary dictionaries, as well as botanical lexicons, and historical fiction corpora. This thesis makes a descriptive, methodological and theoretical contribution to the study of lexical replacement. Taken together, the different perspectives highlight the usefulness of method triangulation in approaching the complex phenomenon of lexical replacement. Keywords: semantics, lexical typology, semantic typology, historical linguistics, historical semantics, lexical replacement, lexical change, rate of lexical replacement, color, regression models, Swedish, English, German, Danish, Norwegian, Icelandic, method triangulation. Stockholm 2017 http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-137874 ISBN 978-91-7649-644-2 ISBN 978-91-7649-645-9 Department of Linguistics Stockholm University, 106 91 Stockholm

Transcript of From Predictive Statistical Models to Descriptive Color...

Page 1: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

Triangulating Perspectives on LexicalReplacementFrom Predictive Statistical Models to Descriptive Color LinguisticsSusanne Vejdemo

Academic dissertation for the Degree of Doctor of Philosophy in Linguistics at StockholmUniversity to be publicly defended on Friday 3 March 2017 at 13.00 in hörsal 4, hus B,Universitetsvägen 10 B.

AbstractThe aim of this thesis is to investigate lexical replacement processes from several complementary perspectives. It does sothrough three studies, each with a different scope and time depth.

The first study (chapter 3) takes a high time depth perspective and investigates factors that affect the rate (likelihood)of lexical replacement in the core vocabulary of 98 Indo-European language varieties through a multiple linear regressionmodel. The chapter shows that the following factors predict part of the rate of lexical replacement for non-grammaticalconcepts: frequency, the number of synonyms and senses, and how imageable the concept is in the mind.

What looks like a straightforward lexical replacement at a high time depth perspective is better understood as severalintertwined gradual processes of lexical change at lower time depths. The second study (chapter 5) narrows the focus toseven closely-related Germanic language varieties (English, German, Bernese, Danish, Swedish, Norwegian, and Icelandic)and a single semantic domain, namely color. The chapter charts several lexical replacement and change processes in thepink and purple area of color space through experiments with 146 speakers.

The third study (chapter 6) narrows the focus even more, to two generations of speakers of a single language, Swedish.It combines experimental data on how the two age groups partition and label the color space in general, and pink and purplein particular, with more detailed data on lexical replacement and change from interviews, color descriptions in historicaland contemporary dictionaries, as well as botanical lexicons, and historical fiction corpora.

This thesis makes a descriptive, methodological and theoretical contribution to the study of lexical replacement.Taken together, the different perspectives highlight the usefulness of method triangulation in approaching the complexphenomenon of lexical replacement.

Keywords: semantics, lexical typology, semantic typology, historical linguistics, historical semantics, lexicalreplacement, lexical change, rate of lexical replacement, color, regression models, Swedish, English, German, Danish,Norwegian, Icelandic, method triangulation.

Stockholm 2017http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-137874

ISBN 978-91-7649-644-2ISBN 978-91-7649-645-9

Department of Linguistics

Stockholm University, 106 91 Stockholm

Page 2: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 3: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

Triangulating Perspectives on Lexical Replacement

From Predictive Statistical Models to Descriptive Color Linguistics

Susanne Vejdemo

Page 4: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 5: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 6: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 7: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

Table of Contents

Acknowledgements ix

1 Introduction 1

1.1 Research questions and perspectives 2

1.2 Studies and thesis structure 4

1.3 Style conventions 5

2 Background 7

2.1 Words and concepts 8

2.2 Lexical replacement and lexical change 11

2.3 Lexical replacement and primary words 12

2.4 Where do new primary words come from? 13

2.5 Is there regularity in lexical change? 15

2.6 Lexical change is not uniform 17

2.7 Some semantic and pragmatic factors in lexical change 19

2.7.1 Emotional charge 19

2.7.2 Imageability 21

2.7.3 Frequency and entrenchment 21

2.7.4 Subjectification, inferences, polysemy, synonymy 23

2.7.5 Speaker age effects 28

3 Macro-perspective: A model of some semantic and pragmatic causes of lexical replacement 30

3.1 Background and method 32

3.1.1 Measuring lexical replacement: basic assumptions 32

3.1.2 Lexical replacement and Swadesh lists 34

3.1.3 Earlier models of lexical replacement 36

3.1.4 Evaluation of earlier models 39

3.2 A new model 43

3.2.1 Variables in the model 43

Page 8: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

3.2.2 Word class (Semantic Categories) 45

3.2.3 Imageability 45

3.2.4 Entrenchment (as frequency and co-occurrence) 46

3.2.5 Age of acquisition 47

3.2.6 Emotional charge / Arousal 47

3.2.7 Senses 48

3.2.8 Synonyms 48

3.3 Results and discussion 49

3.4 Summary 55

3.5 Rate for particular semantic domains 57

3.6 Chapter end notes and bridge to the next chapter 61

4 Introduction to color studies 63

4.1 Terms amd definitions 65

4.2 Background on color linguistics 66

4.2.1 The Berlin and Kay paradigm 66

4.2.2 New color concepts appear in border regions 71

4.2.3 Color concepts and perspective shifts 74

4.2.4 Labels for color concepts gradually become simpler 75

4.2.5 Intra-language variation in color labeling 77

4.2.6 Intermittent summary 79

4.3 The EoSS Experiment Protocol 81

5 Meso-perspective: Cross-linguistic lexical change in pink and purple 84

5.1 Languages and speakers 86

5.2 General results 88

5.3 The PINK1 and PINK2 concepts 90

5.4 System A: a single PINK1 concept 91

5.5 System B: the PINK1 concept 93

5.6 System B: the PINK2 concept 98

5.7 The PURPLE1 concept 102

Page 9: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

5.8 Rare responses 108

5.9 Discussion 109

6 Micro-perspective: diachronic lexical change in pink and purple 112

6.1 Words for pink and purple in historical texts 115

6.1.1 Textual Material 115

6.1.2 Results 117

6.2 Meta-awareness of change: interviews with older speakers 131

6.2.1 Method and material 131

6.2.2 Results 132

6.3 Summary of textual and interview results 140

6.4 Capturing intergenerational differences through color elicitation 143

6.4.1 Supplementary notes on methodology 143

6.4.2 Results 145

6.4.3 Differences over the entire spectrum 146

6.4.4 Differences within the PINK1, PINK2, and PURPLE1 areas 158

6.5 Discussion 169

6.5.1 Are derived colors in general, and purple and pink in particular, special? 169

6.5.2 Lexical change processes in pink and purple 171

6.5.3 Reconnecting with some previous theories 173

7 General conclusions 175

Appendix A: Factors influencing the rate of replacement. 189

Appendix B: Naming task results 196

Appendix C: Best example task results 201

Appendix D: EoSS codes, Munsell codes, Hex codes 205

Appendix E: The effect of color blindness 206

Appendix F: Swedish summary / Svensk sammanfattning 207

References 216

Page 10: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 11: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

ix

ACKNOWLEDGEMENTS Just as it takes a village to raise a child, it takes a community to create a

dissertation. This thesis would not have been possible without the generous support of my supervisors and academic role models, Maria Koptjevskaja Tamm and Bernhard Wälchli. You have taught me so much, and I will be forever grateful for all the help, insights, arguments, and laughter. You have been the anchoring string to my plunging / flying / soaring kite, and I have always felt secure in your support in matters both professional and personal.

The heart of my community is the Department of Linguistics at Stockholm University. It’s where I grew up academically, and where I have always returned after academic adventures around the world. I am very grateful to my academic family there and the great learning and research environment that they have created. It is a privilege to know that you can knock on any door and always be greeted with an enthusiastic “Sure, I’d love to read three pages on color theory by tomorrow, that sounds just fascinating!” I wish to thank Professor Östen Dahl, my “mock-defense opponent,” who has been my employer in several research projects over the years, and who is also the person who introduced me to linguistics in the first place, back when I was still a high school student. There are so many colleagues at the Department who have helped me in learning the trade of linguistic research – in seminars, in lunch room discussions, in detailed and insightful comments on drafts – but I would especially like to mention Ljuba Veselinova, Emil Perder, Eva Lindström, Kristina Nilsson Björkenstam, Sofia Gustafson Capková, and my statistics guru Thomas Hörberg.

In the wider community of Stockholm University, I am very grateful for the support of the steering committee of FoSprak, the Special Doctoral Programme in Language and Linguistics. I’ve been fortunate enough to work and learn alongside other PhD students both in and outside the programme. In particular I would like to mention Sigi Vandewinkel (a treasured co-author), Guillermo Montero Melis, and of, course, my wonderful roommates Ghazaleh Vafaeian and Pernilla Hallonsten Halling: thanks for all your comments, in tutorials, over tea, and in late night discussions in the saunas of Bommersvik.

Widening the circle yet again, I wish to thank the PIs and other members of the Evolution of Semantic Systems consortium (Max Planck Institute for Psycholinguistics, Nijmegen). In particular, I am thankful to Michael Dunn for lots of interesting insights, for playing host for my visit to the institute, and for giving me comments on early drafts of the paper (co-

Page 12: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

x

written with Thomas Hörberg) that forms the basis of chapter three in this thesis. Through the consortium I met my co-authors for the paper that forms the basis of the fifth chapter of this thesis: Þórhalla Guðmundsdóttir Beck, Cornelia von Scherpenberg, Åshilde Næss, Martina Zimmerman, Linnea Stockall and Matthew Whelpton. I am especially happy that the project brought me into contact with Carsten Levisen.

I would also like to thank the organizers and participants at the The Colour Language and Colour Categorisation Conference in Tallinn, 2013. There, and at various other conferences, I have met people who have since been kind enough to correspond with me about lexical semantics, such as Mari Uusküla, Magalie Desgrippes, Elena Parina, and Misuzu Shimotori. I have also received grants to support my travels from Gålöstiftelsen, FoSpråk, and Stockholm University.

I am also fortunate to be surrounded by a wealth of non-linguists who have aided the growth of this thesis through their various skills and talents – social media has turned out to be an excellent research tool, since posting a question about statistics, programming, or evaluation of semantic content has often led to dozens of valuable replies. This great set of people includes, among many others, the ever-helpful Gustaf Rydevik, Jon Karlfeldt, Stefan Björk, and Annika Waern.

I also wish to thank all the participants in my experiments, the reviewers and editors who helped with the journal papers, and Lamont Antieau, my proof-reader.

I would never have made it this far without my family, who has always supported me wholeheartedly – in particular, of course, my wonderful parents, Kerstin and Stefan, the bedrock of my existence.

Finally, there is my best friend, statistician, programmer, cook, driver, proof-reader, chocolate procurer, cheerleader, designer of the front-page image of this thesis, co-worker, and sometime co-author: my husband Mikael Vejdemo-Johansson. Wherever I may wander, you will always be my home.

Thank you.

Page 13: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

1

1 INTRODUCTION Since ‘tis natures law to Change, Constancy alone is strange.

-John Wilmot

Languages change. It is an essential part of their very nature. The systematic scientific inquiry into how and why these changes occur is a common research topic in linguistics, and great advances have been made in uncovering the rules and tendencies of phonological and grammatical changes. It has proven more difficult to find systematic rules for changes in the lexicon, and this thesis contributes to that ongoing discussion.

The aim of this thesis is to investigate lexical replacement processes from several complementary perspectives. A typical example of lexical replacement is the change of the most common word for ‘girl’ in English from maiden (Old and Middle English) to girl (Modern English). In modern speech, maiden can no longer be used as a neutral expression to refer to any young female – the word has become replaced for this particular meaning. Assuming that the meaning ‘girl’ does not vary, prototypical lexical replacement is the fact and the process of this meaning being denoted first by one word (i.e. maiden) and then by another (i.e. girl).

The process maiden � girl seems, on the surface, to be a straightforward kind of lexical replacement. However, to a large extent this seeming simplicity is an illusion born from the lengthy (broad) time perspective: from a time scale of several centuries, we can conclude that the replacement has taken place, without knowing how or why, or what effects the replacement has had on other parts of the vocabulary.

The color domain yields good examples of complex kinds of lexical replacement. Field bindweed (Convolvulus Arvensis L.) is a flower that is sometimes white and sometimes pink, and this fact has consistently been reported in floras (botanical encyclopedias) during the last few centuries – but over time, the color terms have changed. In 19th century Swedish floras, the term ljust röd ‘light red’ was typically used. In the beginning of the 20th century, however, floras referred to the same flower as skär ‘pink’. And by the beginning of the 21st century, the flower was being described as rosa ‘pink’.

The ljust röd � skär � rosa example is a slightly more complicated example of lexical replacement than maiden � girl, at least on the surface. From a broader time perspective, it is clear that the same referent – the color

Page 14: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

2

of a particular flower – is connected to a lexical replacement ljust röd � skär � rosa. Several concrete questions immediately arise from this example, however:

Rosa and skär co-existed historically in Swedish, and still do. At what point in time could a lexical replacement be said to have occurred?

How does a lexical replacement process interact with other kinds of change? For instance, the term röd ‘red’ and its modified form ljust röd ‘light red’ still exist in modern Swedish. That means that the lexical replacement described above is connected with a semantic change: the area of the perceptual color space that röd denotes has changed, since it is now not the most natural way for speakers to describe the color of the field bindweed. Similarly, skär is still used (though rarely) in modern Swedish, but usually only for the lightest shades of pink.

How quickly do lexical replacement processes occur? During the same time period in which the word describing the color of field bindweed has been replaced two times in Swedish, the word for the typical color of grass has stayed the same.

How uniform is lexical replacement? Is the process of change from ljust röd till skär the same as that from skär to rosa?

All these questions belong to the lexical typological research tradition in linguistics – defined as “the “characteristic ways in which language […] packages semantic material into words” by Lehrer (1992, p. 249) and “the cross-linguistic and typological dimension of lexicology” by Koptjevskaja-Tamm (2008, p. 5), or more concretely the “systematic study of cross-linguistic variation in words and vocabularies” (Koptjevskaja-Tamm, 2016, p. 4). Lexical typological research can have both more local approaches (restricted to a particular lexical field, to a particular process, or to a particular polysemy pattern) and more general approaches (with the aim of uncovering patterns that are relevant for the structuring of the entire lexicon) (Koptjevskaja-Tamm, 2008, p. 6).

1.1 RESEARCH QUESTIONS AND PERSPECTIVES In this thesis, I will argue that it is both possible and worthwhile to

seek generalizations for lexical replacement, just as linguists seek generalizations for phonetic and grammatical change. The example concerning Swedish words for ‘pink’ makes it clear that it is difficult, at least at narrower time scales, to separate lexical replacement from other processes,

Page 15: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

3

such as semantic change. This thesis will work to answer three research questions, stated here, together with related sub-questions.

A. What affects the likelihood of lexical replacement?

A1. Is it possible to formulate global, domain independent generalizations? A2. How can local, domain-dependent generalizations be found?

B. How does lexical replacement proceed in the semantic domain of

color?

B1. How does lexical replacement interact with other kinds of lexical change? B2. How does knowledge about the semantics and pragmatics of color (psychophysiology, sociohistory) help elucidate lexical replacement in this domain?

C. How do different perspectives on lexical replacement relate to and

complement each other?

I contend that generalizations for lexical replacement are most

successful when they are framed within several complementary perspectives. Some of the most important perspectives are

� Time scale: what might be a useful generalization from the perspective of many centuries might not be as useful in understanding processes of change in the time frame of a few generations.

� Semantic domain scope: Some generalizations will be domain-independent, and some will be different depending on the semantic domain.

� An onomasiological or a semasiological viewpoint.1

I mainly focus on denotative meaning (which part of reality is indicated by a word), for context-less words and the concepts they are connected to.

1 Briefly: Onomasiological, starting the analysis from a concept; semasiological, starting the analysis from lexical items. The terms will be discussed more in section 2.2.

Page 16: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

4

1.2 STUDIES AND THESIS STRUCTURE This thesis employs three different time perspectives from which to

study lexical replacement – the three studies will cover, respectively, macro-time (the age of Indo-European: several millennia), meso-time (several centuries), and micro-time (a few generations). The first chapter will not be restricted to any particular semantic domain, while the latter two will focus on color. Throughout, the thesis will alternate between a semasiological and an onomasiological perspective.

I argue that triangulation of these perspectives can be a fruitful way for approaching this complex issue, and that each perspective gives complementary insights into why, how, and how fast lexical replacement proceeds.

Overall, the thesis has seven chapters. Chapter 2 is a general background chapter and is followed by the chapters for the three studies.

The three studies of the thesis are summarized in Table 1. The broader the time scale, the greater the amount of data from more languages that can be used, but the analysis will be shallower, lacking the rich detail of individual word histories. The narrower the time scale, the more detailed the information on the process of lexical replacement, but generalizations are also hampered by the small amount of material and the way changes in one word affect other words.

Chapter 3 houses the first study, which concerns domain-overriding generalizations of lexical replacement (research question A1). The chapter attempts to explain part of the likelihood (rate) of lexical replacement by building a statistical model using data from 87 Indo-European languages. Part of the material in this chapter has been previously published in Vejdemo and Hörberg (2016). The issue of domain-independent generalizations (research question A2) is discussed in the last section, as a bridge to the rest of the thesis.

Chapters 4, 5, and 6 all concern domain-specific generalizations for lexical replacement (research question A2 and B) in the semantic domain of color. Chapter 4 is an introduction to the linguistics of color, and also presents some common methodology of chapters 5 and 6.

Chapter 5 is a cross-linguistic synchronic study of the lexicalization strategies of the pink and purple parts of the perceptual color space in seven Germanic languages. From this, diachronic information on lexical replacement and other kinds of change can be inferred (research question B1). Knowledge of the sociohistory of color in the region will be combined with

Page 17: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

5

elicitation experiment results (thereby addressing part of research question B2). Part of the material in this chapter has previously been published in Vejdemo, Levisen, Guðmundsdóttir Beck, von Scherpenberg, Næss, Zimmerman, Stockall, and Whelpton (2015).

Chapter 6 is an intra-language study of lexical replacement processes concerning pink and purple in Swedish. The chapter combines dictionaries and encyclopedias from the last few centuries, corpus research into historical fiction novels, interviews with speakers and comparisons of elicitation experiment results from two generations of Swedish speakers to gain a fuller picture of lexical replacement (thereby addressing part of research question B1, B2).

Chapter 7 contains a general discussion of the research questions and will combine the perspectives of the three studies, thereby specifically reconnecting with research question C.

1.3 STYLE CONVENTIONS When words are discussed, they will be written in italics (röd). Concepts

will be written in small capitals (RED), while more loosely defined meanings (often tentative translations) will be written within single quotation marks (ljusröd ‘light red’). Cognate classes are written inside regular quotation marks (“rot”). The reader is reminded that all translations, especially when it comes to color terms, are tentative.

Page 18: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

6

Table 1. Organization of the studies in the thesis: the macro study in chapter 3, the meso study in chapter 5, and the micro study in chapter 6

Chapter 3 Chapter 5 Chapter 6 Time

scale

&

scope

Macro: several millennia,

87 language varieties

Meso: several centuries, seven Germanic

languages

Micro: two generations,

one language

Method

&

material

A statistical model tests domain-independent

hypotheses about lexical replacement

(based on a database of cognate class judgments of a

Swadesh list)

Comparison of variation,

using elicitation experiment results, supplemented with

dictionary data

Comparison of variation and change, using

elicitation experiment results, supplemented with

interviews, dictionaries, floras,

corpora Domain Core vocabulary Color, with focus

on pink and purple Color, with focus

on pink and purple

Page 19: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

7

2 BACKGROUND The difficulty with formulating generalizations for changes in the

lexicon stems at least partly from the sheer amount of material to be explained. Ever since the neo-grammarians’ startling revelation that sound change is regular and not random (see e.g. Paul, 1886), there has been a concentrated effort to find generalizations and rules in phonetic change – however, in contrast to the amount of words in the world’s languages, the number of phonemes is relatively small. Similarly, in the last century advances have also been made in finding generalizations and rules for grammaticalization (a term coined by Meillet, 1921) processes, which, simply put, relate to how semantically rich content words change and become semantically poorer function words, or the way in which the grammatical content of function words changes. But the number of function words and grammatical functions is also smaller than the number of content words and phenomena that can be named.

The great majority of replacement and change in language is not connected to phonemes or function words, but to semantically rich content words that change into, or are replaced by, other content words. This makes it both important, and difficult, to find generalizations for this kind of change.

This background chapter will first introduce some fundamental terminology and assumptions (section 2.1), and then go on to discuss the relationship between lexical replacement and other kinds of lexical change, as well as the semasiological and onomasiological perspectives that might be taken for such changes (2.2). In section 2.3, the term “primary (and secondary) word” is introduced, and section 2.4 turns to where new primary words come from. Section 2.5 argues that there is regularity in lexical change (and its subprocess, lexical replacement), and section 2.7 turns to different hypotheses on why and how such regularity might arise.

Page 20: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

8

2.1 WORDS AND CONCEPTS Any research into meaning change must take a stance on what meaning

is, and there is a rich and terminologically confusing literature on the subject. The aim of this section is to introduce some fundamental distinctions and address the issue of categorical particularism: is it even possible to ever compare the meaning of two words in different languages?

This thesis will mainly discuss words and the concepts that words designate. In order to gain a better understanding of the nature of concepts, some further ideas should be established.

It could be suggested that Swedish hatt, English hat, and Spanish sombrero all denote the same cross-linguistic semantic category HAT, which somehow exists independently of all of these languages. This makes sense at face value, but is a gross simplification: there very seldom exists an exact, perfect semantic overlap between words like hatt and hat. Differences can often be found, either denotationally (the ranges of objects that the different words can be used for: hat can be used for hard top hats and soft hats in English; Swedish hatt can only be used for the former) or connotationally (the more general associations the words bring forth in the minds of the speaker: for me, even out of context Swedish hatt readily evokes the idea of formal or fancy wear, while English hat does not). The Swedish word hatt could be said to match a language-specific concept HATT, which carries information on possible subsenses, denotation, connotation, morphosyntactic use, etc. In contrast, the English word hat could be said to match a language-specific concept HAT.

Haspelmath (2010) makes a distinction between language-particular descriptive categories (categories that are used to describe particular languages); cross-linguistic categories (that, if they exist, have some claim to universality or at least cross-linguistic applicability); and comparative concepts (that are linguist-specific, and are claimed to be useful in cross-linguistic comparison).

Categorical particularism is the scholarly position claiming that it is not useful to talk about cross-linguistic categories, but only of language-particular descriptive categories: This, however, would seem to mean that languages become “incommensurable systems” and that comparison is not possible (Boas, 1911; Haspelmath, 2010, p. 681).

In order to allow for comparison between languages, Haspelmath (2010, p. 663) suggests that crosslinguistic comparison should be based on comparative concepts created by the researcher, rather than on

Page 21: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

9

crosslinguistic categories that are assumed to be instantiated in different languages. “Comparative concepts are concepts created by comparative linguists for the specific purpose of crosslinguistic comparison. Unlike descriptive categories, they are not part of particular language systems and are not needed by descriptive linguists or by speakers. They are not psychologically real, and they cannot be right or wrong. They can only be more or less well suited to the task of permitting crosslinguistic comparison” (Haspelmath, 2010, p. 665. See also 2016).

Haspelmath (2010) uses these theoretical tools mainly to discuss morphosyntactic categories (like word classes), but they can also be applied to lexical semantics. This thesis will use the notion of comparative concepts in semantic research, and will deviate from the strict definitions that Haspelmath suggests are necessary for comparative concepts.

Several scholars have noted that comparative concepts may be based on the observation of specific cross-linguistic data patterns, which makes them discoveries rather than inventions (Beck, 2016, p. 398; Dahl, 2016, p. 431; Moravcsik, 2016, p. 422). There is good reason to accept vaguely defined comparative concepts: “vague categories with a prototype core and fuzzy boundaries” as expressed by Lander and Arkadiev (2016, p. 411) (See also Dahl, 2016, p. 435; LaPolla, 2016, p. 367). For some comparative concepts, it might be possible to establish rather exact definitions (‘the adjustable tool used to clamp down on and tighten nuts and bolts with edges’), which makes it easy to identify language-specific words (English: adjustable wrench; Swedish; skiftnyckel) so that comparisons can be made. For many semantic comparisons, however, that level of definition is often impractical, if not impossible, and often the definition of the comparative concept remains vague.

This does not mean that comparisons cannot be made. I would claim that one semantic example of comparative concepts already in use in linguistic literature is the shared concept alluded to in discussions about translation

equivalents. Swedish hund and English dog are translation equivalents because they both match a very similar idea and because they are the most neutral translation that a bilingual speaker would give, if asked to translate one of the terms to the other language. The idea (DOG) that they both match is a comparative concept, even though it is often not defined outright. The discussion should then concern whether the level of care taken to identify the translation equivalents is good enough for whatever task the researcher wants to accomplish, not whether it is right or wrong that hund and dog both match

Page 22: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

10

the same comparative concept (cf. the discussion of the pragmatic stance in cross-linguistic research in Koptjevskaja-Tamm, 2008, p. 10).

When this thesis compares concepts between different speakers, different languages, or different generations of the same languages, it is, unless otherwise stated, this kind of comparative concept that is intended.

Having briefly discussed how concepts and words will be used in this thesis, a few words should be said about referents. This thesis will mainly deal with denotations of words, by which is meant the extension in the real world that a word refers to: the denotation of dog is typically the set of all dogs; the denotation of red is typically the part of the perceptual color spectrum that red refers to. Occasionally, the connotation of a word will also be discussed, by which is meant the associated notions: dog might be associated with loyalty, or pets; red might be associated with danger, lipstick, or roses.

I use the vaguer terms “to mean” and “a meaning” when it is not necessary for the discussion at hand to determine or define a concept – instead, “meaning” should be read as “semantic content” and “to mean” as “to have (a certain) semantic content”.

While acknowledging that the exact meaning of sign emerges in each occasion of its use, this thesis will take the approach that the meaning of a word can and should also be studied at a more abstract level.

Page 23: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

11

2.2 LEXICAL REPLACEMENT AND LEXICAL CHANGE Koch (2016, p. 21ff.) writes that the connected linguistic processes of

lexical replacement (which he calls change of designation) and semantic change are both instances of the more general process: lexical change. What makes a historical lexical process a case of lexical replacement or a case of semantic change has mainly to do with whether a semasiological or an onomasiological perspective is used. This section will define and exemplify all these terms.

A prototypical case of lexical replacement was given in the introduction to this thesis: girl ‘young female person’ replacing maiden ‘young female person’. Parallel to girl replacing maiden, another lexical change (a semantic change) occurred when girl ‘young person’ came to mean girl ‘young female person’. The observation that girl ‘young person’ replaced maiden ‘young female person’ is made from an onomasiological perspective of a lexical change. This means that the analysis starts from a concept or meaning that is assumed not to change (e.g. ‘young female person’) and investigates which words are used to refer to that meaning. The observation that girl used to denote ‘young person’, and then came to denote ‘young female person’, takes a semasiological perspective of this lexical change. This means that the analysis starts from a word (e.g. girl) and investigates which meaning(s) the word can refer to.

Semasiology and onomasiology are thus used for two complementary but inverse perspectives from which it is possible to view the connections between words and meaning. The term “semasiology” was first used by Reisig (for an overview of the development of the term, see Nerlich, 2001; 1881), and “onomasiology” was defined as its counterpart by Zauner (1902). Geeraerts (2004, p. 653) notes that: “Semasiology considers lexemes and the way their meanings are manifested. Onomasiology considers concepts and investigates the way they relate to one another through language and the way that they are denoted through language.” Similar definitions can be found in Kleparski and Borkowska (2007, p. 127) and Grondelaers, Speelman, and Geeraerts (2007, p. 989).

The assumption that the concept does not change when a word is replaced is only valid for the most prototypical cases of lexical replacement. This thesis will, however, not limit itself to prototypical cases. Different lexical change processes overlap to a great degree, so that a particular historical change might involve both semantic change and lexical replacement.

Page 24: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

12

There are, of course, several other kinds of lexical change: for instance, sometimes a word is not replaced, but merely altered, as when television was replaced by tv. This kind of lexical form change is often connected to lexical replacement processes, but will not be a focus of this thesis.

2.3 LEXICAL REPLACEMENT AND PRIMARY WORDS So far, prototypical lexical replacement has been loosely defined as a

process that occurs when a concept in a language is first primarily designated by one word, and then later by another word. Clearly, there can be many words that denote the same or at least a very similar concept. Kleparski (1997, p. 71) talks about a concept’s primary designating expression and a concept’s secondary designating expressions (called primary and secondary words in the rest of this thesis). The former has the greatest chance to be chosen by speakers to name any of the entities concept may legitimately refer to: the primary word is more neutral and has greater applicability than secondary words. As an example, in English, the concept ADULT FEMALE HUMAN

BEING has as its primary word woman but also has several secondary words, like the nouns lady or female.

After a word has been replaced (lexical replacement) as the primary word for a concept, it may remain in the language with a slightly different meaning (semantic change) – maiden is still used in English, but is a rare word that denotes a virgin (usually female) or used in constructions like maiden voyage or maiden speech (see Kleparski, 1997 for a detailed semantic history of words for girl in English).

A former primary word that is replaced may also remain in the language and continue to denote the original concept as a secondary word. An example is Norwegian Bokmål pike, which was the most common term for ‘girl’ in the beginning of the 20th century, but which has now been superseded by jente. Pike is still used with the meaning ‘girl’, however, though it is seen as old-fashioned (Vejdemo, 2009).

The assumption that there is often a single most neutral, most frequent word for a particular concept is useful mostly at a broader time scale – such as observing a change from maiden to girl as the most neutral way of referring to a ‘young female person’ in English. At smaller time scales, the situation becomes more complex – polysemy may occur, and different words might be used by different age groups or social groups in the language community.

Page 25: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

13

2.4 WHERE DO NEW PRIMARY WORDS COME FROM? When lexical replacement occurs, the new word can come from three

principal sources (see also Grzega, 2003, p. 35): a) Speakers make the inference that an existing word in the language

could also be used for the meaning in question. This other word is appropriated and given a new meaning. One example is seen in the Modern French word jument replacing the earlier word cavale for ‘mare’; jument originally meant ‘pack horse’.

b) An inference is made from a new combination of linguistic elements already existing in the language. An example is seen in how Swedish städare ‘cleaner’ is in the process of being replaced by lokalvårdare ‘building+caretaker’.

c) A word, interpreted as having a meaning similar to another word, is borrowed from another language (or dialect). An example is seen in how Swedish personalavdelning ‘employee+department’ is in the process of being replaced by the English loanword human resource (department), or its acronym HR.

Several works have sought to build general, domain-overriding

taxonomies to document the many different ways that a, b, and c (and other kinds of lexical change) may play out in languages (see e.g. Ullman, (1957). These taxonomies are mainly descriptive and make little attempt to predict what kind of lexical change might happen for a given concept. A discussion of these processes are outside the scope of this thesis, and it suffices to note that two of the most frequently recurring lexical change processes in the taxonomies are when a word comes to be used to denote A) more semantic material than it did before or B) less semantic material. A (or an idea very similar to A) is referred to as “expansion” in (an English translation of) Bréal (1897/1900) and “widening” in Ullman (1957). B (or an idea very similar to B) is referred to as “restriction” in Bréal (1897/1900) and “narrowing” in Ullman (1957).

Another productive research area for generalizations on why and how lexemes change can be found in works on borrowing: for instance, it has been shown that words belonging to different word classes have different propensities for being borrowed (process c above). Haspelmath and Tadmor (2009b) use data from 41 languages in the World Loanword Database to show that there is a difference in how likely it is for borrowing to occur

Page 26: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

14

(borrowability) in terms of word classes: most importantly, nouns are far more likely to be borrowed than verbs. This echoes earlier findings from, among others, Haugen (1950, p. 224), who looks at the speech of Norwegian immigrants to the United States and finds that nouns are more likely to be borrowed than verbs, which in turn are more likely to be borrowed than adjectives (this can be expressed as nouns > verbs > adjectives). Moravcsik (1975) claims that verbs are seldom borrowed into the recipient language as verbs, but are rather borrowed as nouns that then undergo verbalization. Singh (1980, p. 113) looks at English loanwords in Hindi and finds nouns > verbs > adjectives. Building on Haugen, Muysken (2000, p. 74) suggests instead the hierarchy nouns > adjectives > verbs. Wohlgemuth (2009, p. 292) contains a good overview of the research on this topic and cautions that while nouns do seem to be easier to borrow than verbs, this is mainly due to the fact that words with nouny semantic content are more common in languages than words with verby semantic content. The semantic content, not morphosyntactic features, are what drives the phenomenon, and the pragmatics of language contact situations means that words referring to concrete objects are more important to share than words referring to actions or qualities.

These suggested borrowability rules are notably different from what is suggested about lexical replacement in general in Pagel, Atkinson and Meade (2007) (using Indo-European languages) and Vejdemo (2010) (using Indo-European and Austronesian data). These studies find that lexical replacement is more likely to affect verbs than nouns. The results for adjectives are inconclusive and vary for different methodologies and different data sets. The discrepancy between the semantic categories for a) the general likelihood for any kind of replacement (where verbs are more likely to be replaced) and b) the likelihood that the replacement happens due to borrowing (where nouns are more likely to be borrowed) may indicate that nouns and verbs undergo different typical replacement strategies. Nouns might be more likely to get replaced by borrowed words (as when Swedish krockkudde ‘lit. crash pillow, airbag’ was replaced by the loanword airbag under influence of the English word), according to strategy c above. Verbs might be more likely than nouns to be replaced by a synonym already present in the language (as when Swedish dinera ‘eat (dinner)’ became rarer and rarer so that it is now in the process of being replaced by the existing äta mat ‘eat food’), according to strategy a or b above. The inconclusive results for adjectives indicate the need for more studies of their replacement processes.

Page 27: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

15

2.5 IS THERE REGULARITY IN LEXICAL CHANGE? The sheer amount of diversity in human lexicons, as well as the

interdependence and complexity of different kinds of lexical change, has led some researchers to see lexical change as an unpromising object of systematic scientific inquiry. If lexical change is not regular, any attempts to investigate it will be purely descriptive. Lehrer formulates it thus: “Every word has its own history. About the best we have come to hope for is a taxonomy, or classification schema” (1985, p. 283).

Several such classification schemas of different kinds of lexical change have been made – some of the most notable by (1886), Darmesteter (1887), Bréal (1897/1900), Stern (1931), and Ullman (1957). These often deal both with the reasons behind lexical change (for instance: people seek to innovate), the linguistic tools used in the change (for instance: metaphor), and the results of lexical change (for instance: the possible referents of a word increase in number.)

Classification schemas of different kinds of changes can be very useful, but are, on their own, theoretical constructions that state facts and do not seek to explain them (see Anttila, 1989, pp. 146–148 for a critique of the value of schemas). Hock (1986, p. 308) seems to be of the opinion that explanations and generalizations are difficult to find: “there seems to be no natural constraints on the directions and results of semantic change”.

While it might be difficult to formulate strictly causal explanations for lexical change, of the kind that are found in the natural sciences, general tendencies can be sought. Both Lehrer (1985) and Geeraerts (1997) write that some headway might be made – Lehrer (building on the Word Field Theory of Trier (1931) shows that generalizations can be made for specific semantic domains (such as animal terms or gambling terms), and Geeraerts (1997) notes that some statistical generalizations and probabilistic predictions can be sought for semantic and lexical change. This thesis will employ both of these approaches.

Traugott and Dasher (2002, p. 1) are also hopeful, and claim that there are “predictable paths for semantic change across different conceptual structures and domains of language function” and that, while each instance of semantic change has its own story, semantic change is highly regular at a macro-level – both within a single language and across languages (Traugott & Dasher, 2002, p. 4).

Traugott and Dasher express the opinion that these regularities are not absolute, however – they are “possible, indeed probable, tendencies, not

Page 28: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

16

changes that are replicated across every possible meaningful item at a specific point in time in a specific language, such as the Neogrammarians postulated for sound change” (Traugott & Dasher, 2002, p. 1).

The same authors (2002, pp. 3–4) further note that extralinguistic factors have a constant effect on the use and interpretation of words, and therefore on lexical change, and this can make it difficult to find patterns. They write that nouns are “particularly susceptible to extralinguistic factors such as change in the nature of the social construction of the referent. For example, the referents of towns, armor, rockets, vehicles, pens, communication devices, etc., have changed considerably over time, as have concepts of disease, hence the meanings attached to the words referring to them have changed in ways not subject to linguistic generalization”.

Nonetheless, Traugott and Dasher (2002, p. 94ff) list three general tendencies of lexical change (� indicates change):

� Meanings based in the external described situation � meanings

based in the internal (evaluative/perceptual/cognitive) described situation. E.g. Old English felan ‘touch’ � ‘experience mentally’, Modern English grasp ‘take’ � ‘understand’.

� Meanings based in the external or internal described situation � meanings based in the textual and metalinguistic situation. E.g. anyway � ‘pragmatic particle for return to previous topic’.

� Meanings tend to become increasingly based in the speaker’s subjective belief state/attitude toward the proposition. (This is the dominant tendency according to Traugott and Dasher.) E.g. the development of honorifics is ongoing, since yesteryear’s honorifics gradually lose their exalted meaning.

In addition to such domain-independent generalizations, directionality

(of change) generalizations can also be found for much more specific semantic domains. Wilkins (1996) shows that names for smaller visible arm-related body parts often become names for larger contiguous body parts (nail � finger � hand), but not the other way around. Brown and Witkowski (1983) also work with body parts, and show that in small-scale societies, ‘eye’ is more unmarked than ‘face’ – and the latter may be derived by compounding or a derivation of the former. In the domain of color, Berlin and Kay (1969) suggest a particular universal evolutionary sequence according to which colors will appear in a language, and have claimed that this is unidirectional. This sequence will be discussed in more detail in chapter 4.

Page 29: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

17

2.6 LEXICAL CHANGE IS NOT UNIFORM The rate of language change is different for different parts of the

vocabulary – and different from time period to time period. There is evidence, for instance, that time periods following social and technological change see more language change. Johnson (1996) suggests that when society sees changes in technology, level of education, economy and quantity of information, this can lead to an increase in the rate of language change. She compares questionnaire data on word use that she collected in the 1990s with similar data collected in the 1930s, and shows that, for example, as familiarity with farming declined in the US, so did the knowledge of words pertaining to agriculture and husbandry. Speakers interviewed in the 1990s were often unable to supply synonyms to words like stallion, calve, ram.

Juola (2003) makes a similar point. He uses KL-distance, an information theory algorithm that measures entropy, to compare the similarities of documents based on the similarities of the words they contain. Juola (2003) calculates the linguistic distance between different articles in the National Geographic magazine from 1939 to 2000. He finds that the rate of language change has not been uniform. Most notably, English changed less during the Second World War, and then had a period of particularly rapid change in the decades following the war. Juola (2003, p. 90) shows that at time periods as short as a decade, linguistic change is algorithmically perceptible. He theorizes that the social upheaval of the war caused both soldiers and civilians to have new experiences and encounter new technologies, and that this led to changes in the spoken language. Changes in the spoken language then took some years to percolate into the written language of the National Geographic. In Juola (2005, pp. 171–172) the author correlates the amount of linguistic change with the per capita rate of patents in the US over the time period 1930-1980. A significant correlation (r=0.32, p<0.05) was detected between the rate of change in the National Geographic articles, and the number of patents eight years earlier. The author draws the conclusion that it takes nearly ten years for the effect of technological innovation (as measured by the patents) to show up in the language of general readership magazines, and that technological change can affect linguistic change, though there is no reason to think that it is the only factor.

Other evidence for the link between technological innovation and, specifically, lexical change is found in the semantic domain of color, where the advance of chemical dyeing processes has been connected to an increase in color words (see Casson, 1997, and also chapter 4).

Page 30: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

18

Bochkarev, Solovyev, and Wichmann (2014) use a far larger corpus than Juola to investigate how the rate of change in language varies over time. They look at the Google Ngram corpus, which has information on word frequencies from 1520 onwards. The corpus has 862 billion words in eight languages, of which the authors used six (English, Russian, German, French, Spanish, and Italian). Based on the observation (see e.g. Pagel et al., 2007; this claim will be investigated in depth in chapter 3) that more frequent words are more insulated against replacement than rarer words, Bochkarev, Solovyev, and Wichmann (2014) use frequency change as a proxy for lexical change in general. The authors assume that if a word underwent a large frequency change between two time stamps it would also have undergone more lexical change than one that had not seen such a frequency change. The authors find that the six languages had very similar rates of change over broader time periods (at least five decades), but that at shorter time periods there is much individual variation between the languages. The authors are also able to trace the impact of catastrophic events, such as the Second World War, on the vocabulary: following the war, words had much more change in their frequency than before. Stable, uneventful periods in history had a dampening effect on frequency changes, however.

Bochkarev et al. (2014, p. 7) note that the fact that historical events have such impact on lexical change could, at first glance, indicate that it would be unproductive to search for regularities underlying lexical change that would cut across languages and time periods. They do not subscribe to this view, however: “The contradiction, however, is only apparent, because the points of view are distinct. It is only in the bird’s eye view, facilitated by either large amounts of data or large time spans, that regularities emerge.”

The following sections will discuss several theories about semantic and pragmatic features that influence lexical change processes like lexical replacement.

Page 31: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

19

2.7 SOME SEMANTIC AND PRAGMATIC FACTORS IN LEXICAL CHANGE Over the years many different hypotheses about why and how words

change have been proposed by researchers. The hypotheses are often compatible, but highlight different aspects of lexical change. This thesis focuses on semantic and pragmatic factors influencing lexical change. This means that equally important factors, like meta-communicative needs, desires, and agendas of speakers, or the way that speaker networks are organized, lie outside the scope of this thesis.

2.7.1 EMOTIONAL CHARGE Many concepts evoke emotional responses in speakers. One hypothesis

contends that the higher the emotional charge of a meaning, the more there will be semantic change or lexical replacement for words expressing the meaning (see Burridge (2006, p. 453). Already Sperber (1923) suggested that the need to relieve emotional tension was one of the main driving mechanisms behind change. Some meanings contaminate the words that denote them, necessitating new euphemisms. Writing several decades later, Pinker (1994) calls this the “euphemism treadmill”: “The euphemism treadmill shows that concepts, not words, are in charge: give a concept a new name, and the name becomes colored by the concept, the concept does not become freshened by the name.” In some cases, however, contaminations can happen on formal rather than semantic grounds, when a word can be or become taboo due to homonymy or polysemy (Burridge, 2006, p. 453), as in how the American English term rooster ‘male hen’ replaced cock ‘male hen’, due to the latter word being homonymous with cock ‘penis’.

It seems uncontroversial that some parts of the vocabulary (like words for genitalia) often undergo taboo-related changes – but is emotional charge a factor that drives lexical change in the general vocabulary? Certainly there are concepts that are not taboo, but which have what Grzega refers to as “high anthropological salience”: the sociophysical anthropological reality of the speakers leads to certain concepts being inherently more likely to evoke an emotional response. To illustrate this, Grzega (2004, pp. 32–33) gives several examples of how words for ‘girl’ have changed multiple times from Old English to Modern English: the words gradually come to denote taboo concepts, and as a consequence, new terms have to be found for the neutral meaning ‘girl’ to avoid unintended associations. ‘Girl’ has a high emotional charge even though it is not a taboo meaning.

Page 32: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

20

Having made the case that a high emotional charge may lead to lexical change and replacement, the question now becomes whether this (i.e. the likelihood for euphemisms to appear) can be quantified and measured.

Osgood, Suci, and Tannenbaum (1957) introduce the concept arousal, which might be applied here. Arousal has not, to my knowledge, been used before to quantify emotional charge in order to investigate its effect on lexical replacement, though Janschewitz (2008, p. 1065) mentions that a defining aspect of taboo words is that they incur high arousal.

Arousal is a feature of a word that can be measured using psycholinguistic Semantic Differential techniques. The techniques measure the (subconscious) opinions speakers have of particular words, by asking them to rate the word on a number of scales. The experiments produce three principal values that together reflect a person’s subconscious and conscious emotions towards a word: the word’s valence, arousal, and power. Valence is measured by a scale from good to evil; arousal from agitation and taboo to indifference and calm; and power (sometimes called “activity” or “dominance”) from dominant and active to passive and inactive.2 From this point on, I will only use the arousal scale, not those of valence or power, since this is the scale that best corresponds to emotional charge.

Are arousal values cross-linguistically comparable? Warriner, Kuperman, & Brysbaert (2013) compare the cross-linguistic correlation results of their own (English-based) study with four other studies measuring arousal value and find strong significant correlations (p<0.05). Significant correlations between the arousal values in the different languages were found for all the studies: English (Bradley & Lang, 1999, r=0.759), Dutch (Moors et al., 2013, r=0.575), Spanish (Redondo, Fraga, Padrón, & Comesaña, 2007, r=0.692), and Portuguese (Soares, Comesaña, Pinheiro, Simões, & Frade, 2012, r=0.635).3 The Dutch, Spanish, and Portuguese data all had English glosses and were matched with the English data using these. The existence of cross-

2 While the three principal values of a Semantic Differential test are considered to be largely independent, certain connections are expected. Lexemes that are judged to be very positive or very negative (high versus low valence) are more arousing than those that are judged to be neutral, for instance (Warriner, Kuperman, & Brysbaert, 2013, p. 11), though the most negative lexemes have, on average, a higher value of arousal than the most positive lexemes do (Võ et al., 2009). 3 The correlations between the arousal values in the different studies were lower than the correlations between the valence and power values, and the variance was higher (Warriner et al., 2013, p. 14). The split-half reliability of arousal was also lower than for valence and power. The split-half reliability is measured by comparing half of the dataset with the other half (ibid:9). This indicates that subjects were less in agreement within and across languages about the arousal values than they were for power or valence values.

Page 33: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

21

linguistic agreement on the arousal values of concepts shows that it is a valid measurement for estimating the emotional charge of concepts, not just of individual words – at least when the data comes from languages spoken in similar cultures.

2.7.2 IMAGEABILITY Imageability is a psycholinguistic measure of the ease with which

speakers are able to picture the meaning of a word in their minds. A link between imageability and lexical replacement, more specifically between low imageability and slow lexical replacement, was suggested in a pilot study (Vejdemo, 2010) that examined lexical replacement in word inventories for concepts in both Indo-European and Austronesian languages. Those concepts that are more easily imagined and pictured in the minds of the speakers (STONE, in contrast to OLD) would be expected to undergo less word replacement.

Much psycholinguistic research has investigated the difference in brain processing of highly imageable and less imageable items. Paivio (1971/2013) bases the dual coding theory of cognitive organization partly on the difference between processing of easily visualized and hard to visualize words. Schwanenflugel (1991), and Crutch and Warrington (2005) also discuss the fundamental differences between these categories. Mårtensson, Roll, Apt and Horne (2011, p. 456) show that nouns connected to sensory semantic (visual and otherwise) features are dealt with in different parts of the brain than those that are not.

When measuring imageability in psycholinguistic studies, participants are typically asked how easy it is to form a mental image, when presented with a particular word. A closely related semantic feature to imageability is concreteness: participants are in such cases asked how concrete a word is. Imageability and concreteness are highly correlated (for a discussion of their differences, see e.g. Richardson, 1976).

2.7.3 FREQUENCY AND ENTRENCHMENT Many researchers have connected the frequency of use of words and

constructions to the likelihood that they will be replaced and have suggested that the underlying reason for this is due to entrenchment.

A high frequency for a lexical item can lead to several different results: highly frequent lexical items in languages are more likely to undergo semantic change (Stoffel, 1901) and suffer phonetic attrition (Zipf, 1935), but while the item might be phonetically reduced and its semantic content changed, the

Page 34: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

22

word form itself might be protected from lexical replacement by the higher frequency (Haspelmath, 2004; Winter, 1971). Bybee (2007) argues that the first two phenomena are due to the reducing effect, and the latter to the conserving effect of high frequency.

Examples of the reducing effect include how the frequent phrase God be with you was reduced to good-bye, or the gradual semantic weakening of intensifiers, which leads to lexical replacement when speakers regularly introduce new ones (for more on intensifiers, see Hopper & Traugott, 2003, p. 122). Pagel et al.’s (2007) finding that the more frequent a word is, the more unlikely it is to be replaced is an example of the conserving effect.

Bybee argues that the reason that frequency has these effects on language is due to lexical strength – a feature of cognitive organization that will be referred to in this thesis as, following Langacker (1987, p. 59), entrenchment. Each time a word is heard or produced, this leaves a slight trace on the mental lexicon. The repetition strengthens the memory of the word and makes it more accessible (Bybee, 1985, pp. 10–12, 117; Langacker, 1987, p. 59).

The preceding discussion concerned entrenchment mainly from a semasiological perspective, focusing on the frequency of linguistic signs (words, utterances). It can also be useful to consider the process from an onomasiological viewpoint by asking the question: Can concepts have different levels of entrenchment?

Bybee (2007, p. 18) notes that words do not become frequent at random. Frequent topics are those that speakers like to talk about – typically themselves and things that concern them greatly. Scheibman (2002, p. 61) also mentions that subjective speaker-relevant words and topics have a higher frequency of occurrence than less speaker-relevant words and topics.

Put another way, frequency is a feature of words (linguistic signs), but it could possibly also be seen as a feature of the underlying concept. Calude and Pagel (2011) examine primary word frequency rates in six languages from different language families: they use a list of meanings (the Swadesh list) and its translation equivalents across the languages. They find that the primary word frequencies are stable cross-linguistically: the correlation between the languages is very strong (r=0.73, p<0.0001). This means that the most prominent words (e.g. woman in English, flicka in Swedish) for a comparative concept (WOMAN) have more or less the same frequency ranking regardless of language, at least for the core vocabulary present in a Swadesh list. It also means that not only can we talk about a frequency ranking for a word, but we can talk about frequency rankings for concepts (inferred from words from

Page 35: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

23

several languages), and maybe measure this by averaging the relative frequencies of the most prominent word for the concept in several languages.

While word frequency is often used as a way to quantify entrenchment, it is far from clear which is the best way to measure frequency. Gries (2010, p. 269) suggests two main ways: frequency of occurrence and frequency of co-occurrence. Frequency of occurrence measures count the number of times each word (or each lemma, in which case dog and dogs are counted as two instances of dog) appears in a text (possibly also considering the distribution of the word over different kinds of texts, e.g. genres). Frequency of co-occurrence measures consider how often items co-occur. There are several alternate ways to measure co-occurrence – Gries (2010, p. 275) notes that over 20 different measures exist in the literature, the most common of which is Mutual Information (see chapter 3 of this thesis for more on Mutual Information).

This section aimed to provide some background on the phenomenon of entrenchment, but has avoided contrasting and comparing the specific definitions of individual authors. It is still very much an active research field, and the relationship between frequency and entrenchment is complex. Geeraerts, Grondelaers, and Bakema (1994) argue that frequency of use does not determine the overall entrenchment of a word at all: frequency and entrenchment are instead most useful when considered for a specific function of a specific word, in comparison with alternative words that can be used to denote that specific function. Schmid (2010, p. 125) notes that “we have understood neither the nature of frequency itself nor its relation to entrenchment.” A more optimistic stance would be that we understand a little bit more for each publication dealing with the subject. This thesis returns to both frequency of occurrence and frequency of co-occurrence (in the guise of a Mutual Information-derived measurement) in chapter 3.

2.7.4 SUBJECTIFICATION, INFERENCES, POLYSEMY, SYNONYMY Traugott and Dasher (2002, p. 30) argue that subjectification, which

enables inferences, is the major cause of lexical change. This section will define these terms, and explain their connection to polysemy and synonymy.

Traugott and Dasher (2002, p. 30) define subjectification as follows: “Subjectification is the semasiological process whereby SP/Ws [speakers or writers] come over time to develop meanings for Ls [lexemes] that encode or externalize their perspectives and attitudes as constrained by the communicative world of the speech event, rather than by the so-called “real-world” characteristics of the event or situation referred to.”

Page 36: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

24

Inferencing is the action of assuming that a word can designate to not just meaning A, but also the semantically close meaning B.

The importance of subjectification and inferences in lexical change was already noted by Bréal (1897), who suggests that language change can occur when a subjective meaning of a particular author gets reinterpreted as a new general meaning. Paul (1886) deals with a similar topic and introduces an important distinction between the usual meaning (the usual, established, conventional meaning) and the occasional meaning (the context dependent meaning in a particular utterance).

The shift between occasional meanings to usual meanings can only be understood in context. Usual meanings in a specific context can trigger new occasional meanings (inferences are made), which can later become usual meanings themselves if the context they are used in is repeated often enough (see also Geeraerts, 2004, pp. 15–16). An example of a word that shifted from an old usual meaning and adopted one (of many) occasional meanings as its new usual meaning is German Schirm. Schirm used to mean any kind of ‘screen’ or ‘protective surface’, but then came to be mostly used for Regenschirm ‘umbrella’.

Combining subjectification and usual and occasional meanings into a single theory, Traugott and Dasher (2002) introduce the Invited Inferencing Theory of Semantic Change (IITSC). For similar proposals, see “context-induced references”, as described in Heine, Claudi, and Hünnemeyer (1991) and also Levinson (1995).

Following Levinson (1995), Traugott and Dasher (2002) distinguish three levels of meaning relevant to a lexeme. Note the similarities to the usual and occasional meanings of Paul (1886).

� Coded meanings (semantics) are the convention of a language at a

given time. � Utterance type meanings are generalized invited inferences (GIINs).

They are preferred meanings and conventions of use. They can be canceled and may be pragmatically ambiguous. GIINs can be exploited to imply/insinuate certain meanings.

� Utterance-token meanings are invited inferences (IIN) that have not been crystallized into commonly used implicatures. They arise in context “on the fly”. There is little solid evidence that IINs differ in different communities (i.e. that the way implications are made up on the fly differ), although it is often surmised that literacy may affect ways of interpreting utterances.

Page 37: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

25

New meanings are “considered GIINs so long as the original coded

meaning is dominant or at least equally accessible, but when that original meaning becomes merely a trace in certain contexts, or disappears, then the GIIN can be considered to have become semanticized as a new polysemy or coded meaning: a macro-dynamic change has occurred” (Traugott & Dasher, 2002, p. 35).

So, a coded meaning has many invited inferences (IINs). One of these might become a generalized invited inference (GIIN), which, if the original coded meaning is lost, might then become a new coded meaning. An example is as long as in Middle English, which had the coded meanings both of spatial distance (this ship is as long as the other ship) and temporal distance (do it as

long a time as he does it). An invited inference of the latter would be a conditional interpretation of the temporal, which was most common during early Middle English: in Early Modern English, the conditional interpretation was more common than the temporal. In Modern English, as long as, when not used with two nominals, mainly has the conditional meaning (Traugott & Dasher, 2002, p. 37).

The reasons why subjectification and inferencing are possible are most likely connected to the human cognitive capacity for considering the same situation from different perspectives (see Verhagen, 2010 for an overview of this active research field). Speakers may interpret a meaning along a continuum, with conventionalized form-meaning pairings (cf. a coded meaning) on one end and a non-conventionalized new inference (cf. an IIN) on the other. Academic ideas that come close to this include change of focus, in Paradis’ work on metonymization and language change (2011); MacLaury’s Vantage Theory (among others: MacLaury, 1991, 1995, 1997; see also chapter 4 of this thesis), where the process is called a “change of vantage” (1991); and Langacker’s work on subjectification and perspectivization (Langacker, 1987, among other works). Lexical change in a whole language community would occur when enough speakers regularly make the same IIN, which becomes a GIIN and eventually a coded meaning.

Having discussed the cognitive processes of subjectification and inferencing, we will now turn to their relationship to polysemy and synonymy. I argue that a greater propensity for subjectification is connected to a greater chance of having many senses, and that this is connected to a greater chance of semantic change – but is also possibly connected to a lower chance of lexical replacement. I also contend that inferencing is not only connected to

Page 38: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

26

semantic change via subjectification, but is also connected to lexical replacement through synonymy.

If subjectification leads to more senses of a word being used, or to a semantic change of what the primary sense of a word is, then a propensity for subjectification, and a word having many senses, could be said to be linked to a propensity for semantic change. Another way to put this would be to say that a high rate of polysemy for a word might be linked to a greater propensity for semantic change. There is plenty of support for this, usually (though not exclusively) based on purely theoretical works, with anecdotal examples from semantic change in particular languages:

Goossens (1969, p. 100), who works with differences in the phonology and lexicon of German dialects, says that all language variants seek to avoid polysemy: languages have polysemiphobia, which is a driving force in language change. This is echoed by, among others, Anttila in a textbook on historical and comparative linguistics. Anttila notes that language seeks to move towards linguistic isomorphism: “When the one-to-many relations between form and meaning are felt to be problematic one can expect a move toward one-to-one-symbolization” (Anttila, 1989, p. 407).

Kleparski (1997, p. 63) works with a large diachronic corpus of English (1100-1700) and investigates words for the concept GIRL. Using examples from the corpus he reasons that polysemy is “a very important, if not the most important vehicle for semantic change or, at least, the most important ingredient”. Both Kleparski (1997) and Blank (2001, p. 4603) are of the opinion that polysemy is part of the synchronic side of lexical semantic change (while Blank (2001) is a theoretical work, it rests partly on Blank (1997): a corpus of 600 historical semantic changes, mainly from Romance languages, as well as insights from other published works).

Traugott and Dasher also see a polysemous stage as necessary in any semantic change: “Semantic change cannot be studied without drawing on a theory of polysemy because of the nature of change. Every change, at any level in a grammar, involves not “A > B” (…) but rather “A > A ~ B” and then sometimes “>B” alone” (2001, p. 11). Traugott and Dasher's theory does not assume that meaning A will disappear, and there are other voices that caution against seeing elimination of a sense as the necessary end product of a polysemous state (see discussion in Koch, 2016; Nerlich & Clarke, 2001).

A final example of a work that supports the idea of a link between polysemy and semantic change is Boussidan (2013), who investigates the amount of polysemy in English corpora through computer models and finds a link between high polysemy and a greater likelihood of semantic change.

Page 39: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

27

The interplay between subjectification, polysemy, and semantic change is complex. As stated above, Kleparski argues that polysemy is a vehicle for semantic change – but it can surely also be the other way around: subjectification leads to semantic change, which leads to an increase in the number of senses. Perhaps the most useful statement that can be made is that a change in polysemy is connected to a change in semantics – and that both of these can be caused by subjectification.

If a high rate of polysemy is connected to a greater propensity for semantic change, this does not mean that it would lead to a greater propensity for lexical replacement (cf. the results of the conserving effect versus the reducing effects of frequency discussed in 2.7.3). Fewer linguists have been interested in the link between lexical replacement and polysemy than in semantic change and polysemy. Kapitan (1994) studies the survival rates of Latin words in Romance languages, and finds that the more polysemous a linguistic sign (a word) is, the more chance there is that it will not be replaced. Lüdtke (1985, p. 363) notes that there is a possibility, which he considers to be remote, that “an item may get a somewhat longer respite from its inexorable fate of loss of identity […] by acquiring either a different or an additional meaning, if this process goes along with higher frequency of occurrence.” The reasoning is that a polysemous form would be used in many different genres and contexts, which might lead to a greater chance of form retention – it would be more difficult to replace the word, because it is used in more contexts. This would not only be a question of frequency, but also one of diversity: by being anchored in several different semantic networks, a polysemous form might be more entrenched.

Turning to (near) synonymy, a lexeme in the mind of a speaker that has many different connections to other lexemes, might be more likely to undergo both semantic change and lexical replacement, due to speakers having more materials from which it is possible to make inferences: if a speaker has both the primary word wireless for RADIO (common in the first two decades of the 20th century) and a more rare synonym radio, the presence of the synonym might make lexical replacement easier.

It is thus a reasonable hypothesis that the more semantically close a group of concepts are, the easier it is for speakers to make inferences that word W1 might be used instead of word W2 for a certain concept, and for lexical replacement to occur. Words that have many lexical relations (synonyms, hyponyms, etc.) might be more likely to expand or retract their denotational range of references than words with fewer lexical relations. Concepts that can be expressed by many different (overlapping) words could

Page 40: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

28

be more likely to have changes in their lexical inventory than concepts that cannot. Therefore, if there is a way to measure how many semantically close neighboring concepts and/or semantically related words a target concept/word has, this measurement should correlate with the likelihood of lexical replacement for the target concept.

Naturally, it is with synonymy as it is with polysemy: more synonyms might lead to a greater propensity for change, or replacement, or a propensity for change or replacement might lead to more synonyms. It is also logical to assume that if a word W1 has a greater number of senses than another word W2, then W1 will also have more potential synonyms: synonymy and polysemy are connected and are both highly relevant both for lexical replacement and semantic change.

2.7.5 SPEAKER AGE EFFECTS An important division within research into language change has

historically been between those who believe that change primarily takes place between generations in the imperfect acquisition of language by children and those who believe that large scale change also happens later in life. The basic assumption of the former school of thought is that children make repeated mistakes when learning their first language, and that some of these mistakes are then incorporated into their language (see e.g. Halle, 1962). “The child is exposed to the utterances produced around her, and may intuit a grammar that is different in some way from the grammar of her parents” (Croft, 2000, p. 44). The changes the child makes then become part of a fixed new language for the next generation. Jespersen (1922, pp. 161–162) notes that earlier linguists had many and conflicting views on the matter. He reaches the conclusion that it is not the age of the individual learning a new language element that matters, but that the act of learning is of paramount importance in precipitating language change.

Attributing change only, or primarily, to new learners also means that language change should be very abrupt, which is not the case (Croft, 2000, p. 45). Croft further notes (2000, p. 46ff) that the idea that imperfect learning is the only, or major, motor of language change is now mostly abandoned, but that it still can be found in some formalist syntactic theories, such as minimalism.

While the exact strength of generational transmission effects on language change is debated, the argument that there are some effects is uncontroversial.

Page 41: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

29

Something that will be important later on in this thesis is the finding that different age groups innovate to different degrees – several studies find that younger people innovate more than older people (see Southworth, 1990). This also begs the question of if, and if so when, younger speakers stop innovating. Croft (2000, p. 49) notes that after adolescence (ages 6-12), errors in the language acquisition process of children are very rare, and that, after this stage, children’s non-normative language use is similar to the everyday innovations of adults. This means that at some point during childhood, vocabulary increases should stabilize. Interestingly, Hodgson and Ellis (1998) find that earlier acquired words are less likely to disappear from elderly (71–86 years of age) speakers’ inventories. Section 4.2.5 will discuss what is known about age of acquisition in relation to speakers’ color term vocabulary.

The typical age of acquisition for a lexical item might be an indication of its level of entrenchment, since words that are acquired earlier might be more mentally entrenched. As discussed in section 2.7.3, entrenchment of a word form can insulate the word form from lexical replacement. Age of acquisition is also highly correlated with frequency (Blumenthal-Dramé, 2013, pp. 39–40).

Age of acquisition will also be discussed in more detail in the next chapter, in the context of how likely it is to predict the rate of lexical replacement.

Page 42: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

30

3 MACRO-PERSPECTIVE: A MODEL OF SOME SEMANTIC AND PRAGMATIC CAUSES OF LEXICAL REPLACEMENT4

This chapter focuses on research question A1, though A2 will be

discussed briefly at the end, in section 3.5.

A1. Is it possible to formulate global, domain-independent generalizations for lexical replacement? A2. How can local, domain-dependent generalizations be found?

The chapter looks at lexical replacement from a macro time scale

perspective (the age of Indo-European, approximately 5,000-8,000 years) and reduces the complex and gradual process of lexical replacement to a binary state: either a replacement has happened or it has not happened. By counting the number of replacements having taken place in 87 Indo-European daughter languages for a list of 176 concepts, a Rate of Replacement measurement may be derived for each concept.

Following earlier research (Ladd, Roberts, & Dediu, 2015; Monaghan, 2014; see Pagel et al., 2007; Vejdemo, 2010), this chapter will take such a measurement – specifically one calculated by Pagel et al. (2007) – and use it to construct a statistical model that evaluates several of the proposed semantic and pragmatic factors that were hypothesized to contribute to the rate of lexical replacement (discussed in section 2.7).

Specifically, the model will be used to evaluate the claims that the following factors influence the rate of lexical replacement: emotional charge / arousal (discussed earlier in section 2.7.1); word class and imageability (see 2.7.2); entrenchment (operationalized both as frequency and as mutual information, see 2.7.3); synonyms and polysemy (see 2.7.4); and the age of acquisition (see 2.7.5).

4 A condensed version of this chapter has been published in Vejdemo and Hörberg (2016). Background research, data collection, analysis, and initial model design for that article were done by the author of this thesis, and Hörberg focused on setting up and running the final statistical model. This chapter is substantially longer than the article, and sections 3.5 and 3.6 contain entirely new material.

Page 43: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

31

In this chapter, comparative concepts (e.g. WOMAN) are not defined outside the semantic content alluded to by their English meta-language labels. Their translations into particular languages (e.g. Swedish kvinna, English woman) are assumed to be a close enough match, semantically, to the meaning of the English meta-label.

Lexical replacement is treated, in the perspective taken in this chapter, as a fact, not as a process. Its relationship to other lexical change processes, such as semantic change, is not investigated.

Section 3.1 will provide some background on lexical replacement – how it can be measured, the role of Swadesh lists in studying it, and some previous statistical models that use Swadesh lists. Section 3.1.4 will discuss some problems with existing statistical models, and motivate why the model in this chapter is structured the way it is. Section 3.2 will present the data and methodology of the new model, while section 3.3 will present and evaluate the model. Section 3.4 summarizes what generalizations about lexical replacement might be assumed based on the model, and section 3.5 addresses the conspicuous absence of semantic domains as a factor in the model, and provides a bridging context to the rest of the thesis.

Table 2 shows the place of this chapter within the thesis.

Table 2. The place of the present study within the thesis.

Chapter 3 Chapter 5 Chapter 6 Time

scale

&

scope

Macro: several millennia,

87 language varieties

Meso: several centuries, seven Germanic

languages

Micro: two generations,

one language

Method

&

material

A statistical model tests domain-independent

hypotheses about lexical replacement

(based on a database of cognate class judgments of a

Swadesh list)

Comparison of variation,

using elicitation experiment results, supplemented with

dictionary data

Comparison of variation and change, using

elicitation experiment results, supplemented with

interviews, dictionaries, floras,

corpora Domain Core vocabulary Color, with focus

on pink and purple Color, with focus

on pink and purple

Page 44: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

32

3.1 BACKGROUND AND METHOD In section 3.1.1, the case is made that the rate of lexical replacement

can be measured, and a way to calculate the rate is discussed. In section 3.1.2, the Swadesh lists that are often used as data sources for these kinds of measurements are discussed. Following that, some earlier models that use the rate are presented (3.1.3) and subsequently evaluated (3.1.4).

3.1.1 MEASURING LEXICAL REPLACEMENT: BASIC ASSUMPTIONS Over the years, many researchers have noted that the primary words

(the most neutral term for a concept, see section 2.3 for full definition) for lexical concepts are replaced at different rates – and that this rate can be calculated (see “Stability Ranking” in Dolgopolsky (1986) and “Retentiveness” in Lohr (1999), “Rate of Lexical Replacement” in Pagel et al. (2007)). These authors use slightly different methods to calculate the rate, though their end results are very similar. An illustrative example for a very simple method comes from Dahl (2004, p. 262).

Dahl contrasts how the Latin word form for the concept THREE (tres) was retained in all the daughter languages he surveyed with how the Latin word for girl (puella) has been replaced as the primary word in all the languages, as shown in Table 3.

When two semantically similar words in two different languages are related, they belong to the same cognate class. Thus, since both the Venetian word fia ‘girl’ and French (jeune) fille ‘girl’ are historically developed from Latin filia ‘daughter’, they are related and are therefore considered to belong to the same cognate class. By counting cognate classes, it is possible to quantify the relative likelihood of replacement for THREE and GIRL in the Romance family: the number of cognate class replacements is divided by the number of languages in the sample. To illustrate this: if all languages (15 in the Dahl (2004) sample) use words for THREE from the same cognate class, the rate of lexical replacement can be expressed as 1/15 = 0.07. As can be seen in Table 3, there are 15 different cognate classes for GIRL in the 15 languages: the rate of change can be expressed as 15/15 = 1.00. Note that this “rate of replacement” measurement is not comparable between different language sample sizes: if no replacement occurs for THREE in a sample of 15 languages, the rate would be 1/15. If no replacements occur for TWO in a sample of 100 languages, the rate is 1/100 – in other words, rates are comparable within a sample, not across samples.

Page 45: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

33

Table 3. Words for GIRL and THREE in selected Romance languages

Sometimes there are two or more synonyms that are equally applicable

as primary word for a concept – say ragazza and fanciulla in Italian for GIRL. The researcher would then need to decide whether to choose just one of these terms (at random) or to include both of them in the data set.

This kind of measure can also be used for data from language families where, unlike Romance, the proto-language is not well-known. An assumption is made that the proto-language had a single primary word for a certain concept. If the concept is still denoted by the same cognate class in all daughter languages, it seems likely that no replacement has occurred. However, if the concept is denoted by many different cognate classes in the daughter languages, one could say that lexical replacement has indeed occurred for the concept.

There are several potential problems with this calculation of a rate of lexical replacement. The first is that there might have been several primary expressions in the proto-language at some point. Fortunately, very close synonyms rarely stay synonymous, but often diverge, at least slightly, in meaning. Any such state of synonymy thus only holds for brief moments in time. Another potential problem could be borrowing. Several daughter languages might in fact undergo lexical replacement for a concept, but this diachronic diversity could then be hidden by a common loanword being incorporated into all the languages. This can be countered by using sets of concepts that are very resistant to borrowing.

The example of Romance words for GIRL was used to illustrate the general principles behind using cognate classes to estimate the rate of lexical

Latin Asturian Cata

-lan

Corsi-

can

French Galici-

an

Gascon Italian Portu-

guese

Proven

-cal

tres tres tres trè trois tres tres tre três tres

1 1 1 1 1 1 1 1 1

puella moza minyona giuvanotta (jeune) rapaza gojata ragazza; menina chatofille fanciulla

1 2 3 4 5 6 7; 8 9 10Romanian Romansh Sardini

-an

Sicili

-an

Spanish Veneti-

an

Walloon

trei trais tres tri tres tri treus

1 1 1 1 1 1 1

fata giuvna pitzinna picciotta muchacha fia; tóxa båcele

11 3 12 12 13 4; 14 15

Sum: 15 cognate classes

15/15 = 1.00

Sum: 1 cognate class

1/15 = 0.07

Page 46: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

34

replacement. In the following sections, specific instantiations of these principles will be discussed.

3.1.2 LEXICAL REPLACEMENT AND SWADESH LISTS Databases with Swadesh lists are routinely used for investigations into

lexical replacement, and this section will discuss their strengths and weaknesses.

Swadesh (1950) proposes lists of diachronically very stable concepts, with the aim of gathering lexical denotations for a set of concepts from as many languages as possible: these lists are now referred to as Swadesh lists. The most commonly used versions are the 100-word Swadesh list and the 200-word Swadesh list. The original purpose of the lists was to investigate the historical relationship between different languages, based on the assumption that the concepts in the lists would undergo replacement at the same constant rate over time. This assumption was later proved to be false by, among others, Rea (1958); Bergsland and Vogt (1962); and Dyen, James, and Cole (1967). Nonetheless, Swadesh lists are still routinely used by field workers, as they are one of the few quantifiable methods used to determine the relative closeness of two or more languages.

The present study does not involve investigating genealogical relationships of languages. The genetic relationships of the languages are unimportant for comparing the relative rate of change (assessed from all the languages) of one concept to the relative rate of change of another concept (assessed from the same languages). Whatever the genetic relationships between the languages, this state of affairs will affect all the concepts in a language in the same way.

A key assumption when using cognate classes to measure relative rates of replacement is that the shared human physiology and world experience make humans categorize the world in similar ways. Some comparative concepts, like STONE or TO SLEEP, should often have matches in languages across the world. The rate of lexical replacement is a feature of such a comparative concept and is assumed to, at least to a certain extent, account for the lexical replacement rate of the matching words in each language. The fact that these words are pre-selected for their supposed stability means that the Swadesh lists may not be a good representative sample of the entire vocabulary of a language. Even so, they are often chosen for investigations into lexical replacement. There are two main reasons for this.

The first reason is the availability of data. There are several published databases with Swadesh wordlists from hundreds of languages, where the data

Page 47: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

35

is tagged with cognate class judgments. Examples include Dyen, Kruskal and Black (1992) on Indo-European, the IELex (http://ielex.mpi.nl/) and the Austronesian Basic Vocabulary Database (Greenhill, Blust, & Gray, 2008).

The second issue is the possible problem of borrowing, mentioned in section 3.1.1. This should be less of an issue with the Swadesh data, since the items included in a Swadesh list are pre-selected for low likelihood of borrowing – nonetheless, this assumption will be tested on the new model in section 3.2.

How different are the concepts on the Swadesh list from the language as a whole? Their language-specific translations skew towards higher frequency compared to the vocabulary as a whole (Kapitan, 1994, p. 239; Piantadosi, 2014, p. 1116). The Swadesh list concepts are also, as was originally intended by Swadesh, less likely to be replaced over time. Sankoff (1970) shows that the Swadesh lists do have a lower rate of replacement. He calculates and compares a rate of replacement (called λm) for two sources: a) 1077 comparative concepts from Buck (1949), which has translations and cognate judgments from 31 Indo-European languages, and b) 159 Swadesh list comparative concepts from Dyen et al. (1992), which has translations and cognate judgments from 95 Indo-European language varieties.

Sankoff (1970) is able to show that, as a set, the Swadesh comparative concepts generally have a lower rate of replacement than the set of comparative concepts from Buck (1949). This is illustrated in the histogram in Figure 1, where the X-axis is the rate of replacement and the Y-axis is the proportion of meanings: approximately 20% of the Swadesh meanings had a rate of replacement between 0.32-0.40 λm while approximately 15% of the Buck meanings fall into the same bracket. The Buck list words approach a normal distribution when it comes to different rates of replacement, while it is clear that the Swadesh list is skewed to the left: the Swadesh list has more stable concepts (i.e. with a lower rate of replacement)5 than the Buck list.

The Buck list of concepts is not as obviously pre-selected for stability as is the Swadesh list, and is a better sample of a natural language lexicon. Yet the Buck list is not a random list of words either, since the author’s intention was to trace the concepts’ lexicalizations back to Sanskrit and Old High

5 Unfortunately the digitized data from Buck (1949) used in Sankoff (1970) is no longer available (David Sankoff, p.c.), and a redigitalization of the data in the Buck dictionary would require substantial amounts of time and effort.

Page 48: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

36

German: concepts like SUN and STONE are included; concepts like CAR and TELEPHONE are not.

Figure 1. Proportion of meanings with different rates of change. Figure redone

from Sankoff (1970, p. 567, figure 2).

What is the effect of the skewed nature of the Swadesh data on models that try to use it to predict the rate of replacement for the entire mental lexicon? The models will certainly be better at predicting the behavior of other frequent, stable concepts, than they are at accounting for the variation in replacement rates for sets of infrequent, unstable concepts.

3.1.3 EARLIER MODELS OF LEXICAL REPLACEMENT This section will present several recent endeavors to statistically model

lexical replacement. These models will be evaluated in section 3.1.4, and a new model will be presented from section 3.2 and forward.

Pagel et al. (2007) present a model that partly accounts for the rate of lexical replacement, based on frequency and word class. The authors calculate the rate of lexical replacement for 200 Swadesh items by using cognate class categorizations originally gathered by Dyen et al. (1967) and weighting this by also taking the historic relationships of the languages into consideration. This weighting is unnecessary if the goal is to compare the difference between concept rates, and in practice the difference between the results of the Pagel method and the simple method used as an example in section 3.1.1 is negligible: a correlation test shows a near perfect match (R=0.93 p<0.01).

Assuming an age for the Indo-European language family of 8,700 years (based on Gray & Atkinson, 2003), Pagel et al. (2007) compute that in the last 10,000 years in Indo-European, the concept NAME has had, on average,

Page 49: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

37

0.47 word replacements in a given language, while MAN has had 3.38 replacements and WOMAN 2.75. Calude and Pagel (2011, p. 1103) compare the Pagel et al. (2007) rate with a subjective stability ranking by Starostin (2007) for 110 comparative concepts derived from Starostin’s work with 14 language families. Calude and Pagel (2011, p. 1103) find that the Pagel (2007) and the Starostin (2007) rates have a high correlation (r=0.65, no p-value given).

Pagel et al. (2007) extracted frequency information from balanced corpora for the Swadesh concepts’ primary words in English, Spanish, French, and German. Pagel et al. (2007) tagged each concept with a single word class categorization for each concept (not each word). While no information is given on the procedure for this word class categorization, it may be inferred from the data that it was most likely done on semantic criteria, and possibly on the basis of the English meta-language word form for each concept: the concept RED is classified as an adjective, since the English word red is an adjective.

Knowing only the frequency, the Pagel et al. (2007) linear regression model can account for around 13% of the rate of replacement (English, R=0.37; Spanish, R=0.35; Russian, R=0.41; and Greek, R=0.32). As demonstrated in Figure 2 frequently used words and concepts in the Pagel et al. (2007) data set have fewer cognate classes and thus a slower rate of lexical replacement.

If, in addition to frequency, the word class is also known, the linear regression model can account for 50% of the rate of lexical (English, R=0.69; Spanish, R=0.69; Russian, R=0.71; and Greek, R=0.69: all p<0.0001). Pagel et al. (2007) argue that prepositions and conjunctions are most likely to be replaced, followed by progressively slower replacement rates for adjectives, verbs, nouns, special adverbs, pronouns, and numbers (Pagel et al 2007, 719).

Pagel et al. (2007) venture two different interpretations for the relationship between frequency and rate of replacement. One possibility is that “frequency of word-use could directly modify the rate at which new word forms arise”. Alternatively, the rate at which new forms appear could be the same for all meanings, with frequency of use affecting the probability that a population of speakers will come to adopt a given innovation (2007, p. 19).

Page 50: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

38

Figure 2. Lexical replacement rate as function of normalized frequency in

English, Greek, Russian, and Spanish, for concepts of open (red: adjectives, green: nouns, blue: verbs) and closed (yellow: adverbs, grey: conjunctions, purple: numbers, turquoise: prepositions, orange: pronouns) word classes,

respectively. Made from data in Pagel et al. (2007). Figure previously published in Vejdemo and Hörberg (2016, p. 4).

Monaghan (2014) adds several factors to the statistical model proposed

in Pagel et al. (2007), namely age of acquisition, phonological length, phonological similarity, and concreteness. Monaghan’s main motivation for this expanded model is to investigate the suggestion that language change is linked to children’s language acquisition (see discussion in section 2.3.3), and the more exact hypothesis that early acquired words should be less vulnerable for replacement than later acquired words. Monaghan (2014) uses age of acquisition values from Kuperman, Stadthagen-Gonzalez and Brysbaert (2012), whose data collection method consisted of asking adults to estimate at what age they acquired specific words. The adults had a high in-group consistency in their estimates, and the data set also correlates strongly with other data sets based on adult rankings (such as Bird, Franklin, & Howard, 2001; Cortese & Khanna, 2007; Stadthagen-Gonzalez & Davis, 2006). Adult rankings of age of acquisition correlate strongly with age of acquisition estimates from experiments with children, as shown by Morrison, Chappel and Ellis (1997) for English; Bonin, Peereman, Malardier, Méot and Chalard (2003) for French. For an overview of the history of Age of Acquisition research, see Johnston and Barry (2006).

Since Kuperman et al. (2012) also note that age of acquisition correlates with the phonological length of a word (early acquired words tend

Page 51: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

39

to be shorted) and phonological similarity (early acquired words are more similar to other words), Monaghan (2014) also includes these factors in the model. Phonological length is approximated by the number of letters in the written word and phonological similarity with how many other words had only a single different letter – these variables were also from Kuperman et al. (2012). Finally, Monaghan (2014) notes that since early acquired words also tend to be higher in concreteness (see discussion in section 3.1.4), a concreteness ranking (based on English speakers’ judgments on a 5 degree scale) was included (data from Brysbaert, Warriner, & Kuperman, (2014).

Monaghan (2014) replicates Pagel et al.’s (2007) general results, and can also show a significant impact of age of acquisition, phonological length, and phonological similarity, though not of concreteness. Monaghan (2014) writes that the latter three variables were included to isolate the effect of age of acquisition from other factors.

Finally, I will mention the works of Kapitan (1994) and Rundblad (2000). Kapitan studies the extent to which the 1000 most frequent Latin words have survived in their Romance daughter languages. He does not use a multiple regression model and instead considers each possible factor’s impact on the retention likelihood separately. His conclusion was that there was significant correlation between the following factors: age, level of polysemy and frequency. Older words are in general more stable than younger words but among older words, numerals pronouns and prepositions are the most stable lexical signs. Among young words, the most stable lexical signs are adjectives, verbs and nouns. The more polysemous a lexical concept is, the more stable it is. The more frequent a linguistic sign was, the more stable it is over time. Unlike Kapitan, Rundblad finds no correlation between how old a word is, and how likely it is to be replaced in the present. Rundblad (2000) shows, in an investigation of the history of 72 words related to (what I would refer to as the comparative concept) NATURAL WATERCOURSE, that the origin of a word contributes to its lexical stability. Words created from verbs or from metonyms have a significantly higher lexical stability than those created from nouns, adjectives or metaphors. But, he argues, the origin of a word is only important until it becomes opaque in the minds of users – after this point, it is the frequency of use which determines the lexical stability of a sign.

3.1.4 EVALUATION OF EARLIER MODELS This section will evaluate the earlier models presented by Pagel et al.

(2007) and Monaghan (2014), and motivate why a new, and different, model

Page 52: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

40

is needed to understand the rate of replacement in open-class items. A new model will then be presented in section 3.2.

As previously stated, Pagel et al. (2007) (and therefore Monaghan, 2014) use the 200-concept Swadesh list as the basis for their data sets, and categorize the concepts into word classes, presumably on semantic criteria and most likely influenced by the English labels for each comparative concept. The practice of assigning word class on semantic grounds is also used in other lexico-semantic databases, like the Austronesian Basic Vocabulary Database (Greenhill et al., 2008), where it is called “semantic category”, and the World Loanword Database (Haspelmath & Tadmor, 2009a), where it is called “category”.

Table 4. Pagel et al. (2007) data by word class. Frequency per million words,

averaged for each word class.

Word class

Number of

concepts

Average rate of change

Average English

frequency

Average Spanish

frequency

Average Russian

frequency

Average Greek

frequency Conjunctions 3 4.9 10013 10864 12989 12851 Prepositions 3 3.6 9860 18048 14684 9885

Numbers 5 0.2 1040 5158 1098 2496 Adverbs 7 1.4 2001 4268 6601 1827

Pronouns 9 1.4 5059 1421 6248 310 Adjectives 41 3.4 597 669 886 466

Verbs 57 3.7 313 389 264 194 Nouns 75 2.9 146 139 308 72

TOTAL 200

Table 4 shows the number of items per word class in the 200-concept list in Pagel et al. (2007). In sum, 173 of the 200 concepts are adjectives, nouns, or verbs, all commonly seen as open word classes. The remaining 27 are adverbs, conjunctions, numbers, prepositions, and pronouns, all closed word classes. Closed word classes contain words that have little or uncertain extension in the real world, and a more or less grammatical function in language.

It is clear from Table 4 that the closed word classes, unsurprisingly, have very small samples: 3 conjunctions, 3 prepositions, 5 numbers (to use Pagel et al.’s term), 7 adverbs, and 9 pronouns. The English meta-language labels for the closed word class concepts are found in Table 5. There are far more adjectives (41), verbs (57) and nouns (75) than closed class items. The open word classes are also distinguished by radically smaller frequency values – the average noun appears 146 times per million words in the English corpus, while the average conjunction appears 10013 times.

Page 53: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

41

Table 5. The items in the closed word classes in Pagel et al. (2007). See section 3.2.1 for a note about their decision to list FATHER, MOTHER as pronouns.

AND, BECAUSE, IF Conjunctions IN, WITH (ACCOMPANYING), AT Prepositions ONE, TWO, THREE, FOUR, FIVE Numbers HERE, THERE, HOW, WHERE, WHEN, WHAT, NOT Adverbs I, THOU, HE, WE, YE, THEY, WHO, FATHER, MOTHER Pronouns

If the Pagel et al. (2007) data is split into one dataset with all the closed

word class items (function words), and one dataset with all open word class items (content words), it becomes apparent that most of the perceived correlation between frequency and the rate of lexical replacement can be found in the closed word classes. For the closed word class items, the multiple linear regression model is better than for the full data set: together, frequency and word class account for about 60% of the variation in closed word class items (R= 0.7691, R2=0.59, p=0.001, n=27).

For open word class items, the model is noticeably less good a predictor for the rate of change and accounts for only about 12% of the variance (R=0.36, R2=0.12, p<0.001, n=173).

The reason the model presented in Pagel et al. (2007) was so convincing at first glance is that the correlation coefficient is bolstered by the high correlation between frequency and type of word class in the six closed classes. The six closed classes account for only a small part of the data (27 of the 200 concepts), and, naturally, of an even smaller part of all the words in an actual language. This means that for open class words, judging by this model, frequency and word class are not as good predictors of the rate of lexical replacement as previously thought.

There are good reasons to investigate the lexical replacement behavior of closed word class concepts separately from the behavior of open word class concepts. For instance, there is a cognitive divide in the brain’s handling of content and function words. Whereas clinical patients suffering from expressive aphasias generally have problems in producing function words and morphosyntactic structure, patients with receptive aphasias are often unable to comprehend and select correct content words during speech production (Ingram, 2007). There are also clear differences in neurophysiological activity during the processing of function words in comparison to the processing of content words (Diaz & McCarthy, 2009; King & Kutas, 1995; Münte et al., 2001).

Page 54: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

42

It can be shown that Monaghan’s (2014) decision to group together open and closed word classes also affects the model in that article. Vejdemo and Hörberg (2016) successfully replicated Monaghan’s study for that study’s entire 200-item data set, and then added a binary variable to the data set: all concepts were tagged either as open class (all nouns, verbs, adjectives; 173 items) or closed class (the 27 remaining items) to that data set. The effect of age of acquisition on lexical replacement rate is in fact driven by the difference between open and closed class words. Separate analyses of Monaghan's data were conducted across open and closed class items, respectively, and no significant effect of age of acquisition was found in either of them. More importantly, however, when the word class predictor was replaced by a predictor distinguishing open class and closed class words only, thereby controlling for their difference, analysis of the full data set found no significant effect of age of acquisition on the rate of lexical replacement.

Neither Kapitan (1994) nor Rundblad (2000) use linear regression models to examine their data on lexical stability, and their choice to examine separately the impact of individual factors on the likelihood of lexical replacement unfortunately limits the usefulness of their findings for this particular study. It can be noted that their general findings are in accordance with those shown in Pagel et al (2007) vis-à-vis frequency, and that Kapitan’s claim that polysemy is important, merits further study.

As a final note on evaluation of earlier models trying to account for the variation in lexical replacement, it is fitting to mention Ladd, Roberts and Dediu (2015). They present no models of lexical replacement of their own, but instead evaluate many previous works that have identified correlations between different linguistic features, such as Pagel et al. (2007). Ladd et al. (2015) discuss the pros and cons of correlational research on language data, noting the importance of ensuring validity and robustness. Validity, they write, can be ensured when the assumptions behind the hypotheses are clearly presented, and the way that variables are quantified and measured clearly explained. Robustness can be achieved by many means, typically by varying the data or the methods.

Page 55: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

43

3.2 A NEW MODEL Previous sections have presented and evaluated some existing models

that have been used to explain lexical replacement. This section will propose a statistical model that predicts the general rate of lexical replacement for open class concepts. The different hypothesized factors will be quantified, and their correlations both with the independent variable (the rate of lexical replacement) and all the other factors, will be investigated. Since many of the factors interrelate and may measure similar things, a multiple linear regression model, which corrects for this, will then be proposed. The multiple linear regression model considers the impact of all the different variables together and ensures that the interactions between the various variables are accounted for. A bootstrapped version of the model will also be presented. This section presents the variables in the model, and the final model is discussed in detail in section 3.3.

3.2.1 VARIABLES IN THE MODEL All data are available in Appendix A. The independent variable, which one wishes to predict, is the rate of

lexical replacement, as calculated by Pagel et al. (2007). 6 Pagel et al.’s calculations of the variable were used for increased comparability with previous studies.

The potential problem of loanwords skewing the independent variable, mentioned in 3.1.2, should not be an issue for the Swadesh material, since it is pre-selected for stability. To verify this, a Pearson correlation test between the rate of replacement (Pagel et al., 2007) and the rate of borrowability from the World Loanword Database (WOLD; Haspelmath & Tadmor, 2009a) was done. The WOLD database has words from 41 genetically diverse languages representing between 1,000 and 2,000 concepts. Each word for each concept was manually categorized (by the language specialist who submitted it) as being more or less likely to have been borrowed and was given an individual word borrowability score. By averaging this individual score for all words that correspond to the same concept, Haspelmath and Tadmor derive a general concept borrowability score. A correlation test between the rate of

6 An alternative would have been to use the simplistic method of simply dividing the number of cognate classes with the number of languages (as discussed in section 3.1.1), without the additional weighting of family tree relationships present in Pagel et al. (2007), but these two measurement methods are more or less identical (r=0.94, p<0.0001).

Page 56: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

44

replacement and the borrowability score did not yield a significant result, which means that although borrowing is an important factor for certain parts of the vocabulary, it is not a relevant factor affecting the general rate of replacement of the current data.

The dependent variables include frequency, word class (also used in Pagel et al. 2007), age of acquisition, and imageability. The model also tests several of the other hypothesized motivating forces behind lexical replacement that were discussed in the background chapter. The conserving effect of entrenchment (discussed in 2.7.3) is tested by both raw frequency and by a measure of average mutual information. The factor of emotional charge (discussed in 2.7.1) is approximated by the psycholinguistic measure of arousal. The hypothesized impact of polysemy (discussed in 2.7.4) is approximated by the number of senses, while the hypothesized impact of ease of inferencing (also discussed in 2.7.4) is approximated by the number of synonyms. All these dependent variables are discussed in turn below.

It must also be acknowledged that the present model is not created to test all the possible factors mentioned in the background section, and that the interplay between them all is no doubt complex, as shown by e.g. Ladd et al. (2015). It is very likely that societal and cultural considerations are important for which words get replaced, and that these considerations vary between speaker communities, and over time. In addition, specific semantic domains (such as body parts, kinship terms, colors, etc.) probably have domain specific tendencies when it comes to the rate and process of lexical replacement. This chapter’s investigation of lexical replacement takes none of this into account and instead tries to investigate whether it is possible to find evidence for domain-overriding generalizations of which factors can affect the rate of lexical replacement. With the exception of the data on frequency and synonyms, all the data is taken from English. A further, general caveat is that it is difficult to judge the quality of any variables based on subjective guesses by speakers (such as imageability and arousal in this study) where there are no objective measures available with which to compare the resulting values – there are no objective measures that tell us how emotionally charged concepts are in the mind. To a certain extent, this problem can be parried if there is a high correlation between the results from several different studies. Even when there is a high agreement between speakers doing subjective rating experiments, the question of validity remains – to what extent does a measure (e.g. arousal derived from subjective ratings) correspond to the true target, namely how likely a word is to have many euphemisms and be replaced by one of them?

Page 57: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

45

3.2.2 WORD CLASS (SEMANTIC CATEGORIES) It should be noted that the term “word class” is used atypically here

and in previous work on lexical replacement. Ordinarily a “word class” would be assigned to a “word”. In the sense that “word class” is used in this thesis and in works like Monaghan (2014) and Pagel et al. (2007), it is instead a feature assigned not to a word but to a contextless concept – like DOG or TO

RUN. The concept DOG is assigned to the “word class” noun, because its reference is an animate, real-world being. The concept TO RUN is assigned to the “word class” verb, because it refers to an action. This means that what is called “word class” is in fact a semantic category, assigned according to semantic criteria. Semantic criteria are a valid part of word class assignment, but it is normally combined with other language-specific morphosyntactic criteria. Haspelmath and Tadmor (2009b) also divide concepts into verbs, nouns, adjectives etc., but they use of the term “semantic category”. It is arguably the clearer choice, but in order to retain term compatibility with earlier similar work, this thesis will use “word class” instead of “semantic categories.”

Since the purpose of the present model is to investigate lexical replacement of content-bearing lexemes, and not the replacement of grammatical elements, only the open word class concepts in the 200-item Swadesh list were used.

The concepts are tagged as either “nouns”, “verbs”, or “adjectives” in Pagel et al. (2007). This word class tagging was largely reused for the current model, but differed in two ways: first, there are six concepts that Pagel et al. label “adjective” and that I instead label as closed word class items, and therefore exclude. These are ALL, FEW, MANY, NEAR, OTHER, SOME, THAT, THIS – concepts that I consider to be very grammatical in nature, and thus more suited to the closed word class group. In addition Pagel et al. had labeled FATHER and MOTHER as pronouns and WATER as an adjective – these were all relabeled as nouns. Adding or removing these concepts had no relevant impact on the results. All in all, this left 167 of the 200 original concepts Swadesh concepts.

3.2.3 IMAGEABILITY As discussed in section 2.7.2, there is a marked difference between how

highly imageable (i.e. easy to picture in the mind) and less imageable words are processed in the mind, and it is reasonable to test a hypothesis that this could influence the rate of lexical replacement. This study uses data from Cortese and Fugett (2004), who have published imageability ratings from

Page 58: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

46

English speakers for 3,000 words, where speakers were asked to rate the imageability of each word on a scale. The Cortese and Fugett data correlate very strongly with the Brysbaert, Warriner, & Kuperman (2014) concreteness-rated data that was used by Monaghan (2014) (r=0.88 p<0.00001).

3.2.4 ENTRENCHMENT (AS FREQUENCY AND CO-OCCURRENCE) The relationship between entrenchment and the rate of lexical

replacement was discussed in section 2.7.3. This model uses two complementary measurements – raw frequency and co-occurrence frequency (average mutual information).

Pagel et al.’s (2007) findings clearly show that raw word frequency predicts the lexical replacement rate. As shown in section 3.1.4, this holds, but to a lesser degree, even after the open word classes are investigated separately. The raw frequency data in the present model is taken from the average of the frequencies reported in Pagel et al. (2007) for English, Spanish, Russian, and Greek (see section 2.7.3 for a discussion of Calude and Pagel’s (2011) findings regarding the cross-linguistic comparability of frequency measurements).

As noted in background section 2.7.3, raw frequency is only one of several possible perspectives on entrenchment. It might also be more difficult to replace a word that often co-occurs with other words in constructions than to replace a word that does not have such common co-occurrences (e.g., BROTHER might occur often with SISTER, while TO GO might have no such steady lexical partner to anchor it). Any measure of strength of co-occurrence could thus be expected to be negatively related with the rate of replacement.

Mutual Information (MI) is a co-occurrence measurement that gives high values to two items that often co-occur (salt would have a high MI value with pepper) and low values to items that rarely co-occur (salt would have a low MI value with dinosaur). When MI for two words is calculated, the frequency of both words is independently taken into consideration and contrasted with how likely the words are to occur together. An estimate of the mutual information of items X and Y is defined as

log( ( ) )( , ) / log 2

( ) ( )n xy NMI x y

n x n y ngramsize�

�� �

where n(xy) is the frequency of co-occurrences of x and y, n(x) the

frequency of x, n(y) the frequency of y, and ngramsize the size of the n-gram window under investigation.

Page 59: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

47

To gain a measurement per word, instead of per word pair, I calculated the 20 highest Mutual Information neighbors of each Swadesh concept in the English BNC corpus (accessed through the http://corpus.byu.edu/bnc/ interface; ngramsize = 6) and averaged this figure.

3.2.5 AGE OF ACQUISITION Section 2.7.5 discussed how the age of the speakers might influence

lexical replacement, and age of acquisition and its hypothesized relationship to the rate of lexical replacement was further discussed above in the context of Monaghan’s research. Like Monaghan (2014), this chapter uses age of acquisition data from Kuperman et al. (2012).

3.2.6 EMOTIONAL CHARGE / AROUSAL Another potential factor behind lexical replacement (discussed in

section 2.7.1) is the emotional charge of a concept. Concepts with higher emotional charge are likely to engender many euphemisms that replace each other. Emotional charge is operationalized in this study by arousal. A high arousal value means that the word evokes more emotion in the participant. This study uses arousal data from Warriner et al. (2013), who calculated arousal values for 14,000 English lemmas from over 300,000 ratings by 745 speakers on a scale from 1 to 9. Warriner et al. note that only 20% of the words have an arousal rating above 5 (neutral), 7 which indicates that the majority of the words in their data are not particularly emotionally charged.

7 The instructions given to speakers rating for the arousal dimension in Warriner et al. (2013, p. 4), as a paraphrase, for formatting reasons: You will use a scale to rate how you felt while reading each word. There will be approximately 350 words. The scale ranges from 1 (excited) to 9 (calm). At one extreme of this scale, you are stimulated, excited, frenzied, jittery, wide awake, or aroused. When you feel completely aroused, you should indicate this by choosing rating 1. The other end of the scale is when you feel completely relaxed, calm, sluggish, dull, sleepy, or unaroused. You can indicate feeling completely calm by selecting 9. The numbers also allow you to describe intermediate feelings of pleasure calmness/arousal, by selecting any of the other feelings. If you feel completely neutral, not excited nor at all calm, select the middle of the scale (rating 5).

Page 60: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

48

3.2.7 SENSES The degree to which the primary word of a concept is polysemous and

therefore has many different senses may also affect its lexical replacement rate, but, as discussed in section 2.7.4, the direction of effect is not certain. If polysemy is an unstable situation in language, then more senses would presumably be linked with a higher propensity for semantic change – but semantic change does not necessarily lead to more lexical replacement. A word with more senses might be more firmly anchored in different contexts (it would be deeper entrenched), which might help protect from lexical replacement.

In the present model, the number of senses was determined on the basis of the Wordnet English lexical database (Fellbaum, 1999), where all lexical items are tagged with how many senses they have.

3.2.8 SYNONYMS There are, finally, good reasons to suspect that the number of

synonyms (and other semantically closely related words) that a word has is related to its lexical replacement rate, as discussed in section 2.7.4. If inferencing between semantically related words is one of the basic forces behind lexical replacement, then words with more synonyms would be more likely to be replaced. The number of synonyms of a word should therefore be positively related to lexical replacement rate.

In this study, the number of synonyms that a word has is measured by counting the number of suggested synonyms in synonym dictionaries. However, synonym dictionaries contain not only synonyms, but often common hyponyms and hypernyms. At any rate, these are all evidence for semantic connections, and, for brevity, “synonyms” will be used to refer to them. In order to get a more balanced average synonym count for the underlying concept, data was gathered from synonym dictionaries in five Germanic languages. For the English data, the synonyms came from the Oxford Pocket American Thesaurus (“Pocket Oxford American Thesaurus Online,” 2008), since automatically extracting data from this source proved to be easy. The Swedish data come from Bonniers Synonymordbok (Walter, 2000), the Danish from Gyldendals Synonymordbog (Ingemann, 2011), the German from Wörterbuch.info (http://www.woerterbuch.info/), and the Dutch from Synoniemen.net (http://synoniemen.net). The synonym measurement for a comparative concept was derived from first normalizing within each dictionary and then averaging over all the languages. To illustrate: there were 16 synonyms for the primary word for WOMAN in the Dutch

Page 61: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

49

dictionary, but 37 synonyms for the primary word for WOMAN in the English dictionary. The average number of synonyms for the studied words was 18.4 in the Dutch dictionary and 37.2 in the English dictionary. The normalized synonym counts for WOMAN was thus 0.87 in Dutch (16/18.4) and 0.99 in English (37/37.2). The average number of normalized synonyms over all the five languages for WOMAN was 1.28.

3.3 RESULTS AND DISCUSSION As an initial analysis, the correlations among all of the continuous

variables were investigated. All statistical analyses were conducted with the statistical software R, primarily with the integrated stats package (R Development Core Team, 2013). The frequency, number of senses and age of acquisition predictors (variables) were log transformed in order to have an approximately normal distribution. The correlation matrix with the Pearson correlations among all of the continuous variables is shown in Table 6. As evident from the table, the lexical replacement rate is significantly correlated with frequency, synonyms, mutual information, and imageability. The number of senses and level of arousal does not correlate with the rate.

Table 6. Correlation matrix for all continuous variables in the study. P values

for significance tests have been corrected for multiple comparisons using Holm correction. ***: p < .0001; **: p < .01; * p<.05.

As expected, there are also high correlations among many of the

dependent variables themselves. As was discussed in 2.7.4, having a high number of synonyms and a high number of senses (polysemy) is likely

Log Log Log

Freq. Senses AgeOfAcq

Rate - -.273** .242* -.281* -.254* 0.029 -0.046 .255*

LogFrequency

-.273** - .381*** 0.221 -0.208 -0.046 .357*** -.482***

Synonyms .242* .381*** - -0.003 -.486*** 0.195 .592*** -0.043

MutualInfo.

-.281* 0.221 -0.003 - .506*** -0.111 0.031 -.367***

Imageability -.254* -0.208 -.486*** .506*** - -0.071 -.369*** -0.238

Arousal 0.029 -0.046 0.195 -0.111 -0.071 - 0.046 0.097

LogSenses

-0.046 .357*** .592*** 0.031 -.369*** 0.046 - -0.178

LogAgeOfAcq

.255* -.482*** -0.043 -.367*** -0.238 0.097 -0.178 -

Rate SynonymsMutualinfo.

Imageability Arousal

Page 62: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

50

connected, and this is reflected in the correlation matrix in Table 6, where synonyms and senses correlate at r=0.592. Both synonyms and senses (in log form: logSenses) also have rather strong negative correlations with imageability and a rather strong correlation with frequency (logFrequency): more abstract items have more synonyms and senses, and are less frequent, than more concrete items. Previous findings that age of acquisition and frequency correlate negatively are borne out in the matrix as well, and age of acquisition (logAgeOfAcquisition), also had a negative correlation with mutual information: earlier acquired words are more frequent and co-occur more often in set constructions than later acquired words, according to this data.

In Table 6, it appears that the rate of replacement is positively correlated with synonyms, and age of acquisition, and negatively correlated with frequency, mutual information, and imageability. But the high correlation coefficients between these variables mean that it is unclear whether their individual correlations with rate of replacement are affected by another variable.

In order to overcome this problem, data was analyzed using multiple regression modeling, and the validity of the model was tested by comparing it to a bootstrapped model.

A multiple regression model models a normally distributed continuous variable, the outcome (dependent) variable, as a linear combination of a set of independent or predictor variables. Importantly, the model estimates the individual relationships between each predictor and the outcome (independent) variable while controlling for the influence of all other predictor (dependent) variables in the model by partialling them out. The model contains the continuous predictors shown in Table 6, together with the word class predictor: word class, frequency, age of acquisition, imageability, mutual information, synonyms, senses, and arousal.

As mentioned earlier, the high correlation coefficients between the several dependent predictor variables (multicollinearity) are a concern in linear regression. Collinear predictors may not individually account for the variance in the outcome variable. This in turn increases the standard errors of the coefficient estimates, and therefore reduces the confidence of those estimates. As shown in Table 6, the high correlations between individual predictors together with measures of the Variance Inflation Factor (max VIF: 5.9), which estimates correlations between an individual predictor and all other model predictors (see e.g. Harrell, 2001, p. 65), indicates that multicollinearity might be a concern. Bootstrapping was therefore also used to test the

Page 63: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

51

significance in the individual predictors, independently of their standard errors. Predictor estimates were calculated on the basis of 10,000 bootstrap samples, shown in Table 7 below. The predictor statistics of the bootstrapped model confirm the predictor effects in the original model in terms of both effect direction and significance, and therefore attest to the stability of the predictors.

Another concern in linear regression is overfitting of the regression model. If the model contains too many predictors in relation to the sample size, the model coefficients might be overoptimistic, and the model predictions will not generalize beyond the sample data (c.f., e.g., Babyak, 2004). Overfitting was evaluated using bootstrap validation. Overall overfitting was estimated by calculating the shrinkage coefficients γ0 and γ1 on the basis of 10,000 bootstrap samples,8 using the boot package (Canty & Ripley, 2013). These coefficients did not differ significantly from the intercept and slope of the observed values regressed against the predicted values of the original model (i.e., 0 and 1, respectively), as evident by Z tests (γ0 = 0.41, Z = -0.92, p = .36; γ1 = 0.87, Z = 0.95, p = 0.34) (c.f. Baayen, 2008, pp. 194–195; Gude, Mitchell, Ausband, Sime, & Bangs, 2009; Harrell, 2001, pp. 249–250).

The model shows a decent fit (N=117, r2 = 0.34, F(9, 99) = 5.62, p < .0001), accounting for approximately 34% of the variance of the lexical replacement rate. Crucially, the fit is significantly better than that of a model only including log frequency and word class as predictors (χ2(6) = 247.42, p < .0001). The predictor statistics are shown in Table 7, which includes statistics of both the original and the bootstrapped model. The final column reports the ΔR2 of each predictor, which is a measure of the proportion of the variance of the lexical replacement rate explained by each individual predictor, over and above that of all other predictors in the model. ΔR2 was calculated on the basis of the R lmSupport package (Curtin, 2014).

Table 7 also includes a 95% point wise confidence intervals for the coefficients, based on the 0.025 and 0.975 quantiles of the coefficient estimates of the 10,000 bootstrap samples. For technical reasons, the word class variable, which has three values (verb, noun, adjective), is represented as two different binary variables: word class: noun and word class: verb. Word

8 The model is refit on each bootstrap sample, and the observed values of the original data set are regressed against the predicted values of each bootstrapped model. γ0, and γ1 is then estimated as the mean of the intercepts and the slopes of the bootstrapped models.

Page 64: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

52

class: adjective is not entered into the model since its information is already there; if something is not a verb or a noun, it is an adjective.

The results of the regression modeling replicate the results of Pagel et al. (2007) in showing that word frequency is a strong predictor of the rate of lexical replacement. Frequency explains as much as 16.3% of the variance after controlling for the influence of all other predictors: the more frequent a concept is, the less likely its primary lexical form is to be replaced, as shown by the negative sign of the beta coefficient.

Table 7. β coefficients and inferential statistics of the original and the

bootstrapped model. Previously published in Vejdemo and Hörberg (2016, p. 10).

Table 7 further shows that the imageability of a concept is also associated with a decrease in its lexical replacement rate. The predictor imageability explains about 6.1% of the variance of the replacement rate. Recall from Table 6 that there was no significant correlation between the lexical replacement rate and senses – and yet a significant effect of senses emerges in the model when all other factors are controlled for, accounting for close to 3.4% of the variance in the lexical replacement rate. Concepts whose primary forms have a greater number of senses show a weak, albeit significant, decrease in lexical replacement rate. A further test shows why the connection between the rate and the number of senses did not turn up in the correlation matrix but does turn up in the linear regression model. The negative effect specifically emerges when the variable synonyms is controlled for. This is illustrated in Figure 3 below: when concepts are grouped with

CI CIβ Std. error t p β Std. error Z p lower upper

(Intercept) 10.07 2.32 4.33 0 10.03 2.69 3.73 0 4.56 15.12 -LogFreq.

-0.7 0.15 -4.62 0 -0.7 0.17 -4.24 0 -1.01 -0.37 16.3%

Synonyms 1.77 0.4 4.38 0 1.78 0.41 4.39 0 0.97 2.56 12.5%Mutual�

info.0.05 0.12 0.38 0.7 0.04 0.13 0.31 0.76 -0.23 0.29 0.0%

Imageability -0.64 0.21 -2.98 0 -0.63 0.24 -2.67 0.01 -1.08 -0.15 6.1%Arousal -0.18 0.14 -1.25 0.21 -0.19 0.14 -1.33 0.18 -0.47 0.09 1.0%Log�Senses

-0.54 0.24 -2.28 0.02 -0.53 0.24 -2.22 0.03 -1 -0.06 3.4%

Noun 0.81 0.71 1.14 0.26 0.84 0.71 1.18 0.24 -0.51 2.26 0.2%Verb 0.44 0.51 0.86 0.39 0.45 0.46 0.97 0.33 -0.45 1.37 1.6%Log�

AgeOfAcq -0.18 0.72 -0.26 0.8 -0.18 0.7 -0.25 0.8 -1.48 1.3 0.3%

PredictorOriginal model Bootstrapped model

ΔR2

Page 65: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

53

respect to their average amount of synonyms, a strong negative relationship the lexical replacement rate and senses is found.

Figure 3. Scatterplots of the relationship between the rate of lexical replacement and senses. The left hand panel shows the relationship between the lexical replacement rate and senses, for three different levels of synonyms (Low: 0-0.65 mean synonyms; Medium: 0.65-1.1 mean synonyms; and High: 1.1-2.65 mean synonyms). The right hand panel shows the relationship between lexical replacement rate and senses when the average number of synonyms is not controlled for. Shaded areas represent 95% confidence intervals of the slopes of the regression lines. This plot was previously published in Vejdemo & Hörberg (2016, p. 12).

Importantly, the model also shows that the average number of synonyms that are listed in synonym dictionaries for a concept is almost as strongly associated with the lexical replacement rate as frequency. The predictor synonyms accounts for about 12.5% of the variance of the replacement rate. The greater the average amount of synonyms used for a concept is, the more likely its primary form is to be replaced.

The regression model finally shows that the individual relationship between lexical replacement rate and mutual information (see Table 6) is in fact mediated by other variables in the model: Once their influence is accounted for, the relationship disappears. Arousal and word class are likewise not statistically significant. The effect of age of acquisition on the model is not significant - the correlation that was seen in Monaghan (2014) disappeared once the few closed class function words were removed from the dataset.

The significant relationships between frequency, synonyms, imageability and senses, on the one hand, and lexical replacement rate, on the

Page 66: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

54

other, are illustrated in Figure 4. The figure illustrates the relationships between lexical replacement rate and the aforementioned predictor variables while holding the influence of all other predictors constant. This is done by plotting the lexical replacement rate against the residuals of each predictor variable regressed against all other predictors.

Figure 4. Scatterplots of the relationships between the rate of lexical replacement and (A) residualized log frequency, (B), residualized synonyms, (C) residualized imageability, and (D) residualized senses. Shaded areas represent 95% confidence intervals of the slopes of the regression lines. Previously published in Vejdemo and Hörberg (2016, p. 11).

Page 67: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

55

3.4 SUMMARY The chapter has shown that, in addition to frequency, the number of

synonyms, imageability, and the number of senses predict the rate of lexical replacement for the corresponding open class concepts. The more synonyms that are used for a concept, the higher the lexical replacement rate of that concept. This can be attributed to the fact that availability of other semantically close words makes inferencing and replacement easier. A negative relationship between the imageability (i.e. the ease of which the concept is depicted in the mind) of concepts and their lexical replacement rate was also found. Concepts that are more easily imagined and pictured in the minds of the speakers therefore seem to undergo less word replacement. It is particularly interesting to note that imageability was a better predictor than word class (noun, adjective, or verb) in determining the rate of lexical replacement.

Finally, a small negative relationship between the number of senses a concept has and its lexical replacement rate was found. It was noted in section 2.7.4 that polysemy is often assumed to be a natural stage in lexical replacement, and that from this it can be inferred that it should have a positive relationship with the likelihood of semantic change (though not necessarily on lexical replacement). This study agrees with the Kapitan (1994)who suggests that the more polysemous a word is, the greater the chance for its survival. Polysemous words are also more versatile in that they can be used in more different contexts, which might have a conserving effect and reduce their replacement likelihood. This study finds support for this explanation.

Unlike Monaghan (2014), this study found no relationship between the rate of lexical replacement and the age of acquisition when frequency is controlled for. Likewise, no significant contribution of the mutual information factor in the regression model could be shown, even though it is significantly correlated with the rate of lexical replacement on its own.

Also, while some effect of arousal on the replacement rate of some taboo concepts seems indisputable, this study has not been able to show that the effect of emotional charge was generally applicable also to non-taboo concepts. This might be a validation problem: the Warriner arousal values could be a suboptimal way to measure the emotional charge of words. It might also be due to the fact that only a small part of the vocabulary is typically replaced by euphemisms, and that it is a factor that is not important in this domain-independent perspective.

Page 68: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

56

A drawback of this study is that although the rate of lexical replacement is calculated on the basis of data from many different Indo-European languages, all of the independent variables are based upon data from either a few Germanic languages or English only. The reason for using mainly English data was practical since there are, at this time, no other languages for which all the variables were consistently available.

More data and further studies will make the results more reliably generalizable to other Indo-European languages. However, the shortcomings of the independent variables should work against the hypotheses rather than in favor of them: already in this limited study, strong correlations between the variables are seen, even though idiosyncrasies with, e.g., language particular homonyms should lead to more noise. Had the independent variables been based on less noisy data from a representative sample of Indo-European languages, stronger relationships might have been hoped for between them and the rate of lexical replacement, not weaker.

The fact that the investigated comparative concepts on the Swadesh lists are pre-selected for stability merits a further note of caution (as already discussed in section 3.1.2). Since frequency is the strongest predictor of stability, and since the comparative concepts on the Swadesh lists are highly frequent in language, it can be estimated that the average rate of replacement for the lexicon as a whole will be higher than it is for the comparative concepts in the current study – as also suggested by Sankoff (1970).

To conclude, this chapter has argued that there is reason to assess function words and content words separately with respect to their rate of lexical replacement. It has also been confirmed that frequency has an effect on rate of replacement for content words, but that this effect is smaller than the effect it has on function words. And, finally, that, in addition to frequency, the semantic factors of synonyms, senses, and imageability predict the rate of lexical replacement of content words.

Page 69: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

57

3.5 RATE FOR PARTICULAR SEMANTIC DOMAINS The model presented in this chapter has a rather conspicuous absence:

categorization of the data into semantic domains. The general research question A2 from the introduction begs the question of whether different semantic domains (body parts, kinship terms, colors etc.) have different rates of replacement.

A. What affects the likelihood of lexical replacement? A2. How can local, domain-dependent generalizations be found?

The problems of adding a categorization of the data into semantic

domains as an additional factor in the model are two-fold. First, there exist many ways to categorize semantic domains – which one would be relevant? And second, any list of semantic domains used to categorize the data is bound to sustain that many different semantic domains, and the current data set is simply too small to have too many categorical variables in addition to the factors already examined. Trying to add more and more variables to the model until some combination of variables suddenly yields a high significance and effect size is not prudent – it would be an improper statistical method with a high risk of false results.

This section presents a limited investigation into the relationship between the rate of lexical replacement and semantic domains, outside the confines of the model.

Which semantic domains should the data be divided into? Kapitan (Kapitan, 1994, p. 238) notes (without further reference) that the 19th century scholar Friedrich Christian Diez posited that the following groups are among the most stable in Romance languages: names of gods; names of natural phenomena; names of seasons and days of the week; names of human body parts; names of human psychological states; kinship terms; names of animals and plants; agricultural and nautical terms. Unfortunately this list has no internal differentiation in levels of stability.

A source with higher potential to generate such a good list of semantic domains is the WOLD database, which was constructed for research into a particular kind of lexical replacement, namely borrowing. The WOLD database contains 1,460 concepts and investigates their language specific primary expressions in 41 languages. It also provides a borrowability score (how likely is it for a language to have a borrowed primary word for this concept?) and a (single) semantic domain (referred to as semantic field) that

Page 70: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

58

the concept’s semantic content best matches. The WOLD list has 24 semantic domains (listed in the leftmost column in Table 8.) Tadmor (2009, p. 64) finds what he considers to be a remarkable degree of consistency cross-linguistically as to which semantic domains have higher rates of borrowing likelihood. “Religion and belief”; “clothing and grooming”; and “the house” are the semantic domains most likely to have borrowings – and Tadmor (2009, p. 65) notes that these semantic domains are those that have typically been most affected by intercultural influence. Tadmor also writes that the “semantic fields [domains] at the other extreme, comprising those least amenable to borrowing [...] consist of concepts that are universal and shared by most human societies. [...]” These domains are “sense perception”, “spatial relations”, “the body” and “kinship” (2009, p. 65).

The list of semantic domains created for WOLD has thus proved a useful tool for finding differences in the borrowability score in Tadmor’s research (Tadmor, 2009). Since borrowing is a kind of lexical replacement, it is prudent to see to what extent the same semantic domain list can explain lexical replacement in general.

That the WOLD semantic domain list might be the best available for lexical replacement research does not mean, unfortunately, that there are not problems with this particular division of the concepts. Of the 24 semantic domains in the WOLD taxonomy, 19 were represented in the Swadesh open class data (see Table 8). The most common semantic domain was “the body”. A potentially problematic decision by Tadmor is that word classes (called “semantic categories”) are not separated in the domains – thus “the body” semantic domain contains body part concepts like TONGUE and EAR tagged in the WOLD database as “noun”, but also TO BITE and TO SPIT, tagged in the database as “verb”.

The fact that the semantic domain “the body” was the most common in the Swadesh data set underlines the challenge of the pre-selected nature of the Swadesh items: body part terms are a very small part of the total vocabulary in natural language, but form a large part of the Swadesh list. Other common semantic domains in the Swadesh list are items from “the physical world” (such as FIRE, STONE), “sense perceptions” (such as DIRTY, TO SEE), and “spatial relations” (such as THICK, TO STAND).

Page 71: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

59

Table 8. The Swadesh items divided into WOLD semantic domains. Three Swadesh items were not found in the WOLD database and have been excluded.

An ANOVA with the rate of replacement as the independent variable, and the different semantic domains from WOLD that occurred more than 5 times in the data (see Table 1) as the factors, shows a significant result (F= 3.4619, p-value < 0.01; using Welsh correlation to correct for possible non-equality of variances due to small sample sizes). This shows that these semantic domains do have some explanatory potential when it comes to the rate of lexical replacement. However, a post-hoc test (Games-Howell, to compensate for the unequal sample sizes) used to see which specific domains

Semantic domains # of occurrences

The body 31The physical world 24Sense perception 16Spatial relations 15Basic actions and technology 12Motion 12Agriculture and vegetation 8Food and drink 8Kinship 8Animals 7Emotions and values 6Time 5Speech and language 3Cognition 2Possession 2Warfare and hunting 2Clothing and grooming 1Quantity 1The house 1

Law 0Miscellaneous function words 0Modern world 0Religion and belief 0Social and political relations 0Total number of items 164

Semantic domains not present

Page 72: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

60

were more prone to replacements revealed few statistically significant differences between domains. This is unsurprising since the post-hoc tests are even more dependent on sample size than the ANOVA.

The semantic domains that showed statistical (adjusted p<0.05) differences in their rates of replacement were:

“basic actions and technology” vs. “body” “basic actions and technology” vs. “physical world” “basic actions and technology” vs. “animals” The Swadesh concepts categorized in the WOLD database as

belonging to the semantic domain “Basic actions and technology” were all from the verb word class. The Swadesh concepts categorized as belonging to the semantic domains “body”, “physical world”, and “animals” were all from the noun word class. The results show that some concepts involving actions were different from some domains indicating time stable concrete objects. This comes as no surprise given the model discussed earlier in this chapter: more imageable and concrete concepts are different from less imageable action-focused concepts when it comes to their rate of replacement. And indeed, a multiple regression test with rate of replacement as the dependent variable, and the WOLD semantic categories and the imageability value (discussed in previous section 3.2.3) as independent variables show that the effect of the semantic domains on rate of replacement disappears once imageability is accounted for.

While negative results are important in their own right, the main point of the present section is to discuss the limits of the quantitative perspective taken, and the necessity to combine it with other perspective. The results of the ANOVA indicate the general usefulness of considering the data from the perspective of semantic domains, but the WOLD list of semantic domains provides little explanatory power for why and how semantic domains differ in rate of replacement. It is possible that a different list would lead to more fruitful results (the decision to mix action concepts like TO BITE and object concepts like TONGUE would seem to lessen the usefulness of the list, for example), yet, as discussed above, this list must not be established through trial and error.

Another alternative is to take a more qualitative and descriptive perspective in order to find out more about lexical replacement processes in specific semantic domains.

Page 73: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

61

3.6 CHAPTER END NOTES AND BRIDGE TO THE NEXT CHAPTER Insights about lexical replacement processes from the macro time scale

perspective (many thousands of years) assumed in this chapter are very general and ignore specific features of particular semantic domains – it is akin to knowing the average temperature for a geographical region, but having no knowledge of how temperature varies throughout a day and night.

As a contrast, the two next chapters operate on a narrower time scale, and seek insights about lexical replacement generalizations in a particular semantic domain: color. Specifically, the chapters are focused on the pink and purple areas in the perceptual color space, where a great deal of lexical replacement has taken place in West Germanic languages in the last few centuries.

When color words are considered together as a group in the Swadesh list, they are among the most stable of all the adjective concepts (see Figure 5, which contrasts color adjectives with four other ad hoc adjectival semantic categories: sensation adjectives (e.g. COLD), spatial adjectives (NARROW), and state adjectives (OLD). This observation is both intuitively reasonable, and yet, from another perspective, surprising. The color concepts in the Swadesh list (BLACK, GREEN, RED, WHITE, YELLOW) have been established by much previous research to be frequently labeled in the world’s languages (this will be discussed in the next chapter). While it is irrefutable that, for the present data set, these concepts are slow to be replaced, most modern speakers of Indo-European languages know a host of other color terms that seem to be changing very fast indeed – turquoise, aquamarine, cerise, apricot in English, for instance. The extant research into the linguistics of color, and the hypothesized differences within this semantic domain, makes this a very interesting area in which to conduct detailed investigations of lexical changes and replacement in specific domain. The next chapters will investigate local rules for lexical replacement in the color domain.

Page 74: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

62

Figure 5. The rate for different kinds of adjective items - color terms are clearly

the most stable group

Page 75: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

63

4 INTRODUCTION TO COLOR STUDIES The following chapters will focus on research questions A2, B1, and B2.

A. What affects the likelihood of lexical replacement? A2. How can local, domain-dependent generalizations be found? B. How does lexical replacement proceed in the semantic domain of color? B1. Specifically: how does lexical replacement interact with other kinds of lexical change? B2. How does knowledge about the semantics and pragmatics of color (psychophysiology, sociohistory) help elucidate lexical replacement in this domain?

Lexical change can be studied on many different levels and from many

different perspectives. In chapter 3, I presented a holistic, top-down approach in which the rate of lexical replacement was investigated for 167 open word class concepts, using data from the Indo-European language family. This led to a model of factors that influence lexical replacement on a very general level: primarily frequency, number of synonyms, and imageability.

But what does it mean, in detail, that these factors influence lexical replacement? The previous chapter speculated that a connection between many synonyms and a higher change of lexical replacement in a given time period is due to inference – but what does this mean in practice in a specific case of lexical replacement? And what does an “increase in imageability” mean for a particular semantic domain?

It is reasonable to assume that there will be specific rules of variation and change that govern specific semantic domains. The underlying factors affecting change in, for instance, the kinship domain will be slightly or very different from those for the color domain. Each semantic domain will have local “rules” when it comes to the whys and hows of lexical replacement.

MacLaury (1991, p. 34) notes that the semantic domain of color is especially suited to exploring motivations of language change and categorization. The reason for this is that much is known about the way humans perceive colors, and with modern technology it is possible to manufacture exact colors for elicitation experiments relatively cheaply, thus enabling good comparability between different studies. This allows for a fruitful combination of semasiological perspectives (investigating the domain starting from color terms like rosa and pink) and onomasiological perspectives (starting from a particular set of color stimuli, such as all reddish hues). The

Page 76: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

64

standardized testing techniques help solve the recurring problem in typology of comparing data gathered from different sources (see Koptjevskaja-Tamm, 2012, pp. 383–384 for more on this problem).

The next two chapters will mostly focus on a particular part of the color domain: pink and purple. Independent terms denoting pink and purple are relatively recent additions to many Western European languages, and I will argue that they therefore are still in flux and more likely to be subject to lexical replacement.

Both upcoming chapters use the same kind of color elicitation experiment to gather data on domain specific generalizations about lexical replacements in the color domain, but while chapter 5 compares adult (typically in their 20s) contemporary speakers of seven Germanic languages, chapter 6 compares two generations (in their 20s; in their 60s) of speakers of Swedish. The common parts of the methodology will be discussed in section 4.3.

The goal of chapter 5 is to use cross-linguistic data from seven Germanic languages to infer processes of lexical replacement in the last few centuries for pink and purple. When lexical replacement is viewed from the perspective of only a few hundred years, it must be treated as a process (replacement is ongoing) rather than an established fact (as it was viewed in chapter 3). When lexical replacement is viewed as a process, it becomes difficult, and not desirable, to separate it from other kinds of lexical change, like semantic change.

Chapter 5 uses elicitation experiment results from 146 speakers, mainly young adults in their 20s. The languages are closely related and are spoken in neighboring sociocultural environments. The chapter will show that there are strong cross-linguistic similarities in the way that lexical replacement and other kinds of lexical change have unfolded in the color domain. By looking at several related language varieties synchronically, it is possible to infer diachronic patterns for the last few centuries. For example, a new pink concept displaced part of an older red concept in all the languages; for a subset of the languages, a secondary darker pink color concept in turn displaced part of that new pink concept; some rare color terms seem to denote concepts that are growing in strength (like pink in Danish) and herald a coming lexical replacement; some rare terms are lexical remnants, on their way to being replaced (like violett in Swedish).

The sheer amount of material in the cross-linguistic study makes it difficult to do detailed analyses of all color terms that can be used to label the pink and purple areas of the perceptual color space, however. The results

Page 77: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

65

from chapter 5 in turn form a background for a more detailed case study in chapter 6, which contrasts two generations of Swedish speakers. Comparing the color labeling strategies of the two generations gives a diachronic perspective, which can be complemented with corpus studies, dictionary data, and interviews to give an in-depth look at lexical replacement from a very narrow time scale.

Taken together, the cross-linguistic perspective on lexical replacement that will be provided in chapter 5 and the intra-language diachronic perspective provided in chapter 6 will allow me to posit generalizations about lexical replacement in the color domain, and also evaluate existing theories about color term replacement in the literature. In order to do that, a deeper background on lexical change in color linguistics will now follow.

The next section, 4.1, defines some terms, and 4.2 gives a general background on color linguistics. Section 4.3 then summarizes a common methodology that was used in chapters 5 and 6.

4.1 TERMS AND DEFINITIONS In the following chapter, color is treated as a real world property of

objects, 9 and several assumptions are made about this property: color stimulates a physiological color sensation in humans, which is, under the same environmental conditions, the same for most people who have the same visual sensory receptors. In the mind of a single speaker, there is an individual perception and partitioning of the continuous perceptual color space into language-specific color concepts (like RED, as different from PINK). This perception and partitioning is dependent on, among other things, the language(s) of the speaker, and on the way that the speaker construes the input.

Language specific color concepts are roughly the same for speakers of the same language (though see Lindsey and Brown (2014) for geographical differences and Paramei and Oakley (2014) for age differences), and if a language community has a color concept, it will be denoted by one or more generally recognized, basic color terms (like English red or pink). Color concepts can be compared between languages by postulating a theoretical

9 For an introduction into the ongoing academic discussion on whether

and to what degree color is best described as existing in objects or in the minds of speakers, see Byrne and Hilbert (2003, primary article pp 3-21, discussion and authors’ response pp 22-64).

Page 78: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

66

cross-linguistic comparative color concept: for instance, how does the lexical labeling of PINK differ between languages?

Unless otherwise stated, the semantics of color terms discussed in this chapter is reduced to the denotative meaning, ignoring equally important aspects such as connotations and collocations.

The use of the term cognates is broader than the norm in these chapters: it is used to both refer to words in different languages that are very similar because they have a common history (e.g. German rot and Danish rød, both inherited from Proto-German *raudaz), as well as to words that are similar because they have been borrowed (e.g. German pink and Danish pink, both borrowed from English).

4.2 BACKGROUND ON COLOR LINGUISTICS This section gives an overview of important insights in color linguistics

and shows the motivation for choosing to focus on the pink and purple areas of the color space in some West Germanic languages, in particular.

Section 4.2.1 discusses the Berlin and Kay paradigm in color linguistics. Section 4.2.2 presents several studies that, collectively, indicate that new color concepts often appear in border regions between old ones. Section 4.2.3 discusses how perspective shifts (cognitive construals) of color may affect lexical change in this domain. Section 4.2.4 discusses the word forms of color terms. Section 4.2.5 discusses variation in color labeling within language communities.

4.2.1 THE BERLIN AND KAY PARADIGM PINK and PURPLE have only recently become independent language

specific color concepts in many Western European languages – which is to say that the part of the perceptual color space that is now denoted by terms like English pink was previously considered part of broader color concepts (i.e. RED). Their late independence is predicted by the Berlin and Kay theory of universality in color linguistics (1969). The best-known part of the theory is its suggestion that languages have basic and non-basic color terms, and that there is a universal order according to which basic color terms are acquired. The theory is also credited with establishing (though not pioneering, see Lenneberg & Roberts, 1956) a standardized elicitation method that makes use of a large set of colored slips of paper. The theory has been amended several times (Berlin & Kay, 1969; Kay, 1975; Kay, Berlin, Maffi, & Merrifield, 1997; Kay, Berlin, Maffi, Merrifield, & Cook, 2009) as more data has been acquired

Page 79: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

67

by researchers, and its latest incarnation is based on extensive field linguistics studies.

Briefly, the theory posits that there is an important difference between two kinds of color terms. One kind are the basic color terms, which are frequently used by almost all speakers, can be used to describe anything, are morphologically simple, and are not hyponyms of another color term. The other kind are non-basic color terms, which are typically rarer, may be restricted in what they can describe (blonde can only be used for hair), may be morphologically complex (light red has a modifier; blue-green is a compound), and might be seen as hyponyms of other colors (scarlet is a kind of red.) A language may have very many non-basic color terms (see Schirillo (2001), also discussed in section 4.2.4), but its set of basic color terms is small and predictable in two ways. First, because the way that basic color terms partition the color space is not random, but often reveals the same kinds of partitions: there is often a reddish color, a greenish color, etc. And second, because the order in which the perceptual color space gets partitioned and labeled is predictable – not universally, but in a majority of studied languages (Kay et al., 1997). The suggested order is described in Figure 6. For languages with only two basic color terms, the entire perceptual color space is split into two parts: one word for all darker colors, one for all lighter colors (stage I). If a language has three basic color terms, a new partitioning will take place, the suggested order predicts that the warm reddish-yellowish colors will be partitioned off from the rest of the lighter colors: a separate basic term for the reddish-yellowish part of the perceptual color space is introduced (stage II). The next partitioning of the perceptual color space either divides the darker colors into two parts (separating out a black, and a green-blue) (stage IIIa) or it partitions the reddish-yellowish colors into two parts (giving the term a concept for, and a label for, red and yellow, in stage IIIb.) Then, in stage IV, two achromatic colors (black and white) and three chromatic colors (red, yellow, green-blue) are treated and labeled as independent parts of the perceptual color space. Finally green-blue separates into green and blue (stage V). After stage V, more partitioning will take place, in no clear universal order, until there are words for grey, pink, purple, orange, etc.

While the exact boundaries of independently labeled areas in color space might differ between languages, their prototypical centers are more consistent, possibly reflecting the optimum partitioning for maximum perceptual difference (Regier, Kay, & Khetarpal, 2007).

Page 80: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

68

Kay and Maffi (1999) explain that the cross-linguistic tendencies in color naming revealed by field studies can to a great extent be explained by what is known as the opponent process theory of color vision.10

The opponent theory states that humans with normal color vision perceive four primary chromatic colors (red, green, blue, yellow) and two achromatic colors (white, black). All other colors are different kinds of combinations of these primary colors. Kay and Maffi (1999) note that languages with small inventories of basic color terms only label composite combinations of two or more primary color categories: a common example is that many languages with few basic color terms have a grue term – they use the same color term to describe the nuances that English speaker would differentiate into blue and green.

Then these composites may be broken up into the primary color categories, and, finally, derived colors categories may get individually labeled by the speaker community (Kay & Maffi, 1999). Examples of derived colors are gray (the intersection of white+black), pink (the intersection of

10 The psychophysiological details of this theory are not relevant for the present study, but a brief account will be given here for the interested reader. Depending on the wavelength of light that hits the human eye, different cones (photoreceptors) will be stimulated to different degrees. Blue cones are stimulated by short wavelengths, green cones by medium wavelengths, red cones by long wavelengths. Color vision is made possible by comparing the proportion of nerve impulses that come from each cone type. Color information from the photoreceptors is handled in two channels: a blue-yellow channel and a green-red channel. The blue-yellow channel signals blue from the blue cones, and yellow from activation of both the green and red cones simultaneously. Using this information, this channel judges the input light to be either blue or yellow, depending on the proportion of the nerve impulses that come from the different cone types. The green-red channel judges light to be either red or green, depending on whether the input stimulates red or green cone types. Note that a color can be different shades of red, and another can be different shades of green, but humans do not recognize a reddish-greenish mix color or a yellowish-bluish color, in the way that we can recognize a reddish-bluish or a bluish-greenish color: the blue-yellow channel will determine whether the light is blue or yellow, and the red-green channel will determine whether the light is red or green. Every color that humans see is decided by the balance between the information of these two channels. The exact process of this in the cortex in the brain is unknown, as the final human sensation of color does not exactly match the output of the cones. A more detailed, non-technical introduction of the theory can be found in Wooten and Miller (1997).

Page 81: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

69

white+red), brown (the intersection of black+red), purple (the intersection of red+blue), orange (the intersection of red+yellow), and turquoise (the intersection of blue+green).

The process is illustrated in Figure 6.

Figure 6. The historical evolution from few to many independently recognized

color concepts. Berlin and Kay suggest that one prominent reason for the universal

temporal order of color term acquisition is technological advancement and cultural complexity: more technologically advanced and culturally complex language communities would have access to more color terms – although they acknowledge that it is very hard to assess cultural complexity and technological advancement (Berlin & Kay, 1969, p. 104).

The evolutionary steps proposed by Berlin and Kay are still discussed today. Amended versions of the developmental stages have been published – see in particular Kay & McDaniel (1978) and Kay et al. (1997). The revisions are heavily influenced by the massive data collection effort known as the World Color Survey (WCS), which surveyed color terminology in 110 unwritten languages.11 The first published results of the WCS were in Kay, Berlin, and Merrifield (1991). The exact nature of the proposed evolutionary ladder is not pertinent for the purposes of this study. Of interest is a far less controversial subclaim that some colors, in particular in the pink, purple,

11 The data can be found here: http://www1.icsi.berkeley.edu/wcs/

Page 82: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

70

orange, gray, and brown areas of the color spectrum, tend to have salient independent color terms in fewer languages and can therefore be considered, cross-linguistically, to get partitioned off later than many other color areas.

Over the years, several competing (or complementary) research paradigms have been active in the study of the cross-linguistic similarities and differences in color terminology. In this context, the Berlin and Kay paradigm in color linguistics is often identified as a quantitative, universalist school of thought. An alternative perspective is represented in so-called relativist or diversity-oriented studies. Briefly it can be mentioned that Levinson (2000) has shown that not all stimulus colors have names in all languages, something that is acknowledged by Kay and Maffi (1999) with the addendum that most languages still do name all the stimuli. Diversity-oriented studies often highlight drawbacks with using the Munsell slips as stimulus materials, such as the risk of them being too artificial (Lucy 1997), not separating glossy and matte surfaces (Uusküla & Eesalu, 2014), or not accounting for languages that do not separate color and material features in their language specific concepts (Saunders, 1992). Instead, small-scale detailed studies in the field are advocated by this program (see e.g. Wierzbicka 1990, 2005, 2008). A detailed examination of the differences between these two research perspectives is outside the scope of this text (but see Regier et al 2007).12

The clear advantage of quantitative stimuli-driven studies is that they lead to the collection of large amounts of comparable data, albeit with the risk of missing important color terminology features of specific languages. The drawbacks of quantitative versus qualitative data are not unique to color studies, but recur in all of lexical typology (for a discussion, see e.g. Koptjevskaja-Tamm, 2008, 2016).

Color linguistics of the universalist persuasion is a research tradition that traditionally has focused heavily on the denotation of color terms and rarely addresses connotations in general, or the specialized meanings of color terms in particular fields of discourse – these research questions are better approached in small-scale diversity-oriented studies. This study employs elements from both the universalist and diversity-oriented perspectives.

12 The article has the aptly named section A brief history of the language and thought war, as fought on the battlefield of color.

Page 83: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

71

4.2.2 NEW COLOR CONCEPTS APPEAR IN BORDER REGIONS Berlin & Kay (1969, and subsequent research) are of the opinion that

color lexicons are in a state of continued expansion, though the speed of the change can vary greatly.

The partitioning of the color space is not random, but follows a set pattern. It is possible for new parts of the perceptual color space to be given basic color terms of their own, by promoting non-basic color terms to basic color terms, by borrowing words, or by creating new ones (often through similes with colored objects).

One of the more persistent theories in the literature about the mechanics of new or changed partitions of the perceptual color space is that changes will be likely to occur in overlapping border regions between colors. This section will first explain the idea of border regions and then discuss some theories that, using different terminology, support this view in different ways.

To understand the idea of a border region, picture the perceptual color space as a matrix that can be placed on a cylinder (Figure 7). The horizontal dimension is hue (greenish to bluish to reddish to yellowish to greenish again) and the vertical dimension is lightness (darker colors at the bottom, lighter at the top). A color concept will cover a part of this perceptual color space. It will thus always have an upper border towards lighter colors, a lower border towards darker colors, and a left and right border towards other hues.

When two color concepts are neighbors in the perceptual color space, their border regions (peripheries) overlap – in the simplified illustration of Figure 7, they may, for example, be horizontal neighbors, meaning that they mainly differ in their hue, or vertical neighbors, in which case that they mainly differ in their lightness degree. These border regions are common places for new color concepts to arise. In the overlapping horizontal border region of, for example, a RED and BLUE color concept in a language, a new color concept – PURPLE – may start to be recognized by a speaker community. The light border region of a color may also split off (e.g. a partition forming in RED, splitting of the lighter part to form a new concept for PINK). The border regions will also be more likely to have low consensus in their naming by speakers – speakers disagree more about how a color stimulus belonging in a border region between established color concepts should be labeled, than they do about the labeling of a color stimulus belonging to the more central parts of a color concept’s extension.

Page 84: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

72

Figure 7. A simplified visualization of the perceptual color space, as a matrix

(top) and cylinder (bottom). Sun (1983) notes the importance of this border region partitioning. He

statistically plots the centers of the primary colors in the perceptual color space. The positioning of these centers are taken from Heider (1972b), who based them on experiments with 30 monolingual and bilingual speakers of English. Sun investigates which regions appear at the mid-points between these primary color centers – and these mid-points turn out to be close to the derived colors discussed above (midway between red and blue in the perceptual color space, you find purple). This means that there is a biological and physiological basis for this kind of linguistic color change: the most likely perceptual place for a new color to emerge is in the border regions between colors. Zollinger (1984, p. 408) agrees with Sun and points out that the growing cross-linguistic frequency increases in terms like English turquoise, which might be becoming a basic color term in several European languages, is therefore not surprising: the area in perceptual color space that this term denotes is located between the centers for green and blue. Once other mid-points like red-white (pink), red-blue (purple), and red-yellow (orange) are named, this blue-green mid-point has the most “space” for a new color term. Zollinger also notes that just because this is a likely place for a new color, the question of when a potential new color concept is realized in a language is complex and governed by cultural, technological, social, and psychological factors.

Archibald (1989, p. 52) reports on an experiment with English speakers who had to group color terms into categories. He uses the results from this experiment to present a model that, among other things, addresses how new color concepts emerge. He finds that a basic color can split in the lightness dimension, so that a dark color (in his example, RED) can find its lighter part split off (creating an independent color concept in the PINK region), or a light

Page 85: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

73

basic color (e.g., in his example, YELLOW) can find its darker part split off (creating an independent color concept BROWN, in his example).

Lindsey and Brown (2014) find that the border regions between basic color terms in the perceptual color space are labeled with low-frequent non-basic terms by English speakers (like aqua, jade), while there is more consensus, and more basic terms, used to label the central parts of basic term extensions in the color space (where simple terms like green or blue were used). In their study, 51 English speakers performed two color naming tasks: in both they were shown 330 colored slips of paper (from the Munsell Book of Color) and asked to name them. In the first task, the naming was unconstrained: they could give any answer provided it was a monolexemic color term that the speaker often used. In the second task, the naming was constrained: they had to choose between the 11 English color terms black, white, red, yellow, green, blue, brown, orange, pink, purple, and gray. As a result, Lindsey and Brown found that some parts of the color space had high consensus labeling in the constrained naming task by all speakers – these regions were the extensions of the basic color terms. There were also stimuli for which the consensus was lower – some participants would label a stimulus blue, others would label the same stimulus green, for example. This allowed Lindsey and Brown to map out and find the central parts (high consensus) and border regions (low consensus, several different terms used) for the partitioning of the English color space. They then compared this partitioning with the results on the unconstrained naming task and found that the central parts of the color concepts were still labeled with a high group consensus with simple basic terms, but that there was low consensus on how to name the border regions. Speakers were likely to use non-basic, rare terms for the border regions, like ocean, or aquamarine. Similar results were found by Alvarado and Jameson (2002), who looked at how speakers of Vietnamese and English partition and label the perceptual color space.

Page 86: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

74

4.2.3 COLOR CONCEPTS AND PERSPECTIVE SHIFTS Given that new color concepts can appear between established color

concepts in a language, and given that new color concepts can also appear in the lighter or darker part of a color concept, eventually splitting into two parts, the question becomes how, in detail, such changes occur, and why.

MacLaury (1991, p. 35 & 42ff) investigates how speakers of the Mexican languages Tenejapa Tzeltal and Navenchauc Tzotzil (both Mayan) partition and label the perceptual color space. He proposes that semantic change for color concepts is due to a change in perspective among speakers of a community. In his opinion the variation and change in color terminology can be explained by a dynamic cognitive model based on making a distinction between cognition and perception. His theory of color terminology change (“Vantage Theory”) rests on assumptions of both perception and cognition. Perception has to do with the fact that, physiologically, most people perceive the same colors, and that these colors are perceived to be more or less similar to other colors. Blue and green are closer perceptual neighbors than blue and yellow. In cognition, each speaker can attend simultaneously both to similarity and to distinctiveness between neighbors, and can shift their focus back and forth. Humans can, for example, recognize the gradual difference between green and blue, and can either categorize a green hue and a blue hue as similar or different depending on their perspective. Perceptually we all share the same view of color, but cognitively we can choose which perspective to assume.

Another example is that if two people look at a field of tulips, one of them, focusing on similarity, might say that all the tulips are red, but another person, focusing on distinctiveness, might protest and say that there are both pink and red tulips. Both might be ready to acknowledge that the other is also right, but they have their own preferred perspective: one focusing on similarity, one on distinctiveness.

MacLaury (1991, p. 44) suggests that as a community of speakers encounters more novelty, a focus on distinctiveness becomes natural. He writes, “as individuals shift the strength of cognitive attendances from similarity to distinctiveness, the basic level of color categorization moves toward greater differentiation and specificity” (MacLaury, 1991, p. 55). This means that when enough individuals in a speaker community have shifted their perspective from similarity to distinctiveness, the language as a whole can be said to have undergone a change.

MacLaury suggests a general cognitive model for how change proceeds in the color domain. He also proposes two more detailed cases for how color concept change can happen: In the first case (MacLaury, 1997, Chapter 5) is

Page 87: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

75

two terms, T1 and T2, are first absolute synonyms (at least in their denotative meaning). They then continue to be synonyms, but with different prototypical centers (both terms can be used for the same part of the perceptual color space, but the best example of T1 is quite different from the best example of T2). In the next step, T2 might be seen as referring to only a part of T1 (a relationship of inclusion), and finally T2 and T1 might be seen as referring to two different colors. This is illustrated in Figure 8.

Figure 8. Four steps in the creation of a new independent color concept. After MacLaury (1997).

MacLaury (1991, p. 57) also suggests another case, where a smaller secondary term gains salience at the expense of an older, basic color term, leading to a relation of complementation between the old and new term: they divide the color space between them. This second case might also be seen as only a fragment of the first case (comprising only steps 3 and 4 in Figure 8), and it crucially does not require the terms to first be synonyms and then become co-extensive (i.e. have different focal points but cover the same area, as in steps 1 and 2 in Figure 8).

4.2.4 LABELS FOR COLOR CONCEPTS GRADUALLY BECOME SIMPLER Schirillo (2001, p. 185) notes in an overview article on color linguistics

that speakers of languages belonging to the early stages of Berlin and Kay’s developmental color progression, and therefore having very few basic color terms, still have a plethora of rare, non-basic color terms that originated in

Page 88: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

76

colored objects. Schirillo theorizes that moving to a later color evolution stage might entail a transition from the use of mainly contextualized names to the use of more abstract names.

Rakhilina and Paramei (2011, pp. 129–130) study the historical progression of new color terms in Russian corpora and note that these terms progress from context-dependent similes that can only be used to describe a very limited set of semantic domains, to free adjectives with broader applicability. They theorize that when a new color term that complements an older existing color term does emerge, the new term is at first a simile, following the pattern “color of X” (mašina cveta baklažana ‘a car the color of an aubergine’), and then it can develop an adjectival form (černil’nyj ‘ink-y’). Such terms may also appear in color compound terms, like “X-blue” (černil’no-sinij ‘ink-colored blue’). Eventually the emerging term can be used with only the color sense, without any connotations connected to the original object (echoing Schirillo, 2001).

Rakhilina and Paramei’s findings can be used to get a usage-based, semasiological perspective on MacLaury’s earlier theories on color concept emergence case (i.e. steps 3 and 4 in Figure 8).

Rakhilina and Paramei (2011, pp. 129–130) also stress that new terms typically enter a language in a very restricted semantic domain, in recurring collocations with a restricted set of referents of artifacts (manmade objects). As the term becomes more and more salient, it starts being used also to refer to natural objects. The new term may then supplant older ones denoting the same or similar parts of the perceptual color space, which leads to the older terms becoming more and more constrained in the set of referents they can co-occur with in collocations. An example of this gradual development is discussed in Rakhilina (2007, pp. 5–6). The author brings up two Russian terms used for the brown part of the perceptual color space: koričnevyj and buryj. The latter is the older term. Rakhilina shows that the two terms have different taxonomic constraints, in that koričnevyj is used primarily with artifacts, while the much rarer buryj is used with natural objects (and in conventional constructions) and very rarely with manmade objects. Koričnevyj, although being very frequent, still has collocational constraints. Modified versions of koričnevyj are also frequent (like tëmno-koričnevyj ‘deep brown’), while modified versions of buryj are very rare (Rakhilina, 2007, p. 7). Rakhilina and Paramei suggest, presumably as a sub-criterion to the Berlin and Kay basic color term criterion of general applicability, that a feature distinguishing basic color terms is that they can combine with lexemes denoting both artifacts and natural objects (2011, pp. 129–130).

Page 89: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

77

4.2.5 INTRA-LANGUAGE VARIATION IN COLOR LABELING So far, we have talked about languages “being” at a certain stage in the

color terminology expansion process and “having” a certain number of color terms. This is a simplification: the color terminology of a language community can be in synchronic flux. Different speakers can use different numbers of color terms. Kay (1975, p. 263ff) notes that “in a language in which the color term system is undergoing change there will be inter-speaker variation.” He concludes that the assumption of a sequence of color stages in the Berlin and Kay schema should not be interpreted as absolute for entire speaker communities: individual speakers can very well be at a different stage than the general consensus of the speaker group – but such speakers will always be spread out over adjacent stages in the sequence. As an example, in contemporary English there might be people (say interior decorators or fashion designers) who regularly use turquoise as a basic color term: in their professional group, no one would ever say that turquoise is a kind of blue-green: instead it is recognized as its own color, clearly separate from other colors. At the same time, there might be other parts of the English-speaking community for whom turquoise is a low frequency, rather non-salient term that is a hyponym of green, or blue, or both. Basic color terms do not spring from nothing – before becoming basic they are typically present in languages as secondary, lesser-known color terms. Kay’s evidence for these conclusion comes from consideration of field work data from several sources, primarily, Heider (1972b, 1972a) on Dugum Dani; Berlin and Berlin (1975) on Aguaruna; Hage and Hawke (1975) on Binumarien; and Dougherty (1977) on West Futuna.

Heider (1972b, 1972a) reports on color naming experiments with Dugum Dani (a non-Austronesian language of Highland New Guinea) and finds that there are only two terms used by all speakers - mili 'black' and mola 'light'. This would classify Dugum Dani as a language with only two basic color terms, i.e. a stage I language. But half of the speakers had a term for red, 45% a term for yellow, and 28% also a term for blue. Likewise, Berlin and Berlin (1975) report that their work with speakers of Aguaruna revealed that although the majority of the interviewed speakers, and therefore the language, can be classified as stage III, 10% of the speakers were at stage IV, and a third of the speakers at stage V. Kay (1975, p. 263ff) also cites Hage and Hawkes (1975), who worked with Binumarien (a non-Austronesian language of Highland New Guinea) and found that while most speakers were at stage IIIb, a quarter were at stage IV, and a single young speaker was at stage V. Dougherty (1977) has data from speakers of West Futuna (a Polynesian

Page 90: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

78

language spoken in Vanuatu) and also finds that while the majority of speakers are at stage IIIa, about half are at stage IV or V.

Kay suggests that if intra-community differences are found in color terminology, this may be correlated with various social factors, but there should be a “pervasive correlation with age” (Kay 1975, pp. 263–264). Kay reexamines color naming data from Aguaruna, Futunese, and Binumarien with regard to the ages of the speakers and finds ample (and statistically significant) evidence that younger speakers were at later stages in the Berlin and Kay evolutionary sequence of color terminology than older speakers.

MacLaury’s research into Tenejapa Tzeltal and Navenchauc Tzotzil (mentioned earlier) also investigated variation within the communities. He shows that even people who interact daily might be at different color terminology stages: “some individuals attend more strongly to similarity and others attend more strongly to distinctiveness. The latter group of individuals will name more color categories[…]”(MacLaury, 1991, p. 35).

This variation in color terminology is not surprising: the fact that synchronic variation can be seen as another side of diachronic change has been observed in many parts of language: change is gradual and is preceded by a state of variation and competing terms (Labov, 1972; Traugott & Dasher, 2002; Weinreich, Labov, & Herzog, 1968, p. 188).

Having concluded that individuals speaking the same language might display different color vocabularies, we must also consider whether a single individual’s color vocabulary might change in her lifetime. Davies and colleagues have done extensive color elicitation work with Setswana speakers in Botswana (1994), Damara speakers in Namibia (1997), Russian speakers (1998), and Tsakhur speakers in Daghestan and Azerbaijan (1999). According to their research, color term vocabulary seems to have stabilized at different ages for different language communities, and it is possible in some communities that it could be undergoing expansion even in the language of teenagers.

Lexical changes in color terms might also be expected in older subjects, where color vision might start to change for physiological reasons, like the onset of diabetes or vision problems that increase with age. As humans grow older, the ocular system slowly changes. Wuerger (2013) argues that there is a compensatory mechanism in the mind that keeps color appearance largely intact even as humans age; yet from 50 years onward, the ability to discriminate between small color differences is compromised with an increase in age. Paramei and Oakley (2014) contrast their own findings with older

Page 91: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

79

insights in the literature and conclude that from 60 years onward there is an increase in the likelihood of degeneration in color vision.

At the same time, speakers over 50 make up an important part of the speaking community, and their slight changes in color vision might affect the way that they perceive, and therefore the way that they talk about, their surroundings. This could be a factor in lexical change processes in the domain of color.

Very little research has been published that focuses on differences in color labeling and categorization in adults of different generations, but Desgrippes (2011) studies 26 French speakers aged between 11 and 90. She finds that speakers aged over 45 and speakers under 45 differ in which areas of color space they can label with the French color term orange. When the older speakers were asked to pick a single, most prototypical, best example orange, the color stimuli they chose was perceptually closer to the orange area of the younger group than their own orange area. In this way, Desgrippes writes, the older group resembles bilinguals, who, when learning the color vocabulary of a second language, may experience shifts in the color labeling of their first language (for more on second language learners and color, see Caskey-Sirmons & Hickerson, 1977).

4.2.6 INTERMITTENT SUMMARY To sum up, there are several candidates for domain-specific

generalizations that can be taken, or inferred, from the published literature on when, what, and how lexical change might occur in the color domain.

A composite color may split up into its primary colors (Kay & Maffi 1999), and new derived color categories can appear between primary colors (Kay & Maffi, 1999; Sun, 1983; Zollinger, 1984) or at the lighter and darker part of a domain (splitting it off) (Archibald 1989). In other words: change will often occur in disputed border regions. It can be inferred that change in one part of this semantic domain (one part of color space) will affect the categorization and labeling of other nearby parts.

When a change happens, MacLaury (1997) suggests that it is a gradual process, with different individuals in a community at different stages, depending on how focused on similarity or difference they are. From this comes the question of how aware speakers are of the variation within their speech community, and how aware they are of the process of change.

MacLaury (1997) has further posited two alternatives for the birth of a new color concept. It may begin with two synonymous color terms referring to the same color concept. The two terms may then become co-extended:

Page 92: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

80

they cover the same area of color space, but different centers. Then one color term’s denotation is included in the denotation of the other, which MacLaury defines as complementation (see Figure 8). A second alternative case is where one term is a hyponym of another, and thus denotes a small part of a region in the perceptual color space that is also covered by another, larger concept. This smaller hyponym can then split off to form its own independent concept. There is no previous research on a corresponding process for the disappearance of a color term.

From Schirillo (2001) and Rakhilina and Paramei (2011), it can be inferred that a new color term is limited in its application – it is used at first for only a few semantic domains. From Lindsey and Brown (2014) and Alvarado and Jameson (2002), we can arrive at the hypothesis that rare, complex, and modified color terms are more likely to occur in parts of the color space where speakers have low consensus – prime areas for the eventual development of new, high-consensus color concepts, at which point a (new) simple, high-frequency color term would replace the variety of more complex color terms.

These hypotheses mesh with some of the more general predictions from the previous chapter: frequency insulates items from replacement, and a wealth of synonyms for a concept may be connected to its eventual replacement.

As for where and when change occurs, the Berlin and Kay paradigm predicts that change will not be random, but follow a particular and universal sequence. All the Germanic languages that are investigated in the following chapters are at the last stage of this sequence, which means that there are independent color categories for the derived colors, and that these derived colors are relatively recent. What little research exists on age differences in color naming suggests that, for color areas where change is likely, the younger generation is more likely to have more basic color terms than the older, as they will divide up the perceptual color space in more color categories.

Page 93: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

81

4.3 THE EOSS EXPERIMENT PROTOCOL This section will discuss some common methodology used in the next

two chapters. The elicitation experiment data that is used in chapter 5 (and part of the data that is used in chapter 6) was obtained in the Evolution of Semantic Systems (EoSS) project. The project investigated how meanings vary over space and change over time.13 As part of that, the project developed experiment protocols to standardize testing procedures. I used the same protocols to gather additional data for chapter 6. This section will explain the EoSS protocol.

In the color elicitation sessions, the EoSS protocol calls for the use of a standardized visual toolkit with colored cardboard chips. The chips are displayed on a neutral gray 14 background, under natural daylight, supplemented, when necessary, by a light bulb with a minimum color temperature of 5000 K (this produces light comparable to daylight).

Of the 84 chips, 4 were achromatic (i.e. gray scale), and the remaining 80 varied in hue, lightness, and saturation – there were 20 equally spaced hues at 4 degrees of lightness. All chips were identified using the Munsell color chart. Saturation varied, but colors were generally at the maximal possible saturation for that point in the color space. The color set was developed by Majid and Levinson (2007) and Majid (2008), and a conversion table between EoSS grid coordinates, Munsell codes, and Hex values are presented in Appendix D.

The experiments took place in available rooms at universities, which naturally might have caused some complications. Lighting conditions cannot be verified to have been exactly the same. Color priming, wherein subjects react differently to colors depending on which other colors they have seen before, is also always an issue outside a completely controlled lab. The total control of the testing environment that is necessary for no color to be present around the speaker is impractical, and also creates a very artificial and unnatural setting.

The elicitation sessions were audio recorded, and the sessions were transcribed in full.

13 While the current study makes use only of the color data, the project also tested three other categories, namely containers, body parts, and spatial relations (Majid, Jordan, & Dunn, 2011, p. 6). Please see

http://www.mpi.nl/departments/other-research/research-consortia/eoss for more information 14 50% gray; in RGB values: R128, B128, G128.

Page 94: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

82

The color elicitation tasks had three main components: a free listing task; a best example task (referred to as a focal color task in the EoSS manual); and a color naming task.

The free listing task was a preliminary test performed on five native speakers per language, who were asked to simply name as many color terms as possible. The most commonly occurring color terms from this test were then used as input in the main testing session, which was carried out a few months later.

In these main testing sessions, each language was represented by 20-25 native speakers. Striving for roughly comparable groups across languages, participants were primarily recruited from undergraduate classes at universities.

The main testing sessions consisted of a best example task, a color naming task, and the color blindness test (Waggoner, 2002).

In the best example task, participants were shown the 84 color chips, sorted in a rainbow matrix of hue and lightness. They were asked to point to the color chip that was the best possible example for a given color term. The list of color terms was derived from the most common responses in the free listing task.

The following chapters will mostly focus on the data from the color

naming task, performed on the same occasion as the best example task detailed above, which involved showing the speakers colored chips, one by one, and asking for a color term. Participants were given the following instructions (in translation): “In this task, I will show you some colors. I will show them to you one at a time, and I would like you to tell me what color it is. Just tell me the first color that comes to your mind. You can use the same name more than once as we go through the colors. Do not give long descriptions” (Majid, Jordan, & Dunn, 2011, p. 27).

For each chip shown in the color naming task, one or more main

responses were extracted from the full response of each speaker. The main response is the overall color concept (or categories) referred to in the full response. This full response from the English language data, ah, I know what that color is, it's like a light purple, leads to purple being recorded as the main response. If more than one response was given, as in the full response it’s blue-green, both blue and green were noted as main responses. The full responses light green or murky green would both result in green as the main responses. The simplification of full responses into main responses makes it easier to use statistical testing on the data.

Page 95: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

83

For each language, the most common main response per chip was noted as the modal response for that chip: as an example, for stimulus C1, the most common Swedish main response was röd, and that is therefore the modal response for C1.

Throughout this text, the most typical graph of answers in the color space will be a color matrix, or grid, with all the 84 colors used in the experiment. The coordinates of the matrices (4 rows labeled A-E, and 20 chromatic columns labeled 1-20, and 1 achromatic column labeled 0) correspond to those used in other published material based on the EoSS elicitation kit, to ensure compatibility. Each row is a lightness distinction: the topmost row is the lightest, the bottom row the darkest. See examples in Appendix B: Naming task results.

Page 96: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

84

5 MESO-PERSPECTIVE: CROSS-LINGUISTIC LEXICAL CHANGE IN PINK AND PURPLE15

This chapter will seek generalizations on lexical replacement at a lower

level of temporal magnification (a few centuries) than chapter 3 (a few thousand years), in a few related languages, for a particular semantic domain (color). Comparing the lexicalization strategies of related languages for well-defined concepts (areas in color space), allows us to infer domain-specific generalizations concerning lexical replacement. An advantage of comparing genetically close languages, compared to languages from diverse families, is that differences and similarities can be assessed in light of shared cultural and historical circumstances (see Majid, Jordan, & Dunn, 2015).

By choosing the color domain, it is possible to take advantage of the existing research in lexical typology, and to evaluate existing theories about lexical change and replacement in the literature, such as the progression scenarios suggested by MacLaury (1997), discussed in section 4.2.2, and the

15 This chapter contains material that has been previously published in Vejdemo et al. (2015). The present chapter differs from the paper not only in that it has been heavily restructured and rewritten, but also because it has been enlarged by an analysis of words for the purple region of color space in the languages identified by the study (in particular sections 5.7, 5.8, 5.9). The lexical categorization of the pink and purple region can therefore be contrasted and compared (section 5.9). Vejdemo et al. (2015) was a product of collaboration within the EoSS Projectproject (MPI Nijmegen). All authors contributed initial analyses of the pink area and the etymology of pink terms from their own language: Susanne Vejdemo (Swedish), Carsten Levisen (Danish), Cornelia can Scherpenberg (German), Thorhalla Gudmundsdóttir Beck (Icelandic), Åshild Naess (Norwegian), Martina Zimmerman (Swiss German), Linnaea Stockall (English), and Matthew Whelpton (Icelandic). The bulk of the cross-linguistic comparison analysis was done by this author and Carsten Levisen, during long and fruitful discussions. Dr. Levisen was particularly involved with: *The “pink” and “cerise” cognate sets (4.4.1 and 4.4.2 in original paper; here parts of that work are included in sections 5.4, 5.5 and 5.6.) *The section on minority responses (4.6 in original paper; part of the information included here in section 5.8) *Focusing on an interdisciplinary approach in which the pink data can and should be analyzed not only from a universalist approach, but also from a diversity-oriented color studies approach.

Page 97: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

85

suggestions that new color terms will first denote the border region of an existing term (Archibald, 1989; Sun, 1983), discussed in 4.2.2.

In chapter 3, the simplifying assumption was made that the comparative concepts (e.g. RED, STONE) were unchanging. How common it was for the comparative concept to have lexical replacement in languages was inferred from the number of cognate classes – if two languages used the same cognate class for the same comparative concept in the Swadesh list, they were assumed to denote the same semantic content. In this chapter, there is no a priori assumption that words in different language varieties represent the same (denotation of) a concept. Instead, experimental data is used to establish the denotative similarities and difference between the language specific color concepts, and between the color terms in the different languages. The place of this study within the thesis is illustrated in Table 9.

Table 9. The place of the present study within the thesis.

Chapter 3 Chapter 5 Chapter 6 Time

scale

&

scope

Macro: several millennia,

87 language varieties

Meso: several centuries, seven Germanic

languages

Micro: two generations,

one language

Method

&

material

A statistical model tests domain-independent

hypotheses about lexical replacement

(based on a database of cognate class judgments of a

Swadesh list)

Comparison of variation,

using elicitation experiment results, supplemented with

dictionary data

Comparison of variation and change, using

elicitation experiment results, supplemented with

interviews, dictionaries, floras,

corpora Domain Core vocabulary Color, with focus

on pink and purple Color, with focus

on pink and purple

Whatever stage in color evolution that a particular language is at, the most recently labeled color concepts, and their perceptual neighbors in colors space, are prime subjects for lexical replacement studies. Many Western European languages like English or Swedish are currently at a stage where speakers label most of the likely derived color categories.

Contemporary Swedish can serve as an illustration for the situation in many Northwest Germanic languages: Swedish has no composite color terms,

Page 98: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

86

but at least these six salient primary basic color terms (vit ‘white’, svart ‘black’, röd ‘red’, gul ‘yellow’, grön ‘green’, blå ‘blue’) and five derived basic color terms (grå ‘gray’, brun ‘brown’, rosa ‘pink’, lila ‘purple’, orange ‘orange’). There is also some meta-linguistic awareness among speakers for changes in the last few generations regarding two recent, derived color concepts: PINK and PURPLE. For PINK there is an older term skär in addition to the most frequent term rosa – and a recent term, cerise, that is gaining ground. For PURPLE, there are two older terms, violett and gredelin, that are still used by some speakers, in addition to the most common term lila.

This chapter will investigate lexical replacement, and other kinds of connected lexical change, for PINK and PURPLE in seven Germanic languages: English, German, Bernese, Danish, Swedish, Norwegian, and Icelandic.

These languages make use of a restricted set of cognate classes as modern-day primary labels for these parts of the color space. The situation is rather straightforward for PURPLE (the cognate classes used are mainly “lila” and “violett”), but far more complex for PINK (where we find the cognate classes “pink”, “rosa”, “bleikur”, “rød”, “cerise”).

Not all languages use the same cognate classes, and languages may not divide the perceptual color space in the same way. The relationship between the perceptual color space partitioning and color labeling is complex, and explanations for the various strategies employed will be sought both in existing theories of color linguistics, language contact, and in sociocultural information on colors in Western Europe in the last few centuries.

5.1 LANGUAGES AND SPEAKERS This section will give some basic information on the participants, and a

short overview of the relationships between the seven languages, and the typical uni- or multi-directional influences they have had on each other.

The seven languages belong to Germanic subgroups. Swedish, Danish, Norwegian, and Icelandic are in the North Germanic subgroup. German and the closely related Bernese, on the one hand, and English, on the other, are in different branches of the West Germanic subgroup.

Swedish, Danish, and Norwegian are closely related (forming a continuum) in the North Germanic group, and the language communities share historical and cultural contacts. The languages have a complex history of contact. Briefly, it can be said that Swedish, Danish, and Norwegian vocabularies were most heavily influenced by Low German during the 1200s-1500s, by High German during the 1500s-1930s, and by French during the

Page 99: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

87

1700s-1930s. From the 1930s onward, English has had the primary outside influence on the languages (Haugen, 1987, p. 176).

Icelandic is also part of the North Germanic group, but has been less impacted by language contact than the other languages studied here. There has been a concentrated effort through the centuries by adherents of language purism to purge Icelandic of loanwords. Even so, the language has lexical influence from Low German, High German, Danish, Romance, and lately English (Tarsi, 2014).

German and the particular Swiss German language variety spoken in Bern, here called Bernese, are also very close in the West Germanic family. The major outside influence on the Germanic language varieties was from Latin: the Roman Empire and the spread of Christianity led Latin to have an unbroken influence on Germanic varieties up until the 18th century, clearly evident in word borrowing (Chambers & Wilkie, 1970, p. 72). Other important sources of borrowing were French (from the 12th century onwards) and English. The English loanwords start to appear in the middle of the 18th century, and then form a strong and continuous presence (Chambers & Wilkie, 1970, p. 74,79).

In this study, the seven languages are represented by a total of 146 speakers, who are typically in their twenties. Information on gender distribution and median age for the participants is summarized in Table 10.

Table 10. Median age, number of speakers, and gender for EoSS participants.

Some speakers had more than one error in their color blindness test results, which led their responses to be examined more closely for deviant patterns. Nonesuch was found in a manual check of the data. In addition to this, multidimensional scaling was used to detect outlier participants with deviant response patterns in the groups, and the participants who had issues that were highlighted by the color blindness test did not stand out as outliers. Details on this can be found in appendix E.

Median age Speakers MalesEnglish 21 21 12German 21 20 10Icelandic 25 25 15Danish 26.5 20 9Swedish 24 20 10Bernese 24 20 10Norwegian 28 20 10

Page 100: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

88

5.2 GENERAL RESULTS The languages differed in the number of chromatic (colorful) modal

color terms (i.e. color terms that were the majority answers for at least one stimulus). The Icelandic speakers provided 9 modal color terms in the EoSS experiment. Swedish, Danish, Norwegian, and English speakers used 10 modal terms, and the German and Bernese Swiss German speakers gave 11 terms. The modal color terms are displayed in Table 11.

Table 11 shows that the primary basic color terms share the same cognate classes for all the languages: for example, RED is expressed by red in English, rot in Bernese, rauður in Icelandic, röd in Swedish, and rød in both Danish and Norwegian.

The rest of the modal terms belong to cognate sets that are not shared between all languages. Some concern a cognate set that forms a simile from skin+color (German hautfarben, Icelandic húðlitaður, Swedish hudfärgad, Danish hudfarvet, and Norwegian hudfarge - see Zimmermann, Levisen, Guðmundsdóttir Beck, & van Scherpenberg (2015). The rest of the cognate sets also denote derived colors – of these, the “orange” cognate set is the one that is present in the largest subset of languages, missing only Icelandic (though the most appropriate Icelandic translation, appelsíngulur, comes from a simile referring to the same fruit). The “turquoise” cognate set acts as a modal term in all languages, except English and Icelandic (though it exists as a non-color term in these languages as well).

All the rest of the modal terms concern the pink and purple part of the color space. The pink cognate sets are “rosa”, “pink”, and “bleikur”, and the purple cognate sets are “purple”, “lila”, “violett” (notice that Icelandic fjólurblár derives from this cognate set plus blár ‘blue’) – they will be addressed in the following sections. The English modal terms peach and maroon have very small extensions in the color space.

Page 101: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

89

Table 11. Modal chromatic color terms in English, German, Bernese Swiss German, Danish, Swedish, Norwegian, and Icelandic. Terms are grouped into

cognate sets (rows), and (mostly) ordered so that the cognate sets with the highest number of cross-linguistic uses are labeled at the top.

English German Bernese Icel. Swed. Dan. Norw.red rot rot rau∂ur röd rød rødblue blau blau blár blå blå blå

brown braun bruun brúnn brun brun brungreen grün grüen grænn grön grøn grønnyellow gelb gäub gulur gul gul gulorange orange orangsch orange orange oransj

appelsínu-gulur

türkis türkis turkos turkis turkisrosa rosa rosa rosa

pink pink pink pinklila lila lilla lilla lilla

bleikurpurplepeach

maroon

Primary colorterms

Derived color terms

haut-farben

hud-färg

Derived color terms for

pinkand

purplehues

hud-farge

fjólu-blár

hú∂-litur

hud-farvet

violett

Page 102: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

90

5.3 THE PINK1 AND PINK2 CONCEPTS Limiting this analysis to only modal terms would severely limit its

usefulness for the pink and purple domains. If one looks at English pink and considers its potential translation equivalents, one finds that the languages analyzed here follow one of two patterns. Some languages have, like English, only a single term: rosa for Norwegian, bleikur for Icelandic. German and Bernese have two terms: pink and rosa. Swedish has rosa, but also a frequent (though non-modal) term cerise. Danish has pink, but the most common translation for English pink in Danish is not the Danish loanword pink, but lyserød ‘light red’. These words represent five different cognate sets: “pink”, “rosa”, “cerise”, “(lyse)rød”, and “bleikur”.

In the following, an onomasiological view of the data will be taken, and it will be claimed that members of the cognate sets are used in different ways to express two different underlying color concepts, termed PINK1 and PINK2. PINK1 has a larger denotation and is matched in all the languages. PINK2, has a smaller denotation and is a recent addition in some languages where it typically has affected the PINK1 denotation. The languages can be said to follow one of two systems, A or B, which are illustrated in Table 12.

Table 12. Two PINK systems.

System A

Main PINK1 English Pink Norwegian Rosa Icelandic Bleikur

System B

Main PINK1 Secondary darker PINK2 German Rosa Pink Bernese Rosa Pink Danish Lyserød Pink Swedish Rosa Ceris

Page 103: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

91

5.4 SYSTEM A: A SINGLE PINK1 CONCEPT The System A languages have a single color term – pink in English, rosa

in Norwegian, and bleikur in Icelandic – each denoting a very similar part of the color space. The distributions of the three words are shown in Figure 9, and the comparative concept that is motivated by their similarities is shown in Figure 10. Please note that in Figure 10, and in similar figures in the rest of this thesis, numbers in the C and D rows are colored white. The white color is only for visibility and carries no other information. The reader is also reminded that the printed color not necessarily matches the exact color of the stimuli.

Figure 9. The System A languages' PINK1 distributions

Figure 10. The cells that, cross-linguistically, have the most recurrent responses

for the local lexical matches of the PINK1 concept.

A 2 7 18 17 17 14 4 1 AB 1 11 19 18 10 BC 11 11 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 6 14 18 18 17 4 3 AB 4 17 17 16 1 BC 9 11 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 7 16 19 19 17 5 4 AB 8 17 19 17 BC 5 16 18 1 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Pink - English

Rosa - Norwegian

Bleikur - Icelandic

A * * * * * * AB * * * * BC * * CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

PINK1

Page 104: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

92

English pink is well understood and has been studied in a variety of frameworks (Biggam, 2012; Koller, 2008; Wierzbicka, 1996). Before its status as label for a color, pink denoted a particular flower, a pale reddish garden plant with the name pink (probably Dianthus, see Casson, 1997, p. 232). The first attestation of the color sense was in 1674 (“Oxford English Dictionary,” 2016, s.v. pink).

Norwegian rosa is a borrowing from French (rose), possibly via German and Swedish. The second earliest French recorded sentence with rose as an adjective in French dates from 1853 (TLFI, 2016, s.v. rose). The earliest French example of rose is from as early as 1165, but it is unclear what color this rose denotes when the original source is examined. There is little earlier lexicographic research into the specific etymology of Norwegian rosa, but it is explained as being ‘rose-colored’ in Hansen (1842), a dictionary specifically aimed at listing foreign words that were being employed in Norwegian in the middle of the 19th century.

Unlike English pink and Norwegian rosa, there are no cognates to Icelandic bleikur that emerged in the other languages as modal terms during the elicitation experiments. Contemporary cognate terms of bleikur do exist in Swedish, Norwegian, and Danish, where the word blek/bleg means ‘pale’, and in English, where the words bleach and bleachV refer to removing the color from something, or making it lighter. In Middle English, bleak meant ‘pale’ (“Oxford English Dictionary,” 2016, s.v. bleak).

As seen in Figure 9, bleikur has the same center and main extension as the other cognate sets that have been discussed in this section, but it is more widespread, having a noticeable presence in C18 (5 mentions) and A3 (4 mentions).

Unlike rosa, bleikur does not stem from a loanword. Bleikur exemplifies the multi-stage, polysemous semantic change discussed in section 2.7.4: first, there is one meaning of a word, M1, used and shared by the speech community. Then a new meaning evolves, M2, and for some time, M1 and M2 co-exist in speakers. They can co-exist for a lengthy time, but usually one of the word meanings conquers the other at a point in time, leaving only, say M2, as the meaning of that word.

In a first phase, bleikr was an Old Norse visual descriptor, meaning ‘pale, light, intense’ (Klein, 1999, p. 156). The visual semantics of the concept denoted by bleikr described things like gold, ripe barley fields, and locks of hair (Cleasby & Vigfússon, 1874, p. 68). Presumably, in a second phase, a polysemous pattern was established in which bleikur1 remained a visual descriptor for ‘pale’, and at the same time bleikur2 emerged as a genuine color

Page 105: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

93

term, meaning ‘pink’. In the final phase, bleikur2 came to be the only acceptable meaning. Today, the original ‘pale’ meaning of bleikur can be seen in a few fossilized expressions like bleikur sem nár ‘pale like a corpse / deathly pale’.

5.5 SYSTEM B: THE PINK1 CONCEPT The PINK1 comparative concept, suggested in the previous section, is

also expressed in German, Bernese, Danish, and Swedish, but with one important difference. The PINK1 area is affected (typically reduced) in these languages in cells C19 and C20, which is the center for a secondary pink concept. System B languages thus have a slightly modified PINK1, and a smaller, darker PINK2 concept.

Figure 11 shows the PINK1 denoting color terms for German (rosa), Bernese (rosa), Swedish (rosa), and Danish (lyserød). These color terms will be discussed one at a time.

German, Bernese, and Swedish all have rosa as their term for PINK1 – as in Norwegian, this is a borrowing from French rose.16

Rosa-, in the color sense, appeared first as a compound simile, rosarot or rosenrot ‘red like rose’, in German and then lost its dependence on rot and stood on its own as a separate color term, rosa. The term rosarot did not cease to exist in German but came to be reanalyzed as a combination of two colors, rather than simply conceptualizing a specific rose-anchored kind of rot ‘red’. In the data, a considerable number of Bernese Swiss German speakers still use the term rosarot, whereas rosarot does not appear in the modern German data (for discussion, see Kaufmann 2006, p. 35).

16 A “rosa” cognate form also makes a rare appearance in English: one speaker

used rose for A19 and A20, and another used rose red for B18. Steinvall (2002, p. 65ff, 2006, p. 113) investigates the competition between rose/rosy (of which, the latter word was not found in the results of this study) and pink, and finds that the two terms were in competition in English in the 19th century. In modern times, many fossilized expressions can still be found in corpora with rose, but pink is clearly the more salient word. Rosa does not occur in the elicited Danish data and is considered archaic in modern Danish.

Page 106: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

94

Figure 11. System B languages' PINK1 distribution

The DWDS dictionary gives us the following etymology of the

adjective rosa (author’s translation from German): “as the evolved New High German expressions such as rosenfarb, rosenfarbig, rosenrot, rosig no longer denoted the zartrot (‘subtly red’) color, the flower name (from Latin) rosa was introduced into German in the second half of the 18th century. It first appeared, probably noun-like, in compounds such as Rosaband – ‘rosa ribbon', later predicative uses evolved, and in vernaculars attributive uses also” (DWDS, 2012, s.v. rosa).

Cognate terms of rosarot existed earlier in German, Swedish, Danish, and Norwegian, but the term is now archaic or highly specialized. On the face of it, Bernese is the most complicated of all contemporary systems, in that it operates with four terms rot, rosarot, rosa, and pink. Most Bernese speakers in this study either use only rosa (9 speakers) or use a mix of rosa and rosarot (8

A 3 12 16 18 12 3 1 AB 1 4 7 16 8 BC 1 2 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 4 16 16 15 11 6 3 AB 4 12 16 12 BC 1 3 2 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 5 19 19 17 18 12 4 AB 8 18 18 16 BC 11 15 CD 2 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 6 16 18 17 12 2 AB 5 11 19 10 BC 1 6 6 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Rosa - German

Rosa - Bernese

Rosa - Swedish

Lyserød - Danish

Page 107: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

95

speakers). A few use rosarot exclusively (3 speakers). It is possible that there is a dialectal or sociolectal difference, rather than a semantic and conceptual one. For the Bernese speakers who distinguish between rosa and rosarot, the rosa term is often used for the lighter colors, and rosarot for darker ones. More data is needed to fully ascertain the usage patterns of Bernese Swiss German colors terms and their meanings.

Rosa is first attested as a compounded color term in Swedish in 1773, according to SAOB: rosa-färgad ‘rosa-colored’, and as an independent color term in 1868, describing (human) legs. At this early stage, rosa is often found as a modifier to rött ‘red’, as in rosa-rött ‘rosa red’ from 1819, and later with other colors as well: rosa-grå ‘rosa gray’, rosa-brun ‘rosa brown’, rosa-gul ‘rosa yellow’, rosa-violett ‘rosa purple’, and rosa-vit ‘rosa white’ (SAOB, 2014, s.v. rosa, entry publ. 1959). The Swedish rosa denotation does not, at first glance, seem to match the template posited for System B PINK1: crucially, there is still a strong rosa presence in C20 in Figure 9 – however almost a third of the rosa uses for C20 are in fact the compound ceriserosa. This seeming discrepancy will be further addressed when the PINK2 color terms are addressed in 5.6.

Before we turn to the PINK2 terms, the last of the PINK1 terms, Danish lyserød, must be considered. From a cross-Germanic perspective, Danish lyserød is lexically speaking quite odd, and the term poses several significant questions for color theory. Lyserød is a composite term meaning ‘light rød’, and in that sense it is a formal equivalent of Swedish ljusröd ‘light röd’ and Norwegian lyserød ‘light-rød’. At least superficially and formally, its denotation seems clearly to be included in rød ‘red’. Yet this section will argue that lyserød is, or at the very least is on the way to becoming, an independent color term.

The first argument for this is its distribution – see Figure 12. The usage distribution of lyserød in color space resembles English pink, Swedish and Norwegian rosa, and Icelandic bleikur – in other words, it matches the PINK1 comparative concept. It is unlikely that this is a coincidence – rather, all these terms represent different lexicalization strategies for the same color.

Page 108: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

96

Figure 12. Danish speakers’ use of the color terms lyserød (top); and rød,

excluding lyserød (bottom).

Figure 13. Danish speakers’ use of the color terms lyselilla (top); and lilla,

excluding lyselilla (bottom).

If one takes a closer look at tile B20, it is clear that 19 of 20 consultants said that the tile was lyserød, and only 1 of 20 called it en slags rød ‘a kind of rød’. Generally, the usage patterns of lyserød and rød are relatively complementary to each other, though not in cells C19, C20, and B1. This pattern of distribution can be compared to the patterns of Danish lilla and lyselilla in Figure 13, where there is far more overlap between the modified and unmodified term – there is only one chip, A15, that is only ever called lyselilla (by two people) and never lilla.

Further, lyserød takes modifiers in a way that lysegrøn or lyselilla do not. A Google search on Danish language web pages returned 4,325 hits for “mørk

A 6 16 18 17 12 2 AB 5 11 19 10 BC 1 6 6 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 AB 1 1 5 1 BC 4 3 2 1 CD 2 13 15 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Lyserød - Danish

Rød, excluding lyserød - Danish

A 2 1 3 1 AB 1 7 4 4 BC 1 1 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 5 1 AB 1 12 15 12 1 BC 17 19 17 5 CD 18 2 18 18 13 3 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Lilla - Danish

Lilla, excluding lyselilla - Danish

Page 109: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

97

lyserød” ‘dark light-red’, but only 2 for “mørk lysegrøn” and 79 for “mørk lyseblå”. Lyserød is also a far more frequent term in the Danish elicitation results than the other ‘light’ + ’red’ terms are in other languages, as can be seen in Table 13: there were 129 occurrences of lyserød, compared to just a few uses for the direct translations in the other languages.

Table 13. Raw number of “light + [color for RED]” and “[color for RED]” uses

in the elicited material. The small number of red in English is noteworthy, especially when contrasted with the Bernese material - this is partly explained

by the use of compounds (most frequently rosa-rot) in the latter language.

Initial fieldwork with Danish speakers also indicates that they refrain

from calling lyserød “a kind of rød”. They prefer to explain the meaning of lyserød as ‘like something rød, with some hvid in it’ (Levisen, pc.).

The fact that a compound like Danish lyserød has developed into a separate color concept is not unique in the Scandinavian context. For instance, in Icelandic there are two modal compound terms (appelsínugulur ‘orange’, ’yellow like an orange’ and fjólublár ‘blue like a violet’) that seem to have established their own coherent denotations independent of the roots gulur ‘yellow’ and blár ‘blue’. An even clearer, parallel example comes from the Finnish term vaaleanpunainen ‘pink’ (vaalea ‘light’ and punainen ‘red’), which Uusküla (2007, p. 389) establishes as a basic, and young, color term. Uusküla notes that vaaleanpunainen can take further light and dark modifiers, which is true for lyserød as well.

To summarize, English pink, Icelandic bleikur, Norwegian rosa, Swedish rosa, German rosa, Bernese rosa, and Danish lyserød all match the comparative color concept PINK1, but the denotative range of the comparative concept is modified in German, Bernese, and Danish. In the next section, it will be

Language Color term Responses, minus the 'light' responses

Modified color term Responses

Danish rød 76 lyserød, lys rød 129Norwegian rød 63 lyserød, lys rød 7

Bernese rot 115 hèurot 5

Swedish röd 89 ljusröd, ljus röd/rött, ljust röd/rött

2

Icelandic rauður/rautt 86 ljósrauður/rautt, ljós rauður/rautt

2

English red 33 light red 1German rot 93 hellrot, helles rot 0

Light-modified and non-modified RED colours

Page 110: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

98

argued that this is due to the existence of another word matching a secondary pink concept, PINK2.

5.6 SYSTEM B: THE PINK2 CONCEPT In the following sections, I will argue that the part of color space

denoted by Danish pink, Bernese Swiss German pink, German pink, and Swedish cerise all denote similar color concepts. I will suggest a comparative color concept, PINK2. PINK2 has fuzzy edges but is centered on B19, C19, and C20. The PINK2 color space is shown in Figure 14.

Figure 14. The cells that, cross-linguistically, have the most recurrent responses

for the local lexical matches of the PINK2 concept.

All of the 20 German consultants in the EoSS elicitation used the term pink. Both Danish and Bernese pink were used by 14 out of 20 EoSS participants. Figure 15 shows the answer distribution in the EoSS experiments for the term pink in German, Bernese, and Danish, and for the Swedish term cerise.

A AB * ? BC ? * * CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

PINK2

Page 111: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

99

Figure 15. System B languages’ PINK2 distributions.

The story of the cognate set “pink” in contemporary Germanic

languages is a complex one, but only so in recent times. From a historical perspective, there is one old pink (English), and three young ones (Danish, German, and Bernese Swiss German.) For a detailed discussion of the German loan, see Frenzel-Biamonti (2011).

German pink covers fewer cells than English pink, but has also spread towards the darker (lower) edge, and also covers C18, which English pink does not. German pink also has a stronger representation in the entire C-row, specifically in C19 and C20, than English pink. Bernese pink is similar to, but less frequent than, German pink and has no uses for the lightest color chips. Danish pink is also similar, but has some uses in the lightest row. For all pinks, except English, the frequency of use is highest for the B19, C19, and C20 chips.

A 3 4 1 1 AB 13 3 4 BC 8 15 18 1 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A AB 6 3 2 BC 1 9 11 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 2 2 1 1 AB 8 BC 6 11 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 AB 1 1 BC 1 5 9 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Pink - German

Pink - Bernese

Pink - Danish

Cerise - Swedish

Page 112: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

100

The difference between English pink (which corresponds to PINK1; see Figure 9) and the other pinks (which correspond to PINK2; see Figure 15 above) can also be seen in the results from the best example task in the elicitation session. The best example responses for pink show that the non-English pinks are all centered primarily on C20 and secondarily on B19; English has a slightly different pattern: speakers primarily voted for B19 and B20 as the best examples. English also has the lowest overall consensus on the best example of pink (6 people voted for B19; 5 for B20; 9 for other cells). In comparison the other languages have higher agreement: German (14 people voted for C20; 3 for B19; 3 for other cells), Bernese (10 for C20; 6 for B19; 4 for other cells), and Danish (9 for C20; 6 for B19; 5 for other cells).

Van Scherpenberg notes that pink has been attested in German since 1970 (2013, p. 81), and the historical ODS Danish dictionary has an attestation from 1952 in Danish (ODS, 2016, s.v. pink). The relative dates of these attestations should be taken with a grain of salt, though, since a specialized word (for example, in fashion) can appear in print before it is in general use, and a word in general use in spoken language can take a long time before it appears in print.

Both German and Danish already had words matching PINK1, and in the process of semantic integration, the new pink loanword came to denote a dark subpart (PINK2) of the previous rosa color space in German and lyserød color space in Danish.

The difference between English pink, on the one hand, and German and Danish pink, on the other, is not only denotational, however. In both borrowing languages, the borrowed pink is seen as a visually conspicuous, gaudy, bold color. This has been shown for German by Kaufmann (2006, p. 38), who examined the relationship between German pink and German rosa in a large corpus study of German newspapers. She notes that German pink is knallig ‘loud’ and a hyponym of rosa (see also Frenzel-Biamonti, 2011).

Similarly, Levisen (2012) defines Danish pink as skrigende ‘screaming’, a color calling for attention based on interviews with native speakers. The latter aspect is very similar to Kaufmann’s description of German pink as a knallig ‘loud’ color.

Figure 15 shows that Swedish has a term that is denotationally similar to the borrowed pinks: cerise. Unlike German, Bernese, and Danish, however, the PINK1 matching term in Swedish (rosa) is not noticeably weaker in C19 and C20 than it is in the languages lacking a PINK2 term (English, Icelandic, Norwegian). This is partly because many speakers use a compound term, ceriserosa, which is counted as rosa when only main responses are considered,

Page 113: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

101

as they are in Figure 15. The PINK2 area in Swedish has three typical responses from speakers: rosa, cerise, or the compound term ceriserosa.

The Swedish term cerise has its earliest attestation already in the 19th century, but not as a pinkish term. Cerise-röd ‘cerise-red’ is known from 1855 – a borrowing from French cerise ’cherry’.17 In 1889 the term itself is discussed in a catalogue of various technical substances and is identified as a kind of brown. A few decades later, in 1904, it is called a kind of red or a kind of brown (SAOB, 2014, s.v. ceris; entry published 1904). Thus, while the Swedish lexeme is older than the Danish and German borrowings, its denotation has probably shifted over time, and it is likely that it appeared in its dark pink meaning at some point in the middle of the 19th century – using only a dictionary as a source, that is as far as we get. It should be noted that the typical contemporary Swedish speaker does not link the term cerise to cherries: the Swedish word for ‘cherry’ is körsbär and most speakers are neither experts in etymology nor fluent in French.

It is also worth noting that the term cerise exists in Danish, though it did not emerge in the EoSS data. For instance, it appears in the DDO dictionary, where it is described as having a French origin, first introduced in kirsebær ‘cherry’ and with the meaning en klar rød farve ‘a clear red color’ (DDO, 2016, s.v. ceris). This definition would be odd for contemporary Danish pink.

The term cerise also appears once in the English data and twice in the Norwegian data, for C20 and for C19, C20, respectively. In the Norwegian data, it occurs only in the compound form ceriserød and is used by only a single speaker. While the term is used in spoken Norwegian and is attested from early 20th century in newspaper clothing advertisements and fashion reports (http://www.bokhylla.no), yet it is interesting that its use is so much less frequent in the Norwegian elicited data than in Swedish, despite the closeness of the two languages and speaking communities.

17 One English speaker used the term cerise, for C20.

Page 114: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

102

5.7 THE PURPLE1 CONCEPT This section will propose a comparative concept (PURPLE1), the

denotational footprint of which is remarkably similar in all the languages - Figure 16 shows the area. It forms a right angle triangle, with a rather sharp delimitation hue wise to the left (between the 15th and 16th column) towards the blue area, and a slope towards the redder area to the right. (Half the Norwegians speakers also indicated D1 as a lilla color, but this is not mirrored in the elicited data from the other languages.) This section will also note how a previous lexical replacement has taken place or is taking place in all the languages, leaving them with a vestigial, secondary purple concept. This secondary purple term exists in all the languages, but it is not helpful to label it a comparative concept, and the reasons for this will be addressed.

Figure 16. The PURPLE1 area. The majority of the PURPLE1 terms fall in these cells.

The purple color space is less complex than the pink in the seven Germanic languages, but equally interesting.

The same two cognate sets recur in the languages: “lila” and “violet”. The exception is English purple (a 9th century borrowing from Latin purpura). The word purple has had semantic change: it originally denoted a crimson red color, associated with emperors in the Roman Empire and royalty. The exact denotation of historical purple word is the subject of much scholarly debate (see Edmonds, 2000; Finlay, 2004; Garfield, 2000), but the meaning changed from red to designating a blue-red derived color (“Oxford English Dictionary,” 2016, s.v. purple). The infrequent German color term purpur also denoted a reddish color, and it was historically supplemented with, and then replaced by, violett, as described in Jones (2013, pp. 362, 370).

The “violet” cognate set is the older term in these Germanic languages. It was borrowed from French and is etymologically a simile of the purple lilac flower. Its color sense is attested in late 14th century in English (“Oxford English Dictionary,” 2016, s.v. violet) and in German dialects from the 14th and 15th century (DWDS), reaching New High German in the second half of the 17th century (Jones, 2013, p. 369). A few centuries after its appearance in

A * * * AB * * * BC * * * * CD * * * * * D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

PURPLE1

Page 115: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

103

German dialects it appeared in Swedish (first attestation in 1563 according to Hellquist, 1922) and Danish (first attestation in late 1700s, according to ODS, 2016).

The “lila” cognate set is a borrowing from French and is, like many of the other cognate sets discussed for these young colors, originally a flower simile, referencing the lilac flower. The word lilac is attested in English as a color adjective in 1791 (“The colour was more or less inclined to red, from lilac to violet” “Oxford English Dictionary,” 2016, s.v. lilac). In the early 1800s, the word is attested in German (DWDS, 2012, s.v. lila; Jones, 2013, p. 370). Its exact early denotation in German is unclear (it was some kind of purple), as shown by an extensive discussion in Jones (2013, p. 354), who quotes (Seufert, 1955, p. 134ff), calling it the unhappy child of colour lexicology. The cognate set is attested in both Danish (ODS, 2016, s.v. lila) and Swedish (2014, s.v. lila) from the late 18th/early 19th century.

The cognate sets are similar across the languages, and, as will soon be demonstrated, the concepts are also similar – but the way that cognates map onto concepts varies. All the contemporary languages express a single very similar, stable PURPLE1 comparative concept.

The languages also all have a secondary term inside the PURPLE1 denotative range. The exact placement of this secondary term varies, and therefore it is not productive to speak of a second denotationally cohesive “Purple2 comparative concept” in competition with the main PURPLE1. From an onomasiological point of view, there is nothing that unites the secondary words for purple in the different languages: they have different denotations. Semasiologically, the words belong to two different cognate sets (“lila” and “violet”). Yet it is still interesting to talk about the secondary purple words as a single set. They are the leftovers of an areal lexical replacement operation and will be referred to as a set: the Purple2 words.

There are three different systems employed, as seen in Table 14.

Page 116: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

104

Table 14. Three PURPLE systems System A

Main PURPLE1 Secondary rare, lighter Purple2 English purple lilac/violet Bernese violett lila

System B

Main PURPLE1 Secondary rare, synonymical Purple2 German lila violett Icelandic fjólublár lilla Norwegian lilla violet

System C

Main PURPLE1 Secondary rare, darker Purple2 Danish lilla violet Swedish(?) lila (violett?)

Page 117: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

105

System A is followed by English and Bernese, and is characterized by the pan-Germanic PURPLE1 comparative concept and a secondary rarer Purple2 word, which forms a subset of PURPLE1 located in the lighter part of the category. Bernese has violett for PURPLE1 and lila for Purple2. In English, there are two alternative Purple2 words: lilac (more common) and violet (less common).

Figure 17. System A languages have a term matching PURPLE1 (left), and a rarer Purple2 word matching a lighter color (right).

A 7 7 3 A 1 1B 3 13 13 8 B 2 2C 19 18 19 5 1 CD 18 19 19 19 15 5 D

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

A 2 3B 5 4 1CD

11 12 13 14 15 16 17 18 19 20 1

A 6 7 1 A 6 7 1 2B 1 9 10 7 B 9 9 8C 15 17 16 2 C 4 1D 17 18 18 17 14 D

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

Violet - EnglishPurple - English

PURPLE1 terms Purple2 terms

Lilac - English

Violett - Bernese Lila - Bernese

Page 118: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

106

System B is followed by German, Norwegian, and Icelandic. This system is characterized by a term matching the comparative concept PURPLE1, and also a secondary color term that is rarer, but has the same extension as PURPLE1 (in German and Norwegian) or a slightly larger extension (Icelandic). German has lila for PURPLE1, violett for Purple2, which confirms earlier findings that while lila is more common, violett is still a salient alternative to many speakers (Fan, 1996, p. 131p). Norwegian has lilla for PURPLE1 and fiolett for Purple2. Icelandic has fjólublár (composed of fjolu, from the same cognate class as violet, and blár, meaning ‘blue’) for PURPLE1 and lilla for Purple2. Icelandic lilla has similar extension to fjólublár, except that it also stretches over into bluish-greenish territory in columns 11-15, where a few speakers use it in compounds (typically, lillablár ‘purple blue’). The Purple2 words are used by very few speakers, but the fact that the weak phenomena recurs in the different languages gives credence to the assumption that this should not be dismissed as nonce uses.

Figure 18. System B languages have a term matching PURPLE1 (left), and a rarer

Purple2 word that is synonymous (right).

A 11 9 6 A 1 2B 1 17 18 15 1 B 1 4 1C 19 19 14 4 C 2 3 3 3D 15 18 18 15 13 D 3 3 4 3 4 1

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

A 9 11 6 1 A 1 1B 18 18 15 1 B 1 1 1C 17 17 17 9 1 1 C 1 1 1 1 1D 16 16 16 16 15 10 D 1 1 2 2 3 2

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

A 5 10 4 1 A 1 1 1 1 1 2 2 1 1B 2 16 18 11 1 B 1 1 3 1 1 1C 1 17 18 14 3 C 2 1 1 2 1D 16 19 18 18 13 1 D 1 1 1 1

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

Lilla - Norwegian Fiolett - Norwegian

Fjólublár - Icelandic Lilla - Icelandic

PURPLE1 terms Purple2 terms

Lila - German Violett - German

Page 119: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

107

System C is represented by Danish, and possibly Swedish, and it is the opposite of System A: here the rarer Purple2 terms cover a darker subset of PURPLE1. Danish has lilla for PURPLE1 and violett for Purple2. Swedish has lila for PURPLE1 and a Purple2 term: violett. Here Swedish is very tentatively placed in system C together with Danish, since two of three uses for Swedish violett are in the darkest row – there will be cause to re-examine and discuss this in the more detailed, multi-generational analysis of Swedish color terms that will follow in section 6. System C is illustrated in Figure 19 (note that the single Danish speaker using violett for the bluish-greenish C11 used the creative compound grønviolet).

Figure 19. System C languages have a term matching PURPLE1 (left), and a rarer Purple2 word matching a darker color (right). Swedish violett is only tentatively

placed in this category. Just as PINK was historically subsumed under RED in earlier forms of

the studied languages, the contemporary PURPLE color was historically subsumed under BROWN in the cultures of Western Europe. Traces of this can be seen in flower names, such as the very purple flower Prunella Vulgaris, which is known as Braunelle in German and brunört in Swedish (Paul, Henne, Kämper, & Objartel, 2002 s.v. violett). Paul et al. are of the opinion that the differentiation between BROWN and PURPLE occurred in the 1700s, while Courtade claims that the process was ongoing in the 1600s (1996, p. 79).

The most common contemporary lexicalization pattern is to have a match for the younger cognate set (“lila”) for the main PURPLE1 and a match for the older cognate set (“violett”) for a much rarer Purple2. Bernese is the opposite of its close sibling, German, in this regard – the Bernese speakers in

A 2 2 8 2 AB 2 19 19 16 1 BC 18 20 17 5 C 1 2D 18 20 18 18 3 D 1 2 2 2

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

A 11 14 2 AB 1 18 19 14 B 1C 20 20 20 2 CD 1 18 19 20 18 6 D 1 1

11 12 13 14 15 16 17 18 19 20 1 11 12 13 14 15 16 17 18 19 20 1

Violett - SwedishLila - Swedish

Violett - Danish

PURPLE1 terms Purple2 terms

Lilla - Danish

Page 120: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

108

this study use violett for the larger, more common PURPLE1, and lila for the secondary category. Similar to Bernese, Icelandic has an older “violett” cognate term, fjólublár, for PURPLE1, and a more recent loanword, lilla, for Purple2.

The contemporary lexicalization of purple in these related languages is most likely indicative of a stabilization of the purple category. The contemporary snapshot works as a clue to historical processes. In all cases, one term has supremacy and a secondary term (or terms: English has two competing secondary terms – lilac and violet) is being marginalized. In Swedish, the secondary term violett seems almost driven to extinction, the lexical replacement all but complete. In all the languages, the secondary term is either “in exile” in the lighter, or darker, part of the PURPLE1 denotation, or it is a denotative synonym of the PURPLE1 term, but much rarer. Fan (1996, p. 131) is of the opinion that the two terms manage, so far, to co-exist in German, though with lila as the far more common alternative; however, Altman (1999, p. 124) argues that here, as well, violett is steadily replaced by lila.

5.8 RARE RESPONSES From a German, Swedish, and Danish perspective (System B), the

English, Norwegian, and Icelandic languages (System A) seem to be lacking a PINK2 category. The System A languages do not have a color term that specifically covers the PINK2 area.

English has several color terms that a few speakers use for parts of PINK1. One of these, peach, is a modal term for C3. The other terms are rare: magenta, fuchsia, rose, cerise, puce, mauve, and coral. These terms are all similes based on objects – what Alvarado and Jameson (2002) and Jameson and Alvarado (2003) call “object glosses”.

Another interesting rare response is skär in Swedish. Its few uses indicate a very light kind of PINK1. It is generally seen as an archaic term, at least by younger speakers. From the perspective taken in this chapter it is difficult to say more about skär, but the next chapter will discuss the term at length.

There are several recurring rare terms in the languages for the PURPLE1 region as well – to give some examples, “lavendel” has one or two uses in several languages (Norwegian, Danish, German, English) and always indicates a very light purple. Lexical matches for the “purpur” cognate set occur once or twice in all the elicited data, except English and Icelandic, and always denote a dark color.

Page 121: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

109

Most of the languages have a range of minor terms in the PINKs and PURPLEs – chapter 6 will look closer at how major and minor terms interact in a particular language: Swedish.

5.9 DISCUSSION This chapter has discussed complex cases of lexical replacement and

their interactions with other kinds of lexical change. Through the combination of contemporary experiment results with “young” speakers (most in their 20s) and historical knowledge from dictionaries, the historical progression of these processes has been made clearer.

In the data discussed in this chapter, there have been three lexical change processes involving lexical replacements:

� The first kind is the advent of the PINK1 and PURPLE1 concepts in these languages.

� The second is the partial replacement of part of PINK1 with PINK2. � The third is the fading of the words in the Purple2 collection.

The sociohistorical precursor to the elaboration and differentiation of Germanic color vocabularies is a series of technological and social developments, such as the emergence of Venice and Florence as major centers for dye manufacturing during the Renaissance. The Renaissance color explosion resulted in several waves of color terms spreading across Europe (Casson, 1997). In the second wave, in the 17th through 19th centuries, advances in the chemistry of dye experimentation (Casson, 1994, pp. 16–17; Jones, 2013, p. 107) and the availability of more easily dyed Indian cotton fabric (Hanna Hodacs, p.c.) led to the presence of stable colors (reproduced in the same way more or less every time) in the lives of Europeans.

One of these colors was a lighter kind of red, another, a color in-between blue and red. Eventually the regular use of these colors in everyday life and conversation was enough for speakers to start awarding the colors their own independent color term.

There were few independently lexicalized words for the pink region in color space before the 17th and 18th century in the Germanic languages. However, all contemporary Germanic languages have at least one pink color concept (PINK1), and some have two (PINK1, PINK2). The color concept PINK1 has been lexicalized through different lexicalization strategies: Swedish rosa, Norwegian rosa, German rosa, Bernese Swiss German rosa, English pink, Icelandic bleikur, Danish lyserød. Throughout, the PINK1 area remains remarkably stable. The color concept PINK2 is lexicalized as German pink,

Page 122: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

110

Bernese Swiss German pink, Danish pink, and Swedish cerise. This color concept is not lexicalized with a salient color term in English, Norwegian, and Icelandic.

Further, in the languages where there is a PINK2 word, the extension of PINK1 is often somewhat restricted. This is most likely an ongoing case of lexical replacement, though not a prototypical kind. The referents referred to by PINK2 are the same kind of referents that were most likely earlier referred to by words that now denote PINK1 (or some sort of RED concept): a lexeme used has been replaced. But there has also been semantic change: the concept of PINK2 has stabilized as an alternative to PINK1 for particular parts of the perceptual color space, and the denotation of PINK1 has changed. The relative youth of PINK2 is evident not only from earlier work and dictionaries, but also in its cross-linguistic presence: it is possible that it will eventually come into the other Germanic languages in the same way that PINK1 once did, but for now the presence is not uniform in this group of languages. When the concept does appear, it has a clear constant center, however: C19 and C20. The current situation in the PINK2 languages resembles MacLaury’s (1997) second case of new color terms’ establishment (a new term claims a part of an old one, first as a hyponym, then as an independent term), and the fact that it was the darker periphery that was separated, chimes well with the observations of Archibald (1989) that the lighter or darker parts of concepts often separate (these hypotheses were discussed in 4.2.2). To simplify: just as PINK1 once split off from the lighter part of RED, PINK2 is now splitting off the darker part of PINK1.

Modern-day PURPLE1 made its appearance in the Germanic cultural sphere in Western Europe in the last few centuries: it appeared in the larger languages (English and German) in the 14th-15th century, and then spread northwards. There is a single cross-linguistic PURPLE1 category in the languages, labeled as versions of lila in German, Norwegian, Danish, and Swedish, and as fjólublár in Icelandic, violett in Bernese, and purple in English.

The data in this chapter suggests that the languages also have a secondary purple, with the possible exception being Swedish. There are so few uses of Swedish violett that on its own, it is quite a stretch to talk about as representing a trace of an old concept that was at some earlier time generally recognized in the speaker community. Here the strength of cross-linguistic comparison method is apparent: when the Swedish violett uses are considered together with the data from the other languages, we can infer that the Purple2 words are the remnants of an areal lexical replacement process, instigated by real world factors like advances in dyeing. The cross-linguistic commonality of

Page 123: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

111

Purple2 is the fact that it is a fading color concept covering part of PURPLE1’s denotation.

The consistent existence of secondary Purple2 color terms (though not with the cross-linguistic denotative similarities that PINK2 has) in the languages suggests an earlier lexical change – this is also supported by dictionary entries that claim that the terms were common before, but are becoming rarer and rarer. The most likely interpretation is that lexical change has pushed the secondary terms close to extinction and marginalized them to either the darkest or lightest part of the purple color space, or just marginalized them overall, making it a very infrequent synonym of the main term. However – with only a synchronic snapshot of the contemporary denotations of speaker groups with a rather low median age, it is difficult to tell more precisely what has occurred. Was this a prototypical lexical replacement? Did the Purple2 words once denote exactly what is now defined as the PURPLE1 area, only to be pushed out by a new lexeme at approximately the same time in all the languages?

The finding that Danish lyserød is a basic color term, despite its apparent modified form (lyse ‘light’ + rød ‘red’), underscores the necessity for careful consideration of basic term status judgments. Alvarado and Jameson (2002) noted that modified terms might be a sign (together with complex terms and rare terms) of low consensus areas where change is more likely. The case of Danish lyserød is interesting in light of this. The modified form, increasing in frequency, was most likely a sign of the increased salience of PINK as an independent concept. But this modified rød form was not, as it was in the other languages, given up for another, simpler morpheme in Danish. The fact that this same development has been found in Finnish (Uusküla, 2007) shows that this is not a one-time example.

In all cases, the lexical replacement processes must be analyzed in tandem with lexical change. It is equally true to say, for instance, that PINK1 was born as a conceptually independent color in these connected cultures in the last few centuries, as it is to say that PINK1 replaced RED in part of REDs denotation. Objects that are now called Rosa in German or bleikur in Icelandic existed and were categorized as another color before PINK1 was born. The details of such a shift require a more diachronic perspective, however, which will be provided in chapter 6. That chapter will delve deeper into the question of lexical change in the pink and purple regions of color space, but within a single language, Swedish, and two generations of speakers. This will be combined with a more exhaustive study of dictionaries and encyclopedias, as well as corpora and interviews with older speakers.

Page 124: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

112

6 MICRO-PERSPECTIVE: DIACHRONIC LEXICAL CHANGE IN PINK AND PURPLE

This chapter will use a narrower time scale than the previous chapter, in order to gain insight into the details of lexical replacements connected with the PINK1, PINK2, and PURPLE1 color concepts, and the disappearing Purple2 collection of words. There are some changes of perspective between this chapter and chapter 5: Chapter 5 took a largely synchronic view (though analyzed in the light of historical data) and did an interlanguage comparison of color terms in seven related languages, inferring diachronic development from the analysis. This chapter takes a diachronic view, combining information from dictionaries, encyclopedias, corpora, and interviews, with results from an elicitation experiment of color terms in two generations of a single language. No source is sufficient on its own, but by triangulating (combining) data from these sources, a patchwork image of lexical replacement and change in these domains is revealed.

The previous chapter 5 defined the denotation of several color words in seven related Germanic languages in the pink and purple part of color space. By comparing the lexicalization patterns and strategies of the languages (both through modern-day elicitation results and through information in historical dictionaries), it became apparent that a new color concept, PINK1, (lexicalized in several different ways) had affected the denotation of an older color concept, RED, in all the studied languages during the last few centuries. In some of the languages, a secondary color concept, PINK2, had also emerged, lexicalized as pink in German and Danish, and as cerise in Swedish, and in turn affected the denotation of PINK1. A PURPLE1 color concept had replaced a part of an older color concept, probably BROWN, though the contemporary presence in all the languages of a rarer secondary Purple2 word, now relegated to a marginal existence, indicated that some earlier lexical competition had occurred, the details of which were difficult to derive from that material.

The data in chapter 5 indicated that the Swedish speakers in the experiment had a PINK1 category lexicalized as rosa; a smaller and darker PINK2 category lexicalized as cerise, which might be gaining in prominence; a PURPLE1 category lexicalized as lila; and a Purple2 word violett, which might be fading, its denotation taken over by PURPLE1. In this chapter, two additional color terms will be brought into the discussion: skär, a pink term, and gredelin, a purple term. The terms are rare answers in the data from the younger generation: skär was used four times; gredelin only once. However, they have

Page 125: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

113

both had periods of heavy prominence in the Swedish color vocabulary during the last few centuries.

The data in chapter 5 was from speakers typically in their twenties. By adding an older generation of Swedish speakers, it will be easier to study these processes.

This chapter thus focuses on lexical replacement (and other intertwined lexical change processes) in a particular semantic domain (color) in a particular language (Swedish) during a particular time. The chapter aims at both descriptive, theoretical and methodological contributions to the field of color linguistics. Descriptively, I wish to examine if there is more intergenerational difference in the labeling of pink and purple areas than in other parts of the perceptual color space; and, if so, describe the difference between the generations in detail. As for theoretical contributions, the study can enrich the existing discussion regarding theories of color lexical replacement processes and suggest additions or amendments to the claims put forward by, among others, MacLaury (1997), Rakhilina and Paramei (2011) and Lindsey and Brown (2014). As a methodological contribution, I wish to argue for the usefulness of comparing color categorization behavior in elicitation experiments featuring different generations, rather than just using elicitation experiments to compare different languages. I will suggest several quantitative methods for how such comparisons might be done. I also wish to underline the need for triangulation of different data sources and methods so that the inherent challenges with the color elicitation method can be compensated for by a grounding in sociohistorical context. The place of the present study within the thesis is illustrated in Table 15.

Page 126: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

114

Table 15. The place of the present study within the thesis. Chapter 3 Chapter 5 Chapter 6

Time

scale

&

scope

Macro: several millennia,

87 language varieties

Meso: several centuries, seven Germanic

languages

Micro: two generations,

one language

Method

&

material

A statistical model tests domain-independent

hypotheses about lexical replacement

(based on a database of cognate class judgments of a

Swadesh list)

Comparison of variation,

using elicitation experiment results, supplemented with

dictionary data

Comparison of variation and change, using

elicitation experiment results, supplemented with

interviews, dictionaries, floras,

corpora Domain Core vocabulary Color, with focus

on pink and purple Color, with focus

on pink and purple The chapter will be organized as follows: section 6.1 will present a

historical overview of the lexical development of the color terms, using a variety of source materials including dictionaries, encyclopedias, and corpora. This will be supplemented with interviews with older speakers in section 6.2. Both textual sources and interview results will be analyzed together in section 6.3, and this joint analysis will form the historical context for section 6.4, which presents results from an elicitation experiment with two generations of Swedish speakers, focusing on differences and similarities in their use of color terms.

Page 127: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

115

6.1 WORDS FOR PINK AND PURPLE IN HISTORICAL TEXTS This section will focus on Swedish textual data from the last few

centuries. The data sources are presented in 6.1.1, and 6.1.2 summarizes what these sources can tell us about the historical lexical change in pink and purple.

6.1.1 TEXTUAL MATERIAL This historical review is based on dictionaries, encyclopedias, corpora,

and botanical encyclopedias compiled since the beginning of the 19th century up until the present. These sources consist of the Dalin dictionary (1850); the continuously published Swedish Academy Dictionary SAOB (SAOB, 2014, entries from 1893 to 2014); the SAOL (Swedish Academy Word List) with updated versions from 1874 and onwards; an etymological dictionary (Hellquist, 1922); two editions of a multi-volume encyclopedia (Nordisk Familjebok); two corpora drawn from Swedish novels; and several botanical encyclopedias. In the next few paragraphs, these will be described in greater detail.

The Dalin dictionary (1850) was the first Swedish defining dictionary (see Anna Helga Hannesdóttir, 1998, p. 514 for a discussion of Swedish dictionary history). It was followed in 1874 by the first edition of the Swedish Academy wordlist (SAOL), which covered 35,000 words and was heavily influenced by Dalin's work (see discussion in Gellerstam, 2009, p. 54). SAOL's first five editions were more or less identical, but the sixth edition in (1889) was expanded to 40,000 words, and the seventh to 71,000 words (1900), the 8th to 85,000 words (1923), the ninth to 155,000 words (1950). After that, the editions saw a reduction in the number of words, as archaic uses and compounds were removed: the 10th edition (1973) added 7,000 new words, but removed 20,000 old ones. This new policy of additions of new material and removal of archaic terms was followed in the 11th (1986), 12th (1998), 13th (2006), and 14th (2015) editions. The long tradition of successive SAOL editions means that changes in words might get documented – however, it is impossible to know if material has been copied from earlier editions without much scrutiny. It follows from this that the lack of changes in entries does not necessarily reflect that there has been no change in meaning or usage.

SAOB (Svenska Akademiens Ordbok, The Swedish Academy Dictionary) began publication in 1893, and its editors have been working their way through the alphabet – as of the beginning of 2017, the editors are

Page 128: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

116

working on V (but have, alas, not yet reached violett). Each entry is a very detailed scholarly description of the word and its history.

Another source for investigation into Swedish meaning and usage changes is the consecutive editions of the Nordisk Familjebok encyclopedia, published in 20 volumes from 1876-1899, and 38 volumes from 1904-1926.

The frequency and collocations of the color terms were also investigated in several corpora of Swedish. The corpora were divided into two time periods: one older period consisting of Swedish novels published between 1830-194218 and one newer period consisting of Swedish novels published between 1976-1999.19 The headwords that the color terms modified (either in attributive or predicate fashion) were categorized into intuitive semantic categories, and the most frequent of these categories will be discussed.

A final source of data was the descriptions of flower colors in floras (botanical encyclopedias). Some of the earliest Swedish language floras that contain color descriptions are Hartman (1866), Larsson (1868), Kindberg (1861), and Thedenius (1871). The descriptions of the flowers in Kindberg (1861) and Kindberg (1877) are identical, but the latter has more flowers. Another national flora came along at the turn of the century: Neuman and Ahlfvengren (1901), and two decades later Krok and Almquist also published a national flora (1920). For most of the 20th century the available floras were mostly reprints of earlier floras, or had too few of the relevant flowers to be included (this is why Hylander, 1953, is not used as a source). Later works are Mossberg, Stenberg, and Eriksson (1992), Sandberg and Göthberg (1998), and the comprehensive digital database DigiFlora (Nordenstam, Larsson, Hansson, & Tönnby, 2012).

For a few works, there were only one or two pink or purple flowers present in the flora, and that flora has therefore been excluded from consideration for that color (e.g. Ursing (1944) is not a source for the pink flower material discussed in 6.1.2.1).

18 “Äldre Svenska Romaner Korpus,” 2015, Språkbanken (the Swedish Language Bank), Gothenburg University, http://spraakbanken.gu.se/, accessed 20150603. Total: 4.3 million tokens. 19 “Bonniersromaner I (1976/77)” 2014, “Bonniersromaner II (1980/81)” 2014, “Norstedtsromaner (1999)” 2014), Språkbanken, Gothenburg University, http://spraakbanken.gu.se/, accessed 20150603. Total: 13.4 million tokens.

Page 129: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 130: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

118

Table 16. The etymological history of the terms (continued on next page.)

First

Attest.1840 1860 1880 1900

SAOB: 1868 (about legs)

In NF Lexicon in entry on colors, mixed color fr.

blue+red; violett +yellow;blue+orange

In NF lexicon as a series of mixed colors

SAOL: adj

Rosa appears in fiction corpus data v. rarely.

Dalin: 1)light, clear, pure re:

e.g. skin, voice2)light red

SAOL: 1)adj, clean, pure

2)adj, light red

SAOB: a certain nuance of röd '?red' or

rödbrun '?red brown', similar to the col of

ripe cherry

NF Lexicon: Color of lilac

flower

NF Lexicon: Lilacs colored, light gredeline '?light gredeline', pale

violett

SAOL: noun.

Dalin: One of the major colors, 7th in

rainbow, mix of blue and red.

NF Lexicon: gredeline is often used instead of

violett

NF Lexicon: red + violett makes purpur

SAOL: adjective, noun

Dalin: color of flax flowers

SAOL: gridelin changes spelling to

gredelin

NF Lexicon: violett is a likened to lila

Violett in fiction corpus, about clothing (accessories), flowers. Describes similar domains as gredelin.

Lila not present in fiction corpus (except once, in 1928 describing a can)

SAOL: adjective

Gredelin in fiction corpus, about clothing (accessories), flowers.Describes similar domains as violett

SAOB: 1858

(about paper/ fabric)

Skär frequent in fiction corpus data, re pure/clear voices; clothing (accessories), flesh etc.

Ceris not present in fiction corpus data

SAOL: adjective

SAOL/Dalin: Not listed, only rosaktig (rose-ish) or rosenfärgad (colored like a rose).

SAOL: 1)adj, clean, pure

Rosa not present in fiction corpus data

Ceris

Violett

Gredelin

Skär

Rosa

Lila

SAOB: 1868

(about legs)

SAOB: 1808 (chair, about a hat

SAOB: 1563 (fiolett, fiolidt about silk,

velvet)

SAOB: 1694

(gredlin, about fabric)

SAOB: 1809 (lilas, about gown)

Page 131: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

119

1920 1940 1960 1970 1980 1990 2000

SAOB: A mixture of white and (pale) red (similar to the light red color of wild

roses).

SAOL: adj

No fiction corpus data available

SAOB: pale red, light red,

rose colored

SAOL: 1)adj, clean, pure

2)adj, light red

SAOL: 1)adj, clean,

pure2)adj, rosa

Hellquist: A pale violett

SAOB: A color (nuance) which is formed as a

mixture of red and blue (similar to the violett color

of the lilac flowers), a (pale) violett color.

SAOL: adj, noun, violett

colored

SAOL: adj.But gredelint =

noun

SAOL: adjective, noun

No fiction corpus data available Gredelin mostly absent from fiction corpus

No fiction corpus data available

No fiction corpus data availableSkär frequent in fiction corpus data, mostly re flesh,

flowers, clothing (&accessories). No clear/pure uses

SAOL: adjective, noun

Violett in fiction corpus, mainly about sky, but also about clothing (accessories), flowers etc.

No fiction corpus data available

SAOL: adj, clear red

SAOL:violett, color of lilacs

SAOL: 1) noun, violett color or color of lilacs , 2) adjectiveSAOL

1) adjective, 2) noun, violett color or col of lilac flowers

SAOL, adj, noun, blue-red, violett -blue

No fiction corpus data availableLila in fiction corpus, often about clothing

(&accessories), flowers etc.

SAOL: synonym to light red or skär

SAOL: adj, noun

SAOL:1)adj, rosa

2)adj, clean, pure, clear

Ceris not present in fiction corpus data

Rosa frequent in fiction corpus data, mostly re fabric, clothing (accessories) etc.

Page 132: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

120

6.1.2.1 THE PINK TERMS: TEXTUAL DATA Skär has the oldest attestation of the three pink terms (skär, rosa, cerise):

in 1808 it was used to describe a hat. The word’s French origin was far more apparent in its first spelling: chair ‘flesh’ (SAOB, 2014, s.v. skär; entry orig. published 1975).

Skär is also the only pink color term that has its own entry in the Dalin dictionary, where it is treated as polysemous, having two main senses: 1) ljus, klar, ren ‘light, clear, pure’ (regarding e.g. skin, voice), and 2) light red (Dalin, 1850, s.v. skär). Historically, the two senses are homonyms: the second sense comes from the French loanword for flesh as mentioned above, but the first comes from Old Norse skaer ‘to shine’ – cf. English sheer ‘light, translucent’, as in sheer silk stockings. The ‘pale red’ and the ‘light/clear/pure’ sense are semantically close enough for a folk etymological reanalysis of skär as polysemous, and, with the exception of the etymological dictionary of Hellquist (1922), no other dictionary treats the two readings as homonyms. In the two first editions of SAOL (1874 and 1889), skär is only listed with its ‘pure, light’ sense. The light red sense mentioned 20 years earlier in Dalin is gone, despite the fact that much of the SAOL material came directly from Dalin. This omission probably indicates the lesser importance of the color sense of skär at the time.

Some background on the rosa term in Swedish was given in section 5.5 to facilitate comparison with the other Germanic languages. That brief description will now be expanded upon.

The first attestation of rosa as an independent lexeme is from 1868 (describing legs), but as a compound with -färgad ‘-colored’, or as a morphologically bound modifier to other words, it has scattered uses from the beginning of the 1800s. Rosa has no entry in Dalin (1850-1853). Instead, the dictionary mentions, under the entry for ros ‘rose’, both rosaktig ‘rose-ish’ and rosenfärgad ‘rose colored’.

Rosa is not mentioned at all in the early SAOL editions (1874 and 1889). The 1882 edition of the Nordisk Familjebok encyclopedia mentions rosa under the general “Color” entry: the encyclopedia tells us that rosa is a mixed color that can be attained by various mixes, such as red and blue, violett and yellow, or blue and orange (“Nordisk familjebok,” 1876–1899, s.v. färg).

In 1900, skär is listed with both its ‘pure’ meaning and its secondary ‘light red’ meaning in SAOL. Rosa is not listed, as is its compound rosafärgad ‘rosa colored’, but no explanation is given.

At the very beginning of the 20th century, then, skär is listed in dictionaries as light red (as a secondary meaning) and rosa only rarely has its

Page 133: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

121

own dictionary entry, but is defined in one encyclopedia as a mixed bluish-reddish color, the nuance of which is uncertain. The latter is confirmed in the next edition of Nordisk Familjebok (“Nordisk familjebok,” 1904–1926, s.v. Rosa), where rosa is described as a series of mixed colors. It is not until SAOB in 1940 that rosa gets its meaning more strictly defined: it is a mixture of red and white, a light red. Rosa gets no meaning explanation in SAOL until the 11th edition (SAOL 1986), where it is described as ljusröd ‘light red’, with skär as a synonym. From the seventh (1900) to the tenth (1973) edition of SAOL, skär is described as primarily ren ‘pure/clean’, and secondarily as ‘light red’. Then, in SAOL (1986), skär’s secondary sense is described as a rosa instead of light red, and in the next edition, SAOL (1998), the color sense is promoted to the first listed sense.20

Having discussed these various sources, there are several reasons to believe that a new Swedish PINK concept was being partitioned off from RED in the beginning of the 19th century. Skär, the most likely candidate for a lexical label for the new concept, was described in dictionaries as pale red. Rosa was not yet an independent morpheme, and the fact that ros- could appear in compounds may be, according to Rakhilina and Paramei (2011), a typical sign of an early stage of a color term’s introduction in a language. Finally, as seen in chapter 6 (and also in Vejdemo et al., 2015), several neighboring Germanic languages lexicalized pink in the 18th and 19th centuries.

The history of the youngest of the three pink terms, cerise, was discussed in the previous study, in section 5.6. To briefly recapitulate, its first attestation is from as early as 1858 (when it described the color of fabric or paper flowers used for decoration), and for a long time it was part of a rather specialized fashion and fabric discourse. In 1904, it was described in SAOB as a certain nuance or certain nuances of röd ‘red’ or rödbrun ‘red brown’, which are similar to the color of the ripe cherry (SAOB, 2014, s.v. ceris; entry orig. published 1904). It did not get an entry in SAOL until 1986, and from that edition and forward it was described as klarröd ‘clear(ly) red’. Foreshadowing

20 An anecdote: growing up in the early 80s, I was only ever aware of the color sense of skär and never of the ‘pure/clean’ sense – until recently I assumed that the old idiom ren och skär tur meant ‘pure and pink luck’ and that Skärtorsdagen, the Holy Thursday before Easter, commemorating the last supper in Christian mythology, should be understood as ‘the pink Thursday’, not ‘the pure Thursday’. Older speakers often find this amusing while my contemporaries tend to agree with my interpretations.

Page 134: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

122

the elicitation experiment results, it is worth noting that the categorization of cerise as a kind of röd, seen in SAOL, does not match the contemporary meaning of the word in spoken Swedish for younger people, though the association between röd and cerise is stronger for the older generation.

The dictionaries and encyclopedias give little information on the type of objects the words describe. Two corpora of Swedish fiction texts were used to investigate the collocations of skär and rosa: cerise does not appear in any of the fiction corpora. The older corpus contained Swedish novels from 1830-1942, and the newer contained Swedish novels from 1976-1999.

Table 17. Results, categorized by semantic domain of headword, from an older

(1830-1942) and a newer (1976-1999) fiction corpus. The years given in the table indicate the first and last appearance of the word in the material.

Skär can be found in the earliest fiction works in the corpus (1839). Dictionaries and encyclopedias of this time did not always acknowledge the color reading of skär and instead highlighted its more abstract reading ‘light, clear, pure’. In contrast to this, even the earliest fiction books used the color sense of skär (about three-quarters of all uses) more frequently than the more abstract sense (about one-quarter of all uses). The color sense of skär in the oldest books in the corpus usually describes flesh or clothing (accessories), but also flowers.

There are only six instances of rosa in the older fiction corpus - the earliest is from 1909, the last from 1924.

In the newer texts (1976-1999), skär is still more common than rosa, and skär has mostly lost its ‘clear, pure’ abstract meaning: there are only three

accessoriesclothing

color (noun?)fabricflesh

flowerfossilized idiomlight, clear, pure

otherTotal

1 18

1 12 4 73 15 34 34

Older fiction corpus Newer fiction corpus

Rosa 1909-1924

Skär 1839-1937

Rosa 1976-1999

Skär 1976-1999

25 13 889 6 41

5 4 32 18 15

2 25 17 1006 118 96 309

25 3

Page 135: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

123

occurrences, in religious contexts). The ‘clear, pure’ sense has led to a fossilized idiom: ren och skär ‘pure and skär’, which is used in a similar fashion as the English cognate sheer (it was sheer luck/folly). Skär, in the color sense, still has the same common types of headwords in both older and newer texts, however: mainly flesh, clothing (and accessories), and flowers. In the realm of fashion, rosa and skär have similar amounts of hits in the newer corpus – both colors were originally introduced in Swedish in fashion contexts. Both rosa and skär are also used to a similar extent with fabrics (indoor decorating, etc.). However, percentage wise, since rosa is still the more uncommon term, the fashion related uses are far more important for rosa than for skär. From the newer texts, the impression is that skär is the common, everyday word, and that while rosa might also make such a claim, it has a decided semantic focus on clothing (accessories) and fabrics. These data tell us little about the denotations of rosa and skär.

The gradual increase of rosa can also be seen in the botanical encyclopedias. Botanical encyclopedias are a specific genre – color descriptions serve the purpose of helping the botanist locate flowers in nature, but since the color of flowers vary within a species, while the number of petals or leafs remain the same, the color terms used are often short and plain. In the modern DigiFlora (2012), it is possible to search for flowers by a set number of color terms – rosa is one of them. There are no skär or cerise uses in the DigiFlora.

Eighteen flowers, described as rosa in DigiFlora (2012), were chosen – they are all wild, and their first recorded botanical find is from before the first flora under consideration was published. The results are presented in Figure 20.

If DigiFlora (2012) is compared with historical floras, it becomes clear that skär is a rare term within the botanical encyclopedia genre (though it does appear, as shown in the Introduction to this thesis, for the field bindweed flower). A lexical replacement has transpired in the botanical textual tradition, but it was from variations of the color term röd and to variations of rosa. This can be contrasted with the fact that a large portions of the uses of skär in the newer fiction corpus (see Table 17) was for flowers. This seems contradictory until the kind of flowers described in the corpus are considered – over half the flowers described as skär in the fiction corpus are roses, which indicates that the idea of skär ros ‘pink flower’ might be used in the literature because it has an important cultural significance. And it does: it is a symbol of affection.

In the botanical encyclopedias, a typical historical development can be illustrated by the flower Rosa Dumalis. This was called röd eller vit ‘red or

Page 136: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

124

white’ in Larsson (1868). Thirty years later it was labeled blekröd ‘pale red’ by (Neuman & Ahlfvengren, 1901). In Krok and Almquist (1920), it is described as djuprosa ‘deep rosa’, and in DigiFlora (2012) it is called mörkt eller ljust rosa ‘dark or light pink’. The flower is the same, yet it has different color descriptors. This evolution, from a modified röd ‘red’ descriptor to a modified rosa ‘pink’ descriptor, is typical for the material.

Figure 20 illustrates the gradual waning of the röd descriptors and the waxing of the rosa descriptors. In the figure, -rosa- and -röd- indicate modified responses, like ljusrosa ‘light rosa’ or rödaktig ‘reddish’. The singular frequency of rödlätt ‘reddish’ has led to those occurrences being counted in its own category. Other color terms (like color compounds, vit ‘white’, blå ‘blue’) are not included in the graph. The very small number of flower descriptions for many years should lead to caution in interpreting the results. All the 18 flowers investigated do not appear in some floras, or they appear but without color descriptors at all. In the 19th century floras, the PINK flowers are typically described as plain röd or as modified, non-prototypical cases of röd, such as rosenröd ‘rosy red’, ljusröd ‘light red’, or rödlätta ‘reddish, red colored’. Another (rarer) early alternative was to use compound color terms like rödblå ‘red blue’; purpurröd ‘purple red’; and rödviolett ‘red purple’ (not included in Figure 20).

Some differences can be noted between the 19th and 20th century floras. The use of rödlätt ‘reddish’ declines, though it makes a single reappearance in the 1920 flora (Hottonia Palustris). Rosenröd ‘rosy-red’ (Ononis Spinosa and Rosa Canina) becomes the most common -röd- modified term. This rosen- ‘rosy-; like a rose’ morpheme is clearly a precursor to rosa, but it cannot stand alone as an independent morpheme.

The first instance of rosa ‘pink’ as an independent headword in a flora is in Krok and Almquist (1920), in a modified form (djuprosa ‘deep rosa’). As an illustration of the emergence of the color term rosa, Rosa Canina was described as blekt rosenröd ‘pale rosy red’ in 1901, but as ljust rosa-vit in 1920.

By the 1992-1998 sources, the plain röd ‘red’ descriptions are gone, and modifications of -röd- and plain rosa ‘pink’ are the rule (Sandberg et al., 1998). In the digital flora (Nordenstam et al., 2012), the -röd- modifications are even fewer, and plain rosa and modified -rosa- are the typical color description tools.

The move from röd to rosa as the main descriptor in the floras does not happen equally fast for all parts of the pink perceptual color space. The lightest PINK flowers are the first to change from a modified röd descriptor to a rosa descriptor. Malva Moschata and Stachys Palustris are identified as ljusröda ‘light red’ in even the earliest of the floras, but by modified -rosa- ‘pink’

Page 137: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 138: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

126

6.1.2.2 THE PURPLE TERMS: TEXTUAL DATA. There are many signs that violett was the most salient general word for

purple in the beginning and middle of the 19th century. It has the earliest first attestation of the three purple words (violett, gredelin, lila): it was used already in 1563 for describing silk and velvet (Hellquist, 1922). SAOB quotes many examples from the early 19th century, mostly having to do with clothing (SAOB, 2014, s.v. lila; entry orig. published 1940), and the Dalin dictionary, discussing the perceptual color space, calls violett a mix of red and blue and names it as one of the major color terms, the 7th in the rainbow (Dalin 1850-1853).22

A second purple term, gredelin, has a first attestation from 1665, according to SAOB, but the first example of it as a modifier to a noun is from 1694, when it was used to describe fabric (SAOB, 2014, s.v. gredelin; entry orig. published 1929). The term has its own entry in Dalin (1850 s.v. gredelin), but is described as the color of the flax flowers (which are, to this author’s eye, typically blå ‘blue’, with sometimes a faint hint of lila ‘purple’), from which it etymologically derives (French gris de lin ‘grey of flax’).

A third term, lila (from French, denoting the lilac flower), leaves far less of a mark in the textual material from the early and middle 19th century. Its entry in SAOB has its earliest attestation from 1809, describing a white and lilas empire gown (note spelling) (SAOB, 2014, s.v. lila; entry orig. published 1940). Lila is not listed in the Dalin dictionary as an independent entry, nor is it mentioned in the Dalin entry on colors (1850, s.v. färg).

In 1874, the first SAOL Word List is published, and its entry for violett has no meaning explanation given, which is the dictionary norm for most common words. The gredelin entry has a note saying that the spelling and pronunciation of gredelin is changing from gridelin to gredelin – this would have made the word’s origin from French gris de lin ‘grey of flax’ more opaque to those speakers who knew French and might hitherto have made the

22 This dictionary entry is most likely influenced by the English cognate terms used in Newton’s (1660) treatise on color. Newton famously performed an experiment in which he shone a white light on a prism and observed that the light divided into a rainbow. He recognized seven “primary colours” in the continuum. The last three, at the end of the rainbow, he named blue, indigo, and violet. Waldman (2002, p. 193) writes that a contemporary English observer repeating the experiment would recognize that the color denotations of these terms has shifted: Newton's blue denotes what is now more appropriately called blue-green or cyan, and Newton's indigo now denotes what contemporary speakers would call blue.

Page 139: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

127

connection. Opaqueness of etymological origin can be (but need not be) a sign of an increase in the salience of a term. Lila has no entry in SAOL (1874).

The next piece of data comes from the Nordisk Familjebok encyclopedia published in 1893, where it is written that gredelin is often used instead of violett, suggesting that violett used to be the standard term, and that this status is now being challenged (“Nordisk familjebok,” 1876–1899, s.v. gredelin). Another sign that gredelin might be growing in salience at the turn of the century is that in the next Nordisk Familjebok encyclopedia, from 1912, lila in the lila entry is described primarily as a light gredelin, and secondarily as a pale violett (“Nordisk familjebok,” 1904–1926, s.v. lila). Gredelin is then, at this point in time, salient enough to use to explicate other colors. It is difficult to ascertain whether the 1912 Nordisk Familjebok entry on lila can give us any information about the degree of synonymy between gredelin and lila (is pale violett and light gredelin the same? If so; why is pale used in one case but not the other? Is it a comment on a saturation difference between the terms?). This is a good example of the limitations of using dictionaries and encyclopedias as source material.

In 1918, a famous (in Sweden) children’s book called Tant Grön, Tant Brun och Tant Gredelin ‘Auntie Green, Auntie Brown and Auntie Gredelin’ by Elsa Beskow was published, and alongside the very salient color terms brun ‘brown’ and grön ‘green’, gredelin is given – not violett or lila (see Beskow, 1918). This heavily illustrated children’s book is about three Aunties who always wear the same color-coded dresses and accessories and live together in a small town, and the two children who move in with them. The book will become important later in this chapter. All this hints at an ongoing lexical replacement, where gredelin became more popular than violett.

The popularity of gredelin did not last long, however: in the Hellquist (1922) dictionary, lila is described as a pale violett (any mention of lila as a light gredelin is gone), and in the SAOB entry for lila (printed in 1940), lila is described as a blue-red mixed color similar to violett.

To summarize what we know so far, violett seems to have been the most salient term for purple in the early 19th century. It may have been challenged by gredelin by late 19th century. In the beginning of the 20th century, lila is becoming a salient alternative to violett.

Some final textual data for this attempt to chart the history of lila, violett, and gredelin comes from the two corpora of older Swedish novels (published between 1830-1942) and newer Swedish novels (published between 1976-1999). The older novels have only a single example of lila (in

Page 140: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

128

1928, describing a can), but 37 cases of gredelin, and 36 of violett. The oldest examples of violett in the novels occur as early as 1841, while gredelin is used in the novels for the first time in 1884. In contrast, the newer novels (1976-1999) have almost no examples of gredelin (8), but 49 of lila, and 89 of violett. The two competing words in the older corpus - violett and gredelin - are used more or less about the same headwords (clothing and clothing accessories, like hats and gloves, and flowers) showing no clear semantic demarcation of what the words can modify. The two competing words in the younger corpus –- violett and lila – show some slight differences. The most common type of headwords for violett are related to the sky (twilight, sunset, etc.), with clothing and clothing accessories and descriptions of faces (cold lips, blushing or angry skin, etc.) and flowers as other domains. The most common headwords for lila, in contrast, are related to clothing and clothing accessories, and then flowers – sky phenomena are almost never described as lila.

Except for the lack of gredelin before 1884, when violett was the only one of the three purple words that was used in the works of fiction, there is too little data to see any internal differences (such as changing types of head words) within the time period of either corpus, i.e. a difference within the texts in the older fiction corpus and within the texts in the newer fiction corpus.

Table 18. Fiction corpus results, sorted by type of headword, from an older (1830-1942) and a newer (1976-1999) corpus. Lila appeared only once in the older corpus and has been excluded. The years given in the table mirror the

first and last appearance of the word in the material.

To summarize, the fiction corpus data indicate that the purple words have all mainly been used about clothing, clothing accessories, and flowers,

Gredelin 1976-1999

clothingaccessories

flowers 1skyface 2

color (noun?)other 5Total 8

1 12 2

111 4 8 610 7 9

Older fiction corpus Newer fiction corpus

Violett 1841-1935

Gredelin 1884-1931

Violett 1976-1999

Lila 1976-1999

7 6 6 54

13 16 38 1436 37 89 49

1 3 9 37 8

Page 141: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

129

except for violet, which in latter times have started to be used quite a lot with sky phenomena (such as sunset and sunrise). Violett is the oldest word, and once gredelin came along, they were both used to describe more or less the same domains. Gredelin had mostly disappeared by 1976. Lila came to be used in the first half of the 20th century. In the newer corpus, lila and violett are used to describe similar things, though violett is commonly used with regard to sky phenomena.

The botanical floras tell a story that is both similar and different to what is revealed in the fiction corpus (see Figure 21). In the contemporary DigiFlora, 70% of the purple flowers are described as violett, and 30% as lila (gredelin is never used). From these, I chose 14 flowers that were described with the term violett (including in compounds), 2 that were described as both lila and violett, and 4 flowers described with lila (including compounds). The flowers were all wild (except Crocus Vernus, which has now become wild) and had their first botanical find before the first Swedish flora used in this botanical overview had been published.

Figure 21 graphs all occurrences of the terms blå, röd, violett, gredelin, and lila in the floras, as used independently or in compounds (lilablå) or modified states (ljusblå). The floras reveal that the purple flower descriptions over the years are more time stable than that of the pink flowers. From the mid-19th century up until the end of the 20th century, blå ‘blue’, röd ‘red’, and violett ‘purple’ are the typical words used in the descriptions. There are two changes of interest, however: first, around the end of the 19th century, gredelin appears in a few floras - but then this color term disappears. This is consistent with how the term is treated in dictionaries, and with the fact that there is a dearth of gredelin in the fiction corpus from the 20th century. The second point of interest is that lila has started appearing in the floras at the end of the 20th century – the botanical color term genre is clearly conservative and slow to change, but eventually the presence of lila is making inroads even there.

Page 142: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 143: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

131

6.2 META-AWARENESS OF CHANGE: INTERVIEWS WITH

OLDER SPEAKERS

The previous section (6.1) showed how the pink and purple words were described and used in texts. This section will present interview results from speakers born in the middle of the 20th century, on their perception of the meaning, and change in meaning, of the pink and purple words.

The interview results will be analyzed together with the textual results, in section 6.3. This historical information will then form a context for the elicitation experiment data from older and younger Swedish speakers.

6.2.1 METHOD AND MATERIAL

The interview material was gathered in the summer of 2014 with 18 older speakers of Swedish (median age 61) in semi-structured interviews. Paramei and Oakley (2014) noted that human color vision changes physiologically as people get older. Chromatic sensitivity undergoes a gradual decrease from 30 years, and from 60 years onward the likelihood of degeneration in color vision increases even more. At the same time, older speakers are very much an important part of the language community, and the small changes that they experience in color vision might be a factor in the changing community discourse about color.

Speakers were asked about the associations they had to the three pink and three purple color terms. A manual categorization of the association answers into semantic classes was carried out, based on my native speaker semantic intuition.

Speakers were also asked how they would explain the meanings of the color terms to someone else. If the speakers needed more prompting, they were asked to describe the terms as if describing them to someone who did not know them, such as a learner of Swedish.

The speakers were also asked about which (if any) of the terms they believe they had used as children, and if they were aware of any shift in usage or meaning.

In addition to this data, supporting material will be quoted from Ambjörnsson’s (2011) work within gender studies. Ambjörnsson carried out interviews with Swedish parents and children on their view of rosa from a gender and power perspective.

Page 144: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

132

6.2.2 RESULTS

Section 6.2.2.1 discusses the pink terms, sections 6.2.2.2 discusses the purple terms.

6.2.2.1 THE PINK TERMS: INTERVIEW DATA

Access to native speakers born around the 1940s and 1950s allows us to get a far higher quality of data than what the dictionaries and encyclopedias can provide from this time. Two of the 18 speakers did not recall what color terms they had used for pink as children, but the rest had quick recollections. Two speakers remembered using both skär and rosa as children, but a majority (14) were certain that they had only used skär for pink as children (several also mentioned that their parents used only skär and never used rosa). This is in sharp contrast with their current use: all speakers use rosa as their primary color word for pink. About half were uncertain about when the lexical replacement occurred, but the rest claimed that it was either in primary school or in high school. By comparing their birth years with their claims, the change seems to have happened for most of them during the 1960s, or possibly the early 1970s. The most common story was that skär had been the most salient term for pink during childhood, but that this had changed at some point during their school years to rosa, and that skär had either undergone a meaning change restricting it to a very light pink denotation or had become synonymous with rosa as a word for the entire pink region but was now archaic. None of the speakers recalled using cerise when they were small, but several recognized it as a word appearing in the 1980s, and as a fashion color typically used for fashionable clothes or accessories.

As for the current meanings, all except one speaker agreed that rosa is the largest color (has the largest denotation in color space). The single speaker was of the opinion that skär and rosa were synonyms. The speakers were split between the opinion that rosa was a clear hypernym to skär, or that skär was its own light pink color. Approximately half the speakers also thought that cerise was independent from rosa, while half saw it as a hyponym.

To summarize:

Rosa is a hypernym of skär 9 Rosa and skär are independent colors 8 Rosa and skär are synonymous 1 Rosa is a hypernym of cerise 10 Rosa and cerise are independent from each other 8

Page 145: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

133

Most of the interviewed speakers agreed that skär was a light color and cerise was a darker color (with a blue tint).

The speakers were also asked about their associations for the pink terms. The overlap with the typical headwords in the 1976-1999 fiction corpus is noticeable - see Table 17. The associations are listed in Table 19.

When asked about what things were typically rosa, speakers primarily gave answers belonging to the categories clothes (accessories), make-up and toys – with the addendum that these items were typically linked with girls and women. This leads to a meta-category labeled “Women and children’s things” in Table 19. Examples include flickor ‘girls’, kvinnor ‘women’, damkläder ‘ladies’ clothes’, tjej/flick-kläder ‘girls’ clothes’, barbiedockor ‘barbie dolls’, my little ponies, and flickrum ‘girls’ rooms’. Other frequent categories of responses could be labeled flowers (växter ‘plants’, 3 nyponblommor ‘rose hip flowers’, 3 ros ‘rose’, pion ‘peony’, äppelblom ‘apple flowers’, päronbom ‘pear flowers’, bougainvillea, fuchsia) and sweet foodstuffs (2 prinsesstårta ‘marzipan cream cake’, marsipanrosor ‘marzipan roses’, jordgubbsglass ‘strawberry ice cream’, yoghurt med smultron eller jordgubbar ‘yoghurt with (wild) strawberries’).

The association between rosa and girly things was also noted in Ambjörnsson’s (2011) book “Rosa – The Dangerous Color” (my translation of the title). Ambjörnsson interviewed Swedish parents and children about pink, and how they viewed it from a gender and power perspective. While Ambjörnsson’s research does not discuss the denotation of the Swedish words for pink, Ambjörnsson does confirm that the pink color (for which rosa is the most salient contemporary term) is clearly linked to the feminine: in Sweden, pink is often linked to low status when worn by men, especially men from lower socioeconomic classes. Men from higher socioeconomic classes can therefore use it (pink shirts, pink paper in the financial newspapers) to mark social distance (Ambjörnsson, 2011, p. 96). Ambjörnsson notes that pink is a difficult color for adult women, who wish to disassociate themselves from a connection between pink and the low status female gender, at the same time that they wish to raise the status of the female gender, and therefore raise the status of pink (Ambjörnsson, 2011, p. 55). In the interviews for this study, several male speakers made a point of distancing themselves from words for pink. One speaker first noted that he never talked about pink things, only to later indicate the reverse: he said that he frequently teased his teenage daughter, threatening to give her pink things, something she did not appreciate.

The associations for skär by the speakers in this study were similar to those for rosa – and some speakers said outright that they had the same

Page 146: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

134

associations for the two color terms. There are things typically seen as culturally “girly/childish” (like barbiedockor ‘barbie dolls’, dockor ‘dolls’, bäbiskläder ‘baby clothes’, tjejkläder ‘girl clothes’, kvinnokläder ‘women’s clothes’, dockkläder ‘doll clothes’, tyllkjol ‘tulle skirt’, spädbarn ‘babies’, småbarn ‘little children’), but also flowers, like växter ‘plants’, blommor ’flowers’, nejlikor ‘carnations’, persika ‘peach’, fruktkött ‘fruit pulp’, onaturliga blommor ‘unnatural flowers’, rosor ‘roses’, and nyponros ‘dog rose’. The single most popular association for skär had nothing to do with girls, children, or flowers, however: it was gris ‘pig’, most likely due to the skär som en gris ‘pink like a pig’ popular saying, and/or the skin color of pigs. Skin was also a recurring associations in other ways – two people responded with hud ‘skin’, but there was also barnhud ‘kid’s skin’ and hudfärgad krita ‘skin colored crayon’. Two speakers mentioned women’s clothes in general, and three mentioned older ladies’ underwear.

The associations for skär by the speakers is similar to the common headword types in the newer fiction corpus, but in the corpus the flesh uses of skär are more prominent than in the association lists. Rosa is rarely used to describe flowers in the corpus, but different flowers are one of the most common associations for rosa by the speakers.

Cerise caused some speakers problems when it came to associations – several male speakers found the term difficult. Some responded that it was a fashion color, and they didn’t know anything about those things. Several speakers, mostly male, also brought up the same name Bengt Grive when talking about cerise. This is a reference to a sport commentator on black and white TV from 1960-1990 who was famous for making up creative color terms when describing the color of the clothes of competitors: he even got a special prize from the Swedish Academy in 1996 in praise of his linguistic inventiveness.

Fashion and flowers were the prevailing categories for cerise associations – for example, four speakers gave the association mode(färg) ‘fashion (color)’. Others mentioned indiska kläder ‘Indian clothes’, kläder ‘clothes’, textilier ‘fabrics’, konståkningskläder ‘figure skating clothes’. Two gave the example kvinnokläder ‘women’s clothes’, while another said klänning ‘dress’. There were also flowers: blommor ‘flowers’, körsbärsblommor ‘cherry blossoms’, syrener ‘lilacs’, bougainvillea, nejlikor ‘carnations’, and cyklamen ‘cyclamen’.

Page 147: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

135

Table 19. Associations for the pink terms. Numbers indicate recurring answers.

Rosa Skär Ceris

Flowers

rose-hip flowers (3), roses(4), peony, apple flowers, pear flowers, bougainvillea, fuchsia,

plants

flowers, carnations, unnatural flowrs, roses, rose hip flowers, plants

cherry flowers, flowers(6), lilacs, bougainvillea (2),

carnations, cyclamen

Fruit peach, fruit meat cherry

Skin pig (4), skinpig (6), skin

colored crayon, skin (2), baby skin

Clothes (accessories)

girl clothes (4), women’s clothes(2),

children’s clothes, tulle skirt, old fashioned bra,

fabric bow

baby clothes, girl clothes, women’s clothes (2), tulle skirt, old ladies’ underwear(3)

fashion color (4), clothes, fabric, figure

skating costumes, women’s clothes (2),

dress

Make-upmake up, lipstick (2),

rouge, nail polishlipstick, nail polish

Toys

girly things, children’s things, barbie dolls, My Little Pony, daughter's

room

barbie dolls dolls, doll clothes, little

children

Food

marzipan cream cake (3), marzipan roses,

strawberry ice cream, yoghurt with (wild)

strawberries

marzipan pig, marzipan rose

rosé wine

Otherhearts, flip flops, 18th century clothes, Gant

clothes

flip flops, swim wear, kerchiefs,

jewelry, made up color (3), wallpaper

1 speaker 1 speaker 1 speaker

Nature

Organics

Artefacts

Women/ Children's Things

No associations

Page 148: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

136

6.2.2.2 THE PURPLE TERMS: INTERVIEW DATA

Most of the interviewed speakers were aware of a historic change in the meaning of purple color terms, although they were a bit uncertain about what the change entailed. The speakers’ present use of the Swedish purple terms was similar to their childhood use in this regard: lila was the most salient term for purple for all but one speaker (who said that violett was the standard term). No speaker used gredelin as the standard term. No speakers used all three terms in their everyday discourse at the time of the study.

Two speakers could not recall what terms they had used during childhood, but the remaining sixteen said that they used lila as the most common word during childhood. Some of them (5) say they also used gredelin (but not violett) and some (3) also used violett (but not gredelin). One remarked that violett was a turn-of-the-century color term; another that his parents might have used violett. Some mentioned the Beskow children’s book (Tant Brun, Tant Grön, Tant Gredelin) and said that the book made the term gredelin popular to them as children, while others indicated that terms like gredelin and violett were terms acquired in school, as alternatives to the “standard” lila.

The speakers had similar ways of describing lila, and somewhat less consensus for describing violett, and finally an interesting lack of consensus in their descriptions of gredelin. Fourteen of the eighteen speakers were able to describe a system of how, in their idiolect, the color terms were related or overlapped. Of these, ten speakers indicated that lila was the hypernym of both violett and gredelin. One indicated that lila was the hypernym of violett, but not gredelin, and one indicated that lila was the hypernym of gredelin, but not violett. Two indicated that the three terms represent three distinct colors.

To summarize:

Lila is a hypernym of violett and gredelin 10 Lila is a hypernym of violett, but not of gredelin 1 Lila is a hypernym of gredelin, but not of violett 1 Lila, violett, gredelin are all independent of each other 2 Unable to describe the relationship between the terms 4

Violett was often described as a bluish lila by speakers – none treated it

as a synonym to lila. The most consistent thing about the descriptions of gredelin was that there was no consistency – some speakers said gredelin was bluish, other reddish, some said it was a light color, others said that it was a

Page 149: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

137

dark color. Four speakers described their purple systems as lightness-based, but none of these arranged their color terms in the same light-dark order (going from light to dark, the systems were: lila-gredelin-violett, violett-lila-gredelin, lila-violett-gredelin, gredelin-lila-violet). If gredelin is taken out of the picture, one can notice that three of the four lightness-based systems had violett as a darker kind of lila – this will be echoed in later findings as well (see the experiment results in section 6.4), where violett is typically used for the darker part of the red-blue perceptual color space.

The speakers were also asked for their associations for the color terms - according to them, what kind of things are typically [color term]? Their answers are listed in Table 20. The overlap with the typical headwords in the newer fiction corpus, in particular, is noticeable – compare Table 18. .Lila is primarily associated with flowers. The most common associations are viol ‘violet’ (6 instances) and syren ‘lilacs’ (5), followed by midsommarblomster ‘wood crane’s bill’(3). Six other flowers were mentioned (gökärt ‘bitter vetch’, bergsklematis ‘anemone clematis’, tulpaner ‘tulips’, bougainvillea, vårblommor ‘spring flowers’, näva ‘crane’s bill’). Less frequent domains of association were vegetables and fruits (2 aubergine, 2 plommon ‘plums’, 1 vindruvor ‘grapes’), clothes (1 kläder ‘clothes’, 1 sjal ‘kerchief’, 1 halsduk ‘scarf’, galaklänning ‘gala dress’, 1 hatt ‘hats’), and items associated with indoor decorating (2 gardiner ‘curtains’, 1 kudde ‘pillows’, 1 sänglinne ‘bed linen’). One speaker (B14) was unable or unwilling to come up with any associations for lila. The three speakers who only mentioned a single item all mentioned a flower.

When asked about associations for violett, five speakers were unable or unwilling to come up with associations for typically violett items (though two of them noted that it would be similar to the things that were lila). The prototypical association is therefore less clear for violett than for lila. Flowers are an important category among the association responses, with skogsviol ‘dog violet’ (3) and viol ‘violet’ (2) as well as a light syren ‘lilacs’ and a simple answer of blommor ‘flowers’. Sky phenomena are another category for violett: himmel i solnedgång/skymning ‘sunset sky’ (2), ljusa åskmoln ‘light thunder clouds’ (1), del av regnbågen ‘part of the rainbow’ (1). Fruits and vegetables were also mentioned, and these were a subset of the fruits and vegetables mentioned for lila: 2 plommon ‘plums’, 1 aubergine.

Page 150: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

138

Table 20. Associations for the purple terms. Numbers indicate recurring

answers

Lila Violett Gredelin

Flowersviolets (6), lilacs (5), wood cranebill (3),

other flower names (6)

violets(5), lilacs, flowers

bougainvillea, carnations,

roses

Fruitaubergine (2), plums

(2), grapesaubergine, plums (2)

aubergine (2)

sunset sky(2), light thunder

clouds, part of rainbow

curtains (2), pillows, bed linen

fabric (2)

clothes, kerchief, scar, gala dress, hats,

accessoriesblouse, clothes fashion clothes

nail polishlipstick, nail

polish

cream cake (homonym: grede-lin and

grädde 'cream')

companies with purple logos (3), politics,

made up color, can of plums, teenagers' hair, sky, water, My Little Pony, kitchen utensils

letter, eyes, candy, jewellry

Auntie Gredelin (16)

1 speaker 5 speakers 2 speakersNo associations

Other

Nature

Organics

Sky

Artefacts

Decorations

Clothing (accessories)

Make-up

Food

Page 151: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

139

Only two speakers had no associations for gredelin. The rest associated the word with the children’s book heroine Tant Gredelin ‘Auntie Gredelin’ mentioned earlier in this text. Two people also mentioned the vegetable aubergine, and there were also responses like gräddtårta ‘cream cake’ (there is an accidental homonym between grädde ‘cream’ and the first part of gredelin), bougainvillea, nejlikor ‘carnations’, rosor ‘roses’, läppstift ‘lipstick’, ‘nagellack ‘nail polish’, modekläder ‘fashion clothes’.

The speakers’ associations in Table 20 can be compared to the similar typical headwords in the newer fiction corpus. The first books in the newer fiction corpus were published when the interviewed speakers were in or around 20 to 30 years old. The associated words (Table 20) and the corpus headwords (Table 18) are similar in many respects: clothing (accessories) and flowers and fruits are described as both lila and violett by the speakers and typical headwords also fall into these categories. Speakers often associate violett with the sky, and similarly many headwords in the fiction corpus are also sky phenomena. Gredelin, on the other hand, is more or less missing from the newer fiction corpus, but does get some flowers, fruits, and fashion and make-up related associations in the interviews. The majority of speakers have only one and the same association for gredelin, however: Tant Gredelin.

Page 152: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

140

6.3 SUMMARY OF TEXTUAL AND INTERVIEW RESULTS To summarize: when a Swedish pink color concept became established

in the 19th and 20th centuries, two lexemes were used to describe it. The extant word skär merged with the loanword chair and was first analyzed as having a secondary color sense, and then, later, this became the primary sense. Rosa developed from a bound compounded morpheme (rosenröd, etc.) to an independent word. These two words were first defined as variants of röd in dictionaries and then, later, were defined as versions of one another. In botanical floras, röd has steadily decreased as a descriptor of pink flowers – and the replacement of röd happened to the palest pink flowers before those of a darker hue.

It is difficult to judge the relationship between the words from the dictionaries, but skär was established in its pale red sense before rosa, and skär was the more salient of the two terms in the beginning of the 20th century. In the fiction literature, this dominance by skär was true as late as 1976-1999, as evidenced by the books from this time period. This can be contrasted with the interviews with the speakers born in the middle of the 20th century. Most speakers agree that skär was the most salient color term for pink when they were children but that the most salient color term for pink at their present age is rosa. The shift seems to have happened in the 1960s or early ‘70s in spoken language. It is not strange that the fiction corpus from 1976-1999 does not yet reflect this, since fiction literary standards might well be conservative (maybe even more so than the journal article genre discussed in section 2.6, where it took around a decade for real-world changes to be reflected in language (see also Juola, 2003).

Based on the opinions of the speakers in this study, in contemporary Swedish, rosa is the general word for pink, and skär is a light pink. Cerise is, according to the interviewed older speakers, a darker pink – something that matches the PINK2 concept defined and discussed in chapter 5.6. This dark pink color meaning is rather recent: cerise was described as red or red-brown in SAOB in the beginning of the 20th century. It was not present in the fiction corpus and has been described as red ever since it first appeared in a SAOL edition in the 1970s. Judging from the interviews, cerise entered speakers’ awareness in the 1980s.

As for words for purple, 19th century Swedish speakers most often used the label violett for this red-blue derived color. The color was recognized in an encyclopedia as one of the major colors of the perceptual color space.

Page 153: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

141

Violett was, partly successfully, challenged by gredelin for a few decades around the beginning of the 20th century (as seen in dictionaries, floras, and the fiction corpus), and also, later and more successfully, by lila. Most of the details of this “fight for supremacy” between the words are unavailable to us, but for the interviewed speaker, who grew up in the middle of the 20th century, lila had become the standard term, with violett and gredelin as secondary terms.

The corpus searches returned no uses of lila in the Swedish novels corpus before 1928. Before that, violett and gredelin was used, for mostly the same kinds of headwords, typically clothing, clothing accessories, and flowers. Gredelin is all but gone in the 1976-1999 novels, while violett and lila are used approximately with the same headwords.

For people born around the middle of the 20th century, lila is today mostly associated with flowers and, to a lesser extent, with fruits, clothes, and indoor decoration. Roughly a third of the participants were unable to think of something that was prototypically violett, but flowers, the sky, and some fruits were mentioned. While there was little consensus among these speakers about what part of the color space gredelin actually denotes, most of them very quickly agreed on something that had this color, the fictional Tant Gredelin ‘Auntie Gredelin’. Comparing the speakers’ intuitions with textual co-occurrence in a historical corpus (1830s-1940s) reveals mostly similarities, though the amount of data is quite sparse.

The kind of competition evidenced by violett and gredelin in the 19th century and violett and lila in the 20th century can be seen as a natural developmental step for new color categories arising in border regions (like red-blue). This historical overview has shown that this competition can be lively for many generations of speakers. Rakhilina and Paramei (2011) predicted that new color terms would be mostly used for describing artificial (manmade) objects and only eventually start to modify natural objects. This is partly shown in the corpus results, where clothes and clothing accessories (taken together) are the most frequent headwords for both violett and gredelin in the older corpus, followed by description of flower colors. By the later corpora, violett and lila are also used for clothing and clothing accessories (flower color descriptions come next, frequency-wise), but now the number of other uses, not so easily categorized into semantic domains, has risen dramatically. All the color terms referring to the purple part of the color space seem to go, over time, from specialized usage to general usage. The fact that the terms often refer to the colors of flowers gainsay Rakhilina and Paramei’s (2011) hypothesis that the more established a color term gets, the more its use

Page 154: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

142

shifts from describing artifacts to describing also natural occurrences – but it is worth noting that both color terms were initially derived from flower names. Flowers are also very prominent in the association responses of the speakers (born in the 1940s and ‘50s).

While there are some signs of semantic restrictions on the kinds of headwords the Swedish purple and pink terms could name, these do not come out as dramatic in the current material. Instead, the textual and interview data show that for these two color concepts, the blue-red derived color of purple and the red-white derived color of pink, different words have been the most salient at different times. The lexical change is not sudden, but gradual. The material can tell us little about what actuated the changes when they did occur.

The dynamic color terminology for pink and purple is in stark contrast with the conservative labeling of colors like red and blue in Swedish. The primary word for these colors has not been in doubt at all in the last few centuries. However, this does not mean that, onomasiologically, the meanings of röd 'red' and blå ‘blue’ have not changed – as language specific PURPLE1 and PINK1 concepts have carved out a recognition in the Swedish color space, the applicability of the older color concepts has been altered – the floras show that pink and purple flowers used to have more röd 'red' and blå ‘blue’ terms in their descriptions than they do now.

Naturally, works of fiction may reflect the idiolect or artistic choices of the author. Likewise, encyclopedias and dictionaries are problematic sources for the meanings and use of words – the methods of the compilers are often not stated, and there might be prescriptive leanings and censorship against unwanted linguistic forms. Moreover, Petersen, Tenenbaum, Havlin, and Stanley (2012) have shown that it takes on average about 40 years for a lexical item to be entered into dictionaries after it first starts gaining prominence in (written) language use. In addition, written language is often more conservative than spoken language, which further complicates the issue. The first attestation of a term in written form is usually later than its first use in spoken language. Lexicons and dictionaries are therefore of limited help when it comes to pinpointing exact dates of a lexical items birth or decline – but successive dictionaries and encyclopedias can tell us something of the different stages of the meaning of a word, if not when these development happened. Interviews with speakers about their awareness of lexical change are also problematic since speakers might misremember or (un)consciously misrepresent their own language use, but when speakers have a high consensus in their descriptions of development, this will more reliably reflect

Page 155: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

143

actual meaning and usage situations. When several sources align and point in the same direction (triangulate), it will be possible to say something about the historical development of the lexicalization of the pink and purple color region in Swedish. A historical review can, unfortunately, only speculate on the denotational details of these lexical change and replacement processes - something that motivates the more in-depth experimental research in section 6.4.

6.4 CAPTURING INTERGENERATIONAL DIFFERENCES

THROUGH COLOR ELICITATION Sections 6.1 and 6.2 provided a historical context to the present

elicitation study, giving us a first look into the changing meanings of the Swedish purple (lila, violett, gredelin) and pink (rosa, skär, cerise) color terms. This section focuses on the intergenerational differences in color naming between two generations of Swedish speakers and what this tells us about lexical change and replacement.

There are two overarching questions that will be discussed in the sections to come. The first is whether any intergenerational differences found in color term categorization is randomly distributed or focused on predictable parts of the perceptual color space. The Berlin and Kay paradigm would suggest that derived color categories like pink and purple would be relatively recent, and this might make it reasonable that language change (operationalized as difference between the two generations) might be more clearly seen in these areas and their perceptual neighbors.

The second issue has a descriptive part (how does lexical replacement and change proceed for words describing pink and purple?) and a theoretical part (does this conform to, or suggest amendments to, published theories of lexical replacement and change?).

6.4.1 SUPPLEMENTARY NOTES ON METHODOLOGY The color elicitation experiment made use of the EoSS experiment

protocol discussed in section 4.3, but with some modifications and new analysis approaches that will be discussed in this section.

It has so far been unusual to use the well-known color term elicitation method for comparing two generations of speakers from the same language (though see Desgrippes, 2011, whose work on French was already mentioned in chapter 4.). The standard way of using the method is to compare two or more languages with one another, as was done in chapter 5.

Page 156: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

144

In order to capture the diachronic aspects of lexical change, I will in this chapter make use of the apparent time construct (see e.g. Robinson (2010a, 2010b) and D’Arcy (2006) for other semantic studies using this method). The apparent time construct contrasts the behavior of an older group of speakers with the behavior of a comparable, younger group of speakers. The difference between the age groups is assumed to reflect, to a large extent, diachronic change in the language community. The apparent time construct rests on the assumption that language patterns mostly stabilize after childhood, and then change relatively little during the lifespan. Research into color term acquisition, discussed in section 4.2.5, indicates that stabilization of color term inventory occurs in speakers before adulthood, though at different ages in different language communities. This stabilization is not an absolute stop for new color terms – indeed new colors will wax and wane in popularity in society, and adult speakers can certainly innovate. In the case of pink and purple terms in Swedish, it is noteworthy that many of the older speakers are very much aware both of a case of lexical replacement that happened when they were young, as seen in the interview data in section 6.2 with regard to them replacing skär with rosa, and of the introduction of a new color term, cerise, around the 1980s.

Since members of an age group (with a similar sociocultural background) are likely to experience similar real-world changes, their language changes should also be similar. The similarities in the language pattern of an age group could then both be a result of similar childhood experiences, leading to a fossilization of a large part of language by the time they are adults, and also due to similar experiences later in life.

The diachronic analysis that the apparent time construct provides should be complemented with other methods, as is being done in this thesis.

Subjects: The younger group’s data is the same as that used for the Swedish part of the cross-linguistic comparison chapter 5. The group was first introduced in section 4.3. To briefly reiterate, the group consists of 19 individuals born between 1974 and 1992 (an 18-year span) who were all interviewed during the summer of 2011. Ten were female, nine male, and the median age was 25.

The older group consists of 19 individuals born between 1947 and 1959 (a 12-year span), with a median age of 61. Eighteen of these (nine male and nine female) were interviewed during the summer of 2014. The participants were recruited among the staff at Stockholm University, and subsequently by word-of-mouth referrals. One of the older participants (a male) was interviewed alongside the younger group in 2011. In this study, his

Page 157: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

145

results have been grouped together with his age peers.24 One speaker had several errors in the color blindness test, but no deviant response pattern was detected in either a visual inspection of their responses or in an MDS clustering visualization – for details, see Appendix E.

Experiment tasks: The elicitation experiment included the color naming task and best example task, the methodologies of which were discussed in section 4.3. It is important to note that the older and younger groups of Swedish speakers did not do the exact same experiment task. Both groups did a best example task, but the older speakers were also asked to point out the best example of four terms that the younger speakers were not asked about: violett ‘purple’, gredelin ‘purple’, skär ‘pink’, and cerise ‘pink’.

6.4.2 RESULTS Naming task and best example task results are available in Appendices

B and C, respectively. I will briefly reiterate the point of the different experiment tasks, and

what bearing they might have on the research questions. The best example task can tell us whether, for a particular color term, there are differences between the generations in the location of the prototypical center (often referred to as the color’s focal point). Given the tendency for color concepts even in different languages to center on similar parts of color space, it would be surprising if the best examples differed much between the Swedish generations. However, a lack of consensus on the best examples of color terms might be interesting: for instance, if almost all speakers in both generations agree on the best example for röd ‘red’, and there is a lot of disagreement on the best example for rosa ‘pink’.

In contrast to the best example of a color term, the naming task results show the denotational footprint of a term: its boundaries in color space. This denotational footprint is more likely change over time than the best example placement.

24 The data from the younger group and the lone older participant recorded in 2011 was collected within the international EoSS project and was subsequently entered into the EoSS database where it represents the overall Swedish color vocabulary. It was also analyzed in Vejdemo et al. (2015). For a general introduction to the EoSS project, see Majid, Jordan, and Dunn, (2011). The color set was developed by Majid and Levinson (2007) and Majid (2008).

Page 158: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

146

Before turning to the generational differences for the pink and purple words in 6.4.4, the next section (6.4.3) will discuss the variation between and within the two generation groups for the entire color spectrum.

6.4.3 DIFFERENCES OVER THE ENTIRE SPECTRUM This section will discuss generational differences in the distribution of

the color term use over the stimuli, differences in the way the generations use color term modifiers, differences in the complexity of results, and differences in the amount of internal consensus within each group when it comes to both the naming task and the best example task results.

The younger speakers used on average (median) 16 color terms as main responses to label the stimuli in the naming task, but one speaker used as few as 13 different terms, and another as many as 22. The older speakers had a similar verbosity: 17 color terms on average (median), with a minimum of 13 and a maximum of 25.

All in all, there were 11 terms that were used more than 100 times as responses by both older and younger speakers. These were grön, blå, lila, rosa, gul, röd, brun, turkos, orange, svart, and grå, and their relative frequencies are shown in Figure 22. There were an additional eight terms that were used between 10 and 100 times: vit, cerise, skär, lime, violett, hudfärg(ad), beige, oliv. Many terms differ in frequency of use between the generations, but as can be seen in Figure 22, the largest difference in frequency of use can be found in svart, blå, rosa, and röd: the younger generation uses more rosa and svart and less röd and blå.

Page 159: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 160: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

148

example in Figure 23, rosa was found to have a different distribution between the groups, while gul was not. Other terms that were used differently were skär ‘pink’, violett ‘purple’, blå ‘blue’, röd ‘red’, and svart ‘black’.26 In addition, two other terms (lila ‘purple’, p=0.052; and gredelin ‘purple’, p=0.072) were borderline significantly different. By the tentative translations given above, the terms used differently by the generations (and the two borderline cases) can be sorted into a pink group, a purple group, red and blue (crucial perceptual neighbors to purple and pink) and finally black. There is lexical change in Swedish speakers’ labeling of the color spectrum, and this lexical change is more pronounced in the pink and purple parts of the color space than in many other areas.

The robustness of these results can be shown by a leave-one-out test. One by one, each participant was removed from the dataset and the Wilcoxon test was repeated. The result gives a distribution of p-values. In Table 21, the p-values of the entire dataset are given in the second column; the third column holds the mean p-value of each test where one young speaker was left out; the fourth column holds the mean p-value of each test where one old speaker was left out. While the mean p-values for the “leave one out” tests are very similar to the p-values for the entire data set, demonstrating the robustness of the results, two color terms get different results: blå ‘blue’ and lila ‘purple’. Their two cells are shaded in the figure.

When the entire data set is considered, there is a difference in the way the groups use the blå and lila color terms. For blå, this difference holds if younger speakers are removed, but if older speakers are removed, the differences between the groups disappear. For lila, the difference in use between the groups disappears if younger speakers are removed from the testing.

26 An additional term was used significantly differently (p=0.04): mauvaise, used five times by a single speaker for purplish hues. It was never used by an older speaker and will not be discussed further.

Page 161: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

149

Table 21. Results from the Wilcoxon test with all data (left column), mean p-value with one younger speaker left out (middle); mean p-value with one older speaker left out (right)

So far we have only looked at results for the main responses, but

differences between the groups also emerge when their use of color term modifiers is considered. Since the lightness modifiers ljus- and mörk- are the single most popular way of modifying color terms across the perceptual color space, one angle from which to consider the possible difference of the pink and purple areas to the rest of the color space are the sheer frequencies of use of these terms. When the frequency of lightness modifier usage is considered for all possible color terms mentioned in the naming task, color terms connected to the pink and purple area stand out among the most different between the generations (see Figure 24). The clearest differences in the use of the lightness modifiers are how the ljus- modifier is applied to lila, rosa, and röd: Of all answers containing lila, 14% contained ljuslila in the older group, compared to 20% in the younger group (an increase of 6 percentage points, pp). The number of ljusröd responses decreased by 7 pp by the younger generation, and ljusrosa responses increased by 3.5 pp. This begs the question of what alternate terms ljuslila and ljusrosa “replaced”, and what terms the younger group started using instead of the ljusröd-label of the older group.

All data YOUNG - one left out OLD - one left out

p-value Mean p-value Mean p-value

skär 0.0029 0.003 0.0062blå 0.0036 0.0003 0.2336

violett 0.0042 0.0041 0.0113röd 0.0081 0.0042 0.0364rosa 0.0092 0.0175 0.006svart 0.0119 0.0113 0.0175lila 0.0516 0.5176 0.0057

gredelin 0.0719 0.0719 0.1248

Page 162: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier
Page 163: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

151

Table 22. Breakdown of non-plain elements in answers into subcategories.

Table 22 shows a rough categorization of the non-plain elements for the entire perceptual color space. For both generations, the lightness modifiers ljus- and mörk- are the most common way to modify a color term, but the speakers also use other modifiers. Speakers are more than three times as likely to use a modifier (such as jättegul ‘very yellow’) than a color compound of two color terms (such as rödgul ‘red yellow’). The older generation uses a more diverse set of modifiers than the younger generation and also has a lot more hesitations. Table 23 and Table 24 contrast the modifier use for the entire color space, with the modifier use for the PINK1&PINK2 area (treated together, since PINK2 is a subsection of PINK1) and the PURPLE1 areas, respectively.

By comparing the projected number of different modifiers for the PINK1&PINK2 or PURPLE1 area, with the actual number of modifiers that the speakers used, it is possible to say whether speakers use modifiers in an unexpected way for the areas in question. One sample chi-square tests comparing the actual and expected number of modifiers confirms that the generations’ use of modified forms was statistically different than what would have been expected given the speakers overall response patterns, for the PINK1&PINK2 area (young group: DF=5, x2=24.95, p<0.001; older group: DF=5, x2=20.0, p<0.01) and PURPLE1 area (young group: DF=5, x2=24.8, p<0.001; older group: DF=5, x2=15.69, p<0.01). A two sample chi2 test shows that the two groups’ modifier use was different from each other for the PINK1&PINK2 area (DF=5, x2=28.12, p<0.001) and the PURPLE1 area (DF=5, x2=18.09, p<0.01)

As an example, one such difference between the age groups is the greater hesitation shown by the older speakers for the PINK1&PINK2 area: the older speakers supplied hesitant answers 51 times inside PINK1&PINK2. Since

Category Example Young Old ljus - ‘light’modifiers ljusblå ‘light blue’ 22% 17%mörk - ‘dark’ modifiers mörkröd ‘dark red’ 21% 18%

Other modifiersjättegul ‘very gul’,

knallrosa ‘shocking pink’27% 28%

Color compounds rödgul ‘ red yellow’ 18% 15%

Hesitationskanske gul ‘maybe gul’

vad är det där oj röd'‘what is that oh röd’

12% 23%

TOTAL100%

(1017 elements)100%

(1257 elements)

Page 164: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

152

the PINK1&PINK2 area covers 14% of all stimuli, and the total number of hesitations were 283, one would have expected only about 40 (=283x0.143) hesitations from older users.

Another example is that the younger speakers used only half the number of color compounds in their answers for the PURPLE1 area as would have been expected. The color compounds that are used in the purple area are never blåröd ‘blue red’ or rödblå ‘red blue’ for PURPLE1, despite the fact that purple is a border color between blue and red. There are also no rödvit ‘red+white’ or vitröd ‘white+red’ compounds in the answers given in response to the PINK1&PINK2 stimuli, which might have been expected given that the pink color is situated often described as mix of red and white. In the purple area, the color compounds are typically combinations with röd ‘red’ and lila ‘purple’ (and violett, but only in the older group), and in the pink area the combinations are typically with röd ‘red’ and rosa ‘pink’.

Page 165: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

153

Table 23. Actual and projected numbers of non-plain elements for the PINK1&PINK2 area (14% of the color stimuli).

Table 24. Actual and projected numbers of non-plain elements for the PURPLE1 area (18% of the color stimuli).

Young Old Young Old Young Oldljus - 224 212 32 30 49 37

mörk - 212 220 30 31 11 7other modifiers 275 349 39 50 36 48

color compound 181 193 26 28 22 26hesitation 125 283 18 40 11 51

plain answers 755 640 108 91 112 108Total 1772 1897 253 271 241 277

Actualnon-plain elements

in the PINK1&PINK2

area (14%)

Allnon-plain elements

in all answers

Projected non-plain elements

in the PINK1&PINK2

area (14%)

Young Old Young Old Young Oldljus- 224 212 40 38 55 38

mörk- 212 220 38 39 39 42other modifiers 275 349 49 62 29 37

color compound 181 193 32 34 16 31hesitation 125 283 22 51 19 41

plain answers 755 640 135 114 152 133Total 1772 1897 316 339 310 322

Allnon-plain elements

in all answers

Projected non-plain elements

in the PURPLE1 area (18%)

Actualnon-plain elements

in the PURPLE1 area (18%)

Page 166: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

154

Another angle from which to compare the variation both between and within the groups is to look at the amount of consensus – or, inversely, the amount of disagreement – that the groups have when it comes to their use of color terms.

One way to operationalize and measure consensus is by the (Shannon) entropy of the responses. When all speakers agree on how to label a particular stimulus, the amount of entropy in the distribution of their answers is low. Conversely, when speakers disagree, the amount of entropy is high. Entropy is typically measured in bits – here a measure “rbits” will be used, which is the ratio bit value, or the normalized bit value. This is obtained by dividing the bit value for a particular stimuli by the maximum bit value (if all speakers had given completely different answers).27

In Figure 25, the entropy for each color stimuli is given in rbits, and the background of each cell is color coded. The darker the background (the

27 The normal Shannon Entropy is calculated at (Shannon & Weaver, 1964), where “p” is the probability of occurrence. There are different ways of calculating p. The original Shannon and Weaver (1964) method works well on symbols: p is then calculated as the likelihood of a symbol (e.g. “a”) appearing in the whole text (“a” has 2/4 likelihood of appearing in “abac”). By this method, the strings “abac” and “defd” have the same entropy, and so do both the number series “7 9 1 7” and “7 6 4 6”.

In this thesis, I use a slightly different method to estimate the probability p, which is adapted to a distribution of numbers, not a distribution of symbols. As an example: for stimuli S, 7 people answered blue, 9 answered green, and 1 answered turquoise. We wish to calculate the entropy for the distribution [7 9 1] in a way that recognizes that the distribution [7, 9, 1] shows higher group consensus than [7, 6, 4]. Therefore p is calculated by dividing each count by the entire count. For for [7, 9, 1] this produces [7/17, 9/17, 1/17], and for [7, 6, 4] this produces [7/17, 6/17, 4/17].

In these examples, the entropy contributions are, respectively, for [7, 9, 1] the values [0.53, 0.48, 0.24] and for [7, 6, 4] the values [0.53, 0.53, 0.49]. This produces the total entropy measures 1.25 bits for [7, 9, 1] and 1.55 bits for [7, 6, 4]. The lower the entropy (bit value), the lower the disagreement.

Finally, I will use normalized entropy (Hassibi & Shadbakht, 2007), with the unit rbits. This is computed as the ratio of the entropy: the entropy in bits (calculated by the method explained above), is divided by the maximum entropy that would have resulted if every responder had used a different color term for the stimulus. For 17 responders, this maximum entropy comes out to 17*(1/17)*log2(1/17) = 4.08 rbits. Thus the answer distribution [7, 9, 1] has normalized entropy 0.30 rbits (1.25 bits / 4.08 maximum bits), and [7, 6, 4] has normalized entropy 0.38 rbits.

Page 167: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

155

higher the number of rbits), the more the speakers within the older group disagreed. Lighter cells (low number of rbits) indicate agreement inside the older group.

Figure 25. Group internal consensus in the older generation's naming task.

Font color varies for visibility. Figure 26 shows the disagreement/consensus per color stimuli, but this

time for the younger group. Lighter cells (low number of rbits) indicate agreement.

Figure 26. Group internal consensus in the younger generation's naming task.

Font color varies for visibility. Both the older and the younger group have the most internal

disagreement for cell A3 (in the border between pink and yellow), for example, and both groups have absolute consensus when it comes to labeling cell A0 (pure white).

Figure 27 shows the amount of disagreement between the generations, measured in absolute difference in rbits – for example, Cell A11 had 0.3 rbits of information in the younger group and 0.4 rbits of information in the older group. The difference is 0.1 rbits. The background color of each cell is color coded on a gray scale: the darker the background color, the more the two generations differ in their amount of consensus.

A 0.3 0.28 0.29 0.06 0.17 0.42 0.35 0.2 0.21 0.1 0.16 0.58 0.58 0.06 0.09 0.28 0.24 0.11 0.31 0.34 0 AB 0.31 0.36 0.17 0 0.09 0.18 0.12 0.4 0.23 0.23 0.45 0.26 0.11 0.37 0.34 0.22 0.06 0 0.15 0.25 0.19 BC 0.35 0.35 0.06 0.11 0.06 0.06 0.15 0.33 0.49 0.39 0.11 0.29 0.31 0.23 0.43 0.3 0.27 0.06 0.15 0.27 0 CD 0.42 0.11 0.11 0.06 0.06 0.19 0.1 0.17 0.33 0.41 0.38 0.35 0.33 0.35 0.28 0.26 0.34 0.26 0.17 0.23 0.31 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 0

Amount of disagreement (rbits of information) per cell for the older generation

A 0.4 0.39 0.19 0.06 0.11 0.35 0.17 0.11 0.06 0.12 0.06 0.34 0.58 0.09 0.09 0.23 0.25 0.22 0.17 0.27 0 AB 0.25 0.23 0.09 0 0.11 0.12 0.06 0.23 0.06 0.11 0.29 0.17 0 0.28 0.25 0.24 0.17 0.06 0.11 0.32 0.11 BC 0.3 0.33 0.11 0 0 0 0 0.11 0.41 0.19 0 0.32 0.22 0.21 0.41 0.19 0.11 0 0 0.18 0 CD 0.34 0.27 0.14 0.11 0.06 0.12 0.06 0 0.17 0.36 0.27 0.27 0.11 0.23 0.39 0.41 0.3 0.11 0.17 0.15 0.24 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 0

Amount of disagreement (rbits of information) per cell for the younger generation

Page 168: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

156

Figure 27. Difference between the groups’ disagreement on the naming task.

Font color varies for visibility. A visual inspection of Figure 27 indicates that several of the cells with

the highest amount of disagreement between the generations appear in the PINK1&PINK2 and PURPLE2 areas. This can be confirmed with two Mann-Whitney U-tests (unpaired): one test comparing the behavior or the two age groups outside the PINK1&PINK2, PURPLE2 area (61 cells, no significant difference in level of disagreement) and another test comparing the behavior of the groups for the 23 cells that comprise the PINK1&PINK2, PURPLE2 area (PINK1: p=0.028 N=12; PURPLE1: p=0.034, N=15; PINK2, which you might recall is a small subset of PINK1, consists of only 3 cells, and the data is too small to be suitable for the test). In other words, the two generations have approximately the same amount of consensus (same level of disagreement) in general, but not when they have to label the pink and purple colors.

The level of disagreement within each group is given in table form in Table 25. For all color areas, the older group had more internal disagreement than the younger group.

Table 25. Average amount of disagreement in rbits, per group and color concept(s)/areas in the color spectrum.

The general differences between the groups for the entire color

spectrum can also be assessed for the way they completed the best example task. Recall: in the best example task, speakers had to select a single color cell in a color matrix as the best example for each color. It would be unexpected if the final majority choice for best example of a particular term differed greatly

A 0.1 0.11 0.1 0 0.06 0.07 0.18 0.09 0.15 0.02 0.1 0.24 0 0.03 0 0.05 0.01 0.11 0.14 0.07 0 AB 0.06 0.13 0.08 0 0.02 0.06 0.06 0.17 0.17 0.12 0.16 0.09 0.11 0.09 0.09 0.02 0.11 0.06 0.04 0.07 0.08 BC 0.05 0.02 0.05 0.11 0.06 0.06 0.15 0.22 0.08 0.2 0.11 0.03 0.09 0.02 0.02 0.11 0.16 0.06 0.15 0.09 0 CD 0.08 0.16 0.03 0.05 0 0.07 0.04 0.17 0.16 0.05 0.11 0.08 0.22 0.12 0.11 0.15 0.04 0.15 0 0.08 0.07 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 0

Difference between the generations (in rbits)

Area Old group Young group DifferencePINK2 0.37 0.22 0.15PINK1 0.32 0.18 0.14PINK1-PINK2-PURPLE1 0.27 0.15 0.12PURPLE1 0.26 0.15 0.11Outside PINK1-PINK2-PURPLE1 0.22 0.19 0.03

Average amount of rbits(Amount of disagreement)

Page 169: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

157

between generations, but there might be more or less consensus on the placement.

The color cells that were recognized as the best example of a color term by one generation were generally recognized as such by the other generation as well. The levels of intragenerational consensus varied between generations, however, and are shown in Table 26.

The term with the most intergenerational similarity when it came to the level of consensus was röd, which both groups placed in C1. The entropy for röd was low for both groups (0.139 rbits for the older speakers, 0.145 rbits for the younger speakers).

The term with the most intergenerational difference was rosa, at 0.268 rbits of difference in rbits between the groups: the older speakers disagreed to a far greater degree about the placement of the best example for rosa.

Table 26. The number of rbits for the best example task results, ordered by the

highest amount of rbits (the highest amount of disagreement) in the older group.

To summarize, a comparison of the two groups for their color term usage over the entire color space shows that there are differences and that these differences often, though not solely, concern the PINK1, PINK2, and PURPLE1

Best Example Color Oldergredelin 0.737

skär 0.614violett 0.460ceris 0.391

Best Example Color Older Younger Differenceturkos 0.689 0.580 0.109

lila 0.599 0.526 0.073rosa 0.598 0.330 0.268grön 0.515 0.464 0.052blå 0.465 0.444 0.021

brun 0.385 0.256 0.129orange 0.291 0.231 0.060

gul 0.212 0.148 0.064röd 0.139 0.145 0.006grå 0.114 0.070 0.044

rbits (amount of disagreement)

rbits (amount of disagreement)

Page 170: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

158

areas. There are also differences in the way the generations use color term modifiers, hesitations, and compounds. There is less internal consensus (more disagreement) for the older group than the younger group about the labeling inside the PINK1, PINK2, and PURPLE2 areas, and the older group has a high amount of internal disagreement over the best examples of rarer pink and purple words like gredelin and skär, as well as rosa and lila.

6.4.4 DIFFERENCES WITHIN THE PINK1, PINK2, AND PURPLE1 AREAS The previous section shows, using several different methods, that when the two generations differ, they often do so in the denotations matching the PINK1, PINK2, and PURPLE1 concepts. The next step is to ask what these differences consist of in detail.

6.4.4.1 RÖD, SKÄR, ROSA, AND CERISE IN PINK1 AND PINK2 The differences between the groups when it comes to the use of röd ‘red’ show an ongoing lexical change in the borders of the denotation of both röd and other surrounding terms, but no change when it comes to the prototypical center, the best example, of the term. The best example results for röd are shown below in Figure 28 and Figure 29. Both groups have a high internal consensus on the placement on the best example cells (almost all place it in C1) – and, as noted in Table 26, the difference in rbits of information is 0.006, the lowest difference between the groups for any term.

Figure 28. Best example answers for röd in the older group. The participant who chose C17 did not fail the color blindness test. In this and future color matrix

figures, the color of font (white or black) varies for visibility reasons, to provide contrast with the lightness level of the background.

Figure 29. Best Example answers for röd in the younger group.

A AB BC 1 17 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Röd (Older Group, 0.139 rbits)

A AB BC 16 1 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Röd (Younger Group, 0.145 rbits)

Page 171: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

159

Yet the Wilcoxon results showed that röd was one of the terms used differently by the groups in the naming task results. Figure 30 and Figure 31 show the naming task distribution of röd in the answers.

Figure 30. The cells for which the older group used röd in the naming task.

Figure 31. The cells for which the younger group used röd in the naming task.

The figures show that the older group used röd as a main response more often than the younger group. Both groups place most röd answers in cell C1 (and also C2, D1, and D2), and many of the röd responses outside these cells are compounds or other complex answers (the older speaker for cell C15 used the term in the following way: blå men dock med aning dragning mot det röda ‘blue but with a slight touch of red’). It is the places where the younger speakers do not make use of röd that are the most interesting: the younger speakers have less röd usage in the pink area (to the left in Figure 31). The center of röd has not changed, but the younger group’s use of röd has shrunk inwards, away from the pink colors to the left. For the younger group, röd has a smaller denotation.

The röd uses that have lessened between the generations belong mainly to a particular group: compared to the other color terms, the uses of ljusröd ‘light red’ have decreased a lot between the generations (see Figure 24 on page 150). This begs the question of what term has replaced ljusröd for the younger speakers. Skär ‘pink’ was one of the terms found by the Wilcoxon test to be used significantly different by the groups, and a visualization of the results of the best example task (Figure 32) and the naming task (Figure 33, Figure 34) shows that this is a term with an unclear prototypical center and scattered uses in the old group, and almost no uses in the younger group.

A 1 AB 3 4 9 7 BC 1 1 2 7 8 19 16 2 CD 8 15 11 5 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Röd (Older group)

A 1 AB 5 6 BC 1 2 19 13 1 CD 1 6 16 15 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Röd (Younger group)

Page 172: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

160

The younger speakers were not asked about the best example for skär, but the older speakers were, and their responses are shown in Figure 32. The older group considers the best example of skär to be a very light color somewhere on the A-row, which mirrors their interview results, but there is low consensus on where in the lighter part the best example of skär is located, also reflecting the uncertainty indicated in the interviews. The older group’s best example responses show a high level of group-internal disagreement at 0.614 rbits – the only other color terms that had a higher level of group-internal disagreement were the other derived color terms gredelin ‘purple’ and turkos ‘turquoise’ (see Table 26 on page 157).

Figure 32. Best example answers for skär in the older group.

The five older speakers that use skär at all in the naming task use it sparingly over a large area of the color space, as can be seen in Figure 33. The naming task responses are centered on the lightest row (A) of the PINK1 area, which are also the placement of the best example guesses. In the younger group, skär is an exceedingly rare term – two users use it twice each, for the lightest part of PINK1, as shown in Figure 34.

Figure 33. The cells for which the older group used skär in the naming task.

Figure 34. The cells for which the younger group used skär in the naming task.

A 2 4 3 5 1 AB 2 2 BC CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Skär Older Group, 0.614 rbits)

A 1 2 5 2 5 4 1 AB 1 1 2 1 BC CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Skär (Older group)

A 1 1 1 AB 1 BC CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Skär (Younger group)

Page 173: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

161

Skär and röd both show a reduction in naming task use from the older to the younger generation, but it is the combination of the high level of uncertainty in the best example task and the few uses in the naming task results that are evidence for the ongoing disappearance of skär from Swedish, in contrast with the adjustment of denotation that röd is undergoing. With respect to rosa ‘pink’, the Wilcoxon test in section 6.4.3 confirmed that this term is treated differently by the two groups – but that test yielded no information about what this difference consists of. There are two different, but related, observations on the changing use of rosa that stand out from the material. The first is that the older group has less consensus than the younger group when it comes to the placement on the best example, as can be seen in Figure 35 and Figure 36: for the older group, there are several candidate cells for the best example of rosa, at both the lightest row (row B; 9 answers in 4 cells) and the next-to-lightest row (row B; 9 answers in 2 cells). More than half of the younger group agreed on the best example of rosa (B19; 11 speakers) and the rest chose either A20 or B20 (4 speakers each). As noted before: if the level of consensus for the best example results is measured in rbits, rosa has the largest discrepancy for any color term between the generations. This lack of consensus is most likely due to the fact that some of the speakers have a competing term, skär.

Figure 35. Best example answers for rosa in the older group.

Figure 36. Best example answers for rosa in the younger group.

The second observation is that the naming task results indicate that the disappearance of skär and the decline of röd is due to the expansion of the term rosa. Figure 37 and Figure 38 show a strengthening of the rosa (main

A 3 2 2 2 AB 7 2 BC 1 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Rosa (Older Group, 0.598 rbits)

A 4 AB 11 4 BC CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Rosa (Younger Group, 0.330 rbits)

Page 174: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

162

response) denotation in the A-row, the B-row, and the C-row from the older to the younger generation. The frequencies of ljus- (‘light’) modified terms discussed for the entire color term inventory in Figure 24 (page 150) hint to the probable cause, and the distribution of the ljusrosa ‘light pink’ and ljusröd ‘light red’ responses confirm it: in general, where the older speakers used ljusröd, the younger speakers use rosa. And where the older speakers use skär, the younger speakers use rosa, more specifically: they often use ljusrosa. Both the best example results and the naming task results give testimony to the ongoing lexical change.

Figure 37. The cells for which the older group used rosa in the naming task.

Figure 38. The cells for which the younger group used rosa in the naming task.

Turning to the last of the pink terms, the Swedish color term cerise has already been discussed from a cross-linguistic perspective in section 5.6. The younger generation’s data was compared to speakers of approximately the same age in several other languages, and their use of cerise matched the comparative concept PINK2, which centered on C19 and C20 (and to a slightly smaller degree on B19). As has been previously stated, there is no best example task results for cerise for the younger generation, but the older generation responses are shown in Figure 39. There are only 17 responses, since one speaker declined to name a best example cell).

A 10 17 16 17 16 8 4 AB 6 15 14 9 BC 1 2 6 1 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Rosa (Older Group)

A 1 5 19 19 17 18 12 3 AB 8 18 18 16 BC 11 15 CD 2 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Rosa (Younger Group)

Page 175: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

163

Figure 39. Best example answers for cerise in the older group.

Figure 40 and Figure 41 show the uses of cerise as a main response by the two groups, and there is not much difference. The interviews showed that cerise had entered the vocabulary of the older speakers when they were adults, but even so the color term has a consistent and frequent use for a large part of the PINK2 area.

Figure 40. The cells for which the older group used cerise in the naming task.

Figure 41. The cells for which the younger group used cerise in the naming task.

Looking deeper at the more complex responses, however, reveals that a semantic change is ongoing for cerise as well, and that this is interconnected with the expansion of rosa and the decline of röd. Ceris typically alternates in C19 and C20 with röd in the older generation, and with rosa in the younger generation.

6.4.4.2 VIOLETT, LILA, GREDELIN, AND BLÅ IN PURPLE1 The lexical changes in the violett ’purple’, lila ‘purple’, gredelin ‘purple’, and blå ‘blue’ terms are best illustrated by starting with violett and lila. Here there are several intertwined question: what is the difference between lila and violett for the older generation? What is the difference between lila and violett for the

A 1 1 AB BC 1 10 4 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Ceris (Older Group, 0.391 rbits)

A AB 1 BC 1 9 8 1 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Ceris (Older group)

A 1 AB 1 1 BC 1 5 9 CD 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Ceris (Younger group)

Page 176: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

164

younger generation? What lexical changes have happened or are ongoing between the generations?

Starting with the best example results, the older speakers had similar best example answers for both violett and lila, and there is also similarity between the older group’s best example answers for lila and the younger group’s best example answers for the same color terms. For “old violett”, the best example fell either on C16 (for six speakers) or C17 (for seven speakers) for most of the participants. For “old lila” and “young lila”, the best examples were either C16, C17, or C18, see Figure 42, Figure 43, and Figure 44.

With respect to the level of consensus within each group, the older group has a relatively high consensus for violett (0.460 rbits), less so for lila (0.599 rbits). The younger group’s level of consensus for lila falls in between, at 0.526 rbits. As with skär and rosa, the differences between the groups are less about the prototypical center than about the boundaries of the denotations.

Figure 42. Best example answers for violett in the older group.

Figure 43. Best example answers for lila in the older group.

Figure 44. Best example answers for lila in the younger group.

Judging from only the best example results, lila and violet might as well be synonyms. Some important differences are revealed in the naming task results,

A AB BC 6 7 1 CD 1 3 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Violett (Older Group, 0.460 rbits)

A AB 1 1 BC 4 3 5 CD 4 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Lila (Older Group, 0.599 rbits)

A AB BC 3 6 4 CD 4 2 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Lila (Younger Group, 0.526 rbits)

Page 177: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

165

however. The Wilcoxon test on the naming task data showed a borderline significant p-value when testing whether the use of lila was similar between the two groups, but this difference turned out to be not so robust in the leave-one-out-test. This is reflected in the visual representation of the answers in Figure 45 and Figure 46: the older and younger group’s uses of lila are spread out over a very similar area. Differences that can be seen are the presence of a dark “tail” in the older group’s answers in the lower right part of the PURPLE1 area (D2, D3) and a lack of the same tail in the younger group’s answers – similarly, the younger group had more lila uses for the A row than the older group. The frequency of use for the ljuslila ‘light lila’ terms was also noticeably higher for the younger group than the older group (Figure 24 on page 150), and it may be that lila is becoming a lighter color – but more data is needed to ascertain this.

The future fate of violett is easier to predict than that of lila – the Wilcoxon test revealed that the groups used the term differently, and that can be seen in a visual representation of the distribution of responses as well. The violett answers of the older speakers (see Figure 47) are spread out in the darker part of the PURPLE1. Violett is rarely used by the younger group (Figure 48), who prefer lila. Remember: among the pink colors, skär was used for a light by for the older group, but the younger speakers instead used ljusrosa ‘light rosa’. If the younger generation had also replaced violett with mörklila, this would form a perfect parallel, but this does not happen: instead violett, which already in the older generation is not a frequent term, is just disappearing by the younger generation.

In the cross-linguistic analysis of the group of color terms collectively called Purple2 in section 5.7, Swedish violett was hesitantly categorized as belonging to System C, which meant one dominant term denoting the PURPLE1 area (i.e. lila), and a secondary term denoting a darker part of the PURPLE1 area (i.e. violett). The cross-linguistic dictionary analysis indicated, directly or indirectly, that the secondary Purple2 terms had historically been the primary term for the purple color for the seven Germanic languages, and the more detailed analysis of the Swedish history of the color term that was presented earlier in this chapter has confirmed that this was indeed the case for violett as well. The older group has violett answers spread over three degrees of lightness (rows B, C, and D), but the majority of the answers are in the D row: the older speakers who use violett do so in reference to a darker nuance in the PURPLE1 area. It has not been possible to ascertain that the contemporary PURPLE1 now denoted by lila is the same area of color space that was, in the 19th century, denoted by violett: it is possible that violett has always been a dark

Page 178: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

166

purple color. But given the various descriptions of the color in older dictionaries (placing it between red and blue, with no indication of it being a particularly dark color, the assumption will be, for now, that violett historically denoted a larger area, similar to PURPLE1, of color space, and that the term’s denotation has since been pushed out to the darker parts as it is being lexically replaced by lila.

Figure 45. The cells for which the older group used lila in the naming task.

Figure 46. The cells for which the younger group used lila in the naming task.

Figure 47. The cells for which the older group used violett in the naming task.

Figure 48. The cells for which the younger group used violett in the naming

task.

The gredelin lexeme can be said to have gone from a very weak presence in the older group (see Figure 50) to an almost non-existent presence in the younger group (see Figure 51). In the best example answers from the older

A 8 10 1 1 AB 2 16 17 12 BC 1 18 17 16 2 1 CD 16 17 18 16 12 2 2 2 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 0

Naming Task Lila (Older group)

A 11 14 1 AB 1 17 18 14 BC 19 19 19 2 CD 1 17 18 19 17 12 5 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Lila (Younger group)

A AB 1 1 1 BC 1 2 CD 1 1 2 3 2 2 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Violett (Older group)

A AB 1 BC CD 1 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Violett (Younger group)

Page 179: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

167

generation, gredelin also had the most diffusion of any color term. The 17 speakers who answered gave 10 different cells (see Figure 49) – calculated in rbits, this is 0.737 rbits, the highest degree of disagreement within the older group for the placement of any best example for any color.

Figure 49. Best example answers for gredelin in the older group.

Figure 50. The cells for which the older group used gredelin in the naming task.

Figure 51. The cells for which the younger group used gredelin in the naming

task.

The interaction between röd ‘red’ and the Swedish pink terms (rosa, skär, cerise) is quite different from the interaction between blå ’blue’ and the purple terms (lila, violett, gredelin). Whereas the denotation of röd shrunk between the generations, the border between the PURPLE1 area and the bluer area to the left in the visual representation in Figure 52 and Figure 53 is very sharp. Column 15 is a blå column; column 16 is a lila column. The fuzziness that existed between rosa and röd, between rosa and cerise, and between cerise and röd is not present for the blå-lila border. There is a scattering of blå uses in the PINK1 and PURPLE1 area for the older group, where the older speakers used different kinds of compound color terms, but these compound terms are very rare in column 15 (the blå column) and column 16 (the lila column). The Wilcoxon test showed that the color term blå was used in a significantly

A 1 AB 1 1 1 BC 3 3 3 CD 1 2 1 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Gredelin (Older Group, 0.737 rbits)

A AB 1 1 1 BC 1 CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Gredelin (Older group)

A AB 1 BC CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Gredelin (Younger group)

Page 180: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

168

different way by the generations, but this result became less clear in the leave-one-out robustness test. In any case, a visual inspection of the naming task results in Figure 52 and Figure 53 shows that except for the scattered use of blå compounds in the PINK1 and PURPLE1 area mentioned above, the distribution differences between the groups are mainly focused on the green-blue border, where the term turkos ‘turquoise’ is appearing. As interesting as the increasing salience and use of turkos is in Swedish, it is outside the scope of the present chapter.

Figure 52. The cells for which the older group used blå in the naming task. The

blå uses in the center of the color matrix are compounds.

Figure 53. The cells for which the younger group used blå in the naming task.

To summarize some important points: in general, for the PINK1, PINK2, and PURPLE1 areas, the younger speakers seem more certain in their answers and more categorical in their decisions, with a higher consensus, and they also use fewer color terms. For the younger group, lila and rosa are the preferred answers, and the rare terms skär, violett, and gredelin are dispreferred to the point of exclusion. The younger speakers are less imaginative in their modifiers (using mostly ljus- and mörk-.) The denotation of the younger group’s rosa covers a bigger part of the perceptual color space (light, dark, and reddish) than the older group’s and has crowded out röd ‘red’. The younger group’s lila may be shifting towards the lighter part of the spectrum (though this is uncertain).

A 10 14 9 18 18 3 1 1 4 AB 6 5 18 19 18 1 1 5 2 4 BC 7 9 19 19 19 1 1 2 7 CD 7 18 18 19 19 5 1 1 2 1 3 2 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Blå (Older group)

A 7 9 18 18 19 1 1 2 AB 2 3 18 19 19 1 4 1 BC 4 10 18 19 19 1 CD 7 15 19 18 19 1 1 2 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Blå (Younger group)

Page 181: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

169

6.5 DISCUSSION This section will address the following questions:

1. Do derived colors show more signs of change, or different forms of change, than other color categories? 2. Do the purple and pink derived color concepts show more and different forms of change? 3. What are the processes evident in this change? 4. Do these processes conform to, or suggest amendments to, published theories of lexical replacement and change in the color domain.

Questions 1 and 2 will be dealt with in section 6.5.1, and 3 and 4 in section 6.5.2.

6.5.1 ARE DERIVED COLORS IN GENERAL, AND PURPLE AND PINK IN

PARTICULAR, SPECIAL?

Overall, the results show that the older and younger groups diverge more in their lexical treatment of the purple and pink areas than they do for other parts of the perceptual color space.

In the best example task, the older generation had less consensus about where to place the best example for lila, rosa, and turkos than the younger speakers did. These terms all denote derived color categories – but this lack of consensus for the older speakers did not happen to the placement of the best example of grå and orange, two other derived colors. Instead, for these terms, the speakers were in agreement. This could be taken as support for the idea that derived color concepts, due to their late acquisition in Swedish, will still be in a stabilization process and have signs of lack of consensus (as shown by lila, rosa, turkos), but it also shows that this need not happen by necessity (as shown by grå and orange). The older generation also had less consensus about the best example of skär, violett, and gredelin than they did for most other color terms.

The Wilcoxon test showed that only a few terms were used differently by the generations. These were skär ‘pink’, rosa ‘pink’, violett ‘purple’, blå ‘blue’, röd ‘red’, and svart ‘black’. Lila ‘purple’ and gredelin ‘purple’ were used borderline significantly different. These terms can be sorted into a pink group, a purple group, red and blue, and finally black. There is lexical change in the Swedish PINK1 and PURPLE1 concepts. Some of the results of the Wilcoxon

Page 182: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

170

test can also be inferred from a plain frequency count: Many terms differ in frequency of use between the generations, but (as can be seen in Figure 22) some of the largest differences can be found in rosa and röd: the younger generation uses a lot more rosa and a lot less röd.

Looking at how modifiers and hesitations were used by the generations, chi-square tests show that the groups behaved differently in their answers for the pink and purple areas than what could have been expected given their general response pattern for the whole color space. Furthermore, the most common modifiers for both groups were the lightness modifiers ljus and mörk. The clearest differences in the frequency use of the lightness modifiers are how the ljus- modifier is applied to lila, rosa, and röd, which indicates yet again that the variation between the groups is focused on the purple and pink areas.

This lends strength to the hypothesis that lexical replacement does not happen randomly to all parts of the color space with equal likelihood. Swedish is a stage VI language in the Berlin and Kay paradigm, and the present study lends credence to the thought that the later acquired color terms may be more susceptible to lexical competition. For a stage VI language, the latest acquired color categories will be the derived ones, like orange, gray, purple, pink, brown, and turquoise. Not all derived colors in Swedish show clear signs of lexical change (orange and gray do not), but several do (purple, pink, turquoise). Of these, purple and pink are the colors that have terms that are used most clearly in a different way by the two generations in the elicitation experiment.

This does not mean that only color terms connected to derived colors are affected. As derived color concepts are established as independent concepts in the language community, their presence and spread affect neighboring color concepts, both primary and derived. The denotational shrinking of the denotation of the primary color term röd ‘red’ (specifically the shrinking of the term ljusröd) is connected with the spread of rosa ‘pink’. There are likewise changes in Swedish blå ‘blue’. The reason for the change in the term blå ‘blue’ is not pursued in detail in this text, other than to conclude that it is not mainly connected with changes in lila ‘purple’; instead, the difference in blue usage between the generations is focused on the blue-green border (where another derived color term, turkos ‘turquoise’, is making an appearance). The fact that svart ‘black’ is handled differently by the generations (younger speakers mostly use it for the achromatic color chips, while several older speakers also use it for the darkest purple, red, and brown) can also be noted, and this cannot be explained as being connected with

Page 183: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

171

expanded derived colors – it is possible that it is connected to age-related physiological changes in the visual system.

6.5.2 LEXICAL CHANGE PROCESSES IN PINK AND PURPLE

With the historical review and interview as a background, the elicitation data from the two generations indicate three lexical replacement processes ongoing in the data. Skär is being replaced by rosa: skär is used for a light pink by some in the older generation, but almost never by those in the younger generation. Violett is being replaced by lila: it is used for a darker purple by some in the older generation, but very rarely by those in the younger. Judging by only the experiment results, gredelin is all but gone, replaced by lila: it has a weak presence in the older generation and has only a single mention by a single young speaker.

While all methodologies show the same general tendency, they also tell slightly different stories about how far the lexical replacement processes have come. From this detailed perspective, it is also clear that there are several intertwined processes of lexical change: here called restriction and expansion (both well-known processes in language change in general, mentioned in section 2.4, that are here seen in a very concrete way), movement (related to restriction and expansion), focusing, and denotational loss.

Restriction. Going by the historical and interview data in Sections 6.1 and 6.2, skär was the general term for pink until the 1970s. The majority of speakers in the older generation used skär for pink when they were children, and many spontaneously volunteered that it was the only term for pink that their parents used. In contrast with their parents’ reported use, most of the older speakers said that they used rosa (in the interview). In the experiments, they showed that they used primarily rosa, and when some speakers used skär it was always for a very light pink. The younger speakers almost never used skär at all: the replacement process is almost complete in three generations. Similarly, the historical review showed that violett was probably the most salient purple term in the 19th century, challenged by gredelin at the turn of the century, and then challenged by lila in the 20th century. All speakers in the older generation said in the interview that they used lila as the most common term for purple as children, not violett. Most of them indicated in the interview that they were confident in labeling violett as a kind of lila, usually a bluish, darkish lila. In the experiment, likewise, violett was used (by the few speakers that used it) for a darkish lila. Both skär and violett have undergone

Page 184: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

172

denotational restriction and are in the middle (or end) of a lexical replacement process.

Expansion. The first signs in the surveyed material that rosa started to be used in everyday language come from the beginning of the 20th century, when the term started to appear in corpora. Skär was the more dominant term before that, and rosa went from being a less salient term to being the most salient term around the middle of the 20th century, judging by the interview answers. From both the interview and the elicitation data, we know that rosa is the most salient term for the pink area for the older generation in this study, and for the younger generation. But the younger generation’s experiment results also show that the rosa denotation has increased, crowding out röd, and more or less replacing the skär term that the older generation used for the lightest pink. Rosa also increased its presence in the darker parts of the pink area, when it replaced röd in combinations with cerise.

Movement. Movement is expansion, but with the added realization that a color concept can abandon some areas of color space, as it gains a new one. A weak case of this can be seen in the Swedish purple category: some older speakers use purple words, in particular lila, for certain cells in the D row, while younger speakers more rarely do this. At the same time, more younger speakers use purple words for the lightest of the rows (A row). This indicates that a color term’s denotation can shift (move) in color space.

Focusing. While some older speakers, especially male ones, voiced their worries that they did not really know fashion color terms like cerise, overall neither generation showed much difficulty in eventually naming the same part of color space cerise. Cerise is a very focused term in that there is a specific, and rather small, part of color space that is called cerise (C19 and C20 in the grids). While some speakers do not use cerise for C19 and C20, many acknowledge that this is not a regular rosa part of color space by using a modified terms for at least one of these cells. Cerise is only rarely used outside this area. For the older generation, cerise is often found compounded with röd, and for the younger it is found compounded with rosa.

Denotational loss. Is gredelin still a Swedish color term for the younger generation? It appears only once in the younger generation’s data. It is not used in the 1976-1999 fiction corpora (see 6.1.2.2). Speakers in the older generation in this study used the term very rarely in the naming task, and struggle to define its denotation relative to lila and violett in the interview. Several speakers in the older group say outright that they do not know the color, that it is a strange color, a made-up color, a fairy tale color. The speakers have little consensus about the denotation of the color, and yet insist

Page 185: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

173

that it is in fact a color. The reason for this might be that even though gredelin has undergone denotational loss, so that it has no clear denotation in the speech community, it still has a strong connotation and thus has a meaning. Denotational loss does not mean connotational loss. Gredelin is the color of the gown of the popular children’s fairy tale figure Tant Gredelin (Beskow, 1918) – most Swedish speakers know that. That they can’t pinpoint the color nuance does not take away this meaning of the term. Gredelin thus remains in the Swedish color term vocabulary, a color term without clear denotation, and is likely to do so until the children’s book falls out of favor with the populace.

6.5.3 RECONNECTING WITH SOME PREVIOUS THEORIES While it seems likely that the skär, violett and gredelin terms will be

replaced in one or a few generations, they are at different stages of the replacement. Can these stages be related to MacLaury’s proposed stages in the creation of new color categories? Recall MacLaury’s hypotheses (see section 4.2.2) about the process of creation of a new independent color concept in a language. The process (which I call “Process A”) breaks down into the following steps: a) a color concept first has two terms that appear as synonyms, then b) the two terms get different best example prototypes, then c) one of the terms is established as a hyponym of the other, referring only to a particular nuance of the larger color, and then, finally, d) the hyponymic color region splits of from the larger color region, forming a new independent color concept. MacLaury also suggests an alternative process (which I call “Process B”) that starts at c), with a new term taking on a particular nuance of an established color term, and then d) becoming independent. Process B fits with the development of cerise – it is a very small color, denotationally, focusing very strongly on two color cells, and rarely being used outside these cells at all.

The steps in MacLaury’s Process A are similar to what is happening for both skär and violett in their lexical competition with rosa and lila. The end result for skär and violett is most likely not that the terms will denote independent new color categories. Instead it seems likely that they are facing lexical disappearance and accompanying lexical replacement. For step a), instead of a stable synchronic situation of synonymy at a particular time, we have a diachronic perspective of lexical competition between skär-rosa, and violett-gredelin-lila. At some point during this diachronic process, there probably was synonymy.

The data from the elicitation experiment also show us an equivalent of stage c (c’) for the older generation, where skär and violett are no longer

Page 186: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

174

contenders for being the most salient, primary term for PINK1 and PURPLE1, but instead have undergone denotational shrinking to a light versus a dark part of the larger color area. Another stage (d’), most obvious for skär, is a very weak presence in the older part of the speaking community that suggests a lexical loss in the younger generation.

Both the lexical replacement processes of skär and violett act as a reversed form of Archibald’s (1989) suggestion that it is often the lightest or darkest part of a color that splits off to form a new independent color – instead, skär and (possibly) violett underwent denotational shrinking to smaller light or dark parts of a color, and are most likely going to disappear at the next stage. The importance of lightness in the Swedish color vocabulary is also supported by the fact that the most important modifiers used for color terms are ljus ‘light’ and mörk ‘dark’. The data in this chapter lends support to the Lindsey & Brown’s (2014)observations that the areas between established color concepts had the most lack of consensus in naming. This lack of consensus can continue even after new concepts are entrenched in the language. Both the Swedish PINK and PURPLE concepts are considered established by contemporary speakers, and speakers express surprise when told that the color concepts have not “always” been a part of the language. Yet their relative youth is evidenced in the continued upheaval in their naming. The fuzzy edges between Swedish PINK and PURPLE, and between PINK and RED, are not echoed in the sharp BLUE-PURPLE border, however.

Page 187: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

175

7 GENERAL CONCLUSIONS This thesis has presented three studies that approach the process of

lexical replacement from different perspectives. The first study (the statistical macro study, chapter 3) has a (relatively speaking) broader time scale perspective and investigates factors that affect the rate (likelihood) of lexical replacement in the core vocabulary of 98 Indo-European language varieties. The second study (the cross-linguistic meso study, chapter 5) has a smaller time scale perspective since it narrows the focus to only seven closely related languages from the Germanic language family (English, German, Bernese, Danish, Swedish, Norwegian, and Icelandic.) The data is also restricted to the semantic domain of color, and in particular the way that speakers of these languages partitioned and labeled the pink and purple parts of the color spectrum. The third study (the inter-generational micro study, chapter 6) narrows the focus even further. It considers two generations of speakers of a single language, Swedish, and combines experimental data on how the two groups partition and label the color space in general (and the pink and purple parts in particular) with more detailed data on lexical replacement and change from interviews, color descriptions from historical and contemporary dictionaries, as well as floras (botanical encyclopedias), and historical fiction corpora.

The thesis began by listing three general research questions, each with more concrete subquestions:

A. What affects the likelihood of lexical replacement?

A1. Is it possible to formulate global, domain independent generalizations? A2. How can local, domain-dependent generalizations be found?

B. How does lexical replacement proceed in the semantic domain of

color?

B1. How does lexical replacement interact with other kinds of lexical change? B2. How does knowledge about the semantics and pragmatics of color (psychophysiology, sociohistory) help elucidate lexical replacement in this domain?

C. How do different perspectives on lexical replacement relate to and

complement each other?

Page 188: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

176

The “A” questions are covered by the macro study. The “B” questions are addressed in the meso and micro studies. Reflections on the “C” question are interwoven in chapter 6, where more general results (e.g. about processes of lexical change in several languages in chapter 5) make it easier to understand more detailed results (e.g. individual word histories in Swedish). Additionally, “C” will also be addressed in these general conclusions.

This thesis argues that it is both possible and worthwhile to investigate generalizations about lexical replacement on both semantic domain-specific (local) and domain-independent (global) levels, and that it is useful to triangulate different perspectives in order to gain a deeper understanding of the phenomenon. Method triangulation is a process in which validation of results is facilitated through cross-verification using several different sources, methods, and/or perspectives. All these perspectives are valid and can inform one another – they merely lend themselves to different kinds of generalizations about lexical replacement, in this case, and different kinds of data and methodology.

The contributions of the macro study are both methodological and theoretical. Methodologically, the macro study adds to the current body of work in computational semantics, which advocates regression modeling as a worthwhile method for studying domain-independent generalizations about lexical processes.

The macro study also shows that open-class and close-class concepts should be analyzed separately when it comes to diachronic changes: lexical change and grammatical change are quite different, and while the latter can to a great extent be predicted based on frequency, it is necessary to have a more nuanced view when approaching how and why processes of lexical change occur. This is both a methodological and theoretical contribution.

The macro study presents a statistical model that, to a degree greater than models that have been published previously, can explain the variation in rates of lexical replacement for open-class vocabulary (34%). Significant factors in the model are (in descending order of importance) frequency of use, the number of semantically related words (synonyms), the ease of which one can form an image in the mind for the concepts corresponding to the words (imageability), and the number of senses that a word has.

While it is thus possible to formulate domain-independent generalizations about lexical replacement, it is important to have an awareness of both the strengths and the weaknesses of the data, as well as the methods used. In the case of the macro study, statistical models can only partly explain the variation in rates of lexical replacement. The nature of the data (core

Page 189: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

177

vocabulary, not a representative sample of the entire vocabulary) also needs to be kept in mind when generalizations are made concerning the lexicon as a whole.

The macro study also discusses to what extent this data can be used to investigate statistical differences in rate of replacement in particular semantic domains. Unfortunately, any results are greatly dependent on the taxonomy (the list of semantic domains) that is used and how concepts are categorized into these domains. An ANOVA test using a well-known taxonomy from the World Loanword Database (WOLD; Haspelmath & Tadmor, 2009a) initially showed that there were some differences in the rate of lexical replacement between the WOLD semantic domains, but these differences disappeared once imageability was taken into account.

Lexical replacement for specific semantic domains can also be approached one domain at a time. This is done in the meso and micro studies, where the focus was narrowed to the single semantic domain of color. The studies take a successively smaller time scale perspective, which leads to the use of more precise lexical data.

The color domain was chosen for several reasons. First, its semantic content is more readily measurable and reproducible in experimental stimuli, and is therefore more suited for cross-study comparison than many other semantic domains. The semantic content also has an internal conceptual contiguity – regardless of human categorization, it can be established that RED is closer to PINK than it is to GREEN. In this it resembles other semantic domains that have successfully been investigated through lexical typology, such as body parts (see e.g. Majid, 2010) and temperature (Koptjevskaja-Tamm, 2014). Third, there is a strong research tradition in the lexical typology of the color domain, the methods and previous findings of which can be repurposed to infer diachronic processes. In discussing the ongoing state of lexical variation and change in color, MacLaury notes that “an individual system of color categories retains a part of its past, projects its future, and maintains a constellation of basic categories in the present” (1997, p. 107). Fourth, the color domain has an advantage when it comes to visualization: semantic differences are usually difficult to express and illustrate, but with color it becomes possible to represent denotational differences between different groups of speakers as areas in a color matrix.

The narrower the time scale and more detailed the lexical data, the clearer it becomes that the prototypical view of lexical replacement (the concept stays the same, the words replace each other), which was necessary for more general conclusions about lexical replacement in the macro study, is

Page 190: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

178

less useful in observations at narrower time scales (in the meso and micro studies). Instead, lexical replacement at narrow time scales is best explained in conjunction with other lexical change processes, like semantic change.

The cross-linguistic meso study of the seven Germanic languages argues for the usefulness of investigating closely related languages. While the macro study treated lexical replacement as a binary fact (it had either occurred, or it had not), the meso study answers questions about how replacement processes have taken place. For example, even if the same lexical replacement tendency occurs in related languages, the lexical strategies can clearly be different. When a new language-specific color concept (matching the comparative color concept PINK1) replaced part of the area that used to belong to another color category (matching RED), the lexicalization strategies differed for different languages, even as contact between speakers led to very similar semantic changes. The strategies include borrowing (Swedish rosa, from French); reanalysis of an existing word (Icelandic bleikur from ‘pale’); and use of a modified term (lyserød ‘light red’), which then assumes a denotation independent of that of its head constituent (in this case, rød). During the last few decades, it is useful to posit a secondary comparative color concept (PINK2) with local matches in a subset of these languages (German, Bernese, Danish, and Swedish). Different lexicalization strategies are used: borrowings, such as pink in, e.g., German; reanalysis of a rare word, viz. cerise, in Swedish. The advent of PINK2 typically affects the denotation of PINK1.

The meso study marries cross-linguistic experimental results with additional sources. Dictionary information and sociohistorical knowledge makes the lexical change process clearer – for example, the question of why new words (most likely matching very similar comparative concepts) appeared roughly at the same time in the languages. Part of the answer lies in the technical advances in chemistry and dyeing, and the import of more easily dyed fabric from India. This is one of many examples that indicate that technological advancement speeds up lexical replacement and change (as discussed in section 2.6 of the background).

The inter-generational micro study, finally, is one of the first scholarly works using color experiment methodology to compare the behavior of different generations of speakers of the same language, rather than comparing speakers of different languages. Diachronic change can then be inferred from the synchronic behavior of the two age groups. The micro study tests the hypothesis that there are differences between the groups, and that these differences are focused in the color regions matching two particular color

Page 191: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

179

concepts – PINK1 and PURPLE1. This is borne out through a battery of tests – among them a Wilcoxon test of the color denotation ranges of the two groups and normalized entropy calculations on the level of consensus within and between the groups. The study argues that it is interesting to look for statistically significant deviations in color denotation ranges between the generations – typically, researchers devote most of their time to best example (focal) points, not on the denotational footprint of color terms.

Yet the experimental results are most useful when combined with more qualitative methods – the micro study employs classical techniques like speaker interviews, corpus studies, and findings from almost two centuries of dictionaries. It also introduces the use of a historical progression of botanical encyclopedias to study semantic change in color term denotation: the color of the flowers has stayed the same throughout the centuries, but their descriptions have varied.

Different kinds of sources (e.g. corpora, dictionaries, botanical encyclopedias) have different levels of inertia in how quickly they come to reflect the spoken language, and genre conventions also play a role in lexical choices. The sources tell slightly different stories about when different lexical changes occurred – a change that is revealed in interviews (such as the replacement of skär by rosa for Swedish PINK in mid-20th century Swedish) might not (yet) be reflected in fiction corpora (where skär was the most common of the two words even in works published in late 20th century). But, importantly, the different sources coalesce when it comes to the different phases of lexical replacement. For Swedish it goes something like this, to simplify the complex descriptive results discussed at length in chapter 6:

Ceris, a word matching the cross-linguistically relevant comparative concept PINK2, is, at least for some speakers, currently replacing part of the denotation of rosa ‘pink’. Rosa ‘pink’ (currently matching the PINK1 comparative concept) is also replacing skär ‘pink’. Skär ‘pink’ once replaced part of the denotation of röd ‘red’. Similarly, for PURPLE1, lila is replacing violett. Violett was at one point in lexical competition with gredelin. Violett is also a new term, presumably introduced for colors that were previously labeled brun ‘brown’ or with color compounds (e.g. rödblå ‘red blue’).

These replacement processes occur in tandem with semantic changes for the words: Ceris has been a reddish term (evidenced by its lexicographic history and by the way older speaker sometimes combine it with röd) but is becoming more of a pinkish term. Skär is present in the elicitation results of some older speakers, but only for a very light PINK1: the term skär has undergone semantic change. Röd ‘red’ has a smaller denotative range for the

Page 192: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

180

younger generation than the older: where it disappears, the younger generation uses rosa ‘pink’ instead. Violett is very rarely used by the younger generation, but when it is used by some speakers of the older generation it is typically used for the darker part of PURPLE1.

The inter-generational comparison in the micro study enables identification of several different kinds of lexical change processes. There is restriction and expansion (both well-known processes in language change in general), movement (related to restriction and expansion), focusing, and denotational loss – these processes are discussed in 6.5.2.

As shown with these examples, the micro study has several descriptive results with respect to lexical replacement and semantic change in Swedish – some of these “word histories” will be further discussed here.

When the micro study is combined with the knowledge gleaned from the cross-linguistic distribution in the meso study, several aspects of the lexical processes are elucidated. Without the cross-linguistic data in the meso study, it would not be clear that Swedish cerise ‘pink’ is probably not just another short-lived, rare color term: the comparative concept that it matches, PINK2, has a stable presence in several neighboring languages, as discussed above in the context of the meso study. The detailed interview data and intergenerational comparison of the micro study further establishes that while cerise ‘pink’ might have a limited denotation, it is in fact very focused with a strong presence in a particular small region in color space. The intergenerational data shows the process of change: for example, cerise ‘pink’ typically alternates with the röd ‘red’ term in the older generation, and with the rosa ‘pink’ term in the younger generation.

In contrast, comparing the results of the meso and micro studies reveals that Swedish skär ‘pink’ is a case of local Swedish lexical replacement that is not matched by a pattern in neighboring languages. Naturally, more data on older speakers’ color categorization and labeling in the other Germanic languages would shed further light on this.

The word history of Swedish skär is very similar to that of violett ‘purple’, another word studied in detail in the micro study: they are both in ongoing processes of lexical replacement, by rosa ‘pink’ and lila ‘purple’, respectively. But the micro-study analysis must be informed by the meso study finding of a cross-linguistic pattern: in all the studied languages in the meso study, there is a term matching a comparative concept PURPLE1, but there is also an ongoing disappearance of an older purple color term (Purple2) – and Swedish violett is one such word.

Page 193: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

181

There is not enough data to derive a quantitative figure for the rate of the lexical replacement processes at the narrower time scales of the meso and micro studies. But the competition of the terms shows that lexical replacement can be both surprisingly quick and surprisingly slow. It is quick in that in three generations, skär went from being the most salient pink term (in the generation before the “older speakers” in this study) to having a very weak presence among the younger speakers in this study. Also, the last few generations of Swedish speakers have gone through three different “most salient” terms for the purple region of the color spectrum. The turnover among color terms can also be surprisingly slow, as evidenced by gredelin. Gredelin is used very few times by the current older generation of Swedish speakers in the micro study, and the speakers disagree strongly about what the best example for gredelin looks like. Yet even if the Swedish speakers are uncertain about the denotation of the color word, it still lives on in the language thanks to a single very strong connotation: it is the color of the dress of a fairy tale figures, Tant Gredelin. The fact that the speakers do not have a visual memory of the color of that dress does not mean that the word does not have semantic content.

All these individual word histories are descriptive results – they can be compared to similar lexical change processes for other color concepts, but are in essence akin to the traditional taxonomies of different kinds of lexical changes (as discussed in 2.4 of the background). Descriptive results are interesting in their own right, but even more so when combined with general knowledge about the semantics of color, which brings us to another research question (“B2”).

The most well-known hypothesis about color semantics is the claim that there is a predictable partitioning strategy (a predictable order) according to which the new color terms will appear and start to denote parts of the color spectrum (most clearly articulated in Kay & Maffi, 1999). When a language has partitioned off all primary colors, derived color concepts (e.g. PINK, PURPLE, ORANGE) in the border regions between these primary colors will be partitioned off – and this is the stage at which the Germanic languages discussed in this thesis are. This makes the propensity for different kinds of lexical change in the lexicalization and conceptualization of PINK and PURPLE, and their neighboring colors like RED and BLUE, unsurprising – yet not all the derived colors labeled in the languages (e.g. ORANGE or GRAY) experience the same amount of lexical change.

Many of the other hypotheses in the literature on the semantics of color concern the advent of new color terms and their interaction with already

Page 194: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

182

existing terms. MacLaury (1997) has suggested different typical templates that this lexical competition may follow – but the templates cover mainly the way that two synonymous color terms may, first, become hypernym and hyponym, and then separate and denote two different color concepts. This thesis suggests that an additional template is of interest: when a term is in the process of disappearing, at the very end of a lexical replacement process, it may be pushed out to the border regions of the color concept that it once denoted. It will naturally become less frequently used and often denote a specific subset of the extension of a larger term – this can be exemplified by the Purple2 words in many of the Germanic languages (e.g. violett ‘purple’ in Swedish) or by the gradual marginalization of skär ‘pink’ in Swedish.

The hypothesis that lexical change in the color domain is tied to the border regions between colors has been approached by researchers such as Sun (1983), who notes that new concepts should and do evolve in the border regions between established colors. Lindsey and Brown (2014) and Alvarado and Jameson (2002) also write that there is high group consensus on how to label central parts of the color concepts (simple, basic terms are used), but that there is low consensus on how to name the border regions (complex, rare terms). The micro study shows that the PINK1 (which denotationally includes the hyponym PINK2) and PURPLE1 areas of Swedish color space are more prone to low consensus, hesitations, compounds, and modifiers than other parts of color space: this reinforces the idea that these are young concepts in the speaking community.

Another part of the psychophysiology of color is the connection between the aging visual system of the individual and her color labeling and categorization. Desgrippes’ (2011) results from French suggest that aging speakers are similar to second language learners acquiring a new color system. I have shown that older Swedish speakers have less consensus and more hesitations than younger speakers. It is possible that lexical change in the color domain is tied to subjectification (discussed in 2.7.4): at a certain age, the color perception of older humans changes, slightly but inevitably. The older people are, of course, still an active part of the language community, however, and as their color perception changes, so might their language, and this becomes a factor that affects lexical change in the language community.

Turning back to the general once more, the final research question (“C”) concerned how different perspectives on lexical replacement may complement each other. The interplay between the meso and micro studies has been demonstrated above.

Page 195: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

183

The factors that were found to be connected to lexical replacement in the macro study can also be (tentatively) connected to more concrete lexical replacement in the color domain. The the statistical model in the macro study indicates that low frequency, a high number of synonyms, a low level of imageability, and a low number of senses are connected to a higher rate of lexical replacement. But what could the concrete, local reflection of these generalizations be in lexical replacement processes in the semantic domain of color?

The connection between low frequency and the likelihood of lexical replacement is clear on the surface – as a term is used by fewer speakers, this can be a sign of its denotative loss. However, a low frequency of use can naturally also be connected with a new term, which may instead be gaining in salience. In the case of cerise ‘pink’, infrequent use is coupled with a high consensus of use: it is very prominent for a small part (mainly two stimuli) of color space. Violett ‘purple’ is also used infrequently, but its use is spread out over a large area. Violett is more likely to disappear than cerise.

If a high number of synonyms is connected to an increase in lexical replacement in the color domain, this is most likely due to the fact that it is easier to switch vantages if more concepts cover similar or neighboring parts of color space: as an example, many Swedish speakers can choose to use rosa or cerise for a darker pink, depending on whether they are focusing on difference or similarity, to use MacLaury’s (e.g. 1995) terms. The high number of synonyms is naturally connected to a greater chance of inference.

The imageability measurement in the statistical model was derived from averaging speakers’ evaluations of how easy it was to form a mental image of the concept denoted by a word. In the domain of color, this has a very tangible counterpart: how easy is it to point out the best example of a term? Is there a high group consensus regarding this, or about the extension of the term? Overall, the older group showed less group consensus when it came to the pink and purple color terms denotation than the younger group – the disappearance in the younger generation of the terms skär ‘pink’, violett ‘purple’, and gredelin ‘purple’ is heralded, obviously, by their lack of use among members of the younger age group, but also by the older group’s low consensus in the best example task and in the naming task; the high number of hesitations in answers; and the uncertainty displayed in the interviews.

The connection between a low number of senses and a higher rate of lexical replacement in the statistical model is possibly reflected in the lexical replacement of the color domain by the difference between denotational (extension in color space) and connotational (associations) meaning. Gredelin

Page 196: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

184

‘purple’ is a good example here: the Swedish term is, denotationally, all but gone in the younger generation and has a very weak presence in the older generation. Yet the term is widely recognized, and most of the older speakers had a consistent connotation in their interviews about what it represented: the children’s storybook character Tant Gredelin ‘Auntie Gredelin’ and, more specifically, the color of her dress. (The younger group was not asked about connotations of gredelin, but to give some anecdotal evidence of the current state of the word in younger generations, the storybook is still widely available in children’s sections of major bookshops).

The fact that the older group had a very low consensus about the denotation does not make the connotation any less strong. Gredelin ‘purple’ may soon be denotationally dead, but will live on through its other sense, showing that having many different kinds of meaning can strengthen a word form, protecting it from lexical disappearance.

In a nutshell, the general conclusions of this monograph are the following:

� It is possible and worthwhile to seek to uncover generalizations about

lexical replacement. � The scopes of these generalizations differ: some generalizations can be

made for the lexicon as a whole (they are domain-independent); others will have scope only over particular semantic domains (they are domain-dependent).

� The scope of generalizations also depends on the time scale perspective that is taken. Lexical replacement is a broad time scale perspective of what, at narrower time scales, is better understood as several intertwined gradual processes of lexical change.

� Given highly different time scales and the need for domain-dependent and -independent considerations, lexical change is best studied using triangulation of different methods.

� Quantitative methods, such as linear regression models, can be helpful in uncovering generalizations about lexical replacement at broad time scales. Such a model is presented in chapter 3, and it demonstrates a) that lexical replacement is dependent on the type of concept (open class, closed class), and b) that frequency, number of synonyms, imageability, and number of senses can, to a certain extent, predict variation in the rate of replacement for the (open class) comparative concepts in a Swadesh list.

Page 197: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

185

� Experimental methods are useful when investigating previous and ongoing lexical change processes at narrower time scales.

� Color is an ideal domain for studying lexical change because its semantic space is well understood and measurable extra-linguistically, and because there is a strong research tradition in the (synchronic) lexical typology of color that can be repurposed to understand diachronic change.

� The pink and purple zones of the color space in modern Germanic languages are particularly favorable for studying detailed lexical change in the color domain because they are highly dynamic.

� Lexical change processes in a particular language can be better understood if neighboring languages are also investigated – in this thesis, the cross-linguistic results of color elicitation experiments from seven Germanic languages are combined with results from two different generations of speakers of one of these languages, viz. Swedish.

� This is one of the first scholarly works to use the standardized color experimentation methods to compare speakers from two generations of the same language, rather than speakers of two different languages.

� A battery of different statistical tests can be used with experimental data to elucidate differences between groups, such as Wilcoxon tests, to check for denotational differences, and entropy tests, to compare the level of consensus between different groups.

� Lexical replacement results from different time scales can complement each other. The results from the statistical model in the macro study suggest that low frequency, a high number of synonyms, a low level of imageability, and a low number of senses are connected with a higher rate of lexical replacement. The domain-dependent version of these domain-independent claims could be that low frequency is a natural condition of disappearing color terms (but also of new terms); that high lexical competition among color words leads to many near synonyms; that a high level of imageability for a color term is similar to a high level of consensus in a group of speakers for the denotation of that term; and that the having several different senses can insulate from lexical replacement.

The choice of perspective governs the questions that can be pursued

and the kinds of answers that can be expected. Future research on lexical replacement should combine the

perspectives of macro, meso and micro studies. Diverse methods like

Page 198: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

186

statistical modelling, experiments, corpus research and classical dictionary studies all provide different pieces of the puzzle.

Domain-independent, general tendencies for lexical replacement can be fruitfully studied through statistical explorations of large data sets. At the moment, many researchers reuse the same datasets, which limits how far results can be generalized and increases the risk of statistical error. Many researchers (e.g. Calude & Pagel, 2011; Monaghan, 2014; Pagel et al., 2007) have used Dyen, Kruskal and Black’s (1967) cognate tagged database of Indo-European languages to produce interesting claims and hypotheses. The next step is to further test these hypotheses with new and larger datasets. But such datasets must first be compiled.

Two of the largest extant sources, apart from Dyen, Kruskal and Black’s work, of cognate tagged data are the ABVD (the Austronesian Basic Vocabulary Database, Greenhill et al., 2008) and the IELex (Indo-European Lexical Cognacy Database, http://ielex.mpi.nl/). Another possible data source is Buck (1949) which has approximately a thousand meanings and their realizations and cognate class from 31 Indo-European languages. That data was digitized by Sankoff and assistants in a meticulous and time-consuming project using punch cards, and formed the basis for his doctoral thesis (1970). Unfortunately, the digitized form of that data is no longer available, but the task could be redone – a part of the Buck data is already in IELex.

Another large repository for comparative dictionary data is the Intercontinental Dictionary Series (Key & Comrie, 2015) which stores lexical data for 1310 concepts for 849 languages (though not all languages have data for all the concepts). Translation equivalents can also be extracted from parallel text databases such as the parallel bible text corpus (Mayer & Cysouw, 2014) or the “Parallel Corpus of Slavic and other languages” (ParaSol, see von Wandenfels, 2006). Neither the IDS data nor parallel text data comes tagged with cognate class information, though.

An alternative to the interesting, but incredibly time-consuming, work of manually determining cognate classes for these data sources would be to automate the process. Manual cognate judgments can to a certain extent be substituted by using automatic distance measures, typically (versions of) Levenshtein distances (see e.g. Wichmann, Holman, Bakker, & Brown, 2010). This works well in some cases, like for Indo-European (Serva & Petroni, 2008), but less good in for other languages, such as Austronesian (Greenhill, 2011); and the more distant the language varieties are, the more problematic the method is (Dunn, 2015, p. 195). Recently, work is also being done on neural network approaches to automated cognate identification (see e.g.

Page 199: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

187

Rama, 2016). As the number of concepts under investigation increases, the chance of borrowing in the data also increases and Dunn & Terrill (2012) have shown that distance measures are very vulnerable to undetected borrowings. Nonetheless, this area of research is growing and will hopefully provide new and better datasets that can be used for research into lexical replacement.

As for domain-specific knowledge about how lexical replacement and change processes proceed, it would be interesting to apply the apparent time construct to more semantic domains. Data from several generations of speakers of the same language can then be compared in order to study lexical change, and the experimental data can be combined with corpus data from the same time periods as the speakers. The EoSS project has amassed data from 50 Indo-European languages not only on color, but also on body parts and (kitchen) containers. It thus provides an excellent basis for lexical typological research into these semantic domains: but the participants were mainly from one generation/of the same age: typically young adults. Larger-scale applications of the methods suggested in this thesis could be done, if more data were gathered on the lexicalization and categorization of older speakers.

In order to better establish apparent age construct methodology in lexical replacement of color, more research on the stabilization of color vocabulary in language acquisition (in children and young adults) is needed. Future research might also comprise longitudinal real age studies of color term acquisition.

The most promising semantic domains for further research into lexical replacement are those where lexical change happens relatively quickly. This is true for parts of the color domain, as has been shown in this thesis. Much work has already been done which catalogues different kinds of slang and taboo words, but it would be interesting to attempt to quantify the quick language change apparent in these domains.

At the heart of lexical replacement research lies the problem of comparing like with like. In the case of projects such as the EoSS protocol, this is assured through the use of the same stimuli. Future research could expand to other domains and language data sources where the concept, or referent, being named is (at lest somewhat) stable, but the lexical labeling varies over time. This thesis presented a small case study that showed how botanical encyclopedias can be utilized in this way. As more and more historical material is digitized, this method could be expanded.

Page 200: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

188

Finally, this thesis has approached lexical replacement as an almost context-less phenomena in several respects. It has not dealt with the sociolinguistic aspects that might motivate lexical replacement, or that different groups (like genders) might replace different part of the vocabulary to different degree. It has also only touched upon the interesting question of how words can live on in one genre but disappear in another.

To end this thesis, and as a final note on the importance on triangulation, I’d like to quote an old, well-known Indian parable:

Some blind men encounter a big animal for the first time, but each touch a different part.

“The beast is like a rope!” says one. “No, it’s like a pillar!” says another.

“You are all wrong, it’s like a big leaf!” says a third. In the end, it is only when they put their different insights together that they can figure out

what the elephant truly looks like.

Page 201: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

189

APP

END

IX A

: FA

CTO

RS

INFL

UEN

CIN

G T

HE

RA

TE O

F R

EPLA

CEM

ENT.

A

Conc

ept

From

Sw

ades

h (1

957)

B

Rate

OfR

eplac

emen

t Fr

om P

agel

et a

l. (2

007)

C

Syno

nym

s_av

erag

e A

vera

ge o

f the

syno

nym

valu

es

D

Syno

nym

sEng

lish

Ext

ract

ed fr

om d

ata

in L

indb

erg

and

Oxf

ord

Uni

vers

ity P

ress

(201

2)

E

Syno

nym

sDan

ish

Ext

ract

ed fr

om d

ata

in In

gem

ann

(201

1)

F Sy

nony

msD

utch

E

xtra

cted

from

dat

a in

http

://s

ynon

iemen

.net

G

Sy

nony

msG

erm

an

Ext

ract

ed fr

om d

ata

in h

ttp:/

/ww

w.w

oerte

rbuc

h.in

fo/

H

Syno

nym

sSw

edish

E

xtra

cted

from

dat

a in

Walt

er (2

000)

I

Freq

uenc

y Fr

om P

agel

et a

l. (2

007)

J

Imag

eabi

lity

From

Cor

tese

& F

uget

t (20

04)

K

Conc

rete

ness

Fr

om B

rysb

aert

et a

l. (2

014)

L

Mut

ualIn

form

atio

n E

xtra

cted

from

dat

a at

http

://c

orpu

s.byu

.edu/

bnc/

M

A

rous

al Fr

om W

arrin

er e

t al.

(201

3)

N

Sens

es

Ext

ract

ed fr

om d

ata

in F

ellba

um (1

999)

O

A

geO

fAcq

uisit

ion

From

Mon

agha

n (2

015)

P

Wor

dClas

s Fr

om P

agel

et a

l. (2

007)

Page 202: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

190

A

B

C

D

E

F

G

H

I J

K

L

M

N

O

P

AN

IMA

L 3.

4 0.

47

0.24

1.

01

0.33

0.

45

0.31

97

N

A

4.61

6.

44

4.3

1 2.

89

Nou

n A

SHE

S 2.

65

0.28

0.

21

0.13

0.

05

0.15

0.

83

9 N

A

4.92

2.

46

NA

N

A

7.06

N

oun

BACK

4.

29

0.54

0.

43

1.14

0.

33

0.40

0.

42

148

5 4.

33

5.20

2.

59

9 5.

31

Nou

n BA

D

6.87

2.

46

0.40

1.

71

6.02

1.

46

2.70

21

8 2.

4 1.

68

5.85

4.

86

14

2.79

A

djec

tive

BARK

_OF

_A_T

REE

3.

79

0.28

0.

16

0.06

0.

43

0.20

0.

52

8 5

4.52

2.

37

3.4

1 N

A

Nou

n BE

LLY

4.

39

0.62

0.

40

0.25

0.

54

0.91

0.

99

32

NA

4.

8 1.

93

3.75

5

4.05

N

oun

BIG

3.

41

2.08

2.

07

4.50

1.

36

0.76

1.

71

1119

4.

5 3.

66

5.39

4.

33

13

2.89

A

djec

tive

BIRD

2.

15

0.24

0.

16

0.44

0.

33

0.10

0.

16

63

6.4

5 7.

55

3.83

5

3.52

N

oun

BLA

CK

1.91

N

A

NA

N

A

NA

N

A

NA

20

3 6.

4 3.

76

7.26

3.

58

5 3.

56

Adj

ectiv

e BL

OO

D

2.19

0.

50

0.81

0.

06

NA

0.

45

0.68

13

2 6.

4 4.

86

8.65

5.

76

5 4.

89

Nou

n BO

NE

1.

34

0.58

N

A

1.08

0.

33

0.25

0.

68

46

6.4

4.9

7.55

4.

75

3 5.

53

Nou

n CH

ILD

_Y

OU

NG

6.

31

1.44

0.

73

2.41

1.

95

0.76

1.

35

626

6.4

4.78

5.

84

5.33

4

5.15

N

oun

CLO

UD

3.

25

0.47

0.

75

0.19

0.

27

0.20

0.

93

31

6.7

4.54

7.

09

2.81

6

3.63

N

oun

COLD

_W

EA

THE

R 3.

58

1.77

1.

45

3.11

0.

98

1.51

1.

82

96

4.2

3.85

6.

77

NA

13

3.

95

Adj

ectiv

e D

AY

_NO

T _N

IGH

T 1.

04

0.52

0.

67

0.25

0.

27

0.76

0.

62

1127

4.

8 3.

92

5.75

3.

62

9 3.

50

Nou

n D

IRTY

9.

27

2.03

2.

55

2.03

1.

90

1.51

2.

18

41

NA

4.

23

5.96

5.

05

12

4.55

A

djec

tive

DO

G

1.79

0.

65

0.67

0.

82

0.65

0.

50

0.62

12

0 6.

9 4.

85

7.44

5.

43

7 2.

80

Nou

n D

RY _

SUBS

TAN

CE

1.83

1.

73

1.72

1.

71

0.98

2.

87

1.35

51

3.

8 3.

77

6.62

3.

8 N

A

4.11

A

djec

tive

DU

LL_K

NIF

E

6.79

1.

52

2.36

2.

16

0.71

1.

11

1.25

13

3

2.37

3.

74

1.67

N

A

8.05

A

djec

tive

DU

ST

3.63

0.

60

0.35

0.

06

1.68

0.

30

0.62

40

5.

2 4.

4 6.

83

3.45

3

5.06

N

oun

EA

R 0.

88

0.50

0.

38

0.38

0.

81

0.35

0.

57

107

6.8

5 6.

63

3.5

5 3.

63

Nou

n E

ART

H_S

OIL

3.

52

1.06

0.

46

1.39

1.

14

1.36

0.

93

242

6.5

4.8

6.59

5.

04

7 N

A

Nou

n E

GG

1.

57

0.37

N

A

0.38

0.

22

0.55

0.

31

34

NA

4.

97

9.33

3.

28

3 3.

89

Nou

n E

YE

0.

93

0.79

1.

07

0.70

0.

92

0.25

0.

99

520

6.7

4.9

7.95

3.

95

5 3.

75

Nou

n FA

R 3.

14

0.63

0.

30

1.08

0.

38

1.26

0.

16

215

3.5

2.71

5.

22

NA

4

4.88

A

djec

tive

FAT

_SU

BSTA

NCE

4.

89

0.86

1.

13

0.70

0.

49

1.41

0.

57

20

5.8

4.52

7.

44

3.89

N

A

5.15

N

oun

FATH

ER

2.3

0.87

1.

05

0.38

0.

87

0.81

1.

25

356

NA

4.

52

5.97

3.

68

8 4.

11

Nou

n

Page 203: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

191

FEA

THE

R _L

ARG

E 2.

35

0.29

0.

08

0.13

0.

22

0.50

0.

52

23

NA

4.

9 4.

17

3.29

N

A

4.67

N

oun

FIRE

1.

75

1.46

1.

26

1.20

2.

06

1.51

1.

25

123

6.5

4.68

7.

65

6.05

9

3.25

N

oun

FISH

1.

45

0.13

N

A

0.19

0.

05

0.00

0.

26

64

6.8

5 7.

44

3.33

4

4.05

N

oun

FLO

WE

R 2.

29

0.54

0.

56

0.38

0.

43

0.71

0.

62

83

NA

5

7.29

3.

67

3 3.

11

Nou

n FO

G

4.97

0.

51

0.21

0.

32

0.71

0.

76

0.57

25

6

4.66

2.

98

3.67

3

6.21

N

oun

FOO

T 1.

39

0.73

0.

38

1.27

0.

60

0.55

0.

83

294

6.7

4.9

6.69

2.

77

11

3.44

N

oun

FRU

IT

2.95

0.

68

NA

0.

76

0.38

0.

86

0.73

31

6.

3 4.

81

8.12

4.

09

3 3.

63

Nou

n G

OO

D

3.04

2.

55

4.27

4.

56

2.17

0.

76

0.99

14

76

2.6

1.64

5.

65

3.66

21

3.

55

Adj

ectiv

e G

RASS

2.

73

0.69

N

A

1.20

0.

49

0.66

0.

42

49

6.9

4.93

7.

76

3.39

4

3.94

N

oun

GRE

EN

2.

46

NA

N

A

NA

N

A

NA

N

A

116

6.2

4.07

7.

82

4.07

6

3.79

A

djec

tive

GU

TS

6.89

0.

34

0.62

0.

25

0.05

0.

55

0.21

10

N

A

3.85

1.

70

NA

1

6.96

N

oun

HA

IR

3.61

0.

69

0.54

0.

89

0.71

0.

81

0.52

10

4 6.

3 4.

97

9.05

3.

71

6 3.

17

Nou

n H

AN

D

0.82

0.

71

1.13

0.

38

0.81

0.

50

0.73

75

1 6.

4 4.

72

6.60

3.

98

14

2.74

N

oun

HE

AD

2.

32

2.11

2.

58

1.46

3.

20

1.97

1.

35

446

6.8

4.75

6.

77

4.45

33

3.

42

Nou

n H

EA

RT

1.69

1.

39

1.59

1.

08

1.74

1.

41

1.14

15

3 6.

7 4.

52

7.29

5.

07

10

5.17

N

oun

HE

AV

Y

2.89

2.

67

3.33

2.

79

2.23

1.

46

3.53

11

9 N

A

3.37

6.

79

3.65

27

4.

05

Adj

ectiv

e H

USB

AN

D

4.74

0.

96

0.24

1.

65

0.33

1.

36

1.25

16

7 N

A

4.11

4.

57

4.38

1

5.53

N

oun

ICE

2.58

0.

18

0.54

0.

19

0.00

0.

00

0.16

40

6.

3 4.

89

8.57

3.

3 7

3.86

N

oun

LAK

E

3.5

0.62

0.

16

0.82

0.

71

0.66

0.

78

41

6.6

4.88

8.

67

2.64

3

4.61

N

oun

LEA

F 2.

43

0.95

0.

30

1.20

1.

14

1.16

0.

93

70

6.8

5 7.

47

3.05

3

4.60

N

oun

LEFT

4.

41

NA

N

A

NA

N

A

NA

N

A

148

NA

3.

7 6.

77

NA

5

5.57

A

djec

tive

LEG

3.

73

0.86

0.

32

1.52

1.

57

0.20

0.

68

252

6.8

4.83

7.

33

2.75

9

3.00

N

oun

LIV

ER

2.66

N

A

NA

N

A

NA

N

A

NA

12

N

A

4.68

8.

37

3.27

2

8.56

N

oun

LON

G

1.22

0.

81

0.48

0.

63

0.54

1.

51

0.88

33

0 3.

6 3.

18

5.73

N

A

9 4.

24

Adj

ectiv

e LO

USE

1.

74

0.36

N

A

0.32

0.

05

0.66

0.

42

2 3.

4 N

A

NA

3.

61

2 10

.26

Nou

n M

AN

_MA

LE

3.38

1.

56

1.42

1.

65

1.74

1.

06

1.92

63

8 6.

2 4.

79

6.29

N

A

11

3.11

N

oun

ME

AT_

FLE

SH

1.62

N

A

NA

N

A

NA

N

A

NA

57

6.

3 4.

9 7.

47

NA

3

NA

N

oun

MO

THE

R 2.

36

0.57

0.

64

0.38

0.

81

0.40

0.

62

316

NA

4.

6 6.

27

4.73

5

2.63

N

oun

Page 204: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

192

MO

UN

TAIN

2.

75

0.92

0.

75

0.70

1.

36

1.06

0.

73

82

NA

4.

96

7.77

4.

12

2 6.

15

Nou

n M

OU

TH

2.78

1.

07

0.94

0.

89

1.52

1.

01

0.99

88

N

A

4.74

6.

67

4.14

8

3.58

N

oun

NA

ME

0.47

1.

07

1.07

1.

96

1.19

0.

45

0.68

29

2 4.

2 3.

5 5.

71

3.04

6

3.68

N

oun

NA

RRO

W

2.89

1.

25

1.13

0.

38

1.36

1.

51

1.87

73

N

A

3.04

7.

04

4.53

5

7.57

A

djec

tive

NE

CK

3.57

0.

42

NA

0.

38

0.38

0.

45

0.47

63

6.

5 5

7.29

3.

65

5 3.

00

Nou

n N

EW

0.

6 1.

34

1.45

0.

82

1.52

1.

51

1.40

68

6 3

2.81

6.

05

5.14

10

4.

72

Adj

ectiv

e N

IGH

T 0.

76

0.23

0.

16

0.25

N

A

0.20

0.

31

359

5.8

4.52

6.

27

3.57

8

3.61

N

oun

NO

SE

1.49

1.

06

0.73

1.

84

0.43

1.

01

1.30

78

6.

4 4.

89

7.24

3.

1 8

2.95

N

oun

OLD

2.

53

1.71

2.

23

2.85

1.

09

1.51

0.

88

422

4.3

2.72

6.

96

4.48

8

3.72

A

djec

tive

PERS

ON

4.

81

0.96

0.

48

1.39

N

A

0.96

0.

99

1086

N

A

4.72

5.

78

3.71

3

4.67

N

oun

RED

2.

26

NA

N

A

NA

N

A

NA

N

A

152

6 4.

24

7.74

5.

02

3 3.

68

Adj

ectiv

e RI

GH

T 2.

61

NA

N

A

NA

N

A

NA

N

A

429

NA

3.

47

5.45

5

3 4.

35

Adj

ectiv

e RI

GH

T _C

ORR

ECT

3.

91

1.44

1.

59

0.38

1.

90

2.92

0.

42

94

3.6

3.47

5.

45

5 12

N

A

Adj

ectiv

e RI

VE

R _S

TRE

AM

_BR

OO

K

3.69

0.

61

NA

0.

38

0.49

0.

71

0.88

11

5 N

A

4.89

8.

64

4.22

1

NA

N

oun

ROA

D

4.86

1.

51

0.35

3.

17

1.57

0.

96

1.51

31

8 6.

6 4.

75

8.37

3.

81

2 4.

55

Nou

n RO

OT

1.75

0.

63

0.56

1.

08

0.38

0.

50

0.62

44

6

4.34

5.

68

3.62

8

5.94

N

oun

ROPE

6.

09

0.49

0.

19

0.32

0.

71

0.55

0.

68

29

6.6

4.93

4.

88

4.58

2

5.44

N

oun

ROTT

EN

4.

76

1.42

2.

18

0.76

1.

47

1.51

1.

19

8 N

A

3.87

4.

00

4.9

3 6.

95

Adj

ectiv

e SA

LT

1.01

0.

26

0.38

0.

19

0.27

0.

15

0.31

25

6.

2 4.

89

7.93

4.

53

3 5.

05

Nou

n SA

ND

4.

26

NA

0.

00

NA

0.

00

0.00

0.

00

41

6.7

5 8.

03

3.43

2

4.63

N

oun

SEA

_OCE

AN

2.

53

0.81

0.

54

0.82

0.

92

0.71

1.

04

127

6.4

4.79

8.

27

2.8

3 N

A

Nou

n SE

ED

2.

56

0.52

0.

78

0.32

0.

71

0.30

0.

52

19

6.4

4.71

8.

31

3.68

5

4.72

N

oun

SHA

RP_K

NIF

E

3.76

2.

33

2.53

1.

71

3.04

1.

51

2.86

44

4.

6 3.

86

6.11

6

NA

6.

11

Adj

ectiv

e SH

ORT

3.

39

1.20

2.

04

0.76

0.

65

1.51

1.

04

176

4.5

3.61

5.

98

2.62

11

4.

32

Adj

ectiv

e SK

IN_O

F _P

ERS

ON

2.

73

0.44

0.

54

0.32

0.

27

0.50

0.

57

68

6.3

4.79

6.

73

3.25

8

4.48

N

oun

SKY

2.

52

0.69

0.

21

0.38

0.

76

1.21

0.

88

106

6.5

4.45

7.

17

2.74

1

4.17

N

oun

SMA

LL

3.7

1.69

2.

04

1.71

1.

79

1.51

1.

40

469

4.2

3.22

5.

82

3.43

10

3.

22

Adj

ectiv

e SM

OK

E

1.7

0.39

0.

13

0.06

0.

49

0.66

0.

62

50

6.4

4.96

8.

98

5 8

4.00

N

oun

Page 205: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

193

SMO

OTH

6.

23

1.42

1.

48

1.58

1.

30

1.51

1.

25

25

4 3.

81

6.32

2.

76

8 5.

61

Adj

ectiv

e SN

AK

E

3.11

0.

39

NA

0.

19

0.54

0.

66

0.16

15

6.

5 5

3.02

7.

24

6 5.

10

Nou

n SN

OW

1.

62

0.45

N

A

0.13

N

A

0.76

0.

47

69

NA

4.

85

6.12

4.

57

2 4.

11

Nou

n ST

AR

0.72

0.

85

0.73

0.

57

1.30

0.

71

0.93

83

6.

8 4.

69

7.66

5.

5 8

3.89

N

oun

STIC

K_O

F _W

OO

D

8.02

0.

60

0.59

0.

51

0.60

0.

76

0.57

29

5.

9 4.

59

6.28

3.

81

9 3.

89

Nou

n ST

ON

E_R

OCK

2.

73

0.52

0.

67

0.19

0.

60

0.60

0.

52

101

6.6

4.72

7.

35

3.25

7

N

oun

STRA

IGH

T 4.

08

1.24

1.

85

0.95

0.

54

1.51

1.

35

43

4.1

2.76

4.

81

NA

15

6.

80

Adj

ectiv

e SU

N

0.98

0.

33

NA

0.

06

0.54

0.

25

0.47

13

3 6.

9 4.

83

7.66

4.

64

4 3.

40

Nou

n TA

IL

4.16

0.

74

0.56

1.

08

0.27

1.

16

0.62

46

6.

5 4.

96

7.33

3.

27

8 3.

70

Nou

n TH

ICK

5.

41

1.82

1.

72

2.79

1.

52

1.11

1.

97

64

3.7

4 6.

57

3.13

10

5.

61

Adj

ectiv

e TH

IN

2.99

1.

32

1.59

1.

33

1.09

1.

46

1.14

11

0 4.

9 3.

83

6.07

4.

5 8

6.47

A

djec

tive

TO_B

ITE

4.67

0.

72

0.83

0.

89

0.33

1.

11

0.47

14

5.

2 4.

44

5.27

5.

1 4

3.58

V

erb

TO_B

LOW

4.

31

1.00

1.

21

0.44

0.

33

1.81

1.

19

23

4.2

3.74

6.

77

4.48

23

4.

00

Ver

b TO

_BRE

ATH

E

2.48

0.

77

0.67

0.

51

0.22

0.

76

1.71

41

3.

8 4.

36

5.43

2.

6 9

4.11

V

erb

TO_B

URN

2.

94

1.26

1.

61

1.14

1.

25

1.51

0.

78

42

4.7

4.11

6.

07

5.4

15

4.72

V

erb

TO_C

OM

E 1.

9 1.

09

0.78

1.

27

1.03

1.

06

1.30

84

5 2.

2 2.

72

4.59

3.

57

21

3.32

V

erb

TO_C

OU

NT

4.07

0.

72

0.89

0.

63

0.71

0.

30

1.09

43

1 4.

4 3

4.16

2.

3 9

2.61

V

erb

TO_C

UT

3.36

1.

89

2.74

2.

35

1.36

1.

56

1.45

92

5.

1 4.

55

5.35

5.

07

41

4.43

V

erb

TO_D

IE

0.82

1.

44

1.53

1.

33

1.03

1.

51

1.82

17

9 4.

4 3.

07

6.25

6.

9 11

4.

53

Ver

b TO

_DIG

3.

86

0.73

1.

18

0.19

0.

87

0.76

0.

68

12

5.4

4.33

6.

15

3.67

8

4.19

V

erb

TO_D

RIN

K

1.49

1.

13

0.51

1.

01

1.41

1.

36

1.35

92

5.

7 4.

76

6.26

5.

19

5 3.

47

Ver

b TO

_EA

T 1.

77

0.86

0.

86

0.57

1.

30

0.35

1.

19

470

4.8

4.44

5.

75

4.38

6

2.78

V

erb

TO_F

ALL

_D

ROP

2.97

2.

03

2.53

1.

65

3.20

0.

76

2.03

20

2 4.

8 4.

04

6.10

4.

24

NA

N

A

Ver

b TO

_FE

AR

4.06

N

A

NA

N

A

NA

N

A

NA

13

6 2.

9 2.

57

4.14

6.

14

5 4.

79

Ver

b TO

_FIG

HT

4.21

1.

32

1.56

1.

08

1.09

1.

31

1.56

60

4.

8 4.

2 5.

70

6.33

4

5.47

V

erb

TO_F

LOA

T 4.

04

0.82

0.

64

0.32

1.

63

NA

0.

68

21

5.1

4.07

3.

44

3.1

9 5.

39

Ver

b TO

_FLO

W

3.4

1.15

0.

99

0.89

0.

33

1.46

2.

08

20

3.8

3.72

3.

12

3.71

7

7.40

V

erb

TO_F

LY

2.83

1.

16

0.99

1.

52

1.14

1.

51

0.62

72

5.

9 4.

64

6.35

4.

9 14

3.

05

Ver

b

Page 206: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

194

TO_F

REE

ZE

2.

69

0.67

0.

73

0.57

N

A

0.86

0.

52

20

5.2

3.96

2.

80

4 10

4.

89

Ver

b TO

_GIV

E

0.76

1.

90

2.04

2.

09

2.28

1.

56

1.51

15

58

2.8

2.83

5.

29

4.57

44

4.

28

Ver

b TO

_HE

AR

2.35

1.

11

0.78

0.

95

1.57

1.

56

0.68

39

8 N

A

3.66

5.

59

3.62

5

3.80

V

erb

TO_H

IT

5.51

1.

98

1.59

0.

44

2.17

2.

57

3.12

90

4.

4 4.

11

5.78

5.

48

16

4.75

V

erb

TO_H

OLD

_IN

_HA

ND

4.

48

2.18

3.

14

1.58

2.

61

1.56

2.

03

270

4 3.

68

5.01

4.

86

36

4.67

V

erb

TO_H

UN

T _G

AM

E 3.

33

1.37

0.

62

2.47

0.

81

1.46

1.

51

20

4.6

3.81

3.

71

5.1

NA

6.

06

Ver

b TO

_KIL

L 3.

5 1.

68

2.55

0.

82

0.87

2.

62

1.56

11

8 4.

8 3.

9 6.

27

6.81

15

6.

35

Ver

b TO

_KN

OW

_(F

ACT

S)

2.11

1.

25

0.99

0.

89

2.01

0.

96

1.40

18

26

2.4

1.68

3.

70

3.24

11

4.

50

Ver

b TO

_LA

UG

H

1.61

N

A

NA

N

A

NA

N

A

NA

10

0 4.

9 4.

21

5.61

6.

62

1 3.

79

Ver

b TO

_LIE

_ON

_SI

DE

3.

56

0.98

0.

89

0.76

0.

92

1.51

0.

83

166

3.2

3.11

5.

69

4.81

6

3.75

V

erb

TO_L

IVE

1.

1 0.

89

1.18

0.

82

0.54

0.

96

0.93

51

1 3.

5 3.

57

4.48

4.

71

7 6.

10

Ver

b TO

_PLA

Y 3.

18

1.62

1.

34

0.51

2.

28

1.51

2.

44

327

4.5

3.24

5.

95

3.81

35

4.

10

Ver

b TO

_PU

LL

4.53

1.

51

0.75

0.

38

1.63

1.

51

3.27

92

4.

1 3.

97

5.78

4.

1 17

4.

79

Ver

b TO

_PU

SH

8.06

1.

97

1.18

2.

54

1.41

2.

77

1.97

45

4.

1 4.

21

5.99

4.

4 10

4.

26

Ver

b TO

_RA

IN

2.39

0.

65

0.32

1.

01

NA

0.

81

0.47

45

6.

3 4.

97

3.37

3.

29

1 3.

60

Ver

b TO

_RU

B 3.

57

0.68

0.

27

0.44

0.

92

1.16

0.

62

13

4.4

4.33

3.

44

4.35

3

6.00

V

erb

TO_S

AY

3.64

1.

72

1.85

1.

71

2.17

1.

51

1.35

36

89

2.8

2.58

5.

02

4.43

11

3.

42

Ver

b TO

_SCR

ATC

H _

ITCH

4.

44

0.77

0.

73

0.44

0.

54

0.86

1.

30

7 4.

7 4.

16

4.14

4.

35

NA

5.

61

Ver

b TO

_SE

E

2.63

1.

92

2.69

1.

90

1.90

1.

51

1.61

17

02

3 3.

21

5.85

3.

9 24

3.

06

Ver

b TO

_SE

W

1.71

0.

46

0.13

0.

38

0.87

0.

45

0.47

11

5.

2 3.

93

2.11

4.

35

2 6.

05

Ver

b TO

_SIN

G

2.54

0.

81

0.48

0.

89

0.49

1.

21

0.99

10

0 4.

9 4.

34

7.11

4.

1 5

3.47

V

erb

TO_S

IT

1.25

1.

05

1.13

0.

44

1.47

1.

46

0.73

31

3 4.

7 4.

8 5.

92

3.19

10

3.

47

Ver

b TO

_SLE

EP

2.01

0.

95

0.32

0.

51

1.36

1.

51

1.04

12

1 5.

5 4.

44

5.89

3.

6 2

2.79

V

erb

TO_S

ME

LL

6.48

0.

64

0.27

0.

82

0.49

0.

81

0.83

20

3.

4 3.

7 5.

30

5.24

5

4.22

V

erb

TO_S

PIT

2.04

1.

01

1.45

1.

08

0.38

1.

21

0.93

13

5.

2 4.

71

2.31

3.

57

4 5.

06

Ver

b TO

_SPL

IT

5.58

0.

76

1.29

0.

38

0.38

1.

21

0.52

52

4.

3 3.

96

3.54

5.

55

5 6.

25

Ver

b TO

_SQ

UE

EZ

E

7.83

0.

79

0.78

1.

14

0.60

0.

35

1.09

18

4.

3 3.

75

5.31

3.

91

9 5.

42

Ver

b TO

_STA

B_O

R _S

TICK

7.

84

0.90

0.

54

1.77

1.

41

0.55

0.

21

5 5.

2 4.

07

4.18

5.

17

3 N

A

Ver

b

Page 207: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

195

TO_S

TAN

D

2.02

0.

93

1.53

N

A

0.49

0.

96

0.73

36

1 5

4.16

5.

39

3.1

12

4.39

V

erb

TO_S

UCK

2.

41

0.55

0.

00

0.63

0.

65

0.55

0.

93

12

4.4

3.46

4.

06

5.6

7 5.

58

Ver

b TO

_SW

ELL

4.

54

0.72

0.

91

0.32

0.

33

1.16

0.

88

6 3.

9 3.

31

4.20

4.

56

6 7.

42

Ver

b TO

_SW

IM

2.73

0.

36

0.38

N

A

0.16

0.

50

0.42

22

5.

6 4.

43

4.08

6.

05

5 4.

17

Ver

b TO

_TH

INK

5.

34

1.08

0.

81

1.65

0.

81

1.41

0.

73

825

2.6

2.41

3.

46

3.75

13

4.

75

Ver

b TO

_TH

ROW

7.

52

1.14

1.

64

0.63

1.

09

1.21

1.

14

99

4.3

4.04

6.

37

4.52

15

4.

14

Ver

b TO

_TIE

2.

68

1.09

0.

94

1.20

0.

92

1.51

0.

88

28

6.4

4.81

6.

18

3.1

9 4.

74

Ver

b TO

_TU

RN _

VE

ER

8.18

1.

46

2.55

0.

63

2.55

0.

96

0.62

15

4 3.

9 2.

79

5.33

3.

48

NA

N

A

Ver

b TO

_WA

LK

5.14

1.

97

1.05

3.

61

0.54

1.

51

3.12

20

7 5.

2 4.

07

5.53

3.

24

10

3.45

V

erb

TO_W

ASH

3

0.88

1.

42

0.89

0.

76

0.76

0.

57

29

5.1

4.35

6.

37

3.7

13

4.00

V

erb

TO_W

IPE

6.

92

0.61

0.

51

0.51

0.

49

0.91

0.

62

22

5 4

5.92

3.

54

1 4.

83

Ver

b TO

_VO

MIT

5.

02

0.55

0.

51

0.13

0.

71

0.91

0.

52

13

NA

4.

75

2.47

4.

82

1 5.

68

Ver

b TO

NG

UE

0.

49

0.45

0.

35

0.63

0.

16

0.35

0.

73

169

6.5

4.93

7.

76

4.25

8

4.47

N

oun

TOO

TH

1.18

0.

66

0.03

0.

89

NA

1.

26

0.47

74

6.

8 4.

89

8.69

3.

52

5 3.

61

Nou

n TR

EE

3.

58

0.30

N

A

0.19

0.

16

0.60

0.

26

112

6.8

5 8.

10

2.67

2

3.57

N

oun

WA

RM_H

OT

2.35

0.

99

1.21

1.

46

0.76

0.

76

0.78

66

3.

5 3.

56

6.28

3.

35

9 N

A

Adj

ectiv

e W

ATE

R 0.

86

1.02

0.

19

2.03

1.

09

1.01

0.

78

342

NA

5

7.27

3.

71

6 2.

37

Nou

n W

ET

4.45

0.

69

0.67

0.

63

0.71

0.

81

0.62

49

5.

3 4.

46

6.59

5

2 2.

47

Adj

ectiv

e W

HIT

E

2.22

N

A

NA

N

A

NA

N

A

NA

22

4 6.

5 3.

89

6.93

3.

35

12

4.06

A

djec

tive

WID

E

2.89

0.

85

0.83

0.

89

0.33

1.

51

0.68

93

4.

1 3.

06

5.71

3.

05

7 5.

84

Adj

ectiv

e W

IFE

4.

12

0.91

0.

43

1.14

0.

87

1.21

0.

88

189

5.7

4.13

5.

84

4.21

1

5.67

N

oun

WIN

D _

BRE

EZ

E

1.78

0.

92

0.30

1.

01

1.03

0.

76

1.51

92

4.

7 3.

93

8.32

3.

7 8

NA

N

oun

WIN

G

2.66

0.

47

0.46

N

A

0.22

1.

01

0.21

38

6.

1 4.

86

6.02

4.

32

11

4.79

N

oun

WO

MA

N

2.75

1.

28

0.99

1.

77

0.87

0.

71

2.08

52

3 N

A

4.46

5.

90

3.8

4 4.

95

Nou

n W

OO

DS

4.15

0.

43

0.21

0.

51

0.76

0.

30

0.36

12

4 N

A

4.87

3.

72

NA

1

4.93

N

oun

WO

RM

2.16

0.

32

NA

0.

19

NA

0.

10

0.68

9

6.6

4.9

2.31

3.

5 4

3.89

N

oun

YE

AR

2.78

0.

41

0.13

0.

32

1.03

0.

05

0.52

14

44

3.9

3.25

5.

18

3.33

4

5.24

N

oun

YE

LLO

W

2.12

N

A

NA

N

A

NA

N

A

NA

69

N

A

4.3

6.54

3.

83

1 3.

20

Adj

ectiv

e

Page 208: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

196

APPENDIX B: NAMING TASK RESULTS

A 11 4 8 2 10 15 18 14 13B 11 2 1 10 17 19 19 19 16C 9 9 1 13 18 16 19 18 16D 10 1 1 1 1 1 3 7 15 18 15

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 5 4 1 2 10 18 17 17 8B 6 5 18 19 19 19 14 1C 9 3 1 11 19 19 19 19 16D 14 13 2 5 9 15 19 18 18

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 10 14 9 18 18 3 1 1 4B 6 5 18 19 18 1 1 5 2 4C 7 9 19 19 19 1 1 2 7D 7 18 18 19 19 5 1 1 2 1 3 2

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 7 9 18 18 19 1 1 2B 2 3 18 19 19 1 4 1C 4 10 18 19 19 1D 7 15 19 18 19 1 1 2

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 8 10 1 1B 2 16 17 12C 1 18 17 16 2 1D 16 17 18 16 12 2 2 2 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 0

A 11 14 1B 1 17 18 14C 19 19 19 2D 1 17 18 19 17 12 5 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Grön (Older group)

Naming Task Grön (Younger group)

Naming Task Blå (Older group)

Naming Task Blå (Younger group)

Naming Task Lila (Older group)

Naming Task Lila (Younger group)

Page 209: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

197

A 10 17 16 17 16 8 4B 6 15 14 9C 1 2 6 1D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 5 19 19 17 18 12 3B 8 18 18 16C 11 15D 2

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 3C 1 2 17 18 3 1D 1 3 6 13 9 4 1 2 2

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 1C 1 17 18 7 2D 13 19 16 9 6 1 1 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 2 19 18 13 5B 2 1 2 12 12 3 1C 1 1 1 4 5 3 3D 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 18 19 15 6B 9 16 2 2C 3 2 6 1D 1 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Rosa (Older Group)

Naming Task Rosa (Younger Group)

Naming Task Brun (Older Group)

Naming Task Brun (Younger Group)

Naming Task Gul (Older Group)

Naming Task Gul (Younger Group)

Page 210: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

198

A 1B 3 4 9 7C 1 1 2 7 8 19 16 2D 8 15 11 5

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1B 5 6C 1 2 19 13 1D 1 6 16 15

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 4 2 3 1 2 6B 10 15 3 4C 10 8 1 1 1 3D 3 1 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 6 6 1 11B 14 16 2 2 6C 11 8 3D 3 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 2 8 1B 3 17 19 9C 5 2D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 4 2B 3 17 19 13C 9 3 1D 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Röd (Older group)

Naming Task Röd (Younger group)

Naming Task Turkos (Older group)

Naming Task Turkos (Younger group)

Naming Task Orange (Older group)

Naming Task Orange (Younger group)

Page 211: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

199

A

B

C 1D 2 1 1 1 12 14 15 12 1 1 3 16

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B

C 1D 1 1 1 1 5 8 7 6 1 1 12

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 1 2B 1 16C 19D 2 2 2 3 3

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 1 2B 1 1 19C 1 1 19D 2 1 2 1 8

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 1C 1 9 8 1D 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1B 1 1C 1 5 9D 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Svart (Older group)

Naming Task Svart (Younger group)

Naming Task Grå (Older group)

Naming Task Grå (Younger group)

Naming Task Ceris (Older group)

Naming Task Ceris (Younger group)

Page 212: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

200

A 1 2 5 2 5 4 1B 1 1 2 1C

D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 1 1B 1C

D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 1 1 1C 1 2D 1 1 2 3 2 2 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 1C

D 1 111 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 1 1 1C 1D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A

B 1C

D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Naming Task Skär (Older group)

Naming Task Skär (Younger group)

Naming Task Violett (Older group)

Naming Task Violett (Younger group)

Naming Task Gredelin (Older group)

Naming Task Gredelin (Younger group)

Page 213: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

201

APPENDIX C: BEST EXAMPLE TASK RESULTS

AB 1C 1 9 5D 3

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 1C 2 4 10D 1 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

ABC 3 2D 14

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 1 1C 5 1D 11

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 1C 18D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 2C 17D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Blå (Younger Group)

Best Example Blå (Older Group)

Best Example Brun (Younger Group)

Best Example Brun (Older Group)

Best Example Grå (Younger Group)

Best Example Grå (Older Group)

Page 214: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

202

ABC 3 6 4D 4 2

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 1 1C 4 3 5D 4 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 4B 11 4CD

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 3 2 2 2B 7 2C 2D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 2 9C 1 4 3D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

AB 2 5C 4 6 2D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Rosa (Older Group)

Best Example Grön (Younger Group)

Best Example Grön (Younger Group)

Best Example Lila (Younger Group)

Best Example Lila (Older Group)

Best Example Rosa (Younger Group)

Page 215: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

203

A 3 16BCD

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 16 3BCD

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

ABC 16 1D 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

ABC 1 17D 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1B 6 5 1 3C 2 1D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 2B 5 2 2 3C 1 1 1 1D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Gul (Younger Group)

Best Example Gul (Older Group)

Best Example Röd (Younger Group)

Best Example Röd (Older Group)

Best Example Turkos (Younger Group)

Best Example Turkos (Older Group)

Page 216: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

204

AB 11 8CD

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1B 10 8CD

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

ABC 6 7 1D 1 3

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1B 1 1 1C 3 3 3D 1 2 1

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 2 4 3 5 1 B 1.5 1.5CD

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

A 1 1 5 1 BC 1 10 4 D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

Best Example Skär (Older Group)

Best Example Ceris (Older Group)

Best Example Orange (Younger Group)

Best Example Orange (Older Group)

Best Example Violett (Older Group)

Best Example Gredelin (Older Group)

Page 217: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

205

APP

END

IX D

: EO

SS C

OD

ES, M

UN

SELL

CO

DES

, HEX

CO

DES

A1

A2

A3

A4

A5

A6

A7

A8

A9

A10

5R

8/6

10

R 8/

6 5Y

R 8/

8 10

YR

8/14

5Y

8/1

4 10

Y 8

/12

5GY

8/1

0 10

GY

8/8

5G

8/6

10

G 8

/6

#FE

B4A

D

#E

EBB

AA

#

F9B9

8A

#F7

BC60

#

E7C

530

#D

2CC2

A

#B2

D43

D

#8C

D98

1 #

80D

8AC

#77

D9B

B B1

B2

B3

B4

B5

B6

B7

B8

B9

B1

0 5R

6/1

2 10

R 6/

14

5YR

6/14

10

YR

6/12

5Y

6/1

0 10

Y 6

/10

5GY

6/1

0 10

GY

6/1

2 5G

6/1

0 10

G 6

/10

#E

D63

62

#F6

6028

#

DA

7511

#

C182

0D

#A

98C1

D

#97

9218

#

7D99

2B

#28

A62

E

#2B

A27

3 #

0AA

284

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

5R 4

/14

10R

4/12

5Y

R 4/

8 10

YR

4/8

5Y 4

/6

10Y

4/6

5G

Y 4

/8

10G

Y 4

/8

5G 4

/10

10G

4/1

0 #

B914

2C

#A

9300

D

#8B

4815

#

7E4F

00

#6C

5710

#

5F5B

0D

#4C

601A

#

1C68

23

#12

6647

#

2D62

55

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

5R

2/8

10

R 2/

6 5Y

R 2/

4 10

YR

2/2

5Y 2

/2

10Y

2/2

5G

Y 2

/2

10G

Y 2

/4

5G 2

/6

10G

2/6

#

5A09

1F

#4F

1814

#

4221

11

#34

271B

#

3029

19

#2B

2A1B

#

272C

1F

#15

301A

#

0C30

22

#08

3028

A11

A

12

A13

A

14

A15

A

16

A17

A

18

A19

A

20

5BG

8/4

10

BG 8

/4

5B 8

/4

10B

8/6

5PB

8/6

10PB

8/4

5P

8/4

10

P 8/

6 5R

P 8/

6 10

RP 8

/6

#91

D3C

8 #

90D

2D3

#97

CFD

C #

8ECF

F2

#A

7C8F

6 #

C4C3

E1

#CF

C0D

C #

E8B

7DA

#

F5B5

C9

#FB

B4BB

B1

1 B1

2 B1

3 B1

4 B1

5 B1

6 B1

7 B1

8 B1

9 B2

0 5B

G 6

/10

10BG

6/8

5B

6/1

0 10

B 6/

10

5PB

6/10

10

PB 6

/10

5P 6

/8

10P

6/10

5R

P 6/

12

10RP

6/1

2 #

389E

95

#35

9CA

4 #

4399

B0

#06

99D

3 #

5890

D7

#8D

84D

2 #

A48

0BA

#

C374

B1

#E

0669

8 #

E96

37D

C1

1 C1

2 C1

3 C1

4 C1

5 C1

6 C1

7 C1

8 C1

9 C2

0 5B

G 4

/8

10BG

4/6

5B

4/1

0 10

B 4/

10

5PB

4/12

10

PB 4

/12

5P 4

/12

10P

4/12

5R

P 4/

12

10RP

4/1

4 #

2862

5E

#27

6168

#

2D5F

6F

#23

5E80

#

1A5A

9F

#5D

4AA

5 #

7C3F

96

#91

3583

#

A22

A67

#

B611

4D

D11

D

12

D13

D

14

D15

D

16

D17

D

18

D19

D

20

5BG

2/6

10

BG 2

/6

5B 2

/6

10B

2/6

5PB

2/8

10PB

2/1

0 5P

2/8

10

P 2/

6 5R

P 2/

8 10

RP 2

/8

#1B

2D2C

#

1A2D

30

#1B

2C33

#

0A2D

42

#0C

2A51

#

321B

62

#40

194E

#

421C

3E

#52

0E39

#

570B

2E

Page 218: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

206

APP

END

IX E

: TH

E EF

FEC

T O

F C

OLO

R B

LIN

DN

ESS

The

follo

win

g pa

rticip

ants

had

mor

e th

an o

ne e

rror

on

their

col

or b

lindn

ess t

ests

, whi

ch in

dica

tes t

hat a

clo

ser i

nspe

ctio

n of

their

(a)ty

pica

lity

is in

ord

er:

G

roup

Pa

rtic

ipan

t Sw

edis

h (o

lder

spea

kers

) pa

rtici

pant

8

Engl

ish

Parti

cipa

nt 6

B

erne

se

Parti

cipa

nts 1

2 &

16

Icel

andi

c Pa

rtici

pant

13

& 1

6 E

ach

parti

cipan

t’s m

ain re

spon

se to

eac

h ce

ll in

the

Nam

ing

Task

was

con

verte

d in

to a

bin

ary

mat

rix, w

ith th

e st

imul

i as r

ows

and

all th

e co

lor t

erm

s use

d by

th

e pa

rticip

ant’s

gro

up a

s co

lum

ns.

The

bina

ry m

atrix

was

the

n co

nver

ted

into

a s

ingl

e on

e di

men

siona

l ve

ctor

for

eac

h pa

rtici

pant

, an

d clu

ster

ed u

sing

Mul

tidim

ensio

nal s

calin

g w

ith th

e py

thon

pac

kage

skl

earn

.man

ifold

and

the

MD

S m

odul

e. Th

e sp

eake

rs o

f int

eres

t are

mar

ked

in th

e pl

ots

belo

w. I

am

inde

bted

to

Joha

nnes

Bjer

va fo

r adv

ice an

d im

plem

enta

tion

help

.

Page 219: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

207

APPENDIX F: SWEDISH SUMMARY / SVENSK SAMMANFATTNING

Introduktion

Språk förändras ständigt och systematiska vetenskapliga undersökningar om hur och varför dessa förändringar sker i framförallt fonologi och grammatik är vanliga inom lingvistiken. Det har visat sig svårare att hitta regler och tendenser när det gäller förändringar inom lexikon (vokabulären). Min avhandling bidrar till den pågående forskningsdiskussionen om detta genom att undersöka processen “lexikal ersättning” från flera olika perspektiv.

Ett typiskt exempel på lexikal ersättning är när ett ord byts ut, men det som ordet syftar på inte förändras. Ett exempel är hur svensktalande har benämnt färgen på blomman åkervinda. I mitten av 1800-talet kallades den ljust röd i botaniska uppslagsverk. Runt sekelskiftet kallades den skär. I dag kallas den oftast rosa. Blomfärgen har inte förändrats, det har däremot ordet. Från perspektivet av ett eller två århundraden kan man säga att en ersättning från ljust röd till skär och slutligen till rosa har ägt rum – men tidsdjupet innebär också att detaljkunskapen om ersättningsprocessen är dålig. Mindre tidsdjup möjliggör mer detaljerad kunskap om individuella ersättningsförlopp, men gör det å andra sidan svårare att finna generella tendenser och regler. Ju mer detaljkunskap som finns om en specifik lexikal ersättning, desto klarare blir det dessutom att denna process svårligen kan undersökas utan att även ta upp andra, sammanbundna, lexikala förändringsprocesser, som exempelvis semantisk förändring.

Ett typiskt exempel på semantisk förändring är när ett ord stannar mer eller mindre oförändrat i språket, men vad det syftar på förändras. Ett tydligt exempel är hur jungfru tidigare betydde ung kvinna, men numer nästan endast kan användas för att beteckna ett stjärntecken. Ofta är semantiska förändringar dock mer subtila. Ett exempel på en partiell semantisk förändring har redan nämnts ovan i fallet röd: eftersom röd i modern svenska inte längre för många talare framgångsrikt kan användas för att beskriva åkervindans färg, har dess denotationsmöjligheter (mängden av färgspektrumet som ordet kan åsyfta) minskat: ordet har genomgått viss semantisk förändring.

Olika perspektiv – både vad gäller tidsdjup och metoder – är lämpade för olika typer av (besläktade) frågor om lexikal ersättning, och ger olika typer

Page 220: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

208

av svar. Avhandlingen undersöker dels lexikala ersättningsprocesser i en specifik semantisk domän (färg), dels lexikala ersättningsprocesser oberoende av semantisk domän. En semantisk domän är en samling mentala koncept som har tydliga betydelsekopplingar.

Den här avhandlingen undersöker tre generella forskningsfrågor och närmar sig dem från olika perspektiv.

A. Vad påverkar sannolikheten för lexikal ersättning?

A1. Kan domänoberoende generaliseringar göras? A2. Hur kan domänberoende generaliseringar göras?

B. Hur fortlöper lexikal ersättning i den semantiska domänen färg? B1. Hur interagerar lexikal ersättning med andra lexikala förändringsprocesser? B2. Hur kan kunskap om färgs semantik och pragmatik underlätta förståelsen för lexikal ersättning i färgdomänen?

C. Hur kan olika perspektiv på lexikal ersättning komplettera varandra?

Avhandlingens struktur

Kapitel 1 introducerar avhandlingens ämne och forskningsfrågor, och kapitel 2 innehåller bakgrund om lexikal ersättning och andra relaterade lexikala förändringsprocesser. Den första studien (”makrostudien”) återfinns i kapitel 3. Eftersom både den andra (”mesostudien”) och den tredje (”mikrostudien”) handlar om en specifik semantisk domän (färg) så innehåller kapitel fyra en introduktion till färglingvistik. Meso- och mikrostudierna återfinns sedan i kapitel 5 respektive kapitel 6. Kapitel 7 sammanfattar avhandlingens huvudresultat.

Den första studien (den statistikinriktade makrostudien i kapitel 3) antar ett (relativt sett) stort tidsdjupsperspektiv och undersöker vilka faktorer som påverkar graden (sannolikheten i ett givet tidsintervall) av lexikal ersättning i grundläggande vokabulär i 98 indoeuropeiska språkvarianter. Den andra studien (den tvärspråkliga mesostudien i kapitel 5) har ett mindre tidsdjup och begränsar fokus till sju närbesläktade germanska språk (engelska, tyska, bernesiska (den tyska dialekten i Bern), danska, svenska, norska och isländska). Studien är också begränsad till en enda semantisk domän: färg. Mer exakt de rosa och lila delarna av färgspektrumet. Den tredje studien (mikrostudien) begränsar fokus än mer till ett enda språk (svenska) och två generationer talare. Här kompletteras resultat från färgbenämningsexperiment med de två generationerna med data från historiska och moderna ordböcker, lexikon, och intervjuer.

Page 221: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

209

Forskningsfråga A täcks framför allt av makrostudien, medan meso- och mikrostudierna närmar sig fråga B. Fråga C är invävdi diskussionen av mikrostudiens resultat, där mesostudiens mer generella resultat ger bredd och djup till historierna om de enskilda svenska färgtermernas utveckling. Fråga C diskuteras även i avhandlingens sjunde och sista kapitel.

Studierna

Jag visar i denna avhandling att det både är möjligt och eftersträvansvärt att söka efter generaliseringar kring hur lexikal ersättning sker. Dessa generaliseringars tillämpbarhet skiljer sig dock åt – vissa generella observationer kan göras enbart för enskilda semantiska domäner (de är domänberoende) medan andra kan gälla för många domäner. Beroende på vilket tidsdjup som används, kommer också de möjliga frågeställningarna och typerna av svar att förändra sig. Från ett stort tidsdjup (som några tusen år) ser man lexikal ersättning tydligt, medan samma process betraktad från ett mindre tidsdjup bättre kan beskrivas som flera olika gradvisa lexikala förändringsprocesser som är sammanblandade.

Det är nyttigt att triangulera olika perspektiv för att få en djupare förståelse av lexikala förändringsfenomen. Metodtriangulering är processen där resultat valideras genom att flera olika källor, metoder och perspektiv används. De olika perspektiven är relevanta för olika former av generaliseringar och forskningsfrågor, och möjliggör användning av olika typer av metoder.

Makrostudiens (kapitel 3) forskningsbidrag är delvis metodologiska och delvis teoretiska. Metodologiskt bidrar studien till den pågående forskningen i datasemantik som använder regressionsmodellering för att studera domänoberoende generaliseringar om lexikala processer. Kvantitativa metoder, såsom linjära regressionsmodeller, kan framgångsrikt användas för att hitta generaliseringar för data från stora tidsdjup. En sådan modell presenteras i kapitel 3, och den demonstrerar att a) lexikal ersättning bör studeras separat för öppna innehållskoncept å ena sidan och slutna grammatiska koncept å andra sidan; och b) att frekvens, synonymantal, imageability och grad av polysemi, till en viss grad (34%), kan förklara varför det finns variation i hur troligt det är att ett koncepts mest neutrala ord byts ut mot ett annat ord.

Makrostudien studerar många språk och föreslår tendenser som gäller oavsett semantisk domän. Meso- och mikrostudierna begränsar sig till en enda domän (färg) och antar successivt mindre och mindre tidsperspektiv och mer och mer detaljerad information om lexikal ersättning.

Page 222: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

210

En mindre omfattande version av kapitel tre är tidigare publicerad i Vejdemo & Hörberg (2016).

Mesostudien (i kapitel 5) är en tvärspråklig studie av hur talare av sju nära besläktade germanska språk (engelska, tyska, bernesiska, danska, svenska, norska och isländska) kategoriserar och lexikaliserar de rosa och lila delarna av det perceptuella färgspektrumet. Kapitlet berör främst forskningsfråga B1, och delar av kapitlet har tidigare publicerats i Vejdemo et al. (2015).

Studien kombinerar resultat från färgbenämningsexperiment med 146 talare av de sju språken; kunskap om historiska förändringar (som utvecklingen av färgningsindustrin i regionen); och data från lexikon och ordböcker.

Främst fokuserar studien på några få fall av lexikal ersättning och samtida semantisk förändring:

Under 1300–1700-talet etablerades ett nytt färgkoncept i språken som undersöks i studien, här kallat PURPLE1. Den moderna utsträckningen (denotationen) i färgrymden av PURPLE1 ses i Figur 1Figur 1. PURPLE1 representeras av olika färgord i de olika språken, men med undantag av engelskans purple är de alla kopplade till ordstammarna ”viol” och ”lila”, båda ursprungligen ord för blommor.

De moderna språken som undersöks i studien har också alla en ovanligare sekundär färgterm som används för olika delar av PURPLE1, ett s.k. Purple2-ord. Dessa ord har mist eller håller på att sakta men säkert mista sin historiska denotation och är nu endast är små, snart döende fragment. Dessa är kvarlevor från en tidigare lexikal ersättning som troligen skett relativt samtidigt i språken. Detta illustreras i Tabell 1.

Page 223: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

211

Figur 1. PURPLE1-konceptets denotation.

Figur 2. PINK1-konceptets denotation.

Figur 3. PINK2-konceptets denotation

Tabell 1. Namngivning av de lila delarna av färgrymden.

System A

PURPLE1 Sekundär, ovanlig, ljusare Purple2 Engelska purple lilac/violet

Bernesiska violett lila System B

PURPLE1 Sekundär, ovanlig, synonym Purple2 Tyska lila violett

Isländska fjólublár lilla Norska lilla violet

System C

PURPLE1 Sekundär, ovanlig, mörkare Purple2 Danska lilla violet

Svenska(?) lila (violett?)

A * * * AB * * * BC * * * * CD * * * * * D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

PURPLE1

A * * * * * * AB * * * * BC * * CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

PINK1

A AB * ? BC ? * * CD D

11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10

PINK2

Page 224: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

212

Mellan 1600- och 1800-talet växte ytterligare ett färgkoncept, PINK1, fram i språken, från ett område som tidigare täcktes av ord som röd på svenska och red på engelska. PINK1:s moderna denotation i språken illustreras i Figur 2.

I en delmängd av PINK1:s denotation framträder just nu ett nytt koncept i vissa av språken: PINK2. PINK2:s denotation ses i Figur 3. Språkens ord för PINK1och PINK2 ses i Tabell 2.

Tabell 2. Namngivning av de rosa delarna av färgrymden.

System A

PINK1 Engelska Pink Norska Rosa

Isländska Bleikur System B

PINK1 Sekundär mörkare PINK2 Tyska Rosa Pink

Bernesiska Rosa Pink Daniska Lyserød Pink Svenska Rosa Ceris

Meso-studien har alltså delvis rent deskriptiva resultat. De samtida

(synkrona) resultaten från färgbenämningsexperiment kan kombineras med data från lexikon och ordböcker. Detta gör att det går att uttyda historiska (diakrona) resultat om språkliga förändringar. Först och främst står det klart att den syn på lexikal ersättning som ett binärt tillstånd (det har antingen skett, eller inte skett) som antogs i den första studien, inte är så framgångsrikt på detta mindre tidsdjup där fler detaljer om enskilda ords historier blir tydliga. Här kan i stället frågor om hur ersättningen går till besvaras – det är exempelvis tydligt att PINK1-, PINK2- och PURPLE1-koncepten sprids via kontakt, och att de förs över tämligen intakta mellan talare – men att lexikaliseringsstrategierna skiljer sig. Vissa språk använder låneord (exv. svenskans rosa och lila, båda från franska, möjligen via tyska), andra reanalyserar existerande lexikalt material i språket (isländskans bleikur för PINK1-konceptet betydde ursprungligen ’blek’), och återigen andra sätter samman nya ord (danskans lyserød, som trots sin form (lyse ’ljus’) nu inte ses som en sorts röd nyans av talarna, utan behandlas som ett eget färgkoncept).

Page 225: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

213

Det synkrona tvärspråkliga perspektivet har sina styrkor (exempelvis visar det att alla de sekundära, ovanliga Purple2-orden är del av en liknande historisk process, trots att de i de enskilda språken har olika denotation) men också sina svagheter (vi vet exempelvis att Purple2-ordet violett i svenska vid något tillfälle och till någon grad har ersatts av PURPLE1-ordet lila – men vi vet inte mer än så).

Mikrostudien undersöker lexikala ersättningsprocesser i PINK1- och PURPLE1-konceptens denotationer i svenska. Studien kombinerar data från ordböcker, lexikon, botaniska lexikon, skönlitterära textkorpusar, intervjuer med talare och resultat från färgbenämningsexperiment med två generationer av svenska talare. Studien är en av de första i sitt slag att använda färgbenämningsexperiment för att jämföra beteendet hos två generationer talare av samma språk. Diakrona förändringar uttyds därefter från generationsskillnaderna.

Studien undersöker hypotesen att det finns skillnader i färgbenämning mellan de äldre och de yngre talarna när det gäller några specifika delar av färgspektrumet: PINK1 och PURPLE1-koncepten. Detta undersöks med en rad olika och kompletterande metoder, som Wilcoxontest av färgordens denotation och användandet av ett entropimått för att jämföra hur hög konsensus de två generationerna har i sin färgordsanvändning. Studien visar på nyttan i att jämföra skillnader i denotation mellan grupper – ofta tittar färglingvister annars mest på skillnader i färgkonceptens centralpunkter, inte hur de skiljer sig i utbredning i färgspektrumet.

Olika datakällor (experiment, skönlitterära korpus etc.) har olika tröghetsnivåer när det gäller hur fort de speglar förändringar i det talade språket, och olika genrer kan ha specifika konventioner. Det innebär att datakällorna visar upp olika svar på frågan om när färgtermsförändringar har skett i svenskan. Källorna kan dock vägas ihop och visar då på samma faser i färgtermsutvecklingen. En extremt förkortad sammanfattning är att violett ersatte delar av denotationen för ord som brun och sammansatta färgtermer som rödblå före 1800-talet. Violett utmanades av gredelin i 1800-talets slut, men ersattes slutligen av lila i mitten och slutet av 1900-talet. Skär ersatte under 1800-talet delar av denotationen som förut täckts av röd, bara för att i sin tur bli ersatt av rosa under 1900-talets mitt och slut. Just nu håller cerise på att etablera sig och ersätter successivt delar av den mörkare delen av rosa. Dessa ersättningsprocesser är gradvisa och komplicerade, och sker i samklang med semantiska förändringar.

Generationsjämförelsen i mikrostudien gör det möjligt att identifiera flera olika lexikala förändringsprocesser i datat, däribland restriktion och

Page 226: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

214

expansion (välkända processer i språkförändring överlag), rörelse (kopplat till restriktion och expansion), fokusering och denotationsförlust. Processerna diskuteras utförligt i kapitel 6.

Insikter från mikrostudien och mesostudien kan med fördel kombineras (forskningsfråga C). Exempelvis visar mesostudien att PINK2-koncept just nu lexikaliseras i flera närbesläktade språk, och detta förstärker bilden från mikrostudien att den svenska färgtermen cerise nog inte är en kortlivad temporär modeterm. Cerise må ha en mycket begränsad denotation, men till skillnad från andra färgtermer som täcker små utsträckningar i färgrymden så är den snarare fokuserad än bara ovanlig. Som jämförelse blir det tydligt när meso- och mikrostudiernas resultat jämförs, att den svenska färgtermen skär inte har någon tydlig motsvarighet i de andra närbesläktade språken. Ett annat exempel: ordet skärs historia liknar på vissa sätt violetts historia: båda orden genomgår i nutida svenska semantiska förändringar och tar del i en lexikal ersättningsprocess. Den tvärspråkliga mesostudien visar dock på en viktig skillnad mellan orden: det finns ett tvärspråkligt mönster där violett ingår, som visar att violett troligen är en restprodukt av en lexikal ersättningsprocess där ett nytt PURPLE1- ord (i svenskans fall lila) tar över.

Mikrostudiens detaljerade perspektiv kan också kombineras med de domänöverskridande insikterna från makrostudien:

Den statistiska modellen som presenteras i makrostudien indikerar att det finns en koppling mellan en låg frekvens och en högre sannolikhet för lexikal ersättning. I färgdomänen motsvaras detta av att färgtermer som håller på att försvinna från språket givetvis används mindre. Modellen visar även på en koppling mellan ett stort antal synonymer och en högre grad av lexikal ersättning. För färgdomänen motsvaras detta konkret av den tävlan som uppkommer mellan ord i en talargemenskap, som i fallet PURPLE1 där tre ord har tävlat om att vara det mest använda under de senaste två århundradena. Vidare innehåller modellen imageability som en faktor: ju lättare det är att föreställa sig ett ords betydelse i sinnet (hög imageability) desto mindre troligt är det att ordet byts ut. En hög nivå av imageability kan i färgdomänens fall kopplas till en hög nivå av konsensus inom en grupp av talare om en färgterms denotation, som i fallet när den äldre generationen talare visar mycket låg gruppintern konsensus på vilka färgnyanser som termerna skär och violett åsyftar. Slutligen visar modellen i makrostudien på en effekt av polysemi: ju fler underbetydelser ett ord har, desto troligare är det att det inte ersätts. Att en högre grad av polysemi skulle ge visst skydd mot lexikal ersättning kan motsvaras av att färgtermer kan behållas i språket trots att deras denotation är osäker, för att de har många konnotationer

Page 227: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

215

(associationer). Det tydligaste exemplet är gredelin, som fortfarande är en aktiv och igenkänd färgterm åtminstone i den äldre generationen talare, trots att dess denotation är ytterst osäker: ordet överlever på grund av en stark association till en specifik sagofigur, Elsa Beskows Tant Gredelin.

Slutord

Sammanfattningsvis visar avhandlingen att lexikal ersättning är en komplex process som med fördel bör betraktas ur kompletterande perspektiv, både vad gäller tidsdjup och domänomfång. Ur ett brett tidsdjupsperspektiv där många språk och olika semantiska domäner undersöks, kan lexikal ersättning betraktas som ett binärt faktum. Detta möjliggör då undersökningar medelst statistiska regressionsmodeller, vilket ger domänoberoende generaliseringar om varför lexikal ersättning sker. Ett annat sätt att betrakta lexikal ersättning är att undersöka enskilda semantiska domäner. Då betraktas lexikal ersättning bäst i samklang med andra besläktade lexikala processer, som semantisk förändring. Färgdomänen har en rik forskningstradition inom synkron lexikal typologi vilket gör det till en ypperlig kandidat för studier om lexikal ersättning, då insikterna om synkron variation kan användas till att förstå diakrona förändringar. Avhandlingen har således: � Deskriptiva forskningsbidrag, såsom beskrivningar av enskilda ords och

koncepts utveckling, � Teoretiska forskningsbidrag, såsom förslag på domänoberoende faktorer

som påverkar lexikal ersättning av icke-grammatiska koncept, samt att dessa bör analyseras separat från grammatiska koncept när det gäller ersättning. Avhandlingen har även detaljerade diskussioner kring hur ersättning kan fortgå i detalj i färgdomänen,

� Metodologiska forskningsbidrag, såsom att avhandlingen använder färgbenämningsexperiment för att jämföra två generationer talare av samma språk. Avhandlingen visar också hur en rad konkreta kvantitativa metoder, exempelvis Wilcoxon-test och entropimått, kan användas för att jämföra generationernas färgbenämning och färgkategorisering.

Page 228: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

216

REFERENCES (General works listed first, followed by separate sections for corpora; floras; dictionaries and encyclopedias.)

Altman, H. (1999). Farbadjektiva in Deutschen. In W. Falkner & H.-J. Schmid (Eds.), Words,

Lexemes, Concepts, Approaches to the Lexicon: Studies in Honour of Leonhard Lipka (pp. 121–132).

Tübingen: Gunter Narr Verlag.

Alvarado, N., & Jameson, K. (2002). The Use of Modifying Terms in the Naming and

Categorization of Color Appearances in Vietnamese and English. Journal of Cognition and Culture,

2(1), 53–80.

Ambjörnsson, F. (2011). Rosa : den farliga färgen. Stockholm: Ordfront.

Anttila, R. (1989). Historical and Comparative Linguistics. Amsterdam ; Philadelphia: John

Benjamins Publishing Company.

Archibald, J. (1989). A lexical model of color space. In R. Corrigan, F. R. Eckman, & M.

Noonan (Eds.), Linguistic Categorization (pp. 31–53). Amsterdam: John Benjamins Publishing

Company. Retrieved October 2, 2014, from

Baayen, R. H. (2008). Analyzing linguistic data: a practical introduction to statistics using R.

Cambridge, UK ; New York: Cambridge University Press.

Babyak, M. A. (2004). What you see may not be what you get: a brief, nontechnical

introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411–421.

Beck, D. (2016). Some language-particular terms are comparative concepts. Linguistic Typology,

20(2), 395–402.

Bergsland, K., & Vogt, H. (1962). On the Validity of Glottochronology. Current Anthropology,

3(2), 115–153.

Berlin, B., & Berlin, E. A. (1975). Aguaruna Color Categories. American Ethnologist, 2(1), 61–

87.

Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley: University

of California Press.

Beskow, E. (1918). Tant Grön, tant Brun och tant Gredelin. Stockholm: Bonnier.

Biggam, C. P. (2012). The Semantics of Colour: A Historical Approach. New York: Cambridge

University Press.

Bird, H., Franklin, S., & Howard, D. (2001). Age of acquisition and imageability ratings for a

large set of words, including verbs and function words. Behavior Research Methods, Instruments, &

Computers, 33(1), 73–79.

Blank, A. (1997). Prinzipien des lexikalischen Bedeutungswandels am Beispiel der romanischen Sprachen.

Tübingen: Niemeyer.

Blank, A. (2001). Pathways of lexicalization. In M. Haspelmath, E. König, W. Oesterreicher,

& W. Raible (Eds.), Language Typology and Language Universals (Vol. 2, pp. 1596–1608). Berlin •

New York: Walter de Gruyter. Retrieved September 7, 2011, from http://www.reference-

global.com/doi/abs/10.1515/9783110171549.2.15.1596

Page 229: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

217

Blumenthal-Dramé, A. (2013). Entrenchment in Usage-Based Theories: What Corpus Data Do and

Do Not Reveal About The Mind. Berlin: Walter de Gruyter.

Boas, F. (1911). Introduction. In F. Boas (Ed.), Handbook of American Indian languages (Vols. 1–

1, pp. 1–84). Washington: Government Printing Office.

Bochkarev, V., Solovyev, V., & Wichmann, S. (2014). Universals versus historical

contingencies in lexical evolution. Journal of The Royal Society Interface, 11(101), digital, no page

range.

Bonin, P., Peereman, R., Malardier, N., Méot, A., & Chalard, M. (2003). A new set of 299

pictures for psycholinguistic studies: French norms for name agreement, image agreement,

conceptual familiarity, visual complexity, image variability, age of acquisition, and naming

latencies. Behavior Research Methods, Instruments, & Computers, 35(1), 158–167.

Boussidan, A. (2013). Dynamics of Semantic Change: Detecting, analyzing and modeling semantic change

in corpus in short diachrony. Bron: CNRS, UMR 5304 Institut des Sciences Cognitives.

Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction

manual and affective ratings (Technical Report No. C-1) (pp. 1–47). The Center for Research in

Psychophysiology, University of Florida.

Bréal, M. (1900). Semantics: Studies in the Science of Meaning. London: W. Heinemann. Retrieved

October 13, 2015, from http://archive.org/details/semanticsstudie00postgoog (Original work

published 1897)

Brown, C. H., & Witkowski, S. R. (1983). Polysemy, lexical change and cultural importance.

Man, 18(1), 2–89.

Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand

generally known English word lemmas. Behavior Research Methods, 46(3), 904–911.

Buck, C. D. (1949). A Dictionary of Selected Synonyms in the Principal Indo-European Languages.

Chicago: University of Chicago Press.

Burridge, K. (2006). Taboo Words. In K. Brown (Ed.), Encyclopedia of Language & Linguistics

(second edition, pp. 452–455). Oxford: Elsevier. Retrieved December 12, 2013, from

http://www.sciencedirect.com/science/article/pii/B0080448542007781

Bybee, J. L. (1985). Morphology: A Study of the Relation Between Meaning and Form. Amsterdam ;

Philadelphia: John Benjamins Publishing Company.

Bybee, J. L. (2007). Frequency of use and the organization of language. Oxford; New York: Oxford

University Press. Retrieved March 13, 2013, from http://site.ebrary.com/id/10194230

Byrne, A., & Hilbert, D. R. (2003). Color realism and color science. The Behavioral and Brain

Sciences, 26(1), 3–63.

Calude, A. S., & Pagel, M. (2011). How do we use language? Shared patterns in the frequency

of word use across 17 world languages. Philosophical Transactions of the Royal Society B: Biological

Sciences, 366(1567), 1101–1107.

Canty, A., & Ripley, B. (2013). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-

15. (Version R package version 1.3-15.).

Page 230: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

218

Caskey-Sirmons, L. A., & Hickerson, N. P. (1977). Semantic Shift and Bilingualism: Variation

in the Color Terms of Five Languages. Anthropological Linguistics, 19(8), 358–367.

Casson, R. W. (1994). Russett, Rose, and Raspberry: The Development of English Secondary

Color Terms. Journal of Linguistic Anthropology, 4(1), 5–22.

Casson, R. W. (1997). Color shift: evolution of English color terms from brightness to hue.

In C. L. Hardin & L. Maffi (Eds.), Color Categories in Thought and Language (pp. 224–239).

Cambridge: Cambridge University Press. Retrieved from

http://dx.doi.org/10.1017/CBO9780511519819.010

Chambers, W. W., & Wilkie, J. R. (1970). Short History of the German Language. London:

Methuen.

Cortese, M. J., & Fugett, A. (2004). Imageability ratings for 3,000 monosyllabic words.

Behavior Research Methods, Instruments, & Computers, 36(3), 384–387.

Cortese, M. J., & Khanna, M. M. (2007). Age of acquisition ratings for 3,000 monosyllabic

words. Behavior Research Methods, 40(3), 791–794.

Courtade, I. (1996). Die purpurrote Rose aus Kairo war lila: Fehler bei der Übersetzung von

Farbadjektiven (Deutsch-Englisch-Spanisch), deren Ursprung und Beispiele aus Fachtexten,

Filmtiteln und Kunst. Die Unterrichtspraxis/Teaching German, 29(1), 77–80.

Croft, W. (2000). Explaining language change : an evolutionary approach. Harlow England ; New

York: Longman.

Crutch, S. J., & Warrington, E. K. (2005). Abstract and concrete concepts have structurally

different representational frameworks. Brain, 128(3), 615–627.

Curtin, J. (2014). lmSupport: Support for Linear Models (Version R package version 2.9.1.).

Retrieved from http://CRAN.R-project.org/package=lmSupport

Dahl, Ö. (2004). The growth and maintenance of linguistic complexity. Amsterdam ; Philadelphia:

John Benjamins.

Dahl, Ö. (2016). Thoughts on language-specific and crosslinguistic entities. Linguistic Typology,

20(2), 427–437.

D’Arcy, A. (2006). Lexical Replacement and the Like(s). American Speech, 81(4), 339–357.

Darmesteter, A. (1887). La vie des mots étudiée dans leurs significations. Paris: C. Delagrave.

Desgrippes, M. (2011). Sur les traces de l’évolution de la catégorie orange (Master Thesis). Université de

Fribourg. Retrieved October 29, 2016, from http://doc.rero.ch/record/27075

Diaz, M. T., & McCarthy, G. (2009). A comparison of brain activity evoked by single content

and function words: an fMRI investigation of implicit word processing. Brain Research, 1282,

38–49.

Dolgopolsky, A. B. (1986). A probabilistic hypothesis concerning the oldest relationships

among the language families in northern Eurasia. In V. V. S ̌evoroskin & T. L. Markey (Eds.),

Typology, relationship and time: a collection of papers on language change and relationship. Ann Arbor:

Karoma.

Page 231: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

219

Dougherty, J. W. D. (1977). Color Categorization in West Futunese: Variability and Change.

In B. G. Blount & M. Sanches (Eds.), Sociocultural Dimensions of Language Change (pp. 103–118).

New York: Academic Press. Retrieved January 27, 2015, from

http://www.sciencedirect.com/science/article/pii/B9780121074500500127

Dunn, M. (2015). Language Philogenies. In C. Bowern & B. Evans (Eds.), The Routledge

Handbook of Historical Linguistics (p. 190-). Routledge.

Dunn, M., & Terrill, A. (2012). Assessing the lexical evidence for a Central Solomons Papuan

family using the Oswalt Monte Carlo Test. Diachronica, 29(1), 1–27.

Dyen, I., James, A. ., & Cole, J. W. . (1967). Language Divergence and Estimated Word

Retention Rate. Language, 43(1), 150.

Dyen, I., Kruskal, J. B., & Black, P. (1992). An Indoeuropean classification : a lexicostatistical

experiment. Philadelphia: American Philosophical Society.

Edmonds, J. (2000). Tyrian or Imperial Purple Dye: The Mystery of Imperial Purple Dye. Little

Chalfont: John Edmonds.

Fan, Y. (1996). Farbnomenklatur Im Deutschen Und Im Chinesischen: Eine Kontrastive Analyse Unter

Psycholinguistischen, Semantischen Und Kulturellen Aspekten. Frankfurt am Main ; New York: P. Lang.

Fellbaum, C. (1999). WordNet an electronic lexical database (second edition). Cambridge Mass:

MIT Press.

Finlay, V. (2004). Color: A Natural History of the Palette. New York: Random House Trade

Paperbacks.

Frenzel-Biamonti, C. (2011). Rosa Schätze – Pink zum kaufen: Stylistic confusion, subjective

perception and semantic uncertainty of a loaned colour term. In C. P. Biggam, C. Hough, C.

Kay, & D. R. Simmons (Eds.), New Directions in Colour Studies (pp. 91–104). Amsterdam: John

Benjamins Publishing Company. Retrieved March 8, 2016, from

https://benjamins.com/catalog/z.167.13fre

Garfield, S. (2000). Mauve: how one man invented a colour that changed the world. London: Faber and

Faber.

Geeraerts, D. (1997). Diachronic prototype semantics : a contribution to historical lexicology. Oxford:

Clarendon.

Geeraerts, D. (2004). Theories of lexical semantics a cognitive perspective. Oxford: Oxford University

Press.

Geeraerts, D., Grondelaers, S., & Bakema, P. (1994). The structure of lexical variation : meaning,

naming, and context. Berlin ;;New York: M. de Gruyter.

Gellerstam, M. (2009). SAOL och tidens flykt : några nedslag i ordlistans historia. Stockholm:

Norstedt. Retrieved May 29, 2015, from

Goossens, J. (1969). Strukturelle Sprachgeographie. C. Winter.

Gray, R. D., & Atkinson, Q. D. (2003). Language-tree divergence times support the

Anatolian theory of Indo-European origin. Nature, 426(6965), 435–439.

Page 232: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

220

Greenhill, S. J. (2011). Levenshtein distances fail to identify language relationships accurately.

Computational Linguistics, 37(4), 689–698.

Greenhill, S. J., Blust, R., & Gray, R. D. (2008). The Austronesian Basic Vocabulary

Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics, 4, 271–283.

Gries, S. T. (2010). Useful statistics for corpus linguistics. In A. Sánchez (Ed.), A Mosaic of

Corpus Linguistics: Selected Approaches (pp. 269–292). Frankfurt am Main: Peter Lang.

Grondelaers, S., Speelman, D., & Geeraerts, D. (2007). Lexical Variation and Change. In D.

Geeraerts & H. Cuyckens (Eds.), The Oxford handbook of cognitive linguistics (pp. 987–1011).

Oxford ; New York: Oxford University Press.

Grzega, J. (2003). Borrowing as a Word-Finding Process in Cognitive Historical

Onomasiology. Onomasiology Online, 4, 22–42.

Grzega, J. (2004). A Qualitative and Quantitative Presentation of the Forces for Lexemic

Change in the History of English. Onomasiology Online, 5, 15–55.

Gude, J. A., Mitchell, M. S., Ausband, D. E., Sime, C. A., & Bangs, E. E. (2009). Internal

Validation of Predictive Logistic Regression Models for Decision-Making in Wildlife

Management. Wildlife Biology, 15(4), 352–369.

Hage, P., & Hawkes, K. (1975). Binumarien Color Categories. Ethnology, 14(3), 287–300.

Halle, M. (1962). Phonology in Generative Grammar. WORD, 18(1–3), 54–72.

Hannesdóttir, A. H. (1998). Lexikografihistorisk Spegel: Den Enspråkiga Svenska

Lexikografins Utveckling Ur Den Tvåspråkiga. Göteborg: Göteborgs universitet.

Harrell, F. E. (2001). Regression Modeling Strategies with Applications to Linear Models, Logistic

Regression, and Survival Analysis. New York, NY: Springer New York. Retrieved February 16,

2015, from http://dx.doi.org/10.1007/978-1-4757-3462-1

Haspelmath, M. (2004). On directionality in language change with particular reference to

grammaticalization. In O. Fischer, M. Norde, & H. Perridon (Eds.), Up and Down the Cline--the

Nature of Grammaticalization (pp. 17–44). Amsterdam/Philadelphia: John Benjamins Publishing.

Haspelmath, M. (2010). Comparative concepts and descriptive categories in crosslinguistic

studies. Language, 86(3), 663–687.

Haspelmath, M. (2016). The challenge of making language description and comparison

mutually beneficial. Linguistic Typology, 20(2), 299–303.

Haspelmath, M., & Tadmor, U. (2009a). Loanwords in the world’s languages: a comparative

handbook. Berlin: De Gruyter Mouton.

Haspelmath, M., & Tadmor, U. (Eds.). (2009b). WOLD. Leipzig: Max Planck Institute for

Evolutionary Anthropology. Retrieved December 17, 2015, from http://wold.clld.org/

Hassibi, B., & Shadbakht, S. (2007). Normalized Entropy Vectors, Network Information

Theory and Convex Optimization (pp. 1–5). Presented at the 2007 IEEE Information Theory

Workshop on Information Theory for Wireless Networks.

Haugen, E. (1950). The Analysis of Linguistic Borrowing. Language, 26(2), 210.

Page 233: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

221

Haugen, E. (1987). Danish, Norwegian and Swedish. In B. Comrie, The World’s major languages

(pp. 157–179). New York: Oxford University Press.

Heider, E. R. (1972a). Probabilities, Sampling, and Ethnographic Method: The Case of Dani

Colour Names. Man, 7(3), 448–466.

Heider, E. R. (1972b). Universals in color naming and memory. Journal of Experimental

Psychology, 93(1), 10–20.

Heine, B., Claudi, U., & Hünnemeyer, F. (1991). Grammaticalization: A conceptual framework.

Chicago: University of Chicago Press.

Hock, H. H. (1986). Principles of Historical Linguistics. Mouton de Gruyter.

Hodgson, C., & Ellis, A. W. (1998). Last in, First to Go: Age of Acquisition and Naming in

the Elderly. Brain and Language, 64(1), 146–163.

Hopper, P. J., & Traugott, E. (2003). Grammaticalization. Cambridge: Cambridge University

Press.

Hylander, N. (1953). Nordisk kärlväxtflora omfattande Sveriges, Norges, Danmarks, Östfennoskandias,

Islands och Färöarnas kärlkryptogamer och fanerogamer. I. Stockholm: Almqvist & Wiksell.

Ingram, J. C. L. (2007). Neurolinguistics: an introduction to spoken language processing and its disorders.

Cambridge: Cambridge University Press.

Janschewitz, K. (2008). Taboo, emotionally valenced, and emotionally neutral word norms.

Behavior Research Methods, 40(4), 1065–1074.

Jespersen, O. (1922). Language : its nature, development and origin. London: Allen & Unwin.

Johnson, E. (1996). Lexical Change and Variation in the Southeastern United States, 1930-1990. The

University of Alabama Press. Retrieved October 31, 2016, from

https://muse.jhu.edu/book/6767

Johnston, R. A., & Barry, C. (2006). Age of acquisition and lexical processing. Visual

Cognition, 13(7–8), 789–845.

Jones, W. J. (2013). German colour terms: a study in their historical evolution from earliest times to

present. Amsterdam: John Benjamins Publishing Company.

Juola, P. (2003). The Time Course of Language Change. Computers and the Humanities, 37(1),

77–96.

Juola, P. (2005). Language change and historical inquiry. In Humanities, Computers and Cultural

Heritage (pp. 169–175). Amsterdam: Royal Netherlands Academy of Arts and Sciences.

Retrieved August 31, 2016, from

http://repository.ubn.ru.nl/bitstream/handle/2066/56711/56711.pdf

Kapitan, M. E. (1994). Influence of various system features of romance words on their

survival*. Journal of Quantitative Linguistics Journal of Quantitative Linguistics, 1(3), 237–250.

Kaufmann, C. (2006). Zur Semantik Der Farbadjektive Rosa, Pink Und Rot: Eine Korpusbasierte

Vergleichsuntersuchung Anhand Des Farbträgerkonzepts. München: Utz, Herbert.

Kay, P. (1975). Synchronic Variability and Diachronic Change in Basic Color Terms. Language

in Society, 4(3), 257–270.

Page 234: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

222

Kay, P., Berlin, B., Maffi, L., & Merrifield, W. (1997). Color Naming across Languages. In C.

L. Hardin & L. Maffi (Eds.), Color Categories in Thought and Language (pp. 21–56). Cambridge:

Cambridge University Press.

Kay, P., Berlin, B., Maffi, L., Merrifield, W., & Cook, R. S. (2009). World Color Survey.

Stanford: CSLI Publications. Retrieved September 22, 2014, from

Kay, P., Berlin, B., & Merrifield, W. (1991). Biocultural Implications of Systems of Color

Naming. Journal of Linguistic Anthropology, 1(1), 12–25.

Kay, P., & Maffi, L. (1999). Color Appearance and the Emergence and Evolution of Basic

Color Lexicons. American Anthropologist, 101, 743–760.

Kay, P., & McDaniel, C. K. (1978). The Linguistic Significance of the Meanings of Basic

Color Terms. Language, 54, 610–646.

Key, M. R., & Comrie, B. (Eds.). (2015). The Intercontinental Dictionary Series. Leipzig: Max

Planck Institute for Evolutionary Biology. Retrieved January 4, 2017, from http://ids.clld.org

King, J. W., & Kutas, M. (1995). Who Did What and When? Using Word- and Clause-Level

ERPs to Monitor Working Memory Usage in Reading. Journal of Cognitive Neuroscience, 7(3), 376–

395.

Klein, T. P. (1999). Six Colour Words in the Pearl Poet: Blake, Blayke, Bla3t, Blwe, Blo &

Ble. Studia Neophilologica, 71(2), 156–158.

Kleparski, G. (1997). Theory and practice of historical semantics: the case of Middle English and early

modern English synonyms of girl/young women. Lublin: University Press of the Catholic University of

Lublin.

Kleparski, G., & Borkowska, P. (2007). A Note on Synonymy: Synchronic and Diachronic.

Studia Anglica Resoviensia, 4, 126–139.

Koch, P. (2016). Meaning change and semantic shifts. In P. Juvonen & M. Koptjevskaja-

Tamm, The Lexical Typology of Semantic Shifts (pp. 21–66). Berlin/Boston: De Gruyter Mouton.

Koller, V. (2008). “Not just a colour”: pink as a gender and sexuality marker in visual

communication. Visual Communication, 7(4), 395–423.

Koptjevskaja-Tamm, M. (2008). Approaching lexical typology. In M. Vanhove (Ed.), From

polysemy to semantic change: towards a typology of lexical semantic associations (pp. 3–52). Amsterdam:

John Benjamins.

Koptjevskaja-Tamm, M. (2012). New directions in lexical typology. Linguistics, 50(3), 373–

394.

Koptjevskaja-Tamm, M. (2014). The Linguistics of Temperature. Amsterdam: John Benjamins

Publishing Company. Retrieved October 10, 2014, from

http://www.bokus.com/bok/9789027206886/the-linguistics-of-temperature/

Koptjevskaja-Tamm, M. (2016). “The lexical typology of semantic shifts”: An introduction.

In P. Juvonen & M. Koptjevskaja-Tamm (Eds.), The Lexical Typology of Semantic Shifts (pp. 1–20).

Berlin, Boston: De Gruyter. Retrieved November 28, 2016, from

Page 235: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

223

http://www.degruyter.com/view/books/9783110377675/9783110377675-

001/9783110377675-001.xml

Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings

for 30,000 English words. Behavior Research Methods, 44(4), 978–990.

Labov, W. (1972). Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.

Ladd, D. R., Roberts, S. G., & Dediu, D. (2015). Correlational Studies in Typological and

Historical Linguistics. Annual Review of Linguistics, 1(1), 221–241.

Lander, Y., & Arkadiev, P. (2016). On the right of being a comparative concept. Linguistic

Typology, 20(2), 403–416.

Langacker, R. W. (1987). Foundations of cognitive grammar. Vol. 1, Theoretical prerequisites. Stanford:

Stanford University Press.

LaPolla, R. J. (2016). On categorization: Stick to the facts of the languages. Linguistic Typology,

20(2), 365–375.

Lehrer, A. (1985). The influence of semantic fields on semantic change. In J. Fisiak, Historical

semantics, historical word formation (pp. 283–296). Berlin ; New York: Mouton Publishers.

Lehrer, A. (1992). A theory of vocabulary structure: Retrospectives and prospectives. In M.

Pütz (Ed.), Thirty Years of Linguistic Evolution (p. 243). Amsterdam: John Benjamins Publishing

Company. Retrieved November 16, 2016, from https://benjamins.com/catalog/z.61.21leh

Lenneberg, E. H., & Roberts, J. M. (1956). The language of experience; a study in methodology.

Baltimore: Waverly Press.

Levinson, S. (1995). Three Levels of Meaning. In F. R. Palmer (Ed.), Grammar and Meaning.

Essays in Honour of Sir John Lyons (pp. 90–115). Cambridge: Cambridge University Press.

Levinson, S. C. (2000). Yélî Dnye and the Theory of Basic Color Terms. Journal of Linguistic

Anthropology, 10(1), 3–55.

Levisen, C. (2012). Cultural Semantics and Social Cognition. A Case Study on the Danish Universe of

Meaning. Berlin, Boston: De Gruyter Mouton. Retrieved March 8, 2016, from

http://www.degruyter.com/view/product/184940

Lindsey, D. T., & Brown, A. M. (2014). The color lexicon of American English. Journal of

Vision, 14(2).

Lohr, M. (1999). Methods for the genetic classification of languages. University of Cambridge,

Cambridge.

Lucy, J. (1997). The linguistics of colour. In C. L. Hardin & L. Maffi (Eds.), Color Categories in

Thought and Language. Cambridge ; New York: Cambridge University Press.

Ludtke, H. (1985). Diachronic irreversibility in word-formation and semantics. In J. Fisiak

(Ed.), Historical Semantics, Historical Word-Formation (pp. 355–366). Berlin ; New York ;

Amsterdam: Mouton Publishers.

MacLaury, R. E. (1991). Social and Cognitive Motivations of Change: Measuring Variability

in Color Semantics. Language, 67(1), 34–62.

Page 236: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

224

MacLaury, R. E. (1995). Vantage Theory. In J. R. Taylor & R. E. MacLaury (Eds.), Language

and the cognitive construal of the world (pp. 231–276). Berlin ; New York: Mouton de Gruyter.

MacLaury, R. E. (1997). Color and cognition in Mesoamerica: constructing categories as vantages. Austin:

University of Texas Press.

Majid, A. (2008). Focal colours. In A. Majid (Ed.), Field Manual Volume 11 (Vols. 1–11, pp. 8–

10). Nijmegen: Max Planck Institute for Psycholinguistics.

Majid, A. (2010). Words for Parts of the body. In B. Malt & P. Wolff (Eds.), Words and the

mind how words capture human experience. Oxford;New York: Oxford University Press.

Majid, A., Jordan, F., & Dunn, M. (2011). Evolution of Semantic Systems Procedures Manual.

Nijmegen: Max Planck Institute for Psycholinguistics.

Majid, A., Jordan, F., & Dunn, M. (2015). Semantic systems in closely related languages.

Language Sciences, 49, 1–18.

Majid, A., & Levinson, S. C. (2007). The language of vision I: colour. In A. Majid (Ed.), Field

Manual Volume 10 (Vols. 1–10, pp. 22–25). Nijmegen: Max Planck Institute for

Psycholinguistics. Retrieved September 23, 2014, from

http://fieldmanuals.mpi.nl/volumes/2007/language-of-vision-colour/

Mårtensson, F., Roll, M., Apt, P., & Horne, M. (2011). Modeling the meaning of words:

neural correlates of abstract and concrete noun processing. Acta Neurobiologiae Experimentalis,

71(4), 455–478.

Mayer, T., & Cysouw, M. (2014). Creating a Massively Parallel Bible Corpus. In Proceedings of

the International Conference on Language Resources and Evaluation (LREC) (pp. 3158–3163).

Reykjavik.

Meillet, A. (1921). Linguistique historique et linguistique générale. In Comment les mots changent

de sens (pp. 230–271). Paris: Champion.

Monaghan, P. (2014). Age of acquisition predicts rate of lexical evolution. Cognition, 133(3),

530–534.

Moors, A., De Houwer, J., Hermans, D., Wanmaker, S., van Schie, K., Van Harmelen, A.-L.,

… Brysbaert, M. (2013). Norms of valence, arousal, dominance, and age of acquisition for 4300

Dutch words. Behavior Research Methods, 45(1), 169–177.

Moravcsik, E. A. (1975). Borrowed verbs. Wiener Linguistische Gazette, 8, 3–30.

Moravcsik, E. A. (2016). On linguistic categories. Linguistic Typology, 20(2), 417–425.

Morrison, C. M., Chappell, T. D., & Ellis, A. W. (1997). Age of Acquisition Norms for a

Large Set of Object Names and Their Relation to Adult Estimates and Other Variables.

Quarterly Journal of Experimental Psychology: Section A, 50(3), 528–559.

Münte, T. F., Wieringa, B. M., Weyerts, H., Szentkuti, A., Matzke, M., & Johannes, S. (2001).

Differences in brain potentials to open and closed class words: class and frequency effects.

Neuropsychologia, 39(1), 91–102.

Muysken, P. (2000). Bilingual Speech: A Typology of Code-Mixing. Cambridge: Cambridge

University Press.

Page 237: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

225

Nerlich, B. (2001). Approaches to Semantics in the 19th and the First Third of the 20th

century. In S. Auroux & H.-J. Niederehe (Eds.), Handbooks of Linguistics and Communication Science

(HSK) : An International Handbook on the Evolution of the Study of Language from the Beginnings to the

Present (pp. 1596–1600). Berlin/Boston: De Gruyter Mouton.

Nerlich, B., & Clarke, D. D. (2001). Ambiguities we live by: towards a pragmatics of

polysemy. Journal of Pragmatics, 33(1), 1–20.

Pagel, M., Atkinson, Q. D., & Meade, A. (2007). Frequency of word-use predicts rates of

lexical evolution throughout Indo-European history. Nature, 449(7163), 717–720.

Paivio, A. (2013). Imagery and verbal processes. New York ; London: Psychology Press. (Original

work published 1971)

Paradis, C. (2011). Metonymization: A key mechanism in semantic change. In R. Benczes, A.

Barcelona, & F. J. Ruis de Mendoza Ibáñez (Eds.), Defining Metonymy in Cognitive Linguistics (pp.

61–88). Amsterdam ; Philadelphia: John Benjamins Publishing Company.

Paramei, G. V., & Oakley, B. (2014). Variation of color discrimination across the life span.

Journal of the Optical Society of America A, 31(4), A375–A384.

Paul, H. (1886). Prinzipien der Sprachgeschichte. Halle: Max Niemeyer. Retrieved April 19, 2016,

from http://archive.org/details/prinzipiendersp01paulgoog

Paul, H., Henne, H., Kämper, H., & Objartel, G. (2002). Deutsches Wörterbuch:

Bedeutungsgeschichte und Aufbau unseres Wortschatzes (tenth edition). Tübingen: Niemeyer. Retrieved

November 8, 2016, from http://site.ebrary.com/id/10595774

Petersen, A. M., Tenenbaum, J., Havlin, S., & Stanley, H. E. (2012). Statistical Laws

Governing Fluctuations in Word Use from Word Birth to Word Death. Scientific Reports, 2.

Retrieved April 3, 2013, from

http://www.nature.com/srep/2012/120315/srep00313/full/srep00313.html

Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and

future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.

Pinker, S. (1994, April 3). The Game of the Name. New York Times.

R Development Core Team. (2013). R: A language and environment for statistical computing.

Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria. Retrieved from

http://www.R-project.org

Rakhilina, E. (2007). Linguistic construal of colors: The case of Russian. In R. E. MacLaury,

P. Galina Galina V., & D. Dedrick (Eds.), Anthropology of color interdisciplinary multilevel modeling

(pp. 363–377). Amsterdam ; Philadelphia: John Benjamins Publishing Company.

Rakhilina, E., & Paramei, G. V. (2011). Colour terms - Evolution via expansion of taxonomic

constraints. In C. P. Biggam, C. A. Hough, C. J. Kay, & D. R. Simmons (Eds.), New Directions in

Colour Studies. Amsterdam ; Philadelphia: John Benjamins Publishing Company.

Rama, T. (2016). Siamese convolutional networks based on phonetic features for cognate

identification. arXiv:1605.05172 [Cs]. Retrieved January 4, 2017, from

http://arxiv.org/abs/1605.05172

Page 238: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

226

Rea, J. A. (1958). Concerning the Validity of Lexicostatistics. International Journal of American

Linguistics, 24(2), 145–150.

Redondo, J., Fraga, I., Padrón, I., & Comesaña, M. (2007). The Spanish adaptation of ANEW

(affective norms for English words). Behavior Research Methods, 39(3), 600–605.

Regier, T., Kay, P., Gilbert, A. L., & Ivry, R. B. (2007a). Language and thought: Which side

are you on, anyway? In B. C. Malt & P. M. Wolff (Eds.), Words and the World: How words capture

human experience (pp. 165–182). Oxford: Oxford University Press.

Regier, T., Kay, P., & Khetarpal, N. (2007b). Color Naming Reflects Optimal Partitions of

Color Space. Proceedings of the National Academy of Sciences of the United States of America, 104(4),

1436–1441.

Reisig, K. (1881). Vorlesungen über lateinische Sprachwissenschaft. Berlin: Calvary.

Richardson, J. T. E. (1976). Imageability and concreteness. Bulletin of the Psychonomic Society,

7(5), 429–431.

Robinson, J. A. (2010a). “Awesome” insights into semantic variation. In D. Geeraerts, G.

Kristiansen, & Y. Peirsman (Eds.), Advances in cognitive sociolinguistics (pp. 85–109). Berlin ; New

York: de Gruyter Mouton. Retrieved January 13, 2014, from http://sro.sussex.ac.uk/21191/

Robinson, J. A. (2010b, March 23). Semantic Variation and Change in Present-day English

(Doctoral Thesis). University of Sheffield. Retrieved January 13, 2014, from

http://etheses.whiterose.ac.uk/2232/

Rundblad, G. (2000). On the Correlation between Lexical Stability and Word Creation

Device. Journal of Quantitative Linguistics, 7(1), 31–41.

Sankoff, D. (1970). On the Rate of Replacement of Word-Meaning Relationships. Language,

46(3), 564.

Saunders, B. A. C. (1992). The invention of basic colour terms. Utrecht: ISOR.

Scheibman, J. (2002). Point of view and grammar: structural patterns of subjectivity in American English

conversation. Amsterdam; Philadelphia: John Benjamins Pub.

Schirillo, J. A. (2001). Tutorial on the importance of color in language and culture. Color

Research & Application, 26(3), 179–192.

Schmid, H.-J. (2010). Does frequency in text instantiate entrenchment in the cognitive

system? In D. Glynn & K. Fischer (Eds.), Quantitative Methods in Cognitive Semantics: Corpus-Driven

Approaches. Berlin, New York: De Gruyter Mouton. Retrieved November 17, 2016, from

http://www.degruyter.com/view/books/9783110226423/9783110226423.101/978311022642

3.101.xml

Schwanenflugel, P. J. (1991). Why are abstract concepts hard to understand? In The psychology

of word meanings (pp. 223–250). Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.

Serva, M., & Petroni, F. (2008). Indo-European languages tree by Levenshtein distance. EPL

(Europhysics Letters), 81(6), 68005.

Seufert, G. (1955). Farbnamenlexikon von A - Z. Göttingen: Musterschmidt.

Page 239: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

227

Singh, R. (1980). Aspects of language borrowing: English loans in Hindi. In P. H. Nelde

(Ed.), Sprachkontakt Und Sprachkonflikt =: Languages in Contact and Conflict = Langues En Contact Et

En Conflit (pp. 113–116). Wiesbaden: Niemeyer.

Soares, A. P., Comesaña, M., Pinheiro, A. P., Simões, A., & Frade, C. S. (2012). The

adaptation of the Affective Norms for English Words (ANEW) for European Portuguese.

Behavior Research Methods, 44(1), 256–269.

Southworth, F. C. (1990). Synchronic manifestations of linguistic change. In E. C. Polomé

(Ed.), Research Guide on Language Change (pp. 25–36). Berlin ; New York: Mouton de Gruyter.

Retrieved November 16, 2011, from

Sperber, H. (1923). Einführung in die Bedeutungslehre. Bonn: Dümmler.

Stadthagen-Gonzalez, H., & Davis, C. J. (2006). The Bristol norms for age of acquisition,

imageability, and familiarity. Behavior Research Methods, 38(4), 598–605.

Starostin, S. (2007). Opredelenije ustojčivosti bazisnoj leksiki [Defining the Stability of Basic

Lexicon]. In Trudy po jazykoznaniju [Works in Linguistics] (pp. 827–839). Moscow: Jazyki

slav’anskix kul’tur.

Steinvall, A. (2002). English Color Terms in Context. Umeå: Umeå University.

Steinvall, A. (2006). Basic Colour Terms and Type Modification. Progress in Colour Studies:

Language and Culture, 1, 57.

Stern, G. (1931). Meaning and Change of Meaning: with Special Reference to the English Language.

Bloomington: Indiana University Press.

Stoffel, C. (1901). Intensives and Down-toners: A Study in English Adverbs. Heidelberg: C. Winter’s

universitätsbuchhandlung. Retrieved from

https://archive.org/details/intensivesanddo00stofgoog

Sun, R. K. (1983). Perceptual Distances and the Basic Color Term Encoding Sequence.

American Anthropologist, 85(2), 387–391.

Swadesh, M. (1950). Salish Internal Relationships. International Journal of American Linguistics,

16(4), 157–167.

Tadmor, U. (2009). Loanwords in the World’s Languages: Findings and Results. In U.

Tadmor & M. Haspelmath, Loanwords in the World’s Languages A Comparative Handbook (pp. 55–

75). Berlin: Walter de Gruyter.

Tarsi, M. (2014). On Loanwords of Latin Origin in Contemporary Icelandic. Nordicum-

Mediterraneum, 9(1), digital, no page range.

Traugott, E. C., & Dasher, R. B. (2001). Regularity in Semantic Change. Cambridge: Cambridge

University Press. Retrieved from http://dx.doi.org/10.1017/CBO9780511486500

Traugott, E., & Dasher, R. B. (2002). Regularity in semantic change. Cambridge ; New York:

Cambridge University Press.

Trier, J. (1931). Der Deutsche Wortschatz im Sinnbezirk des Verstandes: die Geschichte eines sprachlichen

Feldes. Heidelberg: Winter.

Ullman, S. (1957). The principles of semantics. Oxford: Blackwell Publishers.

Page 240: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

228

Uusküla, M. (2007). The Basic Colour Terms of Finnish. SKY Journal of Linguistics, 20, 367–

397.

Uusküla, M., & Eesalu, M. (2014). Glossy vs. matte: An overlooked feature in

psycholinguistic colour naming studies (pp. 445–450). Presented at the 10th Gruppo del Colore

- Associazione Italiana Colore conference, Genua, Italu. Retrieved October 2, 2014, from

http://www.researchgate.net/publication/265683830_Glossy_vs._matte_An_overlooked_feat

ure_in_psycholinguistic_colour_naming_studies

van Scherpenberg, C. (2013). Pink im Deutschen und Dänischen Eine Studie zur Semantik

von Farbwortentlehnungen. Tagungsband Der 53. Studentischen Tagung Sprachwissenschaft (StuTS), 9,

81.

Vejdemo, S. (2009, October). Presentation: The changing nature of Swedish GIRLs – report on a

corpus study on lexical change. Presented at the Annual Meeting of the Michigan Linguistic Society,

the University of Michigan, Ann Arbor.

Vejdemo, S. (2010). Cross-linguistic Lexical Change: Why, How and How Fast? In Proceedings

of WIGL 2010 (Vol. 8). Wisconsin. Retrieved from

http://vanhise.lss.wisc.edu/ling/?q=node/164

Vejdemo, S., & Hörberg, T. (2016). Semantic Factors Predict the Rate of Lexical

Replacement of Content Words. PLOS ONE, 11(1), 1–15.

Vejdemo, S., Levisen, C., Beck, T. G., von Scherpenberg, C., Næss, Å., Zimmerman, M., …

Whelpton, M. (2015). Two Kinds of Pink: Development and Difference in Germanic Colour

Semantics. Language Sciences, 49, 19–34.

Verhagen, A. (2010). Construal and Perspectivization. In D. Geeraerts & H. Cuyckens (Eds.),

The Oxford Handbook of Cognitive Linguistics (pp. 48–81). New York: Oxford University Press.

Retrieved May 27, 2016, from

http://oxfordindex.oup.com/view/10.1093/oxfordhb/9780199738632.013.0003

Võ, M. L. H., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J., & Jacobs, A. M. (2009).

The Berlin Affective Word List Reloaded (BAWL-R). Behavior Research Methods, 41(2), 534–538.

von Wandenfels, R. (2006). Compiling a parallel corpus of Slavic languages. Text strategies,

tools and the question of lemmatization in alignment. Beiträge Der Europäischen Slavistischen

Linguistik (POLYSLAV), 9, 123–138.

Waggoner, T. L. (2002). Quick six colour vision test pseudoisochromatic plates. In Colour

vision testing made easy. Good-Lite company.

Waldman, G. (2002). Introduction to Light: The Physics of Light, Vision, and Color. Courier

Corporation.

Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and

dominance for 13,915 English lemmas. Behavior Research Methods, 45(4), 1191–1207.

Weinreich, U., Labov, W., & Herzog, M. (1968). Empirical foundations for a theory of

language change. In W. P. Lehman & Y. Malkiel (Eds.), Directions for historical linguistics (pp. 95–

195). Austin ; London: University of Texas Press.

Page 241: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

229

Wichmann, S., Holman, E. W., Bakker, D., & Brown, C. H. (2010). Evaluating linguistic

distance measures. Physica A: Statistical Mechanics and Its Applications, 389(17), 3632–3639.

Wierzbicka, A. (1990). The meaning of color terms: semantics, culture, and cognition.

Cognitive Linguistics, 1(1), 99–150.

Wierzbicka, A. (1996). Semantics : Primes and Universals: Primes and Universals. Oxford: Oxford

University Press.

Wierzbicka, A. (2005). There Are No “Color Universals” but There Are Universals of Visual

Semantics. Anthropological Linguistics, 47(2), 217–244.

Wierzbicka, A. (2008). Why there are no “colour universals” in language and thought. Journal

of the Royal Anthropological Institute, 14(2), 407–425.

Wilkins, D. P. (1996). Natural tendencies of semantic change and the search for cognates. In

The Comparative Method Reviewed: Regularity and irregularity in language change (pp. 579–655). New

York: Oxford University Press.

Winter, W. (1971). Formal frequency and linguistic change: some preliminary comments.

Folia Linguistica, 5(1–2). Retrieved October 7, 2015, from

http://www.degruyter.com/view/j/flin.1969.5.issue-1-2/flin.1969.5.1-2.55/flin.1969.5.1-

2.55.xml

Wohlgemuth, J. (2009). A Typology of Verbal Borrowings. Berlin ; New York: Mouton de

Gruyter.

Wooten, B., & Miller, D. L. (1997). The psychophysics of color. In C. L. Hardin & L. Maffi

(Eds.), Color Categories in Thought and Language (pp. 59–88). Cambridge: Cambridge University

Press. Retrieved May 17, 2016, from

http://ebooks.cambridge.org/ref/id/CBO9780511519819A011

Wuerger, S. (2013). Colour Constancy Across the Life Span: Evidence for Compensatory

Mechanisms. PLoS ONE, 8(5), e63921.

Zauner, A. (1902). Die romanischen Namen der Körperteile: Eine onomasiologische Studie. Erlangen:

K.B. Hof- und Universitäts-Buchdruckerei von F. Junge. Retrieved November 8, 2016, from

http://archive.org/details/dieromanischenn00zaungoog

Zimmermann, M., Levisen, C., Guðmundsdóttir Beck, þórhalla, & van Scherpenberg, C.

(2015). Please pass me the skin coloured crayon! Semantics, socialisation, and folk models of

race in contemporary Europe. Language Sciences, (49), 35–50.

Zipf, G. (1935). The Psychobiology of Language. Boston: Houghton Mifflin Co.

Zollinger, H. (1984). Why just turquoise? Remarks on the evolution of color terms.

Psychological Research, 46(4), 403–409.

Page 242: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

230

Corpora “Bonniersromaner I (1976/77)” (2015). Språkbanken, Gothenburg University,

http://spraakbanken.gu.se/, accessed 20150603. “Bonniersromaner II (1980/81)” (2015). Språkbanken, Gothenburg University,

http://spraakbanken.gu.se/, accessed 20150603. “Norstedtsromaner (1999)” (2015). Språkbanken, Gothenburg University,

http://spraakbanken.gu.se/, accessed 20150603. Total: 13.4 million tokens. “Äldre Svenska Romaner Korpus” (2015). Språkbanken, Gothenburg University,

http://spraakbanken.gu.se/, accessed 20150603.

Floras

Hartman, C. (1866). Landskapet Nerikes flora : För nybörjare utgifven. Örebro: N. M. Lindh.

Kindberg, N. C. (1861). Östgöta flora : (fanerogamerna). Linköping: Sahlström.

Kindberg, N. C. (1877). Svensk flora : beskrifning öfver Sveriges fanerogamer och ormbunkar.

Linköping: Sahlströms bokh.

Krok, T. O. B. N., & Almquist, S. (1920). Svensk flora. 1, Fanerogamer. Stockholm.

Larsson, L. M. (1868). Flora öfver Wermland och Dal. Carlstad: Kjellin.

Mossberg, B., Stenberg, L., & Ericsson, S. (1992). Den nordiska floran. Stockholm: Wahlström

& Widstrand.

Neuman, L. M., & Ahlfvengren, F. E. (1901). Sveriges flora : fanerogamerna. Lund: Gleerup.

Nordenstam, J.-O., Larsson, U., Hansson, A., & Tönnby, I. (2012). Digiflora.se [Elektronisk

resurs]. Retrieved February 23, 2016, from http://digiflora.se/seek/webui.php

Sandberg, F., Göthberg, G., & Norberg, B.-M. (1998). Örtmedicin och växtmagi. Stockholm: Det

bästa [Reader’s Digest].

Thedenius, K. F. (1871). Flora öfver Uplands och Södermanlands fanerogamer och bräkenartade växter.

Stockholm: Författarnas Förlag.

Ursing, B. (1944). Svenska växter i text och bild. Stockholm: Nordisk rotogravyr. Dictionaries and Encyclopedias

Cleasby, R., & Vigfússon, G. (1874). An Icelandic-English dictionary based on the ms. collections of the

late Richard Cleasby. Enlarged and completed by Gudbrand Vigfússon. Oxford Clarendon Press.

Retrieved March 9, 2016, from http://archive.org/details/icelandicenglish00cleauoft

Dalin, A. F. (1850). Ordbok öfver svenska språket. Stockholm: A.F. Dalin.

DDO. (2016). Den Danske Ordbog: Moerne Dansk Sprog. Retrieved from http://ordnet.dk/ddo/

DWDS. (2012, May 10). Digitales Wörterbuch der Deutschen Sprache. Retrieved from

http://www.dwds.de/

Hansen, M. C. (1842). Fremmed-Ordbog: eller Forklaring over de i det norske Skrift og Omgangs-Sprog

almindeligst forekommende fremmede Ord og Talemaader. Guldberg & Dzwonkowski.

Hellquist, E. (1922). Svensk etymologisk ordbok. Lund: Gleerups.

Ingemann, T. (2011). Synonymordbog. Copenhagen: Gyldendal.

Nordisk familjebok. (1876-1899). Stockholm.

Page 243: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier

231

Nordisk familjebok Uggleutgåvan. (1904-1926). Stockholm.

ODS. (2016). Ordbog over det danske Sprog: Historisk ordbog 1700-1950. Retrieved from

http://ordnet.dk/ods/

Oxford English Dictionary. (2016). Oxford: Oxford University Press. Retrieved March 16, 2016,

from http://www.oed.com/

Pocket Oxford American Thesaurus Online. (2008, January 1) (second edition). Oxford University

Press. Retrieved from

http://www.oxfordreference.com/view/10.1093/acref/9780195301694.001.0001/

SAOB. (2014). SAOB Svenska Akademiens Ordbok Online. Retrieved from

http://g3.spraakdata.gu.se/saob/

SAOL. (1889). Ordlista öfver svenska språket (6., omarb. uppl). Stockholm: Norstedt.

SAOL. (1900). Ordlista öfver svenska språket (7. uppl., omarb. och utvidgad). Stockholm:

Norstedt.

SAOL. (1923). Ordlista Över Svenska Språket (8. uppl., omarb. och utvidgad). Stockholm:

Norstedt.

SAOL. (1950). Svenska Akademiens Ordlista Över Svenska Språket (9. uppl., omarb. och

utvidgad). Stockholm: Norstedt.

SAOL. (1973). Svenska Akademiens Ordlista Över Svenska Språket (10. uppl., omarb. och

utvidgad). Stockholm: Norstedt.

SAOL. (1986). Svenska Akademiens Ordlista Över Svenska Språket (11th ed.). Stockholm:

Norstedt.

SAOL. (1998). Svenska Akademiens Ordlista Över Svenska Språket (12th ed.). Stockholm:

Norstedt.

SAOL. (2006). Svenska Akademiens Ordlista Över Svenska Språket (13th ed.). Stockholm:

Norstedt.

SAOL. (2015). Svenska akademiens ordlista över svenska språket (Fjortonde upplagan). Stockholm:

Svenska akademien : Norstedts akademiska förlag [distributör.

TLFI. (2016, May 5). Le Trésor de la langue française informatisé. Retrieved from

http://www.atilf.fr/

Walter, G. (2000). Bonniers synonymordbok. Stockholm: Bonnier.

Page 244: From Predictive Statistical Models to Descriptive Color ...su.diva-portal.org/smash/get/diva2:1064750/FULLTEXT01.pdf · 3.1.2 Lexical replacement and Swadesh lists 34 3.1.3 Earlier