Analyses of first names in The Netherlands: full population studies Gerrit Bloothooft Institute of...

56
Analyses of first names in The Netherlands: full population studies Gerrit Bloothooft Institute of Linguistics OTS Utrecht University

Transcript of Analyses of first names in The Netherlands: full population studies Gerrit Bloothooft Institute of...

Analyses of first names in The Netherlands: full population

studies

Gerrit Bloothooft

Institute of Linguistics OTS

Utrecht University

CTL colloqium June 2006 2

Dutch studies on first names

Limited scientific work so far– Dictionary (20.000 entries)– Few socio-linguistic studies

• Limited scope, small samples

Topic is extremely popular in the media

CTL colloqium June 2006 3

Research dimensions in onomastics

– Name– Form and spelling– Origin– Motives– Time– Place

require a lot of data

CTL colloqium June 2006 4

Full population

Gemeentelijke Basis Administratie (GBA), Civil Administration

Electronically from 1994 Legal right to use data for scientific

research 16+ million people

CTL colloqium June 2006 5

Connected!

UiL-OTS and Meertens Institute are connected to the GBA on June 1, 2006

The right to make a rich data extraction for the full population (all persons with Dutch nationality): planned July 1, 2006

CTL colloqium June 2006 6

Research proposal NWO

The first name revolution in the 20th century in The Netherlands – the first name as a measure of social and linguistic change

CTL colloqium June 2006 7

Mile stones

Traditional naming (after relatives) decreased enormously during the 20th century, especially second half

Full freedom for parents through name law of 1970

-> Naming of children became a very personal linguistic and social expression during the last 50 years

CTL colloqium June 2006 8

Major topics

Changes in naming after relatives Relations between names and social

classes (sets and spelling) Regional spread of names, dialectal

influences Life cycles of names

CTL colloqium June 2006 9

What do we get (per person)

All first names Date -, place -, postal code -, land of birth,

gender, date of decease (after 1994) Parents: first names, date & place of birth Children: first names, date & place of birth Administrative number of all persons with own

record

this is unprecedented (also internationally)!

CTL colloqium June 2006 10

Looking for mechanisms

All research topics can be described as the search for large scale mechanisms and relations

Away from the individual name, towards much higher aggregation levels

CTL colloqium June 2006 11

Towards name sets

From 16+ million names with over 200.000 different first names to a much lower number of name sets

that have homogeneous properties

CTL colloqium June 2006 12

A previous study (2000-2004)

First names from the National Social Security Bank (SVB)

All children born since 1983– first name (official, no nick name, but..)

– year of birth– family code (separate table) – postal code (four digits)

CTL colloqium June 2006 13

A very rich source

4.2 million children (1983-2002)– 200.000 per year

1.9 million families

176.800 different first names– 108.500 unique names– 3.120 names with frequency > 100

represent 85% of the children

CTL colloqium June 2006 14

Datareduction needed

Far too many names to describe one by one

Names with common properties– Not from etymological point of view– Not from linguistic point of view

– Based on choices of parents name use!

CTL colloqium June 2006 15

Naming and social classesHypothesis:

There are social classes with own naming preferences

These classes/subcultures may relate to – culture/language (Frisian, Arabic, Turkish, Surinam,

Antillean,..)

– religion (Catholic, Protestant, Islam,..)

– sociological status (education, income,..)

– geography (urban, rural, regional,..)

CTL colloqium June 2006 16

Research aims:

Identification of social classes (and their naming preferences) on the basis of the first names of children per family

Study of the relation between these subcultures (first names) and socio-cultural and geographic factors

CTL colloqium June 2006 17

Method (a chain of names)

Parents choose first names from a set that is popular in their subculture (relatives, friends, neighbours,..) (with higher probability) [Social Group size is about 150]

This is informative only if there is more than one child (more than one name) in a family

Pairs of first names (from a family) as unit for analysis

CTL colloqium June 2006 18

Method (a chain of names)

Children in on family: Mark, Peter, Linda

If Mark is popular in a subculture, then Peter and Linda may be popular as well

Name pairs: Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda - Peter

CTL colloqium June 2006 19

Method (a chain of names)

Select all families with two or more children (1.17 million families, 2.81 million children)

Derive all pairs of first names (from a single family) (in all, 2.12 million different pairs)

Compute the frequency of each pair The higher the frequency of a pair, the more

likely the first names in the pair belong to the same set

CTL colloqium June 2006 20

Most frequent name pairs

Frequency Pair of first names1091 Johannes Maria

790 Johannes Johanna

754 Jeroen Martijn

727 Johanna Maria

….

572 Mohamed Fatima

459 Lars Niels

CTL colloqium June 2006 21

Clustering of first names

Define measure that reflects relationship between two names

Combine names, which mutually have a strong relationship, into a set– Johannes, Maria, Johanna, …

CTL colloqium June 2006 22

Name relationship measure

Esther– 7.967 girls– 12.973 brothers and sisters– 276 times sister Judith (= 2.1 %)

Judith– 4.828 girls– 8.033 brothers and sisters– 276 times sister Esther (= 3.4 %)

Geometric average (2.7 %)– A symmetric measure of relationship between the two

names

CTL colloqium June 2006 23

Clustering of first names

Name pairs from a (subculture-related) set have the highest relation measure

Esther:

Judith 2.7

Mirjam 2.4

Ruben 1.2

David 1.1

Judith:

Esther 2.7

Mirjam 1.6

Ruben 1.0

Miriam 0.8

CTL colloqium June 2006 24

Clustering

Start with strongly related name-pairs Add new name-pair to existing cluster or

start a new cluster Iterative procedure

CTL colloqium June 2006 25

Clustering results

4.013 first names– Frequency of a pair > 4

result: 340 name sets– Limited number of large sets– High number of small sets

top-25 of sets is most illustrative– 2.887 first names– 2.64 million children (75%)

CTL colloqium June 2006 26

Features of name sets

Period of maximum popularity refine!– Traditional, Pre-modern (1950-1980), Modern

Language– Dutch, Frisian, English, American, French,

Spanish, Italian, [Arabic, Turkish]– Common Western

Topic area– Nature, History & Culture, Old Testament

Length– Short (one syllable), long

CTL colloqium June 2006 27

A map of name sets

Presentation of a map of name sets– Based on mutual relations between name sets

The closer two name sets on the map, the more related the sets

CTL colloqium June 2006 28

Spanish & Italian

Long American & English

Short American & English

Pre-modernEnglish & French

Long names from the Old Testament

Names from nature

Long names from history and culture

Short modern Common Western

Pre-modern Common Western

Long French Scandinavian

Pre-modern Dutch

Short modern Dutch

Traditional DutchLatin | Dutch Short traditional

DutchFrisian

CTL colloqium June 2006 29

Dimensions

Long Short

Modern

Pre-modern

Traditional

Foreign

Common Western

Dutch, Frisian

CTL colloqium June 2006 30

Spanish & Italian RICARDO

Long American & English MICHAEL

Short American & English

Pre-modern English & French DENNIS

KIM

Names from the Old Testament DANIËL

Names from nature IRIS

Names from history and culture LAURENS

Short modern TIM Common Western

Pre-modern MARK Common WesternFrench Scandinavian NIELS

CHARLOTTE Pre-modern Dutch

JEROEN Short modern Dutch BART

Traditional DutchJOHANNES | JAN Short traditional

Dutch TEUNFrisianJELLE

CTL colloqium June 2006 31

Geographical distribution

four-digit postal code area level [3584]– Big differences between pc areas

• city quarters• villages (religion)

– Enough children for characterisation• On average 1200 births per pc in 20 years• Some further name grouping needed

CTL colloqium June 2006 32

Further grouping

Traditional names (Latin form)

Traditional names (Dutch)

Frisian names

Pre-modern names (Dutch, Western)

Foreign names (English)

Short modern names (Dutch, Western, Skand)

Names from OT, history, culture, nature

Arabic & Turkish names [unrelated group]

Other [low frequent]

%

8

5

3

12

24

13

7

5

23

CTL colloqium June 2006 33

Spanish & Italian

Long American & English

Short American & English

Pre-modern English & French

Names from the Old Testament

Names from nature

Names from history and culture

Short modern Western

Pre-modern WesternFrench Scandinavian

Pre-modern Dutch

Short modern Dutch

Traditional Dutch

Short traditional Dutch

ShortPre-Modern

Foreign

TraditionalLatin Dutch

Frisian

History & Culture

CTL colloqium June 2006 34

Traditional(Dutch)AaltjeBarendDirkjeEvertGeertjeHarmJantjeKlaasMargjeTeunis

CTL colloqium June 2006 35

Traditional(Latin form)AdrianaBernardusChristinaEduardElisabethFranciscusGeertruidaHubertusJohannaKrijnMaria

CTL colloqium June 2006 36

Frisian namesAafkeBaukeDouweFroukjeJoppeJitskeJelleMennoSietskeOnnoWietskeWiebe

CTL colloqium June 2006 37

Pre-modern names (Dutch, Western)AnniekAnitaCarlaFrankJochemJeroenLindaMarkMarloesPaulSuzanne

CTL colloqium June 2006 38

Foreign names(English)AmandaDennisDannyChantalHenryIsabellaKimKevinMelissaRicardoSamanthaStephen

CTL colloqium June 2006 39

Short names(modern, Dutch,Western, Skand)AnneBartEvaGijsLisaKajNielsSanneSofieTim

CTL colloqium June 2006 40

Religion

Short names - Religion

None

Protestant

Catholic

CTL colloqium June 2006 41

Old testamenthistory, culture,natureDaniëlEstherJudithNaomiWillemijnDiederikFrederiekeMauritsIrisFleurJasmijn

CTL colloqium June 2006 42

Religion

Income

Lowest

Highest

CTL colloqium June 2006 43

Arabic and Turkish names

FatimaMohamedNouraHamzaSaraYassinFatmaMustafaHaticeMehmet

CTL colloqium June 2006 44

Further geographical analysis

Per pc area: percentage of children per name group (8 values)

These percentages reflect social composition of the pc area

Factor analysis on data from 3584 pc areas 10 typical profiles

CTL colloqium June 2006 45

10 profilesTraditional – Latin form

Traditional – Dutch

Transitional, Traditional Dutch to pre-modernTransitional, Traditional Latin form to foreign

Pre-modern

Foreign

Short

Elite

Arabic-Turkish

Frisian

CTL colloqium June 2006 46

Example profileTraditional – Latin form

Traditional – Latin formTraditional – DutchFrisian namesPre-modern namesForeign namesShort namesNames from OT, history, culture, natureArabic and Turkish namesother

%3718

18

12660

12

CTL colloqium June 2006 47

Naming map of the

Netherlands

short

foreign

Frisian

traditional Latin

elite

ArabTurkish

pre-modern

traditional Dutch

>foreign

CTL colloqium June 2006 48

Education level

EU constitution votes

CTL colloqium June 2006 49

Education level

Educational level

Highest

Lowest

CTL colloqium June 2006 50

Conclusions

Successful data reduction Name groups & subcultures

– language, income, education, religion

Geographic representation– four-digit postal code area just right

The factor time should be included

CTL colloqium June 2006 51

The Wegener connection

Direct marketing company Organises twice a year a national

consumer questionnaire 200.000 families per year

– Wide range of information• Income, education level

– Includes first names and year of birth of all family members

CTL colloqium June 2006 52

Correlation at family level(instead of postal code level)

Name set &– Income of parents– Educational level (of both parents)– (newspapers, underwear, cars, insurance,

holidays,…..) preferences of parents

CTL colloqium June 2006 53

Mathematical studies

Life cycle of a name Zipf’s behavior

– A few names with high frequency, a lot of names that are unique

information function of a name in communication

CTL colloqium June 2006 54

CTL colloqium June 2006 55

Research dimensions in onomastics

– Name– Form and spelling– Origin– Motives– Time– Place

YES, we can do great research on this with the full population data!

CTL colloqium June 2006 56

Contact Book:

Over voornamen, Het spectrum (2004)

E-mail: [email protected]

Homepage:www.let.uu.nl/~Gerrit.Bloothooft/personal

Mail:Trans 10, 3512 JK Utrecht, The Netherlands