Last Names of Albanians Ahg12015

12
doi: 10.1111/ahg.12015 Surnames in Albania: A Study of the Population of Albania through Isonymy Ilia Mikerezi 2 , Endrit Xhina 2 , Chiara Scapoli 1 , Guido Barbujani 1 , Elisabetta Mamolini 1 , Massimo Sandri 1 , Alberto Carrieri 1 , Alvaro Rodriguez-Larralde 3 and Italo Barrai 1 1 Department of Life Sciences and Biotechnology, University of Ferrara, 44121, Ferrara, Italy 2 Department of Biology, Faculty of Natural Sciences, Tirana, Albania 3 Centro de Medicina Experimental, Laboratorio de Genetica Humana, IVIC, Apdo. 20632, Caracas 1020A, Venezuela Summary In order to describe the isonymic structure of Albania, the distribution of 3,068,447 surnames was studied in the 12 prefectures and their administrative subdivisions: the 36 districts and 321 communes. The number of different surnames found was 37,184. Effective surname number for the entire country was 1327, the average for prefectures was 653.3 ± 84.3, for districts 365.9 ± 42.0 and for communes 122.6 ± 8.7. These values display a variation of inbreeding between administrative levels in the Albanian population, which can be attributed to the previously published “Prefecture effect”. Matrices of isonymic distances between units within administrative levels were tested for correlation with geographic distances. The correlations were highest for prefectures (r = 0.71 ± 0.06 for Euclidean distance) and lowest for communes (r = 0.37 ± 0.011 for Nei’s distance). The multivariate analyses (Principal component analysis and Multidimensional Scaling) of prefectures identify three main clusters, one toward the North, the second in Central Albania, and the third in the South. This pattern is consistent with important subclusters from districts and communes, which point out that the country may have been colonised by diffusion of groups in the North-South direction, and from Macedonia in the East, over a pre-existing Illiryan population. Keywords: Albania, population structure, isonymy, inbreeding, isolation by distance Introduction Albania has a long and complex history. It was populated by an Aryan people, the Illiryans, around 3000 BC. In historical times, it was conquered by the Macedons of Phylip in 300– 350 BC, coming under Greek power. Then, it became a Roman province first under the Republic and then under the Empire for about five centuries. After the split of the Empire, it stayed under the rule of the Byzantines until the 15th century, when it became part of the Ottoman Empire. When the Ottoman Empire dissolved in 1912, nationalism arose in Albania, and the country gained independence in Corresponding author: Chiara Scapoli, Department of Life Sci- ences and Biotechnology, University of Ferrara, Via L. Borsari 46, – I-44121 Ferrara, Italy. Tel: +39-0532-455744; Fax: +39-0532- 249761; E-mail: [email protected] 1920, and excluding the World War II parenthesis, it has been independent ever since. The language spoken in Albania is a separate Indo- European branch spoken by more than 7 million persons, and has influences from Latin, Greek, and in modern times from Southern Slavic. The land is mountainous, and the Alba- nians call themselves Shqipetari, “children of the eagles”. The present language is derived from the Tosk¨ e dialect, which is spoken in the South of the country, as opposed to the Geg¨ e di- alect in the North. Due to the relative isolation of the country and to minor settlements of invading armies over the course of centuries, its population seems of considerable interest for the study of population genetics. However, studies of the genetic structure of the Albanian population are recent and few, and refer mainly to the fre- quencies of traditional blood group markers (Mikerezi et al., 1995; Susanne et al., 1996) and to the distribution of sur- names (Mikerezi et al., 2003). In this work, we continue to 232 Annals of Human Genetics (2013) 77,232–243 C 2013 Blackwell Publishing Ltd/University College London

description

Scientific Article in Annals of Human Genetics about the Last Names of Albanian Population living within the territory of today's Republic of Albania

Transcript of Last Names of Albanians Ahg12015

Page 1: Last Names of Albanians Ahg12015

doi: 10.1111/ahg.12015

Surnames in Albania: A Study of the Population of Albaniathrough Isonymy

Ilia Mikerezi2, Endrit Xhina2, Chiara Scapoli1∗, Guido Barbujani1, Elisabetta Mamolini1,Massimo Sandri1, Alberto Carrieri1, Alvaro Rodriguez-Larralde3 and Italo Barrai11Department of Life Sciences and Biotechnology, University of Ferrara, 44121, Ferrara, Italy2Department of Biology, Faculty of Natural Sciences, Tirana, Albania3Centro de Medicina Experimental, Laboratorio de Genetica Humana, IVIC, Apdo. 20632, Caracas 1020A, Venezuela

Summary

In order to describe the isonymic structure of Albania, the distribution of 3,068,447 surnames was studied in the 12prefectures and their administrative subdivisions: the 36 districts and 321 communes. The number of different surnamesfound was 37,184. Effective surname number for the entire country was 1327, the average for prefectures was 653.3 ±84.3, for districts 365.9 ± 42.0 and for communes 122.6 ± 8.7. These values display a variation of inbreeding betweenadministrative levels in the Albanian population, which can be attributed to the previously published “Prefecture effect”.

Matrices of isonymic distances between units within administrative levels were tested for correlation with geographicdistances. The correlations were highest for prefectures (r = 0.71 ± 0.06 for Euclidean distance) and lowest for communes(r = 0.37 ± 0.011 for Nei’s distance).

The multivariate analyses (Principal component analysis and Multidimensional Scaling) of prefectures identify three mainclusters, one toward the North, the second in Central Albania, and the third in the South. This pattern is consistentwith important subclusters from districts and communes, which point out that the country may have been colonised bydiffusion of groups in the North-South direction, and from Macedonia in the East, over a pre-existing Illiryan population.

Keywords: Albania, population structure, isonymy, inbreeding, isolation by distance

Introduction

Albania has a long and complex history. It was populated byan Aryan people, the Illiryans, around 3000 BC. In historicaltimes, it was conquered by the Macedons of Phylip in 300–350 BC, coming under Greek power. Then, it became aRoman province first under the Republic and then underthe Empire for about five centuries. After the split of theEmpire, it stayed under the rule of the Byzantines until the15th century, when it became part of the Ottoman Empire.When the Ottoman Empire dissolved in 1912, nationalismarose in Albania, and the country gained independence in

∗Corresponding author: Chiara Scapoli, Department of Life Sci-ences and Biotechnology, University of Ferrara, Via L. Borsari 46,– I-44121 Ferrara, Italy. Tel: +39-0532-455744; Fax: +39-0532-249761; E-mail: [email protected]

1920, and excluding the World War II parenthesis, it has beenindependent ever since.

The language spoken in Albania is a separate Indo-European branch spoken by more than 7 million persons,and has influences from Latin, Greek, and in modern timesfrom Southern Slavic. The land is mountainous, and the Alba-nians call themselves Shqipetari, “children of the eagles”. Thepresent language is derived from the Toske dialect, which isspoken in the South of the country, as opposed to the Gege di-alect in the North. Due to the relative isolation of the countryand to minor settlements of invading armies over the courseof centuries, its population seems of considerable interest forthe study of population genetics.

However, studies of the genetic structure of the Albanianpopulation are recent and few, and refer mainly to the fre-quencies of traditional blood group markers (Mikerezi et al.,1995; Susanne et al., 1996) and to the distribution of sur-names (Mikerezi et al., 2003). In this work, we continue to

232 Annals of Human Genetics (2013) 77,232–243 C© 2013 Blackwell Publishing Ltd/University College London

Page 2: Last Names of Albanians Ahg12015

Surnames in Albania

investigate the Albanian population with the aim of detect-ing its structure through the isonymic methods as defined byCrow and Mange (Crow & Mange, 1965) in the three ad-ministrative levels of the nation, namely: 12 prefectures, 36districts and 321 communes. The data that were made avail-able to us are the surnames of the electors of the 2009 generalelections database.

We report here how, in Albania, isonymic distance varieswith geography, as we observed in other European countries.We obtained indications of the direction of migration, bystudying the geographic heterogeneity of surnames. For eachlevel, we studied the surname effective number, α, and thevalue of random inbreeding, FST.

We recall that surnames are a weak marker for inbreedingand a strong marker for migration. Two “Bianchi” in Italymay be more or less distantly related, as two “White” inBritain, but one “Bianchi” in Britain or one “White” in Italyare indicative of migration, as clearly as an immunofluorescentcell in a negative field. With this proviso, our aim in this workwas the study of the present isonymic structure of Albaniaresulting from surname drift and population movements in anarea about 320 km long and on average about 90 km wide,bordering with the Adriatic sea, South of Montenegro andKosovo, West of Macedonia and North of Greece.

Materials and Methods

Administrative Subdivisions of Albania

In 2011, one of the authors (IM) obtained from the Cen-tral Election Commission (CEC) of Albania the data suitablefor describing the isonymy structure of the country with themethodologies developed by us. In the data that were madeavailable, a total of 3,068,447 individuals were distributed inthe 12 prefectures, the 36 districts, and in 373 communes. TheAlbanian Administration classifies as “communes” 308 suchunits which are prevalently agricultural, plus 65 “bashkias”which are predominantly urban. However, several communesare pooled for electoral purposes, so that we had available321 lower units, some of them groups of smaller units. Inthis analysis, we decided to use these hierarchical subdivisionsas statistical units, since the geography of all three levels iswell-defined, and all the individuals in the sample availableare classified accordingly, communes inside districts insideprefectures inside Albania. Hence, for the analysis, we hadavailable 37,184 surnames of more than 3 million individuals,all classified according to the administrative subdivisions.

The area studied covers the entire nation, about 28,000square km, an area slightly larger than Sicily. The 12 prefec-tures differ in position, area, and population. The prefectures,districts, and communes are indicated in Figure 1. There are

Figure 1 Distribution of the 321 communes (dots) in the 12prefectures and 36 districts as acquired from 2009 census data inAlbania.

six prefectures in the North, the northernmost being Shkoder,Kukes, and Lezhe, then southward the two prefectures ofDiber and Durres, followed by the prefecture of the capitalTirane. Traditionally, the River Shkumbin in the central zoneacross the prefecture of Elbasan separates the North from theSouth and the two dialects of Albania, the Gege from theToske. The South has six prefectures, namely Elbasan itself,Fier and Berat, and Korce, Vlore, and Gjirokaster. The lastthree are the southernmost and border with Greece.

Differences in surnames due to the complexities of the 36-letter Albanian alphabet were maintained through the properASCII codes.

Annals of Human Genetics (2013) 77,232–243 233C© 2013 Blackwell Publishing Ltd/University College London

Page 3: Last Names of Albanians Ahg12015

I. Mikerezi et al.

In the following subsections, we briefly touch on and re-call the definitions of some of the statistics derived from thesurname distributions and their meaning in the study of mi-croevolution in human groups (for an exhaustive review, seeRelethford, 1988).

Isonymy within and between groupsThe main statistics derived from surname distributions are:(1) isonymy within a group J, namely Ijj = ∑

k pkj2 where

pkj is the relative frequency of surname k in group J, and thesums comprise all surnames; and (2) random isonymy betweengroups I and J estimated as Iij = ∑

kpkipkj; where pki and pkj

are the relative frequencies of surname k in groups I and J,respectively, and the sums comprise all surnames.

The distribution of surnames between groups, in this caseprefectures, districts, and communes, is useful for assessingtheir population similarities, under the limit hypothesis ofcommon origin.

Fisher’s alpha (α)Fisher’s α was estimated according to Barrai et al. (1996). Itestimates the number of surnames having equal frequency,which would result in the same isonymy as that observed.It is exactly homologous to the allele effective number ina genetic system (Barrai et al., 2000). A small value of α

would indicate large inbreeding and drift, whereas a largevalue would indicate migration and low inbreeding. It hasbeen verified (Wright, 1951) that in the presence of a rate ofmigration (m): FST = 1/(4Nm + 1), then, α = Nm + (1/4),since FST = I/4 (Crow & Mange, 1965) and α = 1/I for largesamples (Rodriguez-Larralde et al., 1993). Then, for largeN, α tends to Nm. This makes α a useful predictor of theevolutionary dynamics of a system, and a sufficient indicatorof structure.

Isolation by distanceTo detect isolation by distance, we calculate the linearcorrelation of surname distances (Lasker’s, Euclidean andNei’s) between localities I and J, with their geographicdistances.

Lasker’s distance (Rodriguez-Larralde et al., 1998) isdefined as

L = −log(Iij).

Euclidean distance (Cavalli-Sforza & Edwards, 1967) is de-fined as

E =√

1−∑

k

√pkipkj

where the summation is over all surnames. Nei’s distance (Nei,1973) is

Nd = − log

(Iij√(IiiIjj)

).

Euclidean and Nei’s distances have been developed forpurely genetic data; however, they can be applied to the fre-quencies of surnames, since these simulate alleles at a locus inthe recombining region of the Y chromosome (the daughtersinherit the surname with the paternal X chromosome).

As geographical coordinates, we used the centroids ofprefecture, district and commune areas obtained from theArcGisR© (ESRI) map downloaded from Global Administra-tive Areas site (http://gadm.org/).

The correlations of isonymic distances with the geographicones give very similar results independently from the isonymicindex used, and this is further indication that either of theisonymy measures can be used without loss of generality.

The significance of correlations was assessed with the Man-tel’s test using 1000 permutations (Mantel, 1967; Smouseet al., 1986). For a graphic representation of the surname re-lationship between different prefectures, these were mappedon the first and second dimension of the MultidimensionalScaling (MDS) of Lasker’s distance matrix. In order to de-tect the direction of surname diffusion, following Menozziet al. (1978), the first three components from the PrincipalComponent Analysis (PCA) of the same matrix, were alsoprojected individually on the Albania map, with the ArcGisR©

(ESRI) software package. To complement and clarify the clus-tering, we built dendrograms (Ward, 1963; Cavalli-Sforza &Edwards, 1967) of prefectures and of districts. These wereobtained from the matrix of Lasker distances between admin-istrative sections, using the agglomeration method of Ward(1963). They were considered only as a help to the cluster-ing, we do not imply that the present situation was generatedby subsequent splits of preexisting clusters.

Random kinshipRandom kinship �IJ (x) between any two localities I and J atdistance x is given by

�IJ (x) = K exp (–Bx) (Malecot, 1955; Kimura, 1960)

where K is the average kinship at geographic distance x =0, say average FST, and B is a function of average mutationrate and of the variance of x. Then, �IJ(x) is always positiveand is expected to decrease exponentially to 0 with increasingdistance. Random kinship was defined as

�IJ(x) = IIJ(x)/4

(Barrai et al., 2012) with average FST as the average kinshipat distance x = 0.

234 Annals of Human Genetics (2013) 77,232–243 C© 2013 Blackwell Publishing Ltd/University College London

Page 4: Last Names of Albanians Ahg12015

Surnames in Albania

Results and Discussion

The Most Frequent Surnames

The distribution, by prefecture and district, of the surnamenumbers used in the analysis with the main parameters de-rived from the isonymy theory, are given in Table 1. Thedata for communes and bashkias are presented in Table S1available, as all further supplementary materials mentionedin this paper, at our website: http://web.unife.it/utenti/alberto.carrieri/ricerca.htm.

In Figure S1, we give the distribution of the logarithm ofthe number of surnames over the logarithm of the numberof times they occur (Fox & Lasker, 1983; Zipf, 1935. Seethis last reference for the meaning and uses of the log-logdistribution). In this case, it is fairly linear (Fig. S1). It iscalled a typical rank-size distribution or Zipfian curve, andit is so named by glottologists (Adamic & Huberman, 2002),and here it indicates the number of instances (people) with aunique surname.

In Albania, surnames originated and have been establishedgenerally in the same way as in other European countries.The Albanian language belongs to the Indo-European group,and, despite several exchanges with other languages, it haspreserved its own structure in its formative elements. Accord-ing to Bidollari (2010), the language does not possess generalrules, as other Indo-European languages, for the patronimicformation like the suffixes -ades, -eides, -poulos in Greek, -ezin Spanish and Portuguese, -escu in Romanian, -ich in Slaviclanguages and so on. It does not possess suffix elements indi-cating lineage like -son, preferred frequently in the English andSwedish languages, or -sohn in German, and -sen in Danish.However, many Albanian surnames have been formed by thepatronymisation process of the anthroponyms (first names),ethnonyms and toponyms in all the cases when it was neces-sary to indicate social or geographic origin.

Albania was for nearly five centuries under Turkish oc-cupation. Therefore, several surnames, like Hoxha, Hoxhaj,Shehu, Shehaj, Dervishi and others have been introducedthrough the Muslim religion indicating in such cases lev-els of the religious hierarchy. Some other surnames have beenstrongly influenced by the Turkish language, for example, sur-names that have been formed by the introduction of suffixeslike -llari, -xhi, -lli and -li.

Here, we deal with 3,068,447 persons and 37,184 sur-names, so that the average number of instances (persons)having an unique surname, the so called “type-token” ra-tio of glottologists, is 82 (see further down our ratio SampleSize/Surnames in Table 2 and King and Jobling (2009) forother type-token ratios in Europe).

We studied in some detail the 100 most frequent surnames(Table S2). Overall, these surnames comprise 583,708 occur-

rences, equal to 19.0% of the total number of surnames usedhere. The most frequent surnames are Hoxha with 39,088 oc-currences, Cela with 14,632, Marku with 13,852, Shehu with12,348, and Muca with 12,236. After these, one finds Kola(11,443), Dervishi (10,953), Gjoka (10,191), Kurti (10,152)and in 10th place Koci (9533). Overall, the first 10 surnamescomprise 144,428 individuals, or 4.7% of the total number ofelectors.

Surnames of clear Arabic origin are frequent in the Northand the East of Albania. Dervishi (10,953), which is seventhin the general list, is the first name of clear Arabic origin,followed by Elezi (8155), Sinani (6237), Hasani (4541), andOsmani (4103). The Turkish language was the main vehi-cle for other frequent surnames that were formed by firstnames of Arabic or Persian origin like Brahimaj (1684),Brahimi (2225), Elezaj (1970), Islami (1751), among severalothers.

Greek surnames, a result of the influence of the Christianorthodox religion, are frequent in the South of the Coun-try, which borders with Greece. Short lists of the 30 mostfrequent Albanian surnames of Arabian and Greek origin aregiven in Tables S3 and S4. However, these lists are by farincomplete, since they are based on our knowledge of Arabicand Greek, knowledge, which is very limited. In particular,for the Greek names, we list only those which start with“Papa” (which means “priest”, “father”) to avoid uncertain-ties. There are 9961 surnames beginning with “Papa”, whichare joined with another name of Christian (or sometimes non-Christian) origin, like Papajani, Papajorgji, and Papanikolla.Note the curious Papazisi, which might be a translocation ofthe Arabic Aziz (which means “strong”) on the Greek “Papa”.So Papazisi might be “the father of the strong”.

Isonymy Parameters in Albanian Prefectures,Districts, and Communes

Fisher’s alpha and inbreeding by isonymyValues of α and FST are given in Table 1 for prefectures anddistricts and in Table S1 for communes. We recall that α, theeffective surname number, is the inverse of isonymy I (I =∑

p2 and α = 1/I, Barrai et al., 1996), so that FST = 1/(4α)and then the meaning of α is exactly homologous to theeffective allele number of genetic systems.

The effective surname number α, in Albania, was estimatedat 1327 for the country, considered as a unit. The average forthe 12 prefectures was 653.3 ± 84.3. For the 36 districts, itwas 365.9 ± 42.0 and for the 321 communes it was 122.6± 8.7. The difference between the estimates of α, then ofFST, in prefectures, districts, communes and for the countryas a unit, is observed when different subdivisions of the same

Annals of Human Genetics (2013) 77,232–243 235C© 2013 Blackwell Publishing Ltd/University College London

Page 5: Last Names of Albanians Ahg12015

I. Mikerezi et al.

Table 1 Prefecture, district, number of surnames N, number of different surnames S, Fisher’s α, Karlin-McGregor ν, isonymy I, and FST inAlbania. Districts grouped by prefecture.

Prefecture District N S α ν I FST

Berat 169,377 5276 496 0.00293 0.00201 0.000505Berat 112,084 4042 420 0.00374 0.00238 0.000597Kucove 35,894 2123 277 0.00767 0.0036 0.000907Skrapar 21,399 1314 273 0.01258 0.00366 0.000926

Diber 120,994 2482 377 0.00312 0.00265 0.000664Diber 50,866 1216 298 0.00582 0.00335 0.000844Mat 42,669 1296 247 0.00575 0.00404 0.001017Bulqize 27,459 915 191 0.00691 0.00521 0.001312

Durres 289,512 9698 775 0.00268 0.00129 0.000323Durres 236,662 9149 757 0.0032 0.00132 0.000331Kruje 52,850 1861 337 0.00633 0.00297 0.000746

Elbasan 299,600 6555 457 0.00153 0.00219 0.000548Elbasan 197,185 5568 442 0.00224 0.00226 0.000566Gramsh 25,062 839 168 0.00663 0.00595 0.001497Peqin 26,185 1103 103 0.00396 0.00953 0.002392Librazhd 51,168 1309 186 0.00362 0.00536 0.001346

Fier 352,352 7479 623 0.00177 0.00161 0.000402Fier 193,704 5379 510 0.00264 0.00196 0.000491Lushnje 128,406 3691 345 0.00268 0.0029 0.000726Mallakaster 30,242 998 147 0.00482 0.0068 0.001709

Gjirokaster 121,628 4544 910 0.00744 0.0011 0.000277Gjirokaster 66,969 3150 767 0.01133 0.0013 0.00033Tepelene 28,946 1621 273 0.00934 0.00365 0.000922Permet 25,713 1539 460 0.01756 0.00217 0.000553

Korce 264,449 7860 1110 0.00419 0.0009 0.000226Korce 152,114 6250 1108 0.00724 0.0009 0.000227Kolonje 15,813 1232 453 0.02783 0.00221 0.000567Pogradec 64,452 2497 378 0.00583 0.00264 0.000664Devoll 32,070 1462 211 0.00653 0.00473 0.001191

Kukes 72,875 1844 351 0.0048 0.00284 0.000714Kukes 39,510 1113 190 0.00479 0.00524 0.001317Has 13,247 270 84 0.00629 0.0118 0.00297Tropoje 20,118 886 198 0.00973 0.00504 0.001272

Lezhe 148,395 4080 173 0.00117 0.00576 0.001442Lezhe 72,257 2617 133 0.00184 0.0075 0.001879Mirdite 26,750 778 67 0.00249 0.0148 0.003708Kurbin 49,389 2192 298 0.006 0.00335 0.000842

Shkoder 239,312 7350 658 0.00275 0.00152 0.000381Shkoder 179,065 6642 637 0.00355 0.00157 0.000394Puke 21,712 892 123 0.00562 0.0081 0.002036Malesi madhe 38,535 1235 260 0.00671 0.00384 0.000965

Tirane 712,068 19,057 997 0.00141 0.001 0.000251Tirane 631,027 18,415 1048 0.00167 0.00095 0.000239Kavaje 81,041 2743 282 0.00347 0.00354 0.000889

Vlore 277,885 7335 913 0.00328 0.00109 0.000275Sarande 74,963 3534 470 0.00623 0.00213 0.000535Delvine 23,788 1504 339 0.01404 0.00295 0.000747Vlore 179,134 5327 694 0.00386 0.00144 0.000362

236 Annals of Human Genetics (2013) 77,232–243 C© 2013 Blackwell Publishing Ltd/University College London

Page 6: Last Names of Albanians Ahg12015

Surnames in Albania

Table 2 Comparison of isonymy parameters in nine European countries, in five South-American countries, in the United States and Texas,and in Yakutia. Overall, 122 million surnames were analysed.

Sample size Surnames α Isolation Type-tokenCountry (SS, millions) (S) (average) by distance (SS/S)

EuropeAustria 1 140,766 854 0.59 7.1Albania 3.0 37,184 123 0.71 82Belgium 1.1 137,442 997 0.74 8France 6 495,104 1615 0.69 12.1Germany 5.2 462,526 1596 0.51 11.2Holland 2.4 126,485 787 0.46 19Italy 5.1 215,623 1236 0.61 23.7Switzerland1 1.7 166,116 891 0.72 10.2Spain 3.6

Paternal 94,886 134 0.21 38Maternal 110,034 144 0.26 33

AsiaYakutia 0.5 44,625 107 0.69 11.1North AmericaUnited States 18 899,585 1366 0.24 20

Texas 3.6 235,740 734 0.42 15.3South AmericaArgentina3 22.6 414,441 422 0.47 54.5Venezuela2 3.9 68,665 122 0.78 56.8Bolivia4 23.2 174,922 122 0.5 144.6Paraguay3 4.8 39,047 108 0.42 122.9

1Cantons.2States.3Districts.4Provinces.

area and population are considered. Very properly in the caseof Albania, the difference constitutes the “Prefecture Effect”,identified for FST by Nei and Imaizumi (1966), in Japan, andso named by Scapoli et al. (2007). Nei and Imaizumi observedthat, for the same area and population, small subdivisions havelarger FST, and larger subdivisions have smaller FST. In theirstudy, the effect was seen in towns and in the Japanese prefec-tures where the towns were located; hence the name. It couldalso be named a “geographic scale effect” that intervenes inmany phenomena since it is just a question of heterogeneityincreasing with population size. Of course, the prefecture ef-fect is visible both on FST and α. It appears that Albania isno exception, and, since α is inversely related with FST, thesequence

FST Prefecture < FST District < FST Commune

is respected.In Albania, the lowest levels of random inbreeding, indi-

cated by FST, are expected and observed in the highly popu-lated areas of the central part of the country, the area aroundthe capital Tirana.

In the analysis, α is significantly and negatively correlated(r = –0.16) with latitude, possibly due to the average higherpopulation density of southern communes. So, the largestvalues of α (the inverse of isonymy) were seen in the largetowns, which are also capitals of prefectures. Highest α’s forcommunes were 1245 in the commune of Korce, 1222 inTirana, 990 in Durres, 748 in Vlore, and 720 in Shkoder.These large communes give the name to the prefectures wherethey are located. The lowest values observed in communeswere α = 7 in Sheze, in the prefecture of Elbasan, α = 10in Hysgjokaj and α = 11 in Ballagat, both communes inthe prefecture of Fier, and α = 12 in Shtiqen and α = 13in Surroj, both in Kukes. These communes are located inmountainous areas and have a small population.

Isolation by distanceWe studied isolation by distance through the correlationof geographic with surname distances at the prefecture,district and commune levels. We found that Euclidean,Nei’s and Lasker’s distance between the 12 prefectures were

Annals of Human Genetics (2013) 77,232–243 237C© 2013 Blackwell Publishing Ltd/University College London

Page 7: Last Names of Albanians Ahg12015

I. Mikerezi et al.

Figure 2 Variation of Lasker’s distance between prefectureswith geographic linear distance.

considerably correlated with linear geographic distance, withr = 0.709 ± 0.062, r = 0.560 ± 0.079 and r = 0.621 ± 0.082,respectively. The same tendency was observed between the 36districts, although the correlations in this case were smaller,r = 0.581 ± 0.029, r = 0.543 ± 0.033 and r = 0.584 ± 0.030,respectively. Similarly, between communes, we observed 0.47± 0.008, 0.37 ± 0.011, 0.44 ± 0.011 for Euclidean, Nei’sand Lasker’s. As an example, the variation of Lasker’s distancebetween prefectures is given in Figure 2 (see Fig. S2 for thedistribution of Lasker’s distances between districts and Fig. S3for that of Lasker’s distance between communes). Given thehigh correlation between the three measures of distance (forprefectures, r[Nei–Euclidean] = 0.85 ± 0.03; r[Nei–Lasker] = 0.74± 0.06 and r[Euclidean–Lasker] = 0.65 ± 0.08), for this analysis,we used mainly Lasker’s distance.

The signal extracted from the scatter diagram of Lasker’sdistance over kilometres for communes is given in Figure 3.Linearity seems dominant, in Albania a clear tendency towardan asymptote is not observed, as it was in Spain, Bolivia andChile (Rodriguez-Larralde et al., 2003, 2011; Barrai et al.,2012) where the relation between isonymic and geographicdistance flattens at large distances. In Albania, there is a sharpincrease of Lasker’s distance up to 120 km, which gives in-dication of isolation and drift below that distance. After that,the increase in isonymic distance becomes minor, possiblyindicating the effect of internal migration. The signal forEuclidean and Nei’s distance is given in Figures S4 and S5,respectively. Note the rapid rise of Euclidean distance towardthe asymptote, due to the sensitivity of this distance to thechange of surnames and of their frequency with increasinggeographic distance.

Figure 3 Variation of Lasker’s distance (±s.d.) over kilometresbetween 321 communes in Albania.

Figure 4 Exponential decay of random kinship (±1/2 s.d. toavoid intersection of the lower one with the abscissa) in Albaniaover geographic distance. Pairwise distances betweencommunes.

KinshipWe plotted kinship between communes as previously definedas a function of geographic distance (Fig. 4). Note that at thecommune level several pairs of communes (33 per thousand)did not share surnames.

The decrease of kinship with distance is significantly ex-ponential, as predicted by Malecot (1955), (see also Kimura,1960). Specifically, the exponential decay should be charac-teristic of structures more linear than Albania, for example, asobserved by us in Chile. However, there is considerable andsignificant agreement between Malecot theory and kinshipdecay in Albania. Then, the Malecot model is very strong

238 Annals of Human Genetics (2013) 77,232–243 C© 2013 Blackwell Publishing Ltd/University College London

Page 8: Last Names of Albanians Ahg12015

Surnames in Albania

and, possibly due to the large number of pairwise distanceswe had available, it is also applicable to a geographic structure,which, like Albania, is elongated from North to South butis poorly linear. We were not surprised when we observedthe considerable agreement between the Malecot model andkinship decay in Chile, since this latter country is practicallylinear. Still, the agreement between the model and the ob-served decay of kinship over kilometric distance in Albania,which is elongated but far from linear, is indicative of a gen-eral validity of the model although originally it was derivedonly for a linear structure.

Relations between the Administrative Sectionsof Albania

In order to obtain a general idea on the movements of popu-lation groups in Albania, we conducted MDSs and PCAs onthe matrix of Lasker’s distances between prefectures, betweendistricts and between communes. We report here and as sup-plementary material some of the results of these analyses.

PrefecturesThe MDS projection on the first two dimensions of the ma-trix between prefectures (Fig. S6) differentiates a few clusters,which correspond to groups of neighbouring prefectures. Inthe resulting dendrogram (Fig. S7), a first large cluster com-posed mainly of the central prefectures is observed: Tirane,Durres, Elbasan, Diber, Fier and Berat. These last two forma subcluster within this cluster. Then, three prefectures inthe South-East and the extreme South, namely Korce, Vloreand Gjirokaster, form the next cluster. Finally, two prefecturesof the North cluster together, Shkoder and Lezhe, whereasKukes represents an exception because, despite being a moun-tainous prefecture of the North, clusters together with theCentral prefectures, possibly due to the emigration from thepoorer areas toward the highly populated and richer areasaround the capital Tirana.

From the MDS projection in Figure S6, some other minorbut relevant points emerge, which complement the clusteringof prefectures. In particular, Tirane, Durres and Elbasan standalone at the centre of the bidimensional projection, removedfrom the other prefectures. Vlore is marginal as is Korce.

DistrictsThe projection on the first two dimensions of the MDS tendsto differentiate several clusters, which correspond fairly wellto neighbouring districts (Fig. S8).

In the dendrogram (Fig. S9), the districts of Malesi eMadhe, Tropoje and Has, at the Northern border with Mon-tenegro, cluster with Fier, Mallakaster and Vlore, which are in

the South of the country. One district in the South, Tepelene,clusters with a Central-Northern belt of the seven districts ofDurres, Kruje, Tirane, Mat, Bulqize, Diber and Kukes.

A second central main cluster, south of the former, includesin an East-West belt the districts of Kavaje, Lushnje, Peqin,Elbasan, Gramsh and Librazhd.

Then, comes a Southern group of districts: Kucove, Berat,Skrapar, Korce, Pogradec and Devoll. All these are adjacentalso geographically. However, we underline that the cluster-ing of Malesi e Madhe, Tropoje and Has in the North, withthe Vlore cluster in the South, might indicate injection, be-tween North and South of Albania, of eastern groups fromMacedonia toward the Adriatic (Fig. S9).

From the projection, some other minor but relevant pointsemerge, which complement the clustering. In particular, theTirane district stands at the centre of the bidimensional pro-jection, with Durres. This might indicate that these districts,which together comprise almost one quarter of the Albanianpopulation, possess most of the surnames of the nation.

Malesi-e-Madhe in Shkoder, and Mallakaster in Fier aremarginal both on the projection and in Albanian geography,bordering, respectively, Montenegro, Kosovo at North andthe limit of the Toske dialect in the South.

A visual indication of the isonymic proximity of districts isgiven by the maps of Figure 5 where the similarity of districtsis indicated by the similar intensity of the same colour. It isappropriate at this point to indicate that recently new methodsof identifying spatial concentration of surnames have been de-veloped (e.g. Longley et al., 2011; Chesire & Longley, 2012),which give specific examples on various ways of clustering andrepresenting geographical dimensions of surname frequencydata. Most interesting seem the developments which includeforenames to detect ethnicity of groups (Mateos et al., 2011).This adds a further dimension to isonymy studies, which needsto be explored.

CommunesWe found that, only at the commune level, there were 157pairs of communes out of 51,360, which did not share sur-names. Out of these 157 pairs, 49 included the commune ofLiqenas in Korce, which has a mainly Macedonian popula-tion. Also, 34 pairs included the commune of Lure in Diber,but we did not find a good reason for this last preference.Of course, there are various reasons why in Albania this ab-sence of the same surname in small communes may occur.We believe that, among others, one reason is to be foundin the complexity of the Albanian alphabet, which often re-sults in the same name being written differently in differentcommunes. However, there is also some effect of distance onthe phenomenon. The average geographic distance betweenthe 157 pairs having infinite Lasker’s and Nei’s distance is

Annals of Human Genetics (2013) 77,232–243 239C© 2013 Blackwell Publishing Ltd/University College London

Page 9: Last Names of Albanians Ahg12015

I. Mikerezi et al.

Figure 5 Projection of Lasker’s matrix of surname distances on districts in Albania by mapping (A) the first threePCA’s factors (I: Factor 1 = 42.8%; II: Factor 2 = 26.9%; III: Factor 3 = 11.5%) (B) the first three MDS’s dimensions(I: Dimension 1; II: Dimension 2; III: Dimension 3. Stress 11.2%).

128.9 ± 14.7 km. The average distance for the other 51,203pairs is 95.9 ± 0.06 km, and the difference is significant(t[oo] = 8.568, P � 0.0001). We bypassed the problem posedin the multivariate analysis of the distance matrices, by theelements of infinite value, by substituting to the 157 infiniteisonymic distances, the nearest maximum observed. In thisway, we met no complexities in the subsequent analysis of thedistance matrices of Lasker and Nei. It is important to notethat if the 157 infinite distances are excluded, the correlationsfor communes rise from 0.44 to 0.47 for Lasker, and from 037

to 0.39 for Nei (refer again to Fig. S3 for Lasker’s distancebetween communes).

As noted briefly above, we estimated in the commune ofKorce, the capital of the homonymous prefecture, the highestvalue of α in the Country (1245). Relatively high α and lowestimates of inbreeding are also observed in several communesof the central area in the Tirana region. This might explainthe position of this prefecture relative to the other groups andmight indicate recent immigration toward the main urbanarea of Albania. Low α (and high FST) are observed in the

240 Annals of Human Genetics (2013) 77,232–243 C© 2013 Blackwell Publishing Ltd/University College London

Page 10: Last Names of Albanians Ahg12015

Surnames in Albania

communes of Elbasan (one commune with α = 7), Fier (onecommune with α = 10 and a second one with α = 11), andKukes (one communes with α = 12 and one with α = 13).

We do not present here either the projection of the firsttwo dimensions of MDS for the 321 × 321 matrix of thecommunes nor the dendrogram derived from it. Both are,however, given as Figures S10 and S11. Since the projec-tions of the individual names of 321 elements are illegible,we decided to label with different symbols the communesNorth of the River Shkumbin in central Elbasan (152 and169, respectively) to detect whether the two main groups ofpoints depicted (Fig. S10) in the projection contain a major-ity of communes where Gege or Toske is spoken. In fact, thesubdivision is sharp, a vast majority the Gege-speaking com-munes cluster together as do those speaking Toske. Then, thetwo main groups identified through surname distances arehighly correlated with the two linguistic areas of Albania, theGege area in the North and the Toske area in the South.

We used the same technique to visualise the clusters inthe dendrogram, putting the labels G and T at the endpointsof the graph (Fig. S11). The resulting clusters correlate withlatitude, but the North-South distribution of communes isnot as clear as in the projection from the MDS.

Mapping of the first three components of Lasker’s matrixThe structures revealed by the MDSs and the dendrogramsare only partially indicative of the possible movements of thepopulation, therefore, to have a general idea of the direc-tion, if any, of settlements in Albania, we mapped on thenation (following Menozzi et al., 1978) the first three com-ponents of the matrix of Lasker’s distance, obtained from aPCA and from the MDS. We provide the PCA componentsbecause the relative importance of each component is given bythe corresponding eigenvalue, while the MDS provides thevalue of the stress for a judgement of the overall fitting on thethree dimensions. The resulting maps are given in Figure 5(A for PCA and B for MDS, respectively).

The variation of the first component, which accounts foralmost half of the variability (42.8%) in the North-Southdirection, indicates movement from the centre of the coun-try toward North and South. This might mean that from achronological point of view, immigration was in the East-West direction from Macedonia, establishing a centre of highdensity of migrants, which subsequently moved North andSouth. The third component (11.5%) gives the same indi-cation, although with minor intensity. Then, the sense ofmovement may be hypothesised from the East toward theAdriatic Coast, since the entry of surnames from the sea insignificant numbers is unlikely.

It appears that only the second component (26.9%) is some-what directional; the deviations from the second axis appear to

be ordered in the North-South direction. However, althoughthis component indicates movement in the North-South di-rection, the sense of movement cannot be detected from it,unless we accept that the highest deviations are the most re-cent ones.

Overall, the three components account for 81.2% of thesurname variation as obtained from Lasker’s distance matrix.

The mappings of the first three dimensions of the MDSseem to us compatible with those obtained from the PCA.The indication of possible East-West movement seems clearenough for the first and second dimension, and less so forthe third. So, this isonymic structure of Albania seems tobe mainly due to ancient migration from the East towardthe coast, with radiation toward the North and South, withsubsequent isolation and drift, with drift and short-range mi-gration playing a major role in the generation of the presentgeographical variation of surnames.

Conclusions

The methodology described in this paper was used to analyzethe isonymic structure of several South American countries(Rodriguez-Larralde et al., 2000, 2011; Dipierri et al., 2005,2011; Barrai et al., 2012). In these countries, 4 (Venezuela),24 (Argentina), 23 (Bolivia), 4.5 (Paraguay) and 16.5 (Chile)million surnames from the registers of electors were used.In European countries and in the United States, we anal-ysed surnames of telephone users (Barrai et al., 2001; Scapoliet al., 2005, 2007; Rodriguez-Larralde et al., 2007). In thinlypopulated Siberia, we used half a million surnames (Tarskayaet al., 2009). The average value of α for all the cities (or states,in the case of Venezuela and the United States, or districts,in the case of Argentina and Paraguay), and the isolation bydistance measured by the correlation between isonymic andgeographic distances, are given in Table 2 for the countriesstudied up to now. Several features emerge from the compar-isons reported in Table 2. First, the general similarity amongEuropean nations in profusion of surnames as measured by α,and for isolation by distance, as measured by the linear corre-lation. Secondly, the relatively small value of α in Venezuela,Bolivia, Paraguay, Spain, Chile and now Albania; and thirdly,the practical absence of isolation by distance in the UnitedStates, excluding bilingual Texas (Rodriguez-Larralde et al.,2007). In Albania, the average number of persons having thesame surname (measured by the ratio Sample Size/Surnames,given as the index SS/S in Table 2, is more similar (82) tothat of Argentina, Bolivia and Venezuela than to that of otherEuropean countries. It may be of some interest to compareour Table 2 with King and Jobling’s (2009) table 1. There,they give the mean number of carriers per surname in 5538households in 27 countries. Where applicable, their results areconsistent with ours.

Annals of Human Genetics (2013) 77,232–243 241C© 2013 Blackwell Publishing Ltd/University College London

Page 11: Last Names of Albanians Ahg12015

I. Mikerezi et al.

Albania is the only European country in which we hadnear-census data (persons below 18 years of age were not in-cluded, our data being those of electors), as we had in SouthAmerica. The ratio in countries where we had only the sur-names of telephone users is about 25% of the ratio observedin countries where we had census data. We would like to labelthis as a “census effect”, but at present it is more prudent toattribute the phenomenon to a “bias of the telephone direc-tory”. However, according to Lasker (1985), this should notbe a major problem in countries with high telephone pene-tration rates, since telephone lines are a good sample measureof households in the country. In this context, 25% of the to-tal population simply reflects four people per telephone line,which may well approach the average household size. In anycase, we will wait to explore the effect further when we shallhave available more data from national censuses, because forthe time being, barring Yakutia and Albania, the effect is con-founded with the small number of different single surnamesin the Spanish language.

In Albania, Gege is spoken in the northern prefectures andToske in the southern ones. In Vlore and in Gjirokaster bothGreek and Toske are spoken. It is interesting to note that inthe map projection of MDS analysis of communes, a vast ma-jority of the Gege-speaking communes cluster together, as dothose speaking Toske. Thus, the two main swarms identifiedthrough surname distances are highly correlated with the twolinguistic areas of Albania; the Gege area in the North andthe Toske area in the South.

In this analysis, all inbreeding estimates were lower (andα higher) in the highly populated central area, in the Tiranaregion. At present, most internal migration seems to takeplace toward the capital and the other main towns. Con-sequently, for the time being, we may conclude that cur-rently the population structure of this country is the result ofthe joint action of directional and short-range migration anddrift, with directional migration dominating over drift at shortdistances, as suggested by the rapid rise of Lasker’s over geo-graphic distance below 120 km and by its flattening above thatdistance.

Acknowledgements

The authors are grateful to the CEC of Albania who concededthe data. The authors are also particularly grateful to bothReferees who gave valuable advice. The work was supportedby grants of the University of Ferrara to Chiara Scapoli.

ReferencesAdamic, L. A. & Huberman, B. A. (2002) Zipf law and the Internet.

Glottometrics 3, 143–150.

Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. &Rodriguez-Larralde, A. (1996) Isonymy and the genetic struc-ture of Switzerland. I: The distributions of surnames. Ann HumBiol 23, 431–455.

Barrai, I., Rodriguez-Larralde, A., Mamolini, E. & Scapoli, C.(2000) Elements of the surname structure of Austria. Ann HumBiol 26, 1–15.

Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Manni, F. &Scapoli, C. (2001) Elements of the surname structure of the USA.Am J Phys Anthropol 114, 109–123.

Barrai, I., Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Acevedo,N., Mamolini, E., Sandri, M., Carrieri, A. & Scapoli, C.(2012) Surnames in Chile. A study of the population of Chilethrough isonymy. Am J Phys Anthropol 147, 380–388. doi:10.1002/ajpa.22000.

Bidollari, C. (2010) Onomastic investigations. In Albanian. Tirane:Botimet Kumi Editor.

Cavalli-Sforza, L. L. & Edwards, A. W. F. (1967) Phylogenetic analysismodels and estimation procedures. Am J Hum Genet 19, 233–257.

Chesire, J. A. & Longley, P. A. (2012) Identifying spatial concentra-tions of surnames. Int J Geogr Inform Sci 26, 309–325.

Crow, J. F. & Mange, A. (1965) Measurements of inbreeding fromthe frequency of marriages between persons of the same surname.Eugen Q 12, 199–203.

Dipierri, J. E., Alfaro, E., Scapoli, C., Mamolini, E., Rodriguez-Larralde, A. & Barrai, I. (2005) Surnames in Argentina. A popu-lation study through isonymy. Am J Phys Anthropol 128, 199–209.

Dipierri, J. E., Rodriguez-Larralde, A., Alfaro, E. L., Scapoli, C.,Mamolini, E., Salvatorelli, G., De Lorenzi, S., Sandri, M., Car-rieri, A. & Barrai, I. (2011) Surnames in Paraguay: A study ofthe population of Paraguay through isonymy. Ann Hum Genet 75,678–687. doi: 10.1111/j.1469-1809.2011.00676.x.

Fox, W. R. & Lasker, G. W. (1983) The distribution of surnamefrequencies. Int Stat Rev 51, 81–87.

Kimura, M. (1960) Outline of population genetics (in Japanese). Tokyo:Baifukan.

King, T. E. & Jobling, M. A. (2009) What’s in a name? Y chro-mosomes, surnames and the genetic genealogy revolution. TrendsGenet 25(8), 351–360.

Lasker, G. W. (1985) Surnames and genetic structure. Cambridge: Cam-bridge University Press.

Longley, P. A., Chesire, J. A. & Mateos, P. (2011) Creating a regionalgeography of Britain through the spatial analysis of surnames.Geoforum 42, 506–516.

Malecot, G. (1955) Decrease of relationship with distance. ColdSpring Harbour Symp 20, 52–53.

Mantel, N. (1967) The detection of disease clustering and a gener-alized regression approach. Cancer Res 27, 209–220.

Mateos, P., Longley, P. A. & O’Sullivan, D. (2011) Ethnicity andpopulation structure in personal naming networks. PloS ONE 6,e22943. doi:10.1371/journal.pone.0022943.

Menozzi, P., Piazza, A. & Cavalli-Sforza, L. L. (1978) Syntheticmaps of human gene frequencies in Europeans. Science 201, 786–792.

Mikerezi, I., Susanne, C., Bajrami, Z. & Kume, K. (1995) Differenti-ation of Albanian human populations and their relationships withBalkanic ethnic groups according to gene frequencies at ABO,MN and Rhesus loci. IUAES International Congress, April 20–21,1995, Torino, Italia, p. 32.

Mikerezi, I., Pizzetti, P., Lucchetti, E. & Ekonomi, M. (2003)Isonymy and the genetic structure of Albanian population. CollAntropol 27, 507–514.

242 Annals of Human Genetics (2013) 77,232–243 C© 2013 Blackwell Publishing Ltd/University College London

Page 12: Last Names of Albanians Ahg12015

Surnames in Albania

Nei, M. (1973) The theory and estimation of genetic distance. In:Genetic structure of populations (ed. N. E. Morton). Hawaii: HawaiiUniversity Press.

Nei, M. & Imaizumi, J. (1966) Genetic structure of human popula-tions. I. Local differentiation of blood groups gene frequencies inJapan. Heredity 21, 9–36.

Relethford, J. H. (1988) Estimation of kinship and genetic distancefrom surnames. Hum Biol 60, 475–492.

Rodriguez-Larralde, A., Barrai, I. & Alfonzo, J. C. (1993) Isonymystructure of four Venezuelan states. Ann Hum Biol 20, 131–145.

Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C.,Mamolini, E. & Barrai, I. (1998) Isonymy and the genetic struc-ture of Switzerland. II. Isolation by distance. Ann Hum Biol 25,533–540.

Rodriguez-Larralde, A., Morales, J. & Barrai, I. (2000) Surnamefrequency and the isonymy structure of Venezuela. Am J HumBiol 12, 352–362.

Rodriguez-Larralde, A., Gonzalez-Martin, J., Scapoli, C. & Barrai,I. (2003) The names of Spain: A study of the isonymy structureof Spain. Am J Phys Anthropol 121, 280–292.

Rodriguez-Larralde, A., Scapoli, C., Mamolini E. & Barrai, I. (2007)Surnames in Texas: A population study through isonymy. HumBiol 79, 215–239.

Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Scapoli, C.,Mamolini, E., Salvatorelli, G., De Lorenzi, S., Carrieri, A.& Barrai, I. (2011) Surnames in Bolivia: A population studythrough isonymy. Am J Phys Anthropol 144, 177–184. doi:10.1002/ajpa.21379.

Scapoli, C., Goebl, H., Sobota, S., Mamolini, E., Rodriguez-Larralde, A. & Barrai, I. (2005) Surnames and dialects in France:Population structure and cultural transmission. J Theor Biology 237,75–86.

Scapoli, C., Mamolini, E., Carrieri, A., Rodriguez-Larralde, A. &Barrai, I. (2007) Surnames in Western Europe: A comparison ofthe subcontinental populations through isonymy. Theor Popul Biol71, 37–48.

Smouse, P. E., Long, J. C. & Sokal, R. R. (1986) Multiple re-gression and correlation extensions of the Mantel test of matrixcorrespondence. Syst Zool 35, 627–632.

Susanne, C., Bajrami, Z., Kume, K. & Mikerezi, I. (1996) Genedifferentiation at the ABO, MN and Rhesus loci among Albaniansand their relation with other Balkan populations. Gene Geogr 10,31–36.

Tarskaya, L., El’chinova, G. I., Scapoli, C., Mamolini, E., Carrieri, A.Rodriguez-Larralde, A. & Barrai, I. (2009) Surnames in Siberia.A study of the population of Yakutia through isonymy. Am J PhysAnthropol 138, 190–198.

Ward, J. H. (1963) Hierarchical grouping to optimize an objectivefunction. J Am Statist Assoc 58, 236–244.

Wright, S. (1951) The genetic structure of populations. Ann Eugen15, 324–354.

Zipf, G. K. (1935) The psychobiology of language. Boston, MA:Houghton-Mifflin.

Supporting Information

Additional supporting information may be found in the onlineversion of this article:

Table S1 Distribution of isonymy parameters.

Table S2 The 100 most frequent surnames in Albania.

Table S3 The most frequent names of Arabic origin inAlbania.

Table S4 Surnames with the prefix “Papa” of clear Greekorigin.

Figure S1 Variation of the number of occurrences in 3 mil-lion surnames in Albania.

Figure S2 Variation of Lasker’s distance between 36 districtsin Albania.

Figure S3 Variation of Lasker’s distance between 321 com-munes in Albania.

Figure S4 Variation of Euclidean with geographic distance.

Figure S5 Variation of Nei’s with geographic distance.

Figure S6 MDS on the matrix of Lasker’s distances betweenPrefectures.

Figure S7 Dendrogram of Albania prefectures.

Figure S8 MDS of Lasker’s distance matrix between districts.

Figure S9 Dendrogram of districts from the matrix of Lasker’sdistance.

Figure S10 Projection of the 321 communes of Albania onthe first two dimensions of the matrix of Lasker’s distances.

Figure S11 Dendrogram of communes.

As a service to our authors and readers, this journal providessupporting information supplied by the authors. Such mate-rials are peer-reviewed and may be re-organised for onlinedelivery, but are not copy-edited or typeset. Technical sup-port issues arising from supporting information (other thanmissing files) should be addressed to the authors.

Received: 9 August 2012Accepted: 18 November 2012

Annals of Human Genetics (2013) 77,232–243 243C© 2013 Blackwell Publishing Ltd/University College London