1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international...

22
1 Inferring patterns of folktale diffusion using genomic data 1 SI Appendix 2 Eugenio Bortolini, Luca Pagani, Enrico R. Crema, Stefania Sarno, Chiara Barbieri, Alessio Boattini, Marco 3 Sazzini, Sara Graça da Silva, Gessica Martini, Mait Metspalu, Davide Pettener , Donata Luiselli, Jamshid J. 4 Tehrani 5

Transcript of 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international...

Page 1: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

1

Inferringpatternsoffolktalediffusionusinggenomicdata1

SIAppendix2

Eugenio Bortolini, Luca Pagani, Enrico R. Crema, Stefania Sarno, Chiara Barbieri, Alessio Boattini,Marco3Sazzini, SaraGraça da Silva,GessicaMartini,MaitMetspalu,Davide Pettener, Donata Luiselli, Jamshid J.4Tehrani 5

Page 2: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

2

Contents1

1EXTENDEDDATASETDESCRIPTION....................................................................................................32

2DISTANCES........................................................................................................................................33

2.1SNPFILTERING...................................................................................................................................34

2.2GENETIC,FOLKTALE,ETHNOLINGUISTICANDGEOGRAPHICDISTANCES............................................................45

2.2.1GENETICDISTANCE...................................................................................................................................46

2.2.2FOLKTALEDISTANCE.................................................................................................................................47

2.2.3GEOGRAPHICDISTANCE.............................................................................................................................58

2.2.4EUCLIDEANDISTANCES..............................................................................................................................59

3SPACEMIX..........................................................................................................................................510

4.NEIGHBORNET................................................................................................................................1211

5.AMOVA..........................................................................................................................................1312

6.BIAS-CORRECTEDDISTANCECORRELATIONANDPARTIALDISTANCECORRELATION.......................1413

7.EXPLORINGASSOCIATIONBETWEENVARIABLESOVERCUMULATIVEGEOGRAPHICDISTANCE.......1514

8.DIFFUSIONOFMOSTPOPULARTALESANDESTIMATIONOFPOSSIBLEFOCALPOINTSFROMSPATIAL15

DISTRIBUTION....................................................................................................................................1616

17

18

Page 3: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

3

1Extendeddatasetdescription1

Folktale datawere sourced from theAarne ThompsonUther (ATU) Index - a catalogue of over 2,0002

distinct “international tale types” distributed amongmore than 200 cultures [22]. Each international3

type represents an independent, self-contained storyline comprising a combination of motifs (e.g.4

specific events, characters, or artefacts) that is recognizably stable across cultures.We constructed a5

datasetrecordingthecross-culturaldistributionsoftwogroupsoffolktales:'AnimalTales'(ATU1–299),6

which featurenon-humanprotagonists, as typifiedbyAesop's fables, and 'TalesofMagic' (ATU300–7

749),whichconcernbeingsorobjectswithsupernaturalpowers,suchasfairies,witchesormagicrings8

[22].Wefocusedonthesetwogenresbecausetheyarethemostrichlydocumentedandmostculturally9

widespreadgroupsoftalesintheATUIndex.10

73ofthe198societies inwhichthetaleswererecordedcouldbematchedwithpopulationsforwhich11

wholegenomesequenceswereavailable(TableS2-I).Ofthese,33(DatasetMAIN)wereselectedbasedon12

athresholdofminimumrichness(i.e.thoseexhibitingatleast5folktales;TableS2-II)andthepresence13

ofviablegeneticproxies.Eachpopulationwasunivocallydescribedbyastringlistingthepresence(1)or14

absence(0)ofanyoftheincluded596folktales(TableS2-II).15

In addition to DatasetMAIN we generate an additional subset which is functional to testing explicit16

hypotheses, i.e. DatasetEURASIA (N=30) which does not include the 3 African population present in17

DatasetMAIN(TableS1-II,i.e.Congolese,Tanzanian,andWestAfrican); 18

2Distances19

2.1SNPfiltering20

Thewholegenomesequencesusedinthisstudyweregenerated,QCedandphasedaspartofabroader21

study[21].Thebulkof~39MSNPswereusedtocalculatethestatisticsdescribedbelow.22

23

Page 4: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

4

2.2Genetic,Folktale,EthnolinguisticandGeographicdistances1

2.2.1Geneticdistance2

Geneticdistanceswereestimatedbytheaveragepairwisedistancesbetweentwogenomes,one from3

eachpopulation.Geneticdistance for (i,j)pairsofpopulations representedbymore thanonegenome4

eachwascalculatedastheaverageofallpossible(i,j)pairsofgenomes.Asaconsequencethediagonal5

ofthegeneticdistancematrixwasnotconstrainedtobezero(TableS2-3.1-3).6

2.2.2Folktaledistance7

Sincetheoriginaldataset(TableS1-I)andDatasetMAIN(TableS1-II)comprisebinaryevidenceofpresence8

(1)orabsence (0)of a given folktale ina setofworldwidepopulations,wecalculate folktaledistance9

betweenpopulationsasanasymmetricpairwiseJaccarddistance2.SymmetricJaccarddistancebetween10

populationAandpopulationBiscalculatedas11

12

13

in other words as the ratio between the number of differences and the sum of similarities and14

differences which can be identified by comparing A and B. In the present work, we assume Xij = 015

(absence of the jth tale in the ith population of datasetX) to be the ancestral state. Accordingly,we16

adopt an asymmetrical coefficient that does not consider absence of the jth tale in two sampled17

populations iandk(Xij+Xkj=0)asaninstanceofhomology(forthesubstantial limitsposedbydouble18

zerostoinferenceinecologyandrelateddisciplinespleaserefertoLegendreandLegendre3).19

Therefore, in a datasetX formedbyn rows each representing a populationunivocally describedby a20

stringofpresence(1)/absence(0)valuesofJfolktales,weeliminatedoublezerosandcalculatepairwise21

folktaledistance(F𝜹)betweenpopulationXiandpopulationXkas22

23

wheresquareIversonbracketsequal1iftheirinternalconditionissatisfiedand0ifitisnotsatisfied.The24

resultingvalue is the ratiobetweenthenumberof inter-populationdifferences,and thesumof inter-25

populationdifferencesandsimilaritiesbasedonthepresenceofthejthfolktaleinbothpopulations.26

Page 5: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

5

2.2.3Geographicdistance1

GeographicdistanceswerecalculatedaspairwiseGreatCircleDistanceusingthepackagegdistanceinR2

[42] and by constraining the hypothesisedmovement of people through onewaypoint located in the3

SinaiPeninsula.Coordinates(longitudeandlatitudeindecimaldegrees)expressingthelocationofeach4

populationcomprisedinTableS2-5identifytheassumedcentreoftheareaoccupiedbyagivenfolkloric5

traditionasdefinedbyATUindex.6

2.2.4Euclideandistances7

In order to perform bias corrected and partial distance Correlation, folktale, genetic, and geographic8

distances were transformed into their exact Euclidean representations (as indicated in Szekely et al9

2007, 2013). The original folktale and genetic distance matrices were scaled through Classic10

Multidimensional Scaling using the function cmdscale in R and following the procedure for exact11

representation presented by Szekely et al. (2013a). Euclidean distances were computed from the12

obtained n-2 number of descriptors using the function dist in R with method set to “euclidean”.13

Euclideanrepresentationofgeographicdistancewasinsteadobtainedbyreprojectingtheoriginalsetof14

coordinatesonaplaneusingtwo-pointequidistantprojectionthroughthefunctionspTransforminthe15

packagespinR(PebesmaandBivand2005;Bivandetal.2013).ActualEuclideandistancebetweenthe16

newsetofcoordinateswascomputedusingthefunctionradishinthepackagefieldsinR(Nychkaetal.17

2016).18

19

3SpaceMix20

WeperformedtwoindependentSpaceMix(Bradburdetal.2013)analysesaimedatretrievingthe21

“genetic”andthe“folktale”spatialpositionsofourEurasiansamples.Forthegeneticrun,weusedthe22

126,554SNPsofchromosome22thatwerevariableinatleastoneof30sampleseachrepresentingone23

ofourstudiedpopulations.Giventhehighnumberofavailablemarkerswedeemedasingle24

chromosometobesufficienttoyieldreliableresults.25

ThegeographicinformationwasobtainedfromTableS2-5.1,thegeneticinformationwasinputtedusing26

thecount/totaloptionandSpacemixwasusedwiththedefaultparameters:27

run.spacemix.analysis(n.fast.reps=10,fast.MCMC.ngen=1e5,fast.model.option="target",28

long.model.option="source_and_target",data.type="counts",sample.frequencies=NULL,29

Page 6: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

6

mean.sample.sizes=NULL,counts=count,sample.sizes=total,sample.covariance=NULL,1

target.spatial.prior.scale=NULL,source.spatial.prior.scale=NULL,spatial.prior.X.coordinates=coord[,1],2

spatial.prior.Y.coordinates=coord[,2],round.earth=FALSE,long.run.initial.parameters=NULL,3

k=nrow(count),loci=ncol(count),ngen=1e6,printfreq=1e2,samplefreq=1e3,mixing.diagn.freq=50,4

savefreq=1e5,directory=NULL,prefix="OurPrefix")5

Thisyieldeda“geno-geographic”positioningforeachofthe30samples.6

Thesameprocedurewasreplicatedusingthistime“folktale”informationasinputdata.Particularlywe7

generatedapseudo-geneticfilewhereeachpopulationwastypedasasingleindividualandthe8

presenceofagivenfolktalewasregisteredasanhomozygoustrait.9

The“geno-geographic”and“folk-geographic”coordinateshencegeneratedwerecomparedwiththe10

actualgeographiccoordinates,denotingatendencyforthefolk-geographiccoordinatestoapproximate11

betterthanthegeno-geographiconestheactualgeographiclocationsofthesampledpopulations12

(FigureS1-3.1).13

14

FigureS1-3.1SpaceMixanalysesforeachoftheEurasianpopulationsshowingthegeographic(dot),15geno-geographic(G)andfolk-geographic(F)coordinatesjoinedbywhitesegments.16

17

18

19

20

Page 7: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

7

1

2

3

Page 8: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

8

1

Page 9: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

9

1

Page 10: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

10

1

Page 11: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

11

1

2

Page 12: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

12

4.NeighborNet1

ANeighbornet analysiswas also carriedout to further explore the impactof geographyand linguistic2

ancestryonthedistributionoffolktalesinthepresentdataset.Theanalysisyieldedthefollowinggraph3

(Fig. S1-4.1), exhibiting a certain degree of spatial clustering in addition to proximity and reticulation4

among linguistic close relatives (e.g. within the Indo-European family, the Semitic family, and the5

Mongolianfamily).Overall,thespatialstructureinthedatasetseemstobestronger(e.g.thepositionof6

Hungarian, Japanese/Chinese). Some language families, notably Turkic, Uralic and Caucasian, are7

scatteredacrossthenetwork.Thedegreeofreticulationisquitehigh,astestifiedbytherelevantquartet8

statistics(deltascore=0.317;Q-residualscore=0.002335),suggestingthatculturaladmixtureprocesses9

betweendemesmayhaveanimportantrole.10

Fig. S1-4.1.Neighbornet graph based on folktale distance.Linguistic color code: red = Turkic; blue =11

Indo-European;pink=Sino-Tibetan;purple=Caucasian;turquoise=Eskimo-Aleut;orange=Semitic;12

lightgreen=Uralic;darkgreen=Japonic,brown=Mongolian;black=Austro-asiatic13

14

15

Abkhaz

Tuva

Eskimo

Turkmen

Dagestan

Vietnamese

Azerbaijan

BurmeseBuryat

MongolianLebanese

SaudiArabian

Jordanian

Armenian

Mordvinian

Yakut

Chuvash

Ossetian

Byelorussian

Russian

UkrainianLithuanian

HungarianFrench

Italian

ChineseJapanese

Iranian

Tadzhik

Uzbek

0.1

Page 13: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

13

5.AMOVA1

WeuseAnalysisofMolecularVariation(AMOVA;Excoffieretal.1992)to formallyassesthe impactof2

ethnolinguisticboundariesonbothgeneticandcultural(folktale)variability.Todothis,weassignedeach3

populationtoanethnolinguisticgroup(derivedfromEthnologue;SItableS2-9.1),andusedthefunction4

amova in the package pegas in R to run the analysis (Paradis 2010). AMOVA is commonly used in5

populationgeneticstoassessthedegreeofpopulationstructureinametapopulation.Inotherwords,it6

measuresthedegreeofvariabilityexistingbetweenpredeterminedgroupsasopposedtotheamountof7

variability observed within each group. AMOVA returns a summary statistic (PhiST) obtained by8

computing the ratio between intergroup diversity estimate and total diversity in themetapopulation.9

AlthoughitisderivedfromthemoregeneralclassofFSTmeasures,PhiSTevaluatessymmetricdistance10

matriceswhileFSTisbasedoncorrelationsbetweenindividualvariantfrequencies.Suchmeasureshave11

already been successfully adopted to investigate co-evolutionary patterns in genetic and cultural12

datasets(Belletal.2009;Rseszuteketal.2012),inculturaldatasetsalone(Shennanetal2015),andin13

one case on the distribution of folktale variants in Europe (Ross et al. 2013). Results of these works14

consistently show that average intergroup cultural dissimilarity is stronger than genetic dissimilarity15

measuredonthesamesetofdemes,whilePhiSTvaluesobtainedforculturalmarkersareusually ina16

rangecomprisedbetween0.02(musicaldiversity;Rseszuteketal.2012)andhigherlevelsforvariantsof17

individual folktales (PhiST =0.09; Ross et al. 2013), or personal ornaments (PhiST =0.109) and pottery18

(PhiST=0.134)inNeolithicEurope(Shennanetal2015).Ourresultsconfirmthisdifferentialimpacton19

geneticvariabilityontheonehand,andculturalvariabilityontheother.20

21

22 23 24 25 26 27 28 29 30 31 32 33 34

Page 14: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

14

6.Bias-correcteddistancecorrelationandpartialdistancecorrelation1

2Distancecorrelation isameasureofstatisticaldependencebetweentwovariableswhich isspecifically3

suitedfortestingsuchhypothesisonpairsofsymmetricdistancematrices. Itsvalueequalszero ifand4

only if thetwovariablesarestatistically independent(Székelyetal.2007). Inaddition:a) theresulting5

statisticsarenotboundto linearmodels.Onthecontrary,theyaresensitivetoalltypesofdependent6

relationships,includingnonlinearandnonmonotoneones(Székelyetal2007);b)itisnotpronetothe7

sameproblems raised for standard andpartialMantel tests (Guillot and Rousset 2013); and c) it is8

usually preferred to othermeasures of nonlinear association when small or practical sample size are9

concerned(Gorfinetal2011).10

Giventwodistanceordissimilaritymatrices,standarddistancecorrelationperformsdoublecenteringof11

eachmatrix by subtracting row and column average to rows and columns of the originalmatrix, and12

addingthegrandmeanofthedistancematrixtotheresults,sothatallcolumnsandrowsoftheresulting13

matricessumtozero.Distancecorrelationbetweentwosuchscaledmatrices-asinPearson’sProduct-14

momentcorrelationcoefficient is computedbydividing thedistanceCovarianceby theproductof the15

respective distance standard deviations. Distance Covariance is obtained by computing the summed16

cross-product between the two double-centeredmatrices and averaging it over squared sample size.17

Distance correlation is thereforenot the correlationbetweenoriginal distances. It is insteadbasedon18

cross-productsbetweenscaledmomentobtainedbydouble-canteringtheoriginalmatrices.19

Inthepresentpaper,weperformBias-CorrectedDistanceCorrelation(SzekelyandRizzo2013)suitedfor20

biggersamplesize(inthepresentstudywehave435observationwhenallpairsareconsideredover3021

populations,whichisexactlytheexamplesizeprovidedbytheauthorsanddevelopersofthemethod),22

Thismethod corrects forpotential limitationsoforiginaldistanceCovarianceanddistanceCorrelation23

measures(Székelyetal.2007)whendimensiontendstoinfinity,andisbasedonanunbiasedestimator24

or thesquareddistancepopulationcovariance.Asuitedt-testof independence isoffered. Inaddition,25

weperformPartialdistancecorrelation(Szekelyetal2013a)toassessthe impactofonevariableover26

another,whilecontrollingfortheeffectofathirdvariable.Forcalculatingpartialdistancecorrelationthe27

standard double-centering used in standard and bias-corrected distance correlation is replaced by a28

different centering technique named U-centering and based on the demonstration that such29

transformed distance and dissimilarity matrices have a corresponding U-centered Euclidean30

representation inHilbert space (ageneralizationofEuclideanplanewitha finiteor infinitenumberof31

Page 15: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

15

dimensions;Szekelyetal2013a).Simulationstudiesconfirmthatthejointsignificancetestcontrolstype1

Ierrorrateat itsnominal level,andoutperformspartialcorrelationandpartialManteltest intermsof2

power(Szekelyetal2013a).3

4

7. Exploringassociationbetweenvariablesover cumulative geographic5

distance6

Table S1-7.1 Model comparison over cumulative geographic distance. Results report Pearson’s7

product-momentcorrelationcoefficientsplottedinFig.2andobtainedwithoriginaldistancematrices.8

9

10

11

12

13

14

15

16

17

Table 1: Model selection at cumulative geographic distances

Km N Best AIC �

Geno

Geo

L

Geno

L

Geo

w

Geno

w

Geo

Main 528 All -2389.8 4.2 17.5 0.12 0.00016 0.99 0.01

Eurasia 435 All -1895.48 0.09 2.29 0.96 0.32 0.75 0.25

2000 115 Geno -438.94 0.00 22.61 1.00 0.00 1.00 0.00

4000 249 All -1052.72 5.90 21.82 0.05 0.00 1.00 0.00

6000 343 All -1484.20 2.40 4.70 0.30 0.09 0.76 0.24

8000 412 All -1798.36 1.66 1.86 0.44 0.39 0.53 0.47

10000 435 Geno -1890.30 0.00 2.40 1.00 0.30 0.77 0.23

12000 435 All -1895.48 0.09 2.29 0.96 0.32 0.75 0.25

pdcor p

folktale˜genetic+geographic -0.11 1.00

folktale˜geographic+genetic 0.26 0.00

folktaleL˜geneticL+geographic 0.17 0.00

folkL˜geo+genoL 0.25 0.00

N bindist r.folk p.folk r.folk(Lw) p.folk(Lw) r.folk p.folk r.folk(Lw) p.folk(Lw)

geo geo geo geo geno geno geno(Lw) geno(Lw)

1 114 2000 0.15 0.06 0.15 0.06 0.27 <0.001 0.26 <0.001

2 245 4000 0.36 <0.001 0.64 <0.001 0.16 0.01 0.28 <0.001

3 339 6000 0.31 <0.001 0.70 <0.001 0.13 <0.001 0.40 <0.001

4 411 8000 0.28 <0.001 0.69 <0.001 0.23 <0.001 0.48 <0.001

5 433 10000 0.24 <0.001 0.62 <0.001 0.17 <0.001 0.52 <0.001

6 435 12000 0.31 <0.001 0.57 <0.001 0.20 <0.001 0.55 <0.001

N bindist r.folk p.folk r.folk(Lw) p.folk(Lw) r.folk p.folk r.folk(Lw) p.folk(Lw)

geo geo geo geo geno geno geno(Lw) geno(Lw)

115 2000 0.16 0.09 0.21 0.03 0.45 <0.001 0.40 <0.001

249 4000 0.24 <0.001 0.58 <0.001 0.34 <0.001 0.40 <0.001

343 6000 0.22 <0.001 0.68 <0.001 0.23 <0.001 0.51 <0.001

412 8000 0.22 <0.001 0.67 <0.001 0.22 <0.001 0.55 <0.001

434 10000 0.19 <0.001 0.64 <0.001 0.20 <0.001 0.55 <0.001

435 12000 0.19 <0.001 0.64 <0.001 0.20 <0.001 0.55 <0.001

3

Page 16: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

16

8.Diffusionofmostpopulartalesandestimationofpossiblefocalpoints1

fromspatialdistribution2

3

Weidentify19“mostpopular”tales, i.e. folktalesthatarepresent inat least30populationsoutof604

OldWorldpopulationsavailableintheoriginalpresence/absencematrix(TableS2-10,S2-11),whichare5

brieflysummarizedbelow:6

7

o ATU155‘TheUngratefulSnakeReturnedtoCaptivity’:Asnake(oranotherdangerousanimal)8

attacksamanwhorescuesit,andispunishedbyotheranimals.9

o ATU300‘TheDragonSlayer’:Amanrescuesabeautifulmaidenfromadragon/monster,often10

withthehelpofhisdogs.Later,heexposesanimposterwhoclaimscreditforthedeed.11

o ATU301‘TheThreeStolenPrincesses’:Amanrescuesthreewomenfromapit.Hiscompanions12

betrayhimbyleavinghiminthepitandstealingthegirls.Withtheaidofaspirittheheroflies13

upandexposeshiscompanions,marryingtheyoungestgirl.14

o ATU303‘TheTwins,OrBloodBrothers’:Aherorescuesawomanandmarriesher,butislater15

bewitched.Histwinbrothersetsouttofindhim,andismistakenbythewomanforherhusband.16

Thetwinreleaseshisbrother,whokillshiminajealousrage,mistakenlybelievinghimtohave17

seducedhiswife.Thetwinislaterresuscitated.18

o ATU 313 ‘TheMagic Flight’: A man elopes with the daughter of a demon or king. She uses19

magicalobjectstoobstructtheirpursuersandtheyescape.20

o ATU 314 ‘Goldener’: A golden-haired man marries the king’s daughter. He is mocked by his21

brothers-in-law,butsucceedsincompletingheroicdeedswheretheyfailandismadetheheir.22

o ATU325‘TheMagicianandhisApprentice’:Aboyisgiventoamagiciantobehisapprentice.23

Theboylearnstheartofsorceryandfreeshimselffromhismasterafterabattleinwhichthey24

transformintoasuccessionofdifferentkindsofanimals.25

o ATU400‘TheManonaQuestforhisLostWife’:Amiscellaneousgroupofstoriesconcerninga26

manwho is separated fromhiswife during an adventure.When he finds her she is about to27

marryanotherman,butheproveshisidentitytoherandtheyarereconciled.28

Page 17: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

17

o ATU403‘TheBlackandtheWhiteBride’:Agirlistomarrytheking,butherstepmothertriesto1

kill her and replace her with her own daughter. The girl proves her identity to the king and2

exposesherstepsisterasanimposter.3

o ATU 480 ‘The Kind and Unkind Girls’: A girl goes on a journey and is kind to those she4

encounters.Sheisrewarded.Herstepmothersendsherowndaughteronthesamejourneybut5

sheisunkindandgetspunished.6

o ATU531‘TheCleverHorse’:Amiscellaneousgroupofstoriesinwhichayoungmanishelpedto7

completesomenear-impossibletasksbyatalkinghorseandmarriesaprincess.8

o ATU550‘Bird,HorseandPrincess’:Threebrothersaresentonaquestbytheirfathertocatcha9

magicbird.Theyoungestbrothersucceedsbutisbetrayedbytheothertwowhotrytoclaimthe10

prize.Withthehelpofamagicalanimaltheheroexposeshisbrothers.11

o ATU554 ‘TheGratefulAnimals’:Amanhelpsa seriesof animals,who reciprocatebyhelping12

himtocompleteaseriesofnear-impossibletasks.13

o ATU560 ‘TheMagic Ring’: A boy acquires amagic ring that grants himwishes. Hemarries a14

princess,whostealstheringtoelopewithher lover.Theboyrecoversaringandpunisheshis15

faithlesswifeandherlover.16

o ATU 563 ‘The Table, the Donkey and the Stick’: A man acquires magical objects from a17

supernatural being. He is cheated out of them and given plain objects in their place, but18

managestorecoverhispossessionsandpunishthecheat.19

o ATU613‘TheTwoTravellers’:After losinganargumentwithhiscompanion,amanisblinded.20

He learns the secrets of birds and recovers his sight as well as gaining new powers. His21

companionimitateshimandispunishedbythebirds.22

o ATU670‘TheManWhoUnderstandsAnimalLanguages’:Asnaketeachesamanthelanguages23

ofanimalsonconditionhekeepsitasecret.Theman’swifenagshimtoteachherbutherefuses24

afterbeingwarnedoftheconsequencesbyamaleanimal(usuallyarooster).25

o ATU700 ‘Thumbling’:Acouplewish forachildandaregivena tinyboythroughsupernatural26

means.Theboyislostandgoesonaseriesofadventuresuntilheisreunitedwithhisparents.27

o ATU707‘TheThreeGoldenChildren’:Awomanmarriesakingandgivesbirthtothreechildren,28

whoarestolenbyherjealoussisters.Whentheygrowupthechildrengoonaquesttofindtheir29

parentsandeventuallyexposethesisters.30

Page 18: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

18

One possible explanation for the wide dispersion of these tales is that they spread through the1

disseminationofwritten texts,whichwouldallow themto travelmuch furtherand inamuchshorter2

period than would be possible solely through the vectors of human dispersal and traditional oral3

transmission.Suchaprocesswouldmostlikelyhavecoincidedwiththeemergenceofthefairytaleasa4

popular literarygenre inthesixteenthandseventeenthcenturies (Bottigheimer2014).Althoughmany5

folktaleswereincorporatedintoliteraryworkspriortothisperiod(forexample,inmedievalromances),6

thedevelopmentofnew, cheapprinting technologies togetherwith thegrowthof international trade7

networksandEuropeancolonialismwouldhaveallowedtalestocirculatetomuchwideraudiencesthan8

waspreviouslypossible(ibid.).9

Infact,sevenofthetaleslistedabovewerepublishedinmajorfairytalecollectionsduringthisperiod,10

includingGiovanni Francesco Straparola's LePiacevoliNotti in 1550-55 (ATU314,ATU325,ATU670),11

Giambattista Basile's Lo cunto de li cunti in 1634 (ATU 301, ATU 480, ATU 560), Charles Perrault's12

Histoiresoucontesdutempspasséin1697(ATU480,ATU700).However,oneobviousquestionraised13

by the hypothesis that these tales spread via textual transmission iswhy the other stories contained14

within these collections did not achieve similarly wide cross-cultural distributions. Secondly, while15

literaryversionshaveundoubtedlymadeamajorcontributiontothemodernformsofthesetales,there16

is compelling evidence that they were derived from already well-established and widespread oral17

traditions, rather thantheotherwayround (Ben-Amosetal.2010).Forexample,ATU301 ‘TheThree18

StolenPrincesses’occursinGreekandIndianmythsthatlongpredateBasile’sItalianfairytaleof1634.19

Similarly,versionsofATU325‘TheMagicianandHisPupil’,ATU560‘TheMagicRing’andATU670‘The20

ManWhoUnderstandsAnimal Languages’appear in IndianandMiddleEasternsources (including the21

RamayanaandOneThousandandOneNights) thatare clearly independentof laterEuropean literary22

versionsofthesetales(Thompson1977).23

Analternativeexplanationforthedistributionofthesetales isthattheyreflectsignaturesofdemicor24

cultural diffusion that aremore ancient than the other patterns detected in the dataset. In order to25

characterizethesesignatures,wesoughttoidentifypossiblecentersoforiginanddispersalforthetales.26

To do so, we assessed the amount of linear correlation between geographic distance from each27

populationexhibitingagiven taleand thedistributionof theproportionof the remainingpopulations28

displayingthattaleoverthesamegeographicgradient.Morespecifically,webinnedpairsofpopulations29

intofixedintervalsofgeographicdistance(2000Km),andforeachbinwecalculatedtheproportionof30

populationsexhibitingagivenfolktale.Wethencalculatelinearcorrelationbetweenthedistributionof31

Page 19: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

19

percentagesobtained foreachgeographicbinandgeographicdistance fromall thepopulations in the1

dataset that exhibit that folktale. All the above mentioned analyses have been performed in R. The2

assumptionisthat-ifweenvisagealong-rangeandancientdiffusionprocesswhosevectorsaresolely3

humandispersalandtraditionalculturaltransmission-weexpecttoobtainadistance-decaypatterning4

with higher percentages indicating potential “origin populations” from which increasingly lower5

percentagesdepartformingaclinaltrendovergeographicdistance.Toavoidlosinginformationwedid6

notfocusonsinglecoordinatepairsexhibitingthehighestvaluesforeachtale,andplottedinsteadthe7

distributionof correlation coefficientsonamap inorder to visuallydefine themostprobable areaof8

origin (centresoforiginexhibiting the lowest correlationcoefficients; FigureS8-I).Probability surfaces9

wereobtainedbyinterpolatingcorrelationcoefficientscomputedforeachpopulationusingthefunction10

producingplatesplineinterpolationofthefieldspackageinR[47].11

The resulting potential patterns of diffusion are represented in Figure S6-I. Although each of these12

patternspresentssomespecificities,somegeneraltrendscanbefound-assummarizedinthemaintext.13

Inparticular,fourmainmulti-directionalwavesofdiffusioncanbehypothezised:14

1. PotentialAfricanorigin(e.g.ATU670)15

2. SouthwardspreadfromnorthernEurasia(e.g.ATU700)16

3. EasternEuropeanorigin(e.g.ATU301,303,313)17

4. Middle-Eastern/Caucasianorigin(e.g.ATU314,400,480,560)18

19

While further research is needed to verify these patterns (for example, by reconstructing the20

evolutionaryhistoriesofvariantsofeachtaletypetotestwhethertheymatchthedispersalscenarios),21

the results have significant implications for current understandings about the origins of international22

folktale traditions. In particular, they suggest a less Euro-centric view of tale origins than traditional23

"historic-geographic"reconstructionsbasedonthefrequencyofvariants(i.e.thenumberofversionsof24

agiventaletyperecordedineachpopulation)andchronologyofliteraryversions.Amajorproblemwith25

thisapproachisthatconclusionsaboutatale'soriginsmayoftenbeskewedbythestrongEuropeanbias26

in both the richness of the folktale and literary records. For example, ATU 300 'TheDragon Slayer' –27

whichhas beenproposed tobe theoriginal archetype storyline fromwhich all fairy tales arederived28

(Propp 1968)– was previously believed to have originated in medieval western Europe, most likely29

France,wheretheearliestknownversionswererecorded.Ouranalysisinsteadsuggeststhatthistale–30

togetherwith the related taleATU313 'The Twins'mayhave arrived inwestern Europe from further31

Page 20: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

20

east,either fromtheregionofmoderndayUkraineandBelarus,orofTurkeyandKurdistan.Similarly,1

whereasfolkloristshaveclaimedthatATU670'TheManWhoUnderstandsAnimalLanguages'originated2

in Europe and was transported to Africa through colonialism, our findings reverse the direction of3

transmissionandsuggestthatthetaleprobablyaroseinNorthAfrica. 4

Page 21: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

21

1

Page 22: 1 Inferring patterns of folktale diffusion using genomic ...€¦ · 3 distinct “international tale types” distributed among more than 200 cultures [22]. Each international 4

22

FigureS1-8-I-Plotoftheprobableareasoforiginofthe19“mostpopular”tales:Probabilitysurfaces1

have been obtained interpolating correlation coefficients computed for each population. Grey dots2

indicatepopulationsthatdonotexhibitthespecifictaleofinterest.1)Tale155;2)Tale300;3)Tale301;3

4)Tale303;5)Tale313;6)Tale314;7)Tale325;8)Tale400;9)Tale403;10)Tale480;11)Tale531;12)4

Tale550;13)Tale554;14)Tale560;15)Tale563;16)Tale613;17)Tale670;18)Tale700;19)Tale707.5