Classifying humans
Transcript of Classifying humans
news & views
nature genetics • volume 33 • april 2003 435
Recent papers in Science1 and the Ameri-can Journal of Human Genetics2 haveshown that genetic polymorphisms can beused to predict the population of origin ofan individual. In both reports a largenumber of polymorphisms were geno-typed in population samples from aroundthe world, and a model-based clusteringmethod3 was used by the authors to ascer-tain how many distinct populations canbe found in the global sample and esti-mate the probability that an individualbelongs to one of these populations.
The algorithm used by the authors stripseach sample of its original, self-reportedpopulation label and finds the groups ofindividuals that seem to cluster. Althoughthe designers of the clustering methodstated that these groups may not alwayshave a clear biological interpretation3, in
both studies individuals in each grouptended to fall into categories that corre-sponded to principal geographical (conti-nental) divisions. At first glance, statingthat the continental ancestry of eachhuman can be identified seems to rehabili-tate the discredited notion that humanscan be classified typologically. Under-standing what these studies really mean,however, requires a closer examination.
Classification the hard wayAssigning individuals to discrete groupsrequires a large number of polymorphicmarkers to be typed. Rosenberg et al.1 esti-mated that the global classification wouldhave required a minimum of 150microsatellite polymorphisms per individ-ual. Similarly, in the study by Bamshad etal.2, 60 Alu insertion polymorphisms or 60
microsatellites were required to correctlygroup sub-Saharan Africans, Europeansand East Asians. It has been known for along time that a mere 5–15% of the globalgenetic variation is accounted for by inter-continental differences (or FST for short;refs. 4–6). By genotyping such a large num-ber of polymorphisms, the authors wereable to delve into this component of geneticvariation and accurately classify individu-als into populations. The method usedassigns each individual to a group with aprobability, and often individuals have rel-atively high probabilities of belonging tomore than one group. These probabilitiescould be interpreted as the fraction of eachindividual’s genome coming from a group.Designation of broad continental ancestry,however, should be distinguished fromsocially defined ethnic groups residing in
Classifying humansFrancesc Calafell
Unitat de Biologia Evolutiva, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.e-mail: [email protected]
Recent papers have shown the feasibility of classifying humans into categorical populations from their genotypes. How can this bereconciled with the claim that human races are biologically meaningless, and what are the implications for medical genetics projects?
BOB CRIMI
Global distribution of genetic diversity. Most genetic variation (shown here as color) is found within individuals of the same population, with a small fraction attribut-able to differences among populations.
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics
news & views
436 nature genetics • volume 33 • april 2003
the same territory, who are often classifiedas a race. Ascription to such groups is oftenthinly, or not at all, based on ancestry: forexample, Hispanics in the United States aredefined on the basis of language and mayhave ancestors from almost anywhere inthe globe in various proportions.
Gene historyInterpopulation variation for each poly-morphism may deviate from the averagedepending on a number of factors: type ofmarker and heterozygosity (for microsatel-lites, for instance, FST tends to decrease withheterozygosity), specific demographic fac-tors, random variation and selection. Thismeans that some polymorphisms may havelarge FST values and may be used more effi-ciently to ascribe individuals to continents7.
Uniparentally transmitted markers (forexample, mitochondrial DNA (mtDNA)and the non-recombining portion of the Ychromosome) present particular featuresrelating to population structure and havebeen used extensively to infer populationof origin, given that reasonable phylogeo-graphies (the population distribution ofhaplotypes) are available for both genomesegments. The Y chromosome is moregeographically compartmentalized than ismtDNA, to the point that it can be used totrack gene flow between closely relatedpopulations8 or zero in on a particularpopulation in a continent as a source ofmigrants9. Uniparental transmission,however, means that these loci capture asmall fraction of overall ancestry: ten gen-erations ago, in the absence of inbreeding,each individual had 1,024 ancestors butinherited mtDNA from only one of those.Thus, it is not surprising to find that 28%of Brazilians of light skin carry mtDNAsequences that originated in Africa10.Findings such as this reinforce the ideathat features that may be used by some todefine a ‘racial’ category have no relationto the overall genomic genotype.
Why it mattersVariation among continents in some locican be explained by geographicallyrestricted selection pressures, such asthose related to climate or to some infec-tious diseases, such as malaria11. In fact,the right-hand tail of the FST distribu-tion has been used to search for lociunder selection12. Whether owing todrift or to selection, some genes withhigh FST values may be related to healthissues, either as etiologic factors in com-mon diseases or in drug metabolism13.Therefore, even if population geneticdifferences are small, they should not beignored when trying to establish thegenetic architecture of complex, com-mon disease. Obviously, environmentalfactors may carry a greater weight indetermining differences in prevalence ofcommon diseases among sociallydefined ethnic groups.
One of the current main lines of attackfor such a complex problem as the geneticbasis of common diseases is a two-stepapproach: (i) define a minimum set of sin-gle-nucleotide polymorphisms (SNPs)that would capture the linkage disequilib-rium landscape of the human genome and(ii) apply those SNPs to association stud-ies. Human population genetic structureshould be considered in both steps. Thefirst step is being addressed by theHapMap project, in which millions ofSNPs will be typed in samples of Euro-pean, African and East Asian origin (seehttp://www.genome.gov/page.cfm?pageID=10005336). A careful consideration ofsampling and population definition willbe essential for this project. This divisionof humankind into three typologieswould not capture a substantial fraction ofgenetic variation, both by ignoring subdi-vision within continents and by missingpopulations such as Native Americans andOceanians that contribute as much as halfof the global FST (ref. 2). This is particu-
larly relevant as SNPs are being chosenwith a frequency threshold set at 10% forthe least frequent allele; a considerablefraction of common SNPs in NativeAmericans and Oceanians will probablybe missed by the simplistic approaches topopulation designation14,15.
General genetic differences betweencases and controls may result in false posi-tives in association studies. This can beavoided by carefully checking ancestry incases and controls beyond the usual self-reported ethnic identities and by typing aroutine battery of polymorphisms in bothsamples. Large differences in such poly-morphisms can flag inadequate samplesbefore the daunting task of whole-genomeSNP typing is unleashed on them.Increasingly, medical genetics will rely onpopulation-based approaches to studyingdisease. It is apparent that interpopulationgenetic differences can be identified.Though they are small, these differencesmay be used to partly understand differ-ences in disease risk among populations.The next natural level—differencesamong individuals—will pave the way forpersonalized medicine. �1. Rosenberg, N.A. et al. Science 298, 2381–2385
(2002).2. Bamshad, M.J. et al. Am. J. Hum. Genet. 72,
578–589 (2003).3. Pritchard, J.K., Stephens, M. & Donnelly, P. Genetics
155, 945–959 (2000).4. Lewontin, R.C. Evol. Biol. 6, 381–398 (1972).5. Barbujani, G., Magagni, A., Minch, E. & Cavalli-
Sforza, L.L. Proc. Natl. Acad. Sci. USA 94, 4516–4519(1997).
6. Romualdi, C. et al. Genome Res. 12, 602–612 (2002).7. Shriver, M.D. et al. Am. J. Hum. Genet. 60, 957–964
(1997).8. Bosch, E. et al.. Am. J. Hum. Genet. 68, 1019–1029
(2001).9. Bosch, E. et al. Hum. Genet. (in the press).10. Alves-Silva, J. et al. Am. J. Hum. Genet. 67, 444–461
(2000).11. Hamblin, M.T., Thompson, E.E. & Di Rienzo, A. Am.
J. Hum. Genet. 70, 369–383 (2002).12. Akey, J.M., Zhang, G., Zhang, K., Jin, L. & Shriver,
M.D. Genome Res. 12, 1805–1814 (2002).13. Wilson, J.F. et al. Nat. Genet. 29, 265–269 (2001).14. Carlson, C.S. et al. Nat. Genet. 33, 518–521 (2003).15. Reich, D.E., Gabriel, S.E. & Altshuler, D. Nat. Genet.
33, 457–458 (2003).
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics