Human genome diversity: Frequently asked questions
Guido Barbujani
Dipartimento di Biologia ed Evoluzione, Università di Ferrara
Total size 3 272 480 987(haploid)N of protein-coding genes 22 320N of RNA-coding genes 9 922N of gene exons 530 906N of transcripts 142 707N of segregating sites 15 040 632Nucleotide differences with chimp 1.23%Chimp orthologue genes 13 454Human genes missing in chimp 36 totally, 17 largelyClasses of genes with max. differences immune response,
reproduction, olfaction
A few human genome statistics
From www.ensembl.org version 57.37b (Jan. 2010)
Human-mouse alignment
Human-chimp alignment
Chimp chromosomes 2 and 2a
The human genome is very similar to the chimpanzee genome
Phylogenetic tree of human (n=70), chimpanzee (n=30), bonobo (n=5), gorilla (n=11) and orang-utan (n=14), based on 10,000 bp sequences of a noncoding Xq13.3 region. Kaessmann et al. (2001).
Individual genetic diversity among humans is the lowest of all primates
N of markers
Samples FST Reference
599,356 SNPs
209 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba
0.13 Weir et al. 2005
1,034,741 SNPs
71 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba
0.10 Weir et al. 2005
1,007,329 SNPs
269 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba
0.12 International HapMap Consortium 2005
443,434 SNPs
3845 worldwide distributed individuals 0.052 Auton et al. 2009
2,841,354 SNPs
210 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba
0.11 Barreiro et al. 2008
243,855 SNPs
554 individuals from 27 worldwide populations 0.123 Xing et al. 2009
100 Alu insertions
710 individuals from 23 worldwide populations 0.095 Watkins et al. 2008
67 CNVs 270 individuals from 4 populations with ancestry in Europe, Africa or Asia
0.11 Redon et al. 2006
Genomic estimates of FST for the global human population are ∼ 0.12
Human populations display ∼ 12% of the maximum possible diversity, given their allele frequencies
0.38 0.32
0.12
Genetic diversity among human populations is the lowest of all primates
Geographically-variable selectionSmall population sizesLittle gene flowIsolation
Stabilizing selectionLarge population sizesExtensive gene flowAdmixture
FST
Li et al. (2009)
Clinal variation in the geographical space is the rule for human populations
Cavalli-Sfdorza et al. (1994)
Methods 1: Estimating variances from sequence comparisons
-TACGAACATCAGGC--TATGAACATCAGGC--TATGAACATCGGGC-
Polymorphic DNA sites
Genetic variances within and between populations
Population 1 Population 2 variance between pops.
100%
19%
0%
Independent studies of genetic variances yield very similar results: 85, 5, 10
Lewontin (1972) 17 loci 85% 8% 6%Latter (1973) 18 86% 5% 9%Barbujani et al. (1997) 109 85% 5% 10%Jorde et al. (2000) 100 85% 2% 13%Romualdi et al. (2002) 32 83% 8% 9%Rosenberg et al. (2002) 377 93% 3% 4%Excoffier & Hamilton (2003) 377 88% 3% 9%Ramachandran et al. (2005) 17 90% 5% 5%Bastos-Rodriguez et al. (2006) 40 86% 2% 12%Li et al. (2008) 650 000 89% 2% 9%
MEDIANA
within populations
between populations
between races or continents
85% 5% 10%
What does it mean, in practice?
100%
100%100%
Members of our community are only slightly less different from us than members of distant populations
85%85%
85%
Mind the numbers
Humans and chimps share >98% of their genomes
Among the 1.8% differences, 1.7% are fixed differences within species
The remaining fraction, 0.1%, contains all human genomic variation
The differences among the main continental groups represent 10% of 0.1% of the total, that is, 0.01%
But 0.1% of >3 billion DNA sites means >3 million polymorphic DNA sites (3,213,401 according to Levy et al. 2007)
Methods 2: Clustering genotypes or haplotypes
K=3
K=4
Rosenberg et al., 2002
SNPs
Haplotypes
CNV
Jakobsson et al. (2008)
Structure inferred from SNPs and haplotypes differs from that inferred from Copy Number Variation
Genes, as well as morphology, suggest inconsistent clusterings of genotypes
Y chromosome: Romualdi et al. 2002
X chromosome: Wilson et al. 2001
Europe,Ethiopia
S. Africa N. Guinea
Asia
Africa
Asia, Europe, Australia, Americas
Americas
377 STR loci: Barbujani and Belle 2006
Melanesia Eurasia N Africa N America
Maya
S. Africa
E Africa
C Africa
Piapoco
Suruì
Karitiana
Kalash
W. Eurasia
E. Asia
Africa
Americas
Oceania
377 STR loci: Rosenberg et al. 2002
Sampling assumptions have a large effect on the apparent structuring
Sampling assumptions have a large effect on the apparent structuring
Serre and Pääbo (2004)
ASIP A8818G
MATP C374G
Genetic variation is discordant across loci
Similar skin colors are due to different combinations of alleles
MATP C374G
ASIP A8818G
16 completely sequenced genomes (as of May 1st, 2010)
And 5 more published on May 6th:
Two persons from the same continent may share fewer SNPs than persons of different continents
81% of SNPs cosmopolitan.
Alleles present in one continent only: 0.91% in Africa, 0.75% in Asia, practically 0 elsewhere.
Jakobsson et al. 2008(525910 SNPs, 396 CNVs)
In the 117 megabases (Mb) of sequenced exome-containing intervals, the average rate of nucleotide difference between a pair of the Bushmen was 1.2 per kb, compared to an average of 1.0 per kb between a European and Asian individual. Schuster et al. (2010)
Greater differences between Africans than between European and Asians
Genetic diversity out of Africa is
often a subset ot the African genetic
diversity
Tishkoff et al. (1998)
LD decreasing with physical distance between loci and with geographic distance from East Africa
Jakobsson et al. 2008
Gene diversity declines as a function of distance from Africa
Best fit of the model for an African exit 56,000 years agoLiu et al. (2006)
Patterns of morphological and genetic variation are compatible with the effects of dispersal from Africa
Manica et al. (2007)
Models with an African population replacing previous human continental groups explain the data better than
any alternative models
Fagundes et al. (2007)
• The human genome is very similar to the chimpanzee genome• Individual genetic diversity among humans the lowest of all primates• Population differences among humans (FSTs) the lowest among primates• Clinal patterns in the geographical space the rule, not the exception • Genes, as well as morphology, suggest inconsistent clusterings of genotypes• Genetic variation discordant across loci • Similar skin colors due to different combinations of alleles• Two persons from the same continent may be less similar than persons of different
continents • Greater differences between Africans than between European and Asians• Alleles in non African populations often a subset of African alleles• Models with an African population replacing previous human continental groups
explain the data better than any alternative models• Gradients of gene diversity correlated with distance from Africa
To make a long story short:
So, what happened?
100,000 years ago
70,000 years ago
60,000 years ago
30,000 years ago
10,000 years ago
1. In all samples, tests for presence of Neandertal mtDNA2. mtDNA from Vi33-26 and Vi33-16 identical, but diff(within) < diff(between)3. Between 95% and 99% sequences from non primates4. Enrichment of libraries with enzymes preferentially cutting bacterial
genomes
Preliminary analyses
Divergence from modern humans
Neandertals fall inside the variation of present-day humans.Overall divergence greater for the three Neandertal genomes (modes ~11%), whereas the San mode is ~9% and for the other present-day humans ~8%. For the Neandertals, 13% of windows have a divergence above 20%, whereas this is the case for 2.5% to 3.7% of windows in the current humans
Segments of Neandertal genome in non-African genomes
• Statistic: D= Difference in polyorphic alleles shared with Neandertals between two humans.
2. D(Asia-Europe) = -0.53, P=0.25
3. D(Europe-Africa) = 4.57, D(Asia-Africa) = 4.81, both P<10-12
4. The greater genetic proximity of Neandertals to Europeans and Asians than to Africans is seen no matter how we subdivide the data: (i) by individual pairs of humans, (ii) by chromosome, (iii) by transitions or transversions, (iv) by hypermutable CpG versus all other sites, (v) by Neandertal sequences shorter or longer than 50 bp.
5. Contamination of the Neandertal sequences with non-African DNA? However, the magnitude of contamination necessary to explain the Europe-Africa and Asia-Africa comparisons are both over 10% and thus inconsistent with estimates of contamination in the Neandertal data, all below 1%
?
1. Comparison with the HapMap sequences and 5 newly sequenced individuals
2. No comparison within Eurasia (Papuan-French-Han) or within Africa (Yoruba- San) shows significant skews in D
3. All comparisons of non-Africans and Africans show that the Neandertal is closer to the non-Africans
4. All or almost all the gene flow detected was from Neandertals into modern humans
5. Some old haplotypes most likely owe their presence in non Africans to gene flow from Neandertals
Four processes potentially accounting for the data
Between 1 and 4% of the genomes of people in Eurasia are derived from Neandertals
1. From erectus to Neandertal
2. From late Neandertals into the first Europeans
3. From early Neandertals into the first Eurasians
4. Ancient population structure, preserved from before the Neandertal – modern sapiens separation
Enza Colonna
And if you want to read more about all this
Trends in Genetics, July 2010
Top Related