Lisbon genome diversity

42
Human genome diversity: Frequently asked questions Guido Barbujani Dipartimento di Biologia ed Evoluzione, Università di Ferrara [email protected]

Transcript of Lisbon genome diversity

Page 1: Lisbon genome diversity

Human genome diversity: Frequently asked questions

Guido Barbujani

Dipartimento di Biologia ed Evoluzione, Università di Ferrara

[email protected]

Page 2: Lisbon genome diversity

Total size 3 272 480 987(haploid)N of protein-coding genes 22 320N of RNA-coding genes 9 922N of gene exons 530 906N of transcripts 142 707N of segregating sites 15 040 632Nucleotide differences with chimp 1.23%Chimp orthologue genes 13 454Human genes missing in chimp 36 totally, 17 largelyClasses of genes with max. differences immune response,

reproduction, olfaction

A few human genome statistics

From www.ensembl.org version 57.37b (Jan. 2010)

Page 3: Lisbon genome diversity

Human-mouse alignment

Page 4: Lisbon genome diversity

Human-chimp alignment

Chimp chromosomes 2 and 2a

The human genome is very similar to the chimpanzee genome

Page 5: Lisbon genome diversity

Phylogenetic tree of human (n=70), chimpanzee (n=30), bonobo (n=5), gorilla (n=11) and orang-utan (n=14), based on 10,000 bp sequences of a noncoding Xq13.3 region. Kaessmann et al. (2001).

Individual genetic diversity among humans is the lowest of all primates

Page 6: Lisbon genome diversity

N of markers

Samples FST Reference

599,356 SNPs

209 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba

0.13 Weir et al. 2005

1,034,741 SNPs

71 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba

0.10 Weir et al. 2005

1,007,329 SNPs

269 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba

0.12 International HapMap Consortium 2005

443,434 SNPs

3845 worldwide distributed individuals 0.052 Auton et al. 2009

2,841,354 SNPs

210 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba

0.11 Barreiro et al. 2008

243,855 SNPs

554 individuals from 27 worldwide populations 0.123 Xing et al. 2009

100 Alu insertions

710 individuals from 23 worldwide populations 0.095 Watkins et al. 2008

67 CNVs 270 individuals from 4 populations with ancestry in Europe, Africa or Asia

0.11 Redon et al. 2006

Genomic estimates of FST for the global human population are ∼ 0.12

Human populations display ∼ 12% of the maximum possible diversity, given their allele frequencies

Page 7: Lisbon genome diversity

0.38 0.32

0.12

Genetic diversity among human populations is the lowest of all primates

Geographically-variable selectionSmall population sizesLittle gene flowIsolation

Stabilizing selectionLarge population sizesExtensive gene flowAdmixture

FST

Page 8: Lisbon genome diversity

Li et al. (2009)

Clinal variation in the geographical space is the rule for human populations

Cavalli-Sfdorza et al. (1994)

Page 9: Lisbon genome diversity

Methods 1: Estimating variances from sequence comparisons

-TACGAACATCAGGC--TATGAACATCAGGC--TATGAACATCGGGC-

Polymorphic DNA sites

Page 10: Lisbon genome diversity

Genetic variances within and between populations

Population 1 Population 2 variance between pops.

100%

19%

0%

Page 11: Lisbon genome diversity

Independent studies of genetic variances yield very similar results: 85, 5, 10

Lewontin (1972) 17 loci 85% 8% 6%Latter (1973) 18 86% 5% 9%Barbujani et al. (1997) 109 85% 5% 10%Jorde et al. (2000) 100 85% 2% 13%Romualdi et al. (2002) 32 83% 8% 9%Rosenberg et al. (2002) 377 93% 3% 4%Excoffier & Hamilton (2003) 377 88% 3% 9%Ramachandran et al. (2005) 17 90% 5% 5%Bastos-Rodriguez et al. (2006) 40 86% 2% 12%Li et al. (2008) 650 000 89% 2% 9%

MEDIANA

within populations

between populations

between races or continents

85% 5% 10%

Page 12: Lisbon genome diversity

What does it mean, in practice?

100%

100%100%

Members of our community are only slightly less different from us than members of distant populations

85%85%

85%

Page 13: Lisbon genome diversity

Mind the numbers

Humans and chimps share >98% of their genomes

Among the 1.8% differences, 1.7% are fixed differences within species

The remaining fraction, 0.1%, contains all human genomic variation

The differences among the main continental groups represent 10% of 0.1% of the total, that is, 0.01%

But 0.1% of >3 billion DNA sites means >3 million polymorphic DNA sites (3,213,401 according to Levy et al. 2007)

Page 14: Lisbon genome diversity

Methods 2: Clustering genotypes or haplotypes

K=3

K=4

Rosenberg et al., 2002

Page 15: Lisbon genome diversity

SNPs

Haplotypes

CNV

Jakobsson et al. (2008)

Structure inferred from SNPs and haplotypes differs from that inferred from Copy Number Variation

Page 16: Lisbon genome diversity

Genes, as well as morphology, suggest inconsistent clusterings of genotypes

Y chromosome: Romualdi et al. 2002

X chromosome: Wilson et al. 2001

Europe,Ethiopia

S. Africa N. Guinea

Asia

Africa

Asia, Europe, Australia, Americas

Americas

377 STR loci: Barbujani and Belle 2006

Melanesia Eurasia N Africa N America

Maya

S. Africa

E Africa

C Africa

Piapoco

Suruì

Karitiana

Kalash

W. Eurasia

E. Asia

Africa

Americas

Oceania

377 STR loci: Rosenberg et al. 2002

Page 17: Lisbon genome diversity

Sampling assumptions have a large effect on the apparent structuring

Page 18: Lisbon genome diversity

Sampling assumptions have a large effect on the apparent structuring

Serre and Pääbo (2004)

Page 19: Lisbon genome diversity

ASIP A8818G

MATP C374G

Genetic variation is discordant across loci

Page 20: Lisbon genome diversity

Similar skin colors are due to different combinations of alleles

MATP C374G

ASIP A8818G

Page 21: Lisbon genome diversity

16 completely sequenced genomes (as of May 1st, 2010)

And 5 more published on May 6th:

Page 22: Lisbon genome diversity

Two persons from the same continent may share fewer SNPs than persons of different continents

Page 23: Lisbon genome diversity

81% of SNPs cosmopolitan.

Alleles present in one continent only: 0.91% in Africa, 0.75% in Asia, practically 0 elsewhere.

Jakobsson et al. 2008(525910 SNPs, 396 CNVs)

Page 24: Lisbon genome diversity

In the 117 megabases (Mb) of sequenced exome-containing intervals, the average rate of nucleotide difference between a pair of the Bushmen was 1.2 per kb, compared to an average of 1.0 per kb between a European and Asian individual. Schuster et al. (2010)

Greater differences between Africans than between European and Asians

Page 25: Lisbon genome diversity

Genetic diversity out of Africa is

often a subset ot the African genetic

diversity

Tishkoff et al. (1998)

Page 26: Lisbon genome diversity

LD decreasing with physical distance between loci and with geographic distance from East Africa

Jakobsson et al. 2008

Page 27: Lisbon genome diversity

Gene diversity declines as a function of distance from Africa

Best fit of the model for an African exit 56,000 years agoLiu et al. (2006)

Page 28: Lisbon genome diversity

Patterns of morphological and genetic variation are compatible with the effects of dispersal from Africa

Manica et al. (2007)

Page 29: Lisbon genome diversity

Models with an African population replacing previous human continental groups explain the data better than

any alternative models

Fagundes et al. (2007)

Page 30: Lisbon genome diversity

• The human genome is very similar to the chimpanzee genome• Individual genetic diversity among humans the lowest of all primates• Population differences among humans (FSTs) the lowest among primates• Clinal patterns in the geographical space the rule, not the exception • Genes, as well as morphology, suggest inconsistent clusterings of genotypes• Genetic variation discordant across loci • Similar skin colors due to different combinations of alleles• Two persons from the same continent may be less similar than persons of different

continents • Greater differences between Africans than between European and Asians• Alleles in non African populations often a subset of African alleles• Models with an African population replacing previous human continental groups

explain the data better than any alternative models• Gradients of gene diversity correlated with distance from Africa

To make a long story short:

So, what happened?

Page 31: Lisbon genome diversity

100,000 years ago

Page 32: Lisbon genome diversity

70,000 years ago

Page 33: Lisbon genome diversity

60,000 years ago

Page 34: Lisbon genome diversity

30,000 years ago

Page 35: Lisbon genome diversity

10,000 years ago

Page 36: Lisbon genome diversity
Page 37: Lisbon genome diversity

1. In all samples, tests for presence of Neandertal mtDNA2. mtDNA from Vi33-26 and Vi33-16 identical, but diff(within) < diff(between)3. Between 95% and 99% sequences from non primates4. Enrichment of libraries with enzymes preferentially cutting bacterial

genomes

Preliminary analyses

Page 38: Lisbon genome diversity

Divergence from modern humans

Neandertals fall inside the variation of present-day humans.Overall divergence greater for the three Neandertal genomes (modes ~11%), whereas the San mode is ~9% and for the other present-day humans ~8%. For the Neandertals, 13% of windows have a divergence above 20%, whereas this is the case for 2.5% to 3.7% of windows in the current humans

Page 39: Lisbon genome diversity

Segments of Neandertal genome in non-African genomes

• Statistic: D= Difference in polyorphic alleles shared with Neandertals between two humans.

2. D(Asia-Europe) = -0.53, P=0.25

3. D(Europe-Africa) = 4.57, D(Asia-Africa) = 4.81, both P<10-12

4. The greater genetic proximity of Neandertals to Europeans and Asians than to Africans is seen no matter how we subdivide the data: (i) by individual pairs of humans, (ii) by chromosome, (iii) by transitions or transversions, (iv) by hypermutable CpG versus all other sites, (v) by Neandertal sequences shorter or longer than 50 bp.

5. Contamination of the Neandertal sequences with non-African DNA? However, the magnitude of contamination necessary to explain the Europe-Africa and Asia-Africa comparisons are both over 10% and thus inconsistent with estimates of contamination in the Neandertal data, all below 1%

?

Page 40: Lisbon genome diversity

1. Comparison with the HapMap sequences and 5 newly sequenced individuals

2. No comparison within Eurasia (Papuan-French-Han) or within Africa (Yoruba- San) shows significant skews in D

3. All comparisons of non-Africans and Africans show that the Neandertal is closer to the non-Africans

4. All or almost all the gene flow detected was from Neandertals into modern humans

5. Some old haplotypes most likely owe their presence in non Africans to gene flow from Neandertals

Page 41: Lisbon genome diversity

Four processes potentially accounting for the data

Between 1 and 4% of the genomes of people in Eurasia are derived from Neandertals

1. From erectus to Neandertal

2. From late Neandertals into the first Europeans

3. From early Neandertals into the first Eurasians

4. Ancient population structure, preserved from before the Neandertal – modern sapiens separation

Page 42: Lisbon genome diversity

Enza Colonna

And if you want to read more about all this

Trends in Genetics, July 2010