1 Population Genomics Gil McVean, Department of Statistics, Oxford.

Post on 20-Jan-2016

214 views 0 download

Tags:

Transcript of 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

1

Population Genomics

Gil McVean, Department of Statistics, Oxford

2

Questions about genetic variation

• How different are our genomes?

• How is the variation distributed within and between genomes?

• What does variation tell us about human evolution?

3

How different are our genomes?

4

Serological techniques for detecting variation

Human

Rabbit

A

A B AB O

5

Blood group systems in humans

• 28 known systems– 39 genes, 643 alleles

System Genes Alleles

ABO ABO 102

Colton C4A, C4B 7+

Chido-rodgers AQP1 7

Colton DAF 10

Diego SLC4A1 78

Dombrock DO 9

Duffy FY 9

Gerbich GYPC 9

GIL AQP3 2

H/h FUT1, FUT2 27/22

I GCNT2 7

Indian CD44 2

Kell KEL, XK 33/30

Kidd SLC14A1 8

Knops CR1 24+

Landsteiner-Wiener

ICAM4 3

Lewis FUT3, FUT6 14/20

Lutheran LU 16

MNS GYPA,GYPB,GYPE

43

OK BSG 2

P-related A4GALT, B3GALT3

14/5

RAPH-MER2 CD151 3

Rh RHCE, RHD, RHAG

129

Scianna ERMAP 4

Xg XG, CD99 -

YT ACHE 4

http://www.bioc.aecom.yu.edu/bgmut/summary.htm

6

Protein electroporesis

• Changes in mass/charge ratio resulting from amino acid substitutions in proteins can be detected

• In humans, about 30% of all loci show polymorphism with a 6% chance of a pair of randomly drawn alleles at a locus being different

+++

--- - --

++--- - --- - +

Starch or agar gel

Direction of travel

Lewontin and Hubby (1966)

Harris(1966)

7

The rise of DNA sequence analysis

• RFLPs– Cann et al 1987

• Sequencing of small regions– Vigilant et al 1991

• Whole genome sequencing– Ingman et al 2000

8

The human genomes…

• The draft human genome sequence was published in 2001– This is a mosaic from several individuals

• Since then, several more genomes have been sequenced, at least partially– Shotgun sequencing variation discovery

• Other methods have been developed to look for gross chromosomal differences

Nimblegen array CGH

9

The International HapMap Project

• Launched in 2002 with the goal of characterising single nucleotide variation between 540 human genomes from individuals of European, Nigerian, Chinese and Japanese ancestry

• Not a sequencing project, rather it types known polymorphisms

• Has currently assembled information on over 6 million SNPs (single nucleotide polymorphisms)

10

The 1000 Genomes Project

11

How do we differ? – Let me count the ways

• Single nucleotide polymorphisms– 1 every few hundred bp

• Short indels (=insertion/deletion)– 1 every few kb

• Microsatellite (STR) repeat number– 1 every few kb

• Minisatellites– 1 every few kb

• Repeated genes– rRNA, histones

• Large inversions, deletions– Y chromosome, Copy Number Variants (CNVs)

TGCATTGCGTAGGCTGCATTCCGTAGGC

TGCATT---TAGGCTGCATTCCGTAGGC

TGCTCATCATCATCAGCTGCTCATCA------GC

≤100bp

1-5kb

12

Y chromosome variation

• Non-pathological rearrangements of the AZFc region on the Y chromosome

Tyler-Smith and McVean (2003)

13

Mutation is the ultimate source of variation

• New mutations occur in the germ-line

• Point mutations at about 2x10-8 per nucleotide per generation– You pass on about 60 new mutations to your children, of which perhaps 1

changes the protein sequence encoded by a gene

• Microsatellite mutations can occur much faster– Up to 10-4 per generation– Some, e.g. in Huntington’s disease, have important consequences

• Minisatellites can mutate at rates of up to 10-1 per generation– The uniqueness of these patterns gives rise to DNA fingerprinting

• Most of the differences between genomes are the result of inheriting mutations from our ancestors

14

Our genomesInherited mutations

Our genealogical tree

Mutations in our ancestors

15

Different, but not that different

• Humans are one of the least diverse organisms (excepting cheetahs)

Species Diversity (percent)

Humans 0.08 - 0.1

Chimpanzees 0.12 - 0.17

Drosophila simulans 2

E. coli 5

HIV1 30

Photos from UN photo gallery www.un.org/av/photo

16

An aside on the genetics of race

• It is sometimes claimed that there is a ‘genetic basis to race’

• What is true is that groups of individuals from different parts of the world tend to have similar genomes because they share recent ancestry

• But there are very few ‘fixed’ genetic differences between populations (I can think of one example – the FY gene)

• The differences between populations are in terms of the combinations of variants,

Rosenberg et al (2002)

17

How is genetic variation distributed within and between genomes?

18

Diversity is not evenly distributed across the genome I

Genome Average pairwise differences / kb

Relative copy number ()

Autosomes 0.5 – 0.85 1

X chromosome 0.47 3/4

Y chromosome 0.15 1/4

mtDNA 2.8 1/4

TISMWG (2001) , Jobling et al (2004)

• Autosomes, sex chromosomes and mtDNA have systematically different levels of diversity

• This reflects differences in the number of chromosomes and the mutation rate

19

Diversity is not evenly distributed across the genome II

TISMWG (2001)

Chromosome 6

HLA

• There are fluctuations in the level of variation across the genome

20

Diversity is not evenly distributed across genes I

• Purifying selection eliminates deleterious mutations and reduces diversity in regions of strong functional constraint

0123456789

Intergenic Intronic Exonic UTR

SN

Ps

pe

r 1

0k

b

Zhao et al (2003)

21

Diversity is not evenly distributed across genes II

• Adaptive evolution ‘wipes out’ diversity nearby due to the hitch-hiking effects of a selective sweep

– e.g. Duffy-null locus in sub-Saharn africa, protects against P. vivax

Pop1

Pop2

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 T 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 C 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 C 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 T 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 T 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 T 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 G 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 0

C 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0

0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0

0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 00 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 AC 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 0C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 0

C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 00 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 C 0 0 0 0 G 0 0

0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 C 0 0 0 0 G 0 0

European

African

FY*O mutation

Ancestral alleleDerived alleleMissing dataHamblin and Di Rienzo (2000)

22

Diversity is not evenly distributed across genes III

• Some genes are under balancing or diversifying selection, where diversity is actively selected for

– MHC complex: heterozygote advantage and frequency-dependent selection driven by recognition of pathogens

Horton et al (1998)

23

Diversity is not evenly distributed across populations I

• African populations are more diverse than non-African populations– More polymorphisms– Polymorphisms at less skewed frequencies

• Differences reflect bottlenecks associated with the colonisation from Africa c.65 KYA

Population Segregating sites per kb (n = 30)

Diversity per kb

Tajima D statistic

Hausa (African)

4.8 0.11 -0.33

Italian 3.2 0.10 1.18

Chinese 3.0 0.07 1.19

Frisse et al (2001)

24

mtDNA phylogeography

Ingman et al (2000)

African

Non-African

25

The colonisation process as inferred from mtDNA variation

26

What does genetic variation tell us about human evolution?

• Modern humans appear in the fossil record about 200K years ago

• The mitochondrial Eve dates back to about 150K years ago

• The Y-chromosome Adam dates back to about 70K years ago

• For most of our genome, however, the common ancestor is about 500K – 1M years ago

– This predates the origin of Homo sapiens considerably

27

Human – chimp split

Autosomal MRCA

Origin of H. sapiens

28

Did early humans interbreed with Neanderthals?

Ovchinnikov et al (2000)

Neanderthals

mtDNA sequences say no…

29

But…

• There is some evidence for this in the presence of unusual haplotypes found in Europe composed of SNPs not found in non-European populations

Plagnol and Wall (2006)

30

Deeper trees in the human genome

• There is growing evidence that some regions of our genome have truly ancient common ancestors

• Dystrophin has an ancient haplotype found primarily outside Africa suggesting a colonisation of >160KYA

• There is an inversion found primarily in Europeans that is roughly 3MY old

Stefansson et al (2005)

Haplotype 1

Haplotype 2

31

What are the genetic differences that make us human?

32

Chromosomal changes

• Human chromosome 2 is a fusion of two chromosomes in great apes

• There are several inversion differences between the chromosomes

Feuk et al (2005)

33

Gene loss

• Loss of enzymes that make sialic acid

– Sugar on cell surface that mediates a variety of recognition events involving pathogenic microbes and toxins

• Myosin heavy chain– Associated with

gracilization

Wang et al (2006)

34

Gene evolution

• FOXP2 is a highly conserved gene (across the mammalia), expressed in the brain. Mutations in the gene in humans are associated with specific language impairment

• Across the entire mammalian phylogeny, there have only been a very few amino acid changing substitutions

• However, two amino acid changes have become fixed in the lineage leading to modern humans since the split with the chimpanzee lineage

Enard et al. (2002)

35

What are the genetic differences that make people and peoples different?

36

Detecting recent adaptive evolution

• Let’s look closely at the dynamics of the fixation process for adaptive mutations

• The fixation of a beneficial mutation is associated with a change in the patterns of linked neutral genetic variation

• This is known as the hitch-hiking effect (Maynard Smith and Haigh 1974)

• Looking for the signature of hitch-hiking can be a good way of detecting very recent fixation events

37

Long haplotypes

• A selective sweep at the Lactase gene in Europeans

38

Strong population differentiation

Lamason et al (Science 2005)

• SLC24A5

39

40

Classes of selected genes

Voight et al. (2005)

41

Reading

• Human genetic variation– Rosenberg et al. Genetic structure of human populations. Science 2002, 298:2381-2385.– Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the

human genome. Nature Genet. 2006, 1251-1260.– McVean et al. Perspectives on human genetic variation from the International HapMap

Project. PLoS Genetics 2005, 1:e54.

• The origin of modern humans– Reed & Tishkoff. African human diversity, origins and migrations. Curr Opin Genet Dev. 2006

16:597-605.– Jobling et al. Human evolutionary genetics: origins, peoples, and disease. Garland Science,

2004.– Harding & McVean. A structured ancestral population for the evolution of modern humans.

Curr. Op. Genet. Dev. 2004, 14: 667-674.

• Natural selection– Lamason et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and

humans. Science 2005, 310:1782-1786.– Sabeti et al. Positive natural selection in the human lineage. Science 2006, 312:1614-1620. – Tishkoff et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat

Genet. 2007 39:31-40