MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics...

70
Antonio Barbadilla Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia UAB Course 2012-13 1 Population Genomics: Theory MSc in Genetics

Transcript of MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics...

Page 1: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

Antonio Barbadilla

Group Genomics, Bioinformatics & Evolution

Institut Biotecnologia I Biomedicina

Departament de Genètica i Microbiologia

UAB

Course 2012-13 1

Population Genomics: Theory

MSc in Genetics

Page 2: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

2 Antonio Barbadilla

Outline

Population thinking: population genetics

The explanation of genetic variation: neutral theory and selection

Detecting natural selection at the nucleotide level: The MKT

Measures of nucleotide variation

Linkage disequilibrium

The golden age of population genetics

Cataloguing nucleotide variation at the genome scale

Exercices

Page 3: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

3 3 Antonio Barbadilla Lesson 6. Genome variation: I. nucleotide variation

Page 4: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

4 Antonio Barbadilla Lesson 6. Genome variation: I. nucleotide variation

Genetic variation is the

cornerstone of biological

evolution

R. C. Lewontin

Page 5: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

5 Antonio Barbadilla

A G A G T T C T G C T C G A G G G T T C T G C G C G A G T G T T C T G C G C G

Origin and

substitution of genetic variants

within populations

Evolution

Page 6: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

6

Population is the substrate where evolution occurs

Page 7: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Nothing in Biology Makes Sense Except in the Light of Evolution

Theodosious Dobzhansky

Nothing in Evolution Makes Sense Except in the Light of Population Genetics

Michael Lynch

Evolution and population genetics

Page 8: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

8

Population is the substrate where evolution occurs

Mendelian population: a group of interbreeding individuals sharing a common gene pool

Page 9: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

•Genetic variation or genetic polymophism: Existence in a population of two or more allelic forms in appreciable frequencies

•Gene or allelic frequency (population property, basic unit of evolution): f(A) proportion of an allele in the population

One gene, two alleles case A and a How does allelic frequency

change over time?

Hardy-Weinberg equilibrium law plays the role of the inertia principle in Dynamics: if not force is acting on the population, allelic and genotypic frequencies remains unchangable over time.

Sperm

Egg

AA p2

Aa pq

Aa pq

aa q2

A

p a

q

a q

A p

Allelic frequencies

Hardy-Weinberg law assumes that alleles of a infinite population unite at random to form genotypes of the next generation

9

Page 10: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

The problematic of population genetics is the description and explanation of genetic variation within and between populations

Theodosious Dobzhansky

Population genetics

Page 11: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

The theory of population genetics

Ronald Fisher J. B. S. Haldane Sewall Wright

Founders of Population Genetics

(1918-1932)

Genetic Drift

Natural Selection

Mutation

Migration

Factors changing gene frequencies in populations

Structure of population genetics

A theory of forces

Page 12: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

1

3 2

The struggle for the measurement of genetic variation

Page 13: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

13

The Great Obsession of population genetics (Gillespie 2004) What evolutionary forces led to the observed pattern of genetic variation?

Page 14: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

H. J. Muller

Teorías de la variación en los 60

•Ausencia de variación

•Selección purificadora

•Genotipo salvaje es óptimo

•Muller (laboratorio)

•Eugenesia

•Variación ubicua

•Selección equilibradora

•No existe un genotipo salvaje

•Dobzhansky (naturalista)

•¡Viva la diversidad!, no interferencia

Teoría clásica Teoría equilibradora

Page 15: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

60-70

•Electropheretic variation

The struggle for the measurement of genetic variation

Page 16: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

16

Neutral theory of molecular evolution (1968)

Motoo Kimura

Mutations are mainly neutral or strongly deleterious

Tomoko Ohta

0 0

Page 17: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Page 18: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

DGRP Freeze 1 Patterns polymorphism and divergence

along chromosome arms in D. melanogaster

PopDrowser -> http://popdrowser.uab.cat;

2L 2R 3L 3R X Tel Cen Cent Tel Cen Cen Cen Tel Tel Cen

Genome variation data

Page 19: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Page 20: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

20

Neutral theory of molecular evolution

Page 21: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

21 21 Antonio Barbadilla Lesson 6. Genome variation: I. nucleotide variation

Schrödinger equation (general)

F = ma (Newton’s dynamics 2nd law)

Page 22: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Nature of genetic variation

Mutation

=

Individual

Substitution

=

Population

Page 23: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

23

Page 24: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

24

Page 25: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Allelic frequency

Time

1

0

gen.4Nt fix

Dynamics of neutral substitutions

Neutral theory of molecular evolution Assumption

New mutations are mainly neutral or strongly deleterious

µ

ln2 Ntlost

•Polymorphism •Heterozygosity in the equilibrium H = = 2N x 2N µ = 4N µ

Page 26: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

26

µ N

Page 27: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Polymorphism and divergence are coupled

Species A Species B

Species B

Species A

Substitution Substitution => divergence

Time from separation

Neutral theory of molecular evolution

Page 28: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Polymorphism and divergence are coupled

Species A

• Divergence increases over time

D = 2Tµ

• Polymorphism reaches a dynamic equilibrium

H = = 4N µ

Neutral theory of molecular evolution

Species B

Page 29: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

29

Page 30: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Allelic frequency

Time

1

0

The intellectual elegance of Neutral theory: Play the role of null hypothesis

The Myth of Sysiphus and molecular evolution

Page 31: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

II: Mutaciones selectivamente ventajosas

gen. )2ln(2

Ns

t

1

0

4Ns

1

Dynamics of selective advantage mutations

Allelic frequency

Time

Page 32: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

How to detect positive (adaptive) selection in a background of neutral mutations?

Page 33: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

33

•Divergence • K = µ

•Polymorphism • Neutral heterozygosity = 4N µ

Page 34: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

34

Synonymous and non-synonymous sites

C T T C T A

C T T C C T

non-synonymous site

synonymous site

Leu

Leu Pro

Leu

Page 35: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

How to detect adaptive fixation in a class of sites? •Neutral hypothesis: Correlation polymorphism and divergence •Adaptive hypothesis: adaptive fixation uncouples divergence and polymorphism

McDonald, J. H. Y M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652-654.

McDonald and Kreitman Test (MKT)

Fixation by positive (adaptive) selection

Fixation by neutral process

Neutral sites (synonymous sites)

Putative positive selected sites (non-synonymous sites)

Page 36: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

McDonald-Kreitman test (MKT)

K Kneut μneut

Ratio divergence = ω =

i 4Nμneut_i neut 4Nμneut

Ratio polymorphism =

= 4Nμ k = μ

Neutral expectance

i = non-synonymous or putative selected sites

=

Only neutral fixation

= μneut_i

= μ neut_i

μneut

Page 37: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Extended McDonald-Kreitman test (ext MKT)

i neut

= ki kneut

Only neutral fixation

+ Adaptive fixation i neut

< ki kneut

Page 38: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Fixed Polymorphic Synonymous Ds Ps Non-synonymous Dn Pn

•Null hypothesis: the proportion of fixed versus polymorphic mutations is the same for synonymous and non-synonymous mutations.

•G or 2 test: variants are classified as (1) fixed or polymorphic; or (2) synonymous or non-synonymous

MKT Chi-square table. Observed (Expected) segregating sites

Relative excess suggests directional selection

Dn Relative excess suggests directional selection or purifying selection

Pn

Page 39: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Structure of gene Adh (256 codons)

Exon 1 Exon 2 Exon 3 Exon 4

3’ 5’

Adh Gene of Drosophila melanogaster Kreitman, M. 1983. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature 304: 412-417.

•13 synonymous polymorphism

•1 non-synonymous polymorphism

Page 40: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Fixed Polymorphic Non-synonymous 7 (3,2) 2 (5,8) Synonymous 17 (20,8) 42 (38,2)

G = 7,43 or X2

= 8,1 **

Polymorphism and divergence in locus Adh for closely related species of Drosophila melanogaster, D. simulans and D. yakuba

•Null hypothesis: the proportion of fixed versus polymorphic mutations is the same for synonymous and non-synonymous mutations.

•G or 2 test: variants are classified as (1) fixed or polymorphic; or (2) synonymous or non-synonymous

MKT Chi-square table. Observed (Expected) segregating sites

Page 41: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

41

Exercise 1: Go to the Web page Standard & Generalized MacDonald-Kreitman test (http://mkt.uab.cat). Test the examples and try to understand the difference between the standard and generalized MKT. Which is the parameter alpha whose value is given in the results of the test. How do you would estimate it?

Figure 1

Page 42: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

42

Page 43: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

43

Page 44: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

G

C

Single nucleotide polymorphism (SNP)

44

Page 45: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Nature and classification of Single Nucleotide Polymorphism (SNP)

•A nucleotide change •Transversion-Transition | Indel

•Changes of two o few nucleotides or indels – >

Simple Nucleotide Polymorhism

•Coding SNP •Synonymous •Non synonymous or replacement

•Non-coding SNP: CNS, 5’ and 3’UTR, intron, 5’ and 3’ intergenic

Classification

45

Page 46: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Description levels of genetic variation

SNPs BRCA2

Individual 1 acgtagcatcgtatgcgttagacgggggggtagcaccagtacag

Individual 2 acgtagcatcgtatgcgttagacgggggggtagcaccagtacag

Individual 3 acgtagcatcgtatgcgttagacgggggggtagcaccagtacag

Individual 4 acgtagcatcgtttgcgttagacgggggggtagcaccagtacag

Individual 5 acgtagcatcgtttgcgttagacgggggggtagcaccagtacag

Individual 6 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag

Individual 7 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag

Individual 8 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag

Individual 9 acgtagcatcgtttgcgttagacggcatggcaccggcagtacag

one-dimensional: SNP to SNP

multi-dimensional: Haplotype

46

Page 47: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

SNPs per 10000 bases along the human chromosome 6 (2001 Nature 409: 928-941)

HLA

47

Page 48: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Functional genome regions

48

Page 49: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

49

• Number of segregating sites per nucleotide (Watterson 1975):

S/m • Watterson Ѳ estimator (1975):

Ѳw = (S/m) / 1/𝑖𝑛−1𝑖=1

• , nucleotide diversity o expected nucleotide

heterozygosity (Tajima 1983): average number of differences by site among pair of randomly samples sequences

= 1𝑛2𝑚 𝑘𝑖𝑗

𝑛𝑗=𝑖+1

𝑛−1𝑖=1

S = number segregating sites m = number of nucleotides analyzed n = sample size (number of sequences)

𝑘𝑖𝑗 is the number of differences between sequence i and j

Page 50: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

50

Site frequency spectrum

Page 51: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Nucleotide variation in gen Rhodopsin 3 of Drosophila simulans

Data summary: Sample n = 5 sequences. Size m = 500 nucleotides

•Number of segregating sites per nucleotide: 16/500 = 0.0320

•Watterson Ѳ estimator = Ѳw = (16/500)/(1+1/2+1/3+1/4) = 0.0154

•Diversidad nucleotídica: = 79/(500 x 10) = 0.0158

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 T C T A C C T C C T C G G T T A

2 T C C T A C C T C C T G G T T T

3 C T C C C C C T C T T T G C T A

4 C T C C C C C T T C T G A C T T

5 C T C C C T C T T T T G G C C A

6 6 4 7 4 4 4 4 6 6 4 4 4 6 4 6

Page 52: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

52

Exercise 2: Given the data set of DNA sequences of Figure 1 and 2 below: • Estimate the most common measures or summary statistics of nucleotide variation (S/m, Ѳw , and the site frequency spectrum)

Sequence 1 … A … G … C … G …

Sequence 2 … A … G … T … G …

Sequence 3 … A … A … T … T …

Sequence 4 … T … G … T … T …

Sequence 5 … T … G … T … T …

Figure 1

Figure 2

Page 53: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

53

Exercise 1: Given the data set of DNA sequences of Figure below: • Estimate the most common summary statistics of nucleotide variation (S/m, Ѳw , and the site frequency spectrum)

Sequence 1 … A … G … C … G …

Sequence 2 … A … G … T … G …

Sequence 3 … A … A … T … T …

Sequence 4 … T … G … T … T …

Sequence 5 … T … G … T … T …

0

0,5

1

1,5

2

2,5

0.2 0.4

Site frequency spectrum

Number of SNPs

Minor allele frequency

S/m = 4/m Ѳw = (4/m) / [1 + (1 / 2) + (1 / 3) + (1 / 4) = (4/m) / 2.083] = (6+4+4+6)/(m x 10)=2/m

Responses

Page 54: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

54

Software for estimation of nucleotide variation DnaSP — DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. Variscan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. MEGA, Molecular Evolutionary Genetics Analysis, is a software package used for estimating rates of molecular evolution, as well as generating phylogenetic trees, and aligning DNA sequences. Available for Windows, Linux and Mac OSX (since ver. 5.x). Arlequin3 software can be used for calculations of nucleotide diversity and a variety of other statistical tests for intra-population and inter-population analyses. Available for Windows.

Page 55: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

First estimates of nucleotide diversity in the Drosophila genus

Species Number of

genes

No coding region

Coding region Total Synonymous

Page 56: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

First estimates of nucleotide diversity in the human genome

Nucleotide diversity (π) Density SNP number Chromosome

Autosomes 1-22

Page 57: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

57

Page 58: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Multidimensional structure of genetic variability

A G A G T T C T G C T C G

A G G G T T C T G C G C G A G G G T T A T G C G C G

A G A G T T C T G C T C G A G A G T T C T G C T C G

A G A G T T C T G C T C G

A G A G T T C T G C T C G A G A G T T C T G C T C G

A G G G T T A T G C G C G A G G G T T A T G C G C G A G G G T T A T G C G C G

A G G G T T A T G C G C G

A G G G T T A T G C G C G

58

Page 59: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Linkage disequilibrium (D’ Lewontin)

B1 B2 Total

A1 p11 = p1q1 + D p12 = p1q2 - D p1

A2 p21 = p2q1 - D p22 = p2q2 + D p2

Total q1 q2 1

2. D’ = D / Dmax

1. DAB = pAB - pApB

3. r2AB = D2/ [pA(1-pA) qB(1-qB)]

(Lewontin & Kojima 1960)

(Lewontin 1964)

(Hill & Robertson 1968)

Page 60: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Linkage disequilibrium

D = 0.5625 – 0.625 x 0.6875 = 0.1328

p1 ^ = n1. /2n = 10 / 16 = 0.625

q1 = n.1 /2n = 11 / 16 = 0.6875 ^

Χ2 = 5.606 > 3.84 *

B1 B2 Total

A1 9 1 10

A2 2 4 6

Total 11 5 16

Page 61: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Linkage disequilibrium

D’ = 0.1328 / 0.1953 = 0.68

p1 q2

p2 q1

= 10/16 x 5/16

= 6/16 x 11/16

r212 = D2/ [p1 p2 q1 q2)] = 0.3502

B1 B2 Total

A1 9 1 10

A2 2 4 6

Total 11 5 16

Page 62: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

c

c

c

c

Recombinaton and

linkage disequilibrium

DAB (t + 1) = (1 – c) DAB (t )

62

c = recombination rate

Page 63: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Linkage disequilibrium

Linkage blocks

&

Tag SNPs

Linkage disequilibrium distribution along the human lipase lipoprotein gene (LPL). 66

SNPs around 10 kb of 142 chromosomes 63

Page 64: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

64

Page 65: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

65

The golden age of the study of genetic variation

Page 66: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Description and explanation of patterns of nucleotide variation at a large scale

• Patterns of polymorphism and divergence

• Neutral vs Selection variation (adaptive evolution)

Explanation of the genome complexity from first population

genetic principles (sensu M. Lynch)

The golden age of the study of genetic variation

Page 67: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

Comparative and Functional Genomics

• Role of conserved and fast-evolution non-coding regions

Catalogue of human genetic variation and association studies genotype -> phenotype

• HapMap, Biobanks, GWAS

• Personalized Genomics

The golden age of the study of genetic variation

Page 68: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

68

Exercise 2: Estimate manually the most common measures or summary statistics of nucleotide variation (S/m, Ѳw , p and the site frequency spectrum) and the three measures of linkage disequilibrium (D, D' and r2) for the 8 aligned sequences given below. m, the analyzed sequence length, is 100. Only variable sites are shown.

Sequence 1 … A … G … C … G … Sequence 2 … A … G … T … G … Sequence 3 … A … A … T … G … Sequence 4 … T … G … T … T … Sequence 5 … T … G … T … T … Sequence 6 … T … G … C … T … Sequence 7 … T … G … T … T … Sequence 8 … T … G … T … T …

Exercise 1: Go to the Web page Standard & Generalized MacDonald-Kreitman test (http://mkt.uab.cat). Test the examples and try to understand the difference within the standard and generalized MKT. Which is the parameter alpha whose value is given in the results of the test. How do you would estimate it?

Exercices

Page 69: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

69

> ki kneut

i neut <

=

+ Adaptive fixation +Weakly negative

selection

Exercices

Exercise 3: Consider that adaptive and weakly deleterious selection are acting in a DNA sequence. If you want to perform a MKT, search for a statistical approach to take into account the weakly negative selection to detect adaptive selection.

Page 70: MSc in Genetics - UABbioinformatica.uab.es/base/documents/masterGP/Population...MSc in Genetics Module Genomics & Proteomics Population Genomics: Theory Antonio Barbadilla 2 Outline

MSc in Genetics Module Genomics & Proteomics

Antonio Barbadilla Population Genomics: Theory

70

• Population genomics in Drosophila: T.F.C. Mackay*, S. Richards*, E.A. Stone*, A. Barbadilla*, M. Barrón, D. Castellano, P. Librado, M. Ràmia, J. Rozas et al. 2012. The Drosophila melanogaster Genetic Reference Panel: A Community Resource for Analysis of Population Genomics and Quantitative Traits. Nature 482: 173-178.

• 1000 genome project: The 1000 Genomes Project Consortium. 2012. An integrated

map of genetic variation from 1,092 human genomes. Nature 491: 56-65.

Readings