What should we study - UFSCarevolucao/TGE/Lect02.pdf · Microsoft PowerPoint - Lect 02 Genetic...

What should we study ?

• Levels of genetic variability - intrapopulational• Population structure - interpopulational• Geographic distribution of genetic diversity

• Taxonomic uncertainties – taxonomic and systematic studies

• Number of species – taxonomic and ecological approaches

Intrapopulational measures

Why Genetic Diversity

• Genetic diversity is important because it is the raw material on which selection can act, and thus species can respond to selective pressure.

• Majority of low frequency alleles exist in heterozygous states, and there if they are deleterious, their action may be fully or partially masked.

Why Genetic Diversity

• Genetic diversity also plays a role in determining IUCN categories.

• The lower the genetic diversity, the higher the perceived risk of threat.

Measuring Genetic Diversity

• Measures of genetic diversity depend on the data analyzed.

• One set of measures focuses on heterozygositymeasures and is based on diploid, co-dominant markers.

• Other set of measures focuses on allelic information, and or unphased diploid data.

• Some indexes implemented in Arlequin

Measures of Genetic diversity

Molecular Markers• Sequence data

• Single Nucleotide Polymorphism (SNP) data

• Microsatellite data

• Allozyme data

• Amplified Fragment Lengths Polymorphism (AFLP) data

• Randomly Amplified Polymorphic DNA (RAPD) data

• Hybridization data

• Chromosomal pattern data

Sequence data

Sequence data• Differences in haplotypes are due to point mutations

(transition or transversion types), due to insertions or due to deletions.

• In diploid organisms, differences are also due to recombination.

• Molecular models of evolution dealing with point mutations are very well studied.

Microsatellite data

Microsatellite data

Template strand

+1 repeat -1 repeat

Slippage

Misalignment

Growing strand

Microsatellite data• Differences in haplotypes are due to unequal crossing

over, or due to slippage in strand replication.

• This class of markers is co-dominant, i.e. heterozygous and both homozygous classes of individuals can be distinguished.

• Fast rate of molecular evolution.

• Models of molecular evolution are not well known.

Allozyme data

Allozyme data• Properties of allozyme data are very similar to

microsatellite data.

RFLP data• Differences in haplotypes are due to point mutations

(transition or transversion types), due to insertions or due to deletions.

• In diploid organisms, differences are also due to recombination.

• This class of markers is dominant, i.e. heterozygous and homozygous dominant individuals cannot be distinguished.

Chromosomal data

Best Markers• Theoretically the best markers are sequence markers.

• If there is sufficient variation – sufficient sequence length.

• If the differences can be phased.

• And because we have the best models of molecular evolution for these markers.

HaplotypesSample 1 AAAAASample 2 AAAAASample 3 AGAAASample 4 AGAAASample 5 AGAAGSample 6 AGAAGSample 7 GGAAASample 8 GGAAASample 9 GGGAASample 10 GGGAASample 11 GGGGASample 12 GGGGA

Measuring Genetic DiversitySample 1 AGAACTTCTGSample 2 AGAACTTCTGSample 3 AGAACTTCTGSample 4 AAAA TTTTTGSample 5 AAAA TTTTTGSample 6 AAAATCTTTG

Number of segregating sites– Is the total number of mutations observed in the dataset.


Gene Diversity –Is equivalent to expected heterozygosity for diploid data. It is defined as the probability that any two randomly selected sequences will be different.


Mean number of pairwise differences –Mean number of differences between all pairs of haplotypes in the sample.d = mutational difference, p = allele frequency, k = allele number, n = sample size


NucleotideDiversity –It is computed as the probability that two randomly chosen homologous sites are different.d = mutational difference, p = allele frequency, k = allele number, L = number of loci (allele number)

Measuring Genetic Diversity

• Theta = θ = 4Nµ = 4Nm = 4N(µ+m)• For haploid markers θ = 2Nµ = 2Nm = 2N(µ+m)• The all important population genetic parameter.• It is based on the number of alleles or the number of

different nucleotides in a given sample.• It quantifies genetic diversity of a given population.

Theta (θ) Hom

• The expected homozygosity (Zouros, 1979; Chakraborty and Weiss (1991) in a population at equilibrium between drift and mutation.

• Sensitive to small sample and allele sizes

• For microsat data

Theta (θ) S

• Estimated from the infinite-site equilibrium relationship (Watterson, 1975) between the number of segregating sites (S), the sample size (n) and θ for a sample of non-recombining DNA.

Theta (θ) k

• Estimated from the infinite-allele equilibrium relationship (Ewens, 1972) between the expected number of alleles (k), the sample size (n) and θ.

• 95% confidence limits are calculated as

Sterling number (expansion factor of a factorialFalling factorial

Theta (θ) πˆ

• Estimated from the infinite-site equilibrium (Tajima, 1983) relationship between the mean number of pair-wise differences (πˆ) and theta (θ ).

Why so many θ measures

• Not all methods are suitable for all types of data.• Ultimately all methods should result in the same

estimates of theta.• Differences in estimates can be interpreted as

violations of assumptions, and each method is sensitive to different assumptions.

Tajima’s D

• Tajima’s (1989) D test quantifies the discordance between the estimate of theta from number of segregating sites and from average pair-wise sequence divergence.

Fu’s Fs

• Fu’s (1997) Fs measures the probability of observing a certain number of haplotypes given particular value of θ

Differences in θ measures

• Have selective interpretations.• Have demographic interpretations.

What should we study - UFSCarevolucao/TGE/Lect02.pdf · Microsoft PowerPoint - Lect 02 Genetic...

Documents

Transcript of What should we study - UFSCarevolucao/TGE/Lect02.pdf · Microsoft PowerPoint - Lect 02 Genetic...