Genetic diversityclustering and AMOVA
Aladdin Hamwieh
Sources of variation
• Variation from mutation
Mutation rates are so slow that mutation alone cannot account for rapid genetic changes of populations and species.
Overview of the phenomena that cause genetic change in populations
Nonrandom mating
Gene flow
Overview of the phenomena that cause genetic change in populations
Genetic distanceMarker1 Marker2 Marker3 Marker4 Marker5 Marker6 Marker7
Plant1 1 0 1 1 0 1 1
Plant2 1 1 1 0 0 1 0
Plant11 0
Plant2 1 Fa=3 Fb=10 Fc=2 Fd=1
N= Fa+Fb+Fc+Fd
Simple Match distance = Fa/N= 3/7= 0.43Genetic distance (Jaccard) = Fa/(Fa+Fb+Fc) = 3/6= 0.5
Similarity coefficients for binary variables
Dissimilarity indices – Continuous
Euclidean distance
Euclidean Distance is the most common use of distance. In most cases when people said about distance , they will refer to Euclidean distance. Euclidean distance or simply 'distance' examines the root of square differences between coordinates of a pair of objects.
Example:Point A has coordinate (0, 3, 4, 5) and point B has coordinate (7, 6, 3, -1). The Euclidean Distance between point A and B is
Features
cost time weight incentive Plant A 0 3 4 5 Plant B 7 6 3 -1
Dissimilarity indices – Continuous
Euclidean distance
Manhattan (City-Block)It is also known as Manhattan distance, boxcar distance, absolute value distance. It examines the absolute differences between coordinates of a pair of objects.
Features cost time weight incentive
Plant A 0 3 4 5 Plant B 7 6 3 -1
Measures of Marker Informativeness
• Allelic Diversity
Var M1 M2Pl 150/150 130/134P2 150/152 132/134P3 152/152 130/132
Marker Major.Allele.Frquency AlleleNo GeneDiversity PICM1 0.5000 2.0000 0.5000 0.3750M2 0.3333 3.0000 0.6667 0.5926
Mean 0.4167 2.5000 0.5833 0.4838
• Polymorphism Information Content (PIC)
Major allele freq. M1=3/6 = 0.5Div (M1) = 1-0.5 = 0.5
PIC (M1) = 0.5-(0.5X(0.5/2) = 0.375
• Example
Analysis of molecular variance AMOVA
Analysis of molecular variance (AMOVA)
• MOVA is a method for estimating molecular variation within a species
• It is not ANOVA :Don’t require the assumption of a normal distribution (Pre-mutational methods)\
• This may contains regions within continent• Individuals within population in area in region
in continent
Analysis of molecular variance (AMOVA)
Pop1Hs
Pop2Hs
Pop3Hs
DST
DST
DST
GST=DST/Hs
GST measures the pop of gene diversity that is distributed among populations
HT=0.263HS=0.202
DST=0.061 GST=0.2319 Means 23% is the variation among the populations
Analysis of molecular variance (AMOVA)
Example 1 This model, also called AMOVA, measures gene diversity among populations with specific reference to areas of a region in a continentWe have: i = individuals, j = alleles, k = populations
Analysis of molecular variance (AMOVA)
A1 A2 A1 A2 A1 A20 0 0 1 1 11 1 0 1 1 10 0 1 1 0 11 0 1 0 1 10 0 0 1 0 10 0 0 1 0 01 1 1 1 1 10 0 1 1 0 01 0 1 1 1 01 1 1 0 0 11 0 0 1 1 10 0 1 1 1 01 1 1 1 0 11 1 1 0 1 01 1 0 1 1 0
Pop1 Pop1 Pop1
2011
Clustering
Phylogenetic Trees and Dissimilarity estimation
20
Historical Note• Until mid 1950’s phylogenies were constructed
by experts based on their opinion (subjective criteria)
• Since then, focus on objective criteria for constructing phylogenetic trees– Thousands of articles in the last decades
• Important for many aspects of biology– Classification – Understanding biological mechanisms
Morphological vs. Molecular
• Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.
• Modern biological methods allow to use molecular features– Gene sequences– Protein sequences– DNA markers
22
Rat QEPGGLVVPPTDA
Rabbit QEPGGMVVPPTDA
Gorilla QEPGGLVVPPTDA
Cat REPGGLVVPPTEG
From sequences to a phylogenetic tree
There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).
.
Basic Assumptions Closer related organisms have more similar genomes.
Highly similar genes are homologous (have the same ancestor).
Phylogenetic relation can be expressed by a dendrogram (a “tree”) .
Aardvark Bison Chimp Dog Elephant
24
Species Phylogeny
Gene Phylogenies
Speciation events
Gene Duplication
1A 2A 3A 3B 2B 1B
Phylogenies can be constructed to describe evolution genes.
Three species termed 1,2,3.Two paralog genes A and B.
25
Types of Trees
A natural model to consider is that of rooted trees Common
Ancestor
26
Types of treesUnrooted tree represents the same phylogeny without the
root node
Depending on the model, data from current day species does not distinguish between different placements of the root.
Human, Chimp, Gorilla, Orangutan, and Gibbon
UPGMA
Taxa 1 2 3 4 5 6 7OTU-1 T G C G T A TOTU-2 T G G G T A TOTU-3 T G C G C T TOTU-4 T G C T G T GOTU-5 T A G T A G C
Step 1: Generate data (Sequence/ Genotype/ Morphological) for each OTU.
Distance can be calculated by using different substitution models:1. # of nucleotide differences.2. p-distance.3. JC distance4. K2P distance.5. F816. HKY857.GTR etc
Step 2: Calculate p- distance for all pairs of taxa
Taxa 1 2 3 4 5 6 7OTU-1 T G C G T A TOTU-2 T G G G T A T
= 0.142857143
Step 3: Calculate distance matrix for all pairs of taxa and select pair of taxa with minimum distance as new OTU.
Taxa 1 2 3 4 51 0 1 2 4 62 0.1428 0 3 5 53 0.2857 0.4285 0 3 64 0.5714 0.7142 0.4285 0 55 0.8571 0.7142 0.85710.7142 0
OTU-1OTU-2
0.0714
0.0714
Step 4: Recalculate new distance matrix, assuming OTU-1 and OTU-2 as one OTU.
= 0.3571
taxa 1+2 3 4 51+2 0
3 0.35714 0 4 0.64285 0.4285 0 5 0.78571 0.8571 0.7142 0
Taxa 1 2 3 4 51 0 2 0.1428 0 3 0.2857 0.4285 0 4 0.5714 0.7142 0.4285 0 5 0.8571 0.7142 0.8571 0.7142 0
Step 5: Select pair of taxa with minimum distance as new OTU.
OTU-1
OTU-2
0.071
0.071
OTU-30.179
0.107
0.107 + 0.071 + 0.179 = 0.357
Step 6: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix.
= 0.5714
taxa (1+2)3 4 5(1+2)3 0
4 0.5714 0 5 0.8095 0.7142 0
taxa 1+2 3 4 51+2 0
3 0.35714 0 4 0.64285 0.4285 0 5 0.78571 0.8571 0.7142 0
Taxa 1 2 3 4 51 0 2 0.1428 0 3 0.2857 0.4285 0 4 0.5714 0.7142 0.4285 0 5 0.8571 0.7142 0.8571 0.7142 0
Step 7: Again select pair of taxa with minimum distance as new OTU.
OTU-2
OTU-10.071
0.071
OTU-30.179
0.107
OTU-40.286
0.107
0.107 + 0.107 + 0.071 + 0.286 = 0.571
Step 8: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix.
= 0.7857
taxa ((1+2)3)4 5
((1+2)3)4 0
5 0.7857 0
taxa (1+2)3 4 5(1+2)3 0
4 0.5714 0 5 0.8095 0.7142 0
taxa 1+2 3 4 51+2 0
3 0.35714 0 4 0.64285 0.4285 0 5 0.78571 0.8571 0.7142 0
Taxa 1 2 3 4 51 0 2 0.1428 0 3 0.2857 0.4285 0 4 0.5714 0.7142 0.4285 0 5 0.8571 0.7142 0.8571 0.7142 0
Step 9: Again select pair of OTU with minimum distance as new OTU and make final rooted tree.
OTU-2
OTU-10.071
0.071
OTU-30.179
0.107
OTU-40.286
0.107
OTU-50.393
0.107
0.393 + 0.107 + 0.107 + 0.107 + 0.071 = 0.785
Jukes-Cantor distancethe rate of nucleotide substitution is the same for all pairs of the four nucleotides A, T, C, and G
A AA CA GA TC AC CC GC TG AG CG GG TT AT CT GT T
25% similar (= distance of 0.75). 75% which is what you expect with random assignment of nucleotides to a pair of taxa
طريقة الوراثية UPGMAتفترض القرابة شجرة أفرع طول في ثابتة نسبة
=-(3/4)*LN(1-(((4/3)*0.1594)))
Method: Neighbor-joining
A B C D E rA (human) — 0.015 0.045 0.143 0.198 0.4010B (chimp) — 0.03 0.126 0.179 0.3500C (gorilla) — 0.092 0.179 0.3460D (orangutan) — 0.179 0.5400E (gibbon) — 0.7350
Neighbor-joiningطريقة
A B C D E
A (human) — 0.0150 0.0450 0.1430 0.1980
B (chimp) -0.3605 — 0.0300 0.1260 0.1790
C (gorilla) -0.3285 -0.3180 — 0.0920 0.1790
D (orangutan) -0.3275 -0.3190 -0.3510 — 0.1790
E (gibbon) -0.3700 -0.3635 -0.3615 -0.4585 —
A:B = 0.015-(0.4010+0.35)/2
Example:
A B C D E r r/3A (human) — 0.015 0.045 0.143 0.198 0.4010 0.1337B (chimp) — 0.03 0.126 0.179 0.3500 0.1167C (gorilla) — 0.092 0.179 0.3460 0.1153D (orangutan) — 0.179 0.5400 0.1800E (gibbon) — 0.7350 0.2450
=0.179/2+(0.18-0.245)/2
=0.179-0.0572
Human and chimpanzee have the smallest value of Mij and they are replaced by node 2.
dijMij
• PHYLIP (Phylogeny Inference Package)
a = 0.016
3
2
1
b = -0.001
c = 0.006
d = 0.057
e = 0.1221'= 0.0403
2'= 0.024
E
D
A
B
C
• UPGMA
• Neighbor-joining (NJ)
Hamwieh, A., Udupa, S., Sarker, A., Jung, C. and Baum, M. (2009). Development of new microsatellite markers and their application in the analysis of genetic diversity in lentils. Breeding Science 59: 77-86.
Project 2: Genetic diversity in lentils
300 accessions2915 accessions
Chickpea Reference Set (GCP)
Upadhyaya HD, Dwivedi SL, Baum M, Varshney RK, Udupa SM, Gowda CLL, Hoisington D and Singh S (2008) Genetic structure, diversity, and allelic richness in composite collection and reference set in chickpea (Cicer arietinum L.). BMC Plant Biology 8: 106.
Thank you
Top Related