Molecular Evolution and Phylogeny
Embed Size (px)
Transcript of Molecular Evolution and Phylogeny
Molecular Evolution and PhylogenyExamples
Weakly deleterious mutationsWeakly deleterious mutations can reach high frequencies in local populations and, thus, may contribute significantly to genetic variance in disease susceptibility.
Sequencing of human polymorphismsA team at Celera Genomics sequenced by exon-specific polymerase chain reaction (PCR) amplification 20,362 loci in 20 European Americans, 19 African Americans and one male chimpanzee with the initial intention of finding novel nonsynonymous single nucleotide polymorphisms (SNPs) based on their 2001 build of the human genome.
Divergence between human and mouseA total of 34,099 fixed synonymous differences between 39 humans and the chimpanzee yield a genomic average synonymous divergence of dS = 1.02%.20,467 non-synonymous differences dN = 0. 242% across 11.81 megabases (Mb) of aligned coding DNA.
Polymorphisms15,750 synonymous and 14,311 non-synonymous SNPs among the human subjects, yielding average synonymous and non-synonymous SNP densities of pS = 0.470% and pN = 0.169%.
Polymorphisms are more than divergencea highly significant excess of amino acid variation relative to divergence.
Can you comment on the following?Evolution of human populations since sharing a last common ancestor with chimpsType of nonsynonymous mutations (very deleterious or mildly deleterious) in human populationsPositively selectionNegative selectionDisease associations?
- Non-neutral evolution dN/dS = 1 neutral evolutiondN/dS > 1 positive selectiondN/dS
Accelerated evolution of genes
What makes us a vertebrate?Neural crest?Highly sophisticated nervous system?Bones/cartilage?Vertebrate specific genes?
Origin of bilateriaSome vertebrate genes date prior to the origin of bilateria
BilateriaBilateria: a monophyletic group of metazoan animals characterized by bilateral symmetry.
Radial symmetryBilateria excludes the Cnidaria, Ctenophora (sea gooseberries), Porifera (sponges) and Placozoa.
A little taxonomy
CnidariaCnidaria: a basal phylum, has two body layers, radial symmetry and being at the tissue grade of morphological organization.There are two basic morphologies; the sessile polyp and the swimming medusa or jellyfish. The phylum contains four classes (examples), including jellyfish, sea anemone and hydra
Body AxisOralaboral axis: the single obvious body axis of the two radiate phyla (Cnidaria and Ctenophora), marked at one end by the mouth or oral pore.
Wnt SignalingIn Wnt signalling pathway, ligand binding triggers the formation of a receptor complex, and protein kinases modify the receptor tails, leading to recruitment of cytoplasmic factors.In other signalling pathways, receptor-induced protein phosphorylation amplifies the signal, and the receptor-associated kinase acts as a catalyst for the modification of many substrate molecules.
Wnt genesMammals have 19 wntsSea anemone has 12:Nematostella vectensis, a diploblastKusserow A, Pang K, Sturm C, Hrouda M, Lentfer J, Schmidt HA, Technau U, von Haeseler A, Hobayer B, Martindale MQ, Holstein TW (2005) Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433:156-160.
Phylogenetic tree of wnts
Expression of wntsThe original bilaterian was equipped with a fairly elaborate set of molecular tools.
Endoderm, ectoderm, mesodermFor example, the Nematostella ectodermal genes, NvWnt1, NvWnt2, NvWnt4 and NvWnt7 correspond to the neuroectodermal Wnt genes in the higher Bilateria.NvWnt5, NvWnt6 and NvWnt8 are expressed in the endoderm, whereas the corresponding genes in deuterostomes are all expressed in the mesoderm.
CollagenBone is significantly linked to cartilage, both in development and evolution, with earlier forms having a cartilaginous skeleton that is replaced by bone. In vertebrates, cartilage also contains threads of collagen running through it.
CollagenBone is a living tissue continually remodeling the mineral matrix threaded with fibers of a protein, type II collagen, gives strength.
CollagenCollagen is an ancient protein (800 million years ago?). There are about 27 different types of collage in at least a dozen different classes. http://web.indstate.edu/thcme/mwking/extracellularmatrix.html
One particular type, type II collagen, is an essential part of the matrix of bones and cartilages.
A primitive jawless fish from the late Devonian, around 370 million years ago. Do lampreys have collagen?
Initially it was thought lampreys dont have collagenZhang et al. screened a library of lamprey sequences and isolated two forms of collagen II, Col21a and Col21b. The presence of a collagen homolog related to human collagen II the gene arose before the (jawless)lamprey-gnathostome (true-jaws) split.Col21 is used in developing branchial cartilaginous skeleton.Proc Natl Acad Sci U S A. 2006 Feb 21; Lamprey type II collagen and Sox9 reveal an ancient origin of the vertebrate collagenous skeleton.Zhang G, Miyamoto MM, Cohn MJ.
but they do!
BootstrappingThe bootstrap is a procedure that involves choosing random samples with replacement from a data set and analyzing each sample the same way.
BootstrappingSampling with replacement means that every sample is returned to the data set after sampling. So a particular data point from the original data set could appear multiple times in a given bootstrap sample.
The number of elements in each bootstrap sample equals the number of elements in the original data set. The range of sample estimates we obtain allows us to establish the uncertainty of the quantity we are estimating.
Reliability of a treereliability of an estimated tree is to examine the reliability of each interior branch.
the reliability of an inferred tree is examined by using Efrons bootstrap resampling technique. A set of nucleotide sites is randomly sampled with replacement from the original set, and this random set is used for constructing a new phylogenetic tree. This process is repeated many times, and the proportion of replications in which a given sequence cluster appears is computed.If this proportion (PB) is high (say, PB > 0:95) for a sequence cluster, this cluster is considered to be statistically significant.
BootstrappingOpen MatlabOpen HelpType bootstrap and read
Example> load lawdata> plot(lsat,gpa,'+')> lsline
Plot of lsat vs. gpa
Calculate correlation between lsat and gpa> rhohat = corrcoef(lsat,gpa)> rhohat = 1.0000 0.7764 0.7764 1.0000
Is 0.78 significant?Now we have a number, 0.7764, describing the positive connection between LSAT and GPA, but though 0.7764 may seem large, we still do not know if it is statistically significant.
Bootstrp functionUsing the bootstrp function we can resample the lsat and gpa vectors as many times as we like and consider the variation in the resulting correlation coefficients.
Generate 1000 lsat and gpa vectors by resampling from the original vectorsrhos1000 = bootstrp(1000,'corrcoef',lsat,gpa);hist(rhos1000(:,2),30)
What is the uncertainty associated with the observed correlation?>> mean(rhos1000(:,2))ans = 0.7711>> std(rhos1000(:,2))ans =0.1350>> 0.1350*1.96ans = 0.2646
You have data on the expression pattern of two genesHOXA1 and CDK6 expression values in different tissues are collected.Open the excel file named data.xlsCopy and paste the numerial data columns (two of them) into the workspace as follows naming the data as a:>> a =[ paste.here and close bracket];
Calculate the uncertainty associated with the correlation btw HOXA1 and CDK6 genesPlot the expression values (x, HOXA1; and y, CDK6).Place a lsline on the dataCalculate the correlation coefficient between the genesGenerate 1000 bootstrapped samples to estimate the sample correlation coefficient.Determine the 95% confidence interval around the bootstrapped correlation coefficient.
Bootstrap of align2.mGenerate 1000 samples of bootstraped alignment score and its 95% confidence interval using the bootstrp function.
Bayesian InferenceThere are three basic methods that have been used to estimate phylogeny, including distance, maximum parsimony (MP),and maximum likelihood (ML).Bayesian statistics differs in that in addition to the current data, prior knowledge is included in the testing of the hypothesis.
Medical tests and Bayesian StatsAssume that previous studies have evaluated the accuracy of this test and have shown that, if you are in fact ill, there is a 99% likelihood that the test will give a true positive result (and thus, a 1% likelihood that the test will give a false negative).
Medical tests and Bayesian StatsIt was also found that if you are healthy, there is a 0.1% likelihood of a false positive result from the test. If we were simply using the data (i.e., the test result), we would then conclude that a positive test result had approximately a 99% chance of being correct.
Medical tests and Bayesian StatsIf we were to examine this question in a Bayesian framework, we could incorporate prior knowledgein this case that other studies have shown that the base rate of this illness is 0.1% in the population.Thus, of a population of 100,000 individuals, 100 would be ill and 99,900 would be healthy.
Medical tests and Bayesian StatsUsing the likelihood values mentioned above, we could conclude that a positive test result would be seen in 99% of the ill individuals (99 true positives) and 0.1% of the healthy individuals (approximately 100 false positives).
Medical tests and Bayesian StatsThis leaves us with a conclusion that if a person has a positive test result, there is a 99/199 or approximately 50% chance that the test is correct and this person is actually ill. Therefore, by including prior knowledge of the base rate of the illness in the population, the perceived chance that a positive result indicates that an individual actually has the illness drops from 99% to 50%.