Download - Molecular Evolution and Phylogeny

Transcript
Page 1: Molecular Evolution and Phylogeny

Molecular Evolution and Phylogeny

Examples

Page 2: Molecular Evolution and Phylogeny

Weakly deleterious mutations

Weakly deleterious mutations can reach high frequencies in local populations and, thus, may contribute significantly to genetic variance in disease susceptibility.

Page 3: Molecular Evolution and Phylogeny

Sequencing of human polymorphisms

A team at Celera Genomics sequenced by exon-specific polymerase chain reaction (PCR) amplification 20,362 loci in 20 European Americans, 19 African Americans and one male chimpanzee with the initial intention of finding novel nonsynonymous single nucleotide polymorphisms (SNPs) based on their 2001 build of the human genome.

Page 4: Molecular Evolution and Phylogeny

Divergence between human and mouse

A total of 34,099 fixed synonymous differences between 39 humans and the chimpanzee yield a genomic average synonymous divergence of dS = 1.02%.

20,467 non-synonymous differences dN = 0. 242% across 11.81 megabases (Mb) of aligned coding DNA.

Page 5: Molecular Evolution and Phylogeny

Polymorphisms

15,750 synonymous and 14,311 non-synonymous SNPs among the human subjects, yielding average synonymous and non-synonymous SNP densities of pS = 0.470% and pN = 0.169%.

Page 6: Molecular Evolution and Phylogeny

Polymorphisms are more than divergence

a highly significant excess of amino acid variation relative to divergence.

Page 7: Molecular Evolution and Phylogeny

Can you comment on the following?

Evolution of human populations since sharing a last common ancestor with chimps

Type of nonsynonymous mutations (very deleterious or mildly deleterious) in human populations Positively selection Negative selection

Disease associations?

Page 8: Molecular Evolution and Phylogeny

Non-neutral evolution

dN/dS = 1 neutral evolution dN/dS > 1 positive selection dN/dS <1 negative selection

Page 9: Molecular Evolution and Phylogeny

Accelerated evolution of genes

Page 10: Molecular Evolution and Phylogeny

What makes us a vertebrate?

Neural crest? Highly sophisticated nervous system? Bones/cartilage? Vertebrate specific genes?

Page 11: Molecular Evolution and Phylogeny

Origin of bilateria

Some vertebrate genes date prior to the origin of bilateria

Page 12: Molecular Evolution and Phylogeny

Bilateria

Bilateria: a monophyletic group of metazoan animals characterized by bilateral symmetry.

Page 13: Molecular Evolution and Phylogeny

Radial symmetry Bilateria excludes the Cnidaria,

Ctenophora (sea gooseberries), Porifera (sponges) and Placozoa.

Page 14: Molecular Evolution and Phylogeny

A little taxonomy

Page 15: Molecular Evolution and Phylogeny

Cnidaria

Cnidaria: a basal phylum, has two body layers, radial symmetry and being at the tissue grade of morphological organization.

There are two basic morphologies; the sessile polyp and the swimming medusa or jellyfish.

The phylum contains four classes (examples), including jellyfish, sea anemone and hydra

Page 16: Molecular Evolution and Phylogeny

Body Axis

Oral–aboral axis: the single obvious body axis of the two ‘radiate’ phyla (Cnidaria and Ctenophora), marked at one end by the mouth or oral pore.

Page 17: Molecular Evolution and Phylogeny

Wnts signaling

http://www.stanford.edu/~rnusse/reviews/NaVReviewFinal438747a.pdf

Page 18: Molecular Evolution and Phylogeny

Wnt Signaling

In Wnt signalling pathway, ligand binding triggers the formation of a receptor complex, and protein kinases modify the receptor tails, leading to recruitment of cytoplasmic factors.

In other signalling pathways, receptor-induced protein phosphorylation amplifies the signal, and the receptor-associated kinase acts as a catalyst for the modification of many substrate molecules.

Page 19: Molecular Evolution and Phylogeny

Wnt genes

Mammals have 19 wnts Sea anemone has 12:

Nematostella vectensis, a diploblast

Kusserow A, Pang K, Sturm C, Hrouda M, Lentfer J, Schmidt HA, Technau U, von Haeseler A, Hobayer B, Martindale MQ, Holstein TW (2005) Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433:156-160.

Page 20: Molecular Evolution and Phylogeny

Nematostella vectensis

http://www.nematostella.org/

Page 21: Molecular Evolution and Phylogeny

Phylogenetic tree of wnts

Page 22: Molecular Evolution and Phylogeny

Expression of wnts

The original bilaterian was equipped with a fairly elaborate set of molecular tools.

Page 23: Molecular Evolution and Phylogeny

Endoderm, ectoderm, mesoderm

For example, the Nematostella ectodermal genes, NvWnt1, NvWnt2, NvWnt4 and NvWnt7 correspond to the neuroectodermal Wnt genes in the higher Bilateria.

NvWnt5, NvWnt6 and NvWnt8 are expressed in the endoderm, whereas the corresponding genes in deuterostomes are all expressed in the mesoderm.

Page 24: Molecular Evolution and Phylogeny

Collagen

Bone is significantly linked to cartilage, both in development and evolution, with earlier forms having a cartilaginous skeleton that is replaced by bone. In vertebrates, cartilage also contains threads of collagen running through it.

Page 25: Molecular Evolution and Phylogeny

Collagen

Bone is a living tissue continually remodeling the mineral matrix threaded with fibers of a protein, type II collagen, gives strength.

Page 26: Molecular Evolution and Phylogeny

Collagen

Collagen is an ancient protein (800 million years ago?).

There are about 27 different types of collage in at least a dozen different classes. http://web.indstate.edu/thcme/mwking/extracellul

armatrix.html

One particular type, type II collagen, is an essential part of the matrix of bones and cartilages.

Page 27: Molecular Evolution and Phylogeny

A primitive jawless fish from the late Devonian,

around 370 million years ago. Do lampreys have collagen?

Page 28: Molecular Evolution and Phylogeny

Initially it was thought lampreys don’t have collagen

Zhang et al. screened a library of lamprey sequences and isolated two forms of collagen II, Col2α1a and Col2α1b.

The presence of a collagen homolog related to human collagen II the gene arose before the (jawless)lamprey-gnathostome (true-jaws) split.

Col2α1 is used in developing branchial cartilaginous skeleton.

Proc Natl Acad Sci U S A. 2006 Feb 21; Lamprey type II collagen and Sox9 reveal an ancient origin of the vertebrate collagenous skeleton.

Zhang G, Miyamoto MM, Cohn MJ.

Page 29: Molecular Evolution and Phylogeny

but they do!

Page 30: Molecular Evolution and Phylogeny

Collagen phylogeny

Page 31: Molecular Evolution and Phylogeny

Bootstrapping

The bootstrap is a procedure that involves choosing random samples with replacement from a data set and analyzing each sample the same way.

Page 32: Molecular Evolution and Phylogeny

Bootstrapping

Sampling with replacement means that every sample is returned to the data set after sampling. So a particular data point from the original data set could appear multiple times in a given bootstrap sample.

Page 33: Molecular Evolution and Phylogeny

Bootstrapping

The number of elements in each bootstrap sample equals the number of elements in the original data set. The range of sample estimates we obtain allows us to establish the uncertainty of the quantity we are estimating.

Page 34: Molecular Evolution and Phylogeny

Reliability of a tree

reliability of an estimated tree is to examine the reliability of each interior branch.

Page 35: Molecular Evolution and Phylogeny

Bootstrap

the reliability of an inferred tree is examined by using Efron’s bootstrap resampling technique. A set of nucleotide sites is randomly sampled with

replacement from the original set, and this random set is used for constructing a new phylogenetic tree.

This process is repeated many times, and the proportion of replications in which a given sequence cluster appears is computed.

If this proportion (PB) is high (say, PB > 0:95) for a sequence cluster, this cluster is considered to be statistically significant.

Page 36: Molecular Evolution and Phylogeny

Bootstrap values

Page 37: Molecular Evolution and Phylogeny

Bootstrapping

Open Matlab Open Help Type bootstrap and read

Page 38: Molecular Evolution and Phylogeny

Example

> load lawdata

> plot(lsat,gpa,'+')

> lsline

Page 39: Molecular Evolution and Phylogeny

Plot of lsat vs. gpa

Page 40: Molecular Evolution and Phylogeny

Calculate correlation between lsat and gpa

> rhohat = corrcoef(lsat,gpa)

> rhohat =

1.0000 0.7764

0.7764 1.0000

Page 41: Molecular Evolution and Phylogeny

Is 0.78 significant?

Now we have a number, 0.7764, describing the positive connection between LSAT and GPA, but though 0.7764 may seem large, we still do not know if it is statistically significant.

Page 42: Molecular Evolution and Phylogeny

Bootstrp function

Using the bootstrp function we can resample the lsat and gpa vectors as many times as we like and consider the variation in the resulting correlation coefficients.

Page 43: Molecular Evolution and Phylogeny

Generate 1000 lsat and gpa vectors by resampling from the original vectors

rhos1000 = bootstrp(1000,'corrcoef',lsat,gpa);

hist(rhos1000(:,2),30)

Page 44: Molecular Evolution and Phylogeny

What is the uncertainty associated with the observed correlation?

>> mean(rhos1000(:,2))ans = 0.7711>> std(rhos1000(:,2))ans =

0.1350>> 0.1350*1.96ans = 0.2646

Mean +/-1.96*std

Page 45: Molecular Evolution and Phylogeny

You have data on the expression pattern of two genes

HOXA1 and CDK6 expression values in different tissues are collected.

Open the excel file named data.xls Copy and paste the numerial data columns

(two of them) into the workspace as follows naming the data as ‘a’:

>> a =[ paste….here and close bracket];

Page 46: Molecular Evolution and Phylogeny

Calculate the uncertainty associated with the correlation btw HOXA1 and CDK6 genes

Plot the expression values (x, HOXA1; and y, CDK6).

Place a lsline on the data Calculate the correlation coefficient between

the genes Generate 1000 bootstrapped samples to

estimate the sample correlation coefficient. Determine the 95% confidence interval

around the bootstrapped correlation coefficient.

Page 47: Molecular Evolution and Phylogeny

Bootstrap of align2.m

Generate 1000 samples of bootstraped alignment score and its 95% confidence interval using the ‘bootstrp’ function.

Page 48: Molecular Evolution and Phylogeny

Bayesian Inference

There are three basic methods that have been used to estimate phylogeny, including distance, maximum parsimony (MP),and maximum likelihood (ML).

Bayesian statistics differs in that in addition to the current data, prior knowledge is included in the testing of the hypothesis.

Page 49: Molecular Evolution and Phylogeny

Medical tests and Bayesian Stats

Assume that previous studies have evaluated the accuracy of this test and have shown that, if you are in fact ill, there is a 99% likelihood that the test will give a true positive result (and thus, a 1% likelihood that the test will give a false negative).

Page 50: Molecular Evolution and Phylogeny

Medical tests and Bayesian Stats

It was also found that if you are healthy, there is a 0.1% likelihood of a false positive result from the test. If we were simply using the “data” (i.e., the test result), we would then conclude that a positive test result had approximately a 99% chance of being correct.

Page 51: Molecular Evolution and Phylogeny

Medical tests and Bayesian Stats

If we were to examine this question in a Bayesian framework, we could incorporate prior knowledge—in this case that other studies have shown that the base rate of this illness is 0.1% in the population.

Thus, of a population of 100,000 individuals, 100 would be ill and 99,900 would be healthy.

Page 52: Molecular Evolution and Phylogeny

Medical tests and Bayesian Stats

Using the likelihood values mentioned above, we could conclude that a positive test result would be seen in 99% of the ill individuals (99 true positives) and 0.1% of the healthy individuals (approximately 100 false positives).

Page 53: Molecular Evolution and Phylogeny

Medical tests and Bayesian Stats

This leaves us with a conclusion that if a person has a positive test result, there is a 99/199 or approximately 50% chance that the test is correct and this person is actually ill. Therefore, by including prior knowledge of the base rate of the illness in the population, the perceived chance that a positive result indicates that an individual actually has the illness drops from 99% to 50%.