Molecular Evolution and Phylogeny

Post on 24-Jan-2016

83 views 5 download

Tags:

description

Molecular Evolution and Phylogeny. Examples. Weakly deleterious mutations. Weakly deleterious mutations can reach high frequencies in local populations and, thus, may contribute significantly to genetic variance in disease susceptibility. Sequencing of human polymorphisms. - PowerPoint PPT Presentation

Transcript of Molecular Evolution and Phylogeny

Molecular Evolution and Phylogeny

Examples

Weakly deleterious mutations

Weakly deleterious mutations can reach high frequencies in local populations and, thus, may contribute significantly to genetic variance in disease susceptibility.

Sequencing of human polymorphisms

A team at Celera Genomics sequenced by exon-specific polymerase chain reaction (PCR) amplification 20,362 loci in 20 European Americans, 19 African Americans and one male chimpanzee with the initial intention of finding novel nonsynonymous single nucleotide polymorphisms (SNPs) based on their 2001 build of the human genome.

Divergence between human and mouse

A total of 34,099 fixed synonymous differences between 39 humans and the chimpanzee yield a genomic average synonymous divergence of dS = 1.02%.

20,467 non-synonymous differences dN = 0. 242% across 11.81 megabases (Mb) of aligned coding DNA.

Polymorphisms

15,750 synonymous and 14,311 non-synonymous SNPs among the human subjects, yielding average synonymous and non-synonymous SNP densities of pS = 0.470% and pN = 0.169%.

Polymorphisms are more than divergence

a highly significant excess of amino acid variation relative to divergence.

Can you comment on the following?

Evolution of human populations since sharing a last common ancestor with chimps

Type of nonsynonymous mutations (very deleterious or mildly deleterious) in human populations Positively selection Negative selection

Disease associations?

Non-neutral evolution

dN/dS = 1 neutral evolution dN/dS > 1 positive selection dN/dS <1 negative selection

Accelerated evolution of genes

What makes us a vertebrate?

Neural crest? Highly sophisticated nervous system? Bones/cartilage? Vertebrate specific genes?

Origin of bilateria

Some vertebrate genes date prior to the origin of bilateria

Bilateria

Bilateria: a monophyletic group of metazoan animals characterized by bilateral symmetry.

Radial symmetry Bilateria excludes the Cnidaria,

Ctenophora (sea gooseberries), Porifera (sponges) and Placozoa.

A little taxonomy

Cnidaria

Cnidaria: a basal phylum, has two body layers, radial symmetry and being at the tissue grade of morphological organization.

There are two basic morphologies; the sessile polyp and the swimming medusa or jellyfish.

The phylum contains four classes (examples), including jellyfish, sea anemone and hydra

Body Axis

Oral–aboral axis: the single obvious body axis of the two ‘radiate’ phyla (Cnidaria and Ctenophora), marked at one end by the mouth or oral pore.

Wnts signaling

http://www.stanford.edu/~rnusse/reviews/NaVReviewFinal438747a.pdf

Wnt Signaling

In Wnt signalling pathway, ligand binding triggers the formation of a receptor complex, and protein kinases modify the receptor tails, leading to recruitment of cytoplasmic factors.

In other signalling pathways, receptor-induced protein phosphorylation amplifies the signal, and the receptor-associated kinase acts as a catalyst for the modification of many substrate molecules.

Wnt genes

Mammals have 19 wnts Sea anemone has 12:

Nematostella vectensis, a diploblast

Kusserow A, Pang K, Sturm C, Hrouda M, Lentfer J, Schmidt HA, Technau U, von Haeseler A, Hobayer B, Martindale MQ, Holstein TW (2005) Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433:156-160.

Nematostella vectensis

http://www.nematostella.org/

Phylogenetic tree of wnts

Expression of wnts

The original bilaterian was equipped with a fairly elaborate set of molecular tools.

Endoderm, ectoderm, mesoderm

For example, the Nematostella ectodermal genes, NvWnt1, NvWnt2, NvWnt4 and NvWnt7 correspond to the neuroectodermal Wnt genes in the higher Bilateria.

NvWnt5, NvWnt6 and NvWnt8 are expressed in the endoderm, whereas the corresponding genes in deuterostomes are all expressed in the mesoderm.

Collagen

Bone is significantly linked to cartilage, both in development and evolution, with earlier forms having a cartilaginous skeleton that is replaced by bone. In vertebrates, cartilage also contains threads of collagen running through it.

Collagen

Bone is a living tissue continually remodeling the mineral matrix threaded with fibers of a protein, type II collagen, gives strength.

Collagen

Collagen is an ancient protein (800 million years ago?).

There are about 27 different types of collage in at least a dozen different classes. http://web.indstate.edu/thcme/mwking/extracellul

armatrix.html

One particular type, type II collagen, is an essential part of the matrix of bones and cartilages.

A primitive jawless fish from the late Devonian,

around 370 million years ago. Do lampreys have collagen?

Initially it was thought lampreys don’t have collagen

Zhang et al. screened a library of lamprey sequences and isolated two forms of collagen II, Col2α1a and Col2α1b.

The presence of a collagen homolog related to human collagen II the gene arose before the (jawless)lamprey-gnathostome (true-jaws) split.

Col2α1 is used in developing branchial cartilaginous skeleton.

Proc Natl Acad Sci U S A. 2006 Feb 21; Lamprey type II collagen and Sox9 reveal an ancient origin of the vertebrate collagenous skeleton.

Zhang G, Miyamoto MM, Cohn MJ.

but they do!

Collagen phylogeny

Bootstrapping

The bootstrap is a procedure that involves choosing random samples with replacement from a data set and analyzing each sample the same way.

Bootstrapping

Sampling with replacement means that every sample is returned to the data set after sampling. So a particular data point from the original data set could appear multiple times in a given bootstrap sample.

Bootstrapping

The number of elements in each bootstrap sample equals the number of elements in the original data set. The range of sample estimates we obtain allows us to establish the uncertainty of the quantity we are estimating.

Reliability of a tree

reliability of an estimated tree is to examine the reliability of each interior branch.

Bootstrap

the reliability of an inferred tree is examined by using Efron’s bootstrap resampling technique. A set of nucleotide sites is randomly sampled with

replacement from the original set, and this random set is used for constructing a new phylogenetic tree.

This process is repeated many times, and the proportion of replications in which a given sequence cluster appears is computed.

If this proportion (PB) is high (say, PB > 0:95) for a sequence cluster, this cluster is considered to be statistically significant.

Bootstrap values

Bootstrapping

Open Matlab Open Help Type bootstrap and read

Example

> load lawdata

> plot(lsat,gpa,'+')

> lsline

Plot of lsat vs. gpa

Calculate correlation between lsat and gpa

> rhohat = corrcoef(lsat,gpa)

> rhohat =

1.0000 0.7764

0.7764 1.0000

Is 0.78 significant?

Now we have a number, 0.7764, describing the positive connection between LSAT and GPA, but though 0.7764 may seem large, we still do not know if it is statistically significant.

Bootstrp function

Using the bootstrp function we can resample the lsat and gpa vectors as many times as we like and consider the variation in the resulting correlation coefficients.

Generate 1000 lsat and gpa vectors by resampling from the original vectors

rhos1000 = bootstrp(1000,'corrcoef',lsat,gpa);

hist(rhos1000(:,2),30)

What is the uncertainty associated with the observed correlation?

>> mean(rhos1000(:,2))ans = 0.7711>> std(rhos1000(:,2))ans =

0.1350>> 0.1350*1.96ans = 0.2646

Mean +/-1.96*std

You have data on the expression pattern of two genes

HOXA1 and CDK6 expression values in different tissues are collected.

Open the excel file named data.xls Copy and paste the numerial data columns

(two of them) into the workspace as follows naming the data as ‘a’:

>> a =[ paste….here and close bracket];

Calculate the uncertainty associated with the correlation btw HOXA1 and CDK6 genes

Plot the expression values (x, HOXA1; and y, CDK6).

Place a lsline on the data Calculate the correlation coefficient between

the genes Generate 1000 bootstrapped samples to

estimate the sample correlation coefficient. Determine the 95% confidence interval

around the bootstrapped correlation coefficient.

Bootstrap of align2.m

Generate 1000 samples of bootstraped alignment score and its 95% confidence interval using the ‘bootstrp’ function.

Bayesian Inference

There are three basic methods that have been used to estimate phylogeny, including distance, maximum parsimony (MP),and maximum likelihood (ML).

Bayesian statistics differs in that in addition to the current data, prior knowledge is included in the testing of the hypothesis.

Medical tests and Bayesian Stats

Assume that previous studies have evaluated the accuracy of this test and have shown that, if you are in fact ill, there is a 99% likelihood that the test will give a true positive result (and thus, a 1% likelihood that the test will give a false negative).

Medical tests and Bayesian Stats

It was also found that if you are healthy, there is a 0.1% likelihood of a false positive result from the test. If we were simply using the “data” (i.e., the test result), we would then conclude that a positive test result had approximately a 99% chance of being correct.

Medical tests and Bayesian Stats

If we were to examine this question in a Bayesian framework, we could incorporate prior knowledge—in this case that other studies have shown that the base rate of this illness is 0.1% in the population.

Thus, of a population of 100,000 individuals, 100 would be ill and 99,900 would be healthy.

Medical tests and Bayesian Stats

Using the likelihood values mentioned above, we could conclude that a positive test result would be seen in 99% of the ill individuals (99 true positives) and 0.1% of the healthy individuals (approximately 100 false positives).

Medical tests and Bayesian Stats

This leaves us with a conclusion that if a person has a positive test result, there is a 99/199 or approximately 50% chance that the test is correct and this person is actually ill. Therefore, by including prior knowledge of the base rate of the illness in the population, the perceived chance that a positive result indicates that an individual actually has the illness drops from 99% to 50%.