MAT 4830 Mathematical Modeling

32
MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I http://myhome.spu.edu/lauw

description

MAT 4830 Mathematical Modeling. 4.5 Phylogenetic Distances I. http://myhome.spu.edu/lauw. Preview. Phylogenetic : of or relating to the evolutionary development of organisms Estimate the amount of total mutations (observed and hidden mutations). Example from 4.1. S0 : Ancestral sequence - PowerPoint PPT Presentation

Transcript of MAT 4830 Mathematical Modeling

Page 1: MAT 4830 Mathematical Modeling

MAT 4830Mathematical Modeling

4.5

Phylogenetic Distances I

http://myhome.spu.edu/lauw

Page 2: MAT 4830 Mathematical Modeling

Preview

Phylogenetic: of or relating to the evolutionary development of organisms

Estimate the amount of total mutations (observed and hidden mutations).

Page 3: MAT 4830 Mathematical Modeling

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Page 4: MAT 4830 Mathematical Modeling

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Observed mutations: 2

Page 5: MAT 4830 Mathematical Modeling

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Actual mutations: 5

Page 6: MAT 4830 Mathematical Modeling

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Actual mutations: 5, (some are hidden mutations)

Page 7: MAT 4830 Mathematical Modeling

Distance of Two Sequences

We want to define the “distance” between two sequences.

It measures the average no. of mutations per site that occurred, including the hidden ones.

S0 : ATGTCGCCTGATAATGCC

S : ATGCCGCGTGATAATGCC

Page 8: MAT 4830 Mathematical Modeling

Distance of Two Sequences

Let d(S0,S) be the distance between sequences S0 and S. What properties it “should” have?

1.

2.

3.S0 : ATGTCGCCTGATAATGCC

S : ATGCCGCGTGATAATGCC

Page 9: MAT 4830 Mathematical Modeling

Jukes-Cantor Model

Assume α is small. Mutations per time step are “rare”.

0

1 / 3 / 3 / 3

/ 3 1 / 3 / 3 1 1 1 1( )

/ 3 / 3 1 / 3 4 4 4 4

/ 3 / 3 / 3 1

T

M p

Page 10: MAT 4830 Mathematical Modeling

Jukes-Cantor Model

q(t)=conditional prob. that the base at time t is the same as the base at time 0

( )q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

Page 11: MAT 4830 Mathematical Modeling

Jukes-Cantor Model

q(t)=fraction of sites with no observed mutations

( )q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

Page 12: MAT 4830 Mathematical Modeling

Jukes-Cantor Model

p(t)=1-q(t)=fractions of sites with observed mutations

( )q t

( ) 1 ( )p t q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

Page 13: MAT 4830 Mathematical Modeling

Jukes-Cantor Model

p(t)=1-q(t)=fractions of sites with observed mutations

( )q t

( ) 1 ( )p t q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

3 3 4( ) 1

4 4 3

t

p t

Page 14: MAT 4830 Mathematical Modeling

Jukes-Cantor Model

p can be estimated from the two sequences

( )q t

( ) 1 ( )p t q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

3 3 4( ) 1

4 4 3

t

p t

Page 15: MAT 4830 Mathematical Modeling

Example from 4.1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Observed mutations: 2

fractions of sites with observed mutations

2 0.11

18p

Page 16: MAT 4830 Mathematical Modeling

Jukes-Cantor Distance

Given p (and t), the J-C distance between two sequences S0 and S1 is defined as

0 1

3 4( , ) ln 1

4 3JCd S S p

0

1

: ATGTCGCCTGATAATGCC

: ATGCCGCGTGATAATGCC

S

S

Page 17: MAT 4830 Mathematical Modeling

Jukes-Cantor Distance

Given p (and t), the J-C distance between two sequences S0 and S1 is defined as

0 1

3 4( , ) ln 1

4 3JCd S S p

0

1

: ATGTCGCCTGATAATGCC

: ATGCCGCGTGATAATGCC

S

S

Page 18: MAT 4830 Mathematical Modeling

Jukes-Cantor Distance

rate of base sub. sub. per site per time step

t no. of time step

t total no. of sub. in t time steps sub. per site

Page 19: MAT 4830 Mathematical Modeling

Jukes-Cantor Distance

rate of base sub. sub. per site per time step

t no. of time step

t total no. of sub. in t time steps sub. per site

3 3 41

4 4 3

4 4ln 1 ln 1

3 3 when is small

44ln 1

33

t

p

p pt

Page 20: MAT 4830 Mathematical Modeling

Jukes-Cantor Distance

rate of base sub. sub. per site per time step

t no. of time step

t total no. of sub. in t time steps sub. per site

3 3 41

4 4 3

4 4ln 1 ln 1

3 3 when is small

44ln 1

33

t

p

p pt

3 4ln 1

4 3t p

Page 21: MAT 4830 Mathematical Modeling

Example from 4.3

Suppose a 40-base ancestral and descendent DNA sequences are

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

Page 22: MAT 4830 Mathematical Modeling

Example from 4.3

Suppose a 40-base ancestral and descendent DNA sequences are

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

110.275

403 4 11

ln 1 0.34264 3 40JC

p

d

0 1

3 4( , ) ln 1

4 3JCd S S p

Page 23: MAT 4830 Mathematical Modeling

Example from 4.3

0.275 observed sub. per site.

0.3426 sub. estimated per site.

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

110.275

403 4 11

ln 1 0.34264 3 40JC

p

d

Page 24: MAT 4830 Mathematical Modeling

Example from 4.3

11 observed sub.

13.7 sub. estimated.

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

110.275

403 4 11

ln 1 0.34264 3 40JC

p

d

Page 25: MAT 4830 Mathematical Modeling

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Page 26: MAT 4830 Mathematical Modeling

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Count the number of base substitutions occurred.

Page 27: MAT 4830 Mathematical Modeling

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Count the number of base substitutions occurred.

Compute the Jukes-Cantor distance of the initial and finial sequence.

Page 28: MAT 4830 Mathematical Modeling

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Count the number of base substitutions occurred.

Compute the Jukes-Cantor distance of the initial and finial sequence.

Compare the actual number of base substitutions and the estimation from the Jukes-Cantor distance.

Page 29: MAT 4830 Mathematical Modeling

Performance of JC distance (Homework Problem 4)

Page 30: MAT 4830 Mathematical Modeling

Maple: Strings Handling II

Concatenating two strings

Page 31: MAT 4830 Mathematical Modeling

Maple: Strings Handling II

However, no “re-assignment”.

Page 32: MAT 4830 Mathematical Modeling

Classwork

Work on HW #1, 2