Approaches to Sequence Analysis s2s2 s3s3 s4s4 s1s1 statistics GT-CAT GTTGGT GT-CA- CT-CA-...

Approaches to Sequence Analysis

s2 s3 s4s1

statistics

GT-CAT

GTTGGT

GT-CA-

CT-CA-

Parsimony, similarity, optimisation.

Data {GTCAT,GTTGGT,GTCA,CTCA}

Actual Practice: 2 phase analysis.

Ideal Practice: 1 phase analysis.

1. TKF91 - The combined

substitution/indel process.

2. Acceleration of Basic

Algorithm

3. Many Sequence Algorithm

4. MCMC Approaches

Thorne-Kishino-Felsenstein (1991) Process

(birth rate) (death rate)

A # C G

s1 s22. Time reversible:

1. P(s) = (1-)()l A#A* .. * T

#T l =length(s)

# - - -

# # # #

& into Alignment BlocksA. Amino Acids Ignored:

e-t[1-]()k-1

# - - - # # # # k

# - - - -- # # # # k

=[1-e()t]/[e()t]

pk(t)p’k(t)

[1--]()k

p’0(t)= (t)

* - - - -* # # # # k

[1-]()k

p’’k(t)

B. Amino Acids Considered:

T - - -R Q S W Pt(T-->R)*Q*..*W*p4(t) 4

T - - - -- R Q S W R *Q*..*W*p’4(t) 4

# - - ... -# # # ... #

Differential Equations for p-functions

# - - - ... -- # # # ... #

* - - - ... -* # # # ... #

Initial Conditions: pk(0)= pk’’(0)= p’k (0)= 0 k>1 p1(0)= p0’’(0)= 1. p’0 (0)= 0

pk = t*[*(k-1) pk-1 + *k*pk+1 - ()*k*pk]

p’k=t*[*(k-1) p’k-1+*(k+1)*p’k+1-()*k*p’k+*pk+1]

p’’k=t*[*k*p’’k-1+*(k+1)*p’’k+1- [(k+1)+k]*p’’k]

Basic Pairwise Recursion (O(length3))

Survives: Dies:

i-1j-2

……………………

1… j (j) cases

……………………

])[2(*'*)21( 111 jspssP ji

0… j (j+1) cases

…………………………………………

……………………

P(s1i s2 j )

(s2[ j])

f (s1[i],s2[ j 1])

P(s1i 1 s2 j 2)

e-t[1-]()k-1, where

=[1-e()t]/[e()t]

Basic Pairwise Recursion (O(length3))

(i-1,j)

(i-1,j-1)

survive

(i-1,j-k)

…………..

…………..…………..

Initial condition:

p’’=s2[1:j]

Accelleration of Pairwise Algorithm(From Hein,Wiuf,Knudsen,Moeller & Wiebling 2000)

Corner Cutting ~100-1000

Better Numerical Search ~10-100Ex.: good start guess, 28 evaluations, 3 iterations

Simpler Recursion ~3-10

Faster Computers ~250

1991-->2000 ~106

-globin (141) and -globin (146)(From Hein,Wiuf,Knudsen,Moeller & Wiebling 2000)

430.108 : -log(-globin) 327.320 : -log(-globin --> -globin) 747.428 : -log(-globin, -globin) = -log(l(sumalign))

*t: 0.0371805 +/- 0.0135899*t: 0.0374396 +/- 0.0136846s*t: 0.91701 +/- 0.119556

E(Length) E(Insertions,Deletions) E(Substitutions) 143.499 5.37255 131.59

Maximum contributing alignment:

V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS--H---GSAQVKGHGKKVADALTVHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFS

NAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYRDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

Ratio l(maxalign)/l(sumalign) = 0.00565064

The invasion of the immortal link

VLSPADNAL.....DLHAHKR 141 AA long

???????????????????? k AA long

2 107 years

2 108 years

2 109 years

*########### …. ### 141 AA long

*########### …. ###

109 years

Algorithm for alignment on star tree (O(length6))(Steel & Hein, 2001)

* ()*######

P(S) (1)[P*(S)

P# (Tail )P(S Tail)]

*ACGC *TT GT

*ACG GT

Binary Tree Problem

The problem would be simpler if:

A Markov chain generating ancestral alignments can solve the problem!!

a1 a2* *# ## -- ## #- #

i. The ancestral sequences & their alignment was known.

ii. The alignment of ancestral alignment columns to leaf sequences was known

How to sum over all possible ancestral sequences and their alignments?:

- # # E # # - E ** e- e-

## e- e-

_# e- e-

( )1 e

Generating Ancestral Alignments

a1 *a2 *

# # e-

The Basic Recursion

”Remove 1st step” - recursion:

”Remove last step” - recursion:

Last/First step removal are inequivalent, but have the same complexities.

First step algorithm is the simplest.

Sequence Recursion: First Step Removal

P '(k Si ,H )H C

P( )P (Si)

P(Sk): Epifixes (S[k+1:l]) starting in given MC starts in .

P(Sk) = E

( p' kj:H ( j )0

(t j ) sj [i( j) : k( j)])( pkj:H( j )1

( t j ) sj [i( j)1: k( j)])F(kSi,H)

Where P’(kS i,H =

Human alpha hemoglobin;Human beta hemoglobin;Human myoglobinBean leghemoglobin

Probability of data e -1560.138

Probability of data and alignment e-1593.223

Probability of alignment given data 4.279 * 10-15 = e-33.085

Ratio of insertion-deletions to substitutions: 0.0334

Maximum likelihood phylogeny and alignment

Gerton Lunter

Istvan Miklos

Alexei Drummond

Yun Song

Metropolis-Hastings Statistical AlignmentLunter, Drummond, Miklos, Jensen & Hein, 2005

Approaches to Sequence Analysis s2s2 s3s3 s4s4 s1s1 statistics GT-CAT GTTGGT GT-CA- CT-CA-...

Documents

Transcript of Approaches to Sequence Analysis s2s2 s3s3 s4s4 s1s1 statistics GT-CAT GTTGGT GT-CA- CT-CA-...

S4S4 SScream for Halloween - s3images.coroflot.com · Halloween. It’s one of the best times of year for crafting, cooking and scaring up some truly creative treats and home decorations.

Natural Language Processing >> Syntax · PDF fileNatural Language Processing >> Syntax

COMMITTEE - 6502.org · PDF filecommittee < •♦♦•>>> ♦♦ ♦ *>•♦>>> >> ... 91 mikro revisited ... 11 rem rl=rec0rd length sa=sec. address for relative file

Solving the McKenzie/Admirals Intersection · Solving the McKenzie/Admirals Intersection ... skarpes@hotmail.com GTCA Board of Directors 2015/16 General email: info@gorgetillicum.ca

W:/dokumentation/Broschüren/VC1650/VC1650 f 2016 5.19 · PDF file!4b8=5>a=bc427=8@d4b4c;4b?7>c>bb>=cb>dbaob4ae434385820c8>=b!4b?7>c>b?4de4=c?aob4=c4a34b>?c8>=b34b0224bb>8a4b. title:

ECD booklet 5 Nutrition Karen - · PDF fileturRpXRtzdvXw>rRM> tzdoh.wz.vXw>tD.t*kmt*RM.vDRIyxH.b.rd>y>tg*R td.'D;w>zH;w>rRud; *RtCdw>qXuwD>wtd.vXtub.uG>xGJtzdqH;oh.wz.b

MPIA 15 459 1133 - · Dst: Type: icmp time exceeded in-transit [tos OxcO] /-----, -----\ > > > > > > > > > ® . MPIA 15 459 11302/17/99 , !

Auto-tracking Model Auto-collimation Model GT-1001 GT-1003 ... · Model GT-1001 GT-1003 GT-1005 GT-501 GT-503 GT-505/505E Auto-tracking / Auto-Collimating Auto ... - Bluetooth®wordmarkandlogosare

02 · PDF file#9 Muc/BaYern >> >> >> 02 2010 liVe sa 13.03. @ OlYMpiahalle Jean Michel Jarre

CCS (CCA) RULES, 1965 >>>>

GT 11 (as USB,LVDS,GVIF) · GT 3 GT 6W GT 7 GT 10 GT 18W GT 15 GT 8 GT 9 GT 25 GT 22 Coaxial cable connection of Antennas, Sensors, and Communication Trunk Lines BMating Table 8-Conductor

Obstruction lightsObstruction lights > >> > Low ... · PDF fileObstruction lightsObstruction lights > >> > Low IntensityLow IntensityLow ... NF C15-100 EN60529 ICAO Annexe ... Obstruction

List of Tables - Bangko Sentral Ng · PDF fileStatus Report on the Philippine Financial System =>>? =>>& =>@> =>@@% =>@=%!"#$%&'$()!""* $+&,&-.

in > ex >

>ü? ? ? >Ì>ÿ? ? ? ? >Ì>ï? ? ? ? ? ? >Ì>í? ? ?!? ? >Ì>þ · PDF fileClassNK Annual Report on Port State Control TABLE OF CONTENTS Chapter 1 Status of Implementation and Recent

GTCA is now part of Takeda - gtcaustria GTCA is now part of Takeda. AAV is the Vehicle of Choice Global development & manufacturing capacities are limited. ... platf orm Proprietary

ÅÞ Ñ >Ó ÞÅ ÅF ÑÈ> iÓiÅÑÓ Ñ> i B i iÓiÅ · PDF file£ÅÑ `iÓÑ w£ÅÈÓ>Ñ ÈFÑ >ÅÑ >L Å>Ó Å iÓÈÑ äB >ÅÑ Þ ÓÈÓÅBV ÓÈÑ Ó Ñ i >Ñ ° > iÓi

GT 5 GT 13 GT26 Series - Mouser Electronics · 2016-02-05 · 153 GT26 Series GT 5 GT 13 GT 16 PO5G GT 11 GT 17 GT 19 GT 26 GT 27 GT 3 GT 6W GT 7 GT 10 GT 18W GT 15 GT 8 GT 9 GT 25

3rd Movement - Piano Red: sitio sobre · PDF file2 ‚ //// ﬁ 9 //// 7> 2 > > > / > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > = > > > / > > > > > > />>: w >

· PDF file... bbbb;;;;dddd˝˝˝˝cccc bbbb˝˝˝˝###"""aaaa ... ˚22 @@@@79999 ***==== 6666 2222 1111 bbbb˝˝˝˝ˆˆˆˆ>>>>bbbb˝˝˝xxx ˛˛˛kkk >>>> ;;;\\\\rrrr

ECD booklet 5 Nutrition Karen - · PDF fileturRpXRtzdvXw>rRM> tzdoh.wz.vXw>tD.tkmtRM.vDRIyxH.b.rd>y>tgR td.'D;w>zH;w>rRud; RtCdw>qXuwD>wtd.vXtub.uG>xGJtzdqH;oh.wz.b