Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Post on 29-Dec-2015

249 views 3 download

Transcript of Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot,

Dot Matrix)

Dot-matrix AlignmentDot-matrix Alignment

Mount Bioinformatics Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001)

Similarity ≠ Homology,

1) 25% similarity ≥ 100 AAs is likely homology

2) Homology is an evolutionary statement which means “descent from a common ancestor” –common 3D structure–usually common function–all or nothing, cannot say "50% homologous"

Similarity is Based on Dot Plots

1) two sequences on vertical and horizontal axes of graph

2) put dots wherever there is a match

3) diagonal line is region of identity (local alignment)

4) apply a window filter - look at a group of bases, must meet % identity to get a dot

Similitud de dos secuencias

ATGCTAGCACGCGTGCGCAAAGGCAAGCGCTCGTGCGTAA

%Identidad = 15/20 = 75%

Son similares?Depende del punto de corte para el % de identidad escogido

Similitud de dos secuencias

ATGCTAGCACGCGTGCGCAAAGGCAAGCGCTCGTGCGTAA identity = 75%

Tamaño de ventana (window size) = 20

Nivel de restricción (stringency)= ?

Si‘stringency’ <=15 (15/20=75%)=> son similares

Si‘stringency’ >15 (<=20) => No son similares

Dot-matrix AlignmentDot-matrix AlignmentGCTTAGGCTGAAGGCTGAACTA

G C T T A G G C T G A

A M M

G M M M M

G M M M

C M M

T M M M

G M M M

A M M

A M M

C M M

T M M M

A M M

Window = 1

Stringency = 1

Dot-matrix AlignmentDot-matrix AlignmentGCTTAGGCTGA

AGGCTGAACTA

G C T T A G G C T G A

A M M

G M M M M

G M M M

C M M

T M M M

G M M M

A M M

A M M

C M M

T M M M

A M M

Window = 1

Stringency = 1

Dot-matrix AlignmentDot-matrix AlignmentGCTTAGGCTGAAGGCTGAACTA

G C T T A G G C T G A

A M M

G M M M M

G M M M

C M M

T M M M

G M M M

A M M

A M M

C M M

T M M M

A M M

Window = 1

Stringency = 1

S E Q U E N C E A N A L Y S I S P R I M E R

S

E

Q

U

E

N

C

E

A

N

A

L

Y

S

I

S

P

R

I

M

E

R

Since this is a comparison between two of the same sequences, an intrasequence comparison, the most obvious feature is the main identity diagonal. Two short perfect palindromes can also be seen as crosses directly off the main diagonal; they are “ANA” and “SIS.”

Window = 1

Stringency = 1

S E Q U E N C E A N A L Y S I S P R I M E R

S

E

Q

U

E

N

C

E

A

N

A

L

Y

S

I

S

P

R

I

M

E

R

Since this is a comparison between two of the same sequences, an intrasequence comparison, the most obvious feature is the main identity diagonal. Two short perfect palindromes can also be seen as crosses directly off the main diagonal; they are “ANA” and “SIS.”

Window = 1

Stringency = 1

Dot-matrix Alignment Dot-matrix Alignment

G C T T A G G C T G A

A

G

G

C

T

G

A

A

C

T

A

Window = 3

Stringency = 3

The only remaining dots indicate the two runs of identity between the two sequences; however, any indication of the palindrome, “ANA” has been lost. This is because our filtering approach was too

stringent to catch such a short element. In general you need to make your window about the same size as the element you are attempting to locate. In the case of our palindrome, “AN” and “NA”’ are the inverted repeat sequences and since our window was set to three, we will not be able to see an element only two letters long. Had we set our stringency filter to one in a window of two, then these

would be visible. The Wisconsin Package’s implementation of dot matrix analysis, the paired programs Compare and DotPlot use the window/stringency method by default.

Dot plot of real data

CVJB

Window Size = 8 Scoring Matrix: pam250 matrixMin. % Score = 30Hash Value = 2

20 40 60 80 100 120 140 160 180 200 220

20

40

60

80

100

120

140

160

180

200

220

S E Q U E N C E A N A L Y S I S P R I M E R

S

E

Q

U

E

N

C

E

S

E

Q

U

E

N

C

E

S

E

Q

U

E

N

C

E

Another phenomenon that is very easy to visualize with dot matrix analysis are duplications or direct repeats.

The ‘duplication’ here is seen as a distinct column of diagonals; whenever you see either a row or column of diagonals in a dotplot, you are looking at direct repeats.

Window = 1

Stringency = 1

Now consider the more complicated ‘mutation’ in the following comparison:

S E Q U E N C E A N A L Y S I S P R I M E R

A

N

A

L

Y

Z

E

S

E

Q

U

E

N

C

E

S

Again, notice the diagonals. However, they have now been displaced off of the center diagonal of the plot and, in fact, in this example, show the occurrence of a ‘transposition.’ Dot matrix analysis is one of the only sensible ways to locate such transpositions in sequences. Inverted repeats still show up as perpendicular lines to the diagonals, they are just now not on the center of the plot. The ‘deletion’ of ‘PRIMER’ is shown by the lack of a corresponding diagonal.

Window = 1

Stringency = 1

Reconsider the same plot. Notice the extraneous dots that neither

indicate runs of identity between the two sequences nor inverted

repeats. These merely contribute ‘noise’ to the plot and are due to

the ‘random’ occurrence of the letters in the sequences, the

composition of the sequences themselves.

How can we ‘clean up’ the plots so that this noise does not detract

from our interpretations? Consider the implementation of a filtered

windowing approach; a dot will only be placed if some ‘stringency’ is

met.

What is meant by this is that if within some defined window size, and

when some defined criteria is met, then and only then, will a dot be

placed at the middle of that window. Then the window is shifted one

position and the entire process is repeated. This very successfully

rids the plot of unwanted noise.

In the next plot a window of size three and a stringency of two was

used to considerably improve the signal to noise ratio (remember, I am

using a 1:0 identity scoring function).

Filtered Windowing —

Default RNA self comparison (Phe tRNA)(window of 21 and stringency of 14) —

window size to 7

stringency value to 5

Several direct repeats are now obvious that remained obscured in the previous analysis.

22 GAGCGCCAGACT G 12, 22 || | ||||| | A

48 CTGGAGGTCTAG A 3

Base position 22 through position 33 base pairs with (think — is quite similar to the reverse-

complement of) itself from base position 37 through position 48. MFold, Zuker’s RNA folding algorithm uses base pairing energies to find the family of optimal and suboptimal structures; the most stable structure found is shown to possess a stem at positions 27 to 31 with 39 to 43. However the region around position 38 is represented as a loop. The actual modeled structure as seen in PDB’s 1TRA shows ‘reality’ lies somewhere in between.

RNA comparisons of the reverse, complement of a sequence to itself can often be very informative. Here the yeast tRNA sequence is compared to its reverse, complement using the same 5 out of 7

stringency setting as previously. The stem-loop, inverted repeats of the tRNA clover-leaf molecular shape become obvious.

That same region ‘zoomed in on’ has some small direct repeats seen by comparing the sequence against itself without reversal:

But looking at the same region of the sequence against its reverse-complement shows a wealth of potential stem-loop structure in the transfer RNA:

Conclusion: Dot-matrix AlignmentConclusion: Dot-matrix Alignment

• Strengths:

Simple All possible matches generated Can identify repeated sequence elements Often provides a starting point for other alignment algorithms

• Weaknesses:

Noise level can be high Cannot discriminate optimal from suboptimal alignments Doesn’t handle gaps well