Some principles and examples related to evaluation of sequence similarities with help of length...

14
Some principles and examples related to e valuation of sequence similarities with help of l ength e quivalent m easures (ELEMS) Jaroslav Kubrycht and Karel Sigler Prague, 30 November, 2006

Transcript of Some principles and examples related to evaluation of sequence similarities with help of length...

Page 1: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS)

Jaroslav Kubrycht and Karel Sigler

Prague, 30 November, 2006

Page 2: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Examples and kinds of column identities derived by ELEMS

Page 3: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

L I A T R

I S A R V

L W I R C C

L W S I T V

I S A I R C

L S A T R

L I W I C

L I S R C

I W A T V

L W S I C R

Minimum aa numbers limiting ELEMS(RDA) derived levels: CCBE aa, high occurrence aa, template motif aa, questionable aa

cysteine exhibits the same numbers for both template motif and questionable aa ?

see ourpdf file

Page 4: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Examples of amino acid similaritiesand their contradictory dissimilaritiesin sequence block columns

Page 5: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Questionable amino acids A and V convertible via single triplet mutation present in the same column (cooperating pairs) achieve mixed high occurrence level.

A G

A

V A

A A

G

V G

G

A A

V

On the other, hand collocatingtemplate amino acids A and G without mutation relationship form contradictory pairs, which in fact diminish the level of overall extent of aa similarities in their block.

Page 6: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Length equivalents as products of probabilistic compression

Page 7: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

The probability of amino acids present in left column can be represented by a complete column similarity of non-integer height, i.e. by the vertical length equivalent of column(LEA).

A

A

A

A

A

A

A

A

ELEMS(RDA) in given case determines high occurrence level of aa similarity, which LEA= 3.095.

Page 8: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

In addition to LEA, we define also mean compressed height of whole sequence blocks, i.e. LETM. Both given height-related (vertical) length equivalents are restricted by the same number limits in ELEMS distinguishing different kinds of similarities.

restricted aa description of lower limit interval

questionable

(gray zone)

random aa/chain 1.0-1.5

template motif fuzzy-related point among random aa/chain and double sequence similarity

1.5-3.0

cohesive three compressed aa/chains represent minimum sticking stage of cohesion

3.0-(SL+2)

CBCE for details see our pdf file > SL+2

Page 9: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Similar compression principle is also used to process gapped sequence block. Thus we result a compressed block with co-lumns containing only identical/similar aa and exhibiting non-integer height done by LETM.

However, the first floor of given oblong block belongs to a random chain (in light orange) of the template motif. Only upper area determines HLE value. This means that: HLE = (LETM – 1) x n.

HLE

random chain

Page 10: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Mild modification in case of double sequencesimilarity

Double sequence similarity uses only a single value of LEA (LEA = 2) following from the presence of only two chains in corresponding sequence block. Since this similarity has no alternative chain, corresponding alignment is accompanied by increased frequency of losses of column similarities in comparison with multiple sequence alignments. This and LEA values higher than necessary induced us to avoid restrictions of mean length equivalent (LETM) value in double sequence similarity, still keeping HLE evaluation. In spite of it, some agreement between BLAST and ELEMS is demonstrated in WP3.2.2.

Page 11: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Alternatively, we can represent HLE as a single chain of non-integer HLE length. This raises the question of minimal length of the chain exhibiting mean aa probability (or score) identical with template motif related to HLE. Corresponding minimum value of non-integer length (SL, i.e. specific limit) can be determined using several statistical procedures.

specific limit (SL)

HLE chain of sufficient lengthi.e. HLE > SL

Page 12: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

RBS as unifying value

Page 13: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

The ratio of HLE to SL is independent of any probability differences. Moreover, this ratio provides a simply and illustrative insight into the difference from minimum significant value. Consequently, we suppose that such value may represent an interesting density- related parameter, which may complement the bit score evaluation.

The given ratio was named relative block similarity (RBS). RBS is thus determined by the formula: RBS = HLE/SLE

Page 14: Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Thank you for your visit of our web page.

If you have any questions, our e-mails are:

[email protected]

[email protected]

You are invited.