Tutorial 4 Substitution matrices and PSI-BLAST 1.

27
Tutorial 4 Substitution matrices and PSI-BLAST 1

Transcript of Tutorial 4 Substitution matrices and PSI-BLAST 1.

Page 1: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Tutorial 4Substitution matrices and PSI-BLAST

1

Page 2: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Agenda

• Why study distant homologies?• Substitution Matrices– PAM - Point Accepted Mutations– BLOSUM - Blocks Substitution Matrix

• PSI-BLAST

Cool story of the day:Why should we care about cellular fusion in worms?

2

Page 3: Tutorial 4 Substitution matrices and PSI-BLAST 1.

How proteins evolve

• Throughout evolution proteins change• Some change more than others, and at different

rates in different regions of the protein.

3

Page 4: Tutorial 4 Substitution matrices and PSI-BLAST 1.

• When we study a new organism we may find a lot of unknown sequences that we would like to characterize.

We might not be able to find any close homologies.

• Substitution matrices model different evolutional distances.

• PSI-BLAST enable to find more distant relations between proteins.

4

Why study distant homologies?

Page 5: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Amino acids were not born equally

5

Both substitution matrices and PSI-BLAST are designed to model the process by which AAs mutate.

Page 6: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Substitution Matrix

• Scoring matrix S of size 20x20

• Si,j represents the gain/penalty due to substituting AAj by AAi (i – line , j – column)

– Based on likelihood this substitution is found in nature

– Computed differently in PAM and BLOSUM

• Each matrix is tailored to a particular evolutionary distance

6

Page 7: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Computing probability of Mutation (Mi,j)

• PAM - Point Accepted Mutations– Based on a small set of proteins that are closely

related– Other than PAM1 the matrices are theoretical.

• BLOSUM - Blocks Substitution Matrix– Based on a wider database of proteins that

includes families of proteins with conserved regions.

– The matrices are empirical.

7

Page 8: Tutorial 4 Substitution matrices and PSI-BLAST 1.

PAM

• Based on a small set of proteins that are closely related

• PAM1 Captures mutation rates between close proteins – protein with 1% divergence

• Problematic when comparing distant proteins. The 1% divergence does not capture more sporadic mutations

8

Page 9: Tutorial 4 Substitution matrices and PSI-BLAST 1.

PAM-X

• In order to apply for more distant proteins PAM-1 was self-multiplied. This models the evolutionary process of accumulation of mutations.

• The higher the number of the matrix – the more suitable it is to find distant homologies.

• Other than PAM1 the matrices are theoretical.

9

Page 10: Tutorial 4 Substitution matrices and PSI-BLAST 1.

• Scores for each position are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins.

• BLOSUM62 contains all blocks whose members shared at most 62% identity with any other member of that block.

10

BLOSUM

Page 11: Tutorial 4 Substitution matrices and PSI-BLAST 1.

11

50% similarity 50% similarity 32% similaritySubstitution Matrix B

BLOSUM-X

Substitution Matrix A

BLOCKS DB

Page 12: Tutorial 4 Substitution matrices and PSI-BLAST 1.

PAM vs. BLOSUM

PAM BLOSUMBased on global alignments of closely related proteins.

Based on local alignments.

The PAM1 is calculated from comparisons of sequences with no more than 1% divergence.

BLOSUM 62 is calculated from comparisons of sequences with no more than 62% identity in the blocks.

Other PAM matrices are extrapolated from PAM1.

All BLOSUM matrices are based on observed alignments. They are not extrapolated from comparisons of closely related proteins.

12

BLOSUM are the substitution matrices in use

Page 13: Tutorial 4 Substitution matrices and PSI-BLAST 1.

PAM100 ~ BLOSUM90 Closely RelatedPAM120 ~ BLOSUM80PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52PAM250 ~ BLOSUM45 Highly Divergent

Query length Matrix Gap costs

<35 PAM30 9,1

35-50 PAM70 10,1

50-85 BLOSUM80 10,1

>85 BLOSUM62 11,1

Use Recommendations

13http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html

Page 14: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Example• Query: an uncharacterized (hypothetical)

protein• Data Base: nr• Blast Program: BLASTP• Matrices: PAM30 / PAM250

BLOSUM45 / BLOSUM90

14

Page 15: Tutorial 4 Substitution matrices and PSI-BLAST 1.

15

Page 16: Tutorial 4 Substitution matrices and PSI-BLAST 1.

16

Page 17: Tutorial 4 Substitution matrices and PSI-BLAST 1.

PSI-BLAST

Position Specific Iterative BLAST Aimed to find more distant proteins than BLAST allows

17

Page 18: Tutorial 4 Substitution matrices and PSI-BLAST 1.

PSI-BLAST Steps

18

1. Search a query against a protein database2. Constructs a specialized multiple sequence

alignment based on the top results.3. Creates a position-specific scoring matrix (PSSM).4. The PSSM is used as a query against the database5. PSI-BLAST estimates statistical significance (E values)Repeat steps 3-5 iteratively.

Protein DBSearch

QueryPSSM

Results

Iterations

Page 19: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Example

19

We will use a sequence of an uncharacterized (hypothetical) protein:

Page 20: Tutorial 4 Substitution matrices and PSI-BLAST 1.

20

Threshold for initial BLAST Search (default: 10)

Threshold for inclusion in PSI-BLAST iterations

(default: 0.005)

Page 21: Tutorial 4 Substitution matrices and PSI-BLAST 1.

21

The results are all hypothetical

proteins

The results are all hypothetical

proteins

Page 22: Tutorial 4 Substitution matrices and PSI-BLAST 1.

22

Page 23: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Cool Story of the day

Why should we care about cellular fusion in worms?

Page 24: Tutorial 4 Substitution matrices and PSI-BLAST 1.

Cellular fusionIn cellular fusion two cells unite and form one cell•Fertilization•Muscle cells are composed of rows of fused cells•Placenta is made up of powerful multinucleated cells that are actually numerous individual cells that have fused•The eyes' lenses are formed of rows of fused cells•In bones too cellular fusion occurs. •The fusion processes are also involved in cancer, viral infections and stem cells.

24http://www1.technion.ac.il/_local/includes/blocks/sci-news-items/100513-elegans/news-item-en.htm

Page 25: Tutorial 4 Substitution matrices and PSI-BLAST 1.

25

Beni Podbilewicz

• The exact way fusion takes place is still not completely clear and is the focus of work in Prof. Podbilewicz's lab.

• The worm suits cell fusion research because in its skin intensive cell-cell fusion processes take place and can be easily followed.

• They identified the protein responsible for the worm's fusion activity - the EFF-1 protein.

• The researchers showed that in mutant worms skin cells do not fuse and the cells begin to migrate through the body.

Cellular fusion in C.elegans

Page 26: Tutorial 4 Substitution matrices and PSI-BLAST 1.

26

Page 27: Tutorial 4 Substitution matrices and PSI-BLAST 1.

27

“...we identified fusion family (FF) proteins within and beyond nematodes, and divergent members from the human parasitic nematodeTrichinella spiralis and the chordate Branchiostoma floridae could also fuse mammalian cells…”