Computational Biology, Part 11 Hydrophilicity & Multiple Sequence Alignment Robert F. Murphy...

Computational Biology, Part 11Hydrophilicity &

Multiple Sequence Alignment

Computational Biology, Part 11Hydrophilicity &

Multiple Sequence Alignment

Robert F. MurphyRobert F. Murphy

Copyright Copyright 1996, 1999, 2001. 1996, 1999, 2001.

All rights reserved.All rights reserved.

Hydropathy/Hydrophilicity/ HydrophobicityHydropathy/Hydrophilicity/ Hydrophobicity Hydropathy & HydrophobicityHydropathy & Hydrophobicity

degree to which something is “water hating” or degree to which something is “water hating” or “water fearing”“water fearing”

HydrophilicityHydrophilicity degree to which something is “water loving”degree to which something is “water loving”

Hydropathy/Hydrophilicity/ Hydrophobicity AnalysisHydropathy/Hydrophilicity/ Hydrophobicity Analysis Goal: Obtain quantitative descriptions of the Goal: Obtain quantitative descriptions of the

degree to which regions of a protein are degree to which regions of a protein are likely to be exposed to aqueous solventslikely to be exposed to aqueous solvents

Starting point: Tables of propensities of Starting point: Tables of propensities of each amino acideach amino acid

Hydrophobicity/Hydrophilicity TablesHydrophobicity/Hydrophilicity Tables Describe the likelihood that each amino acid Describe the likelihood that each amino acid

will be found in an aqueous environment - will be found in an aqueous environment - one value for each amino acidone value for each amino acid

Commonly used tablesCommonly used tables Kyte-DoolittleKyte-Doolittle hydropathy hydropathy Hopp-WoodsHopp-Woods hydrophilicity hydrophilicity Eisenberg et al. Eisenberg et al. normalized consensus normalized consensus

hydrophobicityhydrophobicity

Kyte-Doolittle hydropathyKyte-Doolittle hydropathyAminoAcid

Index AminoAcid

Index

R -4.5 S -0.8K -3.9 T -0.7D -3.5 G -0.4Q -3.5 A 1.8N -3.5 M 1.9E -3.5 C 2.5H -3.2 F 2.8P -1.6 L 3.8Y -1.3 V 4.2W -0.9 I 4.5

Basic Hydropathy/Hydrophilicity PlotBasic Hydropathy/Hydrophilicity Plot Calculate average hydropathy over a Calculate average hydropathy over a

windowwindow (e.g., 7 amino acids) and slide (e.g., 7 amino acids) and slide window until entire sequence has been window until entire sequence has been analyzedanalyzed

Plot average for each window versus Plot average for each window versus position of window in sequenceposition of window in sequence

Example Hydrophilicity PlotExample Hydrophilicity Plot

This plot is for a tubulin, a soluble cytoplasmic protein. Regions with high hydrophilicity are likely to be exposed to the solvent (cytoplasm), while those with low hydrophilicity are likely to be internal or interacting with other proteins.

Amphiphilicity/AmphipathicityAmphiphilicity/Amphipathicity

A structural domain of a protein (e.g., an A structural domain of a protein (e.g., an --helix) can be present at an interface between helix) can be present at an interface between polar and non-polar environmentspolar and non-polar environments Example: Domain of a Example: Domain of a membrane-associated membrane-associated

proteinprotein that anchors it to membranethat anchors it to membrane Such a domain will ideally be hydrophilic on Such a domain will ideally be hydrophilic on

one side and hydrophobic on the otherone side and hydrophobic on the other This is termed an This is termed an amphiphilicamphiphilic or or amphipathicamphipathic

sequence or domainsequence or domain

Amphiphilicity/AmphipathicityAmphiphilicity/Amphipathicity

To find such sequences, we look for regions To find such sequences, we look for regions where short stretches of charged residues where short stretches of charged residues alternate with short stretches of alternate with short stretches of hydrophobic residues hydrophobic residues with a repeat distance with a repeat distance corresponding to the period of the structurecorresponding to the period of the structure

Multiple Sequence AlignmentMultiple Sequence Alignment

Goal: Goal: Create best possible “overall” Create best possible “overall” alignment of a family of sequences (more alignment of a family of sequences (more than two)than two)

Ideal approach: Ideal approach: Compare all sequences Compare all sequences “simultaneously”“simultaneously”

Short-cut approach: Short-cut approach: Align all of the Align all of the members members pairwise pairwise with with oneone of the members of the members

Pairwise Multiple Sequence Alignment Example - MacVectorPairwise Multiple Sequence Alignment Example - MacVector ““Align to FolderAlign to Folder””

InputsInputs An open sequence fileAn open sequence file A folder containing a set of sequencesA folder containing a set of sequences Settings for sequence comparisonSettings for sequence comparison

OutputsOutputs Aligned sequence map (graphical)Aligned sequence map (graphical) Aligned sequence listing (text)Aligned sequence listing (text)

Pairwise Multiple Sequence Alignment Example 1Pairwise Multiple Sequence Alignment Example 1 Input: Input: Folder containing a set of protein Folder containing a set of protein

sequences for various sequences for various and and tubulin tubulin chainschains

Task: Task: Compare first sequence to all others Compare first sequence to all others and display mapand display map

Pairwise Multiple Sequence Alignment Example 1Pairwise Multiple Sequence Alignment Example 1 Open first tubulin sequence (A23035)Open first tubulin sequence (A23035)

Pairwise Multiple Sequence Alignment Example 1Pairwise Multiple Sequence Alignment Example 1

Under Under DatabaseDatabase, , pull down pull down to to Align Align to Folderto Folder

Click on Click on Folder to Folder to SearchSearch


Select Select folder folder containing containing tubulinstubulins


Use Use defaultdefaults for s for search search settings settings and and click click OKOK

Pairwise Multiple Sequence Alignment Example 1Pairwise Multiple Sequence Alignment Example 1 Click all boxes for Click all boxes for Display OptionsDisplay Options and and OKOK

Description List

Search Analysis for Sequence: A23035 Matrix: pam250 matrixSearch from 1 to 451 where origin = 1 Score Region from 1 to 451Date: February 20,1997 Maximum possible score: 2265Time: 00:48:27

Database: UserFolder: tubulins

Sequence Opt. Init. Description

1. A23035 2265 2265 Tubulin alpha chain - Human 2. A25873 2193 2176 Tubulin alpha chain - Human 3. UBUTA 1995 1995 Tubulin alpha chain - Trypanosoma brucei rhodesiense 4. A25601 1952 1948 Tubulin alpha chain - Slime mold (Physarum polycephalum) 5. UBUTB 1058 754 Tubulin beta chain - Trypanosoma brucei rhodesiense 6. A29141 1051 755 Tubulin beta chain - Chlamydomonas reinhardtii 7. A26561 1050 773 Tubulin beta chain - Human 8. UBPGB 1044 765 Tubulin beta chain - Pig 9. UBCHB 1043 765 Tubulin beta chain, embryonic - Chicken 10. A25377 1022 762 Tubulin beta chain - Neurospora crassa 11. UBBYB 1002 699 Tubulin beta chain - Yeast (Saccharomyces cerevisiae) 12. UBURAL 826 826 Tubulin alpha chain - Sea urchin (Lytechinus pictus) (fragment) 13. A25342 532 361 Tubulin beta chain - Slime mold (Physarum polycephalum) 14. UBURB 354 182 Tubulin beta chain - Sea urchin (Lytechinus pictus) (fragment)

Aligned Sequence Description List for TubulinsAligned Sequence Description List for Tubulins


Database: UserFolder: tubulins

1. A23035 2265 2. A25873 2193 3. UBUTA 1995 4. A25601 1952 5. UBUTB 1058 6. A29141 1051 7. A26561 1050 8. UBPGB 1044 9. UBCHB 1043 10. A25377 1022 11. UBBYB 1002 12. UBURAL 826 13. A25342 532 14. UBURB 354

50

50

100

100

150

150

200

200

250

250

300

300

350

350

400

400

450

450

Aligned Sequence Map for TubulinsAligned Sequence Map for Tubulins

Pairwise Multiple Sequence Alignment Example 2Pairwise Multiple Sequence Alignment Example 2 Input: Input: Folder containing sequence files for Folder containing sequence files for

just three tubulinsjust three tubulins Task: Task: Compare using different query Compare using different query

sequences and examine aligned sequence sequences and examine aligned sequence listing for differenceslisting for differences

Pairwise Multiple Sequence Alignment Example 2Pairwise Multiple Sequence Alignment Example 2 Open first of three sequencesOpen first of three sequences


Align Align to to Folder Folder with with itself itself and two and two other other tubulinstubulins

Alignment List


Database: UserFolder: subset of tubulins

10 20 30 40 50 60 * * * * * * * * * * * *A23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK | | | | | | 1. A23035 10 20 30 40 50 60[ 2265 ] MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK> ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^Â23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK | | | | | | 2. A25601 10 20 30 40 50 60[ 1952 ] MREvISIHiG QAGtQvGNAC WELYCLEHGI QPDGQMPSDK svGyGDDaFN TFFSETGAGK> ^^^v^^^^^^ ^^^-^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^v^^^^^^ ^^^^^^^^^Â23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK | | | | | | 3. UBUTB 10 20 30 40 | 50 |[ 1058 ] MREivcvqaG QcGnQIGskf WEvisdEHGv dPtGtyqgDs dl--qleriN vyFdEatgGr> ^^^v^-^^-^ ^v^v^^^^vv ^^^v-v^^^^ ^^-^vv-^^- -^ vv^-^^ -^^-^^-^^Â23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK


Close Close first, first, open open second second sequence sequence and and repeat repeat Align to Align to FolderFolder

Alignment List



10 20 30 40 50 60 * * * * * * * * * * * *A25601 MREVISIHIG QAGTQVGNAC WELYCLEHGI QPDGQMPSDK SVGYGDDAFN TFFSETGAGK | | | | | | 1. A25601 10 20 30 40 50 60[ 2161 ] MREVISIHIG QAGTQVGNAC WELYCLEHGI QPDGQMPSDK SVGYGDDAFN TFFSETGAGK> ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^Â25601 MREVISIHIG QAGTQVGNAC WELYCLEHGI QPDGQMPSDK SVGYGDDAFN TFFSETGAGK | | | | | | 2. A23035 10 20 30 40 50 60[ 1948 ] MREcISIHvG QAGvQiGNAC WELYCLEHGI QPDGQMPSDK tiGgGDDsFN TFFSETGAGK> ^^^v^^^^^^ ^^^-^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^v^^^^^^ ^^^^^^^^^Â25601 MREVISIHIG QAGTQVGNAC WELYCLEHGI QPDGQMPSDK SVGYGDDAFN TFFSETGAGK | | | | | | 3. UBUTB 10 20 30 40 | 50 |[ 1014 ] MREivcvqaG QcGnQiGskf WEvisdEHGv dPtGtyqgDs dlql-er-iN vyFdEatgGr> ^^^^^-^^v^ ^v^-^^^^vv ^^^v-v^^^^ ^^-^vv-^^- -^vv ^v ^^ -^^-^^-^^Â25601 MREVISIHIG QAGTQVGNAC WELYCLEHGI QPDGQMPSDK SVGYGDDAFN TFFSETGAGK

Alignment List



10 20 30 40 50 60 * * * * * * * * * * * *A23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK | | | | | | 1. A23035 10 20 30 40 50 60[ 2265 ] MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK> ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^Â23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK | | | | | | 2. A25601 10 20 30 40 50 60[ 1952 ] MREvISIHiG QAGtQvGNAC WELYCLEHGI QPDGQMPSDK svGyGDDaFN TFFSETGAGK> ^^^v^^^^^^ ^^^-^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^v^^^^^^ ^^^^^^^^^Â23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK | | | | | | 3. UBUTB 10 20 30 40 | 50 |[ 1058 ] MREivcvqaG QcGnQIGskf WEvisdEHGv dPtGtyqgDs dl--qleriN vyFdEatgGr> ^^^v^-^^-^ ^v^v^^^^vv ^^^v-v^^^^ ^^-^vv-^^- -^ vv^-^^ -^^-^^-^^Â23035 MRECISIHVG QAGVQIGNAC WELYCLEHGI QPDGQMPSDK TIGGGDDSFN TFFSETGAGK


1st alignment A23035 TIGGGDDSFN A25601 svGyGDDaFN UBUTB dl--qleriN2nd alignment A23035 tiGgGDDsFN A25601 SVGYGDDAFN UBUTB dlql-er-iN

Note that a different (better) alignment of Note that a different (better) alignment of A25601A25601 and and UBUTBUBUTB is obtained from is obtained from direct direct comparison to each other comparison to each other (2nd alignment) than when each is compared (2nd alignment) than when each is compared indirectlyindirectly via via comparison to A23035 (1st alignment)comparison to A23035 (1st alignment)

“True” Multiple Sequence Alignment“True” Multiple Sequence Alignment First do First do allall pairwise alignments (not just pairwise alignments (not just

one sequence with all others)one sequence with all others) Then combine pairwise alignments to Then combine pairwise alignments to

generate overall alignmentgenerate overall alignment

Multiple Sequence Alignment programsMultiple Sequence Alignment programs ClustalWClustalW

available via web serveravailable via web server included within MacVectorincluded within MacVector

MSAMSA available via web serveravailable via web server

ClustalW within MacVectorClustalW within MacVector

Use same set of tubulins from the Use same set of tubulins from the MacVector Sample Files folderMacVector Sample Files folder

Open each sequenceOpen each sequence Under Under AnalyzeAnalyze, select , select ClustalW ClustalW

AlignmentAlignment

0.073 UBUTB

0.073 UBUTA

0.050 UBURAL

A230350.038

0.013A258730.068

0.003

0.089 A25601

0.215

0.396 FVVFBA

0.402 CVJB0.225

0.180

0.134 UBBYB

0.015

0.089 A25377

0.029

0.080 A25342

0.023

0.083 UBURB

UBCHB0.018

UBPGB0.027

0.016A26561

0.046

0.043 A29141

0.007

ClustalW Guide TreeClustalW Guide Tree

From pairwise alignments, build tree that links similar sequences

ClustalW within MacVectorClustalW within MacVector Three ways to save resultsThree ways to save results

Click Save button on initial “ClustalW Alignment Click Save button on initial “ClustalW Alignment Results” screen (or do Save As... on “Aligned Results” screen (or do Save As... on “Aligned Sequences”)Sequences”)

get sequential text file suitable for input to Phylipget sequential text file suitable for input to Phylip

Use Save As... on “Aligned Sequences ClustalW Use Save As... on “Aligned Sequences ClustalW Alignments”Alignments”

get interleaved text file useful for printing but requires get interleaved text file useful for printing but requires significant editing for Phylipsignificant editing for Phylip

Use Save As... on “Aligned Sequences” displayUse Save As... on “Aligned Sequences” display get color PICT fileget color PICT file

ClustalW sequential fileClustalW sequential file> UBUTBMREIVCVQAGQCGNQIGSKFWEVISDEHGVDPTGTYQGDSDLQL--ERINVYFDEATGGRYVPRSVLIDLEPGTMDSVRAGPYGQIFRPDNFIFGQSGAGNNWAKGHYTEGAELIDSVLDVCCKEAESCDCLQGFQICHSLGGGTGSGMGTLLISKLREQYPDRIMMTFSIIPSPKVSDTVVEPYNTTLSVHQLVENSDESMCIDNEALYDICFRTLKLTTPTFGDLNHLVSAVVSGVTCCLRFPGQLNSDLRKLAVNLVPFPRLHFFMMGFAPLTSRGSQQYRGLSVPELTQQMFDAKNMMQAADPRHGRYLTASALFRGRMSTKEVDEQMLNVQNKNSSYFIEWIPNNIKSSVCDIP----PKG----LKMAVTFIGNNTCIQEMFRRV-GEQFTLMFRRKAFLHWYTGEGMDEMEFTEAESNMNDLVSEYQQYQDATIEEEGE------FDEEEQY---------> UBUTAMREAICIHIGQAGCQVGNACWELFCLEHGIQPDGAMPSDKTIGVEDDAFNTFFSETGAGKHVPRAVFLDLEPTVVDEVRTGTYRQLFHPEQLISGKEDAANNYARGHYTIGKEIVDLCLDRIRKLADNCTGLQGFLVYHAVGGGTGSGLGALLLERLSVDYGKKSKLGYTVYPSPQVSTAVVEPYNSVLSTHSLLEHTDVAAMLDNEAIYDLTRRNLDIERPTYTNLNRLIGQVVSSLTASLRFDGALNVDLTEFQTNLVPYPRIHFVLTSYAPVISAEKAYHEQLSVSEISNAVFEPASMMTKCDPRHGKYMACCLMYRGDVVPKDVNAAVATIKTKRTIQFVDWSPTGFKCGINYQPPTVVPGGDLAKVQRAVCMIANSTAIAEVFARI-DHKFDLMYSKRAFVHWYVGEGMEEGEFSEAREDLAALEKDYEEVGAESADMDGE--------EDVEEY--------> UBURB-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------FAPLTSRGSQQYRALTVSELTQQMFDAKNMMAACDPRHGRYLTVAAIFRGRMSMKEVDEQMLNVQNKNSSYFVEWIPNNVKTAVCDIP----PRG----LKMSATFIGNSTAIQELFKRI-SEQFTAMFRRKAFLHWYTGEGMDEMEFTEAESNMNDLVSEYQQYQDATAEEEGE------FDEEEGDEEAA-----

ClustalW interleaved fileClustalW interleaved fileClustal W(1.4) multiple sequence alignment

16 Sequences Aligned. Alignment Score = 145639Gaps Inserted = 114 Conserved Identities = 1

Pairwise Alignment Mode: SlowPairwise Alignment Parameters: Open Gap Penalty = 10.0 Extend Gap Penalty = 0.1 Similarity Matrix: blosum

Multiple Alignment Parameters: Open Gap Penalty = 10.0 Extend Gap Penalty = 0.1 Delay Divergent = 40% Gap Distance = 8 Similarity Matrix: blosum

Processing time: 21.2 seconds

UBUTB MREIVCVQAGQCGNQIGSKFWEVISDEHGVDPTGTYQGDSDLQL--ERINVYFDEATGGRYVPRSVLIDLEPGTMDSVRAGPYGQIFRPDNFIFGQSGAGUBUTA MREAICIHIGQAGCQVGNACWELFCLEHGIQPDGAMPSDKTIGVEDDAFNTFFSETGAGKHVPRAVFLDLEPTVVDEVRTGTYRQLFHPEQLISGKEDAAUBURB UBURAL UBPGB MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEAAGNKYVPRAILVDLEPGTMDSVRSGPFGQIFRPDNFVFGQSGAGUBCHB MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEATGNKYVPRAILVDLEPGTMDSVRSGPFGQIFRPDNFVFGQSGAGUBBYB MREIIHISTGQCGNQIGAAFWETICGEHGLDFNGTYHGHDDIQK--ERLNVYFNEASSGKWVPRSINVDLEPGTIDAVRNSAIGNLFRPDNYIFGQSSAGFVVFBA TDE--I--------------------TSFS-------IP--KFR---P---D--------Q---P-NLIF-Q---GCVJB ADT--I------------------VAVELDT------YPNTDIGD--PS--------------YP-----------A29141 MREIVHIQGGQCGNQIGAKFWEVVSDEHGIDPTGTYHGDSDLQL--ERINVYFNEATGGRYVPRAILMDLEPGTMDSVRSGPYGQIFRPDNFVFGQTGAGA26561 MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGTYHGDSDLQL--DRISVYYNEATGGKYVPRAILVDLEPGTMDSVRSGPFGQIFRPDNFVFGQSGAGA25873 MRECISVHVGQAGVQMGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFTTFFCETGAGKHVPRAVFVDLEPTVIDEIRNGPYRQLFHPEQLITGKEDAAA25601 MREVISIHIGQAGTQVGNACWELYCLEHGIQPDGQMPSDKSVGYGDDAFNTFFSETGAGKXXXXAVFLDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAAA25377 MREIVHLQTGQCGNQIGAAFWQTISGEHGLDASGVYNGTSELQL--ERMNVYFNEASGNKYVPRAVLVDLEPGTMDAVRAGPFGQLFRPDNFVFGQSGAGA25342 MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEATGGKYVPRAVLVDLEPGTMDSVRAGPFGQIFRPDNFVFGQTGAGA23035 MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAA

UBUTB NNWAKGHYTEGAELIDSVLDVCCKEAESCDCLQGFQICHSLGGGTGSGMGTLLISKLREQYPDRIMMTFSIIPSPKVSDTVVEPYNTTLSVHQLVENSDEUBUTA NNYARGHYTIGKEIVDLCLDRIRKLADNCTGLQGFLVYHAVGGGTGSGLGALLLERLSVDYGKKSKLGYTVYPSPQVSTAVVEPYNSVLSTHSLLEHTDVUBURB UBURAL UBPGB NNWAKGHYTEGAELVDSVLDVVRKESESCDCLQGFQLTHSLGGGTGSGMGTLLISKIREEYPDRIMNTFSVVPSPKVSDTVVEPYNATLSVHQLVENTDEUBCHB NNWAKGHYTEGAELVDSVLDVVRKESESCDCLQGFQLTHSLGGGTGSGMGTLLISKIREEYPDRIMNTFSVMPSPKVSDTVVEPYNATLSVHQLVENTDE

ClustalW graphical display (partial)ClustalW graphical display (partial)

Color (or shade of gray) shows type of amino acid

Computational Biology, Part 11 Hydrophilicity & Multiple Sequence Alignment Robert F. Murphy...

Documents

Transcript of Computational Biology, Part 11 Hydrophilicity & Multiple Sequence Alignment Robert F. Murphy...