Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in...

download Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

of 16

Transcript of Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in...

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    1/16

    AMIT

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    2/16

    Objective

    Identification of conserved residues among the datasets of the structurally similar sequdissimilar (SSSD) proteins.

    Generation of the substitution matrix, using the statistical distribution of the mutuallypersistently conserved (MPC) positions, for protein design experiments.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    3/16

    Introduction

    Identification of structurally similar sequence dissimilar (SSSD) proteins , structprotein pairs that have only 10% sequence identity, suggest that many positionscritical role in structure determination and the folding determinants are restrictedlimited number of sequence residues.

    A number of experimental and computational studies have demonstrated SSSD Experimental studies mainly uses effect of mutations on protein stability and while computational studies utilizes positional conservation analysis and

    analysis to do so.

    In this study, a stringent data set of SSSD protein pairs is used to characterize resthat may play a role in the determination of protein structure and/or function.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    4/16

    Introduction

    The database used in the current analysis is of pairs of proteins that share a common fare dissimilar in sequence (12% identical residues on the average).

    For each protein in the database , amino-acid positions are identified that shconservation within both close and distant family members. These positions arpersistently conserved .

    Then, structurally aligned positions in a protein pair that are persistently conservpair mates, are identified and termed as mutually persistently conserved (MPC)

    Because of their intra- and interfamily conservation, these positions are good candiddetermining protein fold and function.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    5/16

    Introduction

    Results identified that 45% of the persistently conserved positions are mutually conser

    A significant fraction of them are located in critical positions for secondary stdetermination, they are mostly buried, and many of them form spatial clusters withprotein structures.

    A substitution matrix based on the subset of MPC positions shows twocharacteristics:

    (i) it is different from other available matrices, even those that are derivestructural alignments;

    (ii) its relative entropy is high, emphasizing the special residue restrictions imthese positions.

    Such a substitution matrix should be valuable for protein design experiments.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    6/16

    Results

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    7/16

    Datasets

    The FSSP (fold classification based on structure-structure alignment of proteins) and aligned protein sequences (DAPS) databases were used as a starting point for the dataSSSD protein pairs.

    The DAPS database is based on FSSP and contains alignments of all protein pairs shaidentical residues. Various filtering criteria were applied to obtain database of 11protein pairs.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    8/16

    Identifying mutually persistently conserved positions

    Fig. 1. A schematic flowchart mutually persistently conserved

    Each sequence in the database was run through PSI-

    BLAST for five iterations, or until convergence, toidentifies remote homologs.

    For each sequence, the conservation of a given amino-acidtype in a given position is evaluated relative to itsbackground frequency in the entire database, termed asInformation content (IC).

    Inclusion of residue positions that are identified asconserved both at the first and last iteration of PSI-BLAST is expected to decrease the fraction of positionsthat were erroneously identified as conserved.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    9/16

    Identifying mutually persistently conserved positions

    Fig. 1. A schematic flowchart mutually persistently conserved

    Quick Stats:

    Out of 118 protein pairs in our database, 93 pairs hadmore than a single PSI-BLAST iteration for both pairmates.

    Seventy-four percent of the positions identified asconserved at the first iteration were persistentlyconserved.

    Among all persistently conserved positions, 45% showmutual conservation, while 55% show persistentconservation only in one pair mate.

    It is evident from the above statistics that the applicationof the two requirements of persistency and mutuality ofconservation directs us to a strictly defined subset ofresidues.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    10/16

    Identifying mutually persistently conserved positions

    Fig. 1. A schematic flowchart mutually persistently conserved

    Assumption:

    MPC positions were maintained persistently conserved in acorresponding manner in the two remote protein familiesbecause they play important roles in structure and/orfunction determination, and we turn to find out whatthese roles might be.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    11/16

    Over-represented amino acid residues in mutually persistentlyconserved positions

    Fig. 2. Distribution of residue typconserved (MPC) positions expre

    between the frequency of a residuand its frequency in the entire d(exp). All frequency differences wesignificant by a X 2 test, except fValine (marked with ^).

    Comparison of the amino-acid frequency distribution in

    MPC positions with their frequency distribution in allpositions in the data is performed by applying a X 2 (chi)test to the individual amino acids.

    aspartic acid, isoleucine, glycine, proline, histidine,cysteine, tryptophan, phenylalanine, and tyrosine werefound to be significantly over-represented in MPCpositions in comparison to their background frequencies.

    In many cases, those residues were maintainedunchanged in the structurally aligned positions of thetwo pair mates.

    These residues may have distinct roles, mainly in or nearthe active sites of the proteins.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    12/16

    Substitution matrix derived from mutually persistently conserved p

    A log-odds substitution matrix based on the data fromMPC positions describes the favorableness of

    substitutions in these positions, and provides a tool foranalyzing the allowable substitutions at positions thatare suspected to be important for structural/functional determination.

    In the MPC-derived matrix, the high frequency ofsynonymy prevails a completely different reason fromthe traditional substitution matrices, i.e. MPC positionswill tend to be synonymous by virtue of theirirreplaceability, and not because they have not yetdiverged in evolution.

    Fig. 3. Amino-acid residue substi(a) mutually persistently consestructurally aligned positions. Valu

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    13/16

    Mutually persistently conserved positions in secondary structure e

    We investigate the presence of MPC residues inspecific positions along secondary-structure elements,

    both in -helices and -strands. In fig. 5a, MPC positions are preferred to be present at

    the flanking regions of -helices, both at their N and Ctermini, especially in positions N, N, and C. Thus,these residues probably play a role in the helix initiationand termination.

    In fig. 5b, Specific positioning of MPC residues in -strands is also observed, although less prominently,preferred at the terminal position of the -strand andin its vicinity.

    Thus, one of the roles MPC residues may have is in thedetermination and stabilization of secondary structureelements along the protein sequence.

    Fig. 5. Frequency of mutually persistently conservelements. The X -axis shows the positions in an(nomenclature after Aurora and Rose 1998). The flathe in-element residues with digits, and the initial andY -axis is the logarithm of the ratio between the actuand that expected at random, based on the overall fr

    positions in which MPCs were found to be significwith an *. (a ) helices; ( b ) strands.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    14/16

    Solvent accessibility

    The solvent accessibility values of MPC positions werecompared with those of all paired positions in the

    database. Figure 6 shows the fractions of exposed and buried

    MPC positions (White Bars) in comparison to all aligneddatabase residues (Black Bars).

    MPC positions tend to be located in the proteins interior, lending further support to their possible role

    as maintainers of structure/function.

    Fig. 6. Distribution of mutually pe(white bars) by solvent accessibiresidues (black bars). Residues wesolvent accessibility was

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    15/16

    Spatial proximity of mutually persistently conserved positions

    Spatial proximity of MPC positions are describes by using a graph representation, asnodes connected by edges.

    Out of the 118 protein pairs, 69 pairs were found to have MPC positions whose spawas better than that expected at random.

    This suggests that a fraction of the MPC positions form spatial clusters of interactithat may have a functional or structural role.

    Thus, an additional role of these residues may be in establishing folding nuclei a

    substructures associated with the functional sites.

  • 8/13/2019 Persistently Conserved Positions in Structurally Similar_sequence Dissimilar Proteins__Roles in Preserving Protein Fold and Function

    16/16

    Conclusion

    MCP can be used to represent the conserved residues among the SSSD protein datase

    MCP are comprise of some specific residues types, which varies according to tstructure type. It can be understand in the context of sequence propensity.

    MCP are mostly present in buried core and important for the protein structure stabilit

    Substitution matrix derived from mutually persistently conserved positions can be usdesign experiments.