Proteomics Bioinformatics & Protein Structural Analysis...

23
The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the most widespread fields of research in bioinformatics. Bioinformatics & Protein Structural Analysis In this Learning Object, the learner will be able to, Describe Protein Structural Databases, and, Recall Uses of Structural databases. Learning Objective Bioinformatics & Protein Structural Analysis Proteomics

Transcript of Proteomics Bioinformatics & Protein Structural Analysis...

Page 1: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the most widespread fields of research in bioinformatics.

Bioinformatics & Protein Structural Analysis

In this Learning Object, the learner will be able to,

Describe Protein Structural Databases, and,Recall Uses of Structural databases.

Learning Objective

Bioinformatics & Protein Structural AnalysisProteomics

Page 2: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The protein structural databases contain a basic search box which requires the input for an identifier of the protein. This identifier can be the protein name, key-word, ID, author, etc. In this example, we take the case of Viral Capsid Proteins. These databases have advanced search features which are optional but help in making the query very specific. The general options can be categorized in 4 broad classes. Structural Features, Biology, Sequence Data and Experimental Details.

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 3: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The search results for the query protein entered showed 67 structures in the database that match the criteria given by the user in the search options. The first page of the results shows the titles of all the hits. The user then needs to select the protein structure of their interest to study in detail. Here we select the structure titled “HIV CAPSID C-TERMINAL DOMAIN (CAC146)” for further study.

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 4: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The summary page shows all the general information pertaining to the basic features of the protein. This includes:1 . Protein Identifier2. Molecule name, structure weight, polymer type, number of chains, length of the molecule and its classification3. Source organism and Expression organism4. Journal, paper and author name

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 5: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The sequence data tab contains all the information related to the amino acid sequence corresponding to the protein under consideration1. FATSA sequence for all chains in the polypeptide 2. Type of chain such as polypeptide, glyco-peptide, lipo-peptide, etc.3. Diagrammatic representation of the Classification and Secondary structure of this chain - assigning residues with helix, sheet or turn

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 6: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The sequence similarity tab shows the information related to comparative studies of the two sequences. 1. Option to perform BLAST search. 2. List of Clusters of proteins is produced. These clusters are formed and ranked based on the resolution of the structures within them. The better the quality (resolution) of the cluster, higher it is ranked.When the user clicks on a particular cluster, the component proteins within the cluster are displayed along with supporting information.

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 7: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The structural similarity tab shows the information related to comparative studies of the two structures. It establishes equivalences based on 3D conformations of both proteins. The default visualization tool for PDB is Jmol. Structural alignment is covered in more detail in the second part of this animation.

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 8: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

This tab provides details of the methodology used in conducting those experiments. This includes, 1. Crystallization methods, pH, temperature, and other details of the experiment2. Crystal Data (Space group, unit cell dimensions)3. Diffraction source, diffraction protocol and diffraction detectors4. Data related to Resolution and Refinement details5. Software, programs and Computing utilized.

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 9: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The Geometry of the molecule contains all the spatial information about the Geometry of the molecule, so that it can be simulated in a virtual environment. This includes:Bond length: Number of occurrences and their positions in the chainsBond Angles: Number of occurrences and their positions in the chainsDihedral Angles: Number of occurrences and their positions in the chainsRamachandran plot, Fold Deviation Scores and other structural details

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM

Page 10: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The biology tab contains information about the significance of the molecule at the biological and cellular level. This includes 1. Molecule type 2. Formula weight 3. Monomers, and linkages4. Source method5. Ligands and prosthetic groups6. Gene detail and Genome information 7. Keywords

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 11: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

Data for the same protein but from other resources such as SCOP, CATH and PFAM classification details are provided in the derived data tab. For more detailed analysis visit http://www.pdb.org/pdb/explore/derivedData.do?structureId=1AUM

Protein Structural Databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 12: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

Two given proteins can be structurally aligned to evaluate the similarity between them. The server requires an input of two protein sequences or their IDs, which are then simulated and aligned based on their 3D coordinates, bond angles and dihedral angles. Few of the various servers available for this are DALI, MAMMOTH, CE/CE-MC, SSAP and ProFit.

Uses of structural databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 13: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

The results are 1. P-value: It is the probability measure that the two structure are similar. If P-value < 0.05 indicates significant similarity2. Raw score: It is used to compare other similarity matches with same proteins3. RMSD: Measure of the average distance between the atoms of the super-imposed proteins4. Percentage sequence identity in the alignment

Uses of structural databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 14: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

Once the amino acid sequence of the protein is known, its secondary and tertiary structures can be predicted using many prediction algorithms, which utilize information from previous structurally characterized sequences. In the secondary structure prediction, 1.“h” represents Alpha Helix2.“e” represents Beta Sheets,3.“c” represents CoilsSince all known proteins have not yet been structurally characterized, this provides a useful bioinformatics analysis tool for researchers. The various servers for structure prediction are GOR, HNN, PredictProtein, NNPredict and Sspro.

Uses of structural databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 15: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

Given a particular amino acid sequence, the cellular, molecular and biological processes associated with the sequence can be predicted using functional annotation servers. These processes are represented by a unique set of identifiers called “Gene Ontology Terms” or the “GO Terms”. The GO term can be a word or an alphanumeric identifier which includes a definition with cited sources and a namespace indicating the domain to which it belongs. The various server for this include DbAli Annolite, PFP, ProteomeAnalyst, GOPET, SpearMint and ProKnow.

Uses of structural databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 16: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

1. Geometry of Protein Structure: Geometry of a protein structure refers to the three dimensional coordinates of its atoms and the angles between their bonds. These are essential to simulate the protein structure on computers.

2. Biology of Protein Structure: Information regarding the biological source of the protein and its metabolic roles within the cell and organism is referred to as the biology of protein structure.

3. SCOP classification: SCOP stands for “Structural Classification of Proteins” and aims to provide a detailed description of the various structural and evolutionary relationships between all proteins that have been

structurally characterized. SCOP Classification can be done at four levels - Class, Fold, Superfamily and Family.

4. CATH classification: CATH stands for “Class Architecture Topology and Homologous Superfamily” and provides a semi-automatic, hierarchical classification of protein domains. The levels for CATH classification are Class, Architecture, Topology and Homologous Superfamily.

Protein structural databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 17: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein

1. Protein Structural Alignment: The geometry of two given protein structures can be compared by means of available software tools that analyse their three dimensional similarity to each other.

2. Protein Structure Prediction: The prospective secondary structures of peptides or proteins can be predicted from a given stretch of amino acid residues by using machine learning algorithms.

3. Machine Learning Algorithms: These are computer algorithms that can be trained from a given classified dataset. Thereafter, these programs train their parameters in a such a way, that they can classify new data. Most

widely used Machine Learning Algorithms in Bioinformatics are Artificial Neural Networks, Hidden Markov Modeling, Support Vector Machines, etc.

4. Functional Annotation: For novel proteins that are yet to be characterized, the potential functions can be predicted by techniques such as Homology Modelling which provide an initial insight into the protein’s properties.

Uses of structural databases

Bioinformatics & Protein Structural AnalysisProteomics

Page 18: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein
Page 19: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein
Page 20: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein
Page 21: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein
Page 22: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein
Page 23: Proteomics Bioinformatics & Protein Structural Analysis ...oscar.iitb.ac.in/onsiteDocumentsDirectory/650... · Bioinformatics & Protein Structural Analysis Proteomics. The protein