Basic bioinformatics tools for studying proteins Dong Xu Computer Science Department C. S. Bond Life...
-
Upload
gabriella-day -
Category
Documents
-
view
220 -
download
0
Transcript of Basic bioinformatics tools for studying proteins Dong Xu Computer Science Department C. S. Bond Life...
Basic bioinformatics tools for studying proteins
Dong Xu
Computer Science Department C. S. Bond Life Sciences CenterUniversity of Missouri, Columbia
http://digbio.missouri.edu
Introduction
Broaden knowledge for undergraduate education
Many opportunities for biomedical and agricultural related jobs
Practice basic protein tools:Useful for biological studiesIntellectually stimulating
Dong’s picks for beginners :Not unnecessarily the most accurate toolEasy to use and understandVery popular
Proteins – Some Basics
What Is a Protein?Linear Sequence of Amino Acids...
What is an Amino Acid?
20 20 Amino acidsAmino acids
Glycine (G)
Glutamic acid (E)
Asparatic acid (D)
Methionine (M)
Threonine (T)
Serine (S)
Glutamine (Q)
Asparagine (N)
Tryptophan (W)
Phenylalanine (F)
Cysteine (C)
Proline (P)
Leucine (L)
Isoleucine (I)
Valine (V)
Alanine (A)
Histidine (H)
Lysine (K)
Tyrosine (Y)
Arginine (R)
White: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic
Amino Acids connect via PEPTIDE BOND
Peptide Bond
A AFNG
GS T
SD
K
An Overview
o A protein folds into a unique 3D structure under the physiological condition
Lysozyme sequence (129 amino acids):KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS
TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV QAWIRGCRL
Protein backbones: Side chain
Primary, Secondary and Tertiary Structures of
Proteins
Protein Structure Representations
Lysozyme structure:
ball & stick strand surface
Structure Visualization
Rasmol (http://www.umass.edu/microbio/rasmol/getras.htm)
MDL Chime (plug-in) (http://www.mdl.com/products/framework/chime/)
Protein Explorer (http://molvis.sdsc.edu/protexpl/frntdoor.htm)
Jmol: http://jmol.sourceforge.net/ Pymol: http://pymol.sourceforge.net/ Vmd: http://www.ks.uiuc.edu/Research/vmd/
Sequence Homology Software
NCBI-BLASThttp://www.ncbi.nlm.nih.gov/BLAST/
Comparing 2 (pairwise) or more (multiple) sequences.
Searching for a series of identical or similar characters in the sequences.
VLSPADKTNVKAAWAKVGAHAAGHG||| | | |||| | ||||VLSEAEWQLVLHVWAKVEADVAGHG
Typical BLAST Output
InterPro Scanhttp://www.ebi.ac.uk/InterProScan/
InterPro Scan PCNA http://www.ebi.ac.uk/InterProScan/
MyHits Local Motifs Searchhttp://myhits.isb-sib.ch/
MyHits Local Motifs Summaryhttp://myhits.isb-sib.ch/
MyHits Local Motif Hitshttp://myhits.isb-sib.ch/
Multiple Alignment
VTISCTGSESNIGAG-NHVKWYQQLPGVTISCTGTESNIGS--ITVNWYQQLPGLRLSCSSSDFIFSS--YAMYWVRQAPGLSLTCTVSETSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKEFYPSD--IAVEWWSNG--
Phylogeny Tree
Multiple protein sequence alignment
conserved sites and hence possibly functional sites
phylogenetic tree
MSA with ClustalW
1exr_A -EQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN 59 1N0Y_A AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN 60 3cln_ ----TEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN 56 :************:******************************************* 1exr_A GTIDFPEFLSLMARKMKEQDSEEELIEAFKVFDRDGNGLISAAELRHVMTNLGEKLTDDE 119 1N0Y_A GTIDFPEFLSLMARKMKEQDSEEELIEAFKVFDRDGNGLISAAELRHVMTNLGEKLTDDE 120 3cln_ GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE 116 *********::******: *****: ***:***:**** *******************:* 1exr_A VDEMIREADIDGDGHINYEEFVRMMVS- 146 1N0Y_A VDEMIREADIDGDGHINYEEFVRMMVSK 148 3cln_ VDEMIREANIDGDGQVNYEEFVQMMTA- 143 ********:*****::******:**.:
2 or more sequences for
analysis
params (default or custom for
different scoring
matrices, gap penalties, etc.)
ClustalW
Phylogram
Cladogram
ClustalW: http://www.ebi.ac.uk/Tools/clustalw2/index.html
Cell localization
Typical Sorting Signals
Signal Function Example
Import into nucleus -P-P-K-K-K-R-K-V-
Export from nucleus -L-A-L-K-L-A-G-L-D-I-
Import into mitochondria <-MLSLRQSIRFFKPATRTLCSSRYLL-
Import into plastid <-MVAMAMASLQSSMSSLSLSSNS
FLGQPLSPITLSPFLQG-
Import into peroxisomes -S-K-L->
Import into ER <-MMSFVSLLLVGILFWAT
EAEQLTKCEVFN-
Return to ER -K-D-E-L->
Localizations
Cell localization
PSORT: http://psort.nibb.ac.jp/
TargetP:
http://www.cbs.dtu.dk/services/TargetP/
Signal peptide
SingalP:
http://www.cbs.dtu.dk/services/SignalP/
SignalP result
Membrane Bilayer with Proteins
Helix Bundle TM Proteins
PDB = 1QHJ PDB = 1RRC
Single helix or helical bundles (> 90% of TM proteins)Examples: Human growth hormone receptor, Insulin receptor
ATP binding cassette family - CFTRMultidrug resistance proteins
7TM receptors - G protein-linked receptors
Beta Barrel TM Proteins
Transmembrane Prediction
http://bp.nuap.nagoya-u.ac.jp/sosui/ (alpha)
http://psfs.cbrc.jp/tmbeta-net/ (beta)
Secondary Structure Prediction
SSpro 4.1: http://sysbio.rnet.missouri.edu/multicom_toolbox/
PSI-PRED: http://bioinf.cs.ucl.ac.uk/psipred/psiform.html
SAM: http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
PHD: http://www.predictprotein.org/
Coiled coil prediction
http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_lupas.html
Special motif prediction
Helix-turn-helix motif predictionhttp://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_hth.html
Kinase related motifshttp://scansite.mit.edu/motifscan_seq.phtml
Leucine Zippershttp://2zip.molgen.mpg.de/index.html
Protein disorder prediction
PreDisorder: http://sysbio.rnet.missouri.edu/multicom_toolbox/
A collection of disorder predictors:http://www.disprot.org/predictors.php
2D: Contact Map Prediction
1 2 ………..………..…j...…………………..…n 123....i.......n
3D Structure 2D Contact Map
Distance Threshold = 8Ao
Contact Prediction
SVMcon: http://casp.rnet.missouri.edu/svmcon.html NNcon:
http://casp.rnet.missouri.edu/nncon.html SCRATCH: http://scratch.proteomics.ics.uci.edu/ SAM:
http://compbio.soe.ucsc.edu/HMM-apps/HMM-applications.html
Structure Comparison
Visualize structure alignment using VAST:
http://www.ncbi.nlm.nih.gov/Structure/
Two ferredoxins, 1DOI and
1AWD, are aligned structurally,
showing an insertion in 1DOI
that contains potassium-ion
binding sites. This may be the
result of adaptations to the high
salt environment of the Dead Sea.
Structure Alignment Tools
CE (http://cl.sdsc.edu/) DALI
(http://www.ebi.ac.uk/dali/)
TM-Align: http://zhang.bioinformatics.ku.edu/TM-align/
Structure-Based Search
Comparing a query protein structure against
all the structures in the PDB
The DALI server:
http://www2.ebi.ac.uk/dali/
When new structures are solved, researchers often submit them to the DALI server to find structural neighbors and their alignments.
Swiss Model: Comparative Modeling Serverhttp://swissmodel.expasy.org/
Protein Structure Homology Modeling: Modeller
Analysis software
PROCHECK WHATCHECK Suite Biotech PROSA
Entrez Databaseshttp://www.ncbi.nlm.nih.gov/Entrez/
Design Program
DEZYMER (Hellinga)Given a ligand and a protein with known structure,
suggest residues to be mutated so that the resulting protein binds the ligand.
ORBIT (Mayo)Given a backbone structure, design a sequence such
that it folds to that backbone.
Rosetta (Baker)One program to treat diverse problems
Prediction and design
DEZYMER
1. Define the expected binding geometry
2. Find backbone places where if appropriate side chains are added, the predefined geometry is satisfied
3. Place the side chains and ligand, and optimize there position
4. Repack residues in positions other than binding residues. If necessary, change residue type
Hellinga and Richards, JMB, 1991. Construction of new ligand binding sites in protein of known structure
ORBIT
Comparison between the designed backbone (averaged NMR structure, blue) and the target backbone (red)
Solution structure of the designed protein. Stereoview showing the best-fit superposition of the 41
1. Divide the target structure into three parts: core, surface and boundary
2. Core: Ala, Val, Leu, Ile, Phe, Tyr, Trp Surface: Ala, Ser, Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg Boundary: union of the above two
3. 1.9*1027 possible sequence
4. Select best sequence efficiently, using dead end elimination (DDE)
Calciomics
Calciomics is a specialized area of biochemistry focusing on the study of calcium-binding biological macromolecules and proteins to understand the factors that contribute to calcium-binding affinity and the selectivity of proteins and calcium-dependent conformational change.
http://lithium.gsu.edu/faculty/Yang/Calciomics.htm
SOSUIRemove transmembrane
regions
SignalPRemove signal region
ProDom
Modifiedsequences
PROSPECT
Originalsequence
Set of domainsequences
Coiled coilsRemove disorder
regions
SSPSecondary Structure
prediction
PSI-BLAST
Iterations:Analysis of E-value,
set of profile sequences
STOPif homolog
found in PDB
3D model
Function annotation
SWISS-PROTannotation
PFAMFamily classification
MotifActive sites
PSORTSubcellular location
Enzyme structure DB
MedlineLiterature search
WHATIF /PROCHECK
Evaluate & adjust alignments
MODELLER/ Jackle
seq
uen
ce
anal
ysis
and
pro
cess
ing
stru
ctu
re p
red
icti
on a
nd
eva
luat
ion
fun
ctio
n in
fere
nce
tool
kit
Summary
Practice 10 selected tools Help answer the question: what does
this protein do? Collaborate with experimentalists Find more tools at
http://us.expasy.org/tools/http://infosuite.welch.jhmi.edu/BS/pt
Acknowledgments
This file is for the educational purpose only. Some materials (including pictures and text) were taken from the Internet at the public domain.