How to use computational tools to maximize the coverage of protein sequence/structure/function space...
-
Upload
melvyn-park -
Category
Documents
-
view
221 -
download
3
Transcript of How to use computational tools to maximize the coverage of protein sequence/structure/function space...
How to use computational tools to maximize the coverage of protein sequence/structure/function space
Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong Lee, Frank Indiviglio, Janey LiHonig Lab: Markus Fischer and Donald Petrey
PSI Bottlenecks
1) Not enough connection between modeling and biology/experiment
2) “Modelability” not used in defining families or a dynamic target selection strategy
3) Incomplete use of functional information in model building
denotesa phosphoinositide
headgroup
Phosphoinositide signaling processes
Intracellular membranes containdistinct lipid compositions andcarry different charge densities
Binding behavior of a +8e peptideto membranes carrying
different negative charge densities
Biophysical properties of cellular protein/membrane interactions
Motif 1 Motif 2
C1/DAG C2/Ca2+ Protein kinase C–,,
PH/PIP2 C2/Ca2+ Phospholipase C–
PH/PIP2 PX/PI3P Phospholipase D
FYVE/PI3P PH/PI FGD1(a Rho/Rac GEF)
Basic/PS PH/PIP2 GPCR kinase
C2/Ca2+ Nonpolar Cytosolic phospholipase A2
ENTH/PIP2 Prot/prot Epsin1, AP180
Myristate Basic/PS Src, MARCKS, (HIV-1 Gag)
Proteins that function in phosphoinositide pathways contain multiple membrane binding motifs
Multiple inputs: Temporal and spatial control of subcellular targeting through coincidence counting
+25 mV-25 mV
Many peripheral proteins, especially those involved in subcellular targeting , are either highly basic or charge polarized.
Quantitative physical theory for the interaction of proteins with membrane surfaces
Connection among biophysical properties, membrane binding behavior, and subcellular localization
No calcium
Calcium
Phospholipase C C2 domains Homology models of all isoforms 5-lipoxygenase C2 domain
Homology model
Structural genomics and proteomics-level studies of lipid-interacting domains:Northeast Structural Genomics and Arabidopsis 2010
Apply what we have learned to whole families
BAR domainsC1 domainsC2 domains
ENTH domainsFERM domainsFYVE domains
GRAM domains
High-throughout comparative modeling: Leverage structure information
PDZ domainsPH domains
PHD domainsPX domains
Sec14 domainsSTART domains
VHS domains
All lipid-binding domains in all model genomes
Use what we have learned computationally and experimentally to develop:
1. More complete lists of peripheral proteins of known structure from the PDB;
2. Detect and model all instances of peripheral proteins in sequence databases;
3. Discover new instances, novel functionalities, new families;
4. Create databases to house this information;
5. Use this information to annotate protein sequences of unknown function.
PDB Structure
Sequence
Homologues
Non-redundant& unsolved
Models
Model quality
Secondary structure
Multiplealignments
Modeling alignments
Homologous structures
Data on homologues(species, IDs, coverage, length, e-value, seq. is.)
Leverage: unique models
MarkUs: Functionannotation
Family analysis
Specialized databases
Web-accessible models database
DSSP
PSI-BLAST
Modeller or Nest
PROSA, pG score
ClustalW
pG > 0.7
Targetreprioritization
Nebojsa MirkovicProteins 66:766
SkyLine: High-throughput comparative modeling
“Modelability”: Create “reliable” models using known structures as templates
NESG Models Database Frank Indiviglio
Models Database: http://156.145.102.40/nesg3/nesg.php
“Leverage”: Number and quality of 3D models produced from a set of structures as templatesPSI1 and PSI2: NESG leverage ~220 sequence unique models
Hunjoong Lee
Alternative models based on different PDB templates, reliability measures and sequence coverage
Additional search mechanisms:Expand methodology to the entire PDB, create specialized family and genome databases
2.3x10-9 M 2.6x10-9 M
C2 domains from phospholipase C isoforms:Comparative functionality
Kd Kd
8.9x10-8 M → 6.2x10-9 M 4.0x10-8 M
C2 domains from phospholipase C isoforms:Comparative functionality
Kd
2.3x10-9 M
Differences between d1 and d4:Detection of specificity determinants leads to hypotheses for differential regulation
8.9x10-8 M → 6.2x10-9 MKd Kd
FYVE domain family: Electrostatic properties of models correlate with in vitro binding measurements and subcellular localization:
Comparison of different members
Whole family modeling: FYVE domains
FYVE domain family: Electrostatic properties of models correlate with in vitro binding measurements and subcellular localization:
Residue substitution of a single family member
Model/Computation Experiment
Structure
There is no straightforward prescription: Each family has to be dealt with individually
“Modelability”: Create “reliable” models using known structures as templates
Dynamic target re-prioritization is an important strategy
409395 83410 36 35171341 86 29 16 54 78356134 71 63
START domain leverage
Modelability (7378) versus 30% sequence identity (2767)
Characterize different START domains based on structural information
Discriminate whether START domains bind cholesterol or PC (PI) or other ligands
Provide leads for chemical library studies for function-interfering compounds
Detailed computational analysis and function annotation
Fine-grain structure analysis in the absence and presence of potential ligand
Experimental characterization: Protein production, SPR analysis, cellular studies
Collaborations with Experimental Groups
Cho Lab: High-throughput analysis ofHuman and Arabidopsis START domains
Clark Lab: Docking studies of ubiquinone intonematode START domain, electron transport
START domains in the Arabidopsis thaliana genome
SkyLine produces quality models for 58 non-redundant sequencesversus
35 Arabidopsis START domains detected by sequence searches (Genome Biology 5:R41)
Key Findings (Tonya Silkov)
1. 45 sequences are of the Birch antigen class
2. Two sequences correspond to AHA1 domains (Activator of Hsp90 ATPase)SCOP classifies AHA domains as belonging to the Birch antigen superfamily
3. Two sequences predicted in databases as integral membrane proteins of unknown function
4. Five sequences for related models apparently represent a group of uncharacterized plant START domains
Fig. 1
ENTH domain ANTH domain VHS domain
Cross-genomic studiesStructure similarity among lipid-binding domains
Tonya Silkov
PIP2
PIP2
J Biol Chem. 278:28993
with Cho Lab
Helix 0
ANTH
ANTH ENTH
ENTH
ENTH and ANTH: similar topology, different membrane binding mechanism
Helix 0 Helix 0
From above
Tonya Silkov
ENTH
ANTH
ENTH ANTH
Cho Lab: First 25 amino acidsare required for both PIP2binding and membrane penetration.Produce enough protein to obtain crystals.
Arabidopsis domain with novel dual ENTH and ANTH functionality
Fig. 1
ENTH domain ANTH domain VHS domain
A novel functional subclass of VHS domains
Tonya Silkov
KIAA1530 (Homo sapiens)
XP_747424(Strongylocentrotus purpuratus)
CAB71110(Arabidopsis thaliana)
XP_420852(Gallus gallus)
Tonya Silkov
A new VHS-related family, “VR domains”, found in other genomes
Among this subset of VHS domains, the basic surface patch is conservedHypothesis: It constitutes a phosphoinositide-specific binding site
VR domain family of membrane-binding VHS domains
Tonya Silkov
Human and Arabidopsis constructs are being examined in the Cho lab
The ability to construct a quality model of a sequence is a more strategic definition of a protein family member
Allows for the discovery of distantly related members
With function annotation, allows for the discovery of new sub-groups
Structures + Sequences -> Models + Function annotation (Markus) More comprehensive coverage of protein sequence/structure/function space
By constantly updating resources as new information becomes available, we produce a more relevant (dynamic) target selection strategy