How to use computational tools to maximize the coverage of protein sequence/structure/function space...

30
How to use computational tools to maximize the coverage of protein sequence/structure/function space b: Nebojsa Mirkovic, Tonya Silkov, Hunjoong Lee, Frank Indiviglio, : Markus Fischer and Donald Petrey lenecks enough connection between modeling and biology/experiment elability” not used in defining families or a dynamic target selecti omplete use of functional information in model building

Transcript of How to use computational tools to maximize the coverage of protein sequence/structure/function space...

Page 1: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

How to use computational tools to maximize the coverage of protein sequence/structure/function space

Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong Lee, Frank Indiviglio, Janey LiHonig Lab: Markus Fischer and Donald Petrey

PSI Bottlenecks

1) Not enough connection between modeling and biology/experiment

2) “Modelability” not used in defining families or a dynamic target selection strategy

3) Incomplete use of functional information in model building

Page 2: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

denotesa phosphoinositide

headgroup

Phosphoinositide signaling processes

Page 3: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Intracellular membranes containdistinct lipid compositions andcarry different charge densities

Binding behavior of a +8e peptideto membranes carrying

different negative charge densities

Biophysical properties of cellular protein/membrane interactions

Page 4: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Motif 1 Motif 2

C1/DAG C2/Ca2+ Protein kinase C–,,

PH/PIP2 C2/Ca2+ Phospholipase C–

PH/PIP2 PX/PI3P Phospholipase D

FYVE/PI3P PH/PI FGD1(a Rho/Rac GEF)

Basic/PS PH/PIP2 GPCR kinase

C2/Ca2+ Nonpolar Cytosolic phospholipase A2

ENTH/PIP2 Prot/prot Epsin1, AP180

Myristate Basic/PS Src, MARCKS, (HIV-1 Gag)

Proteins that function in phosphoinositide pathways contain multiple membrane binding motifs

Multiple inputs: Temporal and spatial control of subcellular targeting through coincidence counting

Page 5: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

+25 mV-25 mV

Many peripheral proteins, especially those involved in subcellular targeting , are either highly basic or charge polarized.

Page 6: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Quantitative physical theory for the interaction of proteins with membrane surfaces

Page 7: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Connection among biophysical properties, membrane binding behavior, and subcellular localization

No calcium

Calcium

Phospholipase C C2 domains Homology models of all isoforms 5-lipoxygenase C2 domain

Homology model

Page 8: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Structural genomics and proteomics-level studies of lipid-interacting domains:Northeast Structural Genomics and Arabidopsis 2010

Apply what we have learned to whole families

BAR domainsC1 domainsC2 domains

ENTH domainsFERM domainsFYVE domains

GRAM domains

High-throughout comparative modeling: Leverage structure information

PDZ domainsPH domains

PHD domainsPX domains

Sec14 domainsSTART domains

VHS domains

Page 9: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

All lipid-binding domains in all model genomes

Use what we have learned computationally and experimentally to develop:

1. More complete lists of peripheral proteins of known structure from the PDB;

2. Detect and model all instances of peripheral proteins in sequence databases;

3. Discover new instances, novel functionalities, new families;

4. Create databases to house this information;

5. Use this information to annotate protein sequences of unknown function.

Page 10: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

PDB Structure

Sequence

Homologues

Non-redundant& unsolved

Models

Model quality

Secondary structure

Multiplealignments

Modeling alignments

Homologous structures

Data on homologues(species, IDs, coverage, length, e-value, seq. is.)

Leverage: unique models

MarkUs: Functionannotation

Family analysis

Specialized databases

Web-accessible models database

DSSP

PSI-BLAST

Modeller or Nest

PROSA, pG score

ClustalW

pG > 0.7

Targetreprioritization

Nebojsa MirkovicProteins 66:766

SkyLine: High-throughput comparative modeling

“Modelability”: Create “reliable” models using known structures as templates

Page 11: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

NESG Models Database Frank Indiviglio

Page 12: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Models Database: http://156.145.102.40/nesg3/nesg.php

“Leverage”: Number and quality of 3D models produced from a set of structures as templatesPSI1 and PSI2: NESG leverage ~220 sequence unique models

Hunjoong Lee

Page 13: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Alternative models based on different PDB templates, reliability measures and sequence coverage

Page 14: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Additional search mechanisms:Expand methodology to the entire PDB, create specialized family and genome databases

Page 15: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

2.3x10-9 M 2.6x10-9 M

C2 domains from phospholipase C isoforms:Comparative functionality

Kd Kd

Page 16: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

8.9x10-8 M → 6.2x10-9 M 4.0x10-8 M

C2 domains from phospholipase C isoforms:Comparative functionality

Kd

Page 17: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

2.3x10-9 M

Differences between d1 and d4:Detection of specificity determinants leads to hypotheses for differential regulation

8.9x10-8 M → 6.2x10-9 MKd Kd

Page 18: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

FYVE domain family: Electrostatic properties of models correlate with in vitro binding measurements and subcellular localization:

Comparison of different members

Whole family modeling: FYVE domains

Page 19: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

FYVE domain family: Electrostatic properties of models correlate with in vitro binding measurements and subcellular localization:

Residue substitution of a single family member

Page 20: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Model/Computation Experiment

Structure

There is no straightforward prescription: Each family has to be dealt with individually

“Modelability”: Create “reliable” models using known structures as templates

Dynamic target re-prioritization is an important strategy

Page 21: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

409395 83410 36 35171341 86 29 16 54 78356134 71 63

START domain leverage

Modelability (7378) versus 30% sequence identity (2767)

Page 22: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Characterize different START domains based on structural information

Discriminate whether START domains bind cholesterol or PC (PI) or other ligands

Provide leads for chemical library studies for function-interfering compounds

Detailed computational analysis and function annotation

Fine-grain structure analysis in the absence and presence of potential ligand

Experimental characterization: Protein production, SPR analysis, cellular studies

Collaborations with Experimental Groups

Cho Lab: High-throughput analysis ofHuman and Arabidopsis START domains

Clark Lab: Docking studies of ubiquinone intonematode START domain, electron transport

Page 23: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

START domains in the Arabidopsis thaliana genome

SkyLine produces quality models for 58 non-redundant sequencesversus

35 Arabidopsis START domains detected by sequence searches (Genome Biology 5:R41)

Key Findings (Tonya Silkov)

1. 45 sequences are of the Birch antigen class

2. Two sequences correspond to AHA1 domains (Activator of Hsp90 ATPase)SCOP classifies AHA domains as belonging to the Birch antigen superfamily

3. Two sequences predicted in databases as integral membrane proteins of unknown function

4. Five sequences for related models apparently represent a group of uncharacterized plant START domains

Page 24: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Fig. 1

ENTH domain ANTH domain VHS domain

Cross-genomic studiesStructure similarity among lipid-binding domains

Tonya Silkov

PIP2

PIP2

Page 25: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

J Biol Chem. 278:28993

with Cho Lab

Helix 0

ANTH

ANTH ENTH

ENTH

ENTH and ANTH: similar topology, different membrane binding mechanism

Page 26: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Helix 0 Helix 0

From above

Tonya Silkov

ENTH

ANTH

ENTH ANTH

Cho Lab: First 25 amino acidsare required for both PIP2binding and membrane penetration.Produce enough protein to obtain crystals.

Arabidopsis domain with novel dual ENTH and ANTH functionality

Page 27: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Fig. 1

ENTH domain ANTH domain VHS domain

A novel functional subclass of VHS domains

Tonya Silkov

Page 28: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

KIAA1530 (Homo sapiens)

XP_747424(Strongylocentrotus purpuratus)

CAB71110(Arabidopsis thaliana)

XP_420852(Gallus gallus)

Tonya Silkov

A new VHS-related family, “VR domains”, found in other genomes

Page 29: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Among this subset of VHS domains, the basic surface patch is conservedHypothesis: It constitutes a phosphoinositide-specific binding site

VR domain family of membrane-binding VHS domains

Tonya Silkov

Human and Arabidopsis constructs are being examined in the Cho lab

Page 30: How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

The ability to construct a quality model of a sequence is a more strategic definition of a protein family member

Allows for the discovery of distantly related members

With function annotation, allows for the discovery of new sub-groups

Structures + Sequences -> Models + Function annotation (Markus) More comprehensive coverage of protein sequence/structure/function space

By constantly updating resources as new information becomes available, we produce a more relevant (dynamic) target selection strategy