Slides - Slide 1

28
Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Craig Bingman Dept. of Biochemistry Center for Eukaryotic Structural Genomics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2010 Guiding Belief Propagation using Domain Knowledge for Protein-Structure Determination

Transcript of Slides - Slide 1

Page 1: Slides - Slide 1

Ameet Soni* and Jude ShavlikDept. of Computer SciencesDept. of Biostatistics and Medical Informatics

Craig BingmanDept. of BiochemistryCenter for Eukaryotic Structural Genomics

Presented at the ACM International Conference on Bioinformatics and Computational Biology 2010

Guiding Belief Propagationusing Domain Knowledge for

Protein-Structure Determination

Page 2: Slides - Slide 1

2

Protein Structure Determination

Proteins essential to most cellular function Structural support Catalysis/enzymatic activity Cell signaling

Protein structures determine function

X-ray crystallography is main technique for determining structures

2

Page 3: Slides - Slide 1

3

X-ray Crystallography: Background

Electron-DensityMap (3D Image)

Interpret

Protein Crystal

X-ray Beam

Protein Structure

3

Diffraction pattern

FFT

Collect

Page 4: Slides - Slide 1

4

Task Overview

Given: A protein sequence Electron-density map

(EDM) of protein

Do: Automatically produce a

protein structure (or trace) that is All atom Physically feasible

4

SAVRVGLAIM...

Page 5: Slides - Slide 1

5

Challenges & Related Work

1 Å 2 Å 3 Å 4 Å

Our Method: ACMI

5

ARP/wARPTEXTAL & RESOLVE

Page 6: Slides - Slide 1

6 ACMI Overview6

- Background

- Inference in ACMI-BP- Guiding Belief Propagation- Experiments & Results

Page 7: Slides - Slide 1

7

Our Technique: ACMI

Perform Local MatchApply Global Constraints

Sample Structure

ACMI-SH ACMI-BP ACMI-PF

7

pk+1(b )k+1*1

pk+1(b )k+1*2

pk+1(b )k+1*M

bk

bk-1

bk+1*1…M

a priori probability of

each AA’s location

marginal probabilityof each AA’s location

all-atom protein structures

Page 8: Slides - Slide 1

8

Previous Work [DiMaio et al, 2007]

8

Page 9: Slides - Slide 1

9

ACMI Framework

Perform Local MatchApply Global Constraints

Sample Structure

ACMI-SH ACMI-BP ACMI-PF

9

pk+1(b )k+1*1

pk+1(b )k+1*2

pk+1(b )k+1*M

bk

bk-1

bk+1*1…M

a priori probability of

each AA’s location

marginal probabilityof each AA’s location

all-atom protein structures

Page 10: Slides - Slide 1

10

Inference in ACMI-BP10

- Background- ACMI Overview

- Guiding Belief Propagation- Experiments & Results

Page 11: Slides - Slide 1

11

ACMI-BP11

ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)

LEU SERGLY LYSALA

Page 12: Slides - Slide 1

12

ACMI-BP: Pairwise Markov Field

LEU SERGLY LYSALA

Model ties adjacency constraints, occupancy constraints, and Phase 1 priors

12

Page 13: Slides - Slide 1

13

Approximate Inference

P(U|M) intractable to calculate, maximize exactly

ACMI-BP uses Loopy Belief Propagation (BP) Local, message-passing scheme Distributes evidence between nodes Approximates marginal probabilities if graph

has cycles

13

Page 14: Slides - Slide 1

14

ACMI-BP: Loopy Belief Propagation

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31

14

Page 15: Slides - Slide 1

15

ACMI-BP: Loopy Belief Propagation

LYS31 LEU32

mLEU32→LEU31

pLEU32pLYS31

15

Page 16: Slides - Slide 1

16

Guiding Belief Propagation16

- Background- ACMI Overview- Inference in ACMI-BP

- Experiments & Results

Page 17: Slides - Slide 1

Best case: wasted resources Worst case: poor information given more influence

Message Scheduling17

SERLYSALA

Key design choice: message-passing schedule When BP is approximate, ordering affects

solution[Elidan et al, 2006]

ACMI-BP uses a naïve, round-robin schedule

Page 18: Slides - Slide 1

18

Using Domain Knowledge18

Idea: use expert to assign importance of messages

Biochemist insight: well-structured regions of protein correlate with strong features in density map eg, helices/strands have stable conformations

Protein disorder - regions of a structure that are unstable/hard to define ACMI-BP can use disorder to decide

importance Accurate predictors exist based on sequence

alone

Page 19: Slides - Slide 1

19

Guided ACMI-BP19

Page 20: Slides - Slide 1

20

Related Work

Assumption: messages with largest change in value are more useful

Residual Belief Propagation [Elidan et al, UAI 2006] Calculates residual factor for each node

Each iteration, highest residual node passes messages

General BP technique

20

Page 21: Slides - Slide 1

21

Experiments & Results21

- Background- ACMI Overview- Inference in ACMI-BP- Guiding Belief Propagation

Page 22: Slides - Slide 1

22

Message Schedulers Tested22

Our previous technique: naive, round robin (BP)

Our proposed technique: Guidance using disorder prediction (DOBP) Disorder prediction using DisEMBL [Linding et al,

2003]

Prioritize residues with high stability (ie, low disorder)

Residual factor (RBP) [Elidan et al, 2006]

Page 23: Slides - Slide 1

23

Experimental Methodology

Run whole ACMI pipeline Phase 1: Local amino-acid finder (prior

probabilities) Phase 2: Either BP, DOBP, or RBP Phase 3: Sample all-atom structures from

Phase 2 results

Test set of 10 poor-resolution electron-density maps From UW Center for Eukaryotic Structural

Genomics Deemed the most difficult of a large set of

proteins

23

Page 24: Slides - Slide 1

24

ACMI-BP Marginal Accuracy24

Page 25: Slides - Slide 1

25

ACMI-BP Marginal Accuracy25

Page 26: Slides - Slide 1

27

Protein Structure Results27

Do these better marginals produce more accurate protein structures?

RBP fails to produce structures in ACMI-PF Marginals are high in entropy (28.48 vs 5.31) Insufficient sampling of correct locations

Page 27: Slides - Slide 1

28

Conclusions

Our contribution: framework for utilizing domain knowledge in BP message scheduling General technique for belief propagation Alternative to information-based techniques

Our technique improves inference in ACMI Disorder prediction used in our framework Residual-based technique fails

Future directions

28

Page 28: Slides - Slide 1

29

Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural

Genomics (CESG)

NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant

GM074901

Thank you!

Acknowledgements29