Probabilistic Ensembles for Improved Inference in Protein -Structure Determination

38
Probabilistic Ensembles for Improved Inference in Protein-Structure Determination Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011

description

Probabilistic Ensembles for Improved Inference in Protein -Structure Determination. Ameet Soni* and Jude Shavlik Dept . of Computer Sciences Dept. of Biostatistics and Medical Informatics. Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011. - PowerPoint PPT Presentation

Transcript of Probabilistic Ensembles for Improved Inference in Protein -Structure Determination

Page 1: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Probabilistic Ensembles for Improved Inference in

Protein-Structure Determination

Ameet Soni* and Jude ShavlikDept. of Computer SciencesDept. of Biostatistics and Medical Informatics

Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011

Page 2: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Protein Structure Determination

2

Proteins essential to mostcellular function Structural support Catalysis/enzymatic activity Cell signaling

Protein structures determine function

X-ray crystallography is main technique for determining structures

Page 3: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Task Overview3

Given A protein sequence Electron-density map

(EDM) of protein

Do Automatically produce a

protein structure that Contains all atoms Is physically feasible

SAVRVGLAIM...

Page 4: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Challenges & Related Work4

1 Å 2 Å 3 Å 4 Å

Our Method: ACMI

ARP/wARPTEXTAL & RESOLVE

Resolution is a

property of the protein

Higher Resolution : Better Quality

Page 5: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline5

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 6: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline6

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 7: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Our Technique: ACMI7

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

bk

bk-1

bk+1*1…M

Page 8: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Results[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]

8

Page 9: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

ACMI Outline9

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

bk

bk-1

bk+1*1…M

Page 10: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2 – Probabilistic Model

10

ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)

LEU4 SER5GLY2 LYS3ALA1

Page 11: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Probabilistic Model11

# nodes: ~1,000# edges:

~1,000,000

Page 12: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Approximate Inference12

Best structure intractable to calculatei.e., we cannot infer the underlying structure analytically

Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence between nodes

Page 13: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Loopy Belief Propagation13

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31

Page 14: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Loopy Belief Propagation14

LYS31 LEU32

mLEU32→LEU31

pLEU32pLYS31

Page 15: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Shortcomings of Phase 215

Inference is very difficult ~1,000,000 possible outputs for one amino

acid ~250-1250 amino acids in one protein Evidence is noisy O(N2) constraints

Approximate solutions, room for improvement

Page 16: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline16

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 17: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Ensembles: the use of multiple models to improve predictive performance

Tend to outperform best single model [Dietterich ‘00] Eg, Netflix prize

Ensemble Methods17

Page 18: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2: Standard ACMI18

Protocol

MRF

P(bk)

Page 19: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2: Ensemble ACMI19

Protocol 1

MRF

Protocol 2

Protocol C

P1(bk)

P2(bk)

PC(bk)

Page 20: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Probabilistic Ensembles in ACMI (PEA)20

New ensemble framework (PEA) Run inference multiple times, under

different conditions Output: multiple, diverse, estimates of each

amino acid’s location

Phase 2 now has several probability distributions for each amino acid, so what?

Page 21: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

ACMI Outline21

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3bk

bk-1

bk+1*1…M

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

Page 22: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Place next backbone atom

Backbone Step (Prior work)22

(1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution

bk-1b'k

bk-2

????

?

Page 23: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Place next backbone atom

Backbone Step (Prior work)23

0.25…

bk-1

bk-2

(2) Weight each sample by its Phase 2 computed marginal

b'k0.20

0.15

Page 24: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Place next backbone atom

Backbone Step (Prior work)24

0.25…

bk-1

bk-2

(3) Select bk with probability proportional to sample weight

b'k0.20

0.15

Page 25: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA25

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? Aggregator

w(b'k)

Page 26: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA: Average

26

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? AVG

0.14

Page 27: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA: Maximum

27

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? MAX

0.23

Page 28: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Backbone Step for PEA: Sample

28

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? SAMP

0.15

Page 29: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Review: Previous work on ACMI

29

Prot

ocol

P(bk)

0.25

bk-1

bk-2

0.20

0.15

Phase 2 Phase 3

Page 30: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Prot

ocol

Prot

ocol

Review: PEA30

Prot

ocol

bk-1

bk-2

0.14

0.26

0.05

Phase 2 Phase 3AG

G

Page 31: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Outline31

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Page 32: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Experimental Methodology32

PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP

ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components

Page 33: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Phase 2 Results33

*p-value < 0.01

Page 34: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Protein Structure Results34

*p-value < 0.05

Correctness Completeness

Page 35: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Protein Structure Results35

Page 36: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Impact of Ensemble Size36

Page 37: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Conclusions37

ACMI is the state-of-the-art method for determining protein structures in poor-resolution images

Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures

Future Work General solution for inference Larger ensemble size

Page 38: Probabilistic Ensembles for Improved Inference in  Protein -Structure  Determination

Acknowledgements38

Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural Genomics

(CESG)

NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant

GM074901

Thank you!