Probabilistic Ensembles for Improved Inference in Protein -Structure Determination

Post on 23-Feb-2016

27 views 0 download

Tags:

description

Probabilistic Ensembles for Improved Inference in Protein -Structure Determination. Ameet Soni* and Jude Shavlik Dept . of Computer Sciences Dept. of Biostatistics and Medical Informatics. Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011. - PowerPoint PPT Presentation

Transcript of Probabilistic Ensembles for Improved Inference in Protein -Structure Determination

Probabilistic Ensembles for Improved Inference in

Protein-Structure Determination

Ameet Soni* and Jude ShavlikDept. of Computer SciencesDept. of Biostatistics and Medical Informatics

Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011

Protein Structure Determination

2

Proteins essential to mostcellular function Structural support Catalysis/enzymatic activity Cell signaling

Protein structures determine function

X-ray crystallography is main technique for determining structures

Task Overview3

Given A protein sequence Electron-density map

(EDM) of protein

Do Automatically produce a

protein structure that Contains all atoms Is physically feasible

SAVRVGLAIM...

Challenges & Related Work4

1 Å 2 Å 3 Å 4 Å

Our Method: ACMI

ARP/wARPTEXTAL & RESOLVE

Resolution is a

property of the protein

Higher Resolution : Better Quality

Outline5

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Outline6

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Our Technique: ACMI7

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

bk

bk-1

bk+1*1…M

Results[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]

8

ACMI Outline9

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

bk

bk-1

bk+1*1…M

Phase 2 – Probabilistic Model

10

ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)

LEU4 SER5GLY2 LYS3ALA1

Probabilistic Model11

# nodes: ~1,000# edges:

~1,000,000

Approximate Inference12

Best structure intractable to calculatei.e., we cannot infer the underlying structure analytically

Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence between nodes

Loopy Belief Propagation13

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31

Loopy Belief Propagation14

LYS31 LEU32

mLEU32→LEU31

pLEU32pLYS31

Shortcomings of Phase 215

Inference is very difficult ~1,000,000 possible outputs for one amino

acid ~250-1250 amino acids in one protein Evidence is noisy O(N2) constraints

Approximate solutions, room for improvement

Outline16

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Ensembles: the use of multiple models to improve predictive performance

Tend to outperform best single model [Dietterich ‘00] Eg, Netflix prize

Ensemble Methods17

Phase 2: Standard ACMI18

Protocol

MRF

P(bk)

Phase 2: Ensemble ACMI19

Protocol 1

MRF

Protocol 2

Protocol C

P1(bk)

P2(bk)

PC(bk)

Probabilistic Ensembles in ACMI (PEA)20

New ensemble framework (PEA) Run inference multiple times, under

different conditions Output: multiple, diverse, estimates of each

amino acid’s location

Phase 2 now has several probability distributions for each amino acid, so what?

ACMI Outline21

Perform Local Match Apply Global Constraints Sample Structure

Phase 1 Phase 2 Phase 3bk

bk-1

bk+1*1…M

prior probability of

each AA’s location

posterior probabilityof each AA’s location

all-atom protein structures

Place next backbone atom

Backbone Step (Prior work)22

(1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution

bk-1b'k

bk-2

????

?

Place next backbone atom

Backbone Step (Prior work)23

0.25…

bk-1

bk-2

(2) Weight each sample by its Phase 2 computed marginal

b'k0.20

0.15

Place next backbone atom

Backbone Step (Prior work)24

0.25…

bk-1

bk-2

(3) Select bk with probability proportional to sample weight

b'k0.20

0.15

Backbone Step for PEA25

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? Aggregator

w(b'k)

Backbone Step for PEA: Average

26

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? AVG

0.14

Backbone Step for PEA: Maximum

27

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? MAX

0.23

Backbone Step for PEA: Sample

28

bk-1

bk-2

b'k0.23 0.15 0.04

PC(b'k)P2(b'k)P1(b'k)

? SAMP

0.15

Review: Previous work on ACMI

29

Prot

ocol

P(bk)

0.25

bk-1

bk-2

0.20

0.15

Phase 2 Phase 3

Prot

ocol

Prot

ocol

Review: PEA30

Prot

ocol

bk-1

bk-2

0.14

0.26

0.05

Phase 2 Phase 3AG

G

Outline31

Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results

Experimental Methodology32

PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP

ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components

Phase 2 Results33

*p-value < 0.01

Protein Structure Results34

*p-value < 0.05

Correctness Completeness

Protein Structure Results35

Impact of Ensemble Size36

Conclusions37

ACMI is the state-of-the-art method for determining protein structures in poor-resolution images

Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures

Future Work General solution for inference Larger ensemble size

Acknowledgements38

Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural Genomics

(CESG)

NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant

GM074901

Thank you!