1 Combinatorial optimisation in protein structure prediction and recognition: Background, review,...

35
1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak

Transcript of 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review,...

Page 1: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

1

Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction

Speaker: Vicky Mak

Page 2: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

2

What’s in this talk?

What is protein structure prediction and recognition?

Who has done what before?What’s interesting and hasn’t been done?

Being critical about others’ work is easy.Doing something brilliant is difficult.This talk addresses the easy problem.

Page 3: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

3

Combining two Amino acids

Amino group

Residue

-Carbon

N-terminal C-terminal

Before

After

Carboxy group

Page 4: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

4

Protein: polypeptide chain

A polypeptide chain: chain of amino acids linked together by peptide bonds.

Each amino acid is the same except for the residues. There are 20 such amino acids. Different combinations of these 20 amino acids make different proteins.

A protein sequence can contain from tens to thousands of amino acids.

N-terminal C-terminal

Page 5: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

5

An example-helix

-sheet

Primarystructure:individualamino acids.

Secondarystructure:-helix and-sheet.

The green chain defines a tertiary structure. So is the blue chain.

Quaternarystructure:green+blue chains.

Page 6: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

6

Motivation

Notice: It is the 3-D structures of the proteins that are important (2 different sequences can have exactly the same structure!)

Need to know the “shape” of a protein, so as to develop antibodies that “bind” that shape - Fold prediction.

Antibodies produced against one protein may also work for another protein that “looks similar” - Structure recognition.

Page 7: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

7

Structure prediction

Page 8: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

8

HP models (Ab initio prediction )

Given a sequence of amino acids, determine the structure from scratch.

Hydrophobic-hydrophilic (HP) model proposed by Dill (1985)

Two groups of amino acids: Hydrophobic acids (H) Hydrophilic acids (P)

Self avoiding walks on latticesObjective: minimise global free energy

Meaning, it’s good to put as many Hydrophobic acids as close together as possible.

Page 9: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

9

HP model on lattices:a 2-dimensional example

Hydrophobic acids

Hydrophilic acids

Page 10: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

10

HP model on lattices:a 2-dimensional example

Hydrophobic acids

Hydrophilic acids

Fold with 5 hydrophobic contacts

Page 11: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

11

Previous work on HP models Most previous work involves complete enumeration of self-

avoiding random walks on various lattices (e.g. Lau and Dill (1989), Irback and Troein (2002)) Irback and Troein (2002) managed sequences with up to 25 amino

acids

Unger and Moult (1993) - hybrid Genetic Algorithm and Simulated Annealing (2-D) size 20-64. Opt for size 36,48,60 (Opt ?! How do they know?) Shakhovich et al. (1991) tried SA on 30 27-acid problems. (Only 1

found global minimum. Inappropriate local search is to blame.)

Backofen (2001) constraint programming approach tested problems of size 27-36, time: 20min - 1hr38min (opt)

IP models proposed recently in Greenberg, Hart and Lancia (2002). No numerical results reported as yet. (See pages 1-4 of pdf file)

Page 12: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

12

Problems with IP modelsDealing with symmetry

Methods are suggested in Greenberg, Hart and Lancia (2002) and in Beckofen’s PhD thesis.

What about other lattices?

Number of lattice points unnecessarily large. Lau and Dill (1989) proposed maximal compact chain

conformations: Lattice walks in which every point is occupied by exactly one amino acid.

E.g. 3x3x3 cubic lattice for a 27-amino acid-chain

May be not that tight, but definitely not n2. May be a union of some of those maximal compact

chain conformations.

Page 13: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

13

Let’s be criticalCubic lattices probably not good enough. But it’s a

good start anyway. Faulon, Rintoul and Young (2002) tried 2-D honeycomb, 2-D square,

3-D diamond and 3-D cubic lattices. Agarwala et al. triangular lattice (Constrained SAW, no optimisation involved).

Use energy matrix rather than simple unit credit for each HH interaction? (Different hydrophobicity) Energy released by putting different pairs of H-acids together are

different, and are depending on how far they are apart in sequence! Dill’s HP model is too simplified. Besides, interactions between H-acids should be defined differently to

the Domain and Neighbourhood.

Page 14: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

14

Under old definitions, suppose are hydrophobic acids,

are all the same.

Page 15: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

15

Butsurely

lookbetterthan

Page 16: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

16

Research opportunitiesExact algorithms

Alternative ILP formulations (with tight LP relaxation bounds)

Difference in lattice neighbourhood and hydrophobic interaction neighbourhood (use Euclidean distance for the latter).

Development of solution methodologies

Modify Dill’s model to deal with reality Alternative lattices (apply optimisation techniques as supposed to

complete or simple constrained numeration).More complicated hydrophobicity (Atkins and Hart (1999)

discussed fixed energy matrix and proved NP-hardness).Previous methods either constraints programming or integer linear

programming. Why not a hybrid CP and ILP approach?

Page 17: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

17

Research opportunities

No methods so far can manage a sequence with >100 amino acids Heuristics:

Meta-heuristics: still room for research, try different neighbourhood scheme

• Tailor-made search techniques that considers folding patterns

Development of problem-specific heuristic or greedy heuristic

• At least that will provide quick initial bounds for exact methods.

Page 18: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

18

Structure recognition

Page 19: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

19

Sequence alignment Comparing a sequence of amino acids with known

sequences in Protein Data Bank on the primary structure level.

Does this sequence look alike that sequence? Methods well developed: e.g. BLAST.

Fold recognition Comparing the structure of an unknown protein with

known protein structures in PDB.Contact Map Optimisation (primary-structure comparisons)Arthur Lesk’s model (secondary-structure comparisons)Ip et al.’s model (secondary-structure comparisons)

Page 20: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

20

Comparing 3-D structures of two sequences of amino acids, e.g. s=(s1..sm) and t=(t1..tn). (Assuming you already know how each of them look like, and you now want to know how much they look alike each other.)

Construct an undirected graph for each of s and t, amino-acids as vertices.

For each sequence, two amino acids that are within a certain Euclidean distance from each other are connected by an edge.

Contact Map Optimisation

Page 21: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

21

Contact Map Optimisation

s

t

s1 s2 sm

t1 t2 tn

Page 22: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

22

Contact Map OptimisationOne way of mapping.4 pairs of edges mapped.

Page 23: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

23

Contact Map OptimisationAnother way of mapping.5 edges mapped.

Page 24: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

24

Wait a minute...Remember from the HP models, amino

acids are divided into two groups. What is the point of mapping a hydrophobic amino acid in one graph to a hydrophilic amino acid in another or vice versa???

Adding constraints that only amino acids of the same group are supposed to be matched might be helpful!!!

Page 25: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

25

Who has done what?No one noticed the HP issue so models aren’t 100%

cool.Lancia et al. (2001) ILP model (see pages 5-6 of pdf file)

LP-relaxation of no-crossing constraints typically weak, hence clique constraints (exponentially many) are introduced.

Problem can be converted to a max independent problem, for which cliques inequalities are facet-defining.

O(n2) time separation for cliques. Root-node LP relaxation (from 1min to 2hours for 62-74

acids and 80-140 contacts. The more alike of the two proteins the faster LP relaxation can be solved!)

Page 26: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

26

Who has done what?Heuristic approaches:

Lancia et al. (2001)Genetic algorithm (GA)Steepest ascent local search

Results of Lancia et al. Exact algorithm

Gaps: 0->5% (Mostly >5% exactly how much??)

HeuristicsSame story as above. GA much better than LS.

Work on similar topics can also be found in Havel et al. (1979), Martin et al. (1992) and so on.

Page 27: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

27

Let’s be critical...Even just the LP relaxation of the IP

formulation without no-crossing constraints takes a long time to solve for comparing pairs of real protein sequences with 100-200 amino acids. Tried comparing two sequences with 120+

amino acids, took more than 10 hours!!!

Really should consider the HP issue, and may be even aggregating certain amino acids!

Page 28: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

28

Let’s be critical...A big problem with model - a 3-D example

1 2 3 4 5 6 7

12

3

6

47

5

1

3

6

7

5

2

4

Consider the following sequence

Two different structures giving the same objective valueby the ILP formulation of Lancia et al. assuming acids withine-distance of 31/3 are connected by an edge.

Page 29: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

29

Research opportunities

Exact methods New ILP formulation. Alternative solution methodologies for solving

the ILPs - now that we know the ILP models are huge and solving them is hard.

Heuristics Problem specific heuristic. Different neighbourhood search for meta-

heuristics.

Page 30: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

30

Arthur Lesk’s modelCompare structures of two protein

sequences by inspecting relations between secondary structures

Does the blue protein look like the green protein?

Page 31: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

31

Angle btw pairs Symbol0-45 A45-90 B90-135 C135-180 D180-225 E225-270 F270-315 G315-360 H

Page 32: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

32

1 β1 β2 α2 β3 β4

α1 - B C Dβ1 - A Fβ2 - E F Gα2 - Aβ3 -β4 -

Page 33: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

33

1 β1 β2 α2 β3 β4

α1 - B C Dβ1 - A Fβ2 - E F Gα2 - Aβ3 -β4 -

'1 β'1 β'2 α'2 β'3 β'4α'1 - D C Bβ'1 - A Fβ'2 - E F Hα'2 - Aβ'3 -β'4 -

Protein sequence 1

Protein sequence 2

Page 34: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

34

Similar to CMO...

1 1 2 2 3 4

C

D

B

’1 ’1

Page 35: 1 Combinatorial optimisation in protein structure prediction and recognition: Background, review, and research direction Speaker: Vicky Mak.

35

Useful papers and websites

Greenberg, H.J., Hart, W.E., Lancia, G. “Opportunities for Combinatorial Optimization in Computational Biology”

http://www.dkfz-heidelberg.de/tbi/bioinfo/ProteinStructure/

Christian Lemmen and Thomas Lengauer. “Computational methods for the structural alignment of molecules”, Journal of Computer-Aided Molecular Design, 14 215- 232, 2000.