Casein protein and the effect on rehydration in comparison ...
Protein structure comparison and contact maps
description
Transcript of Protein structure comparison and contact maps
![Page 1: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/1.jpg)
Protein structure comparison andcontact maps
![Page 2: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/2.jpg)
A ProteinProtein is a complex molecule with a primary, linear structure (a sequence of aminoacids) and a3-Dimensional structure (the protein fold).
Protein STRUCTURE determines its FUNCTION
For instance, the Drug Design problemcalls for constructing peptides with a 3Dshape complementary to a protein, so asto dock onto it.
![Page 3: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/3.jpg)
Motivation:Structure Alignment is Important for:
- Discovery of Protein Function (shape determines function)
- Search in 3D data bases
- Protein Classification and Evolutionary Studies
- Assessment of Fold Prediction quality (e.g. CASP)-…..
Problem: Align two 3D protein structures
![Page 4: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/4.jpg)
Contact Maps
![Page 5: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/5.jpg)
CONTACT MAPSUnfolded protein
![Page 6: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/6.jpg)
Unfolded protein
Folded protein = contacts
CONTACT MAPS
![Page 7: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/7.jpg)
Unfolded protein
Folded protein = contacts
Contact map = graph
CONTACT MAPS
![Page 8: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/8.jpg)
Unfolded protein
Folded protein = contacts
Contact map = graph
OBJECTIVE: align 3d folds of proteins = align contact maps
CONTACT MAPS
![Page 9: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/9.jpg)
Contact Maps are related to fold: Similar folds similar contact maps
We studied the problem of determining contact map similarityWe studied the problem of determining contact map similarity
![Page 10: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/10.jpg)
Contact Maps are related to fold: Similar folds similar contact maps
We studied the problem of determining contact map similarityWe studied the problem of determining contact map similarity
In the period 2001-2004 ------------------------------
-I.P. formulation via Branch & Cut (RECOMB)
-Use of Compact Optimization instead of separation (AIRO)
-Lagrangian Relaxation (RECOMB)
(Pubblications: RECOMB proceedings, AIRO proceedings, OR Letters, Journal of Comp. Bio., 4OR)
![Page 11: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/11.jpg)
The Contact Map AlignmentProblem
![Page 12: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/12.jpg)
Non-crossing Alignments
Protein 1
Protein 2
non-crossing map of residues in protein 1 and protein 2
![Page 13: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/13.jpg)
The value of an alignment
![Page 14: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/14.jpg)
The value of an alignment
![Page 15: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/15.jpg)
The value of an alignment
![Page 16: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/16.jpg)
Value = 3
The value of an alignment
![Page 17: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/17.jpg)
Value = 3
The value of an alignment
We want to maximize the value
![Page 18: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/18.jpg)
The value of an alignment
NP-Hard (Goldman, Istrail, Papadimitriou, 1999)
![Page 19: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/19.jpg)
Integer Programming Formulation
(5th RECOMB conference)
![Page 20: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/20.jpg)
The use of Integer Linear Programming
Integer Programming Formulation
• Model a difficult problem by 0-1 variables, linear objective function and linear constraints
• Can find optimal solution by branch and bound
• Bound comes from LP relaxation (polynomial)
• Bound can be used to access quality of any feasible sol
![Page 21: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/21.jpg)
CONTACT-CONTACT VARS
yef for e and f contacts
f
yef
RESIDUE-RESIDUE VARS
xij for i and j residuesyef
i
j
xij
(i) 0-1 VARIABLES(i) 0-1 VARIABLES
e
![Page 22: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/22.jpg)
maximize ef yef
(ii) OBJECTIVE(ii) OBJECTIVE
over all feasible x and y
![Page 23: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/23.jpg)
(iii) CONSTRAINTS (FEASIBILITY)(iii) CONSTRAINTS (FEASIBILITY)
y(ip)(jq) <= xij and y(ip)(jq) <= xpq
non-crossing
i
j
i’
j’
xij + xi’j’ <= 1
i
j
p
q
activation
![Page 24: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/24.jpg)
Non-crossing clique Constraints
Variables x define a graph Gx:
• A node for each line• An edge between each pair of crossing lines
i
j
i’
j’
ij
i’j’
![Page 25: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/25.jpg)
Variables x define a graph Gx:
• An independent set corresponds to a noncrossing alignment• Gx has nice proprieties (it’s a perfect graph)• It’s easy (poly) to find large independent sets in Gx
• A node for each line• An edge between each pair of crossing lines
i
j
i’
j’
ij
i’j’
Clique Constraints
![Page 26: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/26.jpg)
Non-crossing constraints can be extended to
CLIQUE CONSTRAINTS
xij <= 1[i,j] in M
For all sets M of mutually incompatible (i.e. crossing) lines
All clique constraints satisfied imply a strong bound!
Clique Constraints
![Page 27: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/27.jpg)
Maximal cliques in Gx
![Page 28: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/28.jpg)
Structure of Maximal cliques in Gx
1. Pick two subsets of same size
![Page 29: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/29.jpg)
Structure of Maximal cliques in Gx
2. Connect them in a zig-zag fashion
![Page 30: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/30.jpg)
Structure of Maximal cliques in Gx
![Page 31: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/31.jpg)
Structure of Maximal cliques in Gx
![Page 32: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/32.jpg)
Structure of Maximal cliques in Gx
![Page 33: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/33.jpg)
Structure of Maximal cliques in Gx
![Page 34: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/34.jpg)
Structure of Maximal cliques in Gx
![Page 35: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/35.jpg)
Structure of Maximal cliques in Gx
![Page 36: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/36.jpg)
Structure of Maximal cliques in Gx
![Page 37: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/37.jpg)
Structure of Maximal cliques in Gx
![Page 38: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/38.jpg)
Structure of Maximal cliques in Gx
![Page 39: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/39.jpg)
Structure of Maximal cliques in Gx
3. Throw in all lines included in a zig or a zag
![Page 40: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/40.jpg)
Structure of Maximal cliques in Gx
3. Throw in all lines included in a zig or a zag
![Page 41: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/41.jpg)
Structure of Maximal cliques in Gx
The result is a maximal clique in Gx
![Page 42: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/42.jpg)
Separation of Clique Inequalities
![Page 43: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/43.jpg)
Separation of Clique InequalitiesPROBLEM
There exist exponentially many such cliques (O(22n) inequalities).How do we add them ?
![Page 44: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/44.jpg)
PROBLEM
There exist exponentially many such cliques (O(22n) inequalities).How do we add them ?
SOLUTION
We don’t add them in the original LP, but only when needed at runtime. Not all of them will be needed, so we are fine as long as…
![Page 45: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/45.jpg)
PROBLEM
There exist exponentially many such cliques (O(22n) inequalities).How do we add them ?
SOLUTION
We don’t add them in the original LP, but only when needed at runtime. Not all of them will be needed, so we are fine as long as…
SEPARATION
…we can generate in polynomial time a clique inequality when needed,i.e., when violated by the current LP solution x*
x*ij > 1[i,j] in M
![Page 46: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/46.jpg)
PROBLEM
There exist exponentially many such cliques (O(22n) inequalities).How do we add them ?
SOLUTION
We don’t add them in the original LP, but only when needed at runtime. Not all of them will be needed, so we are fine as long as…
SEPARATION
…we can generate in polynomial time a clique inequality when needed,i.e., when violated by the current LP solution x*
x*ij > 1[i,j] in M
THEOREM
We can find the most violated clique inequality in time O(n2)
![Page 47: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/47.jpg)
n2
1n11 2
2
i
u
Separation of Clique Inequalities
Create n1 x n2 grid
![Page 48: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/48.jpg)
n2
1n11 2
2
i
u
Orient all edges and give weights
x*iu
x*iu
Separation of Clique Inequalities
Create n1 x n2 grid
![Page 49: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/49.jpg)
Create n1 x n2 gridOrient all edges and give weightsThere is violated clique iff longest A,B path has length > 1
A=(1,n2)
B=(n1,1)
Separation of Clique Inequalities
.25 .20 .30
0
.35 0 .15
.20 0
![Page 50: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/50.jpg)
• The method which adds violated inequalities by separation is called BRANCH-and-CUT
• The method can get stuck in long runs of cut additions each of which “cuts very little”
• There is an alternative to this, called COMPACT OPTIMIZATION
![Page 51: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/51.jpg)
Detour:
COMPACT OPTIMIZATIONCOMPACT OPTIMIZATION
vs vs
BRANCH and CUTBRANCH and CUT
![Page 52: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/52.jpg)
Polynomial number of “simple” constraints
LP Separation Paradigm
Exponential number of “combinatorial” constraints
TSP, O(n) Degree constraints
TSP, O(2^n)Subtour Elimination
![Page 53: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/53.jpg)
Polynomial number of “simple” constraints
SEPARATION ALGORITHM:Look for a (the most) violated constraint
Cast as an OPTIMIZATION problem
A violated combinatorial constraint exists
+ some of the combinatorial ones
Add inequalities
Exponential number of “combinatorial” constraints
END YESYES NONO
LP Separation Paradigm
![Page 54: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/54.jpg)
Polynomial number of “simple” constraints
SEPARATION ALGORITHM:Look for a (the most) violated constraint
Cast as an OPTIMIZATION problem
+ some of the combinatorial ones
Add inequalities
Exponential number of “combinatorial” constraints
END YESYES NONO
E.g. SHORTEST PATH, MAX FLOW,E.g. SHORTEST PATH, MAX FLOW, BIP MATCHING...BIP MATCHING...
A violated combinatorial constraint exists
LP Separation Paradigm
![Page 55: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/55.jpg)
Polynomial number of “simple” constraints
SEPARATION ALGORITHM:Look for a (the most) violated constraint
Cast as an OPTIMIZATION problem
+ some of the combinatorial ones
Add inequalities
Exponential number of “combinatorial” constraints
END YESYES NONO A violated combinatorial
constraint exists
...IT’S AN LP ITSELF !!...IT’S AN LP ITSELF !!
LP Separation Paradigm
![Page 56: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/56.jpg)
Polynomial number of “simple” constraints+ some of the combinatorial ones
Add inequalities
END YESYES NONO A violated combinatorial
constraint exists
Polynomial number of constraintsPolynomial number of constraintsfor the separation problem as an LPfor the separation problem as an LP
LP Separation Paradigm
![Page 57: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/57.jpg)
Polynomial number of “simple” constraints+ some of the combinatorial ones
Add inequalities
END YESYES NONO
Polynomial number of constraintsPolynomial number of constraintsfor the separation problem as an LPfor the separation problem as an LP
Polynomial number of constraintsPolynomial number of constraintsto force that no violated combinatorialto force that no violated combinatorialconstraint existsconstraint exists
LP Separation Paradigm
![Page 58: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/58.jpg)
Polynomial number of “simple” constraints
Polynomial number of constraintsPolynomial number of constraintsfor the separation problem as an LPfor the separation problem as an LP
Polynomial number of constraintsPolynomial number of constraintsto force that no violated combinatorialto force that no violated combinatorialconstraint existsconstraint exists
Org variables x
Org variables x+ new variables y
Variables x, y
LP Separation Paradigm
![Page 59: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/59.jpg)
ThTh: Optimization iff Separation: Optimization iff Separation (Grotschel, Lovasz, Schrijver, 1981)
Compact OptimizationCompact Optimization: solve an LP with an exponential n. of: solve an LP with an exponential n. ofinequalities by lifting to a space with a polynomial n. ofinequalities by lifting to a space with a polynomial n. ofinequalities, solving, and projecting backinequalities, solving, and projecting back
ThTh: Compact Optimization iff Compact Separation: Compact Optimization iff Compact Separation (Carr, Lancia, 2002)
Somewhat known (Maculan, Pulleyblank) Somewhat known (Maculan, Pulleyblank) rediscovered (Carr, Lancia, 00)rediscovered (Carr, Lancia, 00)
![Page 60: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/60.jpg)
the application to contact maps comparison
![Page 61: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/61.jpg)
21
maxEf
efEe
y
iuij
vuji xy )(
),)(,(
1221 ),(,),(, EvuViEvuVi
ivij
vuij xy )(
),)(,(
1221 ),(,),(, EvuViEvuVi
1
ijQij
x for all cliques Q of mutually intersecting alignment lines
![Page 62: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/62.jpg)
A=(1,n2)
B=(n1,1)
Separation of cliques: there is a Q such that x*(Q) > 1if and only if the longest A-B path in this grid is > 1
ijx*
ijx*
2 i n1
1
j
Note: Longest path on GRID can be cast as LP...
Vars: zp = longest up to p.Constraints: zp >= zq + len(q,p) for arc (q,p)
![Page 63: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/63.jpg)
21
maxEf
efEe
y
iuij
vuji xy )(
),)(,(
1221 ),(,),(, EvuViEvuVi
ivij
vuij xy )(
),)(,(
1221 ),(,),(, EvuViEvuVi
1
ijQij
x for all cliques Q of mutually intersecting alignment lines
![Page 64: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/64.jpg)
21
maxEf
efEe
y
iuij
vuji xy )(
),)(,(
1221 ),(,),(, EvuViEvuVi
ivij
vuij xy )(
),)(,(
1221 ),(,),(, EvuViEvuVi
1
ijQij
x for all cliques Q of mutually intersecting alignment lines
jiijji zxz ,1,
jiijji zxz ,,1
21, VjVi
02,1 nz
21, VjVi
11,1nz
1, jiz
jiz ,1 jiz ,
ijx
ijx
![Page 65: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/65.jpg)
What about Heuristics?Genetic algorithms
![Page 66: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/66.jpg)
Genetic Algorithm Overview• A Population of candidate solutions that
evolve (improve) over time
• Recombination creates new candidate solutions viacrossover and mutation
Populationat time t
Populationat time t+1
Recombinationoperators
Evaluationfunction
![Page 67: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/67.jpg)
Blue Parent
Offspring
Red Parent
Crossover• Crossover selects pieces from both parents and
creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 68: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/68.jpg)
Crossover• Crossover selects pieces from both parents and
creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 69: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/69.jpg)
Crossover• Crossover selects pieces from both parents and
creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 70: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/70.jpg)
Crossover• Crossover selects pieces from both parents and
creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 71: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/71.jpg)
These edges conflict with existingedges and are not copied
Crossover• Crossover selects pieces from both parents and
creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 72: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/72.jpg)
Crossover• Crossover selects pieces from both parents and
creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 73: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/73.jpg)
Crossover• Crossover selects pieces from both parents and creates two
offspring solutions– Select a set of edges in one parent to copy to the child– Copy as many edges as possible from the other parent– Add random edges to fill any remaining space
![Page 74: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/74.jpg)
Mutation• Mutation introduces small changes to
existing solutions by shifting edge endpoints
![Page 75: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/75.jpg)
Mutation• Mutation introduces small changes to
existing solutions by shifting edge endpoints– Select a set of endpoints to shift
![Page 76: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/76.jpg)
Mutation• Mutation introduces small changes to
existing solutions by shifting edge endpoints– Select a set of endpoints to shift
![Page 77: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/77.jpg)
Mutation• Mutation introduces small changes to
existing solutions by shifting edge endpoints– Select a set of endpoints to shift
This edge “fell off” theend of the contact map
and is removed
![Page 78: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/78.jpg)
Mutation• Mutation introduces small changes to
existing solutions by shifting edge endpoints– Select a set of endpoints to shift– Randomly add new edges
![Page 79: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/79.jpg)
Mutation• Mutation introduces small changes to
existing solutions by shifting edge endpoints– Select a set of endpoints to shift– Randomly add new edges
![Page 80: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/80.jpg)
Computational Results
![Page 81: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/81.jpg)
Compact Optimization vs SeparationCompact Optimization vs Separation
INSTANCE SEPARATION COMPACT OPTIMIZATION
PROT1 PROT2 n m cols rows nLPs time cols rows nLPs time speedup
1b3c 1svf 92 101 4359 6085 1944 28072 6474 10225 1 207 136x1nmg 1svf 92 112 4722 6932 2371 38877 6837 11072 1 314 124x1svf 2b3c 92 101 4359 6085 2118 26996 6474 10225 1 232 116x1bw5 1svf 96 91 4209 5393 1776 14010 6504 9889 1 136 103x1bct 1hlh 105 104 5354 7196 1477 18159 8104 12593 1 186 98x1bw5 1joy 100 115 5805 8055 2434 62426 8304 12955 1 663 94x1svf 1szt 97 101 4584 6349 1118 9744 6924 10934 1 142 69x1svf 2new 93 91 4074 5624 1349 10147 6234 9853 1 149 68x1joy 1svf 94 90 4086 5667 1174 6719 6291 9985 1 115 58x1f22 1svf 93 88 3975 5423 827 4937 6135 9652 1 110 45x1hlh 2new 98 120 5996 10658 1126 28941 8396 13112 1 829 35x1qr9 1svf 100 105 4851 6738 454 3328 7326 11590 1 116 29x1mdy 1svf 100 89 4323 5545 558 2382 6798 10397 1 83 29x1bhb 1svf 97 104 4683 6352 442 2860 7023 10937 1 165 17x1bct 1svf 100 75 3861 4662 316 739 6336 9514 1 54 14x1tn9 1bmr 109 191 12068 15432 328 73449 15036 21164 1 5185 14x1sfc 1svf 101 86 4269 5773 173 513 6789 10714 1 49 10x1svf 1wdc 99 82 4047 5318 170 427 6477 10081 1 61 7x1fza 1fzb 145 194 14664 21142 144 7811 19920 31511 1 1471 5x1ehj 1f22 100 116 5851 8346 65 284 8347 13240 1 96 3x
![Page 82: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/82.jpg)
Computational Results
• 269 proteins– 64 to 72 residues– 80 to 140 contacts
• Selected 597 pairs of proteins out of 36046 possible– roughly as many similar pairs as dissimilar
pairs
![Page 83: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/83.jpg)
Optimality Gap0 1 2 3 4 5 > 5
Number ofInstances
42 48 72 71 76 95 193
Average/MaxNum. Residues
66.4/69 66.8/72 66.7/71 67.0/72 67.0/71 66.8/72 66.8/72
Average/MaxNum. Contacts
61.1/92 56.3/89 57.3/93 59.7/95 61.5/88 64.7/89 71.4/133
Num. GA Best 38 44 63 61 64 74 155Num. LS1 Best 25 20 35 31 33 35 82Num. LS2 Best 5 0 0 1 5 12 53
![Page 84: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/84.jpg)
Skolnick Clustering Test
![Page 85: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/85.jpg)
Skolnick ResultsSkolnick Results• Four Families
1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• alpha-beta
• 8 structures
• up to 124 residues
• 15-30% sequence similarity
• < 3Å RMSD
![Page 86: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/86.jpg)
• Four Families1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• beta
• 8 structures
• up to 99 residues
• 35-90% sequence similarity
• < 2Å RMSD
Skolnick ResultsSkolnick Results
![Page 87: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/87.jpg)
• Four Families1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• alpha-beta
• 11 structures
• up to 250 residues
• 30-90% sequence similarity
• < 2Å RMSD
Skolnick ResultsSkolnick Results
![Page 88: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/88.jpg)
• Four Families1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• alpha
• 6 structures
• up to 170 residues
• 7-70% sequence similarity
• < 4Å RMSD
Skolnick ResultsSkolnick Results
![Page 89: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/89.jpg)
Family Style Residues Seq. Sim. RMSD Proteins1 alpha-beta 124 15-30% < 3A 1b00, 1dbw, 1nat, 1ntr,
1qmp, 1rnl, 3cah, 4tmy2 beta 99 35-90% < 2A 1baw, 1byo, 1kdi, 1nin,
1pla, 3b3i, 2pcy, 2plt3 alpha-beta 250 30-90% < 2A 1amk, 1aw2, 1b9b, 1btm,
1hti, 1tmh, 1tre, 1tri,1ydv, 3ypi, 8tim
4 170 7-70% < 4A 1b71, 1bcf, 1dps, 1fha,1ier, 1rcd
• Four Families1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
Skolnick ResultsSkolnick Results
![Page 90: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/90.jpg)
ClusteringClustering
Define score(P1, P2) as
0 <= # shared contacts
Min # of contacts of P1,P2
<= 1
Put P1, P2 in same family if score(P1, P2) >= threshold
![Page 91: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/91.jpg)
Clustering validation
We got some known families from biologists, PDB.
Experiment: Take a family F of proteins and align them against each other and against the remaining.
![Page 92: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/92.jpg)
We got some known families from biologists, PDB.
0.05 MISMATCH0.1 MISMATCH0.15 MISMATCH0.2 MISMATCH0.25 MISMATCH0.3 MISMATCH0.35 MATCH…… ……1.0 MATCH
score proteins were…
Experiment: Take a family F of proteins and align them against each other and against the remaining.
TYPICAL BEHAVIOUR
Clustering validation
![Page 93: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/93.jpg)
• Performance– 528 alignments– 1.3% false negative– 0.0% false positive
Skolnick ResultsSkolnick Results
![Page 94: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/94.jpg)
The Lagrangian The Lagrangian Relaxation Approach Relaxation Approach
(RECOMB)(RECOMB)
![Page 95: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/95.jpg)
- RECOMB-1 main meritsRECOMB-1 main merits:
showing optimality is feasible for a rigorous and well defined accepted similarity measure
providing a way to obtain bounds to optimal value
- RECOMB-1 main drawbacks:RECOMB-1 main drawbacks:
works only for small proteins (60 residues, 90 contacts)
can be slow and involved: relies on LP
![Page 96: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/96.jpg)
The RECOMB-2 approach:The RECOMB-2 approach:
Can solve larger instanceslarger instances
EasierEasier to implement (no LP)
FasterFaster (no LP)
Provides good heuristicgood heuristic solutions
Provides boundsbounds from optimum
![Page 97: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/97.jpg)
LP Lagrangian
60 resid., 80 contacts
150 resid., 250 contacts
100 res., 200 cont. 1000 res., 2000 cont.
2 hrs 5 min
Bound type
< 10% in < 1 hr
Max proved optimal
Max B&B root time
OLD NEW
Side by side
![Page 98: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/98.jpg)
Integer Quadratic Programming Formulation
![Page 99: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/99.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
jxl
(i) x>=0, integer
Node to Node Variables
![Page 100: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/100.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl(i) x>=0, integer
Node to Node Variables
![Page 101: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/101.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl(i) x>=0, integer
Node to Node Variables
![Page 102: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/102.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
(i) x>=0, integer
Node to Node Variables
![Page 103: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/103.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
(i) x>=0, integer
Node to Node Variables
![Page 104: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/104.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
Node to Node Variables
![Page 105: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/105.jpg)
Node to Node Variables0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
![Page 106: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/106.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
l m
Node to Node Variables
![Page 107: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/107.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
l mb(l,m) = 1
define constant coefficients b
Node to Node Variables
![Page 108: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/108.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
l m b(l,m) = 0
Node to Node Variables
![Page 109: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/109.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
b(l,m) xl xmis 1 iff alignment linesm and l give a sharng
Node to Node Variables
![Page 110: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/110.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
(iii) max m b(l, m) xl xm
Node to Node Variables
l
![Page 111: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/111.jpg)
0-1 VARIABLES: For each possible line l =(i,j), a variable xl (i and j residues)
i
j
xl
CONSTRAINTS: The lines must not cross
xl <= 1l in C
(i) x>=0, integer
for each clique C(ii)
OBJECTIVE: Pick pairs of lines aligning residues in contact
Node to Node Variables
(iii) max m b(l, m) xl xml
![Page 112: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/112.jpg)
The main idea
![Page 113: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/113.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
THE MAIN IDEA
![Page 114: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/114.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
THE MAIN IDEA
![Page 115: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/115.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
THE MAIN IDEA
![Page 116: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/116.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
THE MAIN IDEA
Hard!Hard! What if the function was linear?
![Page 117: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/117.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
B = max i b’i xi
s.t. x is a noncrossing matching
THE MAIN IDEA
Hard!Hard! What if the function was linear?
![Page 118: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/118.jpg)
We are looking for a noncrossing matching max a quadratic function
3 2 1 2
A = max i j bij xi xj
s.t. x is a noncrossing matching
B = max i b’i xi
s.t. x is a noncrossing matching
THE MAIN IDEA
Hard!Hard! What if the function was linear?
![Page 119: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/119.jpg)
We are looking for a noncrossing matching max a quadratic function
52 1
A = max i j bij xi xj
s.t. x is a noncrossing matching
B = max i b’i xi
s.t. x is a noncrossing matching
THE MAIN IDEA
Hard!Hard! What if the function was linear?
![Page 120: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/120.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
Hard!Hard! What if the function was linear?
B = max i b’i xi
s.t. x is a noncrossing matching3 1 4 1
THE MAIN IDEA
![Page 121: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/121.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
Hard!Hard! What if the function was linear?
B = max i b’i xi
s.t. x is a noncrossing matching3 1 4 1
Easy!Easy! It’s same as sequence alignment problem. DP, O( n1 x n2 )
a t c t c g
c g t c
THE MAIN IDEA
![Page 122: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/122.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
Hard!Hard! What if the function was linear?
B = max i b’i xi
s.t. x is a noncrossing matching3 1 4 1
Easy!Easy! It’s same as sequence alignment problem. DP, O( n1 x n2 )
a t c t c g
c g t c
The idea is then to find b’ such that A B
THE MAIN IDEA
![Page 123: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/123.jpg)
We are looking for a noncrossing matching max a quadratic function
A = max i j bij xi xj
s.t. x is a noncrossing matching
Hard!Hard! What if the function was linear?
B = max i b’i xi
s.t. x is a noncrossing matching3 1 4 1
Easy!Easy! It’s same as sequence alignment problem. DP, O( n1 x n2 )
a t c t c g
c g t c
The idea is then to find b’ such that A B In fact, it is always A B
THE MAIN IDEA
![Page 124: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/124.jpg)
This b’ is found by Lagrangian Relaxation of a formulationof the model.
L.R. consists in removing constraints and put them in the objectivefunction, weighted by some penalties
It’s a successful technique for large optimization problems
We skip the technical details.
Lagrangian RelaxationLagrangian Relaxation
![Page 125: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/125.jpg)
The approach yields GOOD FEASIBLE HEURISTIC The approach yields GOOD FEASIBLE HEURISTIC SOLUTIONS (best than previous methods), i.e. the SOLUTIONS (best than previous methods), i.e. the noncrossing matching of the lagrangian subproblemsnoncrossing matching of the lagrangian subproblems
Hence,we get LOWER BOUNDS –as well as UPPER BOUNDS- to the optimum
Lagrangian Relaxation
![Page 126: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/126.jpg)
Computational Results
![Page 127: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/127.jpg)
Computational Results
• Branch-and-Bound Results• 269 proteins
– 70 -100 residues– 80 to 140 contacts
• Picked 597 (REC’01) and 10,000 new pairs of proteins out of 36046 possible (would have taken up to few months with old method, took a weekend on PC)
![Page 128: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/128.jpg)
RECOMB 01
Optimality Gap0 1 2 3 4 5 > 5
Number ofInstances
42 48 72 71 76 95 193
Average/MaxNum. Residues
66.4/69 66.8/72 66.7/71 67.0/72 67.0/71 66.8/72 66.8/72
Average/MaxNum. Contacts
61.1/92 56.3/89 57.3/93 59.7/95 61.5/88 64.7/89 71.4/133
Num. GA Best 38 44 63 61 64 74 155Num. LS1 Best 25 20 35 31 33 35 82Num. LS2 Best 5 0 0 1 5 12 53
![Page 129: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/129.jpg)
Lagrangian
![Page 130: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/130.jpg)
We pushed our algorithm to its limit by optimally aligning very large proteins which are known to be very similar.
For instance, we optimally aligned a protein of 891 residues and 1944 contacts to a protein with 887 residues and 1937 contacts
Further use of Lagrangian
![Page 131: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/131.jpg)
Skolnick Clustering Test
![Page 132: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/132.jpg)
Family Style Residues Seq. Sim. RMSD Proteins1 alpha-beta 124 15-30% < 3A 1b00, 1dbw, 1nat, 1ntr,
1qmp, 1rnl, 3cah, 4tmy2 beta 99 35-90% < 2A 1baw, 1byo, 1kdi, 1nin,
1pla, 3b3i, 2pcy, 2plt3 alpha-beta 250 30-90% < 2A 1amk, 1aw2, 1b9b, 1btm,
1hti, 1tmh, 1tre, 1tri,1ydv, 3ypi, 8tim
4 170 7-70% < 4A 1b71, 1bcf, 1dps, 1fha,1ier, 1rcd
• Four Families1 Flavodoxin-like fold Che-Y related2 Plastocyanin3 TIM Barrel4 Ferratin
- Fix similarity level , define Pi and Pj in same familiy iff score(Pi, Pj) >=
Clustering validation
![Page 133: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/133.jpg)
INVESTIGATING THE CONTACT THRESHOLD
At low threshold there are no contacts
At high threshold, all pairs are in contact
Hence, interesting contact maps (that highlight similarity) lie somewhere in between
We performed experiment: vary contact threshold and similaritylevel and align, until can retrieve the clusters
Clustering a 0-1 matrix of similarity amounts to find a block-diagonal structure (done by TSP)
ijx
We found best results for threshold =7-8 angstrom and similarityabout 65%
![Page 134: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/134.jpg)
We find something like this
![Page 135: Protein structure comparison and contact maps](https://reader038.fdocuments.us/reader038/viewer/2022110213/56814626550346895db333ba/html5/thumbnails/135.jpg)
A server for contact map alignment
Possible practical project:
User inputs PDB proteins and threshold and retrieves alignment
Also, can compute contact maps from PDB files
Joint work with
-Sorin Istrail (Caltech)-Bob Carr (Sandia Natl Labs)-Brian Walenz (Celera genomics)-Alberto Caprara (University of Bologna)
Multiple contact map alignment
Research project: