Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/... · 2015. 12. 3. ·...
Transcript of Euclidean Distance Geometry - École Polytechniqueliberti/teaching/dgp/... · 2015. 12. 3. ·...
Euclidean Distance Geometry
Leo Liberti
CNRS LIX Ecole Polytechnique, France
Institut Pasteur, 28-30 April 2015
[L., Lavor: Introduction to Euclidean Distance Geometry, in preparation]
Table of contents
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
2
Applications
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
3
Clock Synchronization
5
7
3
4
Atomic clock (S)16:27
ChuckAlice
Bob
↓
16:20 16:22 16:24 16:26 16:28 16:30
A C S B
16:21 16:23 16:25 16:27 16:29 16:31
[Singer, 2011]
4
Sensor network localization
0
29
0 / 2.50781
36
1 / 1.97906
51
2 / 1.13649
52
3 / 2.92955
804 / 2.5548
915 / 2.4284
175 / 2.49225177 / 1.4276
87
176 / 2.73147
211 / 0.989917214 / 2.31786
49
210 / 2.83291
84
213 / 3.02018
66 212 / 1.56417
268 / 1.76662
267 / 2.31042
271 / 1.75417
69269 / 2.0622
73
270 / 3.10479
85
272 / 1.15897
346 / 1.50056
1
18
6 / 1.90126
37
7 / 2.61352
41
8 / 1.6604
45
9 / 1.92913
57
10 / 2.66407
5811 / 0.658861
8312 / 1.32064
110 / 1.29792
111 / 0.840347
113 / 2.14071
114 / 2.03576
47
112 / 2.5425
215 / 2.12848
217 / 3.0822
218 / 1.98508
216 / 2.70003
229 / 2.97748
231 / 1.63275
232 / 2.32966
230 / 2.92252
249 / 1.49322
250 / 1.4467
251 / 3.11231
289 / 2.5056
81
290 / 2.54013
291 / 1.97931
2
5
13 / 1.62227
1414 / 2.01015
17
15 / 2.68422
20
16 / 3.05872
23
17 / 2.48332
4218 / 1.05756
64
19 / 2.21356
67
20 / 2.42912
8821 / 2.34146
93
22 / 1.0662
28 / 2.06104
29 / 1.80061
31 / 1.46243
33 / 1.3817835 / 1.01186
33
30 / 3.15392 6532 / 2.19726
70
34 / 3.17108
82 / 2.05285
83 / 1.9929
84 / 2.90088
85 / 0.965501
87 / 1.40579
88 / 2.51699
86 / 3.19003
98
89 / 2.50718
106 / 1.36517
107 / 2.2605
109 / 2.88336
55
108 / 2.81041
119 / 2.25758
123 / 0.666733
126 / 2.79445
122 / 1.26894
124 / 2.45029
71
125 / 2.61515
127 / 2.5523
56
120 / 2.46168
63
121 / 3.15779
143 / 2.62768
145 / 3.01294146 / 1.62871
89
147 / 3.16128
144 / 2.60777
233 / 1.59139
234 / 3.07459
235 / 1.62072
316 / 1.57213
318 / 2.54084
92
317 / 3.16387
326 / 2.30061
325 / 2.97621
327 / 2.62621
350 / 3.18927
323 / 0.844888
26 / 2.43216
24 / 1.43953
25 / 2.52302
27 / 1.2847
329 / 2.97101
331 / 1.47226
330 / 2.81533
79
339 / 3.17026
195 / 3.02256
196 / 2.72116
194 / 3.16431
53
193 / 0.88823
319 / 1.78484
322 / 3.15941
320 / 1.19371321 / 2.59083
332 / 2.88171
77
333 / 2.5427
6
36 / 2.6581637 / 2.84506
39 / 3.00913
39
38 / 0.977096
40 / 0.147583
90
41 / 2.54617 223 / 1.11811
225 / 1.74291
72
224 / 2.873
257 / 2.65041
7
10
42 / 1.67711
16
43 / 1.28092
3044 / 2.6039944
45 / 1.93798
46 / 0.90946
78
47 / 1.95204
86
48 / 3.1702
9749 / 2.91638
50 / 2.79803
59 / 2.64882
61 / 3.11897
63 / 2.57151
64 / 0.6334465 / 1.47059
15
58 / 2.53596 2160 / 2.72874
59
62 / 2.44092
98 / 2.45974102 / 2.89582
101 / 3.00823
100 / 2.72757
103 / 1.23805
104 / 2.65248
105 / 1.56149
24
99 / 2.76855
182 / 2.80251
183 / 2.17394
178 / 0.684621
184 / 2.04844
186 / 2.36739
179 / 1.75353
181 / 1.27559
62
180 / 2.82612
185 / 2.02673
244 / 2.90515
245 / 2.54461246 / 1.4908
248 / 2.19914
242 / 2.10075
243 / 1.81286
247 / 2.70375
334 / 2.85895
335 / 2.74921
345 / 0.96983896344 / 3.01517
852 / 1.59789
53 / 3.1109
13
51 / 2.96353
54 / 2.6062
94 55 / 2.44249
81 / 2.44372
31
80 / 3.09041
352 / 3.17363
95
353 / 1.69606
9
56 / 2.23711
57 / 2.87413
95 / 2.4309797 / 1.79475
90 / 0.575857
91 / 1.84882
60
92 / 3.1144161
93 / 0.745853
7494 / 1.08674
96 / 1.91462
132 / 2.48921134 / 1.67584
128 / 2.41661
129 / 2.54924
130 / 0.857403131 / 1.03428
133 / 1.33929
295 / 2.78515
296 / 3.03355
297 / 2.79379
292 / 2.23004
293 / 2.59059
76
294 / 2.25052
1122
66 / 2.54335
2567 / 1.59039
26
68 / 2.55644
3569 / 1.22183
4370 / 1.26323
46
71 / 0.65784172 / 1.4554873 / 1.45696
74 / 2.43597
99
75 / 0.710271
135 / 1.03939
137 / 2.64809
138 / 2.36917139 / 2.85603
141 / 1.99344
142 / 0.632129
28136 / 3.03397
140 / 1.63894
150 / 2.05658
151 / 2.68427
152 / 1.33314153 / 1.85365155 / 1.0403
156 / 1.22973
157 / 2.29037
154 / 2.67744
158 / 3.05556
159 / 2.34031
160 / 1.82595
162 / 2.70381
163 / 2.43938
161 / 1.90326
203 / 0.991465
204 / 1.81584
206 / 2.66247
207 / 2.45539
208 / 2.2551
209 / 1.2466
48205 / 3.06568
236 / 1.90515238 / 2.5944239 / 2.72018
240 / 3.16484
241 / 0.720351
237 / 2.87679
252 / 0.86256254 / 0.843728
255 / 2.43018
256 / 1.26966
253 / 2.83326
262 / 0.880856
263 / 3.06774
264 / 1.88647
261 / 1.9719
348 / 2.26962
349 / 2.1111
351 / 3.02766
12
78 / 2.35652
76 / 0.500095
77 / 3.13454
79 / 2.02458
173 / 2.85379
172 / 2.85207
174 / 2.12157
281 / 1.77315
280 / 2.13908
187 / 0.741716
188 / 1.28288
304 / 2.67349
298 / 3.19922
301 / 3.10155
303 / 1.2464
68
299 / 2.68553300 / 2.93307
82302 / 2.70564
306 / 3.166308 / 2.46248
305 / 0.378469
307 / 1.95331
341 / 2.70349340 / 1.87485
354 / 2.04535
149 / 1.71292
32148 / 2.63438
19
117 / 2.4561
118 / 1.55824
115 / 1.40904
50116 / 0.648045
259 / 2.59804
260 / 2.60371
258 / 2.04883
265 / 2.53475
266 / 1.20584
285 / 3.04711
284 / 1.28966
286 / 0.444549287 / 2.49879
283 / 0.739388
282 / 2.4463
288 / 2.40494
312 / 2.0258
313 / 1.03139314 / 2.65429
315 / 1.70638
34
189 / 2.92806
38
190 / 2.07339
40191 / 2.80145
192 / 2.35256
324 / 2.63064
323 / 2.58729
27
171 / 2.23482
167 / 2.38609
164 / 1.92141
165 / 2.6582
166 / 2.28031
168 / 2.41683 169 / 0.955716
170 / 0.967808199 / 2.18351
197 / 0.88776
198 / 1.51159
200 / 0.583974201 / 2.87535
202 / 1.31584
220 / 2.98526
219 / 1.26904
221 / 0.308754
222 / 1.8407
226 / 1.33417227 / 3.09905
228 / 1.31378
328 / 1.65874
338 / 1.28689
337 / 1.83675
75
336 / 3.11488
347 / 3.11676
310 / 2.37803
309 / 1.9722
311 / 1.13749
276 / 3.05065
277 / 2.40579
273 / 2.81885
275 / 3.02541
274 / 2.49444
54
278 / 3.02172
279 / 1.17834
343 / 1.31891
342 / 2.50522
−→
[Yemini, 1978]
5
Protein conformation from NMR data
1
2
3
4
5
9
6
8
10
7
11
−→
[Crippen & Havel 1988]
6
Clock synchronization: solutions
|xA − xB| = 7
|xA − xC| = 3
|xA − xS| = 5
|xB − xC| = 4
xS = 16:27.5
7
3
4
PSfrag
xA
xB
xC
xS
xS = 16:27
xA = 16:22
xB=16:15
xC=16:19
xB=16:15
xC=16:25
xB=16:29
xC=16:19
xB=16:29
xC=16:25
xA = 16:32
xB=16:25
xC=16:29
xB=16:25
xC=16:35
xB=16:39
xC=16:29
xB=16:39
xC=16:35
7
Definition
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
8
Distance Geometry Problem (DGP)
Given: Determine whether ∃:– a simple graph G = (V,E)
– an edge function d : E → R≥0– an integer K ∈ N
a realization x : V → RK s.t.
∀{u, v} ∈ E ‖xu − xv‖2 = duv
1
2
3
4
1
√3
1
1
1
1 1
2
3
4
Let n = |V |9
More applications
• Autonomous underwater vehicles [Bahr et al. 2009]
• Statics of rigid structures [Maxwell 1864]
• Matrix completion [Laurent 2009]
• Statistics [Boer 2013]
• Psychology [Kruskal 1964]
[Liberti et al., SIREV 2014]
10
Complexity primer
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
11
Definitions
• Decision problem: mathematical YES/NO-type question depending
on a parameter vector π
• Instance: same as above with π replaced by given values v
• Certificate: proof that a given answer is true
• P: all decision problems solvable in at most p(|π|) steps
where p is a polynomial
• NP: all decision problems with |YES certificate| ≤ p(|π|)where p is a polynomial
12
Reductions
• P,Q: decision problems
• If ∃ algorithm A which:
1. reformulates instances P of P into instances Q of Q
2. has answer(P ) = YES iff answer(A(P )) = YES
3. is polytime in the instance size |P |
then A is a reduction of P to Q
13
NP-hardness
• Q is NP-hard if every problem in NP reduces to Q
• Q is NP-complete if it is NP-hard and is in NP
Why does it work?
any P in NPpolytime reduction−−−−−−−−−−−−−−−−−−→ Q: how hard?
• Suppose Q easier than P
• Solve P by reducing to Q in polytime and then solve Q
• Then P as easy as Q, against assumption
• ⇒ Q at least as hard as P
So if Q is NP-hard it is as hard as any problem in NP
⇒ Q is as hard as the hardest problem in NP
14
NP-hardness proofs
Given a new problem Q, take any known
NP-hard problem P and reduce it to Q
Why does it work?
P : NP-hardpolytime reduction−−−−−−−−−−−−−−−−−−→ Q: how hard?
• As before: Suppose . . . (etc.) ⇒ Q at least as hard as P
• Since P is NP-hard, it is hardest in NP, and so is Q
⇒ Q is NP-hard
15
Complexity of the DGP
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
16
DGP ∈ NP?
• NP: YES/NO problems with polytime-checkable proofs for YES
• DGP is a YES/NO problem
• DGP1 ∈ NP, since duv = |xu − xv| ⇒ (d ∈ Q→ x ∈ Q)
• Solutions might involve irrational numbers when K > 1
• Some empirical evidence that DGP 6∈ NP [Beeker et al. 2013]
17
The DGP is NP-hard
PartitionGiven a = (a1, . . . , an) ∈ Nn, ∃ I ⊆ {1, . . . , n} s.t.
∑
i∈Iai =
∑
i 6∈Iai ?
• Reduce (NP-hard) Partition to DGP1
• a −→ cycle C with V (C) = {1, . . . , n}, E(C) = {{1,2}, . . . , {n,1}}• For i < n let di,i+1 = ai, and dn,n+1 = dn1 = an
• E.g. for a = (1,4,1,3,3), get cycle graph:
1
2 3
4
5
1
4
1
33
[Saxe, 1979]
18
Partition is YES ⇒ DGP1 is YES
• Given: I ⊂ {1, . . . , n} s.t.∑
i∈Iai =
∑
i 6∈Iai
• Construct: realization x of C in R
1. x1 = 0 // start
2. induction step: suppose xi known
if i ∈ I
let xi+1 = xi + di,i+1 // go right
else
xi+1 = xi − di,i+1 // go left
• Correctness proof: by the same induction
but careful when i = n: have to show xn+1 = x1
19
I = {1,2,3}
0 1 2 3 4 5 6
x1
0 1 2 3 4 5 6
x1 x2
1 ∈ I
0 1 2 3 4 5 6
x1 x2 x3
1 ∈ I2 ∈ I
0 1 2 3 4 5 6
x1 x2 x3 x4
1 ∈ I2 ∈ I
3 ∈ I
0 1 2 3 4 5 6
x1 x2 x3 x4x5
1 ∈ I2 ∈ I
3 ∈ I
4 6∈ I
0 1 2 3 4 5 6
x6 = x1? x2 x3 x4x5
1 ∈ I2 ∈ I
3 ∈ I
4 6∈ I5 6∈ I
Partition is YES ⇒ DGP1 is YES
(1) =∑
i∈I(xi+1 − xi) =
∑
i∈Idi,i+1 =
=∑
i∈Iai =
∑
i 6∈Iai =
=∑
i 6∈Idi,i+1 =
∑
i 6∈I(xi − xi+1) = (2)
(1) = (2)⇒∑
i∈I(xi+1 − xi) =
∑
i 6∈I(xi − xi+1)⇒
∑
i≤n(xi+1 − xi) = 0
⇒ (xn+1 − xn) + (xn − xn−1) + · · ·+ (x3 − x2) + (x2 − x1) = 0
⇒ xn+1 = x1
26
Partition is NO ⇒ DGP1 is NO
• By contradiction: suppose DGP1 is YES, x realization of C
• F = {{u, v} ∈ E(C) | xu ≤ xv}, E(C)rF = {{u, v} ∈ E(C) | xu > xv}• Trace x1, . . . , xn: follow edges in F (→) and in E(C) r F (←)
0 1 2 3-1-2-3
x1 x2x3x4 x5
∑
{u,v}∈F(xv − xu) =
∑
{u,v}6∈F(xu − xv)
∑
{u,v}∈F|xu − xv| =
∑
{u,v}6∈F|xu − xv|
∑
{u,v}∈Fduv =
∑
{u,v}6∈Fduv
• Let J = {i < n | {i, i+1} ∈ F} ∪ {n | {n,1} ∈ F}
⇒∑
i∈Jai =
∑
i 6∈Jai
• So J solves Partition instance, contradiction
• ⇒ DGP is NP-hard, DGP1 is NP-complete
27
Number of solutions
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
28
With congruences
• (G,K): DGP instance
• X ⊆ RKn: set of solutions
• Congruence: composition of translations, rotations, reflections
• C = set of congruences in RK
• x ∼ y means ∃ρ ∈ C (y = ρx):
distances in x are preserved in y through ρ
• ⇒ if |X| > 0, |X| = 2ℵ0
29
Modulo congruences
• Congruence is an equivalence relation ∼ on X
(reflexive, symmetric, transitive)
∼
• Partitions X into equivalence classes
• X = X/∼: sets of representatives of equivalence classes
• Focus on |X| rather than |X|
30
Cardinality of X
• infeasible ⇔ |X| = 0
• rigid graph ⇔ |X| < ℵ0
• globally rigid graph ⇔ |X| = 1
• flexible graph ⇔ |X| = 2ℵ0
• |X| = ℵ0: impossible by Milnor’s theorem
31
Milnor’s theorem implies |X| 6= ℵ0
• System S of polynomial equations of degree 2
∀i ≤ m pi(x1, . . . , xnK) = 0
• Let X be the set of x ∈ RnK satisfying S
• Number of connected components of X is O(3nK)
[Milnor 1964]
• If |X| is countable then G cannot be flexible
⇒ incongruent elements of X are separate connected components
⇒ by Milnor’s theorem, there’s finitely many of them
32
Examples
V 1 = {1,2,3}E1 = {{u, v} | u < v}d1 = 1
x1 x2
x3
ρ congruence in R2
⇒ ρx valid realization
|X| = 1
V 2 = V 1 ∪ {4}E2 = E1 ∪ {{1,4}, {2,4}}d2 = 1 ∧ d14 =
√2
x1 x2
x3
x4
ρ reflects x4 wrt x1, x2
⇒ ρx valid realization
|X| = 2 ( , )
V 3 = V 2
E3 = {{u, u+1}|u ≤ 3}∪ {1,4}d1 = 1
x1 x2
x3x4ρ rotates x2x3, x1x4 by θ
⇒ ρx valid realization
|X| is uncountable
( , , , , . . .)
33
Mathematical optimization formulations
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
34
System of quadratic constraints
∀{u, v} ∈ E ‖xu − xv‖2 = d2uv
• Around 10 vertices
• Computationally useless
35
Quadratic objective
minx∈RnK
∑
{u,v}∈E(‖xu − xv‖2 − d2uv)
2
• Globally optimal value zero iff x is a realization of G
• sBB: 10-100 vertices, exact solutions
• heuristics: 100-1000 vertices, poor quality
[Lavor et al., 2006]
36
Convexity and concavity
maxx∈RnK
∑
{u,v}∈E‖xu − xv‖2
∀{u, v} ∈ E ‖xu − xv‖2 ≤ d2uv
• Convex constraints, concave objective
• Computationally no better than “quadratic objective”
37
Pointwise reformulation
maxx∈RnK
∑
{u,v}∈E,k≤Kθuvk(xuk − xvk)
∀{u, v} ∈ E ‖xu − xv‖2 ≤ d2uv
• Convex subproblem in stochastic iterative heuristics
“guess θ and solve”
• 100-1000 vertices, good quality
[L. IOS14/MAGO14(slides)]
38
SDP formulation
minX�0
∑
{u,v}∈E(Xuu +Xvv − 2Xuv)
∀{u, v} ∈ E Xuu +Xvv − 2Xuv ≥ d2uv
• Similar to those of Ye, Wolkowicz — works better for proteins
• 100 vertices, good quality
[D’Ambrosio et al., in progress]
39
Realizing complete graphs
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
40
Cliques
2-clique
12
3-clique
1
2
3
4-clique
1
2
3
4
(K +1)-clique = K-clique ⊕ a vertex
Given a realization of the K-clique, find the position of the vertex
41
Triangulation
1
2
3
11
2
−→1 2 3
0 1 2
Example: realize triangle on a line
• From ‖x3 − x1‖ = 2 and ‖x3 − x2‖ = 1 get
x23 − 2x1x3 + x21 = 4 (1)
x23 − 2x2x3 + x22 = 1. (2)
• (??)− (??) yields
2x3(x1 − x2) = x21 − x22 − 3
⇒ 2x3 = 4,
• Hence x3 = 2
42
Realizing a (K +1)-clique in RK−1
• Apply triangulation inductively on Kassume x1, . . . , xK ∈ RK−1 known, compute y = xK+1
• K quadratic eqns (∀j ≤ K ‖y − xj‖2 = d2j,K+1) in K − 1 vars
‖y‖2 − 2x1 · y + ‖x1‖2 = d21,K+1 [1]...
...‖y‖2 − 2xK · y + ‖xK‖2 = d2K,K+1 [K]
• Form system ∀j ≤ K − 1 ([j]− [K])
2(x1 − xK) · y = ‖x1‖2 − ‖xK‖2 − d21,K+1 + d2K,K+1 [1]− [K]...
...2(xK−1 − xK) · y = ‖xK−1‖2 − ‖xK‖2 − d2K−1,K+1 + d2K,K+1 [K − 1]− [K]
• This is a (K − 1)× (K − 1) linear system Ay = b
Solve to find y
[Dong, Wu 2002]
43
“Solve”?
1. What if A is singular?
2. Or: A nonsingular but instance is NO
44
Singularity: rkA = K − 2
One row xj − xK of A depends on the others
K = 2 triangle in R1 x1 − x2 = 0x1=x2 x3?x3?
K = 3 4-clique in R2 x1, x2, x3 on a linex1 x2
x3
x4?
x4?
K = 4 5-clique in R3 x1, . . . , x4 in a plane x1x2 x3 x4
x5?
x5?
Trend continues: rk A = K − 2⇒ |X| = 2 (see later)
45
Singularity: rkA = K − 3
Two rows xj − xk depend on the others
K = 3 4-clique in R2 x1 = x2 = x3 x1=x2=x3
x4?
K = 4 5-clique in R3 x1, . . . , x4 on a line
x1
x2
x3
x4
x5?
Trend continues: [Hendrickson, 1992]
Thm. 5.8. If a graph G is connected, flexible and has more than K vertices, X
contains almost always a submanifold diffeomorphic to a circle
46
Hendrickson’s theorem also applies to non-cliques
47
Nonsingular matrix A with NO instance
• Infeasible quadratic system ∀j ≤ K ‖xK+1 − xj‖2 = d2j,K+1
• Take differences, get nonsingular A and value for xK+1
• . . . but it’s wrong!
Shit happens!
Every time you solve the linear system Ay = b
check feasibility with quadratic system
48
Algorithm for realizing complete graphs in RK
• Assume:
(i) G = (V,E) complete
(ii) |V | = n ≥ K +2
(iii) we know x1, . . . , xK+1
• Increase K: we know how to realize xK+2 in RK
• Use this inductively for each i ∈ {K +2, . . . , n}
49
Algorithm for realizing complete graphs in RK
// realize next vertex iteratively
for i ∈ {K +2, . . . , n} do// use (K +1) immediate adjacent predecessors to compute xi
if rkA = K thenxi = A−1b // A, b defined as above
elsexi =∞ // A singular, mark ∞ and exit
breakend if// check that xi is feasible w.r.t. other distances
for {j ∈ N(i) | j < i} doif ‖xi − xj‖ 6= dij then
// if not, mark infeasible and exit loop
xi = ∅break
end ifend forif xi = ∅ then
breakend if
end forreturn x
50
Complexity of Alg. 1
• Outer loop: O(n)
• Rank and inverse of A: O(K3)
• Inner loop: O(n)
• Get O(n2K3)
• But in most applications K is fixed
• Get O(n2)
But how do we find the realization of the first K +1 vertices?
51
Realizing (K +1)-cliques in RK
• Realizing (K +1)-cliques in RK−1 yields “flat simplices”
(e.g. triangles on lines)
• Use “natural” embedding dimension RK
• Same reasoning as above:
get system Ay = b where y = xK+1 and Aj = 2(xj − xK)
• But now A is (K − 1)×K
• Same as previous case with A singular
52
Almost square
How can you solve the following system Ay = b:
a11 a12 . . . a1K... ... . . . ...
aK−1,1 aK−1,2 . . . aK−1,K
y1...
yK
=
b1...
bK−1
where A has one more column than rows and rank K − 1?
53
Basics and nonbasics
• Since rkA = K − 1, ∃ K − 1 linearly independent columns
• B: set of their indices
• N : index of remaining column
• B: (K − 1)× (K − 1) square matrix of columns in B
• ⇒ B is nonsingular
• Can partition columns as A = (B|N)
Column j corresponds to variable yj
• Variables yB are called basic variables
• Variable yN is called nonbasic variable
54
The dictionary
(B|N)y = b
⇒ ByB+NyN = b
⇒ yB = B−1b−B−1NyN
Basics expressed in function of nonbasic
55
One quadratic equation
• From value of yN , can use dictionary to get yB
• Use one quadratic equation
1. Pick any h ∈ {1, . . . ,K − 1}, equation is ‖xh − y‖22 = d2hK
2. y = (yB|yN )⊤
3. Replace yB with B−1b−B−1NyN in equation
4. Solve resulting quadratic equation in one variable yN
5. Get 0,1 or 2 values for yN
6. ⇒ Get 0,1 or 2 positions for xK+1
56
What if B−1N is zero?
• yB = B−1b−B−1NyN reduces to yB = B−1b
• Use one quadratic equation
1. Pick any h ∈ {1, . . . ,K − 1}, equation is ‖xh − y‖22 = d2hK
2. y = (yB|yN )⊤
3. Replace yB with B−1b in equation
4. Solve resulting quadratic equation in one variable yN
5. Get 0,1 or 2 values for yN
6. ⇒ Get 0,1 or 2 positions for xK+1
57
The difference
• B−1N 6= 0: yNdictionary−−−−−−−→ yB
• Different values y+N 6= y−N −→ y+, y− with different components
• B−1N = 0: yBquadratic eqn.−−−−−−−−−−→ yN
• Even if y+N 6= y−N , K − 1 components of y+, y− are equal
aff(x1, . . . , xK−1) = {y ∈ RK | yN = 0}
58
The case of no solutions
• No realizations exist for this (K +1)-clique in RK
• DGP instance is NO
59
The case of one solution
• Assume for simplicity: N = K, h = 1, B−1N 6= 0Then ‖xh − y‖2 = d2h,K+1 becomes:
λy2K − 2µyK + ν = 0, where
λ = 1+∑
ℓ,j<K
β2ℓja
2jK
µ = x1K +∑
ℓ,j<K
βℓjajK(βℓjbℓ − x1ℓ)
ν =∑
ℓ,j<K
βℓjbℓ(βℓjbℓ − 2x1ℓ) + ‖x1‖2 − d21,K+1
• (Exactly one solution for yK) ⇔ µ2 = λν, not a tautology
• The set of all (K + 1)-clique DGP instances in RK s.t. µ2 = λν
has Lebesgue measure 0
• Ignore them, they happen with probability∗ 0!
∗ Assuming continuous distributions over the reals. For floating point number, who knows? . . .
but we’ll ignore these instances anyhow
60
Discriminant > 0, = 0
4
4
4
1
3
1
2
3
1
2
32
61
The case of two solutions
• K spheres SK−11 , . . . , SK−1K in RK
centered at x1, . . . , xK
with radii d1,K+1, . . . , dK,K+1
• xK+1 must be at the intersection of SK−11 , . . . , SK−1K
• If⋂
j SK−1j 6= ∅, then |⋂j S
K−1j | = 2 in general
• will not mention “probability 0” or “in general” anymore
[Coope 2000]
62
Mirror images
• Let x+ = {x1, . . . , xK, x+K+1}, x− = {x1, . . . , xK, x−K+1}assume dim aff(x1, . . . , xK) = K − 1 (†)
• Theorem
x+, x− are reflections w.r.t. hyperplane defined by x1, . . . , xK
• Proof
1. x+, x− congruent by construction
2. ∀i ≤ K xi ∈ x+ ∩ x− → x+, x− not translations
3. |x+ ∩ x−| = K < |x+| = |x−| → x+, x− not rotations by (†)4. ⇒ must be reflections
63
Algorithm for realizing (K +1)-cliques in RK
// realize 1 at the origin
x1 = (0, . . . ,0)
// realize next vertex iteratively
for ℓ ∈ {2, . . . ,K +1} do
// at most two positions in Rℓ−1 for vertex ℓ
S =⋂
i<ℓSℓ−2i
if S = ∅ then
// warn: infeasible
return ∅end if
// arbitrarily choose one of the two points
choose any xℓ ∈ S
end for
// return feasible realization
return x
64
Complexity of Alg. 2
• Outer loop: O(K)
• Gaussian elimination on A: O(K3)
• Some messing about to obtain x+K+1, x−K+1: +O(K2)
• Get O(K4)
• But in most applications K is fixed
• Get O(1)
65
Back to complete graphs
• Alg. 2: realize 1, . . . ,K +1 in RK: O(1)
• Alg. 1: Realize K +2, . . . , n: O(n2)
• ⇒ O(n2)
• What about |X|?– Alg. 1 is deterministic: one solution from x1, . . . , xK+1– Alg. 2 is stochastic: pick one of two values K times
⇒ |X| = 2K
66
Let’s look at sparser graphs
67
K-trilaterative graphs
• In Alg. 1 we only need each v > K + 1 to have K + 1 adjacent
predecessors in order to find a unique solution for xv
• Determination of xv from K+1 adjacent predecessors: K-trilateration
• K-trilaterative graph:
(i) has a vertex order ensuring this property
(ii) the initial K +1 vertices induce a (K +1)-clique
the order is called K-trilateration order
• Alg. 1 realizes all K-trilaterative graphs
The DGP restricted to K-trilaterative graphs in RK is easy
[Eren et al. 2004]
68
The story so far
• Lots of nice applications
• DGP is NP-hard
• May have 0,1, finitely many or 2ℵ0 solutions modulo congruences
• Continuous optimization techniques don’t scale well
• Using K + 1 adjacent predecessors, realize K-trilaterative graphs
in RK in polytime
• Do we need K +1 adjacent predecessors, or can we do with
less?
69
The Branch-and-Prune algorithm
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
70
Fewer adjacent predecessors
• Alg. 2 only needs K adjacent predecessor
• Extend to n vertices: (K − 1)-trilaterative graphs
• Can we realize (K − 1)-trilaterative graphs in RK?
• A small case: graph consisting of two K +1 cliques11
1
5
5
5
222
33
3 44
4
71
Take a closer look. . .
5
5
22
33
44
• Realization of a K +1 clique in RK knowing x1, . . . , xK
• We know how to do that!
• Consistent with 2 solutions for x5, reflected across plane through
x2, x3, x4
72
Discretization and pruning edges
• (K − 1)-trilaterative graph G = (V,E):
∀v > K ∃Uv ⊂ V (|Uv| = K ∧ ∀u ∈ Uv(u < v) ∧ {u, v} ∈ E)
• Discretization edges:
ED = {{u, v} ∈ E | u, v ≤ K}︸ ︷︷ ︸
initial clique
∪ {{u, v} ∈ E | v > K ∧ u ∈ Uv}︸ ︷︷ ︸
vertex order
• Pruning edges EP = E r ED
10
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
1819
73
Role of discretization edges
Missing discretization edge
⇒ non-rigid structure
⇒ X uncountable
Else: X finite
74
Role of pruning edges
No pruning edges: 8 incongruent realizations in R2
1
2
3
4
5 1
2
3
4
5 1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
75
Role of pruning edges
Pruning edge {1,4}: only 4 realizations remain valid
1
2
3
4
5 1
2
3
4
5 1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
76
Motivation
Protein backbones
• Total order < on V
• Covalent bond distances: {u− 1, u} ∈ E
• Covalent bond angles: {u− 2, u} ∈ E
• NMR experiments: {u− 3, u} ∈ E
(and other edges {u, v} with v − u > 3)
Generalize “3” to K
[Lavor et al., COAP 2012]
77
KDMDGP graphs
K = 2 K = 3
1
2
3
4
5
Generalization of protein backbone order:
v > K is adjacent to K immediate predecessors v − 1, . . . , v −K
KDMDGP: Discretizable Molecular Distance Geometry Problem
78
The Branch-and-Prune (BP) algorithm
BP(v, x, X):
1. Given v > K, realization x = (x1, . . . , xv−1)
2. Compute S =⋂
u∈Uv
SK−1u
3. For each xv ∈ S s.t. ∀{u, v} ∈ EP (u < v → ‖xu − xv‖ = duv)
(a) let x = (x, xv)
(b) if v = n add x to X, else call BP(v +1, x, X)
• Recursive: starts with BP(K + 1, (x1, . . . , xK), ∅)
• All realizations in X are incongruent∗
• Can be easily modified to find only p solutions for given p
• Applies to all (K − 1)-trilaterative graphs in RK
• Specialize to KDMDGP graph by setting Uv = {v − 1, . . . , v −K}
∗ with probability 1, and aside from one reflection at v = K +1
[L. et al. ITOR 2008]
79
Complexity of BP
• Most operations are O(Kh) for some fixed h ⇒ O(1)
• Distance check at Step 3: O(n)
• Recursion on at most 2 branches at each call: binary tree
• Only recurse when v > K, v < n: 2n−K nodes
• Overall O(n2n−K) = O(2n)
Worst-case exponential behaviour
80
Hardness of KDMDGP
• The KDMDGP is NP-hard for each K
– every DGP instance is also DMDGP if K = 1
– reduction from Partition can be extended to any K
• (K − 1)-trilateration graphs are NP-hard by inclusion
• No polytime algorithm unless P=NP
Trilaterative graphs in RK are complexitywise borderline at K
81
Correctness
Thm.
When BP terminates, X contains every incongruent realization of G
Proof.
• Let y be any realization of G
• Since G has an initial K-clique, can rotate/translate/reflect y to
y[K] = x[K] for all x ∈ X
• BP exhaustively constructs every extension of x[K] which is feasible
with all distances, so y ∈ X
for a realization y, y[h] = (y1, . . . , yh) is the initial segment of y
82
Two examples
83
Empirical observations
• Fast: up to 10k vertices in a few seconds on 2010 hardware
• Precise: errors in range O(10−9)-O(10−12)
• Number of solutions always a power of 2:
obvious if EP = ∅, but otherwise mysterious
• Linear-time behaviour on proteins:
this really shouldn’t happen
84
Symmetry in the KDMDGP
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
[L. et al. DAM 2014]
85
Partial reflections
• For each v > K, let
gv(x) = (x1, . . . , xv−1, Rvx(xv), . . . , R
vx(xn))
be the partial reflection of x w.r.t. v
xv−3
xv−2
xv−1
• Note: the gv’s are idempotent operators
• GD = (V,ED): subgraph of G given by discretization edges
• ∀v > K reflection Rvx gives a binary choice in general∗
• XD ⊂ RnK contains 2n−K incongruent realizations of GD
∗ subsequent results hold “with probability 1”
86
Discretization group
• GD = 〈gv | v > K〉: the discretization group of G w.r.t. K
subgroup of a Cartesian product of reflection groups
• An element g ∈ GD has the form⊗
v>Kgavv , where av ∈ {0,1}
• Action of GD on XD: g(x) =(
gaK+1K+1 ◦ · · · ◦ gann
)
(x)
87
Commutativity of partial reflections
Lemma A GD is Abelian
Proof Assume K < u < v. Then
gugv(x) = gu(x1, . . . , xv−1, Rvx(xv), . . . , R
vx(xn))
= (x1 . . . , xu−1, Rugv(x)
(xu), . . . , Rugv(x)
Rvx(xv), . . . , R
ugv(x)
Rvx(xn))
= (x1 . . . , xu−1, Rux(xu), . . . , R
vgu(x)
Rux(xv), . . . , R
vgu(x)
Rux(xn))
= gv(x1, . . . , xu−1, Rux(xu), . . . , R
ux(xn))
= gvgu(x)
where equality of these terms holds by a Technical Lemma
(next slide)
[L. et al. 2013]
88
Commutativity of partial reflections
Technical Lemma
(Proof sketch for K = 2) Let y⊥Aff(xv−1, . . . , xv−K) and ρy = Rvx
ρz
ρy
t
O
z
y
ρzy
ρyt
ρzt
ρρzy
ρzρyt = ρρzyρzt
89
One realization generates all others
Lemma B The action of GD on XD is transitive
K = 2 x
y
g
gg
54
3
∃g ∈ GD (y = g(x)): namely, y = g5(g4(g3(x)))
Proof By induction on v: assume result holds to v− 1 with g′, then either
it holds for v and g = g′, else flip and let g = gvg′
[L. et al. 2013]
90
Structure and invariance
• GD is Abelian and generated by n−K idempotent elements
⇒ GD∼= Cn−K
2
• GD ≤ Aut(XD) by construction
91
Solution sets
• X: set of incongruent realizations of G
• GD defined on same vertices but fewer edges
⇒ fewer distance constraints on realizations
⇒ more realizations
• All realizations of G are also realizations of GD
⇒ X ⊆ XD
92
Losing invariance on pruning edges
Lemma C Let Wuv = {u+K +1, . . . , v} be the range of {u, v}∀x ∈ X, u,w, v ∈ V (w ∈Wuv ↔ ‖xu − xv‖ 6= ‖gw(x)u − gw(x)v‖)
Proof sketch for K = 2
u
w v
Corollary If {u, v} ∈ EP and w ∈Wuv, gw(x) 6∈ X
[L. et al. 2013]
93
Pruning group
Define:
Γ = {gw ∈ GD | w > K ∧ ∀{u, v} ∈ EP (w 6∈Wuv)}GP = 〈Γ〉
Lemma D X is invariant w.r.t. GP
Proof
Follows by corollary, invariance of XD w.r.t. GD and X ⊆ XD
94
Transitivity of the pruning group
Lemma E The action of GP on X is transitive
• Given x, y ∈ X, aim to show ∃g ∈ GP (y = g(x))
• Lemma B ⇒ ∃g ∈ GD with y = g(x) ∈ XD
• Suppose g 6∈ GP and aim for a contradiction
• ⇒ ∃{u, v} ∈ EP and w ∈W uv s.t. gw is a component of g
• Lemma C ⇒ ‖gw(x)u − gw(x)v‖ 6= duv
• If w is the only such vertex, y = g(x) 6= x against hypothesis, done
• Suppose ∃ another z ∈W uv s.t. gz is a component of g
• Set of cases s.t. ‖xu − xv‖ = ‖gzgw(x)u − gzgw(x)v‖ given ‖gw(x)u − gw(x)v‖ 6=‖xu − xv‖ 6= ‖gz(x)u − gz(x)v‖ has Lebesgue measure 0 in all DGP inputs
• By induction, holds for any number of components gz of g with z ∈W uv
• ⇒ y = g(x) 6= x against hypothesis, done
[L. et al. 2013]
95
The main result
Theorem |X| = 2|Γ|
• Lemma A ⇒ GD∼= Cn−K
2 ⇒ |GD| = 2n−K
• GP ≤ GD ⇒ ∃ℓ ∈ N (GP∼= Cℓ
2) , with ℓ = |Γ|
• Lemma E ⇒ ∀x ∈ X GPx = X
• Idempotency ⇒ ∀g ∈ GP g−1 = g
⇒ ∀g, h ∈ GP , x ∈ X (gx = hx→ h−1gx = x→ hgx = x→ hg = I → h = g−1= g)
⇒ the mapping GPx→ GP given by gx→ g is injective
• ∀g, h ∈ GP , x ∈ X (g 6= h→ gx 6= hx)
⇒ the mapping gx→ g is surjective
• ⇒ the mapping gx→ g is a bijection
• ⇒ |GPx| = |GP |• ⇒ ∀x ∈ X |X| = |GPx| = |GP | = 2|Γ|
[L. et al. 2013]
96
Symmetry-aware BP
• Don’t need to explore all branches of BP tree
• Build Γ as a pre-processing step
• Run BP, terminating as soon as |X| = 1
• For each g ∈ GP , compute gx
[Mucherino et al. JBCB 2012]
97
Complexity
• Computing Γ: O(mn)
1. initialize indicator vector ι = (ιK+1, . . . , ιn) for gv ∈ Γ
2. initialize ι = 1
3. for each {u, v} ∈ EP and w ∈Wuv let ιw = 0
• BP: O(2n)
• Compute gx for each g ∈ GP : O(2|Γ|)
• Overall: O(2n)
• Gains depend on the instance
98
Tractability of protein instances
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
[L. et al. 2013]
99
Let’s handle the BP tree
1
2
3
4 29
5 17
6 13
7 12
8
9
10
11
14 15
16
18 22
19 21
20
23 24
25
26
27
28
30 42
31 38
32 37
33
34
35
36
39 40
41
43 47
44 46
45
48 49
50
51
52
53
Max depth: n, looks good! Aim to prove width is bounded
100
Number of solutions at each BP tree levelDepends on range of longer pruning edge incident to level v
2 4 8 16 32 64 128 256 512
256
128
64
32
16
8
2
1 2 3 4 5 6 7 8 9
2 4 8 16 32 64 128
2 4 8 16 32 64
2 4 8 16 32
2 4 8 16
2 4 8
2 4
2 4
K+
Green path
8 nodes at level K +4
{3,K+5} ∈ EP ⇒ gK+4, gK+5 6∈ Γ
⇒ no symmetry at levels K + 4
and K +5
⇒ only 4 nodes at level K +5
no pruning edges
pruning edges makegraph K-trilaterative
101
Periodic pruning edges
1
16
2
3
4
5
6
7
8
9
10
11
12
13
14
1517
115 6 7 8 9
2 4 8 16 32 64 128 256 512
256
128
64
32
16
8
2
2 4 8 16 32 64 128
2 4 8 16 32 64
2 4 8 16 32
2 4 8 16
2 4 8
2 4
2 4
1210 ...4
• 2ℓ growth up to level ℓ, then constant: O(2ℓn) nodes in BP tree
• BP is Fixed-Parameter Tractable (FPT) in a bunch of cases
• For all tested protein backbones, ℓ ≤ 5⇒ BP linear on proteins!
102
The story so far
• Nice applications, problem is hard, could have many solutions
• Continuous methods don’t scale
• If certain vertex orders are present, use mixed-combinatorial methods
• Realize K-trilaterative in polytime but (K − 1)-trilaterative are
hard
• If adjacent predecessors are immediate, theory of symmetries
• Number of solutions is a power of two
• For proteins, BP is linear time
• How do we find these vertex orders?
103
Finding vertex orders
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
[Cassioli et al., DAM]
104
. . . wasn’t the backbone providing them?
• NMR data not as clean as I pretended
• Have to mess around with side chains
• What about other applications, anyhow?
Methods for finding trilaterative orders automatically
105
Mostly bad news
• Finding K-trilaterative orders is NP-complete :-(
• But also FPT :-)
• Finding KDMDGP orders is NP-complete for all K :-(
• It’s also really hard in practice, and methods don’t scale well
106
Definitions
• Trilateration Ordering Problem (TOP)
Given a connected graph G = (V,E) and a positive integer
K, does G have a K-trilateration order?
• Contiguous Trilateration Ordering Problem (CTOP)
Given a connected graph G = (V,E) and a positive integer
K, does G have a (K−1)-trilateration order such that Uv =
{v − 1, . . . , v −K} for each v > K?
Both problems are in NP
107
Hardness of TOP
• Essentially due to finding the initial clique
– brute force: test all(nK
)
subsets of V
–(nK
)
is O(nK), polytime if K fixed
• Reduction from K-Clique problem:
Given a graph, does it have a K-clique?
[Mucherino et al., OPTL 2012]
108
Reduction from K-Clique
reduction
GG′ = G⊕K
K = 3 K − 1 dummy nodes
• If K-Clique instance is YES– start with α = (initial clique of G,K)
– induction: if αv−1 defined, pick αv at shortest path distance 1 from⋃
α
• If K-Clique instance is NO– By contradiction: suppose ∃ trilateration order α in G′
– Initial clique α[K] = (α1, . . . , αK) must have K − 1 vertices in G, 1 in K
– αK+1 must be in G, hence ∃ K-clique in G
[Cassioli et al., DAM]
109
Once the initial clique is known
Greedily grow a trilateration order α
• Initialize α with initial K-clique K
• Let W = V rK
• ∀v > K av = |vertices in K adjacent to v|// at termination, av will be the number of adjacent predecessors of v
• While W 6= ∅:
1. choose v ∈W with largest av2. if av < K instance is NO
3. α← (α, v)
4. for all u ∈W adjacent to v, increase au5. W ←W r {v}
• Instance is YES
[Mucherino et al., OPTL 2012]
110
Greedy algorithm is correct
• Assume TOP instance is YES, proceed by induction
– start: by maximality, aK+1 > K
– assume α is a valid TOP up to v − 1, suppose av < K
– but instance is YES so there is another z ∈W with az ≥ K
– contradicts maximality of av
• Assume TOP instance is NO
– “YES” termination when W = ∅ contradicts the NO
– hence it must terminate with W 6= ∅ and “NO” answer
111
Complexity
• Outer while loop: O(n)
• Choice of largest av: O(n)
• Inner loop on W : O(n)
• Overall: O(n2)
• If we add brute force initial clique: O(2Kn2)
• Polytime if K fixed, FPT otherwise
112
CTOP is hard
• Reduction from Hamiltonian Path (HP)
Given a graph G, does it have a path passing through each
vertex exactly once?
• α a H. path in G ⇒ ∀v 6= 1, n αv is adjacent to αv−1, αv+1
• Apart from initial 1-clique α1
every αv is adjacent to its immediate predecessor
• ⇒ α is a KDMDGP order in G with K = 1
• HP is the same as CTOP with K = 1
• ⇒ By inclusion, CTOP is NP-hard
[Cassioli et al., DAM]
113
CTOP is hard for all K
• Reduction from HP
reduction
1
2
3C1
C2
C3
C12
C23
• Technical proof
[Cassioli et al., DAM]
114
How do we find KDMDGP orders?
Mathematical optimization & CPLEX
• xvi = 1 iff vertex v has rank i in the order
• Each vertex has a unique order rank:
∀v ∈ V∑
i∈nxvi = 1;
• Each rank value is assigned a unique vertex:
∀i ∈ n∑
v∈Vxvi = 1;
• There must be an initial K-clique:
∀v ∈ V, i ∈ {2, . . . ,K}∑
u∈N(v)
∑
j<i
xuj ≥ (i− 1)xvi;
• Each vertex with rank > K must have at least K contiguous adjacent predecessors
∀v ∈ V, i > K∑
u∈N(v)
∑
i−K≤j<i
xuj ≥ Kxvi.
• Do not expect too much; scales up to 100 vertices
115
How about those 10k-atom backbones?
We have Carlile for those
• Note the repetitions — they serve a purpose!
• Repetition orders are also hard to find for any K
• . . . but Carlile knows how to handcraft them!
[Lavor et al. JOGO 2013]
116
And what about the side-chains?The Carlile+Antonio tool!
[Costa et al. JOGO, accepted]
117
Approximate realizations
1. Applications
2. Definition
3. Complexity primer
4. Complexity of the DGP
5. Number of solutions
6. Mathematical optimization formulations
7. Realizing complete graphs
8. The Branch-and-Prune algorithm
9. Symmetry in the KDMDGP
10. Tractability of protein instances
11. Finding vertex orders
12. Approximate realizations
118
Data errors
The “distance = real number” paradigm is a lie!
• Covalent bonds are fairly precise
• NMR data is a mess [Berger, J. ACM 1999]
– experimental errors yield intervals [dLuv, dUuv]
– NMR outputs frequencies of (atom type pair,distance value)
weighted graph reconstruction yields systematic error
– some atom type pairs yield more error (“only trust H—H”)
• Properties of specific molecules give rise to other constraints
• The protein graph may not be (K−1)-trilaterative based on
the backbone
119
The Lavorder comes to the rescue!
• Carlile’s handcrafted repetition orders properties:
– repetitions allow a “virtual backbone” of H atoms only
– discretization edges: {v, v − i} covalent bonds for i ∈ {1,2},{v, v − 3} sometimes covalent sometimes from NMR
– most NMR data restricted to pruning edges
• When dv,v−3 is an interval: intersect two spheres with sph. shelldL
dU
• Discretize circular segments and run BP with modified S
Algorithm no longer exhaustive
120
Die Symmetriktheoriedammerung
• Intervals and discretization break the theory of symmetries����
�� ��������
����
����
��������
��������
��������
����
����
��
��
����
��
������
��
������
��������
����
��������
��
��������
����
��������
�����������������������������������������������������������
�����������������������������������������������������������
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
duv
• Only some bounds for the number b of BP solutions:
∃ℓ, k 2ℓqk ≤ b ≤ 2n−3qM
q = |discretization points|, M = |NMR discretization edges|
121
But at least it’s producing results
Joint work with Institut Pasteur
[Cassioli et al., BMC Bioinf., submitted]
122
General approximate methods
• All these methods are specialized to
protein distance data from NMR
• What about general approximate methods?
• Assume large-sized input data with errors
• No assumptions on graph structure
123
Ingredients
• PDM = Partial Distance Matrix (a representation of G)
• EDM = Euclidean Distance Matrix
1. Complete the given PDM d to a symmetric matrix D
2. Find a realization x (in some dimension K)
s.t. the EDM (‖xu − xv‖) is “close” to D
3. Project x from dimension K to dimension K,
keeping pairwise distances approximately equal
124
Completing the distance matrix
• ∀{u, v} 6∈ E let Duv = length of the shortest path u→ v
• Use Floyd-Warshall’s algorithm O(n3)
1: // n× n array Dij to store distances
2: D = 0
3: for {i, j} ∈ E do
4: Dij = dij5: end for
6: for k ∈ V do
7: for j ∈ V do
8: for i ∈ V do
9: if Dik +Dkj < Dij then
10: // Dij fails to satisfy triangle inequality, update
11: Dij = Dik +Dkj
12: end if
13: end for
14: end for
15: end for
125
Finding a realization
• Let’s give ourselves many dimensions, say K = n
• Attempt to find x : V → Rn with (‖xu − xv‖2) ≈ (Duv)
• If we had the Gram matrix B of x, then:
1. find eigen(value/vector) matrices Λ, Y of B
2. since B is PSD, Λ ≥ 0⇒√Λ exists
3. ⇒ B = Y ΛY ⊤ = (Y√Λ)(Y
√Λ)⊤
4. x = Y√Λ is such that xx⊤ = B
• Can we compute B from D?
126
Schoenberg’s theorem
• Standard method for computing B from D2
• Also known as classic MultiDimensional Scaling (MDS)
• Apply many algebraic manipulations to
d2uv = ‖xu − xv‖2 = xu⊤xu + xv
⊤xv − 2xu⊤xv
where the centroid∑
k≤nxuk = 0 for all u ≤ n
• Get B = −12(In −
1n1n)D
2(In − 1n1n), i.e.
xu · xv =1
2n
∑
k≤n(d2uk + d2kv)− d2uv −
1
2n2
∑
h≤nk≤n
d2hk
• D “approximately” EDM ⇒ B “approximately” Gram
[Schoenberg, Annals of Mathematics, 1935]
127
Project to RK for a given K
• Only use the K largest eigenvalues of Λ
• Y [K] = K columns of Y corresp. to K largest eigenvalues
• Λ[K] = K largest eigenvalues of Λ on diagonal
• x = Y [K]√
Λ[K] is a K × n matrix
• Y [K] span the subspace where x “fills more space”, i.e. neglecting other
dimensions causes smaller errors w.r.t. the realization in Rn
This method is called Principal Component Analysis (PCA)
128
Isomap
Given K and PDM d:
1. D = FloydWarshall(d)
2. B = MDS(D)
3. x = PCA(B,K)
−→
[Tenenbaum et al. Science 2000]
129
Some references
• L. Liberti, C. Lavor, N. Maculan, A. Mucherino, Euclidean distance geometry
and applications, SIAM Review, 56(1):3-69, 2014
• L. Liberti, B. Masson, J. Lee, C. Lavor, A. Mucherino, On the number
of realizations of certain Henneberg graphs arising in protein conformation,Discrete Applied Mathematics, 165:213-232, 2014
• L. Liberti, C. Lavor, A. Mucherino, The discretizable molecular distance geometry
problem seems easier on proteins, in [see below], 47-60
• A. Mucherino, C. Lavor, L. Liberti, N. Maculan (eds.), Distance Geometry:
Theory, Methods and Applications, Springer, New York, 2013
THE END