Problem Warping and Computational Dynamics in the Solution of NP-hard Problems John A Clark Dept. of...

Problem Warping and Computational Dynamics in the Solution of NP-hard Problems

John A ClarkDept. of Computer Science

University of York, [email protected]

26.07.2001

Overview Overview of Hill-Climbing and Simulated

Annealing Breaking Permuted Perceptron Problem

previous work problem warping timing analysis solution family based attacks quantum computing

Speculation

Heuristic Optimisation and Simulated Annealing

Local Optimisation - Hill Climbing

x0 x1 x2

z(x)

Neighbourhood of a point x might be N(x)={x+1,x-1}Hill-climb goes x

0 x

1 x

2 since

f(x0)<f(x

1)<f(x

2) > f(x

3)

and gets stuck at x2 (local

optimum)

xopt

Really want toobtain x

opt

x3

Simulated Annealing

x0 x1

x2

z(x)Allows non-improving moves so that it is possible to go down

x11

x4

x5

x6

x7

x8

x9

x10

x12

x13

x

in order to rise again

to reach global optimum

In practice neighbourhood may be very large and trial neighbour is chosen randomly. Possible to accept worsening move when improving ones exist.

Simulated Annealing Improving moves always accepted Non-improving moves may be accepted

probabilistically and in a manner depending on the temperature parameter T. Loosely

the worse the move the less likely it is to be accepted

a worsening move is less likely to be accepted the cooler the temperature

The temperature T starts high and is gradually cooled as the search progresses.

Initially virtually anything is accepted, at the end only improving moves are allowed (and the search effectively reduces to hill-climbing)

Simulated Annealing Current candidate x. Minimisation formulation.

farsobestisSolution

TempTemp

rejectelse

acceptyxcurrentUifelse

acceptyxcurrentif

yfxf

xighbourgenerateNey

timesDo

dofrozenUntil

TTemp

xxcurrent

Temp

95.0

)( ))1,0((exp

)( )0(

)()(

)(

400

)(

0

0

/

At each temperature consider 400 moves

Always accept improving moves

Accept worsening moves probabilistically.

Gets harder to do this the worse the move.

Gets harder as Temp decreases.

Temperature cycle

Simulated Annealing

Do 400 trial moves

Do 400 trial moves

Do 400 trial moves

Do 400 trial moves

Do 400 trial moves

100T

95.0TT

95.0TT

95.0TT

95.0TT

00001.0TDo 400 trial moves

95.0TT

Breaking Protocols with Heuristic Optimisation

Identification Problems Notion of zero-knowledge introduced by

Goldwasser and Micali (1985) Indicate that you have a secret without revealing it

Early scheme by Shamir Several schemes of late based on NP-complete

problems Permuted Kernel Problem (Shamir) Syndrome Decoding (Stern) Constrained Linear Equations (Stern) Permuted Perceptron Problem (Pointcheval)

Pointcheval’s Perceptron Schemes

Given

A nm

1a ij

a......aa

...............

a......aa

a.......aa

mnm2m1

2n2221

1n1211

1 js

Find

:

: 2

1

ns

s

s

S n 1

0

:

0

0

:

2

1

mw

w

w

SA nnm 1

So That

Interactive identification protocols based on NP-complete problem.

Perceptron Problem.

Pointcheval’s Perceptron Schemes

Given

A nm

1a ij

a......aa

...............

a......aa

a.......aa

mnm2m1

2n2221

1n1211

1 js

Find

:

: 2

1

ns

s

s

S n 1

:

2

1

mw

w

w

SA nnm 1

So That

Permuted Perceptron Problem (PPP). Make Problem harder by imposing extra constraint.

Has particular histogram H of positive values

1 3 5 .. .. ..

Example: Pointcheval’s Scheme

PP and PPP-example Every PPP solution is a PP solution.

5

1

1

3

1

1

1

1

1

11111

11111

1111-1

1-11-1-1

)1,1,2(

))5(),3(),1((

hhhH

Has particular histogram H of positive values

1 3 5

Generating Instances

Suggested method of generation:

11111

11111

1111-1

11-111-

• Generate random matrix A

1

1

1

1

1

• Generate random secret S

5

1

1

3

• Calculate AS• If any (AS)i <0 then negate ith row of

A

11111

11111

1111-1

11-111-

1

1

1

1

1

5

1

1

3

11111

11111

Significant structure in this problem; high correlation between majority values of matrix columns and secret corresponding secret bits

Instance Properties

Each matrix row/secret dot product is the sum of n Bernouilli (+1/-1) variables.

Initial image histogram has Binomial shape and is symmetric about 0 After negation simply folds over to be positive

-7–5-3-1 1 3 5 7… 1 3 5 7…

Image elements tend to be small

PP Using Search: Pointcheval

Pointcheval couched the Perceptron Problem as a search problem.

1

1

1

1

1

1Y

1

1

1

1

1

2Y

1

1

1

1

1

3Y

1

1

1

1

1

4Y

1

1

1

1

1

5Y

current solution Y

Neighbourhood defined by single bit flips on current solution

1

1

1

1

1

Cost function punishes any negative image components

1

3

1

1

AY

costNeg(y)=|-1|+|-3| =4

Using Annealing: Pointcheval

PPP solution is also PP solution. Based estimates of cracking PPP on ratio of PP solutions to

PPP solutions. Calculated sizes of matrix for which this should be most

difficult Gave rise to (m,n)=(m,m+16) Recommended (m,n)=(101,117),(131,147),

(151,167) Gave estimates for number of years needed to solve PPP

using annealing as PP solution means PP instances with matrices of size 200 ‘could usually be

solved within a day’ But no PPP problem instance greater than 71 was ever

solved this way ‘despite months of computation’.

Perceptron Problem (PP)

Knudsen and Meier approach (loosely): Carrying out sets of runs Note where results obtained all agree Fix those elements where there is complete

agreement and carry out new set of runs and so on. If repeated runs give same values for particular bits

assumption is that those bits are actually set correctly

Used this sort of approach to solve instances of PP problem up to 180 times faster than Pointcheval for (151,167) problem but no upper bound given on sizes achievable.

Profiling Annealing

Approach is not without its problems. Not all bits that have complete agreement are correct.

Actual SecretRun 1Run 2Run 3Run 4Run 5Run 6All runs agree

All agree (wrongly)

1-1

Knudsen and Meier Have used this method to attack PPP problem sizes

(101,117) Needs hefty enumeration stage (to search for wrong

bits), allowed up to 264 search complexity Used new cost function w1=30, w2=1 with histogram

punishment

cost(y)=w1costNeg(y)+w2costHist(y)

1

1

1

1

Ay)0,0,3()(

)1,1,2()(

yhist

shist

010123)(costHist y

Why Don’t They Work Better? What limits the ability of annealing to find a PP solution?

PP Move Effects

A move changes a single element of the current solution. Want current negative image values to go positive But changing a bit to cause negative values to go positive

will often cause small positive values to go negative.

01234567 01234567

iAYiW 2''

iWiAYiW

Problem Warping

Can significantly improve results by punishing at positive value K

For example punish any value less than K=4 during the search

Drags the elements away from the boundary during search. Also use square of differences |Wi-K|2 rather than simple

deviation

01234567

AYW Cost=|4- -1|2=25

Problem Warping PP0 1 2 3 0 1 2 3

Pr 0 0 0 0 3 Pr 5 0 4 6 5Pr 1 3 6 2 11 Pr 6 3 6 12 5Pr 2 1 11 6 8 Pr 7 4 7 14 2Pr 3 8 12 6 3 Pr 8 3 14 2 9Pr 4 0 4 5 4 Pr 9 1 1 5 4

0 1 2 0 1 2 Pr 0 0 0 1 Pr 5 0 1 0 Pr 1 0 0 2 Pr 6 1 2 6 Pr 2 1 11 1 Pr 7 0 11 6 Pr 3 8 12 14 Pr 8 0 2 9 Pr 4 0 4 6 Pr 9 3 12 11

0 1 2 3 0 1 2 3Pr 0 0 0 0 0 Pr 5 0 0 0 2Pr 1 0 0 1 1 Pr 6 0 0 0 1Pr 2 0 2 2 4 Pr 7 0 0 0 1Pr 3 0 1 1 3 Pr 8 0 0 0 0Pr 4 0 0 0 0 Pr 9 0 1 3 4

0 1 2 3 0 1 2 3Pr 0 0 0 0 1 Pr 5 0 0 2 2Pr 1 0 0 0 1 Pr 6 0 2 1 1Pr 2 0 0 0 0 Pr 7 0 0 0 0Pr 3 0 0 0 2 Pr 8 0 0 0 0Pr 4 0 0 0 0 Pr 9 0 0 0 0

(201,217)

(401,417)

(501,517)

(601,617)

Table gives numbers of success in 30 runs of annealing followed by 0,1,2,3 bit hill-climb for each of 10 problems.

Problem Warping

Comparative results Generally allows solution within a few runs of annealing for sizes (201,217) Number of bits correct is generally worst when K=0. Best value for K varies between sizes (but can do profiling to test what it is)

Has proved possible to solve for size (601,617) and higher. Enormous increase in power for essentially change to one line of the

program Using powers of 2 rather than just modulus Use of K factor

Morals… Small changes may make a big difference. The real issue is how the cost function and the search technique interact The cost function need not be the most `natural’ direct expression of the

problem to be solved. Cost functions are a means to an end.

This is a form of fault injection or problem warping on the problem.

PPP (101, 117)

PPP (101,117)

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29Problem Number

Ma

x B

its

Co

rre

ct O

ver

All

R

un

s

Bits Correct in FinalSolution

Initial N Bits StuckCorrect

PPP (131, 147)

PPP (131,147)

0

20

40

60

80

100

120

140

160

1 4 7

10

13

16

19

22

25

28

Problem Number

Ma

x B

its

Co

rre

ct

Ov

er A

ll R

un

s

Bits Correct in FinalSolution


PPP (151, 167)

PPP (151,167)

0

50

100

150

200

1 4 7 10 13 16 19 22 25 28Problem Number

Ma

x B

its

Co

rre

ct O

ver

All

R

un

s

Max Bits Correct inFinal Solution


Some Tricks

Won’t go into detail but there are some further problem specific tricks that can be used to reduce the remaining search.

For example, you can generally tell easily whether you have an odd or even number of bits wrong.

Sum the image elements taking values of … -7,-3,1,5,9,13.. (S1)

Sum the image elements taking values of … -5,-1,3,7, 11.. (S2)

Find the corresponding sums T1, T2 in the provided histogram

If T1=S1 and T2=S2 then there are an even number of bits wrong

If T1=S2 and T2=S1 then there are an odd number wrong

A Few Tricks More

Look at the image elements wi produced. If I knew what they should be I could use linear algebra

to solve the system. I do not know whether they are right or not – but often

they are, or nearly so. If wi=1 is obtained by some run. It is very likely that the

actual value it should be is 1,5,9 (assuming an even number of bits wrong).

Assume it is correct. Then changing any bits of the current solution to obtain the original solution must not change the value of wi

This means half the bits xj I change in the solution x must agree in sign with corresponding bit aij in the ith row (and half must disagree). This reduces the complexity of the remaining search.

Overall

Have missed out the details but basically this scheme is broken.

There is just two much structure….and there is more

Radical Viewpoint Analysis

Problem P

Problem P1 Problem P2 Problem Pn-1 Problem Pn

Essentially create mutant problems and attempt to solve them.

If the solutions agree on particular elements then they generally will do so for a reason, generally because they are correct.

Can think of mutation as an attempt to blow the search away from actual original solution.

Look for agreement between solutions. Often nearly half the key can be obtained without any wrong bits.

Radical Viewpoint Analysis

Bits where three runs agree. Go for unanimity. A more stressful variation of Knudsen and Meier’s idea

Democratic Viewpoint Analysis

Problem P


Essentially same as before but this time go for substantial rather than unanimous agreement.

By choosing the amount of disagreement tolerated carefully you can sometimes get over half the key this way. And on occasion have had only 1 bit in 115 most agreed bits incorrect (out of 167)

It’s a 1 It’s a 1 It’s a 1 No. It’s a -1

Multiple Clock Watchers Analysis

Problem P


Essentially same as for timing analysis but this time add up the times over all runs where each bit got stuck.

As you might expect those bits that often get stuck early (i.e. have low aggregate times to getting stuck) generally do so at their correct values (take the majority value).

Also seems to have significant potential but needs more work.

Quantum Computation

Everything I have reported so far has assumes the classical computational paradigm.

But this is the very assumption that gave rise to the biggest shock in cryptography.

Let’s not fall into the same trap. Can heuristic search and quantum computing work together?

Grover’s Algorithm

Consider a function f(x) : x is in 0..(2N-1) there is a single value v such that some predicate P(v) holds.

Then Grover’s algorithm can find v in approximately O(2(N/2)) steps.

Thus if we have a state space of size 2100, it will require O(250) steps

Now let us return to the (101,117) PPP case. Finding a solution to this by quantum search would require O(259)

steps. But if we can obtain a solution with 108 bits correct, we could

ask a different question. What are the indices of the 9 wrong bits? Assuming each index can be couched in 7 bits, we have 7*9=63 bits This means that Grover’s can find the answer in O(232)

More Short Term Can we view metaheuristic search as a means

of problem reduction rather than problem solving?

The AI community has developed methods that work very well with very highly constrained problems.

Am currently experimenting with profiling and using properties of how near search gets to the goal to place bounds on the remaining problem and solve using linear programming.

Grover’s Algorithm 2

And it’s not all one way. If there are more states satisfying a predicate one might expect the task of finding one of them to be easier than previously.

Indeed if there are M states v satisfying the predicate P(v) then the search becomes of order

And so characterise positions from which you can use heuristic searcheffectively and use QC to find them. Then use HS to reach optimium

2/12

M

N

Use QC to get in this range

Now hill-climb to get here

Speculation and Further work

Can we try failing millions of times and then start doing cryptanalysis on the results?

Will the techniques work more widely? Why cannot I break say DES or RSA using a technique like this?

Is there a theorem to suggest not? No. Cryptography of block ciphers largely works by approximations,

e.g. functions of the form P[3].xor.P[35].xor.K[1].xor.K[22].xor.C[15].xor.C[52]

are true with some bias (e.g. 50.00001% of the time) P[j] =bit j of a plaintext block, similarly C is ciphertext and K

is key. Can we derive these from sample data using annealing?

How can we exploit the notion of shifting computational paradigm?

How well can we profile the distribution of results in order to isolate those ones at the extremes of correctness?

Speculation and Further work

Very few applications of these techniques to modern day cryptography and its applications.

Have successfully created Boolean functions with desirable cryptographic properties.

Have also evolved evolved protocols in belief logics whose abstract execution is a proof of their own correctness.

Much more to come.

Problem Warping and Computational Dynamics in the Solution of NP-hard Problems John A Clark Dept. of...

Documents

Transcript of Problem Warping and Computational Dynamics in the Solution of NP-hard Problems John A Clark Dept. of...