Constraints and Global Optimization for Gene Prediction Overlap Resolution
Global Constraints
description
Transcript of Global Constraints
Global Constraints
Toby WalshNational ICT Australia and
University of New South Waleswww.cse.unsw.edu.au/~tw
Course outline● Introduction● All Different● Lex ordering● Value precedence● Complexity● GAC-Schema● Soft Global Constraints● Global Grammar Constraints● Roots Constraint● Range Constraint● Slide Constraint● Global Constraints on Sets
Global grammar constraints
● Often easy to specify a global constraint
– ALLDIFFERENT([X1,..Xn]) iff Xi=/=Xj for i<j
● Difficult to build an efficient and effective propagator– Especially if we want global reasoning
Global grammar constraints
● Promising direction initiated is to specify constraints via automata/grammar
– Sequence of variables = string in some formal language– Satisfying assignment = string accepted by the grammar/automata
Global constraints meets formal language theory
REGULAR constraint
● REGULAR(A,[X1,..Xn]) holds iff– X1 .. Xn is a string accepted by the deterministic finite
automaton A– Proposed by Pesant at CP 2004– GAC algorithm using dynamic programming– However, DP is not needed since simple ternary
encoding is just as efficient and effective
REGULAR constraint
● Deterministic finite automaton (DFA)– <Q,Sigma,T,q0,F>– Q is finite set of states– Sigma is alphabet (from which strings formed)– T is transition function: Q x Sigma -> Q– q0 is starting state– F subseteq Q are accepting states
● DFAs accept precisely regular languages– Regular language can be specified by rules of the form:NonTerminal -> Terminal | Terminal NonTerminal
REGULAR constraint
● DFAs accept precisely regular languages– Regular language can be specified by rules of the form:
NonTerminal -> Terminal NonTerminal -> Terminal NonTerminal
- Alternatively given by regular expressions- More limited than BNF which can express context-free
grammars
REGULAR constraint
● Deterministic finite automation (DFA)
5 tuple <Q,Sigma,T,q0,F> where
– Q is finite set of states– Sigma is alphabet (from which strings formed)– T is transition function: Q x Sigma -> Q– q0 is starting state– F subseteq Q are accepting states
REGULAR constraint
● Regular language– S -> 0 | 0A| AB | 1B | 1– A -> 0 | 0A– B -> 1 | 1B
● DFA– Q={q0,q1,q2,q3}– Sigma={0.1}– T(q0,0)=q0. T(q0,1)=q1– T(q1,0)=q2, T(q1,1)=q1– T(q2,0)=q2, T(q2,1)=q3– T(q3,0)=T(q3,1)=q3– F={q0,q1,q2}
REGULAR constraint
● Regular language– S -> A | AB | ABA | BA | B– A -> 0 | 0A– B -> 1 | 1B
● DFA– Q={q0,q1,q2,q3}– Sigma={0.1}– T(q0,0)=q0. T(q0,1)=q1– T(q1,0)=q2, T(q1,1)=q1– T(q2,0)=q2, T(q2,1)=q3– T(q3,0)=T(q3,1)=q3– F={q0,q1,q2}
This is the CONTIGUITYglobal constraint
REGULAR constraint
● Many global constraints are instances of REGULAR– AMONG– CONTIGUITY– LEX– PRECEDENCE– STRETCH– ..
● Domain consistency can be enforced in O(ndQ) time using dynamic programming
REGULAR constraint
● REGULAR constraint can be encoded into ternary constraints
● Introduce Qi+1– state of the DFA after the ith transition
● Then post sequence of constraints– C(Xi,Qi,Qi+1) iff DFA goes from state Qi to Qi+1 on symbol Xi
REGULAR constraint
● REGULAR constraint can be encoded into ternary constraints
● Constraint graph is Berge-acyclic– Constraints only overlap on one variable– Enforcing GAC on ternary constraints achieves GAC
on REGULAR in O(ndQ) time
REGULAR constraint
● PRECEDENCE([X1,..Xn]) iff– min({i | Xi=j or i=n+1}) < min({i | Xi=k or i=n+2}) for
all j<k● DFA has one state for each value plus a single
non-accepting state, fail– State represents largest value so far used
● T(Si,vj)=Si if j<=i● T(Si,vj)=Sj if j=i+1● T(Si,vj)=fail if j>i+1● T(fail,v)=fail
REGULAR constraint
● PRECEDENCE([X1,..Xn]) iff– min({i | Xi=j or i=n+1}) < min({i | Xi=k or i=n+2}) for
all j<k● DFA has one state for each value plus a single
non-accepting state, fail– State represents largest value so far used
● T(Si,vj)=Si if j<i● T(Si,vj)=Sj if j=i+1● T(Si,vj)=fail if j>i+1● T(fail,v)=fail● REGULAR encoding of this is just these transition
constraints (can ignore fail)
REGULAR constraint
● STRETCH([X1,..Xn]) holds iff– Any stretch of consecutive values is between
shortest(v) and longest(v) length– Any change (v1,v2) is in some permitted set, P– For example, you can only have 3 consecutive night
shifts and a night shift must be followed by a day off
REGULAR constraint
● STRETCH([X1,..Xn]) holds iff– Any stretch of consecutive values is between
shortest(v) and longest(v) length– Any change (v1,v2) is in some permitted set, P
● DFA– Qi is <last value, length of current stretch>– Q0= <dummy,0>– T(<a,q>,a)=<a,q+1> if q+1<=longest(a)– T(<a,q>,b)=<b,1> if (a,b) in P and q>=shortest(a)– All states are accepting
NFA constraint
● Automaton does not need to be deterministic● Non-deterministic finite automaton (NFA) still
only accept regular languages– But may require exponentially fewer states– Important as O(ndQ) running time for propagator– E.g. 0* (1|2)* 2 (1|2)^k 0*– Where 0=closed, 1=production, 2=maintenance
● Can use the same ternary encoding
Soft REGULAR constraint
● May wish to be “near” to a regular string● Near could be
– Hamming distance– Edit distance
● SoftREGULAR(A,[X1,..Xn],N) holds iff– X1..Xn is at distance N from a string accepted by the
finite automaton A– Can encode this into a sequence of 5-ary constraints
Soft REGULAR constraint
● SoftREGULAR(A,[X1,..Xn],N)– Consider Hamming distance (edit distance similar
though a little more complex)– Qi+1 is state of automaton after the ith transition– Di+1 is Hamming distance up to the ith variable– Post sequence of constraints
● C(Xi,Qi,Qi+1,Di,Di+1) where● Di+1=Di if T(Xi,Qi)=Qi+1 else Di+1=1+Di
Soft REGULAR constraint
● SoftREGULAR(A,[X1,..Xn],N)– To propagate– Dynamic programming
● Pass support along sequence – Just post the 5-ary constraints
● Accept less than GAC– Tuple up the variables
Cyclic forms of REGULAR
● REGULAR+(A,[X1,..,Xn])– X1 .. XnX1 is accepted by A– Can convert into REGULAR by increasing states by
factor of d where d is number of initial symbols– qi => (qi,d)– T(qi,a)=qj => T((qi,d),a)=(qj,d)– T(q0,d)=qk => T(q0,d)=(qk,d)– Thereby pass along value taken by X1 so it can be
checked on last transition
Cyclic forms of REGULAR
● REGULARo(A,[X1,..,Xn])– Xi .. X1+(i+n-1)mod n is accepted by A for each
1<=i<=n– Can decompose into n instances of the REGULAR
constraint– However, this hinders propagation
● Suppose A accepts just alternating sequences of 0 and 1● Xi in {0,1} and REGULARo(A,[X1,X2.X3])
– Unfortunately enforcing GAC on REGULARo is NP-hard
Cyclic forms of REGULAR
● REGULARo(A,[X1,..,Xn])– Reduction from Hamiltonian cycle– Consider polynomial sized automaton A1 that accepts
any sequence in which the 1st character is never repeated
– Consider polynomial sized automaton A2 that accepts any walk in a graph
● T(a,b)=b iff (a,b) in edges of graph– Consider polynomial sized automaton A1 intersect A2– This accepts only those strings corresponding to
Hamiltonian cycles
Other generalizations of REGULAR
● REGULAR FIX(A,[X1,..Xn],[B1,..Bm]) iff– REGULAR(A,[X1,..Xn]) and Bi=1 iff exists j. Xj=I– Certain values must occur within the sequence– For example, there must be a maintenance shift– Unfortunately NP-hard to enforce GAC on this
Other generalizations of REGULAR
● REGULAR FIX(A,[X1,..Xn],[B1,..Bm])– Simple reduction from Hamiltonian path– Automaton A accepts any walk on a graph– n=m and Bi=1 for all i
Chomsky hierarchy
● Regular languages● Context-free languages● Context-sensitive languages● ..
Chomsky hierarchy
● Regular languages– GAC propagator in O(ndQ) time
● Conext-free languages– GAC propagator in O(n^3) time and O(n^2) space– Asymptotically optimal as same as parsing!
● Conext-sensitive languages– Checking if a string is in the language PSPACE-
complete– Undecidable to know if empty string in grammar and
thus to detect domain wipeout and enforce GAC!
Context-free grammars
● Possible applications– Hierarchy configuration– Bioinformatics– Natual language parsing– …
● CFG(G,[X1,…Xn]) holds iff– X1 .. Xn is a string accepted by the context free
grammar G
Context-free grammars
● CFG(G,[X1,…Xn])– Consider a block stacking example– S -> NP | P | PN | NPN– N -> n | nN– P -> aa | bb | aPa | bPb– These rules give n* w rev(w) n* where w is (a|b)*– Not expressible using a regular language
● Chomsky normal form– Non-terminal -> Terminal– Non-terminal -> Non-terminal Non-terminal
Context-free grammars
● CFG(G,[X1,…Xn])– Example with X1 in {n,a}, X2 in {b}, X3 in {a,b} and
X4 in {n,a}– Enforcing GAC on CFG prunes X3=a– Only supports are nbbn and abba
CFG propagator
● Adapt CYK parser ● Works on Chomsky normal form
– Non-terminal -> Terminal– Non-terminal -> Non-terminal Non-terminal
● Using dynamic programming– Computes V[i,j], set of possible parsings for the ith to
the jth symbols
CFG propagator
● Adapt CYK parser which uses dynamic programming– For i=1 to n do– V[i,1]:={A | A->a in G, a in dom(Xi)– For j=2 to n do– For i=1 to n-j+1 do– V[i,j]:={}– For k=1 to j-1 do– V[i,j]:=V[i,j] u {A|A->BC in G, – B in V[i,k], C in V[i+k,j-k]}– If not(S in V[1,n]) then “unsat”
Conclusions
● Global grammar constraints– Specify wide range of global constraints– Provide efficient and effective propagators
automatically– Nice marriage of formal language theory and constraint
programming!