On recognizing words that are squares for the shuffle product

36
On recognizing words that are squares for the shuffle product Romeo Rizzi 1 St´ ephane Vialette 2 1 Department of Computer Science, University of Verona, Italy 2 CNRS & LIGM, Universit´ e Paris-Est Marne-la-Vall´ ee, France 8th International Computer Science Symposium in Russia Ural Federal University, Ekaterinburg, Russia June 25-29, 2013 Rizzi & Vialette () Being a square for the shuffle product CSR 2013 1 / 34

Transcript of On recognizing words that are squares for the shuffle product

Page 1: On recognizing words that are squares for the shuffle product

On recognizing words that aresquares for the shuffle product

Romeo Rizzi 1 Stephane Vialette 2

1Department of Computer Science, University of Verona, Italy

2CNRS & LIGM, Universite Paris-Est Marne-la-Vallee, France

8th International Computer Science Symposium in RussiaUral Federal University, Ekaterinburg, Russia

June 25-29, 2013

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 1 / 34

Page 2: On recognizing words that are squares for the shuffle product

ShuffleThe shuffle u � v of words u and v over A is the finite set of allwords obtainable from merging the words u and v from left toright, but choosing the next symbol arbitrarily from u or v.

ab� cd = {abcd, acbd, acdb, cabd, cadb, cdab}

The iterated shuffle of u is the language

ε ∪ u ∪ (u � u) ∪ (u � u � u) ∪ . . .

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 2 / 34

Page 3: On recognizing words that are squares for the shuffle product

Iterated shuffle

There are basically two kinds of questions that can be addresseddepending on whether or not the shuffled element is given as a part ofthe input:

“Given u, v ∈ A∗, is u in the iterated shuffle of v?”, and“Given u ∈ A∗, is u in the iterated shuffle of some v ∈ A∗?”

For two applications of the shuffle product, we obtain:“Given u, v ∈ A∗, is u ∈ v � v?”, and“Given u ∈ A∗, does there exist v ∈ A∗ such that u ∈ v � v?”

As we shall see soon, these two problems dramatically differ incomplexity.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 3 / 34

Page 4: On recognizing words that are squares for the shuffle product

Some good news

Given words u, v1 and v2, it can be tested in O(|u|2/ log(|u|)

)time whether or not u ∈ v1 � v2 [Leeuwen, and Nivat, 1982].

The shuffle u � v of words u and v can be computed inO

((|u|+ |v|)

(|u|+|v||u|

))time [Spehner. 1986].

Given words u1, u2, . . . , uk , the shuffle �ki=1ui can be computed in

O((|u1|+|u2|+...+|uk ||u1|,|u2|,...,|uk |

))time [Allauzen. 2000].

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 4 / 34

Page 5: On recognizing words that are squares for the shuffle product

Some bad news

Given words u, v1, v2, . . . , vn ∈ A∗, it is NP-complete to decidewhether or not u ∈ �k

i=1vi . [Mansfield. 1983;Warmuth, andHaussler. 1984].

This remains true even if the alphabet has size 3 [Warmuth, andHaussler. 1984].

For two words u and v, it is NP-complete to decide whether ornot u is in the iterated shuffle of v [Warmuth, and Haussler.1984].

This remains true even if the alphabet has size 3 [Warmuth,and Haussler. 1984].

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 5 / 34

Page 6: On recognizing words that are squares for the shuffle product

Our main result

TheoremGiven u ∈ A∗, it is NP-complete to decide whether or not u is theshuffle of some word v ∈ A∗ with itself

(i.e., does there exist some v ∈ A∗ such that u ∈ v � v?).

This result was first claimed by K. Iwama [Iwama. 1983], but itturns out that the proof has a serious flaw.

This result was recently proved independently by Buss and Soltys[Buss, and Soltys. 2012].

The problem is polynomial-time solvable in case each characteroccurs at most four times.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 6 / 34

Page 7: On recognizing words that are squares for the shuffle product

Stack Exchange discussion board

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 7 / 34

Page 8: On recognizing words that are squares for the shuffle product

Words and linear graphsLet u = u1 u2 . . . un ∈ An be a word on some alphabet A.

The graph associated to u, denoted V(Gu), is defined by

V(Gu) = {u1, u2, . . . , un}, andE(Gu) = {{ui , uj} : i 6= j ∧ ui = uj}.

The structure of this underlying graph is linear, i.e., the set ofvertices is equipped with a natural total order < defined byui < uj if and only if i < j.

In other words, the vertices of Gu correspond to the letters of u inthe left to right order and there is an edge between any twoidentical distinct letter of u.

Clearly, Gu is the disjoint union of cliques, one clique for eachdistinct letter of u.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 8 / 34

Page 9: On recognizing words that are squares for the shuffle product

Words and linear graphs

For u = ababbbaa we obtain:

Gu a b a b b b a a

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 9 / 34

Page 10: On recognizing words that are squares for the shuffle product

Words, linear graphs, and inclusion-free matchings

Let u = u1 u2 . . . un ∈ An be a word on some alphabet A, and let Gube the linear graph associated to u.

A matching is perfect if it covers all the vertices of the graph.

In case the set of vertices is equipped with a total order, amatching M is said to be inclusion-free if there do not exist(independent) edges (ui , uj) and (uk , u`) in M such thatui < uk < u` < uj or uk < ui < ij < u`.

LemmaLet u ∈ A∗ for some alphabet A, and Gu be the corresponding lineargraph. Then, there exists v ∈ A∗ such that u ∈ v � v if and only ifthere exists an inclusion-free perfect matching in Gu.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 10 / 34

Page 11: On recognizing words that are squares for the shuffle product

Words, linear graphs, and inclusion-free matchings

Let u = u1 u2 . . . un ∈ An be a word on some alphabet A, and let Gube the linear graph associated to u.

A matching is perfect if it covers all the vertices of the graph.

In case the set of vertices is equipped with a total order, amatching M is said to be inclusion-free if there do not exist(independent) edges (ui , uj) and (uk , u`) in M such thatui < uk < u` < uj or uk < ui < ij < u`.

LemmaLet u ∈ A∗ for some alphabet A, and Gu be the corresponding lineargraph. Then, there exists v ∈ A∗ such that u ∈ v � v if and only ifthere exists an inclusion-free perfect matching in Gu.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 10 / 34

Page 12: On recognizing words that are squares for the shuffle product

Linear graph and inclusion-free perfect matching

u = a b b aa b b a

Gu a b a b b b a a

M a b a b b b a a

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 11 / 34

Page 13: On recognizing words that are squares for the shuffle product

Being a square for the shuffle product

TheoremGiven u ∈ A∗, it is NP-complete to decide whether or not u is theshuffle of some word v ∈ A∗ with itself.

We propose a polynomial-time reduction from the NP-completeLongest Common Subsequence for binary words[Maier.1978]:

“Given a collection of words U = {u1, u2, . . . , um}, ui ∈ {0, 1}∗ for1 ≤ i ≤ m, and a positive integer k, decide whether there exists asubsequence of size k common to all sequences of U ?”

Without loss of generality, we may assume that |ui | = |uj | for1 ≤ i < j ≤ m, and that that we are looking for a commonsubsequence with p letters 0 and q letters 1, k = p + q.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 12 / 34

Page 14: On recognizing words that are squares for the shuffle product

Lossless transmission

It will convenient to see the reduction as a flow-like procedure, wheresome piece of information (the common subsequence) emitted fromgadget Ws (the source) propagates lossless to gadget Wt (the sink)going through all gadgets Wi , 1 ≤ i ≤ m (every such gadget beingassociated to an input word of our input instance of 01-LCS).

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 13 / 34

Page 15: On recognizing words that are squares for the shuffle product

Being a square for the shuffle product - the big picture

SOURCE TARGET

LOSSLESS TRANSMISSION

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 14 / 34

Page 16: On recognizing words that are squares for the shuffle product

Being a square for the shuffle product - the big picture

TARGETSecond

word

Last

word

First

wordSOURCE

Emit word Transmit word Transmit word

Transmit wordTransmit word

LOSSLESS TRANSMISSION LINE

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 15 / 34

Page 17: On recognizing words that are squares for the shuffle product

The constructed string

The string of interest is defined by:

w = Ws W1 W2 . . . Wm Wt ,

where Ws, W1, W2, . . ., Wm and Wt are words in A∗.

Ws is the “source”.

Each Wi is associated to an input binary string.

Wt is the “target”.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 16 / 34

Page 18: On recognizing words that are squares for the shuffle product

Designing the source and the target

SourceWs = s′ 0pq s′ s (0p 1)q 0p s

TargetWt = t (0p 1)q 0q t t ′ 0pq t ′

Recall that we are looking for a common subsequence with pletters 0 and q letters 1, k = p + q.

Letters s, s′, t and t ′ do not occur in any other gadget word.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 17 / 34

Page 19: On recognizing words that are squares for the shuffle product

Designing the source : the big picture

Reverse direction: Easy observations for the source

Rizzi & Vialette Being a square for the shu�e product CSR 2013 23 / 35

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 18 / 34

Page 20: On recognizing words that are squares for the shuffle product

Designing a string gadget

For any input binary string ui = ui,1ui,2 . . . ui,n , define

Wi = xi W ′i xi yi W ′

i yi ,

where

W ′i = ui,1 zi ui,2 zi . . . ui,n−1 zi ui,n .

Letters xi , yi and zi only occur in the gadget word Wi .

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 19 / 34

Page 21: On recognizing words that are squares for the shuffle product

Designing a string gadget: the big picture

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 20 / 34

Page 22: On recognizing words that are squares for the shuffle product

Forward direction

Suppose that there exists a common subsequence v of the wordsu1, u2, . . . , um with p occurrences of the letter 0 and q occurrenceof the letter 1.

The solution v propagates lossless from source gadget Ws to targetgadget Wt going through all string gadgets Wi , 1 ≤ i ≤ m (everysuch gadget being associated to an input word of our inputinstance of 01-LCS).

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 21 / 34

Page 23: On recognizing words that are squares for the shuffle product

Reverse direction

Suppose that w is a square for the shuffle product.

According to our previous lemma, this amount to saying that Gwhas an inclusion-free perfect matching M.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 22 / 34

Page 24: On recognizing words that are squares for the shuffle product

Reverse direction: Easy observations for the source

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 23 / 34

Page 25: On recognizing words that are squares for the shuffle product

Reverse direction: Transmitting from the source

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 24 / 34

Page 26: On recognizing words that are squares for the shuffle product

Reverse direction: Internal transmission

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 25 / 34

Page 27: On recognizing words that are squares for the shuffle product

Reverse direction: Internal transmission

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 26 / 34

Page 28: On recognizing words that are squares for the shuffle product

Iterated shuffle

TheoremIt is NP-complete to decide whether or not a word u ∈ A∗ is in theiterated shuffle of some word v ∈ A∗ with u 6= v.

Proof.In the previous reduction, some letters occur exactly two times inthe constructed word w.

In this case, w cannot be the shuffle of k ≥ 3 identical copies ofsome word v ∈ A∗.

In other words, if w is in the iterated shuffle of some distinct wordv ∈ A∗, then w is a square for the shuffle product.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 27 / 34

Page 29: On recognizing words that are squares for the shuffle product

Turning the problem into an optimisation task“Let u ∈ A∗. Find the largest v � u that is a square for the shuffleproduct.”Define

f (u) := max{|v| : v � u and ∃w � v such that v ∈ w � w}g(n, k) := min{f (u) : u ∈ Σn and |Σ| = k}

Theorem ([Axenovich, Person, and Puzynina. 2013])g(n, 2) = n − o(n).I.e., any binary word of length n can be split into two identicalsubwords and, perhaps, a remaining subword of length o(n).

TheoremThere is polynomial-time approximation scheme (PTAS) for computingf (u).

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 28 / 34

Page 30: On recognizing words that are squares for the shuffle product

Being the shuffle of a word with its reverse“Given u ∈ A∗, does there exist v ∈ A∗ such that u ∈ v � vR?”

Theorem ([Henshall,Rampersad, and Shallit. 2011])Let u be some word over some binary alphabet A. It is polynomial-timesolvable to determine whether or not there exists v ∈ A∗ such thatu ∈ v � vR.

Proof.If there exists v ∈ A∗ such that u ∈ v � vR, then u is an abeliansquare (i.e., u = v v′, where v′ is a permutation of v).

If u is a binary abelian square, then there exists v ∈ A∗ such thatu ∈ v � vR.

Notice that the equivalence is no longer true for larger alphabets!

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 29 / 34

Page 31: On recognizing words that are squares for the shuffle product

Words and linear graphsLet u = u1 u2 . . . un ∈ An be a word on some alphabet A, and let Gube the linear graph associated to u.

A matching is perfect if it covers all the vertices of the graph.

In case the set of vertices is equipped with a total order, amatching M is said to be 2-nested if the following two conditionshold:

I For every edge (u, v) in M, one endpoint is in the first half of thenodes of Gu and the other is in the second half;

I M can be partitioned into two disjoint matchings M1 and M2 suchthat both M1 and M2 are nested matchings

LemmaLet u ∈ A∗ for some alphabet A, and Gu be the corresponding lineargraph. Then, there exists v ∈ A∗ such that u ∈ v � vR if and only ifthere exists a 2-nested perfect matching in Gu.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 30 / 34

Page 32: On recognizing words that are squares for the shuffle product

Words and linear graphsLet u = u1 u2 . . . un ∈ An be a word on some alphabet A, and let Gube the linear graph associated to u.

A matching is perfect if it covers all the vertices of the graph.

In case the set of vertices is equipped with a total order, amatching M is said to be 2-nested if the following two conditionshold:

I For every edge (u, v) in M, one endpoint is in the first half of thenodes of Gu and the other is in the second half;

I M can be partitioned into two disjoint matchings M1 and M2 suchthat both M1 and M2 are nested matchings

LemmaLet u ∈ A∗ for some alphabet A, and Gu be the corresponding lineargraph. Then, there exists v ∈ A∗ such that u ∈ v � vR if and only ifthere exists a 2-nested perfect matching in Gu.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 30 / 34

Page 33: On recognizing words that are squares for the shuffle product

Being the shuffle of a word with its reverse

ABACBACA ∈ ABCA� (ABCD)R

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 31 / 34

Page 34: On recognizing words that are squares for the shuffle product

Problems left open

“How hard is the problem of detecting squares for the shuffle productfor bounded alphabet words?”

It is proved in [Buss, and Soltys. 2012] that the problem isNP-complete for an alphabet with 9 symbols (it is claimed thatthis can be improved to 7 letters).

It is claimed without proof in [Aoki, Uehara, and Yamazaki.2001] (Fact 2 Subsection 2.2) that detecting squares for the shuffleproduct is NP-complete for binary words.

This result – that would be an important improvement over [Buss,and Soltys. 2012] and our result – is yet to be confirmed.

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 32 / 34

Page 35: On recognizing words that are squares for the shuffle product

Problems left open

“Let u, v1, v2, . . . , vk ∈ A∗. Decide whether or not u ∈ �ki=1vi . Where

does fall the problem of deciding whether or not u ∈ �ki=1vi in the

W-hierarchy for parameter k?”

Parameterized complexity [Downey, and Fellows. 1999].

In particular, what about the (parameterized) complexity ofdeciding whether or not u ∈ �k

i=1vi for parameter k?

Given words u, v1, v2, . . . , vk ∈ A∗, it is NP-complete to decidewhether or not u ∈ �k

i=1vi [Mansfield. 1983] and [Warmuth,and Haussler. 1984].

The question which lies at the heart of this is: How hard is thisproblem compared to LCS?

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 33 / 34

Page 36: On recognizing words that are squares for the shuffle product

Problems left open

“How hard is the problem of detecting whether of word is in the shuffleof another word with its reverse?”

Polynomial time solvable for binary words[Henshall,Rampersad, and Shallit. 2011].

How useful are perfect 2-nested matchings?

RNA pseudoknotted structures: 2-intervals, RNA Bi-secondarystructures, LAPCS(S,S), . . .

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 34 / 34