On recognizing words that are squares for the shuffle product

On recognizing words that aresquares for the shuffle product

Romeo Rizzi 1 Stephane Vialette 2

1Department of Computer Science, University of Verona, Italy

2CNRS & LIGM, Universite Paris-Est Marne-la-Vallee, France

8th International Computer Science Symposium in RussiaUral Federal University, Ekaterinburg, Russia

June 25-29, 2013

Rizzi & Vialette () Being a square for the shuffle product CSR 2013 1 / 34

ShuffleThe shuffle u � v of words u and v over A is the finite set of allwords obtainable from merging the words u and v from left toright, but choosing the next symbol arbitrarily from u or v.

ab� cd = {abcd, acbd, acdb, cabd, cadb, cdab}

The iterated shuffle of u is the language

ε ∪ u ∪ (u � u) ∪ (u � u � u) ∪ . . .


Iterated shuffle

There are basically two kinds of questions that can be addresseddepending on whether or not the shuffled element is given as a part ofthe input:

“Given u, v ∈ A∗, is u in the iterated shuffle of v?”, and“Given u ∈ A∗, is u in the iterated shuffle of some v ∈ A∗?”

For two applications of the shuffle product, we obtain:“Given u, v ∈ A∗, is u ∈ v � v?”, and“Given u ∈ A∗, does there exist v ∈ A∗ such that u ∈ v � v?”

As we shall see soon, these two problems dramatically differ incomplexity.


Some good news

Given words u, v1 and v2, it can be tested in O(|u|2/ log(|u|)

)time whether or not u ∈ v1 � v2 [Leeuwen, and Nivat, 1982].

The shuffle u � v of words u and v can be computed inO

((|u|+ |v|)

(|u|+|v||u|

))time [Spehner. 1986].

Given words u1, u2, . . . , uk , the shuffle �ki=1ui can be computed in

O((|u1|+|u2|+...+|uk ||u1|,|u2|,...,|uk |

))time [Allauzen. 2000].


Some bad news

Given words u, v1, v2, . . . , vn ∈ A∗, it is NP-complete to decidewhether or not u ∈ �k

i=1vi . [Mansfield. 1983;Warmuth, andHaussler. 1984].

This remains true even if the alphabet has size 3 [Warmuth, andHaussler. 1984].

For two words u and v, it is NP-complete to decide whether ornot u is in the iterated shuffle of v [Warmuth, and Haussler.1984].

This remains true even if the alphabet has size 3 [Warmuth,and Haussler. 1984].


Our main result

TheoremGiven u ∈ A∗, it is NP-complete to decide whether or not u is theshuffle of some word v ∈ A∗ with itself

(i.e., does there exist some v ∈ A∗ such that u ∈ v � v?).

This result was first claimed by K. Iwama [Iwama. 1983], but itturns out that the proof has a serious flaw.

This result was recently proved independently by Buss and Soltys[Buss, and Soltys. 2012].

The problem is polynomial-time solvable in case each characteroccurs at most four times.


Stack Exchange discussion board


Words and linear graphsLet u = u1 u2 . . . un ∈ An be a word on some alphabet A.

The graph associated to u, denoted V(Gu), is defined by

V(Gu) = {u1, u2, . . . , un}, andE(Gu) = {{ui , uj} : i 6= j ∧ ui = uj}.

The structure of this underlying graph is linear, i.e., the set ofvertices is equipped with a natural total order < defined byui < uj if and only if i < j.

In other words, the vertices of Gu correspond to the letters of u inthe left to right order and there is an edge between any twoidentical distinct letter of u.

Clearly, Gu is the disjoint union of cliques, one clique for eachdistinct letter of u.


Words and linear graphs

For u = ababbbaa we obtain:

Gu a b a b b b a a


Words, linear graphs, and inclusion-free matchings

Let u = u1 u2 . . . un ∈ An be a word on some alphabet A, and let Gube the linear graph associated to u.

A matching is perfect if it covers all the vertices of the graph.

In case the set of vertices is equipped with a total order, amatching M is said to be inclusion-free if there do not exist(independent) edges (ui , uj) and (uk , u`) in M such thatui < uk < u` < uj or uk < ui < ij < u`.

LemmaLet u ∈ A∗ for some alphabet A, and Gu be the corresponding lineargraph. Then, there exists v ∈ A∗ such that u ∈ v � v if and only ifthere exists an inclusion-free perfect matching in Gu.


Linear graph and inclusion-free perfect matching

u = a b b aa b b a

Gu a b a b b b a a

M a b a b b b a a


Being a square for the shuffle product

TheoremGiven u ∈ A∗, it is NP-complete to decide whether or not u is theshuffle of some word v ∈ A∗ with itself.

We propose a polynomial-time reduction from the NP-completeLongest Common Subsequence for binary words[Maier.1978]:

“Given a collection of words U = {u1, u2, . . . , um}, ui ∈ {0, 1}∗ for1 ≤ i ≤ m, and a positive integer k, decide whether there exists asubsequence of size k common to all sequences of U ?”

Without loss of generality, we may assume that |ui | = |uj | for1 ≤ i < j ≤ m, and that that we are looking for a commonsubsequence with p letters 0 and q letters 1, k = p + q.


Lossless transmission

It will convenient to see the reduction as a flow-like procedure, wheresome piece of information (the common subsequence) emitted fromgadget Ws (the source) propagates lossless to gadget Wt (the sink)going through all gadgets Wi , 1 ≤ i ≤ m (every such gadget beingassociated to an input word of our input instance of 01-LCS).


Being a square for the shuffle product - the big picture

SOURCE TARGET

LOSSLESS TRANSMISSION


Being a square for the shuffle product - the big picture

TARGETSecond

word

Last

word

First

wordSOURCE

Emit word Transmit word Transmit word

Transmit wordTransmit word

LOSSLESS TRANSMISSION LINE


The constructed string

The string of interest is defined by:

w = Ws W1 W2 . . . Wm Wt ,

where Ws, W1, W2, . . ., Wm and Wt are words in A∗.

Ws is the “source”.

Each Wi is associated to an input binary string.

Wt is the “target”.


Designing the source and the target

SourceWs = s′ 0pq s′ s (0p 1)q 0p s

TargetWt = t (0p 1)q 0q t t ′ 0pq t ′

Recall that we are looking for a common subsequence with pletters 0 and q letters 1, k = p + q.

Letters s, s′, t and t ′ do not occur in any other gadget word.


Designing the source : the big picture

Reverse direction: Easy observations for the source

Rizzi & Vialette Being a square for the shu�e product CSR 2013 23 / 35


Designing a string gadget

For any input binary string ui = ui,1ui,2 . . . ui,n , define

Wi = xi W ′i xi yi W ′

i yi ,

where

W ′i = ui,1 zi ui,2 zi . . . ui,n−1 zi ui,n .

Letters xi , yi and zi only occur in the gadget word Wi .


Designing a string gadget: the big picture


Forward direction

Suppose that there exists a common subsequence v of the wordsu1, u2, . . . , um with p occurrences of the letter 0 and q occurrenceof the letter 1.

The solution v propagates lossless from source gadget Ws to targetgadget Wt going through all string gadgets Wi , 1 ≤ i ≤ m (everysuch gadget being associated to an input word of our inputinstance of 01-LCS).


Reverse direction

Suppose that w is a square for the shuffle product.

According to our previous lemma, this amount to saying that Gwhas an inclusion-free perfect matching M.


Reverse direction: Easy observations for the source


Reverse direction: Transmitting from the source


Reverse direction: Internal transmission


Iterated shuffle

TheoremIt is NP-complete to decide whether or not a word u ∈ A∗ is in theiterated shuffle of some word v ∈ A∗ with u 6= v.

Proof.In the previous reduction, some letters occur exactly two times inthe constructed word w.

In this case, w cannot be the shuffle of k ≥ 3 identical copies ofsome word v ∈ A∗.

In other words, if w is in the iterated shuffle of some distinct wordv ∈ A∗, then w is a square for the shuffle product.


Turning the problem into an optimisation task“Let u ∈ A∗. Find the largest v � u that is a square for the shuffleproduct.”Define

f (u) := max{|v| : v � u and ∃w � v such that v ∈ w � w}g(n, k) := min{f (u) : u ∈ Σn and |Σ| = k}

Theorem ([Axenovich, Person, and Puzynina. 2013])g(n, 2) = n − o(n).I.e., any binary word of length n can be split into two identicalsubwords and, perhaps, a remaining subword of length o(n).

TheoremThere is polynomial-time approximation scheme (PTAS) for computingf (u).


Being the shuffle of a word with its reverse“Given u ∈ A∗, does there exist v ∈ A∗ such that u ∈ v � vR?”

Theorem ([Henshall,Rampersad, and Shallit. 2011])Let u be some word over some binary alphabet A. It is polynomial-timesolvable to determine whether or not there exists v ∈ A∗ such thatu ∈ v � vR.

Proof.If there exists v ∈ A∗ such that u ∈ v � vR, then u is an abeliansquare (i.e., u = v v′, where v′ is a permutation of v).

If u is a binary abelian square, then there exists v ∈ A∗ such thatu ∈ v � vR.

Notice that the equivalence is no longer true for larger alphabets!


Words and linear graphsLet u = u1 u2 . . . un ∈ An be a word on some alphabet A, and let Gube the linear graph associated to u.

A matching is perfect if it covers all the vertices of the graph.

In case the set of vertices is equipped with a total order, amatching M is said to be 2-nested if the following two conditionshold:

I For every edge (u, v) in M, one endpoint is in the first half of thenodes of Gu and the other is in the second half;

I M can be partitioned into two disjoint matchings M1 and M2 suchthat both M1 and M2 are nested matchings

LemmaLet u ∈ A∗ for some alphabet A, and Gu be the corresponding lineargraph. Then, there exists v ∈ A∗ such that u ∈ v � vR if and only ifthere exists a 2-nested perfect matching in Gu.


Being the shuffle of a word with its reverse

ABACBACA ∈ ABCA� (ABCD)R


Problems left open

“How hard is the problem of detecting squares for the shuffle productfor bounded alphabet words?”

It is proved in [Buss, and Soltys. 2012] that the problem isNP-complete for an alphabet with 9 symbols (it is claimed thatthis can be improved to 7 letters).

It is claimed without proof in [Aoki, Uehara, and Yamazaki.2001] (Fact 2 Subsection 2.2) that detecting squares for the shuffleproduct is NP-complete for binary words.

This result – that would be an important improvement over [Buss,and Soltys. 2012] and our result – is yet to be confirmed.


Problems left open

“Let u, v1, v2, . . . , vk ∈ A∗. Decide whether or not u ∈ �ki=1vi . Where

does fall the problem of deciding whether or not u ∈ �ki=1vi in the

W-hierarchy for parameter k?”

Parameterized complexity [Downey, and Fellows. 1999].

In particular, what about the (parameterized) complexity ofdeciding whether or not u ∈ �k

i=1vi for parameter k?

Given words u, v1, v2, . . . , vk ∈ A∗, it is NP-complete to decidewhether or not u ∈ �k

i=1vi [Mansfield. 1983] and [Warmuth,and Haussler. 1984].

The question which lies at the heart of this is: How hard is thisproblem compared to LCS?


Problems left open

“How hard is the problem of detecting whether of word is in the shuffleof another word with its reverse?”

Polynomial time solvable for binary words[Henshall,Rampersad, and Shallit. 2011].

How useful are perfect 2-nested matchings?

RNA pseudoknotted structures: 2-intervals, RNA Bi-secondarystructures, LAPCS(S,S), . . .


On recognizing words that are squares for the shuffle product

Documents

Transcript of On recognizing words that are squares for the shuffle product