Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio.

Post on 14-Jan-2016

212 views 0 download

Transcript of Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio.

Ravello, 19-20-21/09 C.E.

On some researches...

Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Outline

Compact representation

of local automata

The multidimensional Critical

Factorizazion Theorem

Ravello, 19-20-21/09 C.E.

The multidimensional Critical Factorization Theorem

Chiara Epifanio, Filippo Mignosi

Ravello, 19-20-21/09 C.E.

• A word is a sequence of characters over an alphabet A,

w A{1,2,…n}, ANN, AZZ

• w=a1…an is periodic if pN N s. t.

w(x+p)= w(x) x,1xn-p

W

• p is a period of w

Ravello, 19-20-21/09 C.E.

• a word may have more than a period (e. g. abaababaabaababaaba, that has periods 8 and 13)

• the smallest period of w is called “the” period of w.

Ravello, 19-20-21/09 C.E.

A factor v=wj…wj+n-1 of length n of w is a repetition of order if there exists a natural number p, 0pn such that wi=wi+p for i = j,…,j+n-1-p and such that n/p. The number p is called a period of the repetition. The smallest period of the repetition is called the period of the repetition.

Ex: abaabaRepetition of

period 6 and order 1

period 5 and order 6/5

period 3 and order 2

Ravello, 19-20-21/09 C.E.

Word w has a central repetition of order in position i if there exists a factor v centered in i that is a repetition of order . In this case we denote c(w,i) the smallest period among all the central repetitions of order in position i and we call it the central local period of order in i.

i

We denote by P(w) the maximum of the central local

periods of order in w.

A position i is critical if c(w,i)=P(w).

v

Ravello, 19-20-21/09 C.E.

The Critical Factorization Theorem

Let w be a word having length |w| 2. In every

sequence of l max {1, p(w)-1} consecutive

positions there is a critical one and P(w)=p(w),

=2.

Ravello, 19-20-21/09 C.E.

The Critical factorization Theorem in particular

states that for =2 there exists at least one point

such that the central local period detected at this

point coincides with the (global) period of the word,

i.e., there exists an integer j, 1 j |w|, such that

c(w,j)=p(w), =2.

We have given a new proof for =4.

Ravello, 19-20-21/09 C.E.

u v

v w

v wu

Lemma 1

Let u, v, w be words such that uv and vw have period p and |v|p. Then the word uvw has period p.

(cf. Lemma 8.1.2,Lothaire 2 chapter 8)

Ravello, 19-20-21/09 C.E.

w

v

vw

Lemma 2

Suppose that w has period q and that there exists a factor v of w with |v| q that has period r, when r divides q. Then w has period r.

(cf. Lemma 8.1.3,Lothaire 2 chapter 8)

Ravello, 19-20-21/09 C.E.

Fine and Wilf Theorem

Let w be a word having periods p and q, with

q p. If

|w| p + q - gcd(p,q),

then w has also period gcd(p,q).

Ravello, 19-20-21/09 C.E.

Multidimensional case

(Multidimensional periodicity was introduced by Amir and Benson for the design of Pattern Matching algorithms (1991). Since then, lots of people worked on it giving slightly different definitions).

Ravello, 19-20-21/09 C.E.

If u is a factor of w then v is a periodicity vector for u if

w((x,y)+v) = w(x,y)

(x,y)Dom(u) t.c. ((x,y)+ v)Dom(u)

u

v is a periodicity vector for w if w((x,y)+v) = w(x,y) (x,y)

Ravello, 19-20-21/09 C.E.

A factor u of w is lattice-periodic with respect to v1 and v2 if v<v1,v2> is a periodicity vector for u.

a b c d a

f g h e f

c d a b ch e f g ha b c d a

L=<(2,2), (-2,2)> = <(2,2),(4,0)>

Ravello, 19-20-21/09 C.E.

Given a subgroup H of Zd, a transversal TH of H is a subset of Zd such that for any element i Zd, there exists an unique element jTH such that i-j H.

An n-cubic factor v is a repetition of order , if

• v is L periodic, L lattice;

• n is such that n/hL, where hL is the smallest integer such that every hypercube of side hL

contains a transversal of L.

The lattice L is called a period of the -repetition v.

Ravello, 19-20-21/09 C.E.

Word w has a central repetition of order in position jZd if there exists a factor v of w centered in j that is a repetition of order .

If w has at least a central repetition of order and period L in j, the set

H={hL s.t. every hypercube of side hL contains a transversal of L}

We denote c(w,j)=min(H).

Let P(w) = limsup{c(w,j), j position in w}

Ravello, 19-20-21/09 C.E.

Lemma 3

Let v1 and v2 be two factors of same word w Zd that have both period a subgroup H. If sh(v1)sh(v2) contains a transversal of H then the factor v having shape sh(v)= sh(v1)sh(v2) has also period H.

sh(v1)sh(v2)

sh(v)

Ravello, 19-20-21/09 C.E.

Lemma 4

Let v1 and v2 be two factors of same word w Zd such that sh(v2) sh(v1). Suppose that v1 has period H1 and that v2 has period H2, with H1 subgroup of H2 and that sh(v2) contains a transversal of H1. Under these hypotheses v1 has period H2.

sh(v1)sh(v2)

Ravello, 19-20-21/09 C.E.

A generalization of the Fine & Wilf Theorem

If w has two periodicity vectors v1 and v2 and w is “big enough” with respect to v1 and v2, then w is lattice-periodic with respect to v1 and v2.

Ravello, 19-20-21/09 C.E.

The multidimensional Critical Factorization Theorem

• Informally, the C.F.T. states that the maximal local repetition of order 2 is also a period of the whole word.

• But …. there is no total order among lattices!!

• Our solution is to order lattices by using the length hL of the side of the smallest hypercube that contains a transversal of L.

• We have further to prove that all the lattices with same maximal hL coincide over the word.

• To do this, for the moment, we loose the tightness of the local repetition order (4 instead of 2).

Ravello, 19-20-21/09 C.E.

Theorem

Let w be a cubic bidimensional word, X be a cube included in the shape of w.

• Every cube T X, of side max(1,P4(X)-1) contains a position l such that c4(w,l)=P4(w).

• Let v be the factor of w having shape the intersection between sh(w) and the union X’ of the shapes of the 4-repetitions centered in position lX such that c4(w,l)=P4(X). Then v has period L, where L is a subgroup such that every cube of side P4(X) contains a transversal of L.

sh(v)

Ravello, 19-20-21/09 C.E.

Proof of the theorem

Lemma4 Fine & Wilf generalizationLemma 3

Thesis

Ravello, 19-20-21/09 C.E.

• Importance of the extension to the d-dimensional case (d2).

• Difficulties on such an extension (new definitions, extension of already known results).

• It is known that for d=1 the tight value is =2. It remains an open problem to find the tight value of for any dimension.

• Applications.

Conclusions and open problems

Ravello, 19-20-21/09 C.E.

Compact representation of local automata

M. Crochemore, C. Epifanio, R. Grossi, F. Mignosi

Ravello, 19-20-21/09 C.E.

Compacting is a standard technique used for reducing the size of data structures such as factor automata, DAWG and suffix trees and consists on replacing paths in automata with single edges.

In 2000 Crochemore, Mignosi, Restivo and Salemi gave an algorithm for “self-compressing” trie of antifactorial binary sets of words. The aim of that algorithm was to represent in a compact way antidictionaries to be sent to the decoder of a static compression scheme. What we have worked on is an improvement scheme of that algorithm that works for sets of words over any alphabet.

Ravello, 19-20-21/09 C.E.

The suffix trie of a word Tr(w) is a trie where the set of leaves is the set of suffixes of w that does not appear previously as a factor in w.

Ex.:

Ravello, 19-20-21/09 C.E.

The suffix tree T(w) of a word w is a compressed suffix trie, where only leaves and forks are kept. Each edge is labelled with a substring of w. In this way the number of nodes and leaves of T(w) is smaller than 2|w|.

But if the labels of arcs are stored explicitely, the implementation can have quadratic size. The simple solution is to represent labels by pairs of integers (position, position) or (position, length) and to keep the text aside.

Ex.:

Ravello, 19-20-21/09 C.E.

There are classical on-line linear time implementations. All of them use suffix link function s, that is defined over all the nodes of the suffix trie and suffix tree by

• s(root)=root• s(v)=v’, where v=av’, v being the labelling of the path form the root to v and a being the first letter of v.

Ex.:

Ravello, 19-20-21/09 C.E.

Our new approach is basically the same one of the suffix tree, but we compact a bit less, i.e. we keep all nodes of the suffix tree and some more nodes of the trie, that are all the nodes v of the trie such that s(v) is a node of the suffix tree.

In this case for any arc of the form (v,v’) with label a in the trie we have an arc (v,x) with same label in our compacted trie T2(w), where x is

• v’, if v’T2(w);

• the first node in T2(w) that is a descendant of v’ in the original trie, if v’T2(w).

In this second case, we consider that (v,x) represents the whole path from v to x in the suffix trie and we

add a sign + to node x in order to maintain this information.

Ravello, 19-20-21/09 C.E.

To complete the definition of T2(w) we keep the suffix link function over these nodes.

Notice that, by definition, for any node v of T2(w), s(v) is always a node of the suffix tree T(w) and hence it also belongs to T2(w).

This new approach let us not

to maintain the text aside.

Ravello, 19-20-21/09 C.E.

State of the art

• We have given compacting and decompacting algorithms;

• we have proved that the number of nodes in our compacted suffix tree is still linear;

•we have given an algorithm that can be used to check whether a pattern is present in a text, without “decompacting” the automaton;

• actually we are doing some experiments on the Calgary and Canterbury corpus.