Constructing Signature Graphs for Signature Files

24
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada

description

Constructing Signature Graphs for Signature Files. Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada. Motivation Signature Files as Indexes Signature Graph and its Construction Signature Graph and its Construction Searching a Signature Graph - PowerPoint PPT Presentation

Transcript of Constructing Signature Graphs for Signature Files

Page 1: Constructing Signature Graphs for Signature Files

Constructing Signature Graphs for Signature Files

Dr. Yangjun Chen

Dept. Applied Computer Science University of Winnipeg

Canada

Page 2: Constructing Signature Graphs for Signature Files

• Motivation

• Signature Files as Indexes

• Signature Graph and its Construction– Signature Graph and its Construction

– Searching a Signature Graph

• Maintenance of Signature Graph

• Summary and Future Work

Page 3: Constructing Signature Graphs for Signature Files

Motivation

• Establish Indexes to speed up query evaluation

• B+-trees, inverted files, signature files

• Signature files: simple and easy for maintenance

• Signature graphs: less time for searching

Page 4: Constructing Signature Graphs for Signature Files

Signature Files as Indexes

Definition A signature for a key word or

an attribute value is hash-coded bit string.

Signature construction

- Important parameters:

m: number of 1s in bit string

F: length of bit string

D: size of a block (or average number of the key words of

an element)

- optimal choice of the parameters:

F ln2 =mD

Page 5: Constructing Signature Graphs for Signature Files

• Example: (constructing a signature for a word

with m = 4 and F = 12)

“database”

letter triplets: dat, ata, tab, aba, bas, ase

H(dat) = 5, H(ata) = 1, H(tab) = 8, H(aba) = 1,H(bas) = 10, H(ase) = 8.

100 010 010 100

Page 6: Constructing Signature Graphs for Signature Files

Signature Files as Indexes

text: … SGML … database …information … matching

word signatures: queries: query signatures: results

SGML 010000100110 SGML 010000100110 match with

OS

database 100010010100 XML 011000100100 no match

with OS

information 010100011000 informatik 110100100000 false drop

 

object signature 110110111110

(OS)

Page 7: Constructing Signature Graphs for Signature Files

relation:

John male ... ...... ...

name sex1 0 1 1 0 1 1 01 0 1 1 1 0 0 11 0 1 0 0 1 1 10 1 1 1 0 1 1 00 1 1 1 0 1 0 10 1 0 1 1 1 0 01 1 1 0 0 1 0 01 0 1 0 1 0 1 1

s 1 .s 2 .s 3 .s 4 .s 5 .s 6 .s 7 .s 8 .

signature file:

query: John male query signature: 1010 0101

Example:

Page 8: Constructing Signature Graphs for Signature Files

Signature Graph

Consider a signature si of length m. We denote it as si = si[1]si[2] ... si[m],

where each si[j] {0, 1} (j = 1, ..., F). We also use si(j1, ..., jh) to denote

a sequence of pairs w.r.t. si: (j1, si[j1])(j2, si[j2]) ... (jh, si[jh]), where

1 jk m for k {1, ..., h}.

Definition (signature identifier) Let S = s1.s2 ... .sn denote a signature

file. Consider si (1 i n). If there exists a sequence: j1, ..., jh such that

for any k i (1 k n) we have si(j1, ..., jh) sk(j1, ..., jh), then we say

si(j1, ..., jh) identifies the signature si or say si(j1, ..., jh) is an identifier

of si.

Page 9: Constructing Signature Graphs for Signature Files

Example:

s8(5, 1, 4) = (5, 1)(1, 1)(4, 0)

(*For any i 8 we have si(5, 1, 4) s8(5, 1, 4).

For instance, s5(5, 1, 4) = (5, 0)(1, 0)(4, 1) s8(5, 1, 4), s2(5, 1, 4) = (5, 1)(1, 1)(4, 1) s8(5, 1, 4), and so on.*)

s1(5, 4, 1) = (5, 0)(4, 1)(1, 1)

(*For any i 1 we have si(5, 4, 1) s1(5, 4, 1).*)

Page 10: Constructing Signature Graphs for Signature Files

Signature Graph

• Definition (signature graph) A signature graph G for a signature file S = s1.s2 ... .sn, where si sj for i j and |sk| = F for k = 1, ..., n, is a graph G = (V, E) such that

1. each node v V is of the form (p, skip), where p is a pointer to a signature s in S, and skip is a non-negative integer i. If i > 0, it tells that the ith bit of sq will be checked when searching. If i = 0, s will be compared with sq.

2. Let e = (u, v)E. Then, e is labeled with 0 or 1 and skip(u) > 0. Let skip(u) = i. If e is labeled with 0 and i > 0, the ith bit of the signature pointed to by p(v) is 0. If e is labeled with 1 and i > 0, the ith bit of the signature pointed to by p(v) is 1. A node v with skip(u) = 0 does not have any children.

Page 11: Constructing Signature Graphs for Signature Files

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

0

0

p6 10

1

p7 2

0

1p8 4

1

0

S1: 1011 0110S2: 1011 1001S3: 1010 0111S4: 0111 0110S5: 0111 0101S6: 0101 1100S7: 1110 0100S8: 1010 1011

Page 12: Constructing Signature Graphs for Signature Files

Construction of signature graph:

p1 0 p2 5

p1 0

0 1

p2 5

p3 4

0 1

p1 0

10

p2 5

p3 4

0 1

p4 1

10

p1 0

10

Insert s1 Insert s2 Insert s3

Insert s4

Page 13: Constructing Signature Graphs for Signature Files

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

00

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

00

p6 101

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

00

p6 10

1

p7 2

0

1

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

0

0

p6 10

1

p7 2

0

1p8 4

1

0

Insert s5 Insert s6

Insert s7Insert s8

Page 14: Constructing Signature Graphs for Signature Files

Signature Graph

Searching a signature graph

Denote sq(i) the i-th position of sq. During the traversal of a signature graph, the inexact matching can be done as follows:

(i) Let v be the node encountered and sq (i) be the position to be checked.

(ii) If sq (i) = 1, we move to the right child of v

(iii)If sq (i) = 0, both the right and left child of v will be visited.

(iv)A search along a path stops when a node without any child node or a

node is encountered for the second time.

Page 15: Constructing Signature Graphs for Signature Files

Signature Graph

p2 5

p3 4

0 1

p4 1

1

0

p1 0

1

1p5 3

0

0

p6 10

1

p7 2

0

1p8 4

1

0

marked

marked

marked

marked

marked

marked

marked

Page 16: Constructing Signature Graphs for Signature Files

Maintenance of Signature Graph

- Insertion of a signature s into G

Same as the construction of a signature graph

- Deletion of a signature s from G

(i) Search G from the root until a node v is encountered, which is marked or skip(v) = 0.

(ii) If skip(v) = 0, Compare p(v) and s. If s matches p(v) exactly, do the following; otherwise, nothing will be done.

Let v1 ... vk-1 vk v be the path explored.

Let u1 be another child of vk (not on the path). Remove vk-1 vk, vk u1 and v; and generate a new edge vk-1 u1. skip(vk) := 0.

Page 17: Constructing Signature Graphs for Signature Files

Maintenance of Signature Graph

- Deletion of a signature s from G (continued)

(iii) If skip(v) 0, Compare p(v’s father) and s. If s matches p(v’s father) exactly, do the following; otherwise, nothing will be done.

Let v1 ... vk-1 vk v be the path explored.

If vk v, replace p(v) with p(vk). Let u1 be another child of vk (not on the path). Let u2 be another parent of vk (not on the path). Replace vk-1 vk with vk u1, and replace vk v with u2 v. Remove vk. Note that u2 can be found by searching G from vk

with the target signature being p(vk).

If vk v, replace vk vk with vk-1 u1. Remove vk.

Page 18: Constructing Signature Graphs for Signature Files

Maintenance of Signature Graph

Illustration for (ii)

… vv1vk-1 vk

u1 u2

… vv1vk-1 vk

u1 u2

To be removed

Page 19: Constructing Signature Graphs for Signature Files

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

0

0

p6 10

1

p7 2

0

1p8 0

1

0

remove p1

p2 5

p3 4

0 1

10

p4 0

1p5 30

p6 10

1

p7 2

0

1p8 0

1

0

Example:

Page 20: Constructing Signature Graphs for Signature Files

Maintenance of Signature Graph

Illustration for (iii)

… vv1vk-1 vk

u1 u2

… vv1vk-1 vk

u1 u2

To be removed

Page 21: Constructing Signature Graphs for Signature Files

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

0

0

p6 10

1

p7 2

0

1p8 4

1

0

remove p8

Example:

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

00

p6 10

1

p7 2

0

1

Page 22: Constructing Signature Graphs for Signature Files

…v1vk-1 v

u1

…v1vk-1 v

u1

To be removed

Illustration for (iii)

Page 23: Constructing Signature Graphs for Signature Files

remove p7

Example:

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

0

0

p6 10

1

p7 2

0

1p8 4

1

0

p2 5

p3 4

0 1

p4 1

10

p1 0

1

1p5 3

0

0

p6 10

1

p8 4

1

0

Page 24: Constructing Signature Graphs for Signature Files

Summary and Future Work

                                              

- Signature and signature file

- Signature graph

Construction of a signature graphSearch of a signature graphMaintenance of a signature graph

Future work:

Apply signature techniques to evaluation of

path-oriented queries in document databases.