Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And...

24
Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla Kiss Department of Information Systems, Eötvös Loránd University, Hungary Under the support of the Hungarian National Office for Research and Technology under grant no. RET14/2005.

Transcript of Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And...

Page 1: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Efficient Processing Regular Queries InShared-Nothing Parallel Database Systems

Using Tree- And Structural Indexes(ADBIS 2007, Bulgaria)

Vu Le Anh, Attilla Kiss

Department of Information Systems, Eötvös Loránd University, Hungary

Under the support of the Hungarian National Office for Research and Technology under grant no. RET14/2005.

Page 2: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Outline

• Problem

• State-of-art of the problem: Streaming approach vs. Partial Parallel approach

• Our efficient algorithm: Tree index & Structural index

• Experiments

• Summary

Page 3: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Problem

• Shared-Nothing Parallel Database System• Fragmented XML Tree• Regular queries

Page 4: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Shared-Nothing Parallel Database System

• 2 - thousands sites connecting by an interconnection network

• Each site: non-shared memory, non-shared disk and own processor

• The cost per processor may be extremely low because each node is an inexpensive processor

Interconnection network

Disk Disk Disk

memory

P1

memory

P2

memory

Pn

Page 5: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Shared-Nothing Parallel Database System

• Parallel processing

• Provide incremental and unlimited growth

• Failure is local: if one node fails, the others stay up

• The cost of system may be very cheap

Page 6: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Fragmented XML TreeA

A B

CB A D

AD B E

DC EA F

F0

F2 F4

F5

F1

0 F3

1

2

3

4 5

10

6

7

8

9

12

14

13

1511

Nodes : 0,1,…,15 Label values: A,B,…,F

Fragments : F0, F1,…, F5

F0 = {0,1,2,6}F1 = {3,4,5}…

Sites: S0, S1

S0={F0,F4,F5}, S1={F1,F2,F3}Site = machine1 Master site + Slaver sites Master server:- Communicating with the Clients- Controling the Slavers processingqueries

Page 7: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Regular queries

• A variety of query languages have been proposed for XML data: UnQL, Lorel, XQL, XML-QL, etc. All of them are built around regular path expressions.

• Three basic operations: Union, Concatenation and Iteration.

• Every regular path expression can be determined by a finite deterministic automata.

Page 8: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Example regular query

• Query: //B/D. Query graph:

A

A B

0

1 8

CB2 6 A D9 13

AD B E3 107 14

DCA F4 5 1211 E 15

* B D

q0 q1 q2

q0

q0 q0

q0

q0

q0 q2 q0 q2

q1

q0

Answers = {3,11,13}

Page 9: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

ProblemProblem:

- Nodes are in different

fragments

2 approaches:

Streaming approach vs. Partial Parallel approach

A

A

B

0

1

2

q0

q0

q0 q1

F1

F0

Page 10: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Basic operation: Fragment Process

Fragment-Process(F,q):

- Traverse the fragment F and the query graph begin at the root of F and state q

- While processing if a link edge is traversed, different processes will have different behavior

Page 11: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Streaming Approach

• If a link edge F F’ is traversed:

1. Current fragment process operation over F is stopped.

2. The corresponding fragment process operation over F’ is started

3. If 2 finishes 2 sends the result to 1, 1 will be resumed

Page 12: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Streaming Approach

A F00 F3

1

2

A B8

F2 F4F13

4 5

10

6

12

14

13

1511A C

D

CB

F

E

D

D

B

A

A

E

7

F5

Sequence of events:1. (F0,q0) is started

2. Link edge (2,3) is traversed

3. (F0,q0) is stopped

4. (F1,q0) is started

5. (F0,q0) is resumed

6. Link edge (2,3) is traversed again

7. (F0,q0) is stopped

8. (F1,q2) is started, {3} is sent to F0

9. (F0,q0) is resumed

No parallelism, the waiting time is high

Page 13: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Partial Parallel Approach

• When fragment process operation is processed there is no communication with other sites

• If a link edge (F, q) (F’, q’) is traversed:- Write down the fact:

If (F, q) is processed (F’, q’) will be processed • These facts will be sent to the Master to find out

all the operations which are reachable • Only the results of the reachable operations are

sent to the Master

Page 14: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Partial Parallel Approach

A F00 F3

1

2

A B8

F2 F4F13

4 5

10

6

12

14

13

1511A C

D

CB

F

E

D

D

B

A

A

E

7

F5

Sequence of events:1. All fragment process operations of

S0 and S1 are executed in parallel

2. S1= {F1, F2, F3}Operations: (F1,q0), (F1,q1), (F1,q2) (F2,q0), (F2,q1), (F2,q2),(F3,q0), (F3,q1), (F3,q2)3. The list of facts:(F3,q0) (F4,q0) (F3,q0) (F5,q0)(F3,q0) (F5,q1)4. List of reachable operations: (F1,q0), (F1,q1), (F2,q0), (F3,q0)

5. Sending the results of reachable operations to the Master

S0= {F0, F4, F5}S1= {F1, F2, F3}

Page 15: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Our algorithm

• Partial Parallel Approach- Advantages: Parallelism, the number of

communication is constant and each fragment is scanned maximum once

- Disadvantages: many unnecessary operations

• Our algorithm: - Based on the partial evaluation- Restrain the unnecessary operations

Page 16: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Unnecessary operations

• Unnecessary operations type I: Def: Unreachable operationsSolution:- Determined by Tree Index- Tree Index is stored in Master storing all paths

connecting between the roots of Fragments• Unnecessary operations type II:Def: Return no resultSolution:- Restrained by structural indexes- Structural indexes = Simulators of Fragments

Page 17: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Tree Index

A F00 F3

1

2

A B8

F2 F4F13

4 5

10

6

12

14

13

1511A C

D

CB

F

E

D

D

B

A

A

E

7

F5 Tree Index

A F0

A F2 BF3

B F4

DF1

DF5

ε

ABAC

A

εq0

q0 q1

(F2,q1), (F2,q2): unreachable

q0

q0

q0q0 q1

Reachable operations:(F0,q0), (F1,q0), …

The size of tree index = The number of FragmentsThe process cost can be ignored

Page 18: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Structural Indexes• Simulating the fragment by a index graph

• Processing over the index graph is safe. Using as necessary condition

(if an operation returns no result over the index graph, it also returns no result over the fragment )

• The size of the index should be constant so that the cost of pre-processing is minimized

Page 19: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

DL-Indexes

A5

Fragment

A10

D8B7 D9B6

C11 C12 C13 A14

F16E15 E17 E18

A19 A20 A21 A22

DL Index

A

B,D

A,C

E,F

A

Simulating

* B D

q0 q1 q2

(F,q0), (F,q1) and (F,q2): unnecessary operations type II

q0

q0

q0 q1

q0

q0

Page 20: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Our Algorithm

1. The Master determines the reachable operations by the tree-index

2. For each reachable operation, using the corresponding structural index to check out if it is a unnecessary operation type 2.

3. Sending the „good” operations to each sites

4. Each site processes the operations and send back to the Master

Page 21: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Experiments

• Comparing the performance of three algorithm:

Our algorithm (EPP), Partial Processing algorithm (PP) and Streaming Processing algorithm (TP)

• System: 19 Linux machines connecting by local network

• Data set: 500 Mb 76 fragments: randomly stored in servers

• Queries: 10 Queries representing for different conditions of the environment

Page 22: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Experiments

• Waiting time:

EPP : PP : TP = 1 : 1.94 : 37.52

The waiting time of TP extremely high since there is no parallelism

• Processing and Communication Cost:

EPP : PP : TP = 1 : 1.77 : 2.75

In some cases the total cost of PP is higher than TP because of redundant operation type 2

• EPP is the best

Page 23: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Summary• Introduce an efficient algorithm processing regular

queries in shared-nothing based on partial evaluation

• Two types of unnecessary operations: - Type 1: Unreachable operations. Restrained by

processing over the tree index - Type 2: Returning no matching nodes. Restrained

by processing over structural indexes• Experiments: Our algorithm overcomes the

classical algorithms according the waiting time and processing and communication cost criteria

Page 24: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.

Thank you.

Question?