Parallel K-set mutual range-join in hypercubes

s __ __ ii!il!z ELSEVIER

Microprocessing and Microprogramming

Microprocessing and Microprogramming 41 (1995) 443-448

Parallel k-set mutual range-join in hypercubes *

Hong Shen *

School of Compuring and Informafion Technology, Grlflth University, Nathan, QLD 41 I I, Austrulia

Received I September 1994; revised 3 March 1995; accepted 6 June 1995

Abstract

The mutual range-join of k sets, S,, S,; . . , S,, is the set containing all tuples (s,, s2; . ., st) that satisfy e, I 1 si -s, 1 I e, for all 1 I i +j I k, where si E Sj and e, 5 e2 are fixed constants. This paper presents an efficient parallel algorithm for computing the k-set mutual range-join in hypercube computers. The proposed algorithm uses a fast method to determine whether the differences of all pair numbers among k given numbers are within a given range and applies the technique of permutation-based range-join [ll]. To compute the mutual range-join of k sets S,, S,; . . , S, in a hypercube of p processors with 0(X;, ,n;/p) local memory, p < 1 Si 1 = n, and 1 < i 5 k, our algorithm requires at most

O((k log k/pN-I& ,_ ,ni) data comparisons in the worst case. The algorithm is implemented in PVM and its performance is extensively evaluated on various input data.

Keywords: Data comparison; Hypercube; Parallel algorithm; Permutation; Range-join

1. Introduction

The problem of k-set mutual range-join is to

compute a subset of the Cartesian product of k given

data sets in which each element (tuple) satisfies that the (absolute value) difference between any pair

components falls into the range [e,, e,], where 0 s e, I e, are fixed constants (bounds), 1 I i Zj I

” This work was partially supported by Australia Research Council under its Small Grants Scheme.

’ Email: [email protected]

k. Like k-set chain range-join [14], k-set mutual range-join is a generalisation of the standard range- join operation raised in [ 111 that covers both equijoin

and non-equijoin and has wide applications in database management, information processing and

statistics. When k = 2 this problem is the standard (Zset) range-join problem. When e, = 0, the problem becomes k-set equijoin which is the ordinary

join operation in most cases [16]. Various algorithms on sequential and parallel

equijoin have been proposed [l-2,4-5,7-9,15] and

they are mainly based on the most effective and promising technique for equijoin, the hash-based

0165-6074/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved SSDI Ol65-6074(95)00018-6

444 H. Shen /Microprocessing and Microprogramming 41 (1995) 443-448

join. As to non-equijoin for which the hash-based join technique is ineffective, few results are known

until recently [3,&l l-141. In [ 1 l-131 we proposed permutation-based join and selection-based join as

two alternative techniques for solving the problem of

2-set range-join in hypercubes. In [3] we presented a

permutation-based join algorithm for 2-set range-join

in N-dimensional torus. In [14], we described an

efficient algorithm for computing the k-set chain

range-join. In this paper, we present a simple and efficient

parallel algorithm to compute the k-set mutual range-join in hypercubes.

2. Fast method for mass determination

Given k numbers s,, s2, * . . , sk, and two con-

stants e, I e2, we are required to determine whether

or not tuple (s,, s2;. . , sk) is a mass in range [et,

e,], i.e. whether or not e, I 1 si - sj 1 I e2 for all 1 I i # j I k. A straightforward method for solving this problem is to check all different pairs of num-

bers, which would need k(k - 1)/2 comparisons. However, if the components in the tuple are sorted,

we find that k comparisons are sufficient to com- plete the above job as described in the following

lemma:

Lemma 1. Let s, I s2 I , . . . , I sk be k given numbers. If e, I l si - si+ i [se, for 1 silk-1 and e, I I sk - s1 I I e2 hold, then e, I I si - sj I I e2 is true for all 1 I i Zj I k.

Proof. Assume that the theorem is false, that is, its condition holds whereas 3x, y E [ 1, k] and x < y

such that (I s, - sY I < e, 1 V (I s, - sY I > e,>.

If lsx-sYl<e,, then ls,-ss,+ll~ls,-syl<

e, since s, I s,+i I . . . s s_,,, which contradicts the condition of the theorem.

If I s, - - sJ > e2

s,I>el, then lsk-s,l~Is,-skl~ls, since s1 I . +. s, I . . . Is, 5 . . . I

sk, which is contradictory with the condition, too.

Thus the assumption can not hold. q

We further claim that the above k comparisons

are necessary to check whether tuple (s,, s2,. . . , sk) is a mass in range [e,, e2] by the following lemma:

Lemma 2. Let s, I s2 I * . . I sk be k given numbers. Checking whether e, I I si - sj I I e2 holds for all 1 I i Zj I k requires at least k comparisons.

Proof. View s, I sz I . 1 . I sk as k vertices and add an edge between si and sj (i f j) if there is a

comparison being carried out between them. To obtain the relation between si and sj w.r.t. the satisfaction to inequality et I I si - sj I I e2 for all 1 5 i #j I k, we require that all the k vertices are connected since otherwise there is no way to know the relation

between any pair of unconnected vertices, which would require at least k - 1 edges.

Clearly k - 1 edges can connect k vertices only into a tree. In order to obtain the relation (w.r.t. the

satisfaction to the specified inequality) between every pair of non-neighbouring vertices in the tree, we

must have that the tree is an ordered tree so that

sequence {s, , s2, * * . , sk} can be produced by using a traversal scheme, and that all the relations of neighbouring pairs of vertices are cyclically transitive to

generate the relations of non-neighboring nodes. Achieving this with adding fewest extra edges into

the tree requires that the above tree is a linear array

whose elements follow the same partial order as

input (i.e. the i-th element is si, 1 I i I k) and that one extra edge is added to connect the head (s,) and

tail (So) of the array to make the array become a ring. 0

For arbitrary (si, s2,. . . , s,>, making the relations of all neighbouring pairs of numbers to be transitive is equivalent to sorting them. Hence by Lemma 1 and 2 we have the following theorem:

Theorem 1. Let s,, s2,. . *, sk be k given numbers. An upper bound on the number of comparisons for

H. Shen / Microprocessing and Microprogramming 41 (1995) 443-448 445

checking whether e, < 1 si - sj ( I e2 holds for all 1 < i # j 5 k is O(k log k) and it is tight.

3. Parallel algorithm for mutual range-join

We follow the conventional definition on hypercube: it is an SIMD machine without shared memory and each processor has local memory of size M and a local disk and can perform basic arithematic and logic operations, data comparison and data transfer to a neighboring processor in constant time. We use 6”‘) to denote the resulting integer after inverting the wth bit of integer i. Let data sets S,, S,, . . . , Sk be each distributed evenly over p processors in a hypercube, p = 2” I ) Sj I = ni, and processor Pj holds S!, where 0 2 i I k and 0 I j s p - 1. We are required to compute the mutual range-join of S,, S,, . * . , Sk: T= MJ(S,, s,; . .) S,). Following Lemma 1, to check whether tuple (s, , s2,. . . , sk) is an element of T we should first sort its components, then compare each pair of adjacent components as well as the pair of the first and last components, and finally restore the sorted tuple back to the original order if it is accepted. The sketch of our algorithm is the following:

(1) T:= Sk; (2) For i=k-1, k-2;..,1 permute Si in the

hypercube such that all its subsets meet S” at processor P,. to form all possible combinations of Sy and T” forO<u, u<p- 1; Following each step of permutation, use a proper sequential algorithm to compute the mutual range-join of Sy permuted to processor P, and T” at P,.: for each pair of elements s and t, s E S,!’ and t E T”, first insert s into t and then compute the differences of s and its cyclical left and right neighbours in t; accept t if both differences are within range [e,, e2] or reject it otherwise. Thus T” = MJ(T”, Si) is obtained at processor P,, 0 I u I p - 1, after Si is fully

permuted. Hence T = U L,dT” = MJ(S,,

S,-,; . *, Si) is produced. Obviously using the above method to expand t

(increase the arity from 1 to k), we can ensure that finally if t is accepted all its components are sorted and each pair of cyclically neighbouring components are matching pair (whose difference is within the range [e,, e,]), and therefore by Theorem 1 t is a mass in range [e,, e2] and thus an element of T.

We denote the time required for a single data comparison, for a single arithmetic operation and for a single I/O operation by tC, t, and t,, respectively. Let every element in T” be an r-tuple. Elabo- rating the second part of Step 2 yields clearly the time required for computing MJCT”, SF> at processor P,, in the worst case to be

IT” I ni log r tc+(2r+6)t,

+ (“,+ l)IT”ln;t

PM IO’

where I Sy I = n/p and M is the local memory size. The first part of Step 2 can be realized by our data

permutation algorithm described in [ 1 l] which shows that fully permuting Si to all processors in the hypercube can be done in p - 1 (parallel) steps. Let M = O(Cf, ,ni/p). When I T”I > O(Cf, ,ni/p) T” is updated at P,, in blocks of size 0(X:, , n/p> one by one. Clearly I T” I I (k/pKlf= ,n, (in the worst case I T I= k@=, ni (all data match each other)). Elaborating the whole Step 2 we can easily derive the total running time of our algorithm:

k log kfini \

0 i=l

P tc +

k21jIjni k2,fini -t,+r

P t10

C ni \ i= 1 )

Hence, we have the following theorem:

Theorem 2. Let sets S,, S,, . . . , S, be each distributed evenly over p processors in a hypercube,

446 H. Shen /Microprocessing ond Microprogramming 41 (1995) 443-448

where p < 1 Si ( = ni and processor Pi holds subsets S/, where 1 <ilk and Oljlp- 1. The mutual range-join of S, , S,, . * 1, Sk can be computed in the hypercube with O<Cf= ,n/p) local memory in at most O((k log k/p>FI f=, ni> comparisons in the worst case.

Our algorithm compares favourably with that by simply sorting all data of every set and then joining sets one by one which would require 1 /PC:_ ,njl:

jn,i = 0((k2/p>II~= 1 ni> comparisons in the worst case.

4. Performance evaluation

Our algorithm has been implemented in the PVM environment on an S-processor hypercube. We have tested our algorithm extensively with various input data generated by a random number gegerator. The experimental results have validated our algorithm and its running time analyzed theoretically and re- vealed the performance of the algorithm on different input data.

Fig. 1 gives some figures based on our experimental results that depict the performance of the algorithm from different angles and show the rela-

cc

-= 0.0625

0.015625.-, - -

0.00195312 I I I I

0.015625 2 3 4 tiun&ofinputr6crS(k) 7 a 9 0 20 40 60 a0

Silt of b!$ 8atZ 140 160 Lao 200

(a) Varying number of input sets(k) (b) Varying size. of input sets (n)

Fig. 1. Experimental results on execution time of the algorithm.

H. Shen / Microprocessing and Microprogramming 41 (1995) 443-448 447

tions between different time components. Each tim- ing result is obtained by taking the average over

those measured on 10 random inputs. For each result, we also measure its variance (on the curve

‘Total variance’) with the result of the worst case among the 10 tested cases.

5. Concluding remarks

We have proposed a novel method to determine

whether the differences of all pair numbers among k given numbers are within a given range. Our method reduces the number of comparisons from O(k*) required by the usual method to O(k log k) which is

proved to be optimal. Using this method and the data

permutation technique described in Ill], we have

presented a simple and efficient parallel algorithm

for solving the k-set mutual range-join problem in

hypercube computers. The proposed algorithm has been implemented.

InthecasewhenlSj(<pandp<ISj(fori=~,,

73,’ ., n-, and j=r,+,, rrt+2,...,~k,where(~,,

772,’ . . , rk) is a permutation of (1, 2,. . *, k}, our

algorithm will also work correctly using the same treatment as described in [l 11.

The algorithm assumes that Si is evenly dis-

tributed in the hypercube. For other typical cases of

data distribution, the same solutions as described in [14] will apply.

Like k-set chain range-join, our algorithm joins

sets one by one in arbitrary order. We have observed that the join order affects the algorithm’s perfor-

mance in many cases, but have not yet been able to find the optimal join order which minimises the

algorithm’s running time. This remains as an open problem for further study.

References

[I] D. Bitton, H. Boral, D.J. Dewitt and W.K. Wilkinson, Parallel algorithms for the execution of relational database

operations, ACM Trans. Darubase Systems 8(3) (1983) 324-

353.

[2] S.D. Chen, H. Shen and R. Topor, An improved hash-based join algorithm in the presence of double skew on a hypercube computer, Proc. 17th Annual Compufer Science Conj’.

(1994) 179-188. 131 S.D. Chen, H. Shen and R. Topor, An efficient permutation-

based parallel range-join algorithm on iv-dimensional torus computers, Information P recessing Letters 52( 1) ( 1994) 35- 38.

[4] V. Deshpande, P. Larson and T.P. Martin, Parallel join algorithms for nested relations on shared-memory multipro- cessors, Proc. 2nd IEEE Symp. on Parallel and Distributed

Processing (IEEE Comput. Sot. Press, 1990) 344-347. [5] D.J. Dewitt and R.H. Gerber, Multi processor hash-based

join algorithms, Proc. I lth Int. Co& Very Large Data

Bases, Stockholm (1985) 151- 164. [6] D.J. Dewitt, J.F. Naughton and D.A. Schneider, An evalua-

tion of non-equijoin algorithms, Technical report, 1990. [7] M. Kitsuregawa, H. Tanaka and T. Mote-oka, Application of

hash to data base machine and its architecture, New Generu-

tion Compuring l(1) (1983) 63-74. [8] H.J. Lu, K.L. Tan and M.C. Shan, Hash-based join algo-

rithms for multiprocessor computers with shared-memory, Proc. 16rh VLDB Con$, Brisbane, Australia (1990) 198-209.

191 E. Omiecinski and E. Tien, A hash-based join algorithm for a cube-connected parallel computer, Information Processing

Lerrers 3d5) (1989) 269-275.

[IO] E. Omiecinski and E. Tien, The adaptive-hash join algorithm for a hypercube multicomputer, IEEE Trans. Purallel Dis-

rribured Sysr. 3(3) (1992) 334-349. [11] H. Shen, An efficient permutation-based parallel algorithm

for range-join in hypercubes, Parallel Computing 21 (1995)

303-313.

[I21 H. Shen, Selection-based parallel range-join in hypercubes, Proc. High Performance Computing Co@ ‘94, Singapore (1994) 160-169.

[13] H. Shen, An improved selection-based parallel range-join algorithm in hypercubes, Proc. ELI-ROMICRO 94, Liver- pool, UK, (1994) 65-72.

[I41 H. Shen, Efficient parallel k-set chain range-join in hypercubes, to appear in Computer J. (in press).

[I51 P. Valduriez and G. Gardarin, Join and semi-join algorithms for a multiprocessor database machine, ACM Trans.

Darabase Sysrems 9(l) (1984) 133-161. [16] M.T. &su and P. Valduriez, Principles of Distributed

Database Sysrems, (Prentice-Hall, Englewood Cliffs, 199 I).

H. Shen /Microprocessing and Microprogramming 41 (I 995) 443-448

Hong Shen is currently a Senior Lec- turer at the School of Computing and Information Technology, Griffith Uni- versity, Australia. He received the B.S. degree from Beijing University of Iron & Steel Technology, China, in 1982, MS. degree from the University of Sci- ence and Technology of China in 1287, Ph.Lic and Ph.D. degrees from Abo Akademi University, Finland, in 1990 and 1991 respectively, all in Computer Science. He became an Assistant Profes- sor at the Deoartment of Comouter Sci-

ence, Abe Akademi University, in 1991, and joined Griffith University in 1992. His main research interests include parallel algorithms, parallel and distributed computing, parallel computer architectures. He has oublished over 50 technical oaoers in the above areas.

. .

Dr. Shen is a member of the Association for Computing Machin- ery and the IEEE Computer Society.

Parallel K-set mutual range-join in hypercubes

Documents

Transcript of Parallel K-set mutual range-join in hypercubes