DATA BECDCTION ALGORITHBS FOR DISIRIBUTSD
QDIBY PROCISSING
by
JIA-SHINN WANG, E.E.
A THESIS IN
COHPOTEH SCIENCE
Submitted to the Graduate Faculty of Texas Tech University in
Partial Fulfillicent of the Fequirements for
the Degree of
HASIER OF SCIENCE
Approved
Accepted
May, 1984
ACKNCWLED3EM£N'IS
I would like to express my deepest appreciation to the
committee chairman. Dr. Gopal Lakhani, for his guidance and
suggestions in preparing this thesis. I wish to thank Dr.
Leonard H. Weiner for his valuable assistance. I am also
grateful to the members of my ccmicittee for their support
and assistance. Finally, to my faiily and Yu-Hua, ay
everlasting thanks for their understanding and encouragement
during my graduate study.
11
CONTENTS
ACKNOWLEDGEMENTS 11
CHAPTEF
I. INTRODOCTION 1
II. QUERY PROCESSING IN DISTRIBUTED DATAEASE 7
Query Processing 7 Semijcin 10
Distributed Query Processing Strategies 13
III. I!1PR0VEKENT IN HEVNER AND YAC'S METHCC 17
System Model 18 Simple Query 23
General Query 30
IV. MINI-MAX ALGORITHM 39
Introduction 39 Mini-Max Algorithm 42 Correctness of the Algorithm 51 Complexity of the Algorithm 55
V. CONCLUSION 58 Merits of Algorithm Mini-Max 58 Simulation for Study of Total and Response Time 61 Future Research and Suggestions 63
BIBLIOGRAPHY 64
111
LIST OF TABIES
1. A Distributed Eata Base System 21
2. Example of Siitple Query 27
3. Example of General Queries 31
4. Candidate Schedule for RJ 33
5. Candidate Schedule for R2 33
6. Data Base State for Algorithm Mini-Max 48
7. Comparison of Total Time and Response Time 62
IV
LIST OF FIGURES
1. Graph Cost for lES 25
2. Cost Graph for Reducing A Relation 25
3. Cost Graph for An Example of A Simple Query 28
4. Candidate Schedules for General Query Example 32
5. Cost Graph for RJ 35
6. Cost Graph for R2 36
7. Final Cost Graph for General Query 36
8. Cost Graph Including Processing Cost 41
9. Alternate Cost Graph Schedule 43
10. Cost Graph for IFS in Algorithm Mini-Max 49
11. Cost Graph After First Phase 50
12. Final Cost Graph for Algorithm Mini-Max 51
CHAPTER I
INTRODUCTICN
Recent advances in computer and communciation
technologies, coupled with explosion in size and complexity
of application areas, have led to design of large computer
communication complexes. For instance, the ARPANET
currently supports communication among more than one hundred
computer systems. The network is under continual development
and is used by thousands of users daily. Computer networks
have capability to bring computing power to the people who
need it, and provide access to a wider variety of resources
dispersed among several computers which are linked by a
communication facility to provide the basis for
"distributed" computing services.
In general, the notion of "distributed systems" varies
in character and scope with different people [KAR80]. So
far, there is no accepted definition and basis for
classifying these systems. Basically, at least four
physical components of a system might be distributed:
hardware or processing logic, data, the processing itself,
and control. Some speak of a system that has any one of
these components distributed as being a "distributed
system." However, Enslow [ENS78] pointed out that a
^
distributed system should include: a multiplicity of general
purpose resource components, a physical distribution of
these physical and logical components, a high level oper
ating system that unifies and integrates the control of the
distributed components, system transparency, and cooperative
autonomy.
Evolution of modern computer network technology and
rise of common carrier packet switched networks have provid
ed motivation to develop distributed information networks.
In addition, due to increasing geographic dispersion of end
users within an organization, pressures are generated or
data processing and corporate management to distribute data
processing and storage capabilities at the location of data
origin and/or end use of the data. The Distributed Data
Base Management System (DDBMS) provides faster, easier access
and more reliability to data than is feasible in tradition
al, centralized Data Base Management System (EBMS). Accord
ing to the CODASYL systems committee [NCC78], the major be
nefits derived of distributions focus on increased data
availablity and reduced exposure of total system failure due
to hardware/software failure to end users.
There are currently three favored approches to the data
base model: the hierarchical model, the network model, and
the relational model £DAT31]. The hierarchical model and
the network model present to the user a navigational
interface with which the user must determine the data access
path. The relational model was later proposed to achieve a
high level of data independence. Users perceieve the data
base as a collection of relations (or tables), regardless of
actual physical data structures used for stroage. Each rela
tion is composed of a set of homogeneous records (called tu
ples) , which in turn are subdivided into an ordered set of
fields call attributes. This simple data representation al
lows access of data by specifying the properties of data to
be retrieved, rather than specifying how data are to he ac
cessed. A component of the data base called the query opti
mizer determines ah efficient access plan.
Distributed data base systems are considerably more
complex than centralized ones [SWA81, SMA79, SMI81, RAM79].
These additional complexities are due to some problems in
herent to distribution, such as synchronization, heterogene
ity, and geographical dispersion. Major areas of current
research are query optimization, distributed concurrency
control, failure recovery, distributed data base design, and
distributed architectures for data base machines [SAC82].
The present thesis focuses on the problem of guery
optimization in distributed data base systems. The query
optimizer is a difficult component of a data base system
^
[ROT80], because the cost of almost all interactions with
the data base depends on the quality of plans which are de
termined by the query optimizer. This component is invoked
not only for retrival operations but also for replace and
remove operations. In [SAC82], the authors have presented
an excellent review of this area.
The principal bottleneck in distributed data base sys
tems is data communication. All economically feasible long
distance communication media incur lengthy delays. Moreover,
the cost of moving data through a network is high enough
that it can be compared with the cost of storing the data
locally for many days. In addition, parallel processing is
also an inherent aspect of distributed systems and mitigates
to some extent the communication factor. Distributed guery
evalution methods not only focus on efficient utilization of
network links but also explore mutltiplicity of assignable
resources to provide services within the system-
Most of the works in this area have concentrated on
static decomposition strategies which develop entire sche
dules before data processing for query evaluation starts.
Various simulation studies [SAC82] have shown that the
errors in this kind of scheduling would have a significant
influence on the quality of the performance. A more adaptive
control for the query optimizer should be enforced. It is
generally true that by exploring more assignable resources
to provide services will lower the response time of the
query. But at the same time, the total system load will be
relatively increased. If the work load is taken into con
sideration, then the gain in the response time of the guery
may decrease a lot due to system overhead.
Our research was motivated by research work of
S. Bing Yao et al. [SAC82] over the past few years. He
have noticed the necessity of various assumptions of their
data base system model. Although the assumptions of their
data base system model are almost necessary for deriving any
mathematically optimal solutions, the degree of correctness,
however, should vary froB an environment to another. Since
these assumptions fall into three mutually independent
aspects of computations, namely, data model, computer archi
tecture, and network topology and transmission characteris
tics, it is very difficult to quantify and relate them and
further to consider them in any optimization process. Con
sequently, their algorithms ignore any type of overhead,
whatsoever,in distributed processing, except the trans
mission cost. We propose to take a different view of the
problem of overhead cost. Another purpose of this thesis is
to investigate feasibility of development of an algorithm
which emphasizes parallelism but to a limited extent. He
believe that an extreme parallelisir in transmission and
operation execution eventually creates substantial overhead
which would affect the response of the system adversely.
In Chapter II of this thesis, an overview of query pro
cessing in the distributed data base system is provided.
This chapter presents a theoretical model of distributed
data base, implication of constraints on data base model,
guery representation and some general methods of query real
izations. It also includes discussion on seme important re
search in the area of query optimization. In Chapter III,
the approach of Hevner and Yao [HEV79b, APE83 ] on query op
timization is discussed in detail. Further, by means of ex
amples, we shew that their algorithms do net necessarily
produce schedules which are optimal within the limitations
of the data base model. Estimation of file size during
guery processing is a complex problem [SAC82, YU82 ]. Esti
mation based en static assumptions do net provide close to
actual results. The heuristic algorithm presented in Chapter
IV allows to use the actual size of intermediate files in
the development of query processing strategy. Proof of cor
rectness of the algorithm is presented later. He conclude
this thesis in Chapter V and provide some results of
simulation of cur algorithm.
CHAPTER II
QUERY PROCESSING IN CISTRIBUTEI DATABASE
Query Processing
A guery is an access reguest made by a user or a
program in which one or more files need to be accessed
[RAM79 ]. When mutiple files are accessed by the same query
in a distributed bata base, these files usually have to
reside at a common location before the query can be
processed, at least partially. Substantial communication
overhead is involved if these files are geographically
distributed and a copy of each file needs to be transferred
to a common location. It is therefore necessary to
decompose the guery into subgueries so that each subquery
accesses a single remote file. These subgueries may then be
processed in parallel at different locations. Finding an
efficient schedule of subgueries is important. If the query
is processed inefficiently, it not only takes a long time
before the end user gets his answer, but it might also
decrease the performance of the whole system because of
network congestion [SCH78].
Several nonprocedural languages are known for guery
representation, but we choose relation algebra for our
8
study. Relational algebra is based on three principal
operations: selection, projection, and join. A guery can
be fully described in terms of these operations and their
attribute domains [HEV79b].
Distributed queries are processed by moving data to de
sired locations and then by performing the reguired opera
tions there. The distributed query problem is one of trans
forming a distributed query into a set of local queries each
of which either transfers data from a sinqle location to
other locations or executes a relational operation on local
ly available data. A major objective in any distributed
query processing strategy is to decrease transmission cost.
It is important that the size of data base files is reduced
at their source locations. Therefore, every strategy tries
to execute selection and projection as early as possible be
cause each such operation reduces the size of its operand.
The size reduction decreases the subsequent movement cost of
intermediate relations. The final data lovement of all par
tial solutions must move data to the result site for final
processing. All algorithms assume that an initial phase is
performed. In this phase:
1. The guery is transformed into disjunctive normal
form. The conjuncts can be treated as subgueries and
can be executed independently. The result of the
original query is then the union of the results of
its subgueries.
2. All local processing at different locations should
possibly be performed in parallel.
A general processing cost function for a static system
of transmission (from site i to site j) ^^^ processing (at
site j) of s units of data, can be stated as :
TiJ (s) = lij + Cij(s) + Proj(s*sj)
where lij is the transmission overhead cost, CiJ is the cost
of using the link per unit of data, Proj is the cost of
processing a unit of data by the computer at site J, and sj
is the size of the relation already at site j which have to
be joined with the incoming data. The cost is expressed in
units of time. If no direct link exists between i and j ,
the cost is assumed to be taken over the cheapest path
connecting the two sites.
Distributed query evaluation strategies try to reach
one of two objectives: minimal total time or minimal
response time [HEV78]. Minimal total time objective attempts
to minimize resource consumption and therefore to maximize
system throuhput. The objective behind the ainimal response
time is to minimize system response time and therefore,
probably, to maximize resource consumption. These two
objectives are ,in general, conflicting. A decrease in
10
response time, for instance, may be obtained by having a
large number of parallel transmissions to different sites.
This requires a higher resource consumption, and consequent
ly, the system throughput is reduced.
The general problem of guery optimization in distribut
ed data bases has been proven to be NP-hard [HEV79a]. Gener
al strategies are, therefore, based on heuristics. The term
"optimal strategy" often means a strategy which produces
plans of a lover cost than a plan produced by naive stra
tegies and which also seem reasonably suitable to norrral
queries- Even if a strategy can be proven to produce the
optimum query processing, its rasults could be seriously bi
ased by using various methods of estimating size of interme
diate results.
Semijoin
Semijoin is a relational operator used in a number of
guery processing algorithms [ROT80, BERBIa, APE83 ] and data
base machines [PAE79, 02K77]. In a distributed data base,
we may need to compute joins on relations which are located
at different sites. In order to process these operations,
we need to transfer whole relation from one site to another
site. Because the transmission speed is much slower than
the processing speed, these join operations appear to be
11
most costly operations in the distributed data base.
Therefore, instead of computing joins directly, we should
first reduce the size of the relations, wherever possible,
by using selections and projections on appropriate attri
butes. The semijoin operation has the effect of reducing
the size of relation first before performing the join opera
tion.
A semijoin is "half of a join;" the semijoin of rela
tion Ri by relation Rj on a common attribute D is the jcin
of Ri with the relation projected from RJ on that common at
tribute D. He denote the semijoin operation by the symbol
0. The semijoin of relation Ri on domain A with relation Rj
on domain B is defined as (ti f Si I3tj G J such that
ti.A = tj.B) and is denoted by Ri[A«CB]RJ- The relational
algebra notation t.X means "the value of domain X of tuple
t."
The semijcin operation in this thesis will be limited
to single domain. The justification for this assumption is
its practicality. To perform a semijoin of relation Bi on
domains A and E with RJ on domains C and D, we need to com
pute projection of RJ on the pair of domains C and D. The
size of such a projection will generally be quite large, and
therefore the benefit of the semijoin is likely to exceed
its cost. Still, we do allow multidomain seaijoins in a
12
limited context. If a pair of domains in a relation are
always treated as a single composite domain, then the com
posite domain can be considered as atomic.
The characteristics of semijoin operation can be found
in [BER81a], [00082]. The principal issue discussed there
is to characterize those gueries that could be fully reduced
by semijoins. Applications of semijoin reduce the size of
data and therefore, in general, decrease transimission cost.
However, it is not difficult to see that these do not neces
sarily provide optimal solutions.
There are three basic advantages of semijoin operation
in distributed query processing. First, (Ri-RJ)^^!* nd so
semijoin monotonically reduce the size of the data base. By
contrast, joins can increase the size of the data base; in
the worst case |Ri join RJI = |Ri| • IRjI. Second, semi-
joins can be computed with less intersite data transfer than
needed for joins- To compute Ri^Rj, we need only transmit a
projection of a relation, whereas to compute the join we
must transmit the entire relation. Of course, semijoin may
also have less effect than a join, since BiiiRj only reduces
Ri whereas join simultaneously reduces both Ri and Rj. The
third advantage of semijcin is that the "reductive effect"
of a single join can be obtained by two semijoins, usually
at lower cost.
13
However, there are cases in which j o i n s outperform
s e m i j o i n s [BERSIa], and therefore a p p l i c a t i o n s of semijo ins
are based mostly on h e u r i s t i c arguments. An optimal query
process ing algorithm would almost c e r t a i n l y include both
j o i n s and semi jo ins . The graceful in tegrat ion of these tac
t i c s i s an open problem.
JOi.§llituted ^uery Frccessinq S t r a t e g i e s
Research on d i s tr ibuted query processing was i n i t i a l l y
done by Wong [W0N77]. He proposed an opt imizat ion method
based on a greedy h e u r i s t i c that produced general query
process ing s t r a t g i e s but not neces sar i l y optimal s t r a t e g i e s .
An enhanced vers ion of t h i s method i s implemented in the
SDD-1 system [1^0180] [BERSIb] which i s operat ional at
Computer Corporation of America. Epstein et a l . [EPS78]
developed an algorithm which i s partly based on c e n t r a l i z e d
guery s t r a t e g i e s . This algorithm i s implemented in the
d i s t r i b u t e d INGRES data base system. The algorithm
opt imizes each guery operation l o c a l l y . Chu and Hurley
[CHD80] incorporated the idea of permuting r e l a t i o n a l
operat ions in the guery trees [SMI75] and defined the
s o l u t i o n space of f e a s i b l e processing s t r a t e g i e s for
d i s t r i b u t e d g u e r i e s . They presented an optimal algorithm
14
using 0-1 integer programming which is known to be an
exponential time problem. Hevner and Yao [HEV78] proposed to
use volume of data transferred as the optimization cri
terion. They proposed solutions for a special class of
query — simple guery. In [HEV79t] [APE83], they extended
these algorithns to process general distributed gueries.
Their algorithm GENERAL has a serious problem in that it may
allow simultaneous transmission of different data on the
sane link. He discuss this problem in detail later [LAK83].
Bernstein et al. [BERSIa] investigated the reducing proper
ties of semijoins and used the semijoin operation to develop
full reduction methods for distributed guery processing
[G0082]. These methods are applicable for a special class of
queries known as the tree queries. Their concern was to de
termine full reducer, net to consider their cost effective
ness. Therefore, this strategy does not minimize trans
mission costs, but rather reduces the amount of irrelevant
data. Kerschberg et al. [KER«2] derived an efficient algo
rithm for one variable queries given in disjunctive normal
form for a star shaped network.
The objective of most of distributed guery processing
strategies is to minimize use of network resources.
Transformation of the guery into disjunctive normal form and
initial parallel local processing are considered to be
15
beneficial, and are always performed. The basic
optimization problem then, is to find distributed joins,
which require least transmissions between sites. Three
different actions can he applied:
1. Redistribution of a given relation to a certain site
i, in order to perform an on site join at i.
2. Redistribution of several relations to different
sites, in order to perform on site fragments of joins
at those sites.
3. Redistribution of the attributes of certain rela
tions, in order to perform semijoins.
Most methods avoid an expensive search of the solution
space by (1) assuming a nonredundant static materialization
of all the relations referenced in the guery. In this way,
the number of network sites to be considered is generally
less than or equal to the number of relations referenced in
the guery, and by <2) exploring only a heuristically defined
subset of the solution space.
Studies [SAC82] show that evaluation of strategies
produced by heuristic algorithms perform significantly
better than those are produced by naive methods, but often
are poorer than optimal strategies.
16
Several problems remain to be solved. The principal
ones are :
1- Network topologies.
2. Initial materializations.
3. Query evaluation cost functions.
4. System load.
5. Ojective functions.
CHAPTER III
IMPROVEMENT IN H3VNEH AND YAC'S METHOD
Hevner and Yao have proposed several algorithms [HEV78,
HEV79b, SAC82, APE83 ] for distributed gueries- These
algorithms are based on an algorithm which they developed
for a class of queries, called "simple gueries." Their
algorithms can be classified into two different sets. One
is for total time minimization, and the ether is for
response time minimization. In this chapter, we show that
their algorithm for the response time version does not
necessarily produce optimal schedule, and later we present
modifications to improve the schedules produced by their
PARALLEL algorithir.
The first subdivision of this chapter describes their
distributed data base model. In the second subdivision, an
alternate algorithm is proposed for simple gueries. The
error and modifications of their GENERAL algorithm for
general gueries are described in the third subdivision.
17
18
System Model
In a distributed system environment, we can view the
network as a connected, undirected graph. Each node of the
graph represents an independent computer which has
processing and data storage capabilities, and each edge
represents a point to point [MAR81] communication line which
can carry data to the adjoining node. Each computer
contains a version of a distributed data base management
system.
The data base is viewed logically in the relational
data model. The data base is distributed over whole system
nodes in units of relations, and it can be accessed locally
or globally from any system node.
We assume that the data transmission cost between any
two nodes is a linear function of the size of the data. This
function will be denoted by T(x) <-- I • C (x) , where x is
the amount of data transmitted and I is the transmission
start up cost which is assumed to be constant. The local
processing time compared to the transmission time is assumed
negligible. Domains of various attributes are assumed to be
independent and that values are evenly distributed in every
attribute of a relation. The response time of a query
schedule is the time elapsed between the start of the first
transmission and the time when the relation arrives at the
reguired computer. The minimum response time of a relation
19
is the minimum response time among all possible schedules
for the relation.
For a distributed data base system described above, we
assume that the following information is given:
For distributed system:
N Number of nodes
H Number cf unique relations in the data base
For each relation Ri, i = 1, ,M:
ni Number cf records
ai Number of domains
si size (in bytes)
For each domain dij, j = 1,..--.,ai, of relation Ri:
Vij Number of possible domain values
Uij Number of values domain d_ij currently holds
Pij Selectivity; PiJ = Dij/Vij (0 < Pij < 1)
WiJ Size of data item in the domain
Bij Projected size of domain; Eij = Uij*Hij
Query processing involving data stored at a single node is
termed local processing. An advantage of local processing is
to reduce the amount of data that needs further processing.
To estimate effect of processing, we defined following
parameters:
n Number of relations in the remainder of the query
Fi Number cf domains in relation Hi
Gi Number cf internal join domains in relation Ri
20
It is assumed that attributes of a relation are independent
and so, if a semijoin over an attribute Dij of a relation Ri
is taken with another attribute Dkl of another relation Rk,
the effect of reduction on relation parameters is :
Si < Si • Pkl
Pij < Pij • Pkl
BiJ < Eij • Pja
In a distributed data base, it is, in general, better to do
initial local processing first beacuse it reduces the amount
of data to be transmitted. After the initial processing,
each node that has guery data will be considered to contain
only one "integrated" relation. The relations at each node
remain distinct. However, by reformatting the guery so that
each node becomes a variable, the distribution aspects of
the guery can be emphasized [APE83].
The relations to be referenced by the queries can be
stored in arbitrary computers of the network. To process a
query that requires non-local relations, we have to transmit
data from one computer to another.
One way of processing a query is to use Initial Feasi
ble Solution (IFS) strategy. The idea of IFS is to send all
involved data directly to the computer where the result
(target computer) is required. Current research [ APE83,
BER81b, KER82] in the area of distributed query processinq
21
show that computation of intermidiate results by computers
other than the target computer may be mere efficient (in
terms of total tiire and response time) .
To show the relavence of constraints of above model for
transmission strategies, we take an example. Consider a
distributed relational data base where each of the following
relations is located at a separate node and there is a link
between every two nodes. For simplicity, we assume, without
loss of generality, that Tij = Cij (s),
Suppose that the query is : Find the department number
(D*) of all departments that are located in Texas (LOC) ,
have a budget over $20,0C0 (BUD), have at least two managers
(CLASS), and have at least one computer in their equipment
inventory (TYPE). The result should be sent to node 3. The
data base state is given in Table 1.
TABLE 1
A ristributed Data Base SysteE
NODE EEIATICN VARIABLE SIZE SELECTIVITY
1 Cepartffent (D*, LCC) D 1000 1 2 Employee (E*, D*, CLASS) E 800 4/5 3 Eguipment(EQ*, D*, TYPE) EQ 100 1/10 4 Budget (BUD, D*) B 200 1/5
22
The IFS strategy will produce the following schedules :
1. send department relation to node 3.
2. send employee relation to node 3.
3. send budget relation to node 3.
4. perform query operations at node 3.
It can he easily seen that the response time of IFS
strategy for this example is 1000. Now, consider following
schedules:
step1:
send equipment relation to node 4,
perform jcin and projection.
step2:
send the result from node 4 to node 2,
perform join and projection.
s te p 3:
send the result from node 2 te node 4,
perform join and projection.
step4:
send the result from node 1 to node 3.
The response time for these schedules are:
step 1 : 100
step 2 : (1/10) • 200 = 20
step 3 : (1/10) * (1/5) * 300 = 16
step 4 : (1/10) • (1/5) • (4/5) * 1000 = 16
23
The response time of the query is 100 + 20 + 16 + 16 = 152.
This schedule, obviously, is far superior than the schedule
obtained by IFS.
From this example, we can see that before transferrinq
the larger relation directly to the result node, it may be
better to reduce its size by doing some intermediate opera
tions with some smaller relations first. It is generally
true that by bringing in more relations would avoid trans
mission of redundant data, but at the same time it is going
to increase the overhead. Arrangement of these schedules
determines the efficiency of a guery processor.
JijPl. Query
Hevner and Yao [HEV78, HSV79] presented an optimal
algorithm (PARALLEL) for simple gueries on a completely
connected network. This algorithm minimizes response time. A
simple query is defined such that after initial local
processing, each relation referenced in the query contains
only one domain - the common joining domain. The objective
of the algorithm PARALLEL is to minimize the response time
by reducing the transmission cost.
Before we describe this algorithm, let us describe a
graphing method [HEV78] that will be used to analyze the
costs of different distribution strategies. This graphing
24
method represents the timing of data movements in a
distribution strategy. Data novement is represented by a
horizontal line connecting processing symbols. The length of
the line corresponds to novement time. A query cost graph
is composed of parallel lines of movement, one for each re
quired relation. These parallel lines are viewed on a hori
zontal time axis. This allow us to recognize synchronization
between different lines of movement. For example, the cost
graph for Initial Feasible Solution (IFS) is shown in Fig
ure 1. The relations 2i,...,Rj are assumed to be at result
node originally. Therefore their line of meveraent costs are
zero.
The lines of movement to relation R_i represents the
seguential and parallel data movements that are made in
order to reduce the size of the relation Ri. before it is
moved. Sometines it is beneficial to move seme relations to
a relation sc that the relation size is reduced and the
subsequent data movements are less costly. As in Figure 2,
relation Rj and Rk are moved parallel to Ri. Relation Ri is
reduced in size and then is moved to the result node.
We define the response time for relation Ri as the time
from the start of data movement until relation Ri is
receive! at the result node. For example in Figure 2, the
response time will be t2*t3. The time t1 is not included
25
RJ: I
R2: I
Ri: I
Bj: I
Rk: 1
C(SJ)
C(S2)
C(Sk)
- 1
- I
Figure 1: Graph Cost for IFS
because it occurs in parallel with time t2. The delay time
for relation Ri is defined as the time from the start of
data movement until it reaches the relation Ri. In Fig
ure 2, the delay time is t2.
x.
26
Now, let us describe the algorithm PARALLEL. There are
two assumptions for the algorithm- First, local processing
costs are insignificant in comparison with data movement
cost. Second, after initial local processing each relation
contains only one common joining domain. Since all rela
tions are interconnected on a common joining domain, each
node is assumed to contain only one reguired relation for
the query. Also once a relation is moved to another rela
tion, then it need not be moved to the destination node-
The algorithm PARALIEL starts with the IFS and searches
for cost beneficial data moves in the current system state.
The state of the system is given by the size Si, selectivity
Pi, and line of movement cost Li, for each relation Ri. A
cost beneficial move to reduce response time is defined as a
data move to a relation Bi so that movement cost for Ri is
reduced. Relations are ordered so that after initial local
processing step, SJ < S2 <....< Sra. Algorithm PARALLEL then
uses this ordering to implement the moving cf relations of
smaller size to relations of larger size if such a move is
cost beneficial.
Algorithm PARALLEL
1. After initial local processing, order relations so
that SJ < S2 <...< Sm. Initial system state
parameter Si, Pi and 0i = C (Si) for all Ri.
27
2. For i = 1 to m repeat steps 3 through 4.
3. Find the response time Oij for all j < i;
Oij = Cj + C(Si * IfJ^^^Pk) .
4. Choose the most cost beneficial movements;
Oi = min (Ci, Oij).
He point cut below that the algorithm PARALLEL may not
produce optimal schedules [LAK83]. Cur observation is stat
ed in the following example- Let RJ and R2 be two relations
having only one attribute over a common join domain, and let
the join of Rj and R^ be reguired for a query at some loca
tion other than that of Rj and RJ. After initial process
ing, the size and selectivity values are listed in the Ta
ble 2.
TABLE 2
Example of Simple Query
Relation Size Selectivity Ej 100 0-1 B2 200 0-2
28
According to the algorithm PARALLEL, the optimal
schedule for transmission of attribute to the destination
should consist of transmission of Ej to the location R2 and
the transmission of the output of the semijcin of Rj and R2
(to be computed at location of R2) to the destination. The
cost graph is shown in Figure 3- The expected volume of the
output of the semijoin operation is 200 * 0.1 = 20. The to
tal cost for this schedule is I • C(100) • I • C(20). Now
consider the following alternate schedule.
Rj C(100) R2 C(20) Result J , ,
Response time = C(100) -i- C (2 0)
(a) Cost graph for algorithm PARALLEL
pj C(100) R2 C(10) Result , , ,
R2 C(100) Result , ,
Response time = C(100) + C(10)
(b) Cost graph for improved schedule
Figure 3: Cost Graph for An Example of A Simple Query
29
Let Rj be transmitted simultaneously, to the location
of R2 and to the destination, and let R2 be transmitted to
the destination at the same time until the transmission of
Rj ends. For this example, first half of R2 would be trans
mitted before the transmission of Rj ends- The join of Rj
and the remaining half of R2 is computed at the location of
R2, and it is then transmitted to the destination. A join of
Rj with the first half of R2 is computed at the destination
computer.
The response time of this schedule is I -»• c(100) • I *
C(100*0-1) = 21 + C(110). This schedule, therefore, outper
forms the schedule produced by PARALLEL. Complete response
time (the transmission and processing time together) may now
be computed. The cost function of computing join of two at
tributes of size X and y may be denoted by K * Pro(x*y).
The function is justified because the two attributes may not
be sorted. The response time of the schedule obtained by
their algorithm is I • C(100) + K • Pro(200*100) + I +
C(200*0.1) = 21 + C(120) • K + Pro(2000). The response time
of the alternate schedule is I • C(100) + K • Pro (100* 100) -•
I + C(100*0.1) • K + Pro(100*100) = 21 + C(110) • 2K •
Pro(2000). If C(10) > K, the response time of the alternate
schedule is better. The reason is that the algorithm
PARALLEL does not use the hardware fully for parallel
transmission.
\ :
30
general 2uer^
In [APE83], Apers et al. extended simple guery
algorithms for general gueries. The algorithms (GENERAL,
RESPONSE) assume the same network properties and
transmission cost function as given before for the algorithm
PARALLEL. A query is called GENERAL, if there are some
relations which may have any number of common joining and
output domains. Furthermore, a node in the network may
contain any number of required relations. The joining
domains within each relation are assumed to be independent-
Thus a selectivity reduction on one domain does not affect
the selectivity of the other joining domains.
The central idea in these algorithms is to try to
reduce the relation size as much as possible by using
semijoin operations and then transfer the reduced relations
to the destination node. Using the indenpendence of
domains, they consider that a general guery involving
several domains can be partitioned into several simple
queries, each one with an undefined result node. The
algorithm PARALLEL can be applied to each subquery
separately. The schedules for each simple guery are then
integrated into a complete guery processing strategy. They
showed that under the hypothesis of attribute independence
within each referenced relation, the algorithm GENERAI and
RESPONSE will produce schedules for optimal response time.
31
First, let us demonstrate these algcrithras by an
example. The steps of the algorithms GENERAL (step 1, step
2, setp 5) and RESPONSE (step 3, step 4) are shown below.
Step 1: Do all initial local processing: All the opera
tions that can be done viithout transferring data
should be first executed locally-
Let the data base state of the example, after initial local
processing, be given in Table 3. There are two relations Ej
and R2, each of them has two common joining attributes, bij
and bi2 with selectivity pij and pi2, respectively-
TABLE 3
Example of General Queries
1
Realtion Ri
RI R2
Size Si
1000 2000
Domain 1 biJ Pil
400 400
0.4 0.4
Domain 2 bi_2 pi2
100 0.2 450 0-9
Step 2: Generate candidate schedules: For each of the
joining attributes, consider the simple query
described by it. Apply the alqorithra PARALLEL
to each simple query. Save all candidates for
integration in step (3).
32
By applying the algorithm PARALLEL to both bjj and b2j
separately, produce candidate schedules which are shown in
Figure 4(a). The candidate schedules for bj2 and b22 are
shown in Figure 4 (b) .
tjl 42 0 bjj: I-- 1
b21 420 blJ: I-- 1
(a) Candidate schedules for bjj and b2j,
tj2 120 bj2: I I
bj2 120 b2 2 110 b22: I I I
(b) Candidate schedules for b 12 and bJJ
Figure 4: Candidate Schedules for General Query Example
Step 3: Candidate schedule ordering: For each relation
Ei, order the candidate schedules on joining
attribute bij, j = 1,2,.--,g in ascending order
of arrival time- Let ARTk denote the arrival
time of schedule CSCHk-
33
For relation Ej, the schedules of attributes that can be
applied are ordered on their arrival time in the node where
Rj is located, and this result is given in Table 4. The
schedules for relation E2 are shown in Table 5.
TABLE 4
Candidate Schedule for El
Attribute hk
b22 b21
Arrival Time ARTk
330 42 0
TABLE 5
Candidate Schedule for R2
Attribute
b12 til
Arrival Time ARTk
120 420
34
Step 4: Schedule integration: For each candidate sche
dule CSCHk, in ascending order, construct an
integrated schedule for Ri that consists of
parallel transmission of CSCHk and all CSCH£,
£ < k. Select the integrated schedule of minimum
response time.
In step (4) , for each of these attributes bk, an integrated
schedule for relation Ri is constructed which consists of
parallel transirission of all attributes having arrival time
less than or equal to ARTk. This construction for relation
Rj is shown in Figure 5. Figure 6 shows the schedule ccn-
struction for relation R2. In Figure 5, there are two sche
dules for Rj. Along with the Initial Feasible Solution for
Rj, the schedule with the minimun response is chosen. This
is the second schedule, and its response time is 800. The
first schedule in Figure 6 is chosen for relation R2, and
its response time is 540.
Step 5: Remove schedule redundancies: Eliminate schedules
of relations which have already been transmitted
in schedules cf other relations.
The final guery processing strategy for this example is
shown in Figure 7.
The algorithm RESPONSE has two steps (step 3 and step 4
in our example). Step 1 computes the optimal schedules for
35
bj2 120 b22 110 Rj b22: I I I
920 , , I
Response time = C(100) + C (0.2*450) + C (0.9*1000) = 120 + 11C + 920 = 1150
b12 I —
120 b22 110 Rj I
380 b21 I 1
b2j 1 —
420
Response time = C (400) -•• C (0. 9*0.4*1000) = 420 + 380 = 800
Figure 5: Cost Graph for Rj
each joining attributes individually and integrates these
schedules. Step 2 determines the attributes that may te
transmitted using these schedules to locations of data base
to reduce volume of data. A problem arises when some rela
tions share two or more joining attributes. In this case,
these common attributes can not be transmitted concurrently
on any single communication link which is shared by the two
Therefore, the schedules computed individually locations. inerfci-ui- #
for joining attributes can not be executed in parallel. The
step 2 does not consider this problem. As a result, the
b12: bj2 I —
120 R2 • I -
420
Response time = C(100) • C (0.2*2000) = 120 -»• 420 = 540
b12 120 R2 ,11
b11: bjj 420
180
Response time = C(400) + C (0. 2*0. 4*2000) = 420 + 180 = 600
Figure 6: Cost Graph for R2
36
Rl:
bj2 I —
120 b22 j —
110 R1
b2j I -
420
380
R2:
tj2 120 R2 ^ 4 20
Figure 7: Fi^^l Cost Graph for General Query
37
schedules which RESPONSE produces may try to transfer more
than one common attributes over a single communication link
concurrently. Figure 7 shows this problem graphically. The
schedules of two attributes shared between Rj and R2 have
concurrent transmission of b22 and b2j (both from location
E2) to the location of relation Rj. It concludes that the
algorithm RESPCNSE will not produce optimal response time
schedules for general gueries.
Now we will synthesize guality of schedules which are
produced by the algorithm RESPONSE. If an attribute can not
be fragmented, then the schedules produced by the algorithm
PARALLEL for the joining attributes have minimum response
time [HEV79a]. If each relation schedule is of the minimum
response time, then the total guery processing strategy is
also of minimum response time. These two statements follow
from the attribute independence of relations. Because of
the optimality of algorithm PARALLEL, the candidate sche
dules used in RESPONSE are also minimum response time sche
dules for each joining attribute. Algorithm RESPONSE puts
these candidate schedules in ascending order of arrival
time and it only considers integrated schedules for
relation Ri that consist of the parallel transmission of
joining attributes with arrival time less than or egual to
the arrival time of a certain CSCBk. If we consider all
38
possible attributes which can reduce the relation Ri, then
the schedules which give the minimal response time is opti
mal. Algorithm RESPONSE considers all possible attributes
by bringing them in parallel if their arrival time is less
than or equal to the current attribute been considered. By
bringing attributes in parallel to a relation, it implies
that these attributes can be transmitted concurrently. In
order to make sure that these attributes can be transfered
concurrently, the schedules which PARALLEL produced can not
have any conflicting transmission. If the relation that has
more than one common joining attributes (as shown in Fig
ure 4 and Table 4), it may have a conflicting transmission.
It concludes that the algorithm RESPONSE is applicable only
for relations such that no two relations have more than one
joining attributes which join together.
CHAPTES IV
MINI-MAX ALGCRITHM
Introduction
Every query optimization strategy is defined over a
simplified model of a real distributed data base. Even with
very simple cost functions, as described in previous
chapters, many sutproblems in distributed guery processing
are known to he NP-hard [HEV79b, YLC82, YU83]. As a result,
most optimization algorithms try to find an efficient
solution, and therefore, are either heuristics, or are
generalized algorithms tailored for special types of queries
suitable for specific network configurations.
To simplify the problem further, in [APE83, HEV79b,
YA079] the authors assume that processing time of guery
operations can be ignored. They concentrate on transmission
cost only. correctness of this assumption essentially
depends upon the environment of the network which connects
hiqh speed processors. But this is not the case in some
types of distributed architecture for which there are high
bandwidth channels which connect relatively slow processing
unit, such as local area network.
39
40
If the processing cost of guery operations is also
included in the cost model (this assumption seems to be more
realistic), the problem becomes much more complicated and
difficult. To emphasize on this point, consider cost qraph
of IFS in Figure 8, (the transmission time is represented by
dash line and the processing time is represented by *) . It
is clear that the transmission time of the query is limited
by transmission time of the largest size of relation (R4 in
this case), and the processing time of the guery is the time
of computation of some binary operation on relations (Rj,
R2, Rj[, R£ in this example) which are sent to the the desti
nation. Now, if we can reduce the processing time at the
destination, we can reduce the response time of the query.
In order to reduce the processing time at the destination,
we can try to combine some relations (as shown in Figure 3)
provided that the cost of these combined relations do not
exceed the maximum transmission time. The principal idea is
to combine relations as many as possible, process subgueries
at locations other than the destination to achieve parallel
ism.
Now the problem becomes how to determine the relations
h'ch can be combined profitably. This problem is more
plex than the traditional Bin-Packing problem (an NP-Hard
blem) which states that given a list of real numbers in
w
CO
41
RI : I E2 : I
Rj : 1 R4 : 1
:ec:(c:tc*i)c | 4 > « * 4 e « | *4***1^^
Figure 8: Cost Graph Including Processing Cost
the range (0,1), place these numbers in a mininum number of
"bins" so that no bin holds numbers summing to more than 1.
In our study, if the bin size is defined by the maximum
transmission time and then tho number of bins are the nuiber
of relations which can be combined.
Another severe problem is to estimate the size of rela
tions [YU82]. Any static guery processing strategy needs to
estimate size cf intermediate relations which are results of
the execution of the subgueries. Existing distributed query
processing algorithms estimate the size of an intermediate
relation by assuming that the values of attribute in differ-
T, + innc: are among the same range- This assumption may ent reiaxj-"** "^
be true in practice. As a result, without knowing the
1 r^i-r^ of the intermediate relations, the errors in the actual si^c ^
estimation result are accumulated.
42
Experiments on the effects of estimating the size of
relations have been conducted by Epstein et al. fSAC82].
They used Distributed INGRES [EPS78] as a tool to run over
two sets of relations and a large number of gueries. They
showed that typical difference in performance is around ten
to forty percent. However, a later experiment en different
relations showed a difference of performance of almost seven
hundred percent.
It is veil recognized that deriving an algorithm to
find efficient distribution strategies for combining rela
tions is very difficult problem- In this chapter. We first
describe a somewhat simplified query environment. An algo
rithm (Mini-Max) is then presented to find a distribution
strategy for this environment-
Mini- Jax Algorithm
Basically^ the system assumptions are the same as
described in Chapter III. For simplicity, we restrict the
query to simple query, though extension of our ideas to
general gueries is not very difficult. We define
transmission time Ti of relation Ri as the time from start
of the schedule until the relation Ri is received at the
destination. It is clear that several distribution
<?trategies may produce the same transmission time, with
different response time. For example, both of the cost
43
graphs in Figure 9 show feasible data movements for a query
requiring data from four remote relations. Eoth cost graphs
have egual transmission time although the distribution stra
tegies and response time differ- The transmission time is
t4+t»1 for both graphs.
First cost graph:
RI: R4 t4 Rj tM , , I
R2: E2 t2 1 I
R3: R3 t3 , ,
Second cost graph:
R1"
,r 1 1
^-'' E2 t2 B3 f 3 ,r 1 1
Figure 9: Alternate Cost Graph Schedule
x
44
We now propose algorithm Mini-Max to find a
distribution strategy for response time reduction- Since it
is difficult to find an optimal strategy which can combine
several relations, we consider schedules which combine a re
lation with just one another relation. Our aim is to find
as many combined schedules as possible such that the pro
cessing time at the destination is reduced and at the same
time these schedules do not increase minimum transmission
time schedule. It should be noted that the minimum trans
mission time schedule defined here is actually the maximum
transmission time schedule among all the schedules we pro
duced. The reason we called it "the minimum transmission
time schedule" is that this schedule provides the minimum
transmission time among all the possible schedules-
We start with IFS and search for cost beneficial data
movement in the current system state. The state of the sys
tem is given by the size Si, selectivity Pi, and trans
mission time Ti for each relation Ei. A cost beneficial move
to reduce the response time is defined as one that moves
some relations to another relation Ri. so that the trans
mission time Ti of Ri is either reduced or effect of this
move does not increase Ti beyond transmission time of
current system state. Considering the example in Figure 9,
though, both cost graphs have t4+t»1 transmission time, the
45
second one suggests that relation R2 can be combined with
relation R3 without increasing its transmission time
(t2+t»3) beyond the transmission time Ti (t4<'t»1) of the
complete schedule. The second schedule has tetter response
time because that there are two combined relations (Rj and
E2) instead of three (Rj, R2, and Ej) for processing at the
destination.
In order to combined as many relations as we can, the
algorithm is executed in several phases (recall that we al
low a relation to be combined with just one another relation
at any time). Relations are ordered by their sizes for each
phase- Algoritnm Mini-Max, then, uses this relation ordering
to implement a tactic of pairing relation of the smallest
size with the relation of larger size if such a move is cost
beneficial. The algorithm searches for cost beneficial data
transmission by trying to join a small relation to a large
relation. Relations are checked in the order of decresing
size. Each time a schedule (for the largest relation) is
produced which either is a combined schedule of two rela
tions or it is a schedule for which joining of another rela
tion is not profitable. After schedule is produced, the
algorithm checks the rest of the relations which have not
been tested for inclusion in the schedules yet using the
me method. It is shown (in next section) that algorithm
46
Mini-Max produces the maximum number of pa ir s of r e l a t i o n s
in e v e r y phase and a t the same time these s c h e d u l e s have the
minimum t r a n s m i s s i o n t ime .
Algori thm Mini-Max i s presented i n the f o l l o w i n g :
1 . ( I n i t i a l i z a t i o n . )
Index r e l a t i o n s so t h a t S j < S2 < < Sn
Ti <— C (Si) (* i n i t i a l i z e t ransmiss ion c o s t *)
Max_Trans <— 0 (* minimum t r a n s m i s s i o n c o s t *)
Sch <— empty (* schedule pool *)
Next_Phase <— t r u e
Buffer,T€mp_Buffer <— s e t of ordered r e l a t i o n s
2 . (Repeat u n t i l no fur ther reduct ion i s p r o f i t a b l e . )
While Next_Phase do s t ep 2 to s t e p 6
Buffer <— Temp_buffer
Temp_Buffer <— empty
Next_Pha£e <— f a l s e
3 . (Are a l l the r e l a t i o n s checked?)
While (Buffer # empty) do s t e p 3 t o s t e p 5
Pick up a pair of r e l a t i o n Ri and Rj from Q such t h a t
Bi i s t h e s m a l l e s t s i z e of r e l a t i o n and Rj i s the
l a r g e s t s i z e of r e l a t i o n in Buf fer .
4 (Construct a s c h e d u l e and check the t ransmis s ion
c o s t . )
T ' j = Ti + Pro (Si • Sj) + C(Pi • Sj)
Case
47
4. 1 ( i ) T»j < Tj
Append s c h e d u l e (Rj, Rj) to Sch
Buffer <— Buffer - (B i , Rj}
Sj <— Pi * Sj
l i <— C(Sj)
Temp^Buffer <— Temp_Buffer + (Rj)
Next_Phase <— true
( i i ) T ' j > Tj
Case
4 . 2 (a) T»j < Max_Trans
Append schedule (F i , Rj) to Sch
Buffer < - - Buffer - {Ri, Ej]
S j <— Pi * S j
Tj <— C(Sj)
Temp_3uffer <— Temp_Buffer ••• (Rj)
4^3 (b) T ' j > Max_Trans
Append schedule (Rj) t o Sch
Buffer < - - Buffer - {Rj}
Temp_Buffer <— Temp_Buffer + {Rj}
5 . (Determine the maximum transmis s ion t ime . )
If Max_Trans < T ' j then Max_Trans = T' j
6 . ( R e i n i t i a l i z e for the next phase.)
Arrange the r e l a t i o n s in Buffer by the i r s i z e
Euffer = TeiEp_Buffer
48
A simple query example is presented to illustrate use
of alqorithra Mini-Max strategy. Let us assunie a guery such
that four required relations are located at four different
nodes. After ini t ial processing, the size and selectivity
value are shown in Table 6.
TABLE 6
Data Base State for Algorithm Mini-Max
Relation RI: R2: R3: E4:
Size 100 300 800 1000
Selectivity 0.1 0.3 0-8 1.0
Assume that the result node is separate from the nodes
which store the given relations. Let the transmission cost
function be C (x) = 10 • x, and let the processing cost
compared to the transmission cost be of ratio one to four.
The schedule for initial feasible solution (IFS) is shown in
Figure 10.
Algorithm Mini-Max attempt to reduce the transmission time
by finding the pairs in the order of R4, R3, R2, Rj. B4
transmission time reduction:
Transmit Rj to R4
49
110 Rj: , ,
310 R2: I ,
810 E3: I J
1010 B4: I ,
response time = C (1000)+Pro (100*300)+Pro (0. 1*300*800) + Pro (0.1*0.3*800*1000)
= 1010 • 3 + 2.4 + 1.6 = 1017
Figure 10: Cost Graph for IFS in Algorithm Mini-Max
T ' 4 = C(10C) + Pro (10C+1000) + C (0 .1 *1000 )
= 110 + 10 + 110
= 230
Since 230 < 10 10=T4, the transmission of Rj to R4 is inte
grated into the strategy.
Transmit R2 to R3:
T'3 = C(300) + Pro(300*800) + C(0.3*800)
= 310 + 24 • 250
= 584
Since 584 < €10 = T3, the transttission of R2 to E3 i s
in tegrated i n t o the s t r a t e g y . After the f i r s t phase, the
c o s t graph i s shown in Figure 11.
50
Rj 110 R4 10 110 I j * * * i i i i * I
R2 310 R3 24 250
Figure 11: Cost Graph After First Phase
Rj and R2 are not needed in the second phase because these
relations have been combined. The size of R4 and R3 after
the first phase is 100 and 240, respectively. Since the size
of R4 is smaller than R3, in the second phase, the algorithm
will try to ccnibined R4 to R3-
Transmit R4 to H3:
T'3 = C(10C) • Pro(10C*240) + 0 (0 .1*240)
= 110 + 2 . 4 + 34
= 146 .4
S i n c e 146.4 < 250 , the t r a n s m i s s i o n of R4 to R3 i s i n t e g r a t
ed i n t o t h e s t r a t e g y . The f i n a l c o s t graph i s shown i n F ig
ure 12 .
^
51
RI R4 J I * * * * l 1
R2 R3 I
response time = C(300) • Pro (300*800) •»• C(0-1*1000) +Pro( 100*0. 3*800) -t-C (0.1*0.3*800)
= 310 • 24 + 110 + 2-4 • 34 = 480.4
Figure 12: Final Cost Graph for Algorithm Mini-Max
Correctness of the Algorithm
In this section, we prove that the algorithm aini-Max,
for each phase, finds a distribution strategy which obtains
the maximum nunter of pairs of relations without increasing
the minimum transmission time of the query. Recall that the
query being simple performs only one join on any relation.
Hence, after a relation is combined with another, the
relation is not referenced again. The basis of the proof is
given in the following lemma. «e denote by Tij the cost of
combining (a pair) Rj into relation Rj (i.e., Tij = C(Si) .
pro (Si • Sj) * C(Pi * Sj)). ^e denote by Tk the schedule
for relation Ek if it is not combined with any other
relation.
> i l 111 I I I .
52
LEMMA^- For s i m p l e guery . If Si < Sm < Sk < S j , then
max ( T i j , Tmk) < max (Tile, Tmj) < max (Tim, Tkj) .
Because Si < Sm t h e r e f o r e T i j < Tmj (which f o l l o w s from
t h e d e f i n i t i o n of t r a n s m i s s i o n time) - S i m i l a r l y , Sk < Sj
i m p l i e s t h a t Tmk < Tmj. Hence, max ( T i j , Trajs) < Tmj <
max(Tik, Tmj) . i t means t h a t Tik < Tjcj and Tmj < Tkj . I t
p r o v e s t h a t max (Tik, Tmj) < Tkj < max(Tim, Tkj ) .
The lemma given above s t a t e s tha t the minimum t r a n s
m i s s i o n time s c h e d u l e for any two pa ir s cf combined r e l a
t i o n s i s e i t h e r the schedule of pa ir ing the s m a l l e s t s i z e of
r e l a t i o n with the l a r g e s t s i z e r e l a t i o n or the schedule by
p a i r i n g the second s m a l l e s t s i z e of r e l a t i o n with the second
l a r g e s t s i z e of r e l a t i o n . Further, i t s a y s that i f T i j i s
t h e minimum t r a n s m i s s i o n time s c h e d u l e , then a l l r e l a t i o n s
wi th s i z e g r e a t e r than S j should have been paired with r e l a
t i o n s with s i z e s i a l l e r than S i - Al l r e l a t i o n s of s i z e be
tween Si and Sj can not pair with the r e l a t i o n s which are of
s i z e s m a l l e r than S i .
THEOREM_j. For a s imple guery, the algorithm Mini-Max
d e r i v e s the minimuB t r a n s m i s s i o n time s c h e d u l e s and maximum
number o f p a i r s of combined s c h e d u l e s for each phase.
Proof: We index the r e l a t i o n s so t h a t S j < S2 < . . . . <
s n . F i r s t , we prove that the a lgor i thm Mini-Max w i l l d e r i v e
t h e minimum t r a n s m i s s i o n time s c h e d u l e . Two c a s e s are
^
53
s t u d i e d h e r e , t h a t the minimum transmiss ion time schedule
produced by a l g o r i t h m Mini-Max i s given by e i t h e r T i j or Tj-«
Let Q be the set of schedules produced by algorithm Mini-
Max. Suppose that Tij is the minimum transmission time
schedule in Q, and let it were not a minimum transmission
time schedule- Let Q» be the set of schedules which con
tains the minimum transmission time schedule- It is easy to
see that all the schedules in Q' are less than Tij. Since
Tij < Tj, there must exist a schedule Tmj in Q» where Rm #
Ri. The size of Rm can not exceed cr egual to the size of
Ri. If it is not the case, then by definition, Tij < Tmj.
It contradicts the assumption that C» contains the minimum
transmission time schedule. So the size of Rjp should be
smaller than Si. Since the algorithm Mini-Max always choos
es the smallest relation to pair it with the largest rela
tion first, the number of relations Rk where k > j are egual
to the number of relations Rn where n < i. If Tmj exist
then there exist at least one relation R^ in Q', y > j,
which can not te paired with Rn, for n < i. Since Tij < Tj
< TV Bl should be paired with one of the relations Rx, for
X > n. From Lemma 1, max (Tij, Tmy) < max (Txj, Tmjf) <
max(Tmj, Tx^). Again, it leads to a contradiction of the
assumption that all the schedules in Q' are less than Tij.
Therefore Tij must be the minimum transmission time
54
s c h e d u l e . S i m i l a r l y , we prove for the oth^r case that Tj
g i v e s minimum r e s p o n s e t i m e . The proof i s i d e n t i c a l and
t h e r e f o r e i s c t t i t t e d .
We now prove t h a t the a lgor i thm Mini-Max w i l l d e r i v e
t h e maximum number o f p a i r s of combined s c h e d u l e s . Again,
l e t Q t e t h e s e t of s c h e d u l e s genertaed by algorithm Mini-
Max. Suppose t h a t Q were not the s e t of the maximum pairs
of combined s c h e d u l e . Let Q' be another s e t of s c h e d u l e s .
F u r t h e r , l e t the number of pa ired r e l a t i o n s in Q' be g r e a t e r
than t h e number of pa i red r e l a t i o n s in Q. Let Kk be the
s m a l l e s t s i z e of r e l a t i o n which does not have a combined
s c h e d u l e produced by the a lgor i thm Mini-Max, but Q» p a i r s i t
wi th some o t h e r r e l a t i o n . If i t i s not the c a s e , then the
number of pa ired r e l a t i o n s in Q' can not be mere than Q.
S i n c e t h e a lgor i thm Mini-Max s e l e c t s the s m a l l e s t r e l a t i o n
f i r s t t h e r e , the s c h e d u l e s Tjm, T2m-1 , . . - - . Tk-1m-k'«-2 e x i s t ,
and Tkj i s g r e a t e r than the minimum transmis s ion t ime , where
k < j < m-k4 2 . In order t o f ind a r e l a t i o n to pair with Bk,
t h e on ly c h o i c e i s t o pair Rk with Ri, where j < i < k.
Otherwise Q* does not conta in the minimum t r a n s m i s s i o n time
s c h e d u l e . Suppose Ri i s the l a r g e s t p o s s i b l e s i z e of
r e l a t i o n which can be paired with Rk. Then, there e x i s t s a
r e l a t i o n Rw in Q» , where m-k-f2 < w < m, which o r i g i n a l l y
p a i r s with Rx in Q i s l e f t a l o n e , where j ^ x < k- S ince Sk
x:
55
^ Sw, Rw can not be paired with Rj, where k < j< nz^jj,
hence the number of combined schedules is not more in Q*.
It contradicts the assumption that the numbers of combined
schedule are mere in Q'. Therefore algorithm Mini-Max must
produce the maximum number of paired relations-
Cora lejtjt of the Algorithm
He assume that the algorithm is computed at a location
which is a 'master* of all locations in the system. The
master has current information of the data base sysyera
(e.g., relations, their sizes, locations, network
connections and load on each line, etc-) . The master, after
computing an efficient scheduling, dispatches instructions
(e.g., subguery operations, transmission, etc.) to
appropriate locations.
Assume that the data structure Buffer in the algorithm
Mini-Max is a array of records. Each record stores various
informations about relation Ri, namely, the size Si, its
current schedule and transmission time Ti-
In step 4.1 cr step 4.3 of the algorithm Mini-Max, the
relation Ri is paired with Rj. How Ri, after have teen
chosen for transmission to Rj, is no longer exists for
consideration in the rest of guery processing strategy.
However, for the sake of efficiency, the record of updated
56
Rj is stored in the record of n , and the record of Hj is
marked 'dead'. Thus, at the beginning of a phase, first few
of records in Euffer are those which were obtained by pair
ing in the previous phase. We call these records of class
M. The relation Pi in step 3 belongs to M. Some other re
cords in Buffer are those which were marked dead in previous
phases, step 4 determines best pairs for relations of class
M with relations which are not dead. A relation of class M
may be paired with another in M. Now, we estimate com
plexity of various steps of the algotithm.
The array Puffer is sorted according to the size of re
lations. Determination of the largest active relation Sj
for Si which satisfies step 4.1 or step 4.3 is very much
like binary search in a sorted array which has multiple en
tries. Hence, for each phase, step 4 regaires 0(JM| Lg (n))
time. For step 6, it is sufficient to sort and arrange only
the relations of class H. Hence, each phase reguires
0(1M| lg(l?5l)) time for step 6.
Let Mi denote the class M for the Ith phase. He now es
timate mi = IMjI- In the Ith phase, mi relations are com
bined with another mi relations at most. Hence, number of
relations that remain in the (1 + 1) th phase is n - .E.mi.
Further, a relation of class Mi+I is combined with an active
elation which belongs to either Mi - Mi+j or belongs to the
57
r e s t o f a c t i v e r e l a t i o n s which number i s n - mj - -
i a i ' l - 2mi. Hence, mi»1 < ( 1 / 2 ) * (n - i mk) < k=l
(1 /2 ) • ( n - i*mi) . Fur ther , roi+1 < mi, which i m p l i e s mi-t-1 <
min(mi, (1 /2 ) *(n - i*mi) ) . i t can be shown, by i n d u c t i o n ,
t h a t mi < n / ( i + 1 ) , o t h e r w i s e , the above mentioned i n e g u a l i t y
i s n o t t r u e . l e t Lg denote the binary logor i thm.
The c o m p l e t e complex i ty of the algorithm i s n n
OCiS l^ i Lg (n) • i S i O i l g ("i) ) n n
= C{Lg(n) ^S^(D/i) • i i i ( n / i ) L g ( n / i ) )
= 0 ( n Lg(n) ^ l i ( V i J )
= 0 (n Lg2n) .
THEOREM.. 2 . The a lgor i thm Kini-Max d e r i v e s minimum
t r a n s m i s s i o n s c h e d u l e s in time 0 (n Lg^n) .
I t i s a p p r o p r i a t e here t o r e c a l l that the t ime com
p l e x i t y of the a lgor i thm PARAILEL i s 0 ( n 2 ) .
CHAPTER V
CONCLUSION
In previous chapters of this thesis, we have described
the general basis for processing distributed gueries and we
have also presented an algorithm for processing queries in
distributed data base system. in this chapter, we discuss
the relationships between our algorithm and other works in
this area. We also suggest some topics for future research.
Merits of Algprithj Mini-Max
The first conprehensive algorithm for distributed guery
processing was developed by Wong [WCN77J, and was later
implemented in SEC-1. Wong's algorithm translates a guery
into a sequence of two kind of operations: (1) move a
subrelation from one site to another, and (2) process a
query operation on locally available data. Each "move"
command is improved recursively by a lower cost sequences of
"move" and "process" commands. The algorithm terminates when
no "move" command can be replaced by a lower cost sequence.
The final SDE-1 strategy (algorithm OPT) [BERSIb] is
similiar to the previous one. They introduced the concept of
semijoin to abstract the main optimization problem.
Basically, algorithm OPT is an example of serial greedy
58
X
59
o p t i m i z a t i o n t e c h n i g u e ; i t always seeks to maximize
immediate g a i n , i g n o r i n g the f a c t that the e x e c u t i o n of a
s e m i j o i n o f t e n d e c r e a s e s the c o s t and i n c r e a s e s the b e n e f i t
of o t h e r s e m i j o i n s . As a r e s u l t , the o p t i m i z a t i o n proce
dures generated by OPT are sequences of e f f i c i e n t s e g u e n t i a l
programs ( i . e . , only one move and s i n g l e j c i n operat ion w i l l
be e x e c u t e d a t any t ime in the network).
Our a l g o r i t h m not only c o n s i d e r s a l l s e m i j o i n s that
c o u l d maximize immediate g a i n s tu t a l s o c o n s i d e r s maximizing
t h e number of o p e r a t i o n s which could be executed in p a r a l l e l
a t s e v e r a l s i t e s . This concept improves the e f f e c t i v e n e s s of
t h e a lgor i thm (1) by reducing the response time of the
query , and (2) by reducing the process ing overhead a t the
d e s t i n a t i o n .
Most a l g o r i t h m s for d i s t r i b u t e d query proces s ing to
d a t e a r e "open loop" and can not respond to e r r o r s in c o s t
e s t i m a t i o n . To c l o s e the l o o p , we must be able t o suspend
e x e c u t i o n of the s u b g u e r i e s and c o n s t r u c t a new s t r a t e g y
t h a t u t i l i z e s (1) p a r t i a l r e s u l t s computed so far and (2)
t h e c o s t i n f o r m a t i o n obta ined by the p a r t i a l computat ion.
T h i s s u g g e s t s t h a t an a d a p t i v e guery process ing s t r a t e g y
s h o u l d be e n f o r c e d .
Our a l g o r i t h m i s presented towards t h i s d i r e c t i o n . We
d i v i d e t h e guery o p t i m i z a t i o n procedures i n t o s t a g e s - In
60
each stage, the query optimizer chooses some subplans
involving join of two nonlocal relations Ri and Rj, trans
mission of Fi froir site i to site j, according to the pres
ent stage's ccroputation result. Thus, the errors in the es
timation will not accumulate. Therefore we can keep the
estimitation errors relatively low.
In our algorithm, a relation is transmitted to only one
destination, but several relations may be transmitted to
different locations concurrently. This involves much less
overhead as opposed to if several relations are to be trans
mitted, each to several other locations, simultaneously.
Yao's model assumes that a relation sent to several loca
tions reaches there at the same tine. Most networks do not
connect every two locations by a direct link. As a result,
whichever method, virtual or datagram, is chosen for trans
mission, the overhead on different paths would vary, and
thus this assumption remains of theoretical interest.
Another major advantage of our algorithm is that we do
not ignore the total cost of the query processing while de-
* 1 a strategy that emphasizes parallelism and thus re
duces response time cost of the guery. As there are so many
^^^<^ involved in the calculation of total time cost, parameters xiiwo.
difficult to establish any kind of relationship
en the total time cost and the response time cost. The
61
RESPONSE algorithm of [APE83] minimizes the response time
only and thus increases total ti.nie significantly- We have
carried out certain simulation [LAK84] te compare these two
costs for the algorithm Mini-Max and the RESPONSE.
iiliJiation for Study cf Total and I^sponse Time
Table 7 contains costs for the two algorithms
(algorithm 5ESE0NSE and algorithm Mini-Max) for various sets
of data bases. He assume that the data base contains ten
different relations of arbitary sizes but no relation may
contain more than one thousand distinct attribute values-
These files are generated using the VAX-11/780 uniform
distribution random number generator.
The processing time of guery operation is ignored in
our simulation so that the optimality cf response time
derived by the algorithm RESPONSE can be compared. As
indicated in the table that the ratio RT is always greater
than one. The ratios for the total time comparision are
rvinq. It should be noted that the total time cost is
. hiqher than the response time cost. So, the ratio TT is
. widely from RT. If we look at the distribution of
file size in Table 7, there is a tendency for the
iqorithm RESPCNSE to perform much more poorly than the
62
algorithm Mini-Max if the sizes of the different relations
vary greatly.
1
233 147 117 220 92
173 138 234 99 150 132
TT:
RT:
TABLE 7
Comparison of Total Time and Response Time
3 67 200 143 310 150 245 145 295 193 199 140
385 309 175 358 163 322 230 293 278 205 168
3S4 463 191 394 205 326 260 309 326 242 231
Size 5
400 466 253 448 318 335 304 490 394 311 251
477 525 275 449 331 376 393 513 430 363 393
500 529 315 486 332 403 481 595 474 464 4 40
8
509 550 405 563 560 469 489 601 489 611 444
564 591 427 601 569 478 489 621 507 625 477
10
578 621 596 629 603 560 490 6 33 634 652 608
Ratio TT RT
12.31 2.23 1.99 6.93 2.27 7.91 5.44 3.01 2.50 4. 11 3.55
1.59 2.72 1.95 1.88 2.81 2.00 2.43 1.93 3.45 2.12 2.37
total time of RESPONSE divided by total time of Mini-Max response time of Mini-Max divided by response tine of RESPONSE
63
l iJ ture Research and Su^gest icj i s
When v iewed from a r e a l i s t i c point of v iew, the query
environment a s sumpt ions in t h i s t h e s i s appear g u i t e
r e s t r i c t i v e . The s i m p l i c a t i o n i s required in order that the
a l g o r i t h m Mini-Max may t e s t a t e d and understood i n a
c o n c e p t u a l l y s i m p l e manner. The algorithm Mini-Max can be
e a s i l y a p p l i e d i n g e n e r a l guery environment with minor
m o d i f i c a t i o n s . For i n s t a n c e , we can apply algori thm
Mini-Max on each common jo in domain as descr ibed in [APE83]
by assuming a t t r i b u t e i n d e p e n i e n t , or rep lace the greedy
s t r a t e g y by a l g o r i t h m Mini-Max in [BERSIb]-
We have s u g g e s t e d t h a t our algorithm can be appl ied on
a d a p t i v e c o n t r o l p r o c e s s . In order to measure the g u a l i t y
of t h e d i s t r i b u t i o n s t r a t e g i e s derived by the a lgor i thm
Mini-Max more p r e c i s e l y , i t should be t e s t e d for dynamic
c o s t e s t i m a t i o n . The dynamic environment should be modeled
t h a t e x t e n d s and adapts the a l g o r i t h i Mini-Max. The modeling
approach should a l s o i n c l u d e deve loping methods to guant i fy
t h e overhead in a adapt ive environment as opposed to s t a t i c
env ironment .
X
BIBLIOGRAPHY
APE83- — Peter M. G. Apers, Alan R- Hevner and S. Einq Yao, Q£timjzatjon Algorithms for Distributed Queries, IEEE Trans, on Software Engineering, Vol. SE-9, No. 1, January 1983, pp. 57-68.
BAB79 . -- E. Babb, Implementing a Relational Database by 15§.lis of J^ecialized Hardware, ACM Trans. on Database System, Vol. 47 No. l7 March 1979, pp. 1-29.
BERSIa. — Philip A. Bernstein and Dah-Ming W. Chiu, JUsing Semi-Joins to Solve Relational Queries, Journal of ACM, Vol. 28, No. 1, January 1981, pp. 25-40.
BERSIb. — P. Eernstein, N. Goodman, E. Wong, C. Reeve and J. Eothnie, Querry Processing in a System for Distributed Dat abases(SED-j), ACM Trans, on Database System, Vol. 6, No. 4, Deceiber 1981, pp. 602-625.
CHD75. — Wesley H. Chu and E. E. Nahouraii, File Directory Design Considerations for Distributed lata Bases, Proceedings, International Conference on Very Large Data Base, October 1975.
CHU80. — Wesley W. Chu and Paul Hurley, Optimal Query Processing for Distributed Database Systems, IEEE Trans. oirComputeri,~Vol. C-31,"NO. S, September 1982, pp. 835-850.
DAT81. — C. J. Date, An Introduction to Database Systems, Addison-Wesley Publishing Company, Inc., 1981, pp. 181-234.
T?NS78 — Philip H. Enslow, What is a "Distributed" Data " Processing System?, Computer , January 1973, pp. 13-21.
64
65
EPS78. - - R. E p s t e i n , M. Stonebraker and E. Wong, d i s t r i b u t e d ^uery P r o c e s s i n g in a R e l a t i c n a l Data Base System, i n P r o c . SIGMCD, May 1978, pp. 169-180 .
G0082. Sim
- - Nathan Goodman and Oded Shmueli, Tree Q u e r i e s : A P i^ £ i a s s of R e l a t i o n a l Q u e r i e s , ACM Trans- on Data
base Sys tem, Vol . 7 , No. 4 , December 1982, pp. 653-667-
HEV78. — Alan R. Hevner and S. Bing Yao, £i|ery Process ing on a D i s t r i b u t e d Database , Proceedings Berkeley Workshop, 1978 .
HEV79a. - - Alan R. Hevner, The Optimization qg Query ^ r o -c e s s i n q in E i s t r i b u t e d Database Systems, Ph.D. T h e s i s , Purdue U n i v e r s i t y , L a f a y e t t e , Indiana , 1979.
HEV79b. — Alan R. Hevner and S. Bing Yao, Query Process inq / i B D i s t r i b u t e d Database Sys tems , IEEE Trans, on Software "^ Engineer ingT Vol. SE-5 , No. 3 , May 1979, pp. 177-187.
KERS2. — Larry Kerschberg, Peter D. Ting and S. Bing Yao, Query O e t i m i z a t i o n in Star Computer Networks, ACM Trans. on Database Sys t ems , Vol. 7 , No. 4 , December 1982, pp. 6 7 8 - 7 1 1 .
LAK83. — G. C. Lakhani and J. S. i ang , A Cgminent on O^ti-miza^-ion Algorithms for D i s t r i b u t e d Q u e r i e s , To appear i n lEEETransT o n ' s o f t w a r e Engineer ing , July 1984.
LAK84 — G- ^' Lakhani and J. S. Wang, P a r a l l e l i z i n g Query O o e r a t i o n s in D i s ^ i b u t e d Database, in preparat ion for i ^ m i i i l ^ to ACM SIGMOD, 1984.
po^ jaires Martin, Computer Networks and D i s t r i b u t e d P r o c e s s i n g : Sof tware , Technigue, A r c h i t e c t u r e . , p J 5 n t I c e - H a l l , " l n c . , Englewood C l i f f , N . J . , 1981, pp. 1 -64 .
Mrr78 " COCASYL Systems Committee, Distributed Data Jase Technology/ Proceedings, National Computer Conference, ^9787
66
0ZK77. — E.A. Ozkarahan, S.A. Schuster and K-C. S e v c i k , Pe iJo i f iance E v a l u a t i o n of a R e l a t i o n a l A s s o c i a t i v e Proc e s s o r , ACM T r a n s , on Database System, Vol. 2 , No. 2 , June 1977, pp. 1 7 5 - 1 9 5 .
RAM79. — C. V. Ramamoorthy and Benjamin R- Wah, Data Man-/ agei ient i n E i s t r i b u t e d Database , AFACP, 1979," pp.
667 -6 80 . "
ROT80. — J . E o t h n i e , P. E e r n s t e i n , S- Fox, N. Goodman, M. Hammer, T. Landers , C. Reeve, D. Shipman and E. Wong, I n -t r o d u c t i o n to a System f o r D i s t r i b u t e d Databases (S^D-j) , ACM Trans . on Database Sys tems , Vol. 5 , No, 1, March 1980 , pp . 1 -17 .
SAC82. — Giovanni Maria Sacco and S. Bing Yao, Query Cpt i -^ m i z a t i o n in D j s t r i b u t e d Data Base Systems, Advances in
ComputersT VolT"21,"'T982,''pp. 225-273 .
SCH7S. — H. E. Schwetman, Hybrjd Simulat ion Models of Ccm-£] i i €r S y s t g g , Cemmun. ACM, Vol- 21, No- 9, September 1978 , pp . 7 l S - 7 2 3 .
SMA79- Dana L. Small and Wesley W. Chu, A D i s t r i b u t e d Data Base A r c h i t e c t u r e for Data Process ing in a Dynamic i n v i r o n r a i n t , in Proc- COMPCON, March 1979, pp. 123-127.
e„-r7c j . M , Smith and P.Y. Chang, Opt iBiz inq the Per-i^^m^n^P of a R e l a t i o n a l Algebra Database I n t e r f a c e , Com-i S f i f l , V o l . - l S r i i i r c h 1975, pp. 568 -579 .
^ J , Smith, P. B e r n s t e i n , U. Dayal , N. Goodman, T. T nders K. Lin and E. Wong, Mult ibase - I n t e g r a t i n g Heterogeneous D i s t r i b u t e d Database Systems, AFACP, 1981, p i 7 ' 4 8 7 - 4 9 9 .
, o i - - James R. Swager, A r c h i t e c t u r e of a D i s t r i b u t e d ^^ n ^abase Information Resource , Nat ional Computer
f f f f ^ n c e r i S e l T ' p p T 501-5057
T 67
W0N77. - - Eugene Wong and Computer Corporation of America, l ^ t r i e y j n g Ci spersed Data form SDD-1: A System for D i s -t r i b u t e d D a t a b a s e s , Berkeley Workshop D i s t r i b u t e d Data Management Comput. Networks, 1977, pp. 217-235.
YA079. — S. Bing Yao, Optimizat ion of Query Evaluat ion A l g o r i t h m s , ACM Trans , on Database Systems, Vol. 4 , lie, 27 1979, pp . 133-155 .
YLC82. — C.T. Yu, K. Lam, C.C. Chang and S.K. Chang, A Promis ing Approach t o D i s t r i b u t e d Query P r o c e s s i n q , Berke ley Conference on D i s t r i b u t e d Data Ease, February 1982, pp. 3 n - 3 9 0 .
YU82. — C.T. Yu and Y.C. Lin, Some Est imation Problems i n D i s t r i b u t e d Query P r o c e s s j n j . Proceedings , The 3rd I n t e r n a t i o n a l Conference on D i s t r i b u t e d Computing Systems, '^ t o b e r 1982 , pp. 1 3 - 1 9 .
Oc-
YU83. — C.T. Yu and C.C. Chang, On the Uesi^S of A Query Process ing . S t r a t e g y in A Dis tr ibuted Database Enyiron-i e n t 7 ACM SIGMOC 8 3 , Vol. 13, No. 4, May 1983, pp. 3 0 - 3 9 .
Top Related