Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood...
-
date post
22-Dec-2015 -
Category
Documents
-
view
219 -
download
3
Transcript of Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood...
April 21, 2005 Lal: M.S. thesis defense 1
Constraint Systems Laboratory
Neighborhood Interchangeability (NI) for Non-Binary CSPs
& Application to Databases
Anagh Lal
Constraint Systems LaboratoryComputer Science & Engineering
University of Nebraska-Lincoln
Research supported by NSF CAREER award #0133568 and by Maude Hammond Fling Faculty Research Fellowship.
April 21, 2005 Lal: M.S. thesis defense 2
Constraint Systems Laboratory
Main contributions
CSPs1. Interchangeability: An algorithm for neighborhood
interchangeability (NI) in non-binary CSPs2. Dynamic bundling: Integrating NI + backtrack search
for solving non-binary CSPs3. Exploratory: Towards detecting substitutabilityDatabases1. A new model of the join query as a CSP2. A new sorting-based bundling algorithm3. A new sort-merge join algorithm that produces
bundled tuples4. Exploratory: Application to materialized views
April 21, 2005 Lal: M.S. thesis defense 3
Constraint Systems Laboratory
Outline
• Background
• Neighborhood Interchangeability (NI) for non-binary CSPs
• Empirical evaluations
• Database algorithms based on dynamic bundling
• Conclusions & future work
April 21, 2005 Lal: M.S. thesis defense 4
Constraint Systems Laboratory
Constraint Satisfaction Problem
• Given P = (V, D, C)– V : set of variables– D : set of their domains– C : set of constraints restricting the acceptable
combination of values for variables– Solution is a consistent assignment of values to variables
• Query: find 1 solution, all solutions, etc.• Examples: SAT, scheduling, product configuration• NP-Complete in general
V3
{d}
{a, b, d} {a, b, c}
{c, d, e, f}
V4
V2V1
April 21, 2005 Lal: M.S. thesis defense 5
Constraint Systems Laboratory
Systematic search• Basic mechanism
– DFS & backtracking (BT)– Variable being instantiated: current variable– Uninstantiated variables: future variables– Instantiated variables: past variables
• Constraint propagation– Remove values inconsistent with constraints– Forward checking filters domains of future
variables given the instantiation of current variable
April 21, 2005 Lal: M.S. thesis defense 6
Constraint Systems Laboratory
Value interchangeability [Freuder, ‘91]
Equivalent values in the domain of a variable
{c, d, e, f }{d}
{a, b, d} {a, b, c}
V4
V2V1
V3
• Full Interchangeability (FI): – d, e, f interchangeable for V2 in any solution
• Neighborhood Interchangeability (NI): – Efficiently approximates FI– Finds e, f but misses d– Discrimination tree DT(Vx)
April 21, 2005 Lal: M.S. thesis defense 7
Constraint Systems Laboratory
• Dynamic bundling [Our group, ‘01]
– Dynamically identifies NI– Finds fatter solution than BT & static bundling– Never less efficient than BT & static bundling
Bundling: using NI in search
BT Static bundling
S
c d, e, f
dV1
V2
Dynamic bundling
c e, f d
dV1
V2
S
c e f d
dV1
V2
S
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
• Static bundling [Haselböck, ‘93]
April 21, 2005 Lal: M.S. thesis defense 8
Constraint Systems Laboratory
Robust solutionsSingle solution• V1 d• V2 e• V3 a• V4 c
Robust solution
• V1 {d}
• V2 {d, e, f}
• V3 {a}
• V4 {b, c}
V3
{d}
{a, b, d} {a, b, c}
{c, d, e, f}
V4
V2V1• Solution bundle: Cartesian
product of bundles of variables• Solution-bundle size
= 1 3 1 2 = 6
April 21, 2005 Lal: M.S. thesis defense 9
Constraint Systems Laboratory
Phase transition [Cheeseman et al. ‘91]
• Significant increase of cost around critical value• In CSPs, order parameter is constraint tightness & ratio• Algorithms compared around phase transition
Cos
t of
sol
ving
Mostly solvable problems
Mostly un-solvable problems
Critical value of order parameter
Order parameter
April 21, 2005 Lal: M.S. thesis defense 10
Constraint Systems Laboratory
Non-binary CSPs
Constraint Variable
C1 C2 C3 C4
V V1 V2 V V3 V2 V3 V4 V1 V4
1 1 3 1 3 1 2 1 1 1
1 3 3 2 3 1 2 2 2 2
2 1 3 3 2 2 2 1 3 1
2 3 3 4 2 2 2 2
3 1 1 4 2 3 1 1
3 2 2 6 1
4 1 1
4 2 2
5 3 2
6 3 2
C4
{1, 2, 3, 4, 5, 6}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
C2
C1
C3 V1
V2
V3
V4
V
• Scope(Cx): the set of variables involved in Cx
• Arity(Cx): size of scope
Computing NI for non-binary CSPs is not a trivial extension from binary CSPs
April 21, 2005 Lal: M.S. thesis defense 11
Constraint Systems Laboratory
CSP parameters
• n number of variables
• a domain size
• t constraint tightnessratio of number of disallowed tuples over all possible tuples
• deg degree of a variable
• ck number of constraints of arity k
• pk = ck / (nk) constraint ratio
April 21, 2005 Lal: M.S. thesis defense 12
Constraint Systems Laboratory
Outline
• Background
• Neighborhood Interchangeability (NI) for non-binary CSPs– Non-binary discrimination tree (nb-DT)
• Empirical evaluations
• Database algorithms based on dynamic bundling
• Conclusions & future work
April 21, 2005 Lal: M.S. thesis defense 13
Constraint Systems Laboratory
NI for non-binary CSPs1. Building an nb-DT for each constraint
– Determines the NI sets of variable given constraint
2. Intersecting partitions from nb-DTs – Yields NI sets of V (partition of DV)
3. Processing paths in nb-DTs– Gives, for free, updates necessary for forward checking
C4
{1, 2, 3, 4, 5, 6}
C2
C1
C3
V1
V2
V3
V4
V
{1, 2} {5, 6} {3, 4}
Root
nb-DT(V, C1)
Root
{1, 2} {3, 4}{6}
{5}
nb-DT(V, C2)
April 21, 2005 Lal: M.S. thesis defense 14
Constraint Systems Laboratory
Building an nb-DT: nb-DT(V, C1)
(<V1 1>, <V2 3>)
(<V1 3>, <V2 3>)
{1, 2}
Root
C1
V V1 V2
1 1 3
1 3 3
2 1 3
2 3 3
3 1 1
3 2 2
4 1 1
4 2 2
5 3 2
6 3 2
(<V1 3>, <V2 2>)
Annotation
Path
{1}
Domain of V
5 62 3 41
O (deg . a (k+1) . (1 - t))
(<V1 2>, <V2 2>)
{3, 4}
(<V1 1>, <V2 1>)
{5, 6}
April 21, 2005 Lal: M.S. thesis defense 15
Constraint Systems Laboratory
Bundling = Search + NI• Benefits of bundling
1. Bundles solutions
2. Bundles no-goods
• Dynamic bundling (DynBndl)– Re-computes NI during search– Yields larger bundles,boosts effects
of bundling
• Skeptics’ objection to DynBndl – Costly & not worthwhile
• We show that the converse holds
{3, 4}
{2}
{1}
{1, 2}
{1, 3}{1}
{3}{1}
No-good
bundle
V
V4
V3
V1
V2
Solution bundle
April 21, 2005 Lal: M.S. thesis defense 16
Constraint Systems Laboratory
Advantages of DynBndl
• We exploit nb-DTs for forward checking• DynBndl versus FC (BT+ forward checking)
– Finding all solutions: theoretically best– Finding first solution: empirical evidence
DynBndl yields multiple, robust for less cost
April 21, 2005 Lal: M.S. thesis defense 17
Constraint Systems Laboratory
Outline
• Background
• Neighborhood Interchangeability (NI) for non-binary CSPs
• Empirical evaluations
• Database algorithms based on dynamic bundling
• Conclusions & future work
April 21, 2005 Lal: M.S. thesis defense 18
Constraint Systems Laboratory
Empirical evaluations
• DynBndl versus FC (BT+forward checking)
• Experiments– Effect of varying tightness– In the phase-transition region
• Effect of varying domain size • Effect of varying constraint ratio (CR)
• Randomly generated problems, Model B• ANOVA to statistically compare performance of
DynBndl and FC with varying t• t-distribution for confidence intervals
April 21, 2005 Lal: M.S. thesis defense 19
Constraint Systems Laboratory
Experimental set-up
• Generated 16 data sets– n = {20,30} a = {10,15} {CR1,CR2,CR3,CR4}– 9—12 values for t [25%,75%] – 1,000 instances per tightness value
• Performance measurements– FBS, size of the first solution bundle– NV, number of nodes visited in the search tree– CC, number of constraints checked– CPU time
April 21, 2005 Lal: M.S. thesis defense 20
Constraint Systems Laboratory
Analysis: Varying tightness• Low tightness
– Large FBS • 33 at t=0.35 • 2254 (Dataset #13, t=0.35)
– Small additional cost
• Phase transition– Multiple solutions present– Maximum no-good bundling
causes max savings in CPU time, NV, & CC
• High tightness– Problems mostly unsolvable– Overhead of bundling minimal
n=20a=15CR=CR3
0
2
4
6
8
10
12
14
16
18
20
0.325 0.35 0.375 0.4 0.425 0.45 0.475 0.5 0.525 0.55 0.575 0.6
TightnessT
ime
[s
ec
]#
NV
, h
un
dre
ds t FBS
0.350 33.44 0.400 10.91 0.425 7.130.437 6.38 0.450 5.620.462 2.370.475 0.660.500 0.03
0.550 0.00 NV
CPU time
DynBndl
FC
DynBndl
FC
April 21, 2005 Lal: M.S. thesis defense 21
Constraint Systems Laboratory
Analysis: Varying domain size• Increasing a in phase-
transition– FBS increases: More
chances for symmetry– CPU time decreases:
more bundling of no-goods
CR Improv (CPU) %
FBS
a=10 a=15 a=10 a=15
CR1 33.3 34.3 5.5 11.9
CR2 28.6 33.0 5.0 5.5
CR3 29.8 31.7 3.6 5.0
CR4 28.4 31.6 1.2 1.4
Increasing a (n=30)
Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical
April 21, 2005 Lal: M.S. thesis defense 22
Constraint Systems Laboratory
Outline
• Background• Neighborhood Interchangeability (NI) for
non-binary CSPs• Empirical evaluations• Database algorithms based on
dynamic bundling– Sorting-based bundling algorithm– Dynamic-bundling-based join algorithm
• Conclusions & future work
April 21, 2005 Lal: M.S. thesis defense 23
Constraint Systems Laboratory
Databases & CSPs
DB terminology CSP terminology
Table, relation Constraint (relational constraint)
Join condition Constraint (join-condition constraint)
Attribute CSP variable
Tuple in a table Tuple in a constraint or allowed by one
A sequence of natural joins All solutions to a CSP
• Same computational problems, different cost models– Databases: minimize # I/O operations– CSP community: # CPU operations
• Challenges for using CSP techniques in DB– Use of lighter data structures to minimize memory usage– Fit in the iterator model of database engines
April 21, 2005 Lal: M.S. thesis defense 24
Constraint Systems Laboratory
Join operator
• R1 xy R2– Most expensive operator in terms of I/O is “=” Equi-Join
• x is same as y Natural Join
• Join algorithms– Nested Loop– Sorting-based
• Sort-Merge, Progressive Merge-Join (PMJ)• Partitions relations by sorting, minimizes # scans of relations
– Hashing-based
April 21, 2005 Lal: M.S. thesis defense 25
Constraint Systems Laboratory
The join queryJoin query
SELECT R2.A,R2.B,R2.C
FROM R1,R2
WHERE R1.A=R2.A
AND R1.B=R2.B
AND R1.C=R2.C
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
4 10 25
5 12 23
5 13 23
5 14 23
6 13 27
6 14 27
7 14 28
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
4 10 25
5 12 23
5 13 23
5 14 23
5 15 23
6 13 27
6 14 27
Result: 10 tuples in
3 nested tuples
R1 R2 (Compacted)
A B C
{1, 5} {12, 13, 14} {23}
{2, 4} {10} {25}
{6} {13, 14} {27}
April 21, 2005 Lal: M.S. thesis defense 26
Constraint Systems Laboratory
Modeling join query as a CSP
• Attributes of relations CSP variables• Attribute values variable domains• Relations relational constraints• Join conditions join-condition constraintsSELECT R1.A,R1.B,R1.C
FROM R1,R2
WHERE R1.A=R2.A
AND R1.B=R2.B
AND R1.C=R2.C
R1.A R1.B R1.C
R2.A R2.BR2.C
R1 R2
April 21, 2005 Lal: M.S. thesis defense 27
Constraint Systems Laboratory
Progressive Merge Join• PMJ: a sort-merge algorithm by [Dittrich
et al. ‘03]
• Two phases1. Sorting: sorts sub-sets of relations &
produces early results
2. Merging phase: merges sorted sub-sets
• We use the framework of the PMJ for our external join
April 21, 2005 Lal: M.S. thesis defense 28
Constraint Systems Laboratory
New join algorithm
• Sorting & merging phases– Load sub-sets of relations in memory– Compute in-memory join using dynamic
bundling
• In-memory join– Uses sorting-based bundling (shown next)– Computes join of in-memory relations using
dynamically computed bundles Cool animation upon request
April 21, 2005 Lal: M.S. thesis defense 29
Constraint Systems Laboratory
Computing a bundle of R1.A
Partition
Unequalpartitions
Symmetricpartitions
Bundle {1, 5}
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
5 12 23
5 13 23
5 14 23
• Partition of a constraint–Tuples of the relation having the same value of R1.A
• Compare projected tuples of first partition with those of another partition
• Compare with every other partition to get complete bundle
April 21, 2005 Lal: M.S. thesis defense 30
Constraint Systems Laboratory
Experiments
• XXL library for implementation & evaluation• Data sets
• Random: 2 relations R1, R2 with same schema as example– Each relation: 10,000 tuples– Memory size: 4,000 tuples– Page size 200 tuples
• Real-world problem: 3 relations, 4 attributes
• Compaction rate achieved– Random problem: 1.48
– Savings even with (very) preliminary implementation
– Real-world problem: 2.26 (69 tuples in 32 nested tuples)
April 21, 2005 Lal: M.S. thesis defense 31
Constraint Systems Laboratory
Outline
• Background
• Neighborhood Interchangeability (NI) for non-binary CSPs
• Empirical evaluations
• Database algorithms based on dynamic bundling
• Conclusions & future work
April 21, 2005 Lal: M.S. thesis defense 32
Constraint Systems Laboratory
Conclusions
• Algorithm for computing NI sets in non-binary CSPs
• DynBndl – produces multiple robust solutions – significantly reduces cost of search at phase
transition
• New dynamic-bundling-based join algorithm
Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases
April 21, 2005 Lal: M.S. thesis defense 33
Constraint Systems Laboratory
Future work
• Sort constraint definitions to improve CSP techniques
• Design bundling mechanisms for gap & linear constraints in Constraint Databases
• Explore benefits of bundling in Databases– Sampling operator– Main-memory databases– Automatic categorization of query results
April 21, 2005 Lal: M.S. thesis defense 34
Constraint Systems Laboratory
Thanks!!
April 21, 2005 Lal: M.S. thesis defense 35
Constraint Systems Laboratory
Related work
• Join algorithms– Well established algorithms– Do not focus on exploiting symmetry
• Database compression– Output results are not compressed– Compression at value level, not tuple level
April 21, 2005 Lal: M.S. thesis defense 36
Constraint Systems Laboratory
Related work (contd)
• [Mamoulis & Papadias 1998] – Join using FC for spatial DB – Restricted to binary constraints– No compaction of solution space
• [Bayardo et al. 1996]– Reduce the number of the intermediate tuples of a sequence of
joins
• [Rich et al. 1993]– Do not compact join attribute values– Does not detect redundancy present in the grouped sub-relations
April 21, 2005 Lal: M.S. thesis defense 37
Constraint Systems Laboratory
Analysis of overheads
• For Bundling– Additional data structures: 2 arrays, 1 pointer– Only 1 array (Processed values) may become
cumbersome
• Array size is largest – when all the values of a variable are in one
bundle – But, this case also leads to best savings!
April 21, 2005 Lal: M.S. thesis defense 38
Constraint Systems Laboratory
Sorting-based bundling
• Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible
R1.A
R2.A
R1.B
R2.B
R1.C
R2.C
R1
R2
Sort relations using above ordering Next: Compute bundles of variable
ahead in variable ordering (R1.A)
April 21, 2005 Lal: M.S. thesis defense 39
Constraint Systems Laboratory
R1 J oin R2 (Compacted)
A B C
Join using bundling
Computing bundle for R1.A
A B C
A B C
Processed values R1
Processed values R2
Select partition to compare for R1.ASymmetric partitions, Adding to bundle of R1.A, Current bundle of R1.A = {1, 5}
Computing bundle for R2.ASelect partition to compareSymmetric partitions,
Adding to bundle of R2.A, Current bundle of R2.A = {1, 5}
Update processed values for R1.A
5
5
Update processed values for R2.A
R1
R2
R2.C
R1.A
R2.A
R1.B
R2.B
R1.C
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
4 10 25
5 12 23
5 13 23
5 14 23
6 13 27
6 14 27
7 14 28
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
4 10 25
5 12 23
5 13 23
5 14 23
5 15 23
6 13 27
6 14 27
April 21, 2005 Lal: M.S. thesis defense 40
Constraint Systems Laboratory
R1 J oin R2 (Compacted)
A B C
Join using bundling
5
1, 55
Current bundle of R1.A = {1, 5}
Current bundle of R2.A = {1, 5}
Common(R1.A, R2.A) = {1, 5}
Compute current constraint of R1
Assign {1, 5} to R1.A
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
4 10 25
5 12 23
5 13 23
5 14 23
6 13 27
6 14 27
7 14 28
A B C
A B C
Processed values R1
Processed values R2
R1
R2
R2.C
R1.A
R2.A
R1.B
R2.B
R1.C
1, 5
1, 5
Assign {1, 5} to R2.A
Compute current constraint of R2
Next variable R1.B
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
4 10 25
5 12 23
5 13 23
5 14 23
5 15 23
6 13 27
6 14 27