CSIS 113A Lecture 3 Conditional & Switch Glenn Stevenson CSIS 113A MSJC.
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard...
-
Upload
roberta-french -
Category
Documents
-
view
218 -
download
0
Transcript of TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard...
TAR: Temporal Association Rules on Evolving Numerical
Attributes
Wei Wang, Jiong Yang, and Richard Muntz
Speaker: Sarah Chan
CSIS DB Seminar
May 7, 2003
Presentation Outline
Introduction Problem Definition Mining Algorithms Performance Evaluation Conclusions
Introduction
Association rule mining• X Y (itemsets)• Existence of X implies existence of Y
Earlier work focused on binary attributes and intra-transaction relationships• E.g. “ham bread”: means “A customer who buys h
am is likely to buy bread as well”
Introduction
Cannot describe relationships such as:• If price of item A falls below $1, then monthly sales of
item B rise by a margin between 10K and 20K.• People between 35 and 45 with salary between 80K
and 120K are likely to buy a house whose price is between 300K and 400K within 2 years of marriage.
Goal: to mine ARs involving numerical attributes and temporal evolution
Problem Definition
Each object has a set of numerical attributes Database: a sequence of snapshots S1, S2, .. St
of objects Evolution: temporal changes of values of some
attribute of some object• E.g. Evolution of “salary” attr. with 3 snapshots
(salary [40000,45000]) (salary [47500,55000]) (salary [60000,70000])
Problem Definition
TARs (on evolving numerical attributes): ARs that capture correlations among attr. evolutions
Scope of paper: only consider correlations of simultaneous evolutions (i.e. attr. evolutions over same set of snapshots)
Mining Quantitative ARs Srikant and Agrawal (SIGMOD’96)
• Divide domain of each quantitative attr. into intervals• Combine intervals as long as their support is less tha
n max-sup threshold• A set of items: original and combined intervals• Apply traditional AR mining algorithm
BitOp (Lent et al., ICDE’97)• Rule form:
A B C quantitative categorical
• Partition attribute domain
into 2-D grids• For each value of attr. C
Examine data in each grid cell to see if AR applies Represent result by a bit in a 2-D bitmap Combine ARs with adjacent LHS attr. values to form a clustered AR
• Smoothing: to cover “small holes” in a big cluster
Mining Quantitative ARs
b4
b3
b2
b1a1 a2 a3 a4 a5 a6
xxx
x x
x
x x
x x
xxx
xxx xx
x
x
xx
x
Mining TARs SR algorithm (based on Srikant et al., 1996)
• Map numerical attribute evolutions to binary attrs.• Apply any traditional AR mining algorithm• Transform binary attr. values in rules to numerical ran
ges• Complexity
For a numerical attr. quantized to b intervals• Need O(b2) items to represent all possible sub-ranges• For t snapshots, need O(b2t) items to encode all possible evolutions
Huge number of items, very inefficient
Mining TARs LE algorithm (based on BitOp)
• Quantize domains• Map each possible evolution of RHS attr. into an item• For each rule form, generate clustered rules for each
possible value of each possible RHS attr.• Complexity
For a RHS attr. quantized to b intervals, consider its evolution over t snapshots
• There could be b2t distinct evolutions Total no. of possible evolutions increases exponentially with no. of a
ttrs. and no. of snapshots
Mining TARs TAR algorithm
The Model: Evolution and Its Space
Given attr. Ai and m snapshots
• Evolution E(Ai ) = (Ai [l1, u1]) (Ai [l2, u2]) … (Ai [lm, um])
Length of evolution = m
• Evolution space of Ai: m dimensional space (jth dimension associated with value of Ai at jth snapshot)
The Model: Evolution and Its Space
E.g. E1 = (salary [40000,45000]) (salary [47500,55000]) (salary [60000,70000])
The Model: Evolution Conj. and Its Space
Given n attrs A1, A2, …, An (length m)
• Evolution conjunction: E(A1) E(A2) … E(An)
• Evolution space: n x m dimensional space (each dimension associated with value of one attr. at one snapshot)
The Model: TAR
TAR: X Y (evolution conjunctions)• Symmetric relationship• Assumption: Y only contains evolution of one attr.
• E(A1)E(A2)…E(Ak-1)E(Ak+1)…E(An) E(Ak)
The Model: Window
Window• Subsequence of m consecutive snapshots
• For t available snapshots S1, S2, …, St, there are t-m+1 windows of width m
The Model: Object History
Object history of an object o over a window W• The sequence of changes of o over W
• Follows an evolution E(Ai) iff, for each snapshot in the window, the value of Ai in the object history falls into corr. interval specified in E(Ai)
• Follows an evolution conjunction E(A1) E(A2) … E(An) iff it follows every evolution in it
• o satisfies the TAR X Y iff, it has an object history that follows X and Y
The Model: TAR as Hypercube
Each object history can be mapped to a point in evolution space of involved attributes
TAR: a hypercube in this space, which contains the set of object histories satisfying the rule
Support, density & strength thresholds: constraints on number & distribution of object histories in hypercube
The Model: Rule Set
Rule set <rmin, rmax>: set of all rules r s.t. r is a specialization of rmax and a generalization of rmin
Each rule set can summarize a large no. of valid rules
Mining TARs: TAR algorithm Find density-based (subspace) clusters Find all valid rule sets
Mining TARs: TAR algorithm Find density-based (subspace) clusters
• Create base intervals for each attribute• Form base cubes from base intervals: n=1, m=1• Bottom-up clustering algorithm• Density of an evolution cube: object history concentr
ation of the sparsest base cube in it• The Apriori property holds on density
Find all valid rule sets
Mining TARs: TAR algorithm Find density-based (subspace) clusters Find all valid rule sets
• Make use of the strength and support metrics For rule X Y,
strength = Sup(X Y) / (Sup(X) x Sup(Y))
• Strength is used to prune search space
Pruning with the Strength Threshold
Property 1:• For any rule r, a base rule bri
which is a specialization of r and
with strength that of r.
Implication• Only have to examine rules which are generalization
s of BR (set of base rules) whose strength thres.
Pruning with the Strength Threshold
Property 2• For any two rules r and r’ where
r’ is a specialization of r, and
strength of r’ < strength of r,
another base rule bri which is
a specialization of r but not r’ and
strength of bri > strength of r.
Implication• Can skip rules which are generalizations of r’ but whi
ch do not contain any other base rule in BR.
Finding Rule Sets from Each Cluster
Find BR For each subset of BR, explore
corr. search region from rule r(min. bounding box of rules in BR’)• If strength of r < thres., ignore region• min-rule
If sup of r thres., min-rule r If sup of r < thres., search for its valid generalizations within region.
Stop when strength < thres.
• max-rule Search similarly until a rule is found s.t. all of its generalizations either viola
te strength requirement or another base rule is included There can be multiple max-rules for a min-rule
Performance Evaluation
300MHz CPU with 128MB memory Three synthetic datasets
• 100,000 objects with 5 attributes• 100 snapshots• Embedded 500 rules of length 5 or less• User-specified thresholds
Density: 2 (2 times the average density) Support: 5% Strength: 1.3
Performance Evaluation
Precision: 100% for all algorithms
Recall
Performance Evaluation
Observations• TAR is faster than SR and LE
Strength is used to prune the search space in TAR Search a smaller set of candidate rules
• Response time of TAR increases at a slower pace w.r.t. number of base intervals
Performance Evaluation
Real dataset• 20,000 objects (persons)• 5 attributes: age, title, salary, family status (single,
married, head of household), distance between person’s house and a major city
• 10 snapshots (one per year)• No. of base intervals 100; support 3%, density 2,
strength 1.3
Performance Evaluation
Performance of TAR alg. on real dataset• Time taken: 260s to mine 347 rule sets• Examples of TARs
People receiving a salary raise tend to move further away from city center.
If people with a salary in the range 70K and 100K get a raise, the range of the raise will likely be from 7K to 15K.
Conclusions
A TAR model is proposed to represent correlations among numerical attribute evolutions.
A novel approach to mine TARs by first discovering clusters and then efficiently constructing rule sets is introduced.
Empirical evaluation shows TAR algorithm outperforms alternative algs. by a large margin.
References
W. Wang, J. Yang, and R. Muntz. W. Wang, J. Yang, and R. Muntz. TAR: Temporal TAR: Temporal association rules on evolving numerical attributesassociation rules on evolving numerical attributes, , ICDE’01.ICDE’01.
R. Srikant and R. Agrawal. R. Srikant and R. Agrawal. Mining quantitative Mining quantitative association rules in large relational tablesassociation rules in large relational tables, SIGMOD’96., SIGMOD’96.
B. Lent, A. Swami, and J. Widom. B. Lent, A. Swami, and J. Widom. Clustering Clustering association rulesassociation rules, ICDE’97., ICDE’97.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Raghavan. Automatic subspace clustering of high Automatic subspace clustering of high dimensional data for data mining applicationdimensional data for data mining application, , SIGMOD’98.SIGMOD’98.