Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate:...
-
Upload
mary-waters -
Category
Documents
-
view
218 -
download
1
Transcript of Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate:...
![Page 1: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/1.jpg)
Detecting Group Differences: Mining Contrast Sets
Author: Stephen D. BayAdvisor: Dr. HsuGraduate: Yan-Cheng Lin
![Page 2: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/2.jpg)
Outline
Motivation Objective Research Review Search for Contrast Sets Filtering for Summarizing Contrast
Set Evaluation Conclusion
![Page 3: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/3.jpg)
Motivation
Learning group differences a central problem in many domains
Contrasting groups especially important in social science research
![Page 4: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/4.jpg)
Objective
Automatically detect differences between contrasting groups from observational multivariate data
![Page 5: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/5.jpg)
Research Review
time series research multiple observations
traditional statistical methods rule learner and decision tree
miss group differences association rule mining
multiple group and different search criteria
![Page 6: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/6.jpg)
Problem Definition
itemset concept extends to contrast setDefinition 1:
Let A1,A2,...,Ak be a set of k variables called attributes.
Each Ai can take on values from the set {Vi1,Vi2,...Vim}.
Contrast set a conjunction of attribute –value pairs defined on groups G1,G2,...,Gn with no Ai occurring more than once.
![Page 7: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/7.jpg)
Define support of contrast set Definition 2:
The support of a contrast set with respect to a group G is the percentage of examples in G where the contrast set is true.
minimum support difference δ user defined threshold
![Page 8: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/8.jpg)
Search for Contrast Sets
find contrast sets meet our criteria though search
explore all possible contrast sets return only sets meet our criteria
STUCCO (Search and Testing for Understandable Consistent Contrasts): breadth-first search incorporates several efficiently mining techniques
![Page 9: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/9.jpg)
Framework use set-enumeration trees use breadth-first search counting phase organize nodes into candidate
groups
![Page 10: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/10.jpg)
Finding Significant Contrast Sets testing the null hypothesis across all groups support counts from contingency tables
![Page 11: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/11.jpg)
Controlling Search Error
data mining test many hypotheses family of tests control Type I error Bonferroni inequality:given any set of events
e1,e2,...,en, the probability of their union is less than or equal to the sum of the individual probabilities
![Page 12: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/12.jpg)
Pruning
prune when contrast sets fail to meet effect size or statistical significance criteria
prune when lead to uninteresting contrast sets
Effect Size Pruning prune nodes when bound maximum support differ
ence groups below δ Statistical Significance Pruning
pruned when too few data or maximum value X2 too small
![Page 13: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/13.jpg)
Interest Based Pruning contrast sets are not interesting when have
identical support or relation between groups is fixed
Specializations with Identical Support marital-status=husband marital-status=husband ^ Sex = male
![Page 14: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/14.jpg)
Fixed Relations
Fixed Relations prune node as contrast set specializations do
not add new information
![Page 15: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/15.jpg)
Relation to Itemset Mining
minimum support difference criterion implies constraints support levels in individual groups
eliminate large portions of the search space based on:
subset infrequency pruning effect size pruning
superset frequency pruning interest based pruning
ab abc
![Page 16: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/16.jpg)
Filtering for Summarizing Contrast Set
past approaches limit the rules shown by constraint the
variables or items compare discovered rules, show only
unexpected results new methods
expectation based statistical approach identify and select linear trend contrast
sets
![Page 17: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/17.jpg)
Statistical Surprise
show most general contrast sets first, more complicated conjunctions if surprising based on previously shown sets
IPF(Iterative Proportional Fitting) find maximum likelihood estimates
![Page 18: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/18.jpg)
Detecting Linear Trends identical to finding change over time detect significant contrast set by using the chi-
square test use regression techniques to find the portion of
the x2
![Page 19: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/19.jpg)
Evaluation three research points:
low support difference few high support attribute-value pairs, lower bounds can’
t take advantage pruning rules
δ -> 0 statistical significance pruning is more important filtering rules
![Page 20: Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f285503460f94c406e4/html5/thumbnails/20.jpg)
Conclusion
STUCCO algorithm combined statistical hypothesis testing with search for mining contrast sets
STUCOO has pruning rules efficient mining at low
support differences guaranteed control over false positives linear trend detection compact summarization of result