Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering...
-
Upload
brice-horton -
Category
Documents
-
view
216 -
download
0
Transcript of Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering...
Treatment Learning:Implementation and Application
Ying Hu
Electrical & Computer Engineering
University of British Columbia
Ying Hu http://www.ece.ubc.ca/~yingh 2
Outline
1. An example2. Background Review3. TAR2 Treatment Learner
• TARZAN: Tim Menzies• TAR2: Ying Hu & Tim Menzies
4. TAR3: improved tar2• TAR3: Ying Hu
5. Evaluation of treatment learning6. Application of Treatment Learning7. Conclusion
Ying Hu http://www.ece.ubc.ca/~yingh 3
First Impression
low high
6.7 <= rooms < 9.8 and12.6 <= parent teacher ratio < 15.9
0.6 <= nitric oxide < 1.9 and17.16 <= living standard < 39
• C4.5’s decision tree:• Treatment learner:
Boston Housing Dataset (506 examples, 4 classes)
Ying Hu http://www.ece.ubc.ca/~yingh 4
Review: Background
What is KDD ? – KDD = Knowledge Discovery in Database [fayyad96]
– Data mining: one step in KDD process– Machine learning: learning algorithms
Common data mining tasks– Classification
• Decision tree induction (C4.5) [quinlan86]• Nearest neighbors [cover67]• Neural networks [rosenblatt62]• Naive Baye’s classifier [duda73]
– Association rule mining• APRIORI algorithm [agrawal93]• Variants of APRIORI
Ying Hu http://www.ece.ubc.ca/~yingh 5
Treatment Learning: Definition– Input: classified dataset
• Assume: classes are ordered
– Output: Rx=conjunction of attribute-value pairs• Size of Rx = # of pairs in the Rx
– confidence(Rx w.r.t Class) = P(Class|Rx)– Goal: to find Rx that have different level of
confidence across classes– Evaluate Rx: lift– Visualization form of output
Ying Hu http://www.ece.ubc.ca/~yingh 6
Motivation: Narrow Funnel Effect When is enough learning enough?
– Attributes: < 50%, accuracy: decrease 3-5% [shavlik91]
– 1-level decision tree is comparable to C4 [Holte93]
– Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97]
– Scheduling: random sampling outperforms complete search (depth-first) [crawford94]
Narrow funnel effect– Control variables vs. derived variables
– Treatment learning: finding funnel variables
Ying Hu http://www.ece.ubc.ca/~yingh 7
TAR2: The Algorithm Search + attribute utility estimation
– Estimation heuristic: Confidence1
– Search: depth-first search• Search space: confidence1 > threshold
Discretization: equal width interval binning Reporting Rx
– Lift(Rx) > threshold Software package and online distribution
Ying Hu http://www.ece.ubc.ca/~yingh 8
The Pilot Case Study Requirement optimization
– Goal: optimal set of mitigations in a cost effective manner
Risks
Mitigations
RequirementsCost
reduce
relates
Benefit
incur
achieve
Iterative learning cycle
Ying Hu http://www.ece.ubc.ca/~yingh 9
The Pilot Study (continue) Cost-benefit distribution (30/99 mitigations)
Compared to Simulated Annealing
Ying Hu http://www.ece.ubc.ca/~yingh 10
Problem of TAR2 Runtime vs. Rx size
To generate Rx of size r: To generate Rx from size [1..N]
Ying Hu http://www.ece.ubc.ca/~yingh 11
TAR3: the improvement
Random sampling– Key idea:
• Confidence1 distribution = probability distribution
• sample Rx from confidence1 distribution
– Steps:• Place item (ai) in increasing order according to
confidence1 value
• Compute CDF of each ai
• Sample a uniform value u in [0..1]
• The sample is the least ai whose CDF>u
– Repeat till we get a Rx of given size
Ying Hu http://www.ece.ubc.ca/~yingh 12
Comparison of Efficiency Runtime vs. Data size
Runt i me vs. at t r i bute#
R2 = 0. 9436
0
5
10
15
20
25
30
10 20 30 40 50 60 70 80 90 99
at t r i bute#
Runt
ime
(sec
)
Runt i me vs. Rx si ze
R2 = 0. 8836
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8
Treatment si ze
Runt
ime
(sec
)
Runtime vs. Rx size
Runtime vs. TAR2
Ying Hu http://www.ece.ubc.ca/~yingh 13
Comparison of Results
Mean and STD in each round
Final Rx: TAR2=19, TAR3=20
10 UCI domains, identical best Rx
pilot2 dataset (58 * 30k )
Ying Hu http://www.ece.ubc.ca/~yingh 14
External Evaluation
All attributes(10 UCI datasets)
learning
FSS framework
someattributes
learning
CompareAccuracy
C4.5Naive Bayes
Feature subset selectorTAR2less
Ying Hu http://www.ece.ubc.ca/~yingh 15
The Results
Accuracy using Naïve Bayes
(Avg increase = 0.8% )
Number of attributes
Accuracy using C4.5(avg decrease 0.9%)
Ying Hu http://www.ece.ubc.ca/~yingh 16
Compare to other FSS methods
# of attribute selected (C4.5 )
# of attribute selected (Naive Bayes)
17/20, fewest attributes selected Another evidence for funnels
Ying Hu http://www.ece.ubc.ca/~yingh 17
Applications of Treatment Learning Downloading site: http://www.ece.ubc.ca/~yingh/ Collaborators: JPL, WV, Portland, Miami Application examples
– pair programming vs. conventional programming– identify software matrix that are superior error
indicators– identify attributes that make FSMs easy to test– find the best software inspection policy for a
particular software development organization Other applications:
– 1 journal, 4 conference, 6 workshop papers