Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab...
-
Upload
gregory-williams -
Category
Documents
-
view
214 -
download
0
Transcript of Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab...
Patch Based Prediction TechniquesUniversity of Houston
By: Paul AMALAMANFrom: UH-DMML LabDirector: Dr. Eick
Introduction1. Research Goals2. Problem Setting3. Solutions: TPRTI-A & TPRTI-B4. Results5. ConclusionFuture Work
UH-DMML Lab 1
Research Goals
To improve Machine Learning techniques for inducing predictive models based on efficient subdivisions of the input space (patches)
Areas of Focus: Linear Regression Tree Induction Classification Tree Induction
UH-DMML Lab 2
Linear regression is a global model, where there is a single predictive formula holding over the entire data-space. Y = β0 + βTX + ϵ
Linear Regression TreeWhen the data has lots of input attributes which interact in complicated, nonlinear ways, assembling a single global model can be very difficult. An alternative approach to nonlinear regression is to split, or partition, the space into smaller regions, where the interactions are more manageable. We then partition the sub-divisions again - this is called recursive partitioning - until finally we get to chunks of the space which can fit simple models to them.
Splitting Method selecting the pair {split variable, split value} minimizing some error/objective function
Background (Research Goals Continued)
UH-DMML Lab 3
Popular approaches:1-Variance-based {split variable, split value} selection: try each mean value of
each input attribute objective function: variance minimization scalable, complex trees, often less accurate
2-RSS-based {split variable, split value} selection: try each value for each
input attribute (Exhaustive search) objective function: RSS minimization (Residual Sum of
Squared Errors)Less scalable, smaller trees, better accuracy
Our Research Goals:
To induce smaller trees with better accuracy while improving on scalability by designing better splitting methods (patches), and objective functions
UH-DMML Lab 4
Background (Research Goals Continued)
UH-DMML Lab 5
y
y
(A) (D) (G)(B) (C) (E)(F)
A B G
Problem Setting
x
x
Exhaustive search
Variance-based
y
(A) (B) (C)
A
B Cx
Problem Setting
Our Research Goals: To induce smaller trees with better accuracy while improving on scalability by designing better splitting methods (patches), and objective functions
UH-DMML Lab 6
1-Variance –based approaches like M5 will miss the optimum split point
2-Exhaustive search approaches like RETIS will find optimum split point but at cost of expensive search (not scalable)
x1 and x2=0, and y
Example (Problem Setting Continued)
UH-DMML Lab 7
Current Proposed Solutiono Detect areas in the dataset where the general
trend makes sharp turns (Turning Points)
o Use Turning Points as potential split points in a Linear Regression Tree induction algorithm
Challenges: Determining the turning points Balancing accuracy, model complexity, and
runtime complexity
UH-DMML Lab 8
Solutions
UH-DMML Lab 9
Determining Turning Points (Solutions continued)
Two algorithms: TPRTI-A and TPRTI-BBoth rely on 1. detecting potential split points in the dataset
(turning points)2. then feed a tree induction algorithm with the split
points
TPRTI-A and TPRTI-B differ by their objective functionso TPRTI-A RSS based node evaluation approacho TPRTI-B uses a two steps node evaluation function Select split point based on distance Use RSS computation to select the pair {split
variable/split value}
UH-DMML Lab 10
Two New Algorithms (Solutions continued)
TPRTI-A RSS based node evaluation approach.
Does a look-ahead split for each turning point and select the split that best minimizes RSS
UH-DMML Lab 11
TPRTI-B uses a two steps node evaluation function Select split point based on
distance Use RSS computation to
select the pair {split variable/split value}
Two New Algorithms (Solutions continued)
M5 TPRTI-A
TPRTI-B RETIS GUIDE SECRE
TTPRTI-
A (6/5/1) - (4/6/2) (4/6/0) (5/1/2) (4/2/2)
TPRTI-B (4/6/2) (2/6/4) - (3/7/0) (5/1/2) (1/4/3)
Table1. Comparison between TPRTI-A, TPRTI-B and state-of-the-art approaches with respect to accuracy (wins/ties/loses)
Results On Accuracy
UH-DMML Lab 12
Results On Complexity
Table2. Number of times an approach obtained the combination (Best accuracy, fewest leaf-nodes)
M5 TPRTI-A
TPRTI-B RETIS GUIDE SECRE
T
0 5 3 5 N.A. N.A.
UH-DMML Lab 13
Results On Scalability
UH-DMML Lab 14
We propose a new approach for Linear Regression Tree construction called Turning Point Regression Tree Induction (TPRTI) that infuses turning points into a regression tree induction algorithm to achieve
improved scalability while maintaining high accuracy and low model complexity.
Two novel linear regression tree induction algorithms called TPRTI-A and TPRTI-B which incorporate turning points into the node evaluation were introduced and experimental results indicate that TPRTI is a scalable algorithm that is capable of obtaining a high predictive accuracy using smaller decision trees than other approaches.
Conclusion
UH-DMML Lab 15
FUTURE WORKWe are investigating how turning point detection can also be used to induce better classification trees.
UH-DMML Lab 16
Thank You