Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
description
Transcript of Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
1
Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
2
Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
5.1 Introduction 5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
3
Model Essentials – Neural Networks
Predict new cases.
Select useful inputs.
Optimize complexity.
...
4
Model Essentials – Neural Networks
Stoppedtraining
None
Predict new cases.
Select useful inputs
Optimize complexity
Select useful inputs.
Optimize complexity.
...
5
Model Essentials – Neural Networks
Stoppedtraining
None
Predict new cases.
Select useful inputs.
Optimize complexity.
...
6
Neural Network Prediction Formula
predictionestimate
weightestimate
hidden unit
biasestimate
0
1
5-5
-1
tanh
...
activationfunction
...
8
Neural Network Binary Prediction Formula
0
1
5-5
-1
tanh
0 1
5
-5
logitlink function
...
9
Neural Network Diagram
y
targetlayer
H1
H2
H3
hiddenlayer
x2
inputlayer
x1
...
10
Neural Network Diagram
y
targetlayer
H1
H2
H3
hiddenlayer
x2
inputlayer
x1
...
11
Prediction Illustration – Neural Networks
...
logit equation
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
12
Prediction Illustration – Neural Networks
...
logit equation
Need weight estimates.
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
13
Prediction Illustration – Neural Networks
...
logit equation
Log-likelihood Function
Weight estimates found by maximizing:
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
14
Prediction Illustration – Neural Networks
...
logit equation 0.70
0.60
0.50
0.40
0.40
0.60
0.50
0.50
0.60
0.30
Probability estimates are obtained by solving the logit equation for p for each (x1, x2).^
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
15
Neural Nets: Beyond the Prediction Formula
Interpret the modelInterpret the model.
Handle extreme or unusual values
Use non-numeric inputs
Account for nonlinearities
Manage missing values.
Handle extreme or unusual values.
Use non-numeric inputs.
Account for nonlinearities.
...
17
Training a Neural Network
This demonstration illustrates using the Neural Network tool.
18
Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
19
Model Essentials – Neural Networks
Predictionformula
Best modelfrom sequence
Sequentialselection
Predict new cases.
Select useful inputs
Optimize complexity.
Select useful inputs.
20
21
5.01 Multiple Answer PollWhich of the following are true about neural networks in SAS Enterprise Miner?
a. Neural networks are universal approximators.
b. Neural networks have no internal, automated process for selecting useful inputs.
c. Neural networks are easy to interpret and thus are very useful in highly regulated industries.
d. Neural networks cannot model nonlinear relationships.
22
5.01 Multiple Answer Poll – Correct AnswersWhich of the following are true about neural networks in SAS Enterprise Miner?
a. Neural networks are universal approximators.
b. Neural networks have no internal, automated process for selecting useful inputs.
c. Neural networks are easy to interpret and thus are very useful in highly regulated industries.
d. Neural networks cannot model nonlinear relationships.
23
Selecting Neural Network Inputs
This demonstration illustrates how to use a logistic regression to select inputs for a neural network.
24
Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
25
Model Essentials – Neural Networks
Predict new cases.
Select useful inputs.
Optimize complexity.
Predictionformula
Sequentialselection
...
26
Fit Statistic versus Optimization Iteration
^logit(ρ1)logit( p ) = ^
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
logit(0.5)0
initial hidden unit weights
+ 0·H1 + 0·H2 + 0·H3
...
27
Fit Statistic versus Optimization Iteration
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
logit( p ) = ^ 0 + 0·H1 + 0·H2 + 0·H3
random initial input weights and biases
...
28
Fit Statistic versus Optimization Iteration
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
logit( p ) = ^ 0 + 0·H1 + 0·H2 + 0·H3
random initial input weights and biases
...
29
Fit Statistic versus Optimization Iteration
0 5 15 20Iteration10
...
30
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration1 10
...
31
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration2 10
...
32
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration3 10
...
33
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration4 10
...
34
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration10
...
35
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration6 10
...
36
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration7 10
...
37
Fit Statistic versus Optimization Iteration
0 5 10 15 20
validationtraining
ASE
Iteration8
...
38
Fit Statistic versus Optimization Iteration
0 5 10 15 20
validationtraining
ASE
Iteration9
...
39
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration10
...
40
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration1011
...
41
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration10 12
...
42
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration10 13
...
43
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration10 14
...
44
Fit Statistic versus Optimization Iteration
0 5 15 20
validationtraining
ASE
Iteration10
...
45
Fit Statistic versus Optimization Iteration
0 5 20
validationtraining
ASE
Iteration1510 16
...
46
Fit Statistic versus Optimization Iteration
0 5 20
validationtraining
ASE
Iteration1510 17
...
47
Fit Statistic versus Optimization Iteration
0 5 20
validationtraining
ASE
Iteration1510 18
...
48
Fit Statistic versus Optimization Iteration
0 5 20
validationtraining
ASE
Iteration1510 19
...
49
Fit Statistic versus Optimization Iteration
0 5
validationtraining
ASE
Iteration201510
...
50
Fit Statistic versus Optimization Iteration
0 5
validationtraining
ASE
Iteration201510 21
...
51
Fit Statistic versus Optimization Iteration
0 5
validationtraining
ASE
Iteration201510 22
...
52
Fit Statistic versus Optimization Iteration
0 5
validationtraining
ASE
Iteration201510 23
...
53
Fit Statistic versus Optimization Iteration
ASE
Iteration
0.70
0.60
0.50
0.40
0.40
0.60
0.50
0.50
0.60
0.30
0 5 15 2010 12
...
54
Increasing Network Flexibility
This demonstration illustrates how to further improve neural network performance.
55
Using the AutoNeural Tool (Self-Study)
This demonstration illustrates how to use the AutoNeural tool.
56
Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)5.4 Other Modeling Tools (Self-Study)
57
Model Essentials – Rule Induction
Predict new cases.
Select useful inputs.
Optimize complexity.
58
Rule Induction Predictions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.74
0.39
[Rips create prediction rules.]
A binary model sequentially classifies and removes correctly classified cases.
[A neural network predicts remaining cases.]
59
Model Essentials – Dmine Regression
Predict new cases.
Select useful inputs.
Optimize complexity.
60
Dmine Regression Predictions Interval inputs binned,
categorical inputs grouped
Forward selection picks from binned and original inputs
61
Model Essentials – DMNeural
Predict new cases.
Select useful inputs.
Optimize complexity.
62
DMNeural Predictions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0 Up to three PCs with highest target R square are selected.
One of eight continuous transformations are selected and applied to selected PCs.
The process is repeated three times with residuals from each stage.
63
Model Essentials – Least Angle Regression
Predict new cases.
Select useful inputs.
Optimize complexity.
64
Least Angle Regression Predictions
1.0
Inputs are selected using a generalization of forward selection.
An input combination in the sequence with optimal, penalized validation assessment is selected by default.
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
65
Model Essentials – MBR
Predict new cases.
Select useful inputs.
Optimize complexity.
66
MBR Prediction Estimates
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
Sixteen nearest training data cases predict the target for each point in the input space.
Scoring requires training data and the PMBR procedure.
67
Model Essentials – Partial Least Squares
Predict new cases.
Select useful inputs.
Optimize complexity.
68
Partial Least Squares Predictions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0 Input combinations (factors) that optimally account for both predictor and response variation are successively selected.
Factor count with a minimum validation PRESS statistic is selected.
Inputs with small VIP are rejected for subsequent diagram nodes.
69
Exercises
This exercise reinforces the concepts discussed previously.
70
Neural Network Tool ReviewCreate a multi-layer perceptron on selected inputs. Control complexity with stopped training and hidden unit count.