Feature Selection in Machine Learning
-
Upload
upekha-vandebona -
Category
Education
-
view
823 -
download
0
Transcript of Feature Selection in Machine Learning
![Page 1: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/1.jpg)
Data Analytics & Machine LearningMCS4102
Assignment 2
Feature Selections with Trajan Simulator
U.V Vandebona
![Page 2: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/2.jpg)
Content
• Feature Selection• Dataset 1 - Iris Dataset• Forward Selection• Backward Selection• Genetic Algorithm
• Dataset 2 - Abalone Dataset• Dataset 3 – Custom Dataset
![Page 3: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/3.jpg)
Data Set (1) - Iris The data set contains 3 classes of 50 instances
each, where each class refers to a type of iris flower.
![Page 4: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/4.jpg)
Data Set (1) - Iris Attribute Information (All in centimeters)
› Sepal length› Sepal width› Petal length› Petal width› Flower class
Ex: 5.3,3.7,1.5,0.2,Iris-setosa5.0,3.3,1.4,0.2,Iris-setosa7.0,3.2,4.7,1.4,Iris-versicolor6.4,3.2,4.5,1.5,Iris-versicolor6.3,3.3,6.0,2.5,Iris-virginica5.8,2.7,5.1,1.9,Iris-virginica
![Page 5: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/5.jpg)
Iris Class Features1. One of the classes (Iris Setosa) is linearly separable
from the other two. However, the other two classes are not linearly separable.
2. There is some overlap between the Versicolor and Virginica classes, so that it is impossible to achieve a perfect classification rate.
3. There is some redundancy in the four input variables, so that it is possible to achieve a good solution with only three of them, or even (with difficulty) from two.
![Page 6: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/6.jpg)
Import and Setup Data
1
3 2
![Page 7: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/7.jpg)
Import and Setup Data
1. Iris dataset is just a simple dataset that values are delimited by commas.
2. Dataset doesn’t include any variable naming or case naming.
3. We can edit our dataset to give proper variable naming.
› Class field has automatically turned into a nominal field as it contain only three nominal values.
![Page 8: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/8.jpg)
Feature Selection Analysis Feature Selection From the available variables, set the dependent
and independent (output and input) variables.
Dependent Variable :
ClassIndependent Variable :
Sepal Width Sepal LengthPetal Width Petal Length
4
![Page 9: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/9.jpg)
Why Feature Selection ? Dependent or output variable states which
flower class the record belongs to; Either Virginica, Versicolor or Setosa.
Independent or input variables are used to predict that decision.
Typically we do not have a strong idea of the relationship between the available variables and the desired prediction.
![Page 10: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/10.jpg)
Why Feature Selection ? To an extent, some neural network architectures
(e.g., multilayer perceptrons) can actually learn to ignore useless variables.
However, other architectures (e.g., radial basis functions) are adversely affected, and in all cases a larger number of inputs implies that a larger number of training cases are required.
As a rule of thumb, the number of training cases should be, a good, few times bigger than the number of weights in the network, to prevent over-learning.
![Page 11: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/11.jpg)
Why Feature Selection ? As a consequence, the performance of a network
can be improved by reducing the number of inputs, even sometimes at the cost of losing some input information.› In many problem domains, a range of input variables
are available which may be used to train a neural network, but it is not clear which of them are most useful, or indeed are needed at all.
![Page 12: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/12.jpg)
Why Feature Selection ? In non-linear problems, there may be
interdependencies and redundancies between variables; › for example, a pair of variables may be of no value
individually, but extremely useful in conjunction, or any one of a set of parameters may be useful.
› It is not possible, in general, to simply rank parameters in order of importance.
![Page 13: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/13.jpg)
Why Feature Selection ? The "curse of dimensionality" means that it is
sometimes actually better to discard some variables that do have a genuine information content, simply to reduce the total number of input variables, and therefore the complexity of the problem, and the size of the network.
Counter-intuitively, this can actually improve the network's generalization capabilities.
![Page 14: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/14.jpg)
Why Feature Selection ? The method that is guaranteed to select the best
input set, is to train networks with all possible input sets and all possible architectures, and to select the best. › In practice, this is impossible for any significant
number of candidate inputs. If you wish to examine the selection of variables
more closely yourself, Feature Selection is a good technique.
![Page 15: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/15.jpg)
Feature Selection The Feature Selection Algorithms conduct a large
number of experiments with different combinations of inputs, building probabilistic or generalized regression networks for each combination, evaluating the performance, and using this to further guide the search.
This is a "brute force" technique that may sometimes find results much faster.
![Page 16: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/16.jpg)
Feature Selection It explicitly identify input variables that do not
contribute significantly to the performance of networks, then by suggest to remove them.
These algorithms are either stepwise algorithms that progressively add or remove variables, or genetic algorithms.
![Page 17: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/17.jpg)
Sampling - Random5.1 Randomized subset assignment to train, select
and test.5.1
![Page 18: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/18.jpg)
Sampling - Fixed5.2 Fixed subset assignment to train, select and
test.› Add a column containing nominal values “Train”,
“Select”, ”Test” and “Ignore”. For generate the values, support of spreadsheet package may needed. Name the column as NNSET.
5.2
![Page 19: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/19.jpg)
Sampling - Fixed
Run the Feature Selection once and these subsets will be assigned after that.
![Page 20: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/20.jpg)
Sampling A major problem with neural networks is the
generalization issue (the tendency to overfit the training data), accompanied by the difficulty in quantifying likely performance on new data.
It is important to have ways to estimate the performance of the models on new data, and to be able to select among them.
Most work on assessing performance in neural modeling concentrates on approaches to resampling.
![Page 21: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/21.jpg)
Sampling Typically the neural network is trained in using a
training subset. The test subset is used to perform an unbiased
estimation of the network's likely performance.
![Page 22: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/22.jpg)
Sampling Often, a separate subset (the selection subset) is
used to halt training to mitigate over-learning, or to select from a number of models trained with different parameters. It keep an independent check on the performance of the networks during training with deterioration in the selection error indicating over-learning.
If over-learning occurs, stops training the network, and restores it to the state with minimum selection error.
![Page 23: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/23.jpg)
Feature Selection – Results Configuration
7
6
![Page 24: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/24.jpg)
Feature Selection – Results Configuration
6. In the results shown after analysis, each row will represents a particular test of a combination of inputs. So with this, it will show every combination of inputs.
7. It is sometimes a good idea to reduce the number of input variables to a network even at the cost of a little performance, as this improves generalization capability and decreases the network size and execution size.
![Page 25: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/25.jpg)
Feature Selection – Results Configuration
You can apply some extra pressure to eliminate unwanted variables by assigning a Unit Penalty.
› This is multiplied by the number of units in the network and added to the error level in assessing how good a network is, and thus penalizes larger networks.
![Page 26: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/26.jpg)
Feature Selection – Results Configuration
If there are a large number of cases, the evaluations performed by the feature selection algorithms can be very time-consuming (the time taken is proportional to the number of cases).
› For this reason, you can specify a sub-sampling rate. (However, in this case as we have very few cases, the sampling rate of 1.0 (the default) is fine).
![Page 27: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/27.jpg)
Forward Selection Begins by locating a single input variable, that on
its own, best predicts the output variable. It then checks for a second variable, that added
to the first. Repeat the process until either all variables have
been selected or no further improvement is made.
Good for larger number of variables.
![Page 28: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/28.jpg)
Forward Selection Generally Faster. Much faster if there are few
relevant variables, as it will locate them at the beginning of its search.
Can behave sensibly when data set has large number of variables as it selects variables initially.
It may miss key variables if they are interdependent. (that is where two or more variables must be added at the same time in order to improve the model.)
![Page 29: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/29.jpg)
Results The row label indicates the stage; (e.g. 2.3
indicates the third test in stage 2. ) The final row replicates the best result found, for convenience. The first column is the selection error of the Probabilistic Neural Network (PNN) or Generalized Regression Neural Network (GRNN). Subsequent columns indicate which inputs were selected for that particular combination.
![Page 30: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/30.jpg)
Results
0 Penalty
0.003 Penalty
0.001 Penalty
0.005 Penalty0.012 Penalty
0.002 Penalty
Conclusion : By considering the span of the error values of above results with penalty value, Petal Width and Petal Length are good features to use if needed reducing of input features.
![Page 31: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/31.jpg)
Backward Selection A Reverse process. Starts with a model including all the variables
and then removes them one at a time At each stage finding the variable that, when it is
removed least degrades the model. Good for smaller (20 or less) number of
variables.
![Page 32: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/32.jpg)
Backward Selection Doesn’t suffer from missing key variables. As it starts with the whole set of variables, the
initial evaluations are most time consuming. Suffer from large number of variables. Specially if
there are only a few weakly predictive ones in the set.
Not cut down the irrelevant variables until the very end of its search.
![Page 33: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/33.jpg)
Results
0 Penalty
0.003 Penalty
0.001 Penalty
0.004 Penalty0.012 Penalty
0.002 Penalty
Conclusion : By considering the span of the error values of above results with penalty value, Petal Width, Petal Length and Sepal Length are good features to use if needed reducing of input features.
![Page 34: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/34.jpg)
Genetic Algorithm A optimization algorithm. Genetic algorithms are a particularly effective
search technique for combinatorial problems (where a set of interrelated yes/no decisions needs to be made).
The method is time-consuming (it typically requires building and testing many thousands of networks)
![Page 35: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/35.jpg)
Genetic Algorithm For reasonably-sized problem domains (perhaps
50-100 possible input variables, and cases numbering in the low thousands), the algorithm can be employed effectively overnight or at the weekend on a fast PC.
With sub-sampling, it can be applied in minutes or hours, although at the cost of reduced reliability for very large numbers of variables.
![Page 36: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/36.jpg)
Genetic Algorithm Run with the default settings, it would perform
10,000 evaluations (100 population times 100 generations).
Since our problem has only 4 candidate inputs, the total number of possible combinations is only 16 (2 raised to the 4th power).
![Page 37: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/37.jpg)
Results
0 Penalty 0.003 Penalty
0.004 Penalty
0.012 Penalty
Conclusion : By considering the span of the error values of above results with penalty value, Petal Width, Petal Length and Sepal Length are good features to use if needed reducing of input features.
![Page 38: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/38.jpg)
Data Set (2) - Abalone The age of abalone can determined by
counting the number of rings. The number of rings is the value to predict
from physical measurements.
![Page 39: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/39.jpg)
Data Set (2) - AbaloneAttribute Name
Data Type Measurement Unit
Description
Sex nominal M, F, and I (infant)Length continuous mm Longest shell
measurement Diameter continuous mm perpendicular to lengthHeight continuous mm with meat in shell Whole weight continuous grams whole abalone Shucked weight continuous grams weight of meatViscera weight continuous grams gut weight (after
bleeding)Shell weight continuous grams after being driedRings integer +1.5 gives the age in
years
![Page 40: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/40.jpg)
Results - Forward Selection
Conclusion : Sex, Whole Weight and Shell Weight are good features to use if needed reducing of input features.
Height doesn’t give any useful effect.
High Penalty : 0.001
Less Penalty : 0.0001
![Page 41: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/41.jpg)
Results - Backward Selection
Conclusion : Sex, Whole weight and Shell weight are good features to use if needed reducing of input features.
Height doesn’t give any useful effect.
High Penalty : 0.001
Less Penalty : 0.0001
![Page 42: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/42.jpg)
Results - Genetic Algorithm
Conclusion : Sex, Whole weight and Shell weight are good features to use if needed reducing of input features.
Height doesn’t give any useful effect.
Sampling rate : 0.1
High Penalty : 0.001
Less Penalty : 0.0001
![Page 43: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/43.jpg)
Data Set (3) - Custom Set 4 Classes
17 Attribute Features
• C1 • C2 • C3 • C4
•F1•F2•F3•F4•F5•F6
•F7•F8•F9•F10•F11•F12
•F13•F14•F15•F17•F18
![Page 44: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/44.jpg)
Forward Selection ResultsHigh Penalty : 0.001
Less Penalty : 0.0001
Conclusion : Feature F14 is a good features to use if needed reducing of input features.
Feature F2, F5, F6, F7, F9 and F11 doesn’t give any useful effect.
![Page 45: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/45.jpg)
Backward Selection Results
High Penalty : 0.001Less Penalty : 0.0001
Conclusion : Feature F14 is a good features to use if needed reducing of input features.
Feature F2, F5, F6, F7, F9 and F11 doesn’t give any useful effect.
![Page 46: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/46.jpg)
Genetic Algorithm Results
High Penalty : 0.001Less Penalty : 0.0001
Conclusion : Feature F14 is a good features to use if needed reducing of input features.
Feature F2, F5, F6, F7, F9 and F11 doesn’t give any useful effect.
![Page 47: Feature Selection in Machine Learning](https://reader035.fdocuments.us/reader035/viewer/2022070603/586fe1fa1a28ab18428b799b/html5/thumbnails/47.jpg)
Reference
http://archive.ics.uci.edu/ml/datasets/Iris
[Online 2015-10-25]
http://archive.ics.uci.edu/ml/datasets/Abalone
[Online 2015-10-25]
Trajan Neural Network Simulator Help