Data Visualization and Feature Selection: New Algorithms for
Nongaussian Data
Howard Hua Yang and John MoodyNIPS’99
Contents
Data visualizationGood 2-D projections for high dimensional data interpretation
Feature selectionEliminate redundancy
Joint mutual informationICA
Introduction
Visualization of input data and feature selection are intimately related.Input variable selection is the most important step in the model selection process.
Model-independent approaches to select input variables before model specification.Data visualization is very important for human to understand the structural relation among variables in a system.
Joint mutual information for input/feature selectionMutual information
Kullback-Leibler divergence
Joint mutual information
))()(||),(();( ypxpyxpKYXI iii
x xq
xpxpxqxpK
)()(
log)())(||)((
))(),...,(||),,...,(();,...,( ypxxpyxxpKYXXI kikiki
Conditional MI
When
Use joint mutual information instead of the mutual information to select inputs for a neural network classifier and for data visualization.
);,( YXXI ji
);( YXI i
0),...,|;();,...,();,,...,(X 111111 nnnnn XXYXIYXXIYXXI
)|;()|;();,();,( 13123121 XYXIXYXIYXXIYXXI
kj xx
kjkjikjkji xxypxxyxpKxxpXXYXI,
)),|(),|,((),(),|;(
);();();( 321 YXIYXIYXI
Data visualization methods
Supervised methods based on JMI cf) CCA
Unsupervised methods based on ICA cf) PCA
Efficient method for JMI
);,(maxarg ),( YXXI jiji
)|;();();,( ijiji XYXIYXIYXXI
Application to Signal Visualization and
ClassificationJMI and visualization of radar pulse patterns
Radar pattern 15-dimensional vector, 3 classes
Compute JMIs, select inputs
Radar pulse classification
7 hidden unitsExperiments
all inputs vs. 4 selected inputs4 inputs with the largest JMI vs. randomly selected 4 inputs
ConclusionsAdvantage of single JMI
Can distinguish inputs when all of them have the sameCan eliminate the redundancy in the inputs when one input is a function of other inputs
Top Related