Post on 27-Mar-2015
Negative Selection Algorithms at GECCO 20057/22/2005
AIS track of GECCO 2005• 11 regular paper
– 5 “negative selection algorithm” related
– 3 “immune network model” related
– multi –agent simulation, gene library, antigenic search
• 2 posters– Immune network model,
clonal selection
Papers on “Negative selection algorithms”• Ji & Dasgupta “Estimating the
detector coverage in a negative selection algorithm”
• Gonzalez et al “Discriminating and visualizing anomalies using negative selection algorithm and self-organizing maps”
• Stibor et al, “Is negative selection appropriate for anomaly detection?”
• Shaprio et al, “An evolutionary algorithm to generate hyper-ellipsoid detectors for negative selection”
• Hang et al, “Applying both positive and negative selection to supervise learning for anomaly detection”
“Discriminating and visualizing anomalies using negative selection algorithm and self-organizing maps”
Main Idea:• Combination of NS and
SOM (self-organizing map)• Visualize the anomalies
Key feature
• Using negative selection to produce artificial anomalies instead of detectors
SOM
• A type of neural network• To capture the feature in the
input and to provide a structural representation
• Output neurons are organized in a one- or two-dimensional lattice
• The weight vectors of these neurons represent prototypes (cluster centroid)
Three phases of NS-SOM
NS-SOM model
• “training SOP with only normal samples will produce a map that only reflect the structure of the self space, ignoring the non-self space”
• N-dimensional real-valued• During the second phase: if the input
samples are labels, … (moving the third phase).
• The first phase is executed just once, but the second and third phases could be executed as many times as sets of new samples are available
• Visual representation by a 2-D grid corresponding to the network
SOP output
• “A visual representation of the feature (self/non-self) space could be generated by drawing the 2-dimensional grid corresponding to the network, and assigning each node a different color depending on the category it represents (normal, unknown anomaly, or known anomaly).”
• “Two different SOM topologies were used with a rectangular output layer of 8×8 and 16×16 nodes.”
Output visualization
• Implementation– NS : RRNS algorithm by
Gonzalez et al– SOP : using the SOM-PAK
package by Helsinki University of Technology http://www.cis.hut.fi/
• Experiments– Iris data set– Wisconsin Breast Cancer data set
“Is negative selection appropriate for anomaly detection?”
• Problems in negative selection (specific schemes and applications)
• Compare with SVM (Support Vector Machine): requiring examples of one class or two classes?
• General problem : candidates are generated by a simple random search
• Shape space <-> affinity• “holes are necessary, to
generalizing beyond training set”– No hole: overfitting– Too many hole: underfitting
Criticism for binary representation
• “the hamming shape-space and the r-chunk matching rule only appropriate and applicable for anomaly detection problems for a small value of l (e.g. 0<l<32)” – Totally based on Esponda et
al’s analysis about number of holes
* Although I want to focus on introducing instead criticizing this work. The authors seems confused between hamming and r-chunk.
Criticism for real-valued representation
• Positive selection (Self Detection Classification) is more straightforward.
• It is not clear how to choose self radius.– “From our point of view, it is an approach
which requires two classes in the learning phase in order to determine the self-radius.” – no reason given.
• It is a problem how to find an optimal distribution do the detector (Gonzalez et al’s method takes “a vast amount of time”).
Occam’s razor principle
• When you have two competing theories which make exactly the same predictions, the one that is simpler is the better.
Comparison with SVM
• SVM is a machine learning algorithm for a two-class classification problem.
• The input data is mapped into a higher-dimensional feature space, where a linear decision region is constructed.
• A one-class SVM was proposed by Scholkopf et al.– Provides good results in high
dimensional space (no detail or results provided)
Summary
• Unfortunately, citing several related works, then making a scary claim.
• Little was done to analyze or propose alternatives, except proposing “Self Detector Classification” – detection by directly check all training samples.
“Applying both positive and negative selection to supervise learning for anomaly detection”
• Use synthetic anomalies to deal with anomaly-detection (supervised learning from class-imbalance data sets)– GA: Positive selection– Synthetic data: negative
selection
• Categorical/discrete data
Two categories of methods• At data level: main
focusing on re-sampling– Under-sampling the normal
class– Over-sampling the anomaly
class– combination
• At algorithm level
Other works using this strategy• Gonzales et al• SMOTE (Synthetic Minority
Over-sampling TEchniques)– “taking each minority class
sample and introducing synthetic examples along the line segment joining any/all of the k minority class nearest neighbors.”
The way of SMOTE generating synthetic samples
Phase 1: co-evolving patterns of the normal data (positive selection)• A number of non-interbreeding
subpopulation: no cooperation, no competition
• Randomly initialized• All converged scheme together
form the decision boundary.• Individuals consist of four
sections:
• fitness-proportionate selection• Uniform crossover• Bit flipping mutation• Subpopulation size=100• Crossover rate=0.65• Mutation rate=0.15
Phase 2: synthetic generation of anomalous samples
• Strategy 1: with seed– Starting with vacant neighbors of
the examples of the anomaly class• 2n neighbors for n-dimensional• “Vacant” means neither normal nor
anomaly
– Check if candidates is covered by schema of normal class. Those covered are removed.
• Strategy 2: without seed – in the case of no anomaly examples– Starting with random position
experiments
• UCI data sets: 14 used• Multi-class data are mapped into a
2-class dataset– Version 1: Natural distribution– Version 2: Balanced natural distribution– Version 3: balanced extreme
distribution(“balanced” means “processed by the
approach described in this paper”)
• Classifiers used: C4.5 and Naive Bayes
• Result: v2>v3>>v1