Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: 20299604...

Stable Feature Selection for Biomarker Discovery

Name: Goutham Reddy BakaramStudent Id: 20299604Instructor Name: Dr. Dongchul Kim

Review Article by Zengyou He, Weichuan Yu @ School of software, Dalian University of Technology, China

Contents

1. Abstract

2. Introduction

3. Causes of Instability

4. Existing Methods

5. Stability Measure

6. Analysis

7. Conclusions

8. Future Work

Abstract1. The article reviews different existing stable feature selection methods using

a generic hierarchical framework for biomarker discovery.2. Two Objectives:

• Providing an overview on this latest yet fast topic for convenient reference.•Categorizing existing methods into a framework for future research and development.

Keywords: Feature selection; Biomarker Discovery; Stability; Machine learning

IntroductionWhat is Feature Selection?

• In machine learning, feature selection is the process of selecting a subset of relevant features for use in model construction. They are used for three reasons:

1. Simplification2. Shorter Time3. Enhanced generalization by reducing overfitting.

• The discovery of biomarkers from high-throughput “omics” data is typically modeled as selecting the most discriminating features (or variables) for classification (e.g., discriminating healthy versus diseased, or different tumor stages)

• The stability issue in feature selection has received much attention recently. In this article, we shall review existing methods for stable feature selection in biomarker discovery applications, summarize them with an unified framework and provide a convenient reference for future research and development.

Causes of Instability

There are mainly three sources of instability in biomarker discovery:

1. Algorithm design without considering stability: Classic feature selection methods aim at selecting a minimum subset of features to construct a classifier of the best predictive accuracy.

2. The existence of multiple sets of true markers: It is possible that there exist multiple sets of potential true markers in real data. On the one hand, when there are many highly correlated features, different ones may be selected under different settings. On the other hand, even there are no redundant features, the existence of multiple non-correlated sets of real markers is also possible.

3. Small number of samples in high-dimensional data: In the analysis of gene expression data and proteomics data, there are typically only hundreds of samples but thousands of features.

Existing Methods1. The ensemble feature selection method and the method using prior feature relevance incorporate stability consideration into the algorithm design stage. To handle data with highly correlated features, the group feature selection approach treats feature cluster as the basic unit in the selection process to increase robustness. The sample injection method tries to increase the sample size to address the small-sample-size versus large-feature-size issue.

Fig. 1 A hierarchical framework for stable feature selection methods.

http://www.sciencedirect.com/science/article/pii/S1476927110000502#fig0005

Ensemble Feature Selection

1. Ensemble learning methods combine multiple learned models under the assumption that “two (or more) heads are better than one”.

2. Ensemble feature selection techniques use a two-step procedure that is similar to ensemble classification and regression:• Creating a set of different feature selectors.• Aggregating the results of component selectors to generate the ensemble output.

3. The second step is typically modeled as a rank aggregation problem. Rank aggregation combines multiple rankings into a consensus ranking, which has been widely studied in the context of web search engines

Data perturbation

• In ensemble feature selection with data perturbation, different samplings of the original data are generated to construct different feature selectors, as described in Fig. 2.

• These methods can be further distinguished according to the sampling method, the component feature selection algorithm and the rank aggregation method. (Table 1)


Function Perturbation

1. Here we use function perturbation to refer to those ensemble feature selection methods in which the component learners are different from each other. The basic idea is to capitalize on the strengths of different algorithms to obtain robust feature subsets.

2. Function perturbation is different from data perturbation in two perspectives:• It uses different feature selection algorithms rather than the same feature selection method.• It typically conducts local feature selection on the original data (without sampling).

Feature selection with prior feature relevance• In most biomarker discovery applications, we typically assume that all

features are equally relevant before the selection procedure. In practice, some prior knowledge may be available to bias the selection towards some features assumed to be more relevant.

• Fig. 3 shows that there are several methods for obtaining such kind of prior knowledge. One feasible method is to seek advices from domain experts or relevant publications.


Group Feature Selection

1. One motivation for group feature selection is that groups of correlated

features commonly exist in high-dimensional data, and such groups are

resistant to the variations of training samples

2. Group formation is the process of identifying groups of associated features.

There are typically two classes of methods for this purpose: knowledge-

driven methods and data-driven methods.

3. Group transformation generates a single coherent representation for each

feature group. The transformation method can range from simple

approaches like feature value mean to complicated methods such as

principal component analysis.

Sample Injection

1. In biomarker discovery applications, the number of features is typically larger than the sample size. This is one of the main sources of instability in feature selection. To increase the reproducibility of feature selection, one natural idea is to generate more samples.

2. However, the generation of real sample data from patients and healthy people is usually expensive and time-consuming. With this practical limitation in mind, people begin to seek other alternative methods for the same purpose

Stability Measure1. How to measure the “stability” of feature selection algorithms? Answer: Two approaches are reviewed. On the one hand, it is indispensable

for evaluating different algorithms in performance comparison. On the other hand, it can be used for internal validation in feature selection

algorithms that take into account stability.

2. Measuring stability requires a similarity measure for feature selection results. This depends on the representations used by different algorithms. Formally, let training examples be described by a vector of features F = (f1,f2,…,fm), then there are three types of representation methods:

• A subset of features: S = {s1,s2,…,sk}, si ∈ {f1,f2,…,fm}.• A ranking vector: R = (r1,r2,…,rm), 1 ≤ ri ≤ m.• A weighting-score vector: W=(w1,w2,..,wm), wi∈R+.

3. Stability measures that take two subsets are only considered for the sake of notation simplication.

Feature Subset

Most of the measures in feature subset are defined using the physical properties of only two sets.

MS 1: Relative hamming distance between the masks corresponding to two subsets.

MS 2: The Tanimoto distance metric measures the amount of overlap between two sets of arbitrary cardinality

MS 3: The Dice–Sorensen's index is the harmonic mean of |S ∩ S′|/|S| and |S ∩ S′|/|S′|.

Ranking List

1. The problem of comparing ranking lists is widely studied in different contexts such as voting theory and web document retrieval.

2. One typical example is MR2, in which the Spearman's correlation is adapted to place more weights on those top ranked features since these features are more important than irrelevant features in the stability evaluation.

• MR1: The Spearman's rank correlation coefficient takes values in [−1,1], with 1 meaning that the two ranking lists are identical and a value of −1 meaning that they have exactly inverse orders.

• MR2: The Canberra distance is a weighted version of Spearman's foot rule distance.

• MR3: The overlap score is originally proposed in and here we follow to reformulate it with the assumption that only top ranked features are important.

Weighting-score Vector

• The computational issue of combinatorial search for feature subset can to some extent be alleviated by using a feature weighting strategy. Allowing feature weights to take real-valued numbers instead of binary ones enables us to use well-established optimization techniques in algorithmic development.

• The weighting-score vector is seldom used in defining stability measure. Table 6 lists one stability measure MW1

http://www.sciencedirect.com/science/article/pii/S1476927110000502#tbl0030

Analysis1. Out of the three sources of instability for feature selection, the small

number of samples in high-dimensional feature area is most difficult part in biomarker discovery- WHY?

2. Group feature selection is used mostly in existing stable feature selection approaches because of many correlated features in high-dimensional space. However it has a PROBLEM.

3. The solution is implementing a hybrid strategy is to combine group feature selection with ensemble feature selection.

Future WorkThe group feature selection strategy is only helpful when multiple sets of true markers are generated due to the existence of redundant features. However, it is also possible that multiple sets of true markers share no correlated features. The feature selection problem in this case is much harder than finding a minimal optimal feature set for classification.

To our knowledge, there is still no available method and measure that aim at handling stability issues in this context. The general problem is open and needs more research efforts. • How to directly measure the stability of feature(s) without sampling training

data?

• Can we propose new methods that are capable of explicitly controlling the stability of reported feature subset?

• Are there other special requirements for biomarker discovery rather than stability?

Conclusion

• This paper summarizes existing stable feature selection methods and stability measures. When it comes to theoretical perspective and practical aspect, stable feature selection is an important problem.

• More efforts has to devoted to this topic.

Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: 20299604...

Documents

Transcript of Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: 20299604...