AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features...

64
Feature Selection School of Computer Science & Engineering Chung-Ang University Artificial Intelligence Dae-Won Kim

Transcript of AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features...

Page 1: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Feature Selection

School of Computer Science & Engineering Chung-Ang University

Artificial Intelligence

Dae-Won Kim

Page 2: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

What is feature selection?

Page 3: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

All of the features must be used?

Name Age Sex Alg. C++ Military Height Weight C Smoke Class

label

Student-1 21 M A B Y 178 80 A Yes A

Student-2 23 F A A N 165 50 A No B

Student-3 22 F B C N 160 45 C Yes A

Student-4 21 M B A N 180 70 B No C

We predict the grade of ‘AI’ using the following training data.

Page 4: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Definition: Given a set of features, select a subset that performs best.

Page 5: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Q: Advantages ?

Page 6: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Benefits-1: simpler and faster

Page 7: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

It reduces the number of features.

Page 8: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Benefits-2: cost-effect

Page 9: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

It provides lower computational cost and clinical cost, etc.

Page 10: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Benefits-3: a better accuracy

Page 11: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

It may provide a better accuracy by removing useless features.

Page 12: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Benefits-4: a deeper insight

Page 13: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Knowledge of good features gives insights into the underlying structure.

Page 14: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Caution: Feature Selection vs. Feature Transformation

Page 15: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Related to scaling and normalization.

Page 16: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Certain transformation of features may lead to the discovery of structures that were not obvious on the original scale.

Page 17: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1
Page 18: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1
Page 19: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

For scaling, we usually take square roots, reciprocals, and logarithms.

Page 20: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

We also handle the question of normalization.

Page 21: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Features are often normalized to lie in a fixed range, from zero to one.

Page 22: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

1. By dividing all values by the maximum value encountered.

Page 23: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

2. By subtracting the minimum value and dividing by the range between the max. and the min.

Page 24: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

3. By calculating the standard mean and standard deviation of the feature, subtract the mean from each value, and divide the result by the standard deviation.

Page 25: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

This is also called standardization. (mean-zero and s.d.-one)

Page 26: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

However, we may sacrifice the way it represents the underlying data.

Page 27: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Caution: Feature Selection vs. Feature Extraction

Page 28: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Definition: It creates new good features using combinations or transformation of the original feature.

Page 29: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

1. By using linear combinations that are simple to compute & tractable.

Page 30: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

2. By projecting high dimensional data onto a lower dimensional space.

Page 31: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

PCA (principal component analysis) SVD (singular value composition) LDA (linear discriminant analysis) MDA (multiple discriminant analysis) ICA (independent component analysis) MDS (multi-dimensional scaling) NMF (nonnegative matrix factorization) …

Page 32: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

PCA extracts principal components calculated by the covariance.

Page 33: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

New features are obtained by a linear combination of PCs.

Page 34: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Have you heard about eigen vector, eigen value in linear algebra?

Page 35: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1
Page 36: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1
Page 37: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

PCA (Principal Component Analysis)

548

187

1324

121

4

3

2

1

321

X

X

X

X

AAA

Given four data where each data point has three attributes, reduce the dimension of attributes to ‘two’

Step 1. Find the eigen vectors () and corresponding eigen values () through the covariance matrix

Step 2. Sort the eigen vectors according to their corresponding eigen values in descending order

1 = 17.79 1 = [0.59 -0.43 0.68]T (referred to as the first principal component)

2 = 9.58 2 = [0.42 -0.56 -0.72]T (referred to as the second principal component)

3 = 2.42 3 = [0.69 0.71 -0.15]T (referred to as the third principal component)

Step 3. Create new attributes using the top k eigen vectors where k = 2.

Original attributes x eigenvectorT

= [1 2 1] x 1T

= [1 2 1] x [0.59 -0.43 0.68]T

= 0.41

Original attributes x eigenvectorT

= [1 2 1] x 2T

= [1 2 1] x [0.42 -0.56 -0.72]T

= -1.42

??

??

??

??

4

3

2

1

21

X

X

X

X

AA newnew

48.240.6

26.237.1

80.834.10

42.141.0

4

3

2

1

21

X

X

X

X

AA newnew

Page 38: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

SVD (Singular Value Decomposition)

Given the relation matrix of documents and terms, recalculate the relation matrix with top 2 singular values

Step 1. Decompose the original matrix A into three matrices (i.e., using MATLAB)

A = [1/singular values x A x eigen vectors] [singular values] [eigen vectors]T, singular value = (eigen value)1/2

Step 2. Create new matrix Anew using the top k entries of the decomposed matrices where k = 2.

101000truck

011001car

000011moon

000010astronaut

000101cosmonaut

654321 dddddd

A

??????truck

??????car

??????moon

??????astronaut

??????cosmonaut

654321 dddddd

Anew

58.058.057.000.000.000.0

22.041.019.063.029.053.0

58.058.000.058.000.000.0

33.012.020.045.075.028.0

41.022.063.019.053.029.0

12.033.045.020.028.075.0

039.00000

0000.1000

00028.100

000059.10

0000016.2

09.058.041.065.026.0

16.058.015.035.070.0

61.000.037.051.048.0

73.000.059.033.013.0

25.058.057.030.044.0

101000

011001

000011

000010

000101

A

1st singular value

2nd singular value

1st eigen vector 2nd eigen vector

49.041.090.008.039.013.0

41.062.003.121.013.098.0

21.016.005.036.072.000.1

18.003.021.016.036.036.0

08.021.013.028.052.085.0

41.022.063.019.053.029.0

12.033.045.020.028.075.0

59.10

016.2

65.026.0

35.070.0

51.048.0

33.013.0

30.044.0

newA

Page 39: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

PCA is useful for representing data.

Page 40: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

LDA is useful for discriminating data.

Page 41: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1
Page 42: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

PCA seeks for the orthogonal directions.

Page 43: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

ICA seeks for the independent directions.

Page 44: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

ICA is useful for blind source separation.

Page 45: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Let us go back to feature selection.

Page 46: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

How can we select good features?

Page 47: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Quiz-11: propose your idea on feature selection using examples

Page 48: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

We can find a set of good features.

Page 49: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

We can find a set of bad features.

Page 50: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

We need definitions on the good and bad feature.

Page 51: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

redundant, relevant, inconsistent,…

Page 52: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Feature selection is a special case of search problem (optimization prob).

Page 53: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

We can use all search techniques.

Page 54: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Among many approaches, we learn techniques using Filter vs. Wrapper.

Page 55: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

The difference: learning algorithm.

Page 56: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Wrapper uses learning algorithms.

Page 57: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Filter: univariate ranking 1. score each feature 2. sort them 3. select top-ranked features

Page 58: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Q: pros and cons of univariate filter

Page 59: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Filter: multivariate correlation

Page 60: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Wapper: Search + Learning + Evaluation 1. Form subsets 2. Evaluate them using classifers 3. Expand or modify subsets

Page 61: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Wapper: Search 1. Greedy algorithm 2. Branch and bound 3. Hill-climbing algorithm 4. Genetic algorithm 5. …

Page 62: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Wapper: Learning (Bayesian, K-NN) + Evaluation (accuracy)

Page 63: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Q: pros and cons of wrapper

Page 64: AI Lecture Notes - CAUai.cau.ac.kr/teaching/ai-2011/12a-feature-selection.pdfAll of the features must be used? Name Age Sex Alg. C++ Military Height Weight C Smoke Class label Student-1

Q: which feature handling technique is appropriate to your project?