Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection...

17
COMP61011 : Machine Learning Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown

Transcript of Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection...

Page 1: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

COMP61011 : Machine Learning

Feature Selection

Gavin Brown www.cs.man.ac.uk/~gbrown

Page 2: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

The Usual Supervised Learning Approach

data + labels

Learning Algorithm

Model

Testing data Predicted label

Page 3: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Pattern Recognition: Then and Now

Predicting recurrence of cancer from gene profiles:

Only a subset of features actually influence the phenotype.

Predicting Recurrence of Lung Cancer

Only a few genes actually matter!

Need small, interpretable subset to help doctors!

Page 4: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Pattern Recognition: Then and Now

Text classification.... is this news story “ interesting” ?

“Bag-of-Words” representation:

x = { 0, 3, 0, 0, 1, ..., 2, 3, 0, 0, 0, 1} one entry per word!

Easily 50,000 words! Very sparse - easy to overfit !

Need accuracy, otherwise we lose visitors to our news website!

Page 5: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

The Usual Supervised Learning Approach ?????

data + labels

Learning Algorithm – OVERWHELMED!

Model

Testing data Predicted label

Page 6: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

With big data….

Feature selection

Time complexity

Computational cost

Cost in data collection

Over-fitting

Lack of interpretability

Page 7: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Some things matter, Some do not.

Relevant features - those that we need to perform well

Irrelevant features - those that are simply unnecessary

Redundant features - those that become irrelevant in the presence of others

Page 8: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

3 main categories of Feature Selection techniques:

Wrappers, Filters, Embedded methods

Page 9: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Wrappers: Evaluation method

Feature set Pros:

Model-oriented

Usually gets good performance for the

model you choose.

Cons:

Hugely computationally expensive.

Trains a model

Trains a model

Outputs accuracy

Page 10: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Wrappers: Search strategy

101110000001000100001000000000100101010

20 features … 1 million feature sets to check

25 features … 33.5 million sets

30 features … 1.1 billion sets

Need for a search strategy

Sequential forward selection

Recursive backward elimination

Genetic algorithms

Simulated annealing

With an exhaustive search

Page 11: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Wrappers: Sequential Forward Selection

Page 12: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Search Complexity for Sequential Forward Selection

Page 13: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Feature Selection (2): Filters

Page 14: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Search Complexity for Filter Methods

Pros:

A lot less expensive!

Cons:

Not model-oriented

Page 15: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Feature Selection (3): Embedded methods

Pros:

Performs feature selection as part of learning the procedure

Cons:

Computationally demanding

Principle: the classifier performs feature selection as part of the learning procedure

Example: the logistic LASSO (Tibshirani, 1996)

With Error Function:

Cross-entropy error Regularizing term

Page 16: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

Conclusions on Feature Selection

Potential benefits

Wrappers generally infeasible on the modern “big data” problem.

Filters mostly heuristics, but can be formalized in some cases.

- Manchester MLO group works on this challenge.

Page 17: Feature Selectionnogueirs/files/SLIDES-master.pdf · Computational cost Cost in data collection Over-fitting Lack of interpretability . Some things matter, Some do not. Relevant features

This is the End of the Course Unit….

That’s it. We’re done.

Exam in January – past papers on website.

MSc students: Projects due Friday, 4pm

CDT/MRes students: 1 week later.

You need to submit a hardcopy to SSO:

- your 6 page (maximum) report

You need to send by email to Gavin :

- the report as PDF, and a ZIP file of your code.