Computational decision making

30
Computational decision making Dr. Boris Adryan @BorisAdryan

Transcript of Computational decision making

Page 1: Computational decision making

Computational decision making

Dr. Boris Adryan @BorisAdryan

Page 2: Computational decision making

What I aim to provide

✓basic vocabulary for ✓ fundamental concepts of

computational decision making ✓phenomenological introduction

to machine learning methods ✓a rough idea when and how to

use these methods

Page 3: Computational decision making

What this presentation isn’t

x hands-on tutorial

x thorough summary

x comprehensive guide

x technical deep dive

x statistics course

Page 4: Computational decision making

Is this artificial intelligence?

word = input(‘Enter a word:’) for key in British_dictionary.iteritems(): if key.startswith(word): print(‘This is a British word.’)

Page 5: Computational decision making

Is this machine learning?

temperature = float(input('What is the temperature?')) if temperature >= 1.0: print('Wear shorts.') else: print('Wear long underwear.’)

Page 6: Computational decision making

DefinitionRule-based decision making on the basis of numeric thresholds or string patterns etc is not machine learning.

And most definitely it is not artificial intelligence.

Page 7: Computational decision making

But what if……the threshold is inferred at run time?

“Write a software that says how close to Euston you can move if you can afford to spend £650k.”

450

550

650

750

850

950

1050

0 3 6 9 12

Average property price

Northern Line, number of stops from Euston

table = input_table(‘cost at station’)

print(where_x_yields_a_low_enough_y)

Page 8: Computational decision making

Linear regressionis probably the most simple “machine learning” method.

Average property price

Northern Line, number of stops from Euston

It is an example of supervised learning, because we teach the computer the relation between an input variable (“feature”) and an output variable (“label”). 450

550

650

750

850

950

1050

0 3 6 9 12

y = m . x + b

Page 9: Computational decision making

Linear regressioncan become arbitrarily complicated.

The difference between curve fitting in statistics and machine learning is mostly semantics.

f(number of stops to Euston, square footage, bedrooms, bathrooms, …) price

many features

EustonHigh

Barnet

small

large

price

Page 10: Computational decision making

Classification tasksI. setosa

I. virginica

I. versicolor

All images from https://en.wikipedia.org/wiki/Iris_flower_data_set

Rather than projecting the feature vector onto a continuous variable, many supervised learning methods identify “class labels”.

Page 11: Computational decision making

Classification tasksSupervised learning requires complete input matrices. Missing or nonsensical values have to be replaced or removed.

Non-numerical features (think, e.g. “name of colour”, “smell”) have to be encoded.

class label feature 1 feature 2 feature 3 feature 4

1 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 7.0 3.2 4.7 1.4

3 6.3 3.3 6.0 2.5

Page 12: Computational decision making

Classification tasks

sepal length

sepa

l wid

th

I. setosa

I. virginica

In a first approximation, classification (by regression) aims to find a function that best separates the different class labels.

f(sepal width, sepal length) (1,2)

“1”“2”

Page 13: Computational decision making

sepal length

sepa

l wid

th

I. setosa

I. virginica

I. versicolor

Decision trees

sepal width

peta

l wid

th

A decision tree can be understood as a series of linear separations of the data.

ratio of sepal width : sepal length

I. virginica ratio of petal width : sepal width

I. setosa I. versicolor

Page 14: Computational decision making

Random forestsA collection of decision trees, each trained on a random subset of the data, can minimise the risk of over-fitting.

A single big decision tree trained on all data can effectively describe a single data point.

sepal width

peta

l wid

th

Page 15: Computational decision making

Over-/UnderfittingSloppy separation is called underfitting, and greedy separation overfitting.

Counteracting an overfit is called regularisation, and works by penalising too many features (L1) or too strong feature weights (L2).

underfit (high bias)

okay

overfit (high variance)

Page 16: Computational decision making

Dimensionality reductionDimensionality reduction aims to reduce the complexity of a dataset (in respect to number of features).

The first principal components are dimensions that explain most of a dataset’s variance.

Page 17: Computational decision making

Support vector machineThe SVM aims to provide an ideal separation plane by supporting it with a training data vector.

A classification margin protects the SVM against overfitting.

?

margin support vectors

decision boundary wTx = 0

negative hyperplane wTx = -1

positive hyperplane wTx = 1

Page 18: Computational decision making

Kernel trickInput data can be projected into a higher-dimensional space that allows linear separation of otherwise inseparable data.

There are different kernels, such as the radial basis function (Gaussian) or polynomial kernel.

“2D” input space

Φ

“3D” feature space

http://scikit-learn.org/0.18/auto_examples/svm/plot_iris.html

Page 19: Computational decision making

features

weather forecast

airport location

# of gates

# of runways

# of snowploughs

airline

aircraft

BLACK BOX

trainingflights

cancelled in the past classifier

ranked list of relevant features

weight of features

thresholds for features

performance metric

prediction

new dataGeneral approachAn intuitive example from real life.

Page 20: Computational decision making

training

classifier

performance assessment

good enough?

success!

mor

e da

ta fo

r tra

inin

g

data

noyes

sens

itivi

ty

“true

pos

itive

s”

1-specificity “false positives”

0 0.2 0.4 0.6 0.8 1.0

1.0

0.8

0.6

0.4

0.2

worse than random guess

Classifier performanceNot all machine learning behaves ideal, and performance metrics are important for quality checks and parameter tuning.

Page 21: Computational decision making

https://en.wikipedia.org/wiki/Precision_and_recall

There is a wide range of performance metrics, comprising combinations of true & false positives as well as true & false negatives.

Metrics zoo

positive class (P) negative class (N)

predicted positive true positive (TP) false positive (FP)

predicted negative false negative (FN) true negative (TN)

Page 22: Computational decision making

data acquisition model building test use in production

data recording (production system)

evaluation

raw data clean-up feature engineering model learning model selection

labour intense compute intensebrain intense

development

production

ML pipeline

Page 23: Computational decision making

Choosing a method

from: Olson et al., 2017, https://arxiv.org/abs/1708.05070

There is no ‘one-size-fits-all’ machine learning method. Most methods need to be carefully tuned to perform ideal.

Often, there a ‘non-functional’ constraints on choosing a method. Runtime, interpretability, etc.

Page 24: Computational decision making

What about neural networks?

feature 1 feature 2 feature 3weight 1 weight 2 weight 3

input function

activation function

class output error for weight updates

a simple perceptron

Neural networks attempt to mimic the integrative properties of neurons. The perceptron is a single-layer network.

inputs

outputs

Page 25: Computational decision making

Deep neural networks

http://www.asimovinstitute.org/neural-network-zoo

While many artificial neural networks show great performance, the basis on which features exactly the classification works remains largely unknown.

Page 26: Computational decision making

In RL, the methods iteratively learn to optimise an output from an abstract representation of a system

Reinforcement learning

move (l/r) or shoot

unknown systemmap (210 x 160 pixels, 8-bit RGB)

actual score

choose action on basis of map to optimise score

Mnih et al., Nature (2015)

Page 27: Computational decision making

Machine learning can help to structure and explore an unknown dataset.

These methods aid the identification of classes where their existence isn’t known yet.

Unsupervised learning

• Hierarchical clustering • K-means clustering • Expectation maximisation • Density-based clustering

plus

clever visualisation

Page 28: Computational decision making

Hierarchical clusteringDerives hierarchical dependencies of individual rows and columns in the dataset on the basis of similarity (correlation) between their properties.

Combined with a heatmap, gives a good first impression of a dataset.

Page 29: Computational decision making

k-means clusteringDefines k different centroids to which data points are assigned by proximity. If the distance doesn’t get much smaller, k is the number of clusters in the set.

Density-based clustering and expectation maximisation are conceptually related, the latter giving a probability for membership in any group.

k = 2

k vs centroid distance

Page 30: Computational decision making

Conclusions

https://badryan.github.io/2015/10/20/is-it-all-machine-learning.html

• People on the Internet steal infographics.

• ML methods have been around in the stats world for ages, but big data sets and compute power make them more widely known.

• Understanding key principles behind ML should be part of the school curriculum.