Mathematical techniques in data science

Lecture 7: Classification – Logistic regression and LDA

March 1st, 2019

Chapter 4: Classification

Logistic regression

Linear Discriminant Analysis

Nearest neighbours

Support Vector Machines

Next Friday (03/08): Homework 1 due

Groups (for class projects) need to be formed by the end ofnext week

Class project: example

Dataset: 25,000 images of dogs and cats

Similar: Malaria cell images dataset, Sign language dataset

Class project: example

Dataset: 285,000 credit card transactions

Similar: heart disease dataset

Linear discriminant analysis

Suppose

Y ∈ {1, 2, . . . ,K}P(Y = i) = πi , i = 1, 2, . . . ,K .

P(X = x |Y = i) ∼ fi (x)

P(Y = i |X = x) =P(X = x |Y = i)P(Y = i)∑Kj=1 P(X = x |Y = j)P(Y = j)

=fi (x)πi∑Kj=1 fj(x)πj

Model P(X = x |Y = i) using Gaussian distributions

The natural model for fi (x) is the multivariate Gaussiandistribution

fi (x) =1√

(2π)p det(Σi )e−

(x−µi )T Σ−1

i (x−µi ), x ∈ Rp

µ: mean vector

Σ: covariance matrix

Σ = E [(X − µ)T (X − µ)]

LDA and QDA

The natural model for fi (x) is the multivariate Gaussiandistribution

fi (x) =1√

(2π)p det(Σi )e−

(x−µi )T Σ−1

i (x−µi ), x ∈ Rp

Linear discriminant analysis (LDA): We assume

Σ1 = Σ2 = . . . = ΣK

Quadratic discriminant analysis (QDA): general cases

Parameter estimation: LDA

Suppose we have dataset (x1, y1), (x2, y2), . . . , (xn, yn) where niobservations have label i .

An estimate of the class probabilities πi

πi =nin

Estimate the mean vectors

µi =1

∑yj=i

Estimate the covariance matrix Σ

N − K

K∑i=1

∑yj=i

(xj − µi )(xj − µi )T

Decision rule

Suppose we have dataset (x1, y1), (x2, y2), . . . , (xn, yn) where niobservations have label i .A new instance x arrives, how to predict label of x?

Compute πi , µi and Σ

Compute

P(Y = i |X = x) ≈ pk(x) =fi (x , µi , Σ)πi∑Kj=1 fj(x , µj , Σ)πj

Bayes classifier: assign an observation to the class for whichthe posterior probability pk(x) is greatest.

LDA: linearity of the decision boundary

Note that

pi (x) =fi (x , µi , Σ)πi∑Kj=1 fj(x , µj , Σ)πj

fi (x) =1√

(2π)p det(Σ)e−

(x−µi )T Σ−1(x−µi ), x ∈ Rp

logpi (x)

pk(x)= log

πiπk− 1

2(µi + µk)T Σ−1(µi − µk) + xT Σ−1(µi − µk)

= β0 + xTβ

How is that different from logistic regression?

Recall that for logistic regression

logP[Y = 1|X = x ]

P[Y = 0|X = x ]= β0 + xTβ

Both methods use linear decision boundary

Both are simple, and often perform very well.

However

The probability models are different

The estimations are different

Practical problem when n� p

Estimating covariance matrices when n� p is challenging

The sample covariance Σ is singular when n� p

Need regularization

Nearest neigbours

Suppose Y = {0, 1}Parameter: k

Use closest observations in the training set to make predictions

Y (x) =1

∑Nk (x)

Here Nk(x) denotes the k-nearest neighbors of x (w.r.t. somemetric, e.g. Euclidean distance)

Decision:

label(x) =

{0 if Y (x) < 0.5

1 otherwise

Nearest neighbors

Note: Y = {0, 1},Y (x) =

∑Nk (x)

Nearest neighbors

Decision boundary, k = 3

Nearest neighbors

Mathematical techniques in data science - vucdinh.github.io fileLogistic regression Linear...

Transcript of Mathematical techniques in data science - vucdinh.github.io fileLogistic regression Linear...

Mathematical techniques in data science - vucdinh.github.io fileLogistic regression Linear...

Documents

Transcript of Mathematical techniques in data science - vucdinh.github.io fileLogistic regression Linear...

The Quadratic Formula and the Discriminant. The Discriminant.

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Neighbours 2

Neighbours Helping Neighbours Since 1947

Linear discriminant analysis, two classes Linear discriminant

1. Fisher Linear Discriminant 2. Multiple Discriminant ...

Neighbours Newsletter - neighbours-rouyn-noranda.ca · News from Noranda School 7 A Day of Remembrance 8 ... NEIGHBOURS REGIONAL ASSOCIATION OF ROUYN-NORANDA Neighbours Newsletter

Discriminant Analysis with Graph Learning for ...crabwq.github.io/pdf/2018 Discriminant Analysis... · Linear Discriminant Analysis Revisited In this section, the Linear Discriminant

Estonian neighbours

1. Fisher Linear Discriminant 2. Multiple Discriminant Analysis

Benny's neighbours

My neighbours

Incremental Hierarchical Discriminant Regressionweng/research/TNN-IHDR.pdf · Incremental Hierarchical Discriminant Regression ... incremental learning, cortical development, discriminant

Chapter 5: Linear Discriminant Functions Introduction Linear Discriminant Functions and Decisions Surfaces Generalized Linear Discriminant Functions.

LOGISTIC PROCESS REENGINEERING: A CASE STUDY · PDF fileLOGISTIC PROCESS REENGINEERING: A CASE STUDY Abstract: Theaimofthispaperistopresentthecasestudyofbusinessprocessesreengineering

1 Lecture 4 Linear machine Linear discriminant functions Generalized linear discriminant function Fisher’s Linear discriminant Perceptron Optimal separating.

The Discriminant

Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.

SPSS Discriminant Function Analysis - PiratePanelcore.ecu.edu/ofe/StatisticsResearch/SPSS Discriminant Function... · SPSS Discriminant Function Analysis By Hui Bian Office for Faculty

Discriminant analysis