This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...

• This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al.

• Bayes’ rule:

• Pr is probability

• Equation means: “How does the probability of an item being a member of group change, given evidence x”

Bayesian Discriminant Analysis

Prior probabilityThis can be a problem!

Bayesian Discriminant Analysis• Bayes’ rule can be turned into a classification

rule:

=> Choose group 1

*If priors are both 0.5, decision boundaries are

where curves cross

• If the data is multivariate normal drawn from the same population, the decision rule becomes:

Bayes-Gaussian Discriminant Analysis

slope intercept

with the “distance” defined as:

and

• Note that if the data is just 1D this is just an equation for a line:

Like an average cov mat

• If the data is multivariate normal but drawn from different populations, the decision rule is the same but the “decision distance” becomes:


b c

• Note that if the data is just 1D this is an equation for a parabola:

New quadratic term

a

• The “quadratic” version is always called quadratic discriminant analysis, QDA

• The “linear” version is called by a number of names!• linear discriminant analysis, LDA

• Some combination of of the above with the words, Gaussian or classification

• A number of techniques use the name LDA!• Important to specify the equations used to tell the

difference!



Groups have similar covariance structure:linear discriminant rule should work well

Groups have different covariance structure:quadratic discriminant rule may work better

• This supervised technique is called Linear Discriminant Analysis (LDA) in R• Also called Fisher linear discriminant analysis

• CVA is closely related to linear Bayes-Gaussian discriminant analysis

• Works on a principle similar to PCA: Look for “interesting directions in data space”• CVA: Find directions in space which best separate

groups.• Technically: find directions which maximize ratio of

between group to within variation

Canonical Variate Analysis


Project on PC1:Not necessarily good group separation!

Project on CV1:Good group separation!

Note: There are #groups -1 or p CVswhich ever is smaller

• Use between-group to within-group covariance matrix, W-1B to find directions of best group separation (CVA loadings, Acv):

• CVA can be used for dimension reduction.

• Caution! These “dimensions” are not at right angles (i.e. not orthogonal)

• CVA plots can thus be distorted from reality

• Always check loading angles!

• Caution! CVA will not work well with very correlated data



2D CVA of gasoline data set: 2D PCA of gasoline data set:

• Distance metric used in CVA to assign group i.d. of an unknown data point:

• If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification.


• PLS-DA is a supervised discrimination technique and very popular in chemometrics• Works well with highly correlated variables (like in

spectroscopy)• Lots of correlation causes CVA to fail!

• Group labels coded into a “response matrix” Y• PLS searches for directions of maximum covariance in X and Y.

• Loading for X can be used like PCA loading• Dimension reduction

• Loading plots

Partial Least Squares Discriminant Analysis

Partial Least Squares Discriminant Analysis• PLS-DA theory: Find an (approximate) linear

relationship between experimental (explanatory) variables and group labels (response variables)• Y=XB+E

• X=TPT+EX

• Y=UQT+EY

• So substituting: UQT=TPTB+E

exp. vars.lbls.

“error” or “residuals” matrix

PLS-scores PLS-loadings

*Use these “Y-scores” with a “soft-max” or “Bayes” to pick “most-likely” group label

Partial Least Squares Discriminant Analysis• How do we solve this for T, P and U??• Objective: maximize covariance between X and Y scores, T

and U.

• Various procedure to do this:

• Kernel-PLS

• SIMPLS

• NIPLS

• Give close, but slightly different numerical results

• In R, functions are:

• plsr (pls package)

• spls (spls package)

• Easiest: plsda (caret pakage)


2D PLS of gasoline data set: 2D PCA of gasoline data set:

• Group assignments of observation vectors are made by interpreting Y scores.• Typically “soft-max” function is used.


Observation Vectors

Y-s

core

s

This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...

Documents

Transcript of This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...