This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...
-
Upload
alexis-blair -
Category
Documents
-
view
217 -
download
0
Transcript of This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...
![Page 1: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/1.jpg)
• This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al.
• Bayes’ rule:
• Pr is probability
• Equation means: “How does the probability of an item being a member of group change, given evidence x”
Bayesian Discriminant Analysis
Prior probabilityThis can be a problem!
![Page 2: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/2.jpg)
Bayesian Discriminant Analysis• Bayes’ rule can be turned into a classification
rule:
=> Choose group 1
*If priors are both 0.5, decision boundaries are
where curves cross
![Page 3: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/3.jpg)
• If the data is multivariate normal drawn from the same population, the decision rule becomes:
Bayes-Gaussian Discriminant Analysis
slope intercept
with the “distance” defined as:
and
• Note that if the data is just 1D this is just an equation for a line:
Like an average cov mat
![Page 4: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/4.jpg)
• If the data is multivariate normal but drawn from different populations, the decision rule is the same but the “decision distance” becomes:
Bayes-Gaussian Discriminant Analysis
b c
• Note that if the data is just 1D this is an equation for a parabola:
New quadratic term
a
![Page 5: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/5.jpg)
• The “quadratic” version is always called quadratic discriminant analysis, QDA
• The “linear” version is called by a number of names!• linear discriminant analysis, LDA
• Some combination of of the above with the words, Gaussian or classification
• A number of techniques use the name LDA!• Important to specify the equations used to tell the
difference!
Bayes-Gaussian Discriminant Analysis
![Page 6: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/6.jpg)
Bayes-Gaussian Discriminant Analysis
Groups have similar covariance structure:linear discriminant rule should work well
Groups have different covariance structure:quadratic discriminant rule may work better
![Page 7: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/7.jpg)
• This supervised technique is called Linear Discriminant Analysis (LDA) in R• Also called Fisher linear discriminant analysis
• CVA is closely related to linear Bayes-Gaussian discriminant analysis
• Works on a principle similar to PCA: Look for “interesting directions in data space”• CVA: Find directions in space which best separate
groups.• Technically: find directions which maximize ratio of
between group to within variation
Canonical Variate Analysis
![Page 8: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/8.jpg)
Canonical Variate Analysis
Project on PC1:Not necessarily good group separation!
Project on CV1:Good group separation!
Note: There are #groups -1 or p CVswhich ever is smaller
![Page 9: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/9.jpg)
• Use between-group to within-group covariance matrix, W-1B to find directions of best group separation (CVA loadings, Acv):
• CVA can be used for dimension reduction.
• Caution! These “dimensions” are not at right angles (i.e. not orthogonal)
• CVA plots can thus be distorted from reality
• Always check loading angles!
• Caution! CVA will not work well with very correlated data
Canonical Variate Analysis
![Page 10: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/10.jpg)
Canonical Variate Analysis
2D CVA of gasoline data set: 2D PCA of gasoline data set:
![Page 11: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/11.jpg)
• Distance metric used in CVA to assign group i.d. of an unknown data point:
• If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification.
Canonical Variate Analysis
![Page 12: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/12.jpg)
• PLS-DA is a supervised discrimination technique and very popular in chemometrics• Works well with highly correlated variables (like in
spectroscopy)• Lots of correlation causes CVA to fail!
• Group labels coded into a “response matrix” Y• PLS searches for directions of maximum covariance in X and Y.
• Loading for X can be used like PCA loading• Dimension reduction
• Loading plots
Partial Least Squares Discriminant Analysis
![Page 13: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/13.jpg)
Partial Least Squares Discriminant Analysis• PLS-DA theory: Find an (approximate) linear
relationship between experimental (explanatory) variables and group labels (response variables)• Y=XB+E
• X=TPT+EX
• Y=UQT+EY
• So substituting: UQT=TPTB+E
exp. vars.lbls.
“error” or “residuals” matrix
PLS-scores PLS-loadings
*Use these “Y-scores” with a “soft-max” or “Bayes” to pick “most-likely” group label
![Page 14: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/14.jpg)
Partial Least Squares Discriminant Analysis• How do we solve this for T, P and U??• Objective: maximize covariance between X and Y scores, T
and U.
• Various procedure to do this:
• Kernel-PLS
• SIMPLS
• NIPLS
• Give close, but slightly different numerical results
• In R, functions are:
• plsr (pls package)
• spls (spls package)
• Easiest: plsda (caret pakage)
![Page 15: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/15.jpg)
Partial Least Squares Discriminant Analysis
2D PLS of gasoline data set: 2D PCA of gasoline data set:
![Page 16: This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649f395503460f94c55ec1/html5/thumbnails/16.jpg)
• Group assignments of observation vectors are made by interpreting Y scores.• Typically “soft-max” function is used.
Partial Least Squares Discriminant Analysis
Observation Vectors
Y-s
core
s