Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.
Environmental Data Analysis with MatLab
description
Transcript of Environmental Data Analysis with MatLab
![Page 1: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/1.jpg)
Environmental Data Analysis with MatLab
Lecture 15:
Factor Analysis
![Page 2: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/2.jpg)
Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power Spectral DensityLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUS
![Page 3: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/3.jpg)
purpose of the lecture
introduce
Factor Analysis
a method of detecting patterns in data
![Page 4: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/4.jpg)
source A
ocean
sediment
source B
s4s2 s3s1 s5
example:
sediment samples are a mix of several sources
![Page 5: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/5.jpg)
e1e2e3e4e5
e1e2e3e4e5
s1 s2
ocean
sediment
what does the composition of the samples
tell you about the composition of the sources?
![Page 6: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/6.jpg)
another example
Atlantic Rock Datasetchemical composition for several thousand rocks
![Page 7: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/7.jpg)
Rocks are a mix of minerals, and …
mineral 1mineral 2mineral 3
rock 1 rock 2rock 3
rock 4
rock 5 rock 6 rock 7
…minerals have a well-defined composition
![Page 8: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/8.jpg)
Which simpler?
rocks have a chemical composition
or
rocks contain mineralsand
minerals have chemical compositions
![Page 9: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/9.jpg)
answer will depend on how many minerals are involved
and how many elements are in each mineral
![Page 10: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/10.jpg)
representing mixing with matrices
![Page 11: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/11.jpg)
the sample matrix, SN samples by M elements
e.g.sediment samples
rock samples
word element is used in the abstract sense and may not refer to actual chemical elements
![Page 12: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/12.jpg)
the factor matrix, FP factors by M elements
e.g.sediment sources
minerals
note that there are P factorsa simplification if P<M
![Page 13: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/13.jpg)
the loading matrix, CN samples by P factors
specifies the mix of factors for each sample
![Page 14: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/14.jpg)
summary
samples contain factors
factors contain elements
![Page 15: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/15.jpg)
an important issue
how many factors are needed to represent the samples?
need at most P=Mbut is P < M ?
![Page 16: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/16.jpg)
simple example using ternary diagrams
![Page 17: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/17.jpg)
samples
element
element element B
![Page 18: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/18.jpg)
samples
element
element element B
line of samples implies only 2 factors, so P=2
![Page 19: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/19.jpg)
factorssamples
element
element element B
![Page 20: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/20.jpg)
A) B)factor, f’2
factor, f’1
factor, f1
factor, f2
data do not uniquely determine factors
two bracketing factors most typical factor and deviation from it
![Page 21: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/21.jpg)
mathematically
S = CF = C’ F’with F’ = M F and C’ = C M-1 where M is any P×P matrix with an inverse
must rely on prior information to choose M
![Page 22: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/22.jpg)
a method to determine
the minimum number of factors, Pand
one possible set of factors
![Page 23: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/23.jpg)
a digression, but an important one
suppose that we have an N×N square matrix, Mand we experiment with it by multiplying “input”
vectors, v, by it to create “output” vectors, ww = Mv
![Page 24: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/24.jpg)
surprisingly, the answer to the question
when is the output parallel to the input ?
tells us everything about the matrix
![Page 25: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/25.jpg)
if w is parallel to vthenw = λ v
where λ is a proportionality factor
the equationw = Mv is then λ v = Mv or (M - λ I)v=0
![Page 26: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/26.jpg)
but if (M - λ I)v=0then it would seem that
v = (M - λ I)-10 = 0 which is not a very interesting solutionw is parallel to v when v is zero
![Page 27: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/27.jpg)
to make an interesting solution you must choose λ so that
(M - λ I)-1 doesn’t exist
which is equivalent to choosing λ so that
det(M - λ I)=0
![Page 28: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/28.jpg)
to make an interesting solution you must choose λ so that
(M - λ I)-1 doesn’t exist
which is equivalent to choosing λ so that
det(M - λ I)=0
since a matrix with zero
determinant has no inverse
![Page 29: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/29.jpg)
in the 2×2 case …
this is a quadratic equation in λand so has two solutionsλ1 and λ 2
![Page 30: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/30.jpg)
in the N×N case
det(M - λ I)=0
is an N-order polynomial equationand so has N solutionsλ1, λ 2 , … λ N
each corresponds to a different vv(1), v(2), … v(N)
![Page 31: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/31.jpg)
in the N×N case
det(M - λ I)=0
is an N-order polynomial equationand so has N solutionsλ1, λ 2 , … λ N
each corresponds to a different vv(1), v(2), … v(N)“eigenvalues”
“eigenvectors”
![Page 32: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/32.jpg)
N×N matrix, Mw = Mv when is the output parallel to the input ?
N different cases
Mv(1) = λ1v(1) Mv(2) = λ2v(2) …Mv(N) = λNv(N)
![Page 33: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/33.jpg)
Mv(1) = λ1v(1) Mv(2) = λ2v(2) …Mv(N) = λNv(N) simplify notationMV = V Λ
![Page 34: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/34.jpg)
In the text its shown thatif M is symmetric
then
all λ’s are real
v’s are orthonormal
v(i)T v(j) = 1 if i=j0 if i ≠ j
![Page 35: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/35.jpg)
In the text its shown thatif M is symmetric
then
all λ’s are real
v’s are orthonormal
v(i)T v(j) = 1 if i=j0 if i ≠ j
implies VTV = VVT= I
![Page 36: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/36.jpg)
MV = V Λpost-multiply by VT
M = V Λ VT
M can be constructed from V and Λso
when is the output parallel to the input ?tells you everything about M
![Page 37: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/37.jpg)
now here’s what this has to do with factors
![Page 38: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/38.jpg)
suppose S is square and symmetricthen
S = CF = V Λ VT
![Page 39: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/39.jpg)
suppose S is square and symmetricthen
S = CF = V Λ VTC F
![Page 40: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/40.jpg)
suppose S is square and symmetricthen
S = CF = V Λ VTC F
S can be represented by M mutually-perpendicular factors, F
![Page 41: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/41.jpg)
furthermore, suppose that only P eigvenvalues are nonzero
the eigenvectors with zero eigenvalues can be thrown out of the equation
![Page 42: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/42.jpg)
we can reduce the number of factors from M to P
S = CF = VP ΛP VPTC F
S can be represented by P mutually-perpendicular factors, FP
![Page 43: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/43.jpg)
unfortunately …
Sis usually neither square nor symmetric
so a patch in the methodology is needed
![Page 44: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/44.jpg)
the trick …
STSis an M×M square matrix
![Page 45: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/45.jpg)
suppose
STShas eigenvalues ΛP and eigenvectors VP
![Page 46: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/46.jpg)
STS written in terms of its eigenvalues and eigenvectors
![Page 47: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/47.jpg)
STS written in terms of its eigenvalues and eigenvectors
write ΛP as product of its square roots
![Page 48: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/48.jpg)
STS written in terms of its eigenvalues and eigenvectors
write ΛP as product of its square roots insert identity matrix, I
![Page 49: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/49.jpg)
STS written in terms of its eigenvalues and eigenvectors
write ΛP as product of its square roots
write I = UpTUp, with Up as yet unknown
insert identity matrix, I
![Page 50: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/50.jpg)
STS written in terms of its eigenvalues and eigenvectors
write ΛP as product of its square roots
write I = UpTUp, with Up as yet unknown
insert identity matrix, I
group and write first group as transpose of transpose
![Page 51: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/51.jpg)
STS written in terms of its eigenvalues and eigenvectors
write ΛP as product of its square roots
write I = UpTUp, with Up as yet unknown
insert identity matrix, I
group and write first group as transpose of transpose
compare
![Page 52: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/52.jpg)
so
![Page 53: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/53.jpg)
and
so
![Page 54: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/54.jpg)
and
so
called the “singular value decomposition” of S
now the non-square, non-symmetric matrix, S, is represented as a mix of P
mutually perpendicular factors
called the “singular values”
![Page 55: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/55.jpg)
the matrix of loadings, C.
the matrix of factors, F
since C depends on Σ,the samples contains more of the factors with large singular values than of the factors with
the small singular values
![Page 56: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/56.jpg)
in MatLab
svd() computes all M factors(you must decide how many to use)
![Page 57: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/57.jpg)
1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000singular values, s(i)
index, i
s(i)
sing
ular
val
ues,
Sii
index, i
singular values of the Atlantic Rock dataset(sorted into order of size)
![Page 58: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/58.jpg)
1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000singular values, s(i)
index, i
s(i)
sing
ular
val
ues,
Sii
index, i
singular values of the Atlantic Rock dataset(sorted into order of size)
discard, since close to zero
![Page 59: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/59.jpg)
factors of the Atlantic Rock dataset
![Page 60: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/60.jpg)
factor of the Atlantic Rock dataset
factor 1 is the “typical factor”
![Page 61: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/61.jpg)
factor of the Atlantic Rock dataset
factor 2 as MgO increases, Al2O3 and CaO decreases
![Page 62: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/62.jpg)
factor of the Atlantic Rock dataset
factor 3: as Al2O3 increases, FeO and CaO increase
![Page 63: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/63.jpg)
f2 f3 f4 f5
f2p f3p f4p f5p
graphical representation of factors 2 through 5
f5f2 f3 f4
SiO2
TiO2
Al2O3
FeOtotal
MgO
CaO
Na2O
K2O
![Page 64: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/64.jpg)
C2C3
C4
factor loadings C2 through C4 plotted in 3D
factors 2 through 4 capture most of the variability of the rocks
![Page 65: Environmental Data Analysis with MatLab](https://reader036.fdocuments.us/reader036/viewer/2022062323/56815df1550346895dcc224a/html5/thumbnails/65.jpg)
Al203
Ti02Al203
Si02
K20
Fe0
Mg0
Al203
A) B)
C) D)