Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of...

Post on 05-Jan-2016

217 views 5 download

Transcript of Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of...

Math 5364/66 NotesPrincipal Components and Factor Analysis in SAS

Jesse Crawford

Department of MathematicsTarleton State University

Setting for Principal Components

1,Random vector ( , taking values i) ' n ppX X X

Typical Coordinate System

1,Random vector ( , taking values i) ' n ppX X X

Principal Components

1,Random vector ( , taking values i) ' n ppX X X

Relation to Eigenvectors

1

1

1

• Let cov( )

• Suppose 0 are the eigenvalues of

• Let , be corresponding orthonormal eigenvectors

• Then , are the principal component

,

, s

p

p

p

X

a a

a a

Implementation in R

Simulating the Data in SAS

Simulating the Data in SAS

1 1

2 2

5 0

2 0.4

X Z

X Z

1 1

2 2

1 1

2 2

2 2

5 0

2 0.4

5 0 5 2cov cov

2 0.4 0 0.4

5 0 5 2

2 0.4 0 0.4

25 10

10 4.16

X Z

X Z

X

I

Z

X Z

Covariance Matrix in SAS

25 10

10 4.16

Principal Components in SAS

Inputting a Covariance Matrix Manually

PCA Using Original Data

Example: Math and Reading Exams

Example: Adelges (Winged Aphids)

• 19 variables

• 4 principal components needed to explain 90% of the total variation

• PCA can be used to reduce dimensionality

PCA Summary

• -dimensional random vector

• Covariance matrix

• Principal components are simply an orthonormal eigenbasis of

• Dimensionality reduction is achieved by dropping components with small eigenvalues

p X

Setting for Factor Analysis

1• Random vector ( ,

• Example from Spearman (1904). Exam scores for 33 students.

(Classics,French,English,Math,Pit

,

ch,Music) '

) 'pX X X

X

Setting for Factor Analysis

1

1

• Random vector ( ,

• Example from Spearman (1904). Exam scores for 33 students.

(Classics,French,English,Math,Pitch,Music) '

• Idea: Explain the variation in with a random vector

, ) '

( , , )

p

k

X X X

X

X f f f

'

via a regression equation

1 1 11 1 1 1

2 2 21 1 2 2

1 1

k k

k k

p p p pk k p

X f l f

f l

l

f

f l

X

X l f

l

ò

ò

ò

Setting for Factor Analysis

1• Random vector ,( , ) 'pX X X

1 1 11 1 1 1

2 2 21 1 2 2

1 1

k k

k k

p p p pk k p

X f l f

f l

l

f

f l

X

X l f

l

ò

ò

ò

Observed data(Random)

Intercept Term(Constant)

Factor loadings(Constant) Common factors

(Random)

Specific factors(Random)

Setting for Factor Analysis

1• Random vector ,( , ) 'pX X X

1 1 11 1 1 1

2 2 21 1 2 2

1 1

k k

k k

p p p pk k p

X f l f

f l

l

f

f l

X

X l f

l

ò

ò

ò

Observed data(Random, Observable)

Intercept Term(Constant)

Factor loadings(Constant) Common factors

(Random)

Specific factors(Random)

Unobservable

1 1 11 1 1 1

2 2 21 1 2 2

1 1

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

k k

k k

p p p pk k p

p

p k

l

X l

X l

X

X

X f l f

f l f

f l f

Lf

p

k

p

L

f

ò

ò

ò

ò

ò

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

p

p k

Lf

p

k

p

X

X

L

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

p

p k

Lf

p

X

X

k

L

f

p

L

L

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

2 21

2

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

Var( )

p

p k

i ii i ik i

i i

Lf

p

k

p

LL

X

X

X

L

l

h

l

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

2 21

2

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

Var( )

p

p k

i ii i ik i

i i

Lf

p

k

p

LL

X

X

X

L

l

h

l

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

Communality orCommon variance

Uniqueness or Specific variance

2

2

Var( )

If corr( ), then

1

i i i

i i

Lf

h

h

X

X

X

ò

2ˆih

1 1

cov( , )

If corr( ), then

corr( , )

i i i ik k i

i j ij

i j ij

X f l f

X f l

X f l

l

X

ò

Correlations between

's and iX f

Principal Component Method for Factor Analysis

1

1 1 1

, where is orthogonal

and diag( , , )

ˆDefine ( ,

ˆ ˆ ˆDefine ')

ˆ ˆ ˆ'

, )

(

ˆ ˆ ˆRes '

p

k

ii ii ii

Lf

LL

L

LL

X

LL

LL

ò

1

2

12

21 2

1

th column of

th principal component of

th eigenvalue

0 res res

res 0 resRes

res res 0

p

p

p p

i

i

i

i

i

2ˆih

'sˆDiagonal Entries:

Off-diagonal entries : re ss 'ii

ij

Rule of thumb:

If RMS 0.05, then the model is acceptable

1 1 1

cov( )

ˆ

ˆ ˆ ˆ ˆ ˆ ˆ( ) ) (Generalized/weighted least squares method)

ˆˆˆ

ˆValues of are called factor sc r

(

o es.

Lf

X

f L L L X

X Lf

f

X

X

X

ò

ò

ò

Estimating Factor Scores

Rotation of Factors

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

p

p k

Lf

p

k

p

X

X

L

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

• be an orthogonal matrix.

• Then and satisfy the above conditions.

Let

fLL f

å å