Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of...

35
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University

Transcript of Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of...

Page 1: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Math 5364/66 NotesPrincipal Components and Factor Analysis in SAS

Jesse Crawford

Department of MathematicsTarleton State University

Page 2: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Setting for Principal Components

1,Random vector ( , taking values i) ' n ppX X X

Page 3: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Typical Coordinate System

1,Random vector ( , taking values i) ' n ppX X X

Page 4: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Principal Components

1,Random vector ( , taking values i) ' n ppX X X

Page 5: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Relation to Eigenvectors

1

1

1

• Let cov( )

• Suppose 0 are the eigenvalues of

• Let , be corresponding orthonormal eigenvectors

• Then , are the principal component

,

, s

p

p

p

X

a a

a a

Page 6: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Implementation in R

Page 7: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Simulating the Data in SAS

Page 8: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Simulating the Data in SAS

1 1

2 2

5 0

2 0.4

X Z

X Z

Page 9: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

1 1

2 2

1 1

2 2

2 2

5 0

2 0.4

5 0 5 2cov cov

2 0.4 0 0.4

5 0 5 2

2 0.4 0 0.4

25 10

10 4.16

X Z

X Z

X

I

Z

X Z

Page 10: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Covariance Matrix in SAS

25 10

10 4.16

Page 11: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Principal Components in SAS

Page 12: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Inputting a Covariance Matrix Manually

Page 13: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

PCA Using Original Data

Page 14: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Example: Math and Reading Exams

Page 15: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Example: Adelges (Winged Aphids)

• 19 variables

• 4 principal components needed to explain 90% of the total variation

• PCA can be used to reduce dimensionality

Page 16: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

PCA Summary

• -dimensional random vector

• Covariance matrix

• Principal components are simply an orthonormal eigenbasis of

• Dimensionality reduction is achieved by dropping components with small eigenvalues

p X

Page 17: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Setting for Factor Analysis

1• Random vector ( ,

• Example from Spearman (1904). Exam scores for 33 students.

(Classics,French,English,Math,Pit

,

ch,Music) '

) 'pX X X

X

Page 18: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Setting for Factor Analysis

1

1

• Random vector ( ,

• Example from Spearman (1904). Exam scores for 33 students.

(Classics,French,English,Math,Pitch,Music) '

• Idea: Explain the variation in with a random vector

, ) '

( , , )

p

k

X X X

X

X f f f

'

via a regression equation

1 1 11 1 1 1

2 2 21 1 2 2

1 1

k k

k k

p p p pk k p

X f l f

f l

l

f

f l

X

X l f

l

ò

ò

ò

Page 19: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Setting for Factor Analysis

1• Random vector ,( , ) 'pX X X

1 1 11 1 1 1

2 2 21 1 2 2

1 1

k k

k k

p p p pk k p

X f l f

f l

l

f

f l

X

X l f

l

ò

ò

ò

Observed data(Random)

Intercept Term(Constant)

Factor loadings(Constant) Common factors

(Random)

Specific factors(Random)

Page 20: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Setting for Factor Analysis

1• Random vector ,( , ) 'pX X X

1 1 11 1 1 1

2 2 21 1 2 2

1 1

k k

k k

p p p pk k p

X f l f

f l

l

f

f l

X

X l f

l

ò

ò

ò

Observed data(Random, Observable)

Intercept Term(Constant)

Factor loadings(Constant) Common factors

(Random)

Specific factors(Random)

Unobservable

Page 21: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

1 1 11 1 1 1

2 2 21 1 2 2

1 1

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

k k

k k

p p p pk k p

p

p k

l

X l

X l

X

X

X f l f

f l f

f l f

Lf

p

k

p

L

f

ò

ò

ò

ò

ò

Page 22: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

p

p k

Lf

p

k

p

X

X

L

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

Page 23: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

p

p k

Lf

p

X

X

k

L

f

p

L

L

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

Page 24: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

2 21

2

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

Var( )

p

p k

i ii i ik i

i i

Lf

p

k

p

LL

X

X

X

L

l

h

l

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

Page 25: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

2 21

2

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

Var( )

p

p k

i ii i ik i

i i

Lf

p

k

p

LL

X

X

X

L

l

h

l

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

Communality orCommon variance

Uniqueness or Specific variance

Page 26: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Page 27: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

2

2

Var( )

If corr( ), then

1

i i i

i i

Lf

h

h

X

X

X

ò

2ˆih

Page 28: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

1 1

cov( , )

If corr( ), then

corr( , )

i i i ik k i

i j ij

i j ij

X f l f

X f l

X f l

l

X

ò

Correlations between

's and iX f

Page 29: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Principal Component Method for Factor Analysis

1

1 1 1

, where is orthogonal

and diag( , , )

ˆDefine ( ,

ˆ ˆ ˆDefine ')

ˆ ˆ ˆ'

, )

(

ˆ ˆ ˆRes '

p

k

ii ii ii

Lf

LL

L

LL

X

LL

LL

ò

1

2

12

21 2

1

th column of

th principal component of

th eigenvalue

0 res res

res 0 resRes

res res 0

p

p

p p

i

i

i

i

i

Page 30: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

2ˆih

'sˆDiagonal Entries:

Off-diagonal entries : re ss 'ii

ij

Rule of thumb:

If RMS 0.05, then the model is acceptable

Page 31: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Page 32: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

1 1 1

cov( )

ˆ

ˆ ˆ ˆ ˆ ˆ ˆ( ) ) (Generalized/weighted least squares method)

ˆˆˆ

ˆValues of are called factor sc r

(

o es.

Lf

X

f L L L X

X Lf

f

X

X

X

ò

ò

ò

Estimating Factor Scores

Page 33: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Page 34: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.

Rotation of Factors

• is a -dimensional random vector

• is a -dimensional random vector

• is a -dimensional random vector

p

p k

Lf

p

k

p

X

X

L

f

ò

ò

1

1

1

• ( ) 0

• ) 0

• cov( , ) 0

• cov( )

• cov( ) diag( , , ),

with each 0

(

• cov( )

k

p

k p

k k

p

i

E f

f

f I

E

X

ò

ò

ò

• be an orthogonal matrix.

• Then and satisfy the above conditions.

Let

fLL f

å å

Page 35: Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.