Confirmatory Factor Analysis Part Two

ConfirmatoryFactorAnalysisPartTwo

STA2101:Fall/Winter2019

THETRUTH(Well,closertothetruth,anyway)

Regression-likemodels

•  Realityismassivelynon-linear.•  Wecanlivewithalinearapproximation,asinmultipleregression.

•  Allthemodelequationshaveunknownslopesandunknownintercepts.

•  LikeD1=λ0,1+λ1F1+e1•  Latentvariableshaveunknownexpectedvaluesandvariances.

•  Callthisthe“originalmodel.”

Identifiability

•  Iftherearelatentvariables,theparametersoftheoriginalmodelarenotidentifiable.

•  Inaway,whenwedropinterceptsandignoreexpectedvalues,it’slikeweareassumingallexpectedvalues=0,“centering”themodel.

•  OriginalModelCenteredModel

Centeringisare-parameterization

•  Notone-to-one.•  Itreducesthedimensionoftheparameterspace,helpingwithidentifiability.

•  Doesnotaffectslopes,variancesorcovariances.

•  Meaningisunaffected.

•  WhataboutVar(Fj)=1?

Whyshouldthevarianceofthefactorsequalone?

•  Inheritedfromexploratoryfactoranalysis,whichwasmostlyadisaster.

•  Thestandardanswerissomethinglikethis:“Becauseit’sarbitrary.Thevariancedependsuponthescaleonwhichthevariableismeasured,butwecan’tseeittomeasureitdirectly.Sosetittooneforconvenience.”

•  Butsayingitdoesnotmakeitso.IfFisarandomvariablewithanunknownvariance,then

•  Var(F)=ϕisanunknownparameter.

CenteredModel

D1 = �1F + e1

D2 = �2F + e2

D3 = �3F + e3

D4 = �4F + e4

e1, . . . , e4, F all independentV ar(ej) = !j V ar(F ) = ��1,�2,�3 6= 0

CovarianceMatrix

� =

�

⇧⇧⇤

�21⇥ + ⇤1 �1�2⇥ �1�3⇥ �1�4⇥

�1�2⇥ �22⇥ + ⇤2 �2�3⇥ �2�4⇥

�1�3⇥ �2�3⇥ �23⇥ + ⇤3 �3�4⇥

�1�4⇥ �2�4⇥ �3�4⇥ �24⇥ + ⇤4

⇥

⌃⌃⌅

PassestheCountingRuletestwith10equationsin9unknowns

Butforanyc≠0�1 ⇥ �1 �2 �3 �4 ⇤1 ⇤2 ⇤3 ⇤4

�2 ⇥/c2 c�1 c�2 c�3 c�4 ⇤1 ⇤2 ⇤3 ⇤4

Bothyield

� =

�

⇧⇧⇤

�21⇥ + ⇤1 �1�2⇥ �1�3⇥ �1�4⇥

�1�2⇥ �22⇥ + ⇤2 �2�3⇥ �2�4⇥

�1�3⇥ �2�3⇥ �23⇥ + ⇤3 �3�4⇥

�1�4⇥ �2�4⇥ �3�4⇥ �24⇥ + ⇤4

⇥

⌃⌃⌅

The choice � = 1 just sets c =�

�: convenient but seemingly arbitrary.

Youshouldbeconcerned!

•  Foranysetoftrueparametervalues,thereareinfinitelymanyuntruesetsofparametervaluesthatyieldexactlythesameSigmaandhenceexactlythesameprobabilitydistributionoftheobservabledata.

•  Thereisnowaytoknowthefulltruthbasedonthedata,nomatterhowlargethesamplesize.

•  Butthereisawaytoknowthepartialtruth.

Certainfunctionsoftheparametervectorareidentifiable

At points in the parameter space where �1,�2,�3 6= 0,

• �12�13�23

= �1�2��1�3��2�3�

= �21�

• And so if �1 > 0, the function �j�1/2 is identifiablefor j = 1, . . . , 4.

• �11 � �12�13�23

= !1, and so !j is identifiable for j = 1, . . . , 4.

• �13�23

= �1�3��2�3�

= �1�2, so ratios of factor loadings

are identifiable.

Reliability•  Reliabilityisthesquaredcorrelationbetweentheobservedscoreandthetruescore.

•  Theproportionofvarianceintheobservedscorethatisnoterror.

•  ForD1=λ1F+e1it’s

⇢2 =

✓Cov(D1, F )

SD(D1)SD(F )

◆2

=

�1�p

�21�+ !1

p�

!2

=�21�

�21�+ !1

⇢2 =�21�

�21�+ !1

� =

�

⇧⇧⇤

�21⇥ + ⇤1 �1�2⇥ �1�3⇥ �1�4⇥

�1�2⇥ �22⇥ + ⇤2 �2�3⇥ �2�4⇥

�1�3⇥ �2�3⇥ �23⇥ + ⇤3 �3�4⇥

�1�4⇥ �2�4⇥ �3�4⇥ �24⇥ + ⇤4

⇥

⌃⌃⌅

�12�13

�23�11=

�1�2��1�3�

�2�3�(�21�+ !1)

=�21�

�21�+ !1

= ⇢2

Soreliabilitiesareidentifiabletoo.

Whatcanwesuccessfullyestimate?

•  Errorvariancesareknowable.•  Factorloadingsandvarianceofthefactorarenotknowableseparately.

•  Butbothareknowableuptomultiplicationbyanon-zeroconstant,sosignsoffactorloadingsareknowable(ifonesignisknown).

•  Relativemagnitudes(ratios)offactorloadingsareknowable.

•  Reliabilitiesareknowable.

TestingModelFit•  Notethatalltheequalityconstraintsmustinvolveonlythecovariances:σijfori≠j.

•  Intheoriginalmodel,thecovariancesareallmultipliedbythesamenon-zeroconstant.

•  So,theequalityconstraintsoftheoriginalmodelandthepretendmodelwithϕ=1arethesame.

•  Thechi-squaretestforgoodnessoffitappliestotheoriginalmodel.Thisisagreatrelief!

•  Likelihoodratiotestscomparingfullandreducedmodelsaremostlyvalidwithoutdeepthought.–  Equalityoffactorloadingsistestable.–  CouldtestH0:λ4=0,etc.

Re-parameterization•  Thechoiceϕ=1isaverysmartre-parameterization.

•  Itre-expressesthefactorloadingsasmultiplesofthesquarerootofϕ.

•  Itpreserveswhatinformationisaccessibleabouttheparametersoftheoriginalmodel.

•  Muchbetterthanexploratoryfactoranalysis,whichlosteventhesignsofthefactorloadings.

•  Thisisthesecondmajorre-parameterization.Thefirstwaslosingthethemeansandintercepts.

Re-parameterizations

OriginalmodelSurrogatemodel1Surrogatemodel2

D1 D6D5D4D3D2

F1 F2

Addafactortothecenteredmodel

Addafactortothecenteredmodel

D1 = �1F1 + e1

D2 = �2F1 + e2

D3 = �3F1 + e3

D4 = �4F2 + e4

D4 = �5F2 + e5

D6 = �6F2 + e6

cov

✓F1

F2

◆=

✓�11 �12

�12 �22

◆

e1, . . . , e6 independent of each other and of F1, F2

�1, . . .�6 6= 0V ar(ej) = !j

� =

�

⇧⇧⇧⇧⇧⇧⇤

�21⇥11 + ⇤1 �1�2⇥11 �1�3⇥11 �1�4⇥12 �1�5⇥12 �1�6⇥12

�1�2⇥11 �22⇥11 + ⇤2 �2�3⇥11 �2�4⇥12 �2�5⇥12 �2�6⇥12

�1�3⇥11 �2�3⇥11 �23⇥11 + ⇤3 �3�4⇥12 �3�5⇥12 �3�6⇥12

�1�4⇥12 �2�4⇥12 �3�4⇥12 �24⇥22 + ⇤4 �4�5⇥22 �4�6⇥22

�1�5⇥12 �2�5⇥12 �3�5⇥12 �4�5⇥22 �25⇥22 + ⇤5 �5�6⇥22

�1�6⇥12 �2�6⇥12 �3�6⇥12 �4�6⇥22 �5�6⇥22 �26⇥22 + ⇤6

⇥

⌃⌃⌃⌃⌃⌃⌅

�1 = (�1, . . . ,�6,⇥11,⇥12,⇥22,⇤1, . . . ,⇤6)�2 = (��

1, . . . ,��6,⇥

�11,⇥

�12,⇥

�22,⇤

�1, . . . ,⇤

�6)

��j = �j for j = 1, . . . , 6

Wherec1≠0andc2≠0

��1 = c1�1 ��

2 = c1�2 ��3 = c1�3 ⇥�

11 = ⇥11/c21

��4 = c2�4 ��

5 = c2�5 ��6 = c2�6 ⇥�

22 = ⇥22/c22

⇥�12 = �12

c1c2

Parametersarenotidentifiable

Variancesandcovariancesoffactors

•  Areknowableonlyuptomultiplicationbypositiveconstants.

•  SincetheparametersofthelatentvariablemodelwillberecoveredfromΦ=cov(F),theyalsowillbeknowableonlyuptomultiplicationbypositiveconstants–atbest.

•  Luckily,inmostapplicationstheinterestisintesting(pos-neg-zero)morethanestimation.

Cov(F1,F2)isun-knowable,but•  Easytotellifit’szero•  Signisknownifonefactorloadingfromeachsetisknown–saylambda1>0,lambda4>0

•  And,

•  Thecorrelationbetweenfactorsisidentifiable!

⇥14��12�13

�23

��45�46

�56

=�1�4⇤12

�1�

⇤11�4�

⇤22

=⇤12�

⇤11�

⇤22

= Corr(F1, F2)

Thecorrelationbetweenfactorsisidentifiable

•  Furthermore,itisthesamefunctionofSigmathatyieldsϕ12underthesurrogatemodelwithVar(F1)=Var(F2)=1.

•  Therefore,Corr(F1,F2)=ϕ12underthesurrogatemodelisequivalenttoCorr(F1,F2)undertheoriginalmodel.

•  Estimatesandtestsofϕ12underthesurrogatemodelapplytoundertheoriginalmodel.

�12��11�

�22

Settingvariancesoffactorstoone

•  Isaverysmartre-parameterization.•  Isexcellentwhentheinterestisincorrelationsbetweenfactors.

•  Allowsestimationofclassicalpathcoefficientsforthelatentvariablemodel.

•  (Thatlastremarkwasjustfortherecord.)

Re-parameterizationasachangeofvariables

•  Var(Fj’)=1•  ThenewfactorloadingisinunitsofthestandarddeviationofFj.

•  ThisappliestoallobservablevariablesconnectedtoFj.

•  Putsfactorloadingsfordifferentfactorsonacommonscale.

Dj = �jFj + ej

= (�j

p�jj)

1p�jj

Fj

!+ ej

= �0jF

0j + ej

Covariances

•  Covariancesbetweenfactorsinthesurrogatemodelequalcorrelationsintheoriginalmodel.

•  Latentvariableparametersarestronglyaffected.•  Parametersinthelatentsurrogatemodelaretheoriginalparameterstimespositiveconstants.

Cov(F 0j , F

0k) = E

1p�jj

Fj1p�kk

Fk

!

=E(FjFk)p�jj

p�kk

=�jkp

�jjp�kk

= Corr(Fj , Fk)

Whathappensifthereisalatentvariablemodel?

Yi = �1Xi + ✏i

V ar(Yi) = �21�+

StandardizebothXandY.

p�21�+

✓1p

�21�+

Yi

◆= �1

p�

⇣1p�Xi

⌘+ ✏i

)✓

1p�21�+

Yi

◆=

⇣�1

q�

�21�+

⌘ ⇣1p�Xi

⌘+ ✏ip

�21�+

Y 0i = �0

1 X 0i + ✏0i

Whatdoesitmean?

Y 0i = �0

1X0i + ✏0i

�01 = �1

s�

�21�+

Cov(X 0i, Y

0i ) = �0

1 = Corr(Xi, Yi)

Becausecovariancesunderthesurrogatemodelequalcorrelationsundertheoriginalmodel.

FactorLoadingsareaffectedtoo

Di = �Yi + · · ·+ ei

=

✓�q�21�+

◆ 1p

�21�+

Yi

!+ · · ·+ ei

= �0Y 0i + · · ·+ ei

Cascadingeffects

•  Understandthere-parameterizationasachangeofvariables

•  Notjustanarbitraryrestrictionoftheparameterspace.

•  Itshowstherearewidespreadeffectsthroughoutthemodel.

•  Alsoshowshowthemeaningsofothermodelparametersareaffected.

Theotherstandardtrick

•  Settingvariancesofallthefactorstooneisanexcellentre-parameterizationindisguise.

•  Theotherstandardtrickistosetonefactorloadingequaltooneforeachfactor.

•  D=F+eishardtobelieveifyoutakeitliterally.

•  It’sactuallyare-parameterization.•  Everymodelyou’veseenwithafactorloadingofoneisasurrogatemodel.

Backtoasingle-factormodelwithλ1>0

D1 = �1F + e1

D2 = �2F + e2

D3 = �3F + e3...

Dj =

✓�j

�1

◆(�1F ) + ej

= �0jF

0 + ej

D1 = F 0 + e1

D2 = �02F

0 + e2

D3 = �03F

0 + e3...

� =

�

⇤⇥ + ⇤1 �2⇥ �3⇥

�2⇥ �22⇥ + ⇤2 �2�3⇥

�3⇥ �2�3⇥ �23⇥ + ⇤3

⇥

⌅

Value under model

Function of ⌃ Surrogate Original

�23�13

�2�2�1

�23�12

�3�3�1

�12�13�23

� �21�

Σunderthesurrogatemodel

Underthesurrogatemodel

• It looks like �j is identifiable, but actually it’s �j/�1.

• Estimates of �j for j �= 1 are actually estimates of �j/�1.

• It looks like ⇥ is identifiable, but actually it’s �21⇥.

• ⇥ is being expressed as a multiple of �21.

• Estimates of ⇥ are actually estimates of �21⇥.

Everythingisbeingexpressedintermsofλ1.

MakeD1theclearestrepresentativeofthefactor.

Addanobservablevariable

•  Parametersareallidentifiable,evenifthefactorloadingofthenewvariableequalszero.

•  EqualityrestrictionsonSigmaarecreated,becauseweareaddingmoreequationsthanunknowns.

•  Theseequalityrestrictionsapplytotheoriginalmodel.

•  Itisstraightforwardtoseewhattherestrictionsare,thoughthecalculationscanbetimeconsuming.

Findingtheequalityrestrictions

•  CalculateΣ(θ).•  Solvethecovariancestructureequationsexplicitly,obtainingθasafunctionofΣ.

•  SubstitutethesolutionsbackintoΣ(θ).•  Simplify.

Example:Adda4thvariable

D1 = F + e1

D2 = �2F + e2

D3 = �3F + e3

D4 = �4F + e4

e1, . . . , e4, F all independentV ar(ej) = !j V ar(F ) = ��1,�2,�3 6= 0

�(�) =

�

⇧⇧⇤

⇥ + ⇤1 �2⇥ �3⇥ �4⇥�2⇥ �2

2⇥ + ⇤2 �2�3⇥ �2�4⇥�3⇥ �2�3⇥ �2

3⇥ + ⇤3 �3�4⇥�4⇥ �2�4⇥ �3�4⇥ �2

4⇥ + ⇤4

⇥

⌃⌃⌅

Solutions

�2 = �23�13

�3 = �23�12

�4 = �24�12

⇥ = �12�13�23

Substitute

⇥12 = �2⇤

=⇥23

⇥13

⇥12⇥13

⇥23= ⇥12

Substitutesolutionsintoexpressionsforthecovariances

�12 = �12

�13 = �13

�14 =�24�13

�23�23 = �23

�24 = �24

�34 =�24�13

�12

EqualityConstraints

�14�23 = �24�13

�12�34 = �24�13

Theseholdregardlessofwhetherfactorloadingsarezero(1234).

�12�34 = �13�24 = �14�23

Addanother3-variablefactor•  Identifiabilityismaintained.•  Thecovarianceϕ12=σ14•  Actuallyσ14=λ1λ4ϕ12undertheoriginalmodel.

•  Thecovariancesofthesurrogatemodelarejustthoseofthesurrogatemodel,multipliedbyun-knowablepositiveconstants.

•  Asmorevariablesandmorefactorsareadded,allthisremainstrue.

Comparingthesurrogatemodels•  Eithersetvariancesoffactorstoone,orsetoneloadingperfactortoone.

•  Botharisefromasimilarchangeofvariables.•  Fj’=cjFj,wherecj>0.•  cjiseitherafactorloadingoroneoverastandarddeviation.

•  Interpretationofsurrogatemodelparametersisdifferentexceptforthesign.

•  Mathematicallythemodelsareequivalent:

•  Thetruemodelandbothsurrogatemodelssharethesameequalityconstraints,andhencethesamegoodnessoffitresultsforanygivendataset.

Exchange �j and 1p�jj

.

Whichre-parameterizationisbetter?•  Technically,theyareequivalent.•  Theybothinvolvesettingasingleun-knowableparametertoone,

foreachfactor.•  Thisseemsarbitrary,butactuallyitresultsinaverygoodre-

parameterizationthatpreserveswhatisknowableaboutthetruemodel.

•  Standardizingthefactors(Surrogatemodel2A)ismoreconvenientforestimatingcorrelationsbetweenfactors.

•  Settingoneloadingperfactorequaltoone(Surrogatemodel2B)ismoreconvenientforestimatingtherelativesizesoffactorloadings.

•  HandcalculationswithSurrogatemodel2Bcanbeeasier.•  Ifthereisaseriouslatentvariablemodel,Surrogatemodel2Bis

mucheasiertospecifywithSAS.•  MixingSurrogatemodel2Bwithdoublemeasurementisnatural.•  Don’tdobothrestrictionsforthesamefactor!

Whyarewedoingthis?•  Theparametersoftheoriginalmodelcannotbeestimateddirectly.Forexample,maximumlikelihoodwillfailbecausethemaximumisnotunique.

•  Theparametersofthesurrogatemodelsareidentifiable(estimable)functionsoftheparametersofthetruemodel.

•  Theyhavethesamesigns(positive,negativeorzero)ofthecorrespondingparametersofthetruemodel.

•  Hypothesistestsmeanwhatyouthinktheydo.•  Parameterestimatescanbeusefulifyouknowwhatthenewparametersmean.

TheCrossoverRule•  Itisunfortunatethatvariablescanonlybecausedbyonefactor.Infact,it’sunbelievablemostofthetime.

•  Apatternlikethiswouldbenicer.

Whenyouaddasetofobservablevariablestoameasurementmodelwhoseparametersare

alreadyidentifiable

•  Straightarrowswithfactorloadingsonthemmaypointfromeachexistingfactortoeachnewvariable.

•  Youdon’tneedtoincludeallsucharrows.•  Errortermsforthenewsetofvariablesmayhavenon-zerocovarianceswitheachother,butnotwiththeerrorvariancesorfactorsoftheoriginalmodel.

•  Someofthenewerrortermsmayhavezerocovariancewitheachother.It’suptoyou.

•  Allparametersofthenewmodelareidentifiable.

Proof•  Haveameasurement(factoranalysis)modelwithpfactorsandk1observablevariables.Theparametersareallidentifiable.

•  Assumethatforeachfactor,thereisatleastoneobservablevariablewithafactorloadingofone.

•  Ifthisisnotthecase,re-parameterize.•  Re-orderthevariables,puttingthepvariableswithunitfactorloadingsfirst,intheorderofthecorrespondingfactors.

Thefirsttwoequationsbelongtotheinitialmodel

D1 = F + e1

D2 = �2F + e2

D3 = F + e3�3

�33

cov

0

@e1e3e3

1

A =

0

@⌦11 ⌦12 0

⌦22 0

1

A

cov(F) = �

�33

⌃ =

0

@⌃11 ⌃12 ⌃13

⌃22 ⌃23

⌃33

1

A

=

0

@� + ⌦11 �⇤>

2 �⇤2�⇤>

2 +⌦22 ⇤2�⇤>3

⇤3�⇤>3 +

1

A⇤>

3

⇤3 = ⌃>13�

�1

⌦33 = ⌃33 �⇤3�⇤>3

Solveforitanditbecomesblack

Comments•  Therearenorestrictiononthefactorloadingsofthevariablesthatarebeingaddedtothemodel

•  Therearenorestrictiononthecovariancesoferrortermsforthenewsetofvariables,exceptthattheymustnotbecorrelatedwitherrortermsalreadyinthemodel.

•  Thissuggestsamodelbuildingstrategy.Startsmall,perhapswith3variablesperfactor.Thenaddtheremainingvariables–maximumflexibility.

•  Couldevenfittheone-factorsub-modelsoneatatimetomakesuretheyareokay,thencombinefactors,thenaddvariables.

Addanobservedvariabletothefactors•  Oftenit’sanobservedexogenousvariable(likesexor

experimentalcondition)youwanttobeinalatentvariablemodel.

•  Supposeparametersoftheexisting(surrogate)factoranalysismodel(pfactors)areallidentifiable.

•  Xisindependentoftheerrorterms.

•  Addarow(andcolumn)toΣ.•  Addp+1parameterstothemodel.•  SayVar(X)=Φ0,Cov(X,Fj)=Φ0,j•  Dk=λkFj+ek,λkisalreadyidentified.•  E(XDk)=λkE(XFj)+0=λkΦ0,j

•  Solveforthecovariance.•  Dothisforeachfactorinthemodel.Done.

Wehavesomeidentificationrules

•  DoubleMeasurementrule.•  Three-variableruleforstandardizedfactors.•  Three-variableruleforunstandardizedfactors.•  Cross-overrule.•  Error-freerule.

CopyrightInformation

ThisslideshowwaspreparedbyJerryBrunner,Departmentof

Statistics,UniversityofToronto.ItislicensedunderaCreative

CommonsAttribution-ShareAlike3.0UnportedLicense.Use

anypartofitasyoulikeandsharetheresultfreely.These

Powerpointslidesareavailablefromthecoursewebsite:

http://www.utstat.toronto.edu/~brunner/oldclass/2101f19

Confirmatory Factor Analysis Part Two

Documents

Transcript of Confirmatory Factor Analysis Part Two