Factorial analysis of qualitative and quantitative ... · both mixed and structured according to a...
Transcript of Factorial analysis of qualitative and quantitative ... · both mixed and structured according to a...
European University of Britany
Agrocampus
Factorial analysis of qualitative and quantitative dataFactorial analysis of qualitative and quantitative databoth mixed and structured according to a hierarchy
Jérôme Pagès
Applied Mathematics deptApplied Mathematics dept.
1
Mont Saint-Michel
Rennes
Britany
2
3
Context of factorial analysis : One set of individuals described by several variables
Taking into account both quantitative and qualitative variablesFactor Analysis for Mixed Data (FAMD)1
Taking into account a partition of the variables2 Multiple Factor Analysis (MFA)2
Taking into account a hierarchy defined on the variablesHierarchical Multiple Factor Analysis (HMFA)3 Hierarchical Multiple Factor Analysis (HMFA)3
4
Taking into account both quantitative and qualitative variables1Data
g q qFactor Analysis for Mixed Data (FAMD)1
K1 quantitativevariables
( t d di d)
Q qualitativevariables(standardized) variables
1 k K11 q Q1
individuals i xik xiq
I
5
DataData
Q qualitatives variablesK quantitative Q q=K2 indicatrices(complete disjunctive coding)
K1 quantitativevariables
(standardized)
1 q Q
1 k K1 kq K2
1 k K1 1 kq Kq
1
individuals i xik xikq
I
6
Principal components analysisRepresentation of the variables and criterionRepresentation of the variables and criterion
RI
O i bl i
Criterion of standardized PCA1
k
v
One variable = one axis
v
θk
2 ( , )k
r k v∑O
1
θkv
2cos kvk
θ∑k
k
7
Multiple correspondence analysisRepresentation of the variables and criterionRepresentation of the variables and criterion
Variable q = sub space ERI
Variable q = sub-space Eq
v E
Criterion of MCA
2 ( )q vη∑ Eq
2cos qvθ∑
( , )q
q vη∑
O θqvqv
q∑
8
Factor analysis for mixed data (FAMD)
Criterion
RI2 2cos coskθ θ+∑ ∑B. Escofier (1979)
1k
RI cos coskv qvk q
θ θ+∑ ∑
1v Eq
O
θkvθqv
12 2( , ) ( , )
k qr k v q vη+∑ ∑
k q
M. Tenenhaus (1985)G. Saporta (1990)
9
p ( )
Representations provided by FAMD
F2F2
A
F2
i1
k1
kF1
A
B
Cindividual
categoryF1
i1
i2
k3
k2
10
Taking into account a partition of the variablesMultiple Factor Analysis (MFA)2
J tit ti J lit ti
MFA applied to groups of variables : quantitative, qualitative or mixed
p y ( )
(complete disjunctive coding)J1 quantitative groups J2 qualitative groups
1 q Qj1 k Kj
1 j J1 1 j J2Groups
Var.
1 k Kq
1
Ind. i xik xik
I
11
Weighting the groups of variablesWeighting
S l f ti i bl i i l iSeveral groups of active variables in a unique analysis
Question :
How to balance their influence ?
12
Weighting the variables in MFAWeighting
Reference example : two groups of quantitative variables
RI
set 1 : 2 var.
set 2 : 3 varset 2 : 3 var.
13
Reference exampleWeighting
PCA of the 5 variables, without considering the sets
1st principalcomponent
RI component
set 1 : 2 var.
set 2 : 3 var.
14
Reference exampleWeighting
Weighting the sets of variables in MFAbalancing the maximum axial inertia
Each variable of the set j is weighted by 1/λ1j
λ1j : 1st eigenvalue of PCA applied to set j.
15
RI
.5
.5
set 1 : 2 var.
set 2 : 3 var.
1
1
15
MFA is based on a factor analysis applied to all active sets of variables
Weighted fact. an.
The groups of variables can bequantitative (standardized or not)
MFA is based on a factor analysis applied to all active sets of variables
quantitative (standardized or not)qualitativemixed
Criterion (case of 2 groups : K1 quantitative variables Q2 qualitative variables)
1 2
2 21 21 1 2
1 1( , ) ( , )k K q Q
r k v q vQ
ηλ λ∈ ∈
+∑ ∑
Equivalences of MFAwhen each group t i i l i bl
Quantitative variables
Qualitatives variables
Standardized PCA
MCA
16
contains a single variableQ
Mixed data FAMD
MFA is based on a factor analysis applied to all active sets of variablesWeighted fact. an.
MFA provides :Firstly : classical results of factor analysis
F2F2
A
F2
i1
k1
kF1
A
B
Cindividual
categoryF1
i1
i2
k3
k2
Specific representations (see HMFA)
17
Taking into account a hierarchy defined on the variablesHierarchical Multiple Factor Analysis (HMFA)3
Sorted napping : an holistic approach in sensory evaluation
Hierarchical Multiple Factor Analysis (HMFA)
A set of products is given (a, b, c, d, e, f)
Task 1 : napping
Position the products on the tablecloth in such a way that :two products are very near one another if they seem identical (for you) ,two products are distant one another if they seem different (for you). p y ( y )
This must be done according to your own criteria.
18
Sorted napping : an holistic approach in sensory evaluation
A set of product is given (a, b, c, d, e, f)
Task : sorted napping
As the panellist forms his “tablecloth” (or ”nappe”)As the panellist forms his tablecloth (or nappe ),he is asked to make groups of products,i.e. to put in the same group the products that he perceives as similar.
19
Data structure One sorted napping
Coordinates on the tableclothe2 quantitative var. (not standardized)
Sorting : 1 qualitative var.g q
X Y C
X Y C
Nappingpp g
Sorting
20Sorted napping
Case of several sorted nappings Data structure
X1 Y1 X2 Y2C1 C2
Napping 1 Napping 2pp g
Sorting 1
pp g
Sorting 2
21Sorted napping 1 Sorted napping 2
Exemple : two sorted nappings
22
λ1=1 955HMFA λ1 1.955Balancing in HMFA
λ1=1.866 λ1=1.956
λ 1
MFAMFA
λ1=200 λ1=249
λ1=1λ1=1
PCA
PCA
MCA MCA
PCA
X1 Y1 X2 Y2C1 C2
Napping 1 Napping 2pp g
Sorting 1
pp g
Sorting 2
23Sorted napping 1 Sorted napping 2
Sorted nappings seen through their MFAFirst step of HMFA
The raw sorted nappings
24
Individuals factor map
F2 (30.35 %)
c d
λ1=1.955
b
F1 (64.54 %)1
a b e f
25
λ1=1 955Decomposition of the inertia for the first axisEfficiency of the balancing
λ1 1.955
.977 .978
464
.514 .505
.464 .472
X1 Y1 X2 Y2C1 C2
Napping 1 Napping 2pp g
Sorting 1
pp g
Sorting 2
26Sorted napping 1 Sorted napping 2
Representation of the nodes (=sets of variables) in HMFA
Data1 Kj
1
scalar products1 l I
1The set Kj ofvariables gatheredin the node j
1
i xik
1
i Wj(i,l)in the node j
I I
RI RI²
NKj
Wj
NJ
27
Representing NJ with HMFA
RI RI²
jNKj
N
Wj ws
NJvs
ws : W associated to vs
i i f j j d di finertia of NKj projected upon vs co-ordinate of Wj upon ws
Projected inertia of the whole set of the variables Kj onto vsj
= a measure of relationship betweenone variable vs
a set of variables Kj
( , )s jLg v K
0 ( , ) 1s jLg v K≤ ≤
28
a set of variables Kj( , )s jg
Can be applied to each node of the hierarchy
F2 (30.35 %)Lg square = relationship square
c d
F1 (64.54 %)
1 Sorting 1 F2 – 30.35%
a b e f
( )
sorted napping 1
5.5napping 1
napping 2
2
sorted napping 2
napping 2
2
01.50
sorting 2
F1 - 64.54%
29
Superimposed representation of the partial clouds of individuals(e g the J partial clouds associated to the highest partition in HMFA)
1 j J1 K1 1 Kj 1 KJ
(e.g. the J partial clouds associated to the highest partition in HMFA)
1 K1 1 Kj 1 KJ
1i1 ij iJ
ii
I
RK1 RKj RKJ NIJNI
jRNI
j
i ji 1
NI
i J
NI
30NIj : partial cloud (individuals seen through group j)
Superimposed representation of the J partial clouds of individuals (MFA)Geometrical framework
RK1 RKj RKJ N JN jR 1 RNI
j
i ji1
R NI
iJ
NI
RKj
RK
NIjRK1 i jNI
j
i 1jKKR R= ⊕
NIjpartial cloud
N mean cloudi
NI mean cloud
31NI
Superimposed representation of the J partial clouds of individuals (MFA)
Principle
RKj
RK
NIj
RK1
i jNIj
i1
i
i1
usNI
Partial clouds are projected onto principal axes of the mean cloud
32
p j p p
Representation of the partial clouds associated to the two highest nodes
c dc d
F1 - 64.54 %
ab
e f
F2 - 30.35 %
33
Representation of the partial clouds associated to the two highest nodes
c-1 d-1
c dc d
F1 - 64.54 %b-2
c-2 d-2
e-2 f-2
ab
e f
a-2e-2
a-1F2 - 30.35 %
b-1 e-1f-1
34
Concl sion
Conclusion
Conclusion
HMFA is a factor analysis devoted to multiple tables in whicha set of individuals is described byseveral sets of variables organized according to a hierarchy
The variables can be quantitative or categoricalThe variables can be quantitative or categorical
The core of the method is a weighted factor analysis ; it worksThe core of the method is a weighted factor analysis ; it worksas a PCA for quantitative variablesas MCA for categorical variables
It provides resultsusual in any factor analysisusual in any factor analysis
representation of individuals, of variables, etc.specific to the hierarchy defined on the variables
representation of partials points, of nodes, etc.
35
representation of partials points, of nodes, etc.
The analyses were performed with
An R package dedicated to p gExploratory Analysis
See LMA² site36