Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes...
Transcript of Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes...
Regression & factor analyses
Where regression can go wrong An example:
A financial company wishes to ascertain what the drivers of satisfaction are for their service: They are:
EXPERT="experts"
Q30A2 ="Take the time to understand who you are"
Q30A3 ="Communicate clearly, in plain language"
Q30A6 ="Go out of their way to tailor the best deal"
Q30A7 ="Have the knowledge and authority to make"
Q30A8 ="Have a positive, can-do approach"
Q30A11 ="Understand your business and the market"
Q30A12 ="Are proactive with ideas on how to get t"
Q30A13 ="Are prompt and reliable in handling any"
Q30A14 ="Treat you with respect and listen"
Q30A15 ="Keep in regular contact to keep you updated"
Q32A1 ="The competitiveness of their fees and rates"
Q32A2 ="Offering a flexible range of lending/rep"
Q32A3 ="How easy it is to take out a commercial"
Q32A4 ="The features and benefits of their comments"
Q32A5 ="Providing a full range of commercial product"
Q32A6 ="Being fair and reasonable in their lending“
Q24 ="Q3a. AMP BANKING OVERALL RATING“ NB: this is the response
These were all on a 10 point scale
Example Let’s clean this data: SAS CODE:
Libname hold ‘’;
data temp;
set hold.model;
array new {*}
Q24
EXPERT Q30A2 Q30A3 Q30A6 Q30A7 Q30A8
Q30A11 Q30A12 Q30A13 Q30A14 Q30A15
Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;
do i=1 to 26;
if new[i] in (11) then new[i]=.;
end;
drop i;
run;
proc standard data=temp replace out=temp;
var Q24 Q33 Q34 EXPERT Q30A2 Q30A3 Q30A6 Q30A7 Q30A8
Q30A11 Q30A12 Q30A13 Q30A14 Q30A15
Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;
run;
data hold.model;
set temp;
run;
The above code changes 11’s for . (missings in SAS) and replaces them with the mean value for each varaible
Let’s look at the data: STAFF - Experts in Commercial Finance Ma
Cumulative Cumulative EXPERT Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 5 1.67 5 1.67 2 4 1.33 9 3.00 3 5 1.67 14 4.67 4 3 1.00 17 5.67 5 14 4.67 31 10.33 6 16 5.33 47 15.67 7 22 7.33 69 23.00 7.462890625 121 40.33 190 63.33 8 50 16.67 240 80.00 9 24 8.00 264 88.00 10 36 12.00 300 100.00
Take the time to understand who you are
Cumulative Cumulative Q30A2 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 10 3.33 10 3.33 2 4 1.33 14 4.67 3 11 3.67 25 8.33 4 10 3.33 35 11.67 5 19 6.33 54 18.00 6 19 6.33 73 24.33 7 25 8.33 98 32.67 7.4111328125 52 17.33 150 50.00 8 48 16.00 198 66.00 9 41 13.67 239 79.67 10 61 20.33 300 100.00
Communicate clearly, in plain language
Cumulative Cumulative Q30A3 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 3 1.00 3 1.00 2 5 1.67 8 2.67 3 3 1.00 11 3.67 4 6 2.00 17 5.67 5 11 3.67 28 9.33 6 12 4.00 40 13.33 7 34 11.33 74 24.67 7.98046875 33 11.00 107 35.67 8 81 27.00 188 62.67 9 48 16.00 236 78.67 10 64 21.33 300 100.00
Some more code
proc reg data = hold.model;
model Q24= expert Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13
Q30A14 Q30A15
Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;
run;
proc corr data = hold.model;
var Q24 expert Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13
Q30A14 Q30A15
Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;
run;
Regression output Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 1.99970 0.53770 3.72 0.0002
EXPERT STAFF - Experts in Commercial 1 0.05590 0.06486 0.86 0.3895
Finance Matters
Q30A2 Take the time to understand 1 0.01870 0.07645 0.24 0.8069
who you are
Q30A3 Communicate clearly, in plain 1 0.02263 0.07383 0.31 0.7595
language
Q30A6 Go out of their way to tailor 1 0.01097 0.06114 0.18 0.8578
the best
Q30A7 Have the knowledge and 1 0.11831 0.06004 1.97 0.0498
authority to make
Q30A8 Have a positive, can-do 1 0.13498 0.08037 1.68 0.0942
approach to doing
Q30A11 Understand your business and 1 -0.06802 0.07025 -0.97 0.3338
the market
Q30A12 Are proactive with ideas on 1 0.02511 0.05764 0.44 0.6634
how to get
Q30A13 Are prompt and reliable in 1 0.37204 0.06702 5.55 <.0001
handling any
Q30A14 Treat you with respect and 1 -0.17003 0.08039 -2.12 0.0353
listen
Q30A15 Keep in regular contact to 1 0.07978 0.04594 1.74 0.0835
keep you updated
Q32A1 The competitiveness of their 1 0.00392 0.06439 0.06 0.9514
fees and rates
Q32A2 Offering a flexible range of lending/rep 1 -0.05496 0.07295 -0.75 0.4519
Q32A3 How easy it is to take out a commercial 1 0.07025 0.06019 1.17 0.2442
Q32A4 The features and benefits of their comments1 -0.08790 0.08377 -1.05 0.2949
Q32A5 Providing a full range of commercial prod 1 0.07440 0.05614 1.33 0.1861
Q32A6 Being fair and reasonable in their lending 1 0.15004 0.06826 2.20 0.0288
Issues Note that many of these coefficients are not significant
Even worse some are negatively related when we would expect, in the worst case, that they would be at least >=0
Eg: Q30A14 Treat you with respect and 1 -0.17003 0.08039 -2.12 0.0353
listen
i.e.: this seems to imply that that not listening and treating people dis-respectfully would increase overall satisfaction !#&%$#%*&
So what is going on?
Some Correlation output Q30A7 0.61756 0.58737 0.61441 0.59967 0.64270 1.00000 0.71403 0.59881 0.60714
Have the knowledge and authority to make <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q30A8 0.60261 0.58008 0.76265 0.68892 0.70250 0.71403 1.00000 0.76378 0.70638
Have a positive, can-do approach to doin <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q30A11 0.52959 0.62118 0.81022 0.66246 0.64729 0.59881 0.76378 1.00000 0.71796
Understand your business and the market <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q30A12 0.53925 0.55677 0.73597 0.59714 0.66199 0.60714 0.70638 0.71796 1.00000
Are proactive with ideas on how to get t <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q30A13 0.64158 0.47558 0.63395 0.64574 0.54768 0.68501 0.68526 0.64092 0.59023
Are prompt and reliable in handling any <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q30A14 0.47386 0.51258 0.65066 0.69404 0.55816 0.57507 0.66858 0.60788 0.51475
Treat you with respect and listen to wha <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q30A15 0.50963 0.51407 0.67322 0.59555 0.55953 0.51578 0.54464 0.60346 0.64993
Keep in regular contact to keep you upda <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Q32A1 0.31972 0.37541 0.40878 0.40499 0.46758 0.40688 0.32594 0.38509 0.37980
The competitiveness of their fees and ra <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
It appears that the explanatory variables are very highly correlated with each other
.
Where do we go from here? Clearly we have data that is multi-collinear ( i.e.: variable
are linearly related and hence one variable may explain others)
In this case, some relationships may be hidden as another variable has ‘hogged’ the relationship in terms of explanation
So how do we go about seeing if we can reduce the number of variables we look at without losing the finer detail?
The answer is ….
(PS: let’s leave this example for a while and return to it later)
Factor analyses
Factor Analysis
Background A Factor Analysis takes answers from many (maybe different)
types of questions and summarises them with a smaller number of factors. It works by pulling out “common dimensions” from the input variables and grouping them together (e.g.. if Income and Education were input into a Factor Analysis they would probably come out on one factor resembling Socio-Economic Status).
The reasons for doing this are: to gain greater control over final solutions to equate the scale of variables that have been measured on
different scales that the output factors are independent or orthogonal to each
other
–
Principal components vs Factor analysis
With Principal components we compute: Y=(x-) where is othogonal
and ’andis diagonal matrix of eignevalues of the covariance matrix of x
With factor analysis we compute:
x= f where is a matrix of factor loadings. Here =’+
PC reduced dimensionality by taking a linear combination of the x’s
FA attempts to understand correlations between observable variables in terms of underlying factors, which are themselves not directly observable (latent)
Essentially the code you obtain is PC with ‘fudge factors’ so that we can investigate underlying or latent (i.e. factors) patterns
–
Problems Missing Values
To perform a Factor Analysis the variables must contain no missing values. To overcome this any missing variables need to be filled in with the mean, median or mode - depending on the type of data. If there are missing values, the entire observation will be omitted from the analysis.
Variable Correlation and Factor Interpretation Since Factor Analysis works by grouping variables which are
correlated, the correlations between the variables should be checked before performing the analysis. From the qualitative research certain variables are expected to be correlated. This needs to be true if we are to reproduce the qualitative model. If this is not the case, it can result in problems interpreting the factors from the analysis. We need factors that make sense to continue with Regression Analysis or Segmentation (much later).
Number Of Factors The number of factors used depends upon the individual and the
job. The key point to note is that the factors need to be interpretable to be useful in analysis. Interpretability can make the final decision on how many factors you have.
Example
The following pages are an example of a Factor Analysis from a project done for the Auckland Regional Council regarding recycling in businesses. The questions used for the following Factor Analysis example are on the next page.
What the ARC wanted was a segmentation so they could target recycling programs at businesses which would be receptive to them. They also wanted to find out which media channels would be most effective for reaching the target market.
Q6 I am now going to read a series of statements which describe how an organisation might feel about buying recycled products. Please indicate how strongly you agree or disagree that each of the following statements applies to your company on a scale of 1 to 10, where 1 means you strongly disagree, and 10 means you strongly agree. ROTATE AND READ
My company wouldn’t use recycled products because they look cheap and nasty.
Recycled products seem to be of much lower quality than non-recycled products.
Using recycled products results in our equipment breaking down and needing more maintenance.
They would need to be a lot cheaper before we would consider buying them.
If there were no other problems with recycled products we would even pay a small premium to use them.
All recycled products cost more than non-recycled products.
It’s not worth the time and effort finding and changing suppliers just to get recycled products.
It would be too hard to make the system changes necessary to use recycled products.
The range of recycled products available is not wide enough to warrant using them.
It’s just too difficult to get enough people to change their routines and to use more recycled products.
We would use recycled products if someone in our company took the responsibility to push the initiative ahead.
Using recycled products doesn’t really fit with our image.
If quality, price and availability were the same, we would choose to buy recycled products over not recycled products whenever we could
Manufacturing recycled products is actually less energy efficient and more harmful to the environment.
There are benefits to us if our customers see us as “Green”.
PreliminariesPrior to performing a Factor Analysis a couple of preliminaries need to be worked through. First of all, the data used for the Factor Analysis needs to be
cleaned (i.e.. missing values or don’t knows replaced, influential points/ outliers checked and null microtab values that result in zeros). Next the correlations between the variables should be checked to see whether they are as the qualitative researcher (for segmentations) or client (for threshold analyses) expects.
Checking Data
First the variables in the Factor Analysis need to be checked for missing or invalid points. This can be done using a frequency table with code:
proc freq data=hold.cards;
table q33a1-q33a15;
run;
This table will show all values for the listed questions and how many missing values there are. The output for one table is shown below.
The SAS System 10:40 Tuesday, February 25, 1997 11
Cumulative Cumulative
Q33A13 Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 12 4.9 12 4.9
2 8 3.2 20 8.1
3 16 6.5 36 14.6
4 8 3.2 44 17.8
5 26 10.5 70 28.3
6 23 9.3 93 37.7
7 31 12.6 124 50.2
8 47 19.0 171 69.2
9 13 5.3 184 74.5
10 60 24.3 244 98.8
11 3 1.2 247 100.0
Data cleaning issuesReplacing Don’t Knows Or Refused's With Missing's
Say the questions have a 1-10 scale for answers with 11’s as don’t knows. To convert the don’t knows to missings the following code can be used:
data hold.cards;
set hold.cards;
/* setting up an array for the variables to be replaced */
array new {*} q33a1-q33a15;
/* running through that array */
do i=1 to dim(new);
/* replacing 11’s with missings for all variables in the array */
if new[i]=11 then new[i]=.;
end;
/* dropping unneeded variable i */
drop i;
run;
Replacing Missings With Means
Now the variables do not have any don’t know answers - but a heap of missing values. To replace all the missings with means the following code can be used:
proc standard data=hold.cards replace out=hold.cards;
var q33a1-q33a15;
run;
Data cleaning issues…Replacing Missings With Other Values
However if you want to replace missings with other values either of the following two sets of code can be used:
To replace all variables with the same value:
data hold.cards;
set hold.cards;
array new {*} q33a1-q33a15;
do i=1 to dim(new);
if new[i]=. then new[i]=8;
end;
drop i;
run;
To replace all variables with different values:
data hold.cards;
set hold.cards;
if q33a1=. then q33a1=8;
if q33a2=. then q33a2=8.25;
if q33a3=. then q33a3=8.5;
...
run;
Inspecting the data
Checking Variable Correlations
To check correlations between variables the following code can be used:
proc corr data=hold.cards best=7; var q33a1-q33a15;run;
The output from this procedure is shown over the next 2 pages.
The best= option shows the 7 most highly correlated variables with each variable in the procedure.
If the correlations between variables are not as they should be you can either:
1. leave the offending variable out of the Factor Analysis or
2. run separate Factor Analyses for different sets of variables (renaming the different sets of factors in between)
The SAS System 10:40 Tuesday, February 25, 1997 12
Correlation Analysis
15 'VAR' Variables: Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Q33A1 247 2.534694 2.031215 626.069388 1.000000 10.000000
Q33A2 247 3.906780 2.452764 964.974576 1.000000 10.000000
Q33A3 247 2.845000 2.042431 702.715000 1.000000 10.000000
Q33A4 247 4.289362 2.528153 1059.472340 1.000000 10.000000
Q33A5 247 4.608333 2.376360 1138.258333 1.000000 10.000000
Q33A6 247 3.889952 2.238311 960.818182 1.000000 10.000000
Q33A7 247 4.504098 2.584346 1112.512295 1.000000 10.000000
Q33A8 247 3.144068 2.055235 776.584746 1.000000 10.000000
Q33A9 247 4.276316 2.383541 1056.250000 1.000000 10.000000
Q33A10 247 3.698347 2.384724 913.491736 1.000000 10.000000
Q33A11 247 5.782427 2.800034 1428.259414 1.000000 10.000000
...
Inspecting the dataThe SAS System 10:40 Tuesday, February 25, 1997 20 Correlation Analysis Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 247 Q33A1Q33A1 Q33A2 Q33A12 Q33A3 Q33A9 Q33A7 Q33A41.00000 0.41681 0.41245 0.38841 0.31230 0.30666 0.296600.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A2Q33A2 Q33A3 Q33A1 Q33A15 Q33A7 Q33A9 Q33A41.00000 0.48408 0.41681 0.36507 0.35198 0.33727 0.321250.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A3Q33A3 Q33A2 Q33A1 Q33A4 Q33A10 Q33A15 Q33A81.00000 0.48408 0.38841 0.35555 0.34687 0.30598 0.287090.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A4Q33A4 Q33A6 Q33A3 Q33A7 Q33A2 Q33A1 Q33A81.00000 0.42521 0.35555 0.32410 0.32125 0.29660 0.269380.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A5Q33A5 Q33A13 Q33A14 Q33A11 Q33A12 Q33A1 Q33A71.00000 0.20620 0.17659 0.11134 -0.09332 0.08699 -0.086060.0 0.0011 0.0054 0.0807 0.1436 0.1729 0.1776 ...
SAS Code SAS Code The code for performing a factor analysis is as follows:
proc factor data=hold.cards nfact=6 rotate=varimax out=hold.cards fuzz = .3;
var q33a1-q33a15;
run; data= input data set nfact= number of factors asked for Out= output data set with factor values for each individual var variables in the Factor Analysis fuzz = .3 , eliminates any value less than .3 in absolute value in the FA output (see below)
SAS outputSAS System 12:05 Monday, February 24, 1997 1 Initial Factor Method: Principal Components Prior Communality Estimates: ONE 1. Eigenvalues of the Correlation Matrix: Total = 15 Average = 1 1 2 3 4 5 6 7 8 Eigenvalue 3.9958 1.5398 1.1739 1.0474 0.9795 0.8416 0.8112 0.7429 Difference 2.4560 0.3659 0.1265 0.0679 0.1380 0.0303 0.0683 0.0584 Proportion 0.2664 0.1027 0.0783 0.0698 0.0653 0.0561 0.0541 0.0495 Cumulative 0.2664 0.3690 0.4473 0.5171 0.5824 0.6385 0.6926 0.7421 9 10 11 12 13 14 15 Eigenvalue 0.6845 0.6624 0.5972 0.5746 0.5072 0.4646 0.3774 Difference 0.0221 0.0652 0.0226 0.0673 0.0426 0.0872 Proportion 0.0456 0.0442 0.0398 0.0383 0.0338 0.0310 0.0252 Cumulative 0.7878 0.8319 0.8717 0.9100 0.9439 0.9748 1.0000 6 factors will be retained by the NFACTOR criterion. 2. Factor Pattern FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 Q33A1 0.64261 0.10466 0.22909 0.00702 0.37260 0.19244 Q33A2 0.70770 0.06589 0.02988 -0.15700 0.28081 -0.19919 Q33A3 0.64134 0.14019 -0.13186 -0.04640 0.25951 -0.24317 Q33A4 0.57238 0.35839 -0.29582 0.10771 -0.02757 0.27656 Q33A5 -0.08357 0.53286 0.53208 -0.09429 0.15764 -0.23052 Q33A6 0.43021 0.47717 -0.41209 -0.13827 -0.28342 0.05263 Q33A7 0.60480 0.04908 -0.10888 0.33017 -0.19909 0.04189 Q33A8 0.60941 -0.16610 0.22173 0.31312 -0.23898 -0.00199 Q33A9 0.55460 0.10588 0.28130 -0.21724 -0.42052 -0.08247 Q33A10 0.58595 0.04968 0.18529 0.21746 -0.27972 -0.17652 Q33A11 -0.23996 0.50645 -0.23555 0.54390 0.32545 0.01683 Q33A12 0.51615 -0.32144 0.32763 0.05628 0.22886 0.55261 Q33A13 -0.24610 0.49055 0.39569 0.02863 0.00242 -0.02809 Q33A14 -0.37558 0.46974 0.09508 -0.27936 -0.20472 0.47604 Q33A15 0.48719 -0.00283 -0.24449 -0.54913 0.19115 -0.02302 Variance explained by each factor FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 3.995833 1.539818 1.173885 1.047402 0.979544 0.841555
SAS output…
3. Final Communality Estimates: Total = 9.578037
Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8
0.652294 0.649247 0.576990 0.632425 0.660909 0.684801 0.530450 0.603293
Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15
0.628753 0.536827 0.771585 0.838002 0.459394 0.717317 0.635753
The SAS System 12:05 Monday, February 24, 1997 2
Rotation Method: Varimax
4. Orthogonal Transformation Matrix
1 2 3 4 5 6
1 0.60090 0.60049 0.31986 -0.14237 0.35172 -0.17902
2 -0.04409 0.07833 0.60942 0.70061 -0.16650 0.31930
3 0.25547 -0.14998 -0.47831 0.67364 0.37609 -0.29702
4 0.54786 -0.28654 -0.18312 -0.10344 0.06205 0.75476
5 -0.46699 0.57038 -0.32109 0.06755 0.37548 0.45600
6 -0.23126 -0.45094 0.40111 -0.14078 0.74986 0.01305
SAS output…5. Rotated Factor Pattern
Q33A1 . 0.48345 . . 0.57939 .
Q33A2 . 0.72062 . . . .
Q33A3 . 0.68685 . . . .
Q33A4 . . 0.64305 . . .
Q33A5 . . . 0.79651 . .
Q33A6 . . 0.76294 . . .
Q33A7 0.59762 . . . . .
Q33A8 0.71377 . . . . .
Q33A9 0.49688 . . . . -0.50583
Q33A10 0.68783 . . . . .
Q33A11 . . . . . 0.83377
Q33A12 . . . . 0.86208 .
Q33A13 . . . 0.64644 . .
Q33A14 -0.38964 -0.45439 0.42849 0.39468 . .
Q33A15 . 0.60575 0.301 . . -0.3431
Variance explained by each factor
FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 2.095412 2.052493 1.520768 1.401878 1.318371 1.189115 Final Communality Estimates: Total = 9.578037 Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 0.652294 0.649247 0.576990 0.632425 0.660909 0.684801 0.530450 0.603293 Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15 0.628753 0.536827 0.771585 0.838002 0.459394 0.717317 0.635753 Scoring Coefficients Estimated by Regression Squared Multiple Correlations of the Variables with each Factor FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
SAS output…6. Standardized Scoring Coefficients
FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6
Q33A1 -0.08335 0.18455 -0.03213 0.14900 0.43336 0.11645
Q33A2 -0.05022 0.41908 -0.08899 0.09010 -0.01441 -0.01110
Q33A3 -0.01743 0.41447 -0.03231 0.02842 -0.12090 0.11730
Q33A4 0.00491 -0.05167 0.43022 -0.08589 0.15910 0.19259
Q33A5 0.02684 0.18768 -0.15766 0.60951 -0.04507 -0.01852
Q33A6 0.00968 -0.01383 0.53336 -0.04939 -0.21569 -0.04682
Q33A7 0.32195 -0.12140 0.13970 -0.11504 -0.00638 0.15652
Q33A8 0.42291 -0.16895 -0.08466 -0.01712 0.06781 -0.00350
Q33A9 0.25110 -0.08846 0.10820 0.19609 -0.12006 -0.42766
Q33A10 0.42262 -0.06087 -0.03939 0.09682 -0.14606 -0.03909
Q33A11 0.02286 0.05149 0.08346 0.06972 0.02062 0.71907
Q33A12 -0.07340 -0.15889 -0.04086 -0.05886 0.76862 -0.01701
Q33A13 0.05660 -0.05396 -0.00596 0.46108 0.02966 0.03395
Q33A14 -0.22857 -0.34256 0.45994 0.21551 0.27576 -0.19905
Q33A15 -0.35190 0.37817 0.15988 -0.08769 -0.01491 -0.26763
SAS Output - interpretation
The above lines of code result in the output on the last few pages. The output shows:
1. the eigenvalues for each factor (check for reasonable size). The cumulative row shows what percentage of the variance is explained in the Factor Analysis using different numbers of factors. Aim for approximately 60% or more ultimately depending on the interpretability of the Factor Analysis.
2. the unrotated factor pattern (ignore this).
3. final communality estimates (check for any low ones). These show how much of each variables variance is explained by the factors. It is desirable for these to be approximately 60% or better for those variables which are important in the final analysis. Any variable with a low communality is essentially NOT used in the factor solution. If an important variable has a low communality, it can be used in a segmentation as a separate variable (more later).
SAS Output - interpretation… The above lines of code result in the output on the last
few pages. The output shows:
4. the orthogonal transformation matrix (ignore this).
5. the rotated factor pattern (the key output - examine this closely). This shows each variables weighting on each factor. The important variables for each factor are those with weightings of around 50% and over.
6. the standardised scoring coefficients (use this in FA regression).
Output - meaningQ33A1 . 0.48 . . 0.58 . My company wouldn’t use recycled products because they
look
Q33A2 . 0.72 . . . . Recycled products seem to be of much lower quality than non-
Q33A3 . 0.69 . . . . Using recycled products results in our equipment breaking down a
Q33A4 . . 0.64 . . . They would need to be a lot cheaper before we would consider
Q33A5 . . . 0.80 . . If there were no other problems with recycled products we would
Q33A6 . . 0.76 . . . All recycled products cost more than non-recycled products.
Q33A7 0.60 . . . . . It’s not worth the time and effort finding and changing suppliers just
Q33A8 0.71 . . . . . It would be too hard to make the system changes necessary to use
Q33A9 0.50 . . . . -0.51 The range of recycled products available is not wide enough to
Q33A10 0.69 . . . . . It’s just too difficult to get enough people to change their routines
Q33A11 . . . . . 0.83 We would use recycled products if someone in our company took the
Q33A12 . . . . 0.86 . Using recycled products doesn’t really fit with our image.
Q33A13 . . . 0.65 . . If quality, price and availability were the same, we would choose to
Q33A14 -0.39 -0.45 0.43 0.39 . . Manufacturing recycled products is actually less energy efficient and
Q33A15 . 0.61 0.30 . . -0.34 There are benefits to us if our customers see us as “Green”.
Interpretation So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the
variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows:
Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products.”
Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment.”
Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products.”
Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could.”
Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image.”
Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead.”
So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows:
Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products.”
Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment.”
Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products.”
Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could.”
Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image.”
Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead.”