Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes...

29
Regression & factor analyses

Transcript of Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes...

Page 1: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Regression & factor analyses

Page 2: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Where regression can go wrong An example:

A financial company wishes to ascertain what the drivers of satisfaction are for their service: They are:

EXPERT="experts"

Q30A2 ="Take the time to understand who you are"

Q30A3 ="Communicate clearly, in plain language"

Q30A6 ="Go out of their way to tailor the best deal"

Q30A7 ="Have the knowledge and authority to make"

Q30A8 ="Have a positive, can-do approach"

Q30A11 ="Understand your business and the market"

Q30A12 ="Are proactive with ideas on how to get t"

Q30A13 ="Are prompt and reliable in handling any"

Q30A14 ="Treat you with respect and listen"

Q30A15 ="Keep in regular contact to keep you updated"

Q32A1 ="The competitiveness of their fees and rates"

Q32A2 ="Offering a flexible range of lending/rep"

Q32A3 ="How easy it is to take out a commercial"

Q32A4 ="The features and benefits of their comments"

Q32A5 ="Providing a full range of commercial product"

Q32A6 ="Being fair and reasonable in their lending“

Q24 ="Q3a. AMP BANKING OVERALL RATING“ NB: this is the response

These were all on a 10 point scale

Page 3: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Example Let’s clean this data: SAS CODE:

Libname hold ‘’;

data temp;

set hold.model;

array new {*}

Q24

EXPERT Q30A2 Q30A3 Q30A6 Q30A7 Q30A8

Q30A11 Q30A12 Q30A13 Q30A14 Q30A15

Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;

do i=1 to 26;

if new[i] in (11) then new[i]=.;

end;

drop i;

run;

proc standard data=temp replace out=temp;

var Q24 Q33 Q34 EXPERT Q30A2 Q30A3 Q30A6 Q30A7 Q30A8

Q30A11 Q30A12 Q30A13 Q30A14 Q30A15

Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;

run;

data hold.model;

set temp;

run;

The above code changes 11’s for . (missings in SAS) and replaces them with the mean value for each varaible

Page 4: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Let’s look at the data: STAFF - Experts in Commercial Finance Ma

Cumulative Cumulative EXPERT Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 5 1.67 5 1.67 2 4 1.33 9 3.00 3 5 1.67 14 4.67 4 3 1.00 17 5.67 5 14 4.67 31 10.33 6 16 5.33 47 15.67 7 22 7.33 69 23.00 7.462890625 121 40.33 190 63.33 8 50 16.67 240 80.00 9 24 8.00 264 88.00 10 36 12.00 300 100.00

Take the time to understand who you are

Cumulative Cumulative Q30A2 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 10 3.33 10 3.33 2 4 1.33 14 4.67 3 11 3.67 25 8.33 4 10 3.33 35 11.67 5 19 6.33 54 18.00 6 19 6.33 73 24.33 7 25 8.33 98 32.67 7.4111328125 52 17.33 150 50.00 8 48 16.00 198 66.00 9 41 13.67 239 79.67 10 61 20.33 300 100.00

Communicate clearly, in plain language

Cumulative Cumulative Q30A3 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 3 1.00 3 1.00 2 5 1.67 8 2.67 3 3 1.00 11 3.67 4 6 2.00 17 5.67 5 11 3.67 28 9.33 6 12 4.00 40 13.33 7 34 11.33 74 24.67 7.98046875 33 11.00 107 35.67 8 81 27.00 188 62.67 9 48 16.00 236 78.67 10 64 21.33 300 100.00

Page 5: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Some more code

proc reg data = hold.model;

model Q24= expert Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13

Q30A14 Q30A15

Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;

run;

proc corr data = hold.model;

var Q24 expert Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13

Q30A14 Q30A15

Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6;

run;

Page 6: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Regression output Parameter Estimates

Parameter Standard

Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 1.99970 0.53770 3.72 0.0002

EXPERT STAFF - Experts in Commercial 1 0.05590 0.06486 0.86 0.3895

Finance Matters

Q30A2 Take the time to understand 1 0.01870 0.07645 0.24 0.8069

who you are

Q30A3 Communicate clearly, in plain 1 0.02263 0.07383 0.31 0.7595

language

Q30A6 Go out of their way to tailor 1 0.01097 0.06114 0.18 0.8578

the best

Q30A7 Have the knowledge and 1 0.11831 0.06004 1.97 0.0498

authority to make

Q30A8 Have a positive, can-do 1 0.13498 0.08037 1.68 0.0942

approach to doing

Q30A11 Understand your business and 1 -0.06802 0.07025 -0.97 0.3338

the market

Q30A12 Are proactive with ideas on 1 0.02511 0.05764 0.44 0.6634

how to get

Q30A13 Are prompt and reliable in 1 0.37204 0.06702 5.55 <.0001

handling any

Q30A14 Treat you with respect and 1 -0.17003 0.08039 -2.12 0.0353

listen

Q30A15 Keep in regular contact to 1 0.07978 0.04594 1.74 0.0835

keep you updated

Q32A1 The competitiveness of their 1 0.00392 0.06439 0.06 0.9514

fees and rates

Q32A2 Offering a flexible range of lending/rep 1 -0.05496 0.07295 -0.75 0.4519

Q32A3 How easy it is to take out a commercial 1 0.07025 0.06019 1.17 0.2442

Q32A4 The features and benefits of their comments1 -0.08790 0.08377 -1.05 0.2949

Q32A5 Providing a full range of commercial prod 1 0.07440 0.05614 1.33 0.1861

Q32A6 Being fair and reasonable in their lending 1 0.15004 0.06826 2.20 0.0288

Page 7: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Issues Note that many of these coefficients are not significant

Even worse some are negatively related when we would expect, in the worst case, that they would be at least >=0

Eg: Q30A14 Treat you with respect and 1 -0.17003 0.08039 -2.12 0.0353

listen

i.e.: this seems to imply that that not listening and treating people dis-respectfully would increase overall satisfaction !#&%$#%*&

So what is going on?

Page 8: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Some Correlation output Q30A7 0.61756 0.58737 0.61441 0.59967 0.64270 1.00000 0.71403 0.59881 0.60714

Have the knowledge and authority to make <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q30A8 0.60261 0.58008 0.76265 0.68892 0.70250 0.71403 1.00000 0.76378 0.70638

Have a positive, can-do approach to doin <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q30A11 0.52959 0.62118 0.81022 0.66246 0.64729 0.59881 0.76378 1.00000 0.71796

Understand your business and the market <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q30A12 0.53925 0.55677 0.73597 0.59714 0.66199 0.60714 0.70638 0.71796 1.00000

Are proactive with ideas on how to get t <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q30A13 0.64158 0.47558 0.63395 0.64574 0.54768 0.68501 0.68526 0.64092 0.59023

Are prompt and reliable in handling any <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q30A14 0.47386 0.51258 0.65066 0.69404 0.55816 0.57507 0.66858 0.60788 0.51475

Treat you with respect and listen to wha <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q30A15 0.50963 0.51407 0.67322 0.59555 0.55953 0.51578 0.54464 0.60346 0.64993

Keep in regular contact to keep you upda <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Q32A1 0.31972 0.37541 0.40878 0.40499 0.46758 0.40688 0.32594 0.38509 0.37980

The competitiveness of their fees and ra <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

It appears that the explanatory variables are very highly correlated with each other

.

Page 9: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Where do we go from here? Clearly we have data that is multi-collinear ( i.e.: variable

are linearly related and hence one variable may explain others)

In this case, some relationships may be hidden as another variable has ‘hogged’ the relationship in terms of explanation

So how do we go about seeing if we can reduce the number of variables we look at without losing the finer detail?

The answer is ….

(PS: let’s leave this example for a while and return to it later)

Page 10: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Factor analyses

Page 11: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Factor Analysis

Background A Factor Analysis takes answers from many (maybe different)

types of questions and summarises them with a smaller number of factors. It works by pulling out “common dimensions” from the input variables and grouping them together (e.g.. if Income and Education were input into a Factor Analysis they would probably come out on one factor resembling Socio-Economic Status).

The reasons for doing this are: to gain greater control over final solutions  to equate the scale of variables that have been measured on

different scales that the output factors are independent or orthogonal to each

other

–    

Page 12: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Principal components vs Factor analysis

With Principal components we compute: Y=(x-) where is othogonal

and ’andis diagonal matrix of eignevalues of the covariance matrix of x

With factor analysis we compute:

x= f where is a matrix of factor loadings. Here =’+

PC reduced dimensionality by taking a linear combination of the x’s

FA attempts to understand correlations between observable variables in terms of underlying factors, which are themselves not directly observable (latent)

Essentially the code you obtain is PC with ‘fudge factors’ so that we can investigate underlying or latent (i.e. factors) patterns

–    

Page 13: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Problems Missing Values

To perform a Factor Analysis the variables must contain no missing values. To overcome this any missing variables need to be filled in with the mean, median or mode - depending on the type of data. If there are missing values, the entire observation will be omitted from the analysis.

Variable Correlation and Factor Interpretation Since Factor Analysis works by grouping variables which are

correlated, the correlations between the variables should be checked before performing the analysis. From the qualitative research certain variables are expected to be correlated. This needs to be true if we are to reproduce the qualitative model. If this is not the case, it can result in problems interpreting the factors from the analysis. We need factors that make sense to continue with Regression Analysis or Segmentation (much later).

Number Of Factors The number of factors used depends upon the individual and the

job. The key point to note is that the factors need to be interpretable to be useful in analysis. Interpretability can make the final decision on how many factors you have.

Page 14: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Example

 

The following pages are an example of a Factor Analysis from a project done for the Auckland Regional Council regarding recycling in businesses. The questions used for the following Factor Analysis example are on the next page.

What the ARC wanted was a segmentation so they could target recycling programs at businesses which would be receptive to them. They also wanted to find out which media channels would be most effective for reaching the target market.

Page 15: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Q6 I am now going to read a series of statements which describe how an organisation might feel about buying recycled products. Please indicate how strongly you agree or disagree that each of the following statements applies to your company on a scale of 1 to 10, where 1 means you strongly disagree, and 10 means you strongly agree. ROTATE AND READ

  My company wouldn’t use recycled products because they look cheap and nasty.

  Recycled products seem to be of much lower quality than non-recycled products.

  Using recycled products results in our equipment breaking down and needing more maintenance.

  They would need to be a lot cheaper before we would consider buying them.

  If there were no other problems with recycled products we would even pay a small premium to use them.

  All recycled products cost more than non-recycled products.

  It’s not worth the time and effort finding and changing suppliers just to get recycled products.

  It would be too hard to make the system changes necessary to use recycled products.

  The range of recycled products available is not wide enough to warrant using them.

  It’s just too difficult to get enough people to change their routines and to use more recycled products.

  We would use recycled products if someone in our company took the responsibility to push the initiative ahead.

  Using recycled products doesn’t really fit with our image.

  If quality, price and availability were the same, we would choose to buy recycled products over not recycled products whenever we could

  Manufacturing recycled products is actually less energy efficient and more harmful to the environment.

  There are benefits to us if our customers see us as “Green”.

 

Page 16: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

PreliminariesPrior to performing a Factor Analysis a couple of preliminaries need to be worked through. First of all, the data used for the Factor Analysis needs to be

cleaned (i.e.. missing values or don’t knows replaced, influential points/ outliers checked and null microtab values that result in zeros). Next the correlations between the variables should be checked to see whether they are as the qualitative researcher (for segmentations) or client (for threshold analyses) expects.

Checking Data

First the variables in the Factor Analysis need to be checked for missing or invalid points. This can be done using a frequency table with code:

 

proc freq data=hold.cards;

table q33a1-q33a15;

run;

 

This table will show all values for the listed questions and how many missing values there are. The output for one table is shown below.

 

The SAS System 10:40 Tuesday, February 25, 1997 11

 

Cumulative Cumulative

Q33A13 Frequency Percent Frequency Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 12 4.9 12 4.9

2 8 3.2 20 8.1

3 16 6.5 36 14.6

4 8 3.2 44 17.8

5 26 10.5 70 28.3

6 23 9.3 93 37.7

7 31 12.6 124 50.2

8 47 19.0 171 69.2

9 13 5.3 184 74.5

10 60 24.3 244 98.8

11 3 1.2 247 100.0

 

Page 17: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Data cleaning issuesReplacing Don’t Knows Or Refused's With Missing's

Say the questions have a 1-10 scale for answers with 11’s as don’t knows. To convert the don’t knows to missings the following code can be used:

 

data hold.cards;

set hold.cards;

/* setting up an array for the variables to be replaced */

array new {*} q33a1-q33a15;

/* running through that array */

do i=1 to dim(new);

/* replacing 11’s with missings for all variables in the array */

if new[i]=11 then new[i]=.;

end;

/* dropping unneeded variable i */

drop i;

run;

 

Replacing Missings With Means

Now the variables do not have any don’t know answers - but a heap of missing values. To replace all the missings with means the following code can be used:

 

proc standard data=hold.cards replace out=hold.cards;

var q33a1-q33a15;

run; 

Page 18: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Data cleaning issues…Replacing Missings With Other Values

However if you want to replace missings with other values either of the following two sets of code can be used:

To replace all variables with the same value:

 

data hold.cards;

set hold.cards;

array new {*} q33a1-q33a15;

do i=1 to dim(new);

if new[i]=. then new[i]=8;

end;

drop i;

run;

To replace all variables with different values:

 

data hold.cards;

set hold.cards;

if q33a1=. then q33a1=8;

if q33a2=. then q33a2=8.25;

if q33a3=. then q33a3=8.5;

...

run;

Page 19: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Inspecting the data

Checking Variable Correlations

To check correlations between variables the following code can be used:

 

proc corr data=hold.cards best=7; var q33a1-q33a15;run; 

The output from this procedure is shown over the next 2 pages.

 

The best= option shows the 7 most highly correlated variables with each variable in the procedure.

 

If the correlations between variables are not as they should be you can either:

1.   leave the offending variable out of the Factor Analysis or

2.   run separate Factor Analyses for different sets of variables (renaming the different sets of factors in between)

The SAS System 10:40 Tuesday, February 25, 1997 12

 

Correlation Analysis

15 'VAR' Variables: Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15

Simple Statistics

 

Variable N Mean Std Dev Sum Minimum Maximum

 

Q33A1 247 2.534694 2.031215 626.069388 1.000000 10.000000

Q33A2 247 3.906780 2.452764 964.974576 1.000000 10.000000

Q33A3 247 2.845000 2.042431 702.715000 1.000000 10.000000

Q33A4 247 4.289362 2.528153 1059.472340 1.000000 10.000000

Q33A5 247 4.608333 2.376360 1138.258333 1.000000 10.000000

Q33A6 247 3.889952 2.238311 960.818182 1.000000 10.000000

Q33A7 247 4.504098 2.584346 1112.512295 1.000000 10.000000

Q33A8 247 3.144068 2.055235 776.584746 1.000000 10.000000

Q33A9 247 4.276316 2.383541 1056.250000 1.000000 10.000000

Q33A10 247 3.698347 2.384724 913.491736 1.000000 10.000000

Q33A11 247 5.782427 2.800034 1428.259414 1.000000 10.000000

...

Page 20: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Inspecting the dataThe SAS System 10:40 Tuesday, February 25, 1997 20  Correlation Analysis  Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 247  Q33A1Q33A1 Q33A2 Q33A12 Q33A3 Q33A9 Q33A7 Q33A41.00000 0.41681 0.41245 0.38841 0.31230 0.30666 0.296600.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001  Q33A2Q33A2 Q33A3 Q33A1 Q33A15 Q33A7 Q33A9 Q33A41.00000 0.48408 0.41681 0.36507 0.35198 0.33727 0.321250.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001  Q33A3Q33A3 Q33A2 Q33A1 Q33A4 Q33A10 Q33A15 Q33A81.00000 0.48408 0.38841 0.35555 0.34687 0.30598 0.287090.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001  Q33A4Q33A4 Q33A6 Q33A3 Q33A7 Q33A2 Q33A1 Q33A81.00000 0.42521 0.35555 0.32410 0.32125 0.29660 0.269380.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001  Q33A5Q33A5 Q33A13 Q33A14 Q33A11 Q33A12 Q33A1 Q33A71.00000 0.20620 0.17659 0.11134 -0.09332 0.08699 -0.086060.0 0.0011 0.0054 0.0807 0.1436 0.1729 0.1776  ...

Page 21: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS Code SAS Code The code for performing a factor analysis is as follows:

 

proc factor data=hold.cards nfact=6 rotate=varimax out=hold.cards fuzz = .3;

var q33a1-q33a15;

run; data= input data set nfact= number of factors asked for Out= output data set with factor values for each individual var variables in the Factor Analysis fuzz = .3 , eliminates any value less than .3 in absolute value in the FA output (see below)

Page 22: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS outputSAS System 12:05 Monday, February 24, 1997 1 Initial Factor Method: Principal Components  Prior Communality Estimates: ONE 1. Eigenvalues of the Correlation Matrix: Total = 15 Average = 1  1 2 3 4 5 6 7 8 Eigenvalue 3.9958 1.5398 1.1739 1.0474 0.9795 0.8416 0.8112 0.7429 Difference 2.4560 0.3659 0.1265 0.0679 0.1380 0.0303 0.0683 0.0584 Proportion 0.2664 0.1027 0.0783 0.0698 0.0653 0.0561 0.0541 0.0495 Cumulative 0.2664 0.3690 0.4473 0.5171 0.5824 0.6385 0.6926 0.7421  9 10 11 12 13 14 15 Eigenvalue 0.6845 0.6624 0.5972 0.5746 0.5072 0.4646 0.3774 Difference 0.0221 0.0652 0.0226 0.0673 0.0426 0.0872 Proportion 0.0456 0.0442 0.0398 0.0383 0.0338 0.0310 0.0252 Cumulative 0.7878 0.8319 0.8717 0.9100 0.9439 0.9748 1.0000  6 factors will be retained by the NFACTOR criterion. 2. Factor Pattern  FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6  Q33A1 0.64261 0.10466 0.22909 0.00702 0.37260 0.19244 Q33A2 0.70770 0.06589 0.02988 -0.15700 0.28081 -0.19919 Q33A3 0.64134 0.14019 -0.13186 -0.04640 0.25951 -0.24317 Q33A4 0.57238 0.35839 -0.29582 0.10771 -0.02757 0.27656 Q33A5 -0.08357 0.53286 0.53208 -0.09429 0.15764 -0.23052 Q33A6 0.43021 0.47717 -0.41209 -0.13827 -0.28342 0.05263 Q33A7 0.60480 0.04908 -0.10888 0.33017 -0.19909 0.04189 Q33A8 0.60941 -0.16610 0.22173 0.31312 -0.23898 -0.00199 Q33A9 0.55460 0.10588 0.28130 -0.21724 -0.42052 -0.08247 Q33A10 0.58595 0.04968 0.18529 0.21746 -0.27972 -0.17652 Q33A11 -0.23996 0.50645 -0.23555 0.54390 0.32545 0.01683 Q33A12 0.51615 -0.32144 0.32763 0.05628 0.22886 0.55261 Q33A13 -0.24610 0.49055 0.39569 0.02863 0.00242 -0.02809 Q33A14 -0.37558 0.46974 0.09508 -0.27936 -0.20472 0.47604 Q33A15 0.48719 -0.00283 -0.24449 -0.54913 0.19115 -0.02302  Variance explained by each factor  FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 3.995833 1.539818 1.173885 1.047402 0.979544 0.841555  

Page 23: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS output… 

3. Final Communality Estimates: Total = 9.578037

 

Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8

0.652294 0.649247 0.576990 0.632425 0.660909 0.684801 0.530450 0.603293

 

Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15

0.628753 0.536827 0.771585 0.838002 0.459394 0.717317 0.635753

The SAS System 12:05 Monday, February 24, 1997 2

 

Rotation Method: Varimax

 

4. Orthogonal Transformation Matrix

 

1 2 3 4 5 6

 

1 0.60090 0.60049 0.31986 -0.14237 0.35172 -0.17902

2 -0.04409 0.07833 0.60942 0.70061 -0.16650 0.31930

3 0.25547 -0.14998 -0.47831 0.67364 0.37609 -0.29702

4 0.54786 -0.28654 -0.18312 -0.10344 0.06205 0.75476

5 -0.46699 0.57038 -0.32109 0.06755 0.37548 0.45600

6 -0.23126 -0.45094 0.40111 -0.14078 0.74986 0.01305

Page 24: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS output…5. Rotated Factor Pattern

Q33A1 . 0.48345 . . 0.57939 .

Q33A2 . 0.72062 . . . .

Q33A3 . 0.68685 . . . .

Q33A4 . . 0.64305 . . .

Q33A5 . . . 0.79651 . .

Q33A6 . . 0.76294 . . .

Q33A7 0.59762 . . . . .

Q33A8 0.71377 . . . . .

Q33A9 0.49688 . . . . -0.50583

Q33A10 0.68783 . . . . .

Q33A11 . . . . . 0.83377

Q33A12 . . . . 0.86208 .

Q33A13 . . . 0.64644 . .

Q33A14 -0.38964 -0.45439 0.42849 0.39468 . .

Q33A15 . 0.60575 0.301 . . -0.3431

 Variance explained by each factor

  FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 2.095412 2.052493 1.520768 1.401878 1.318371 1.189115   Final Communality Estimates: Total = 9.578037  Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 0.652294 0.649247 0.576990 0.632425 0.660909 0.684801 0.530450 0.603293  Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15 0.628753 0.536827 0.771585 0.838002 0.459394 0.717317 0.635753   Scoring Coefficients Estimated by Regression  Squared Multiple Correlations of the Variables with each Factor  FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 

Page 25: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS output…6. Standardized Scoring Coefficients

 

FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6

 

Q33A1 -0.08335 0.18455 -0.03213 0.14900 0.43336 0.11645

Q33A2 -0.05022 0.41908 -0.08899 0.09010 -0.01441 -0.01110

Q33A3 -0.01743 0.41447 -0.03231 0.02842 -0.12090 0.11730

Q33A4 0.00491 -0.05167 0.43022 -0.08589 0.15910 0.19259

Q33A5 0.02684 0.18768 -0.15766 0.60951 -0.04507 -0.01852

Q33A6 0.00968 -0.01383 0.53336 -0.04939 -0.21569 -0.04682

Q33A7 0.32195 -0.12140 0.13970 -0.11504 -0.00638 0.15652

Q33A8 0.42291 -0.16895 -0.08466 -0.01712 0.06781 -0.00350

Q33A9 0.25110 -0.08846 0.10820 0.19609 -0.12006 -0.42766

Q33A10 0.42262 -0.06087 -0.03939 0.09682 -0.14606 -0.03909

Q33A11 0.02286 0.05149 0.08346 0.06972 0.02062 0.71907

Q33A12 -0.07340 -0.15889 -0.04086 -0.05886 0.76862 -0.01701

Q33A13 0.05660 -0.05396 -0.00596 0.46108 0.02966 0.03395

Q33A14 -0.22857 -0.34256 0.45994 0.21551 0.27576 -0.19905

Q33A15 -0.35190 0.37817 0.15988 -0.08769 -0.01491 -0.26763

Page 26: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS Output - interpretation

The above lines of code result in the output on the last few pages. The output shows:

1. the eigenvalues for each factor (check for reasonable size). The cumulative row shows what percentage of the variance is explained in the Factor Analysis using different numbers of factors. Aim for approximately 60% or more ultimately depending on the interpretability of the Factor Analysis.

2.   the unrotated factor pattern (ignore this).

3.   final communality estimates (check for any low ones). These show how much of each variables variance is explained by the factors. It is desirable for these to be approximately 60% or better for those variables which are important in the final analysis. Any variable with a low communality is essentially NOT used in the factor solution. If an important variable has a low communality, it can be used in a segmentation as a separate variable (more later).

Page 27: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

SAS Output - interpretation… The above lines of code result in the output on the last

few pages. The output shows:

4. the orthogonal transformation matrix (ignore this).

5. the rotated factor pattern (the key output - examine this closely). This shows each variables weighting on each factor. The important variables for each factor are those with weightings of around 50% and over.

6. the standardised scoring coefficients (use this in FA regression).

Page 28: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Output - meaningQ33A1 . 0.48 . . 0.58 . My company wouldn’t use recycled products because they

look

Q33A2 . 0.72 . . . . Recycled products seem to be of much lower quality than non-

Q33A3 . 0.69 . . . . Using recycled products results in our equipment breaking down a

Q33A4 . . 0.64 . . . They would need to be a lot cheaper before we would consider

Q33A5 . . . 0.80 . . If there were no other problems with recycled products we would

Q33A6 . . 0.76 . . . All recycled products cost more than non-recycled products.

Q33A7 0.60 . . . . . It’s not worth the time and effort finding and changing suppliers just

Q33A8 0.71 . . . . . It would be too hard to make the system changes necessary to use

Q33A9 0.50 . . . . -0.51 The range of recycled products available is not wide enough to

Q33A10 0.69 . . . . . It’s just too difficult to get enough people to change their routines

Q33A11 . . . . . 0.83 We would use recycled products if someone in our company took the

Q33A12 . . . . 0.86 . Using recycled products doesn’t really fit with our image.

Q33A13 . . . 0.65 . . If quality, price and availability were the same, we would choose to

Q33A14 -0.39 -0.45 0.43 0.39 . . Manufacturing recycled products is actually less energy efficient and

Q33A15 . 0.61 0.30 . . -0.34 There are benefits to us if our customers see us as “Green”.

Page 29: Regression & factor analyses. Where regression can go wrong uAn example: nA financial company wishes to ascertain what the drivers of satisfaction are.

Interpretation So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the

variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows:

Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products.”

Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment.”

Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products.”

Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could.”

Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image.”

Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead.”

So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows:

Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products.”

Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment.”

Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products.”

Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could.”

Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image.”

Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead.”