SCM300 Survey Design Lecture 4 Statistical Analysis For use in fall semester 2015 Lecture notes were...
Transcript of SCM300 Survey Design Lecture 4 Statistical Analysis For use in fall semester 2015 Lecture notes were...
SCM300 Survey Design
Lecture 4Statistical Analysis
For use in fall semester 2015Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester.
Last modified: 4-8-2015
SCM300 Survey Design
Lecture Aim & Objectives
Aim• To investigate methods of statistical analysisObjectives• Research questions & hypotheses• Statistical tests
SCM300 Survey Design
Introduction
• Survey research is all about answering questions• Analytical techniques covered in lecture 3 are used to
answer descriptive questions– They were univariate in nature
• i.e. only use information from a single variable
• Questions are often more complicated & require analysis of 2+ variables– e.g. Do the personal characteristics of shoppers affect
customer satisfaction?
SCM300 Survey Design
Variables
• The research question is– Do the personal characteristics of shoppers affect customer
satisfaction?
• The variables required are– Personal characteristics and satisfaction
• Personal characteristics is an independent variable (IV)– Causes change in the DV
• Customer satisfaction is the dependent variable (DV)– Influenced by changes in the IV
SCM300 Survey Design
Independent & Dependent Variables
IV 1 (age)
IV 2 (experience)
IV 3 (gender)
DV (satisfaction)
SCM300 Survey Design
Hypotheses
• You may then have a hypothesis– A statement used to test a particular proposition
• i.e. older shoppers have significantly higher levels of customer satisfaction than younger shoppers
• i.e. more experienced shoppers have significantly higher levels of customer satisfaction than less experienced shoppers
• i.e. female shoppers have significantly higher levels of customer satisfaction than male shoppers
SCM300 Survey Design
Null & Alternative Hypotheses
• Null hypothesis (H0)– “There is no significant difference or relationship”
• Alternative hypothesis (H1)– “There is a significant difference or relationship”– One-tailed is directional
• i.e. X is significantly different to Y
– Two-tailed is non-directional• i.e. There is a significant difference between X and Y
SCM300 Survey Design
Null & Alternative Hypotheses
• Hypotheses are normally labelled– H0¹– H0²– H0³– H1¹– H1²– H1³– etc
SCM300 Survey Design
Null & Alternative Hypotheses
• Null hypothesis– There is no significant difference in customer
satisfaction of older versus younger shoppers
• Alternative hypothesis– Customer satisfaction is significantly higher for older
than younger shoppers (one-tailed)– There is a significant difference in customer
satisfaction for older versus younger shoppers (two-tailed)
• I’d recommend 2-tailed over 1-tailed
SCM300 Survey Design
Your turn…..
• Which of the following are IV’s and which are DVs?
1. Sales volume affects profits
2. Advertisting affects volume of customers
3. Customer service levels affect customer retention
4. Study time influences exam results
5. Academic performance is affected by gender
6. Increases in aviation fuel burn reduce air quality
7. Older staff work harder
SCM300 Survey Design
Your turn…..
• Which of the following is a H0, which is a H1 (1-tailed / directional), which is a H1 (2-tailed / non-directional)?
1. Sales volume has a significant positive effect on profits2. Advertisting has a significant effect on the volume of customers3. Customer service levels have a significant effect on customer
retention4. Study time has a significant positive effect on exam results5. There is no significant difference in the academic performance of
men versus women6. Increases in aviation fuel burn have a significant negative effect on
air quality7. There is no significant difference in the effort of younger versus
older workers
SCM300 Survey Design
Your Survey
You are expected to develop a research question(s) based on theoretical context
Example• Customer-supplier relationships affect the
performance of supply chain networks (Ellinger et al, 1999). This has never been investigated in Norway so this study asks:
Do customer-supplier relationships affect the performance of supply chain networks in Norway?
SCM300 Survey Design
Your Survey
• The study by Ellinger et al. (1999) and others such as Jammernegg & Kischka (2005) suggest that frequent customer satisfaction surveys affect performance.
• Discussions with industry experts (e.g…..) suggest performance may also be affected by the frequency of meetings with customers and personal visits by senior managers.
• This study will investigate the overall effect of the customer-supplier relationship as well as the effect of individual aspects of the customer-supplier relationship
• What variables are needed…..?
SCM300 Survey Design
Your Survey
• Variables:– Performance– Customer-supplier relationship
• Frequent customer satisfaction surveys• Frequency of meetings with customers• Personal visit by senior managers
• How might you create the variables using a survey?
• What hypotheses might you use…..?
SCM300 Survey Design
Statistical Analysis
• The significance of each hypothesis is then tested using statistical analysis
• The objective is to ‘prove’ or ‘disprove’ each hypothesis
SCM300 Survey Design
What Tests?
Task Data Types of V Test
Relationship between 2 variables
Cross-tabs of frequencies
Nominal Chi-square
Difference between 2 means – paired
Means – whole sample
Ratio or ordinal t-test - paired
Difference between 2 means independent samples
Means – 2 sub-groups
-Ratio or ordinal (means)
-Nominal (2 groups only)
t-test – independent samples
Relationship between 2 variables
Means – 3+ sub-groups
-Ratio or ordinal (means)
-Nominal (3+ groups)
One-way analysis of variance (ANOVA)
SCM300 Survey Design
What Tests?
Task Data Types of V Test
Relationship between 3+ variables
Means – cross tabulated
- Ratio or ordinal (means)
- Nominal (2+)
Factorial analysis of variance
Relationship between 2 variables
Individual measures
Ratio or ordinal (2)
Correlation
Linear relationship between 2 variables
Individual measures
Ratio or ordinal (2)
Linear regression
Linear relationship between 3+ variables
Individual measures
Ratio or ordinal (3+)
Multiple regression
Relationship between large no.s of variables
Individual measures
Ratio or ordinal Factor or cluster analysis
SCM300 Survey Design
Nature of the Question
The way a question is posed suggests different statistical tests
Inferential statisticsAre there differences in levels of satisfaction between younger
and older students?
Are younger students more satisfied with their course than older students?
Measures of associationIs there a relationship between
age of student and levels of satisfaction?
SCM300 Survey Design
Inferential Statistics
• Chi-square• One Sample t-test• Paired Samples t-test• Independent Samples t-test• One-Way Analysis of Variance (ANOVA)
SCM300 Survey Design
Chi-square
• Crosstabs can provide initial analysis but…..– It is difficult to interpret the data– It does not comment on the significance of any
differences
• Chi-square can be used to investigate the significance of the difference between observed and expected values– Used for 2 nominal variables
• e.g. course enrolments & gender
SCM300 Survey Design
Chi-squareCross-tabulations may suggest a trend
e.g. course enrolments according to gender
Course * Gender Crosstabulation
Count
7 16 23
9 5 14
9 4 13
25 25 50
BSc
MSc
PhD
Course
Total
Female Male
Gender
Total
SCM300 Survey Design
Chi-square Procedure SPSS
1. Analyse2. Descriptive statistics3. Crosstabs4. Select a variable for rows5. Select a variable for columns6. Statistics7. Tick Chi-square8. Cells9. Tick Expected10. Continue11. OK
SCM300 Survey Design
Chi-square Output SPSS
• Use Pearson Chi-square• The value is 6.588
– The greater the value, the greater the difference between observed and expected values
• Significance of the difference is 0.037 (i.e. 3.7%)– This means we can be 96.3% confident that the difference is
not down to chance
• We therefore reject the null hypothesis and accept the alternative hypothesis
Chi-Square Tests
6,588a 2 ,037
6,750 2 ,034
5,649 1 ,017
50
Pearson Chi-Square
Likelihood Ratio
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)
0 cells (,0%) have expected count less than 5. Theminimum expected count is 6,50.
a.
SCM300 Survey Design
Probability - remember this…..?
Probability as %
Probability on a 0-1 scale (p)
Confidence level
5 0.05 95%
1 0.01 99%
0.1 0.001 99.9%
i.e. 95% confidence level means we are saying that we believe that there is a 95% chance that what we found is true
(and 5% chance that it is not): written as p<0.05
SCM300 Survey Design
One Sample t-test
• Compares the mean of a single sample with the population mean– e.g. a University claims that it’s BSc graduates
have an average starting salary that is ’significantly’ higher than the national average
SCM300 Survey Design
One Sample t-test Procedure SPSS
• Analyse• Compare means• One Sample t-test• Enter test variable
– i.e. graduate salary of the sample
• Enter test value– i.e. national average of 2.5mn NOK
• OK
SCM300 Survey Design
One Sample t-test Output SPSS
• Sample average of 2.6mn NOK is higher than the national average of 2.5mn NOK (t=1.551)
• But the difference is not significant (p=0.135)• Accept null hypothesis, reject alternative
One-Sample Statistics
23 2615217 356254,11724 74284,12GradSalaryN Mean Std. Deviation
Std. ErrorMean
One-Sample Test
1,551 22 ,135 115217,39 -38838,4 269273,2GradSalaryt df Sig. (2-tailed)
MeanDifference Lower Upper
95% ConfidenceInterval of the
Difference
Test Value = 2500000
SCM300 Survey Design
Paired & Independent Samples t-tests
• Compares 2 means– i.e. are they statistically different?
• ORDINAL or RATIO variables• Two main situations:
– Compare means of 2 variables for whole sample• Paired samples test e.g. average spend on shoes v food
– Compare means of 1 variable for 2 sub-groups• Independent samples test e.g. average spend on shoes
by men v women
SCM300 Survey Design
Paired Samples t-test Procedure SPSS
Average spend on shoes v food• Analyse• Compare means• Paired Samples t-test• Select the two variables to be compared• OK
SCM300 Survey Design
Paired Samples t-test Output SPSS
• Averages can be compared• t = -0.937 (food spend is lower)• But the difference is not significant (p=0.353,
i.e. 35.3%)• Accept null hypothesis, reject alternative
Paired Samples Test
-6,26000 47,22582 6,67874 -19,68143 7,16143 -,937 49 ,353FoodSpend - ShoeSpendPair 1Mean Std. Deviation
Std. ErrorMean Lower Upper
95% ConfidenceInterval of the
Difference
Paired Differences
t df Sig. (2-tailed)
Paired Samples Statistics
44,4200 50 40,25282 5,69261
50,6800 50 33,77986 4,77719
FoodSpend
ShoeSpend
Pair1
Mean N Std. DeviationStd. Error
Mean
SCM300 Survey Design
Independent Samples t-test Procedure SPSS
Average spend on shoes by men v women 1. Analyse2. Compare means3. Independent Samples t-test4. Select the test variable
• i.e. shoe spend
5. Select the grouping variable• i.e. gender
6. Define groups for the grouping variable• i.e. 1 for group 1 and 2 for group 2 – this corresponds to 1
for female and 2 for male
7. OK
SCM300 Survey Design
• Averages can be compared• t = 5.862 (female more than male)• Difference is significant (p=0.000, i.e. 99.9%+)• Reject null hypothesis, accept alternative
Independent Samples t-test Output SPSS
Group Statistics
25 72,2800 29,91310 5,98262
25 29,0800 21,51534 4,30307
GenderFemale
Male
ShoeSpendN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
2,652 ,110 5,862 48 ,000 43,20000 7,36941 28,38282 58,01718
5,862 43,589 ,000 43,20000 7,36941 28,34399 58,05601
Equal variancesassumed
Equal variancesnot assumed
ShoeSpendF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
SCM300 Survey Design
• t-test examines differences between 2 means• Analysis of Variance (ANOVA) examines 3+
means– e.g. average spend on shoes by course (BSc, MSc,
PhD)
• Examines whether means for each group vary– Labelled as ’between groups’ in the output
One-Way ANOVA
SCM300 Survey Design
Average spend on shoes by course
1. Analyse
2. Compare means
3. One-Way Anova
4. Select the DV• i.e. shoe spend
5. Select the factor• i.e. course
6. OK
One-Way ANOVA Procedure SPSS
SCM300 Survey Design
• F = 3.966 (variance test statistic)• Differences are significant (p=0.026, i.e. 2.6%)• Reject null hypothesis, accept alternative
One-Way ANOVA Output SPSS
ANOVA
ShoeSpend
8073,620 2 4036,810 3,966 ,026
47839,260 47 1017,857
55912,880 49
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
SCM300 Survey Design
Measures of Association
• Correlation Analysis• Linear Regression Analysis• Multiple Regression Analysis
SCM300 Survey Design
Correlation Analysis
• Examines relationship between 2 or more ORDINAL or INTERVAL/RATIO variables
• They are ‘CORRELATED’ if they are systematically related– POSITIVELY: as one increases, so does other– NEGATIVELY: as one decreases, so does other– UN-CORRELATED: no relationship
SCM300 Survey Design
Correlation Analysis
• Correlation is measured by the correlation co-efficient, ‘r’. The co-efficient is:
• Helps to think of correlation in visual terms– e.g. see next slide
Perfect –ve
Moderate –ve
No rel. Moderate +ve
Perfect +ve
-1 -0.7 -0.5 -0.1 0 0.1 0.5 0.7 1
Strong -ve
Weak –ve
Weak +ve
Strong +ve
SCM300 Survey Design
Association for Income & Profit
Business Income
(NOKmn)
Profit
(NOKmn)
Business Income
(NOKmn)
Profit
(NOKmn)
1 370 308 14 283 208
2 283 211 15 275 204
3 360 264 16 284 201
4 361 242 17 264 191
5 386 260 18 329 223
6 361 252 19 334 252
7 307 257 20 334 249
8 326 225 21 285 294
9 309 300 22 276 224
10 256 194 23 297 237
11 312 221 24 290 223
12 361 263 25 350 232
13 274 177 26 315 238
SCM300 Survey Design
Scatter-plot Procedure SPSS
1. Graphs
2. Interactive
3. Scatterplot
4. IV for the x-axis
5. DV for the y-axis
6. OK
SCM300 Survey Design
Correlation Procedure SPSS
1. Analyse
2. Correlate
3. Bivariate
4. Add variables to variables list
5. Tick Pearson’s for interval/ratio data (Spearman’s for ordinal)
6. OK
SCM300 Survey Design
• r = .617 (moderate-strong positive relationship)• Relationship is significant (p=0.001, i.e. 0.1%)• Reject null hypothesis, accept alternative
Correlation Output SPSS
Correlations
1 .617**
. .001
26 26
.617** 1
.001 .
26 26
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
INCOME
PROFIT
INCOME PROFIT
Correlation is significant at the 0.01 level(2-tailed).
**.
SCM300 Survey Design
Linear Regression Analysis
• Correlation shows strength of relationship but not the causality– Causality indicates the likely impact of IV on DV– e.g. How many pax would visit AMS if more flights
were provided (forecasting)
• Regression calculates equation for ‘best fit line’:y = a + bx
a = a constant representing the point the line crosses the y-axisb = a co-efficient representing the gradient of the slope
y & x = DV & IV
SCM300 Survey Design
Region Sales (NOKmn) Profits (NOKmn)
North East 1181 38.4
North West 1140 49.3
Yorkshire Humber 740 31.9
East Midlands 1050 39.9
West Midlands 1165 52.1
East of England 1129 32.1
London 1134 65.4
South East 1497 58.9
South West 687 30
Wales 912 27.2
Scotland 808 26.6
Northern Ireland 551 10.3
Annual ticket sales & profits data p/region for HiMolde Airlines
SCM300 Survey Design
Best Fit Line
• Perhaps we wish to predict profit for given values of sales– Profit is the dependent variable (y)– Sales the independent variable (x)
• Data seems scattered around a straight line• Then need to find the equation of a ‘best fit’ line:
y = a + bxProfit = ‘a number’ + (‘some other number’ x sales)
SCM300 Survey Design
Linear Regression Procedure SPSS
1. Analyse
2. Regression
3. Linear
4. Place DV and IV in relevant box
5. OK
SCM300 Survey Design
Coefficientsa
-9,237 11,080 -,834 ,424
,048 ,011 ,815 4,446 ,001
(Constant)
SalesNOKmn
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: ProfitsNOKmna.
Model Summary
,815a ,664 ,630 9,46114Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), SalesNOKmna.
Linear Regression Output SPSS
Profit = a + b x sales
Extent to which DV can be predicted by the
IV(s) i.e. 66%
Effect of IV on DV is
significant (p=0.001, i.e.
0.1%)
SCM300 Survey Design
Best Fit Line Procedure SPSS
1. Analyse
2. Regression
3. Curve estimation
4. Place DV and IV in relevant box
5. Tick linear
6. OK
SCM300 Survey Design
Multiple Regression Analysis
• Other factors/IVs may affect DV– i.e. A study may find that:
• ‘Airline choice is dependent on income’
– But airline choice may actually be dependent on age or occupation and income may just be a consequence of those factors
• Multiple regression tries to control for such effects
SCM300 Survey Design
Region Sales (NOKmn) Customers (mn) Profit (NOKmn)
North East 1181 36 38.4
North West 1140 47 49.3
Yorkshire Humber 740 29 31.9
East Midlands 1050 39 39.9
West Midlands 1165 52 52.1
East of England 1129 31 32.1
London 1134 68 65.4
South East 1497 59 58.9
South West 687 29 30
Wales 912 28 27.2
Scotland 808 24 26.6
Northern Ireland 551 15 10.3
Annual ticket sales, customers & profit data p/region for HiMolde Airlines
SCM300 Survey Design
Multiple Regression Procedure SPSS
1. Analyse
2. Regression
3. Linear
4. Place DV and IV’s in relevant box
5. OK
SCM300 Survey Design
Coefficientsa
-1,745 2,781 -,628 ,546
,005 ,004 ,089 1,215 ,255
,920 ,073 ,919 12,548 ,000
(Constant)
SalesNOKmn
Customersmn
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: ProfitsNOKmna.
Model Summary
,991a ,982 ,978 2,31904Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Customersmn, SalesNOKmna.
Multiple Regression Output SPSS
SCM300 Survey Design
Summary
• Variables are defined as a DV or as IV’s– A dependent variable is influenced by changes in the
independent variable(s)
• Hypotheses test specific propositions– They can be null or alternative (one-tailed or two-tailed)
• Statistical tests measure the significance of each hypothesis (and prove or disprove them)
• Inferential statistics measure differences• Measures of association measure relationships
SCM300 Survey Design
Recommended Reading
• Chapters 4-9 in Gaur, A.S. and Gaur, S.S. (2006). Statistical Methods for Practice and Research: A Guide to Data Analysis Using SPSS. New Delhi: Response Books.