Post on 23-Oct-2014
1. Exploratory Data Analysis A normality test to check whether the data meet the assumption of the population
must be normally distributed. SPSS provides two statistics:
(i) Kolmogorov-Smirnov
(ii) Shapiro-Wilk
Case Processing Summary
strata
CasesValid Missing TotalN Percent N Percent N Percent
totfinwell urban 480 100.0% 0 .0% 480 100.0%rural 320 100.0% 0 .0% 320 100.0%
There is no missing data from 800 samples, which is assumed to be randomly
selected.
1.1 From the descriptives table below, several observations can be made:
(i) Mean
Mean is 75.53
(ii) Trimmed Mean
To obtain this value, SPSS removes the top and bottom 5 per cent of the
cases and calculates a new mean value. If we compare the original mean
(75.53) and this new trimmed mean (75.50), we can see whether our
extreme scores are having a strong influence on the mean.Trimmed Mean
is 75.50, which is very similar to the mean.
(iii) Skewness
Skew is the tilt of the distribution, skew should be within +2 to -2 range
when the data are normally distributed. In this case, skew is -.314
and .136, which is not within the range of accepted as normally distributed.
1
(iv) Kurtosis
Kurtosis, on the other hand, provides information about the ‘peakedness’
of the distribution. If the distribution is perfectly normal, we would obtain a
skewness and kurtosis value of 0 (rather an uncommon occurrence in the
social sciences). Positive skewness values indicate positive skew (scores
clustered to the left at the low values). Negative skewness values indicate
a clustering of scores at the high end (right-hand side of a graph). Positive
kurtosis values indicate that the distribution is rather peaked (clustered in
the centre), with long thin tails. Kurtosis values below 0 indicate a
distribution that is relatively flat (too many cases in the extremes). With
reasonably large samples, skewness will not ‘make a substantive
difference in the analysis’ (Tabachnick & Fidell 2007, p. 80). Kurtosis can
result in an underestimate of the variance, but this risk is also reduced with
a large sample (200+ cases: see Tabachnick & Fidell 2007, p. 80). In this
case, the value of kurtosis is -.070 and .272, and the sample for this case
is large – 800 samples.
Descriptives
strata Statistic Std. Error
totfinwell urban Mean 75.53 .778
95% Confidence Interval for Mean
Lower Bound 74.00
Upper Bound 77.06
5% Trimmed Mean 75.50
Median 77.50
Variance 290.425
Std. Deviation 17.042
Minimum 19
Maximum 120
Range 101
2
Interquartile Range 24
Skewness -.079 .111
Kurtosis .109 .222
rural Mean 72.86 .997
95% Confidence Interval for Mean
Lower Bound 70.90
Upper Bound 74.82
5% Trimmed Mean 73.11
Median 75.00
Variance 318.000
Std. Deviation 17.833
Minimum 16
Maximum 120
Range 104
Interquartile Range 26
Skewness -.314 .136
Kurtosis -.070 .272
1.2 Kolmogorov-Smirnov and Shapiro-Wilk statistic
Tests of Normality
strata
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
totfinwell urban .058 480 .001 .994 480 .048
rural .064 320 .003 .988 320 .011
a. Lilliefors Significance Correction
3
Kolmogorov-Smirnov and Shapiro-Wilk statistic assess the normality of the
distribution of scores. A non-significant result (Sig. value of more than .05) indicates
normality.
In this exercise, the Sig. value are .001,.003 and .048, .011 suggesting violation of
the assumption of normality. The results of the KS and SW test show that significant
value < .05, therefore, reject null hypothesis. Meaning that the data is not normally
distributed. This is quite common in larger samples.
1.3 Histograms
4
Histograms are used to display the distribution of a single continuous variable.
Inspection of the shape of the histogram provides information about the distribution
of scores on the continuous variable. Many of the statistics discussed in this manual
assume that the scores on each of the variables are normally distributed (i.e. follow
the shape of the normal curve). In this exercise, the scores are reasonably normally
distributed, with most scores occurring in the centre, tapering out towards the
extremes. It is quite common in the social sciences, however, to find that variables
are not normally distributed. Scores may be skewed to the left or right or,
alternatively, arranged in a rectangular shape. The actual shape of the distribution
for each group can be seen in the Histograms.
1.4 Normal Q-Q Plot
In this exercise, scores appear to be reasonably normally distributed. This is also
supported by an inspection of the normal probability plots (labelled Normal Q-Q
Plot). In this plot, the observed value for each score is plotted against the expected
value from the normal distribution. A reasonably straight line suggests a normal
distribution.
5
1.5 Detrended Normal Q-Q Plots
The Detrended Normal Q-Q Plots are obtained by plotting the actual deviation
of the scores from the straight line. There should be no real clustering of points, with
most collecting around the zero line.
6
1.6 Boxplots
The final plot that is provided in the output is a boxplot of the distribution of
scores for the two groups. The rectangle represents 50 per cent of the cases, with
the whiskers (the lines protruding from the box) going out to the smallest and
largest values. Sometimes we will see additional circles outside this range—these
are classified by SPSS as outliers. The line inside the rectangle is the median value.
Any scores that SPSS considers are outliers appear as little circles with a number
attached (this is the ID number of the case). SPSS defines points as outliers if they
extend more than 1.5 box-lengths from the edge of the box. Extreme points
(indicated with an asterisk, *) are those that extend more than three box-lengths from
the edge of the box. In the exercise below there are three outliers (two for urban and
one for rural ): ID numbers 550 and 686 and 711 for rural samples.
Boxplots are useful when we wish to compare the distribution of scores on variables.
We can use them to explore the distribution of one continuous variable (e.g. positive
affect) or, alternatively, we can ask for scores to be broken down for different groups
(e.g. age groups). We can also add an extra categorical variable to compare (e.g.
males and females).
In the exercise presented below, the distribution of scores of total financial wellbeing
between urban and rural population is very similar.
7
1.7 Interpretation and Conclusion
In this exercise, which is to test the normality of data on total financial wellbeing
between urban and rural population, since the data used are large samples (800),
we cannot rely on the Descriptives Table, KS, and SW test, then we have to look at
the Histograms, Normal Q-Q plots, Detrended Normal Q-Q Plots and Boxplots.
In the histograms, look at the tails of the distribution. There are almost invisible data
points sitting on their own, out on the extremes, it means that there are no potential
outliers. Furthermore, the scores drop away in a reasonably even slope.
In the Normal Q-Q Plots, the observed value for each score is plotted against the
expected value from the normal distribution. We see a reasonably straight line which
suggests a normal distribution of the data.
The same is observed in the Detrended Q-Q Normal Plots where there is actual
deviation of the scores from the straight line. There is no real clustering of points,
with most collecting around the zero line. It also indicates the normality of the data.
In the Boxplots, it is observed that the median line of the box is placed in the middle
for the rural population, whereas the line of the urban is almost to the middle of the
box which also indicates the normality of the data is not violated.
To conclude, with the consideration of using big sample size, and the observations
from several analyses above, the data is fulfill the assumption of normal.
8
2. Assumptions for t-test
T-tests are used when we have two groups (e.g. males and females) or two sets of
data (before and after), and we wish to compare the mean score on some
continuous variable. There are two main types of t-tests:
(i) Paired sample t-tests
(also called repeated measures) are used when we are interested in changes
in scores for participants tested at Time 1, and then again at Time 2 (often
after some intervention or event). The samples are ‘related’ because they are
the same people tested each time.
(ii) Independent sample t-tests
are used when we have two different (independent) groups of people (males
and females), and we are interested in comparing their scores. In this case,
we collect information on only one occasion but from two different sets of
people.
2.1 Independent-samples t-test
Independent samples t-test is to compare the mean of two groups on a single
interval or ratio variable.
Example of research question:
Are the urban population more financially satisfied than rural population?
This exercise will use the categorical independent variable with only two groups (e.g.
strata : urban/rural) and one continuous dependent variable (e.g. totfinwell).
Respondents can belong to only one group.
9
2.1.1 Assumptions
(i) Level of measurement
- Involved continuous data for the DV – interval and ratio data
(ii) Random sampling:
- assuming that data are obtained using a random sample from the population
(iii) Independent of observations
- the observations for each variable must be independent of one another i.e. not influenced by other variable/s.
(iv) Normal distribution
- assuming that the population from which the samples are taken are normally distributed
(v) Homogeneity of variance
- assuming that the samples are taken from population of equal variance.
2. 2. Hyphothesis
Null hypothesis (Ho): there is no difference between the two means of the financial
wellbeing between the urban and rural residents.
Alternate hypothesis (HA): there is a difference between the two means of the
financial wellbeing between the urban and rural residents.
2.3 IV and DV
IV is categorical variable (financial wellbeing); and DV is continuous variable (strata,
urban and rural area)
10
2.4 The result
Group Statistics
strata N MeanStd. Deviation
Std. Error Mean
totfinwell urban 480 75.53 17.042 .778
rural 320 72.86 17.833 .997
Independent Samples Test
Levene's Test for Equality of Variances t-test for Equality of Means
F Sig. t df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the Difference
Lower Upper
totfinwell Equal variances assumed
1.474
.225 2.132 798 .033 2.671 1.253 .211 5.130
Equal variances not assumed
2.112 662.218
.035 2.671 1.264 .188 5.154
2.4 Interpretation of the result
2.4.1 Check assumptions using table of Independent Samples Test
- Result of Levene’s test for equality of variance
In the Levene’s test, the result is not significant, f > 0.05
11
0.225 is > 0.05, then the analysis will be using the t-test result in the first row as
there is equal variance.
Referring to the result above, since Levene’s test is not significant, row 1 is used
(equal variance assumed); t = 2.132 & p = 0.033
2.4.2 The result showed that p>0.05 = 0.225 > 0.05, thus fail to reject null
hypothesis.
Therefore, it can be concluded that there is no significant difference in the mean
scores of financial wellbeing for each of the urban and rural residents.
2.4.3 The effect size (eta squared) can be calculated :
Eta squared = t2 + (N1 + N2 – 2)
= (2.132)2 = 0.45 (2.132)2 + (480 + 320 – 2)
The effect size or the magnitude of the difference is very small. Only 0.45 percent of
the variance (effect size x 100) in financial wellbeing is explained by residential
areas. The strength of difference is very low.
12