SPSS Supplement Guide - Pindling.org · Available for Microsoft® Windows® and Macintosh®, the...
Transcript of SPSS Supplement Guide - Pindling.org · Available for Microsoft® Windows® and Macintosh®, the...
1
SPSS Supplement Guide
Version Date: July 2007
Courtney A. Pindling, PhD
2
CONTENT
Introduction ............................................................................................................... 3
Descriptive Statistics ........................................................................................................... 8
Scales of Measurements ............................................................................................ 8
Frequency Distributions ............................................................................................ 9
Variable Measurement Level................................................................................... 10
Central Tendency..................................................................................................... 14
Measure of Dispersion ................................................................................... 15
Box Plots ........................................................................................................ 17
The Standard Score ........................................................................................ 20
The Normal Distribution ......................................................................................... 22
Correlation ..................................................................................................... 28
Inference Statistics ............................................................................................................ 34
Introduction ............................................................................................................. 34
Hypothesis Testing - One-sample case for the mean .............................................. 35
Hypothesis Testing - Two-sample case for the mean.............................................. 39
Hypothesis Testing - Correlated samples case for the mean ................................... 43
Simple Linear Regression........................................................................................ 47
Chi-square Tests of Association .............................................................................. 53
Appendix - Statistical Tables.......................................................................................... 58
Z-score Probability Distribution Table (cumulative) .............................................. 58
Values of t at the 0.05 and 0.01 level: Two-tailed................................................... 62
Values of t at various significance levels: One-tailed ............................................. 63
Chi-square Table...................................................................................................... 64
Critical Values for Correlation Coefficient, r .......................................................... 65
3
Introduction
The purpose of this guide is to help students get started using SPSS. Topics
presented in this guide are: introduction to SPSS, installation of software, the
Data/Variable Views, handling data, getting help, SPSS basic menus, Examples of Using
SPSS for analysis.
SPSS solutions have been assisting college and university administrators for more
than 37 years. The software is used at thousands of colleges and universities worldwide
in a wide variety of disciplines. These solutions assist faculty and administrators in
several key areas:
SPSS was originally designed for use by social scientists to analyze data from
surveys. Over the years it has grown to include a wide range of techniques, which are
outlined briefly below. SPSS for Windows is a version of this statistical package that is
especially configured to work in the Windows operating environment. This version has a
wide range of statistical procedures and also a selection of high-resolution graphics
facilities. It also has links to many other packages. The MS-Windows version surrounds
this core with an extensive help and menu system.
The limit on the size of problem that can be tackled is essentially dependent on
the amount of RAM or virtual memory available on your machine. There are effectively
no limits on the number of variables or cases that the program can handle.
Students in advanced quantitative courses or those conducting graduate-level
research need a powerful statistics package to get results. That's why the SPSS Graduate
Pack includes the full version of SPSS Base, two add-on modules, and for Windows®
users, software for structural equation modeling (SEM) to give students the advanced
statistics and techniques you can't find in most student software packages.
Recommended Graduate Pack (http://spss.com/gradpack/):
The SPSS Graduate Pack provides the most complete tool set for use in your
advanced courses. Use the SPSS Graduate Pack for topics such as:
1. Quantitative methods
2. Research methods
4
3. Educational administration
4. Nursing research
5. And many more
Students can get the best information from any dataset using the in-depth statistics
of the SPSS Graduate Pack. Use basic statistics from counts and crosstabs, to advanced
procedures, including general linear models, linear mixed models, binomial and
multinomial logistic regression, and structural equation models.
Available for Microsoft® Windows® and Macintosh®, the SPSS Graduate Pack
includes:
1. Full version of SPSS Base
2. Two add-on modules, SPSS Regression™ and SPSS Advanced Models™
3. Software for structural equation modeling, Amos™ (available only on Windows)
As students' analytical needs increase, the SPSS Graduate Pack grows right along
with them. They can purchase additional add-on modules and software for specialized
techniques, such as complex sampling and correspondence analysis, publication-ready
tabular reporting, and much more. Best of all, the SPSS Graduate Pack is affordable for
students—up to 85 percent off * the commercial list price of SPSS Base. Students have
the option to purchase the SPSS Graduate Pack on-line at www.journeyed.com,
www.academicsuperstore.com, www.studentdiscounts.com, or lease a copy for 6 or 12
months at www.e-academy.com.
Please note:
The SPSS Graduate Pack is an educational tool, not intended for commercial use
This software will operate for approximately four years
Technical support is limited to installation questions only
The SPSS Graduate Pack is available for use in the United States and Canada only
Purchase by anyone other than degree-seeking students is strictly prohibited by
the license agreement
This guide provides an overview of a first semester course in statistics using
SPSS. It is divided into several sections, one section for each topic.
5
The section on central tendency deals with statistics used to describe a typical
data value for a data set. It employs such statistics as: mean, median, mode, and
frequency polygon (shows mode, sometimes mean and median).
The section on variability deals with statistics that describe the variability of a
sample distribution with such measures as: range, standard deviation, and variance.
The standard score and normal curve allow one to make statements about how far
a data point is from its mean and estimate the probability of other points relative to the
mean.
While descriptive statistics describes data, inference statistics makes predictions
or inference about data. The t-test is a usefully tool for comparing the mean of a data set
to some constant value or comparing the means of two samples or distributions. We also
use the t-test to compare both independent samples and correlated samples. The t-test
assumes that both data sets are fairly normally distributed, but is can be used for non-
normal data sets within certain limits.
The Pearson Chi-square test is used to measure the degree of associations between
two or more categorical data sets (based on observed versus expected frequencies); the
data are often nominal numbers. This is a distribution independent analysis; however, the
samples must be random samples.
The null hypothesis states that what are being compared are the same – means or
distributions. We reject the null hypothesis if: a. the significance of the test (p-value) is
less than 0.05 (95% confidence), b. the test statistics is greater than the table lookup
value, and c. the confidence interval for the test does not contain zero. Table 1 shows a
summary of the statistics used in this workbook.
Table 2 shows the SPSS procedures (basic SPSS menu steps) commonly used in a
first semester statistics course. For example, to create a new variable with all the standard
scores (z-scores) for each scores of a variable simply select Descriptive Statistics from
the SPSS Analyze menu, then Descriptives, then check the “Save standardized values as
variable” from the dialogue menu, move the variable you want standardized scores on to
the variable select window, and press OK.
6
Table 1
Statistics Summary
Statistics Description Remarks Descriptive Statistics Central Tendency Typical Value Mean, median, mode, and
frequency
Variability Spread of Distribution Range, variance, and standard deviation (std dev.)
Standard Score Number of standard deviations from the mean
68%: mean ± 1 std dev. 95%: mean ± 2 std dev. 99%: mean ± 3 std dev.
Normal Distribution Bell-shape, symmetric distribution
Assumed by most statistics
Correlation Measures relationships between 2 variables
Strength and direction of relationship, r
Inference Statistics One-sample t-test Compares mean of a
variable to a constant value
Two-sample t-test (independent)
Compares two sample means
Correlated t-test Compares means of two related samples
Different if p-value < 0.05
Different if test statistics > table lookup value
Different if confidence interval does not contain 0
Pearson Chi-square Measures associations between categorical data
Nominal numbers
7
Table 2
SPSS Statistics Procedure Summary
Statistics Description Remarks
Descriptive Statistics Central Tendency Analyze > Descriptive Statistics > Frequencies [Select
Variable(s)] > Select Central Tendency statistics from Statistics Option > OK (Option to Check Frequency Table display)
Variability Analyze > Descriptive Statistics > Frequencies [Select Variable(s)] > Select Dispersion statistics from Statistics Option > OK (Option to Check Frequency Table display)
Standard Score Analyze > Descriptive Statistics > Descriptives [Select Variable] > Check Save standardized values as variable > OK (New Variable Created with z-scores)
Normal Distribution Use Standard Normal Probability Distribution Tables to find Probabilities (Pr) of Variable (X) from: z = (X – M)/SD
Correlation Analyze > Correlate > Bivarate [Select at least 2 Variables; Correlation (Pearson or Spearman); Significance (Two-tail or One-tail)] > Check Flag significant correlations > OK
Inference Statistics One-sample t-test Analyze > Compare Means > One-Sample T Test [Select
Variable(s)] > Enter Test value (reference mean) > OK
Two-sample t-test (independent)
Analyze > Compare Means > Independent Samples T Test [Select Test Variable(s) and Grouping Variable] > Define Groups (Option Define Confidence Interval) > OK
Correlated t-test Analyze > Compare Means > Paired-Samples T Test [Select Two Variables] > (Option Define Confidence Interval) > OK
Pearson Chi-square Analyze > Nonparametric Tests > Chi Square Test [Select At least Two Variables] > OK
8
CHAPTER ONE
Descriptive Statistics
Scales of Measurements
Measurement is defined as the assignment of numbers to objects or events
according to prescribe rules. Once assigned, these numbers have certain properties of
which we must be aware of as we perform arithmetic or mathematics operations.
Nominal scale is the assignment of numbers for the sole purpose of
differentiating one object from another. Joe’s book locker is labeled 50 which
differentiates his from Sam’s whose locker is labeled 80. Assigning “1” for female and
“2” for male in order to categorize gender in a survey is a nominal assignment of a
number. Nominal assignments are not subjected to arithmetic manipulations.
Ordinal scale is the assignment of numbers for the purpose of differentiating
between objects as well as showing the direction of the difference between them. The
ranking of objects or events is a good example of an ordinal scale. On a survey
questionnaire one may assign “1” for Low Sociability, “2” for Average Sociability, and
“3” for High Sociability. We can now use more than or less than terms to compare
numbers. One cannot say that a person with an Average Sociability assignment of “2” is
twice as sociable as a person with an assignment of “1”.
Interval scale is the assignment of numbers to differentiate and assess the
amount of the difference between objects or events in equal intervals. A good example
of an interval scale is measurement of temperature on the Fahrenheit (F) or Celsius (C)
scale. We can say that a temperature increase from 200 to 400 F is twice as much increase
as from 500 to 600 F.
9
Ratio Scale has all the characteristics of the interval scale plus an absolute value
point. An absolute value point allows us to make statements involving ratios of two
numerical observations, such as “twice as long” or “half as fast”. The zero point for the
Fahrenheit temperature scale has an arbitrary zero point; therefore, we cannot say that 800
F is twice as warn as 400 F. If it takes Joe 6 minutes to run a mile and Sam takes 12
minutes to run a mile, we can say that Sam is twice as fast as Joe because 0 minutes is an
absolute value point; this assignment belongs to a ratio scale.
Most physical scales such as time, length, and weight are ratio scales, but very
few behavioral measurements are of this type.
Frequency Distributions
Frequency Distribution is a table constructed to show how many times a given
score or group of scores occurred in a set of data. A simple frequency distribution (see
Table 1) is the ordering of the frequencies of a set of data from highest to lowest scores in
a table. When scores are grouped into intervals showing how many scores occurred in
each interval, this is called a grouped frequency distribution (see Table 2).
Apparent Limits are the limits displayed in a grouped frequency table (Table 2,
col. 1); these limits give a reasonable range between which groups of data exist.
Real Limits (for continuous data – measurements or observations that depend
upon the accuracy of the measuring instrument) of any interval extend from ½ unit below
and above the apparent lower and upper limits respectively. The real limits of the 20 – 30
interval are 19.5 and 30.5. The real lower limit is designated L and the real upper limit is
designated U.
The Midpoint, MP, of an interval is its exact center. The MP of any interval is
found by adding the apparent upper limit to the apparent lower limit and dividing by 2.
The MP of the 20 – 30 interval is 25.
Interval size is denoted by the symbol i; it is the distance between the real lower
limit and the real upper limit. The interval size is determined by subtracting L from U. It
is recommended that you group data so that there are between 8 and 15 intervals.
10
Variable Measurement Level
You can specify the level of measurement as scale (numeric data on an interval or ratio scale), ordinal, or nominal. Nominal and ordinal data can be either string (alphanumeric) or numeric. Measurement specification is relevant only for:
Custom Tables procedure and chart procedures that identify variables as scale or categorical. Nominal and ordinal are both treated as categorical. (Custom Tables is available only in the Tables add-on component.)
SPSS-format data files used with Answer Tree.
You can select one of three measurement levels:
Scale. Data values are numeric values on an interval or ratio scale--for example, age or income. Scale variables must be numeric.
Ordinal. Data values represent categories with some intrinsic order (for example, low, medium, high; strongly agree, agree, disagree, strongly disagree). Ordinal variables can be either string (alphanumeric) or numeric values that represent distinct categories (for example, 1 = low, 2 = medium, 3 = high).
Note: For ordinal string variables, the alphabetic order of string values is assumed to reflect the true order of the categories. For example, for a string variable with the values of low, medium, high, the order of the categories is interpreted as high, low, medium--which is not the correct order? In general, it is more reliable to use numeric codes to represent ordinal data.
Nominal. Data values represent categories with no intrinsic order--for example, job category or company division. Nominal variables can be either string (alphanumeric) or numeric values that represent distinct categories--for example, 1 = Male, 2 = Female.
For SPSS-format data files created in earlier versions of SPSS products, the following rules apply:
String (alphanumeric) variables are set to nominal.
String and numeric variables with defined value labels are set to ordinal.
Numeric variables without defined value labels but less than a specified number of unique values are set to ordinal.
Numeric variables without defined value labels and more than a specified number of unique values are set to scale.
Figure 1. SPSS: Help menu, measurement levels.
11
Frequency (f) simply indicates how many scores are located or counted in each
interval.
Number, N, is the number of scores in a distribution or the total of all the
frequencies for all the intervals. N is also called the sample size.
The Figures and Tables below show how to use SPSS to create the
frequency distribution table and polygon (histogram) and some SPSS outputs.
Figure 2. SPSS frequency procedure: Analyze -> Charts -> Select Histograms
12
Table 3
Simple Frequency Distribution of 9th Graders (ODE)
N Frequency (f)
Percent
Cumulative Percent
28 1 1.1 1.1 33 1 1.1 2.1 34 1 1.1 3.2 37 1 1.1 4.3 40 1 1.1 5.3 47 3 3.2 8.5 49 1 1.1 9.6 50 4 4.3 13.8 51 2 2.1 16.0 52 1 1.1 17.0 53 1 1.1 18.1 54 2 2.1 20.2 55 2 2.1 22.3 56 2 2.1 24.5 57 1 1.1 25.5 59 3 3.2 28.7 60 2 2.1 30.9 61 4 4.3 35.1 62 2 2.1 37.2 63 1 1.1 38.3 64 3 3.2 41.5 65 3 3.2 44.7 66 3 3.2 47.9 67 3 3.2 51.1 68 5 5.3 56.4 69 4 4.3 60.6 71 3 3.2 63.8 72 3 3.2 67.0 73 5 5.3 72.3 74 3 3.2 75.5 75 4 4.3 79.8 77 2 2.1 81.9 78 3 3.2 85.1 79 1 1.1 86.2 80 1 1.1 87.2 81 2 2.1 89.4 82 1 1.1 90.4 83 2 2.1 92.6 84 2 2.1 94.7 85 1 1.1 95.7 86,95,98, 100 1 1.1 96.8
13
Table 4
Grouped Frequency Distribution of 9th Graders (ODE_New)
Apparent Limits (Scores)
Frequency (f)
Real Limits (L – U)
Midpoint (MP)
Interval Size (i)
90 – 100 3 89.5 – 100 95.0 10.5
80 – 89 7 79.5 – 89.5 84.5 10
70 – 79 8 69.5 – 79.5 74.5 10
60 – 69 11 50.5 – 69.5 64.5 10
50 – 59 9 49.5 – 50.5 54.5 10
40 – 49 3 39.5 – 49.5 40.5 10
30 – 39 2 29.5 – 39.5 34.5 10
20 – 29 1 19.5 – 29.5 24.5 10
Note. This Group Frequency Table is manually created
Table 5
Grouped Frequency SPSS Output
Scores Frequency
Cum Percent
20 – 29 1
1.1
30 – 39 3
4.3
40 – 49 5
9.6
50 – 59 18
28.7
60 – 69 30
60.6
70 – 79 24
86.2
80 – 89 10
96.8
90 – 100 3
100.0
Figure 3 Histogram of grouped frequency.
0.00
20.00
40.00
60.00
80.00
100.00
grade9th
0
5
10
15
20
25
30
Frequency
Mean = 61.2766
Std. Dev. = 13.5388
N = 94
Histogram
14
Central Tendency
Central tendency is an attempt to devise a statistical method that yield a single
value that would tell us something about the typical value(s) of a distribution. The three
most common central tendency statistics are the arithmetic mean, the median, and the
mode.
The mean or arithmetic mean or average is the sum of all values divided by the
number of values, N. When the mean is determined from a subset of data from an entire
population, it is often denoted by the symbol, M. When the mean is of the entire
population, it is designated by the symbol, µ.
The median is a central tendency statistics that attempts to find the exact center of
or mid-point of the data or scores. The median is the value that separates the upper half of
the date set or distribution from the lower half; often this is called the 50th percentile.
The mode is the statistics that shows which score(s) is the most frequent. The
mode is often found by selecting the score(s) that has the highest frequency from the
simple frequency distribution table. Bimodal distribution has two modes. Figure below
shows the SPSS Method: Load ODE Data; Analyze -> Frequencies -> Statistics, Select Mean,
Median, and Mode -> Select Display Frequency, Continue.
The Table below shows the output from a SPSS procedure used to
generate the descriptive statistics showing the mean, median and mode.
Table 6
Descriptive Statistics on 9th Grade (ODE)
Statistics Values
Mean 65.86
Median 67.00
Mode 68, 73
*Multiple modes exist. The smallest value is shown in the frequency statistical summary.
15
Figure 4. Central tendency measures of pass9th variable from ODE.
Measure of Dispersion
Variation is the fluctuation of scores about a measure of central tendency. To
describe a set of measurements accurately we need to know both the central tendency and
the measure of the variation.
The range is the simplest and most straightforward measure of variability; it is
the difference of the lowest score from the highest score. It does not however, tell us
anything about the pattern of the distribution of data.
The variance is the average sum of square deviation of a set of data from the
mean and is often denoted by the symbol, S2.
16
The standard deviation is the square root of the variance, or the average sum of
square deviation of a set of data from the mean. It is the most widely used measure of
variability and is often reported in most research statistical summaries. It is often denoted
by the symbol, S (SD is used to report in APA style).
Table 7 shows the measurement of dispersion for the pass9th variable from the
ODE data table. Figure below shows the SPSS measures of variation selection process:
Analyze -> Frequencies -> Statistics -> Select, Std. Deviation, Variance, Range -> Continue.
Table 7
Measures of Dispersion for pass9th Variable (ODE)
Statistics Values
Std. Deviation 13.61
Variance 185.22
Range 72.00
N is 94
17
Figure 5. Variability measures of pass9th variable from ODE.
Box Plots
A Box Plot is a graphical chart or diagram that illustrates both the central
tendency and variability of a data set: Minimum, 25 percentile (25% of data below point),
Median (50% percentile), 75 percentile, and Maximum value. SPSS procedure: Analyze
18
-> Explore -> Select variable, Statistics select Descriptive and Percentile, for Plots select
Boxplots.
Figure 6. SPSS procedure for Boxplot.
The SPSS output for Boxplot with selected statistics may look like the following
Figures and Tables:
19
Table 8
SPSS Output for Boxplot: Descriptive Statistics Table
Variable Name of Statistics Statistic Std. Error Mean 85.7000
.78951
Lower Bound 83.9140
95% Confidence Interval for Mean Upper Bound
87.4860
5% Trimmed Mean 85.6667
Median 85.5000
Variance 6.233
Std. Deviation 2.49666
Minimum 82.00
Maximum 90.00
Range 8.00
Interquartile Range 3.75
Skewness .373
.687
VAR00001
Kurtosis -.336
1.334
Table 9
SPSS Output Boxplot: Percentile Table
Percentiles
5 10 25 50 75 90 95 Weighted Average(Definition 1) 82.0000
82.1000
83.7500
85.5000
87.5000
89.9000
.
Tukey's Hinges
84.0000
85.5000
87.0000
20
VAR00001
82.00
84.00
86.00
88.00
90.00
Figure 7. SPSS Output: Boxplot diagram
The Standard Score
The standard score or z-score is simply a way of telling how far a score is from
the mean in standard deviation units. Knowing the z-score or standard score for a
particular data point, X not only tells us how far that data is from the mean but also what
percent of the distribution or data set is below or above that point, X. We use the z-score
table in the Appendix to determine this percentile.
It is a bit awkward when discussing a score, X, to say that it is “2 standard
deviations above the mean” or “1.5 standard deviations below the mean.” The z-score
was developed to state this fact; this, in effect, says the same thing in other meaningful
ways.
21
Note that, when the z-score is positive, it is located above the mean, and when
negative it is below the mean. A z-score of zero (0) tells us that 50% of the data is above
or below the mean.
The formula for converting any score, X into its corresponding z-score is:
z X MS
where
z is the z-score
X is the observed score or data point being examined
M is the mean of the distribution of scores
S is the standard deviation
From the ODE data table, we know that for the pass9th variable that M=65.86 and
S=13.61.
Arlington schools:
If we would like to know how the Arlington schools did relative to the rest of the schools
for the pass9th variable in ODE table we would compute Arlington’s z-score or standard
score. Arlington’s observed score, X was 84, so its z-score is:
z X MS
84 65.8613.61 1.33
The positive 1.33 tells us that Arlington school scored above the mean of all the schools
or their mean score is 1.33 standard deviations above the mean (84 = 65.86 + 1.33 x
13.61). The percentile for this z-score (see shaded cell of the z-score cumulative table in
Appendix) is 0.9082 or 90.82% (0.9082 x 100). This means that 90.82% of the rest of the
schools had their pass9th scores below that of Arlington or that Arlington’s pass9th score
was 90.8% above of the rest of the schools.
22
Lima schools:
If we would like to know how the Lima schools did relative to the rest of the schools for
the pass9th variable in the ODE table we would compute Lima’s z-score or standard
score. Lima’s observed score, X was 40, so its z-score is:
z X MS
40 65.8613.61 1.90
The negative 1.90 tells us that Lima schools scored below the mean of all the schools or
their mean score is -1.90 standard deviations below the mean (-1.90 = 65.86 - 1.90 x
13.61). The percentile for this z-score (see shaded cell of the z-score cumulative table in
Appendix) is 0.028716493 or 2.87% (0.028716493 x 100). This means that 2.87% of the
rest of the schools had their pass9th scores below that of Lima or that Lima’s pass9th
score was higher than only 2.87% of the rest of the schools. One may also state that
97.13% (100 - 2.87) of the schools scored higher than Lima for the pass9th variable.
SPSS can generate the standard score (See Figure 8) for you with the following
procedure: Analyze -> Descriptives -> Select variable and Save standardize values as
variables. SPSS will create a new variable with the z-scores for each value of the
variable.
The Normal Distribution
Many physiological and psychological measurements are normality distributed;
that is, a graph of the measurement looks like the familiar symmetrical, bell-shaped
distribution shown below. If we know or can assume that our data is similar to that of a
normal distribution, then there are many statistical conclusions or predictions we can
make about our data. Many statistical calculations assume that the data is normally
distributed. If the data is not normally distributed we can use non-parametric statistical
tests to draw conclusions or make predictions about our data.
There is a whole family of normal curve, depending on the sample or population
mean, M and the sample or population standard deviation, S. The normal curve can also
be drawn with the z-score instead of the actual data values.
23
Figure 8. SPSS procedure: Standard scores
24
Figure 9. IQ Normal Curve.
Figure 10. Standard Normal Curve (z-
scores).
One important characteristic of the normal curve is the information that the area
under the curve tells us about the probability of the set of data we are studying.
The area under the normal curve when the z-score is 0 is 50%. This is indicated
from the z-score table in the Appendix as 0.50. The area under the normal curve when the
z-score is +1.0 is 84.13% or 0.8413 from the z-score cumulative probability distribution
table in Appendix.
Figure 11. Area under curve when z = 0.0.
Figure 12. Area under curve when z = 1.0.
25
Using the z-score table in Appendix, the percent of the sample or population
between z-score of 0 and 1.0 is 84.13% - 50% = 34.13%. The percent of the sample
distribution below a z-score of -1.0 is 15.87%. Therefore, the percent of the sample
between a z-score of -1.0 and +1.0 is 84.13% - 15.87% = 68.26% or approximately 68%.
Note, that the percent of the population or sample between z = -2 and z = +2 is
approximately 95% and the percent between z = -3 and z = +3 is about 99%.
Figure 13. Area under curve when z = -1.
Figure 14. Area between z=-1 and z = +1.
Figure 15. Area between z = -2 and z = +2.
Figure 16. Area above z = 1.65.
26
Figure 17. SPSS procedure for creating a normal curve fit.
If we wanted to know what percent of data is above a z-score of 1.65, we would
observed from the z-score table in the Appendix that the probability less than 1.65 is
95.05. Therefore, the percentage above 1.65 is 4.95% (100 - 95.05).
27
20 40 60 80 100
Passed 9th Grade
0
5
10
15
20
Fre
qu
ency
Mean = 65.86Std. Dev. = 13.609N = 94
Histogram
Figure 18. Normal curve fit for the pass9th variable from ODE.
28
Correlation
Correlation is a statistical method that tries to determine if the is a relationship
between two variables, such as high school GPA and success in college. If there is a
relationship, we say that the two variables are correlated.
Correlation techniques indicate the strength or amount of relationship, so that a
single value will tell us how any two variables are related. This single value is called the
correlation coefficient, r. When we which to predict scores from one variable, knowing
the scores from another, we use another statistical technique called regression. So
correlation tells us if a relationship exists and regression enables us to use this
relationship to predict one variable score, given the score of the other.
The correlation coefficient, r (Pearson), ranges from values of -1 to +1. An r
value of +1 suggests that the two variables are strongly related positively; that is, as the
scores of one variable increases, the other also increases (From Table 10: Ability and
Speed). An r value of -1 suggests that the two variables are strongly related negatively; as
the scores of one variable increases, the other decreases (Table 10: Ability and GPA).
When the r value is 0, there is no relationship or no correlation.
Three assumptions are made about Pearson r: 1. it requires interval or ratio data,
2. the relationship between variables must be linear, and 3. the technique requires pairs of
data values.
29
Table 10
Correlation Example Table
Player Ability
GPA
Speed
Index
A 1
5
1
3
B 2
4
2
2
C 3
3
3
4
D 4
2
4
2
E 5
1
5
3
1 2 3 4 5
Ability
1
2
3
4
5
Figure 19. Positive correlation: Ability vs.
Speed.
1 2 3 4 5
Ability
1
2
3
4
5
Figure 20. Negative correlation: Ability
vs. GPA.
30
The sign of the correlation coefficient tells us
whether one variable is increasing or decreasing
as the other is increasing. The size of r,
indicates the amount of the relationship.
1 2 3 4 5
Ability
2
2.5
3
3.5
4
Figure 21. No correlation: Ability vs. Index.
The Pearson r for the data above is show below:
Table 11
Correlation Matrix for Correlation Example
Variable Ability GPA Speed Index
Ability 1
-1**
1**
0
GPA
1
-1**
0
Speed
1
0
Index
1
**Correlation is significant at the 0.01 level (2-tailed).
If the relationship between the variables is linear, we may use the correlation
coefficient determined by the Pearson r. For nonlinear relationships, we may use
Spearman Rank correlation coefficient, rs.
The significance of the correlation coefficient, r is dependent upon the sample
size and the level of confidence one wishes to have for the correlation coefficient. In the
SPSS correlation method this is related to the p-value or the significance level. If the
significance level, p-value, is very small (less than 0.05, for 95% confidence), then the
31
correlation is significant and the two variables are linearly related (especially so for the
Pearson r). If the significance level, p-value is very large (or p > 0.50) the correlation is
not significant, and the two variables are not linearly related.
Most textbooks use the following scheme to interpret the value of the correlation
coefficient as follows:
Table 12
Interpretations for Correlation Coefficient
Correlation Coefficient value Interpretation
>= 0.80 Very Strong
0.60 to 0.80 Strong
0.40 to 0.60 Moderate
0.20 to 0.40 Low
=< 0.20 Very Low
32
Figure 22. SPSS correlation procedure: Analyze -> Correlate (Pearson).
33
Table 13
Correlation Matrix for First 25 Data for CPS50
Variables independent
living scale
Self
confidence
score
Academic
aptitude test
Personal
adjustment
scale
Social
skills
inventory
Age of
student
independent
living scale
1
0.774**
0.37
0.623**
0.707**
-0.687**
Self
confidence
score
1
0.559**
0.764**
0.863**
-0.720**
Academic
aptitude test
1
0.552**
0.422*
-0.405*
Personal
adjustment
scale
1
0.669**
-0.31
Social skills
inventory
1
-0.618**
Age of
student
1
** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
34
CHAPTER TWO
Inference Statistics
Introduction
Read the sections on Sampling Theory and Hypothesis Testing in the Appendix
[not attached] for an introduction to inference statistics. Before you perform any
inference statistical tests it is best to know: a. a little about your sample and/or
population, b. have an hypothesis that you want tested, and c. determine which statistical
test you will use to evaluate your hypothesis. This chapter deals with selection of
appropriate statistical tests to evaluate your hypothesis or make inference about your
sample(s).
The student t-test: The t-distribution: The t-distribution is similar to a normal
distribution and approaches that of a normal curve when the sample size is very large.
The following are properties of the t-distribution:
1. There are an infinite number of student t distribution, one for each degree
of freedom, df. The number of degrees of freedom for the One-sample case is 1 minus
the sample size or N-1. That is, df = N - 1.
2. The Student's t distribution is similar to the standard normal distribution
in shape: That is it as a mean of t = 0 and the total area under the cure is 1 or the
probability density or distribution of the function.
3. As the sample size gets larger the curve gets closer to the standard normal
curve. Therefore for sample size >= 30 the z-score statistics can be used.
35
Figure 23. Normal and t-distribution curves.
Hypothesis Testing - One-sample case for the mean
One-Sample Case: If you wish to compare the mean of your sample to a test value, a
constant, you use the t distribution. For this test you will need your sample mean, M; the
sample standard deviation, S, the sample size, N and the value that you want the sample
mean compared to.
Hypothesis: Your null hypothesis, H0 (test statement) is that, your sample mean = test
value. Your alternate hypothesis, Ha is that there is no reason to believe that our sample
mean differs from the test value.
H0: M = E, where E is your test value
Statistics Used: Once you select the appropriate t-test method from your SPSS program,
it automatically calculates these statistics for you:
1. Degree of freedom: df = N - 1
2. Standard error: SX
SN
36
3. Test Statistics, t-test: t M E
SX
4. Alpha level, a = 0.05 for 95% confidence statement and a = 0.01 for
99%
5. The t-value: ta, t alpha (a) is the table look up value given alpha (0.05) and, df
Significance level: The probability at which you reject the null hypothesis.
Acceptance and Rejection Criteria:
Reject the null hypothesis for any of the following reasons:
1. The significance level < alpha or p-value < 0.05
2. The absolute value of the t-test (calculated) > t-value (table lookup)
3. The confidence interval for the mean difference does not contain zero.
Do not reject (accept) the null hypothesis for any of the following reasons:
1. The significance level > alpha or p-value > 0.05
2. The absolute value of the t-test (calculated) < t-value (table lookup)
3. The confidence interval for the mean difference contains zero
The SPSS output for the One-sample t-test will provide sufficient information for
you to make a decision about your null hypothesis. The following are two examples of
the one-sample t-test.
The SPSS procedure to calculate the one-sample t-test for a single variable and
compare it to a constant value is as follows: Analyze -> Compare Means -> One-Sample T Test
-> Enter Test Value and under Option Select Confidence Interval of 95% (for alpha of 0.05 or significance
level).
37
Figure 24. SPSS one-sample t-test procedure
Example 1: Is the mean of the pass9th variable 70? Since we which to compare the mean
of a single variable to a constant value of 70, we can use the one-sample t-test with an
alpha or a of 0.05 (95% confidence). When we performed this analysis using SPSS, we
obtained to following results:
The null hypothesis is: H0: 70 = mean of pass9th variable; the SPSS program
actually calculates the test statistics for Ha : mean of pass9th is not > 70, so the t-test will
be negative - we used its absolute value.
38
Table 14
SPSS One-Sample t-test of pass9th Variable: Basic Statistics
N Mean Std. Deviation
Std. Error
Mean
Passed 9th Grade 94
65.86
13.609
1.404
Table 15
SPSS One-Sample t-test of pass9th Variable: t-test Analysis for Test Value of 70
Test Value = 70
t
df
Sig. (2-tailed)
Mean
Difference
95% Confidence
Interval of the
Difference
Lower Upper
Passed 9th Grade -2.948
93
.004
-4.138
-6.93
-1.35
Making inference or Results of the test:
The t-value obtained from the one-tailed t-distribution table in the Appendix is
1.661 (df = 90+, a= 0.05) and since the following is observed from the results Table 12,
we reject the null hypothesis in favor of the alternate hypothesis that there is a
significant difference between the test value of 70 and the sample mean for the pass9th
variable. We could also say that the mean for the pass9th variable of 65.86 is
significantly less that 70.
Reject the null hypothesis for the following reasons (any one):
1. The significance p-value = (0.004 x 2) = 0.008 < 0.05
2. The absolute value of the t-test of 2.948 > t-value of 1.661 (appendix)
3. The confidence interval (-6.93 to -1.35) of the mean difference does not
contain zero.
39
Hypothesis Testing - Two-sample case for the mean
Two-Sample Independent Case: If you wish to compare the means of two variables,
you use the two-tailed t-test statistical strategy. For this test you will need to know both
sample means, M1 and M2; both samples standard deviations, S1 and S2, and the sample
sizes, N1 and N2. We are assuming independence for this case, i.e. a member of one group
does not influence the selection of any member of the other group.
Hypothesis: Your null hypothesis, H0: mean1 = mean2. Your alternate hypothesis, Ha is
that there is a difference in the means.
H0: M1 = M2
Statistics Used: Once you select the appropriate t-test method from your SPSS program,
it automatically calculates these statistics for you:
1. Degree of freedom: df = N1 + N2 - 2
2. Standard error of the mean difference, SD
3. Test Statistics, t-test: t
M1 M2
SD
4. Test for Equality of Variance (Levene's Test for Equality of Variances).
a. If the significance for the Levene's Test > 0.05, use the row of
information for Equality of Variance; F-test: p > 0.05
b. If the significance for the Levene's Test < 0.05, use the row of
information for “Unequality” of Variance; F-test: p < 0.05
5. Alpha level, a = 0.05 for 95% confidence statement and a = 0.01 for 99%
6. The t-value: ta , t alpha is the table look up value given alpha (0.05) and, df
(From two-tailed t-distribution table).
40
Significance level: The probability at which you reject the null hypothesis.
Acceptance and Rejection Criteria:
Reject the null hypothesis for any of the following reasons:
1. The significance level < alpha or p-value < 0.05
2. The absolute value of the t-test (calculated) > t-value (lookup)
3. The confidence interval for the mean difference does not contain zero.
Do not reject (accept) the null hypothesis for any of the following reasons:
1. The significance level > alpha or p-value > 0.05
2. The absolute value of the t-test (calculated) < t-value (table lookup)
3. The confidence interval for the mean difference contains zero
The SPSS output for the two-sample t-test will provide sufficient information for
you to make a decision about your null hypothesis. The following is an example of the
two-sample t-test.
The SPSS procedure to calculate the two-sample t-test for two independent
variables is as follows: Analyze -> Compare Means -> Independent Samples T Test -> Select
and Define a Grouping Variable (two values) -> Under Option make sure the Confidence Interval
is 95% (for alpha of 0.05 or significance level).
41
Example: Are the means for the pass9th variable by hilo variable (values 1 and 2) the
same? Since we which to compare two means (two groupings of the pass9th variable by
the hilo variable), we can use the two-sample t-test with an alpha of 0.05 (95%
confidence; 95% or 0.95 = 1.00 – 0.05). When we performed the analysis using SPSS, we
obtained to following results:
Figure 25. SPSS two-sample t-test procedure for independent samples.
The null hypothesis is that: H0: mean1 (value 1) = mean2 (value 2) of pass9th variable.
42
Table 16
SPSS Two-Sample t-test of pass9th Variable by Hilo: Basic Statistics
hilo N Mean Std. Deviation
Std. Error
Mean
Passed 9th
Grade
1.00 47
62.49
14.368
2.096
2.00 47
69.23
12.033
1.755
Table 17
SPSS Two-Sample t-test of pass9th Variable by Hilo: t-test Analysis
t-test for Equality of Means
Levene's
Test for
Equality of
Variances
Confidence
Interval, 95%
F
Sig.
t
df
Sig. 2-
tailed
Mean
Diff
Std
Error
Lower
Upper
Equal
Variance
0.756
0.387
-2.467
92
0.015
-6.745
2.734
-12.174
-1.315
Unequal
Variance
-2.467
89.251
0.016
-6.745
2.734
-12.176
-1.313
Making inference or Results of the test:
Since the F-test significance level of 0.387 > 0.05, we assume that the variances
are equal, and use row for Equal Variance to make our inference about the means (notice
that the statistics looks the same for both rows, maybe due to large sample size).
The t-value obtained from the two-tailed t-distribution table in the Appendix is
1.987 and since the following is observed from the results Table 14, we reject the null
hypothesis in favor of the alternate hypothesis that there is a significant difference
between the two means for hilo values for the pass9th variable.
43
We reject the null hypothesis for the following reasons:
1. The significance p-value = 0.015 < 0.05
2. The absolute value of the t-test of 2.467 > t-value of 1.987 (Appendix)
3. The confidence interval (-12.174 to -1.315) of the mean difference
does not contain zero.
Hypothesis Testing - Correlated samples case for the mean
Correlated or Paired Samples: If you wish to compare the means of two variables that
are related or correlated, you use the Paired t-test statistical strategy. For this test you will
need to know both sample means, M1 and M2; sample standard deviations, S1 and S2, and
the sample sizes, N1 = N2. If the test for correlation shows a low correlation coefficient
and the test significance is high, consider using the Independent Samples t-test.
Usually the two correlated variables represents the same group of samples but
measured at different time (e.g. before and after an event or treatment), or related groups
(e.g. husbands and wives or left wheel and right wheel of the same automobile).
Hypothesis: Your null hypothesis, H0 mean1 = mean2. Your alternate hypothesis, Ha is
that there is a difference in the means.
H0: M1 = M2
Statistics Used: Once you select the appropriate t-test method from your SPSS program,
it automatically calculates these statistics for you:
1. Degree of freedom: df = N1 - 1
2. Standard error of the mean difference,
SESD
N 1, where SD
D2
N D2
44
3. Test Statistics, t-test:
t DSE
4. Test for Correlation
a. Correlated if r is high and correlation p < 0.05
b. Use Independent Samples t-test, if r is low and correlation p > 0.05
5. Alpha level, a = 0.05 for 95% confidence statement and a = 0.01 for 99%
6. The t-value: ta , t alpha is the table look up value given alpha (0.05) and, df
(From two-tailed t-distribution table).
Significance level: The probability at which you reject the null hypothesis.
Acceptance and Rejection Criteria:
Reject the null hypothesis for any of the following reasons:
1. The significance level < alpha or p-value < 0.05
2. The absolute value of the t-test (calculated) > t-value (table lookup)
3. The confidence interval for the mean difference does not contain zero.
Do not reject (accept) the null hypothesis for any of the following reasons:
1. The significance level > alpha or p-value > 0.05
2. The absolute value of the t-test (calculated) < t-value (table lookup)
3. The confidence interval for the mean difference contains zero
The SPSS output for the Paired t-test will provide sufficient information for you
to make a decision about your null hypothesis. The following is an example of the Paired
t-test for two correlated variables.
45
The SPSS procedure to calculate the Paired t-test for two correlated variables is as
follows: Analyze -> Compare Means -> Paired-samples T Test -> Select the Correlated
Variables -> Under Option make sure the Confidence Interval is 95% (for alpha of 0.05 or significance
level).
Example: Are the means for the Visual and Mathach variables the same from the
HSB500 table? Since we which to compare two means for related or correlated variables,
we can use the Paired t-test with a suggested alpha of 0.05 (95% confidence). When we
performed the analysis using SPSS, we obtained to following results:
The null hypothesis is that: H0 : mean1 (Visual) = mean2 (Mathach)
Table 19
SPSS Paired t-test of Visual and Mathach Variables: Basic Statistics
Variable Mean N Std. Deviation
Std. Error
Mean
Pair 1 visual 5.697
500
3.887
0.174
mathach 13.098
500
6.605
0.295
Table 20
SPSS Paired t-test Correlation Table
N Correlation Sig.
Pair 1 visual &
mathach 500
.438
.000
46
Figure 26. SPSS Paired t-test procedure.
47
Table 21
SPSS Paired t-test Analysis Table
Paired Differences
t df
Sig. (2-
tailed)
Mean
Std.
Deviation
Std.
Error
Mean
95% Confidence
Interval of the
Difference
Lower Upper
visual -
mathach
-7.401
6.018
0.269
-7.930
-6.872
-27.499
499
.000
Making inference from the t-test Analysis
The t-value obtained from the two-tailed t-distribution table in the Appendix is
1.960 and since the following is observed from the results Table 17, we reject the null
hypothesis in favor of the alternate hypothesis that there is a significant difference
between the two means for hilo values for the pass9th variable.
We reject the null hypothesis for the following reasons:
1. The significance p-value = 0.000 < 0.05
2. The absolute value of the t-test of 27.499 > t-value of 1.960 (Appendix)
3. The confidence interval (-7.930 to -6.872) of the mean difference
does not contain zero.
Simple Linear Regression
If two variables are correlated, we can find a formula that predicts the value of
one variable given the value of the other. The variable that is used for prediction is called
the independent variable and is often associated with the variable, X. The predicted
48
variable, Y is called the dependent variable for its calculation is dependent upon the
other variable, X.
A linear equation or the linear formula is a formula of a straight line and is given
my the equation: y = mx + b, where m is the slope of the line, b is the y-intercept (the
value of y when = 0 or the value of the dependent variable when the independent variable
is zero) and X and Y are the independent and dependent variables receptively. If the graph
of two related variables looks like a straight line we say that they appear to be linear
(linearity exist between them).
The purpose of linear regression is to find an equation of the best straight line that
represents the linear relationship between both sets of data. Knowing the linear formula
of a pair of data sets allows us to predict the value of one variable given the value of the
other.
Assumptions for the Pearson r in Regression
There are four assumptions needed before one can meaningfully apply the linear
regression model:
1. Both variables must be correlated, r must be significant
2. The relationship between both variables must be linear (close to a straight line)
3. Both variables are fairly normality distributed (population)
4. Standard deviation of the dependent variable about Y for a given value of
X is about the same (homoscedasticity)
The SPSS procedure for Linear Regression is (HWJ100 table): Analyze -> Regression ->
Linear -> Select Dependent and Independent Variable.
49
Figure 27. Regression SPSS procedure: Equation.
The outputs for the regression analysis are many, and tell us different things about
the analysis. The Regression Summary (Table 18) shows us the Pearson's correlation
coefficient value, r. Since r is high, 0.838, it indicates that the variables are linearly
related and the Coefficient of Determination, r2 is 0.703. The Coefficient of
Determination tells us how much of the variation in the dependent variable, Y, is due to
change in the independent variable, X. An r2 of 0.703 tells us that 70.3% of the variation
in Verbal scores is associated with changes in GPA. That means that 29.7% is caused by
other factors. Table 18 and 19 indicate that the GPA is the independent variable
(predictor variable) and Verbal is the dependent.
50
Table 22
SPSS Regression Table: Summary
Model R R Square
Adjusted R
Square
Std. Error of
the Estimate
1 .838(a)
.703
.700
7.29910
a Predictors: (Constant), GPA
The regression correlation Table 23 gives tells us many other things. The t value
of 8.676 is high which is good and the significance level of 0.000 < 0.05. This indicates
that the GPA significantly predicts the Verbal scores. Rule of thumb: t value greater than
2 and less than -2 show significant predictability of the independent variable.
The linear regression model is shown in the B column of Table 23; the b is the
Constant and the m is the GPA's B value of 25.011. So the equation for this regression
line of equation is:
Verbal = 40.540 + 25.011(GPA)
So a GPA of 2.1 would predict a Verbal score of 93.063 = 40.54 + 25.011(2.1)
Table 23
SPSS Regression Table: Correlation
Model Unstandardized
Coefficients Standardized
Coefficients
t Sig.
B Std. Error Beta
1 (Constant
) 40.540
4.673
8.676
.000
gpa 25.011
1.643
.838
15.221
.000
a Dependent Variable: Verbal
51
Figure 28. SPSS procedure: Analyze -> Regression -> Curve Estimation.
The Analysis of Variance (ANOVA), Table 24 tells us how good the regression
model is in predicting the Verbal variable. An F-test significance level of 0.000 < 0.05
(p-value < a) shows that the regression model is, significantly, a good model in
predicting the outcome, Verbal given GPA. Other information in Table 24 also shows the
significance of the model: high Sum of Squares for the Regression row relative to the
Residual row, and high F-test statistics indicate significance
52
Table 24
SPSS Regression Analysis: ANOVA
Model
Sum of
Squares df Mean Square F Sig.
1 Regressio
n 12343.457
1
12343.457
231.685
.000(a)
Residual 5221.133
98
53.277
Total 17564.590
99
a Predictors: (Constant), GPA
b Dependent Variable: Verbal
80.00
90.00
100.00
110.00
120.00
130.00
140.00
2.00 2.50 3.00 3.50 4.00
gpa
Observed
Linear
verbal
Figure 29. SPSS: Linear Plot of GPA and Visual.
53
Chi-square Tests of Association
The Chi-square test is a nonparametric statistical test that is not affected by the
distribution of the data (the data can be non-normal) - only that the sample be random.
Like correlation it tests the strength of the associations between variables. It does this by
comparing actual (observed) numbers in each group (categories) with those expected
theoretically or by chance. The Chi-square test requires that data be expressed as
frequencies (numbers in each category); this is nominal level of measurement.
The reliability of the Chi-square test requires that the expected frequencies in each
category be not less than 5. Each category should be independent of each other; that is, no
data should fall into more than one category.
We will illustrate the Chi-square test of association with two examples. The first
example tests the hypothesis whether there is any relationship or association between a
college student's sex (gender) and his or her father's level of education (students attending
college from the HSB500 table). Before we perform the analysis, what would you expect
the outcome to be? The second example tests the hypothesis whether there is any
relationship between a college student's sex and grade index.
Example 1: Is there a relationship between sex (0, 1) and father's education (2 - 10)?
A summary of the number of cases (frequencies) in each categories are show
below. Table 25 is called the contingency table, specifically a “9 x 2” contingency table
(nine columns for the father's education variable and two rows for the sex variable). It is
also called the cross-tabulation, rows and columns are totaled.
54
Table 25
9 x 2 Contingency Table: Sex and Father's Education
Father's education
less
than HS
HS
grad
less
than 2
yr
Voc
more
than 2
yr
Voc
less
than 2
yr
Coll
more
than 2
yr
Coll
Coll
grad
Maste
r's
MD/
PhD
Total
Sex
2 3 4 5 6 7 8 9 10 280
0 72 73 12 22 14 20 41 15 11 280
1 60 56 2 16 13 6 35 21 11 220
Total 132 129 14 38 27 26 76 36 22 500
The contingency table for this data is generated with SPSS program from the
HSB500 table. The SPSS procedure for a simply Chi-square association analysis is
shown below. Note, there are other types of Chi-square analysis beyond the scope of this
text.
SPSS Chi-square procedure: Analyze -> Descriptive Statistics -> Crosstabs -> select row variable
(sex) and column variable(s) (faed) -> under Statistics, check Chi-square.
The output for the Chi-square association analysis is listed in Table 26. Notice the
following:
1. The Pearson Chi-square statistics is, 13.465 this is less than the
(table, 1 - )
13.465 < 15.51 (df = 8, = 0.05); So null hypothesis is true, there is no
association between sex and fathers education.
2. The significance level is 0.097 > alpha, of 0.05
3. No (0%) expected frequency is less than 5 (20% maximum allowed)
55
Figure 30. SPSS Procedure: Chi-square for association.
56
Table 26
Chi-square Analysis: Sex and Father's Education
Value df
Asymp. Sig.
(2-sided)
Pearson Chi-square 13.465(a)
8
.097
Likelihood Ratio 14.467
8
.070
Linear-by-Linear
Association .543
1
.461
N of Valid Cases 500
a 0 cells (.0%) have expected count less than 5. The minimum expected count is 6.16.
Example 2: Is sex related to students’ grade index in the HSB500 table?
The cross-tabulation or 7 x 2 contingency table (7 columns and 2 rows of
frequencies associated with grades and sex) is shown below:
Grades
2 3 4 5 6 7 8
Total
Sex 0 1 12 26 57 69 79 36 280
1 2 14 40 47 52 42 23 220
Total 1 3 26 66 104 121 121 59 500
The Chi-square analysis shows that a person's sex (gender) influence their grade
index, because Pearson’s Chi-square value is 13.987 > 12.59 (df = 6, = 0.05 or
0.95 ;
1- ). The significance level is 0.030 < = 0.05, so there is an association between these
two variables. Note that less than 20% of expected frequencies are less than 5 (14.3%).
57
Table 27
Chi-square Analysis: Sex and Grades
Value df
Asymp. Sig.
(2-sided)
Pearson Chi-Square 13.987(a)
6
.030
Likelihood Ratio 14.014
6
.029
Linear-by-Linear
Association 10.505
1
.001
N of Valid Cases 500
a 2 cells (14.3%) have expected count less than 5. The minimum expected count is 1.32.
58
Appendix - Statistical Tables
Z-score Probability Distribution Table (cumulative)
z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.6
0.000159146
0.0001531
0.00014734
0.00014175
0.00013635
0.00013115
0.00012614
0.00012131
0.00011665
0.00011216
-3.5
0.000232673
0.0002241
0.00021582
0.00020782
0.0002001
0.00019266
0.00018547
0.00017853
0.00017184
0.00016538
-3.4
0.000336981
0.0003249
0.00031316
0.00030184
0.00029091
0.00028034
0.00027013
0.00026028
0.00025075
0.00024156
-3.3
0.000483483
0.0004665
0.00045014
0.00043429
0.00041895
0.00040411
0.00038977
0.00037589
0.00036248
0.00034952
-3.2
0.000687202
0.0006637
0.00064102
0.00061901
0.00059771
0.00057709
0.00055712
0.0005378
0.0005191
0.000501
-3.1
0.000967671
0.0009355
0.00090432
0.0008741
0.00084481
0.00081642
0.00078891
0.00076226
0.00073644
0.00071143
-3
0.001349967
0.0013063
0.00126394
0.00122284
0.00118296
0.00114428
0.00110675
0.00107036
0.00103507
0.00100085
-2.9
0.00186588
0.0018072
0.00175022
0.00169488
0.00164113
0.00158894
0.00153826
0.00148907
0.00144131
0.00139496
-2.8
0.002555191
0.0024771
0.00240124
0.00232746
0.00225574
0.00218603
0.00211827
0.00205242
0.00198844
0.00192628
-2.7
0.003467023
0.0033642
0.00326415
0.00316677
0.00307201
0.00297982
0.00289012
0.00280287
0.002718
0.00263546
-2.6
0.004661222
0.0045271
0.00439653
0.00426928
0.00414534
0.00402463
0.00390708
0.00379261
0.00368115
0.00357265
-2.5
0.00620968
0.0060366
0.00586776
0.00570315
0.00554265
0.00538617
0.00523363
0.00508495
0.00494005
0.00479883
-2.4
0.008197529
0.0079763
0.00776025
0.00754941
0.00734363
0.00714281
0.00694686
0.00675566
0.00656913
0.00638717
-2.3
0.010724081
0.0104441
0.01017041
0.00990305
0.00964185
0.00938669
0.00913745
0.00889403
0.00865631
0.00842418
-2.2
0.013903399
0.0135525
0.01320934
0.01287368
0.01254542
0.01222443
0.01191059
0.01160376
0.01130381
0.01101063
59
z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-2.1
0.017864357
0.0174291
0.01700296
0.01658575
0.01617733
0.01577755
0.01538628
0.01500337
0.01462868
0.01426207
-2
0.022750062
0.0222155
0.02169162
0.0211782
0.02067509
0.02018215
0.0196992
0.01922611
0.0187627
0.01830884
-1.9
0.028716493
0.0280665
0.02742888
0.02680335
0.02618978
0.02558799
0.02499783
0.02441912
0.02385169
0.0232954
-1.8
0.035930266
0.0351478
0.03437945
0.03362491
0.03288406
0.03215671
0.0314427
0.03074184
0.03005397
0.02937891
-1.7
0.044565432
0.0436329
0.04271618
0.0418151
0.04092947
0.04005911
0.03920386
0.03836352
0.03753793
0.0367269
-1.6
0.054799289
0.0536989
0.05261613
0.05155074
0.05050257
0.04947145
0.04845721
0.04745966
0.04647863
0.04551395
-1.5
0.066807229
0.0655217
0.06425551
0.06300838
0.06178019
0.06057077
0.05937995
0.05820756
0.05705344
0.0559174
-1.4
0.080756711
0.0792699
0.07780389
0.07635856
0.07493374
0.0735293
0.07214508
0.07078091
0.06943666
0.06811215
-1.3
0.096800549
0.095098
0.09341757
0.0917592
0.09012273
0.08850805
0.08691502
0.08534351
0.08379338
0.08226449
-1.2
0.115069732
0.1131395
0.1112325
0.10934862
0.10748776
0.10564984
0.10383475
0.10204238
0.10027263
0.09852539
-1.1
0.135666102
0.1334996
0.13135693
0.12923816
0.1271432
0.12507199
0.12302446
0.12100054
0.11900017
0.11702326
-1
0.15865526
0.1562477
0.15386424
0.15150502
0.14916997
0.14685908
0.14457233
0.14230969
0.14007112
0.13785661
-0.9
0.184060092
0.1814112
0.17878635
0.17618552
0.17360876
0.17105611
0.1685276
0.16602324
0.16354306
0.16108706
-0.8
0.211855334
0.20897
0.20610799
0.20326933
0.20045414
0.19766249
0.19489447
0.19215016
0.18942961
0.18673291
-0.7
0.241963578
0.238852
0.23576242
0.23269502
0.22964992
0.22662728
0.22362722
0.22064988
0.21769537
0.21476382
-0.6
0.274253065
0.2709308
0.26762883
0.26434723
0.26108623
0.25784604
0.25462685
0.25142882
0.24825216
0.24509702
-0.5
0.308537533
0.3050257
0.30153177
0.29805594
0.29459849
0.29115966
0.28773968
0.28433881
0.28095726
0.27759528
-0.4
0.344578303
0.340903
0.33724276
0.33359785
0.32996858
0.32635524
0.32275813
0.31917752
0.3156137
0.31206695
-0.3
0.382088643
0.3782805
0.37448423
0.37070005
0.36692833
0.36316941
0.35942363
0.3556913
0.35197276
0.34826832
-0.2
0.420740313
0.4168339
0.41293561
0.40904593
0.40516518
0.40129373
0.39743194
0.39358019
0.38973881
0.38590818
-0.1
0.460172104
0.4562046
0.45224153
0.44828318
0.44432997
0.44038229
0.43644053
0.43250507
0.42857629
0.42465458
60
z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 0.5 0.5039894
0.50797835
0.51196653
0.5159535
0.51993887
0.52392225
0.52790324
0.53188144
0.53585646
0.1
0.539827896
0.5437954
0.54775847
0.55171682
0.55567003
0.55961771
0.56355947
0.56749493
0.57142371
0.57534542
0.2
0.579259687
0.5831661
0.58706439
0.59095407
0.59483482
0.59870627
0.60256806
0.60641981
0.61026119
0.61409182
0.3
0.617911357
0.6217195
0.62551577
0.62929995
0.63307167
0.63683059
0.64057637
0.6443087
0.64802724
0.65173168
0.4
0.655421697
0.659097
0.66275724
0.66640215
0.67003142
0.67364476
0.67724187
0.68082248
0.6843863
0.68793305
0.5
0.691462467
0.6949743
0.69846823
0.70194406
0.70540151
0.70884034
0.71226032
0.71566119
0.71904274
0.72240472
0.6
0.725746935
0.7290692
0.73237117
0.73565277
0.73891377
0.74215396
0.74537315
0.74857118
0.75174784
0.75490298
0.7
0.758036422
0.761148
0.76423758
0.76730498
0.77035008
0.77337272
0.77637278
0.77935012
0.78230463
0.78523618
0.8
0.788144666
0.79103
0.79389201
0.79673067
0.79954586
0.80233751
0.80510553
0.80784984
0.81057039
0.81326709
0.9
0.815939908
0.8185888
0.82121365
0.82381448
0.82639124
0.82894389
0.8314724
0.83397676
0.83645694
0.83891294
1
0.84134474
0.8437523
0.84613576
0.84849498
0.85083003
0.85314092
0.85542767
0.85769031
0.85992888
0.86214339
1.1
0.864333898
0.8665004
0.86864307
0.87076184
0.8728568
0.87492801
0.87697554
0.87899946
0.88099983
0.88297674
1.2
0.884930268
0.8868605
0.8887675
0.89065138
0.89251224
0.89435016
0.89616525
0.89795762
0.89972737
0.90147461
1.3
0.903199451
0.904902
0.90658243
0.9082408
0.90987727
0.91149195
0.91308498
0.91465649
0.91620662
0.91773551
1.4
0.919243289
0.9207301
0.92219611
0.92364144
0.92506626
0.9264707
0.92785492
0.92921909
0.93056334
0.93188785
1.5
0.933192771
0.9344783
0.93574449
0.93699162
0.93821981
0.93942923
0.94062005
0.94179244
0.94294656
0.9440826
1.6
0.945200711
0.9463011
0.94738387
0.94844926
0.94949743
0.95052855
0.95154279
0.95254034
0.95352137
0.95448605
1.7
0.955434568
0.9563671
0.95728382
0.9581849
0.95907053
0.95994089
0.96079614
0.96163648
0.96246207
0.9632731
1.8
0.964069734
0.9648522
0.96562055
0.96637509
0.96711594
0.96784329
0.9685573
0.96925816
0.96994603
0.97062109
1.9
0.971283507
0.9719335
0.97257112
0.97319665
0.97381022
0.97441201
0.97500217
0.97558088
0.97614831
0.9767046
2
0.977249938
0.9777845
0.97830838
0.9788218
0.97932491
0.97981785
0.9803008
0.98077389
0.9812373
0.98169116
2.1
0.982135643
0.9825709
0.98299704
0.98341425
0.98382267
0.98422245
0.98461372
0.98499663
0.98537132
0.98573793
2.2
0.986096601
0.9864475
0.98679066
0.98712632
0.98745458
0.98777557
0.98808941
0.98839624
0.98869619
0.98898937
61
2.3
0.989275919
0.9895559
0.98982959
0.99009695
0.99035815
0.99061331
0.99086255
0.99110597
0.99134369
0.99157582
2.4
0.991802471
0.9920237
0.99223975
0.99245059
0.99265637
0.99285719
0.99305314
0.99324434
0.99343087
0.99361283
z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
2.5
0.99379032
0.9939634
0.99413224
0.99429685
0.99445735
0.99461383
0.99476637
0.99491505
0.99505995
0.99520117
2.6
0.995338778
0.9954729
0.99560347
0.99573072
0.99585466
0.99597537
0.99609292
0.99620739
0.99631885
0.99642735
2.7
0.996532977
0.9966358
0.99673585
0.99683323
0.99692799
0.99702018
0.99710988
0.99719713
0.997282
0.99736454
2.8
0.997444809
0.9975229
0.99759876
0.99767254
0.99774426
0.99781397
0.99788173
0.99794758
0.99801156
0.99807372
2.9
0.99813412
0.9981928
0.99824978
0.99830512
0.99835887
0.99841106
0.99846174
0.99851093
0.99855869
0.99860504
3
0.998650033
0.9986937
0.99873606
0.99877716
0.99881704
0.99885572
0.99889325
0.99892964
0.99896493
0.99899915
3.1
0.999032329
0.9990645
0.99909568
0.9991259
0.99915519
0.99918358
0.99921109
0.99923774
0.99926356
0.99928857
3.2
0.999312798
0.9993363
0.99935898
0.99938099
0.99940229
0.99942291
0.99944288
0.9994622
0.9994809
0.999499
3.3
0.999516517
0.9995335
0.99954986
0.99956571
0.99958105
0.99959589
0.99961023
0.99962411
0.99963752
0.99965048
3.4
0.999663019
0.9996751
0.99968684
0.99969816
0.99970909
0.99971966
0.99972987
0.99973972
0.99974925
0.99975844
3.5
0.999767327
0.9997759
0.99978418
0.99979218
0.9997999
0.99980734
0.99981453
0.99982147
0.99982816
0.99983462
3.6
0.999840854
0.9998469
0.99985266
0.99985825
0.99986365
0.99986885
0.99987386
0.99987869
0.99988335
0.99988784
3.7
0.99989217
0.9998963
0.99990036
0.99990423
0.99990796
0.99991156
0.99991502
0.99991835
0.99992156
0.99992465
3.8
0.999927628
0.9999305
0.99993325
0.99993591
0.99993846
0.99994092
0.99994329
0.99994556
0.99994775
0.99994986
z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
Note. Generated using the standard normal formula at given alpha levels
62
Values of t at the 0.05 and 0.01 level: Two-tailed
df
a = 0.05
a = 0.01
1
12.706
63.657
2
4.303
9.925
3
3.183
5.841
4
2.777
4.604
5
2.571
4.032
6
2.447
3.707
7
2.365
3.500
8
2.306
3.355
9
2.262
3.250
10
2.228
3.169
11
2.201
3.106
12
2.179
3.055
13
2.160
3.012
14
2.145
2.977
15
2.132
2.947
16
2.120
2.921
17
2.110
2.898
18
2.101
2.879
19
2.093
2.861
20
2.086
2.845
21
2.080
2.831
22
2.074
2.819
23
2.069
2.807
24
2.064
2.797
25
2.060
2.787
26
2.056
2.779
27
2.052
2.771
28
2.048
2.763
29
2.045
2.756
30
2.042
2.750
40
2.021
2.705
50
2.009
2.678
60
2.000
2.660
70
1.994
2.648
80
1.990
2.639
90
1.987
2.632
100
1.984
2.626
1.960
2.576
Adapted from Sockloff, A., & Edney, J. (1972). Some extension of Student’s t and Pearson’s r central distributions, Technical Report (May 1972). Measurement and Research, Temple University, Philadelphia.
63
Values of t at various significance levels: One-tailed
Significance Levels, p df 0.4
0.25 0.1
0.05 0.025
0.01 0.005
0.0025 0.001 0.0005
1 0.325
1.000
3.078
6.314
12.706
31.821
63.656
127.321
318.289
636.578
2 0.289
0.816
1.886
2.920
4.303
6.965
9.925
14.089
22.328
31.600
3 0.277
0.765
1.638
2.353
3.182
4.541
5.841
7.453
10.214
12.924
4 0.271
0.741
1.533
2.132
2.776
3.747
4.604
5.598
7.173
8.610
5 0.267
0.727
1.476
2.015
2.571
3.365
4.032
4.773
5.894
6.869
6 0.265
0.718
1.440
1.943
2.447
3.143
3.707
4.317
5.208
5.959
7 0.263
0.711
1.415
1.895
2.365
2.998
3.499
4.029
4.785
5.408
8 0.262
0.706
1.397
1.860
2.306
2.896
3.355
3.833
4.501
5.041
9 0.261
0.703
1.383
1.833
2.262
2.821
3.250
3.690
4.297
4.781
10 0.260
0.700
1.372
1.812
2.228
2.764
3.169
3.581
4.144
4.587
11 0.260
0.697
1.363
1.796
2.201
2.718
3.106
3.497
4.025
4.437
12 0.259
0.695
1.356
1.782
2.179
2.681
3.055
3.428
3.930
4.318
13 0.259
0.694
1.350
1.771
2.160
2.650
3.012
3.372
3.852
4.221
14 0.258
0.692
1.345
1.761
2.145
2.624
2.977
3.326
3.787
4.140
15 0.258
0.691
1.341
1.753
2.131
2.602
2.947
3.286
3.733
4.073
16 0.258
0.690
1.337
1.746
2.120
2.583
2.921
3.252
3.686
4.015
17 0.257
0.689
1.333
1.740
2.110
2.567
2.898
3.222
3.646
3.965
18 0.257
0.688
1.330
1.734
2.101
2.552
2.878
3.197
3.610
3.922
19 0.257
0.688
1.328
1.729
2.093
2.539
2.861
3.174
3.579
3.883
20 0.257
0.687
1.325
1.725
2.086
2.528
2.845
3.153
3.552
3.850
21 0.257
0.686
1.323
1.721
2.080
2.518
2.831
3.135
3.527
3.819
22 0.256
0.686
1.321
1.717
2.074
2.508
2.819
3.119
3.505
3.792
df 0.4
0.25 0.1
0.05 0.025
0.01 0.005
0.0025 0.001 0.0005 23 0.256
0.685
1.319
1.714
2.069
2.500
2.807
3.104
3.485
3.768
24 0.256
0.685
1.318
1.711
2.064
2.492
2.797
3.091
3.467
3.745
25 0.256
0.684
1.316
1.708
2.060
2.485
2.787
3.078
3.450
3.725
26 0.256
0.684
1.315
1.706
2.056
2.479
2.779
3.067
3.435
3.707
27 0.256
0.684
1.314
1.703
2.052
2.473
2.771
3.057
3.421
3.689
29 0.256
0.683
1.311
1.699
2.045
2.462
2.756
3.038
3.396
3.660
29 0.256
0.683
1.311
1.699
2.045
2.462
2.756
3.038
3.396
3.660
30 0.256
0.683
1.310
1.697
2.042
2.457
2.750
3.030
3.385
3.646
40 0.255
0.681
1.303
1.684
2.021
2.423
2.704
2.971
3.307
3.551
50 0.255
0.679
1.299
1.676
2.009
2.403
2.678
2.937
3.261
3.496
60 0.254
0.679
1.296
1.671
2.000
2.390
2.660
2.915
3.232
3.460
70 0.254
0.678
1.294
1.667
1.994
2.381
2.648
2.899
3.211
3.435
80 0.254
0.678
1.292
1.664
1.990
2.374
2.639
2.887
3.195
3.416
90 0.254
0.677
1.291
1.662
1.987
2.368
2.632
2.878
3.183
3.402
100 0.254
0.677
1.290
1.660
1.984
2.364
2.626
2.871
3.174
3.390
110 0.254
0.677
1.289
1.659
1.982
2.361
2.621
2.865
3.166
3.381
120 0.254
0.677
1.289
1.658
1.980
2.358
2.617
2.860
3.160
3.373
8 0.253
0.674
1.282
1.645
1.960
2.326
2.576
2.807
3.090
3.290
Note. Generated using the t distribution (one-tailed) at given alpha levels
64
Chi-square Table
df
0
0.01
0.03
0.05
0.1
0.9
0.95
0.97
0.99
1
1
0.000039
0.00016
0.00098
0.0039
0.0158
2.71
3.84
5.02
6.63
7.88
2
0.01
0.0201
0.0506
0.1026
0.2107
4.61
5.99
7.38
9.21
10.6
3
0.0717
0.115
0.216
0.352
0.584
6.25
7.81
9.35
11.34
12.84
4
0.207
0.297
0.484
0.711
1.064
7.78
9.49
11.14
13.28
14.86
5
0.412
0.554
0.831
1.15
1.61
9.24
11.07
12.83
15.09
16.75
6
0.676
0.872
1.24
1.64
2.2
10.64
12.59
14.45
16.81
18.55
7
0.989
1.24
1.69
2.17
2.83
12.02
14.07
16.01
18.48
20.28
8
1.34
1.65
2.18
2.73
3.49
13.36
15.51
17.53
20.09
21.96
9
1.73
2.09
2.7
3.33
4.17
14.68
16.92
19.02
21.67
23.59
10
2.16
2.56
3.25
3.94
4.87
15.99
18.31
20.48
23.21
25.19
11
2.6
3.05
3.82
4.57
5.58
17.28
19.68
21.92
24.73
26.76
12
3.07
3.57
4.4
5.23
6.3
18.55
21.03
23.34
26.22
28.3
13
3.57
4.11
5.01
5.89
7.04
19.81
22.36
24.74
27.69
29.82
14
4.07
4.66
5.63
6.57
7.79
21.06
23.68
26.12
29.14
31.32
15
4.6
5.23
6.26
7.26
8.55
22.31
25
27.49
30.58
32.8
16
5.14
5.81
6.91
7.96
9.31
23.54
26.3
28.85
32
34.27
18
6.26
7.01
8.23
9.39
10.86
25.99
28.87
31.53
34.81
37.16
20
7.43
8.26
9.59
10.85
12.44
28.41
31.41
34.17
37.57
40
24
9.89
10.86
12.4
13.85
15.66
33.2
36.42
39.36
42.98
45.56
30
13.79
14.95
16.79
18.49
20.6
40.26
43.77
46.98
50.89
53.67
40
20.71
22.16
24.43
26.51
29.05
51.81
55.76
59.34
63.69
66.77
60
35.53
37.48
40.48
43.19
46.46
74.4
79.08
83.3
88.38
91.95
120
83.85
86.92
91.58
95.7
100.62
140.23
146.57
152.21
158.95
163.64
Note. Generated using the Chi-square distribution at various 1 – a levels.
65
Critical Values for Correlation Coefficient, r
df (n-2): 0.1 0.05 0.02 0.011 0.988 0.997 0.9995 0.99992 0.9 0.95 0.98 0.993 0.805 0.878 0.934 0.9594 0.729 0.811 0.882 0.9175 0.669 0.754 0.833 0.8746 0.622 0.707 0.789 0.8347 0.582 0.666 0.75 0.7988 0.549 0.632 0.716 0.7659 0.521 0.602 0.685 0.735
10 0.497 0.576 0.658 0.70811 0.476 0.553 0.634 0.68412 0.458 0.532 0.612 0.66113 0.441 0.514 0.592 0.64114 0.426 0.497 0.574 0.62315 0.412 0.482 0.558 0.60616 0.4 0.468 0.542 0.5917 0.389 0.456 0.528 0.57518 0.378 0.444 0.516 0.56119 0.369 0.433 0.503 0.54920 0.36 0.423 0.492 0.53721 0.352 0.413 0.482 0.52622 0.344 0.404 0.472 0.51523 0.337 0.396 0.462 0.50524 0.33 0.388 0.453 0.49625 0.323 0.381 0.445 0.48726 0.317 0.374 0.437 0.47927 0.311 0.367 0.43 0.47128 0.306 0.361 0.423 0.46329 0.301 0.355 0.416 0.45630 0.296 0.349 0.409 0.44935 0.275 0.325 0.381 0.41840 0.257 0.304 0.358 0.39345 0.243 0.288 0.338 0.37250 0.231 0.273 0.322 0.35460 0.211 0.25 0.295 0.32570 0.195 0.232 0.274 0.30380 0.183 0.217 0.256 0.28390 0.173 0.205 0.242 0.267
100 0.164 0.195 0.23 0.254
Level of Significance (p) for a Two-Tailed Test