Exercise Guidelinesciencecapacitybuilding.weebly.com/uploads/8/5/9/6/8596480/spss... · Exercise...
Transcript of Exercise Guidelinesciencecapacitybuilding.weebly.com/uploads/8/5/9/6/8596480/spss... · Exercise...
Exercise Guideline
A Getting familiar with SPSS
B1 Entering data by hand
B2 Using “Variable View”
B3 Creating a frequency table
C Creating a histogram
D Creating a boxplot
E Calculating mean, modus and median
F Calculating measures of spread
Exercise 1
A. Getting familiar with SPSS
> Turn on computer and screen.
> Enter username and password to log in.
> Look up SPSS 14 (etc.) within the RUG menu (within Mathematics &
Statistics) and launch SPSS.
In case SPSS has not been installed on your machine yet, you get a window saying
that you have to restart your computer. Do that. Otherwise, SPSS may have
problems running properly.
> Once SPSS is running, you are offered a menu with choices. Click on
“cancel”.
Now you are in the Data Editor, the window of SPSS in which you can enter data
and work on them. It is a spreadsheet you might be familiar with from other
applications. On its top you find the name of the data file you are working with,
but at this moment it is still: Untitled1 [DataSet0].
In the Data Editor, each (vertical) column of numbers represents a variable. Each
variable is given a name, which appears on the top of the column. Use meaningful
names, such as LENGTH, and not something like X24A06.
Each (horizontal) row represents a case. A case is a series of observations
belonging together, such as the answers of a respondent to the questions in a
questionnaire, or different values measured on the same subject of the experiment.
For instance, if you have 32 respondents, then you need 32 rows for the 32 cases. If
the questionnaire contained 40 questions, then you most probably need 40
columns, and so you have 40 variables. (Next week, we learn how to calculate
new, derivative variables from existing ones.)
The Data Editor is composed of two parts: the Data View and the Variable View.
By clicking on the knob on the bottom left part of the page you can switch between
them.
The Variable View offers an overview of your variables, and you can also define
some features of these variables. The most important features are:
1. Name: the name of the variable.
2. Type: defines the type of the variable. Some of the types offered by SPSS:
a. Numeric: the usual way of rendering numbers (e.g., 12345,67). This is what
Moore & McCabe refer to as quantitative
b. Comma: comma before each group of three digits, dot before decimal digits
(e.g., 12,345.67).
c. Dot: dot before each group of three digits, comma before decimal digits (e.g.,
12.345,67).
d. String: any textual information (e.g., answers to an open question).
3. Width: the number of positions available in the Data View window.
4. Decimals: the number of decimal digits after the comma/dot.
5. Label: text providing more information about the variable.
6. Values: texts providing information about each value of the variable.
7. Missing: the value used to denote missing values (e.g., “no answer”).
8. Column: the width of the column in the Data View window.
9. Measure: the “measurement scale” of the variable (nominal, ordinal or scale, the
last covering all types of numeric scales).
On the top of the window you find the menu of SPSS: FILE, EDIT, VIEW, DATA,
etc. All statistical calculations are found under ANALYZE, and all diagrams and
charts under GRAPHS. To calculate new variables based on the existing ones, use
the commands under TRANSFORM. The HELP menu provides you help with
further assistance, but which may prove quite concise in the beginning.
> Have a look at the different menus to get a general overview of them.
B. Entering data and creating a frequency table
The MLU (Mean Length of Utterance) measures the length of an utterance (a well-
formed sentence or a sentence-like series of words) by counting the number of
words it contains. It is an important measure of linguistic capabilities of children
acquiring a language, of patients with impaired language, but it is also useful in
identifying authors of texts.
Here are the lengths of test utterances produced by 20 patients:
3, 5, 4, 4, 10, 4, 11, 4, 4, 6, 3, 4, 4, 8, 8, 8, 5, 8, 4, 9.
> Enter these values by hand and add the variable the name MLU.
> In the Variable View, set the number of decimals to 0 (as utterance length
always has an integer value).
When you work with SPSS (as with any other application), it is good practice to
regularly save your data files. Output files are often simpler to create again, but
data files are certainly not. Moreover, SPSS 14 is not always stable, causing the
program to terminate unexpectedly. Finally, we may want to use some of the data
files during several labs.
> Therefore, save your data file to your own network drive (X:\) in a separate
folder that you create specifically for this lab.
A frequency table is a table that shows how often each value of a variable appears
among your data.
> Create a frequency table from this variable. Hint: 'Analyze', 'Descriptive Statistics', 'Frequencies'.
During the data entry process, one quite often makes errors. Hence, it is imperative
to check always the data you have just entered. Besides rereading the numbers in
Data View, you should also look for outliers “created” by erroneous data entry: for
instance, typing too many zeros or entering two values in a single cell will create
values much greater than other values. In the present case, check if the frequency
table contains only values you remember having entered (and that make sense).
Compare also your frequency table to the one of your neighbors in the lab.
> Check the frequency of each value in your frequency table together your
neighbor.
* 1. Copy the table in your report.
* 2. How many measurements (data) do you have?
* 3. Which MLU is the second most frequent?
* 4. How often does the highest value of MLU occur?
C. Creating a histogram
A histogram (or frequency diagram) is a graph displaying how frequently each of
the possible values of a variable occur (or how frequently values falling within a
certain range occur) among the data having been entered.
> Create a histogram based on the variable MLU. Hint: 'Graphs', 'Histogram'.
> Do it again, but have SPSS also draw a Normal curve. Hint: mark the checkbox „display normal curve‟.
* 5. Copy this second graph to your report.
* 6. What does the vertical axis display: numbers or percentages?
* 7. What is the highest value and what is the lowest value of the variable?
* 8. How many peaks are there?
* 9. There is a gap is the graph. At what value can this gap be found? What
does this observation mean? Would you expect to find this gap if you had
many more data?
* 10. Is this distribution approximately Normal?
D. Creating a boxplot
A boxplot is another visualization of a distribution and it proves useful for other
purposes later on.
> Create a boxplot of your variable.
Hint: 'Graphs', 'Boxplot'. Choose: “Simple” and “Summary separate variable”.
* 11. Copy this boxplot to your report.
* 12. Which is the lowest and highest value according to the boxplot?
* 13. Which is approximately the median according to the boxplot?
* 14. How many percentages of the data are outside of the box?
* 15. Which data are outside of the “whiskers” of the boxplot?
E. Calculating mean, modus and median
We often would like to summarize a variable as a single number that tells you
roughly where the values of that variable are located. Generally
the mean (average) is used for that purpose. Another option is employing the
modus, that is, the value that appears most frequently. One can also use
the median, the middle value if the observations are sorted from lowest to highest.
When a histogram is created, the mean is automatically calculated. The modus, the
median and the mean can also be derived by choosing “Analyze”, “Descriptive
Statistics”, and then “Frequencies“ in the menu. If you wish, uncheck the mark
next to „Display Frequency Table‟, and ignore the warning. Then choose the mean,
the modus and the median via the Statistics.
> Have SPSS calculate the mean, the modus and the median, and report them
to you in a single table.
* 16. Copy this table to your report.
* 17. Suppose you make an error during data entry: you type 80 instead of 8.
Which of these values will change, and which will not? (Why? How does
M&M call this feature of a statistical measure?)
* 18. The median of MLU is lower than its mean. This is because the
histogram is skewed to the … (left or right?), and it has a longer tail to the …
(left or right?).
F. Calculating measures of spread
In many cases we are not only interested in where more or less the values of the
variable are located, but also in the “width” of the frequency distribution. There are
different measures of describing the “width” of the histogram. The most known
one is standard deviation (SD), but range and interquartile range are also used.
The drawback of the range (the difference of the maximum and minimum values)
is that it is fully dependent on the two most extreme values being measured.
> Have SPSS calculate for you the SD, the range and the quartiles. Hint: “Analyze”, “Descriptive Statistics”, “Frequencies”.
* 19. Report the SD, the range and the IQR.
* 20. If the range is seen as the width of the histogram, then how many SD is
the width of this histogram? (How many times is the range larger than the SD?)
SPSS Exercises 2
Consider the following data, obtained by a researcher who listens to each of 20
randomly-selected radio stations for one hour and records the number of minutes
devoted to advertisements to programming.
Station Format Advertisements Programming
WRCE 3 21 35 WVVA 99 9 45
KORR 2 15 40
WXYZ 6 20 19 KWRZ 3 15 35
KLBA 1 10 37 KPKE 2 7 47
WVLA 1 8 44 KENT 3 25 31
WNOB 4 27 25
WRDO 2 18 34 KLUV 6 17 32
KRQQ 2 22 27 WISH 2 31 21
KLTL 100 21 28
WLNR 3 14 41
WHOO 4 12 36
KBRR 99 22 30 KUKU 6 14 29
WONE 3 21 36
1. Input the following information about each of the researchers’ variables
into the variable view screen in SPSS.
a. Variable name: station
Type: string
b. Variable name: format
Decimals: 0
Label: type of programming
Values:
1=top 40
2=easy listening
3=classic rocks/oldies
4=jazz
5=classical
6=talk
99=other
100=no data
c. Variable name: ads
Decimals: 0
Label: minutes of aired advertisements
d. Variable name: program
Decimals: 0
Label: minutes of aired programming
2. Input the data values for the variables identified in no. 9, into the Data
View screen of SPSS.
3. Use the SPSS Sort function to place data in order from least to most time
devoted to advertisements.
4. a. The researcher wishes to analyze stations that have a clearly identified
format. Use the SPSS Filter function to omit stations that fall into the
“other” or “nodata” categories from future analysis.
b. Check to be sure that the data view Screen reflects the placement of the
filter.
c. Remove the filter
5. a. The researcher wishes to focus on only stations with a talk format. Use the SPSS
Filter function to omit all other stations from future analysis. b. Check to be sure that the Data View screen reflects the placement of the filter. c. Remove the filter.
6. a. The researcher wishes to obtain separate statistics for the stations in
each of the format categories. Use the SPSS Split File
function to prepare data for such an analysis. (Remember that the
appearance of the data screen does not change.)
b. Remove the split file designation.
7. The researcher knows that radio all of the stations included in his or
her analysis charge $40 in advertising fees for each minute of
advertisements broadcast. Use the SPSS compute function to determine
how much money each station collects from advertising fees.
8. Use the SPSS Compute function to determine the amount of time per hour
that each station devotes to broadcasts other than programming and
advertisements (e.g. news, traffic, weather, etc.).
SPSS PROBLEMS
1. For each of the following, tell which significance test would be most
appropriate.
a. two small, independent samples; ordinal scale data
b. two dependent samples; interval scale data
c. three or more independent samples; ordinal scale data
d. two dependent samples; ordinal scale data
e. two independent samples; frequency data
2. Seven children from families in which there is only one child and six
children with at least one sibling are rated for willingness to share
toys with another child. Each child is given a rating from 0 (no
sharing) to 10 (virtually complete sharing) during a 20-minute
observation period. Use the appropriate test to compare the groups,
and tell what your decision means in the context of the problem.
Only Child Child With
Sibling(s)
5 10
3 9
2 9
2 7
1 4
0 2
0
3. Thirty-one randomly selected rats are assigned to one of three
different experimental diets. After 30 days on the diets, each animal
is given a test of irritability to handling. In the test, the behavior is
rated from 0 to 15 with a higher score reflecting greater irritability.
The scores are shown here. Do an overall significance test. If a
significant result is obtained, do all pairwise comparisons. Tell what
each decision means in the context of the problem.
Diet A Diet B Diet C
6 14 4
5 12 4
5 12 3
4 11 2
3 9 2
2 7 1
1 7 1
0 5 0
0 4 0
0 1 0
0
4. A trained speech analyst has received brief taped excerpts of the
speech of 18 parents. Ten of the parents have schizophrenic children,
and the remaining 8 have nonschizophrenic children. Without
knowing whether the parent has a schizophrenic child, the analyst has
rated the excerpts from 0 to 20 for defectiveness of speech. The
groups do not differ on variables such as IQ, age, education, or social
class. Is there evidence for a difference in the speech patterns of the
parents of the schizophrenic children?
Parent of Schizophrenic Parent of
Nonschizophrenic
16 12
15 11
13 10
12 9
9 9
7 5
5 4
3 2
3
2
5. A self-rating scale was used to measure attitudes toward risk taking
before and after alcohol consumption for 12 persons. A high score
indicates a positive attitude toward risk taking; a low score indicates
greater concern. Compare the before and after ratings.
Person Rating
Before Rating After
A 14 17
B 14 19
C 13 14
D 12 9
E 11 12
F 9 9
G 9 15
H 8 7
I 5 9
J 4 8
K 2 1
L 2 5
6. Twenty-four students are selected randomly from a large
introductory psychology class and assigned randomly to one of two
treatment groups. Half are given an alcohol-flavored drink, and the
other half receive a drink containing an ounce of alcohol. Ten
minutes later, each student fills out a self-rating scale measuring
attitudes toward risk taking. Assume the data are ordinal scale at best.
The results are shown here. Compare the two groups. As before, a
high score indicates a positive attitude toward risk taking.
Alcohol Group No-Alcohol Group
14 19
14 17
13 15
12 14
11 12
9 9
9 9
8 9
5 8
4 7
2 5
2 1
7. Matched pairs of parents have written letters to a child in a state
mental institution. One member of each pair has a schizophrenic
child, and the other member has a nonschizophrenic child in the
hospital. One letter from each parent has been rated for double-bind
statements (incompatible ideas and feelings) on a scale from 1 to 7,
with 7 reflecting a high incidence of double-bind statements. Use the
appropriate test to compare the groups.
Pair Parent of
Schizophrenic
Parent of
Nonschizophrenic
A 7 3
B 5 5
C 3 6
D 2 6
E 6 5
F 1 3
G 2 1
H 4 6
I 3 1
J 7 1
K 3 7
L 1 5
8. An investigator wants to see whether creativity (divergent thinking)
can be taught. In one class, the teacher specifically rewards divergent
responses during a 1-hour daily art period. In a second class, a 1-hour
art period is held, but no effort is made to reward divergent
responses. In a third class, a 1-hour study hall is given while the other
classes have the art period. At the end of the year, 10 students are
randomly selected from each class and given a standard test of
creativity on which they receive a score from 1 to 50. A higher score
indicates greater creativity. Assume the data are ordinal scale at best.
Compare the classes with an overall test. If the result is significant,
do all pairwise comparisons, and tell what your conclusions mean in
the context of the problem.
Class 1 Class 2 Class 3
48 41 42
46 40 25
43 27 24
40 25 22
39 13 18
38 11 17
35 10 9
28 10 8
27 9 8
15 7 5