Exercise Guidelinesciencecapacitybuilding.weebly.com/uploads/8/5/9/6/8596480/spss... · Exercise...

Exercise Guideline

A Getting familiar with SPSS

B1 Entering data by hand

B2 Using “Variable View”

B3 Creating a frequency table

C Creating a histogram

D Creating a boxplot

E Calculating mean, modus and median

F Calculating measures of spread

Exercise 1

A. Getting familiar with SPSS

> Turn on computer and screen.

> Enter username and password to log in.

> Look up SPSS 14 (etc.) within the RUG menu (within Mathematics &

Statistics) and launch SPSS.

In case SPSS has not been installed on your machine yet, you get a window saying

that you have to restart your computer. Do that. Otherwise, SPSS may have

problems running properly.

> Once SPSS is running, you are offered a menu with choices. Click on

“cancel”.

Now you are in the Data Editor, the window of SPSS in which you can enter data

and work on them. It is a spreadsheet you might be familiar with from other

applications. On its top you find the name of the data file you are working with,

but at this moment it is still: Untitled1 [DataSet0].

In the Data Editor, each (vertical) column of numbers represents a variable. Each

variable is given a name, which appears on the top of the column. Use meaningful

names, such as LENGTH, and not something like X24A06.

Each (horizontal) row represents a case. A case is a series of observations

belonging together, such as the answers of a respondent to the questions in a

questionnaire, or different values measured on the same subject of the experiment.

For instance, if you have 32 respondents, then you need 32 rows for the 32 cases. If

the questionnaire contained 40 questions, then you most probably need 40

columns, and so you have 40 variables. (Next week, we learn how to calculate

new, derivative variables from existing ones.)

The Data Editor is composed of two parts: the Data View and the Variable View.

By clicking on the knob on the bottom left part of the page you can switch between

them.

The Variable View offers an overview of your variables, and you can also define

some features of these variables. The most important features are:

1. Name: the name of the variable.

2. Type: defines the type of the variable. Some of the types offered by SPSS:

a. Numeric: the usual way of rendering numbers (e.g., 12345,67). This is what

Moore & McCabe refer to as quantitative

b. Comma: comma before each group of three digits, dot before decimal digits

(e.g., 12,345.67).

c. Dot: dot before each group of three digits, comma before decimal digits (e.g.,

12.345,67).

d. String: any textual information (e.g., answers to an open question).

3. Width: the number of positions available in the Data View window.

4. Decimals: the number of decimal digits after the comma/dot.

5. Label: text providing more information about the variable.

6. Values: texts providing information about each value of the variable.

7. Missing: the value used to denote missing values (e.g., “no answer”).

8. Column: the width of the column in the Data View window.

9. Measure: the “measurement scale” of the variable (nominal, ordinal or scale, the

last covering all types of numeric scales).

On the top of the window you find the menu of SPSS: FILE, EDIT, VIEW, DATA,

etc. All statistical calculations are found under ANALYZE, and all diagrams and

charts under GRAPHS. To calculate new variables based on the existing ones, use

the commands under TRANSFORM. The HELP menu provides you help with

further assistance, but which may prove quite concise in the beginning.

> Have a look at the different menus to get a general overview of them.

B. Entering data and creating a frequency table

The MLU (Mean Length of Utterance) measures the length of an utterance (a well-

formed sentence or a sentence-like series of words) by counting the number of

words it contains. It is an important measure of linguistic capabilities of children

acquiring a language, of patients with impaired language, but it is also useful in

identifying authors of texts.

Here are the lengths of test utterances produced by 20 patients:

3, 5, 4, 4, 10, 4, 11, 4, 4, 6, 3, 4, 4, 8, 8, 8, 5, 8, 4, 9.

> Enter these values by hand and add the variable the name MLU.

> In the Variable View, set the number of decimals to 0 (as utterance length

always has an integer value).

When you work with SPSS (as with any other application), it is good practice to

regularly save your data files. Output files are often simpler to create again, but

data files are certainly not. Moreover, SPSS 14 is not always stable, causing the

program to terminate unexpectedly. Finally, we may want to use some of the data

files during several labs.

> Therefore, save your data file to your own network drive (X:\) in a separate

folder that you create specifically for this lab.

A frequency table is a table that shows how often each value of a variable appears

among your data.

> Create a frequency table from this variable. Hint: 'Analyze', 'Descriptive Statistics', 'Frequencies'.

During the data entry process, one quite often makes errors. Hence, it is imperative

to check always the data you have just entered. Besides rereading the numbers in

Data View, you should also look for outliers “created” by erroneous data entry: for

instance, typing too many zeros or entering two values in a single cell will create

values much greater than other values. In the present case, check if the frequency

table contains only values you remember having entered (and that make sense).

Compare also your frequency table to the one of your neighbors in the lab.

> Check the frequency of each value in your frequency table together your

neighbor.

* 1. Copy the table in your report.

* 2. How many measurements (data) do you have?

* 3. Which MLU is the second most frequent?

* 4. How often does the highest value of MLU occur?

C. Creating a histogram

A histogram (or frequency diagram) is a graph displaying how frequently each of

the possible values of a variable occur (or how frequently values falling within a

certain range occur) among the data having been entered.

> Create a histogram based on the variable MLU. Hint: 'Graphs', 'Histogram'.

> Do it again, but have SPSS also draw a Normal curve. Hint: mark the checkbox „display normal curve‟.

* 5. Copy this second graph to your report.

* 6. What does the vertical axis display: numbers or percentages?

* 7. What is the highest value and what is the lowest value of the variable?

* 8. How many peaks are there?

* 9. There is a gap is the graph. At what value can this gap be found? What

does this observation mean? Would you expect to find this gap if you had

many more data?

* 10. Is this distribution approximately Normal?

D. Creating a boxplot

A boxplot is another visualization of a distribution and it proves useful for other

purposes later on.

> Create a boxplot of your variable.

Hint: 'Graphs', 'Boxplot'. Choose: “Simple” and “Summary separate variable”.

* 11. Copy this boxplot to your report.

* 12. Which is the lowest and highest value according to the boxplot?

* 13. Which is approximately the median according to the boxplot?

* 14. How many percentages of the data are outside of the box?

* 15. Which data are outside of the “whiskers” of the boxplot?

E. Calculating mean, modus and median

We often would like to summarize a variable as a single number that tells you

roughly where the values of that variable are located. Generally

the mean (average) is used for that purpose. Another option is employing the

modus, that is, the value that appears most frequently. One can also use

the median, the middle value if the observations are sorted from lowest to highest.

When a histogram is created, the mean is automatically calculated. The modus, the

median and the mean can also be derived by choosing “Analyze”, “Descriptive

Statistics”, and then “Frequencies“ in the menu. If you wish, uncheck the mark

next to „Display Frequency Table‟, and ignore the warning. Then choose the mean,

the modus and the median via the Statistics.

> Have SPSS calculate the mean, the modus and the median, and report them

to you in a single table.

* 16. Copy this table to your report.

* 17. Suppose you make an error during data entry: you type 80 instead of 8.

Which of these values will change, and which will not? (Why? How does

M&M call this feature of a statistical measure?)

* 18. The median of MLU is lower than its mean. This is because the

histogram is skewed to the … (left or right?), and it has a longer tail to the …

(left or right?).

F. Calculating measures of spread

In many cases we are not only interested in where more or less the values of the

variable are located, but also in the “width” of the frequency distribution. There are

different measures of describing the “width” of the histogram. The most known

one is standard deviation (SD), but range and interquartile range are also used.

The drawback of the range (the difference of the maximum and minimum values)

is that it is fully dependent on the two most extreme values being measured.

> Have SPSS calculate for you the SD, the range and the quartiles. Hint: “Analyze”, “Descriptive Statistics”, “Frequencies”.

* 19. Report the SD, the range and the IQR.

* 20. If the range is seen as the width of the histogram, then how many SD is

the width of this histogram? (How many times is the range larger than the SD?)

SPSS Exercises 2

Consider the following data, obtained by a researcher who listens to each of 20

randomly-selected radio stations for one hour and records the number of minutes

devoted to advertisements to programming.

Station Format Advertisements Programming

WRCE 3 21 35 WVVA 99 9 45

KORR 2 15 40

WXYZ 6 20 19 KWRZ 3 15 35

KLBA 1 10 37 KPKE 2 7 47

WVLA 1 8 44 KENT 3 25 31

WNOB 4 27 25

WRDO 2 18 34 KLUV 6 17 32

KRQQ 2 22 27 WISH 2 31 21

KLTL 100 21 28

WLNR 3 14 41

WHOO 4 12 36

KBRR 99 22 30 KUKU 6 14 29

WONE 3 21 36

1. Input the following information about each of the researchers’ variables

into the variable view screen in SPSS.

a. Variable name: station

Type: string

b. Variable name: format

Decimals: 0

Label: type of programming

Values:

1=top 40

2=easy listening

3=classic rocks/oldies

4=jazz

5=classical

6=talk

99=other

100=no data

c. Variable name: ads

Decimals: 0

Label: minutes of aired advertisements

d. Variable name: program

Decimals: 0

Label: minutes of aired programming

2. Input the data values for the variables identified in no. 9, into the Data

View screen of SPSS.

3. Use the SPSS Sort function to place data in order from least to most time

devoted to advertisements.

4. a. The researcher wishes to analyze stations that have a clearly identified

format. Use the SPSS Filter function to omit stations that fall into the

“other” or “nodata” categories from future analysis.

b. Check to be sure that the data view Screen reflects the placement of the

filter.

c. Remove the filter

5. a. The researcher wishes to focus on only stations with a talk format. Use the SPSS

Filter function to omit all other stations from future analysis. b. Check to be sure that the Data View screen reflects the placement of the filter. c. Remove the filter.

6. a. The researcher wishes to obtain separate statistics for the stations in

each of the format categories. Use the SPSS Split File

function to prepare data for such an analysis. (Remember that the

appearance of the data screen does not change.)

b. Remove the split file designation.

7. The researcher knows that radio all of the stations included in his or

her analysis charge $40 in advertising fees for each minute of

advertisements broadcast. Use the SPSS compute function to determine

how much money each station collects from advertising fees.

8. Use the SPSS Compute function to determine the amount of time per hour

that each station devotes to broadcasts other than programming and

advertisements (e.g. news, traffic, weather, etc.).

SPSS PROBLEMS

1. For each of the following, tell which significance test would be most

appropriate.

a. two small, independent samples; ordinal scale data

b. two dependent samples; interval scale data

c. three or more independent samples; ordinal scale data

d. two dependent samples; ordinal scale data

e. two independent samples; frequency data

2. Seven children from families in which there is only one child and six

children with at least one sibling are rated for willingness to share

toys with another child. Each child is given a rating from 0 (no

sharing) to 10 (virtually complete sharing) during a 20-minute

observation period. Use the appropriate test to compare the groups,

and tell what your decision means in the context of the problem.

Only Child Child With

Sibling(s)

5 10

3 9

2 9

2 7

1 4

0 2

0

3. Thirty-one randomly selected rats are assigned to one of three

different experimental diets. After 30 days on the diets, each animal

is given a test of irritability to handling. In the test, the behavior is

rated from 0 to 15 with a higher score reflecting greater irritability.

The scores are shown here. Do an overall significance test. If a

significant result is obtained, do all pairwise comparisons. Tell what

each decision means in the context of the problem.

Diet A Diet B Diet C

6 14 4

5 12 4

5 12 3

4 11 2

3 9 2

2 7 1

1 7 1

0 5 0

0 4 0

0 1 0

0

4. A trained speech analyst has received brief taped excerpts of the

speech of 18 parents. Ten of the parents have schizophrenic children,

and the remaining 8 have nonschizophrenic children. Without

knowing whether the parent has a schizophrenic child, the analyst has

rated the excerpts from 0 to 20 for defectiveness of speech. The

groups do not differ on variables such as IQ, age, education, or social

class. Is there evidence for a difference in the speech patterns of the

parents of the schizophrenic children?

Parent of Schizophrenic Parent of

Nonschizophrenic

16 12

15 11

13 10

12 9

9 9

7 5

5 4

3 2

3

2

5. A self-rating scale was used to measure attitudes toward risk taking

before and after alcohol consumption for 12 persons. A high score

indicates a positive attitude toward risk taking; a low score indicates

greater concern. Compare the before and after ratings.

Person Rating

Before Rating After

A 14 17

B 14 19

C 13 14

D 12 9

E 11 12

F 9 9

G 9 15

H 8 7

I 5 9

J 4 8

K 2 1

L 2 5

6. Twenty-four students are selected randomly from a large

introductory psychology class and assigned randomly to one of two

treatment groups. Half are given an alcohol-flavored drink, and the

other half receive a drink containing an ounce of alcohol. Ten

minutes later, each student fills out a self-rating scale measuring

attitudes toward risk taking. Assume the data are ordinal scale at best.

The results are shown here. Compare the two groups. As before, a

high score indicates a positive attitude toward risk taking.

Alcohol Group No-Alcohol Group

14 19

14 17

13 15

12 14

11 12

9 9

9 9

8 9

5 8

4 7

2 5

2 1

7. Matched pairs of parents have written letters to a child in a state

mental institution. One member of each pair has a schizophrenic

child, and the other member has a nonschizophrenic child in the

hospital. One letter from each parent has been rated for double-bind

statements (incompatible ideas and feelings) on a scale from 1 to 7,

with 7 reflecting a high incidence of double-bind statements. Use the

appropriate test to compare the groups.

Pair Parent of

Schizophrenic

Parent of

Nonschizophrenic

A 7 3

B 5 5

C 3 6

D 2 6

E 6 5

F 1 3

G 2 1

H 4 6

I 3 1

J 7 1

K 3 7

L 1 5

8. An investigator wants to see whether creativity (divergent thinking)

can be taught. In one class, the teacher specifically rewards divergent

responses during a 1-hour daily art period. In a second class, a 1-hour

art period is held, but no effort is made to reward divergent

responses. In a third class, a 1-hour study hall is given while the other

classes have the art period. At the end of the year, 10 students are

randomly selected from each class and given a standard test of

creativity on which they receive a score from 1 to 50. A higher score

indicates greater creativity. Assume the data are ordinal scale at best.

Compare the classes with an overall test. If the result is significant,

do all pairwise comparisons, and tell what your conclusions mean in

the context of the problem.

Class 1 Class 2 Class 3

48 41 42

46 40 25

43 27 24

40 25 22

39 13 18

38 11 17

35 10 9

28 10 8

27 9 8

15 7 5

Exercise Guidelinesciencecapacitybuilding.weebly.com/uploads/8/5/9/6/8596480/spss... · Exercise...

Documents

Transcript of Exercise Guidelinesciencecapacitybuilding.weebly.com/uploads/8/5/9/6/8596480/spss... · Exercise...