Working with PASW (SPSS) 17: Basic Conceptsweb2.utc.edu/.../PASW17howtopacketCJLCunningham.pdf ·...

1

Working with PASW (SPSS) 17:

Basic Concepts

Prepared by:

Chris Cunningham, Ph.D.

UTC Department of Psychology

UTCOM-C Department of Internal Medicine

Prepared for:

UTCOM-C Faculty and Resident Hands-on Training

Last updated: February 2010

2

Overview

This guide is designed to make it easier for you to use one common and very powerful statistical analysis

software program, Predictive Analytics Software (PASW) version 17 (note, the name of this software

program was changed from SPSS to PASW in 2009). Although the developers of this program will release

updates just about every year, the basic elements contained in this guide should serve you well for many

years to come. As you get more comfortable working within this particular analytical framework, you

should also be able to adjust quite easily to any new software updates. The material in this guide comes

from the author’s own experience and the following resources in particular:

Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage Publications

Dr. Michael Biderman (UTC): http://www.utc.edu/faculty/michael-biderman

PASW tutorials (built into the program’s Help function) and the SPSS 17 “brief” manual

These excellent resources are highly recommended if you feel you need additional detail on any of the

topics presented to you in this guide.

This guide is structured to address some of the most common preliminary questions or challenges faced

by residents and physicians interested in doing medical research analyses with this software program.

Mastery of the material in this guide will not guarantee you smooth sailing in all of your future research

endeavors, but it should help you avoid common pitfalls and at least get started. A basic understanding

of how to operate within a Microsoft Windows operating environment is assumed.

Please note the following points:

Research design comes before data collection. If you are at all unsure of how to proceed with your

research and data collection, I strongly recommend that you speak with your colleagues and/or

someone who has experience in research design/methodology. A few brief conversations can help

you to do excellent research. Ignoring this advice can cost you and others significant time, resources,

and even potentially, reputation. Don’t go-it alone when it comes to research.

Data are only as “good” in a quality sense as the design of the research study can ensure – please

remember the axiom, “Garbage in, garbage out.” Take the time when developing your study to

identify the best possible sources of the data or information that you need to test and address your

research questions and hypotheses.

Inevitably, there will be times in your research, when you will be stumped and unable to find a

direct answer to your design or analysis questions in any statistical how-to guide. When these

situations arise, it is important once again that you recognize the limitations on your research

knowledge and that you consult with someone who might be able to help you work through a tough

patch. Following the tips in this guide, however, will help to make sure that calling for help will be a

fruitful exercise.

http://www.utc.edu/faculty/michael-biderman

3

Contents

Overview ................................................................................................................................................. 2

Power and sample size calculations ....................................................................................................... 4

Magnitude of Effect ............................................................................................................................ 4

Error Rate ........................................................................................................................................... 5

Variability in the Populations Being Studied ...................................................................................... 5

Sample Size ......................................................................................................................................... 5

Navigating within PASW ......................................................................................................................... 7

Data View ........................................................................................................................................... 9

Variable View .................................................................................................................................... 10

Output Viewer .................................................................................................................................. 11

Understanding Variables and Data Formatting .................................................................................... 12

Nominal ............................................................................................................................................ 12

Ordinal .............................................................................................................................................. 12

Scale .................................................................................................................................................. 12

Importing Data from Excel ................................................................................................................... 13

Ugly Way ........................................................................................................................................... 13

Professional Way .............................................................................................................................. 13

Entering Data within PASW .................................................................................................................. 18

Choosing the Right Statistical Analysis ................................................................................................. 22

Generating Presentation-Quality Output ............................................................................................. 24

Creating Graphics in PASW ............................................................................................................... 25

Creating Graphics in Excel ................................................................................................................ 29

Basic descriptive and correlational statistics ....................................................................................... 30

Descriptive Statistics......................................................................................................................... 31

Basic Correlations ............................................................................................................................. 34

Other Types of Analyses ....................................................................................................................... 37

Comparing groups: t-tests and ANOVA ............................................................................................ 37

Predicting outcomes: Linear Regression .......................................................................................... 37

When the data are “special”: Nonparametric Statistics .................................................................. 38

4

Power and sample size calculations

Concerns about statistical power and necessary sample size should actually be addressed in the design

phases of a study. The reason for this is that by the time you are ready to analyze your data, you are

typically stuck with what you have and not able to gather additional data from more people without

restarting the whole collection process. That’s why this topic is being covered first in this guide – to

encourage you to think about this stuff before you begin data collection. If you are in need of some more

in-depth information on how to design research studies from the ground up, then I strongly recommend

Weathington, Cunningham, and Pittenger (2010), Research Methods for the Behavioral and Social

Sciences (Wiley) as an accessible text that links research design and methods with basic statistics (yes,

being one of the authors, I am biased – seriously though, this is a good book). Here’s a primer on

statistical power to hopefully get you thinking about some of these basic design issues when you are

developing a research plan.

When doing research, there are two basic mistakes we can make. The first is to herald an observed

relationship or difference as statistically significant when it really wasn’t – a Type I error, which can be

seen as a false alarm. This error risk is limited by the researcher’s choice of an alpha level or error rate.

The second basic mistake, is to fail to identify a relationship or difference as significant when it really

was – a Type II error, which can be thought of as a miss. This error is referred to as beta. In really general

terms, statistical power (1 – beta) is defined as the probability that we have of correctly identifying a

significant effect when it is really present. In the social sciences we shoot for a power of 80%, meaning

that we have an 80% chance of correctly rejecting a false null hypothesis (and detecting the

hypothesized significant difference or relationship that we designed the study to test for originally). In

medicine, depending on the need for precision in your research, you may wish your power to be slightly

higher or lower.

Statistical power is determined by four main factors:

Magnitude of the effect or difference between populations being studied

Established error risk (alpha level)

Variability in the population being studied

Sample size

Magnitude of Effect

We can think of small effects as needles in a haystack and large effects as pitchforks in the same

haystack – which one is easier to find? The basic rule of thumb is that larger effects will grant you higher

statistical power, which is a good thing. This being the case, when you design a study that involves some

sort of manipulation of an independent or treatment variable, you should attempt to make the

manipulation as strong as is feasible and relevant to your research aims. This will help to ensure contrast

5

between control and experimental groups, making it easier for you to spot an effect of the independent

or treatment variable if it indeed exists.

Error Risk

In most of the social sciences we are willing to accept a 5% risk of committing a Type I error (false

alarm). This means that 5% of the time we may identify a statistical effect as significant when it really

was due to chance or error. In medicine, depending on the implications of such an error, you may need

to set your error rate to 1%. Such a change should not be made lightly, however, because the more

stringent you make your Type I error rate, the lower your statistical power drops. Consider it this way –

if you are really scared about making a mistake and calling an effect significant, you are more likely to

ignore effects that are moderate, in favor of only those that are large. In some cases this is probably a

good strategy (e.g., if the treatment has nasty side effects like death). In other cases, however, doing

this will cause you to miss effects that were still large enough to be due to something other than chance

and potentially useful to patients needing your care.

Variability in the Populations Being Studied

The data we collect from any group of subjects or participants will include some variability due to factors

that we are not intending to study (and that typically are not controlled). To increase the statistical

power of a test, it may be helpful to constrain the variability present within each of the groups we are

studying, making it easier for us to detect the real effects of any manipulated independent or treatment

variable in the midst of any noise or random variation that will be present. The two best strategies for

reducing variability within our data are to:

Using samples that are relatively homogeneous (subjects/participants are similar to one other in

terms of individual difference factors)

Use more reliable and accurate measurement techniques (established scales, standardized

observation protocols, etc.)

Sample Size

It is important to realize that sample size is not the only factor you can control in a research project to

help protect your statistical power. It does tend to be the most commonly leveraged, however, and to

help you with this, there are some basic guidelines about identifying proper sample sizes. Unfortunately,

there is no single guideline for how many participants or subjects you will need for your statistical power

to be at an acceptable level. It depends on how large of an effect you think you are looking for. Because

larger effects are easier to observe unambiguously, fewer participants are needed when the effects are

large. Along these lines, more participants are needed when the effect you are expecting to find is very

small. If you have absolutely no idea what effect size you should expect based on previous, similar

6

research, then you might shoot for an appropriate sample size given a low to moderate effect.

Conservative sample size estimates will be based on the smallest possible effect size you expect to

observe.

There are very complicated formulas you can use to figure out a rough sample size estimate. Most

research methods books (including Weathington et al., 2010) include tables that can help you estimate

your necessary sample size for adequate levels of statistical power. Using these tables requires you to

know your expected effect size and desired power level. One of the most straightforward treatments of

sample size estimation was provided by Cohen (1992). I have included this article also at the end of this

guide for you to have as a resource. Again, when in doubt, estimate for the lowest likely effect size you

expect to see in your planned research (i.e., the worst-case scenario).

7

Navigating within PASW

When you first open the PASW program, you are confronted with a start-up window that immediately

scares most users:

For most PASW users, some options on this list are not helpful. The ones to pay the most attention to

are the following:

If you have some time and would like to learn more about how to use specific features of the PASW

program, by all means, click on “Run the tutorial”

If you are going to begin a new dataset with manual entry of data (perhaps from some paper

surveys that you administered), then click on “Type in data” – more on this later in this guide

If you are thinking of converting an Excel data file to PASW, then you can click on “Create a new

query…” – more details are provided in this guide in a bit

If you are hoping to open a data file that you’ve already been working on, then click “Open an

existing data source”. My recommendation is that if you are opening an existing dataset, you should

highlight the “More Files…” option and then click “OK”, so you can locate the most recent version of

your dataset yourself before you start working on it again.

Once you have chosen to either open an existing file or start from scratch, you will be looking at what’s

called the “data editor”. If you’re familiar with Microsoft Excel, then this window should look sort of

familiar. Across the very top of the screen is a typical Windows-style menu bar. Although some of the

functions in these menus are quite similar to what you are probably used to seeing in Microsoft

products, some are not. In addition there are a few things to be aware of when using these menus in

PASW. Here’s a summary of some of the main things you will likely use:

8

File: Standard menu with save functions and other basic things. Other than the New, Open, and Save

functions, the other things in this menu require more advanced instruction. What’s important right

now for you is that you use the PASW Save/Save as functions frequently. This program is not

without bugs and it does require significant processing resources on your computer. At times these

factors can cause the program to freeze or lock-up and there are few things worse than losing data

that you have been working on for a long time. Take-away points:

o Use this menu to save your files regularly in folders that you can easily find

o Save your files on back-up media frequently as well, just in case

o Use file names that make sense to you.

o One other thing to remember, is that if you want to open an existing data file, you click File

Open Data…, and then browse to find the file you seek

Edit: A standard Edit menu. Highlight cells in the data editor that you want to modify, cut, or copy

and use this menu as you would in Excel or other similar programs

View: Allows you to modify the way your data editor window looks; mess with it if you have extra

time.

Data: Allows you to change things in the data editor. These functions are probably the most useful

to you:

o Insert variable – adds a column into the Data View for a new variable

o Insert cases – adds a row into the Data View for a new case/subject/person

o Split file – allows you to split data based on some grouping factor for separate analyses

o Select cases – allows you to select a subset of your cases for separate analyses

Transform: Use when you want to modify your variables somehow. These functions are likely useful

for you:

o Recode – If you need to reverse-code an item from a scale, or if you want to change

Male/Female to 0/1, this is the function to use. NOTE: I recommend you use “Recode into

Different Variables…” so that you do not use the original raw data. If you use “Recode into

Same Variables…”, it can be difficult or impossible to undo a mistake.

o Compute – If you want to create a new variable that is some combination of two or more

other variables (e.g., a summated or averaged scale score based on multiple items)

Analyze: Presents you with all of your analysis options. Here are the ones you are most likely to

utilize:

o Descriptive Statistics: This is where you can calculate statistics for central tendency,

dispersion, and general frequency

o Compare Means: Here you will be able to run t tests, and one-way ANOVAs

o General Linear Model: This is where you can run more complex factorial ANOVAs, repeated

measures ANOVA, and multivariate analysis of variance (MANOVA)

o Correlate: When you want to calculate Pearson’s r and other forms of correlation statistics

o Regression: For simple and multiple linear regression, and logistic regression

o Loglinear: For more complicated non-parametric analysis involving three or more

categorical variables

9

o Data Reduction: In case you are interested in factor analysis – perhaps this is a topic you

might want some help with

o Scale: Useful when you need to calculate the internal consistency reliability of a set of items

from a scale

o Nonparametric Tests: Helpful when you find out that your real-world data do not meet the

assumptions/requirements for standard parametric statistics such as ANOVA and t tests.

Graphs: Will allow you to generate some basic figures from your data. NOTE: If you hope to impress

others with your figures, I recommend that you use the graphic options within Excel, as they are

more flexible than those in PASW17. This may not be the case with future versions of PASW, but this

has been my experience in the past.

Window: Through this menu you can move among multiple PASW windows including the data

editor, output, and syntax windows (the latter two will be discussed soon)

Help: This is actually a very helpful Help menu, giving you access to in-program and online tutorials,

which can be really useful when you’re stuck or have forgotten things that others have taught you

already.

Many of these menu-based functions are also available through shortcut icons that are displayed in the

toolbar just below the drop-down menus. Holding your mouse over these will tell you what these

functions can do.

Data View

At the bottom, left corner of your screen there are two tabs. The first is labeled “Data View” and the

second, “Variable View”. If you are looking at the Data View, you will see the following:

10

Along the right hand side of the Data View window are row numbers, which will typically equate to

individual cases or people in your dataset. Along the top are columns, which will typically be labeled to

represent each of the variables in your dataset. All variables, independent and dependent, predictor and

outcome, grouping/categorization, etc. get their own columns. In the image above, all the row and

column headings are grayed out because you have not entered any data yet.

Variable View

Back at the bottom left of the screen, if you click on the Variable View tab, here’s what your screen will

look like:

This screen is a bit more complicated, and it should be – this is where you really control the organization

of your dataset. Along the right side are numbers reflecting each individual variable (the columns that

you’ll see in the Data View). Along the top are columns for specific properties that you can set for each

of the variables in your data file. Some of these variable properties are really important and some of

them aren’t. Here’s a summary of the important ones:

Name: This is where you type a brief variable name. Keep it simple and short and don’t use any

special characters. The Name cannot start with a digit.

Type: Mostly you will select “numeric”, but sometimes you may have text (classified as “String”),

dates (“Date”), or dollars (“Dollar”) – selecting the variable Type will also establish the format

for how the data have to be entered for this specific variable

Width: Default is 8 spaces, but you can make it smaller if you want to see more variables on the

screen at one time

Decimals: Default is 2, but sometimes your data will not have decimals, so you can change this

to 0

11

Label: This is where you can type in a more descriptive and useful variable label. NOTE: I highly

recommend you do this for every variable in your dataset, especially if you intend to share your

file with others or ask for help from others. Without labels, datasets can easily become

unintelligible.

Values: Use this when you need to define grouping labels (e.g., Male = 0, Female = 1) to a

specific grouping variable

Missing: This is where you can assign a discrete value to denote missing data (rather than entry

error). Common practice is to use a number like 999 here, so that when the data are entered,

you can make sure that you didn’t miss a few keystrokes

Columns: You can set the width of the columns in your data view here

Measure: This is where you specify what level or form of measurement each variable happens

to be (Nominal, Ordinal, Scale more on this later in the guide)

Output Viewer

Whenever you do something in PASW 17, including saving or running an analysis, a third window

appears, the PASW Statistics Viewer. This window summarizes the output of the “dialog” or action that

you just initiated when you selected some function from one of the menus. The Viewer window is really

basic, listing a hierarchical diagram of your output on the left, and the output itself in the middle. You’ll

see some examples of output as we get into other sections of this guide. One important point to

highlight at this stage, however, is that it is from this Viewer that you will most likely want to print or

export your results for interpreting and presentation. All menu functions are quite similar to those that

you’ll see in other Windows programs. One thing to highlight though, is that you can be selective about

what you print or export, so that you don’t get everything PASW happens to spit out. To do this,

highlight the sections of the output that you want (you can use ctrl+ mouse clicks to accomplish this)

and then when you select FilePrint, choose “Selection” before you click “Print” and you’ll get just

want you want to see. You may also want to experiment with copying and pasting selections of your

output directly into Excel where you can easily make figures/graphs and tables.

12

Understanding Variables and Data Formatting

In research terminology, variables are the things we measure. They represent information that is

observed in the environment or within/about the people or subjects we are studying. Variables can take

many forms, but when working within PASW, there are three main forms to understand.

Nominal

A nominal variable is considered to be the most basic form of measurement. The purpose of this type of

variable is to indicate some sort of qualitative difference among people or things. Common examples

include biological sex, religious preference, marital status, level of education, diagnosis code, etc.

Because responses to a nominal variable are basically names, you cannot use this type of variable for

mathematical or statistical operations, except for frequency-based analyses or to serve as a grouping

variable.

Ordinal

An ordinal variable is numerical and rank-ordered. Common examples of these types of variables would

be finishing position in a race, rating of quality, rank in class. Although rank-ordering type information is

sometimes of interest to researchers, it does tend to limit our statistical analysis options. The problem

with ordinal data is that it tends to be imprecise – is the difference between a 1 and 2 on an ordinal

scale the same as the difference between 3 and 4? Think of a table of finish times in a race and you’ll

quickly realize this is not the case. For this reason, with ordinal data all we can typically do statistically is

make general comparisons (e.g., > or <).

Scale

The most common type of variable that you will work with in the PASW program is likely to be scalar in

nature. Scale variables are quantitative and assumed to have a one-to-one connection with the

construct being measured and the numbers on your response scale. In other words, a higher score on

your construct should typically be linked to a higher amount of the construct being assessed. When you

are measuring things like temperature, scores on most validated tests, physical data (height, weight,

age, speed), reaction times, or number of problems solved, you should set the PASW variable type to

Scale.

13

Importing Data from Excel

Often it is useful to organize data in Excel before trying to analyze it in PASW. Sometimes you don’t have

a choice, especially if you are using an internet-survey program, as Excel tends to be the best default

data download option. This being the case, you may need to import your data from Excel into PASW at

some point to run your analyses.

Before I describe how to do this, some of you may wonder why you can’t use Excel to run your analyses.

In all honesty, sometimes you can, especially if you are just doing basic descriptive analyses. However,

any time you are doing something more complicated than calculating means and standard deviations, I

find PASW to be much simpler, easier, and more powerful. There are two main ways to transfer data

from Excel into PASW, the ugly way and the professional way.

Ugly Way

If you are only transferring a small amount of data from Excel to PASW, it is often easiest to simply copy

and paste it like you would with text in any word processing program. Before you do this, however, I

strongly recommend that you open a new data file in PASW and create all the variables that you’ll need

to make the paste-over happen smoothly. This is especially critical if you will be copying and pasting a

mixture of numerical and qualitative data, because PASW will only accept the qualitative (text)

information if you have set the variable type to “String”.

I know this sounds pretty simple, but to be honest, copying and pasting into PASW does not always work

as it should. Depending on your version of PASW, only 40 rows at a time may be transferred. Sometimes

you also need to highlight in PASW the precise cells in the matrix that you will be pasting into. For these

and other reasons it is critically important that you manually examine your data every time you paste, to

make sure that pieces of it are not missing or pasted into the wrong columns or rows.

Professional Way

Because of the risks associated with the ugly way of importing data from Excel to PASW, there is another

way. Unfortunately, this other way isn’t real simple. Hopefully the following information will help you

make sense of this process though. I’ll try to organize this in steps:

Step 1: Clean up your data in Excel

Save your import file as a separate workbook with only one sheet

Make sure what you import includes only the data that you want to analyze (delete everything

else and don’t worry, because you should have saved a backup already!)

14

Make sure the first row of data in your Excel worksheet contains your variable names (you can

insert descriptive labels once the data are in PASW)

Save this file somewhere you can find it again and move to the next step

Close the Excel file and program (otherwise PASW won’t be able to run the import query)

Your Excel file should look something like this before you move ahead to the next step:

Step 2: Begin the import process from within PASW

Open the PASW program and wait for the introductory dialog box to appear

Instead of searching for an existing file, this time select “Create a new query using Database

Wizard” and click OK

In the wizard dialog box that pops up next, select “Excel Files” and click Next

Click on Browse and locate the Excel file you just created for this import; Click OK

The next dialog box looks confusing (and it is), but don’t freak out. If you set up your Excel file as

instructed previously, then all you need to do is select “Sheet 1” and click the blue arrow that

will shift it from the box on the left to the box on the right; click the arrow and all of the columns

from your Excel sheet should be visible (these are the variables you’re importing into PASW);

click Finish (the other options that you could work with are pretty advanced and not really

necessary if you followed the steps outlined previously for setting up your Excel file)

The following screen shots walk-through these steps:

16

At this point (once you click Finish), you’ll be dumped into the PASW date editor window. Make sure you

are looking at the Variable View tab.

17

Now, you have a little more organization work to do before you can start your analyses.

Step 3: Clean up your PASW file

Start by saving your new PASW file somewhere you can find it again

Now, type in meaningful labels for each variable

Check and correct the Measure type for each variable – most will be Scale, but some may be

Nominal and/or Ordinal

Set Values for any nominal variables (e.g., 1=Male, 2=Female for the variable “sex”)

Save your file

Save a backup somewhere else

At this point you can begin your analyses (more on that later).

18

Entering Data within PASW

Many basic guidelines for working with data in PASW have already been covered in the previous

sections of this little guide. However, if you are planning on entering data manually within PASW (and

not importing it from another program such as Excel), there are a few other things to think about. If you

have some extra time available to get familiar with the PASW program, I highly recommend that you

take the time to walk through its tutorial functions.

Once you have opened PASW, click on Help Tutorial

For entering data, the first two topics are the most relevant guides

o Reading data

o Using the data editor

Other topics in the tutorial will be helpful later

19

If you don’t have time to work through the PASW tutorial itself, then try to follow these steps and you

should stay on track pretty well.

Step 1: Open PASW and select “Type in data” to open a new data editor window

This is also a great time to save your file for the first time

Step 2: Design your data file

When entering data manually you can basically view the PASW data editor window as a worksheet in

Excel. The rows should represent people or cases and the columns will be your different variables. An

efficient way to begin the data entry process in PASW is to follow these steps:

Create your variables in the Variable View tab of the data editor. For this, you can simply click on the

first cell under the column “Name” and type in a brief variable name, like the ones you used in Excel

before importing data. You can think of your variables as buckets or bins into which you want to

dump data that you’ve gathered – You need as many bins as you have separate pieces of

information on each participant/subject/case. If you are going to be entering data from a survey,

you should have as many variables/columns as there are responses for each participant, or data for

each case.

o Extra special tip: You might want to use a blank copy of the survey or data collection

instrument as a sort of “code book”, on which you can write each variable name next to

each blank for a response as a way of keeping track which variable in your dataset

represents which response on the actual survey or data form.

Once you enter a variable name, PASW will auto-complete the rest of the properties for this

variable, but it may get the details wrong. Check the variable properties (already discussed in

previous sections) to make sure that you’ve included a descriptive variable label, that the variable

type is correct, and that you have the proper number of decimal points, etc.

If any of your variables are coding variables or have meaningful labels associated with points on the

response options, then make sure to also define the Values of the variable (e.g., 1=full-time work,

2=part-time; 0=Male, 1=Female). The convention is to assign codes in the order in which the

response options appear on the survey or template you are using to collect data; typically you will

start with either a 0 or a 1 and count up to assign these values. Because these numerical codes are

arbitrary place-holders for more meaningful Value labels, their specifics do not matter, as long as

you have defined the Values.

If you know or think you will have some missing data in what you are about to enter, then it may be

a good idea to establish a numerical code that you can use to denote missing data (helps you to

check if you simply forgot to enter some values along the way). You can set discrete missing values

under the Missing column in the Variable View. It is common to use a 99 or 999 as a missing code –

the only rule here is that this value should be used consistently throughout your dataset and it

should not be a legitimate possible value on any of the variables (e.g., if age is a variable in your

dataset, then you might not want to use 99).

20

Once you’ve entered created all the variables you think you’ll need, save the file again and prepare

to enter your data

Step 3: Enter your data

At this point you should be prepared for tedium and also a potentially serious strain on your focus (and

eyes, unfortunately). Typically you will have gathered data at this point, in the form of a survey perhaps

or observations from records. Have all of those files near you before you begin entering data so you can

be efficient in your actions. Here are some tips to make this process go smoother:

Ideally, the data you need to enter are organized on some form of a paper survey or template (one

for each case), then start your data entry by assigning an ID code to each survey. Just write it on the

upper right corner of each. This will help you to double-check for data entry errors later or to pull

additional information from these surveys if you are not entering all of it at one time

Make sure you are working in an environment that is relatively free from distractions – few things

are worse than having to fix lots of entry errors because you weren’t able to type data in the right

way the first time

Save your data file frequently

If you are entering numeric data, you may find it easier to work with a keyboard that has a

dedicated number pad. In PASW, you can move across a row (within a participant/case) by pressing

the Tab key. This being the case, the data entry sequence would be: enter a number with the right

hand, hit Tab with your left, enter a number, tab… With practice, you will be able to enter lots of

data quickly

If you have several Nominal or grouping variables in your dataset, and you’ve defined their variable

Values (as previously described), then you may want to see those labels as you enter data. For

example, if you are entering a person’s sex, when you type a 1, you might like to see “Male” and if

you type a 2, you would like to see “Female”. To do this you need to turn on Value Labels

o In the Data View, on the top menu bar, click View Value Labels to turn this function on

21

If the Value Labels function is off, you will see the number that you enter; if this

function is on, you will see the Value Label that you assigned to that numerical code

when you were creating the variables in Variable View

From time to time it is important to spot check your data entry to make sure you are hitting the

right keys and/or have your “Num Lock” on for your keyboard. I recommend that you do this every 5

– 10 cases, just by quickly reviewing the last few cases that you entered. After doing this quick check

is also a great time to save and backup save your file again (I know I’m a bit paranoid, but trust me,

it’s worth it when you work with data for a living).

22

Choosing the Right Statistical Analysis

Entering data into SPSS is difficult enough. Once you have a complete dataset, however, the real

challenge begins. What statistical analysis should I use to test my hypotheses or answer my research

questions? I wish there was an easy way to answer this question, but there’s not. One of the reasons for

this is that there are typically several ways in which the same research questions can be addressed with

the same set of data.

Here are a few things that may be helpful for you to consider, though:

Check out the decision tree like the one that’s included at the end of this guide (from Field, 2009,

back cover)

Review the SPSS manual (available to you electronically)

Talk to a colleague who might know what approach you should take

Try using the PASW Statistics Coach tool

Here are the steps involved in using the PASW wizard:

Inside the Data Editor, click on Help Statistics Coach

Follow the prompts to guide you to instructions for running the specific analyses that you think you

need to run to address your research questions

The PASW Statistics Coach is designed to be intuitive, so I will not review it in depth here. This help

function will assist you once you have entered your data and if you have an idea of the research

questions you hope to address. You also need to know the type of data that you will be analyzing

(categorical, nominal, scalar – all discussed previously in this guide). In general:

If you are looking for basic descriptive statistics (mean, variability, frequencies) and/or figures and

tables for presentations you should click on the first link in the Statistics Coach window,

“Summarize, describe, or present data”

23

To dig into the variability within a set of data, the second link is appropriate, “Look at variance and

distribution of data”

OLAP report cubes (typically for descriptive purposes) will probably be more advanced than you will

regularly use, but if you want to learn about this function you can click that link

If you are interested in comparing a control versus one or more experimental groups, then the next

link is a good place to start, “Compare groups for significant differences”

If your research involves demonstrating a relationship between two or more variables (e.g., a

correlation perhaps), then the next link is a good starting place, “Identify significant relationships

between variables”

If you are looking for similarities across cases (clustering of some sort), the next link will guide you,

“Identify groups of similar cases” (warning, this can get rather complex)

If you are trying to identify groups of similar variables, then the last link will get you started.

24

Generating Presentation-Quality Output

Generating tables and figures in PASW 17 is a lot easier than it was with earlier versions of this program.

I still feel that you have more flexibility with your figure types and table formats if you use Excel, but

either program will allow you to create presentation-quality graphics. Before you get too concerned

about how to generate graphics in either program, it is important to highlight some basic tips about

visual presentation of statistical information.

In general, pictures are worth a thousand words – when they help to represent statistics pictures are

probably worth more like a million words. Why? Statistics are scary to people, but pictures are not.

Tables are also very efficient, but writing all of your results out in the text of a report or presentation

slide/poster would be ridiculous.

“Good” figures should adhere to these basic principles (adapted and expanded from Field, 2009):

o Show the data simply and cleanly

o Push the viewer to think about the data being presented (and not the colors in the figure)

o Avoid distorting the data

o Present maximum information with minimum “ink”

o Make large data sets coherent

o Encourage the reader to make comparisons among elements of the data

o Avoid 3-d graphics unless you are actually trying to show a three-way relationship

o Make sure all axes are clearly labeled (and that each table/figure has a descriptive note

explaining any abbreviations or symbols that you used in the graphic)

We can do this in several ways, but what’s most appropriate depends on the type of data you are trying

to showcase. There are two basic rules to remember when choosing the type of figure you want to

create:

Continuous data can be plotted with histograms or line-graphs (or something similar)

o These figure formats show continuity between the X-axis elements and help the viewer to

“see between the lines” when doing so is appropriate

Discrete data should be plotted with bar graphs or pie charts (or something similar)

o These figure formats keep each set of responses isolated to their respective category, which

labels the X-axis – there is no in-between when it comes to illustrating the percentage of

your sample who was Male vs. Female

If you’re trying to illustrate a relationship between two variables, a scatterplot may be your best

option (although reporting the Pearson’s r will also do the trick if your audience understands

statistics at least a little bit)

25

Creating Graphics in PASW

Creating tables in PASW 17 is pretty easy, as there are several pre-set format styles for all statistical

output that is displayed in the PASW Viewer window (this is what pops up after you’ve run an analysis).

You can select your preferred style from within the Data Editor, by following these steps:

Click on Edit Options Pivot Tables and find a style that works for you

Then click OK and the next time you open the output Viewer, the tables should conform to your new

style

From within the output Viewer, you can click on any table that you want and then copy and paste it into

Word, PowerPoint, or whatever other program you are using to put together your presentation or

report. This copy and pasting process is just like what you would use within any Windows program.

If you need to create charts/figures, the PASW 17 program comes with two built-in options. The new

Chart Builder function is seen as intuitive by some and utterly confusing to others; I fall somewhere in

between and still prefer to output my results to Excel where I can create most figures I would ever need.

The Legacy Dialogs option may actually be even easier for you to use – my recommendation is that you

try both approaches within PASW to see if one works best for you. If neither one works (and you’ve tried

the built-in PASW tutorial), then I recommend you consider copying and pasting your statistical output

into Excel and making your figures there. If that doesn’t work, then I recommend you find yourself a

26

research collaborator who knows how to create clear and meaningful graphics – seriously, life’s too

short.

If all you need is a basic bar chart, line graph, or scatterplot, you should probably create these in PASW.

Here are the basic steps to get you started:

Open the data file that contains the information you want to illustrate with a figure/chart

Consider skimming through the Tutorial function on this PASW tool: HelpTutorialCreating and

Editing Charts. This is a very helpful tutorial that is strongly recommended if you are doing a fair

amount of graphics work within PASW

If you’re one who hates to follow directions and you prefer to dive in head-first, then click on

GraphsChart Builder

o You may get a warning message when you do this, telling you that you have to make sure

measurement levels are set for all your variables before you use the Chart Builder. If you

followed the instructions in this guide, this should be a non-issue, but if you didn’t, follow

the prompts to make sure all variables have properly assigned variable types (scale, ordinal,

nominal)

In the dialog box that pops up, you’ll need to select the type of graphic that best reflects what

you’re looking for (they’re listed on the left side of the Gallery and you can choose from several

general types

Once you select a graph type, then you can pick a style by clicking and dragging one of the related

options (to the right) up into the white space above the Gallery details

Once this template is in place, you can click and drag over specific variables from the list on the left,

into the figure on the right, assigning them to specific axes as desired.

o If you intend to compare scale scores that are based on multiple items, make sure you have

computed scale scores and listed them as separate variables in the data file before you

attempt to plot them here

o A second dialog box will pop up next to the Chart Builder window. This second box allow you

to control additional details of the elements in your chart, but you can ignore it if this is too

much control

Once you have set the variables in place, then you should work through the other options in the

Chart Builder window to make sure you’ve included a title and any necessary descriptive notes

Once you have set these details in the Chart Builder window, click Ok and it will appear in your

output Viewer. You can edit it further by double-clicking on the figure in the Viewer and using the

editing tools that will appear

o Note that what you see in the Chart Builder preview is not necessarily what you’ll get in the

output Viewer, so you may have to work through this process a few times before you get

what you want

You can copy and paste this into other programs as necessary

29

Creating Graphics in Excel

Obviously it is possible to create tables and figures in PASW 17. Sometimes you’ll want to create a

special type of table or a figure in which you can include additional labels or formatting options

(especially if you are doing a visual presentation). If this is the case, you may want to consider creating

your tables and figures in Excel. This is really a topic for a separate workshop and guide, but to get you

started, you can copy and paste the relevant snippets of output from your output Viewer into a blank

Excel workbook. From there, you can manipulate (copy and paste style) the output so that you can build

your ideal figure or chart directly. Because everyone has their own style for handling this more custom

development process, I am not offering specific step-by-step instructions here, but I will suggest the

following:

Make sure you save your Excel file frequently

Don’t forget the basic rules about formatting figures presented earlier in this section (i.e., don’t go

overboard just because Excel lets you plot things in multicolored three-dimensions)

Check the font, type size, and general formatting of your final tables/figures to make sure everything

is uniform and professional looking before you try to copy and paste out of Excel and into a report

or presentation slide

Consider turning the gridlines off in Excel if they are showing up when you copy and paste (this

happens sometimes, and although I can’t tell you why, I can tell you how to turn gridlines off). This

seems to be a more common issue with Excel 2007 than previous versions. To turn gridlines off in

Excel 2007: Click on Page Layout Uncheck the box under View: Grid lines (toward the right side of

the menu bar)

So much of the “skill” in presenting statistics properly comes from trial and error. You can shorten your

learning cycle, however, by paying attention to the way statistics are presented in the journals you read

and the presentations you see at conferences. Mimicking the work of other strong researchers is totally

ok – in fact it’s expected, given that the field of science is based on replication. This being the case, the

information presented here is enough to get you started, but now you have to pay attention to how it’s

done within your specialty field.

30

Basic descriptive and correlational statistics

It is always a good idea to begin your analyses with a basic review of the descriptive statistics pertaining

to your data. Typically this is done at the scale score level (if you have multi-item scales included in your

dataset) or stand-alone single pieces of information (e.g., weight, height, age). This being the case,

before you run descriptive analyses, make sure you’ve computed all the necessary scale scores. Doing

this is pretty easy, but you need to know what the scoring instructions are for the multi-item scale you

are using. In many cases this simply involves adding or averaging each participant’s responses to all of

the items in a set together to create a scale score for each participant or subject in your data set. There

are a few ways to accomplish this in PASW, but the easiest way is by following these steps:

Within PASW, open the data file that has the raw data for each item in the particular test/scale

On the top menu bar, click on Transform Compute variable

In the box below “Target Variable:” type a variable name that will be assigned to the new scale score

you’re about to compute

In the “Numeric Expression:” box to the right, create the mathematical formula that will calculate

the total score for a set of scale items (following the scoring instructions from the original source for

the measure you are using.

o Note: Some scales have “reverse-coded” items embedded in them. This means that the item

is worded differently from other items in the scale (typically oppositely and negatively

compared to the more positively worded items). The purpose of such items is to limit

participants from getting stuck in a response set or pattern. These items need to be

manually recoded by flipping their scores before you calculate a total sum or average score

for a set of items that are part of the same scale. The scoring guidelines for an established

measure should tell you if any of the items needs to be reversed. Someone with skill in

assessments and evaluation work can also assist you if you are really not sure how to score a

set of items that you’ve gathered.

Click OK and the new total score variable will be added as a new labeled column at the end of your

data set.

o Note: If you will be adding more data to this data file at some point in the future and think

you might need to rerun this computation, I suggest you learn how to use PASW syntax. This

requires additional training, but for starters you can turn to the built-in PASW help on syntax

– basically simplified computer code that allows you to save frequent operations that you

perform on a data set so you can quickly re-run them without having to build them from

scratch in the menus every time.

Do this for all scales that you have in your data set. Then you’ll be ready to run your analyses.

31

Descriptive Statistics

Descriptive statistics include measures of the central tendency and variability/dispersion of a set of data.

For each scale score or stand-alone variable in your data set, these statistics can be calculated, as long

as they are quantitative and scalar in nature. Another form of descriptive statistic is a simple frequency

analysis, in which the focus is on how the scores in a set are distributed. It is this distribution that central

tendency and variability statistics will describe, but a frequency analysis by itself can also be run on

nominal data, to basically show how many people fall in each category for a given variable.

To calculate the basic descriptive statistics for a set of variables in your data set, follow these steps:

Within PASW, open the data file that has the raw data and any available scale scores entered for

each item in the particular test/scale

On the top menu bar, click on Analyze Descriptive Statistics Descriptives…

Click and transfer any variables that you want descriptive statistics for from the list on the left to the

empty box on the right.

If you want or need standardized values for these variables click the box on the lower left corner

(new variables will be added to your data set). This can be helpful if you are comparing multiple test

scores that do not share the same variability, or if you are thinking of combining a few different

variables together into some form of composite – standardizing helps to put the various measures

on the same scale, with a mean of 0 and standard deviation of 1.

32

Once you’ve transferred your variables of interest to the box on the right, click the “Options” button

on the upper right and click to select all the descriptive statistics you need. I recommend clicking just

about everything, although you might need to read up a bit on the importance of some of these

elements when you get more involved with your own analyses. Click “Continue” and then “OK” and

the statistics will be calculated

The output will be displayed in the output Viewer window where you can interpret it or copy and

paste it into Excel for further processing

33

You can obtain much of the same descriptive statistics by running a frequency analysis on the data and

selecting some special options. This analysis can also be a great way to check and make sure that you

didn’t enter any ridiculous values when you were inputting data (i.e., look at the min and max values for

each variable in your data set and you’ll see if you entered something that’s impossible, like a 777 on a

scale for which responses only ranged 1 to 7). Here’s how to do that:



On the top menu bar, click on Analyze Descriptive Statistics Frequencies…

Click and transfer any variables that you want frequency statistics for from the list on the left to the

empty box on the right.

o Click on Statistics to select the analyses you would like

o Click on Charts if you want to see histograms and frequency curves printed for each variable

o Click on Format if you want something special (not always necessary)

o Then click OK



34

Basic Correlations

If the data appear to be properly entered and relatively normally distributed (which you can tell from

the histograms and considering the skewness and kurtosis statistics, then you can proceed with running

a basic correlation analysis on a set of data. Basic correlations are done with the Pearson’s r statistic, but

PASW will also allow you to include dichotomous variables such as sex (e.g., 1=female, 0=male) in with

no problem (it adjust the formula for you). In general though, correlations should only be calculated on

35

data that are scalar and relatively continuous in nature (not nominal). A bivariate correlation indicates

the degree of relatedness between two variables. As a rough guideline, if you see a r of < .25 is a small

effect , a r of .25 - .40 = a moderate effect, and a r of .40+ = a large effect.

Once you’ve identified the variables that you believe should be related to one another (hopefully this

stems from your hypotheses or research questions), here are the steps to follow:



On the top menu bar, click on Analyze Correlate Bivariate…

Click and transfer any variables that you want to correlate with each other from the list on the left

to the empty box on the right.

Click the Options button at the upper right

o Under Statistics definitely request Means and standard deviations (the other one may not

be interesting to you)

o Under Missing Values, if your data set is rather large, then click “Exclude cases listwise” so

that there will be an equal number of data points in each of the correlations computed and

therefore a more stable estimate of relationship between the variables. If you have a small

sample then you may want to use the pairwise deletion option – in general though, you’re

better off making sure there is no missing data in your data set, in which case either the

pairwise or listwise options will yield the same result.

o Then click OK



36

In this particular example, you can see that Sex is negatively correlated with age and that nfrtot is

negatively correlated with General physical health. Your ability to interpret these significant correlations

will be dictated by how clearly you labeled each variable in the data editor and also whether you

understand what each of these variables means.

37

Other Types of Analyses

Your research questions will likely require you to utilize other statistical tests in addition to the ones

covered so far in this guide. You can think of this guide as a Part I to more advanced material. To help

you keep moving forward, however, here are just a few points and tips to consider with several other

common types of analyses you are likely to confront in your research. You can also get step-by-step

instructions for running all of these analyses by working through the PASW Statistics Coach, which is

accessible from the data editor window, via Help Statistics Coach

Comparing groups: t-tests and ANOVA

When your research hypothesis is something along the lines of, “There is a difference between group 1

and group 2 on this variable,” then a t-test may be your best statistic. This statistic is based on a

comparison between the means of two completely separate groups (independent samples t-test) or the

means of two groups that are somehow related (paired/dependent samples t-test). If you are interested

in comparing mean scores across three or more groups you will need to use an ANOVA. In PASW the

following steps will get you started on these types of analyses:



On the top menu bar, click on Analyze Compare Means [choose your mean comparison test]

Click and transfer any variables that you want to correlate with each other from the list on the left

to the empty box on the right.

Drag the variables you want to compare into the Test Variable(s) box. Drag the variable that

indicates the grouping of individuals into the Grouping Variable box (common e.g., Sex, with the

defined groups being 1=Female, and 0=Male).

o If you don’t have one or more nominal variables defined already that you can use for

grouping, you should establish this prior to starting this type of analysis.

Within the PASW Statistics Coach you would click on the link for “Compare groups for significant

differences” and indicate that you are comparing scale data for two groups. For an ANOVA, within the

PASW Statistics Coach you would click on the link for “Compare groups for significant differences” and

then select that you are comparing scale data across more than two groups.

Predicting outcomes: Linear Regression

If you are interested in going beyond a bivariate correlation (r) to try to predict an outcome with

multiple separate predictive factors, a regression of some sort may be most appropriate. In linear

regression the assumption is that the relationship between the multiple predictors and your outcome is

linear, or monotonically increasing/decreasing. There are other forms of regression that can handle

38

dichotomous outcomes (logistic regression), or non-linear predictor-outcome relationships (polynomial

regression), but all of these are more advanced topics requiring additional training. To do a regression

analysis, I would strongly recommend that you work through the PASW Statistics Coach. Within the

PASW Statistics Coach you would click on the link for “Identify significant relationships between

variables” and then one dependent variable and two or more independent variables that are scalar.

When the data are “special”: Nonparametric Statistics

Many times when doing research the data will not “play nice” or conform to the assumptions that

typical statistical tests are based on. All of the statistics discussed up to this point are the relatively

simple kind that do conform to some basic assumptions about how data should be distributed. When

those assumptions are violated, there are nonparametric equivalents for most of the statistics discussed

already in this guide. The logic behind some of these tests is quite complicated, but if you work through

the PASW Statistics Coach, you will get some guidance on how to actually run these analyses.

Working with PASW (SPSS) 17: Basic Conceptsweb2.utc.edu/.../PASW17howtopacketCJLCunningham.pdf ·...

Documents

Transcript of Working with PASW (SPSS) 17: Basic Conceptsweb2.utc.edu/.../PASW17howtopacketCJLCunningham.pdf ·...