Spss Introduction Document-sahyadri

13
Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications of SPSS in Research Data Analysis”, 4 th March, 2014. Page 1 SPSS FOR SPSS, standing for Statistical Package for the Social Sciences, is a powerful, user-friendly software package for the handling statistical analysis of data. This package is particularly useful for researchers in psychology, sociology, psychiatry, and other behavioral sciences, as it deals with an extensive range of both univariate and multivariate procedures much used in these disciplines. For practicing researchers and business managers, many a times it becomes difficult to solve the real life data/problems involving statistical methods. The books available on statisticsdo give a comprehensive picture of statistics as a facilitating tool for decision making but they invariably fail in helping the researcher/manager in solving and getting results for practical problems. Using simple examples, these books very successfully explain simple calculation procedures as well as the concepts behind them. However manual calculations, being cumbersome, tiresome and error-prone can be successful only to the extent of explaining the concepts and not for solving the real life research problems involving huge amount of data. For this reason, most of the practical statistical analyses are done with the help of an appropriate software package. A researcher/manager is only required to prepare the input data and should be able to get the final result easily with the help of software packages, so that focused attention can be given to various other aspects of problem solving and decision making. A wide variety of software packages such as SPSS, Minitab, SAS, STATA, S-PLUS etc. are available for statistical analyses. Microsoft Excel can also be used very successfully to solve a wide variety of problems. This study material is an effort towards facilitating a researcher in solving statistical problems using computers. The chosen Statistical software is “SPSS” which is a very comprehensive and widely available package for statistical analysis. -Sumitha Achar ADDRESS: Assistant Professor AIMIT St.Aloysius College (Autonomous) Mangalore, Beeri, 575022 : 99808 85896 Email: [email protected] [email protected] BEGINNERS

description

SPSS

Transcript of Spss Introduction Document-sahyadri

Page 1: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 1

SPSS FOR

SPSS, standing for Statistical Package for the Social Sciences, is a powerful, user-friendly software

package for the handling statistical analysis of data. This package is particularly useful for researchers in

psychology, sociology, psychiatry, and other behavioral sciences, as it deals with an extensive range of both

univariate and multivariate procedures much used in these disciplines.

For practicing researchers and business managers, many a times it becomes difficult to solve the real

life data/problems involving statistical methods. The books available on “statistics” do give a comprehensive

picture of statistics as a facilitating tool for decision making but they invariably fail in helping the

researcher/manager in solving and getting results for practical problems. Using simple examples, these books

very successfully explain simple calculation procedures as well as the concepts behind them. However manual

calculations, being cumbersome, tiresome and error-prone can be successful only to the extent of explaining

the concepts and not for solving the real life research problems involving huge amount of data. For this

reason, most of the practical statistical analyses are done with the help of an appropriate software package. A

researcher/manager is only required to prepare the input data and should be able to get the final result easily

with the help of software packages, so that focused attention can be given to various other aspects of problem

solving and decision making.

A wide variety of software packages such as SPSS, Minitab, SAS, STATA, S-PLUS etc. are available

for statistical analyses. Microsoft Excel can also be used very successfully to solve a wide variety of problems.

This study material is an effort towards facilitating a researcher in solving statistical problems using

computers. The chosen Statistical software is “SPSS” which is a very comprehensive and widely available

package for statistical analysis.

-Sumitha Achar

ADDRESS:

Assistant Professor

AIMIT

St.Aloysius College (Autonomous)

Mangalore, Beeri, 575022

: 99808 85896

Email: [email protected]

[email protected]

BEGINNERS

Page 2: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 2

MODULE-1

INTRODUCTION TO SPSS

INTRODUCTION:

SPSS (originally, Statistical Package for the Social Sciences) was released in its first version in 1968 after being

developed by Norman H. Nie and C. Hadlai Hull. SPSS is among the most widely very powerful and user friendly

program for statistical analysis in social science. Market researchers, health researchers, survey companies, government,

education researchers, marketing organizations and others use it. In addition to statistical analysis, data management (case

selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the data file) are

features of this software.

SPSS is a very powerful and user friendly program for statistical analyses. Anyone with a basic knowledge of statistics

who is familiar with Microsoft Office can easily learn analyses in SPSS with a simple click of the mouse.

NOTE: Statistical analysis is like a sewer. What you get out of it largely depends on what you put into it. Over 82

percent of all statistics are made up on the spot to try to prove a point. You can conclude just about anything if you’re

not careful with your data and with your calculations. SPSS watches the performance of the calculations for you, but

the raw data, and which calculations should be performed, is up to you.

SPSS works with numbers only not the words. If you cannot express your information as a number, you

can‟t run it through SPSS. You will see names and descriptions seemingly being processed by SPSS, but

that‟s because each name has been assigned a number.

For Example: If a survey questions is like, “How much do you enjoy eating burger?

If you have to select your answer from: Very much, sort of, not really, hate the stuff.

A number (code) is assigned to each of the possible answers, and these numbers are fed through the statistical process.

You must keep accurate records describing your data, how you got the data, and what it means. SPSS can do all the

calculations for you, but only you can decode/read/interpret what it means.

Identify the variables. You always begin by defining a set of variables, and then you enter data for the variables to create

a number of cases.

For example: if you are doing an analysis of automobiles, each car in your study would be a case. The variables that define

the cases could be things such as the year of manufacture, horsepower, and cubic inches of displacement. Each car in the

study is defined as a single case, and each case is defined as a set of values assigned to the collection of variables. Every

case has a value for each variable. (Well, you can have missing values too)

Variables have types. That is, each variable is defined as containing a specific kind of number.

For example:

A scale variable is a numeric measurement, for example, weight, Salary, Income or miles per gallon.

A categorical variable contains values that define a category; for example, a variable named gender could be a categorical

variable defined to contain only values 1 for female and 2 for male.

Things that make sense for one type of variable don‟t necessarily make sense for another.

For example, it makes sense to calculate the average miles per gallon, but not the average gender.

Make sense with data. You can instruct SPSS to do any analysis, draw graphs and charts. When preparing SPSS to run

an analysis or draw a graph, the OK button is unavailable until you have made all the choices necessary to produce output.

Not only does SPSS require that you select a sufficient number of variables to produce output, it also requires that you

choose the right kinds of variables.

Page 3: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 3

For Example: If a categorical variable is required for a certain slot, SPSS will not allow you to choose any other kind.

Whether the output makes sense is up to you and your data, but SPSS makes certain that the choices you make can be used

to produce some kind of result.

Obtaining output is necessary. Getting an output is not that important from any type of data, but learning the skills of

interpreting right output for right data is all that matters.

Keep a multiple copies of your data set. The most valuable possession you have in dealing with statistics is not your

computer. It‟s not your SPSS software. You can lose any one of those, but any one of them can be replaced. Your most

valuable possession is your data. Sure, you can always go and get more data, but you can‟t go and get the same data. The

world doesn‟t hold still long enough. Make sure you make backup copies of your data.

____________________________________________________________________________________

STARTING SPSS The SPSS program can be installed in a computer using a CD or from the network.

Once installed, SPSS can be opened like any

other Windows-based application by clicking

on the Start menu at the bottom left hand

corner of the screen and clicking on SPSS for

Windows from the list of programs. Opening

the SPSS program for the first time will

produce a dialog box as shown in the following

figure.

This dialog box is not of any particular use,

select Don’t show this dialog in the future, and

click on the Cancel button. This activates a

window as shown in the figure below in the next

page.

Page 4: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 4

This is the main data editor window where all the data is entered, is much like an Excel spreadsheet. At the top of the

screen (figure above), there are different menus, which give access to various functions of SPSS. At the bottom of the

screen, we have a status bar. At the bottom of figure, we can see SPSS Processor is ready in the status bar. It implies that

SPSS has been installed properly and the license is valid. The program can be closed by clicking on the Close button at the

top right hand corner, just like in any other Windows application software.

READING DATA FROM EXCEL FILES

Suppose we want to import an excel file into SPSS. Go for FILEOpenData

First, open the excel file and understand how it is formatted. The first row has variable names, and the data part is from the

second row and below. Close the excel file and let‟s start reading this file into SPSS.

This brings up a dialogue box “Open Data” as shown below.

Firstly select the „Files of the type‟ as Excel (*.xls).

Now read the required file from Look In. Then select the excel file “xls_gss93.xls” (for ex) you saved in your system.

Then click Open. (See below for a visualized instruction).

Now you should be seeing another dialogue box “Opening Excel Data Source.”

Page 5: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 5

As we first checked, the excel file has variable names in the first row. So check the “Read variable names from the first

row of the data” box. Click OK.

Now you have a new, unsaved data in another SPSS Data Editor window.

To save the data in the SPSS format, go from the pull-down menu:

File

Save

Let‟s save it in your working directory with the name “xls_gss93.” (Say).

OPENING AN SPSS DATA FILE:

Alternatively, you can use the Open File button on the toolbar. A dialog box for opening files is displayed. By default,

SPSS-format data files (.sav extension) are displayed.

SPSS MAIN MENUS:

The File, Edit, and View menus are very similar to what we get on opening a spreadsheet.

Page 6: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 6

The File menu lets us open, save, print, and close files and provides access to recently used files.

The Edit menu lets us do things like cut, copy, paste etc.

The View menu lets us customize the SPSS desktop. Using the View menu we can hide or show the toolbar, status

bar, gridlines etc.

The Data menu is an important tool in SPSS. It allows us to manipulate the data in various ways. We can define

variables, go to a particular case, sort cases, transpose them, merge cases as well as variables from some other file.

We can also select cases on which we want to run the analysis and split the file to arrange the output of the analysis

in a particular manner.

The Transform menu is another very useful tool, which lets us compute new variables and make changes to

existing ones.

The Analyze menu is the function which lets us perform all the statistical analyses. This has various statistical tools

categorized under different categories.

The Graphs menu lets us make various types of plots from our data.

The Utilities menu gives us information about variables and files.

The Add-ons tells us about other programs of the SPSS family such as Amos, Clementine etc. In addition, we can

find the newly added functions under Add-ons.

The Window and Help menus are very similar to other Windows application menus.

THE SPSS WINDOWS AND FILES

SPSS Statistics has three main windows, plus a menu bar at the top. These allow you to (1) see your data, (2) see your

statistical output, and (3) see any programming commands you have written. Each window corresponds to a separate type of

SPSS file.

Data Editor (.sav files) &

Output Viewer (.spv files)

1) SPSS DATA EDITOR(.sav files)

The Data Editor provides a convenient, spreadsheet-like method for creating and editing data files.

The Data Editor window opens automatically when you start a session.

The Data Editor provides two views of your data:

Data View. This view displays the actual data values or defined value labels.

Variable View. This view displays variable definition information, including defined variable and value labels, data

type (for example, string, date, or numeric), measurement level (nominal, ordinal, or scale), and user-defined missing

values.

In both views, you can add, change, and delete information that is contained in the data file.

Page 7: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 7

The data file is displayed in the Data Editor. (File Name: demo.sav)

The Data Editor lets you see and manipulate your data. You will always have at least one Data Editor Open (even if

you have not yet opened a data set).

When you open an SPSS data file, what you see, is a working copy of your data.

Changes you make to your data are not permanent until you save them (click File, Save or Save As). Data files are

saved with a file type of .sav, a file type that most other software cannot work with.

When you close your last Data Editor you are shutting down SPSS and you will be prompted to save all unsaved

files.

In SPSS 13.0 and earlier versions, one could open only one data editor window at a time, however from SPSS 14.0 and later

versions, multiple data editor windows can be opened simultaneously, much like Microsoft Excel.

At the bottom of the data editor there are two tabs—Data View and Variable View.

In Data View, the data editor works pretty much in the same manner as an Excel spreadsheet. One can enter values in

different cells, modify them and even cut and paste to and from an Excel spreadsheet.

In Variable View, the data editor window looks as shown in figure below.

Page 8: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 8

In addition to entering the values of the variables, we have to provide information about them in SPSS. This can be done

when the data editor is in Variable View. Notice that there are 10 columns in the data editor window in Figure above.

1) NAME: VARIABLES WITHOUT ANY SPACE (USE UNDERSCORE)

2) TYPE: VARIABLES CAN BE EITHER NUMERIC OR STRING (CANNOT PERFORM STATISTICAL

ANALYSIS)

3) WIDTH: WIDTH SHOULD BE BASED ON THE NUMBER

4) DECIMALS: BY DEFAULT YOU CAN KEEP IT AS 2

5) LABEL: COMPLETE DESCRIPTION OF NAME

6) VALUES: USED FOR CODED DATA.

7) MISSING: NO MISSING VALUES.

8) COLUMNS: BY DEFAULT, WE DO NOT USE IT.

9) ALIGN: RIGHT/LEFT OR JUSTIFIED.

10) MEASURE: NOMINAL/ORDINAL/INTERVAL

We will explain the usage of each of them with the help of following small exercise of data entry:

Suppose we want to enter the following data in SPSS:

We observe that we have three variables to enter—respondent number, gender, and age.

1) The first column in the variable view is Name.

Earlier versions of SPSS (SPSS 12.0 and earlier) could take a maximum of eight characters starting with a letter to identify

a variable. There is no limit for the length of variable name in the later versions. In this example, we will name respondent

number as resp_id; gender and age can be named as they are.

2) The next column titled Type

This lets us define the variable type. If we click on the cell

next to variable name and in the Type column, we get a

dialog box as shown in figure.

Data can be of several types, including numeric, date, text

etc.

An incorrect type-definition may not always cause problems, but sometimes does, and should therefore be avoided. The

most common type used is “Numeric,” which means that the variable has a numeric value. The other common choice is

Page 9: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 9

“String,” which means that the variable is in text format. We cannot perform any statistical analysis on a numeric variable

if it is specified as a string variable.

Since all our three variables are of the numeric type, we select numeric from the dialog box shown in figure above. We

can also specify the width of the variable and decimal places on this dialog box. It only affects the way variables are

shown when the data editor is on data view. Click on OK to return to the data editor.

3) Next two columns titled Width and Decimals

This also allows us to specify these factors for the data view. Please note that these have no impact on the actual values we

enter in the data editor, they only affect the display of the data.

For example if the value of a variable in a particular cell is 100000000, which comprises of 9 digits and we have specified

the width for this variable as 8, it will appear as ########. This simply means that the width of the variable column is not

enough to display the variable correctly.

4) Next, we have a column titled „Label‟.

Since the variable name in the first column can only be of 8 characters in the earlier versions of the SPSS program, it is

sometimes difficult to identify the variable by its name. To avoid this problem, we can write the details about a particular

variable in this column.

For example, we can write “Respondent identification number” as the label for “resp_id” variable. We can ask the SPSS

program to show variable labels with or without the names in the output window. This option can be activated by selecting

“Names” and “Labels” from the dialog box obtained by clicking Edit → Options → Output Labels.

5) Then, we have a column labeled „Values‟.

If we click on the cell next to the variable name and in the Values column, we get a dialog box as shown in

Figure below. In this box, we can specify values for our variables.

In the example here, we have two values for a variable „gender‟ as 1 representing Male and 2 representing Female.

Now enter 1 in the empty box labeled Value and specify its name (Male) in the next box labeled Value Label. This will

activate the Add button. Click on this button and repeat these steps to specify female. This way we can keep track of the

actual status of qualitative variables such as gender, nation, race, color etc.

6) After Values we have a column labeled Missing

This is to specify missing values. While coding data, we often specify certain numbers to variables for which some

respondents have given no response. Unless we specify these values as missing values, SPSS will take them into

consideration for data analyses producing a wrong output.

One way to handle this problem is to recode these numbers to missing values. The other way is to specify the number

that should be considered as missing values here itself. Clicking on the cell next to the variable name and in the Missing

column will produce a dialog box as shown in Figure below.

Page 10: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 10

By default, No missing value is selected here.

We can specify up to three discrete values to be

considered as missing values. Alternatively, specify a

range and all the values in the range will be considered as

missing values.

7) The next two columns titled Columns and Align

This helps us modify the way we want to view the data on screen. In the Columns column we can specify the width of the

column and in the Align column we can specify if we want our data to be right, left or center aligned. These do not have

any impact on the actual data analyses.

8) Finally, in the column titled Measure,

We can specify whether our variable is scale, ordinal, or nominal.

SPSS treats interval and ratio data as scale.

Once the variables are specified, you can switch to Data View and enter the data.

This data file can be saved just as an MS Word or MS Excel file and reopened by double clicking on the file from its saved

location.

VARIABLE MEASUREMENT LEVEL:

Nominal. A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for

example, the department of the company in which an employee works).

Examples of nominal variables include region, zip code, and religious affiliation.

Ordinal. A variable can be treated as ordinal when its values represent categories with some intrinsic ranking (for

example, levels of service satisfaction from highly dissatisfied to highly satisfied).

Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating

scores.

Scale. A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so

that distance comparisons between values are appropriate.

Examples of scale variables include age in years and income in thousands of dollars.

Note: For ordinal string variables, the alphabetic order of string values is assumed to reflect the true order of the categories.

For example, for a string variable with the values of low, medium, high, the order of the categories is interpreted as high,

low, medium, which is not in the correct order. In general, it is more reliable to use numeric codes to represent ordinal data.

ENTERING DATA

In Data View, you can enter data directly in the Data Editor. You can enter data in any order. You can enter data by case or

by variable, for selected areas or for individual cells.

To enter anything other than simple numeric data, you must define the variable type first.

If you enter a value in an empty column, the Data Editor automatically creates a new variable and assigns a variable

name.

Page 11: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 11

SPSS-format data files are

organized by cases (rows)

and variables (columns).

In this data file, cases

(rows in data view)

represent individual

respondents to a survey.

Page 12: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 12

Variables (columns of data

view) represent

respondents responses to

each question asked in the

survey.

2) SPSS VIEWER [Output Viewer (.spv files)]

SPSS Viewer is opened automatically to show the output when you run SPSS commands.

The Viewer window has his own menu and toolbars. The window itself is divided into two parts:

The left-hand side shows a tree structured outline (list) of the output elements shown in the right hand part, i.e. a structured

table of contents.

Each SPSS command produces a set of output elements (called frames) that are grouped hierarchically; a yellow icon in the

outline identifies the command and a level below you will find the various frames it has produced.

All statistical commands will produce a Title, a Notes (technical notes on the current command) frame and an Active Dataset

frame (Contains the name of the dataset used). Most commands add a frame about the number of observations included into

analysis (often labeled Case processing summary), in the example here Statistics, as well as the specific frames for the

command, i.e. results, tables, graphs are shown.

The Output Viewer shows you tables of statistical output and any graphs you create. By default it also shows you the

“syntax”. The Output Viewer also allows you to edit and print your results. The contents of the Output Viewer are

saved (click File, Save or Save As) with a file type of .spv, which can only be opened with SPSS software. As with

Data Editors, it is possible to open more than one Output Viewer to look at more than one output file. The “active”

Viewer, marked with a tiny blue plus sign, will receive the results of any commands that you issue. If you close all the

Output Viewers and then issue a new command, a fresh Output Viewer is started.

Page 13: Spss Introduction Document-sahyadri

Study Material on SPSS by Ms. Sumitha Achar ([email protected]) for One-Day National level Workshop on “Applications

of SPSS in Research Data Analysis”, 4th

March, 2014. Page 13

EXPORTING OUTPUT

To export your output, you go through a special procedure. In the Output Viewer click File, Export to invoke the Export

dialog box. There are three main settings to look at. First, pick the type of file to which you want to export: useful file types

include Excel, PDF, PowerPoint, or Word. Next, check that you are exporting as much of your output as you want, the

Objects to Export at the top of the dialog. If you have a part of your output selected, this option will default to exporting just

your selection, otherwise you typically will export all your visible output. Finally, change the default file name to something

meaningful, and save your file to a location where you will be able to keep it, like your U:\ drive. Once your options are set,

click OK.