SPSS-All function.doc

7/30/2019 SPSS-All function.doc

1/56

INTRODUCTION TO SPSS

Statistical Package for Social Sciences is what SPSS stands for. As its name implies it is a

statistical package that was originally designed for the handling of data generated in the process

of social science studies. But, currently, it is widely used in other areas, too. For example it is

used by Governments, businesses, law enforcement agencies, health care providers, academics

and also in experimental and observational studies.

So, what is SPSS? SPSS is a simple package to use. The user interface of the package is a

spreadsheet. In this spreadsheet too, there are cells, columns and rows. The columns represent

the variables and the rows, cases. Cases and variables are the two main components in statistics.

Case is the subject of analysis. This could be an animal in a scientific experiment or a personreplying a questionnaire. Variables are the measurements obtained on the various characteristics

of each case. Data so obtained can be analyzed by this package by means of descriptive and

bivariate or multivariate statistical methods.

SPSS differs from other spreadsheets in that the analysis is done in pull-down menus through

commands instead of analyzing within the spreadsheet. The output also does not appear in the

spreadsheet itself as common in other spreadsheets, but in a separate window. The output of

SPSS is comprehensive. That is to say the package may give additional outputs to augment the

expected output. For example, in addition to a graph it may give a histogram, the mean and the

standard deviation. SPSS can take data from almost any type of file and use them to generate

tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct

complex statistical analyses.

DIFFERENT TYPES OF WINDOWS IN A TYPICAL STATISTICAL SOFTWARE

There are a number of different types of windows in typical statistical software:

1. Data Editor Window / Object Window / Variable Window

Default window with a blank data sheet ready for analyses. This window displays the contents

of the data file. You may create new data files, or modify existing ones with the Data Editor.

The Data Editor window opens automatically when you start an SPSS session.

1


2/56

2. Viewer Window / Output Window / Log file / Results Window

The Viewer window displays the statistical results, tables, and charts from the analysis you

performed (e.g., descriptive statistics, correlations, plots, charts). A Viewer window opens

automatically when you run a procedure that generates output. In the Viewer windows, you can

edit, move, delete and copy your results. Whenever a procedure is run, the output is directed to

a separate window. One can also have multiple [Output] windows open to organize the various

analyses that might be conducted. Later, these results can be saved and/or printed.

3. Syntax Editor Window / Do File

You can paste your dialog box choices into a Syntax Editor window in SPSS, where your

selections appear in the form of command syntax. You can then edit the command syntax to

utilize special features of SPSS not available through dialog boxes. You can save these

commands in a file for use in subsequent SPSS sessions. Similarly in some softwares like

STATA you can enter several lines of command in the do file editor. Either you can run the

selected line in the Do Editor OR Do from the start line to the end. This can be created from the

event history window.

4. Chart Editor Window

You can modify and save high-resolution charts and plots in chart windows. You can change

the colors, select different type fonts or sizes, switch the horizontal and vertical axes, rotate 3-D

scatter plots, and even change the chart type.

5. Script Editor Window

Scripting allow you to customize and automate many tasks in a typical statistical software. Use

the Script Editor to create and modify basic scripts.

6. Command WindowCommand Window is where you will type your commands with syntax. To send a command to

software, hit the "return" or "enter" key.

7. Event History Window

The STATA Review Window lists all of the STATA commands that have been executed since

STATA opened. These can be repeated by double-clicking them and then clicking into the

Command Window and hitting Enter. The Review Window records your commands. The

2


3/56

Results window displays your output. The variables window lists the variables in the data set

you are using. The Results window is the Log Window.

A BRIEF ABOUT THE DATA SET

Before discussing the types of files generation and various menus in the SPSS, it will be useful

to understand the dataset which we are going to use as an example throughout.

A Sample Research Problem: The Employee Income study

The SPSS file name of the data set used with this manual is Employee Data.sav; it stands for

Employee Income Data. It is based on a sample data found in SPSS. The current data set

is a sample of 474 employees drawn randomly from the larger employee population.

Also, there are several kinds of personal, social and occupational data such as Gender,

Date of Birth, Minority Classification, Educational Level (years), Employment

Category, Current Salary, Beginning Salary, Months Since Hire and Previous

Experience (months). The description of the variables is given below:

Variable Description:

Variable Name Variable Description/Label

id Employee Code

gender Gender (M/F)

bdate Date of Birth

educ Educational Level (years)

jobcat

Employment Category (Clerical, Custodial,

Manager)

salary Current Salary

salbegin Beginning Salary

jobtime Months Since Hire

prevexp Previous Experience (months)

minority Minority Classification (yes/no)

DIFFERENT WINDOWS AND FILES / FILE EXTENSIONS:

Each window corresponds to a separate type of files in statistical software.

3


4/56

Data Files

There are two basic types of files in SPSS. The first is the data file window (.sav). This is

where all the data for your analysis resides. When you open up a data file, it will appear in the

Program Editor window. The format is similar to a spreadsheet with a grid of rows and

columns. The columns represent variables and the rows represent observations. You can place

the cursor on the column heading to get a lengthier description of each variable. To get

complete information on any variable, go to the UTILITIES menu and click on variables.

Data view>Utilities>Variables

Data view

4

Variable Name

Observation (Case

Number)


5/56

Variable view

Output files

The second type of file is an output window file (.spv). When a statistical procedure is run,

output is produced. The Viewer window will automatically open to show the output. The left

pane contains an outline view of the output. The right pane contains the contents of the output

which include tables, charts, and text. There are book icons in the outline view next to the

various objects of output. If the book is open, it indicates that the output is visible. If the book

is closed, it is hidden.

5

Variable

characteristics

(entire row)

Variables

(entire

column)


6/56

Analyze> Descriptive statistics>Frequencies>Employment Category>OK

DIFFERENT MENUS IN A TYPICAL STATISTICAL SOFTWARE

Many of the tasks you may want to perform with a typical statistical software start with menu

selections. Interestingly, in many statistical softwares each window would have its own set of

menu bars with different options. The Data Editor window in SPSS, for example, has the

following menu.

Most menus in this window are similar to the ones found in windows menu and some are unique

/ specific to the task of Data Editor. Data Editor Window has ten main menus. The different

menus are described in more detail below:

6


7/56

1. File

The File menu has an option to create a new SPSS system file, open an existing system file,

read in spreadsheet or database files created by other software programs, read in an external

ASCII/EXCEL data file from the Data Editor; create a command file, retrieve an already

created SPSS command file into the Syntax Editor; open, save, and print output files from the

Viewer and Pivot Table Editor; and save chart templates and export charts in external formats in

the Chart Editor, etc.

2. Edit

The Edit menu has an option to cut, copy, and paste data values from the Data Editor; modify or

copy text from the Viewer or Syntax Editor; copy charts for pasting into other applications from

the Chart Editor, etc.

3. View

The View menu has an option to turn toolbars and the status bar on and off, and turn grid lines

on and off from all window types; and control the display of value labels and data values in the

Data Editor.

4. Data

The Data menu has an option to make global changes to SPSS data files, such as transposing

variables and cases, or creating subsets of cases for analysis, and merging files. These changes

are only temporary and do not affect the permanent file unless you save the file with the

changes.

5. Transform

The Transform menu has an option to make changes to selected variables in the data file and to

compute new variables based on the values of existing ones. These changes are temporary anddo not affect the permanent file unless you save the file with changes.

6. Analyze

The Analyze menu is the important menu which contains all statistical procedures specific to

SPSS. This menu will be discussed later in detail.

7. Graphs

7


8/56

8

File Edit View


9/56

The Graphs menu has an option to create bar charts, pie cha rts, histograms, scatterplots, and

other full-color, high-resolution graphs. Some statistical procedures also generate graphs. All

graphs can be customized with the Chart Editor.

8. Utilities

The Utilities menu has an option to display information about variables in the working data file

and control the list of variables from all window types; change the designated Viewer and

Syntax Editor, etc.

9. Add-ons

The Add-ons menu has an option to view the information of add-on modules.

10. Window

The Window menu has an option to switch between SPSS windows or to minimize all open

SPSS windows.

11. Help

The Help menu has a standard Microsoft Help window containing information on how to use

the various features of SPSS. Context-sensitive help is available through the dialog boxes.

Graphs Utilities Add-ons

9


10/56

Window Help

10


11/56

TOOLBAR IN SPSS

For each SPSS window there exist a toolbar that provides quick and easy access to common

tasks of that window. Each icon in the tool bar is provided with Tool Tips. These Tool Tips

show a brief description of each tool when you put the mouse pointer on the icon.

The Main Toolbar

File Buttons

The first three buttons represent the three most common commands from the File menu Open

an Existing File, Save the Current File and Print the Current File respectively.

Dialog Recall

This button gives you quick access to the previous 12 dialog boxes you were working with. This

is particularly useful when you are building up an analysis and frequently going back and forth

to the same box to change or modify options.

Go to Chart

This button helps you to open Chart Editor.

Go to Data

When you are in a window other than the Data Editor window this button will take you back to

the Data Editor.

Go to Case

11


12/56

This button helps you to go quickly to a specific case in the data editor. It is helpful in editing

data, when an abnormal data points / outliers are found in your analysis and you want to check

out the source data.

Variables

This button creates a dialog box containing a list of all the variables defined in the data file.

Selecting a variable from this list displays the properties of variables viz., its name, label, type,

information about missing values and the value labels. This box can be kept open while you

work with the data file so that you can examine a variable's information as you examine the

results of an analysis.

Find

This button helps you to carry out a simple search to find a value.

Insert Case / Insert Variable

It is not unusual to find yourself wanting to add a case or a variable in the middle of data entry.

These two buttons will help you add a blank row or column in your data set.

Split File/Weight Cases/Select Cases

These buttons help you to do three of the Data Menu commands Split File, Weight Cases and

Select Cases respectively. (Data Menu is discussed later in this section)

Value Labels

This button helps you to display the labels in the data editor so that you dont have to remember

what the numbers meant. Disabling mode of this button would display the number again.

Use Sets

12


13/56

In Data Editor window you can group variables together into sets so that the variables can be

analysed together. This button helps you to specify what sets of the ones you have defined you

want to use.

SPECIAL (STATISTICS) MENUS

For every statistical software you may find some menus special and distinct. The special menus

of SPSS are called the Statistics Menus. These special menus are as follows:

1. Data Menu

2. Transform Menu

3. Analyse Menu

DATA MENU IN SPSS.

Data Menu provides procedures to define variables, insert variables or cases, sort cases, merge

files, split files, select cases and use a variable to weight cases. Some of the menu items in the

Data Menu such as sorting, merging and transposing data sets and for selecting subset of cases

and splitting files by variables are explained below.

a) Define variable properties:

SPSS offers a wizard-type tool that helps you to set all variable properties using an interactive

interface. Although it can be used for all types of variables, it is especially useful for categorical

variables, as it scans the actual variables for all distinct values. From the menu select Data -

Define variable properties - then a first panel appears that lets you select the variables for which

you want to set or change properties:

Data>Define variable properties> take gender>Variable to scan

13


14/56

Select the variables; you can also limit the number of cases to scan (useful with very large files)

and as the tool is best use with categorical variables, you can also limit the number of values

(codes) that should be displayed-When you click continue the next panel will pop-up

14


15/56

b) Copy Data Properties

The Copy Data Properties Wizard provides the ability to use an external SPSS Statistics data

file as a template for defining file and variable properties in the active dataset. You can also use

variables in the active dataset as templates for other variables in the active dataset. You can

copy selected file properties from an external data file or open dataset to the active dataset. File

properties include documents, file labels, multiple response sets, variable sets, and weighting.

Copy selected variable properties from an external data file or open dataset to matching

variables in the active dataset. Variable properties include value labels, missing values, level of

measurement, variable labels, print and write formats, alignment, and column width (in the Data

Editor). Copy selected variable properties from one variable in an external data file, open

dataset, or the active dataset to many variables in the active dataset. Create new variables in the

active dataset based on selected variables in an external data file or open dataset.

When copying data properties, the following general rules apply:

If you use an external data file as the source data file, it must be a data file in SPSS

Statistics format.

15


16/56

If you use the active dataset as the source data file, it must contain at least one variable.

You cannot use a completely blank active dataset as the source data file.

Undefined (empty) properties in the source dataset do not overwrite defined properties

in the active dataset.

Variable properties are copied from the source variable only to target variables of a

matching typestring (alphanumeric) or numeric (including numeric, date, and

currency).

From the menus in the Data Editor window choose: Data-Copy Data Properties. Select the data

file with the file and/or variable properties that you want to copy. This can be a currently open

dataset, an external SPSS Statistics data file, or the active dataset. Follow the step-by-step

instructions in the Copy Data Properties Wizard.

Data>Copy data set> follow the instructions of the wizard

Define Dates

The Define Dates dialog box allows you to generate date variables that can be used to establish

the periodicity of a time series and to label output from time series analysis.

Name Label

YEAR_ YEAR, not periodicQUARTER_ QUARTER, period 4MONTH_ MONTH, period 12DATE_ DATE. FORMAT: "MMM YYYY"

The following is a partial listing of the new variables:

16


17/56

YEAR_ QUARTER_ MONTH_ DATE_

1950 2 4 APR 19501950 2 5 MAY 19501950 2 6 JUN 19501950 3 7 JUL 1950

1950 3 8 AUG 1950

Define Multiple Response Sets

To define multiple responses sets:

From the menus, choose-Data- Define Multiple Response Sets

Select two or more variables. If your variables are coded as dichotomies, indicate which

value you want to have counted.

Enter a unique name for each multiple response set. The name can be up to 63 bytes

long. A dollar sign is automatically added to the beginning of the set name.

17


18/56

Enter a descriptive label for the set. (This is optional.)

Click Add to add the multiple response set to the list of defined sets.

Identify Duplicate Cases

Duplicate cases may occur in your data for many reasons, including:

Data entry errors in which the same case is accidentally entered more than once.

Multiple cases share a common primary ID value but have different secondary ID

values, such as family members who all live in the same house.

Multiple cases represent the same case but with different values for variables other than

those that identify the case, such as multiple purchases made by the same person or

company for different products or at different times.

Note: Take employee id and current salary to check the duplicate case

18


19/56

Sort Cases

Sort Cases procedure reorders the sequence of cases based on the values of one or more

variables. You can optionally sort cases in ascending or descending order, or you can use

combinations of ascending and descending order for different variables. For example, if you

select gender as the first sorting variable and minority as the second sorting variable, cases will

be sorted by minority classification within each gender category.

Note: Sort cases by jobcat in Descending order

19


20/56

Sort Variables

You can sort the variables in the active dataset based on the values of any of the variable

attributes (e.g., variable name, data type, measurement level), including custom variable

attributes. Values can be sorted in ascending or descending order. You can save the original

(pre-sorted) variable order in a custom variable attribute.

Note: Sort variables by width in ascending order

20


21/56

Transpose

Transpose procedure creates a new data file in which the rows and columns in the original data

file are transposed so that cases (rows) become variables and variables (columns) become cases.

Transpose automatically creates new variable names and displays a list of the new variable

names. A new Untitled file is created with the transposed data set.

Ex: Flip variables = bdate gender salary salbegin by id

21


22/56

Note: Variable view after transpose

Note: Data view after transpose

Please note: the values of gender are string and hence will be converted into SYSMIS.

Restructuring Data

Restructuring Data Wizard can help replace the current file with a new, restructured file. The

wizard can:

Restructure selected variables into cases.

Restructure selected cases into variables.

Transpose all data.

There are 7 steps to complete restructuring the data. You just need to feed variables in each step

following the instructions given by SPSS. The screen shots for select steps are given below.

22


23/56

23


24/56

24


25/56

Merging Data Files

This procedure helps you to merge data from two files in two different senses. You can:

Merge the active dataset with another open dataset or SPSS-format data file

containing the same variables but different cases.

Take employee data Employee_MergeCase1and merge Empolyee_MergeCase2

with that. Verify the number of cases and variable once the data is merged.

Merge the active dataset with another open dataset or SPSS-format data file

containing the same cases but different variables.

Take employee data Employee_MergeVariable1and merge

Empolyee_MergeVariable2 with that. Verify the number of cases and variable

once the data is merged.

25


26/56

Note: Merge data file containing the same variables but different cases

26


27/56

Note: Merge data file containing the different variables but same cases

Aggregate Data

Aggregate Data procedure aggregates groups of cases in the dataset into single cases and creates

a new, aggregated file or creates new variables in the active dataset that contain aggregated

data. Cases are aggregated based on the value of one or more break /grouping variables.If you

create a new, aggregated data file, the new data file contains one case for each group defined by

the break variables. For example, if there is one break variable with two values, the new data

file will contain only two cases.

Exercise: From employee data base, take gender as a Break Variable and education, Job

category, salary, salary beginning, previous experience as aggregated variables.

27


28/56

Note: Employee dataset before aggregate

28


29/56

Note: New variable after aggregate

Copy Dataset

By the click of the option, SPSS creates one complete duplicate dataset.

Split File

Split File procedure splits the data file into separate groups for analysis based on the values of

one or more grouping variables. If you select multiple grouping variables, cases are grouped by

each variable within categories of the preceding variable. Based on the purpose the files may be

split up in two ways.

Compare groups: This option may split up file and compute the statistical procedures

according to groups defined. The results are presented together for comparison purpose.

Organize output by groups: All results from each statistical procedure are displayed

separately for each split up file group.

29

Variables aggregate

by gender


30/56

Select Cases

Select Cases procedure provides several methods for selecting a subgroup of cases based on

criteria that include variables and complex arithmetical / logical expressions. You can also

select a random sample of cases. The criteria used to define a subgroup can include:

Variable values and ranges

Date and time ranges

Case (row) numbers

Arithmetic expressions

Logical expressions

Functions

30


31/56

Weight Cases

Weight Cases procedure gives cases different weights (equivalent to frequency) for statistical

analysis. This option helps the researcher to work with different sample schemes other than

simple random sampling.

The values of the weighting variable should indicate the number of observations represented

by single cases in your data file.

Cases with zero, negative, or missing values for the weighting variable are excluded from

analysis.

Fractional values are valid; they are used exactly where this is meaningful and most likely

where cases are tabulated.

Once you apply a weight variable, it remains in effect until you select another weight variable

or turn off weighting. If you save a weighted data file, weighting information is saved with the

data file. You can turn off weighting at any time, even after the file has been saved in weighted

form.

31


32/56

TRANSFORM MENU IN SPSS

This menu helps to change, or transform, the values associated with the variables. A number of

data transformation procedures provided in the Transform Menu. The following are the

procedures available in Transform Menu.

32

Weight on

No Weights


33/56

Computing Variables

The compute procedure opens up a dialog box that may help to compute values for a defined

variable based on arithmetic computations defined over other variables.

You can compute values for numeric or string (alphanumeric) variables.

You can create new variables or replace the values of existing variables. For new variables,

you can also specify the variable type and label.

You can compute values selectively for subsets of data based on logical conditions.

You can use a large variety of built-in functions, including arithmetic functions, statistical

functions, distribution functions, and string functions.

Exercise: From Employee database- create a new variable called NewSalary Add 2000

to the current salary.

33


34/56

Exercise: From Employee database- create a new variable called Salary_Difference

Find the difference between current salary and beginning salary.

Count Values within Cases

This dialog box creates a variable that counts the occurrences of the same value(s) in a list ofvariables for each case. For example, a survey might contain a list of magazines with yes/no

34

Salary Difference =

Current salary-

beginning salary

NewSalary=current Salary+

20000


35/56

check boxes to indicate which magazines each respondent reads. You could count the number

of yes responses for each respondent to create a new variable that contains the total number of

magazines read.

Exercise: From the employee data set- Create a new variable empcat_Minority which

has a count of job category=3 and minority=0

35


36/56

Shift Values

In the procedure a new variable is created out of the existing variable either with a lead or lag

values of the existing variable. We may also simply assign a new name to the existing variable.

Example: From employee data set take variable Salary_Difference and change it to

Salary_Gap with a lag of 1.

36

Empcat_Minority=

2 where job

category is 3 and

minority is 0


37/56

Recode into same variable

The Recode into same variables dialog box allows you to reassign the values of existing

variables or collapse ranges of existing values into new values for a new variable. For example,

you could collapse salaries into a new variable containing salary-range categories.

You can recode numeric and string variables.

You can recode numeric variables into string variables and vice versa.

If you select multiple variables, they must all be the same type. You cannot recode numeric

and string variables together.

Exercise: Take variable Gender from the Employee data base. Recode male as 1 and

female as 0.

37

Salary_Gap with

one lag from

Salary_Difference


38/56

38

Gender variable

has been recoded:

m is recoded as 1

and f as 0

Gender takes the value as

m for male and f for

female before recoding


39/56

Recode into different variable

The Recode into Different Variables dialog box allows you to reassign the values of existing

variables or collapse ranges of existing values into new values for a new variable. For example,

you could collapse salaries into a new variable containing salary-range categories.

You can recode numeric and string variables.

You can recode numeric variables into string variables and vice versa.

If you select multiple variables, they must all be the same type. You cannot recode numeric

and string variables together. This is less risky because you are able to retain your original

variable.

Exercise: Convert the values of variable gender which is in numeric form of 1 and 0

as Male and Female. Create a new variable called Gender_String for the same.

39


40/56

Automatic Recode

The Automatic Recode dialog box allows converting string and numeric values into consecutive

integers. When category codes are not sequential, the resulting empty cells reduce performance

and increase memory requirements for many procedures. Additionally, some procedures cannot

use string variables, and some require consecutive integer values for factor levels.

The new variable(s) created by Automatic Recode retain any defined variable and value labels

from the old variable. For any values without a defined value label, the original value is used as

the label for the recoded value. A table displays the old and new values and value labels.

String values are recorded in alphabetical order, with uppercase letters preceding their

lowercase counterparts.

Missing values are recoded into missing values higher than any nonmissing values, with their

order preserved. For example, if the original variable has 10 nonmissing values, the lowest

missing value would be recoded to 11, and the value 11 would be a missing value for the new

variable.

40

Gender_string is a new

variable with values in

strings. The old variable

gender has been retained


41/56

Use the same recoding scheme for all variables. This option allows you to apply a single

autorecoding scheme to all the selected variables, yielding a consistent coding scheme for all

the new variables.

Exercise: Take variable salary Beginning and apply a automatic recode. Name the new

variable as Recoded_BeginSalary. Count how many people have been drawing lowest

beginning salary.

41


42/56

Visual Binning

Visual Binning is designed to assist the process of creating new variables based on grouping of

continuous values of existing variables into a limited number of distinct categories. We can use

Visual Binning to:

Create categorical variables from continuous scale variables. For example, we could use

a scale income variable to create a new categorical variable that contains income ranges.

Collapse a large number of ordinal categories into a smaller set of categories. For

example, we could collapse a rating scale of nine down to three categories representing

low, medium, and high.

In the first step, we select the numeric scale and/or ordinal variables for which we want

to create new categorical (binned) variables. Optionally, we can limit the number of

cases to scan. For data files with a large number of cases, limiting the number of cases

scanned can save time, but we should avoid this if possible because it will affect the

distribution of values used in subsequent calculations in Visual Binning.

42

The beginning salary has

been ranked in ascending

order.


43/56

Note: String variables and nominal numeric variables are not displayed in the source

variable list. Visual Binning requires numeric variables, measured on either a scale or

ordinal level, since it assumes that the data values represent some logical order that can

be used to group values in a meaningful fashion. We can change the definedmeasurement level of a variable in Variable View in the Data Editor.

Example: Take Education level which is a continuous variable and change it into

categorical variable with the help of binning. .

43


44/56

44


45/56

Optimal Binning

The Optimal Binning procedure discreteness one or more scale variables (referred to henceforth

as binning input variables) by distributing the values of each variable into bins. Bin formation is

optimal with respect to a categorical guide variable that "supervises" the binning process. Bins

can then be used instead of the original data values for further analysis. For example: reducing

the number of distinct values a variable takes has a number of uses, including:

Data requirements of other procedures. Discredited variables can be treated as

categorical for use in procedures that require categorical variables. For example, the

Crosstabs procedure requires that all variables be categorical.

Data privacy. Reporting binned values instead of actual values can help safeguard the

privacy of were data sources. The Optimal Binning procedure can guide the choice of

bins.

45

Education level

with categories


46/56

Speed performance. Some procedures are more efficient when working with a reduced

number of distinct values. For example, the speed of Multinomial Logistic Regression

can be improved using discredited variables.

Uncovering complete or quasi-complete separation of data.

Optimal versus Visual Binning. The Visual Binning dialog boxes offer several automatic

methods for creating bins without the use of a guide variable. These "unsupervised"

rules are useful for producing descriptive statistics, such as frequency tables, but

Optimal Binning is superior when ever end goal is to produce a predictive model.

Output- The procedure produces tables of cut points for the bins and descriptive

statistics for each binning input variable. Additionally, we can save new variables to the

active dataset containing the binned values of the binning input variables and save the

binning rules as command syntax for use in discrediting new data.

Exercise: Group Current Salary with respect to Educational level binned.

46


47/56

Output: Current salary binned with respect to Educational Level group

Current Salary

Bin

End Point Number of Cases by Level of Educational Level (years) (Binned)

Lower Upper 12 - 14 15 - 17 18+ Total

1 a $31,050 220 68 2 290

2 $31,050 $43,000 28 64 2 94

3 $43,000 $59,375 0 33 8 41

4 $59,375 a 1 10 38 49

Total 249 175 50 474

Each bin is computed as Lower


48/56

New variable names and descriptive variable labels are automatically generated, based on the

original variable name and the selected measure(s). A summary table lists the original variables,

the new variables, and the variable labels.

Optionally, you can:

Rank cases in ascending or descending order

Organize rankings into subgroups by selecting one or more grouping variables for the by

list. Ranks are computed within each group. Groups are defined by the combination of

values of the grouping variables. For example, if you select gender and minority as

grouping variables, ranks are computed for each combination of gender and minority.

Exercise: Rank Beginning salary in descending order.

48


49/56

Create Time Series

Several data transformations that are useful in time series analysis are provided in this

procedure:

Generate date variables to establish periodicity and to distinguish between historical,

validation, and forecasting periods.

Create new time series variables as functions of existing time series variables.

Replace system- and user-missing values with estimates based on one of several

methods.

A time series is obtained by measuring a variable (or set of variables) regularly over a

period of time. Time series data transformations assume a data file structure in which

49

Ranks are assigned to

Beginning salary in

descending order


50/56

each case (row) represents a set of observations at a different time, and the length of

time between cases is uniform.

Exercise: Create a time series taking variable current salary using cumulative sum.

Replace Missing Values

Missing observations can be problematic in analysis, and some time series measures cannot be

computed if there are missing values in the series. Sometimes the value for a particular

observation is simply not known. In addition, missing data can result from any of the following:

Each degree of differencing reduces the length of a series by 1.

Each degree of seasonal differencing reduces the length of a series by one season.

50

Cumulative

sum of current

salary


51/56

If you create new series that contain forecasts beyond the end of the existing series (by

clicking a Save button and making suitable choices), the original series and the

generated residual series will have missing data for the new observations.

Some transformations (for example, the log transformation) produce missing data forcertain values of the original series.

Missing data at the beginning or end of a series pose no particular problem; they simply

shorten the useful length of the series. Gaps in the middle of a series (embedded missing

data) can be a much more serious problem. The extent of the problem depends on the

analytical procedure you are using.

The Replace Missing Values dialog box allows you to create new time series variables from

existing ones, replacing missing values with estimates computed with one of several methods.Default new variable names are the first six characters of the existing variable used to create it,

followed by an underscore and a sequential number. For example, for the variable price, the

new variable name would be price_1. The new variables retain any defined value labels from

the original variables.

Exercise: Replace missing values in the variable Salary Difference by using series mean.

51


52/56

Random Number Seed

The Random Number Seed dialog box allows to select the random number generator and to set

the seed value so as to reproduce a sequence of random numbers. Two different random number

generators are available:

Version 12 Compatible. The random number generator used in version 12 and previous

releases. If you need to reproduce randomized results generated in previous releases

based on a specified seed value, use this random number generator.

Mersenne Twister. A newer random number generator that is more reliable for

simulation purposes. If reproducing randomized results from version 12 or earlier is not

an issue, use this random number generator.

The random number seed changes each time a random number is generated for use in

transformations (such as random distribution functions), random sampling, or case weighting.

To replicate a sequence of random numbers, set the initialization starting point value prior to

each analysis that uses the random numbers. The value must be a positive integer.

52

Missing values have been

substituted with series mean


53/56

ANALYSE MENU IN SPSS

The Analyze Menu is the work horse of SPSS. Nearly all procedures that generate output are

located on this menu. Here only most important of these menu is discussed.

1. Descriptive Statistics.

a. Frequencies Statistics

The Frequencies procedure provides statistics and graphical displays that are useful for

describing many types of variables. The frequency procedure reports frequency table along

with select statistics and graphs, viz., Percentile values, Central Tendency, Dispersion,

Skewness and Kurtosis, and basic charts.

b. Descriptives Options

One or more of the following subgroup statistics for the variables within each category of each

grouping variable: sum, number of cases, mean, median, grouped median, standard error of the

mean, minimum, maximum, range, variable value of the first category of the grouping variable,

variable value of the last category of the grouping variable, standard deviation, variance,kurtosis, standard error of kurtosis, skewness, standard error of skewness, percentage of total

53


54/56

sum, percentage of total N, percentage of sum in, percentage of N in, geometric mean, and

harmonic mean may be computed. You can change the order in which the subgroup statistics

appear. The order in which the statistics appear in the Cell Statistics list is the order in which

they are displayed in the output. Summary statistics are also displayed for each variable across

all categories.

c. Explore Statistics

Along with descriptive statistics, M-estimators and Huber's M-estimator are displayed.

Outliers display the five largest and five smallest values with case labels.

d. Crosstabs

The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and

measures of association for two-way tables.

2. Compare Means

The following set of procedures help to compare the differences in means among two or more

groups.

a. Independent-Samples T Test

The Independent-Samples T Test procedure compares means for two groups of cases. Ideally,

for this test, the subjects should be randomly assigned to two groups, so that any difference in

response is due to the treatment (or lack of treatment) and not to other factors. This is not the

case if you compare average income for males and females.

b. Paired-Samples T Test

The Paired-Samples T Test procedure compares the means of two variables for a single group.

The procedure computes the differences between values of the two variables for each case and

tests whether the average differs from 0.

c. One-Way ANOVA

The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitativedependent variable by a single factor (independent) variable. Analysis of variance is used to test

the hypothesis that several means are equal. This technique is an extension of the two-sample t

test.

In addition to determining that differences exist among the means, you may want to know

which means differ. There are two types of tests for comparing means: a priori contrasts and

post hoc tests. Contrasts are tests set up before running the experiment, and post hoc tests are

run after the experiment has been conducted. You can also test for trends across categories.

54


55/56

3. GLM Model

The GLM procedure provides regression analysis and analysis of variance for one dependent

variable by one or more factors and/or variables. The factor variables divide the population into

groups. Using this General Linear Model procedure, you can test null hypotheses about the

effects of other variables on the means of various groupings of a single dependent variable. You

can investigate interactions between factors as well as the effects of individual factors, some of

which may be random. In addition, the effects of covariates and covariate interactions with

factors can be included. For regression analysis, the independent (predictor) variables are

specified as covariates.

4. Bivariate Correlations Options

The Bivariate Correlations procedure computes Pearson's correlation coefficient, Spearman's

rho, and Kendall's tau-b with their significance levels. Correlations measure how variables or

rank orders are related. Before calculating a correlation coefficient, screen your data for outliers

(which can cause misleading results) and evidence of a linear relationship. Pearson's correlation

coefficient is a measure of linear association. Two variables can be perfectly related, but if the

relationship is not linear, Pearson's correlation coefficient is not an appropriate statistic for

measuring their association.

5. Partial Correlations

The Partial Correlations procedure computes partial correlation coefficients that describe the

linear relationship between two variables while controlling for the effects of one or more

additional variables. Correlations are measures of linear association. Two variables can be

perfectly related, but if the relationship is not linear, a correlation coefficient is not an

appropriate statistic for measuring their association.

6. Linear Regression Variable Selection Methods

Method selection allows you to specify how independent variables are entered into the analysis.

Using different methods, you can construct a variety of regression models from the same set of

variables.

7. Discriminant Analysis

Discriminant analysis builds a predictive model for group membership. The model is composed

of a discriminant function (or, for more than two groups, a set of discriminant functions) based

on linear combinations of the predictor variables that provide the best discrimination between

55


56/56

the groups. The functions are generated from a sample of cases for which group membership is

known; the functions can then be applied to new cases that have measurements for the predictor

variables but have unknown group membership.

8. Factor Analysis

Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of

correlations within a set of observed variables. Factor analysis is often used in data reduction to

identify a small number of factors that explain most of the variance that is observed in a much

larger number of manifest variables. Factor analysis can also be used to generate hypotheses

regarding causal mechanisms or to screen variables for subsequent analysis (for example, to

identify collinearity prior to performing a linear regression analysis).

SPSS-All function.doc

Documents

Transcript of SPSS-All function.doc