SPSS-All function.doc
-
Upload
asif-mohammed -
Category
Documents
-
view
217 -
download
0
Transcript of SPSS-All function.doc
-
7/30/2019 SPSS-All function.doc
1/56
INTRODUCTION TO SPSS
Statistical Package for Social Sciences is what SPSS stands for. As its name implies it is a
statistical package that was originally designed for the handling of data generated in the process
of social science studies. But, currently, it is widely used in other areas, too. For example it is
used by Governments, businesses, law enforcement agencies, health care providers, academics
and also in experimental and observational studies.
So, what is SPSS? SPSS is a simple package to use. The user interface of the package is a
spreadsheet. In this spreadsheet too, there are cells, columns and rows. The columns represent
the variables and the rows, cases. Cases and variables are the two main components in statistics.
Case is the subject of analysis. This could be an animal in a scientific experiment or a personreplying a questionnaire. Variables are the measurements obtained on the various characteristics
of each case. Data so obtained can be analyzed by this package by means of descriptive and
bivariate or multivariate statistical methods.
SPSS differs from other spreadsheets in that the analysis is done in pull-down menus through
commands instead of analyzing within the spreadsheet. The output also does not appear in the
spreadsheet itself as common in other spreadsheets, but in a separate window. The output of
SPSS is comprehensive. That is to say the package may give additional outputs to augment the
expected output. For example, in addition to a graph it may give a histogram, the mean and the
standard deviation. SPSS can take data from almost any type of file and use them to generate
tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct
complex statistical analyses.
DIFFERENT TYPES OF WINDOWS IN A TYPICAL STATISTICAL SOFTWARE
There are a number of different types of windows in typical statistical software:
1. Data Editor Window / Object Window / Variable Window
Default window with a blank data sheet ready for analyses. This window displays the contents
of the data file. You may create new data files, or modify existing ones with the Data Editor.
The Data Editor window opens automatically when you start an SPSS session.
1
-
7/30/2019 SPSS-All function.doc
2/56
2. Viewer Window / Output Window / Log file / Results Window
The Viewer window displays the statistical results, tables, and charts from the analysis you
performed (e.g., descriptive statistics, correlations, plots, charts). A Viewer window opens
automatically when you run a procedure that generates output. In the Viewer windows, you can
edit, move, delete and copy your results. Whenever a procedure is run, the output is directed to
a separate window. One can also have multiple [Output] windows open to organize the various
analyses that might be conducted. Later, these results can be saved and/or printed.
3. Syntax Editor Window / Do File
You can paste your dialog box choices into a Syntax Editor window in SPSS, where your
selections appear in the form of command syntax. You can then edit the command syntax to
utilize special features of SPSS not available through dialog boxes. You can save these
commands in a file for use in subsequent SPSS sessions. Similarly in some softwares like
STATA you can enter several lines of command in the do file editor. Either you can run the
selected line in the Do Editor OR Do from the start line to the end. This can be created from the
event history window.
4. Chart Editor Window
You can modify and save high-resolution charts and plots in chart windows. You can change
the colors, select different type fonts or sizes, switch the horizontal and vertical axes, rotate 3-D
scatter plots, and even change the chart type.
5. Script Editor Window
Scripting allow you to customize and automate many tasks in a typical statistical software. Use
the Script Editor to create and modify basic scripts.
6. Command WindowCommand Window is where you will type your commands with syntax. To send a command to
software, hit the "return" or "enter" key.
7. Event History Window
The STATA Review Window lists all of the STATA commands that have been executed since
STATA opened. These can be repeated by double-clicking them and then clicking into the
Command Window and hitting Enter. The Review Window records your commands. The
2
-
7/30/2019 SPSS-All function.doc
3/56
Results window displays your output. The variables window lists the variables in the data set
you are using. The Results window is the Log Window.
A BRIEF ABOUT THE DATA SET
Before discussing the types of files generation and various menus in the SPSS, it will be useful
to understand the dataset which we are going to use as an example throughout.
A Sample Research Problem: The Employee Income study
The SPSS file name of the data set used with this manual is Employee Data.sav; it stands for
Employee Income Data. It is based on a sample data found in SPSS. The current data set
is a sample of 474 employees drawn randomly from the larger employee population.
Also, there are several kinds of personal, social and occupational data such as Gender,
Date of Birth, Minority Classification, Educational Level (years), Employment
Category, Current Salary, Beginning Salary, Months Since Hire and Previous
Experience (months). The description of the variables is given below:
Variable Description:
Variable Name Variable Description/Label
id Employee Code
gender Gender (M/F)
bdate Date of Birth
educ Educational Level (years)
jobcat
Employment Category (Clerical, Custodial,
Manager)
salary Current Salary
salbegin Beginning Salary
jobtime Months Since Hire
prevexp Previous Experience (months)
minority Minority Classification (yes/no)
DIFFERENT WINDOWS AND FILES / FILE EXTENSIONS:
Each window corresponds to a separate type of files in statistical software.
3
-
7/30/2019 SPSS-All function.doc
4/56
Data Files
There are two basic types of files in SPSS. The first is the data file window (.sav). This is
where all the data for your analysis resides. When you open up a data file, it will appear in the
Program Editor window. The format is similar to a spreadsheet with a grid of rows and
columns. The columns represent variables and the rows represent observations. You can place
the cursor on the column heading to get a lengthier description of each variable. To get
complete information on any variable, go to the UTILITIES menu and click on variables.
Data view>Utilities>Variables
Data view
4
Variable Name
Observation (Case
Number)
-
7/30/2019 SPSS-All function.doc
5/56
Variable view
Output files
The second type of file is an output window file (.spv). When a statistical procedure is run,
output is produced. The Viewer window will automatically open to show the output. The left
pane contains an outline view of the output. The right pane contains the contents of the output
which include tables, charts, and text. There are book icons in the outline view next to the
various objects of output. If the book is open, it indicates that the output is visible. If the book
is closed, it is hidden.
5
Variable
characteristics
(entire row)
Variables
(entire
column)
-
7/30/2019 SPSS-All function.doc
6/56
Analyze> Descriptive statistics>Frequencies>Employment Category>OK
DIFFERENT MENUS IN A TYPICAL STATISTICAL SOFTWARE
Many of the tasks you may want to perform with a typical statistical software start with menu
selections. Interestingly, in many statistical softwares each window would have its own set of
menu bars with different options. The Data Editor window in SPSS, for example, has the
following menu.
Most menus in this window are similar to the ones found in windows menu and some are unique
/ specific to the task of Data Editor. Data Editor Window has ten main menus. The different
menus are described in more detail below:
6
-
7/30/2019 SPSS-All function.doc
7/56
1. File
The File menu has an option to create a new SPSS system file, open an existing system file,
read in spreadsheet or database files created by other software programs, read in an external
ASCII/EXCEL data file from the Data Editor; create a command file, retrieve an already
created SPSS command file into the Syntax Editor; open, save, and print output files from the
Viewer and Pivot Table Editor; and save chart templates and export charts in external formats in
the Chart Editor, etc.
2. Edit
The Edit menu has an option to cut, copy, and paste data values from the Data Editor; modify or
copy text from the Viewer or Syntax Editor; copy charts for pasting into other applications from
the Chart Editor, etc.
3. View
The View menu has an option to turn toolbars and the status bar on and off, and turn grid lines
on and off from all window types; and control the display of value labels and data values in the
Data Editor.
4. Data
The Data menu has an option to make global changes to SPSS data files, such as transposing
variables and cases, or creating subsets of cases for analysis, and merging files. These changes
are only temporary and do not affect the permanent file unless you save the file with the
changes.
5. Transform
The Transform menu has an option to make changes to selected variables in the data file and to
compute new variables based on the values of existing ones. These changes are temporary anddo not affect the permanent file unless you save the file with changes.
6. Analyze
The Analyze menu is the important menu which contains all statistical procedures specific to
SPSS. This menu will be discussed later in detail.
7. Graphs
7
-
7/30/2019 SPSS-All function.doc
8/56
8
File Edit View
-
7/30/2019 SPSS-All function.doc
9/56
The Graphs menu has an option to create bar charts, pie cha rts, histograms, scatterplots, and
other full-color, high-resolution graphs. Some statistical procedures also generate graphs. All
graphs can be customized with the Chart Editor.
8. Utilities
The Utilities menu has an option to display information about variables in the working data file
and control the list of variables from all window types; change the designated Viewer and
Syntax Editor, etc.
9. Add-ons
The Add-ons menu has an option to view the information of add-on modules.
10. Window
The Window menu has an option to switch between SPSS windows or to minimize all open
SPSS windows.
11. Help
The Help menu has a standard Microsoft Help window containing information on how to use
the various features of SPSS. Context-sensitive help is available through the dialog boxes.
Graphs Utilities Add-ons
9
-
7/30/2019 SPSS-All function.doc
10/56
Window Help
10
-
7/30/2019 SPSS-All function.doc
11/56
TOOLBAR IN SPSS
For each SPSS window there exist a toolbar that provides quick and easy access to common
tasks of that window. Each icon in the tool bar is provided with Tool Tips. These Tool Tips
show a brief description of each tool when you put the mouse pointer on the icon.
The Main Toolbar
File Buttons
The first three buttons represent the three most common commands from the File menu Open
an Existing File, Save the Current File and Print the Current File respectively.
Dialog Recall
This button gives you quick access to the previous 12 dialog boxes you were working with. This
is particularly useful when you are building up an analysis and frequently going back and forth
to the same box to change or modify options.
Go to Chart
This button helps you to open Chart Editor.
Go to Data
When you are in a window other than the Data Editor window this button will take you back to
the Data Editor.
Go to Case
11
-
7/30/2019 SPSS-All function.doc
12/56
This button helps you to go quickly to a specific case in the data editor. It is helpful in editing
data, when an abnormal data points / outliers are found in your analysis and you want to check
out the source data.
Variables
This button creates a dialog box containing a list of all the variables defined in the data file.
Selecting a variable from this list displays the properties of variables viz., its name, label, type,
information about missing values and the value labels. This box can be kept open while you
work with the data file so that you can examine a variable's information as you examine the
results of an analysis.
Find
This button helps you to carry out a simple search to find a value.
Insert Case / Insert Variable
It is not unusual to find yourself wanting to add a case or a variable in the middle of data entry.
These two buttons will help you add a blank row or column in your data set.
Split File/Weight Cases/Select Cases
These buttons help you to do three of the Data Menu commands Split File, Weight Cases and
Select Cases respectively. (Data Menu is discussed later in this section)
Value Labels
This button helps you to display the labels in the data editor so that you dont have to remember
what the numbers meant. Disabling mode of this button would display the number again.
Use Sets
12
-
7/30/2019 SPSS-All function.doc
13/56
In Data Editor window you can group variables together into sets so that the variables can be
analysed together. This button helps you to specify what sets of the ones you have defined you
want to use.
SPECIAL (STATISTICS) MENUS
For every statistical software you may find some menus special and distinct. The special menus
of SPSS are called the Statistics Menus. These special menus are as follows:
1. Data Menu
2. Transform Menu
3. Analyse Menu
DATA MENU IN SPSS.
Data Menu provides procedures to define variables, insert variables or cases, sort cases, merge
files, split files, select cases and use a variable to weight cases. Some of the menu items in the
Data Menu such as sorting, merging and transposing data sets and for selecting subset of cases
and splitting files by variables are explained below.
a) Define variable properties:
SPSS offers a wizard-type tool that helps you to set all variable properties using an interactive
interface. Although it can be used for all types of variables, it is especially useful for categorical
variables, as it scans the actual variables for all distinct values. From the menu select Data -
Define variable properties - then a first panel appears that lets you select the variables for which
you want to set or change properties:
Data>Define variable properties> take gender>Variable to scan
13
-
7/30/2019 SPSS-All function.doc
14/56
Select the variables; you can also limit the number of cases to scan (useful with very large files)
and as the tool is best use with categorical variables, you can also limit the number of values
(codes) that should be displayed-When you click continue the next panel will pop-up
14
-
7/30/2019 SPSS-All function.doc
15/56
b) Copy Data Properties
The Copy Data Properties Wizard provides the ability to use an external SPSS Statistics data
file as a template for defining file and variable properties in the active dataset. You can also use
variables in the active dataset as templates for other variables in the active dataset. You can
copy selected file properties from an external data file or open dataset to the active dataset. File
properties include documents, file labels, multiple response sets, variable sets, and weighting.
Copy selected variable properties from an external data file or open dataset to matching
variables in the active dataset. Variable properties include value labels, missing values, level of
measurement, variable labels, print and write formats, alignment, and column width (in the Data
Editor). Copy selected variable properties from one variable in an external data file, open
dataset, or the active dataset to many variables in the active dataset. Create new variables in the
active dataset based on selected variables in an external data file or open dataset.
When copying data properties, the following general rules apply:
If you use an external data file as the source data file, it must be a data file in SPSS
Statistics format.
15
-
7/30/2019 SPSS-All function.doc
16/56
If you use the active dataset as the source data file, it must contain at least one variable.
You cannot use a completely blank active dataset as the source data file.
Undefined (empty) properties in the source dataset do not overwrite defined properties
in the active dataset.
Variable properties are copied from the source variable only to target variables of a
matching typestring (alphanumeric) or numeric (including numeric, date, and
currency).
From the menus in the Data Editor window choose: Data-Copy Data Properties. Select the data
file with the file and/or variable properties that you want to copy. This can be a currently open
dataset, an external SPSS Statistics data file, or the active dataset. Follow the step-by-step
instructions in the Copy Data Properties Wizard.
Data>Copy data set> follow the instructions of the wizard
Define Dates
The Define Dates dialog box allows you to generate date variables that can be used to establish
the periodicity of a time series and to label output from time series analysis.
Name Label
YEAR_ YEAR, not periodicQUARTER_ QUARTER, period 4MONTH_ MONTH, period 12DATE_ DATE. FORMAT: "MMM YYYY"
The following is a partial listing of the new variables:
16
-
7/30/2019 SPSS-All function.doc
17/56
YEAR_ QUARTER_ MONTH_ DATE_
1950 2 4 APR 19501950 2 5 MAY 19501950 2 6 JUN 19501950 3 7 JUL 1950
1950 3 8 AUG 1950
Define Multiple Response Sets
To define multiple responses sets:
From the menus, choose-Data- Define Multiple Response Sets
Select two or more variables. If your variables are coded as dichotomies, indicate which
value you want to have counted.
Enter a unique name for each multiple response set. The name can be up to 63 bytes
long. A dollar sign is automatically added to the beginning of the set name.
17
-
7/30/2019 SPSS-All function.doc
18/56
Enter a descriptive label for the set. (This is optional.)
Click Add to add the multiple response set to the list of defined sets.
Identify Duplicate Cases
Duplicate cases may occur in your data for many reasons, including:
Data entry errors in which the same case is accidentally entered more than once.
Multiple cases share a common primary ID value but have different secondary ID
values, such as family members who all live in the same house.
Multiple cases represent the same case but with different values for variables other than
those that identify the case, such as multiple purchases made by the same person or
company for different products or at different times.
Note: Take employee id and current salary to check the duplicate case
18
-
7/30/2019 SPSS-All function.doc
19/56
Sort Cases
Sort Cases procedure reorders the sequence of cases based on the values of one or more
variables. You can optionally sort cases in ascending or descending order, or you can use
combinations of ascending and descending order for different variables. For example, if you
select gender as the first sorting variable and minority as the second sorting variable, cases will
be sorted by minority classification within each gender category.
Note: Sort cases by jobcat in Descending order
19
-
7/30/2019 SPSS-All function.doc
20/56
Sort Variables
You can sort the variables in the active dataset based on the values of any of the variable
attributes (e.g., variable name, data type, measurement level), including custom variable
attributes. Values can be sorted in ascending or descending order. You can save the original
(pre-sorted) variable order in a custom variable attribute.
Note: Sort variables by width in ascending order
20
-
7/30/2019 SPSS-All function.doc
21/56
Transpose
Transpose procedure creates a new data file in which the rows and columns in the original data
file are transposed so that cases (rows) become variables and variables (columns) become cases.
Transpose automatically creates new variable names and displays a list of the new variable
names. A new Untitled file is created with the transposed data set.
Ex: Flip variables = bdate gender salary salbegin by id
21
-
7/30/2019 SPSS-All function.doc
22/56
Note: Variable view after transpose
Note: Data view after transpose
Please note: the values of gender are string and hence will be converted into SYSMIS.
Restructuring Data
Restructuring Data Wizard can help replace the current file with a new, restructured file. The
wizard can:
Restructure selected variables into cases.
Restructure selected cases into variables.
Transpose all data.
There are 7 steps to complete restructuring the data. You just need to feed variables in each step
following the instructions given by SPSS. The screen shots for select steps are given below.
22
-
7/30/2019 SPSS-All function.doc
23/56
23
-
7/30/2019 SPSS-All function.doc
24/56
24
-
7/30/2019 SPSS-All function.doc
25/56
Merging Data Files
This procedure helps you to merge data from two files in two different senses. You can:
Merge the active dataset with another open dataset or SPSS-format data file
containing the same variables but different cases.
Take employee data Employee_MergeCase1and merge Empolyee_MergeCase2
with that. Verify the number of cases and variable once the data is merged.
Merge the active dataset with another open dataset or SPSS-format data file
containing the same cases but different variables.
Take employee data Employee_MergeVariable1and merge
Empolyee_MergeVariable2 with that. Verify the number of cases and variable
once the data is merged.
25
-
7/30/2019 SPSS-All function.doc
26/56
Note: Merge data file containing the same variables but different cases
26
-
7/30/2019 SPSS-All function.doc
27/56
Note: Merge data file containing the different variables but same cases
Aggregate Data
Aggregate Data procedure aggregates groups of cases in the dataset into single cases and creates
a new, aggregated file or creates new variables in the active dataset that contain aggregated
data. Cases are aggregated based on the value of one or more break /grouping variables.If you
create a new, aggregated data file, the new data file contains one case for each group defined by
the break variables. For example, if there is one break variable with two values, the new data
file will contain only two cases.
Exercise: From employee data base, take gender as a Break Variable and education, Job
category, salary, salary beginning, previous experience as aggregated variables.
27
-
7/30/2019 SPSS-All function.doc
28/56
Note: Employee dataset before aggregate
28
-
7/30/2019 SPSS-All function.doc
29/56
Note: New variable after aggregate
Copy Dataset
By the click of the option, SPSS creates one complete duplicate dataset.
Split File
Split File procedure splits the data file into separate groups for analysis based on the values of
one or more grouping variables. If you select multiple grouping variables, cases are grouped by
each variable within categories of the preceding variable. Based on the purpose the files may be
split up in two ways.
Compare groups: This option may split up file and compute the statistical procedures
according to groups defined. The results are presented together for comparison purpose.
Organize output by groups: All results from each statistical procedure are displayed
separately for each split up file group.
29
Variables aggregate
by gender
-
7/30/2019 SPSS-All function.doc
30/56
Select Cases
Select Cases procedure provides several methods for selecting a subgroup of cases based on
criteria that include variables and complex arithmetical / logical expressions. You can also
select a random sample of cases. The criteria used to define a subgroup can include:
Variable values and ranges
Date and time ranges
Case (row) numbers
Arithmetic expressions
Logical expressions
Functions
30
-
7/30/2019 SPSS-All function.doc
31/56
Weight Cases
Weight Cases procedure gives cases different weights (equivalent to frequency) for statistical
analysis. This option helps the researcher to work with different sample schemes other than
simple random sampling.
The values of the weighting variable should indicate the number of observations represented
by single cases in your data file.
Cases with zero, negative, or missing values for the weighting variable are excluded from
analysis.
Fractional values are valid; they are used exactly where this is meaningful and most likely
where cases are tabulated.
Once you apply a weight variable, it remains in effect until you select another weight variable
or turn off weighting. If you save a weighted data file, weighting information is saved with the
data file. You can turn off weighting at any time, even after the file has been saved in weighted
form.
31
-
7/30/2019 SPSS-All function.doc
32/56
TRANSFORM MENU IN SPSS
This menu helps to change, or transform, the values associated with the variables. A number of
data transformation procedures provided in the Transform Menu. The following are the
procedures available in Transform Menu.
32
Weight on
No Weights
-
7/30/2019 SPSS-All function.doc
33/56
Computing Variables
The compute procedure opens up a dialog box that may help to compute values for a defined
variable based on arithmetic computations defined over other variables.
You can compute values for numeric or string (alphanumeric) variables.
You can create new variables or replace the values of existing variables. For new variables,
you can also specify the variable type and label.
You can compute values selectively for subsets of data based on logical conditions.
You can use a large variety of built-in functions, including arithmetic functions, statistical
functions, distribution functions, and string functions.
Exercise: From Employee database- create a new variable called NewSalary Add 2000
to the current salary.
33
-
7/30/2019 SPSS-All function.doc
34/56
Exercise: From Employee database- create a new variable called Salary_Difference
Find the difference between current salary and beginning salary.
Count Values within Cases
This dialog box creates a variable that counts the occurrences of the same value(s) in a list ofvariables for each case. For example, a survey might contain a list of magazines with yes/no
34
Salary Difference =
Current salary-
beginning salary
NewSalary=current Salary+
20000
-
7/30/2019 SPSS-All function.doc
35/56
check boxes to indicate which magazines each respondent reads. You could count the number
of yes responses for each respondent to create a new variable that contains the total number of
magazines read.
Exercise: From the employee data set- Create a new variable empcat_Minority which
has a count of job category=3 and minority=0
35
-
7/30/2019 SPSS-All function.doc
36/56
Shift Values
In the procedure a new variable is created out of the existing variable either with a lead or lag
values of the existing variable. We may also simply assign a new name to the existing variable.
Example: From employee data set take variable Salary_Difference and change it to
Salary_Gap with a lag of 1.
36
Empcat_Minority=
2 where job
category is 3 and
minority is 0
-
7/30/2019 SPSS-All function.doc
37/56
Recode into same variable
The Recode into same variables dialog box allows you to reassign the values of existing
variables or collapse ranges of existing values into new values for a new variable. For example,
you could collapse salaries into a new variable containing salary-range categories.
You can recode numeric and string variables.
You can recode numeric variables into string variables and vice versa.
If you select multiple variables, they must all be the same type. You cannot recode numeric
and string variables together.
Exercise: Take variable Gender from the Employee data base. Recode male as 1 and
female as 0.
37
Salary_Gap with
one lag from
Salary_Difference
-
7/30/2019 SPSS-All function.doc
38/56
38
Gender variable
has been recoded:
m is recoded as 1
and f as 0
Gender takes the value as
m for male and f for
female before recoding
-
7/30/2019 SPSS-All function.doc
39/56
Recode into different variable
The Recode into Different Variables dialog box allows you to reassign the values of existing
variables or collapse ranges of existing values into new values for a new variable. For example,
you could collapse salaries into a new variable containing salary-range categories.
You can recode numeric and string variables.
You can recode numeric variables into string variables and vice versa.
If you select multiple variables, they must all be the same type. You cannot recode numeric
and string variables together. This is less risky because you are able to retain your original
variable.
Exercise: Convert the values of variable gender which is in numeric form of 1 and 0
as Male and Female. Create a new variable called Gender_String for the same.
39
-
7/30/2019 SPSS-All function.doc
40/56
Automatic Recode
The Automatic Recode dialog box allows converting string and numeric values into consecutive
integers. When category codes are not sequential, the resulting empty cells reduce performance
and increase memory requirements for many procedures. Additionally, some procedures cannot
use string variables, and some require consecutive integer values for factor levels.
The new variable(s) created by Automatic Recode retain any defined variable and value labels
from the old variable. For any values without a defined value label, the original value is used as
the label for the recoded value. A table displays the old and new values and value labels.
String values are recorded in alphabetical order, with uppercase letters preceding their
lowercase counterparts.
Missing values are recoded into missing values higher than any nonmissing values, with their
order preserved. For example, if the original variable has 10 nonmissing values, the lowest
missing value would be recoded to 11, and the value 11 would be a missing value for the new
variable.
40
Gender_string is a new
variable with values in
strings. The old variable
gender has been retained
-
7/30/2019 SPSS-All function.doc
41/56
Use the same recoding scheme for all variables. This option allows you to apply a single
autorecoding scheme to all the selected variables, yielding a consistent coding scheme for all
the new variables.
Exercise: Take variable salary Beginning and apply a automatic recode. Name the new
variable as Recoded_BeginSalary. Count how many people have been drawing lowest
beginning salary.
41
-
7/30/2019 SPSS-All function.doc
42/56
Visual Binning
Visual Binning is designed to assist the process of creating new variables based on grouping of
continuous values of existing variables into a limited number of distinct categories. We can use
Visual Binning to:
Create categorical variables from continuous scale variables. For example, we could use
a scale income variable to create a new categorical variable that contains income ranges.
Collapse a large number of ordinal categories into a smaller set of categories. For
example, we could collapse a rating scale of nine down to three categories representing
low, medium, and high.
In the first step, we select the numeric scale and/or ordinal variables for which we want
to create new categorical (binned) variables. Optionally, we can limit the number of
cases to scan. For data files with a large number of cases, limiting the number of cases
scanned can save time, but we should avoid this if possible because it will affect the
distribution of values used in subsequent calculations in Visual Binning.
42
The beginning salary has
been ranked in ascending
order.
-
7/30/2019 SPSS-All function.doc
43/56
Note: String variables and nominal numeric variables are not displayed in the source
variable list. Visual Binning requires numeric variables, measured on either a scale or
ordinal level, since it assumes that the data values represent some logical order that can
be used to group values in a meaningful fashion. We can change the definedmeasurement level of a variable in Variable View in the Data Editor.
Example: Take Education level which is a continuous variable and change it into
categorical variable with the help of binning. .
43
-
7/30/2019 SPSS-All function.doc
44/56
44
-
7/30/2019 SPSS-All function.doc
45/56
Optimal Binning
The Optimal Binning procedure discreteness one or more scale variables (referred to henceforth
as binning input variables) by distributing the values of each variable into bins. Bin formation is
optimal with respect to a categorical guide variable that "supervises" the binning process. Bins
can then be used instead of the original data values for further analysis. For example: reducing
the number of distinct values a variable takes has a number of uses, including:
Data requirements of other procedures. Discredited variables can be treated as
categorical for use in procedures that require categorical variables. For example, the
Crosstabs procedure requires that all variables be categorical.
Data privacy. Reporting binned values instead of actual values can help safeguard the
privacy of were data sources. The Optimal Binning procedure can guide the choice of
bins.
45
Education level
with categories
-
7/30/2019 SPSS-All function.doc
46/56
Speed performance. Some procedures are more efficient when working with a reduced
number of distinct values. For example, the speed of Multinomial Logistic Regression
can be improved using discredited variables.
Uncovering complete or quasi-complete separation of data.
Optimal versus Visual Binning. The Visual Binning dialog boxes offer several automatic
methods for creating bins without the use of a guide variable. These "unsupervised"
rules are useful for producing descriptive statistics, such as frequency tables, but
Optimal Binning is superior when ever end goal is to produce a predictive model.
Output- The procedure produces tables of cut points for the bins and descriptive
statistics for each binning input variable. Additionally, we can save new variables to the
active dataset containing the binned values of the binning input variables and save the
binning rules as command syntax for use in discrediting new data.
Exercise: Group Current Salary with respect to Educational level binned.
46
-
7/30/2019 SPSS-All function.doc
47/56
Output: Current salary binned with respect to Educational Level group
Current Salary
Bin
End Point Number of Cases by Level of Educational Level (years) (Binned)
Lower Upper 12 - 14 15 - 17 18+ Total
1 a $31,050 220 68 2 290
2 $31,050 $43,000 28 64 2 94
3 $43,000 $59,375 0 33 8 41
4 $59,375 a 1 10 38 49
Total 249 175 50 474
Each bin is computed as Lower
-
7/30/2019 SPSS-All function.doc
48/56
New variable names and descriptive variable labels are automatically generated, based on the
original variable name and the selected measure(s). A summary table lists the original variables,
the new variables, and the variable labels.
Optionally, you can:
Rank cases in ascending or descending order
Organize rankings into subgroups by selecting one or more grouping variables for the by
list. Ranks are computed within each group. Groups are defined by the combination of
values of the grouping variables. For example, if you select gender and minority as
grouping variables, ranks are computed for each combination of gender and minority.
Exercise: Rank Beginning salary in descending order.
48
-
7/30/2019 SPSS-All function.doc
49/56
Create Time Series
Several data transformations that are useful in time series analysis are provided in this
procedure:
Generate date variables to establish periodicity and to distinguish between historical,
validation, and forecasting periods.
Create new time series variables as functions of existing time series variables.
Replace system- and user-missing values with estimates based on one of several
methods.
A time series is obtained by measuring a variable (or set of variables) regularly over a
period of time. Time series data transformations assume a data file structure in which
49
Ranks are assigned to
Beginning salary in
descending order
-
7/30/2019 SPSS-All function.doc
50/56
each case (row) represents a set of observations at a different time, and the length of
time between cases is uniform.
Exercise: Create a time series taking variable current salary using cumulative sum.
Replace Missing Values
Missing observations can be problematic in analysis, and some time series measures cannot be
computed if there are missing values in the series. Sometimes the value for a particular
observation is simply not known. In addition, missing data can result from any of the following:
Each degree of differencing reduces the length of a series by 1.
Each degree of seasonal differencing reduces the length of a series by one season.
50
Cumulative
sum of current
salary
-
7/30/2019 SPSS-All function.doc
51/56
If you create new series that contain forecasts beyond the end of the existing series (by
clicking a Save button and making suitable choices), the original series and the
generated residual series will have missing data for the new observations.
Some transformations (for example, the log transformation) produce missing data forcertain values of the original series.
Missing data at the beginning or end of a series pose no particular problem; they simply
shorten the useful length of the series. Gaps in the middle of a series (embedded missing
data) can be a much more serious problem. The extent of the problem depends on the
analytical procedure you are using.
The Replace Missing Values dialog box allows you to create new time series variables from
existing ones, replacing missing values with estimates computed with one of several methods.Default new variable names are the first six characters of the existing variable used to create it,
followed by an underscore and a sequential number. For example, for the variable price, the
new variable name would be price_1. The new variables retain any defined value labels from
the original variables.
Exercise: Replace missing values in the variable Salary Difference by using series mean.
51
-
7/30/2019 SPSS-All function.doc
52/56
Random Number Seed
The Random Number Seed dialog box allows to select the random number generator and to set
the seed value so as to reproduce a sequence of random numbers. Two different random number
generators are available:
Version 12 Compatible. The random number generator used in version 12 and previous
releases. If you need to reproduce randomized results generated in previous releases
based on a specified seed value, use this random number generator.
Mersenne Twister. A newer random number generator that is more reliable for
simulation purposes. If reproducing randomized results from version 12 or earlier is not
an issue, use this random number generator.
The random number seed changes each time a random number is generated for use in
transformations (such as random distribution functions), random sampling, or case weighting.
To replicate a sequence of random numbers, set the initialization starting point value prior to
each analysis that uses the random numbers. The value must be a positive integer.
52
Missing values have been
substituted with series mean
-
7/30/2019 SPSS-All function.doc
53/56
ANALYSE MENU IN SPSS
The Analyze Menu is the work horse of SPSS. Nearly all procedures that generate output are
located on this menu. Here only most important of these menu is discussed.
1. Descriptive Statistics.
a. Frequencies Statistics
The Frequencies procedure provides statistics and graphical displays that are useful for
describing many types of variables. The frequency procedure reports frequency table along
with select statistics and graphs, viz., Percentile values, Central Tendency, Dispersion,
Skewness and Kurtosis, and basic charts.
b. Descriptives Options
One or more of the following subgroup statistics for the variables within each category of each
grouping variable: sum, number of cases, mean, median, grouped median, standard error of the
mean, minimum, maximum, range, variable value of the first category of the grouping variable,
variable value of the last category of the grouping variable, standard deviation, variance,kurtosis, standard error of kurtosis, skewness, standard error of skewness, percentage of total
53
-
7/30/2019 SPSS-All function.doc
54/56
sum, percentage of total N, percentage of sum in, percentage of N in, geometric mean, and
harmonic mean may be computed. You can change the order in which the subgroup statistics
appear. The order in which the statistics appear in the Cell Statistics list is the order in which
they are displayed in the output. Summary statistics are also displayed for each variable across
all categories.
c. Explore Statistics
Along with descriptive statistics, M-estimators and Huber's M-estimator are displayed.
Outliers display the five largest and five smallest values with case labels.
d. Crosstabs
The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and
measures of association for two-way tables.
2. Compare Means
The following set of procedures help to compare the differences in means among two or more
groups.
a. Independent-Samples T Test
The Independent-Samples T Test procedure compares means for two groups of cases. Ideally,
for this test, the subjects should be randomly assigned to two groups, so that any difference in
response is due to the treatment (or lack of treatment) and not to other factors. This is not the
case if you compare average income for males and females.
b. Paired-Samples T Test
The Paired-Samples T Test procedure compares the means of two variables for a single group.
The procedure computes the differences between values of the two variables for each case and
tests whether the average differs from 0.
c. One-Way ANOVA
The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitativedependent variable by a single factor (independent) variable. Analysis of variance is used to test
the hypothesis that several means are equal. This technique is an extension of the two-sample t
test.
In addition to determining that differences exist among the means, you may want to know
which means differ. There are two types of tests for comparing means: a priori contrasts and
post hoc tests. Contrasts are tests set up before running the experiment, and post hoc tests are
run after the experiment has been conducted. You can also test for trends across categories.
54
-
7/30/2019 SPSS-All function.doc
55/56
3. GLM Model
The GLM procedure provides regression analysis and analysis of variance for one dependent
variable by one or more factors and/or variables. The factor variables divide the population into
groups. Using this General Linear Model procedure, you can test null hypotheses about the
effects of other variables on the means of various groupings of a single dependent variable. You
can investigate interactions between factors as well as the effects of individual factors, some of
which may be random. In addition, the effects of covariates and covariate interactions with
factors can be included. For regression analysis, the independent (predictor) variables are
specified as covariates.
4. Bivariate Correlations Options
The Bivariate Correlations procedure computes Pearson's correlation coefficient, Spearman's
rho, and Kendall's tau-b with their significance levels. Correlations measure how variables or
rank orders are related. Before calculating a correlation coefficient, screen your data for outliers
(which can cause misleading results) and evidence of a linear relationship. Pearson's correlation
coefficient is a measure of linear association. Two variables can be perfectly related, but if the
relationship is not linear, Pearson's correlation coefficient is not an appropriate statistic for
measuring their association.
5. Partial Correlations
The Partial Correlations procedure computes partial correlation coefficients that describe the
linear relationship between two variables while controlling for the effects of one or more
additional variables. Correlations are measures of linear association. Two variables can be
perfectly related, but if the relationship is not linear, a correlation coefficient is not an
appropriate statistic for measuring their association.
6. Linear Regression Variable Selection Methods
Method selection allows you to specify how independent variables are entered into the analysis.
Using different methods, you can construct a variety of regression models from the same set of
variables.
7. Discriminant Analysis
Discriminant analysis builds a predictive model for group membership. The model is composed
of a discriminant function (or, for more than two groups, a set of discriminant functions) based
on linear combinations of the predictor variables that provide the best discrimination between
55
-
7/30/2019 SPSS-All function.doc
56/56
the groups. The functions are generated from a sample of cases for which group membership is
known; the functions can then be applied to new cases that have measurements for the predictor
variables but have unknown group membership.
8. Factor Analysis
Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of
correlations within a set of observed variables. Factor analysis is often used in data reduction to
identify a small number of factors that explain most of the variance that is observed in a much
larger number of manifest variables. Factor analysis can also be used to generate hypotheses
regarding causal mechanisms or to screen variables for subsequent analysis (for example, to
identify collinearity prior to performing a linear regression analysis).