SPSS-All function.doc

download SPSS-All function.doc

of 56

Transcript of SPSS-All function.doc

  • 7/30/2019 SPSS-All function.doc

    1/56

    INTRODUCTION TO SPSS

    Statistical Package for Social Sciences is what SPSS stands for. As its name implies it is a

    statistical package that was originally designed for the handling of data generated in the process

    of social science studies. But, currently, it is widely used in other areas, too. For example it is

    used by Governments, businesses, law enforcement agencies, health care providers, academics

    and also in experimental and observational studies.

    So, what is SPSS? SPSS is a simple package to use. The user interface of the package is a

    spreadsheet. In this spreadsheet too, there are cells, columns and rows. The columns represent

    the variables and the rows, cases. Cases and variables are the two main components in statistics.

    Case is the subject of analysis. This could be an animal in a scientific experiment or a personreplying a questionnaire. Variables are the measurements obtained on the various characteristics

    of each case. Data so obtained can be analyzed by this package by means of descriptive and

    bivariate or multivariate statistical methods.

    SPSS differs from other spreadsheets in that the analysis is done in pull-down menus through

    commands instead of analyzing within the spreadsheet. The output also does not appear in the

    spreadsheet itself as common in other spreadsheets, but in a separate window. The output of

    SPSS is comprehensive. That is to say the package may give additional outputs to augment the

    expected output. For example, in addition to a graph it may give a histogram, the mean and the

    standard deviation. SPSS can take data from almost any type of file and use them to generate

    tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct

    complex statistical analyses.

    DIFFERENT TYPES OF WINDOWS IN A TYPICAL STATISTICAL SOFTWARE

    There are a number of different types of windows in typical statistical software:

    1. Data Editor Window / Object Window / Variable Window

    Default window with a blank data sheet ready for analyses. This window displays the contents

    of the data file. You may create new data files, or modify existing ones with the Data Editor.

    The Data Editor window opens automatically when you start an SPSS session.

    1

  • 7/30/2019 SPSS-All function.doc

    2/56

    2. Viewer Window / Output Window / Log file / Results Window

    The Viewer window displays the statistical results, tables, and charts from the analysis you

    performed (e.g., descriptive statistics, correlations, plots, charts). A Viewer window opens

    automatically when you run a procedure that generates output. In the Viewer windows, you can

    edit, move, delete and copy your results. Whenever a procedure is run, the output is directed to

    a separate window. One can also have multiple [Output] windows open to organize the various

    analyses that might be conducted. Later, these results can be saved and/or printed.

    3. Syntax Editor Window / Do File

    You can paste your dialog box choices into a Syntax Editor window in SPSS, where your

    selections appear in the form of command syntax. You can then edit the command syntax to

    utilize special features of SPSS not available through dialog boxes. You can save these

    commands in a file for use in subsequent SPSS sessions. Similarly in some softwares like

    STATA you can enter several lines of command in the do file editor. Either you can run the

    selected line in the Do Editor OR Do from the start line to the end. This can be created from the

    event history window.

    4. Chart Editor Window

    You can modify and save high-resolution charts and plots in chart windows. You can change

    the colors, select different type fonts or sizes, switch the horizontal and vertical axes, rotate 3-D

    scatter plots, and even change the chart type.

    5. Script Editor Window

    Scripting allow you to customize and automate many tasks in a typical statistical software. Use

    the Script Editor to create and modify basic scripts.

    6. Command WindowCommand Window is where you will type your commands with syntax. To send a command to

    software, hit the "return" or "enter" key.

    7. Event History Window

    The STATA Review Window lists all of the STATA commands that have been executed since

    STATA opened. These can be repeated by double-clicking them and then clicking into the

    Command Window and hitting Enter. The Review Window records your commands. The

    2

  • 7/30/2019 SPSS-All function.doc

    3/56

    Results window displays your output. The variables window lists the variables in the data set

    you are using. The Results window is the Log Window.

    A BRIEF ABOUT THE DATA SET

    Before discussing the types of files generation and various menus in the SPSS, it will be useful

    to understand the dataset which we are going to use as an example throughout.

    A Sample Research Problem: The Employee Income study

    The SPSS file name of the data set used with this manual is Employee Data.sav; it stands for

    Employee Income Data. It is based on a sample data found in SPSS. The current data set

    is a sample of 474 employees drawn randomly from the larger employee population.

    Also, there are several kinds of personal, social and occupational data such as Gender,

    Date of Birth, Minority Classification, Educational Level (years), Employment

    Category, Current Salary, Beginning Salary, Months Since Hire and Previous

    Experience (months). The description of the variables is given below:

    Variable Description:

    Variable Name Variable Description/Label

    id Employee Code

    gender Gender (M/F)

    bdate Date of Birth

    educ Educational Level (years)

    jobcat

    Employment Category (Clerical, Custodial,

    Manager)

    salary Current Salary

    salbegin Beginning Salary

    jobtime Months Since Hire

    prevexp Previous Experience (months)

    minority Minority Classification (yes/no)

    DIFFERENT WINDOWS AND FILES / FILE EXTENSIONS:

    Each window corresponds to a separate type of files in statistical software.

    3

  • 7/30/2019 SPSS-All function.doc

    4/56

    Data Files

    There are two basic types of files in SPSS. The first is the data file window (.sav). This is

    where all the data for your analysis resides. When you open up a data file, it will appear in the

    Program Editor window. The format is similar to a spreadsheet with a grid of rows and

    columns. The columns represent variables and the rows represent observations. You can place

    the cursor on the column heading to get a lengthier description of each variable. To get

    complete information on any variable, go to the UTILITIES menu and click on variables.

    Data view>Utilities>Variables

    Data view

    4

    Variable Name

    Observation (Case

    Number)

  • 7/30/2019 SPSS-All function.doc

    5/56

    Variable view

    Output files

    The second type of file is an output window file (.spv). When a statistical procedure is run,

    output is produced. The Viewer window will automatically open to show the output. The left

    pane contains an outline view of the output. The right pane contains the contents of the output

    which include tables, charts, and text. There are book icons in the outline view next to the

    various objects of output. If the book is open, it indicates that the output is visible. If the book

    is closed, it is hidden.

    5

    Variable

    characteristics

    (entire row)

    Variables

    (entire

    column)

  • 7/30/2019 SPSS-All function.doc

    6/56

    Analyze> Descriptive statistics>Frequencies>Employment Category>OK

    DIFFERENT MENUS IN A TYPICAL STATISTICAL SOFTWARE

    Many of the tasks you may want to perform with a typical statistical software start with menu

    selections. Interestingly, in many statistical softwares each window would have its own set of

    menu bars with different options. The Data Editor window in SPSS, for example, has the

    following menu.

    Most menus in this window are similar to the ones found in windows menu and some are unique

    / specific to the task of Data Editor. Data Editor Window has ten main menus. The different

    menus are described in more detail below:

    6

  • 7/30/2019 SPSS-All function.doc

    7/56

    1. File

    The File menu has an option to create a new SPSS system file, open an existing system file,

    read in spreadsheet or database files created by other software programs, read in an external

    ASCII/EXCEL data file from the Data Editor; create a command file, retrieve an already

    created SPSS command file into the Syntax Editor; open, save, and print output files from the

    Viewer and Pivot Table Editor; and save chart templates and export charts in external formats in

    the Chart Editor, etc.

    2. Edit

    The Edit menu has an option to cut, copy, and paste data values from the Data Editor; modify or

    copy text from the Viewer or Syntax Editor; copy charts for pasting into other applications from

    the Chart Editor, etc.

    3. View

    The View menu has an option to turn toolbars and the status bar on and off, and turn grid lines

    on and off from all window types; and control the display of value labels and data values in the

    Data Editor.

    4. Data

    The Data menu has an option to make global changes to SPSS data files, such as transposing

    variables and cases, or creating subsets of cases for analysis, and merging files. These changes

    are only temporary and do not affect the permanent file unless you save the file with the

    changes.

    5. Transform

    The Transform menu has an option to make changes to selected variables in the data file and to

    compute new variables based on the values of existing ones. These changes are temporary anddo not affect the permanent file unless you save the file with changes.

    6. Analyze

    The Analyze menu is the important menu which contains all statistical procedures specific to

    SPSS. This menu will be discussed later in detail.

    7. Graphs

    7

  • 7/30/2019 SPSS-All function.doc

    8/56

    8

    File Edit View

  • 7/30/2019 SPSS-All function.doc

    9/56

    The Graphs menu has an option to create bar charts, pie cha rts, histograms, scatterplots, and

    other full-color, high-resolution graphs. Some statistical procedures also generate graphs. All

    graphs can be customized with the Chart Editor.

    8. Utilities

    The Utilities menu has an option to display information about variables in the working data file

    and control the list of variables from all window types; change the designated Viewer and

    Syntax Editor, etc.

    9. Add-ons

    The Add-ons menu has an option to view the information of add-on modules.

    10. Window

    The Window menu has an option to switch between SPSS windows or to minimize all open

    SPSS windows.

    11. Help

    The Help menu has a standard Microsoft Help window containing information on how to use

    the various features of SPSS. Context-sensitive help is available through the dialog boxes.

    Graphs Utilities Add-ons

    9

  • 7/30/2019 SPSS-All function.doc

    10/56

    Window Help

    10

  • 7/30/2019 SPSS-All function.doc

    11/56

    TOOLBAR IN SPSS

    For each SPSS window there exist a toolbar that provides quick and easy access to common

    tasks of that window. Each icon in the tool bar is provided with Tool Tips. These Tool Tips

    show a brief description of each tool when you put the mouse pointer on the icon.

    The Main Toolbar

    File Buttons

    The first three buttons represent the three most common commands from the File menu Open

    an Existing File, Save the Current File and Print the Current File respectively.

    Dialog Recall

    This button gives you quick access to the previous 12 dialog boxes you were working with. This

    is particularly useful when you are building up an analysis and frequently going back and forth

    to the same box to change or modify options.

    Go to Chart

    This button helps you to open Chart Editor.

    Go to Data

    When you are in a window other than the Data Editor window this button will take you back to

    the Data Editor.

    Go to Case

    11

  • 7/30/2019 SPSS-All function.doc

    12/56

    This button helps you to go quickly to a specific case in the data editor. It is helpful in editing

    data, when an abnormal data points / outliers are found in your analysis and you want to check

    out the source data.

    Variables

    This button creates a dialog box containing a list of all the variables defined in the data file.

    Selecting a variable from this list displays the properties of variables viz., its name, label, type,

    information about missing values and the value labels. This box can be kept open while you

    work with the data file so that you can examine a variable's information as you examine the

    results of an analysis.

    Find

    This button helps you to carry out a simple search to find a value.

    Insert Case / Insert Variable

    It is not unusual to find yourself wanting to add a case or a variable in the middle of data entry.

    These two buttons will help you add a blank row or column in your data set.

    Split File/Weight Cases/Select Cases

    These buttons help you to do three of the Data Menu commands Split File, Weight Cases and

    Select Cases respectively. (Data Menu is discussed later in this section)

    Value Labels

    This button helps you to display the labels in the data editor so that you dont have to remember

    what the numbers meant. Disabling mode of this button would display the number again.

    Use Sets

    12

  • 7/30/2019 SPSS-All function.doc

    13/56

    In Data Editor window you can group variables together into sets so that the variables can be

    analysed together. This button helps you to specify what sets of the ones you have defined you

    want to use.

    SPECIAL (STATISTICS) MENUS

    For every statistical software you may find some menus special and distinct. The special menus

    of SPSS are called the Statistics Menus. These special menus are as follows:

    1. Data Menu

    2. Transform Menu

    3. Analyse Menu

    DATA MENU IN SPSS.

    Data Menu provides procedures to define variables, insert variables or cases, sort cases, merge

    files, split files, select cases and use a variable to weight cases. Some of the menu items in the

    Data Menu such as sorting, merging and transposing data sets and for selecting subset of cases

    and splitting files by variables are explained below.

    a) Define variable properties:

    SPSS offers a wizard-type tool that helps you to set all variable properties using an interactive

    interface. Although it can be used for all types of variables, it is especially useful for categorical

    variables, as it scans the actual variables for all distinct values. From the menu select Data -

    Define variable properties - then a first panel appears that lets you select the variables for which

    you want to set or change properties:

    Data>Define variable properties> take gender>Variable to scan

    13

  • 7/30/2019 SPSS-All function.doc

    14/56

    Select the variables; you can also limit the number of cases to scan (useful with very large files)

    and as the tool is best use with categorical variables, you can also limit the number of values

    (codes) that should be displayed-When you click continue the next panel will pop-up

    14

  • 7/30/2019 SPSS-All function.doc

    15/56

    b) Copy Data Properties

    The Copy Data Properties Wizard provides the ability to use an external SPSS Statistics data

    file as a template for defining file and variable properties in the active dataset. You can also use

    variables in the active dataset as templates for other variables in the active dataset. You can

    copy selected file properties from an external data file or open dataset to the active dataset. File

    properties include documents, file labels, multiple response sets, variable sets, and weighting.

    Copy selected variable properties from an external data file or open dataset to matching

    variables in the active dataset. Variable properties include value labels, missing values, level of

    measurement, variable labels, print and write formats, alignment, and column width (in the Data

    Editor). Copy selected variable properties from one variable in an external data file, open

    dataset, or the active dataset to many variables in the active dataset. Create new variables in the

    active dataset based on selected variables in an external data file or open dataset.

    When copying data properties, the following general rules apply:

    If you use an external data file as the source data file, it must be a data file in SPSS

    Statistics format.

    15

  • 7/30/2019 SPSS-All function.doc

    16/56

    If you use the active dataset as the source data file, it must contain at least one variable.

    You cannot use a completely blank active dataset as the source data file.

    Undefined (empty) properties in the source dataset do not overwrite defined properties

    in the active dataset.

    Variable properties are copied from the source variable only to target variables of a

    matching typestring (alphanumeric) or numeric (including numeric, date, and

    currency).

    From the menus in the Data Editor window choose: Data-Copy Data Properties. Select the data

    file with the file and/or variable properties that you want to copy. This can be a currently open

    dataset, an external SPSS Statistics data file, or the active dataset. Follow the step-by-step

    instructions in the Copy Data Properties Wizard.

    Data>Copy data set> follow the instructions of the wizard

    Define Dates

    The Define Dates dialog box allows you to generate date variables that can be used to establish

    the periodicity of a time series and to label output from time series analysis.

    Name Label

    YEAR_ YEAR, not periodicQUARTER_ QUARTER, period 4MONTH_ MONTH, period 12DATE_ DATE. FORMAT: "MMM YYYY"

    The following is a partial listing of the new variables:

    16

  • 7/30/2019 SPSS-All function.doc

    17/56

    YEAR_ QUARTER_ MONTH_ DATE_

    1950 2 4 APR 19501950 2 5 MAY 19501950 2 6 JUN 19501950 3 7 JUL 1950

    1950 3 8 AUG 1950

    Define Multiple Response Sets

    To define multiple responses sets:

    From the menus, choose-Data- Define Multiple Response Sets

    Select two or more variables. If your variables are coded as dichotomies, indicate which

    value you want to have counted.

    Enter a unique name for each multiple response set. The name can be up to 63 bytes

    long. A dollar sign is automatically added to the beginning of the set name.

    17

  • 7/30/2019 SPSS-All function.doc

    18/56

    Enter a descriptive label for the set. (This is optional.)

    Click Add to add the multiple response set to the list of defined sets.

    Identify Duplicate Cases

    Duplicate cases may occur in your data for many reasons, including:

    Data entry errors in which the same case is accidentally entered more than once.

    Multiple cases share a common primary ID value but have different secondary ID

    values, such as family members who all live in the same house.

    Multiple cases represent the same case but with different values for variables other than

    those that identify the case, such as multiple purchases made by the same person or

    company for different products or at different times.

    Note: Take employee id and current salary to check the duplicate case

    18

  • 7/30/2019 SPSS-All function.doc

    19/56

    Sort Cases

    Sort Cases procedure reorders the sequence of cases based on the values of one or more

    variables. You can optionally sort cases in ascending or descending order, or you can use

    combinations of ascending and descending order for different variables. For example, if you

    select gender as the first sorting variable and minority as the second sorting variable, cases will

    be sorted by minority classification within each gender category.

    Note: Sort cases by jobcat in Descending order

    19

  • 7/30/2019 SPSS-All function.doc

    20/56

    Sort Variables

    You can sort the variables in the active dataset based on the values of any of the variable

    attributes (e.g., variable name, data type, measurement level), including custom variable

    attributes. Values can be sorted in ascending or descending order. You can save the original

    (pre-sorted) variable order in a custom variable attribute.

    Note: Sort variables by width in ascending order

    20

  • 7/30/2019 SPSS-All function.doc

    21/56

    Transpose

    Transpose procedure creates a new data file in which the rows and columns in the original data

    file are transposed so that cases (rows) become variables and variables (columns) become cases.

    Transpose automatically creates new variable names and displays a list of the new variable

    names. A new Untitled file is created with the transposed data set.

    Ex: Flip variables = bdate gender salary salbegin by id

    21

  • 7/30/2019 SPSS-All function.doc

    22/56

    Note: Variable view after transpose

    Note: Data view after transpose

    Please note: the values of gender are string and hence will be converted into SYSMIS.

    Restructuring Data

    Restructuring Data Wizard can help replace the current file with a new, restructured file. The

    wizard can:

    Restructure selected variables into cases.

    Restructure selected cases into variables.

    Transpose all data.

    There are 7 steps to complete restructuring the data. You just need to feed variables in each step

    following the instructions given by SPSS. The screen shots for select steps are given below.

    22

  • 7/30/2019 SPSS-All function.doc

    23/56

    23

  • 7/30/2019 SPSS-All function.doc

    24/56

    24

  • 7/30/2019 SPSS-All function.doc

    25/56

    Merging Data Files

    This procedure helps you to merge data from two files in two different senses. You can:

    Merge the active dataset with another open dataset or SPSS-format data file

    containing the same variables but different cases.

    Take employee data Employee_MergeCase1and merge Empolyee_MergeCase2

    with that. Verify the number of cases and variable once the data is merged.

    Merge the active dataset with another open dataset or SPSS-format data file

    containing the same cases but different variables.

    Take employee data Employee_MergeVariable1and merge

    Empolyee_MergeVariable2 with that. Verify the number of cases and variable

    once the data is merged.

    25

  • 7/30/2019 SPSS-All function.doc

    26/56

    Note: Merge data file containing the same variables but different cases

    26

  • 7/30/2019 SPSS-All function.doc

    27/56

    Note: Merge data file containing the different variables but same cases

    Aggregate Data

    Aggregate Data procedure aggregates groups of cases in the dataset into single cases and creates

    a new, aggregated file or creates new variables in the active dataset that contain aggregated

    data. Cases are aggregated based on the value of one or more break /grouping variables.If you

    create a new, aggregated data file, the new data file contains one case for each group defined by

    the break variables. For example, if there is one break variable with two values, the new data

    file will contain only two cases.

    Exercise: From employee data base, take gender as a Break Variable and education, Job

    category, salary, salary beginning, previous experience as aggregated variables.

    27

  • 7/30/2019 SPSS-All function.doc

    28/56

    Note: Employee dataset before aggregate

    28

  • 7/30/2019 SPSS-All function.doc

    29/56

    Note: New variable after aggregate

    Copy Dataset

    By the click of the option, SPSS creates one complete duplicate dataset.

    Split File

    Split File procedure splits the data file into separate groups for analysis based on the values of

    one or more grouping variables. If you select multiple grouping variables, cases are grouped by

    each variable within categories of the preceding variable. Based on the purpose the files may be

    split up in two ways.

    Compare groups: This option may split up file and compute the statistical procedures

    according to groups defined. The results are presented together for comparison purpose.

    Organize output by groups: All results from each statistical procedure are displayed

    separately for each split up file group.

    29

    Variables aggregate

    by gender

  • 7/30/2019 SPSS-All function.doc

    30/56

    Select Cases

    Select Cases procedure provides several methods for selecting a subgroup of cases based on

    criteria that include variables and complex arithmetical / logical expressions. You can also

    select a random sample of cases. The criteria used to define a subgroup can include:

    Variable values and ranges

    Date and time ranges

    Case (row) numbers

    Arithmetic expressions

    Logical expressions

    Functions

    30

  • 7/30/2019 SPSS-All function.doc

    31/56

    Weight Cases

    Weight Cases procedure gives cases different weights (equivalent to frequency) for statistical

    analysis. This option helps the researcher to work with different sample schemes other than

    simple random sampling.

    The values of the weighting variable should indicate the number of observations represented

    by single cases in your data file.

    Cases with zero, negative, or missing values for the weighting variable are excluded from

    analysis.

    Fractional values are valid; they are used exactly where this is meaningful and most likely

    where cases are tabulated.

    Once you apply a weight variable, it remains in effect until you select another weight variable

    or turn off weighting. If you save a weighted data file, weighting information is saved with the

    data file. You can turn off weighting at any time, even after the file has been saved in weighted

    form.

    31

  • 7/30/2019 SPSS-All function.doc

    32/56

    TRANSFORM MENU IN SPSS

    This menu helps to change, or transform, the values associated with the variables. A number of

    data transformation procedures provided in the Transform Menu. The following are the

    procedures available in Transform Menu.

    32

    Weight on

    No Weights

  • 7/30/2019 SPSS-All function.doc

    33/56

    Computing Variables

    The compute procedure opens up a dialog box that may help to compute values for a defined

    variable based on arithmetic computations defined over other variables.

    You can compute values for numeric or string (alphanumeric) variables.

    You can create new variables or replace the values of existing variables. For new variables,

    you can also specify the variable type and label.

    You can compute values selectively for subsets of data based on logical conditions.

    You can use a large variety of built-in functions, including arithmetic functions, statistical

    functions, distribution functions, and string functions.

    Exercise: From Employee database- create a new variable called NewSalary Add 2000

    to the current salary.

    33

  • 7/30/2019 SPSS-All function.doc

    34/56

    Exercise: From Employee database- create a new variable called Salary_Difference

    Find the difference between current salary and beginning salary.

    Count Values within Cases

    This dialog box creates a variable that counts the occurrences of the same value(s) in a list ofvariables for each case. For example, a survey might contain a list of magazines with yes/no

    34

    Salary Difference =

    Current salary-

    beginning salary

    NewSalary=current Salary+

    20000

  • 7/30/2019 SPSS-All function.doc

    35/56

    check boxes to indicate which magazines each respondent reads. You could count the number

    of yes responses for each respondent to create a new variable that contains the total number of

    magazines read.

    Exercise: From the employee data set- Create a new variable empcat_Minority which

    has a count of job category=3 and minority=0

    35

  • 7/30/2019 SPSS-All function.doc

    36/56

    Shift Values

    In the procedure a new variable is created out of the existing variable either with a lead or lag

    values of the existing variable. We may also simply assign a new name to the existing variable.

    Example: From employee data set take variable Salary_Difference and change it to

    Salary_Gap with a lag of 1.

    36

    Empcat_Minority=

    2 where job

    category is 3 and

    minority is 0

  • 7/30/2019 SPSS-All function.doc

    37/56

    Recode into same variable

    The Recode into same variables dialog box allows you to reassign the values of existing

    variables or collapse ranges of existing values into new values for a new variable. For example,

    you could collapse salaries into a new variable containing salary-range categories.

    You can recode numeric and string variables.

    You can recode numeric variables into string variables and vice versa.

    If you select multiple variables, they must all be the same type. You cannot recode numeric

    and string variables together.

    Exercise: Take variable Gender from the Employee data base. Recode male as 1 and

    female as 0.

    37

    Salary_Gap with

    one lag from

    Salary_Difference

  • 7/30/2019 SPSS-All function.doc

    38/56

    38

    Gender variable

    has been recoded:

    m is recoded as 1

    and f as 0

    Gender takes the value as

    m for male and f for

    female before recoding

  • 7/30/2019 SPSS-All function.doc

    39/56

    Recode into different variable

    The Recode into Different Variables dialog box allows you to reassign the values of existing

    variables or collapse ranges of existing values into new values for a new variable. For example,

    you could collapse salaries into a new variable containing salary-range categories.

    You can recode numeric and string variables.

    You can recode numeric variables into string variables and vice versa.

    If you select multiple variables, they must all be the same type. You cannot recode numeric

    and string variables together. This is less risky because you are able to retain your original

    variable.

    Exercise: Convert the values of variable gender which is in numeric form of 1 and 0

    as Male and Female. Create a new variable called Gender_String for the same.

    39

  • 7/30/2019 SPSS-All function.doc

    40/56

    Automatic Recode

    The Automatic Recode dialog box allows converting string and numeric values into consecutive

    integers. When category codes are not sequential, the resulting empty cells reduce performance

    and increase memory requirements for many procedures. Additionally, some procedures cannot

    use string variables, and some require consecutive integer values for factor levels.

    The new variable(s) created by Automatic Recode retain any defined variable and value labels

    from the old variable. For any values without a defined value label, the original value is used as

    the label for the recoded value. A table displays the old and new values and value labels.

    String values are recorded in alphabetical order, with uppercase letters preceding their

    lowercase counterparts.

    Missing values are recoded into missing values higher than any nonmissing values, with their

    order preserved. For example, if the original variable has 10 nonmissing values, the lowest

    missing value would be recoded to 11, and the value 11 would be a missing value for the new

    variable.

    40

    Gender_string is a new

    variable with values in

    strings. The old variable

    gender has been retained

  • 7/30/2019 SPSS-All function.doc

    41/56

    Use the same recoding scheme for all variables. This option allows you to apply a single

    autorecoding scheme to all the selected variables, yielding a consistent coding scheme for all

    the new variables.

    Exercise: Take variable salary Beginning and apply a automatic recode. Name the new

    variable as Recoded_BeginSalary. Count how many people have been drawing lowest

    beginning salary.

    41

  • 7/30/2019 SPSS-All function.doc

    42/56

    Visual Binning

    Visual Binning is designed to assist the process of creating new variables based on grouping of

    continuous values of existing variables into a limited number of distinct categories. We can use

    Visual Binning to:

    Create categorical variables from continuous scale variables. For example, we could use

    a scale income variable to create a new categorical variable that contains income ranges.

    Collapse a large number of ordinal categories into a smaller set of categories. For

    example, we could collapse a rating scale of nine down to three categories representing

    low, medium, and high.

    In the first step, we select the numeric scale and/or ordinal variables for which we want

    to create new categorical (binned) variables. Optionally, we can limit the number of

    cases to scan. For data files with a large number of cases, limiting the number of cases

    scanned can save time, but we should avoid this if possible because it will affect the

    distribution of values used in subsequent calculations in Visual Binning.

    42

    The beginning salary has

    been ranked in ascending

    order.

  • 7/30/2019 SPSS-All function.doc

    43/56

    Note: String variables and nominal numeric variables are not displayed in the source

    variable list. Visual Binning requires numeric variables, measured on either a scale or

    ordinal level, since it assumes that the data values represent some logical order that can

    be used to group values in a meaningful fashion. We can change the definedmeasurement level of a variable in Variable View in the Data Editor.

    Example: Take Education level which is a continuous variable and change it into

    categorical variable with the help of binning. .

    43

  • 7/30/2019 SPSS-All function.doc

    44/56

    44

  • 7/30/2019 SPSS-All function.doc

    45/56

    Optimal Binning

    The Optimal Binning procedure discreteness one or more scale variables (referred to henceforth

    as binning input variables) by distributing the values of each variable into bins. Bin formation is

    optimal with respect to a categorical guide variable that "supervises" the binning process. Bins

    can then be used instead of the original data values for further analysis. For example: reducing

    the number of distinct values a variable takes has a number of uses, including:

    Data requirements of other procedures. Discredited variables can be treated as

    categorical for use in procedures that require categorical variables. For example, the

    Crosstabs procedure requires that all variables be categorical.

    Data privacy. Reporting binned values instead of actual values can help safeguard the

    privacy of were data sources. The Optimal Binning procedure can guide the choice of

    bins.

    45

    Education level

    with categories

  • 7/30/2019 SPSS-All function.doc

    46/56

    Speed performance. Some procedures are more efficient when working with a reduced

    number of distinct values. For example, the speed of Multinomial Logistic Regression

    can be improved using discredited variables.

    Uncovering complete or quasi-complete separation of data.

    Optimal versus Visual Binning. The Visual Binning dialog boxes offer several automatic

    methods for creating bins without the use of a guide variable. These "unsupervised"

    rules are useful for producing descriptive statistics, such as frequency tables, but

    Optimal Binning is superior when ever end goal is to produce a predictive model.

    Output- The procedure produces tables of cut points for the bins and descriptive

    statistics for each binning input variable. Additionally, we can save new variables to the

    active dataset containing the binned values of the binning input variables and save the

    binning rules as command syntax for use in discrediting new data.

    Exercise: Group Current Salary with respect to Educational level binned.

    46

  • 7/30/2019 SPSS-All function.doc

    47/56

    Output: Current salary binned with respect to Educational Level group

    Current Salary

    Bin

    End Point Number of Cases by Level of Educational Level (years) (Binned)

    Lower Upper 12 - 14 15 - 17 18+ Total

    1 a $31,050 220 68 2 290

    2 $31,050 $43,000 28 64 2 94

    3 $43,000 $59,375 0 33 8 41

    4 $59,375 a 1 10 38 49

    Total 249 175 50 474

    Each bin is computed as Lower

  • 7/30/2019 SPSS-All function.doc

    48/56

    New variable names and descriptive variable labels are automatically generated, based on the

    original variable name and the selected measure(s). A summary table lists the original variables,

    the new variables, and the variable labels.

    Optionally, you can:

    Rank cases in ascending or descending order

    Organize rankings into subgroups by selecting one or more grouping variables for the by

    list. Ranks are computed within each group. Groups are defined by the combination of

    values of the grouping variables. For example, if you select gender and minority as

    grouping variables, ranks are computed for each combination of gender and minority.

    Exercise: Rank Beginning salary in descending order.

    48

  • 7/30/2019 SPSS-All function.doc

    49/56

    Create Time Series

    Several data transformations that are useful in time series analysis are provided in this

    procedure:

    Generate date variables to establish periodicity and to distinguish between historical,

    validation, and forecasting periods.

    Create new time series variables as functions of existing time series variables.

    Replace system- and user-missing values with estimates based on one of several

    methods.

    A time series is obtained by measuring a variable (or set of variables) regularly over a

    period of time. Time series data transformations assume a data file structure in which

    49

    Ranks are assigned to

    Beginning salary in

    descending order

  • 7/30/2019 SPSS-All function.doc

    50/56

    each case (row) represents a set of observations at a different time, and the length of

    time between cases is uniform.

    Exercise: Create a time series taking variable current salary using cumulative sum.

    Replace Missing Values

    Missing observations can be problematic in analysis, and some time series measures cannot be

    computed if there are missing values in the series. Sometimes the value for a particular

    observation is simply not known. In addition, missing data can result from any of the following:

    Each degree of differencing reduces the length of a series by 1.

    Each degree of seasonal differencing reduces the length of a series by one season.

    50

    Cumulative

    sum of current

    salary

  • 7/30/2019 SPSS-All function.doc

    51/56

    If you create new series that contain forecasts beyond the end of the existing series (by

    clicking a Save button and making suitable choices), the original series and the

    generated residual series will have missing data for the new observations.

    Some transformations (for example, the log transformation) produce missing data forcertain values of the original series.

    Missing data at the beginning or end of a series pose no particular problem; they simply

    shorten the useful length of the series. Gaps in the middle of a series (embedded missing

    data) can be a much more serious problem. The extent of the problem depends on the

    analytical procedure you are using.

    The Replace Missing Values dialog box allows you to create new time series variables from

    existing ones, replacing missing values with estimates computed with one of several methods.Default new variable names are the first six characters of the existing variable used to create it,

    followed by an underscore and a sequential number. For example, for the variable price, the

    new variable name would be price_1. The new variables retain any defined value labels from

    the original variables.

    Exercise: Replace missing values in the variable Salary Difference by using series mean.

    51

  • 7/30/2019 SPSS-All function.doc

    52/56

    Random Number Seed

    The Random Number Seed dialog box allows to select the random number generator and to set

    the seed value so as to reproduce a sequence of random numbers. Two different random number

    generators are available:

    Version 12 Compatible. The random number generator used in version 12 and previous

    releases. If you need to reproduce randomized results generated in previous releases

    based on a specified seed value, use this random number generator.

    Mersenne Twister. A newer random number generator that is more reliable for

    simulation purposes. If reproducing randomized results from version 12 or earlier is not

    an issue, use this random number generator.

    The random number seed changes each time a random number is generated for use in

    transformations (such as random distribution functions), random sampling, or case weighting.

    To replicate a sequence of random numbers, set the initialization starting point value prior to

    each analysis that uses the random numbers. The value must be a positive integer.

    52

    Missing values have been

    substituted with series mean

  • 7/30/2019 SPSS-All function.doc

    53/56

    ANALYSE MENU IN SPSS

    The Analyze Menu is the work horse of SPSS. Nearly all procedures that generate output are

    located on this menu. Here only most important of these menu is discussed.

    1. Descriptive Statistics.

    a. Frequencies Statistics

    The Frequencies procedure provides statistics and graphical displays that are useful for

    describing many types of variables. The frequency procedure reports frequency table along

    with select statistics and graphs, viz., Percentile values, Central Tendency, Dispersion,

    Skewness and Kurtosis, and basic charts.

    b. Descriptives Options

    One or more of the following subgroup statistics for the variables within each category of each

    grouping variable: sum, number of cases, mean, median, grouped median, standard error of the

    mean, minimum, maximum, range, variable value of the first category of the grouping variable,

    variable value of the last category of the grouping variable, standard deviation, variance,kurtosis, standard error of kurtosis, skewness, standard error of skewness, percentage of total

    53

  • 7/30/2019 SPSS-All function.doc

    54/56

    sum, percentage of total N, percentage of sum in, percentage of N in, geometric mean, and

    harmonic mean may be computed. You can change the order in which the subgroup statistics

    appear. The order in which the statistics appear in the Cell Statistics list is the order in which

    they are displayed in the output. Summary statistics are also displayed for each variable across

    all categories.

    c. Explore Statistics

    Along with descriptive statistics, M-estimators and Huber's M-estimator are displayed.

    Outliers display the five largest and five smallest values with case labels.

    d. Crosstabs

    The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and

    measures of association for two-way tables.

    2. Compare Means

    The following set of procedures help to compare the differences in means among two or more

    groups.

    a. Independent-Samples T Test

    The Independent-Samples T Test procedure compares means for two groups of cases. Ideally,

    for this test, the subjects should be randomly assigned to two groups, so that any difference in

    response is due to the treatment (or lack of treatment) and not to other factors. This is not the

    case if you compare average income for males and females.

    b. Paired-Samples T Test

    The Paired-Samples T Test procedure compares the means of two variables for a single group.

    The procedure computes the differences between values of the two variables for each case and

    tests whether the average differs from 0.

    c. One-Way ANOVA

    The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitativedependent variable by a single factor (independent) variable. Analysis of variance is used to test

    the hypothesis that several means are equal. This technique is an extension of the two-sample t

    test.

    In addition to determining that differences exist among the means, you may want to know

    which means differ. There are two types of tests for comparing means: a priori contrasts and

    post hoc tests. Contrasts are tests set up before running the experiment, and post hoc tests are

    run after the experiment has been conducted. You can also test for trends across categories.

    54

  • 7/30/2019 SPSS-All function.doc

    55/56

    3. GLM Model

    The GLM procedure provides regression analysis and analysis of variance for one dependent

    variable by one or more factors and/or variables. The factor variables divide the population into

    groups. Using this General Linear Model procedure, you can test null hypotheses about the

    effects of other variables on the means of various groupings of a single dependent variable. You

    can investigate interactions between factors as well as the effects of individual factors, some of

    which may be random. In addition, the effects of covariates and covariate interactions with

    factors can be included. For regression analysis, the independent (predictor) variables are

    specified as covariates.

    4. Bivariate Correlations Options

    The Bivariate Correlations procedure computes Pearson's correlation coefficient, Spearman's

    rho, and Kendall's tau-b with their significance levels. Correlations measure how variables or

    rank orders are related. Before calculating a correlation coefficient, screen your data for outliers

    (which can cause misleading results) and evidence of a linear relationship. Pearson's correlation

    coefficient is a measure of linear association. Two variables can be perfectly related, but if the

    relationship is not linear, Pearson's correlation coefficient is not an appropriate statistic for

    measuring their association.

    5. Partial Correlations

    The Partial Correlations procedure computes partial correlation coefficients that describe the

    linear relationship between two variables while controlling for the effects of one or more

    additional variables. Correlations are measures of linear association. Two variables can be

    perfectly related, but if the relationship is not linear, a correlation coefficient is not an

    appropriate statistic for measuring their association.

    6. Linear Regression Variable Selection Methods

    Method selection allows you to specify how independent variables are entered into the analysis.

    Using different methods, you can construct a variety of regression models from the same set of

    variables.

    7. Discriminant Analysis

    Discriminant analysis builds a predictive model for group membership. The model is composed

    of a discriminant function (or, for more than two groups, a set of discriminant functions) based

    on linear combinations of the predictor variables that provide the best discrimination between

    55

  • 7/30/2019 SPSS-All function.doc

    56/56

    the groups. The functions are generated from a sample of cases for which group membership is

    known; the functions can then be applied to new cases that have measurements for the predictor

    variables but have unknown group membership.

    8. Factor Analysis

    Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of

    correlations within a set of observed variables. Factor analysis is often used in data reduction to

    identify a small number of factors that explain most of the variance that is observed in a much

    larger number of manifest variables. Factor analysis can also be used to generate hypotheses

    regarding causal mechanisms or to screen variables for subsequent analysis (for example, to

    identify collinearity prior to performing a linear regression analysis).