SPSS Course Undergrads

download SPSS Course Undergrads

of 26

description

Spss course for learn statistics

Transcript of SPSS Course Undergrads

Define Variable (Data Menu)

Introduction to SPSS1. Introduction...31.1 SPSS Datafiles

2. Coding and Entering Data42.1 Coding Data (Variable View)2.2 Entering Data (Data View)

3. Descriptive Statistics..83.1 Frequency Distribution3.2 Crosstabs

3.3 Correlation

3.4 Means4. Statistical Tests..174.1 Chi Square Test

4.2 Parametric Tests

4.2.1 Independent Sample T-Test

4.2.2 One Way ANOVA4.2.3 Paired Sample T-Test

4.3 Non-Parametric Tests

4.3.1 Mann-Whitney U Test4.3.2 Kruskal-Wallis Test

4.3.3 Wilcoxon Test5. Linear Regression..236. Re-Coding Data..246.1 Recode Command

6.2 Compute Command

1. INTRODUCTION

SPSS (Statistical Package for the Social Sciences) is a very powerful and widely used Data Analysis Application. It provides a user-friendly tool for analysing Questionnaire Data and other Data Sets. It can also be used in conjunction with most standard spreadsheet and word processing packages to produce professional reports on the findings of any Analysis.

1.1 SPSS DATA FILES

Once SPSS is activated the user a new Datafile is opened automatically. This file contains two Windows, a Data View and a Variable View(see tabs on bottom left of the file).(i) Data View

This Data View consists of a grid of Columns and Rows, similar to a spreadsheet. The Columns represent Variables (this can be questions on a questionnaire etc.) and the Rows represent Cases (individuals or companies who fill in the questionnaire/ provide the data). The raw data is entered in this Window

(ii) Variable View

The variable view also contains a grid of rows and columns. However, in this Window the rows represent the variables in the analysis and the columns are used to define the characteristics of each variable. We will discuss this in greater detail in the next section.

2. CODING AND ENTERING DATA

2.1 Coding Data (Variable View)When processing data it is essential to develop a coding system. This is a system of numerical codes representing different values in the data set. In SPSS the user can enter this coding system using the Variable View Window. To illustrate this process we will look at a simple questionnaire and show how this should be coded in SPSS. This questionnaire is shown is Appendix A.

The first Question is

What is your gender?

and the codes for the responses are

1 = Male and 2 = Female

How can we represent this variable in SPSS?Variable View

To code a variable in SPSS we first need to enter the Variable View by clicking the Variable View tab at the bottom of the screen.

Variable Name

In the first column enter the variable name. The variable name can only contain alphanumeric characters, letters and numbers (no spaces). For the first variable in our questionnaire we will simply use the name Gender. (You could alternatively use the name Q1).Variable Type

The Variable Type allows the used enter the data in different forms, numerical, text, date, financial etc. In quantitative research projects it is recommended to use Numeric when appropriate and this is the default in SPSS.

Width and Decimals

Width and Decimals will be entered automatically and are only used for presentation purposes. In this case set the decimals to zero.Variable Label (Optional)If required enter a Label in the box provided. The Variable Label provides a more detailed description of the question than the Variable Name, as it can contain many more characters. In our example we might enter What is your gender as the Variable Label. If this variable is used in any analysis it is the Variable Label that appears in the Output. If no Variable Label is entered then the Variable Name is used in any Output.

Values

Value labels are used to define the coding system for the variable. To define a coding system for a variable, click on the Values cell and then click the little grey box in the cell. Type the Value and the corresponding Value Label in the boxes provided.

For Gender, type 1 as the Value and Male as the Value Label and then click the ADD button.

Then Type 2 as the Value and Female as the Value Label and click ADD.

When all Values and Value Labels are entered, click OK.

Missing Values

Invariably when data is collected using self-completed questionnaire forms there will be several questions left unanswered. This can happen for a variety of reasons, including simple carelessness, a lack of willingness on the part of the respondent to supply the desired information or a lack of competence to answer the question. Also a question may not be relevant to some of the respondents. In SPSS we must enter a code to represent missing data.

In this example we will enter the value 0 to represent a non-response. To enter a missing value code, click the grey box in the missing values cell and select discrete missing values. Enter 0 as the missing value and click OK

Columns, Align

These are purely for data presentation purposes

Measure

You must also define the measurement type in SPSS. This is important as this definition will affect the analysis you can do with the variable.

-Select the appropriate scale from the drop down list.

- There are three choices: Nominal, Ordinal or Scale* (Ratio or Interval). - Gender in a Nominal Variable.

*Ratio and Interval Measures are coded as Scale Measures in SPSSTo practice what we have just learned and to illustrate two other points lets now code some of the other variables on the questionnaireQ2. What is your age?

____Question 3 is relatively easy to code as it does not need value labels, because the responses to the questions are numerical

Q3. What level of education have you achieved?

Post-Graduate 1Degree 2Secondary 3

Question 2 can be coded using the same steps as Question 1. Choose an appropriate variable name yourself.

Q4.

Do you find time to relax?

Always 1 Usually 2 Sometimes 3 Rarely 4 Never 5

Q5.

Do you ever feel stressed at work?

Always 1 Usually 2 Sometimes 3 Rarely 4 Never 5

Questions 4 and 5 can also be coded like question 1. Notice that the responses to both questions are the same. To save time you can copy the labels from Question 4 and paste them in Question 5.

Q6. Does your job involve the following tasks

Evaluating Staff ___

Managing Staff ___

Training Staff

___

Question 6 is a multiple response question. The respondent is being asked three questions, namely

Does your job involve training staff?

Does your job involve managing staff?

Does your job involve evaluating staff?

The respondent can tick all three responses or none or any combination of them, so you will need three variables to represent this question.

Call these variables Task1, Task2 and Task3, use the three questions as the Variable labels and use the Values 1=Yes and 2= No for each variable.Q7. What is your job title?_________________

This is called an open question as there are no response categories given. To code this you need to wait until the questionnaires are returned and then code the different responses that are given. This is called post-coding. Lets assume here that after examining all the questionnaires there were four different response; Training Office, HR Manager, Managing Director and HR Director. We can now code these options as 1,2,3 and 4 and we can code it as we did Question 1.

Your final variable view should look like the graphic on the next page. Dont worry if you used different Variable names or labels.

2.2 Entering Data

To enter data to a SPSS file we must first return to the Data View. Data is simply typed into the appropriate cell with each cell representing one individuals answer to a given question.

The next screen contains the data for a number of respondents to this questionnaire.

From this data we can see that the first respondent is Male (gender = 1), is aged 55 has a postgraduate degree (Educ=1), usually finds time to relax, is never stressed at work etc. Now, enter some data into your own file. You can make up your responses (4 or 5 respondents is sufficient)Finally save the data file using the File Menu. The File Menu is SPSS is similar to other standard Windows packages. SPSS datafiles are given the prefix .sav, for instance we could call this file stdata.sav

3. DESCRIPTIVE STATISTICS

Descriptive Statistics are a group of techniques for describing the breakdown of a variable or variables. They include Frequency Tables (simply tables and charts of counts), Crosstabs, Mean Scores, and other Statistics.

To illustrate the use of these techniques we will use the data in the file staffdata.sav. This file can be found on Moodle.3.1 Frequencies

The Frequencies procedure provides basic statistics, tables and graphical displays that are useful for describing many types of variables. For a first look at your data, the Frequencies procedure is a good place to start.

To use the frequencies procedure

- Select Descriptive Statistics from the Analyse Menu

- Select Frequencies in the sub-group

- The Frequencies Window contains a list of all the variables in the file

a Variables box, a Display Frequency Tables boxand three pushbuttons namely Statistics, Charts and Format.

- Select the variables you are interested in from the Variables list on the left and place them in the Variables Box on the right (using the arrow in the center). For this exercise, choose Do you find time to relax and Do you ever feel stressed.

- For a Frequency Distribution Table tick the Display Frequency Distribution box. (This will be ticked by default)

- A selection of basic statistical values like the mean, standard deviation and others are available by clicking the Statistics button. - To select the appropriate statistic simply click the adjoining circle (see below). For this example select the mean and the standard deviation.

When all required statistics have been chosen click Continue- If you require a chart click the Charts Button and choose the type of chart required. There are three basic charts available, Pie Charts, Bar Charts and Histograms. In this case choose a Bar Chart and then click Continue.- Finally once all Statistics and Charts have been selected click OK .

Below we can see some the SPSS Output from the Frequency Command for the variable Do you find time to relax (Interval variable)Frequency Tables

Do you find time to relax?

FrequencyPercentValid PercentCumulative Percent

ValidAlways1511.011.011.0

Usually6245.645.656.6

Sometimes5137.537.594.1

Rarely85.95.9100.0

Total136100.0100.0

Statistics

Statistics

Do you find time to relax?

NValid136

Missing0

Mean2.3824

Std. Deviation.76069

Charts

Exercise: Generate the Descriptive Statistics for Gender and Age. Remember to use the appropriate descriptive statistics.The output for both of these variables is shown on the next two pages.

GENDER (Nominal Variable)Frequency Distribution

What is your gender

FrequencyPercentValid PercentCumulative Percent

ValidMale10476.576.576.5

Female3223.523.5100.0

Total136100.0100.0

Chart

Note; No quantitative statistics calculated as this is a Nominal VariableAGE (Ratio Variable)Statistics

Statistics

What is your age

NValid136

Missing0

Mean45.2868

Std. Error of Mean.79636

Median45.0000

Std. Deviation9.28711

Variance86.250

Minimum24.00

Maximum65.00

Percentiles2539.0000

5045.0000

7551.7500

Chart (Histogram)

Note: - A frequency Distribution for this variable would not be a useful descriptive statistic as there are too many values. The data could be recoded into a smaller number of categories and then graphed. (SPSS has a recode function)- A Histogram is chosen as Age has a large number of categories

3.2 Crosstabs

A Crosstab is an extension of a frequency table to 2 or more variables. For example you may wish to look at a breakdown of level of education by gender. To create a Crosstab Table

- Select Descriptive Statistics from the Analyse Menu

- Select Crosstabs in the sub-group

- From the Variable List select the variable(s) to represent the columns and place in the Columns box. For this exercise select Gender.

- Do the same thing for the rows box. Select Education Level for the rows.

- To add percentages click the Cells Button

- Choose the percentages required (e.g. column percentages) by clicking the appropriate circle.

Press ContinueFinally click OKThe SPSS Output for this Crosstab can be seen on the next page

What level of education have you achieved? * What is your gender Crosstabulation

What is your genderTotal

MaleFemale

What level of education have you achieved?Post Graduate StudiesCount421456

% within What is your gender40.4%43.8%41.2%

Primary DegreeCount561672

% within What is your gender53.8%50.0%52.9%

Leaving CertCount628

% within What is your gender5.8%6.3%5.9%

TotalCount10432136

% within What is your gender100.0%100.0%100.0%

3.3 Correlation- Select Correlate from the Analyse Menu

- Select Bivariate

- From the Variable List select the variables you wish to correlate. Choose Age and Number of years in the organisation

- Select the appropriate correlation Statistics

Pearson Ratio/Interval Data

Spearman or Kendalls Ordinal Data

Note: Age and Tenure are ratio variables so Pearson is correct coefficient

Press OKRepeat the correlation a second time. This time use the variables Do you find time to relax and Do you get stressed at work. Because these are both interval measurements we can also use the Pearson Correlation coefficient

The Output for both correlations is shown below

Correlation Age vs Tenure

Correlations

How many years are you in the organisation?What is you age

How many years are you in the organisation?Pearson Correlation1.884(**)

Sig. (2-tailed).000

N135135

What is you agePearson Correlation.884(**)1

Sig. (2-tailed).000

N135136

** Correlation is significant at the 0.01 level (2-tailed).

Correlation between age and number of years in organisation is .884. This indicates a positive relationship between the two variables, which is exactly what we would expect.

Note the two asterisks beside the correlation. This indicates that the correlation is statistically significant.

Correlation Do you find time to relax vs Do you get stressed at work.Correlations

Do you find time to relax?Do you ever feel stressed at work?

Do you find time to relax?Pearson Correlation1-.294**

Sig. (2-tailed).001

N136134

Do you ever feel stressed at work?Pearson Correlation-.294**1

Sig. (2-tailed).001

N134134

**. Correlation is significant at the 0.01 level (2-tailed).

This time the correlation between the variables is -.294, suggesting a significant negative relationship. In other words people who relax more dont feel as stressed at work.

Correlation Grade vs Do you find time to relaxGrade can be treated as an ordinal variable as it has a rank order. Time to relax is an interval variable. When correlating between an interval and ordinal variable we must use Spearmans or Kendalls Tau.Correlations

What is your grade?Do you find time to relax?

Spearman's rhoWhat is your grade?Correlation Coefficient1.000.166

Sig. (2-tailed)..056

N133133

Do you find time to relax?Correlation Coefficient.1661.000

Sig. (2-tailed).056.

N133136

The correlation between Grade and Do you find time to relax? is .166, but we can see that it is not statistically significant which indicates that there is no relationship between the two variables. Put simply, your grade has no impact on the time you find to relax.3.4 Means

This functions allows the user calculate the mean score of a dependent variable across several groups. For example, in our case study we could look at the average age of staff members with different level of education - Select Compare Means from the Analyse Menu- Select Means- Select the dependent variable(s) from the Variable List. Choose age for this example (this value must be a ratio/interval variable).- Select the Independent variable(s) form the Variable List. Level of Education is the independent variable for this example

5. Click OKHere is the associated Output from SPSS

Report

What is your age? What level of education have you achieved?MeanNStd. Deviation

Post Graduate Studies44.1250569.61639

Primary Degree45.9167729.38196

Leaving Cert47.750084.71320

Total45.28681369.28711

Note, that the means command also presents the sample size and standard deviation. It is good practice to present these three statistics together.3.5 Presentation and Tables CommandThe output from the basic descriptive commands in SPSS is not always in a form suitable for reports or presentations. There are a couple of approaches that a SPSS user can use to improve the quality of the output.Copying SPSS Output to MS WordTables and Charts generated in SPSS can be copied directly to MS Word using Copy or Copy Objects (if several tables or charts are being copied). Once in MS Word the tables and charts can then be manipulated in MS Word using the standard formatting tools. Tables Command

In the Analyse Menu there is a Tables sub-menu which allows users create their own self-defined tables. This is particularly useful if a large amount of tables are required. Please experiment with the Tables command to learn how it functions. Please note that variables should always be properly labelled if tables are required for reports or presentations.

4. STATISTICAL TESTS

4.1 Chi-Square Test (See Crosstabs)The Chi-Square Test is used when the test variable in a statistical test is a Nominal variable. For example, we can test if there is a significant difference between male and female education levels.- Select Descriptive Statistics from the Analyse Menu

- Select Crosstabs in the sub-group

- From the Variable List select the variable(s) to represent the columns and place in the Columns box (choose Gender).- Do the same thing for the rows box (choose Education Level)- Add Column percentages using the Cells button as before

- Click the Statistics button and choose Chi-Square Test- Click Continue- Click OK4.2 Parametric Tests

4.2.1 Independent Sample T-Test

Difference between 2 Independent Groups with 1 Ratio/Interval Test Variable. For example is there a significance difference in tenure between males and females? - In Analyse Menu Click Compare Means- Select Independent Sample T-test

- Select Test Variable(s) from Variable List (How many years in the organisation)- Select Grouping Variable from Variable List (Gender)- Click Define Groups button

- Insert the value labels representing the two independent groups. Here, the grouping variable is Gender (1=Male, 2=Female), so we enter 1 as Group 1 and 2 as Group 2.

- Click Continue- Click OKExample from Case StudyThe following SPSS screen shows the Independent Sample T-Test from the Case Study. How many years are you in the organisation is the Test Variable and the Gender is the grouping variable. The labels for the two groups are 1 and 2 representing males and females.

4.2.2 One-Way Anova

Testing the difference between 3 or more Independent Groups, 1 Ratio/Interval test variable. For example, is there a significant difference in tenure across educational levels?- In Analyse Menu Click Compare Means- Select One-Way ANOVA.

- Select the Test Variable(s) (Age) and place it in the Dependent Variable box.

- Select the Grouping Variable (Education Level) and place it in the Factor box.

- Select Options and click Descriptive Statistics.

- Click Continue- Click OKNote, if you require Post Hoc Tests, click the Post Hoc button and choose test (Tukey Test is the most commonly used).

4.2.3 Paired Sample T-Test

2 Ratio/Interval Test Variables

The paired sample T-Test compares the mean difference between our samples and the difference that we would expect to find between population means, it then takes account the standard error of the difference.1. In Analyse Menu Click Compare Means

2. Select Paired Sample T-test i.e Do you find time to relax & Do you feel stressed at work.3. Click on the first variable, and then click the second variable. They should both appear in the current selection as variables 1 and 2 respectively

4. Click the black arrow to enter these two variables into the Paired Variables Window.

5. Repeat the process for any other pair of variables to be tested

6. Click OK

4.3 Non-Parametric Tests

These tests are used for ordinal test variables or ratio and interval test variables, which are not normally distributed.4.3.1 Mann-Whitney U Test

Comparing the difference between 2 Independent groups with 1 Ordinal test variable.- In Analyse Menu Click Non-Parametric Tests

-Select Legacy settings- Select 2 Independent Groups- Select Test Variable(s) from Variable List (Time to relax)- Select Grouping Variable from Variable List (Gender)- Click Define Groups button.

- Insert the value labels representing the two independent groups. - Click Continue- Click OK4.3.2 Kruskal-Wallis Test

Comparing the difference between K>2 Independent groups with 1 Ordinal test variable.- In Analyse Menu Click Non-Parametric Tests-Select Legacy settings- Select K Independent Groups- Select Test Variable(s) from Variable List- Select Grouping Variable from Variable List- Click Define Range button.

- Insert the range of value labels representing the independent groups. - Click Continue- Click OK4.3.3 Wilcoxon Test

This test is used for comparing 2 Ordinal test variables.

- In Analyse Menu Click Non-Parametric Tests- Select Legacy Settings- Select 2 Related Samples- Click on the first variable, and then click the second variable. They should both appear in the current selection as variables 1 and 2 respectively

- Click the black arrow to enter these two variables into the Paired Variables Window.

- Repeat the process for any other pair of variables to be tested

- Click OK5. Linear RegressionRegression is used to examine the relationship between one dependent and one independent variable. After performing an analysis, the regression statistics can be used to predict the dependent variable when the independent variable is known. Regression goes beyond correlation by adding prediction capabilities. Linear regression is the most commonly used type and is used when the dependent variable is ratio/interval.- In the Analyse Menu Select Regression- Select Linear.

- Enter the variable that is being predicted as the Dependent at the top of the dialogue box- Enter the variable(s) being used to predict as the Independent(s). - Click OK

6. Manipulating DataFor analysis purposes it is often desirable to re-code a variable or to calculate a new variable using a combination of old variables.

6.1 Recode CommandA variable like Age with values (1= 20 34, 2 = 35 44, 3 = 45 54, 4 = 55+) can be re-coded into a new variable with two values, say (1 = Up to 44, 2 = 45 and over) using the following procedure

1. In the Transform Menu select Recode

2. Select Into Different Variables3. Select the Input variable (variable to be re-coded) from the Variable list and click the black arrow.

4. Enter the Name of the New (Output) Variable.

5. Enter a Label for the New Variable (Optional)

6. Click the Old and New Values button.

7. Enter the old value of the variable on the left side of the screen and the new value on the right and click the Add button.

For instance, in the example above a respondent who was in the range 35-44 will be coded as 2 in the old variable, but will be coded as 1 in the new variable.

8. Repeat this for all values of the old variable.

9. Click Continue

10. Click OKThe new variable should appear at the end of the variable list.

6.2 Compute CommandThe Compute command allows the user to form a new variable using a mathematical combination of several variables in the data set. In many surveys a researcher will ask several questions on a particular topic. To gauge the respondents overall view on the topic he can combine these questions and calculate the average score. To do this or calculate nay other combination of variables use the following procedure.

1. In the Transform Menu select Compute

2. Enter a name of the New Variable in the Target Variable box.3. In the Numerical Expression Window enter the formula!, using the variable list and the calculator, for calculating the new variable.

e.g Variable1 + Variable2 + Variable3Variable1 * Variable2 (* symbol represents multiplication)

Variable1 / Variable2 (! symbol represents divide)

4. Click OKThe new variable will appear at the end of the variable list.

!There are a large number of mathematical functions available for these calculations.APPENDIX A

Q1. What is your gender?

Male 1 Female 2

Q2. What level of education have you achieved?

Post-Graduate 1Degree

2Secondary 3

Q3. What is your age?

____Q4.

Do you find time to relax?

Always 1 Usually 2 Sometimes 3 Rarely 4 Never 5

Q5.

Do you ever feel stressed at work?

Always 1 Usually 2 Sometimes 3 Rarely 4 Never 5

Q6. Does your job involve the following tasks

Evaluating Staff ___

Managing Staff ___

Training Staff

___

Q7. What is your Job Title?___________________Q8. What is your grade?

Junior Management1Middle Management2

Senior Management3

Q9. How many years are you in the organisation?____

PAGE 16