Spss comd interpret

SPSS: SPSS Commands and Interpreting Statistics

Frequency Distributions

We use frequency distributions to determine the frequency or number of people that fall into a certain category. For example, if we classified those running for Senator or governor as Democratic and Republican, a frequency distribution would allow us to determine the percent that were Democrat and Republican.

In our data file, the variable we used to list candidates as Republican or Democrat was “party.”

1. Go to Analyze—Descriptive Statistics—Frequency

2. Double click on party and then click OK.

Interpreting Frequency Distributions

1. As you can see, the two parties are listed below: Democrat and Republican. The “Missing” category simply reflect the candidates whose party affiliation we could not determine. 2. Under the “Frequency” column, we have the number of candidates that were Democrat (186), Republican (280), or unclassified or missing (5). 3. Finally, we typically use the “Valid Percent” column in deterring the frequency distribution of Democrats and Republicans because it does not take into consideration those cases where we could not assign a category. In this case, 39.9 percent were Democrat and 61.1 percent were Republican. Clearly, there is a greater number of Republican than Democratic candidates.

Political Party

Frequency Percent Valid Percent

Cumulative Percent

Democrat 186 39.5 39.9 39.9 Republican 280 59.4 60.1 100.0

Valid

Total 466 98.9 100.0 Missing 9.00 5 1.1 Total 471 100.0

Chi Square Test

Often we have two nominal level variables (gender, party affiliation, or ethnicity for example) and we need to determine if a relationship exists between them. For example, we may want to know if ethnicity is related to party affiliation. We suspect it is the case and we hypothesize because that minorities are associated with the Democratic Party and whites with the Republican Party.

Using a crosstab table and Chi Square test, we can determine if there is a relationship between two variable that IS NOT DUE TO CHANCE.

1. To do this, go to Analyze—Descriptive Statistics—Crosstabs.

We put the “Political Party” in the Row because the dependent variable ALWAYS goes in the Row box. We put “Ethnicity” in the Column because the independent (explanatory variable) ALWAYS goes in the “Column” box.

2. Next we click the “Statistics” button and click “Chi Square”, “Phi and Cramer’s V”, and “Lambda.” Click the “Continue” button.

3. Next, click the “Cell” button. Under “Counts”, check “Observed” and under “Percentages” click “Row”, “Column”, and “Total”. Then click the “Continue” Button.

4. Click the “OK” Button to run your crosstab.

Interpreting Your Crosstab

1. Reading a crosstabulation can be confusing. Over the years, I have found the following to be helpful in reading them. First, we always begin with the dependent variable that is listed in the column. In this case it is ethnicity, and since we are looking at ethnicity, we will read the cell associated with “% within Ethnicity (2)”. Here is how we read this table. If we are interested in what party Non-whites support, we say:

“Of those who are non-white, 72.1% are Democrats.” And “of those people who are non-white, 27.9% are Republicans.”

If we are interested in the white respondents, we say:

“Of those who are white, 35.5% are Democrat AND 64.5% are Republican.”

If you use this phrase and fill-in the blanks, you can interpret this table properly every time!

“Of those who are _____, ____% are _______ AND _____% are _______.

Political Party * Ethnicity (2) Crosstabulation Ethnicity (2)

White

Non-White Totl

Count 146 31 177 % within Political Party

82.5% 17.5% 100.0%

% within Ethnicity (2) 35.5% 72.1% 39.0%

Democrat

% of Total 32.2% 6.8% 39.0% Count 265 12 277 % within Political Party

95.7% 4.3% 100.0%


Political Party

Republican

% of Total 58.4% 2.6% 61.0% Count 411 43 454 % within Political Party

90.5% 9.5% 100.0%


Total

% of Total 90.5% 9.5% 100.0%

2. We thought, hypothesized, that ethnicity was related to party affiliation: Non-‐whites were more likely to be Democrat and Whites more likely to be Republican. As you can see from the table above, this is true. 72% of non-‐whites called themselves Democrats and 65% of whites called themselves Republicans. So our statistics bear out our hypothesis.

3. However, is there a possibility that the relationship between ethnicity and party affiliation is due to chance—that is to say, there really is no statistically significant reason to believe these variables are related to one another.

To answer this question, we use the Pearson Chi-‐Square test. Look at the table below. In the Pearson Chi-‐Square row, there are numbers under three “Sig.” columns. Disregard the column for the time being. If the number is between .000 and .050, we can say that the relationship between the independent variable (ethnicity in this case) is significantly related to the dependent variable (party affiliation). This is another way of saying that the relationship is not due to chance and really exists! As you can see below, the Chi-‐Square coefficient (number) is .000 under the “Asymp. Sign (2-‐sided)” column. Therefore, ethnicity is definitely related to party affiliation

If the number is .051 or above, the significance is due to “chance” and we say that we are not confident that the ethnicity and party affiliation are related. Our hypothesis that ethnicity is related to party affiliation is rejected.

Chi-Square Tests

Value df

Asymp. Sig. (2-sided)

Exact Sig. (2-sided)

Exact Sig. (1-sided)

Pearson Chi-Square 21.886a 1 .000 Continuity Correctionb 20.375 1 .000 Likelihood Ratio 21.438 1 .000 Fisher's Exact Test .000 .000 Linear-by-Linear Association

21.838 1 .000

N of Valid Cases 454 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 16.76. b. Computed only for a 2x2 table

4. How strong is the relationship between the independent variable (ethnicity) and the dependent variable (party affiliation). The are two measures of association and for our purposes use Cramer’s V unless SPSS spits out only a Phi statistic. Under the “Value” column, a number is listed. The higher the number, the greater the strength of association. Let’s use the following scale:

0-‐.30=no relationship (0) to weak relationship

.31-‐.70=moderate relationship

.71-‐1.0=strong relationship

A strong relationship means that knowing the ethnicity of a person will give us very good reason to guess the political party with which they are affiliated. A weak relationship, means that knowing the ethnicity of a person gives does not give us much confidence is guessing the person’s political party affiliation. In this case, the association is weak (.220). If I guess the person’s political affiliation based on a person’s apparent race, I would likely be wrong!

Symmetric Measures Value Approx. Sig.

Phi -.220 .000 Nominal by Nominal Cramer's

V .220 .000

N of Valid Cases 454

Pearson Correlation

A correlation is a powerful way to determine the association between two interval level variables. An interval level variable is one whose values are an equal distance apart. For example, income (dollars), ages (years), experience in politics measured in years (years), and percent of the vote (percentages). Male and female are not interval level variables, because they are not expressed in values equal distance apart. They are categorical variables. For example, we may be interested in determining if political experience as measured by the number of years a person has served in office is related to campaign funds raised. We suspect that the longer the incumbent is in office, the more campaign funds s/he will raise. After all, an incumbent has political power and is likely to be reelected: we would want to contribute to the incumbent.

1. To do a correlation analysis, go to Analyze—Correlation—Bivariate

2. Find and double click the variables “Political Experience” and “Money Raised”. This will put these two variables in the variable window.

3. Click the “OK” button to run your correlation.

Interpreting Your Pearson Correlation

1. A correlation coefficient (number) represents the strength of an association between to variables. The higher the number, the greater the strength of association. Let’s use the following scale:

0-‐.30=no relationship (0) to weak relationship

.31-‐.70=moderate relationship

.71-‐1.0=strong relationship

2. In this case the correlation between “Political Experience” and “Money Raised” is .331** This would be moderate relationship.

3. The “Sig. (2-‐tailed)” is important. It tells us if the relationship is due to chance. If the correlation coefficient (number) is between .000 and .050, we can say that the political experience and money raised are significantly related and we can say that an increase in political experience will lead to an increase in campaign contributions. If the coefficient is .051 or more, we say that we cannot be confident that political experience and money raised are related or associated.

In this case, we can say that there is a “moderate, significant relationship between political experience and money raised.

Correlations

Political Experience (Years)

Money Raised

Pearson Correlation

1 .331**

Sig. (2-tailed) .000

Political Experience (Years)

N 462 414 Pearson Correlation

.331** 1

Sig. (2-tailed) .000

Money Raised

N 414 421 **. Correlation is significant at the 0.01 level (2-tailed).

Multiple Regression

A very powerful way to analyze data is by using a “multiple regression.” For our purposes, a multiple regression allow us to look at several factors that affect a dependent variable and determine what factors exert a greater influence on the dependent variable. For example, we may suspect that the size of a person’s vote is determined by the quality of the candidate AND the amount of money raised. After all, better Senate candidates will win a greater percentage of the vote than poorer Senate candidates and candidates with more money will be able to spend more to get elected. With more money to spend, they should get a greater percent of the vote. But, which factor is more important: candidate quality or money raised. To answer this question, we do a multiple regression.

1. Go to Analyze—Regression—Linear

2. Since the dependent variable is the percentage of the vote a candidate received, we put “Vote: Primary or Convention” in the “Dependent” variable box. The two independent variables we expect to influence the dependent variable goe in the “Independent(s)” variable box. It should look like this:

3. Click the “OK” button.

Interpreting Your Multiple Regression

1. Your output produces a number of tables. Let’s look at the most important tables.

1. The first table, “Variables Entered/Removed”, tells you what variables were used in the analysis. As you can see, “Money Raised” and “Political Experience” were used. Under the table, you can see that the dependent variable was “Vote: Primary or Convention.”

Variables Entered/Removedb,c

Model Variables Entered Variables Removed Method

1 Money Raised, Political Experience (Years)

. Enter

a. All requested variables entered. b. Dependent Variable: Vote: Primary or Convention c. Models are based only on cases for which Office = Senate

2. There are two “coefficients” or numbers that are important: the “R” and “R Square.” The “R” is the combined effect of all the independent variables on the dependent variable. In this case there is a moderate, positive association between money raised and candidate quality (.662). The “R Square” simply means that these two variables explain 43.8 percent of the variance in the dependent variable: the vote. This is a technical way of saying that there are other factors (variables) that explain the remaining 56.2 percent of the variance. What might they be? How about incumbency or candidate quality?

Model Summary

R

Model

Office = Senate

(Selected) R Square Adjusted R

Square Std. Error of the Estimate

1 .662a .438 .432 20.33142 a. Predictors: (Constant), Money Raised, Political Experience (Years)

3. In the ANOVA table, look only at the “Sig.” column. If the number is between .000-‐.05 inclusive, then we can say that the relationship between the independent variables (money raised and candidate quality in this case) and the dependent variable (share of the vote) is not due to chance—which is the case here. This means that we are confident that money raised and candidate quality influence the vote. If it is greater than .05 (for example .051 or .60 or .154), then the relationship MIGHT BE DUE TO CHANCE and we should say we are not confident that money raised and candidate quality are linked to the percentage of the vote.

ANOVAb,c

Model Sum of Squares df Mean Square F Sig. Regression 62539.414 2 31269.707 75.646 .000a Residual 80193.138 194 413.367

1

Total 142732.552 196 a. Predictors: (Constant), Money Raised, Political Experience (Years) b. Dependent Variable: Vote: Primary or Convention c. Selecting only cases for which Office = Senate

4. A very important table is the “Coefficients” table. This table tell us, among other things, how much influence each independent variable exerts on the depend variable. Note the following columns.

a. Under “Model” are listed the two independent variables—Political Experience” and “Money Raised.”

b. Really important are the coefficients (numbers) under the column “Standardized Coefficients, Beta”. The higher the number the more influence this variable influences the dependent variable, the percentage of the vote. In this case, you can see that “Political Experience” (.398) is more important than “Money Raised” (.370)—but not much more. Thus, we can say that political experience is more important than money in explaining voting for Senate candidates—but not by much!

In some cases the Beta coefficient will have a negative sign in front of it. Disregard this sign in interpreting which variable exerts the most influence over the dependent variable. The larger the number, regardless of the sign, exerts more influence.

c. The “Sig.” column simply states whether the independent variables (political experience and money raised) are significantly related to the dependent variable (percent of the vote). If the number is between .000 and .050, we can say that the relationship is NOT due to chance: that there is a significant relationship between this variable and the dependent variable. As you can see, the relationship is

significant and we can say that “political experience and money raised are significantly related to the vote.”

Coefficientsa,b Unstandardized

Coefficients Standardized Coefficients

Model B Std. Error Beta t Sig. (Constant) 16.224 1.698 9.556 .000 Political Experience (Years)

1.077 .166 .398 6.485 .000 1

Money Raised 2.470E-6 .000 .370 6.024 .000 a. Dependent Variable: Vote: Primary or Convention b. Selecting only cases for which Office = Senate

Spss comd interpret

Data & Analytics

Transcript of Spss comd interpret