Crosstabulation and Measures of Association
-
Upload
derek-levine -
Category
Documents
-
view
46 -
download
0
description
Transcript of Crosstabulation and Measures of Association
![Page 1: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/1.jpg)
Crosstabulation and Measures of Crosstabulation and Measures of AssociationAssociation
![Page 2: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/2.jpg)
Investigating the relationship between two Investigating the relationship between two variablesvariables
Generally a statistical relationship exists if the Generally a statistical relationship exists if the values of the observations for one variable are values of the observations for one variable are associated with the values of the observations associated with the values of the observations for another variablefor another variable
Knowing that two variables are related allows us Knowing that two variables are related allows us to make predictions.to make predictions.
If we know the value of one, we can predict the If we know the value of one, we can predict the value of the other.value of the other.
![Page 3: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/3.jpg)
Determining how the values of one Determining how the values of one variable are related to the values of variable are related to the values of another is one of the foundations of another is one of the foundations of empirical science. empirical science.
In making such determinations we must In making such determinations we must consider the following features of the consider the following features of the relationship.relationship.
![Page 4: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/4.jpg)
1.) The level of measurement of the variables. 1.) The level of measurement of the variables. Difference varibles necessitate different Difference varibles necessitate different procedures.procedures.
2.) The form of the relationship. We can ask if 2.) The form of the relationship. We can ask if changes in X move in lockstep with changes in changes in X move in lockstep with changes in Y or if a more sophisticated relationship exists.Y or if a more sophisticated relationship exists.
3.)The strength of the relationship. Is it 3.)The strength of the relationship. Is it possible that some levels of X will always be possible that some levels of X will always be associated with certain levels of Y?associated with certain levels of Y?
![Page 5: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/5.jpg)
4.) Numerical Summaries of the relationship. 4.) Numerical Summaries of the relationship. Social scientists strive to boil down the different Social scientists strive to boil down the different aspects of a relationship to a single number that aspects of a relationship to a single number that reveals the type and strength of the association.reveals the type and strength of the association.
5.) Conditional relationships. The variables X 5.) Conditional relationships. The variables X and Y may seem to be related in some fashion and Y may seem to be related in some fashion but appearances can be deceiving. but appearances can be deceiving. Spuriousness for example. So we need to know Spuriousness for example. So we need to know if the introduction of any other variables into the if the introduction of any other variables into the analysis changes the relationship.analysis changes the relationship.
![Page 6: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/6.jpg)
Types of AssociationTypes of Association
1.) General Association – simply 1.) General Association – simply associated in some way.associated in some way.
2.) Positive Monotonic Correlation – when 2.) Positive Monotonic Correlation – when the variables have order (ordinal or the variables have order (ordinal or continuous) high values of one var are continuous) high values of one var are associated with high values of the other. associated with high values of the other. Converse is also true.Converse is also true.
3.) Negative Monotonic Correlation – Low 3.) Negative Monotonic Correlation – Low values are associated with high values.values are associated with high values.
![Page 7: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/7.jpg)
Types of Association Cont.Types of Association Cont.
4.) Positive Linear Association – A 4.) Positive Linear Association – A particular type of positive monotonic particular type of positive monotonic relationship where the plotted values of X-relationship where the plotted values of X-Y fall on a straight line that slopes upward.Y fall on a straight line that slopes upward.
5.) Negative Linear Relaionship – Straight 5.) Negative Linear Relaionship – Straight line that slopes downward.line that slopes downward.
![Page 8: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/8.jpg)
Strength of RelationshipsStrength of Relationships
Virtually no relationships between Virtually no relationships between variables in Social Science (and largely in variables in Social Science (and largely in natural science as well) have a perfect natural science as well) have a perfect form.form.
As a result it makes sense to talk about As a result it makes sense to talk about the strength of relationships.the strength of relationships.
![Page 9: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/9.jpg)
Strength Cont.Strength Cont.
The strength of a relationship between The strength of a relationship between variables can be found by simply looking at variables can be found by simply looking at a graph of the data.a graph of the data.
If the values of X and Y are tied together If the values of X and Y are tied together tightly then the relationship is strong.tightly then the relationship is strong.
If the X-Y points are spread out then the If the X-Y points are spread out then the relationship is weak.relationship is weak.
![Page 10: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/10.jpg)
Direction of RelationshipDirection of Relationship
We can also infer direction from a graph We can also infer direction from a graph by simply observing how the values for our by simply observing how the values for our variables move across the graph.variables move across the graph.
This is only true, however, when our This is only true, however, when our variables are ordinal or continuous.variables are ordinal or continuous.
![Page 11: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/11.jpg)
Types of Bivariate Relationships Types of Bivariate Relationships and Associated Statisticsand Associated Statistics
Nominal/Ordinal (including dichotomous)Nominal/Ordinal (including dichotomous) Crosstabulation (Lamda, Chi-Square Gamma, etc.)Crosstabulation (Lamda, Chi-Square Gamma, etc.)
Interval and DichotomousInterval and Dichotomous Difference of means testDifference of means test
Interval and Nominal/OrdinalInterval and Nominal/Ordinal Analysis of VarianceAnalysis of Variance
Interval and RatioInterval and Ratio Regression and correlationRegression and correlation
![Page 12: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/12.jpg)
Assessing Relationships between Assessing Relationships between VariablesVariables
1. Calculate appropriate statistic to 1. Calculate appropriate statistic to measure the magnitude of the relationship measure the magnitude of the relationship in the samplein the sample
2. Calculate additional statistics to 2. Calculate additional statistics to determine if the relationship holds for the determine if the relationship holds for the population of interest (statistical population of interest (statistical significance)significance) Substantive significance vs. Statistical Substantive significance vs. Statistical
significancesignificance
![Page 13: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/13.jpg)
What is a Crosstabulation?What is a Crosstabulation?
Crosstabulations are appropriate for examining Crosstabulations are appropriate for examining relationships between variables that are relationships between variables that are nominalnominal, , ordinalordinal, or , or dichotomousdichotomous..
Crosstabs show values for variables categorized Crosstabs show values for variables categorized by another variable.by another variable.
They display the joint distribution of values of the They display the joint distribution of values of the variables by listing the categories for one along variables by listing the categories for one along the x-axis and the other along the y-axisthe x-axis and the other along the y-axis
![Page 14: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/14.jpg)
Each case is then placed in a cell of the Each case is then placed in a cell of the table that represents the combination of table that represents the combination of values that corresponds to its scores on values that corresponds to its scores on the variables.the variables.
![Page 15: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/15.jpg)
What is a Crosstabulation?What is a Crosstabulation?
Example: We would like to know if Example: We would like to know if presidential vote choice in 2000 was presidential vote choice in 2000 was related to race. related to race.
Vote choice = Gore or BushVote choice = Gore or Bush Race = White, Hispanic, BlackRace = White, Hispanic, Black
![Page 16: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/16.jpg)
Are Race and Vote Choice Are Race and Vote Choice Related? Why?Related? Why?
Black
Hispanic
White
TOTAL
Gore
106
23
427
556
Bush
8
15
484
507
TOTAL
114
38
911
1063
![Page 17: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/17.jpg)
Are Race and Vote Choice Are Race and Vote Choice Related? Why?Related? Why?
Black
Hispanic
White
TOTAL
Gore
106 (93%)
23 (60.5%)
427 (46.9%)
556 (52.3)
Bush
8 (7%)
15 (39.5%)
484 (53.1%)
507 (47.7)
TOTAL
114 (100%)
38 (100%)
911 (100%)
1063 (100%)
![Page 18: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/18.jpg)
Measures of Association for Measures of Association for CrosstabulationsCrosstabulations
Purpose – to determine if nominal/ordinal Purpose – to determine if nominal/ordinal variables are related in a crosstabulationvariables are related in a crosstabulation
At least one nominal variableAt least one nominal variable LamdaLamda Chi-SquareChi-Square Cramer’s VCramer’s V
Two ordinal variablesTwo ordinal variables TauTau GammaGamma
![Page 19: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/19.jpg)
Measures of Association for Measures of Association for CrosstabulationsCrosstabulations
These measures of association provide us with These measures of association provide us with correlation coefficients that summarize data from correlation coefficients that summarize data from a table into one number .a table into one number .
This is extremely useful when dealing with This is extremely useful when dealing with several tables or very complex tables.several tables or very complex tables.
These coefficients measure both the strength These coefficients measure both the strength and direction of an association.and direction of an association.
![Page 20: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/20.jpg)
Coefficients for Nominal DataCoefficients for Nominal Data
When one or both of the variables are When one or both of the variables are nominal, ordinal coefficients cannot be nominal, ordinal coefficients cannot be used because there is no underlying used because there is no underlying ordering.ordering.
Instead we use PRE testsInstead we use PRE tests
![Page 21: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/21.jpg)
Lambda (PRE coefficient)Lambda (PRE coefficient)
PRE – Proportional Reduction in ErrorPRE – Proportional Reduction in Error
Two RulesTwo Rules 1.) Make a prediction on the value of an 1.) Make a prediction on the value of an
observation in the absence of no prior observation in the absence of no prior informationinformation
2.) Given information on a second variable 2.) Given information on a second variable and take it into account in making the and take it into account in making the prediction.prediction.
![Page 22: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/22.jpg)
Lambda PRELambda PRE
If the two variables are associated then the use of If the two variables are associated then the use of rule two should lead to fewer errors in your rule two should lead to fewer errors in your predictions than rule one.predictions than rule one.
How many fewer errors depends upon how closely How many fewer errors depends upon how closely the variables are associated.the variables are associated.
PRE = (E1 – E2) / E1PRE = (E1 – E2) / E1
Scale goes from 0 -1Scale goes from 0 -1
![Page 23: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/23.jpg)
LambdaLambda
Lambda is a PRE coefficient and it relies on rules 1 Lambda is a PRE coefficient and it relies on rules 1 & 2 above. & 2 above.
When applying rule one all we have to go on is what When applying rule one all we have to go on is what proportion of the population fit into one category as proportion of the population fit into one category as opposed to another.opposed to another.
So, without any other information, guessing that So, without any other information, guessing that every observation is in the modal category would every observation is in the modal category would give you the best chance of getting the most correct.give you the best chance of getting the most correct.
![Page 24: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/24.jpg)
Why?Why?
Think of it like this. If you knew that I Think of it like this. If you knew that I tended to make exams where the most tended to make exams where the most often used answer was B, then, without often used answer was B, then, without any other information, you would be best any other information, you would be best served to pick B every time.served to pick B every time.
![Page 25: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/25.jpg)
But, if you know information about each But, if you know information about each case’s value on another variable, rule two case’s value on another variable, rule two directs you to only look at the members of directs you to only look at the members of that new category (variable) and find the that new category (variable) and find the modal category (only on that var).modal category (only on that var).
![Page 26: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/26.jpg)
ExampleExample
Suppose a sample of 100 voters and you need to Suppose a sample of 100 voters and you need to predict how they will vote in the general election. predict how they will vote in the general election.
Assume we know that overall 30% voted democrat Assume we know that overall 30% voted democrat and 30% voted republican and 40% were and 30% voted republican and 40% were independent.independent.
Now suppose we take one person out of the group Now suppose we take one person out of the group (John Smith), our best guess would be that he (John Smith), our best guess would be that he would vote independent.would vote independent.
![Page 27: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/27.jpg)
Now suppose we take another person (Larry Now suppose we take another person (Larry Mendez) and again we would assume he Mendez) and again we would assume he voted independent.voted independent.
As a result our best guess is to predict that all As a result our best guess is to predict that all of the voters (all 100) were independent.of the voters (all 100) were independent.
We are sure to get some wrong but it’s the We are sure to get some wrong but it’s the best we can do over the long run.best we can do over the long run.
![Page 28: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/28.jpg)
How many do we get wrong? 60.How many do we get wrong? 60.
Suppose now that we know something Suppose now that we know something about the voters regions (where they are about the voters regions (where they are from) and we know what proportions from) and we know what proportions various regions voted in the election.various regions voted in the election.
NE-30 , MW – 20, SO – 30 , WE - 20NE-30 , MW – 20, SO – 30 , WE - 20
![Page 29: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/29.jpg)
LamdaLamda
NE
MW
SO
WE
TOTAL
REPUB
1 2 4
1 2 10
1 2 6
1 2 10
30
IND
1 2 12
1 2 8
1 2 16
1 2 4
40
DEM
1 2 14
1 2 2
1 2 8
1 2 6
30
TOTAL
30
20
30
20
100
![Page 30: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/30.jpg)
Lamda – Rule 1 Lamda – Rule 1 (prediction based solely on knowledge of marginal (prediction based solely on knowledge of marginal distribution of dependent variable – partisanship)distribution of dependent variable – partisanship)
NE
MW
SO
WE
TOTAL
REPUB
1 2 4 0
1 2 10 0
1 2 6 0
1 2 10 0
30
IND
1 2 12 30
1 2 8 20
1 2 16 30
1 2 4 20
40
DEM
1 2 14 0
1 2 2 0
1 2 8 0
1 2 6 0
30
TOTAL
30
20
30
20
100
![Page 31: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/31.jpg)
Lamda – Rule 2Lamda – Rule 2(prediction based on knowledge provided by independent (prediction based on knowledge provided by independent
variable )variable )
NE
MW
SO
WE
TOTAL
REPUB
1 2 4 0 0
1 2 10 0 20
1 2 6 0 0
1 2 10 0 20
30
IND
1 2 12 30 0
1 2 8 20 0
1 2 16 30 30
1 2 4 20 0
40
DEM
1 2 14 0 30
1 2 2 0 0
1 2 8 0 0
1 2 6 0 0
30
TOTAL
30
20
30
20
100
![Page 32: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/32.jpg)
Lamda –Calculation of ErrorsLamda –Calculation of Errors Errors w/Rule 1: 18 + 12 + 14 + 16 = 60Errors w/Rule 1: 18 + 12 + 14 + 16 = 60 Errors w/Rule 2: 16 + 10 + 14 + 10 = 50Errors w/Rule 2: 16 + 10 + 14 + 10 = 50 Lamda =(Errors R1 – Errors R2)/Errors R1Lamda =(Errors R1 – Errors R2)/Errors R1 Lamda = (60-50)/60=10/60=.17Lamda = (60-50)/60=10/60=.17
NE
MW
SO
WE
TOTAL
REPUB
1 2 4 0 0
1 2 10 0 20
1 2 6 0 0
1 2 10 0 20
30
IND
1 2 12 30 0
1 2 8 20 0
1 2 16 30 30
1 2 4 20 0
40
DEM
1 2 14 0 30
1 2 2 0 0
1 2 8 0 0
1 2 6 0 0
30
TOTAL
30
20
30
20
100
![Page 33: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/33.jpg)
LamdaLamda
PRE measurePRE measure Ranges from 0-1Ranges from 0-1 Potential problems with LamdaPotential problems with Lamda
Underestimates relationship when variables Underestimates relationship when variables (one or both) are highly skewed(one or both) are highly skewed
Always 0 when modal category of Y is the Always 0 when modal category of Y is the same across all categories of Xsame across all categories of X
![Page 34: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/34.jpg)
Chi –Square (Chi –Square (22))
Also appropriate for any crosstabulation Also appropriate for any crosstabulation with at least one nominal variable (and with at least one nominal variable (and another nominal/ordinal variable)another nominal/ordinal variable)
Based on the difference between the Based on the difference between the empirically observed crosstab and what empirically observed crosstab and what we would expect to observe if the two we would expect to observe if the two variables are variables are statistically independentstatistically independent
![Page 35: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/35.jpg)
Background for Background for 22
Statistical Independence Statistical Independence – A property of two – A property of two variables in which the probability that an variables in which the probability that an observation is in a particular category of on variable observation is in a particular category of on variable and also in a particular category of the other and also in a particular category of the other variable equals the simple or marginal probability of variable equals the simple or marginal probability of being in those categories.being in those categories.
Plays a large role in data analysis Plays a large role in data analysis
Is another way to view the strength of a relaitionshipIs another way to view the strength of a relaitionship
![Page 36: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/36.jpg)
ExampleExample
Suppose we have two nominal or categorical Suppose we have two nominal or categorical variables, X and Y. We label the categories for variables, X and Y. We label the categories for the first category (a,b,c) and those of the the first category (a,b,c) and those of the second (r,s,t). second (r,s,t).
Let P(X = a) stand for the probability that a Let P(X = a) stand for the probability that a randomly selected case has property a on randomly selected case has property a on variable X and P(Y = r) stand for the probability variable X and P(Y = r) stand for the probability that a randomly selected case has property r that a randomly selected case has property r on variable Y. on variable Y.
![Page 37: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/37.jpg)
These two probabilities are called marginal These two probabilities are called marginal distributions and simply refers to the distributions and simply refers to the chance that an observation has a chance that an observation has a particular value on a particular variable particular value on a particular variable irrespective of its value on another irrespective of its value on another variable.variable.
![Page 38: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/38.jpg)
Finally, let us assume that P(X = a, Y = r) stands for Finally, let us assume that P(X = a, Y = r) stands for the joint probability that a randomly selected the joint probability that a randomly selected observation has both property a and property r observation has both property a and property r simultaneously.simultaneously.
Statistical Independence – The two variables are Statistical Independence – The two variables are therefore statisitically independent only if the therefore statisitically independent only if the chances of observing a combination of categories is chances of observing a combination of categories is equal to the marginal probability of choosing one equal to the marginal probability of choosing one category times the marginal probability of the other.category times the marginal probability of the other.
![Page 39: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/39.jpg)
Background for Background for 22
P(X = a, Y = r) = [P(X = a)] [P(Y = r)]P(X = a, Y = r) = [P(X = a)] [P(Y = r)]
For example, if men are as likely to vote as For example, if men are as likely to vote as women, then the two variables (gender and women, then the two variables (gender and voter turnout) are statistically independent voter turnout) are statistically independent because the probability of observing a male because the probability of observing a male nonvoter in the sample is equal to the nonvoter in the sample is equal to the probability of observing a male times the probability of observing a male times the probability of obseving a nonvoter.probability of obseving a nonvoter.
![Page 40: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/40.jpg)
ExampleExample
If 100/300 are men & 210/300 voted then;If 100/300 are men & 210/300 voted then;
The marginal probabilities are:The marginal probabilities are:
P(X=m)=100/300 = .33 and P(Y=v) = P(X=m)=100/300 = .33 and P(Y=v) = 210/300 = .7210/300 = .7
.33 x .7 = .23 and is our marginal probability.33 x .7 = .23 and is our marginal probability
![Page 41: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/41.jpg)
If we know that 70 of the voters are male If we know that 70 of the voters are male and take that proportion and divide by the and take that proportion and divide by the total number of voters (70/300) we also total number of voters (70/300) we also get .23.get .23.
We can therefore say that the two We can therefore say that the two variables are independent. variables are independent.
![Page 42: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/42.jpg)
The chi-squared statistic essentially The chi-squared statistic essentially compares an observed result (the table compares an observed result (the table produced by the sample) with a hypothetical produced by the sample) with a hypothetical table that would occur if (in the population) table that would occur if (in the population) the variables were statistically independent.the variables were statistically independent.
A value of 0 implies statistical independence A value of 0 implies statistical independence which means no association.which means no association.
![Page 43: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/43.jpg)
Chi-squared increases as the departures Chi-squared increases as the departures of observed and expected values grows. of observed and expected values grows. There is no upper limit to how big the There is no upper limit to how big the difference can become but if it is past a difference can become but if it is past a critical value then there is reason to reject critical value then there is reason to reject the null hypothesis that the two variables the null hypothesis that the two variables are independent.are independent.
![Page 44: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/44.jpg)
How do we Calc. Chi^2How do we Calc. Chi^2
The observed frequencies are already in The observed frequencies are already in the crosstab.the crosstab.
The expected frequencies in each table The expected frequencies in each table cell are found by multiplying the row and cell are found by multiplying the row and the column marginal totals and dividing by the column marginal totals and dividing by the sample size.the sample size.
![Page 45: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/45.jpg)
Chi –Square (Chi –Square (22))
NE
MW
SO
WE
TOTAL
REPUB
O E 4
O E 10
O E 6
O E 10
30
IND
O E 12
O E 8
O E 16
O E 4
40
DEM
O E 14
O E 2
O E 8
O E 6
30
TOTAL
30
20
30
20
100
![Page 46: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/46.jpg)
Calculating Expected FrequenciesCalculating Expected Frequencies
To calculate the expected cell frequency for NE To calculate the expected cell frequency for NE Republicans:Republicans:
• E/30 = 30/100, therefore E=(30*30)/100 = 9E/30 = 30/100, therefore E=(30*30)/100 = 9
NE
MW
SO
WE
TOTAL
REPUB
O E 4
O E 10
O E 6
O E 10
30
IND
O E 12
O E 8
O E 16
O E 4
40
DEM
O E 14
O E 2
O E 8
O E 6
30
TOTAL
30
20
30
20
100
![Page 47: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/47.jpg)
Calculating the Chi-Square StatisticCalculating the Chi-Square Statistic
The chi-square statistic is calculated as:The chi-square statistic is calculated as:
(Obs. Frequency(Obs. Frequencyikik - Exp. Frequency - Exp. Frequencyikik))22 / Exp. Frequency / Exp. Frequency ikik
(25/9)+(16/6)+(9/9)+(16/6)+(0)+(0)+(16/12)+(16/8)+(25/9)+16/6)+(1/9)+(0) = (25/9)+(16/6)+(9/9)+(16/6)+(0)+(0)+(16/12)+(16/8)+(25/9)+16/6)+(1/9)+(0) = 1818
NE
MW
SO
WE
TOTAL
REPUB
O E 4 9
O E 10 6
O E 6 9
O E 10 6
30
IND
O E 12 12
O E 8 8
O E 16 12
O E 4 8
40
DEM
O E 14 9
O E 2 6
O E 8 9
O E 6 6
30
TOTAL
30
20
30
20
100
![Page 48: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/48.jpg)
The value 9, is the expected frequency in The value 9, is the expected frequency in the first cell of the table and is what we the first cell of the table and is what we would expect in a sample of 100 (with 30 would expect in a sample of 100 (with 30 Republicans and 30 north easterners) if Republicans and 30 north easterners) if there is statistical independence in the there is statistical independence in the population.population.
This is more than we have in our sample This is more than we have in our sample so there is a difference.so there is a difference.
![Page 49: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/49.jpg)
Just Like the Hyp. TestJust Like the Hyp. Test
Null : Statistical Independence between x Null : Statistical Independence between x and Yand Y
Alt : X and Y are not independent.Alt : X and Y are not independent.
![Page 50: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/50.jpg)
Interpreting the Chi-Square StatisticInterpreting the Chi-Square Statistic
The Chi-Square statistic ranges from 0 to infinity The Chi-Square statistic ranges from 0 to infinity 0 = perfect statistical independence0 = perfect statistical independence Even though two variables may be statistically Even though two variables may be statistically
independent in the population, in a sample the independent in the population, in a sample the Chi-Square statistic may be > 0Chi-Square statistic may be > 0
Therefore it is necessary to determine Therefore it is necessary to determine statistical statistical significancesignificance for a Chi-Square statistic (given a for a Chi-Square statistic (given a certain level of confidence)certain level of confidence)
![Page 51: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/51.jpg)
Cramer’s VCramer’s V
Problem with Chi-Square: not comparable Problem with Chi-Square: not comparable across different sample sizes (and their across different sample sizes (and their associated crosstab)associated crosstab)
Cramer’s V is a standardization of the Chi-Cramer’s V is a standardization of the Chi-Square statistic Square statistic
![Page 52: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/52.jpg)
Calculating Cramer’s VCalculating Cramer’s V V = V =
Where R = #rows and C = Where R = #rows and C = #columns#columns
• V ranges from 0-1V ranges from 0-1
Example (region and Example (region and partisanship)partisanship)
= = √.09 = .30= = √.09 = .30
)1,1(
CRMinN
SquaredChi
)13(100
18
![Page 53: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/53.jpg)
Relationships between Ordinal Relationships between Ordinal VariablesVariables
There are several measures of association There are several measures of association appropriate for relationships between appropriate for relationships between ordinal variablesordinal variables
Gamma, Tau-b, Tau-c, Somer’s dGamma, Tau-b, Tau-c, Somer’s d
All are based on identifying All are based on identifying concordantconcordant, , discordantdiscordant, and , and tiedtied pairs of observations pairs of observations
![Page 54: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/54.jpg)
Concordant Pairs:Concordant Pairs:Ideology and VotingIdeology and Voting
IdeologyIdeology - conserv (1), moderate (2), liberal (3) - conserv (1), moderate (2), liberal (3) Voting Voting - never (1), sometimes (2), often (3)- never (1), sometimes (2), often (3)
Consider two hypothetical individuals in the Consider two hypothetical individuals in the sample with scoressample with scores
• Individual A: Ideology=1, Voting=1Individual A: Ideology=1, Voting=1• Individual B: Ideology=2, Voting=2Individual B: Ideology=2, Voting=2
• Pair A&B are considered a Pair A&B are considered a concordant pairconcordant pair because B’s because B’s ideology score is greater than A’s score, and B’s voting score ideology score is greater than A’s score, and B’s voting score is greater than A’s scoreis greater than A’s score
![Page 55: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/55.jpg)
Concordant Pairs (cont’d)Concordant Pairs (cont’d) All of the following are concordant pairsAll of the following are concordant pairs
A(1,1) B(2,2)A(1,1) B(2,2) A(1,1) B(2,3)A(1,1) B(2,3) A(1,1) B(3,2)A(1,1) B(3,2) A(1,2) B(2,3)A(1,2) B(2,3) A(2,2) B(3,3)A(2,2) B(3,3)
Concordant pairs are consistent with a positive Concordant pairs are consistent with a positive relationship between the IV and the DV (ideology and relationship between the IV and the DV (ideology and voting)voting)
![Page 56: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/56.jpg)
Discordant PairsDiscordant Pairs All of the following are discordant pairsAll of the following are discordant pairs
A(1,2) B(2,1)A(1,2) B(2,1) A(1,3) B(2,2)A(1,3) B(2,2) A(2,2) B(3,1)A(2,2) B(3,1) A(1,2) B(3,1)A(1,2) B(3,1) A(3,1) B(1,2)A(3,1) B(1,2)
Discordant pairs are consistent with a negative Discordant pairs are consistent with a negative relationship between the IV and the DV (ideology and relationship between the IV and the DV (ideology and voting)voting)
![Page 57: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/57.jpg)
Identifying Concordant PairsIdentifying Concordant Pairs
Concordant Pairs for Never - Conserv (1,1)Concordant Pairs for Never - Conserv (1,1) #Concordant = 80*70 + 80*10 + 80*20 + 80*80 #Concordant = 80*70 + 80*10 + 80*20 + 80*80
= 14,400= 14,400
Conservative (1)
Moderate (2)
Liberal (3) Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
![Page 58: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/58.jpg)
Identifying Concordant PairsIdentifying Concordant Pairs
Concordant Pairs for Never - Moderate (1,2)Concordant Pairs for Never - Moderate (1,2) #Concordant = 10*10 + 10*80 = 900#Concordant = 10*10 + 10*80 = 900
Conservative (1)
Moderate (2)
Liberal (3)
Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
![Page 59: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/59.jpg)
Identifying Discordant PairsIdentifying Discordant Pairs
Discordant Pairs for Often - Conserv (1,3)Discordant Pairs for Often - Conserv (1,3) #Discordant = 0*10 + 0*10 + 0*70 + 0*10 = 0#Discordant = 0*10 + 0*10 + 0*70 + 0*10 = 0
Conservative (1)
Moderate (2)
Liberal (3)
Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
![Page 60: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/60.jpg)
Identifying Discordant PairsIdentifying Discordant Pairs
Discordant Pairs for Often - Moderate (2,3)Discordant Pairs for Often - Moderate (2,3) #Discordant = 20*10 + 20*10#Discordant = 20*10 + 20*10
Conservative (1)
Moderate (2)
Liberal (3)
Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
![Page 61: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/61.jpg)
GammaGamma
Gamma is calculated by identifying all Gamma is calculated by identifying all possible pairs of individuals in the sample possible pairs of individuals in the sample and determining if they are concordant or and determining if they are concordant or discordantdiscordant
Gamma = (#C - #D) / (#C + #D)Gamma = (#C - #D) / (#C + #D)
![Page 62: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/62.jpg)
Interpreting GammaInterpreting Gamma
Gamma = 21400/24400 =.88Gamma = 21400/24400 =.88 Gamma ranges from -1 to +1Gamma ranges from -1 to +1 Gamma does not account for tied pairsGamma does not account for tied pairs
Tau (b and c) and Somer’s d account for Tau (b and c) and Somer’s d account for tied pairs in different waystied pairs in different ways
![Page 63: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/63.jpg)
Square tables:
Non-Square tables:
![Page 64: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/64.jpg)
ExampleExample
NES 2004 – What explains variation in NES 2004 – What explains variation in one’s political Ideology?one’s political Ideology?
Income?Income? Education?Education? Religion?Religion? Race?Race?
![Page 65: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/65.jpg)
Bivariate Relationships and Bivariate Relationships and Hypothesis Testing Hypothesis Testing
(Significance Testing)(Significance Testing) 1. Determine the null and alternative 1. Determine the null and alternative
hypotheseshypotheses
• Null: There is no relationship between X Null: There is no relationship between X and Y (X and Y are statistically and Y (X and Y are statistically independent and independent and test statistictest statistic = 0). = 0).
• Alternative: There IS a relationship Alternative: There IS a relationship between X and Y (between X and Y (test statistictest statistic does not does not equal 0).equal 0).
![Page 66: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/66.jpg)
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
2. Determine Appropriate Test Statistic 2. Determine Appropriate Test Statistic (based on measurement levels of X and Y)(based on measurement levels of X and Y)
3. Identify the type of sampling distribution 3. Identify the type of sampling distribution for test statistic, for test statistic, and what it would look like and what it would look like if the null hypothesis were trueif the null hypothesis were true..
![Page 67: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/67.jpg)
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
4. Calculate the test statistic from the sample 4. Calculate the test statistic from the sample data and determine the probability of observing data and determine the probability of observing a test statistic this large (in absolute terms) a test statistic this large (in absolute terms) if the if the null hypothesis is truenull hypothesis is true. .
P-value (significance level)P-value (significance level) – probability of – probability of observing a test statistic at least as large as our observing a test statistic at least as large as our observed test statistic, if in fact the null observed test statistic, if in fact the null hypothesis is truehypothesis is true
![Page 68: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/68.jpg)
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
5. Choose an “alpha level” – a decision rule to 5. Choose an “alpha level” – a decision rule to guide us in determining which values of the p-guide us in determining which values of the p-value lead us to reject/not reject the null value lead us to reject/not reject the null hypothesishypothesis When the p-value is extremely small, we reject the When the p-value is extremely small, we reject the
null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically significant,”“statistically significant,”
When the p-value is not small, we do not reject the When the p-value is not small, we do not reject the null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically insignificant.”“statistically insignificant.”
Most common alpha level: .05Most common alpha level: .05
![Page 69: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/69.jpg)
Bottom LineBottom Line
Assuming we will always use an alpha Assuming we will always use an alpha level of .05:level of .05:
Reject the null hypothesis if P-value<.05Reject the null hypothesis if P-value<.05 Do not reject the null hypothesis if P-Do not reject the null hypothesis if P-
value>.05value>.05
![Page 70: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/70.jpg)
An ExampleAn Example
Dependent variable: Vote Choice in 2000Dependent variable: Vote Choice in 2000 (Gore, Bush, Nader)(Gore, Bush, Nader) Independent variable: IdeologyIndependent variable: Ideology
(liberal, moderate, conservative)(liberal, moderate, conservative)
![Page 71: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/71.jpg)
An ExampleAn Example
1. Determine the null and alternative 1. Determine the null and alternative hypotheses.hypotheses.
![Page 72: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/72.jpg)
An ExampleAn Example
Null HypothesisNull Hypothesis: There is no relationship : There is no relationship between ideology and vote choice in 2000.between ideology and vote choice in 2000.
Alternative (Research) HypothesisAlternative (Research) Hypothesis: There : There is a relationship between ideology and is a relationship between ideology and vote choice (liberals were more likely to vote choice (liberals were more likely to vote for Gore, while conservatives were vote for Gore, while conservatives were more likely to vote for Bush).more likely to vote for Bush).
![Page 73: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/73.jpg)
An ExampleAn Example
2. Determine Appropriate Test Statistic 2. Determine Appropriate Test Statistic (based on measurement levels of X and Y)(based on measurement levels of X and Y)
3. Identify the type of sampling distribution 3. Identify the type of sampling distribution for test statistic, for test statistic, and what it would look like and what it would look like if the null hypothesis were trueif the null hypothesis were true..
![Page 74: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/74.jpg)
Sampling Distributions for the Chi-Squared StatisticSampling Distributions for the Chi-Squared Statistic(under assumption of perfect independence)(under assumption of perfect independence)
df = (rows-1)(columns-1)df = (rows-1)(columns-1)
![Page 75: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/75.jpg)
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
4. Calculate the test statistic from the sample 4. Calculate the test statistic from the sample data and determine the probability of observing data and determine the probability of observing a test statistic this large (in absolute terms) a test statistic this large (in absolute terms) if the if the null hypothesis is truenull hypothesis is true. .
P-value (significance level)P-value (significance level) – probability of – probability of observing a test statistic at least as large as our observing a test statistic at least as large as our observed test statistic, if in fact the null observed test statistic, if in fact the null hypothesis is truehypothesis is true
![Page 76: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/76.jpg)
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
5. Choose an “alpha level” – a decision rule to 5. Choose an “alpha level” – a decision rule to guide us in determining which values of the p-guide us in determining which values of the p-value lead us to reject/not reject the null value lead us to reject/not reject the null hypothesishypothesis When the p-value is extremely small, we reject the When the p-value is extremely small, we reject the
null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically significant,”“statistically significant,”
When the p-value is not small, we do not reject the When the p-value is not small, we do not reject the null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically insignificant.”“statistically insignificant.”
Most common alpha level: .05Most common alpha level: .05
![Page 77: Crosstabulation and Measures of Association](https://reader036.fdocuments.us/reader036/viewer/2022062408/568134ab550346895d9bbd79/html5/thumbnails/77.jpg)
In-Class ExerciseIn-Class Exercise For some years now, political commentators have cited For some years now, political commentators have cited
the importance of a “gender gap” in explaining election the importance of a “gender gap” in explaining election outcomes. What is the source of the gender gap?outcomes. What is the source of the gender gap?
Develop a simple theory and corresponding hypothesis Develop a simple theory and corresponding hypothesis (where gender is the independent variable) which seeks (where gender is the independent variable) which seeks to explain the source of the gender gap.to explain the source of the gender gap.
Specifically, determine:Specifically, determine: TheoryTheory Null and research hypothesisNull and research hypothesis Test statistic for a cross-tabulation to test your hypothesisTest statistic for a cross-tabulation to test your hypothesis