Chapter14

18
1 Interpreting Data

Transcript of Chapter14

Page 1: Chapter14

1

Interpreting Data

Page 2: Chapter14

•Empirical research usually uses some type of statistical analysis

•Mathematics – Language for accomplishing logical operations inherent in good data analysis

•Statistics – Branch of math appropriate to research

•Descriptive statistics – Method for describing data in manageable forms

•Inferential statistics – Assist in forming conclusions from our observations

•About a population, based on studying the sample

2

Page 3: Chapter14

•Univariate Analysis: Only one variable at a time

•Bivariate Analysis: Two variables

•Multivariate Analysis: Three or more variables

•Distributions – Reporting all individual cases

•Marginals – Frequency distributions of grouped data (age of students)

•Frequency Distribution: (2, 7, 11, 14, 16) Class Frequency 1-9 2 10-19 3

3

Page 4: Chapter14

•“Summary Averages”

•Mode: Most frequent attribute

•Mean: Sum of all values divided by # of total values

•Median: Middle attribute of ranked data

4

Page 5: Chapter14

•Range: Distance separating the highest value from the lowest value

•Standard Deviation: The average amount of variation about the mean

•Variance: Sum of squared standard deviations from mean divided by total number of cases

•Percentile: What percentage of cases fall at or below some value; can be grouped into quartiles

•Rates: Used to standardize some measure for comparative purposes

5

Page 6: Chapter14

•We are interested how variables are related (explanation)

•Contingency table: Used to compare subgroups; “percentage down” column, read across row

•Values of the dependent variable are contingent on values of the independent variable

6

Own a Gun? Men Women

Yes 49% 24%

No 51 76

100% = (1,270) (633)

Page 7: Chapter14

•Instead of explaining the dependent variable on the basis of a single independent variable - seek an explanation through the use of more than one independent variable

7

Assault Rates, Poverty, & Mobility in 60 Boston Neighborhoods

_______________Mobility________________

Poverty Low High Total

Low 12.2 (22) 19.5 (21) 15.8 (43)

High 43.8 (4) 25.0 (13) 29.4 (17)

Total 17.1 (26) 21.6 (34) 19.6 (60)

Page 8: Chapter14

•Indicates strength of relationship (01)

•Based on Proportionate Reduction of Error (PRE):

•How much variation in y can be predicted by x; how much you can reduce your error in predicting y by knowing x

•The greater the relationship between two variables, the greater the reduction of error

8

Page 9: Chapter14

•Nominal Variables: Gender, marital status, or race

•Lambda (λ) : Based on your ability to guess values on one of the variables

•Ordinal Variables: Occupational status, education

•Gamma (γ) : Same as lambda, except based on the ordinal arrangement of values

•Interval or Ratio Variables: Age, income

•Pearson’s product-moment correlation (r )

9

Page 10: Chapter14

• Variables are linearly related:• The mean of Y increases linearly with X• Check scatterplot for general linear trend• Watch out for non-linear relationships

(e.g., U-shaped)• Y is normally distributed for every outcome of

X in the population; “Conditional normality”Ex: Income = X, Happiness = Y• Is a histogram of Income approximately

normal? For those with X = $25K? $50K? $100K?

• If all are roughly normal, the assumption is met

10

Page 11: Chapter14

• Association between two variables: Y = f (x)• Regression Line: All four points lie on a

straight line, we can superimpose that line over the points; Y' = a + b(x)

• “Unexplained Variation”: The sum of squared differences between actual and estimated values of Y

• Represents errors that exist even when estimates are based on known values of X

• “Explained Variation”: The difference between the total variation and the unexplained variation

11

Page 12: Chapter14

• When we generalize from samples to larger populations, we use inferential statistics to test the significance of an observed relationship

• Data analysis & sampling• Most research projects involve samples• Ultimate purpose is to make inferences about that

larger (target) population • Both univariate and multivariate findings can be

interpreted as a basis for inference

12

Page 13: Chapter14

• Univariate Measures: Percentages & Means

• “Standard Error”: p x q s=√ n

• Any statement of sampling error must contain two essential components:

• Confidence Level• Confidence Interval• Inferential statistics apply to sampling error

only; they do not take account of nonsampling errors

13

Page 14: Chapter14

• So, two variables are related? Is the relationship a significant one?

• Parametric tests of significance can tell us

• We report probability that a parameter falls within a certain range (confidence interval) & that degree of uncertainty is due to normal sampling error

14

Page 15: Chapter14

•Statistical significance is expressed with probabilities

•What does P < .05 or .01 or .001 mean?

•Significance at .05 level means that probability of achieving result by chance alone is 5 out of 100 (or 1 at the .01 level)

•If it’s not by chance, it represents a real finding between the variables!

15

Page 16: Chapter14

• Based on the Null Hypothesis: the assumption that there is no relationship between two variables in a population• Compares what you get (empirical) with what you expect given a null hypothesis of no relationship•Computing: For each cell in the tables, we •Subtract the expected frequency for that cell from the observed frequency •Square this quantity, and •Divide the squared difference by the expected frequency

16

Page 17: Chapter14

• Significance tests are guideline, not ultimate standard

•Dangers due to sampling error, sample size, etc

•Check and compare to other tests

•"Empirical research is, first and foremost, a logical rather than a mathematical operation."

17

Page 18: Chapter14

• What is a statistically discernable difference? • Results from tests on a non-random sample would be considered statistically significant if found in a random sample

• Findings should be viewed as important but not statistically significant

18