Inferential Statistics. Explore relationships between variables Test hypotheses –Research...

16
Inferential Statistics

Transcript of Inferential Statistics. Explore relationships between variables Test hypotheses –Research...

Page 1: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Inferential Statistics

Page 2: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Inferential Statistics

• Explore relationships between variables• Test hypotheses

– Research hypothesis: a statement of the relationship between variables.

• An increase in the number of stressors as measured by the LE scale will correspond to an increase in the number of illness incidents as measured by self-report.

– Statistical hypothesis: mathematical statement implying a relationship between variables

– Null hypothesis: mathematical statement implying no relationship

Page 3: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Examples

• You are looking the relationship between GPA and gender– Null hypothesis: H1: f = m

– Statistical hypothesis: H0: f m

• You are looking at the relationship between stressors and illness– Statistical hypothesis: H1: = 1– Null hypothesis: H0: 1

Page 4: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Tests

• Crosstab

• Difference of Means

• ANOVA

• Bivariate correlation

Page 5: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Crosstab

• Used for 2 categorical variables• Like a sort box, only you are using frequencies

instead of objects

blue

triangles circles

orange

Page 6: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

• I want to use a crosstab to see if there is a difference in referral patterns by school

• SPSS: analyze – descriptive statistics - crosstabSCHOOL * Q14. Food Crosstabulation

Count

21 55 76

4 85 89

14 96 110

22 78 100

42 168 210

103 482 585

Madison cross roads

Rainbow

Rolling hills

New Hope

Morris

SCHOOL

Total

Yes No

Q14. Food

Total

This is the raw sorted data. It gives you a sense of what’sGoing on, but let’s add percentages…

Page 7: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

• To read this table, first search for the 100% and then follow the line up or down..

SCHOOL * Q14. Food Crosstabulation

21 55 76

27.6% 72.4% 100.0%

20.4% 11.4% 13.0%

4 85 89

4.5% 95.5% 100.0%

3.9% 17.6% 15.2%

14 96 110

12.7% 87.3% 100.0%

13.6% 19.9% 18.8%

22 78 100

22.0% 78.0% 100.0%

21.4% 16.2% 17.1%

42 168 210

20.0% 80.0% 100.0%

40.8% 34.9% 35.9%

103 482 585

17.6% 82.4% 100.0%

100.0% 100.0% 100.0%

Count

% within SCHOOL

% within Q14. Food

Count

% within SCHOOL

% within Q14. Food

Count

% within SCHOOL

% within Q14. Food

Count

% within SCHOOL

% within Q14. Food

Count

% within SCHOOL

% within Q14. Food

Count

% within SCHOOL

% within Q14. Food

Madison cross roads

Rainbow

Rolling hills

New Hope

Morris

SCHOOL

Total

Yes No

Q14. Food

Total

Page 8: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

• Significance test for a crosstab is the Chi Square ( ²)

Chi-Square Tests

19.778a

4 .001

22.855 4 .000

585

PearsonChi-SquareLikelihood Ratio

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 13.38.

a.

The p-value for this hypothesis test is 0.001, therefore you would reject the null hypothesis

Page 9: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

• Pearson's chi-square is by far the most common type of chi-square significance test. If simply "chi-square" is mentioned, it is probably Pearson's chi-square. This statistic is used to test the hypothesis of no association of columns and rows in tabular data. It can be used even with nominal data. Note that chi square is more likely to establish significance to the extent that (1) the relationship is strong, (2) the sample size is large, and/or (3) the number of values of the two associated variables is large. A chi-square probability of .05 or less is commonly interpreted by social scientists as justification for rejecting the null hypothesis that the row variable is unrelated (that is, only randomly related) to the column variable.

Page 10: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Difference of means test

• One variable is dichotomous (2 categories only) and the other is continuous

• t-test for significance of the difference of means– Can look at means within one sample and means

between two samples.

• I want to look at the difference in depression symptoms between women who retained custody of their children vs. women who did not after an allegation of child sexual abuse.

Page 11: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

• Analyze – compare means – independent samples t test

Independent Samples Test

.042 .837 -2.316 116 .022 -3.3058 1.4274 -6.1330 -.4786

-2.225 21.105 .037 -3.3058 1.4856 -6.3944 -.2172

Equal variancesassumed

Equal variancesnot assumed

DEPTOTF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Group Statistics

17 6.8824 5.7105 1.3850

101 10.1881 5.4013 .5375

You kept custodyof your child(ren)no

Yes

DEPTOTN Mean Std. Deviation

Std. ErrorMean

I can choose the p-value of 0.022 because the standard deviationsare close enough to assume equal variances. SPSS also tests this withthe Levene’s test for equality of variances in the table above

Page 12: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

ANOVA• One variable is categorical with 3 or more

categories and the other is continuous• Looks for a difference between and within groups.

– Takes into account the mean and the variability

• The ANOVA uses the F-test for significance. – F is between-groups mean square variance divided by

within-groups mean square variance

x x x

Page 13: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Descriptives

Q11. Grade

40 2.42 1.63 .26 1.90 2.95 0 5

73 2.90 1.68 .20 2.51 3.30 0 6

74 2.76 1.87 .22 2.32 3.19 0 11

62 3.68 2.42 .31 3.06 4.29 0 8

165 2.18 1.74 .14 1.91 2.45 0 5

414 2.66 1.92 9.44E-02 2.47 2.85 0 11

Madison cross roads

Rainbow

Rolling hills

New Hope

Morris

Total

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval forMean

Minimum Maximum

ANOVA

Q11. Grade

109.159 4 27.290 7.883 .0001415.819 409 3.462

1524.978 413

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

I am trying to find out if the schools serve children of different grades

The p-value is less than 0.001 so I can reject the null hypothesis

Analyze – compare means – one way ANOVA

Page 14: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Bivariate Correlations• Both variables are continuous• Measure of the association between the two variables• Pearson's r is the usual measure of correlation,

sometimes called product-moment correlation. It is a measure of association which varies from -1 to +1, with 0 indicating no relationship (random pairing of values) and 1 indicating perfect relationship, taking the form, "The more the x, the more the y, and vice versa." A value of -1 is a perfect negative relationship, taking the form "The more the x, the less the y, and vice versa."

Page 15: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

Analyze – correlate - bivariate

Correlations

1.000 .205**. .000

412 412

.205** 1.000

.000 .

412 418

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Q5. child age first abuse

SEVERE2

Q5. child agefirst abuse SEVERE2

Correlation is significant at the 0.01 level (2-tailed).**.

The pearson r is 0.205, which shows a weak association betweenthe two variables. The p-value is less than 0.001 so it issignificant.If you remember, variation is r-squared (0.205²) which means thatchild age at first abuse explains 4% of the variance in abuse severity

Scatterplot: age of child at first abuse and abuse severity

Q5. child age first abuse

20100-10

SE

VE

RE

2

12

10

8

6

4

2

0

Page 16: Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.

What does the p-value really mean?

• Based on the idea of the sampling distribution.– If you have a population and repeatedly sample that

population you will end up with a normal distribution of means

– If you find a mean that the SPSS program tells you has a p-value of less than 0.05, that means that if there is no relationship between the variables in the population and you take 100 samples from the population, you will find a relationship as strong as the one SPSS found in less than 5 out of 100 samples.