One more example of a hypothesis test Chapter 10 ...jackd/Stat203_2011/Wk08_1_Full.pdfOne more...

Post on 01-Aug-2021

8 views 0 download

Transcript of One more example of a hypothesis test Chapter 10 ...jackd/Stat203_2011/Wk08_1_Full.pdfOne more...

One more example of a hypothesis test

Chapter 10: Scatterplots

If time: Joy of Stats – 200 Countries, 200 Years

You should know this.

You should be familiar with all of this, but don’t waste

too much time memorizing.

Alzhiemer’s Onset and Gender (From Ch.7 exercises, #21)

We’ve been given a list of age of Alzheimer’s onset ages from

men and women.

We want to find out if there is a difference between the ages

that men get Alzheimer’s and the age women get it.

Alzheimer’s Onset Men Women Sample Mean 67.75 66.55 Standard Devation 6.58 5.34 Sample Size 8 9

Is this a one-sided or two-sided test?

“If there’s a difference” tells us this is…

______________

Is this a one-sided or two-sided test?

“If there’s a difference” tells us this is…

a two-sided test.

It’s a two-sided test…

Is it about means or proportions?

We’re talking about ages, so the ___________ is

appropriate.

It’s a two-sided test…

Is it about means or proportions?

We’re talking about ages, so the mean is

appropriate.

It’s a two-sided test of a mean or means.

We weren’t told otherwise, so we don’t know the true

standard deviation.

We use the sample standard deviation instead.

It’s a two-sided test of a mean or means, using the sample

standard deviation

We’re comparing men and women, so we’re

interested in ________________

It’s a two-sided, two-sample test of means, using the sample

standard deviation

We’re comparing men and women, so we’re

interested in two means

It’s a two-sided, two-sample test of means, using the sample

standard deviation

Is it an independent or a paired test?

There a different number of people in each

group, so they must be ________________

It’s a two-sided, independent two-sample test of means, using

the sample standard deviation.

Is it an independent or a paired test?

There a different number of people in each

group, so they must be independent

Hope you’re not getting caught up in the concepts

We have this data:

Alzheimer’s Onset Men Women Sample Mean 67.75 66.55 Standard Devation 6.58 5.34 Sample Size 8 9

And we know to do an independent two-sample test.

All our formulae will have t and s in them, instead of z

and sigma.

If we do a confidence interval, it will have mu =

something, instead of pi = something.

We input the raw data into SPSS and then click

Analyze Compare Means Independent T Test

We get this for the first couple columns:

From this result we _____________ assume equal

variances. This is because __________________,

so we use the __________ row.

From this result we can assume equal variances.

This is because Sig. (p-value) is large.

so we use the top row.

We get this for the middle five columns of the table.

Sig. (2-tailed) is large so we _________________ the null.

That means we detect __________________ between the

onset age of alzhiemers between men and women.

Sig. (2-tailed) is large so we fail to reject the null.

That means we detect no significant difference

between the onset age of alzhiemers between men

and women.

We can also tell that the difference between the means was

________ standard errors. (t-score)

The area beyond this score (on both sides) was found on the t-

distribution with _____ degrees of freedom.

The area was ____________ (p-value)

We can also tell that the difference between the means was

0.413 standard errors. (t-score)

The area beyond this score (on both sides) was found on the t-

distribution with 15 degrees of freedom.

The area was .686 (p-value)

To a new chapter we go!

Chapter 10: Correlations

Correlations are one way to quanity and show the

relationship between two features of the same object.

Usually this is between two sets interval data,

otherwise it’s called an association.

If two values increase together, they are said to be

positively correlated. (As one goes up, so does the

other)

If one value increases as the other decreases, they are

said to be negatively correlated.

Example: Longer bearded dragons tend to be larger all

around, so they weigh more.

Length and Weight are positively correlated in bearded

dragons.

Example: Heating bills tend to be a lot less when it’s

warmer out.

Heating Cost and Outdoor Temperature are

negatively correlated.

The most common graph to show two sets of interval

data together is the scatter plot.

Each dot represents a subject. In Length vs. Weight,

each dot is a dragon.

The height of the dot represents the length of the

dragon. How far it is to the right represents the weight

of the dragon.

The dragon for this dot is 18cm long, and weighs 700g.

There is an obvious upward trend in the graph. This

shows a positive correlation.

The negative correlation between heating cost and

outdoor temperature can be shown the same way.

The lack of correlation between two variables can also

be show in a scatterplot.

The strength of a correlation is how well the data

points fit onto a straight line .

Stronger correlations are easier to see and have less

random scatter or variation.

We can quantify the strength and direction of a

correlation with the correlation coefficient.

The correlation coefficient, called…

r from a sample and (we’ll see r frequently)

ρ, or rho from a population. (we’ll see rho rarely)

Is a value between -1 and 1 that tells how strong a correlation

is and in what direction.

The stronger a correlation, the farther the coefficient is from

zero (and the closer it is to 1 or -1)

Positive correlations have positive coefficients r.

Negative correlations have negative coefficients r.

The stronger the negative correlation, the closer it is to -1.

A perfect correlation, one in which all the values fit perfectly

on a line, has a correlation 1 (for positive) or -1 (for negative).

If there is no correlation at all, r will have a value of zero.

However, since r is from a sample, it will vary like everything

else from a sample. Instead of zero, it usually has some value

close to zero on either side.

Recall the Burnaby vs. Coquitlam gas example from last week.

One reason a pooled t-test was appropriate was because gas

prices between the two cities were correlated.

Guess the correlation coefficient between Burnaby and

Coquitlam gas prices.

A) r = 0.05 B) r = 0.97

C) r = 0.592 D) r = -0.592

C) r = 0.592

There is a relationship between Burnaby and Coquitlam gas

prices, but it’s not a perfect relationship.

It’s postive, so the correlation coefficient r is postive, not

negative.

Which of these is a possible correlation coefficent?

A) r = -0.28

B) r = 1.21

C) r = 0.41 grams per bean.

Which of these is a possible correlation coefficent?

A) r = -0.28

r is always between -1 and 1. Also, it has no units, so ‘grams

per bean’ doesn’t make much sense, even through it’s a

relationship between two variables.

Joy of Stats 28:45 – 33:00 (200 Countries, 200 Years)

Next time: r-squared, significance test for correlation ,

nonlinearity.