P Values Robin Beaumont 10/10/2011 With much help from Professor Chris Wilds material University of...

P Values

Robin Beaumont10/10/2011

With much help from

Professor Chris Wilds material University of Auckland

Where do they fit in!

Putting it all together

P Value

sampling

probability

statistic Rule

Populations and samplesEver constant

at least for your study!

= Parameter

estimate = statistic

One sample

Size matters – single samples

Size matters – multiple samples

We only have a rippled mirror

Standard deviation - individual level

= measure of variability

'Standard Normal distribution'

Total Area = 1

0 1= SD value

68%

95%

2

Area:

Between + and - three standard deviations from the mean = 99.7% of area Therefore only 0.3% of area(scores) are more than 3 standard deviations ('units') away.

-

But does not take into account sample size

= t distribution

Defined by sample size aspect ~ df

Area! Wait and see

Sampling level -‘accuracy’ of estimate

From: http://onlinestatbook.com/stat_sim/sampling_dist/index.html

SEM= 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒

= 5/√5 = 2.236

SEM = 5/√25 = 1

We can predict the accuracy of your estimate (mean) by just using the

SEM formula.

From a single sample

Talking about means here

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Example - Bradford Hill, (Bradford Hill, 1950 p.92)

• mean systolic blood pressure for 566 males around Glasgow = 128.8 mm. Standard deviation =13.05

• Determine the ‘precision’ of this mean.

• “We may conclude that our observed mean may differ from the true mean by as much as ± 2.194 (.5485 x 4) but not more than that in around 95% of observations. page 93. [edited]

Sampling summary

• The SEM formula allows us to:• predict the accuracy of your estimate

( i.e. the mean value of our sample)

• From a single sample • Assumes Random sample

Variation what have we ignored!

Onto Probability now

Probabilities are rel. frequencies

40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-90

0

0.05

0.1

0.15

0.2

0.25

Probability Distribution

Scores

Relative frequency =Probability

The total area = 1

total 48 scores

40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-900

2

4

6

8

10

12

Frequency Distribution (Histogram) of exam results

Scores

Fre

quen

cy -

No.

of

All outcomes at any one time = 1

Multiple outcomes at any one time

Probability Density Function

Scores

Probability

0

1

2

3

4

5

6

7

8

9

10

11

33 37 43 47 53 57 63 67 73 77 83 87

The total area = 1total 48 scores

Density

p(score<45) = area A

A

p(score > 50) = area B

B

P(score<45 and score >50) =

Just add up the individual outcomes

= Conditional Probability

MaleP(male)

female

No Disease X

Disease X

No Disease X

Disease X AND Male

What happens in the past affects the present

Multiple each branch of the tree to get end value

Disease XP(disease x |male)

P(disease AND male) = P(male) x P(disease x | male)

P(disease AND male) /P(male) = P(disease x | male)

Screening Example0.1% of the population carry a particular faulty gene.A test exists for detecting whether an individual is a carrier of the gene.In people who actually carry the gene, the test provides a positive result with

probability 0.9.In people who don’t carry the gene, the test provides a positive result with

probability 0.01.Let G = person carries gene P = test is positive for gene N = test is negative for gene

Errors

If someone gets a positive result when tested, find the probability that they actually are a carrier of the gene.We want to find

P(P) = P(G and P) + P(G' and P) = 0.0009 + 0.00999 = 0.01089 0.0009

P(G | P) = 0.08260.01089

P(G and P)P(G | P) =

P(P)

P( P | G)

P(P | G) ≠ P (G | p)ORDER MATTERS

Survival analysis

• Each years survival depends on previous ones or does it?

Probability summary

• All outcomes at any one time add up to 1 • Probability histogram = area under curve =1• -> specific areas = set of outcomes• Conditional probability – present dependent

on past – ORDER MATTERS


P Value

sampling

probability

statistic Rule

Statistics• Summary measure – SEM, Average etc• T statistic – different types, simplest:

observed difference in estimated mean and population valuesampling variability in means

observed difference in estimated mean and population valueSEM

observed difference

statistic

statistic

T

T

in estimated mean and population valueexpected variability in means due to random samping

SignalNoise

So when t = 0 means 0/anything = estimated and hypothesised population mean are equal

So when t = 1 observed different same as SEM

So when t = 10 observed different much greater than SEM

T statistic exampleSerum amylase values from a random sample of 15

apparently healthy subjects. The mean = 96 SD= 35 units/100 ml.

How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted)

96 120 24

35 9.03715

2.656statisticT

This looks like a rare occurrence?

But for what

A population value = the null hypothesis

t density: sx = 9.037 n =15

0

12096

-2.656t 2.656

Shaded area=0.0188

Original units:

0

Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted)

What does the shaded area mean!

Given that the sample was obtained from a population with a mean of 120 a sample with a T(n=15) statistic of -2.656 or 2.656 or one more extreme will occur 1.8% of the time = just under two samples per hundred on average. . . . .

Given that the sample was obtained from a population with a mean of 120 a sample of 15 producing a mean of 96 (120-x where x=24) or 144 (120+x where x=24) or one more extreme will occur 1.8% of the time, that is just under two samples per hundred on average.

But it this not a P value

p = 2 · P(t(n−1) < t| Ho is true) = 2 · [area to the left of t under a t distribution with df = n − 1]

P value and probability for t statistic

p value

= 2 x P(t(n-1) values more extreme than t(n-1) | Ho is true)

= 2 · [area to the left of t under a t distribution with n − 1 shape]

A p value is a special type of probability with:

Multiple outcomes + conditional upon the specified parameter value


P Value

sampling

probability

statistic RuleDo we need it!

Rules

t density: sx = 9.037 n =15

0

12096

-2.656t 2.656

Shaded area=0.0188

Original units:

0

Set a level of acceptability = critical value (CV)!

Say one in twenty 1/20 =Or 1/100

Or 1/1000or . . . .

If our result has a P value of less than our level of acceptability.Reject the parameter value. Say 1 in 20 (i.e.CV=0.5)

Given that the sample was obtained from a population with a mean (parameter value) of 120 a sample with a T(n=15) statistic of -2.656 or 2.656 or one more extreme with occur 1.8% of the time, This is less than one in twenty therefore we dismiss the possibility that our sample came from a population mean of 120. . . .

What do we replace it with?

Fisher – only know and only consider the model we have i.e. The parameter we have used in our model – when we reject it we accept that any value but that one can replace it.

Neyman and Pearson + Gossling

Must have an alternative specified value for the parameter

If there is an alternative - what is it – another distribution!•Power – sample size•Affect size

• – indication of clinical importance:

Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted)

α = the reject region

= 120= 96

Correct decisions

incorrect decisions

Insufficient power – never

get a significant result even when effect size large

Too much power get significant

result with trivial effect size

Life after P values

• Confidence intervals• Effect size• Description / analysis• Bayesian statistics - qualitative approach by the back door!

• Planning to do statistics for your dissertation? see: My medical statistics courses:Course 1:www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html YouTube videos to accompany course 1:http://www.youtube.com/playlist?list=PL9F0EBD42C0AB37D0 Course 2:www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html YouTube videos to accompany course 2:http://www.youtube.com/playlist?list=PL05FC4785D24C6E68

http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html

http://www.youtube.com/playlist?list=PL9F0EBD42C0AB37D0

http://www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html

http://www.youtube.com/playlist?list=PL05FC4785D24C6E68

Your attitude to your data

Where do they fit in!

P Values Robin Beaumont 10/10/2011 With much help from Professor Chris Wilds material University of...

Documents

Transcript of P Values Robin Beaumont 10/10/2011 With much help from Professor Chris Wilds material University of...