What is the probability that of 10 newborn babies at least 7 are boys ?

26
at is the probability that of 10 newborn babies at least 7 are boys () k nk n pk pq k 172 . 0 5 . 0 5 . 0 10 10 5 . 0 5 . 0 9 10 5 . 0 5 . 0 8 10 5 . 0 5 . 0 7 10 ) 6 ( 0 10 1 9 2 8 3 7 k p 0 0.05 0.1 0.15 0.2 0.25 0.3 0 2 4 6 8 10 p(X) X p(girl) = p(boy) = 0.5 Lecture 10 Important statistical distributions n i i p 0 1 Bernoulli distribution

description

Lecture 10 Important statistical distributions. What is the probability that of 10 newborn babies at least 7 are boys ? . p(girl) = p(boy) = 0.5. Bernoulli distribution. Bernoulli or binomial distribution. - PowerPoint PPT Presentation

Transcript of What is the probability that of 10 newborn babies at least 7 are boys ?

Page 1: What is the probability that  of 10  newborn babies at least 7  are boys ?

What is the probability that of 10 newborn babies at least 7 are boys?

( ) k n knp k p q

k

172.05.05.01010

5.05.0910

5.05.0810

5.05.0710

)6( 010192837

kp

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10

p(X)

X

p(girl) = p(boy) = 0.5

Lecture 10Important statistical distributions

n

iip

0

1

Bernoulli distribution

Page 2: What is the probability that  of 10  newborn babies at least 7  are boys ?

( ) k n knp k p q

k

0

( ) ( )k

x n x

x

nF k p x k p q

x

The Bernoulli or binomial distribution comes from the Taylor expansion of the binomial

n

i

nin

i

nin qpin

qpin

qp0

1

0

1 )1()(

npq

np

2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1 2 3 4 5 6 7 8 9 10p

f(p)

1010( ) 0.2 0.8k kp k

k

1010( ) 0.2 0.8k kp k

k

Bernoulli or binomial distribution

Page 3: What is the probability that  of 10  newborn babies at least 7  are boys ?

Assume the probability to find a certain disease in a tree population is 0.01. A bio-monitoring program surveys 10 stands of trees and takes in each case a random sample of

100 trees. How large is the probability that in these stands 1, 2, 3, and more than 3 cases of this disease will occur?

146.39.9

9.999.0*01.0*1000

1001.0*10002

0074.099.0*01.03

1000)3(

0022.099.0*01.02

1000)2(

0004.099.0*01.01

1000)1(

9973

9982

999

p

p

pMean, variance, standard deviation

99.099.001.03

100099.001.0

21000

99.001.01

1000

99.001.00

1000199.001.01)3(1)3(

997399829991

100003

0

i

inikpkp

Page 4: What is the probability that  of 10  newborn babies at least 7  are boys ?

What happens if the number of trials n becomes larger and larger and p the event probability becomes smaller and smaller.

( )! 1 ( )!( )!( 1)! ( ) ( ) ! ( 1)!( )

1

k r k

rk r k

r k r r kp X kk r r r k r r

r

1lim1

( )!lim 1( 1)!( )

r r

r k

e

rr k

r r

( )!

k

p X k ek

Poisson distribution

( ) k n knp k p q

k

rrpq

rp

prpnp 11

The distribution or rare events

Page 5: What is the probability that  of 10  newborn babies at least 7  are boys ?

Assume the probability to find a certain disease in a tree population is 0.01. A bio-monitoring program surveys 10 stands of trees and takes in each case a random sample of

100 trees. How large is the probability that in these stands 1, 2, 3, and more than 3 cases of this disease will occur?

1001.0*1000

0076.0!3

10)3(

0023.0!2

10)2(

00045.0!110)1(

103

102

10

ep

ep

ep

0074.0)3(

0022.0)2(

0004.0)1(

p

p

pPoisson solution Bernoulli solution

The probability that no infected tree will be detected

000045.0!0

10)0( 10100

eepep )0(

The probability of more than three infected trees

981.0019.01)3(019.00076.00023.000045.0)3()2()1()0(

kppppp

99.0)3( kp

Bernoulli solution

Page 6: What is the probability that  of 10  newborn babies at least 7  are boys ?

00.05

0.10.15

0.20.25

0.30.35

0.4

0 1 2 3 4 5 6 7 8 9 10 11 12 13k

p(k)

= 1

= 2 = 3

= 4 = 6

2 1

Variance, mean

Skewness

Page 7: What is the probability that  of 10  newborn babies at least 7  are boys ?

What is the probability in Duży Lotek to have three times cumulation if the first time 14 000 000 people bet, the second time 20 000 000,

and the third time 30 000 000?

The probability to win is

140000001

!49!43!6)6( p

142857.214000000

130000000

428571.114000000

120000000

114000000

114000000

3

2

1

117.0!0

142857.2

239.0!0

428571.1

368.0!01

142857.20

3

428571.10

2

10

1

ep

ep

ep

The events are independent:

01.0117.0*239.0*368.03,2,1 p

The zero term of the Poisson distribution gives the probability of no eventThe probability of at least one event:

ekp 1)1(

Page 8: What is the probability that  of 10  newborn babies at least 7  are boys ?

T→CTCA→GAG→C→GTG→C→AAACG

TTCA→GAGTGCCCT

Single substitution

Parallel substitution

Back substitution

Multiple substitution

Probabilities of DNA substitutionWe assume equal substitution probabilities. If the total probability for a substitution is p:

A T

C G

p

pp p

p

The probability that A mutates to T, C, or G isP¬A=p+p+pThe probability of no mutation ispA=1-3p

Independent events)()()( BpApBAp

Independent events

)()()( BpApBAp The probability that A mutates to T and C to G isPAC=(p)x(p)

p(A→T)+p(A→C)+p(A→G)+p(A→A) =1

The construction of evolutionary trees from DNA sequence data

Page 9: What is the probability that  of 10  newborn babies at least 7  are boys ?

pppppppppppppppp

P

3131

3131

The probability matrixT→CTCA→GAG→C→GTG→C→AAACG

TTCA→GAGTGCCCT

Single substitution

Parallel substitution

Back substitution

Multiple substitution

A T C GA

TCG

What is the probability that after 5 generations A did not change?

55 )31( pp

The Jukes - Cantor model (JC69) now assumes that all substitution probabilities are equal.

Page 10: What is the probability that  of 10  newborn babies at least 7  are boys ?

Arrhenius model

The Jukes Cantor model assumes equal substitution probabilities within these 4 nucleotides.

Substitution probability after time t

tttt

tttt

tttt

tttt

eeee

eeee

eeee

eeee

P

4444

4444

4444

4444

43

41

41

41

41

41

41

41

41

41

43

41

41

41

41

41

41

41

41

41

43

41

41

41

41

41

41

41

41

41

43

41

Transition matrix

pppppppppppppppp

P

3131

3131

tPtP )0()(

tePtPtPdttdP )0()()()(

Substitution matrix

tA,T,G,C A

The probability that nothing changes is the zero term of the Poisson distribution

pteeGTCAP 4),,(

The probability of at least one substitution ispteeGTCAP 41)(

The probability to reach a nucleotide from any other is

)1(41),,,( 4 pteACGTAP

The probability that a nucleotide doesn’t change after time t is

ptpt eeAGCTAAP 44

43

41))1(

41(31)|,,,(

Page 11: What is the probability that  of 10  newborn babies at least 7  are boys ?

Probability for a single difference

This is the mean time to get x different sites from a sequence of n nucleotides. It is also a measure of distance that dependents only on the number of

substitutions

ptpt eeGCTAAP 44

43

43))1(

41(3),,,(

What is the probability of n differences after time t?

xnpt

xptxnx ee

xn

ppxn

txp

)

43

43(1

43

43)1(),( 44

)

43

41ln)(

43

43lnln)1ln()(lnln),(ln 44 ptpt exnex

xn

pxnpxxn

txp

nx

pt

341ln

41

We use the principle of maximum likelihood and the Bernoulli distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1 2 3 4 5 6 7 8 9 10p

f(p)

1010( ) 0.2 0.8k kp k

k

Page 12: What is the probability that  of 10  newborn babies at least 7  are boys ?

GorillaPan paniscusPan troglodytesHomo sapiens

Homo neandertalensis

Time

nx

pt

341ln

41

Divergence - number of substitutions

Phylogenetic trees are the basis of any systematic

classificaton

Page 13: What is the probability that  of 10  newborn babies at least 7  are boys ?

A pile model to generate the binomial.If the number of steps is very, very large the binomial becomes smooth.

The normal distribution is the continous equivalent to the discrete

Bernoulli distribution

Abraham de Moivre (1667-1754)

2

21

21)(

x

exf

)( 2

)( xCexf

Page 14: What is the probability that  of 10  newborn babies at least 7  are boys ?

If we have a series of random variates Xn, a new random variate Yn that is the sum of all Xn will for n→∞ be a variate that is asymptotically normally distributed.

00.010.020.030.040.05

-2 -1.2 -0.4 0.4 1.2 2X

Freq

uenc

y

00.010.020.030.040.05

-2 -1.2 -0.4 0.4 1.2 2X

Freq

uenc

y

0

0.02

0.04

0.06

-2 -1.2 -0.4 0.4 1.2 2X

Freq

uenc

y

0

0.05

0.1

0.15

-2 -1.2 -0.4 0.4 1.2 2X

Freq

uenc

y

00.05

0.10.15

0.20.25

-2 -1.2 -0.4 0.4 1.2 2X

Freq

uenc

y

0

0.05

0.1

0.15

-2 -1.2 -0.4 0.4 1.2 2X

Freq

uenc

y

The central limit theorem

Page 15: What is the probability that  of 10  newborn babies at least 7  are boys ?

00.020.040.060.08

0.10.120.140.160.18

0.2

0 3 6 9 12 15 18X

f(x)

n=20

0

0.02

0.04

0.06

0.08

0.1

0.12

0 6 12 18 24 30 36 42 48X

f(x)

n=50

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10X

f(x)

n=10

0

0.01

0.02

0.03

0.04

0.05

0.06

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

f(x)

2

2( )21( )

2

x

f x e

2

2( )21( )

2

x

f x e

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

f(x)

2

2( )21( )

2

vx

F x e dv

The normal or Gaussian distribution

Mean: Variance: 2

Page 16: What is the probability that  of 10  newborn babies at least 7  are boys ?

Important features of the normal distribution• The function is defined for every real x.• The frequency at x = m is given by

1 0.4( )2

p x

• The distribution is symmetrical around m. • The points of inflection are given by the second

derivative. Setting this to zero gives

( )x x

Page 17: What is the probability that  of 10  newborn babies at least 7  are boys ?

00.020.040.060.08

0.10.120.140.160.18

0.2

0 3 6 9 12 15 18X

f(x)

n=20

0

0.02

0.04

0.06

0.08

0.1

0.12

0 6 12 18 24 30 36 42 48X

f(x)

n=50

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10X

f(x)

n=10

0

0.01

0.02

0.03

0.04

0.05

0.06

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

f(x)

2

2( )21( )

2

x

f x e

+- 0.68

+2-2 0.95

95.021

68.021

121

2

2

2

212

2

21

21

x

x

x

e

e

e

975.021

5.021

2

2

212

21

x

x

e

e

Many statistical tests compare observed values with those of the standard normal distribution and assign

the respective probabilities to H1.

2

2( )21( )

2

vx

F x e dv

Page 18: What is the probability that  of 10  newborn babies at least 7  are boys ?

The Z-transform

2

21

21)(

x

exf

xZ

221

21)(

Zexf

The variate Z has a mean of 0 and and variance of 1.

A Z-transform normalizes every statistical distribution.Tables of statistical distributions are always given as Z-

transforms.

The standard normal

The 95% confidence limit

Page 19: What is the probability that  of 10  newborn babies at least 7  are boys ?

P( - < X < + ) = 68%P( - 1.65 < X < + 1.65) =

90%P( - 1.96 < X < + 1.96) =

95%P( - 2.58 < X < + 2.58) =

99% P( - 3.29 < X < + 3.29) =

99.9%

The Fisherian significance levels

00.020.040.060.08

0.10.120.140.160.18

0.2

0 3 6 9 12 15 18X

f(x)

n=20

0

0.02

0.04

0.06

0.08

0.1

0.12

0 6 12 18 24 30 36 42 48X

f(x)

n=50

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10X

f(x)

n=10

0

0.01

0.02

0.03

0.04

0.05

0.06

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

f(x)

2

2( )21( )

2

x

f x e

+- 0.68

+2-2 0.95

The Z-transformed (standardized) normal distribution

Page 20: What is the probability that  of 10  newborn babies at least 7  are boys ?

Why is the normal distribution so important?

The normal distribution is often at least approximately found in nature. Many additive or multiplicative processes generate distributions of patterns that are normal. Examples are body sizes,

intelligence, abundances, phylogenetic branching patterns, metabolism rates of individuals, plant and animal organ sizes, or egg numbers. Indeed following the Belgian biologist Adolphe Quetelet (1796-1874)

the normal distribution was long hold even as a natural law. However, new studies showed that most often the normal distribution is only a approximation and that real distributions frequently follow more

complicated unsymmetrical distributions, for instance skewed normals.

The normal distribution follows from the binomial. Hence if we take samples out of a large population of discrete events we expect the distribution of events (their frequency) to be normally

distributed.

The central limit theorem holds that means of additive variables should be normally distributed. This is a generalization of the second argument. In other words the normal is the expectation when

dealing with a large number of influencing variables.

Gauß derived the normal distribution from the distribution of errors within his treatment of measurement errors. If we measure the same thing many times our measurements will not always give

the same value. Because many factors might influence our measurement errors the central limit theorem points again to a normal distribution of errors around the mean.

In the next lecture we will see that the normal distribution can be approximated by a number of

other important distribution that form the basis of important statistical tests.

Page 21: What is the probability that  of 10  newborn babies at least 7  are boys ?

x,s

x,s

x,s

x,s

x,sx,sx,s

x,s

,

The estimation of the population mean from a series of samples

xnnnxn

s

nxZ

n

i

i

n

ii

n

ii

1

1

2

1

xZn

The n samples from an additive random variate.

Z is asymptotically normally distributed.

nx

Confidence limit of the estimate of a mean from a series of

samples.

is the desired probability level.

00.020.040.060.080.1

0.120.140.160.180.2

0 3 6 9 12 15 18X

f(x)

n=20

0

0.02

0.04

0.06

0.08

0.1

0.12

0 6 12 18 24 30 36 42 48X

f(x)

n=50

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10X

f(x)

n=10

0

0.01

0.02

0.03

0.04

0.05

0.06

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

f(x)

2

2( )21( )

2

x

f x e

+- 0.68

+2-2 0.95

Standard error

Page 22: What is the probability that  of 10  newborn babies at least 7  are boys ?

How to apply the normal distribution

Intelligence is approximately normally distributed with a mean of 100 (by definition) and a standard deviation of 16 (in North America). For an intelligence study we need 100 persons with an IO above 130. How many persons do we have to test to find this

number if we take random samples (and do not test university students only)?

2 2

2 2( ) ( )1302 2

130

1 1( 130) 12 2

v v

F x e dv e dv

( ) ( )az F x a

0

0.005

0.01

0.015

0.02

0.025

0.03

40 60 80 100 120 140 160

IQ

f(IQ

)

IQ<130 IQ>130

Page 23: What is the probability that  of 10  newborn babies at least 7  are boys ?
Page 24: What is the probability that  of 10  newborn babies at least 7  are boys ?

One and two sided tests

We measure blood sugar concentrations and know that our method estimates the concentration with an error of about 3%. What is the probability that our

measurement deviates from the real value by more than 5%?

Page 25: What is the probability that  of 10  newborn babies at least 7  are boys ?

Albinos are rare in human populations. Assume their frequency is 1 per 100000 persons. What is the probability to find 15

albinos among 1000000 persons?

15 9999851000000( 15) (0.00001) (0.99999)

15p X

=KOMBINACJE(1000000,15)*0.00001^15*(1-0.00001)^999985 = 0.0347

np 2 npq

Page 26: What is the probability that  of 10  newborn babies at least 7  are boys ?

Home work and literature

Refresh:

• Bernoulli distribution• Poisson distribution• Normal distribution• Central limit theorem• Confidence limits• One, two sided tests • Z-transform

Prepare to the next lecture:

• c2 test• Mendel rules• t-test• F-test• Contingency table• G-test

Literature:

Łomnicki: Statystyka dla biologówMendel:http://en.wikipedia.org/wiki/Mendelian_inheritancePearson Chi2 testhttp://en.wikipedia.org/wiki/Pearson's_chi-square_test