Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

29
Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Transcript of Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Page 1: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Statistical model for count dataSpeaker : Tzu-Chun LoAdvisor : Yao-Ting Huang

Page 2: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Outline

•Why use statistical model•Target

▫Gene expression•Binomial distribution

▫Poisson distribution•Over dispersion•Negative binomial

▫Chi-square approximation•Conclusion

Page 3: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Statistics model

•A statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data.

Population

sample Information :

Inference

Make a decision : Hypothesis testing

designer consumer

We have to choose astatistics model for sample(mean, variance)

We

Height, weight, etc.

(mean, variance) size

Page 4: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Target

• Gene expression▫ We like to use statistical model to test an observed difference in read counts is significant.

Look like asignificantregion

How about thisCan we sure ?

Noise or not

Page 5: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Count data

•A type of data in which the observations can

take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•An individual piece of count data is often

termed a count variable.Binomial

Poisson

Negative binomial

All of themare this type

Page 6: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Binomial distribution•The number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.•Notation :

 

Page 7: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Binomial distributionEx : p=0.8 , (1-p)=0.2 , times : 3 , success : 2 (1 1 0) (1 0 1) (0 1 1) f(2)=0.384

33 goals110 shotsin this season

Success : 0.3Fail : 0.7

What is the probabilityif he scored 6 goals in 10 shots

Page 8: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Binomial distribution

•Exactly six goals

•Most three goals

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

binomial(n=10,p=0.3)

goals

probability

0 1 2 3 4 5 6 7 8 9 106

Page 9: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Poisson distribution

•Expresses the probability of a given number

of events occurring in a fixed interval. •Notation : •

Page 10: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.
Page 11: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Poisson distribution

•Suppose interval : goals per game

e = 2.718281828…

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

binomial(n=10,p=0.3)

goals

probability

Page 12: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

0 1 2 3 4 5 6 7 80

1

2

3

poissonraw data

Poisson

•Total : 11 games •Score : 33 goals•(33/11) = 3 goals per game•Poisson : •Raw data : •We could test inaccurately in this case by

poisson

Games

goals

Goals of game

0 1 2 3 4 5 6 7

Poisson 0.5

1.6

2.5

2.5

1.8

1.1

0.6

0.2

Raw data 1 2 2 2 2 0 1 1

Page 13: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

•The presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model.

Overdispersion

Page 14: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Negative binomial

•Gamma-poisson (mixture) distribution

Page 15: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Negative binomial

Page 16: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Parameter estimation

Page 17: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Approximate control limits

•Chi-square approximation

𝑣=2𝜇1+𝜇𝑘

Page 18: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Example

= 67.0

Page 19: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.
Page 20: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.
Page 21: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Conclusion

•Conclusion

•Thanks for attention

Page 22: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Statistics model

•Suitable type▫Which distribution should we use

•Parameters ▫Get some information from data

•Inference ▫What do we want to know▫How could we make a decision

Hypothesis testing

Page 23: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Statistics model

•Suitable type▫Binomial distribution

•Parameters ▫n = 10, p = 0.7

•Inference▫2 successes

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

dbinom(0:10, n=10, p=0.3)

goals

probability

Page 24: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Multinomial distribution

•The analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of possible outcomes.•http://en.wikipedia.org/wiki/Multinomial_

distribution

Page 25: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Trinomial distribution

Page 26: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Count data

•A type of data in which the observations can

take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•We tend to use fixed fractions of genes.

The probability that reads appearedin this region

The number of read countsin this interval

(Binomial distribution) (Poisson distribution)

Page 27: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.
Page 28: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Poisson example

0 1 2 3 4 5 6 7 8 90

1

2

3

poissonraw data

Page 29: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.

Negative binomial