Biostatistics 410.645.01 Class 3 Discrete Probability Distributions 2/8/2000.

Biostatistics410.645.01

Class 3

Discrete Probability Distributions

2/8/2000

Probability distributions of discrete variables

• A table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities– P(X=x)

• Tables – value, frequency, probability• Graph – usually bar chart or histogram• Formula - Binomial distribution

Cumulative Distributions

• Probability that X is less than or equal to a specified value, xI

• Calculated by adding successive probabilities P(X=xi)

• Easier to work with for many applications

• P(Xxi)

• Theoretical distribution can be compared to sample distribution to determine appropriateness of theoretical distribution

Theoretical Probability Distributions

• Why bother? Isn’t observation enough?

– If we know (reasonably) that data are from a certain distribution, than we know a lot about it

• Means, standard deviations, other measures of dispersion

– That knowledge makes it easier to make statistical inference; i.e., to test differences

• Many types of distributions

– 1300+ have been documented in the literature

• Three main ones

– Binomial (discrete - 0,1)

– Poisson (discrete counts)

– Normal (continuous)

Binomial Distribution

• Derived from a series of binary outcomes called a Bernoulli trial

• When a random process or experiment, called a trial, can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, the trial is called a Bernoulli trial

Bernoulli Process

• A sequence of Bernoulli trials forms a Bernoulli process under the following conditions– Each trial results in one of two

possible, mutually exclusive, outcomes: “success” and “failure”

– Probability of success, p, remains constant from trial to trial. Probability of failure is q = 1-p.

– Trials are independent; that is, success in one trial does not influence the probability of success in a subsequent trial.

Bernoulli Process - Example

• Probability of a certain sequence of binary outcomes (Bernoulli trials) is a function of p and q.

• For example, a particular sequence of 3 “successes” and 2 “failures” can be represented by p*p*p*q*q; = p3q2

• However, if we ask for the probability of 3 “successes” and 2 “failures” in a set of 5 trials, then we need to know how may possible combinations of 3 successes and 2 failures out of all of the possible outcomes there are.

Combinations

• Based on last example, it is clear that we need to calculate more easily the probability of a particular result– If a set consists of n objects, and we

wish to form a subset of x objects from these n objects, without regard to order of the objects in the subset, the result is called a combination

• The number of combinations of n objects taken x at a time is given by

– nCk = n! / (k! ( n-k)!)

– Where k! (factorial) is the product of all numbers from k to 0

• 0! = 1

Combinations

• From this, we can determine the binomial probability density function

– f(x) = nCx px qn-x for x=0,1,2,3…,n

– = 0 elsewhere– This is called the binomial

distribution

Permutations

• Similar to combinations– If a set consists of n objects, and we

wish to form a subset of x objects from these n objects, taking into account the order of the objects in the subset, the result is called a permutation

• The number of permutations of n objects taken x at a time is given by

– nPk = n! / ( n-k)!

Binomial Table

• Normally, we would look up probabilities in the Binomial Table (Table 1 in the Appendix)– Tables the Binomial probability

distribution function– P (X=k)– Find probability that x=4 successes

when n trials = 10 and p of success = 0.3

– Find probability that x4– Find probability that x5

Binomial Table when p > 0.5

• Restate problem in terms of failures– P(X=k|n, p>0.50) = P(X=n-k|n,1-p)– Treat p = q for purposes of using

the table– For cumulative probabilities:

• P(Xk|n,p>0.5) = P(Xn-xk|n,1-p)

– For the probability of X some k when p > 0.5,

• P(X k | n, p>0.5) = P(Xn-k|n,1-p)

Binomial parameters

• Mean

= np

• Variance

2 = np(1-p)

• Appropriateness in sampling situations– Appropriate if n small relative to N– Otherwise, not really in a sampling

situation

Poisson Distribution

• Used for counting processes• If k is the number of occurrences of

some random event in an interval of time or space, the probability that k will occur is given by

– where µ is the average number of occurrences of the random event () in the interval t.

– e = 2.7183• Parameters of the Poisson distribution

– Mean = – Variance =

!)(

k

ekf

k

Poisson Process

• Assumptions– Occurrences of events are independent;

i.e., occurrence of an event has no effect on the probability of the occurrence of a second event

– Theoretically, an infinite number of occurrences of the event must be possible in the interval

– Probability of the single occurrence of the event in a given interval is proportional to the length of the interval; i.e., constant event rate

– In an infinitesimally small portion of the interval, the probability of more than one occurrence of the event in negligible; i.e., the event times are unique and discrete

Application of the Poisson Distribution

• Cancer recurrences– Bladder cancer– Breast cancer

• Infections• Earthquakes• Plane crashes

Using Table of Poisson Distribution

• Use Table 2 to look up probabilities for Poisson variables– Tables exact Poisson probabilities

Pr(X=k)– Example

• Probability of obtaining exactly 4 events

for a Poisson distribution with = 6.0

• Probability of at least 12• Probability of 3 or less

Poisson Approximation to the Binomial Distribution

• When n is large and p is small, the Poisson is a reasonable approximation to the Binomial– Poisson is easier to work with

Biostatistics 410.645.01 Class 3 Discrete Probability Distributions 2/8/2000.

Documents

Transcript of Biostatistics 410.645.01 Class 3 Discrete Probability Distributions 2/8/2000.