Biostatistics 410.645.01 Class 3 Discrete Probability Distributions 2/8/2000.
-
Upload
marjory-caldwell -
Category
Documents
-
view
216 -
download
3
Transcript of Biostatistics 410.645.01 Class 3 Discrete Probability Distributions 2/8/2000.
Biostatistics410.645.01
Class 3
Discrete Probability Distributions
2/8/2000
Probability distributions of discrete variables
• A table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities– P(X=x)
• Tables – value, frequency, probability• Graph – usually bar chart or histogram• Formula - Binomial distribution
Cumulative Distributions
• Probability that X is less than or equal to a specified value, xI
• Calculated by adding successive probabilities P(X=xi)
• Easier to work with for many applications
• P(Xxi)
• Theoretical distribution can be compared to sample distribution to determine appropriateness of theoretical distribution
Theoretical Probability Distributions
• Why bother? Isn’t observation enough?
– If we know (reasonably) that data are from a certain distribution, than we know a lot about it
• Means, standard deviations, other measures of dispersion
– That knowledge makes it easier to make statistical inference; i.e., to test differences
• Many types of distributions
– 1300+ have been documented in the literature
• Three main ones
– Binomial (discrete - 0,1)
– Poisson (discrete counts)
– Normal (continuous)
Binomial Distribution
• Derived from a series of binary outcomes called a Bernoulli trial
• When a random process or experiment, called a trial, can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, the trial is called a Bernoulli trial
Bernoulli Process
• A sequence of Bernoulli trials forms a Bernoulli process under the following conditions– Each trial results in one of two
possible, mutually exclusive, outcomes: “success” and “failure”
– Probability of success, p, remains constant from trial to trial. Probability of failure is q = 1-p.
– Trials are independent; that is, success in one trial does not influence the probability of success in a subsequent trial.
Bernoulli Process - Example
• Probability of a certain sequence of binary outcomes (Bernoulli trials) is a function of p and q.
• For example, a particular sequence of 3 “successes” and 2 “failures” can be represented by p*p*p*q*q; = p3q2
• However, if we ask for the probability of 3 “successes” and 2 “failures” in a set of 5 trials, then we need to know how may possible combinations of 3 successes and 2 failures out of all of the possible outcomes there are.
Combinations
• Based on last example, it is clear that we need to calculate more easily the probability of a particular result– If a set consists of n objects, and we
wish to form a subset of x objects from these n objects, without regard to order of the objects in the subset, the result is called a combination
• The number of combinations of n objects taken x at a time is given by
– nCk = n! / (k! ( n-k)!)
– Where k! (factorial) is the product of all numbers from k to 0
• 0! = 1
Combinations
• From this, we can determine the binomial probability density function
– f(x) = nCx px qn-x for x=0,1,2,3…,n
– = 0 elsewhere– This is called the binomial
distribution
Permutations
• Similar to combinations– If a set consists of n objects, and we
wish to form a subset of x objects from these n objects, taking into account the order of the objects in the subset, the result is called a permutation
• The number of permutations of n objects taken x at a time is given by
– nPk = n! / ( n-k)!
Binomial Table
• Normally, we would look up probabilities in the Binomial Table (Table 1 in the Appendix)– Tables the Binomial probability
distribution function– P (X=k)– Find probability that x=4 successes
when n trials = 10 and p of success = 0.3
– Find probability that x4– Find probability that x5
Binomial Table when p > 0.5
• Restate problem in terms of failures– P(X=k|n, p>0.50) = P(X=n-k|n,1-p)– Treat p = q for purposes of using
the table– For cumulative probabilities:
• P(Xk|n,p>0.5) = P(Xn-xk|n,1-p)
– For the probability of X some k when p > 0.5,
• P(X k | n, p>0.5) = P(Xn-k|n,1-p)
Binomial parameters
• Mean
= np
• Variance
2 = np(1-p)
• Appropriateness in sampling situations– Appropriate if n small relative to N– Otherwise, not really in a sampling
situation
Poisson Distribution
• Used for counting processes• If k is the number of occurrences of
some random event in an interval of time or space, the probability that k will occur is given by
– where µ is the average number of occurrences of the random event () in the interval t.
– e = 2.7183• Parameters of the Poisson distribution
– Mean = – Variance =
!)(
k
ekf
k
Poisson Process
• Assumptions– Occurrences of events are independent;
i.e., occurrence of an event has no effect on the probability of the occurrence of a second event
– Theoretically, an infinite number of occurrences of the event must be possible in the interval
– Probability of the single occurrence of the event in a given interval is proportional to the length of the interval; i.e., constant event rate
– In an infinitesimally small portion of the interval, the probability of more than one occurrence of the event in negligible; i.e., the event times are unique and discrete
Application of the Poisson Distribution
• Cancer recurrences– Bladder cancer– Breast cancer
• Infections• Earthquakes• Plane crashes
Using Table of Poisson Distribution
• Use Table 2 to look up probabilities for Poisson variables– Tables exact Poisson probabilities
Pr(X=k)– Example
• Probability of obtaining exactly 4 events
for a Poisson distribution with = 6.0
• Probability of at least 12• Probability of 3 or less
Poisson Approximation to the Binomial Distribution
• When n is large and p is small, the Poisson is a reasonable approximation to the Binomial– Poisson is easier to work with