Statistics for Bioinformatics - Probability · 2014-06-11 · (Second Edition) Warren J. Ewens,...

Statistics for BioinformaticsProbability

Xiuxia [email protected]

Department of Bioinformatics & GenomicsUniversity of North Carolina at Charlotte

Fall 2009

1 / 54

Outline

Logistics

Motivation

Probability

Set operations

Permutation

Combination

Conditional probability

Independence

Bayes’ theorem

2 / 54

Logistics

Reference books• Statistical Methods in Bioinformatics: An Introduction

(Second Edition)Warren J. Ewens, Gregory R. Grant

• Applied Linear Statistical Models (Fifth Edition)Michael H. Kutner, Christopher J. Nachtsheim, John Neter,William Li

• Introduction to Probability and Statistics (Third Edition)J. S. Milton, Jesse C. Arnold

• Practical Nonparametric Statistics (Third Edition)W.J. Conover

• Introductory Statistics with R (Second Edition)Peter Dalgaard

Programming language: R, Matlab, or other language of yourchoice

Homework: biweekly

Exams: midterm and final

3 / 54

MotivationExample. BLAST (Basic Local Alignment Search Tool) is used tofind regions of local similarity between sequences. The programcompares nucleotide or protein sequences to sequence databasesand calculates the statistical significance of matches. BLAST canbe used to infer functional and evolutionary relationships betweensequences as well as help identify members of gene families.Q: How is the statistical significance calculated? How is thesearch performed?

Example. FDR (False discovery rate). With high-throughputtechnologies such as modern mass spectrometry (MS) widely usedfor proteomics and metabolomics studies, hundreds to thousandsof peptides/metabolites can be identified from a single MSanalysis. Many of these identifications are wrong.Q: How to select correct identifications and estimate theassociated FDR?

Many other examples ...4 / 54

Introduction

We will learn how to perform statistical analysis in biologicalstudies in this course.

Deterministic models: The equations can be used to determinethe value of a specific variable in the model based on theknowledge of the values assumed by other model variables.

Example

Perfect gas law: PV = RT

But there is always some uncertainty in experimental science.

Thus we need:Statistical models: Allow us to assess the degree ofuncertainty present in our experimental results.

5 / 54

Statistical MethodsDescriptive statistics: to describe or picture a data set.

Inferential statistics: to draw conclusions about a large groupof objects, called the population, based on observing only asample, or a portion of the objects in the population.

Model building: to develop prediction equations fromexperimental data.

whose characteristics can be observed)

Population(ideal but theoreticalworld whosecharacteristics are to bedescribed)

Sample(real and hands−on world

6 / 54

Probability

The mathematics on which statistical methods rest is probabilitytheory.

Interpreting probabilities:

Probabilities are numbers between 0 and 1, inclusive.

Probabilities near 1 indicate that the event is extremely likelyto occur.

Probabilities near zero indicate that the event is not very likelyto occur.

Probabilities near 1/2 indicate that the event is just as likelyto occur as not.

7 / 54

Assign Probabilities

Personal opinion

Relative frequency: an approximation

P (A) = limn→∞

nA

n≈ nA

n=

number of times event A occurred

number of times experiment was run

Classical approach: accurate assignment

P (A) =n(A)n(S)

=number of ways event A can occur

number of ways experiment can proceed

Example

Q: What is the probability that a child born to a coupleheterozygous for eye color (each with genes for both brown andblue eyes) will be brown-eyed?A: Assume the gene for brown eyes is dominant, then we getP = 3/4.

8 / 54

Sample Space

Definition (Sample space and sample point). A sample spacefor an experiment is a set S with the property that each physicaloutcome of the experiment corresponds to exactly one element ofS. An element of S is called a sample point.

Example: Possible eye color genotypes are: S = {(brown, blue),(blue, brown), (blue, blue), (brown, brown)} where the firstmember of each pair represents the gene received from the father.

If a sample space has a finite number of points, it is called a finitesample space. It it has as many points as there are naturalnumbers 1, 2, · · · , it is called a countably infinite sample space. Ifit has as many points as there are in some interval on the x-axis,such as 0 ≤ x ≤ 1, it is called a noncountably infinite samplespace. A sample space which is finite or countably infinite is calleda discrete sample space, while one which is noncountably infinite iscalled a continuous sample space.

9 / 54

EventsDefinition (Event). Any subset of a sample space is called anevent. The empty set ∅ is called the impossible event, the subsetS is called the certain event.

Example: The event A that the child is brown-eyed = { (brown,blue), (blue, brown), (brown, brown) }.

P (A) =n(A)n(S)

=34

If the outcome of an experiment is an element of event A, we saythat event A has occurred. An event consisting of a single point ofS is often called a simple or elementary event.

Since events are sets, it is clear that statements concerning eventscan be translated into the language of set theory. In particular, wehave an algebra of events corresponding to the algebra of sets.

10 / 54

Set Operations

Union: A ∪B

Intersection: A ∩B

Difference: A \B

Complement: Ac

Some theorems involving sets

A ∪B = B ∪A

A ∪ (B ∪ C) = (A ∪B) ∪ C = A ∪B ∪ C

A ∩B = B ∩A

A ∩ (B ∩ C) = (A ∩B) ∩ C = A ∩B ∩ C

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)A−B = A ∩Bc

if A ⊂ B, then Ac ⊃ Bc or Bc ⊂ Ac

cont’d on the next slide

11 / 54

Sets and EventsA ∪∅ = A, A ∩∅ = ∅A ∪ S = S, A ∩ S = A

(A ∪B)c = Ac ∩Bc

(A ∩B)c = Ac ∪Bc

A = (A ∩B) ∪ (A ∩Bc)

Principle of duality: Any true result involving sets is also true ifwe replace unions by intersections, intersections by unions, sets bytheir complements and if we reverse the inclusion symbols ⊂ and⊃.

By using set operations on events in S, we can obtain other eventsin S.

A ∪B is the event “either A or B or both.”

A ∩B is the event “both A and B.”

Ac is the event “not A.”

A−B is the event “A but not B.”12 / 54

Sets and EventsDefinition (Mutually exclusive events). Two events A1 and A2

are mutually exclusive if and only if A1 ∩A2 = ∅. EventsA1, A2, A3, · · · are mutually exclusive if and only if Ai ∩Aj = ∅for i 6= j.

Example. If we toss a coin twice, then S = {HT, TH,HH, TT}.Let A be the event “at least one head occurs” and B the event“the second toss results in a tail.” Then A = {HT,HH, TH} andB = {HT, TT}. Then

A ∪B = {HT, TH,HH, TT} = S, A ∩B = {HT}Ac = {TT}, A−B = {TH,HH}

When the physical description of the experiment leads us to believethat the possible outcomes are equally likely, then we can compute

P (A) =n(A)n(S)

. Thus, we need to count n(A) and n(S).

13 / 54

PermutationsDefinition (Permutation). A permutation is an arrangement ofobjects in a definite order.

Example. A pentapeptide consisting of the five amino acids

alanine-valine-glycine-cysteine-tryptophan

has different properties and is, in fact, a different compound fromthe pentapeptide

alanine-glycine-valine-cysteine-tryptophan

which contains the same amino acids. Peptides are permutationsof amino acids because the sequence, or order, of the amino acidsin the chain is important.

14 / 54

PermutationsDefinition (Factorial). Let n be a positive integer. The productn(n− 1)(n− 2) · · · 3 · 2 · 1 is called n factorial and is denoted byn!. Zero factorial, denoted by 0!, is defined to be 1.

Multiplication principle. Consider an experiment taking place ink stages. Let ni denote the number of ways in which stage i canoccur for i = 1, 2, · · · , k. Altogether the experiment can occur in∏k

i=1 ni = n1 · n2 · · · ·nk ways.

Theorem. There are n! ways of arranging n distinguishableobjects into a row.

Example. In how many ways can the five amino acids, alanine,valine, glycine, cysteine, tryphophan, be arranged to form apentapeptide?A: 5!

15 / 54

PermutationsThe multiplication principle should be the first thing that comes tomind when a problem involves order.

Theorem (Sampling without replacement). The number ofpermutations of n distinct objects used k at a time is

P (n, k) =n!

(n− k)!

Application notes: The objects to be arranged must be distinct, norepetition is allowed, and there can be no restrictions on anyposition in the arrangement.

16 / 54

PermutationsTheorem (Sampling with replacement). The number ofpermutations to select k objects out of n objects is

n× n× · · · = nk

Example: If an experiment consists of k trials where each trialmay result in one of n possible outcomes, there are nk possibleoutcomes of the entire experiment.

Example. In a group of k people, what is the probability that 2people will have the same birthday?A: Assume n = 365 and that birthdays are equally distributedthroughout the year, no twins. Then the number of differentcombinations of birthdays n(S) = 365k.The number of combinations where at least 2 are the same:n(A) = n(S)− P (365, k)

P (at least 2 have the same birthday) =n(A)n(S)

=n(S)− P (365, k)

n(S)17 / 54

PermutationsTheorem (Sampling with Indistinguishable Objects). If agroup of n objects is composed of n1 identifical objects of type 1,n2 identical objects of type 2, · · · , nk identical objects of type k,the number of distinguishable arrangements into a row is given bythe multinomial coefficients:(

n!n1, n2, · · · , nk

)=

n!n1!n2! · · ·nk!

, n = n1 + n2 + · · ·nk

Example. The total number of ways to arrange 2 adenine (A), 3guanine (G), 2 cytosine (C), and 1 thymine (T) into a DNAsequence is:

(2 + 3 + 2 + 1)!2!3!2!1!

18 / 54

PermutationsExample. 20 members of a club need to be split into 3committees of 8, 8, and 4 people, respectively. How many ways arethere to split the club into these committees?

ways to split =(

208, 8, 4

)=

20!8!8!4!

Example. When rolling 12 dice, what is the probability that 6pairs are thrown?A: There are 612 possibilities for the dice throws, as each of the 12dice has 6 possible values. The total number of ways where 6 pairsshow up is equal to

(12

2, 2, 2, 2, 2, 2

)=

12!(2!)6

Thus,

P (6 pairs are thrown) =12!

(2!)6612

19 / 54

CombinationsDefinition (Combination). A combination is a selection ofobjects without regard to order.

Theorem. The number of combinations of n distinct objectsselected r at a time is given by

C(n, k) =(

n

k

)=

n!k!(n− k)!

Binomial theorem.

(x + y)n =n∑

k=0

xkyn−k

Example: a red balls, b black balls. The number of distinguishableways to order them in a row is:

(a + b

a

)=(

a + b

b

)

20 / 54

CombinationsExample. How many ways to split n identical balls into k boxes?A: Visualize the balls in boxes in a line, as shown below.

!

There are "n!

times that each term will show up in the expansion. k

Example: a - red balls, b - black balls. number of distinguishable ways to order in a row =

#a + b

$ #a + b

$

= a b

Example: r1 + ... + rk = n; ri = number of balls in each box; n, k given

How many ways to split n objects into k sets?

Visualize the balls in boxes, in a line - as shown:

Fix the outer walls, rearrange the balls and the separators.

If you fix the outer walls of the first and last boxes, you can rearrange the separators and the balls using the binomial theorem. There are n balls and k-1 separators (k boxes). Number of di!erent ways to arrange the balls and separators =

#n + k ! 1

$ #n + k ! 1

$

= n k ! 1

Example: f (x1, x2, ..., xk ), take n partial derivatives:

!nf

!2x1!x2!5x3...!xk

k “boxes” ! k “coordinates”

n “balls” ! n “partial derivatives”

number of di!erent partial derivatives = "n+k!1

! =

"n+k!1

!

n k!1

Example: In a deck of 52 cards, 5 cards are chosen. What is the probability that all 5 cards have di!erent face values? "

52total number of outcomes = !

5

total number of face value combinations = "13

!

5

total number of suit possibilities, with replacement = 45

"13

!45

P(all 5 di!erent face values) = "552

5

** End of Lecture 2.

6

If you fix the outer walls of the first and last boxes, you canrearrange the separators and the balls using the binomial theorem.There are n balls and k − 1 separators (boxes).

Number of different ways to arrange the balls and separators =

(n + k − 1

n

)=(

n + k − 1k − 1

)

21 / 54

CombinationsExample. For f(x1, x2, · · · , ck), take n partial derivatives:

∂nf

∂2x1∂x2∂5x3 · · · ∂xk

What is the total number of different partial derivatives?A: (

n + k − 1n

)=(

n + k − 1k − 1

)

Example. In a deck of 52 cards, 5 cards are chosen. What is theprobability that all 5 cards have different face values?A: Total number of outcomes =

(525

)

Total number of face value combinations =(135

)

Total number of suit possibilities, with replacement = 45

P (all 5 different face values) =

(135

)45

(525

)

22 / 54

Probability Laws

Axioms of probability

1. P (S) = 1.

2. 0 ≤ P (A) ≤ 1 for every event A.

3. Let A1, A2, · · · be a finite or an infinite sequence of mutuallyexclusive events. Then

P (A1∪A2∪· · · ) = P (A1)+P (A2)+· · · , i.e. P (∞⋃

i=1

Ai) =n∑

i=1

P (Ai)

This is called ”countably additive.”

Example: The distribution of blood types in the United States isroughly 41% type A, 9% type B, 4% type AB, and 46% type O.An individual is brought into an emergency room and is to beblood-typed. What is the probability that the type will be A, B, orAB?A: P = .41 + .09 + .04 = .54

23 / 54

Probability Laws

P (∅) = 0.P (Ac) = 1− P (A).General addition rule

P (A1 ∪A2) = P (A1) + P (A2)− P (A1 ∩A2)P (A1 ∪A2 ∪A3) = P (A1) + P (A2) + P (A3)

− P (A1A2)− P (A2A3)− P (A1A3) + P (A1A2A3)

A ⊆ B ⇒ P (A) ≤ P (B)

Example. A doctor knows that P (bacterial infection) = 0.7 andP (viral infection) = 0.4. What is P (B ∩ V ) if P (B ∪ V ) = 1?A:

P (B ∪ V ) = P (B) + P (V )− P (B ∩ V )⇒ P (B ∩ V ) = 0.7 + 0.4− 1 = 0.1

24 / 54

Conditional Probability

Example. In trying to determine the sex of a child, a pregnancytest called “starch gel electrophoresis” is used. This test mayreveal the presence of a protein zone called the pregnancy zone.This zone is present in 43% of all pregnant women. Furthermore,it is known that 51% of all children born are male. 17% of allchildren born are male and the pregnancy zone is present.

A2

.26 .17 .34

S (all pregnant women)

A1

Let A1 denote the event that thepregnancy zone is present, and A2 thatthe child is male. We know that, for arandomly selected pregnant woman:P (A1) = .43, P (A2) = .51P (A1 ∩A2) = .17

Q: What is the probability that the child is male?A: .51

25 / 54


Q: What is the probability that the child is male given that thepregnancy zone is present?A: The sample space now no longer includes all pregnant women,but rather consists only of the 43% women with the pregnancyzone present. Of these, .17/.43 = .395 have male children. Thus

P (male|zone present) = P (A2|A1) = .395

Definition (Conditional probability). Let A and B be eventssuch that P (B) 6= 0. The conditional probability of A given B,denoted by P (A|B), is defined by

P (A|B) =P (A ∩B)

P (B)

26 / 54


18.05 Lecture 4

February 11, 2005

Union of Events

P(A1 ... An) =

!

P(Ai) !!

P(AiAj ) +

!

P(AiAj Ak ) + ..." "i i<j i<j<k

It is often easier to calculate P(intersections) than P(unions)

Matching Problem: You have n letters and n envelopes, randomly stu! the letters into the envelopes. What is the probability that at least one letter will match its intended envelope?

P(A1 " ... An), Ai = {ith position will match}

(n!1)!P(Ai) = 1

"= n!n

(permute everyone else if just Ai is in the right place.) P(AiAj ) =

(n!2)! (Ai and Aj are in the right place) n!

P(Ai1Ai2...Aik ) = (n!k)!

n!

1 "

n#

(n ! 2)! "

n#

(n ! 3)! ! ... + (!1)n+1

"n#

(n ! n)!P(A1 ... An) = n +" " #

n !

2 n! 3 n! n n!

general term: "n#

(n ! k)! n!(n ! k)! 1 = =

k n! k!(n ! k)!n! k!

1 1 SUM = 1 ! +

3! ! ... + (!1)n+1

1

2! n! 2 3

Recall: Taylor series for ex = 1 + x + x +

x + ...2! 3! 1for x= -1, e!1 = 1 ! 1 +

1 + ...3!2 !

therefore, SUM = 1 - limit of Taylor series as n $ %

When n is large, the probability converges to 1 ! e!1 = 0.63

§2.1 - Conditional Probability

Given that B “happened,” what is the probability that A also happened?

The sample space is narrowed down to the space where B has occurred:

The sample size now only includes the determination that event B happened.

Definition: Conditional probability of Event A given Event B:

P(AB)P(A B) = |

P(B)

Visually, conditional probability is the area shown below:

11

Since B is known to haveoccurred, it becomes the newsample space replacing theoriginal S.

It is sometimes easier to calculate intersection given conditional probability: P(AB) = P(A B)P(B)|

Example: Roll 2 dice, sum (T) is odd. Find P(T < 8). B = {T is odd}, A = {T < 8}

P(AB) 18 1 P(A B) = , P(B) = =|

P(B) 62 2

All possible odd T = 3, 5, 7, 9, 11. Ways to get T = 2, 4, 6, 4, 2 - respectively.

1 1/3 2P(AB) = 12 = 3

; P(A|B) = 1/2 = 36 3

Example: Roll 2 dice until sum of 7 or 8 results (T = 7 or 8) P(A = {T = 7}), B = {T = 7 or 8}

This is the same case as if you roll once. P(A B) =

P(AB) = P(A) 6/36 =

6= P(B) P(B) (6+5)/36 11|

Example:

Treatments for a disease, results after 2 years: Result A B C Placebo

Relapse 18 13 22 24

No Relapse 22 25 16 10

24Example, considering Placebo: B = Placebo, A = Relapse. P(A B) = 24+10 = 0.7|

13Example, considering treatment B: P(A B) = 13+25 = 0.34|

As stated earlier, conditional probability can be used to calculate intersections: Example: You have r red balls and b black balls in a bin. Draw 2 without replacement, What is P(1 = red, 2 = black)?

rWhat is P(2 = black) given that 1 = red ? P(1 = red) = r+b

Now, there are only r - 1 red balls and still b black balls. b rP(2 = black 1 = red) = b+r

b

!1 ! P(AB) = r+b| b+r!1

!

P(A1A2...An) = P(A1) ! P(A2 A1) ! P(A3 A2 A1) ! ... ! P(An An!1...A2 A1) = | | | | |

P(A2A1) P(A3A2A1) P(AnAn!1...A1) == P(A1) ! P(A1)

! P(A2A1)

! ... P(An!1...A1)

= P(AnAn!1...A1)

Example, continued: Now, find P(r, b, b, r)

12

P (A|B) refers to the probabilityof the cross-shaded area.

Example.

It is sometimes easier to calculate intersection given conditional probability: P(AB) = P(A B)P(B)|

Example: Roll 2 dice, sum (T) is odd. Find P(T < 8). B = {T is odd}, A = {T < 8}

P(AB) 18 1 P(A B) = , P(B) = =|

P(B) 62 2

All possible odd T = 3, 5, 7, 9, 11. Ways to get T = 2, 4, 6, 4, 2 - respectively.

1 1/3 2P(AB) = 12 = 3

; P(A|B) = 1/2 = 36 3

Example: Roll 2 dice until sum of 7 or 8 results (T = 7 or 8) P(A = {T = 7}), B = {T = 7 or 8}

This is the same case as if you roll once. P(A B) =

P(AB) = P(A) 6/36 =

6= P(B) P(B) (6+5)/36 11|

Example:

Treatments for a disease, results after 2 years: Result A B C Placebo

Relapse 18 13 22 24

No Relapse 22 25 16 10

24Example, considering Placebo: B = Placebo, A = Relapse. P(A B) = 24+10 = 0.7|

13Example, considering treatment B: P(A B) = 13+25 = 0.34|

As stated earlier, conditional probability can be used to calculate intersections: Example: You have r red balls and b black balls in a bin. Draw 2 without replacement, What is P(1 = red, 2 = black)?

rWhat is P(2 = black) given that 1 = red ? P(1 = red) = r+b

Now, there are only r - 1 red balls and still b black balls. b rP(2 = black 1 = red) = b+r

b

!1 ! P(AB) = r+b| b+r!1

!

P(A1A2...An) = P(A1) ! P(A2 A1) ! P(A3 A2 A1) ! ... ! P(An An!1...A2 A1) = | | | | |

P(A2A1) P(A3A2A1) P(AnAn!1...A1) == P(A1) ! P(A1)

! P(A2A1)

! ... P(An!1...A1)

= P(AnAn!1...A1)

Example, continued: Now, find P(r, b, b, r)

12

27 / 54


P (Relapse|Placebo) =24

24 + 10= 0.7

P (Relapse|treatment B) =13

13 + 25= 0.34

Remark: Conditional probability can be used to calculateprobability of intersections.

Example. You have r red balls and b black balls in a bin. Draw 2without replacement. What is P (1st = red, 2nd = black)?A:

P (1st = red) =r

r + b

P (2nd = black|1st = red) =b

b + r − 1P (1st = red, 2nd = black) = P (2nd = black|1st = red)P (1st = red)

=b

b + r − 1r

r + b 28 / 54


Theorem. For any three events A1, A2, A3, we have

P (A1 ∩A2 ∩A3) = P (A1)P (A2|A1)P (A3|A1 ∩A2)

Generally,

P (A1A2 · · ·An) = P (A1)P (A2|A1)P (A3|A2A1) · · ·P (An|An−1 · · ·A1)

= P (A1)P (A2A1)P (A1)

P (A3A2A1)P (A2A1)

· · · P (AnAn−1 · · ·A1)P (An−1 · · ·A1)

= P (AnAn−1 · · ·A1)

Example. Continue the previous example.

P (1st = r, 2nd = b, 3rd = b, 4th = r)

=r

r + b

b

r + b− 1b− 1

r + b− 2r − 1

r + b− 3

29 / 54


Theorem. If an event A must result in one of the mutuallyexclusive events A1, A2, · · · , An, then

P (A) = P (A|A1)P (A1) + P (A|A2)P (A2) + · · ·+ P (A|An)P (An)

Sometimes, P (A2|A1) = P (A2)⇒ A1 and A2 have a specialrelationship to one another.

30 / 54

Independence

Definition (Independent events). Events A1 and A2 areindependent if and only if

P (A1 ∩A2) = P (A1)P (A2)

Example. Consider an experiment that consists of rolling a singlefair die once and then tossing a fair coin once. A sample space forthis experiment is

S ={(1, H), (1, T ), (2, H), (2, T ), (3, H), (3, T ),(4, H), (4, T ), (5, H), (5, T ), (6, H), (6, T )}

Consider these events:A: the die shows one or twoB: the coin shows headsA ∩B: the die shows one or two and the coin shows heads.

31 / 54

Independence

Using classical probability:

P (A) = P ({(1, H), (1, T ), (2, H), (2, T )}) = 4/12 = 1/3P (B) = P ({(1, H), (2, H), (3, H), (4, H), (5, H), (6, H)})

= 6/12 = 1/2P (A ∩B) = P ({(1, H), (2, H)}) = 2/12 = 1/6

Clearly,P (A ∩B) = P (A) · P (B)

Thus, events A and B are independent.

Example. Consider an experiment that consists of drawing twocoins in succession from a box containing a nickel (N), a dime (D),and a quarter (Q). The first coin is not replaced before the secondis drawn. A sample space for this experiment is

S = {(N, D), (N, Q), (D,N), (D,Q), (Q, N), (Q, D)}32 / 54

Independence

Consider events:

A: the first coin is a dimeB: the second coin is a dime

Using classical probability, we can see that

P (A) = P ({(D,N), (D,Q)}) = 2/6P (B) = P ({(N, D), (Q, D)}) = 2/6

P (A ∩B) = P (∅) = 0

Clearly: P (A ∩B) 6= P (A) · P (B). Thus events A and B are notindependent.

33 / 54

Independence

Example. Consider the experiment of drawing a card from awell-shuffled deck of 52 cards. Let

A1 : a spade is drawn

A2 : an honor (10, J,Q, K, A) is drawn

Then; P (A1) = 13/52, P (A2) = 20/52, and P (A1 ∩A2) = 5/52.Q: Are the events A1 and A2 independent?A:

P (A1)P (A2) = (13/52)(20/52) = 5/52 = P (A1 ∩A2)

So, these events are independent.

The definition of independence can be used to find the probabilitythat two events will occur simultaneously when the events areclearly independent.

34 / 54

Independence

Example. The probability that a couple heterozygous for eye colorwill parent a brown-eyed child is 3/4 for each child. Genetic studiesindicate that the eye color of one child is independent of that ofthe other. Thus, if the couple has two children, then theprobability that both will be brown-eyed is

P (first brown and second brown) = P (first brown)P (second brown)

=34· 34

=916

Example. Toss an unfair coin twice. These are independentevents. P (H) = p, 0 ≤ p ≤ 1. Find P (“TH ′′) = tails first, headssecond.

P (“TH ′′) = P (T )P (H) = (1− p)p

Since this is an unfair coin, the probability is not just 1/4. If fair,

TH

HH + HT + TH + TT=

14

35 / 54

Independence

Theorem. Let A1 and A2 be events such that at least one ofP (A1) or P (A2) is nonzero. A1 and A2 are independent if andonly if

P (A1|A2) = P (A1) if P (A2) 6= 0

andP (A2|A1) = P (A2) if P (A1) 6= 0

36 / 54

Independence

If A1, A2, A3 are to be independent, then they must be pairwiseindependent,

P (Aj ∩Ak) = P (Aj)P (Ak), j 6= k where j, k = 1, 2, 3

and we must also have

P (A1 ∩A2 ∩A3) = P (A1)P (A2)P (A3)

Neither of the two equations above is by itself sufficient.Generalization to more than three events can be easily made as inthe following definition.

Definition. Let C = {Ai, i = 1, 2, · · · , n} be a finite collection ofevents. These events are independent if and only if, given anysubcollection A(1), A(2), · · · , A(m) of elements of C,

P (A(1) ∩A(2) ∩ · · · ∩A(m)) = P (A(1))P (A(2)) · · ·P (A(m))

37 / 54

Independence

This definition can be used to

test a collection of events for independence.

find the probability that a series of events that are assumed tobe independent will occur. This is the main purpose.

Example. During a space shot, the primary computer system isbacked up by two secondary systems. They operate independentlyof one another and each is 90% reliable. What is the probabilitythat all three systems will be operable at the time of the launch?Let

A1 : the main system is operable

A2 : the first backup is operable

A3 : the second backup is operable

Then,P (A1 ∩A2 ∩A3) = P (A1)P (A2)P (A3) = (.9)(.9)(.9) = .729

38 / 54

Independence

Example. Example of pairwise independence. Consider atetrahedral die, equally weighted. Three of the faces are eachcolored red, blue, and green, but the last face is multicolored,containing red, blue, and green.

P (red) = 2/4 = 1/2 = P (blue) = P (green)P (red and blue) = 1/4 = 1/2× 1/2 = P (red)P (blue)

Therefore, the pair { red, blue } is independent. The same can beproven for { red, green } and { blue, green }. But, what about allthree together?

P (red, blue, and green) = 1/4 6= P (red)P (blue)P (green) = 1/8

Therefore, { red, blue, and green } is not fully independent.

Example. Toss an unfair coin 5 times and get “HTHTT” withP (H) = p, P (T ) = 1− p. Then

P (“HTHTT ′′) = P (H)P (T )P (H)P (T )P (T )

= p(1− p)p(1− p)(1− p) = p2(1− p)339 / 54

Independence

Example. Continue the previous example. FindP (get 2H and 3T, in any order).

P (get 2H and 3T, in any order) = sum of probabilities for ordering

=(

52

)p2(1− p)3

Example. Throw a coin n times, then

P (k heads out of n throws) =(

n

k

)pk(1− p)n−k

Example. Toss a coin until the result is “H”. Then

P (tosses = n) = P (TT · · ·H) = (1− p)n−1

40 / 54

Independence

Remark: Experiments can be physically independent (e.g. roll 1die, then roll another die), or seem physically related and still beindependent.

Example. Roll a die, let A = {odd}, B = {1, 2, 3, 4}. A, Bseem to be related.

P (A) =12, P (B) =

23, P (AB) = P ({1, 3}) =

13

.

P (AB) = P (A)P (B) =12

23

=13⇒ A and B are independent.

Remark: Independence does notimply that the sets do notintersect.

18.05 Lecture 5

February 14, 2005

§2.2 Independence of events. P(A B) =

P(AB)

P(B) ;|

Definition - A and B are independent if P(A B) = P(A)|

P(AB)P(A B) = = P(A) ! P(AB) = P(A)P(B)|

P(B)

Experiments can be physically independent (roll 1 die, then roll another die), or seem physically related and still be independent. Example: A

P(A 1) =

{ }.P(B 2 {1, 3}

= P(AB 13

odd , B

) =

) = , therefore independent.

= = {1, 2, 3, 4}. Related events, but independent. .AB = 2 3

2P(AB) = 1

32 !

Independence does not imply that the sets do not intersect.

Disjoint = Independent. "

If A, B are independent, find P(ABc) P(AB) = P(A)P(B) ABc = A \ AB, as shown:

so, P(ABc) = P(A) # P(AB) = P(A) # P(A)P(B) = P(A)(1 # P(B)) = P(A)P(Bc) therefore, A and Bc are independent as well. similarly, Ac and Bc are independent. See Pset 3 for proof.

Independence allows you to find P(intersection) through simple multiplication.

14

41 / 54

Independence

Example. If A, B are independent, find P (ABc).

Proof: ABc = A \AB

P (ABc) = P (A)− P (AB)= P (A)− P (A)P (B)= P (A)(1− P (B))= P (A)P (Bc)

18.05 Lecture 5

February 14, 2005

§2.2 Independence of events. P(A B) =

P(AB)

P(B) ;|

Definition - A and B are independent if P(A B) = P(A)|

P(AB)P(A B) = = P(A) ! P(AB) = P(A)P(B)|

P(B)

Experiments can be physically independent (roll 1 die, then roll another die), or seem physically related and still be independent. Example: A

P(A 1) =

{ }.P(B 2 {1, 3}

= P(AB 13

odd , B

) =

) = , therefore independent.

= = {1, 2, 3, 4}. Related events, but independent. .AB = 2 3

2P(AB) = 1

32 !

Independence does not imply that the sets do not intersect.

Disjoint = Independent. "

If A, B are independent, find P(ABc) P(AB) = P(A)P(B) ABc = A \ AB, as shown:

so, P(ABc) = P(A) # P(AB) = P(A) # P(A)P(B) = P(A)(1 # P(B)) = P(A)P(Bc) therefore, A and Bc are independent as well. similarly, Ac and Bc are independent. See Pset 3 for proof.

Independence allows you to find P(intersection) through simple multiplication.

14

Therefore, A and Bc are independent as well. Similarly, Ac and Bc

are independent.

42 / 54

Independence

Question: How to find the probability of the simultaneousoccurrence of two events if the events are not independent?Answer: The multiplication rule

P (A1 ∩A2) = P (A1|A2)P (A2)

Example. Recent research indicates that approximately 49% of allinfections involve anaerobic bacteria. Furthermore, 70% of allanaerobic infections are polymicrobic; that is, they involve morethan one anaerobe. What is the probability that a given infectioninvolves anaerobic bacteria and is polymicrobic?A: Let A1 denote the event that the infection is anaerobic and A2

that it is polymicrobic. It is given that P (A1) = .49 andP (A2|A1) = .70. Then

P (A1 ∩A2) = P (A2|A1)P (A1) = (.70)(.49) = .343

43 / 54

Bayes’ Theorem

Example. The blood type distribution in the United States is typeA, 41%; type B, 9%; type AB, 4%; and type O, 46%. It isestimated that during World War II, 4% of inductees with type Oblood were typed as having type A; 88% of those with type A werecorrectly typed; 4% with type B blood were typed as A; and 10%with type AB were typed as A. A soldier was wounded and broughtto surgery. He was typed as having type A blood. What is theprobability that this is his true blood type?

O

B ∩ TA AB ∩ TA

O ∩ TA

A ∩ TA

A

B

AB

44 / 54

Bayes’ Theorem

A: Let

A : he has type A blood

B : he has type B blood

AB : he has type AB blood

O : he has type O blood

TA : he is typed as type A

We want to find P (A|TA). We are given:

P (A) = .41, P (TA|A) = .88P (B) = .09, P (TA|B) = .04P (AB) = .04, P (TA|AB) = .10P (O) = .46, P (TA|O) = .04

45 / 54

Bayes’ Theorem

To get P (A|TA), we can apply the definition of conditionalprobability:

P (A|TA) =P (A ∩ TA)

P (TA)

By the multiplication rule:

P (A ∩ TA) = P (TA|A)P (A) = (.88)(.41) .= .36

Event TA can be partitioned into four mutually exclusive events,that is

TA = (A ∩ TA) ∪ (B ∩ TA) ∪ (AB ∩ TA) ∪ (O ∩ TA)

By axiom 3:

P (TA) = P (A∩ TA) + P (B ∩ TA) + P (AB ∩ TA) + P (O ∩ TA)

46 / 54

Bayes’ Theorem

Applying the multiplication rule to each of the terms on the righthand side:

P (TA) = P (TA|A)P (A) + P (TA|B)P (B)+ P (TA|AB)P (AB) + P (TA|O)P (O)= (.88)(.41) + (.04)(.09) + (.10)(.04) + (.04)(.46) .= .39

Thus

P (A|TA) =P (A ∩ TA)

P (TA)=

.36

.39.= .92

47 / 54

Bayes’ Theorem

Out of n couples, P(A) = P(at least 1 couple) = 1 ! P(no couples) = 1 ! !n

i=1(1 ! p) *Each* couple doesn’t satisfy the description, if no couples exist. Use independence property, and multiply. P(A) = 1 ! (1 ! p)n

P(B) = P(at least two) = 1 ! P(0 couples) ! P(exactly 1 couple) = 1 ! (1 ! p)n ! n " p(1 ! p)n!1, keep in mind that P(exactly 1) falls into P(k out of n)

1 ! (1 ! p)n ! np(1 ! p)n!1

P(B A) = |1 ! (1 ! p)n

If n = 8 million people, P(B A) = 0.2966, which is within reasonable doubt! |P(2 couples) < P(1 couple), but given that 1 couple exists, the probability that 2 exist is not insignificant.

In the large sample space, the probability that B occurs when we know that A occured is significant!

2.3 Bayes’s Theorem §

It is sometimes useful to separate a sample space S into a set of disjoint partitions:

B1 k -, ..., B a partition of sample space S.

Bi # Bj = $, for i =% j, S = "k

Bi (disjoint) i=1

Total probability: P(A) = #k P(ABi) =

#k P(A|Bi)P(Bi)i=1 i=1

(all ABi are disjoint, "k

ABi = A)i=1

** End of Lecture 5

16

Partition B1, B2, · · · , Bk where

k[i=1

Bi = S

Bi ∩Bj = ∅ for i 6= j

Total probability:

P (A) =kX

i=1

P (ABi)

=

kXi=1

P (A|Bi)P (Bi)

Example. In box 1, there are 60 short bolts and 40 long bolts. Inbox 2, there are 10 short bolts and 20 long bolts. Take a box atrandom, and pick a bolt. What is the probability that you choose ashort bolt?

48 / 54

Bayes’ Theorem

A: Let B1 = choose box 1 and B2 = choose box 2. Then

P (short) = P (short|B1)P (B1) + P (short|B2)P (B2)

=60

60 + 4012

+10

10 + 2012

Theorem (Bayes’ theorem). Let B1, B2, · · · , Bn be a collectionof events which partition S. Let A be an event such thatP (A) 6= 0. Then for any of the events Bj , j = 1, 2, · · · , n,

P (Bj |A) =P (A|Bj)P (Bj)∑ni=1 P (A|Bi)P (Bi)

This enables us to find the probabilities of the various eventsB1, B2, · · · , Bn which can cause A to occur. For this reason,Bayes’ theorem is often referred to as a theorem on the probabilityof causes.

49 / 54

Bayes’ Theorem

Example. Medical detection test with 90% accuracy. In thegeneral public, the chance of getting the disease is 1 in 10,000. Ifthe result comes up positive, what is the probability that youactually have the disease?

A: Let B1 = you have the disease and B2 = you do not have thedisease. Then the following is given:

P (B1) = 0.0001, P (B2) = 0.9999P (positive|B1) = 0.9, P (positive|B2) = 0.1

We need to compute P (B1|positive):

P (B1|positive) =P (positive|B1)P (B1)

P (positive|B1)P (B1) + P (positive|B2)P (B2)

=(0.9)(0.0001)

(0.9)(0.0001) + (0.1)(0.9999)= 0.0009

The probability is still very small that you actually have the disease.50 / 54

Bayes’ Theorem

Example. (Identify the source of a defective item.) There are 3machines: M1, M2, M3. The probability that they are defective is0.01, 0.02, 0.03, respectively. The percent of items made that comefrom each machine is: 20%, 30%, and 50%, respectively. Given adefective item, what is the probability that it comes from machine1?

A: The following is given:

P (M1) = 0.2, P (M2) = 0.3, P (M3) = 0.5P (D|M1) = 0.01, P (D|M2) = 0.02, P (D|M3) = 0.03

We need to compute P (M1|D):

P (M1|D) =P (D|M1)P (M1)

P (D|M1)P (M1) + P (D|M2)P (M2) + P (D|M3)P (M3)

=(0.01)(0.2)

(0.01)(0.2) + (0.02)(0.3) + (0.03)(0.5)= 0.087

51 / 54

Bayes’ Theorem

Example. A gene has 2 alleles: A and a. The gene exhibits itselfthrough a trait with two versions. The possible phenotypes aredominant with genotypes AA or Aa, and recessive with genotypeaa. Alleles travel independently, derived from a patient’s genotype.In a population, the probability of having a particular allele:P (A) = 0.5, P (a) = 0.5. Therefore, the probabilities of thegenotypes are: P (AA) = 0.25, P (Aa) = 0.5, P (aa) = 0.25. If yousee that a person has dominant phenotype, predict the genotypesof the parents.

52 / 54

Bayes’ Theorem

A: Genotypes of parents: (AA, AA), (AA, Aa), (AA, aa), (Aa,Aa), (Aa, aa), (aa, aa). Assume pairs match regardless ofgenotype.

parent genotypes probabilities probability that child has dominant phenotype

(AA, AA) 116

1

(AA, Aa) 2( 14)( 1

2) = 1

41

(AA, aa) 2( 14)( 1

4) = 1

81

(Aa, Aa) ( 12)( 1

2) = 1

434

(Aa, aa) 2( 12)( 1

4) = 1

412

(aa, aa) 116

0

P ((AA, AA)|dominant phenotype)

=1× 1

16116 (1) + 1

4 (1) + 18 (1) + 1

4 ( 34 ) + 1

4 ( 12 ) + 1

16 (0)=

112

53 / 54

Bayes’ Theorem

Example. You have 1 machine. In good condition, defective itemsare only produced 1% of the time. In broken condition, defectiveitems are produced 40% of the time. The following is given:

P (G) = P (in good condition) = 0.9P (B) = P (in broken condition) = 0.1

Sample 6 items and find that 2 are defective. Is the machinebroken?

A:

P (G|2 out of 6 are defective)

=P (2 of 6 are defective|G)P (G)

P (2 of 6 are defective|G)P (G) + P (2 of 6 are defective|B)P (B)

=

(62

)(0.01)2(0.99)4(0.9)(

62

)(0.01)2(0.99)4(0.9) +

(62

)(0.4)2(0.6)4(0.1)

= 0.04

54 / 54

Statistics for Bioinformatics - Probability · 2014-06-11 · (Second Edition) Warren J. Ewens,...

Documents

Transcript of Statistics for Bioinformatics - Probability · 2014-06-11 · (Second Edition) Warren J. Ewens,...