Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech...

49
Review of Probability Axioms and Laws Reference: 1. D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability,” Athena Scientific, 2008. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University

Transcript of Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech...

Page 1: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Review of Probability Axioms and Laws

Reference:1. D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability,” Athena Scientific, 2008.

Berlin ChenDepartment of Computer Science & Information Engineering

National Taiwan Normal University

Page 2: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

What is “Probability” ?

• Probability was developed to describe phenomena that cannot be predicted with certainty– Frequency of occurrences– Subjective beliefs

• Everyone accepts that the probability (of a certain thing to happen) is a number between 0 and 1 (?)

• Measures deduced from probability axioms and theories (laws/rules) can help us deal with and quantify “information”

Berlin Chen 2

Page 3: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Sets (1/2)

• A set is a collection of objects which are the elements of the set– If is an element of set , denoted by– Otherwise denoted by

• A set that has no elements is called empty set is denoted by Ø

• Set specification– Countably finite:– Countably infinite:– With a certain property:

Berlin Chen 3

6,5,4,3,2,1 ,...4,4,2,2,0

integer is 2kk 10 xx Pxx satisfies

such that

x S SxSx

Page 4: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Sets (2/2)

• If every element of a set is also an element of a set , then is a subset of– Denoted by or

• If and , then the two sets are equal– Denoted by

• The universal set, denoted by , which contains all objects of interest in a particular context– After specifying the context in terms of universal set , we only

consider sets that are subsets of

Berlin Chen 4

ST S T

TS ST

TS ST TS

S

Page 5: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Set Operations (1/3)

• Complement– The complement of a set with respect to the universe , is

the set , namely, the set of all elements that do not belong to , denoted by

– The complement of the universe Ø

• Union– The union of two sets and is the set of all elements that

belong to or , denoted by

• Intersection– The intersection of two sets and is the set of all

elements that belong to both and , denoted by

Berlin Chen 5

S Sxx

S cSc

STS

T TS

TSS T TS

TxSxxTS or

TxSxxTS and

Page 6: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Set Operations (2/3)

• The union or the intersection of several (or even infinite many) sets

• Disjoint– Two sets are disjoint if their intersection is empty (e.g., =

Ø)

• Partition– A collection of sets is said to be a partition of a set if the sets

in the collection are disjoint and their union is Berlin Chen 6

nSxxSSS nn

n somefor 211

nSxxSSS nn

n allfor 211

TS

SS

Page 7: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Set Operations (3/3)

• Visualization of set operations with Venn diagrams

Berlin Chen 7

Page 8: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

The Algebra of Sets

• The following equations are the elementary consequences of the set definitions and operations

• De Morgan’s law

Berlin Chen 8

n

cn

c

nn SS

n

cn

c

nn SS

,

,

,

,

S

SS

USTSUTS

STTS

cc

.

,

SS

SS

USTSUTS

UTSUTS

c

Ø

commutative

distributive

associative

distributive

Page 9: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Probabilistic Models (1/2)

• A probabilistic model is a mathematical description of an uncertainty situation– It has to be in accordance with a fundamental framework to be

discussed shortly

• Elements of a probabilistic model– The sample space

• The set of all possible outcomes of an experiment– The probability law

• Assign to a set of possible outcomes (also called an event) a nonnegative number (called the probability of ) that encodes our knowledge or belief about the collective “likelihood” of the elements of

Berlin Chen 9

AA AP

A

Page 10: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Probability Axioms

Berlin Chen 10

Page 11: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Probabilistic Models (2/2)

• The main ingredients of a probabilistic model

Berlin Chen 11

Page 12: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Sample Spaces and Events

• Each probabilistic model involves an underlying process, called the experiment– That produces exactly one out of several possible outcomes– The set of all possible outcomes is called the sample space of

the experiment, denoted by– A subset of the sample space (a collection of possible outcomes)

is called an event

• Examples of the experiment– A single toss of a coin (finite outcomes)– Three tosses of two dice (finite outcomes)– An infinite sequences of tosses of a coin (infinite outcomes)– Throwing a dart on a square (infinite outcomes), etc.

Berlin Chen 12

Page 13: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Sample Spaces and Events (2/2)

• Properties of the sample space– Elements of the sample space must be mutually exclusive– The sample space must be collectively exhaustive– The sample space should be at the “right” granularity (avoiding

irrelevant details)

Berlin Chen 13

Page 14: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Probability Laws

• Discrete Probability Law– If the sample space consists of a finite number of possible

outcomes, then the probability law is specified by the probabilities of the events that consist of a single element. In particular, the probability of any event is the sum of the probabilities of its elements:

• Discrete Uniform Probability Law– If the sample space consists of possible outcomes which are

equally likely (i.e., all single-element events have the same probability), then the probability of any event is given by

Berlin Chen 14

nsss ,,, 21

n

nn

sssssssss

PPPPPPP

21

2121

,,,

n

A

n

AA ofelement ofnumber P

Page 15: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Continuous Models

• Probabilistic models with continuous sample spaces– It is inappropriate to assign probability to each single-element

event (?)– Instead, it makes sense to assign probability to any interval (one-

dimensional) or area (two-dimensional) of the sample space

• Example: Wheel of Fortune

Berlin Chen 15

a

b

cd

?333.0?33.0

?3.0

PPP ? bxaxP

Page 16: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Properties of Probability Laws

• Probability laws have a number of properties, which can be deduced from the axioms. Some of them are summarized below

Berlin Chen 16

Page 17: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditional Probability (1/2)• Conditional probability provides us with a way to reason

about the outcome of an experiment, based on partial information– Suppose that the outcome is within some given event , we

wish to quantify the likelihood that the outcome also belongs some other given event

– Using a new probability law, we have the conditional probability of given , denoted by , which is defined as:

• If has zero probability, is undefined• We can think of as out of the total probability of the

elements of , the fraction that is assigned to possible outcomes that also belong to

Berlin Chen 17

B

A

BA BAP

BBABA

PPP

BP BAP BAP

BA

A B

Page 18: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditional Probability (2/2)

• When all outcomes of the experiment are equally likely, the conditional probability also can be defined as

• Some examples having to do with conditional probability1. In an experiment involving two successive rolls of a die, you are told

that the sum of the two rolls is 9. How likely is it that the first roll was a 6?

2. In a word guessing game, the first letter of the word is a “t”. What is the likelihood that the second letter is an “h”?

3. How likely is it that a person has a disease given that a medical test was negative?

4. A spot shows up on a radar screen. How likely is it that it corresponds to an aircraft?

Berlin Chen 18

BBABA

of elements ofnumber of elements ofnumber

P

Page 19: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditional Probabilities Satisfy the Three Axioms

• Nonnegative:

• Normalization:

• Additivity:If and are two disjoint events

Berlin Chen 19

0BAP

1

BB

BBB

PP

PPP

1A 2A

distributive

disjoint sets1A 2A

B

BABAB

BABAB

BABAB

BAABAA

21

21

21

2121

PPP

PPP

PP

PP

Page 20: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Multiplication (Chain) Rule

• Assuming that all of the conditioning events have positive probability, we have

– The above formula can be verified by writing

– For the case of just two events, the multiplication rule is simply the definition of conditional probability

Berlin Chen 20

112131211 ni in

ni i AAAAAAAAA PPPPP

1

1

1

21

321

1

2111

n

i i

ni in

i i AA

AAAAA

AAAAA

PP

PP

PPPP

12121 AAAAA PPP

Page 21: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Total Probability Theorem

• Let be disjoint events that form a partition of the sample space and assume that , for all . Then, for any event , we have

– Note that each possible outcome of the experiment (sample space) is included in one and only one of the events

Berlin Chen 21

nAA ,,1

nAA ,,1

iB

nn

n

ABAABABABABPPPP

PPP

11

1

0iAP

Page 22: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Bayes’ Rule

• Let be disjoint events that form a partition of the sample space, and assume that , for all . Then, for any event such that we have

Berlin Chen 22

nAAA ,,, 21

0iAP iB 0BP

nn

ii

nk kk

ii

ii

ii

ABAABAABA

ABAABA

BABA

BBA

BA

PPPPPP

PPPP

PPP

PP

P

11

1

Multiplication rule

Total probability theorem

Page 23: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Independence (1/2)

• Recall that conditional probability captures the partial information that event provides about event

• A special case arises when the occurrence of provides no such information and does not alter the probability that has occurred

– is independent of ( also is independent of )

Berlin Chen 23

BAPB A

A

B

BABA

ABBABA

PPP

PP

PP

BA

ABA PP

B A

Page 24: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Independence (2/2)

• and are independent => and are disjoint (?)– No ! Why ?

• and are disjoint then• However, if and

• Two disjoint events and with and are never independent

Berlin Chen 24

A B A B

AB BABA PPP

0BAPA B 0AP 0BP

A B 0AP 0BP

Page 25: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditional Independence (1/2)

• Given an event , the events and are called conditionally independent if

– We also know that

– If , we have an alternative way to express conditional independence

Berlin Chen 25

A BC

CBCACBA PPP

C

CBACBCC

CBACBA

PPPP

PPP

multiplication rule

0CBP

CACBA PP

1

2

3

Page 26: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditional Independence (2/2)

• Notice that independence of two events and with respect to the unconditionally probability law does not imply conditional independence, and vice versa

• If and are independent, the same holds for (i) and (ii) and(iii) and

Berlin Chen 26

A B

CBCACBABABA PPPPPP

A

BA cBcAcA cB

B

Page 27: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Independence of a Collection of Events

• We say that the events are independent if

• For example, the independence of three events amounts to satisfying the four conditions

Berlin Chen 27

nAAA ,,, 21

nSAASi

iSii ,1,2, of subset every for ,

PP

321 ,, AAA

321321

3232

3131

2121

AAAAAAAAAAAAAAAAAA

PPPPPPPPPPPPP

2n-n-1

Page 28: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Random Variables

• Given an experiment and the corresponding set of possible outcomes (the sample space), a random variable associates a particular number with each outcome– This number is referred to as the (numerical) value of the

random variable– We can say a random variable is a real-valued function of the

experimental outcome

Berlin Chen 28

xwX :

w

Page 29: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Random Variables: Example

• An experiment consists of two rolls of a 4-sided die, and the random variable is the maximum of the two rolls– If the outcome of the experiment is (4, 2), the value of this

random variable is 4– If the outcome of the experiment is (3, 3), the value of this

random variable is 3

– Can be one-to-one or many-to-one mappingBerlin Chen 29

Page 30: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Discrete/Continuous Random Variables

• A random variable is called discrete if its range (the set of values that it can take) is finite or at most countably infinite

• A random variable is called continuous (not discrete) if its range (the set of values that it can take) is uncountably infinite– E.g., the experiment of choosing a point from the interval

[−1, 1]• A random variable that associates the numerical value

to the outcome is not discrete

Berlin Chen 30

,2 ,1:infinitecountably ,4 ,3 ,2 ,1:finite

a

a2a

Page 31: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Concepts Related to Discrete Random Variables

• For a probabilistic model of an experiment– A discrete random variable is a real-valued function of the

outcome of the experiment that can take a finite or countably infinite number of values

– A (discrete) random variable has an associated probability mass function (PMF), which gives the probability of each numerical value that the random variable can take

– A function of a random variable defines another random variable, whose PMF can be obtained from the PMF of the original random variable

Berlin Chen 31

Page 32: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Probability Mass Function

• A (discrete) random variable is characterized through the probabilities of the values that it can take, which is captured by the probability mass function (PMF) of , denoted

– The sum of probabilities of all outcomes that give rise to a value of equal to

– Upper case characters (e.g., ) denote random variables, while lower case ones (e.g., ) denote the numerical values of a random variable

• The summation of the outputs of the PMF function of a random variable over all it possible numerical values is equal to one

Berlin Chen 32

X

X xpX or xXxpxXxp XX PP

X x

Xx

1x

X xp sxX ' are disjoint and form a partition of the sample space

Page 33: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Calculation of the PMF

• For each possible value of a random variable :1. Collect all the possible outcomes that give rise to the event 2. Add their probabilities to obtain

• An example: the PMF of the random variable = maximum roll in two independent rolls of a fair 4-sided die

Berlin Chen 33

Xx xX

xpX

xpX X

Page 34: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Expectation

• The expected value (also called the expectation or the mean) of a random variable , with PMF , is defined by

– Can be interpreted as the center of gravity of the PMF(Or a weighted average, in proportion to probabilities, of the possible values of )

• The expectation is well-defined if

– That is, converges to a finite value

Berlin Chen 34

X Xp

x

X xxpXE

x

X xpx

x

X xxp

X

xX

xX

xpxc

xpcx 0

Page 35: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Expectations for Functions of Random Variables

• Let be a random variable with PMF , and let be a function of . Then, the expected value of the random variable is given by

• To verify the above rule – Let , and therefore

Berlin Chen 35P b bilit B li Ch 35

X Xp

Xg

x

X xpxgXgE

XgX

XgY

yxgxXY xpyp

xX

y yxgxX

yxgxX

y

Yy

xpxg

xpxgxpy

ypyYXg EE

?

1y 2y 3y 4y

1x 2x 3x 4x 5x 6x 7x

Page 36: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Variance

• The variance of a random variable is the expected value of a random variable

– The variance is always nonnegative (why?)– The variance provides a measure of dispersion of around its

mean – The standard derivation is another measure of dispersion, which

is defined as (a square root of variance)

• Easier to interpret, because it has the same units as

Berlin Chen 36

X 2XX E

xX xpXx

XXX2

2

var

E

EE

X

XX var

X

Page 37: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Properties of Mean and Variance

• Let be a random variable and let

where and are given scalars

Then,

• If is a linear function of , then

Berlin Chen 37

a

baXY b

a linear function of

X

X

XaY

bXaY

varvar 2

EE

X Xg XgXg EE How to verify it ?

Page 38: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Joint PMF of Random Variables

• Let and be random variables associated with the same experiment (also the same sample space and probability laws), the joint PMF of and is defined by

• if event is the set of all pairs that have a certain property, then the probability of can be calculated by

– Namely, can be specified in terms of and

Berlin Chen 38

X Y

X Y

yYxXyYxXyxp YX ,,, PP

A

yx ,A

Ayx

YX yxpAYX,

, ,,P

X Y

A

Page 39: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Marginal PMFs of Random Variables

• The PMFs of random variables and can be calculated from their joint PMF

– and are often referred to as the marginal PMFs

– The above two equations can be verified by

Berlin Chen 39

X Y

x

YXYyYXX yxpypyxpxp , ,, ,,

xp X ypY

yYX

y

X

yxp

yYxX

xXxp

,

,

,

P

P

Page 40: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditioning

• Recall that conditional probability provides us with a way to reason about the outcome of an experiment, based on partial information

• In the same spirit, we can define conditional PMFs, given the occurrence of a certain event or given the value of another random variable

Berlin Chen 40

Page 41: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

• The conditional PMF of a random variable , conditioned on a particular event with , is defined by (where and are associated with the same experiment)

• Normalization Property– Note that the events are disjoint for different

values of , their union is

Berlin Chen 41

Conditioning a Random Variable on an Event (1/2)

XA 0AP

x

AxXA PP

AxX PX A

A

AxXAxXxP AX PPP

1

AA

A

AxX

AAxXxP x

xxAX P

PP

P

PP

X A

Total probability theorem

Page 42: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditioning a Random Variable on an Event (2/2)

• A graphical illustration

Berlin Chen 42

xP AX is obtained by adding the probabilities of the outcomesthat give rise to and belong to the conditioning eventxX A

Page 43: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditioning a Random Variable on Another (1/2)

• Let and be two random variables associated with the same experiment. The conditional PMF of given is defined as

• Normalization Property

• The conditional PMF is often convenient for the calculation of the joint PMF

Berlin Chen 43

X YX

Y

ypyxp

yYyYxXyYxXyxp

Y

YX

YX

,

,

,

P

PP

YXp

1x

YX yxp

yY valuesomeon fixed is

multiplication (chain) rule

) ( ,, xypxpyxpypyxp XYXYXYYX

Page 44: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Conditioning a Random Variable on Another (2/2)

• The conditional PMF can also be used to calculate the marginal PMFs

• Visualization of the conditional PMF

Berlin Chen 44

y

YXYy

YXX yxpypyxpxp ,,

YXp

xYX

YX

Y

YXYX

yxpyxpypyxp

yxp

,,

,

,

,

,

Page 45: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Independence of a Random Variable from an Event

• A random variable is independent of an event if

– Require two events and be independent for all

• If a random variable is independent of an eventand

Berlin Chen 45

X A

xAxXAxX allfor , and PPP

X A

xxp

xXA

AxXA

AxXxp

X

AX

allfor ,

and

PP

PPP

P 0AP

xX A x

Page 46: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Independence of Random Variables (1/2)

• Two random variables and are independent if

• If a random variable is independent of an random variable

Berlin Chen 46

X Y

all 0 with allfor , xypyxpyxp YXYX

yxyYxXyYxX

yxypxpyxp YXYX

, allfor ,,or

, allfor ,, ,

PPP

XY

xypyxpypypxp

ypyxp

yxp

X

Y

YX

Y

YXYX

all and 0 with allfor ,

, ,

Page 47: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Independence of Random Variables (2/2)

• Random variables and are said to be conditionally independent, given a positive probability event , if

– Or equivalently,

• Note here that, as in the case of events, conditional independence may not imply unconditional independence and vice versa

Berlin Chen 47

X YA

yxypxpyxp AYAXAYX , allfor ,, ,

xypyxpyxp AYAXAYX all and 0 with allfor , ,

Page 48: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Entropy (1/2)

• Three interpretations for quantity of information1. The amount of uncertainty before seeing an event2. The amount of surprise when seeing an event3. The amount of information after seeing an event

• The definition of information:

– the probability of an event

• Entropy: the average amount of information

– Have maximum value when the probability(mass) function is a uniform distribution

Berlin Chen 48

ii

i xPxP

xI 22 log1log)(

ixP ix

ix

iXiX xPxPxPEXIEXHi

22 loglog)()(

,...,...,, where 21 ixxxX

00log0 2 define

Page 49: Review of Probability Axioms and Laws - NTNUberlin.csie.ntnu.edu.tw/Courses/Speech Recognition...Conditional Probability (2/2) • When all outcomes of the experiment are equally likely,

Entropy (2/2)

• For Boolean classification (0 or 1)

• Entropy can be expressed as the minimum number of bits of information needed to encode the classification of an arbitrary number of examples– If c classes are generated, the maximum of entropy can be

Berlin Chen 49

222121 loglog)( ppppXEntropy

cXEntropy 2log)(

0,11 ,

12

1

xppxp

xPX