CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning...

19
CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of these slides are taken from course textbook website http://research.microsoft.com/~cmbishop/prml/

Transcript of CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning...

Page 1: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

CSCI 590: Machine Learning

Lecture 2: Basics of probability Instructor: Murat Dundar

Acknowledgement:

Some of these slides are taken from course textbook website

http://research.microsoft.com/~cmbishop/prml/

Page 2: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Axioms of Probability

In probability, the set of all possible outcomes of an experiment is called the sample space and usually denoted by 𝝮.

Tossing a coin, 𝝮={head, tail} Rolling a dice, 𝝮={1, 2, 3, 4, 5, 6} Let the probability of an event A is defined by P(A). The number P(A)

should satisfy the following three conditions: 1. P(A)≥0 2. P(𝝮)=1 3. If A ∩ B={} then P(A ∪ B)=P(A) + P(B)

Page 3: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

σ-Algebra (1)

Events are subsets of 𝝮 but not all subsets of 𝝮 can be considered as events. When the set 𝝮 has infinitely many outcomes it is impossible to assign

probabilities to all subsets. Thus, we say 𝝮 is not measurable. The collection of sets over which a measure can be defined is called a σ-

algebra, σ-field. A σ-field, 𝝨, is a nonempty class of sets such that If A 𝝐 𝝨 then 𝐴 𝝐 𝝨 (complement) If A1, A2, … 𝝐 𝝨 then A1 ∪ A2 ∪ … 𝝐 𝝨 (countable union) The pair (𝝮, 𝝨) is called a measurable space.

Page 4: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

σ-Algebra (2)

These are the minimum set of conditions. We can show that a σ-field also satisfies the following.

• If A1, A2, … 𝝐 𝝨 then A1 ∩ A2 ∩ … 𝝐 𝝨 (countable intersection)

• 𝝮 𝝐 𝝨 and ∅ 𝝐 𝝨

Example: If 𝝮={a,b,c,d} one possible σ-algebra on 𝝮 is 𝝨={∅, {a,b}, {c,d}, {a,b,c,d}}

The smallest possible σ-field is a collection of just two sets, {∅, 𝝮}. The largest possible field is the collection of all possible subsets of 𝝮 also called the powerset of 𝝮.

Page 5: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Borel σ-Algebra

Borel set: The Borel set of 𝝮 is any set that can be formed from open sets (or closed sets) through the operations of countable union, countable intersection, and relative complement.

The collection of all Borel sets on 𝝮 forms a special class of σ-algebra known as Borel algebra.

The Borel algebra on 𝝮 is the smallest σ-algebra containing all open sets.

Page 6: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

The Real Line

Let 𝝮 be the set of all real numbers. One possible set of subsets can be considered as sets of points on the real line. 𝝮 is not measurable on this set as the real line consists of noncountable infinity of elements.

So how do we construct our event set on 𝝮 so that it is countable?

We start with all open intervals (x1< x < x2) and adding in all countable unions, countable intersections, and relative complements and continuing this process until the relevant closure properties are achieved.

The set of these events form the Borel field that contains all half lines x ≤ xi, which can be assigned probabilities. The probabilities of all other events can be derived from these probabilities.

Page 7: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Probability Theory

Marginal Probability

Conditional Probability Joint Probability

Page 8: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Probability Theory

Sum Rule

Product Rule

Page 9: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

The Rules of Probability

Sum Rule

Product Rule

Page 10: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Probability Densities

Page 11: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Transformed Densities

If 𝑦 = 𝑔−1 𝑥 and 𝑥~𝑝𝑥(𝑥)

Page 12: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

From P(x) to uniform distribution

Given a RV x with P(x), the RV u=P(x) is uniformly distributed in the interval [0,1].

Proof: If u=P(x) from the monotonicity of P(x) it follows that u ≤ u iff x ≤ x. Hence

P(u)=P(u ≤ u)=P(x ≤ x)=P(x)=u

Page 13: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

From uniform distribution to P(x)

This is how we can generate samples with cdf P(X)

Given a uniformly distributed RV u in the interval [0,1], we can generate samples of a RV x with a cdf P(x) by transforming u through P-1(u)

Page 14: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Expectations

Conditional Expectation (discrete)

Approximate Expectation (discrete and continuous)

Page 15: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Variances and Covariances

Page 16: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Independence

If two events A and B are independent then P(A,B)=P(A)P(B)

In geneal, if n random variables xi are identically and independently (i.i.d) distributed according to some distribution p(xi), then we can write

𝑝(𝑥1, 𝑥2,…, 𝑥𝑛) = 𝑝(𝑥𝑖)𝑛𝑖=1

If two random variables are independent they are not correlated. The reverse is not necessarily true!

Page 17: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

The Gaussian Distribution

Page 18: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

Gaussian Mean and Variance

Page 19: CSCI 590: Machine Learningmdundar/CSCIMachineLearning/Lecture2.pdf · CSCI 590: Machine Learning Lecture 2: Basics of probability Instructor: Murat Dundar Acknowledgement: Some of

The Gaussian Distribution