CS 1675 Intro to Machine Learning –Recitation Math and...
Transcript of CS 1675 Intro to Machine Learning –Recitation Math and...
![Page 1: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/1.jpg)
MathandProbabilityforMLRecap
JeongminLeeComputerScienceDepartment
UniversityofPittsburgh
CS1675IntrotoMachineLearning– Recitation
![Page 2: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/2.jpg)
Acknowledgement
• Slidecontentsarebasedontheseresources:
• Dr.Hauskrecht’s CS1675LectureNote:http://people.cs.pitt.edu/~milos/courses/cs1675/Lectures/Class5.pdf
• Dr.Ainsworth’sIntrotoMatricesLectureNote:http://www.csun.edu/~ata20315/psy524/docs/Intro%20to%20Matrices.pptx
• WhiteandMarino’sProbabilityRecitationLectureNote:http://www.cs.cmu.edu/~ninamf/courses/401sp18/recitations/probability_recitation.pdf
![Page 3: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/3.jpg)
Outline
• Part1.MatrixOperations• Part2.Probabilities
![Page 4: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/4.jpg)
Part1.MatrixOperations
• VectorandMatrix• Vector– Vectormultiplication•Matrix– MatrixMultiplication• InnerProduct• OuterProduct
![Page 5: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/5.jpg)
Vector
• Arrayofnumbers• Columnvector(verticallyarranged):X• Rowvector(horizontallyarranged):Y
[ ]4 1 1 3
129
7 22 144
0
x xX Y
é ùê úê ú= = -ê ú-ê úë û
![Page 6: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/6.jpg)
Vector-VectorMultiplication
• Twovectorsshouldbeinsamesize
• Inner(dot)product
Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar
or a matrix, depending on how the vectors are multiplied.
Inner (dot) product
Outerproduct
»»»
¼
º
«««
¬
ª
»»»
¼
º
«««
¬
ª�
189
,73
2uv
> @ 17)24(18189
732 ��� »»»
¼
º
«««
¬
ª� uvT
> @»»»
¼
º
«««
¬
ª���
»»»
¼
º
«««
¬
ª�
7566432427
21618189
73
2Tvu
Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar
or a matrix, depending on how the vectors are multiplied.
Inner (dot) product
Outerproduct
»»»
¼
º
«««
¬
ª
»»»
¼
º
«««
¬
ª�
189
,73
2uv
> @ 17)24(18189
732 ��� »»»
¼
º
«««
¬
ª� uvT
> @»»»
¼
º
«««
¬
ª���
»»»
¼
º
«««
¬
ª�
7566432427
21618189
73
2Tvu
![Page 7: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/7.jpg)
Vector-VectorMultiplication
• Twovectorsshouldbeinsamesize
• Inner(dot)product
• Outerproduct
Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar
or a matrix, depending on how the vectors are multiplied.
Inner (dot) product
Outerproduct
»»»
¼
º
«««
¬
ª
»»»
¼
º
«««
¬
ª�
189
,73
2uv
> @ 17)24(18189
732 ��� »»»
¼
º
«««
¬
ª� uvT
> @»»»
¼
º
«««
¬
ª���
»»»
¼
º
«««
¬
ª�
7566432427
21618189
73
2Tvu
Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar
or a matrix, depending on how the vectors are multiplied.
Inner (dot) product
Outerproduct
»»»
¼
º
«««
¬
ª
»»»
¼
º
«««
¬
ª�
189
,73
2uv
> @ 17)24(18189
732 ��� »»»
¼
º
«««
¬
ª� uvT
> @»»»
¼
º
«««
¬
ª���
»»»
¼
º
«««
¬
ª�
7566432427
21618189
73
2Tvu
Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar
or a matrix, depending on how the vectors are multiplied.
Inner (dot) product
Outerproduct
»»»
¼
º
«««
¬
ª
»»»
¼
º
«««
¬
ª�
189
,73
2uv
> @ 17)24(18189
732 ��� »»»
¼
º
«««
¬
ª� uvT
> @»»»
¼
º
«««
¬
ª���
»»»
¼
º
«««
¬
ª�
7566432427
21618189
73
2Tvu
![Page 8: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/8.jpg)
MatrixAddition andSubtraction• Theyalsoshouldbeinthesamesize• Simplyaddorsubtractthecorrespondingcomponentsofeachmatrix.
![Page 9: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/9.jpg)
2 3 2 3
1 2 3 5 6 7
7 8 9 3 4 5
1 2 3 5 6 7 1 5 2 6 3 7 6 8 107 8 9 3 4 5 7 3 8 4 9 5 10 12 14
1 2 3 5 6 7 1 5 2 6 3 7 4 4 47 8 9 3 4 5 7 3 8 4 9 5 4 4 4
x xA B
A B
A B B A
A B
é ù é ù= =ê ú ê úë û ë û
+ + +é ù é ù é ù é ù+ = + = =ê ú ê ú ê ú ê ú+ + +ë û ë û ë û ë û+ = +
- - - - - -é ù é ù é ù é ù- = - = =ê ú ê ú ê ú ê ú- - -ë û ë û ë û ë û
MatrixAddition
![Page 10: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/10.jpg)
2 3 2 3
1 2 3 5 6 7
7 8 9 3 4 5
1 2 3 5 6 7 1 5 2 6 3 7 6 8 107 8 9 3 4 5 7 3 8 4 9 5 10 12 14
1 2 3 5 6 7 1 5 2 6 3 7 4 4 47 8 9 3 4 5 7 3 8 4 9 5 4 4 4
x xA B
A B
A B B A
A B
é ù é ù= =ê ú ê úë û ë û
+ + +é ù é ù é ù é ù+ = + = =ê ú ê ú ê ú ê ú+ + +ë û ë û ë û ë û+ = +
- - - - - -é ù é ù é ù é ù- = - = =ê ú ê ú ê ú ê ú- - -ë û ë û ë û ë û
MatrixSubtraction
![Page 11: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/11.jpg)
MatrixMultiplication• ThenumberofcolumnsinAequalsthenumberofrowsinB.• AssumingA hasi xjdimensionsandB hasjxkdimensions,theresultingmatrix,C,willhavedimensionsi xk• Inotherwords,inordertomultiplythemtheinnerdimensionsmustmatchandtheresultistheouterdimensions.• EachelementinCcanbycomputedby:
ik j ij jkC A B= S
![Page 12: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/12.jpg)
MatrixMultiplication
( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
2 3 3 2
11 12
2 3 3 2 2 221 22
11 1 1
12 1 2
21 2 1
22 2 2
2 3 3 2 2 2
5 31 2 3
' 6 47 8 9
7 5
'
1*5 2*6 3*7 38
1*3 2*4 3*5 26
7*5 8*6 9*7 155
7*3 8*4 9*5 98
38 26'
1
x x
x x x
j j
j j
j j
j j
x x x
A B
c cA B C
c c
c A B
c A B
c A B
c A B
A B C
é ùé ù ê ú= =ê ú ê úë û ê úë û
é ù= = ê ú
ë û= = + + =
= = + + =
= = + + =
= = + + =
= =
åååå
55 98é ùê úë û
Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!
![Page 13: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/13.jpg)
MatrixMultiplication
( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
2 3 3 2
11 12
2 3 3 2 2 221 22
11 1 1
12 1 2
21 2 1
22 2 2
2 3 3 2 2 2
5 31 2 3
' 6 47 8 9
7 5
'
1*5 2*6 3*7 38
1*3 2*4 3*5 26
7*5 8*6 9*7 155
7*3 8*4 9*5 98
38 26'
1
x x
x x x
j j
j j
j j
j j
x x x
A B
c cA B C
c c
c A B
c A B
c A B
c A B
A B C
é ùé ù ê ú= =ê ú ê úë û ê úë û
é ù= = ê ú
ë û= = + + =
= = + + =
= = + + =
= = + + =
= =
åååå
55 98é ùê úë û
Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!
![Page 14: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/14.jpg)
MatrixMultiplication
( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
2 3 3 2
11 12
2 3 3 2 2 221 22
11 1 1
12 1 2
21 2 1
22 2 2
2 3 3 2 2 2
5 31 2 3
' 6 47 8 9
7 5
'
1*5 2*6 3*7 38
1*3 2*4 3*5 26
7*5 8*6 9*7 155
7*3 8*4 9*5 98
38 26'
1
x x
x x x
j j
j j
j j
j j
x x x
A B
c cA B C
c c
c A B
c A B
c A B
c A B
A B C
é ùé ù ê ú= =ê ú ê úë û ê úë û
é ù= = ê ú
ë û= = + + =
= = + + =
= = + + =
= = + + =
= =
åååå
55 98é ùê úë û
Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!
![Page 15: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/15.jpg)
MatrixMultiplication
( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
2 3 3 2
11 12
2 3 3 2 2 221 22
11 1 1
12 1 2
21 2 1
22 2 2
2 3 3 2 2 2
5 31 2 3
' 6 47 8 9
7 5
'
1*5 2*6 3*7 38
1*3 2*4 3*5 26
7*5 8*6 9*7 155
7*3 8*4 9*5 98
38 26'
1
x x
x x x
j j
j j
j j
j j
x x x
A B
c cA B C
c c
c A B
c A B
c A B
c A B
A B C
é ùé ù ê ú= =ê ú ê úë û ê úë û
é ù= = ê ú
ë û= = + + =
= = + + =
= = + + =
= = + + =
= =
åååå
55 98é ùê úë û
Resultingmatrixhasouterdimensions!!!
( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
2 3 3 2
11 12
2 3 3 2 2 221 22
11 1 1
12 1 2
21 2 1
22 2 2
2 3 3 2 2 2
5 31 2 3
' 6 47 8 9
7 5
'
1*5 2*6 3*7 38
1*3 2*4 3*5 26
7*5 8*6 9*7 155
7*3 8*4 9*5 98
38 26'
1
x x
x x x
j j
j j
j j
j j
x x x
A B
c cA B C
c c
c A B
c A B
c A B
c A B
A B C
é ùé ù ê ú= =ê ú ê úë û ê úë û
é ù= = ê ú
ë û= = + + =
= = + + =
= = + + =
= = + + =
= =
åååå
55 98é ùê úë û
Matchinginnerdimensions!!
![Page 16: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/16.jpg)
Outline
• Part1.MatrixOperations• Part2.Probabilities
![Page 17: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/17.jpg)
Part2.Probabilities
• Probabilitydefinition• Conditionalprobability• BayesRule• ConceptofIndependence• Randomvariable• Distributions• Discretedistribution• Continuousdistribution• Jointdistribution(multiplerandomvariables)• Meanandvarianceofadistribution
![Page 18: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/18.jpg)
SetBasics
• Asetisacollectionofelements• Intersection:• Union:• Complement:
Set Basics
3
A set is a collection of elements• Intersection: 𝐴 ∩ 𝐵 = 𝑥: 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵• Union: 𝐴 ∪ 𝐵 = {𝑥: 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}• Complement: 𝐴C = {𝑥: 𝑥 ∉ 𝐴}
Set Basics
3
A set is a collection of elements• Intersection: 𝐴 ∩ 𝐵 = 𝑥: 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵• Union: 𝐴 ∪ 𝐵 = {𝑥: 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}• Complement: 𝐴C = {𝑥: 𝑥 ∉ 𝐴}
![Page 19: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/19.jpg)
ProbabilityDefinitions
• SamplespaceΩ:setofpossibleoutcomes• EventspaceF:collectionofsubsets• ProbabilitymeasureP:assignsprobabilitiestoevents• Probabilityspace(Ω,F,P):setofsamplespace,eventspace,andprobabilitymeasure
![Page 20: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/20.jpg)
ProbabilityDefinitions
• Forexample,let’sconsideracaseofrollingadice:• SamplespaceΩ:={1,2,3,4,5,6}• EventspaceF ={{1},{2},… ,{1,2},… ,{1,2,3,4,5,6},∅ }• P({1})=1/6,P({2,4,6})=½,etc…
![Page 21: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/21.jpg)
ProbabilityAxioms
• P(Ac)=1– P(A)• P(A)≤1• P(∅)=0
![Page 22: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/22.jpg)
Note
• Wewillnotate𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴, 𝐵 inthisrecitation
![Page 23: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/23.jpg)
ConditionalProbabilities
• ConditionalprobabilityofAgivenB:• Thatis,treatBastheentiresamplespaceandthenfindtheprobabilityofA
Conditional Probabilities
8
The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.
This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =
𝑖𝑃 𝐵 ∩ 𝐴𝑖 =
𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)
Conditional Probabilities
8
The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.
This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =
𝑖𝑃 𝐵 ∩ 𝐴𝑖 =
𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)
Conditional Probabilities
8
The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.
This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =
𝑖𝑃 𝐵 ∩ 𝐴𝑖 =
𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)
![Page 24: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/24.jpg)
ConditionalProbabilities
• ConditionalprobabilityofAgivenB:• Thatis,treatBastheentiresamplespaceandthenfindtheprobabilityofA
• Product(chain)rule• Rewritingoftheconditionalprobability
Conditional Probabilities
8
The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.
This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =
𝑖𝑃 𝐵 ∩ 𝐴𝑖 =
𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)
Conditional Probabilities
8
The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.
This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =
𝑖𝑃 𝐵 ∩ 𝐴𝑖 =
𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)
P(A|B)=P(A,B)/P(B)
𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)
𝑎𝑠𝑃 𝐴 ∩ 𝐵 = 𝑃(𝐴, 𝐵)
![Page 25: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/25.jpg)
ConditionalProbabilitiesExample
• Givenadie,• A={1,2,3,4}:i.e.,therollis<5• B={1,3,5}:i.e.,therollisodd• P(A)=2/3• P(B)=1/2• P(A|B)=
• P(B|A)=
Conditional Probability Example
9
Given a die, Ω = {1,2,3,4,5,6}, 𝐹 = 2Ω, 𝑃 𝑖 = 1/6,𝐴 = {1,2,3,4}, i.e., the roll is < 5,𝐵 = 1,3,5 , i.e., the roll is odd.• 𝑃 𝐴 = 2/3• 𝑃 𝐵 = 1/2
• 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
= 𝑃( 1,3 )𝑃(𝐵)
= 23
• 𝑃 𝐵 𝐴 = 𝑃(𝐴∩𝐵)𝑃(𝐴)
= 𝑃( 1,3 )𝑃(𝐴)
= 12
• Note these quantities are not the same!
Conditional Probability Example
9
Given a die, Ω = {1,2,3,4,5,6}, 𝐹 = 2Ω, 𝑃 𝑖 = 1/6,𝐴 = {1,2,3,4}, i.e., the roll is < 5,𝐵 = 1,3,5 , i.e., the roll is odd.• 𝑃 𝐴 = 2/3• 𝑃 𝐵 = 1/2
• 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)
= 𝑃( 1,3 )𝑃(𝐵)
= 23
• 𝑃 𝐵 𝐴 = 𝑃(𝐴∩𝐵)𝑃(𝐴)
= 𝑃( 1,3 )𝑃(𝐴)
= 12
• Note these quantities are not the same!
![Page 26: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/26.jpg)
Bayestheorem
• FromChainrule:
• WecanrearrangeittoBayesrule:
Image:wikipedia.org
Bayes’ Rule
10
Using the chain rule,𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴),Rearranging gives us Bayes’ rule:
𝑃 𝐵 𝐴 =𝑃 𝐴 𝐵 𝑃(𝐵)
𝑃(𝐴)If 𝐵1, 𝐵2, … is a partition of Ω, we have
𝑃 𝐵𝑖 𝐴 =𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖) 𝑖 𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖)
(from Bayes’ rule + Law of Total Probability)
Bayes’ Rule
10
Using the chain rule,𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴),Rearranging gives us Bayes’ rule:
𝑃 𝐵 𝐴 =𝑃 𝐴 𝐵 𝑃(𝐵)
𝑃(𝐴)If 𝐵1, 𝐵2, … is a partition of Ω, we have
𝑃 𝐵𝑖 𝐴 =𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖) 𝑖 𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖)
(from Bayes’ rule + Law of Total Probability)𝑃 𝐵 𝐴 =
𝑃 𝐵 ∩ 𝐴𝑃(𝐴)
𝑃 𝐵 ∩ 𝐴 = 𝑃(𝐴|𝐵)𝑃(𝐴)
![Page 27: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/27.jpg)
Independence
• A,Bareindependentif
• WhenP(A)>0,thenwecanalsowriteP(B|A)=P(B)
𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝑃 𝐵
![Page 28: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/28.jpg)
RandomVariables
• ArandomvariableisafunctionX: Ω → ℝ6
• Example
• Rollingadicen times,X=sumofthenumbers
• Throwingadartatadartboard,𝑋 ∈ ℝ9 arethecoordinateswherethedartlands
Ω:setofpossibleoutcomesℝ6:d-dimensionalrealvalue
![Page 29: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/29.jpg)
Distributions
• Byconsideringrandomvariables,wecanthinkofprobabilitymeasuresasfunctionsontherealnumbers
• Theprobabilitymeasureassociatedwiththerandomvariableischaracterizedbyitscumulativedistributionfunction(CDF):𝐹; 𝑥 = 𝑃 𝑋 ≤ 𝑥 .Wewrite𝑋~𝐹;
• IftworandomvariableshavethesameCDF,wecallthemidenticallydistributed
![Page 30: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/30.jpg)
DiscreteDistributions
• If𝑋 onlyhasacountable numberofvalues,thenwecancharacterizeitusingaprobabilitymassfunction(PMF)whichdescribestheprobabilityofeachvalue
𝑓; 𝑥 = 𝑃 𝑋 = 𝑥
• ∑ 𝑓; 𝑥�; = 1
• Example:Coinflip(Bernoullidistribution)• 𝑋 ∈ 0,1 ,𝑓; 𝑥 = 𝜃F 1 − 𝜃 HIF
• i.e.,𝜃 =probabilityofhead• Bernoullidist.combinesprobabilitiesofaheadandatail• For𝑥=1,𝑓; 𝑥 = 𝜃• For𝑥=0,𝑓; 𝑥 = (1 − 𝜃)
![Page 31: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/31.jpg)
ContinuousDistributions
• WhentheCDFiscontinuous,wecanlookatthederivative:
𝑓; 𝑥 =𝑑𝑑𝑥𝑓; 𝑥
• Thisiscalledtheprobabilitydensityfunction(PDF).• Wecancomputetheprobabilityofaninterval(𝑎,𝑏)withP 𝑎 < 𝑋 < 𝑏 = ∫ 𝑓; 𝑥 𝑑𝑥N
O• Forexample,Gaussiandistribution:
Continuous Distributions
15
• When the CDF is continuous, we can look at the derivative 𝑓𝑋 𝑥 = 𝑑
𝑑𝑥𝐹𝑋 𝑥 .
• This is called the probability density function (PDF).• We can compute the probability of an interval (𝑎, 𝑏) with 𝑃 𝑎 < 𝑋 < 𝑏 = 𝑎
𝑏 𝑓𝑋 𝑥 𝑑𝑥.• Note the probability of any specific point 𝑐, 𝑃 𝑋 = 𝑐 = 0
• E.g. Uniform distribution, 𝑓𝑋 𝑥 = 1𝑏−𝑎
∗ 1 𝑎,𝑏 (𝑥)
• E.g. Gaussian distribution, 𝑓𝑋 𝑥 = 12𝜋𝜎
exp((𝑥−𝜇)2
2𝜎2)
![Page 32: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/32.jpg)
MultipleRandomVariables
• Wecanalsohavemultiplerandomvariables• E.g.,flippingoftwocoinsatthesametime
(firstcoin:X,secondcoin:Y)• Wecanthinkoffourcases:• x=0andy=0• x=0andy=1• x=1andy=0• x=1andy=1
• Assumethatwecanflippingtwocoinsmultipletimesandrecorditsoutcomes
X=0 X=1
Y=0 50 30
Y=1 70 50
![Page 33: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/33.jpg)
MultipleRandomVariables
• Whenwenormalizethetable,wecanrepresentthejointdistribution:
• Sumofallelementsshouldbe1• WewritethejointPMForPDFas𝐹;,P 𝑥, 𝑦
X=0 X=1
Y=0 0.25 0.15
Y=1 0.35 0.25
X=0 X=1
Y=0 50 30
Y=1 70 50
![Page 34: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/34.jpg)
MultipleRandomVariables
• If𝐹;,P 𝑥, 𝑦 = 𝐹; 𝑥 𝐹P 𝑦 ,thenthetwoRVsareindependent
![Page 35: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/35.jpg)
Marginalization
• Summing(discrete)orintegrating(continuous)overonevariable• 𝐹; 𝑥 = ∑ 𝐹;,P(𝑥, 𝑦)�
P :discreteRVs• 𝑓; 𝑥 = ∫ 𝑓;,P 𝑥, 𝑦 𝑑𝑦�
S :continuousRVs
X=0 X=10.6 0.4
X=0 X=1
Y=0 0.25 0.15
Y=1 0.35 0.25
![Page 36: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/36.jpg)
MeanofaDistribution
• Expectationormeanofadistribution:• 𝐸 𝑋 = ∑ 𝑥 ⋅ 𝑓;(𝑥)�
; ifX isdiscrete• 𝐸 𝑋 = ∫ 𝑥 ⋅ 𝑓;𝑑𝑥
VIV ifX iscontinuous
• 𝐸 𝑋 ∗ 𝑌 = 𝐸 𝑋 𝐸 𝑌 onlyifwhenX,Yareindependent• 𝐸 𝐸(𝑋 ) = 𝐸 𝑋
![Page 37: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/37.jpg)
VarianceofaDistribution
• Varianceofadistribution:Var 𝑋 = 𝐸 𝑋 − 𝐸𝑋 9
• Ittellsabouthow“spreadout”thedistributionis
![Page 38: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science](https://reader033.fdocuments.us/reader033/viewer/2022051605/60133a60fea5de1cd545bdb6/html5/thumbnails/38.jpg)
Thanks!-
Questions?