CS 1675 Intro to Machine Learning –Recitation Math and...

MathandProbabilityforMLRecap

JeongminLeeComputerScienceDepartment

UniversityofPittsburgh

CS1675IntrotoMachineLearning– Recitation

Acknowledgement

• Slidecontentsarebasedontheseresources:

• Dr.Hauskrecht’s CS1675LectureNote:http://people.cs.pitt.edu/~milos/courses/cs1675/Lectures/Class5.pdf

• Dr.Ainsworth’sIntrotoMatricesLectureNote:http://www.csun.edu/~ata20315/psy524/docs/Intro%20to%20Matrices.pptx

• WhiteandMarino’sProbabilityRecitationLectureNote:http://www.cs.cmu.edu/~ninamf/courses/401sp18/recitations/probability_recitation.pdf

Outline

• Part1.MatrixOperations• Part2.Probabilities

Part1.MatrixOperations

• VectorandMatrix• Vector– Vectormultiplication•Matrix– MatrixMultiplication• InnerProduct• OuterProduct

Vector

• Arrayofnumbers• Columnvector(verticallyarranged):X• Rowvector(horizontallyarranged):Y

[ ]4 1 1 3

129

7 22 144

0

x xX Y

é ùê úê ú= = -ê ú-ê úë û

Vector-VectorMultiplication

• Twovectorsshouldbeinsamesize

• Inner(dot)product

Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar

or a matrix, depending on how the vectors are multiplied.

Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 �� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª��

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu



Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 �� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª��

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

Vector-VectorMultiplication

• Twovectorsshouldbeinsamesize

• Inner(dot)product

• Outerproduct



Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 �� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª��

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu



Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 �� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª��

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu



Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 �� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª��

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

MatrixAddition andSubtraction• Theyalsoshouldbeinthesamesize• Simplyaddorsubtractthecorrespondingcomponentsofeachmatrix.

2 3 2 3

1 2 3 5 6 7

7 8 9 3 4 5

1 2 3 5 6 7 1 5 2 6 3 7 6 8 107 8 9 3 4 5 7 3 8 4 9 5 10 12 14

1 2 3 5 6 7 1 5 2 6 3 7 4 4 47 8 9 3 4 5 7 3 8 4 9 5 4 4 4

x xA B

A B

A B B A

A B

é ù é ù= =ê ú ê úë û ë û

+ + +é ù é ù é ù é ù+ = + = =ê ú ê ú ê ú ê ú+ + +ë û ë û ë û ë û+ = +

- - - - - -é ù é ù é ù é ù- = - = =ê ú ê ú ê ú ê ú- - -ë û ë û ë û ë û

MatrixAddition

2 3 2 3

1 2 3 5 6 7

7 8 9 3 4 5

1 2 3 5 6 7 1 5 2 6 3 7 6 8 107 8 9 3 4 5 7 3 8 4 9 5 10 12 14

1 2 3 5 6 7 1 5 2 6 3 7 4 4 47 8 9 3 4 5 7 3 8 4 9 5 4 4 4

x xA B

A B

A B B A

A B

é ù é ù= =ê ú ê úë û ë û

+ + +é ù é ù é ù é ù+ = + = =ê ú ê ú ê ú ê ú+ + +ë û ë û ë û ë û+ = +

- - - - - -é ù é ù é ù é ù- = - = =ê ú ê ú ê ú ê ú- - -ë û ë û ë û ë û

MatrixSubtraction

MatrixMultiplication• ThenumberofcolumnsinAequalsthenumberofrowsinB.• AssumingA hasi xjdimensionsandB hasjxkdimensions,theresultingmatrix,C,willhavedimensionsi xk• Inotherwords,inordertomultiplythemtheinnerdimensionsmustmatchandtheresultistheouterdimensions.• EachelementinCcanbycomputedby:

ik j ij jkC A B= S

MatrixMultiplication

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C

é ùé ù ê ú= =ê ú ê úë û ê úë û

é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå

55 98é ùê úë û

Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!

MatrixMultiplication

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C


é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå


Resultingmatrixhasouterdimensions!!!

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C


é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå


Matchinginnerdimensions!!

Outline

• Part1.MatrixOperations• Part2.Probabilities

Part2.Probabilities

• Probabilitydefinition• Conditionalprobability• BayesRule• ConceptofIndependence• Randomvariable• Distributions• Discretedistribution• Continuousdistribution• Jointdistribution(multiplerandomvariables)• Meanandvarianceofadistribution

SetBasics

• Asetisacollectionofelements• Intersection:• Union:• Complement:

Set Basics

3

A set is a collection of elements• Intersection: 𝐴 ∩ 𝐵 = 𝑥: 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵• Union: 𝐴 ∪ 𝐵 = {𝑥: 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}• Complement: 𝐴C = {𝑥: 𝑥 ∉ 𝐴}

Set Basics

3

A set is a collection of elements• Intersection: 𝐴 ∩ 𝐵 = 𝑥: 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵• Union: 𝐴 ∪ 𝐵 = {𝑥: 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}• Complement: 𝐴C = {𝑥: 𝑥 ∉ 𝐴}

ProbabilityDefinitions

• SamplespaceΩ:setofpossibleoutcomes• EventspaceF:collectionofsubsets• ProbabilitymeasureP:assignsprobabilitiestoevents• Probabilityspace(Ω,F,P):setofsamplespace,eventspace,andprobabilitymeasure

ProbabilityDefinitions

• Forexample,let’sconsideracaseofrollingadice:• SamplespaceΩ:={1,2,3,4,5,6}• EventspaceF ={{1},{2},… ,{1,2},… ,{1,2,3,4,5,6},∅ }• P({1})=1/6,P({2,4,6})=½,etc…

ProbabilityAxioms

• P(Ac)=1– P(A)• P(A)≤1• P(∅)=0

Note

• Wewillnotate𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴, 𝐵 inthisrecitation

ConditionalProbabilities

• ConditionalprobabilityofAgivenB:• Thatis,treatBastheentiresamplespaceandthenfindtheprobabilityofA

Conditional Probabilities

8

The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.

This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =

𝑖𝑃 𝐵 ∩ 𝐴𝑖 =

𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)


8







8






ConditionalProbabilities

• ConditionalprobabilityofAgivenB:• Thatis,treatBastheentiresamplespaceandthenfindtheprobabilityofA

• Product(chain)rule• Rewritingoftheconditionalprobability


8







8






P(A|B)=P(A,B)/P(B)

𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)

𝑎𝑠𝑃 𝐴 ∩ 𝐵 = 𝑃(𝐴, 𝐵)

ConditionalProbabilitiesExample

• Givenadie,• A={1,2,3,4}:i.e.,therollis<5• B={1,3,5}:i.e.,therollisodd• P(A)=2/3• P(B)=1/2• P(A|B)=

• P(B|A)=

Conditional Probability Example

9

Given a die, Ω = {1,2,3,4,5,6}, 𝐹 = 2Ω, 𝑃 𝑖 = 1/6,𝐴 = {1,2,3,4}, i.e., the roll is < 5,𝐵 = 1,3,5 , i.e., the roll is odd.• 𝑃 𝐴 = 2/3• 𝑃 𝐵 = 1/2

• 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

= 𝑃( 1,3 )𝑃(𝐵)

= 23

• 𝑃 𝐵 𝐴 = 𝑃(𝐴∩𝐵)𝑃(𝐴)

= 𝑃( 1,3 )𝑃(𝐴)

= 12

• Note these quantities are not the same!

Conditional Probability Example

9

Given a die, Ω = {1,2,3,4,5,6}, 𝐹 = 2Ω, 𝑃 𝑖 = 1/6,𝐴 = {1,2,3,4}, i.e., the roll is < 5,𝐵 = 1,3,5 , i.e., the roll is odd.• 𝑃 𝐴 = 2/3• 𝑃 𝐵 = 1/2

• 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

= 𝑃( 1,3 )𝑃(𝐵)

= 23

• 𝑃 𝐵 𝐴 = 𝑃(𝐴∩𝐵)𝑃(𝐴)

= 𝑃( 1,3 )𝑃(𝐴)

= 12

• Note these quantities are not the same!

Bayestheorem

• FromChainrule:

• WecanrearrangeittoBayesrule:

Image:wikipedia.org

Bayes’ Rule

10

Using the chain rule,𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴),Rearranging gives us Bayes’ rule:

𝑃 𝐵 𝐴 =𝑃 𝐴 𝐵 𝑃(𝐵)

𝑃(𝐴)If 𝐵1, 𝐵2, … is a partition of Ω, we have

𝑃 𝐵𝑖 𝐴 =𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖) 𝑖 𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖)

(from Bayes’ rule + Law of Total Probability)

Bayes’ Rule

10

Using the chain rule,𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴),Rearranging gives us Bayes’ rule:

𝑃 𝐵 𝐴 =𝑃 𝐴 𝐵 𝑃(𝐵)

𝑃(𝐴)If 𝐵1, 𝐵2, … is a partition of Ω, we have

𝑃 𝐵𝑖 𝐴 =𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖) 𝑖 𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖)

(from Bayes’ rule + Law of Total Probability)𝑃 𝐵 𝐴 =

𝑃 𝐵 ∩ 𝐴𝑃(𝐴)

𝑃 𝐵 ∩ 𝐴 = 𝑃(𝐴|𝐵)𝑃(𝐴)

Independence

• A,Bareindependentif

• WhenP(A)>0,thenwecanalsowriteP(B|A)=P(B)

𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝑃 𝐵

RandomVariables

• ArandomvariableisafunctionX: Ω → ℝ6

• Example

• Rollingadicen times,X=sumofthenumbers

• Throwingadartatadartboard,𝑋 ∈ ℝ9 arethecoordinateswherethedartlands

Ω:setofpossibleoutcomesℝ6:d-dimensionalrealvalue

Distributions

• Byconsideringrandomvariables,wecanthinkofprobabilitymeasuresasfunctionsontherealnumbers

• Theprobabilitymeasureassociatedwiththerandomvariableischaracterizedbyitscumulativedistributionfunction(CDF):𝐹; 𝑥 = 𝑃 𝑋 ≤ 𝑥 .Wewrite𝑋~𝐹;

• IftworandomvariableshavethesameCDF,wecallthemidenticallydistributed

DiscreteDistributions

• If𝑋 onlyhasacountable numberofvalues,thenwecancharacterizeitusingaprobabilitymassfunction(PMF)whichdescribestheprobabilityofeachvalue

𝑓; 𝑥 = 𝑃 𝑋 = 𝑥

• ∑ 𝑓; 𝑥�; = 1

• Example:Coinflip(Bernoullidistribution)• 𝑋 ∈ 0,1 ,𝑓; 𝑥 = 𝜃F 1 − 𝜃 HIF

• i.e.,𝜃 =probabilityofhead• Bernoullidist.combinesprobabilitiesofaheadandatail• For𝑥=1,𝑓; 𝑥 = 𝜃• For𝑥=0,𝑓; 𝑥 = (1 − 𝜃)

ContinuousDistributions

• WhentheCDFiscontinuous,wecanlookatthederivative:

𝑓; 𝑥 =𝑑𝑑𝑥𝑓; 𝑥

• Thisiscalledtheprobabilitydensityfunction(PDF).• Wecancomputetheprobabilityofaninterval(𝑎,𝑏)withP 𝑎 < 𝑋 < 𝑏 = ∫ 𝑓; 𝑥 𝑑𝑥N

O• Forexample,Gaussiandistribution:

Continuous Distributions

15

• When the CDF is continuous, we can look at the derivative 𝑓𝑋 𝑥 = 𝑑

𝑑𝑥𝐹𝑋 𝑥 .

• This is called the probability density function (PDF).• We can compute the probability of an interval (𝑎, 𝑏) with 𝑃 𝑎 < 𝑋 < 𝑏 = 𝑎

𝑏 𝑓𝑋 𝑥 𝑑𝑥.• Note the probability of any specific point 𝑐, 𝑃 𝑋 = 𝑐 = 0

• E.g. Uniform distribution, 𝑓𝑋 𝑥 = 1𝑏−𝑎

∗ 1 𝑎,𝑏 (𝑥)

• E.g. Gaussian distribution, 𝑓𝑋 𝑥 = 12𝜋𝜎

exp((𝑥−𝜇)2

2𝜎2)

MultipleRandomVariables

• Wecanalsohavemultiplerandomvariables• E.g.,flippingoftwocoinsatthesametime

(firstcoin:X,secondcoin:Y)• Wecanthinkoffourcases:• x=0andy=0• x=0andy=1• x=1andy=0• x=1andy=1

• Assumethatwecanflippingtwocoinsmultipletimesandrecorditsoutcomes

X=0 X=1

Y=0 50 30

Y=1 70 50


• Whenwenormalizethetable,wecanrepresentthejointdistribution:

• Sumofallelementsshouldbe1• WewritethejointPMForPDFas𝐹;,P 𝑥, 𝑦

X=0 X=1

Y=0 0.25 0.15

Y=1 0.35 0.25

X=0 X=1

Y=0 50 30

Y=1 70 50


• If𝐹;,P 𝑥, 𝑦 = 𝐹; 𝑥 𝐹P 𝑦 ,thenthetwoRVsareindependent

Marginalization

• Summing(discrete)orintegrating(continuous)overonevariable• 𝐹; 𝑥 = ∑ 𝐹;,P(𝑥, 𝑦)�

P :discreteRVs• 𝑓; 𝑥 = ∫ 𝑓;,P 𝑥, 𝑦 𝑑𝑦�

S :continuousRVs

X=0 X=10.6 0.4

X=0 X=1

Y=0 0.25 0.15

Y=1 0.35 0.25

MeanofaDistribution

• Expectationormeanofadistribution:• 𝐸 𝑋 = ∑ 𝑥 ⋅ 𝑓;(𝑥)�

; ifX isdiscrete• 𝐸 𝑋 = ∫ 𝑥 ⋅ 𝑓;𝑑𝑥

VIV ifX iscontinuous

• 𝐸 𝑋 ∗ 𝑌 = 𝐸 𝑋 𝐸 𝑌 onlyifwhenX,Yareindependent• 𝐸 𝐸(𝑋 ) = 𝐸 𝑋

VarianceofaDistribution

• Varianceofadistribution:Var 𝑋 = 𝐸 𝑋 − 𝐸𝑋 9

• Ittellsabouthow“spreadout”thedistributionis

Thanks!-

Questions?

CS 1675 Intro to Machine Learning –Recitation Math and...

Documents

Transcript of CS 1675 Intro to Machine Learning –Recitation Math and...