L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support...

50
ECE 5424: Introduction to Machine Learning Stefan Lee Virginia Tech Topics: – Expectation Maximization – For GMMs – For General Latent Model Learning Readings: Barber 20.120.3

Transcript of L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support...

Page 1: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

ECE 5424: Introduction to Machine Learning

Stefan LeeVirginia Tech

Topics: – Expectation Maximization

– For GMMs– For General Latent Model Learning

Readings: Barber 20.1-­20.3

Page 2: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Project Poster• Poster Presentation:

– Dec 6th 1:30-­3:30pm– Goodwin Hall Atrium – Print poster (or bunch of slides)

• Fedex, Library, ECE support, CS support– Format:

• Portrait, 2 feet (width) x 36 inches (height)• See https://filebox.ece.vt.edu/~f16ece5424/project.html

• Submit poster as PDF by Dec 6th 1:30pm– Makes up the final portion of your project grade

2

Best Project Prize!

Page 3: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster• Layout content consistently

– top to bottom, left to right in columns is common– usually numbered headings

3

Page 4: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster• DONT

4

Page 5: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster• Be cautious with colors

5

Page 6: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster• Be cautious with colors

6

Page 7: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster• Less text, more pictures. Bullets are your friend!

7

Page 8: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster• Avoid giant tables• Don’t use more than 2 fonts• Make sure everything is readable• Make sure each figure has an easy to find take-­away message (write it nearby, maybe in an associated box)

8

Page 9: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Style a Scientific Poster

9

Page 10: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

How to Present a Scientific Poster

10

• Typical poster session at a conference:– You stand by your poster and people stop to check it out

• You will need:– 30 second summary

• gives the visitor an overview and gauges interest• if they are not really interested, they will leave after this bit

– 3-­5 minutes full walkthrough• if someone stuck around past your 30 second summary or asked some follow up questions, walk them through your poster in more detail.

• DON’T READ IT TO THEM!

– A bottle of water is typically useful.

Page 11: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Again• Poster Presentation:

– Dec 6th 1:30-­3:30pm– Goodwin Hall Atrium – Print poster (or bunch of slides)

• Fedex, Library, ECE support, CS support– Format:

• Portrait, 2 feet (width) x 36 inches (height)

• Submit poster as PDF by Dec 6th 1:30pm– Makes up the final portion of your project grade

• If you are worried about your project, talk to me soon.

11

Best Project Prize!

Page 12: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Homework & Grading

12

• HW3 & HW4 should be graded this week

• Will release solutions this week as well

Page 13: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Final Exam

13

• Dec 14th in class (DURH 261);; 2:05 -­ 4:05 pm• Content:

– Almost completely about material since the midterm• SVM, Neural Networks, Descision Trees, Ensemble Techniques, K-­means, EM (today), Factor Analysis (Thrusday)

– True/False (explain your choice like last time)– Multiple Choice– Some ‘Prove this’– Some ‘What would happen with algorithm A on this dataset”

Page 14: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

One last thing

14

• SPOT surveys

Page 15: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Recap of Last Time

(C) Dhruv Batra 15

Page 16: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Some Data

16(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 17: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means

1. Ask user how many clusters they’d like.

(e.g. k=5)

17(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 18: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means

1. Ask user how many clusters they’d like.

(e.g. k=5)

2. Randomly guess k cluster Center locations

18(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 19: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means

1. Ask user how many clusters they’d like.

(e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)

19(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 20: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means

1. Ask user how many clusters they’d like.

(e.g. k=5)

2. Randomly guess kcluster Center locations

3. Each datapoint finds out which Center it’s

closest to.

4. Each Center finds the centroid of the points it owns

20(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 21: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means

1. Ask user how many clusters they’d like.

(e.g. k=5)

2. Randomly guess kcluster Center locations

3. Each datapoint finds out which Center it’s

closest to.

4. Each Center finds the centroid of the points it owns…

5. …and jumps there

6. …Repeat until terminated! 21(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 22: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means• Randomly initialize k centers

– (0) = 1(0),…, k(0)

• Assign: – Assign each point i1,…n to nearest center:–

• Recenter: – j becomes centroid of its points

22(C) Dhruv Batra Slide Credit: Carlos Guestrin

C(i) argminj

||xi µj ||2

Page 23: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

• Optimize objective function:

• Fix , optimize a (or C)

23(C) Dhruv Batra Slide Credit: Carlos Guestrin

K-­means as Co-­ordinate Descent

minµ1,...,µk

mina1,...,aN

F (µ,a) = minµ1,...,µk

mina1,...,aN

NX

i=1

kX

j=1

aij ||xi µj ||2

Page 24: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

• Optimize objective function:

• Fix a (or C), optimize

24(C) Dhruv Batra Slide Credit: Carlos Guestrin

K-­means as Co-­ordinate Descent

minµ1,...,µk

mina1,...,aN

F (µ,a) = minµ1,...,µk

mina1,...,aN

NX

i=1

kX

j=1

aij ||xi µj ||2

Page 25: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Object Bag of ‘words’

Fei-­‐Fei Li

Page 26: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Clustered Image Patches

Fei-­Fei et al. 2005

Page 27: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

(One) bad case for k-­means

• Clusters may overlap• Some clusters may be “wider” than others

• GMM to the rescue!

Slide Credit: Carlos Guestrin(C) Dhruv Batra 27

Page 28: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

GMM

(C) Dhruv Batra 28Figure Credit: Kevin Murphy

−25 −20 −15 −10 −5 0 5 10 15 20 250

5

10

15

20

25

30

35

Page 29: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

GMM

(C) Dhruv Batra 29

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Figure Credit: Kevin Murphy

Page 30: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means vs GMM• K-­Means

– http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

• GMM– http://www.socr.ucla.edu/applets.dir/mixtureem.html

(C) Dhruv Batra 30

Page 31: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Hidden Data Causes Problems #1• Fully Observed (Log) Likelihood factorizes

• Marginal (Log) Likelihood doesn’t factorize

• All parameters coupled!

(C) Dhruv Batra 31

Page 32: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Hidden Data Causes Problems #2• Identifiability

(C) Dhruv Batra 32

−25 −20 −15 −10 −5 0 5 10 15 20 250

5

10

15

20

25

30

35

µ1

µ2

−15.5 −10.5 −5.5 −0.5 4.5 9.5 14.5 19.5

−15.5

−10.5

−5.5

−0.5

4.5

9.5

14.5

19.5

Figure Credit: Kevin Murphy

Page 33: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Hidden Data Causes Problems #3• Likelihood has singularities if one Gaussian “collapses”

(C) Dhruv Batra 33x

p(x

)

Page 34: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Special case: spherical Gaussians and hard assignments

Slide Credit: Carlos Guestrin(C) Dhruv Batra 34

• If P(X|Z=k) is spherical, with same for all classes:

• If each xi belongs to one class C(i) (hard assignment), marginal likelihood:

• M(M)LE same as K-­means!!!

P(xi, y = j)j=1

k

∑i=1

N

∏ ∝ exp − 12σ 2 xi −µC(i)

2%

&'(

)*i=1

N

P(xi | z = j)∝ exp −12σ 2 xi −µ j

2#

$%&

'(

Page 35: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

The K-­means GMM assumption

• There are k components

• Component i has an associated mean vector µι

µ1

µ2

µ3

Slide Credit: Carlos Guestrin(C) Dhruv Batra 35

Page 36: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

The K-­means GMM assumption

• There are k components

• Component i has an associated mean vector µι

• Each component generates data from a Gaussian with mean mi and

covariance matrix σ2Ι

Each data point is generated according to the following recipe:

µ1

µ2

µ3

Slide Credit: Carlos Guestrin(C) Dhruv Batra 36

Page 37: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

The K-­means GMM assumption

• There are k components

• Component i has an associated mean vector µι

• Each component generates data from a Gaussian with mean mi and

covariance matrix σ2Ι

Each data point is generated according to the following recipe:

1. Pick a component at random: Choose component i with

probability P(y=i)

µ2

Slide Credit: Carlos Guestrin(C) Dhruv Batra 37

Page 38: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

The K-­means GMM assumption• There are k components

• Component i has an associated mean vector µι

• Each component generates data from a Gaussian with mean mi and

covariance matrix σ2Ι

Each data point is generated according to the following recipe:

1. Pick a component at random: Choose component i with

probability P(y=i)

2. Datapoint ∼ Ν(µι, σ2Ι )

µ2

x

Slide Credit: Carlos Guestrin(C) Dhruv Batra 38

Page 39: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

The General GMM assumption

µ1

µ2

µ3

• There are k components

• Component i has an associated mean vector µι

• Each component generates data from a Gaussian with mean mi and

covariance matrix ΣiEach data point is generated

according to the following recipe:

1. Pick a component at random: Choose component i with

probability P(y=i)

2. Datapoint ∼ Ν(µι, Σi )

Slide Credit: Carlos Guestrin(C) Dhruv Batra 39

Page 40: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

K-­means vs GMM• K-­Means

– http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

• GMM– http://www.socr.ucla.edu/applets.dir/mixtureem.html

(C) Dhruv Batra 40

Page 41: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

EM• Expectation Maximization [Dempster ‘77]

• Often looks like “soft” K-­means

• Extremely general• Extremely useful algorithm

– Essentially THE goto algorithm for unsupervised learning

• Plan– EM for learning GMM parameters– EM for general unsupervised learning problems

(C) Dhruv Batra 41

Page 42: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

EM for Learning GMMs• Simple Update Rules

– E-­Step: estimate P(zi = j | xi)– M-­Step: maximize full likelihood weighted by posterior

(C) Dhruv Batra 42

Page 43: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

Gaussian Mixture Example: Start

43(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 44: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 1st iteration

44(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 45: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 2nd iteration

45(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 46: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 3rd iteration

46(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 47: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 4th iteration

47(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 48: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 5th iteration

48(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 49: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 6th iteration

49(C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 50: L23 gmm em - Virginia Techf16ece5424/slides/L23_gmm...• Fedex,#Library,#ECE#support,#CS#support – Format: • Portrait,##2#feet#(width)#x36#inches(height) • Submit#poster#asPDF#byDec6

After 20th iteration

50(C) Dhruv Batra Slide Credit: Carlos Guestrin