CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13....

21

CSE/STAT 416 Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13

Upload
others
Category

Documents
view
0
download
0

Embed Size (px):

Transcript of CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13....

Page 1: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

CSE/STAT 416Section 7, 5/16

May 16, 2019

CSE/STAT 416 May 16, 2019 1 / 13

Page 2: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Agenda

Lecture Recap: Pros and Cons of K-means

Intro to Spectral Clustering

Spectral Clustering vs. K-means demo

Time for questions/review

CSE/STAT 416 May 16, 2019 2 / 13

Page 3: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Recall from Class...

The k-means algorithm

Start with k randomly initialized centers, µj .

Repeat until the centers stop moving:

Fix centers and assign each point to the closest center (update eachdatapoints’s zi valueFix the zi and update the centers µj (set µj as the centroid of allpoints with zi = j .

At every step, the objective

k∑j=1

∑i :zi=j

||µj − xi ||22

gets smaller (clusters get more homogeneous in terms of distance)

CSE/STAT 416 May 16, 2019 3 / 13

Page 4: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Cons of K-means

Problem: Sensitive to initial conditions

Solution: Try many initial conditions and compare performance, pickinitial conditions intelligently (k-means++)

Problem: Must select K in advance

Solution: Can try many ks and compare cluster heterogeneityBe careful of overfitting: clusters always get more homogeneous as Kgets largerShould penalize large k.

Problem: Assumes linear cluster boundaries, and assumesminimizing within-node distance is best objective.

Solution: Use a new algorithm, or at least a new representation of thedata

CSE/STAT 416 May 16, 2019 4 / 13

Page 5: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Cons of K-means

Problem: Sensitive to initial conditions

Solution: Try many initial conditions and compare performance, pickinitial conditions intelligently (k-means++)

Problem: Must select K in advance

Solution: Can try many ks and compare cluster heterogeneityBe careful of overfitting: clusters always get more homogeneous as Kgets largerShould penalize large k.

Problem: Assumes linear cluster boundaries, and assumesminimizing within-node distance is best objective.

Solution: Use a new algorithm, or at least a new representation of thedata

CSE/STAT 416 May 16, 2019 4 / 13

Page 6: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Cons of K-means

Problem: Sensitive to initial conditions

Solution: Try many initial conditions and compare performance, pickinitial conditions intelligently (k-means++)

Problem: Must select K in advance

Solution: Can try many ks and compare cluster heterogeneityBe careful of overfitting: clusters always get more homogeneous as Kgets largerShould penalize large k.

Problem: Assumes linear cluster boundaries, and assumesminimizing within-node distance is best objective.

Solution: Use a new algorithm, or at least a new representation of thedata

CSE/STAT 416 May 16, 2019 4 / 13

Page 7: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Cons of K-means

Problem: Sensitive to initial conditions

Solution: Try many initial conditions and compare performance, pickinitial conditions intelligently (k-means++)

Problem: Must select K in advance

Solution: Can try many ks and compare cluster heterogeneityBe careful of overfitting: clusters always get more homogeneous as Kgets largerShould penalize large k.

Problem: Assumes linear cluster boundaries, and assumesminimizing within-node distance is best objective.

Solution: Use a new algorithm, or at least a new representation of thedata

CSE/STAT 416 May 16, 2019 4 / 13

Page 8: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Cons of K-means

Problem: Sensitive to initial conditions

Solution: Try many initial conditions and compare performance, pickinitial conditions intelligently (k-means++)

Problem: Must select K in advance

Solution: Can try many ks and compare cluster heterogeneityBe careful of overfitting: clusters always get more homogeneous as Kgets largerShould penalize large k.

Problem: Assumes linear cluster boundaries, and assumesminimizing within-node distance is best objective.

Solution: Use a new algorithm, or at least a new representation of thedata

CSE/STAT 416 May 16, 2019 4 / 13

Page 9: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Cons of K-means

Problem: Sensitive to initial conditions

Solution: Try many initial conditions and compare performance, pickinitial conditions intelligently (k-means++)

Problem: Must select K in advance

Solution: Can try many ks and compare cluster heterogeneityBe careful of overfitting: clusters always get more homogeneous as Kgets largerShould penalize large k.

Problem: Assumes linear cluster boundaries, and assumesminimizing within-node distance is best objective.

Solution: Use a new algorithm, or at least a new representation of thedata

CSE/STAT 416 May 16, 2019 4 / 13

Page 10: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Kmeans interactive demo

https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

CSE/STAT 416 May 16, 2019 5 / 13

Page 11: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Motivation for Spectral Clustering∑kj=1

∑i :zi=j ||µj − xi ||22 is not necessarily the best objective function

Kmeans prioritizes compactness; what if we want to prioritizeconnectivity?

CSE/STAT 416 May 16, 2019 6 / 13

Page 12: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Motivation for Spectral Clustering∑kj=1

∑i :zi=j ||µj − xi ||22 is not necessarily the best objective function

Kmeans prioritizes compactness; what if we want to prioritizeconnectivity?

CSE/STAT 416 May 16, 2019 6 / 13

Page 13: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Input for Spectral Clustering

First, turn the data into a similarity matrix, or affinity matrix.

Entry (i , j) = 1 tells us how similar datapoint i is to datapoint j . Matrixshould be symmetric and non-negative.

0Figure credit: www.cs.cmu.edu/~aarti/Class/10701/slides/Lecture21_2.pdfCSE/STAT 416 May 16, 2019 7 / 13

www.cs.cmu.edu/~aarti/Class/10701/slides/Lecture21_2.pdf

Page 14: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Examples of similarity measures

If the data are represented as vectors:

Entry (i , j) = 1 if datapoint j is one of the k nearest neighbors ofdatapoint i .Entry (i , j) = e−||xi−xj ||

2/(2σ2) (Gaussian Kernel Function)Cosine similarity1/(distance) for any measure of distance

A cool thing about a similarity matrix is that you can define one evenif your data are not vectors.

Context specific notions of similarityCo-authorship, friendship, etc.

CSE/STAT 416 May 16, 2019 8 / 13

Page 15: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Examples of similarity measures

If the data are represented as vectors:

Entry (i , j) = 1 if datapoint j is one of the k nearest neighbors ofdatapoint i .Entry (i , j) = e−||xi−xj ||

2/(2σ2) (Gaussian Kernel Function)Cosine similarity1/(distance) for any measure of distance

A cool thing about a similarity matrix is that you can define one evenif your data are not vectors.

Context specific notions of similarityCo-authorship, friendship, etc.

CSE/STAT 416 May 16, 2019 8 / 13

Page 16: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Spectral Clustering Algorithm

Main idea: rearrange the rows and columns of the matrix to get a blockdiagonal form.

Maximize total similarity within the blocks, minimize total similarityoutside of the blocks.

CSE/STAT 416 May 16, 2019 9 / 13

Page 17: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

How does it work?

Algorithm:

Encode data as similarity matrix

Compute eigenvalues of similarity matrix, use the first few to obtainlow-dimensional representation of similarity matrix

(the low dimensional representation that keeps as much informationabout similarity as possible)

Apply K-means (or a similar algorithm) in this low dimensional spaceto get clustering

CSE/STAT 416 May 16, 2019 10 / 13

Page 18: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Discussion

If we end up applying K-means anyways, how does it draw non-lineardecision boundaries?

The boundaries it draws are linear, but they are linear in a transformedspace

CSE/STAT 416 May 16, 2019 11 / 13

Page 19: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Discussion

If we end up applying K-means anyways, how does it draw non-lineardecision boundaries?

The boundaries it draws are linear, but they are linear in a transformedspace

CSE/STAT 416 May 16, 2019 11 / 13

Page 20: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

KMeans and Spectral Clustering Demo

Demo notebook

CSE/STAT 416 May 16, 2019 12 / 13

Page 21: CSE/STAT 416 - Section 7, 5/16 · Section 7, 5/16 May 16, 2019 CSE/STAT 416 May 16, 2019 1 / 13. Agenda Lecture Recap: Pros and Cons of K-means Intro to Spectral Clustering Spectral

Pros and Cons of Spectral Clustering

Pros:

Can handle arbitrary cluster shapesMathematically elegant, and run time is reasonable for medium-sizedproblemsAllows for interesting, context-specific definitions of similarity

Cons:

Still need to pick K in advance (either know it or test several differentKs)Results highly dependent on what similarity metric is chosen

CSE/STAT 416 May 16, 2019 13 / 13

Handling missing data - courses.cs.washington.edu · Machine Learning Specialization University of Washington ©2018 Emily Fox . 2 STAT/CSE 416: Intro to Machine Learning Decision

Handling missing data - courses.cs.washington.edu · Machine Learning Specialization University of Washington ©2018 Emily Fox . 2 STAT/CSE 416: Intro to Machine Learning Decision

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA

M.SC Syllabus stat. 2015-16 Choice Based Credit … Syllabus stat. 2015-16 Choice Based... · Choice Based Credit SystemCore Courses ... 101 : Measure and Probability Theory 4 ST

M.SC Syllabus stat. 2015-16 Choice Based Credit … Syllabus stat. 2015-16 Choice Based... · Choice Based Credit SystemCore Courses ... 101 : Measure and Probability Theory 4 ST

PUBLIC LAW 105-200-JULY 16, 1998 112 STAT. 645 Public …112 stat. 646 public law 105-200-^uly 16, 1998 title i—child support data processing requirements sec. 101. alternative penalty

PUBLIC LAW 105-200-JULY 16, 1998 112 STAT. 645 Public …112 stat. 646 public law 105-200-^uly 16, 1998 title i—child support data processing requirements sec. 101. alternative penalty

2 Discrete Distributions - homepage.univie.ac.athomepage.univie.ac.at/florian.frommlet/WS2007/AdvStat/stat416_2.pdf · Probability STAT 416 Spring 2007 2 Discrete Distributions 1.

2 Discrete Distributions - homepage.univie.ac.athomepage.univie.ac.at/florian.frommlet/WS2007/AdvStat/stat416_2.pdf · Probability STAT 416 Spring 2007 2 Discrete Distributions 1.

16 USC 703-712; Ch. 128; July 13, 1918; 40 Stat. 755

16 USC 703-712; Ch. 128; July 13, 1918; 40 Stat. 755

GAO-16-416, Defense Health Care: DOD Is Meeting Most ...Highlights of GAO-16-416, a report to congressional committees April 2016. DEFENSE HEALTH CARE . DOD Is Meeting Most Mental

GAO-16-416, Defense Health Care: DOD Is Meeting Most ...Highlights of GAO-16-416, a report to congressional committees April 2016. DEFENSE HEALTH CARE . DOD Is Meeting Most Mental

NAV-416 TRIUMPH STREET TRIPLE 08-11 SOaz79640.vo.msecnd.net/akrapovicbppmultimedia/dd717...NAV - 416 5 Figure 16 3. Install additional Akrapovic muffler bracket using stock bolts and

NAV-416 TRIUMPH STREET TRIPLE 08-11 SOaz79640.vo.msecnd.net/akrapovicbppmultimedia/dd717...NAV - 416 5 Figure 16 3. Install additional Akrapovic muffler bracket using stock bolts and

104 STAT. 3096 PUBLIC LAW 101-606—NOV. 16, 1990 Public Law ... · fluctuations, may lead to significant global warming and thus PUBLIC LAW 101-606-NOV. 16, 1990 104 STAT. 3097 alter

104 STAT. 3096 PUBLIC LAW 101-606—NOV. 16, 1990 Public Law ... · fluctuations, may lead to significant global warming and thus PUBLIC LAW 101-606-NOV. 16, 1990 104 STAT. 3097 alter

Steam Table - NAFE · 16 Consentino Dr. Telephone: 416-299-8070 Toronto M1P 3A2 Toll Free: 1-866-299-8070 Ontario Canada Fax: 416-299-929 Steam Table

Steam Table - NAFE · 16 Consentino Dr. Telephone: 416-299-8070 Toronto M1P 3A2 Toll Free: 1-866-299-8070 Ontario Canada Fax: 416-299-929 Steam Table

STAT Working papers STAT Documents de travail STAT … · STAT Working papers STAT Documents de travail STAT Documentos de trabajo No. 95-3 INQUÉRITOS DE POPULAÇAO ACTIVA, EMPREGO,

STAT Working papers STAT Documents de travail STAT … · STAT Working papers STAT Documents de travail STAT Documentos de trabajo No. 95-3 INQUÉRITOS DE POPULAÇAO ACTIVA, EMPREGO,

98 STAT. 422 PUBLIC LAW 98-361—JULY 16, 1984 Public … · 98 STAT. 422 PUBLIC LAW 98-361—JULY 16, 1984 Public Law 98-361 98th Congress July 16, 1984 [H.R. 5154] National Aeronautics

98 STAT. 422 PUBLIC LAW 98-361—JULY 16, 1984 Public … · 98 STAT. 422 PUBLIC LAW 98-361—JULY 16, 1984 Public Law 98-361 98th Congress July 16, 1984 [H.R. 5154] National Aeronautics

4 Jointly distributed random variables - univie.ac.athomepage.univie.ac.at/~frommlf7/WS2007/AdvStat/stat416_4.pdf · 2007-03-02 · Probability STAT 416 Spring 2007 4 Jointly distributed

4 Jointly distributed random variables - univie.ac.athomepage.univie.ac.at/~frommlf7/WS2007/AdvStat/stat416_4.pdf · 2007-03-02 · Probability STAT 416 Spring 2007 4 Jointly distributed

Délibération CP 16-416€¦ · CONSEIL REGIONAL D’ILE DE FRANCE CP 21 septembre_AMIE univ ecole et AMi doc_FH AS GB V2 31/08/16 18:08:00 Délibération CP 16-416. DU . 21 Septembre

Délibération CP 16-416€¦ · CONSEIL REGIONAL D’ILE DE FRANCE CP 21 septembre_AMIE univ ecole et AMi doc_FH AS GB V2 31/08/16 18:08:00 Délibération CP 16-416. DU . 21 Septembre

STAT 151 FINAL EXAM 2014 SUMMER University of Alberta Version Abinzou/STAT141/STAT 151 FINAL EXAM... · 2015-03-16 · STAT 151 FINAL EXAM 2014 SUMMER University of Alberta Version

STAT 151 FINAL EXAM 2014 SUMMER University of Alberta Version Abinzou/STAT141/STAT 151 FINAL EXAM... · 2015-03-16 · STAT 151 FINAL EXAM 2014 SUMMER University of Alberta Version

Curriculum vitae for Peter F. Craigmile · STAT 882, Time Series Theory and Methods (Wi 12) STAT 3302, Statistics for Discovery II (Sp 16{17) STAT 5301, Intermediate Data Analysis

Curriculum vitae for Peter F. Craigmile · STAT 882, Time Series Theory and Methods (Wi 12) STAT 3302, Statistics for Discovery II (Sp 16{17) STAT 5301, Intermediate Data Analysis

88 STAT. ] PUBLIC LAW 93-416-SEPT. 7, 1974 IU3uscode.house.gov/statutes/pl/93/416.pdf · 88 STAT.] PUBLIC LAW 93-416-SEPT. 7, 1974 1145 provided by the schedule in subsection (c)

88 STAT. ] PUBLIC LAW 93-416-SEPT. 7, 1974 IU3uscode.house.gov/statutes/pl/93/416.pdf · 88 STAT.] PUBLIC LAW 93-416-SEPT. 7, 1974 1145 provided by the schedule in subsection (c)

PUBLIC LAW 103-416—OCT. 25, 1994 108 STAT. 4305 Public Law … · PUBLIC LAW 103-416—OCT. 25, 1994 108 STAT. 4305 Public Law 103-416 103d Congress An Act To amend title III of

PUBLIC LAW 103-416—OCT. 25, 1994 108 STAT. 4305 Public Law … · PUBLIC LAW 103-416—OCT. 25, 1994 108 STAT. 4305 Public Law 103-416 103d Congress An Act To amend title III of

PUBLIC LAW 98-360—JULY 16, 1984 98 STAT. 403 Public Law 98 ... · PUBLIC LAW 98-360—JULY 16, 1984 98 STAT. 405 GENERAL EXPENSES For expenses necessary for general administration

PUBLIC LAW 98-360—JULY 16, 1984 98 STAT. 403 Public Law 98 ... · PUBLIC LAW 98-360—JULY 16, 1984 98 STAT. 405 GENERAL EXPENSES For expenses necessary for general administration

BK-SFN-NEUROSCIENCE-131211-10 Raichle.indd 416 16/04/14 5 ...

BK-SFN-NEUROSCIENCE-131211-10 Raichle.indd 416 16/04/14 5 ...

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS