UVA CS 4501: Machine Learning Lecture 22: Unsupervised ...€¦ · Machine Learning Lecture 22:...

Post on 27-Jul-2020

5 views 0 download

Transcript of UVA CS 4501: Machine Learning Lecture 22: Unsupervised ...€¦ · Machine Learning Lecture 22:...

UVACS4501:MachineLearning

Lecture22:UnsupervisedClustering(I)

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

4/24/18

Dr.YanjunQi/UVACS

1

Wherearewe?èmajorsecJonsofthiscourse

q Regression(supervised)q ClassificaJon(supervised)

q FeatureselecJonq Unsupervisedmodels

q DimensionReducJon(PCA)q Clustering(K-means,GMM/EM,Hierarchical)

q Learningtheoryq Graphicalmodels

4/24/18

Dr.YanjunQi/UVACS

2

AnunlabeledDatasetX

•  Data/points/instances/examples/samples/records:[rows]•  Features/a0ributes/dimensions/independentvariables/covariates/predictors/regressors:[columns]

4/24/18

Dr.YanjunQi/UVACS

a data matrix of n observations on p variables x1,x2,…xp

Unsupervisedlearning=learningfromraw(unlabeled,unannotated,etc)data,asopposedtosuperviseddatawherelabelofexamplesisgiven

3

4/24/18

Dr.YanjunQi/UVACS

Today:Whatisclustering?

•  Arethereany“groups”?•  Whatiseachgroup?•  Howmany?•  HowtoidenJfythem?

4

•  Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups

Whatisclustering?

Inter-cluster distances are maximized

Intra-cluster distances are

minimized

4/24/18

Dr.YanjunQi/UVACS

5

4/24/18

Dr.YanjunQi/UVACS

Whatisclustering?•  Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjects–  highintra-classsimilarity–  lowinter-classsimilarity–  Itisthecommonestformofunsupervisedlearning

•  AcommonandimportanttaskthatfindsmanyapplicaJonsinScience,Engineering,informaJonScience,andotherplaces,e.g.

•  GroupgenesthatperformthesamefuncJon•  GroupindividualsthathassimilarpoliJcalview•  Categorizedocumentsofsimilartopics•  Idealitysimilarobjectsfrompictures

6

4/24/18

Dr.YanjunQi/UVACS

Whatisclustering?•  Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjects–  highintra-classsimilarity–  lowinter-classsimilarity–  Itisthecommonestformofunsupervisedlearning

•  AcommonandimportanttaskthatfindsmanyapplicaJonsinScience,Engineering,informaJonScience,andotherplaces,e.g.

•  GroupgenesthatperformthesamefuncJon•  GroupindividualsthathassimilarpoliJcalview•  Categorizedocumentsofsimilartopics•  Idealitysimilarobjectsfrompictures

7

4/24/18

Dr.YanjunQi/UVACS

ToyExamples•  People

•  Images

•  Language

•  species

8

Application (I): Search

Result Clustering

4/24/18 9

Dr.YanjunQi/UVACS

Application (II): Navigation

4/24/18 10

Dr.YanjunQi/UVACS

4/24/18

Dr.YanjunQi/UVACS

Issuesforclustering•  Whatisanaturalgroupingamongtheseobjects?

–  DefiniJonof"groupness"•  Whatmakesobjects“related”?

–  DefiniJonof"similarity/distance"•  RepresentaJonforobjects

–  Vectorspace?NormalizaJon?•  Howmanyclusters?

–  Fixedapriori?–  Completelydatadriven?

•  Avoid“trivial”clusters-toolargeorsmall•  ClusteringAlgorithms

–  ParJJonalalgorithms–  Hierarchicalalgorithms

•  FormalfoundaJonandconvergence11

4/24/18

Dr.YanjunQi/UVACS

TodayRoadmap:clustering

§  DefiniJonof"groupness”§  DefiniJonof"similarity/distance"§  RepresentaJonforobjects§  Howmanyclusters?§  ClusteringAlgorithms

§ ParJJonalalgorithms§ Hierarchicalalgorithms

§  FormalfoundaJonandconvergence12

4/24/18

Dr.YanjunQi/UVACS

Whatisanaturalgroupingamongtheseobjects?

13

Anotherexample:clusteringissubjecJve

A

B

A

B

A

B

A

B A

B

A

B

TwopossibleSoluJons…

4/24/18 Dr.YanjunQi/UVACS 14

4/24/18

Dr.YanjunQi/UVACS

TodayRoadmap:clustering

§  DefiniJonof"groupness”§  DefiniJonof"similarity/distance"§  RepresentaJonforobjects§  Howmanyclusters?§  ClusteringAlgorithms

§ ParJJonalalgorithms§ Hierarchicalalgorithms

§  FormalfoundaJonandconvergence15

4/24/18

Dr.YanjunQi/UVACS

WhatisSimilarity?

•  TherealmeaningofsimilarityisaphilosophicalquesJon.WewilltakeamorepragmaJcapproach

•  DependsonrepresentaJonandalgorithm.Formanyrep./alg.,easiertothinkintermsofadistance(ratherthansimilarity)betweenvectors.

Hardtodefine!Butweknowitwhenweseeit

16

4/24/18

Dr.YanjunQi/UVACS

WhatproperJesshouldadistancemeasurehave?

•  D(A,B)=D(B,A) Symmetry

•  D(A,A)=0 ConstancyofSelf-Similarity

•  D(A,B)=0IIfA=B Posi=vitySepara=on

•  D(A,B)<=D(A,C)+D(B,C) TriangularInequality

17

4/24/18

Dr.YanjunQi/UVACS

•  D(A,B)=D(B,A) Symmetry–  Otherwiseyoucouldclaim"AlexlookslikeBob,butBoblooksnothing

likeAlex"

•  D(A,A)=0 ConstancyofSelf-Similarity–  Otherwiseyoucouldclaim"AlexlooksmorelikeBob,thanBobdoes"

•  D(A,B)=0IIfA=B Posi=vitySepara=on–  Otherwisethereareobjectsinyourworldthataredifferent,butyou

cannottellapart.

•  D(A,B)<=D(A,C)+D(B,C) TriangularInequality–  Otherwiseyoucouldclaim"AlexisverylikeBob,andAlexisverylike

Carl,butBobisveryunlikeCarl"

IntuiJonsbehinddesirableproperJesofdistancemeasure

18

4/24/18

Dr.YanjunQi/UVACS

DistanceMeasures:MinkowskiMetric

•  Supposetwoobjectxandybothhavepfeatures

•  TheMinkowskimetricisdefinedby•  MostCommonMinkowskiMetrics

!!d(x , y)= |xi− yi

i=1

p

∑ |rr

!!

x = (x1 ,x2 ,!,xp)y = ( y1 , y2 ,!, yp)

1,r =2(Euclideandistance)d(x , y)= |xi− yii=1

p

∑ |22

2,r =1(Manhattandistance)d(x , y)= |xi− yii=1

p

∑ |

3,r = +∞("sup"distance)d(x , y)=max1≤i≤p

|xi− yi |19

4/24/18

Dr.YanjunQi/UVACS

.},{max :distance sup"" :3. :distanceManhattan :2

. :distanceEuclidean :1

434734

5342 22

==+

=+

AnExample

4

3

x

y

20

4/24/18

Dr.YanjunQi/UVACS

.},{max :distance sup"" :3. :distanceManhattan :2

. :distanceEuclidean :1

434734

5342 22

==+

=+

AnExample

4

3

x

y

21

4/24/18

Dr.YanjunQi/UVACS

11011111100001110100111001001001101716151413121110987654321

GeneBGeneA

. :Distance Hamming 5141001 =+=+ )#()#(

•  ManhanandistanceiscalledHammingdistancewhenallfeaturesarebinaryordiscrete.

–  E.g.,GeneExpressionLevelsUnder17CondiJons(1-High,0-Low)

Hammingdistance:discretefeatures

!!d(x , y)= |xi− yi

i=1

p

∑ |

22

4/24/18

Dr.YanjunQi/UVACS

EditDistance:Agenerictechniqueformeasuringsimilarity

•  Tomeasurethesimilaritybetweentwoobjects,transformoneoftheobjectsintotheother,andmeasurehowmucheffortittook.Themeasureofeffortbecomesthedistancemeasure.

ThedistancebetweenPanyandSelma.

Changedresscolor,1pointChangeearringshape,1pointChangehairpart,1point

D(Pany,Selma)=3

ThedistancebetweenMargeandSelma.

Changedresscolor,1pointAddearrings,1pointDecreaseheight,1pointTakeupsmoking,1pointLoseweight,1point

D(Marge,Selma)=5

ThisiscalledtheEditdistanceortheTransformaJondistance23

SelmaPanyMarge

•  PearsoncorrelaJoncoefficient

•  Specialcase:cosinedistance4/24/18

Dr.YanjunQi/UVACS

. and where

)()(

))((),(

∑∑

∑ ∑

==

= =

=

==

−×−

−−=

p

iip

p

iip

p

i

p

iii

p

iii

yyxx

yyxx

yyxxyxs

1

1

1

1

1 1

22

1

1≤),( yxs

SimilarityMeasures:CorrelaJonCoefficient

yxyxyxs !!!!

⋅⋅=),(

•  MeasuringthelinearcorrelaLonbetweentwosequences,xandy,

•  givingavaluebetween+1and−1inclusive,where1istotalposiJvecorrelaLon,0isnocorrelaLon,and−1istotalnegaJvecorrelaLon.

CorrelaJonisunitindependent

24

4/24/18

Dr.YanjunQi/UVACS

SimilarityMeasures:e.g.,CorrelaJonCoefficientonJmeseriessamples

Time

Gene A

Gene B

Gene A Time

Gene B

Expression Level Expression Level

Expression Level

Time

Gene A Gene B

25

CorrelaJonisunitindependent;IfyouscaleoneoftheobjectstenJmes,youwillgetdifferenteuclideandistancesandsamecorrelaJondistances.

4/24/18

Dr.YanjunQi/UVACS

TodayRoadmap:clustering

§  DefiniJonof"groupness”§  DefiniJonof"similarity/distance"§  RepresentaJonforobjects§  Howmanyclusters?§  ClusteringAlgorithms

§ ParJJonalalgorithms§ Hierarchicalalgorithms

§  FormalfoundaJonandconvergence26

4/24/18

Dr.YanjunQi/UVACS

ClusteringAlgorithms

•  ParJJonalalgorithms– Usuallystartwitharandom(parJal)parJJoning

–  RefineititeraJvely•  Kmeansclustering•  Mixture-Modelbasedclustering

•  Hierarchicalalgorithms–  Bonom-up,agglomeraJve–  Top-down,divisive

27

4/24/18

Dr.YanjunQi/UVACS

ClusteringAlgorithms

•  ParJJonalalgorithms– Usuallystartwitharandom(parJal)parJJoning

–  RefineititeraJvely•  Kmeansclustering•  Mixture-Modelbasedclustering

•  Hierarchicalalgorithms–  Bonom-up,agglomeraJve–  Top-down,divisive

28

4/24/18

Dr.YanjunQi/UVACS

TodayRoadmap:clustering

§  DefiniJonof"groupness”§  DefiniJonof"similarity/distance"§  RepresentaJonforobjects§  Howmanyclusters?§  ClusteringAlgorithms

§ ParJJonalalgorithms§ Hierarchicalalgorithms

§  FormalfoundaJonandconvergence29

4/24/18

Dr.YanjunQi/UVACS

HierarchicalClustering•  Buildatree-basedhierarchicaltaxonomy(dendrogram)fromasetofobjects,e.g.organisms,documents.

•  NotethathierarchiesarecommonlyusedtoorganizeinformaJon,forexampleinawebportal.–  Yahoo!hierarchyismanuallycreated,wewillfocusonautomaJccreaJonofhierarchies

Withbackbone Withoutbackbone

30

(How-to) Hierarchical Clustering The number of dendrograms with n leafs

= (2n -3)!/[(2(n -2)) (n -2)!]

Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10  34,459,425

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjectsè

highintra-classsimilaritylowinter-classsimilarity

4/24/18

Dr.YanjunQi/UVACS

31

(How-to) Hierarchical Clustering The number of dendrograms with n leafs

= (2n -3)!/[(2(n -2)) (n -2)!]

Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10  34,459,425

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjectsè

highintra-classsimilaritylowinter-classsimilarity

Agreedylocal

opJmalsoluJon

4/24/18

Dr.YanjunQi/UVACS

32

4/24/18

Dr.YanjunQi/UVACS

33

(How-to) Hierarchical Clustering The number of dendrograms with n leafs

= (2n -3)!/[(2(n -2)) (n -2)!]

Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10  34,459,425

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjectsè

highintra-classsimilaritylowinter-classsimilarity

Agreedylocal

opJmalsoluJon

0 8 8 7 7

0 2 4 4

0 3 3

0 1

0

D( , ) = 8 D( , ) = 1

We begin with a distance matrix which contains the distances between every pair of objects in our database.

4/24/18

Dr.YanjunQi/UVACS

34

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

… Consider all possible merges…

Choose the best

4/24/18

Dr.YanjunQi/UVACS

35

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

… Consider all possible merges…

Choose the best

Consider all possible merges… …

Choose the best

4/24/18

Dr.YanjunQi/UVACS

36

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

… Consider all possible merges…

Choose the best

Consider all possible merges… …

Choose the best

Consider all possible merges…

Choose the best …

4/24/18

Dr.YanjunQi/UVACS

37

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

… Consider all possible merges…

Choose the best

Consider all possible merges… …

Choose the best

Consider all possible merges…

Choose the best … But how do we compute distances

between clusters rather than objects?

4/24/18

Dr.YanjunQi/UVACS

38

4/24/18

Dr.YanjunQi/UVACS

Howtodecidethedistancesbetweenclusters?

•  Single-Link

– NearestNeighbor:theirclosestmembers.

•  Complete-Link– FurthestNeighbor:theirfurthestmembers.

•  Average:– averageofallcross-clusterpairs.

39

Computing distance between clusters: Single Link

•  cluster distance = distance of two closest members in each class

- Potentially long and skinny clusters

4/24/18

Dr.YanjunQi/UVACS

40

Computing distance between clusters: : Complete Link

•  cluster distance = distance of two farthest members

+ tight clusters

4/24/18

Dr.YanjunQi/UVACS

41

Computing distance between clusters: Average Link

•  cluster distance = average distance of all pairs

the most widely used measure

Robust against noise

4/24/18

Dr.YanjunQi/UVACS

42

Example: single link

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0458907910

03602

0

54321

54321

12 3 4

5

4/24/18

Dr.YanjunQi/UVACS

43

Example: single link

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0458907910

03602

0

54321

54321

12 3 4

5

4/24/18

Dr.YanjunQi/UVACS

44

Example: single link

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0458907910

03602

0

54321

54321

12 3 4

5

4/24/18

Dr.YanjunQi/UVACS

45

Example: single link

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0458907910

03602

0

54321

54321

⎥⎥⎥⎥

⎢⎢⎢⎢

0458079

030

543)2,1(

543)2,1(

12 3 4

5

8}8,9min{},min{9}9,10min{},min{3}3,6min{},min{

5,25,15),2,1(

4,24,14),2,1(

3,23,13),2,1(

======

===

ddddddddd

4/24/18

Dr.YanjunQi/UVACS

46

Example: single link

⎥⎥⎥

⎢⎢⎢

04507

0

54)3,2,1(

54)3,2,1(

12 3 4

5

⎥⎥⎥⎥

⎢⎢⎢⎢

0458079

030

543)2,1(

543)2,1(

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0458907910

03602

0

54321

54321

5}5,8min{},min{7}7,9min{},min{

5,35),2,1(5),3,2,1(

4,34),2,1(4),3,2,1(

======

dddddd

4/24/18

Dr.YanjunQi/UVACS

47

Example: single link

⎥⎥⎥

⎢⎢⎢

04507

0

54)3,2,1(

54)3,2,1(

12 3 4

5

⎥⎥⎥⎥

⎢⎢⎢⎢

0458079

030

543)2,1(

543)2,1(

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0458907910

03602

0

54321

54321

5},min{ 5),3,2,1(4),3,2,1()5,4(),3,2,1( == ddd

4/24/18

Dr.YanjunQi/UVACS

48

29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7

1

2

3

4

5

6

7

Average linkage

Single linkage

Height represents distance between objects / clusters

ParJJonsbycutngthedendrogramatadesiredlevel:eachconnectedcomponentformsacluster.

4/24/18

Dr.YanjunQi/UVACS

49

4/24/18

Dr.YanjunQi/UVACS

HierarchicalClustering•  Bonom-UpAgglomeraJveClustering

–  Startswitheachobjectinaseparatecluster–  thenrepeatedlyjoinstheclosestpairofclusters,–  unJlthereisonlyonecluster.

Thehistoryofmergingformsabinarytreeorhierarchy(dendrogram)

•  Top-Downdivisive–  StarJngwithallthedatainasinglecluster,–  Considereverypossiblewaytodividetheclusterintotwo.Choosethebestdivision

–  Andrecursivelyoperateonbothsides.

50

4/24/18

Dr.YanjunQi/UVACS

ComputaJonalComplexity

•  InthefirstiteraJon,allHACmethodsneedtocomputesimilarityofallpairsofnindividualinstanceswhichisO(n2p).

•  Ineachofthesubsequentn−2mergingiteraJons,computethedistancebetweenthemostrecentlycreatedclusterandallotherexisJngclusters.

•  Forthesubsequentsteps,inordertomaintainanoverallO(n2)performance,compuJngsimilaritytoeachotherclustermustbedoneinconstantJme.ElseO(n2logn)orO(n3)ifdonenaively

51

SummaryofHierarchalClusteringMethods

•  Noneedtospecifythenumberofclustersinadvance.

•  HierarchicalstructuremapsnicelyontohumanintuiJonforsomedomains

•  Theydonotscalewell:JmecomplexityofatleastO(n2),wherenisthenumberoftotalobjects.

•  LikeanyheurisJcsearchalgorithms,localopJmaareaproblem.

•  InterpretaJonofresultsis(very)subjecJve.4/24/18

Dr.YanjunQi/UVACS

52

Hierarchical Clustering

Clustering

n/a

No clearly defined loss

greedy bottom-up (or top-down)

Dendrogram (tree)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

4/24/18 53

Dr.YanjunQi/UVACS

References

q HasJe,Trevor,etal.Theelementsofsta=s=callearning.Vol.2.No.1.NewYork:Springer,2009.

q BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

q BigthankstoProf.ZivBar-Joseph@CMUforallowingmetoreusesomeofhisslides

4/24/18

Dr.YanjunQi/UVACS

54