Lecture12 Clustering
Transcript of Lecture12 Clustering
-
7/26/2019 Lecture12 Clustering
1/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
"#$%&'() *+,-./01/ 23,4/556+1$/47+,.,16/58//9 ()
$/:;
-
7/26/2019 Lecture12 Clustering
2/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
$,0>?5 $,@64A E,+5
!
B,4=C/+; 3/@3/5/+;>E,+5
! #=44/55 436;/36>
!
.1,36;7C5!
2>3EE,+>.
! G6/3>34764>.
-
7/26/2019 Lecture12 Clustering
3/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
!"#$ &' ()*'$+,&-./
H
-
7/26/2019 Lecture12 Clustering
4/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
87>; 65 4.=5;/36+1I
! 5/; ,J ,KL/4;5
6+;, 4.>55/5 ,J 56C6.>3 ,KL/4;5
! B,4=C/+;5 -6;76+ > 4.=5;/3 57,=.0 K/ 56C6.>3M
!
B,4=C/+;5 J3,C 06N/3/+; 4.=5;/35 57,=.0 K/06556C6.>3M
! $7/ 4,CC,+/5; J,3C ,J '"3'4/$103/& 2/-$"0"5! O+5=@/3F65/0 ./>3+6+1 P I
Ch. 16
-
7/26/2019 Lecture12 Clustering
5/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
87>; 65 4.=5;/36+1I
! 5/; ,J ,KL/4;5
6+;, 4.>55/5 ,J 56C6.>3 ,KL/4;5
! B,4=C/+;5 -6;76+ > 4.=5;/3 57,=.0 K/ 56C6.>3M
!
B,4=C/+;5 J3,C 06N/3/+; 4.=5;/35 57,=.0 K/06556C6.>3M
! $7/ 4,CC,+/5; J,3C ,J '"3'4/$103/& 2/-$"0"5
Ch. 16
-
7/26/2019 Lecture12 Clustering
6/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
Q 0>;> 5/; -6;7 4./>3 4.=5;/3 5;3=4;=3/
! G,- -,=.0
?,= 0/561+
>+ >.1,36;7C
J,3 R+06+1
;7/ ;73//
4.=5;/35 6+
;765 4>5/I
Ch. 16
-
7/26/2019 Lecture12 Clustering
7/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S
! !0123 415678 9:92;8
-
7/26/2019 Lecture12 Clustering
8/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S
! 87,./ 4,3@=5 >+>.?565Y+>F61>E,+
! %:@.,3/ 0>;>
! G15 .
! T/-/3 4,C@>365,+5
Sec. 16.1
-
7/26/2019 Lecture12 Clustering
9/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
T,3 6C@3,F6+1 5/>347 3/4>..
! !"#$%&' )*+,%)&$-$& B,4=C/+;5 6+ ;7/ 5>C/ 4.=5;/3 K/7>F/
56C6.>3.? -6;7 3/5@/4; ;, 3/./F>+4/ ;, 6+J,3C>E,+ +//05
! $7/3/J,3/Z ;, 6C@3,F/ 5/>347 3/4>..A
! @36,36
!
87/+ > [=/3? C>;47/5 > 0,4 6Z >.5, 3/;=3+ ,;7/3 0,45 6+ ;7/
4.=5;/3 4,+;>6+6+1 6
! G,@/ 6J -/ 0, ;765A $7/ [=/3? 4>3-6.. >.5, 3/;=3+ 0,45 4,+;>6+6+1
-'#%,%702/
!
U/4>=5/ 4.=5;/36+1 13,=@/0 ;,1/;7/3 0,45 4,+;>6+6+1 (-$-6;7;7,5/ 4,+;>6+6+1 -'#%,%702/8
Why might this happen?
Sec. 16.1
-
7/26/2019 Lecture12 Clustering
10/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S
! 87,./ 4,3@=5 >+>.?565Y+>F61>E,+
! U/V/3 =5/3 6+;/3J>4/A 5/>347 -6;7,=; ;?@6+1
! T,3 6C@3,F6+1 3/4>.. 6+ 5/>347 >@@.64>E,+5
! U/V/3 5/>347 3/5=.;5 W.69/ @5/=0, STX
! G15 J3B35 :9>
-
7/26/2019 Lecture12 Clustering
11/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S
! 87,./ 4,3@=5 >+>.?565Y+>F61>E,+
! U/V/3 =5/3 6+;/3J>4/A 5/>347 -6;7,=; ;?@6+1
! T,3 6C@3,F6+1 3/4>.. 6+ 5/>347 >@@.64>E,+5
! U/V/3 5/>347 3/5=.;5 W.69/ @5/=0, STX
! T,3 K/V/3 +>F61>E,+ ,J 5/>347 3/5=.;5
! %N/4EF/ =5/3 3/4>..-6.. K/ 7617/3
!
G15 8633I34C15 86943 53C592
! (278C35LJ983I 53C592 ?38 D98C35 839540
Sec. 16.1
-
7/26/2019 Lecture12 Clustering
12/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
()*'$+,&-. &''*+'
()
& ) + ) 0 2 S 16 2
-
7/26/2019 Lecture12 Clustering
13/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
"55=/5 J,3 4.=5;/36+1
! S/@3/5/+;>E,+ J,3 4.=5;/36+1
! B,4=C/+; 3/@3/5/+;>E,+
! \/4;,3 5@>4/ C,0/.
!
]//0 > +,E,+ ,J 56C6.>36;?Y065;>+4/! G,- C>+? 4.=5;/35I
! T6:/0 > @36,36I
! ;> 036F/+I
!
QF,60 ;36F6>.4.=5;/35 & ;,, .>31/ ,3 5C>..
! 87>; 65 ;7/ 3617; 56^/ J,3 > 4.=5;/3I
! _69/.? 0>;> 0/@/+0/+;
Sec. 16.2
! & ) ! + ) . 0 2
-
7/26/2019 Lecture12 Clustering
14/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
],E,+ ,J 56C6.>36;?Y065;>+4/
! "0/>.. $&/0123 $-/-"0'-%*M
! 23>4E4>.A ;/3C&5;>E5E4>. 56C6.>36;? W0,45 >5
F/4;,35X!
36;?
!
T,3 C>+? >.1,36;7C5Z />56/3 ;, ;76+9 6+
;/3C5 ,J > &03#-"(/W3>;7/3 ;7>+ 56C6.>36;?XK/;-//+ 0,45M
!U=; 3/>. 6C@./C/+;>E,+5 =5/ 4,56+/ 56C6.>36;?
! # & ) # ! + ) . # 0 2
-
7/26/2019 Lecture12 Clustering
15/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
; >.1,36;7C5
! O5=>..? 5;>3; -6;7 > 3>+0,C @>3EE,+6+1
! S/R+/ 6; 6;/3>EF/.?
!
9 C/>+5 4.=5;/36+1
! G6/3>34764>. >.1,36;7C5
! U,V,C&=@Z >11.,C/3>EF/
! # & ) # ! + ) . # 0 2 Sec 16 4
-
7/26/2019 Lecture12 Clustering
16/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
9&D/>+5
! Q55=C/5 0,4=C/+;5 >3/ 3/>.&F>.=/0 F/4;,35M
! 5/0 ,+ (/"#$%0&3 W>9> ;7/ (/"#/$ %+ 5$-10#:
,3 C/>+X ,J @,6+;5 6+ > 4.=5;/3Z (A
! S/>5561+C/+; ,J 6+5;>+4/5 ;, 4.=5;/35 65 K>5/0 ,+
065;>+4/ ;, ;7/ 4=33/+; 4.=5;/3 4/+;3,605M! W`3 ,+/ 4>+ /[=6F>./+;.? @73>5/ 6; 6+ ;/3C5 ,J 56C6.>36E/5X
!"
=
cx
x
c !
!!
||
1(c)
Sec. 16.4
! # & ) # ! + ) . # 0 2 S 16 4
-
7/26/2019 Lecture12 Clustering
17/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
9&D/>+5 Q.1,36;7C
#/./4; 93>+0,C 0,45 a3(Z 3)Zb 39c >5 5//05M
O+E. 4.=5;/36+1 (%"1/$5/3W,3 ,;7/3 5;,@@6+1 436;/36,+XA
T,3 />47 0,4 &0A
Q5561+ &0;, ;7/ 4.=5;/3 (;5=47 ;7>; &03#W.MW=/ '4&-#/ #?/ 3//&3 #% #?/ (/"#$%0& %+ /-(? (2'3#/$X
T,3 />47 4.=5;/3 (;
-
7/26/2019 Lecture12 Clustering
18/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
9D/>+5 %:>C@./W9P)X
Pick seeds
Reassign clusters
Compute centroids
x
x
Reassign clusters
x
x xx Compute centroids
Reassign clusters
Converged!
Sec. 16.4
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 S 16 4
-
7/26/2019 Lecture12 Clustering
19/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
$/3C6+>E,+ 4,+06E,+5
! #/F/3>. @,556K6.6E/5Z /M1MZ
! Q R:/0 +=CK/3 ,J 6;/3>E,+5M
!
B,4 @>3EE,+ =+47>+1/0M!
+1/M
Does this mean that the docs in a
cluster are unchanged?
Sec. 16.4
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 16 4
-
7/26/2019 Lecture12 Clustering
20/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
#//0 + F>3? K>5/0 ,+3>+0,C 5//0 5/./4E,+M
! #,C/ 5//05 4>+ 3/5=.; 6+ @,,3
4,+F/31/+4/ 3>;/Z ,34,+F/31/+4/ ;, 5=K&,@EC>.4.=5;/36+15M
! #/./4; 1,,0 5//05 =56+1 > 7/=365E4
W/M1MZ 0,4 ./>5; 56C6.>3 ;, >+?/:65E+1 C/>+X
! $3? ,=; C=.E@./ 5;>3E+1 @,6+;5
! "+6E>.6^/ -6;7 ;7/ 3/5=.;5 ,J >+,;7/3C/;7,0M
In the above, if you start
with B and E as centroids
you converge to {A,B,C}
and {D,E,F}
If you start with D and F
you converge to
{A,B,D,E} {C,F}
Example showing
sensitivity to seeds
Sec. 16.4
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
-
7/26/2019 Lecture12 Clustering
21/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
G,- D>+? 3EE,+"0,45 6+;, @3/0/;/3C6+/0 +=CK/3 ,J 4.=5;/35
! T6+06+1 ;7/ 3617;+=CK/3 ,J 4.=5;/35 65 @>3; ,J ;7/
@3,K./C! d6F/+ 0,45Z @>3EE,+ 6+;, >+ >@@3,@36>;/+=CK/3 ,J
5=K5/;5M
! %M1MZ J,3 [=/3? 3/5=.;5 & 60/>. F>.=/ ,J 9+,; 9+,-+ =@ J3,+;
& ;7,=17 O" C>? 6C@,5/ .6C6;5M
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
-
7/26/2019 Lecture12 Clustering
22/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
"&+,#,("&(#) ()*'$+,&-.
))
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Ch 17
-
7/26/2019 Lecture12 Clustering
23/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
G6/3>34764>. ;3//&K>5/0 76/3>34764>. ;>:,+,C?
W&/"&$%5$-,X J3,C > 5/; ,J 0,4=C/+;5M
! `+/ >@@3,>47A 3/4=356F/ >@@.64>E,+ ,J >
@>3EE,+>. 4.=5;/36+1 >.1,36;7CM
animal
vertebrate
fish reptile amphib. mammal worm insect crustacean
invertebrate
Ch. 17
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
-
7/26/2019 Lecture12 Clustering
24/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
Dendrogram: Hierarchical Clustering
)e
!
Clustering obtained
by cutting the
dendrogram at a
desired level: eachconnected
component forms a
cluster.
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 1
-
7/26/2019 Lecture12 Clustering
25/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
G6/3>34764>. Q11.,C/3>EF/ 47 0,4 6+ > 5/@>3>;/ 4.=5;/3
! ;7/+ 3/@/>;/0.? L,6+5 ;7/ (2%3/3# 4-0$,J
4.=5;/35Z =+E. ;7/3/ 65 ,+.? ,+/ 4.=5;/3M
! $7/ 765;,3? ,J C/316+1 J,3C5 > K6+>3? ;3//
,3 76/3>347?M
Sec. 17.1
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 2
-
7/26/2019 Lecture12 Clustering
26/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
@2%3/3# 4-0$,J 4.=5;/35
!
D>+? F>36>+;5 ;, 0/R+6+1 4.,5/5; @>63 ,J 4.=5;/35
! '3 W56+1./&.6+9X
!
(1H623C3L236;? ,J ;7/ J=3;7/5;@,6+;5Z ;7/ 2/-3#4,56+/&56C6.>3
! (3:C513/ ;7/ C,5;4,56+/&56C6.>3
! #>359?3L21/ 4,56+/ K/;-//+ >.. @>635 ,J /./C/+;5
Sec. 17.2
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 2
-
7/26/2019 Lecture12 Clustering
27/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
#6+1./ _6+9 Q11.,C/3>EF/ :6C=C 56C6.>36;? ,J @>635A
! + 3/5=.; 6+ 5;3>11.?W.,+1 >+0 ;76+X 4.=5;/35
0=/ ;, 47>6+6+1 /N/4;M
! Qf/3 C/316+1 (0>+0 (;Z ;7/ 56C6.>36;? ,J ;7/
3/5=.E+1 4.=5;/3 ;, >+,;7/3 4.=5;/3Z (AZ 65A
),(max),(,
yxsimccsimji cycx
ji!!
=
)),(),,(max()),(( kjkikji ccsimccsimcccsim =!
Sec. 17.2
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 2
-
7/26/2019 Lecture12 Clustering
28/34
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
36;? ,J @>635A
!
D>9/5 E17;/3Z5@7/364>. 4.=5;/35 ;7>; >3/ ;?@64>..?
@3/J/3>K./M
! Qf/3 C/316+1 (0>+0 (;Z ;7/ 56C6.>36;? ,J ;7/ 3/5=.E+1
4.=5;/3 ;, >+,;7/3 4.=5;/3Z (AZ 65A
),(min),(,
yxsimccsimji
cycxji
!!
=
)),(),,(min()),(( kjkikji ccsimccsimcccsim =!
Ci Cj Ck
Sec. 17.2
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 3
-
7/26/2019 Lecture12 Clustering
29/34
+
d3,=@ QF/3>1/
!
#6C6.>36;? ,J ;-, 4.=5;/35 P >F/3>1/ 56C6.>36;? ,J >..@>635 -6;76+ C/31/0 4.=5;/3M
! +0 4,C@./;/ .6+9M
! $-, ,@E,+5A
! QF/3>1/0 >43,55 >.. @>635 6+ ;7/ C/31/0 4.=5;/3
!
QF/3>1/0 ,F/3 >.. @>635 7/#B//";7/ ;-, ,3616+>. 4.=5;/35
! !"# $"#%""=
)( :)(
),(
)1(
1),(
ji jiccx xyccyjiji
ji yxsim
cccc
ccsim! !!!
!!
Sec. 17.3
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2
-
7/26/2019 Lecture12 Clustering
30/34
+
+-N GO, $"&' !++P
Hg
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec. 16.3
-
7/26/2019 Lecture12 Clustering
31/34
+
87>; "5 Q d,,0
-
7/26/2019 Lecture12 Clustering
32/34
+
%:;/3+>. 436;/36> J,3 4.=5;/36+1 [=>.6;?
!
h=>.6;? C/>5=3/0 K? 6;5 >K6.6;? ;, 0654,F/3 5,C/
,3 >.. ,J ;7/ 7600/+ @>V/3+5 ,3 .>;/+; 4.>55/5 6+
1,.0 5;>+0>30 0>;>
!
Q55/55/5 > 4.=5;/36+1 -6;7 3/5@/4; ;, 13,=+0;3=;7 b 3/[=63/5 2-7/2/& &-#-
! Q55=C/ 0,4=C/+;5 -6;7 @1,.0 5;>+0>30 4.>55/5Z
-76./ ,=3 4.=5;/36+1 >.1,36;7C5 @3,0=4/ 94.=5;/35Z i(Z i)Z bZ i9 -6;7 "0C/CK/35M
Sec 6 3
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec. 16.3
-
7/26/2019 Lecture12 Clustering
33/34
%:;/3+>. %F>.=>E,+ ,J .6;?
!
#6C@./ C/>5=3/A @=36;?Z >5561+ 4.=5;/3 i6;,;7/ C,5; J3/[=/+; 4.>55
purity =1
Nj
max|!k!cj|k
"
!"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec. 16.3
-
7/26/2019 Lecture12 Clustering
34/34
Cluster I: (max(5, 1, 0)) = 5 (red)
Cluster II: (max(1, 4, 1)) = 4 (blue)
Cluster III: (max(2 0 3)) = 3 (green)
2=36;? /:>C@./