Lecture12 Clustering

download Lecture12 Clustering

of 34

Transcript of Lecture12 Clustering

  • 7/26/2019 Lecture12 Clustering

    1/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    "#$%&'() *+,-./01/ 23,4/556+1$/47+,.,16/58//9 ()

    $/:;

  • 7/26/2019 Lecture12 Clustering

    2/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    $,0>?5 $,@64A E,+5

    !

    B,4=C/+; 3/@3/5/+;>E,+5

    ! #=44/55 436;/36>

    !

    .1,36;7C5!

    2>3EE,+>.

    ! G6/3>34764>.

  • 7/26/2019 Lecture12 Clustering

    3/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    !"#$ &' ()*'$+,&-./

    H

  • 7/26/2019 Lecture12 Clustering

    4/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    87>; 65 4.=5;/36+1I

    ! 5/; ,J ,KL/4;5

    6+;, 4.>55/5 ,J 56C6.>3 ,KL/4;5

    ! B,4=C/+;5 -6;76+ > 4.=5;/3 57,=.0 K/ 56C6.>3M

    !

    B,4=C/+;5 J3,C 06N/3/+; 4.=5;/35 57,=.0 K/06556C6.>3M

    ! $7/ 4,CC,+/5; J,3C ,J '"3'4/$103/& 2/-$"0"5! O+5=@/3F65/0 ./>3+6+1 P I

    Ch. 16

  • 7/26/2019 Lecture12 Clustering

    5/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    87>; 65 4.=5;/36+1I

    ! 5/; ,J ,KL/4;5

    6+;, 4.>55/5 ,J 56C6.>3 ,KL/4;5

    ! B,4=C/+;5 -6;76+ > 4.=5;/3 57,=.0 K/ 56C6.>3M

    !

    B,4=C/+;5 J3,C 06N/3/+; 4.=5;/35 57,=.0 K/06556C6.>3M

    ! $7/ 4,CC,+/5; J,3C ,J '"3'4/$103/& 2/-$"0"5

    Ch. 16

  • 7/26/2019 Lecture12 Clustering

    6/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    Q 0>;> 5/; -6;7 4./>3 4.=5;/3 5;3=4;=3/

    ! G,- -,=.0

    ?,= 0/561+

    >+ >.1,36;7C

    J,3 R+06+1

    ;7/ ;73//

    4.=5;/35 6+

    ;765 4>5/I

    Ch. 16

  • 7/26/2019 Lecture12 Clustering

    7/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S

    ! !0123 415678 9:92;8

  • 7/26/2019 Lecture12 Clustering

    8/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S

    ! 87,./ 4,3@=5 >+>.?565Y+>F61>E,+

    ! %:@.,3/ 0>;>

    ! G15 .

    ! T/-/3 4,C@>365,+5

    Sec. 16.1

  • 7/26/2019 Lecture12 Clustering

    9/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    T,3 6C@3,F6+1 5/>347 3/4>..

    ! !"#$%&' )*+,%)&$-$& B,4=C/+;5 6+ ;7/ 5>C/ 4.=5;/3 K/7>F/

    56C6.>3.? -6;7 3/5@/4; ;, 3/./F>+4/ ;, 6+J,3C>E,+ +//05

    ! $7/3/J,3/Z ;, 6C@3,F/ 5/>347 3/4>..A

    ! @36,36

    !

    87/+ > [=/3? C>;47/5 > 0,4 6Z >.5, 3/;=3+ ,;7/3 0,45 6+ ;7/

    4.=5;/3 4,+;>6+6+1 6

    ! G,@/ 6J -/ 0, ;765A $7/ [=/3? 4>3-6.. >.5, 3/;=3+ 0,45 4,+;>6+6+1

    -'#%,%702/

    !

    U/4>=5/ 4.=5;/36+1 13,=@/0 ;,1/;7/3 0,45 4,+;>6+6+1 (-$-6;7;7,5/ 4,+;>6+6+1 -'#%,%702/8

    Why might this happen?

    Sec. 16.1

  • 7/26/2019 Lecture12 Clustering

    10/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S

    ! 87,./ 4,3@=5 >+>.?565Y+>F61>E,+

    ! U/V/3 =5/3 6+;/3J>4/A 5/>347 -6;7,=; ;?@6+1

    ! T,3 6C@3,F6+1 3/4>.. 6+ 5/>347 >@@.64>E,+5

    ! U/V/3 5/>347 3/5=.;5 W.69/ @5/=0, STX

    ! G15 J3B35 :9>

  • 7/26/2019 Lecture12 Clustering

    11/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    Q@@.64>E,+5 ,J 4.=5;/36+1 6+ "S

    ! 87,./ 4,3@=5 >+>.?565Y+>F61>E,+

    ! U/V/3 =5/3 6+;/3J>4/A 5/>347 -6;7,=; ;?@6+1

    ! T,3 6C@3,F6+1 3/4>.. 6+ 5/>347 >@@.64>E,+5

    ! U/V/3 5/>347 3/5=.;5 W.69/ @5/=0, STX

    ! T,3 K/V/3 +>F61>E,+ ,J 5/>347 3/5=.;5

    ! %N/4EF/ =5/3 3/4>..-6.. K/ 7617/3

    !

    G15 8633I34C15 86943 53C592

    ! (278C35LJ983I 53C592 ?38 D98C35 839540

    Sec. 16.1

  • 7/26/2019 Lecture12 Clustering

    12/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    ()*'$+,&-. &''*+'

    ()

    & ) + ) 0 2 S 16 2

  • 7/26/2019 Lecture12 Clustering

    13/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    "55=/5 J,3 4.=5;/36+1

    ! S/@3/5/+;>E,+ J,3 4.=5;/36+1

    ! B,4=C/+; 3/@3/5/+;>E,+

    ! \/4;,3 5@>4/ C,0/.

    !

    ]//0 > +,E,+ ,J 56C6.>36;?Y065;>+4/! G,- C>+? 4.=5;/35I

    ! T6:/0 > @36,36I

    ! ;> 036F/+I

    !

    QF,60 ;36F6>.4.=5;/35 & ;,, .>31/ ,3 5C>..

    ! 87>; 65 ;7/ 3617; 56^/ J,3 > 4.=5;/3I

    ! _69/.? 0>;> 0/@/+0/+;

    Sec. 16.2

    ! & ) ! + ) . 0 2

  • 7/26/2019 Lecture12 Clustering

    14/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    ],E,+ ,J 56C6.>36;?Y065;>+4/

    ! "0/>.. $&/0123 $-/-"0'-%*M

    ! 23>4E4>.A ;/3C&5;>E5E4>. 56C6.>36;? W0,45 >5

    F/4;,35X!

    36;?

    !

    T,3 C>+? >.1,36;7C5Z />56/3 ;, ;76+9 6+

    ;/3C5 ,J > &03#-"(/W3>;7/3 ;7>+ 56C6.>36;?XK/;-//+ 0,45M

    !U=; 3/>. 6C@./C/+;>E,+5 =5/ 4,56+/ 56C6.>36;?

    ! # & ) # ! + ) . # 0 2

  • 7/26/2019 Lecture12 Clustering

    15/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    ; >.1,36;7C5

    ! O5=>..? 5;>3; -6;7 > 3>+0,C @>3EE,+6+1

    ! S/R+/ 6; 6;/3>EF/.?

    !

    9 C/>+5 4.=5;/36+1

    ! G6/3>34764>. >.1,36;7C5

    ! U,V,C&=@Z >11.,C/3>EF/

    ! # & ) # ! + ) . # 0 2 Sec 16 4

  • 7/26/2019 Lecture12 Clustering

    16/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    9&D/>+5

    ! Q55=C/5 0,4=C/+;5 >3/ 3/>.&F>.=/0 F/4;,35M

    ! 5/0 ,+ (/"#$%0&3 W>9> ;7/ (/"#/$ %+ 5$-10#:

    ,3 C/>+X ,J @,6+;5 6+ > 4.=5;/3Z (A

    ! S/>5561+C/+; ,J 6+5;>+4/5 ;, 4.=5;/35 65 K>5/0 ,+

    065;>+4/ ;, ;7/ 4=33/+; 4.=5;/3 4/+;3,605M! W`3 ,+/ 4>+ /[=6F>./+;.? @73>5/ 6; 6+ ;/3C5 ,J 56C6.>36E/5X

    !"

    =

    cx

    x

    c !

    !!

    ||

    1(c)

    Sec. 16.4

    ! # & ) # ! + ) . # 0 2 S 16 4

  • 7/26/2019 Lecture12 Clustering

    17/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    9&D/>+5 Q.1,36;7C

    #/./4; 93>+0,C 0,45 a3(Z 3)Zb 39c >5 5//05M

    O+E. 4.=5;/36+1 (%"1/$5/3W,3 ,;7/3 5;,@@6+1 436;/36,+XA

    T,3 />47 0,4 &0A

    Q5561+ &0;, ;7/ 4.=5;/3 (;5=47 ;7>; &03#W.MW=/ '4&-#/ #?/ 3//&3 #% #?/ (/"#$%0& %+ /-(? (2'3#/$X

    T,3 />47 4.=5;/3 (;

  • 7/26/2019 Lecture12 Clustering

    18/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    9D/>+5 %:>C@./W9P)X

    Pick seeds

    Reassign clusters

    Compute centroids

    x

    x

    Reassign clusters

    x

    x xx Compute centroids

    Reassign clusters

    Converged!

    Sec. 16.4

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 S 16 4

  • 7/26/2019 Lecture12 Clustering

    19/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    $/3C6+>E,+ 4,+06E,+5

    ! #/F/3>. @,556K6.6E/5Z /M1MZ

    ! Q R:/0 +=CK/3 ,J 6;/3>E,+5M

    !

    B,4 @>3EE,+ =+47>+1/0M!

    +1/M

    Does this mean that the docs in a

    cluster are unchanged?

    Sec. 16.4

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 16 4

  • 7/26/2019 Lecture12 Clustering

    20/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    #//0 + F>3? K>5/0 ,+3>+0,C 5//0 5/./4E,+M

    ! #,C/ 5//05 4>+ 3/5=.; 6+ @,,3

    4,+F/31/+4/ 3>;/Z ,34,+F/31/+4/ ;, 5=K&,@EC>.4.=5;/36+15M

    ! #/./4; 1,,0 5//05 =56+1 > 7/=365E4

    W/M1MZ 0,4 ./>5; 56C6.>3 ;, >+?/:65E+1 C/>+X

    ! $3? ,=; C=.E@./ 5;>3E+1 @,6+;5

    ! "+6E>.6^/ -6;7 ;7/ 3/5=.;5 ,J >+,;7/3C/;7,0M

    In the above, if you start

    with B and E as centroids

    you converge to {A,B,C}

    and {D,E,F}

    If you start with D and F

    you converge to

    {A,B,D,E} {C,F}

    Example showing

    sensitivity to seeds

    Sec. 16.4

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

  • 7/26/2019 Lecture12 Clustering

    21/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    G,- D>+? 3EE,+"0,45 6+;, @3/0/;/3C6+/0 +=CK/3 ,J 4.=5;/35

    ! T6+06+1 ;7/ 3617;+=CK/3 ,J 4.=5;/35 65 @>3; ,J ;7/

    @3,K./C! d6F/+ 0,45Z @>3EE,+ 6+;, >+ >@@3,@36>;/+=CK/3 ,J

    5=K5/;5M

    ! %M1MZ J,3 [=/3? 3/5=.;5 & 60/>. F>.=/ ,J 9+,; 9+,-+ =@ J3,+;

    & ;7,=17 O" C>? 6C@,5/ .6C6;5M

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

  • 7/26/2019 Lecture12 Clustering

    22/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    "&+,#,("&(#) ()*'$+,&-.

    ))

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Ch 17

  • 7/26/2019 Lecture12 Clustering

    23/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    G6/3>34764>. ;3//&K>5/0 76/3>34764>. ;>:,+,C?

    W&/"&$%5$-,X J3,C > 5/; ,J 0,4=C/+;5M

    ! `+/ >@@3,>47A 3/4=356F/ >@@.64>E,+ ,J >

    @>3EE,+>. 4.=5;/36+1 >.1,36;7CM

    animal

    vertebrate

    fish reptile amphib. mammal worm insect crustacean

    invertebrate

    Ch. 17

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

  • 7/26/2019 Lecture12 Clustering

    24/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    Dendrogram: Hierarchical Clustering

    )e

    !

    Clustering obtained

    by cutting the

    dendrogram at a

    desired level: eachconnected

    component forms a

    cluster.

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 1

  • 7/26/2019 Lecture12 Clustering

    25/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    G6/3>34764>. Q11.,C/3>EF/ 47 0,4 6+ > 5/@>3>;/ 4.=5;/3

    ! ;7/+ 3/@/>;/0.? L,6+5 ;7/ (2%3/3# 4-0$,J

    4.=5;/35Z =+E. ;7/3/ 65 ,+.? ,+/ 4.=5;/3M

    ! $7/ 765;,3? ,J C/316+1 J,3C5 > K6+>3? ;3//

    ,3 76/3>347?M

    Sec. 17.1

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 2

  • 7/26/2019 Lecture12 Clustering

    26/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    @2%3/3# 4-0$,J 4.=5;/35

    !

    D>+? F>36>+;5 ;, 0/R+6+1 4.,5/5; @>63 ,J 4.=5;/35

    ! '3 W56+1./&.6+9X

    !

    (1H623C3L236;? ,J ;7/ J=3;7/5;@,6+;5Z ;7/ 2/-3#4,56+/&56C6.>3

    ! (3:C513/ ;7/ C,5;4,56+/&56C6.>3

    ! #>359?3L21/ 4,56+/ K/;-//+ >.. @>635 ,J /./C/+;5

    Sec. 17.2

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 2

  • 7/26/2019 Lecture12 Clustering

    27/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    #6+1./ _6+9 Q11.,C/3>EF/ :6C=C 56C6.>36;? ,J @>635A

    ! + 3/5=.; 6+ 5;3>11.?W.,+1 >+0 ;76+X 4.=5;/35

    0=/ ;, 47>6+6+1 /N/4;M

    ! Qf/3 C/316+1 (0>+0 (;Z ;7/ 56C6.>36;? ,J ;7/

    3/5=.E+1 4.=5;/3 ;, >+,;7/3 4.=5;/3Z (AZ 65A

    ),(max),(,

    yxsimccsimji cycx

    ji!!

    =

    )),(),,(max()),(( kjkikji ccsimccsimcccsim =!

    Sec. 17.2

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 2

  • 7/26/2019 Lecture12 Clustering

    28/34

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

    36;? ,J @>635A

    !

    D>9/5 E17;/3Z5@7/364>. 4.=5;/35 ;7>; >3/ ;?@64>..?

    @3/J/3>K./M

    ! Qf/3 C/316+1 (0>+0 (;Z ;7/ 56C6.>36;? ,J ;7/ 3/5=.E+1

    4.=5;/3 ;, >+,;7/3 4.=5;/3Z (AZ 65A

    ),(min),(,

    yxsimccsimji

    cycxji

    !!

    =

    )),(),,(min()),(( kjkikji ccsimccsimcccsim =!

    Ci Cj Ck

    Sec. 17.2

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec 17 3

  • 7/26/2019 Lecture12 Clustering

    29/34

    +

    d3,=@ QF/3>1/

    !

    #6C6.>36;? ,J ;-, 4.=5;/35 P >F/3>1/ 56C6.>36;? ,J >..@>635 -6;76+ C/31/0 4.=5;/3M

    ! +0 4,C@./;/ .6+9M

    ! $-, ,@E,+5A

    ! QF/3>1/0 >43,55 >.. @>635 6+ ;7/ C/31/0 4.=5;/3

    !

    QF/3>1/0 ,F/3 >.. @>635 7/#B//";7/ ;-, ,3616+>. 4.=5;/35

    ! !"# $"#%""=

    )( :)(

    ),(

    )1(

    1),(

    ji jiccx xyccyjiji

    ji yxsim

    cccc

    ccsim! !!!

    !!

    Sec. 17.3

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2

  • 7/26/2019 Lecture12 Clustering

    30/34

    +

    +-N GO, $"&' !++P

    Hg

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec. 16.3

  • 7/26/2019 Lecture12 Clustering

    31/34

    +

    87>; "5 Q d,,0

  • 7/26/2019 Lecture12 Clustering

    32/34

    +

    %:;/3+>. 436;/36> J,3 4.=5;/36+1 [=>.6;?

    !

    h=>.6;? C/>5=3/0 K? 6;5 >K6.6;? ;, 0654,F/3 5,C/

    ,3 >.. ,J ;7/ 7600/+ @>V/3+5 ,3 .>;/+; 4.>55/5 6+

    1,.0 5;>+0>30 0>;>

    !

    Q55/55/5 > 4.=5;/36+1 -6;7 3/5@/4; ;, 13,=+0;3=;7 b 3/[=63/5 2-7/2/& &-#-

    ! Q55=C/ 0,4=C/+;5 -6;7 @1,.0 5;>+0>30 4.>55/5Z

    -76./ ,=3 4.=5;/36+1 >.1,36;7C5 @3,0=4/ 94.=5;/35Z i(Z i)Z bZ i9 -6;7 "0C/CK/35M

    Sec 6 3

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec. 16.3

  • 7/26/2019 Lecture12 Clustering

    33/34

    %:;/3+>. %F>.=>E,+ ,J .6;?

    !

    #6C@./ C/>5=3/A @=36;?Z >5561+ 4.=5;/3 i6;,;7/ C,5; J3/[=/+; 4.>55

    purity =1

    Nj

    max|!k!cj|k

    "

    !"#$%&'()%" #% !"+%$,-)%" ./#$0/1-2 Sec. 16.3

  • 7/26/2019 Lecture12 Clustering

    34/34

    Cluster I: (max(5, 1, 0)) = 5 (red)

    Cluster II: (max(1, 4, 1)) = 4 (blue)

    Cluster III: (max(2 0 3)) = 3 (green)

    2=36;? /:>C@./