PCA Tutor1

download PCA Tutor1

of 54

Transcript of PCA Tutor1

  • 7/31/2019 PCA Tutor1

    1/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 1

    Principal Component Analysis andMatrix Factorizations for Learning

    Chris DingLawrence Berkeley National Laboratory

    Supported by Office of Science, U.S. Dept. of Energy

  • 7/31/2019 PCA Tutor1

    2/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 2

    Many unsupervised learning methods

    are closely related in a simple way

    SpectralClustering

    NMF

    K-meansclustering

    PCA

    Indicator Matrix

    Quadratic Clustering

    Semi-supervised

    classification

    Semi-supervised

    clustering

    Outlier detection

  • 7/31/2019 PCA Tutor1

    3/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 3

    Part 1.A.Principal Component Analysis (PCA)

    andSingular Value Decomposition (SVD)

    Widely used in large number of different fields

    Most widely known as PCA (multivariate

    statistics)

    SVD is the theoretical basis for PCA

  • 7/31/2019 PCA Tutor1

    4/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 4

    Brief history

    PCA Draw a plane closest to data points (Pearson, 1901) Retain most variance (Hotelling, 1933)

    SVD Low-rank approximation (Eckart-Young, 1936)

    Practical application/Efficient Computation (Golub-Kahan, 1965)

    Many generalizations

  • 7/31/2019 PCA Tutor1

    5/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 5

    PCA and SVD),,,( 21 nxxxX L=Data: n points in p-dim:

    Covariance

    Principal directions:

    (Principal axis,subspace)

    ku Principal components:

    (projection on the subspace)

    kv

    =

    ==p

    k

    T

    kkk

    T uuXXC1

    ==

    r

    k

    T

    kkk

    T

    vvXX1 Gram (kernel) matrix

    Underlying basis: SVDT

    p

    k

    Tkkk VUvuX ==

    =1

  • 7/31/2019 PCA Tutor1

    6/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 6

    Further Developments

    SVD/PCA

    Principal Curves Independent Component Analysis

    Sparse SVD/PCA (many approaches)

    Mixture of Probabilistic PCA Generalization to exponential familty, max-margin

    Connection to K-means clustering

    Kernel (inner-product) Kernel PCA

  • 7/31/2019 PCA Tutor1

    7/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 7

    Methods of PCA Utilization

    dkkk XduXuu ++= )()1( 1 L

    Principal components(uncorrelated random variables):

    Projection to low-dim

    subspace

    Sphereing the data

    Transform data to N(0,1)

    Dimension reduction: T

    p

    k

    Tkkk VUvuX ===1

    ),,,( 21 nxxxX L=

    XUX

    T=

    ~ ),,(1 k

    uuU L=

    XUUXCXT12/1~

    ==

  • 7/31/2019 PCA Tutor1

    8/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 8

    Applications of PCA/SVD

    Most popular in multivariate statistics

    Image processing, signal processing

    Physics: principal axis, diagonalization of2nd tensor (mass)

    Climate: Empirical Orthogonal Functions(EOF)

    Kalman filter. Reduced order analysis

    TttttAPAPEsAs )()1()()1( , =+= ++

  • 7/31/2019 PCA Tutor1

    9/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 9

    Applications of PCA/SVD

    PCA/SVD is as widely as Fast Fourier Transforms

    Both are spectral expansions FFT is more on Partial Differential Equations

    PCA/SVD is more on discrete (data) analysis

    PCA/SVD surpass FFT as computational sciencesfurther advance

    PCA/SVD

    Select combination of variables Dimension reduction

    An image has 104 pixels. True dimension is 20 !

  • 7/31/2019 PCA Tutor1

    10/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 10

    PCAis a Matrix Factorization(spectral/eigen decomposition)

    Covariance

    Tp

    k

    T

    kkk

    T

    UUuuXXC===

    =1

    Tr

    k

    Tkkk

    TVVvvXX ==

    =1

    Kernel matrix

    Underlying basis: SVDT

    p

    k

    Tkkk VUvuX ==

    =1

    Principal directions: ),,,( 21 kuuuU L=

    Principal components: ),,,( 21 kvvvV L=

  • 7/31/2019 PCA Tutor1

    11/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 11

    From PCA to spectral clusteringusing generalized eigenvectors

    = j iji wd

    In Kernel PCAwe compute eigenvector: vWv =

    Consider the kernel matrix:

    Generalized Eigenvector:

    )(),( jiij xxW =

    DqWq =

    ),,( 1 ndddiagD L=

    This leads to Spectral Clustering !

  • 7/31/2019 PCA Tutor1

    12/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 12

    Scale PCA Spectral Clustering

    PCA:

    2/1)/(~,~

    21

    21

    jiijij ddwwWDDW ==

    scaled principal component

    Scaled PCA: DqqDDWDWk

    T

    kkk === 12

    1

    2

    1 ~

    kk vDq21

    =

    =k

    T

    kkk vvW

  • 7/31/2019 PCA Tutor1

    13/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 13

    Scaled PCA on a Rectangle Matrix Correspondence Analysis

    Re-scaling:2/1

    .. )(

    ~

    ,

    ~

    /21

    21

    jiijijcr ppppPDDP==

    are scaled row and column principalcomponent (standard coordinates in CA)

    Apply SVD on P~

    ck

    T

    kkkr

    T DgfDprcP ..1

    / =

    =

    Subtract trivial component

    T

    nppr ),,( ..1 L=

    T

    n

    ppc ),,(.1.

    L=kckkrk vDguDf

    21

    21

    ,

    ==

    (Zha, et al, CIKM 2001, Ding et al, PKDD2002)

  • 7/31/2019 PCA Tutor1

    14/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 14

    Nonnegative Matrix Factorization

    ),,,( 21 nxxxX L=Data Matrix: n points in p-dim:

    TFGX

    Decomposition(low-rank approximation)

    Nonnegative Matrices 0,0,0 ijijij GFX

    ),,,( 21 kgggG L=),,,( 21 kfffF L=

    is an image,

    document,

    webpage, etc

    ix

  • 7/31/2019 PCA Tutor1

    15/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 15

    Solving NMF with multiplicative updating

    Fix F, solve for G; Fix G, solve for F

    Lee & Seung ( 2000) propose

    0,0,|||| 2 = GFFGXJ T

    jk

    Tjk

    T

    jkjkFGF

    FXGG

    )(

    )(

    ik

    Tikikik

    GFG

    XGFF )(

    )(

  • 7/31/2019 PCA Tutor1

    16/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 16

    Matrix Factorization Summary

    PCA:

    Scaled PCA:

    DQQDDWDWT

    == 21

    2

    1 ~

    T

    VVW =

    Symmetric

    (kernel matrix, graph)

    Rectangle Matrix(contigency table, bipartite graph)

    TVUX =

    cT

    rcr DGFDDXDX ==2

    1

    2

    1 ~

    TFGX NMF: TQQW

  • 7/31/2019 PCA Tutor1

    17/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 17

    Indicator Matrix Quadratic Clustering

    Unsigned Cluster indicator Matrix H=(h1,, hK)

    0,..),Tr(max = HIHHtsWHHTT

    H

    ;XXW T=

    Kernel K-means clustering:

    Spectral clustering (normalized cut)

    K-means: ))(),(( >

  • 7/31/2019 PCA Tutor1

    18/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 18

    Indicator Matrix Quadratic ClusteringAdditional features:

    )Tr(max HCWHHTT

    H+

    .,)(

    2/)(CHWHH

    H

    CWHHH

    TT

    ik

    ikikikik +=

    +

    Semi-suerpvised classification:

    Semi-supervised clustering: (A) must-link and (B) cannot-link constraints

    allowing zero rows in HOutlier Detection:

    )Tr(max BHHAHHWHH TTTH

    +

    )Tr(max WHHT

    H

    Nonnegative Lagrangian Relaxation:

  • 7/31/2019 PCA Tutor1

    19/54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 19

    Tutorial Outline PCA

    Recent developments on PCA/SVD

    Equivalence to K-means clustering

    Scaled PCA

    Laplacian matrix Spectral clustering

    Spectral ordering

    Nonnegative Matrix Factorization

    Equivalence to K-means clustering Holistic vs. Parts-based

    Indicator Matrix Quadratic Clustering

    Use Nonnegative Lagrangian Relaxtion

    Includes K-means and Spectral Clustering

    semi-supervised classification

    Semi-supervised clustering

    Outlier detection

  • 7/31/2019 PCA Tutor1

    20/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 20

    Part 1.B.Recent Developments on PCA and SVD

    Principal CurvesIndependent Component Analysis

    Kernel PCA

    Mixture of PCA (probabilistic PCA)

    Sparse PCA/SVD

    Semi-discrete, truncation, L1 constraint, Directsparsification

    Column Partitioned Matrix Factorizations2D-PCA/SVD

    Equivalence to K-means clustering

  • 7/31/2019 PCA Tutor1

    21/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 21

    PCA and SVD

    ),,,( 21 nxxxX L=Data Matrix:

    Covariance

    Principal directions:

    (Principal axis,subspace)ku Principal components:

    (projection on the subspace)kv

    =

    ==p

    k

    T

    kkk

    TuuXXC

    1

    ==

    r

    k

    T

    kkk

    TvvXX

    1

    Gram (kernel) matrix

    Underlying basis: SVD =

    =

    p

    k

    T

    kkk vuX1

  • 7/31/2019 PCA Tutor1

    22/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 22

    Kernel PCA

    Kernel

    Feature extraction

    Indefinite Kernels

    Generalization to graphs with nonnegative weights

    )(),( jiij xxK =

    (Scholkopf, Smola, Muller, 1996)

    )(),()(, xxvxv iii

    =

    PCA Component

    v

    )( ii xx

  • 7/31/2019 PCA Tutor1

    23/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 23

    Mixture of PCA Data has local structures.

    Global PCA on all data is not useful

    Clustering PCA (Hinton et al): Using clustering to cluster data into clusters

    Perform PCA in each cluster

    No explicit generative model Probabilistic PCA (Tipping & Bishop)

    Latent variables

    Generative model (Gaussian) Mixture of Gaussians mixture of PCA

    Adding Markov dynamics for latent variables (LinearGaussian Models)

  • 7/31/2019 PCA Tutor1

    24/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 24

    Probabilistic PCA

    Linear Gaussian Model

    ),0(~, 2INWsx ii ++=

    Latent variables ),,( 1 nssS L=

    ),(~)(2

    0 IsNsP sGaussian prior

    ),(~ 20T

    sWWIWsNx +

    (Tipping & Bishop, 1995; Roweis & Ghahramani, 1999)

    Linear Gaussian Model

    ,,1 +=+=+ iiii WsxAss

  • 7/31/2019 PCA Tutor1

    25/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 25

    Sparse PCA Compute a factorization

    Uor Vis sparse or both are sparse

    Why sparse?

    Variable selection (sparse U)

    When n >> d

    Storage saving

    Other new reasons?

    L1 and L2 constraints

    TUVX

  • 7/31/2019 PCA Tutor1

    26/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 26

    Sparse PCA: Truncation andDiscretization

    Sparsified SVD

    Compute {uk,vk} one at a time, truncate those entriesbelow a threshold.

    Recursively compute all pairs using deflation. (Zhang, Zha, Simon, 2002)

    Semi-discrete decomposition

    U, Vonly contains {-1, 0, 1}

    Iterative algorithm to compute U,V using deflation (Kolda & Oleary, 1999)

    TVUX

    TuvXX

    )( 1 kuuU L= )( 1 kvvV L=

  • 7/31/2019 PCA Tutor1

    27/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 27

    Sparse PCA: L1 constraint

    LASSO (Tibshirani, 1996)

    SCoTLASS (Joliffe & Uddin, 2003)

    Least Angle Regression (Efron, et al 2004)

    Sparse PCA (Zou, Hastie, Tibshirani,2004)

    tXy T 12 ||||,||||min

    0,||||,)(max 1 = hTTTT uutuuXXu

    IxxT

    k

    j

    jj

    k

    j

    jiT

    n

    i

    i =++ ===

    ,||||||||||||min

    1

    1,1

    1

    22

    1,

    ||||/ jjjv =

  • 7/31/2019 PCA Tutor1

    28/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 28

    Sparse PCA: Direct Sparsification

    Sparse SVD with explicit sparsification

    rank-one approximation

    Minimize a bound

    deflation

    Direct sparse PCA, on covariance matrix S

    )nnz()nnz(||||min,

    vuudvX FTvu ++

    )Tr(max)Tr(maxmax SUSuuSuuu TT ===

    1)rank(,0,)nnz(,1)Tr(.. 2 == UUkUUts f

    (Zhang, Zha, Simon 2003)

    (DAspremont, Gharoui, Jordan,Lancriet, 2004)

  • 7/31/2019 PCA Tutor1

    29/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 29

    Sparse PCA Summary Many different approaches

    Truncation, discretization

    L1 Constraint

    Direct sparsification

    Other approaches

    Sparse Matrix factorization in general

    L1 constraint

    Many questions Orthogonality

    Unique solution, global solution

  • 7/31/2019 PCA Tutor1

    30/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 30

    PCA: Further Generalizations

    Generalization to Exponential Family (Collins, Dasgupta, Schapire, 2001)

    Maximum Margin Factorization (Srebro, Rennie, Jaakkola, 2004)

    Collaborative filtering

    Input Y is binary Hard margin

    Soft margin

    +

    Sia

    iaiaXYcX )1,0max(||||min

    )||||||(||||||, 222

    1

    FroFroT

    VUXUVX +==

    SiaXY iaia ,1

  • 7/31/2019 PCA Tutor1

    31/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 31

    Column Partitioned Matrix Factorizations

    Column Partitioned Data Matrix

    Partitions are generate by clustering

    Centroid matrix

    uk is centroid

    Fix U, compute V

    Represent each partition by a SVD.

    Pick leading Us to form U

    Fix U, compute V

    Several other variations

    1)( = UUUXVTT2||||min F

    TUVX

    )(1 k

    uuU L=

    ),,,(),( 1111 1

    2

    21

    1

    1

    4484476LL

    48476L

    48476LL

    k

    k

    n

    nn

    n

    nn

    n

    nn xxxxxxxxX ++ ==

    nnn k =++L1

    ),,(),( )()(

    1

    )1(

    1

    )1(

    111

    48476

    LL

    48476

    LL

    l

    l

    l

    ll

    k

    k

    k

    kuuuuUUU ==

    (Zhang & Zha, 2001)

    (Castelli, Thomasian & Li 2003)

    (Park, Jeon & Rosen, 2003)

    (Dhillon & Modha, 2001)

    (Zeimpekis & Gallopoulos, 2004)

  • 7/31/2019 PCA Tutor1

    32/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 32

    Two-dimensional SVD Large number of data objects are 2-D: images, maps

    Standard method: convert (re-order) each image as a 1D vector

    collect all 1D vectors into a single (big) matrix

    apply SVD on the big matrix

    2D-SVD is developed for 2D objects

    Extension of standard SVD

    Keeping the 2D characteristics

    Improves quality of low-dimensional approximation Reduces computation, storage

  • 7/31/2019 PCA Tutor1

    33/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 33

    0 0

    05

    0 7

    10

    08

    0 2

    0 0

    .

    .

    .

    .

    .

    .

    .

    M

    Pixel vector

    Linearize a 2D object into 1D object

  • 7/31/2019 PCA Tutor1

    34/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 34

    SVD and 2D-SVD

    SVD

    VXUT

    =TVUX =

    ),,,( 21 nxxxX L=

    Eigenvectors of TXX XXTand

    },,,{}{ 21 nAAAA L=Eigenvectors of

    2D-SVD

    Tii

    i

    AAAAF ))(( =)()( AAAAG i

    Ti

    i

    =T

    ii

    VUMA = VAUM iT

    i =

    row-row covariance

    column-column cov

  • 7/31/2019 PCA Tutor1

    35/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 35

    2D-SVD

    },,,{}{ 21 nAAAA L= assume 0=A

    ==Tkkk

    Tii

    i

    uuAAF

    =

    ==1k

    T

    kki

    T

    ii

    kuuAAG

    VAUM iT

    i =

    row-row cov:

    col-col cov:

    ),,,( 21 kuuuU L=

    ),,,( 21 kvvvV L=

    niVUMAT

    ii ,,1, L==

    Bilinear

    subspace

    kk

    i

    kckrcr

    i MVUA

    ,,,

    2D SVD E A l i

  • 7/31/2019 PCA Tutor1

    36/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 36

    2D-SVD Error Analysis

    +====

    r

    kj

    jT

    ii

    n

    i

    RMAJ

    1

    2

    1

    2 ||||min

    +=+== +=c

    kj

    j

    r

    kj

    jT

    ii

    n

    iRLMAJ

    11

    2

    1

    3 ||||min

    +==

    ==

    c

    kj

    jii

    n

    i

    LMAJ

    1

    2

    1

    1 ||||min

    kki

    kckrcri

    Tii RMRRRLRARLMA

    ,,,,

    +==

    =

    r

    kj

    jT

    ii

    n

    i

    LLMAJ

    1

    2

    1

    4 2||||min

    +==p

    kiiTVUX

    1

    22||||min SVD:

  • 7/31/2019 PCA Tutor1

    37/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 37

    Temperature maps (January over 100 years)

    Reconstruction

    Errors

    SVD/2DSVD=1.1

    Storages

    SVD/2DSVD=8

  • 7/31/2019 PCA Tutor1

    38/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 38

    Reconstructed image

    SVD (K=15), storage 160560

    2DSVD (K=15), storage 93060

    SVD

    2dSVD

  • 7/31/2019 PCA Tutor1

    39/54

    PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 39

    2D-SVD Summary

    2DSVD is extension of standard SVD

    Provides optimal solution for 4 representations for

    2D images/maps Substantial improvements in storage, computation,

    quality of reconstruction

    Capture 2D characteristics

  • 7/31/2019 PCA Tutor1

    40/54

    40PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Part 1.C.

    K-means Clustering Principal Component Analysis

    (Equivalence between PCA and K-means)

  • 7/31/2019 PCA Tutor1

    41/54

    41PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    K-means clustering

    Also called isodata, vector quantization

    Developed in 1960s (Lloyd, MacQueen, Hatigan,etc)

    Computationally Efficient (order-mN)

    Widely used in practice Benchmark to evaluate other algorithms

    =

    =

    kCi

    ki

    K

    k

    K cxJ2

    1

    ||||min

    T

    nxxxX ),,,( 21 L=Given n points in m-dim:

    K-means objective

  • 7/31/2019 PCA Tutor1

    42/54

    42

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    PCA is equivalent to K-means

    Continuous optimal solution for clusterindicators in K-means clustering aregiven by principal components.

    Subspace spanned by Kcluster centroidsis given by PCA subspace.

  • 7/31/2019 PCA Tutor1

    43/54

  • 7/31/2019 PCA Tutor1

    44/54

    44

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    A simple illustration

  • 7/31/2019 PCA Tutor1

    45/54

    45

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    DNA Gene Expression File for Leukemia

    Using v1 , tissue

    samples separatedinto 2 clusters, 3errors

    Do one more K-means, reduce to 1error

  • 7/31/2019 PCA Tutor1

    46/54

    46

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Multi-way K-means Clustering

    Unsigned Cluster membership indicators h1,, hK:

    ),,(

    1

    0

    0

    0

    0

    1

    0

    0

    0

    0

    1

    1

    321 hhh=

    C1 C2 C3

  • 7/31/2019 PCA Tutor1

    47/54

    47

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Multi-way K-means Clustering

    =

    =

    i

    K

    k

    Cjij

    Ti

    k

    iKk

    xxn

    xJ

    1

    ,

    2 1

    (Unsigned) Cluster indicators H=(h1,, hK)

    )(Tr2k

    TT

    ki

    iK

    XHXHxJ =

    =

    =i

    K

    k

    kTT

    ki XhXhx

    1

    2

    THQ kk=

    Redundancy: =

    =

    K

    k

    kk ehn

    1

    2/1

    Regularized Relaxation

    Transform h1, , hK to q1 - qkvia orthogonal matrix T

    Thhqq kk ),,(),...,( 11 L=2/1

    1 /neq =

  • 7/31/2019 PCA Tutor1

    48/54

  • 7/31/2019 PCA Tutor1

    49/54

    49

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Consistency:2-way and K-way approaches

    Orthogonal Transform:

    Recover the original 2-way cluster indicator

    Ttransforms (h1, h2) to (q1,q2):

    Tbbaaq ),,,,,(, 2 = LL

    Tq )11(1 L=

    Th )11,00(, 2 LL=Th )00,11(1 LL= nnna 12=nn

    nb

    2

    1=

    =

    nnnn

    nnnnT

    //

    //

    21

    12

    Test of Lower bounds of K means clustering

  • 7/31/2019 PCA Tutor1

    50/54

    50

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Lower bound is within 0.6-1.5% of the optimal value

    Test of Lower bounds of K-means clustering

    opt

    LBopt

    JJJ ||

    Cl t S b ( d b t id )

  • 7/31/2019 PCA Tutor1

    51/54

    51

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    ====k

    Tkkk

    T

    k

    Tkk

    T

    k

    Tkk

    k

    Tkk uuXvvXXhhXccP

    Cluster Subspace (spanned by K centroids)

    = PCA Subspace

    Given a data point x,

    =

    k

    TkkccP project x into the cluster subspace

    k

    k

    ikk Xhxihc == )(Centroid is given by

    PCA

    k

    Tkk

    k

    TkkkmeansK PuuuuP =

    PCA automatically project into cluster subspace

    PCA is unsupervised version of LDA

  • 7/31/2019 PCA Tutor1

    52/54

    52

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Effectiveness of PCA Dimension Reduction

    l Cl

  • 7/31/2019 PCA Tutor1

    53/54

    53

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Kernel K-means Clustering

    ==

    kCi

    ki

    K

    k

    K cxJ 2

    1

    ||)()(||min

    Kernal K-means objective: )( ii xx

    Kernal K-means

    =

    =

    K

    k Cji

    jT

    i

    ki

    i

    k

    xx

    n

    x

    1 ,

    2 )()(1

    |)(|

    =

    =

    K

    k Cji

    ji

    k

    K

    k

    xx

    n

    J1 ,

    )(),(1

    max

  • 7/31/2019 PCA Tutor1

    54/54

    54

    PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding

    Kernel K-means clusteringis equivalent to Kernal PCA

    Continuous optimal solution for clusterindicators are given by Kernal PCAcomponents

    Subspace spanned by K cluster centroidsare given by Kernal PCAprincipal subspace