Clustering and Data Mining in R -...

40
Clustering and Data Mining in R Introduction Thomas Girke December 7, 2012 Clustering and Data Mining in R Slide 1/40

Transcript of Clustering and Data Mining in R -...

Page 1: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Clustering and Data Mining in RIntroduction

Thomas Girke

December 7, 2012

Clustering and Data Mining in R Slide 1/40

Page 2: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Introduction

Data PreprocessingData TransformationsDistance MethodsCluster Linkage

Hierarchical ClusteringApproachesTree Cutting

Non-Hierarchical ClusteringK-MeansPrincipal Component AnalysisMultidimensional ScalingBiclustering

Clustering with R and Bioconductor

Clustering and Data Mining in R Slide 2/40

Page 3: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Outline

Introduction

Data PreprocessingData TransformationsDistance MethodsCluster Linkage

Hierarchical ClusteringApproachesTree Cutting

Non-Hierarchical ClusteringK-MeansPrincipal Component AnalysisMultidimensional ScalingBiclustering

Clustering with R and Bioconductor

Clustering and Data Mining in R Introduction Slide 3/40

Page 4: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

What is Clustering?

Clustering is the classification of data objects into similaritygroups (clusters) according to a defined distance measure.

It is used in many fields, such as machine learning, datamining, pattern recognition, image analysis, genomics,systems biology, etc.

Clustering and Data Mining in R Introduction Slide 4/40

Page 5: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Why Clustering and Data Mining in R?

Efficient data structures and functions for clustering

Reproducible and programmable

Comprehensive set of clustering and machine learning libraries

Integration with many other data analysis tools

Useful Links

Cluster Task Views Link

Machine Learning Task Views Link

UCR Manual Link

Clustering and Data Mining in R Introduction Slide 5/40

Page 6: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Outline

Introduction

Data PreprocessingData TransformationsDistance MethodsCluster Linkage

Hierarchical ClusteringApproachesTree Cutting

Non-Hierarchical ClusteringK-MeansPrincipal Component AnalysisMultidimensional ScalingBiclustering

Clustering with R and Bioconductor

Clustering and Data Mining in R Data Preprocessing Slide 6/40

Page 7: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Data Transformations Choice depends on data set!

Center & standardize1 Center: subtract from each vector its mean2 Standardize: devide by standard deviation

⇒ Mean = 0 and STDEV = 1

Center & scale with the scale() fuction1 Center: subtract from each vector its mean2 Scale: divide centered vector by their root mean square (rms)

xrms =

√√√√ 1

n − 1

n∑i=1

xi2

⇒ Mean = 0 and STDEV = 1

Log transformation

Rank transformation: replace measured values by ranks

No transformation

Clustering and Data Mining in R Data Preprocessing Data Transformations Slide 7/40

Page 8: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Distance Methods List of most common ones!

Euclidean distance for two profiles X and Y

d(X , Y ) =

√√√√ n∑i=1

(xi − yi )2

Disadvantages: not scale invariant, not for negative correlations

Maximum, Manhattan, Canberra, binary, Minowski, ...

Correlation-based distance: 1− r

Pearson correlation coefficient (PCC)

r =n

∑ni=1 xiyi −

∑ni=1 xi

∑ni=1 yi√

(∑n

i=1 x2i − (

∑ni=1 xi )2)(

∑ni=1 y2

i − (∑n

i=1 yi )2)

Disadvantage: outlier sensitiveSpearman correlation coefficient (SCC)

Same calculation as PCC but with ranked values!

Clustering and Data Mining in R Data Preprocessing Distance Methods Slide 8/40

Page 9: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Cluster Linkage

Single Linkage

Complete Linkage

Average Linkage

Clustering and Data Mining in R Data Preprocessing Cluster Linkage Slide 9/40

Page 10: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Outline

Introduction

Data PreprocessingData TransformationsDistance MethodsCluster Linkage

Hierarchical ClusteringApproachesTree Cutting

Non-Hierarchical ClusteringK-MeansPrincipal Component AnalysisMultidimensional ScalingBiclustering

Clustering with R and Bioconductor

Clustering and Data Mining in R Hierarchical Clustering Slide 10/40

Page 11: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Hierarchical Clustering Steps

1 Identify clusters (items) with closest distance

2 Join them to new clusters

3 Compute distance between clusters (items)

4 Return to step 1

Clustering and Data Mining in R Hierarchical Clustering Slide 11/40

Page 12: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Hierarchical Clustering Agglomerative Approach

g1 g2 g3 g4 g50.1

g1 g2 g3 g4 g50.1

0.4

g1 g2 g3 g4 g50.1

0.4

0.6

0.5

(a)

(b)

(c)

Clustering and Data Mining in R Hierarchical Clustering Slide 12/40

Page 13: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Hierarchical Clustering Approaches

1 Agglomerative approach (bottom-up)

hclust() and agnes()

2 Divisive approach (top-down)

diana()

Clustering and Data Mining in R Hierarchical Clustering Approaches Slide 13/40

Page 14: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Tree Cutting to Obtain Discrete Clusters

1 Node height in tree

2 Number of clusters

3 Search tree nodes by distance cutoff

Clustering and Data Mining in R Hierarchical Clustering Tree Cutting Slide 14/40

Page 15: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Outline

Introduction

Data PreprocessingData TransformationsDistance MethodsCluster Linkage

Hierarchical ClusteringApproachesTree Cutting

Non-Hierarchical ClusteringK-MeansPrincipal Component AnalysisMultidimensional ScalingBiclustering

Clustering with R and Bioconductor

Clustering and Data Mining in R Non-Hierarchical Clustering Slide 15/40

Page 16: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Non-Hierarchical Clustering

Selected Examples

Clustering and Data Mining in R Non-Hierarchical Clustering Slide 16/40

Page 17: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

K-Means Clustering

1 Choose the number of k clusters

2 Randomly assign items to the k clusters

3 Calculate new centroid for each of the k clusters

4 Calculate the distance of all items to the k centroids

5 Assign items to closest centroid

6 Repeat until clusters assignments are stable

Clustering and Data Mining in R Non-Hierarchical Clustering K-Means Slide 17/40

Page 18: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

K-Means

X

X

X

XX

X

X

X

X

(a)

(b)

(c)

Clustering and Data Mining in R Non-Hierarchical Clustering K-Means Slide 18/40

Page 19: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Principal Component Analysis (PCA)

Principal components analysis (PCA) is a data reduction techniquethat allows to simplify multidimensional data sets to 2 or 3dimensions for plotting purposes and visual variance analysis.

Clustering and Data Mining in R Non-Hierarchical Clustering Principal Component Analysis Slide 19/40

Page 20: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Basic PCA Steps

Center (and standardize) data

First principal component axis

Accross centroid of data cloudDistance of each point to that line is minimized, so that itcrosses the maximum variation of the data cloud

Second principal component axis

Orthogonal to first principal componentAlong maximum variation in the data

1st PCA axis becomes x-axis and 2nd PCA axis y-axis

Continue process until the necessary number of principalcomponents is obtained

Clustering and Data Mining in R Non-Hierarchical Clustering Principal Component Analysis Slide 20/40

Page 21: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

PCA on Two-Dimensional Data Set

1st

2nd

1st

2nd

Clustering and Data Mining in R Non-Hierarchical Clustering Principal Component Analysis Slide 21/40

Page 22: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Identifies the Amount of Variability between Components

Example

Principal Component 1st 2nd 3rd OtherProportion of Variance 62% 34% 3% rest

1st and 2nd principal components explain 96% of variance.

Clustering and Data Mining in R Non-Hierarchical Clustering Principal Component Analysis Slide 22/40

Page 23: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Multidimensional Scaling (MDS)

Alternative dimensionality reduction approach

Represents distances in 2D or 3D space

Starts from distance matrix (PCA uses data points)

Clustering and Data Mining in R Non-Hierarchical Clustering Multidimensional Scaling Slide 23/40

Page 24: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Biclustering

Finds in matrix subgroups of rows and columns which are as similar aspossible to each other and as different as possible to the remaining datapoints.

Unclustered ⇒ Clustered

Clustering and Data Mining in R Non-Hierarchical Clustering Biclustering Slide 24/40

Page 25: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Remember: There Are Many Additional Techniques!

Additional details can be found in the Clustering Section of theR/Bioc Manual Link

Clustering and Data Mining in R Non-Hierarchical Clustering Biclustering Slide 25/40

Page 26: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Outline

Introduction

Data PreprocessingData TransformationsDistance MethodsCluster Linkage

Hierarchical ClusteringApproachesTree Cutting

Non-Hierarchical ClusteringK-MeansPrincipal Component AnalysisMultidimensional ScalingBiclustering

Clustering with R and Bioconductor

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 26/40

Page 27: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Data Preprocessing

Scaling and Distance Matrices

> ## Sample data set

> set.seed(1410)

> y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""),

+ paste("t", 1:5, sep="")))

> dim(y)

[1] 10 5

> ## Scaling

> yscaled <- t(scale(t(y))) # Centers and scales y row-wise

> apply(yscaled, 1, sd)

g1 g2 g3 g4 g5 g6 g7 g8 g9 g10

1 1 1 1 1 1 1 1 1 1

> ## Euclidean distance matrix

> dist(y[1:4,], method = "euclidean")

g1 g2 g3

g2 4.793697

g3 4.932658 6.354978

g4 4.033789 4.788508 1.671968

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 27/40

Page 28: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Correlation-based Distances

Correlation matrix

> c <- cor(t(y), method="pearson")

> as.matrix(c)[1:4,1:4]

g1 g2 g3 g4

g1 1.00000000 -0.2965885 -0.00206139 -0.4042011

g2 -0.29658847 1.0000000 -0.91661118 -0.4512912

g3 -0.00206139 -0.9166112 1.00000000 0.7435892

g4 -0.40420112 -0.4512912 0.74358925 1.0000000

Correlation-based distance matrix

> d <- as.dist(1-c)

> as.matrix(d)[1:4,1:4]

g1 g2 g3 g4

g1 0.000000 1.296588 1.0020614 1.4042011

g2 1.296588 0.000000 1.9166112 1.4512912

g3 1.002061 1.916611 0.0000000 0.2564108

g4 1.404201 1.451291 0.2564108 0.0000000

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 28/40

Page 29: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Hierarchical Clustering with hclust I

Hierarchical clustering with complete linkage and basic tree plotting> hr <- hclust(d, method = "complete", members=NULL)

> names(hr)

[1] "merge" "height" "order" "labels" "method"

[6] "call" "dist.method"

> par(mfrow = c(1, 2)); plot(hr, hang = 0.1); plot(hr, hang = -1)

g10

g3 g4g2 g9

g6 g7g1

g5 g80.0

0.5

1.0

1.5

2.0

Cluster Dendrogram

hclust (*, "complete")d

Hei

ght

g10 g3 g4 g2 g9 g6 g7 g1 g5 g8

0.0

0.5

1.0

1.5

2.0

Cluster Dendrogram

hclust (*, "complete")d

Hei

ght

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 29/40

Page 30: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Tree Plotting I

Plot trees horizontally

> plot(as.dendrogram(hr), edgePar=list(col=3, lwd=4), horiz=T)

2.0 1.5 1.0 0.5 0.0

g10

g3

g4

g2

g9

g6

g7

g1

g5

g8

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 30/40

Page 31: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Tree Plotting II

The ape library provides more advanced features for tree plotting

> library(ape)

> plot.phylo(as.phylo(hr), type="p", edge.col=4, edge.width=2,

+ show.node.label=TRUE, no.margin=TRUE)

g1

g2

g3

g4

g5

g6

g7

g8

g9

g10

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 31/40

Page 32: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Tree Cutting

Accessing information in hclust objects

> hr

Call:

hclust(d = d, method = "complete", members = NULL)

Cluster method : complete

Number of objects: 10

> ## Print row labels in the order they appear in the tree

> hr$labels[hr$order]

[1] "g10" "g3" "g4" "g2" "g9" "g6" "g7" "g1" "g5" "g8"

Tree cutting with cutree

> mycl <- cutree(hr, h=max(hr$height)/2)

> mycl[hr$labels[hr$order]]

g10 g3 g4 g2 g9 g6 g7 g1 g5 g8

3 3 3 2 2 5 5 1 4 4

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 32/40

Page 33: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Heatmaps

All in one step: clustering and heatmap plotting

> library(gplots)

> heatmap.2(y, col=redgreen(75))

t5 t4 t1 t3 t2

g1

g10

g8

g5

g6

g7

g4

g3

g2

g9

−3 −1 1Value

01

23

Color Keyand Histogram

Cou

nt

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 33/40

Page 34: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Customizing Heatmaps

Customizes row and column clustering and shows tree cutting result in row color bar.

Additional color schemes can be found here Link

> hc <- hclust(as.dist(1-cor(y, method="spearman")), method="complete")

> mycol <- colorpanel(40, "darkblue", "yellow", "white")

> heatmap.2(y, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), col=mycol,

+ scale="row", density.info="none", trace="none",

+ RowSideColors=as.character(mycl))

t1 t3 t5 t2 t4

g10

g3

g4

g2

g9

g6

g7

g1

g5

g8

−1.5 0 1Row Z−Score

Color Key

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 34/40

Page 35: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

K-Means Clustering with PAM

Runs K-means clustering with PAM (partitioning around medoids) algorithm and shows resultin color bar of hierarchical clustering result from before.> library(cluster)

> pamy <- pam(d, 4)

> (kmcol <- pamy$clustering)

g1 g2 g3 g4 g5 g6 g7 g8 g9 g10

1 2 3 3 4 4 4 4 2 3

> heatmap.2(y, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), col=mycol,

+ scale="row", density.info="none", trace="none",

+ RowSideColors=as.character(kmcol))

t1 t3 t5 t2 t4

g10

g3

g4

g2

g9

g6

g7

g1

g5

g8

−1.5 0 1Row Z−Score

Color Key

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 35/40

Page 36: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

K-Means Fuzzy Clustering

Runs k-means fuzzy clustering

> library(cluster)

> fannyy <- fanny(d, k=4, memb.exp = 1.5)

> round(fannyy$membership, 2)[1:4,]

[,1] [,2] [,3] [,4]

g1 1.00 0.00 0.00 0.00

g2 0.00 0.99 0.00 0.00

g3 0.02 0.01 0.95 0.03

g4 0.00 0.00 0.99 0.01

> fannyy$clustering

g1 g2 g3 g4 g5 g6 g7 g8 g9 g10

1 2 3 3 4 4 4 4 2 3

> ## Returns multiple cluster memberships for coefficient above a certain

> ## value (here >0.1)

> fannyyMA <- round(fannyy$membership, 2) > 0.10

> apply(fannyyMA, 1, function(x) paste(which(x), collapse="_"))

g1 g2 g3 g4 g5 g6 g7 g8 g9 g10

"1" "2" "3" "3" "4" "4" "4" "2_4" "2" "3"

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 36/40

Page 37: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Multidimensional Scaling (MDS)

Performs MDS analysis on the geographic distances between European cities

> loc <- cmdscale(eurodist)

> ## Plots the MDS results in 2D plot. The minus is required in this example to

> ## flip the plotting orientation.

> plot(loc[,1], -loc[,2], type="n", xlab="", ylab="", main="cmdscale(eurodist)")

> text(loc[,1], -loc[,2], rownames(loc), cex=0.8)

−2000 −1000 0 1000 2000

−10

000

1000

cmdscale(eurodist)

Athens

Barcelona

BrusselsCalaisCherbourg

Cologne

Copenhagen

Geneva

Gibraltar

Hamburg

Hook of Holland

LisbonLyons

MadridMarseilles Milan

Munich

Paris

Rome

Stockholm

Vienna

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 37/40

Page 38: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Principal Component Analysis (PCA)

Performs PCA analysis after scaling the data. It returns a list with class prcomp that containsfive components: (1) the standard deviations (sdev) of the principal components, (2) thematrix of eigenvectors (rotation), (3) the principal component data (x), (4) the centering(center) and (5) scaling (scale) used.> library(scatterplot3d)

> pca <- prcomp(y, scale=TRUE)

> names(pca)

[1] "sdev" "rotation" "center" "scale" "x"

> summary(pca) # Prints variance summary for all principal components.

Importance of components:

PC1 PC2 PC3 PC4 PC5

Standard deviation 1.3611 1.1777 1.0420 0.69264 0.4416

Proportion of Variance 0.3705 0.2774 0.2172 0.09595 0.0390

Cumulative Proportion 0.3705 0.6479 0.8650 0.96100 1.0000

> scatterplot3d(pca$x[,1:3], pch=20, color="blue")

−2 −1 0 1 2 3−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

−2−1

0 1

2 3

PC1

PC

2

PC

3

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 38/40

Page 39: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Additional Exercises

See here Link

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 39/40

Page 40: Clustering and Data Mining in R - Introductionbiocluster.ucr.edu/.../Rclustering/clustering.pdfOutline Introduction Data Preprocessing Data Transformations Distance Methods Cluster

Session Information

> sessionInfo()

R version 2.15.2 (2012-10-26)

Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] grid stats graphics grDevices utils datasets methods

[8] base

other attached packages:

[1] scatterplot3d_0.3-33 cluster_1.14.3 gplots_2.11.0

[4] MASS_7.3-22 KernSmooth_2.23-8 caTools_1.13

[7] bitops_1.0-4.2 gdata_2.12.0 gtools_2.7.0

[10] ape_3.0-6

loaded via a namespace (and not attached):

[1] gee_4.13-18 lattice_0.20-10 nlme_3.1-105 tools_2.15.2

Clustering and Data Mining in R Clustering with R and Bioconductor Slide 40/40