Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

22
Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples Michael A. Lindgren EWHALE Laboratory Institute of Arctic Biology University of Alaska Fairbanks February 11, 2011

description

Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples. Michael A. Lindgren EWHALE Laboratory Institute of Arctic Biology University of Alaska Fairbanks February 11, 2011. About This Run…. - PowerPoint PPT Presentation

Transcript of Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

Page 1: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

Clustering SolutionsFINAL Exploratory Run

Full 10’ Resolution – 41,311 samples

Michael A. LindgrenEWHALE Laboratory

Institute of Arctic BiologyUniversity of Alaska Fairbanks

February 11, 2011

Page 2: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

About This Run…• This “FINAL” exploratory run, refers to the decision of

which clustering level the group will choose for the final Biome Shift Analysis.

• I was able to modify the R code to pass a very large proximity matrix created in RandomForests to the PAM clustering algorithm, where all 10’ resolution samples were included.

• The clustering levels I am showing for at least the preliminary decision making about the optimal number are 5, 10, 15, 20, 25, & 30.

• Also included are silhouette plots for each cluster level.

Page 3: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

• The silhouette value for each point is a measure of how similar that point is to points in its own cluster compared to points in other clusters, and ranges from -1 to +1. It is defined as:

S(i) = (min(b(i,:),2) - a(i)) ./ max(a(i),min(b(i,:)))

• where a(i) is the average distance from the ith point to the other points in its cluster, and b(i,k) is the average distance from the ith point to points in another cluster k.

*From MathWorks website, developers of Matlab.

See document I have attached with this Presentation, which discusses the Silhouette Plots as a metric of deciding when an acceptable cluster solution is achieved.

Silhouette Plots

Page 4: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

Silhouette Plots

5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Assessment of Average Sihouette Widths of Different Cluster Solutions

Number of Clusters Returned

Aver

age

Siho

uett

e W

idth

Page 5: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

5 Clusters Returned

Page 6: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

5 Clusters Returned

Page 7: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

5 Clusters Returned

Page 8: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

10 Clusters Returned

Page 9: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

10 Clusters Returned

Page 10: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

10 Clusters Returned

Page 11: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

15 Clusters Returned

Page 12: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

15 Clusters Returned

Page 13: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

15 Clusters Returned

Page 14: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

20 Clusters Returned

Page 15: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

20 Clusters Returned

Page 16: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

20 Clusters Returned

Page 17: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

25 Clusters Returned

Page 18: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

25 Clusters Returned

Page 19: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

25 Clusters Returned

Page 20: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

30 Clusters Returned

Page 21: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

30 Clusters Returned

Page 22: Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

30 Clusters Returned