Malhotra MR6e 20
Transcript of Malhotra MR6e 20
-
7/28/2019 Malhotra MR6e 20
1/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-1
Chapter Twenty
Cluster Analysis
-
7/28/2019 Malhotra MR6e 20
2/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-2
Chapter Outline
1) Overview2) Basic Concept
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i. Formulating the Problem
ii. Selecting a Distance or Similarity Measure
iii. Selecting a Clustering Procedure
iv. Deciding on the Number of Clusters
v. Interpreting and Profiling the Clusters
vi. Assessing Reliability and Validity
-
7/28/2019 Malhotra MR6e 20
3/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-3
Chapter Outline
5) Applications of NonhierarchicalClustering
6) Clustering Variables
7) Summary
-
7/28/2019 Malhotra MR6e 20
4/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-4
Cluster Analysis
Cluster analysis is a class of techniques used to classify
objects or cases into relatively homogeneous groups calledclusters. Objects in each cluster tend to be similar to eachother and dissimilar to objects in the other clusters.Cluster analysis is also called classification analysis, ornumerical taxonomy.
Both cluster analysis and discriminant analysis areconcerned with classification. However, discriminantanalysis requires prior knowledge of the cluster or groupmembership for each object or case included, to developthe classification rule. In contrast, in cluster analysis thereis no a priori information about the group or cluster
membership for any of the objects. Groups or clusters aresuggested by the data, not defined a priori.
-
7/28/2019 Malhotra MR6e 20
5/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-5
An Ideal Clustering Situation
Variable 2
Variable1
Fig. 20.1
-
7/28/2019 Malhotra MR6e 20
6/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-6
A Practical Clustering Situation
X
Variable 2
Va
riable
1
Fig. 20.2
-
7/28/2019 Malhotra MR6e 20
7/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-7
Statistics Associated with Cluster Analysis
Agglomeration schedule. An agglomeration schedulegives information on the objects or cases being combined ateach stage of a hierarchical clustering process.
Cluster centroid. The cluster centroid is the mean valuesof the variables for all the cases or objects in a particularcluster.
Cluster centers. The cluster centers are the initial startingpoints in nonhierarchical clustering. Clusters are builtaround these centers, or seeds.
Cluster membership. Cluster membership indicates thecluster to which each object or case belongs.
-
7/28/2019 Malhotra MR6e 20
8/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-8
Statistics Associated with Cluster Analysis
Dendrogram. A dendrogram, or tree graph, is a graphicaldevice for displaying clustering results. Vertical linesrepresent clusters that are joined together. The position ofthe line on the scale indicates the distances at whichclusters were joined. The dendrogram is read from left to
right. Figure 20.8 is a dendrogram.
Distances between cluster centers. These distancesindicate how separated the individual pairs of clusters are.Clusters that are widely separated are distinct, andtherefore desirable.
-
7/28/2019 Malhotra MR6e 20
9/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-9
Statistics Associated with Cluster Analysis
Icicle plot. An icicle plot is a graphical display ofclustering results, so called because it resemblesa row of icicles hanging from the eaves of ahouse. The columns correspond to the objectsbeing clustered, and the rows correspond to the
number of clusters. An icicle plot is read frombottom to top. Figure 20.7 is an icicle plot.
Similarity/distance coefficient matrix. Asimilarity/distance coefficient matrix is a lower-triangle matrix containing pairwise distancesbetween objects or cases.
-
7/28/2019 Malhotra MR6e 20
10/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-10
Conducting Cluster Analysis
Formulate the Problem
Assess the Validity of Clustering
Select a Distance Measure
Select a Clustering Procedure
Decide on the Number of Clusters
Interpret and Profile Clusters
Fig. 20.3
-
7/28/2019 Malhotra MR6e 20
11/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-11
Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 32 2 3 1 4 5 43 7 2 6 4 1 34 4 6 4 5 3 65 1 3 2 2 6 46 6 4 6 3 3 47 5 3 6 3 3 4
8 7 3 7 4 1 49 2 4 3 3 6 310 3 5 3 6 4 611 1 3 2 3 5 312 5 4 5 4 2 413 2 2 1 5 4 414 4 6 4 6 4 715 6 5 4 2 1 416 3 5 4 6 4 717 4 4 7 2 2 518 3 7 2 6 4 319 4 6 3 7 2 720 2 3 2 4 7 2
Table
20.
1
-
7/28/2019 Malhotra MR6e 20
12/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-12
Conducting Cluster Analysis:Formulate the Problem
Perhaps the most important part of formulating theclustering problem is selecting the variables on whichthe clustering is based.
Inclusion of even one or two irrelevant variables maydistort an otherwise useful clustering solution.
Basically, the set of variables selected should describethe similarity between objects in terms that are relevantto the marketing research problem.
The variables should be selected based on past research,
theory, or a consideration of the hypotheses beingtested. In exploratory research, the researcher shouldexercise judgment and intuition.
-
7/28/2019 Malhotra MR6e 20
13/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-13
Conducting Cluster Analysis:Select a Distance or Similarity Measure
The most commonly used measure of similarity is the Euclidean
distance or its square. The Euclidean distance is the square root ofthe sum of the squared differences in values for each variable. Otherdistance measures are also available. The city-block or Manhattandistancebetween two objects is the sum of the absolute differencesin values for each variable. The Chebychev distance between twoobjects is the maximum absolute difference in values for any
variable. If the variables are measured in vastly different units, the clustering
solution will be influenced by the units of measurement. In thesecases, before clustering respondents, we must standardize the databy rescaling each variable to have a mean of zero and a standarddeviation of unity. It is also desirable to eliminate outliers (cases
with atypical values). Use of different distance measures may lead to different clustering
results. Hence, it is advisable to use different measures and comparethe results.
-
7/28/2019 Malhotra MR6e 20
14/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-14
A Classification of Clustering Procedures
Fig. 20.4
NonhierarchicalHierarchical
Agglomerative Divisive
SequentialThreshold
ParallelThreshold
OptimizingPartitioning
LinkageMethods
VarianceMethods
CentroidMethods
WardsMethod
SingleLinkage
CompleteLinkage
AverageLinkage
Other
Two-Step
Clustering Procedures
-
7/28/2019 Malhotra MR6e 20
15/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-15
Conducting Cluster Analysis:Select a Clustering ProcedureHierarchical
Hierarchical clustering is characterized by thedevelopment of a hierarchy or tree-like structure.Hierarchical methods can be agglomerative or divisive.
Agglomerative clustering starts with each object in aseparate cluster. Clusters are formed by grouping objects
into bigger and bigger clusters. This process is continueduntil all objects are members of a single cluster.
Divisive clustering starts with all the objects grouped in asingle cluster. Clusters are divided or split until each objectis in a separate cluster.
Agglomerative methods are commonly used in marketingresearch. They consist of linkage methods, error sums ofsquares or variance methods, and centroid methods.
-
7/28/2019 Malhotra MR6e 20
16/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-16
Conducting Cluster Analysis:Select a Clustering Procedure Linkage Method
The single linkage method is based on minimum distance,or the nearest neighbor rule. At every stage, the distancebetween two clusters is the distance between their two closestpoints (see Figure 20.5).
The complete linkage method is similar to single linkage,except that it is based on the maximum distance or the
furthest neighbor approach. In complete linkage, the distancebetween two clusters is calculated as the distance betweentheir two furthest points.
The average linkage method works similarly. However, inthis method, the distance between two clusters is defined asthe average of the distances between all pairs of objects,
where one member of the pair is from each of the clusters(Figure 20.5).
-
7/28/2019 Malhotra MR6e 20
17/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-17
Linkage Methods of Clustering
Fig. 20.5 Single Linkage
Minimum Distance
Complete Linkage
MaximumDistance
Average Linkage
Average Distance
Cluster 1 Cluster 2
Cluster 1 Cluster 2
Cluster 1 Cluster 2
C d i Cl A l i
-
7/28/2019 Malhotra MR6e 20
18/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-18
Conducting Cluster Analysis:Select a Clustering Procedure Variance Method
The variance methods attempt to generate clusters tominimize the within-cluster variance.
A commonly used variance method is the Ward's procedure.For each cluster, the means for all the variables are computed.Then, for each object, the squared Euclidean distance to thecluster means is calculated (Figure 20.6). These distances are
summed for all the objects. At each stage, the two clusterswith the smallest increase in the overall sum of squares withincluster distances are combined.
In the centroid methods, the distance between two clustersis the distance between their centroids (means for all thevariables), as shown in Figure 20.6. Every time objects are
grouped, a new centroid is computed. Of the hierarchical methods, average linkage and Ward's
methods have been shown to perform better than the otherprocedures.
-
7/28/2019 Malhotra MR6e 20
19/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-19
Other Agglomerative Clustering Methods
Wards Procedure
Centroid Method
Fig. 20.6
C d ti Cl t A l i
-
7/28/2019 Malhotra MR6e 20
20/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-20
Conducting Cluster Analysis:Select a Clustering Procedure Nonhierarchical
The nonhierarchical clustering methods are frequently
referred to as k-means clustering. These methods includesequential threshold, parallel threshold, and optimizingpartitioning.
In the sequential threshold method, a cluster center isselected and all objects within a prespecified threshold valuefrom the center are grouped together. Then a new clustercenter or seed is selected, and the process is repeated for theunclustered points. Once an object is clustered with a seed, itis no longer considered for clustering with subsequent seeds.
The parallel threshold method operates similarly, except thatseveral cluster centers are selected simultaneously and objects
within the threshold level are grouped with the nearest center. The optimizing partitioning method differs from the two
threshold procedures in that objects can later be reassigned toclusters to optimize an overall criterion, such as average withincluster distance for a given number of clusters.
C d ti Cl t A l i
-
7/28/2019 Malhotra MR6e 20
21/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-21
Conducting Cluster Analysis:Select a Clustering Procedure
It has been suggested that the hierarchical andnonhierarchical methods be used in tandem. First, an initialclustering solution is obtained using a hierarchicalprocedure, such as average linkage or Ward's. The numberof clusters and cluster centroids so obtained are used as
inputs to the optimizing partitioning method.
Choice of a clustering method and choice of a distancemeasure are interrelated. For example, squared Euclideandistances should be used with the Ward's and centroidmethods. Several nonhierarchical procedures also use
squared Euclidean distances.
-
7/28/2019 Malhotra MR6e 20
22/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-22
Results of Hierarchical Clustering
Stage clusterClusters combined first appears
Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage1 14 16 1.000000 0 0 62 6 7 2.000000 0 0 73 2 13 3.500000 0 0 154 5 11 5.000000 0 0 11
5 3 8 6.500000 0 0 166 10 14 8.160000 0 1 97 6 12 10.166667 2 0 108 9 20 13.000000 0 0 119 4 10 15.583000 0 6 1210 1 6 18.500000 6 7 1311 5 9 23.000000 4 8 1512 4 19 27.750000 9 0 17
13 1 17 33.100000 10 0 1414 1 15 41.333000 13 0 1615 2 5 51.833000 3 11 1816 1 3 64.500000 14 5 1917 4 18 79.667000 12 0 1818 2 4 172.662000 15 17 1919 1 2 328.600000 16 18 0
Agglomeration Schedule Using Wards Procedure
Table 20.2
-
7/28/2019 Malhotra MR6e 20
23/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-23
Results of Hierarchical Clustering
Number of ClustersLabel case 4 3 2
1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 17 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 215 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
Cluster Membership of Cases Using Wards Procedure
Table
20.
2,
cont.
-
7/28/2019 Malhotra MR6e 20
24/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-24
Vertical Icicle Plot Using Wards MethodFig. 20.7
-
7/28/2019 Malhotra MR6e 20
25/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-25
Fig. 20.8
Dendrogram Using Wards Method
Conducting Cluster Analysis:
-
7/28/2019 Malhotra MR6e 20
26/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-26
Conducting Cluster Analysis:Decide on the Number of Clusters
Theoretical, conceptual, or practical considerations maysuggest a certain number of clusters.
In hierarchical clustering, the distances at which clustersare combined can be used as criteria. This information canbe obtained from the agglomeration schedule or from the
dendrogram. In nonhierarchical clustering, the ratio of total within-group
variance to between-group variance can be plotted againstthe number of clusters. The point at which an elbow or asharp bend occurs indicates an appropriate number of
clusters. The relative sizes of the clusters should be meaningful.
Conducting Cluster Analysis:
-
7/28/2019 Malhotra MR6e 20
27/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-27
Conducting Cluster Analysis:Interpreting and Profiling the Clusters
Interpreting and profiling clusters involvesexamining the cluster centroids. Thecentroids enable us to describe each cluster
by assigning it a name or label.
It is often helpful to profile the clusters interms of variables that were not used forclustering. These may include demographic,
psychographic, product usage, media usage,or other variables.
-
7/28/2019 Malhotra MR6e 20
28/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-28
Cluster Centroids
Table 20.3
Cluster V1 V2 V3 V4 V5 V6
1 5.750 3.625 6.000 3.125 1.750 3.875
2 1.667 3.000 1.833 3.500 5.500 3.333
3 3.500 5.833 3.333 6.000 3.500 6.000
Means of Variables
Conducting Cluster Analysis:
-
7/28/2019 Malhotra MR6e 20
29/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-29
Conducting Cluster Analysis:Assess Reliability and Validity
1. Perform cluster analysis on the same data using differentdistance measures. Compare the results across measures todetermine the stability of the solutions.
2. Use different methods of clustering and compare the results.
3. Split the data randomly into halves. Perform clustering
separately on each half. Compare cluster centroids across thetwo subsamples.
4. Delete variables randomly. Perform clustering based on thereduced set of variables. Compare the results with thoseobtained by clustering based on the entire set of variables.
5. In nonhierarchical clustering, the solution may depend on theorder of cases in the data set. Make multiple runs usingdifferent order of cases until the solution stabilizes.
-
7/28/2019 Malhotra MR6e 20
30/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-30
Table 20.4
Results of Nonhierarchical Clustering
Initial Cluster Centers
4 2 7
6 3 2
3 2 6
7 4 42 7 1
7 2 3
V1
V2
V3
V4V5
V6
1 2 3
Cluster
Convergence achieved due to no or small distancechange. The maximum distance by which any centerhas changed is 0.000. The current iteration is 2. Theminimum distance between initial centers is 7.746.
a.
Iteration Historya
2.154 2.102 2.5500.000 0.000 0.000
Iteration
12
1 2 3
Change in Cluster Centers
-
7/28/2019 Malhotra MR6e 20
31/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-31
Results of Nonhierarchical Clustering
Table 20.4 cont.
Cluster Membership
3 1.414
2 1.323
3 2.550
1 1.404
2 1.848
3 1.225
3 1.500
3 2.1212 1.756
1 1.143
2 1.041
3 1.581
2 2.598
1 1.404
3 2.828
1 1.624
3 2.598
1 3.555
1 2.154
2 2.102
Case Number
1
2
3
4
5
6
7
89
10
11
12
13
14
15
16
17
18
19
20
Cluster Distance
-
7/28/2019 Malhotra MR6e 20
32/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-32
Results of Nonhierarchical Clustering
Table 20.4, cont.Final Cluster Centers
Cluster
1 2 3V1V2
V3V4V5V6
46
3646
23
2463
64
6324
Distances between Final Cluster Centers
1 2 3
123
5.5685.698
5.568
6.928
5.6986.928
Cluster
-
7/28/2019 Malhotra MR6e 20
33/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-33
ANOVA
29.108 2 0.608 17 47.888 0.000
13.546 2 0.630 17 21.505 0.000
31.392 2 0.833 17 37.670 0.000
15.713 2 0.728 17 21.585 0.000
22.537 2 0.816 17 27.614 0.000
12.171 2 1.071 17 11.363 0.001
V1
V2
V3
V4
V5
V6
Mean Square dfCluster
Mean Square dfError
F Sig.
The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the differences among cases in different clusters. The observedsignificance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
Number of Cases in each Cluster
6.000
6.000
8.000
20.000
0.000
1
2
3
Cluster
Valid
Missing
Results of Nonhierarchical Clustering
Table 20.4, cont.
f
-
7/28/2019 Malhotra MR6e 20
34/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-34
Results of Two-Step Clustering
Table 20.5 Auto-Clustering
Number of Clusters
Akaike's
InformationCriterion (AIC)
AICChange(a)
Ratio of AICChanges(b)
Ratio of
DistanceMeasures(c)
1 104.140
2 101.171 -2.969 1.000 .847
3 97.594 -3.577 1.205 1.583
4 116.896 19.302 -6.502 2.115
5 138.230 21.335 -7.187 1.222
6 158.586 20.355 -6.857 1.0217 179.340 20.755 -6.991 1.224
8 201.628 22.288 -7.508 1.006
9 224.055 22.426 -7.555 1.111
10 246.522 22.467 -7.568 1.588
11 269.570 23.048 -7.764 1.001
12 292.718 23.148 -7.798 1.055
13 316.120 23.402 -7.883 1.00214 339.223 23.103 -7.782 1.044
15 362.650 23.427 -7.892 1.004
a The changes are from the previous number of clusters in the table.b The ratios of changes are relative to the change for the two cluster solution.c The ratios of distance measures are based on the current number of clusters
against the previous number of clusters.
-
7/28/2019 Malhotra MR6e 20
35/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-35
Cluster Distribution
Table 20.5, cont.
N% of
Combined % of Total1 6 30.0% 30.0%
2 6 30.0% 30.0%
3 8 40.0% 40.0%
Cluster
Combined 20 100.0% 100.0%
Total 20 100.0%
Cl t P fil
-
7/28/2019 Malhotra MR6e 20
36/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-36
Cluster Profiles
Table 20.5, cont.
Fun Bad for Budget Eating Out
Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation
1 1.67 .516 3.00 .632 1.83 .753
2 3.50 .548 5.83 .753 3.33 .816
3 5.75 1.035 3.63 .916 6.00 1.069
Cluster
Combined 3.85 1.899 4.10 1.410 3.95 2.012
Best Buys Don't Care Compare Prices
Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation
3.50 1.049 5.50 1.049 3.33 .8166.00 .632 3.50 .837 6.00 1.549
3.13 .835 1.88 .835 3.88 .641
4.10 1.518 3.45 1.761 4.35 1.496
Cl t i V i bl
-
7/28/2019 Malhotra MR6e 20
37/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-37
Clustering Variables
In this instance, the units used for analysis are the variables,
and the distance measures are computed for all pairs ofvariables.
Hierarchical clustering of variables can aid in the identificationof unique variables, or variables that make a uniquecontribution to the data.
Clustering can also be used to reduce the number ofvariables. Associated with each cluster is a linear combinationof the variables in the cluster, called the cluster component.A large set of variables can often be replaced by the set ofcluster components with little loss of information. However, a
given number of cluster components does not generallyexplain as much variance as the same number of principalcomponents.
-
7/28/2019 Malhotra MR6e 20
38/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-38
SPSS Windows
To select this procedure using SPSS for Windows,click:
Analyze>Classify>Hierarchical Cluster
Analyze>Classify>K-Means Cluster
Analyze>Classify>Two-Step Cluster
-
7/28/2019 Malhotra MR6e 20
39/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-39
SPSS Windows: Hierarchical Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then HIERARCHICAL CLUSTER.3. Move Fun [v1], Bad for Budget [v2], Eating Out [v3], Best Buys [v4],
Dont Care [v5], and Compare Prices [v6] into the VARIABLES box.
4. In the CLUSTER box, check CASES (default option). In the DISPLAY box,check STATISTICS and PLOTS (default options).
5. Click on STATISTICS. In the pop-up window, check AGGLOMERATIONSCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OFSOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and forMAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.
6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In theICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box,check VERTICAL. Click CONTINUE.
7. Click on METHOD. For CLUSTER METHOD, select WARDS METHOD. In theMEASURE box, check INTERVAL and select SQUARED EUCLIDEANDISTANCE. Click CONTINUE.
8. Click OK.
-
7/28/2019 Malhotra MR6e 20
40/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-40
SPSS Windows: K-Means Clustering
1. Select ANALYZE from the SPSS menu bar.2. Click CLASSIFY and then K-MEANS CLUSTER.
3. Move Fun [v1], Bad for Budget [v2], Eating Out [v3],Best Buys [v4], Dont Care [v5], and Compare Prices[v6] into the VARIABLES box.
4. For NUMBER OF CLUSTER, select 3.
5. Click on OPTIONS. In the pop-up window, in theSTATISTICS box, check INITIAL CLUSTER CENTERS andCLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.
6. Click OK.
-
7/28/2019 Malhotra MR6e 20
41/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-41
SPSS Windows: Two-Step Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then TWO-STEP CLUSTER.
3. Move Fun [v1], Bad for Budget [v2], Eating Out [v3],Best Buys [v4], Dont Care [v5], and Compare Prices[v6] into the CONTINUOUS VARIABLES box.
4. For DISTANCE MEASURE, select EUCLIDEAN.
5. For NUMBER OF CLUSTER, select DETERMINEAUTOMATICALLY.
6. For CLUSTERING CRITERION, select AKAIKES
INFORMATION CRITERION (AIC).
7. Click OK.
-
7/28/2019 Malhotra MR6e 20
42/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-42
SAS Learning Edition
To select this procedure using SAS LearningEdition, click:
Analyze>Classify>Cluster Analysis
i di i i hi l l i
-
7/28/2019 Malhotra MR6e 20
43/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-43
SAS Learning Edition: Hierarchical Clustering
1. Select ANALYZE from the SAS Learning Edition
menu bar.
2. Select Multivariate>Cluster Analysis.
3. Move V1-V6 to the Analysis variables task role.
4. Click Cluster and select Wards minimumvariance method under Cluster method.
5. Click Results and select Simple summarystatistics.
6. Click Run.
SAS Learning Edition Windows:
-
7/28/2019 Malhotra MR6e 20
44/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-44
SAS Learning Edition Windows:K-Means Clustering
1. Select ANALYZE from the SAS Learning Editionmenu bar.
2. Select Multivariate>Cluster Analysis.
3. Move V1-V6 to the Analysis variables task role.
4. Click Cluster and select K-means algorithm as thecluster method and 3 for the Maximum number ofclusters.
5. Click Run.
-
7/28/2019 Malhotra MR6e 20
45/46
Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-45
-
7/28/2019 Malhotra MR6e 20
46/46
All rights reserved. No part of this publication may bereproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical,photocopying, recording, or otherwise, without the priorwritten permission of the publisher. Printed in the United
States of America.
Copyright 2010 Pearson Education, Inc.publishing as Prentice Hall