Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and...
-
date post
22-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and...
2
Introduction
• Also called classification analysis and numerical taxonomy
• Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized
• No (in)dependent variables
• Find naturally occurring groupings of objects
3
Uses in Studying Consumers
• Benefit segmentation
• Finding market niches
• Finding homogeneous market segments for future study
• Data reduction
8
Procedure #1: Divisive (tear down)
• Start with profile data
• Find variable with highest variance
• Split objects above and below mean on this variable
• Find remaining high variance variable and split along mean
9
Procedure #2: Agglomerative (build up)
• Select similarity measure– Distance (Euclidean, city block)– Correlation– Similarity
• Search similarity matrix for most similar cluster pair
• Repeat iteratively until only one cluster remains
11
Procedure #2: Agglomerative Stopping Rules
• Theory and practice
• Distance that clusters combine
• Within/between group variance
• Relative sizes of clusters
12
Procedure #2: Agglomerative Linkage Methods
• Single (nearest neighbor)• Makes long, thin clusters
• Complete (maximum distance to farthest neighbor)
• Sensitive to outliers• Average distance between objects• Variance methods (minimum within-
cluster variance)• Nodal (begin with two least similar
objects as nodes)
15
Procedure #2: Agglomerative Reliability and Validity Assessment
• Use different distance measures• Use different clustering methods• Split data, run both halves, and compare• Shuffle cases (objects)• Solve with subset of profile variables
16
General Problems
• Early assignments treated as permanent
– Precludes later revision for improved fit
• Number of clusters
– More clusters means greater intra-group homogeneity but less descriptive power
• No good measure of cluster compactness
• Lack of statistical properties makes inference difficult
17
General Problems (cont.)
• Coping with inter-correlated profile variables
• Must select profile variables that can discriminate among objects
• Sensitive to unit of measurement and outliers
– Fix: Standardize data and delete outliers
• Subjective interpretation of results (i.e., naming clusters)