Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and...

19
Dr. Michael R. Hyman Cluster Analysis
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and...

Dr. Michael R. Hyman

Cluster Analysis

2

Introduction

• Also called classification analysis and numerical taxonomy

• Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized

• No (in)dependent variables

• Find naturally occurring groupings of objects

3

Uses in Studying Consumers

• Benefit segmentation

• Finding market niches

• Finding homogeneous market segments for future study

• Data reduction

4

Clusters Formed by Using Data on Two Characteristics

5

Scatter Plot of Income and Education Data for PC Owners and Non-owners

6

7

8

Procedure #1: Divisive (tear down)

• Start with profile data

• Find variable with highest variance

• Split objects above and below mean on this variable

• Find remaining high variance variable and split along mean

9

Procedure #2: Agglomerative (build up)

• Select similarity measure– Distance (Euclidean, city block)– Correlation– Similarity

• Search similarity matrix for most similar cluster pair

• Repeat iteratively until only one cluster remains

10

Commonly Used Similarity Coefficients

20

11

Procedure #2: Agglomerative Stopping Rules

• Theory and practice

• Distance that clusters combine

• Within/between group variance

• Relative sizes of clusters

12

Procedure #2: Agglomerative Linkage Methods

• Single (nearest neighbor)• Makes long, thin clusters

• Complete (maximum distance to farthest neighbor)

• Sensitive to outliers• Average distance between objects• Variance methods (minimum within-

cluster variance)• Nodal (begin with two least similar

objects as nodes)

13

14

15

Procedure #2: Agglomerative Reliability and Validity Assessment

• Use different distance measures• Use different clustering methods• Split data, run both halves, and compare• Shuffle cases (objects)• Solve with subset of profile variables

16

General Problems

• Early assignments treated as permanent

– Precludes later revision for improved fit

• Number of clusters

– More clusters means greater intra-group homogeneity but less descriptive power

• No good measure of cluster compactness

• Lack of statistical properties makes inference difficult

17

General Problems (cont.)

• Coping with inter-correlated profile variables

• Must select profile variables that can discriminate among objects

• Sensitive to unit of measurement and outliers

– Fix: Standardize data and delete outliers

• Subjective interpretation of results (i.e., naming clusters)

18

Steps for Conducting a Cluster Analysis: A Summary

19