Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

23
Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo November 10, 2005 North American Meetings of Regional Science Association International

description

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods. Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo. November 10, 2005 - PowerPoint PPT Presentation

Transcript of Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Page 1: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Julie Sungsoon HwangDepartment of Geography, University of Washington

Jean-Claude ThillDepartment of Geography, State University of New York at Buffalo

November 10, 2005North American Meetings of Regional Science Association International

Page 2: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Outlines

• Research objectives

• Methodology: specification

• Methodology: illustration

• Evaluating the performance of fuzzy clustering

• Conclusions

Page 3: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Research objectives

• Demonstrate the use of fuzzy c-means (FCM) algorithm for delineating housing submarkets– Comparison to K-means

• Discuss empirical characteristics of FCM applied to given applications, in particular choice of parameters– Cluster validity index

Page 4: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Challenges

• Are the boundaries of clusters crisp?

Cluster A

Cluster C

X1

X2

Housing market in metropolitan area q

Cluster B

Cluster A

Cluster B Cluster C

X1

X2

Housing market in metropolitan area p

Page 5: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Methodology: specification

Page 6: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

• Our task is to group census tracts to homogeneous housing submarkets within a metropolitan area

• Using fuzzy c-means algorithm• In order to examine whether fuzzy set-based

clustering can do the better job• Implemented in 85 metropolitan areas• Most of data set are public (e.g. 2000 Census)• The whole procedure is automated in GIS

Page 7: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Methodology: flow chart

National

Regional

Local…Census Tract Layer

# x1 x2 x3 … xm

1

2

3

n

# y1 y2 … yk

1

2

3

n

Cluster Analysis# U1 U2 … Uc

1 1 0 … 0

2 0 1 … 0

… 0 1 … 0

n 0 0 … 1

# U1 U2 … Uc

1 0.85 0.05 … 0.10

2 0.12 0.80 .. 0.05

… 0.02 0.74 … 0.12

n 0.40 0.03 … 0.50

K-means

Fuzzy Fuzzy CC--meansmeans

Candidate variables

Significant variables

Stepwise regression (k ≤ m)

Metro

Hard Cluster Layer

(c ≤ n)

Fuzzy Cluster Layer

…1

2

c

k: # selected variables

c: # submarkets

For each metropolitan area

Uj: membership to cluster j

Page 8: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Explanatory variables for house priceVar_Name Variable Definition Data Year Spatial Unit

Socioeconomic/demographic Characteristics of Residents

pcincome per capita income Census 2000 Census Tract

college % college degree Census 2000 Census Tract

managep % management workers Census 2000 Census Tract

prodp % production workers Census 2000 Census Tract

famcpchl % family with children Census 2000 Census Tract

nfmalone % nonfamily living alone Census 2000 Census Tract

black_p % black Census 2000 Census Tract

nhwht_p % non-hispanic white Census 2000 Census Tract

nativebr % native born Census 2000 Census Tract

Structural Characteristics of Housing Units

medroom median number of room Census 2000 Census Tract

hudetp % detached housing unit Census 2000 Census Tract

yrhublt median year structure built Census 2000 Census Tract

Locational Characteristics (Amenities) of Neighborhoods

ptratio pupil to teacher ratio NCES* 2002 School District

schexp school expenditure per student NCES 2002 School District

vrlcrime violent crime rate FBI** 2003 Designated Place

prpcrime property crime rate FBI 2003 Designated Place

jobacm job accessibility (Hansen 1959) CTPP*** 2000 Census Tract

*National Center for Education Statistics; **FBI annual report “Crime in the U.S. 2003”; *** CTPP: Census Transportation Planning Package Dependent variables: median home value of owner-occupied housing units

Page 9: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Metropolitan AreasCMSAMSA

State

300 0 300 600 Miles

N

Source: TIGER/Line 1999

Metropolitan AreasCMSAMSA

StateStudy Set

300 0 300 600 Miles

N

Source: TIGER/Line 1999

Study set: 85 metropolitan areas

Page 10: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

kx

iv

• Clustering method that minimizes the following objective function:

• Updates cluster means vi and membership degree uik until the algorithm converges

ikum

2

1 1

( )n c

mik k i A

k i

u x v

Vectors of data point, 1 ≤ k ≤ n

Center of cluster i, 1 ≤ i ≤ c

Membership degree of data point k with cluster i; [0,1]

Fuzziness amount associated with assigning data point k to cluster i, 1≤ m ≤ ∞

1 1

n nm m

i ik k ikk k

v u x u

12/( 1)

1

mc

k iik

j k j

x vu

x v

Source: Bezdek 1981

#

#

#

#

#

#

#

#

#

#

#

#

#

#

####

#

#

#

#

#

#

#

##

#

#

#

#

# #

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

# #

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

##

#

#

##

# #

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

##

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

x1

x2

What is fuzzy c-means (FCM)?

(III-3a) (III-3b)

Page 11: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

FCM: missing elements

• Optimal number of clusters c*

• Optimal fuzziness amount m*

mc

FCM

Page 12: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Extended fuzzy c-means algorithm

• Step 1: Initialize the parameters related to fuzzy partitioning: c = 2 (2 ≤ c cmax), m = 1 (1 ≤ m mmax), where c is an integer, m is a real number; Fix minc where minc is incremental value of m ( 0 < minc ≤ 0.1); Fix cut-off threshold L; Choose validity index v

• Step 2: Given c and m, initialize U(0) so that it becomes the fuzzy matrix. Then at step l, l = 0, 1, 2, ….;

• Step 3: Calculate the c fuzzy cluster centers {vi(l)} with (III-3a) and U(l)• Step 4: Update U(l+1) using (III-3b) and {vi(l)}• Step 5: Compare U(l) to U(l+1) in a convenient matrix norm; if || U(l+1) – U(l) || ≤ L to

go step 6; otherwise return to Step 3.• Step 6: Compute the validity index for given c and m• Step 7: If c < cmax, then increase c c + 1 and go to step 3; otherwise go to step 8• Step 8: If m < mmax, then increase m m + minc and go to step 3; otherwise go to

step 9• Step 9: Obtain the optimal validity index from , optimal number of clusters c*, and

optimal amount of fuzziness exponent m*; The optimal fuzzy partition U is obtained given c* and m*

Page 13: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Cluster validity indices

2

1 1

( )( )

c n

iki k

uPC U

n

Partition coefficient

21 1

[ log ( )]( )

c c

ik iki k

u uPE U

n

Partition entropy

22

1 12

,

( )

min

n c

ik k i Ak i

XB

i j i j

u x vU

n v v

Xie-Beni index

2

1

1

11 1

2(2 ) /

1 1

( )

( )

nm

ik k ic Ak

ni

ikk

VI c cw w

ij j i Ai j

u x v

uS

z z

1

1

1ij w

cj i A

l j l Al j

z z

z z

1 2 1 1 2[ , ,...., , ] [ , ,...., , ]

1 1,1 1,

T Tc c cz z z z v v v x

i c j c j i

SVi indexwhere w is set to 2 in this study

Page 14: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

• Selected validity indices are calibrated over the study set

Xie-Beni index is recommended as a validity indexAverage m* is 1.38

0

0.2

0.4

0.6

0.8

1

1.2

1.4

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Number of clusters c

Ind

ex

va

lue UXB

PC

PE

SVI/100

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

Fuzziness amount mIn

dex

val

ue

UXB

SVI/100

Determining c* and m*

Page 15: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Histogram of m* for FCM

Page 16: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Methodology: illustration

Page 17: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Median home value of Buffalo, NY

Page 18: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Dimensionality of Buffalo housing market

Predictor Coefficient Standard Error t-statistics p-value

Constant -1455768 164417 -8.85 0.000

Per capita income 2.3667 0.2791 8.48 0.000

% college degree 88221 11346 7.78 0.000

% family: couple with children 65735 18775 3.50 0.001

% detached housing unit -31260 5527 -5.66 0.000

Housing age (year) 692.88 80.26 8.63 0.000

% non-hispanic white 11186 3914 2.86 0.005

% native born status 130039 31111 4.18 0.000

Job accessibility -0.05266 0.02227 -2.36 0.019

Hedonic regression equation of median home value in Buffalo, NY

Adjusted R sq = 84.3%

Page 19: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Optimal number of housing submarkets c*, Optimal fuzziness amount m*, Buffalo, NY

c m 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

2 0.4735 0.4570 0.4380 8.0983 10.4115 12.5478 14.4334 16.0634 17.4645 18.6721

3 0.4136 0.3889 0.3460 0.3385 10.7864 12.9137 14.7939 16.4217 17.8290 19.0553

4 0.7802 0.7116 0.6080 0.5241 1.3154 6.8837 7.4807 8.0441 8.5632 9.0391

5 0.5560 0.5622 0.5940 0.6121 0.4683 0.3404 0.6489 0.6850 0.7206 0.7555

6 0.6223 0.7578 1.0187 0.8173 0.6907 1.3393 1.4074 1.4819 1.5595 1.6382

7 0.8836 0.6903 0.6881 0.6016 0.6148 0.9515 2.4397 2.6306 2.8317 3.0383

8 0.5981 0.5888 0.5703 0.5232 0.3992 0.7381 0.8910 1.2388 1.2926 1.3538

9 0.9645 0.6160 0.4836 0.4866 0.8449 1.4020 1.4198 1.8317 1.8639 1.9161

10 0.7053 0.6004 0.6619 0.5873 0.5868 1.3465 1.5081 1.6875 1.8215 1.8591

c* 3 3 3 3 8 5 5 5 5 5

Values in the cell represent Xie-Beni index given c and m

Page 20: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

ZPCINCOME ZCOLLEGE ZFAMCPCHL ZHUDETP ZYRHUBLT ZNHWHT_P ZNATIVEBR ZJOBACM

Attribute Vector

Clu

ste

r M

ea

n

Cluster 1

Cluster 2

Cluster 3

c* = 3; m* = 1.3

No Data

Membership degree to Cluster 10 - 0.10.1 - 0.20.2 - 0.30.3 - 0.40.4 - 0.50.5 - 0.60.6 - 0.70.7 - 0.80.8 - 0.90.9 - 1

Interstate Highway

(A)

Membership to Cluster 1

No Data

Membership degree to Cluster 20 - 0.10.1 - 0.20.2 - 0.30.3 - 0.40.4 - 0.50.5 - 0.60.6 - 0.70.7 - 0.80.8 - 0.90.9 - 1

Interstate Highway

(B)

Membership to Cluster 2

No Data

Membership degree to Cluster 30 - 0.0990.099 - 0.1970.197 - 0.2960.296 - 0.3950.395 - 0.4930.493 - 0.5920.592 - 0.6910.691 - 0.7890.789 - 0.8880.888 - 0.986

Interstate Highway

(C)

Membership to Cluster 3

No Data

Defuzzified Clusters123

Interstate Highway

(D)

Defuzzified Clusters

Buffalo housing submarkets

Page 21: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Evaluating the performance of fuzzy clustering

Page 22: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

• Compare the sum of squared error derived from KM (m=1) and FCM (m=m*) given c*

Fuzzy clustering outperforms crisp clustering

Paired Samples Statistics

1026.546 85 3848.268377 417.4033

745.7332 85 3022.266891 327.8109

j2_hcm

j2_fcm

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Test

280.8133 915.57126275 99.30765 83.32912 478.2974 2.828 84 .006j2_hcm - j2_fcmPair 1Mean Std. Deviation

Std. ErrorMean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

22

1 1

( )n c

ik k i Ak i

u x v

Compare FCM with K-means (KM)

Page 23: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Conclusions

• Fuzzy set theory provides a mechanism for uncertainty handling involved in classification task

• Fuzzy c-means algorithm is of practical use in delineating housing submarkets

• Fuzzy set theory needs further attention in social science fields

• More works on the choice of parameters are needed