IEEE Paper Template in A4 (V1) - University of Bristol · Web viewIn total, 9 models are specified....

Comparing Multilevel Modelling and Artificial Neural Networks in House Price Prediction

Yingyu Feng1, Kelvyn Jones2

School of Geographical Sciences and Centre for Multilevel ModellingThe University of Bristol, the United Kingdom

[email protected]@Bristol.ac.uk

Abstract— Two advanced modelling approaches, Multi-Level Models and Artificial Neural Networks are employed to model house prices.approaches and the standard Hedonic Price Model are compared in terms of predictive accuracy, capability to capture location information, and their explanatory power. These models are applied to 2001-2013 house prices in the Greater Bristol area, using secondary data from the Land Registry, the Population Census and Neighbourhood Statistics so that these models could be applied nationally. The results indicate that MLM offers good predictive accuracy with high explanatory power, especially if neighbourhood effects are explored at multiple spatial scales.

Keywords— House Prices, Multilevel Modelling, Artificial Neural Networks, Predictive Accuracy

I. INTRODUCTION

“Location, location, location” is frequently used by real estate agents when marketing residential properties. Numerous studies have demonstrated the importance of location and the surrounding neighbourhoods of properties in price determination. However, some neighbourhood attributes may be unobservable, and it remains an open question of how best to capture locational information in house price modelling. Here we present two advanced quantitative approaches, Multi-Level Models (MLM) and Artificial Neural Network (ANN) and compared their predictive accuracy, capability to capture location information, and explanatory power against the baseline, widely-deployed, the standard Hedonic Price Model (HPM). There is no published work to date comparing ANN with MLM. The use of a much larger dataset than previous publications is another important contribution of this study.

The rest of the paper is organized as follows. Section 2 provides a schematic specification of each modelling approach and a review of previous studies utilising ANN and MLM. Section 3 considers how the models are operationalised, in terms of study area and data, scenarios to capture locational information, and performance measures for competing models. The results are presented in section 4 with the conclusions in section 5.

II. SPECIFICATION OF HPM, MLM AND ANN

A. The Hedonic Price ModelThe standard Hedonic Price Model (HPM) regresses the

house prices on a range of its constituent attributes (e.g. number of bedrooms, type of house, and distance to the city centre ) and is usually calibrated using the ordinary least squares (OLS) method. It has become a popular approach to modelling house price as it is easy to apply and can reveal the comparative size of effects of various property and neighbourhood characteristics on house prices. There are numerous studies using HPM to model house price (e.g. [1]-[4]) and a number of reviews have also been conducted (e.g. [5]-[8]).

Although this approach has been widely used, it does not recognize that property and neighbourhood characteristics are in fact measured at different scales. Moreover, all the unexplained variation is assumed to be between individual houses. The number of houses is not distinguished from the number of neighbourhoods and the standard errors of the measured neighbourhood variables will be under estimated. There is a considerably increased chance of committing a type 1 error. Multilevel models are designed to analyse such hierarchies and do not suffer these problems.

The HPM’s parametric nature is also questioned and semi- or non-parametric models have been proposed, including ANN as one of these non-parametric approaches such as [9]-[12]. We now examine the MLM and ANN in more detail.

B. Multilevel Model MLM combines the micro-level equation, describing the

within-neighbourhood between-house relationship for individual properties and the macro-level equation, the between-neighbourhood relationship, into one model. The equation can be expressed as:

y ij=β0+β1 x ij+¿)Where y ij and x1 ijrepresent the house price and house

characteristic for level-1 house i in level-2 neighbourhood j (for example, house type), respectively. The parametersβ0∧β1 are respectively the mean price of non-detached houses (x1 ij=0) and the mean price differentials for detached houses (x1 ij=1) averaged across all neighbourhoods. . The terms u0 j and u1 j represent the unexplained price differentials of neighbourhood j for non-detached and detached houses, and e ij represents the differential price for house i in neighbourhood . It is assumed that the house-level

residuals are mutually independent and follow a normal distribution e ij ~ N (0 , σ e

2) and the neighbourhood-residuals follow the distributional assumptions denoted as:

[u0 j

u1 j] N ¿)

The variance σ u 02 summarises the extent to which the non-

detached house price differential varies from place while σ u12

summarises the extent to which the differentials for detached houses differ between neighbourhoods. The covariance assesses the extent to which a neighbourhood that is expensive for non-detached properties is differentially expensive for detached properties. These random effects are unmeasured or latent variables representing the differential ‘desirability’ of a neighbourhood for non-detached and detached property. This allows the unobserved neighbourhood characteristics to play a role ‘behind the scenes’ in the effects of the observed variables on house prices ([13]). We can estimate these latent effects in the multilevel model and use them in our predictions. The local estimates of residuals u0 j and u1 j are precision-weighted and potentially “shrunken” towards the overall mean relationship for all neighbourhoods ([14]). This means that the MLM estimates of the neighbourhood effects are reliable and can be used to improve predictions of prices by including these robust estimates of the un-measured effects.

The model can include further house and neighbourhood characteristics and the standard 2-level model can be easily extended to more than two levels, or used for non-hierarchical data where two types of higher-level units overlap with each other in a cross-classified model.

There is only limited work employing MLM in house prices research and no published work to date comparing it with other modelling approaches either conceptually or empirically in house price prediction. Most of the studies examined the effects of house and various neighbourhood characteristics on house prices (e.g. [15]-[20]) It has also been used as a tool to segment house submarkets ([21], [22]) and model spatiotemporal heterogeneity in house prices ([16], [23]).

C. Artificial Neural NetworksANN is a form of artificial intelligence that consists of a

number of interconnected processing elements, or neurons, that mimic the functions of biological neurons to process information in parallel ([24]). One of the most popular ANNs is Multi-Layer Perceptron (MLP) as it is capable of mapping the non-linear relationships within the data. MLP typically contains an input layer, an output as the last layer, and one or more hidden layer(s) between the input and output layer. Each layer consists of a number of ‘neurons’, which are interconnected to the neurons in the immediate adjacent layers by a set of weights, representing the strength of connection. A Back-propagation (BP) training algorithm as in [25] is commonly used to train the ANNs, which calculate the errors between the predicted output and the target output (actual

house price) and back-propagate the error to adjust the connection weights between the neurons in adjacent layers with the aim to find the optimal connections between the neurons that best map the relationships between inputs (e.g. various attributes of individual houses and neighbourhoods) and output (e.g. house prices). Once the relationship is established, the networks can be used to predict house prices. We use a feed-forward ANN where neurons are only connected forwards and there are no connections back as feedforward is best suited (as argued in [26]) for modelling relationships between a set of inputs and one or more output variables.

The application of ANN in real estate started in 1990s. Almost all the studies compared ANN with multiple regression calibrated with OLS estimation, essentially the standard HPM model, with few exceptions ([27], [28]). Some found that ANN is superior to standard HPM in terms of the predictive accuracy (e.g. [29]-[34]), but others found that this was not necessarily so (e.g.[11], [12], [35]).

There is also no consensus as to whether ANN is more suitable for larger or smaller datasets. Reference [30] and [32] have showed that a larger training set enhance predictive capability since this approach excels at ‘learning’ from the known data to estimate the ‘unknown’. But [12] prefers ANN for smaller datasets as it predicts better and is less influenced by outliers. All the studies to date have used relatively small data set (less than 6,000 observations) with only two exceptions as in [30] with 46,467 samples and [36] with 16,366 samples. The present study serves as a large-scale empirical study using the largest dataset deployed in an ANN approach to house price predictions.

III. DATA AND SCENARIOS

A. Setting and DataThis study selects Greater Bristol as the study area as it has

a diversified urban and rural mixture and includes affluent suburbs and former social, local-authority owned housing. It also has a mixture of different property types. The area has a population of approximately 1.08 million people in 2009 and covers around 1,342 km2 of land.

The house prices and property attributes are from the Land Registry of England and Wales. Each record contains the actual sold price, the date of the sale, the address, the unit postcode, property type, duration (leasehold or freehold), and whether the property is newly-built at the time of the sale. The location of each sale is geocoded based on the unit postcode of the property to obtain its Ordnance Survey National Grid references so that the effect of location on house prices can be modelled. The geographical locations of sold properties in 2001-2013 in Greater Bristol are displayed in Fig. 1.

Before obtaining the neighbourhood characteristics, it is necessary to define the neighbourhood. This is also an essential step in the MLM design which requires the higher-level units to be pre-specified. We decided to use 2001 census Output Areas (OAs) as the finest neighbourhood level because OAs are designed to be homogenous in terms of household

tenure and property type. As house prices may be determined by processes operating at a variety of spatial scales, we treat neighbourhood as having multiple scales where OAs are seen as nested in lower layer super output areas (LSOAs) and then in middle layer super output areas (MSOAs), forming a strictly nested structure. The usage of these off-the-shelf geography units allows the methodology to be applied nationally and routinely without time consuming and expensive research in the submarket delineation. The neighbourhood characteristics are abstracted from 2001 census data and most of them are measured at the OA level except the Index of Multiple Deprivation scores (IMD) (measured at LSOA level) and neighbourhood median income (estimated at MSOA level).

!

!!

!

!

!

!!!

!

!

!

!

!

!

!

!!

!

!

!

! !

!

!

!!

!

!!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

! !!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

! !

!

! !

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

! !

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!!

!

!

!

!

!

!!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

! !

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!! !

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

! ! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

!

!!

! !

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

! !

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!!

!!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !!

!

!

!

! !

!

!

!

!

!!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!! !

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

! !

!!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

Fig. 1 The geographical locations of sold properties in 2001-2013

In total 65,302 house sales were selected as the full sample with up to 20 sales randomly drawn from each OAs. These houses are nested in 3320 OAs, which in turn are nested in 653 LSOAs and 137 MSOAs. This is the biggest sample used in applying ANN to model house prices and should provide a good empirical test of the suitability of ANN for large datasets.

The total sample are split into training set (sales from 2001 to 2012) and testing set (2013). The training set is used for model calibration and then predictions are made for the testing set. This design allows the examination of house price changes over a relatively long period and provides a stringent and demanding test of the models in a period of substantial market turbulence. We have mapped the actual mean house price in 2013 by OAs (Fig. 2) and the predicted errors for the same period will be presented in section IV.

B. Scenarios Three scenarios are designed with different representations

of space and place. In all scenarios, a 4-level hierarchical structure (houses nested in OAs, which in turn nested in LSOAs, and then in MSOAs) is used for MLM. There is no neighbourhood delineation indicator used in HPM or ANN as they are not capable of handling a large number of binary variables required for practical specification.

´Legend<£100K

£100-400K

£401-500K

£501-700K

>700K

Fig. 2 Average 2013 actual price by OAs in Greater Bristol

1) Scenario 1: Property Characteristics OnlyHouse characteristics and the time of the sale are included

in this scenario as predictors. The property characteristics are specified as full interactions representing 16 combinations of property type. Time of the sale is included as a third order polynomial to capture the time trend. No locational or measured neighbourhood characteristics are included in this scenario.

2) Scenario 2: Property Characteristics and Grid References

Building on scenario 1, the Grid references of property location (Easting and Northing) are included as additional explanatory variables. The polynomial expansion of the coordinates in the model is also possible, which effectively is a trend surface analysis ([37]), which is extended by [14] to a multilevel approach. This scenario allows us to test whether neighbourhood still matters when the locations of properties are included in the model.

3) Scenario 3: Property and Neighbourhood Attributes In this scenario the Grid references are replaced by

measured neighbourhood characteristics to examine whether neighbourhood still matters when neighbourhood and property characteristics are accounted for and whether locational information is better represented by its absolute value or by neighbourhood attributes. To facilitate model calibration and interpretation, all continuous explanatory variables are group-mean-centred so the intercept term represents the reference property category in a typical neighbourhood.

Given that there are 20 neighbourhood variables, we have examined the predictors for multicollinearity by calculating variance inflation factors (VIFs) ([38]). Generally VIF greater than 10 indicates potentially damaging multicollinearity. In this set of neighbourhood characteristics, all the VIF are below 10 and the majority are below 5, with the VIF for 2004 IMD score (VIF=8.89) and proportion of people have no qualifications (VIF=7.38) being the highest.

C. Performance MeasuresIn order to reach a balanced view of a model’s

performance, we have compared the model in terms of their goodness-of-fit, predictive accuracy and their explanatory power. R2 is used to as the goodness-of-fit measure, Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) as the accuracy measures. We have also considered the face validity of the models, and we are particularly concerned with which variables are the most influential in determining the predicted prices. The spatial distribution of the residuals are also examined by graphical presentation and Moran’s Index.

IV. EMPIRICAL RESULTS AND DISCUSSION

In total, 9 models are specified. HPM and MLM are fitted in the software package MLwiN 2.32 ([39]) ANN models are calibrated using IBM SPSS package version 21 ([40]) as this implementation accepts categorical variables without the need for manual transformation and has the inbuilt feature to automatically select the number of hidden neurons. Due to limited space, only the comparison results are presented here. Detailed model results are available upon request.

A. Goodness-of-fit The comparison of goodness-of-fit are presented in Table Ⅰ

. The R2of HPM and ANN increase sequentially from scenario 1 to 3 for both in-sample and hold-out samples, but R2 of MLM hardly changes between different scenarios. This indicates that the locations are better represented by neighbourhood characteristics than grid references for HPM and ANN. For MLM, because the influence of measured locational and neighbourhood information has already been represented by the multiple levels of latent place effects, the inclusion of grid references and measured neighbourhood attributes does not t improve the goodness-of-fit.

TABLE Ⅰ COMPARISONS OF GOODNESS-OF-FIT

R2 (training set) R2 (test set)HPM1 0.39 0.23MLM1 0.75 0.75ANN1 0.39 0.23HPM2 0.43 0.3MLM2 0.75 0.75ANN2 0.41 0.26HPM3 0.68 0.65MLM3 0.75 0.74ANN3 0.69 0.67

B. Predictive AccuracyThe accuracy comparisons are summarized Table Ⅱ , based

on the testing set, 4141 properties sold in 2013. This shows how well these modelling strategies extrapolate to new set of data based on historical house prices, at a time of a turbulent market environment. Similar conclusions are reached as in the goodness-of-fit comparison. The predictive accuracy improve

in HPM and ANN sequentially from scenario 1 to 3, while the predictive accuracy of MLM hardly changes. MLM remains to be the best performer in each scenario with an average error

of £49k between the predicted and actual house prices (around 18% of the actual prices). This demonstrates that the extrapolation capability of MLM from modelled data to newly unknown data is reasonably high, even when the housing market has been very turbulent in the model calibration period 2001-2012.

TABLE Ⅱ

COMPARISON OF PREDICTIVE ACCURACY

Test set

MAE (lnP)

MAPE (lnP)

MAE(raw price)

MAPE (raw price)

HPM1 0.319 5.89% 80.4 30.9%MLM1 0.178 3.29% 48.6 17.5%ANN1 0.318 5.85% 80.1 30.0%HPM2 0.304 5.61% 77.0 29.4%MLM2 0.178 3.29% 48.6 17.5%ANN2 0.313 5.76% 79.0 29.8%HPM3 0.210 3.89% 25.3 20.7%MLM3 0.178 3.30% 48.8 17.6%ANN3 0.216 4.00% 55.7 20.9%

C. Model Explanatory Capability We have also compared the explanatory capacity of each

modelling approach, that is their ability to reveal the nature of the estimated relationships and what still remains unexplained. We only presented the results for scenario 3 where ANN and HPM achieve the best goodness-of-fit and predictive accuracy.

Table gives the size of effect of each neighbourhood variable and the 16 house types with for the range from the minimum to the maximum in thousands £. The relative importance of each input of ANN is also compared. ANN is traditionally seen as rather opaque in revealing the relationships between the inputs and the output. But the SPSS implementation produces the relative importance for each input variable using sensitivity analysis computed from the in-sample data.

It can be seen that the average size of the property in OA, property characteristics and the time of the sale are very important predictors producing differences of over £100k These are similar to the ANN results, but there are also some major differences for some other predictors such as the percentage of old people and the 2004 IMD crime score between the two modelling strategies. It should be noted though that importance measure of ANN does not indicate the ‘direction’ of the effects and there are no confidence intervals on the estimates. The exact effects are still very hard to quantify, especially for categorical variables such as the type of property in ANN.

Another advantage of MLM is that it reveals how much remains to be explained and at what level. Fig. 3 shows the percentage unexplained for a series of MLM models. In the null model where no predictor is included, the total unexplained variance is 100%, which is partitioned into 4

levels. The majority of the price variation is at the macro MSOA level, meaning this macro geography play an important role in influencing house prices. The inclusion of the property characteristics brings an improvement in the model and the unexplained variance reduces to 65%. Finally by including 20 neighbourhood predictors, there is a substantial fall in the unexplained variance to 34% and the majority of this now lies between properties. The macro MSOAs remain to be the most important neighbourhood scales. This means that some properties and some neighbourhoods are desired over and above the attributes that we are able to measure and that latent variable, that is multilevel modelling, is needed and brings benefits.

TABLE ⅢEFFECT SIZE OF PREDICTORS BY MLM AND ANN IN SCENARIO 3

Variable Min(£k)

Max(£k)

Range(£k) Level MLM

rankANN Rank

Room 121 323 202 OA 1 1

16 Types 121 311 190 House 2 4,12,14

Year 87 191 105 House 3 2Degree 164 252 88 OA 4 3Old 175 246 71 OA 5 19Flat 178 227 48 OA 6 6IMD04 154 196 42 LSOA 7 5No_edu 162 202 40 OA 8 9Black 150 187 37 OA 9 15Unemply 175 212 37 OA 10 7Det 160 194 34 OA 11 11Occupy 184 211 27 OA 12 16Young 171 197 26 OA 13 8Pri_rent 184 208 24 OA 14 18LnIncom 174 197 23 MSOA 15 13IMDbar 177 200 22 LSOA 16 22IMDenv 183 195 12 LSOA 17 23Green 184 194 10 OA 18 20Noheat 179 188 9 OA 19 21Soc_rent 179 188 9 OA 20 17Terrace 183 188 4 OA 21 24IMDcrime 184 189 4 LSOA 22 10

D. Spatial Assessment of Model Residuals In order to examine whether there is any spatial clustering

pattern of the residuals from the models, we present the predicted error percentage of the actual price to Fig. 4 for the results produced by ANN and MLM in scenario 3. We have selected to only presented the over- and under-prediction by more than 25% (representing around 10% on either end) as it will give us a much clearer picture and we are more concerned about the geographical distribution of the overly inaccurate predictions. The patterns of residuals from the two models are broadly similar, but there are fewer houses in each group (<-25%, 25-50% and >50%) predicted by MLM than by ANN model. There does not appear to be any obvious clustering, but the Moran’s I statistics indicate there are still signification positive autocorrelations between the residuals in all three

models in scenario 3. We are exploring spatial multilevel models as in [41] to take account of this dependency.

Perc

ent U

nexp

lain

ed

Null +Property +Neighbourhood

0

20

40

60

80

100

Total Prop OA LSOAMSOA 0

20

40

60

80

100

Total Prop OA LSOAMSOA 0

20

40

60

80

100

Total Prop OA LSOAMSOA

Fig. 3 Percentage unexplained in log house prices in MLM model

´LegendANN3 predicted error%

<-25%

25-50%

>50%

´Legend

MLM3 predicted error%<-25%25%-50%>50%

Fig. 4 Predicted error percentage by ANN and MLM in Scenario 3

V. CONCLUSIONS

This paper compared MLM and ANN approach to modelling housing prices with the widely accepted HPM approach in terms of goodness-of-fit, predictive accuracy and explanatory power. Neither ANN nor HPM is capable of including neighbourhood delineation variables in the model due to the large number of variables required for specification, while MLM is able to specify by simply defining them as macro-level units at multiple scales. All performance measures show that MLM is superior to ANN and HPM in

each scenario, indicating that specification of neighbourhood is helpful in house price predictions, even when the locations or neighbourhood characteristics have been included in the model. MLM also indicates how much variance remained to be explained and at what level, providing further insights on the importance of neighbourhoods. This study has also demonstrated that ANN can be applied to large dataset, but its predictive accuracy is not necessarily better than standard HPM. This is contrary to many other studies that find ANN is superior to HPM.

Further studies could extend it in a number ways, such as allowing each neighbourhood to have its own time trend, or permitting the effects of property characteristics to be differential between neighbourhoods (random slopes model); including cross-level interactions between property and neighbourhood characteristics; and alternative definitions of neighbourhood. While such developments would not necessarily bring improved explanatory understanding, they are likely to bring improved predictive accuracy. The multilevel approach is capable of more fully exploiting the data on both properties and neighbourhoods to achieve better predictions, to understand what is driving the predictions and pointing to what scale the unexplained elements are to be found.

ACKNOWLEDGMENT

This work was supported by the Economic and Social Research Council [grant number EDUC.SC3325]. The house price paid data covers the transactions received at Land Registry in the period 1st Jan 2001 to 31st Dec 2013. © Crown copyright 2013.

REFERENCES[1] F. Kong, H. Yin and N. Nakagoshi, “Using GIS and landscape metrics

in the hedonic price modeling of the amenity value of urban green space: A case study in Jinan City, China,” Landscape and Urban Planning, vol. 79, pp. 240–252, 2007.

[2] L. Anselin and J. Le Gallo, “Interpolation of Air Quality Measures in Hedonic House Price Models: Spatial Aspects,” Spatial Economic Analysis, vol. 1(1), pp. 31–52, 2006.

[3] A. S. Adair, J.N. Berry and W.S. McGreal, “Hedonic modelling, housing submarkets and residential valuation,” Journal of Property Research, vol. 13(1), pp. 67–83, 1996

[4] M.M. Li and H.J. Brown, “ Micro-Neighborhood Externalities, Micro-neighborhood Prices, Hedonic Housing,” Land Economics,, vol. 56(2), pp. 125–141, 1980.

[5] A. M. Freeman, "Hedonic Prices, Property Values and Measuring Environmental Benefits: A Survey of the Issues," The Scandinavian Journal of Economics, vol. 81(2), pp. 154–173, 2011.

[6] K.W. Chau, C. Y. Yiu, S.K. Wong and L.W. Lai, "Hedonic Price Modelling of Environmental Attributes: A Review of the Literature and a Hong Kong Case Study," Encyclopaedia of Life Support Systems (EOLSS) Oxford, UK: UNESCO, 2004, vol. 2.

[7] S.G. Sirmans, D.A. Macpherson and E. N. Zietz, "The Composition of Hedonic Pricing Models," Journal of Real Estate Literature vol. 13(1), pp. 1-44, 2005.

[8] S. Malpezzi, “Hedonic Pricing Models: A Selective and Applied Review”, in Housing Economics and Public Policy, T. O’Sullivan and K. Gibb (eds.), Malder, MA: Blackwell, 2003.

[9] O. Bin, “A prediction comparison of housing sales prices by parametric versus semi-parametric regressions,” Journal of Housing Economics, vol. 13, pp. 68–84, 2004.

[10] A. Páez and D.M. Scott, “Spatial Statistics for Urban Analysis: a Review of Techniques with Examples,” GeoJournal, vol.61(1), pp. 53–67, 2004.

[11] S. McGreal, A. Adair, D. McBurney and D. Patterson, “Neural Networks: the Prediction of Residential Values,” Journal of Property Valuation and Investment, vol. 16(1), pp. 57–70, 1998.

[12] P. Rossini, “Application of Artificial Neural Networks to the Valuation of Residential Property,” in 3rd Annual Pacific-Rim Real Estate Society Conference Palmerston North, New Zealand, 20th-22nd January. 1997

[13] T. Snijders and R. Bosker, Multi- level Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Thousand Oaks, CA: Sage, 1999.

[14] K. Jones and N. Bullen, “Contextual Models of Urban House Prices : A Comparison of Fixed- and Random-Coefficient Models Developed by Expansion,” Economic Geography, vol. 70(3), pp. 252–272, 1994.

[15] K. Brown and B. Uyar, “A Hierarchical Linear Model Approach for Assessing the Effects of House and Neighborhood Characteristics on Housing Prices,” Journal of Real Estate Practice and Education, 7(1), pp. 15–24, 2004.

[16] M. Habib and E Miller, “Influence of Transportation Access and Market Dynamics on Property Values: Multilevel Spatiotemporal Models of Housing Price,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2076(-1), pp. 183–191, 2008.

[17] W. Shin, J. Saginor and S Zandt, “Evaluating Subdivision Characteristics on Single-Family Housing Value Using Hierarchical Linear Modeling,” Journal of Real Estate Research, vol. 33(3), pp. 317–348, 2011.

[18] C. Chasco and J. Gallo, “The Impact of Objective and Subjective Measures of Air Quality and Noise on House Prices:A Multilevel Approach for Downtown Madrid,” Economic Geography, vol. 89(2), pp. 127–148, 2012.

[19] W. Brunauer, S. Lang and N. Umlauf, “Modelling house prices using multilevel structured additive regression,” Statistical Modelling, vol. 13(2), pp. 95–123, 2013.

[20] B. Uyar and K. Brown, “Neighborhood Affluence, School-Achievement Scores and Housing,” Journal of Housing Research, vol. 16(2), pp. 97–116, 2007.

[21] A. Goodman and T. Thibodeau, “Housing market segmentation and hedonic prediction accuracy,” Journal of Housing Economics, vol. 12(3), pp. 181–201, 2003.

[22] A. Goodman and T. Thibodeau, “Housing Market Segmentation,” Journal of Housing Economics, vol. 7(2), pp. 121–143, 1998.

[23] C. Leishman, “Spatial Change and the Structure of Urban Housing Sub-markets,” Housing Studies, vol. 24(5), pp. 563–585, 2009.

[24] M. Caudill, “Neural Network Primer: Part III,” AI Expert, vol. 3(6), pp. 53-59, 1988.

[25] D. Rumelhart, G. Hinton, and R. Williams, “Learning Representations By Back-Propagating Errors,” Nature, vol. 323(6088), pp. 533–536, 1986

[26] R. Callan, The Essence of Neural Networks, London: Prentice Hall Europe, 1999

[27] W. McCluskey, P. Davis, M. Haran, M. McCord and D. McIlhatton, "The potential of artificial neural networks in mass appraisal: the case revisited", Journal of Financial Management of Property and Construction, vol. 17(3), pp. 274 – 292, 2012.

[28] W. McCluskey, M. McCord, P. Davis, M. Haran and D. McIlhatton, “Prediction accuracy in mass appraisal: a comparison of modern approaches,” Journal of Property Research, vol. 30(4), pp. 239–265, 2013

[29] S. Peterson and A Flanagan, “Neural Network Hedonic Pricing Models in Mass Real Estate Appraisal,” Journal of Real Estate Research, vol. 31(2), pp. 147–164, 2009.

[30] V. Limsombunchai, C. Gan and M. Lee, “House Price prediction: Hedonic Price Model vs.Artificial Neural Network,” American Journal of Applied Sciences, vol. 1(3), pp. 193–201, 2004.

[31] N. Nguyen and A.Cripps, “Predicting Housing ValueL A Comparison of Multiple Reression Analysis and Artificial Neural Networks,” Journal of Real Estate Research, vol. 22(3), pp. 313–336, 2001.

[32] A. Cechin, A. Souto and M. Gonzalez, “Real estate value at Porto Alegre city using artificial neural networks”, in Proc. 6th Brazilian Symposium on Neural Networks, 2000.

[33] W. Mccluskey and S. Anand, “The Application of Intelligent Hybrid Techniques for the Mass Appraisal of Residential Properties,”Journal of Property Investment & Finance, vol. 17(3), pp.218–239, 1999.

[34] Q. Do and G. Grudnitski, “A neural network approach to residential property appraisal,” The Real Estate Appraiser: pp.38-45, 1992.

[35] E. Worzala, M. Lenk and A. Silva, "An Exploration of Neural Networks and Its Application to Real Estate Valuation," The Journal of Real Estate Research, (January), pp.185–202, 1995.

[36] J. Zurada, A. Levitan and J. Guan, “A Comparison of Regression and Artificial Intelligence Methods in a Mass Appraisal Context,” Journal of Real Estate Research, vol. 33(3), pp.349–387, 2011.

[37] D. Unwin, “The areal extension of rainfall records,” Journal of Hydrology, vol. 7, pp. 404–414, 1969.

[38] D. Belsley, E. Kuh, R. Welsch. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: John Wiley and Sons, 1980

[39] Rasbash, J., Charlton, C., Browne, W.J., Healy, M. and Cameron, B. 2009. MLwiN Version 2.1. Centre for Multilevel Modelling, University of Bristol.

[40] IBM SPSS Statistics for Windows, Version 21.0. IBM Corp. Released 2012. Armonk, NY: IBM Corp.

[41] K. Jones and S. V. Subramanian, Developing multilevel models for analysing contextuality, heterogeneity and change, Vol. 2, 2013.

IEEE Paper Template in A4 (V1) - University of Bristol · Web viewIn total, 9 models are specified....

Documents

Transcript of IEEE Paper Template in A4 (V1) - University of Bristol · Web viewIn total, 9 models are specified....