Gary Grudnitski - Valuations of Residential Properties

7/28/2019 Gary Grudnitski - Valuations of Residential Properties

1/7
http://www.reebooks.net/


2/7

Economics, Finance, and Business

G6.4 Valuations of residential properties using a neural

network

Gary Grudnitski

Abstract

With the advent of large computerized databases, computational techniques are being

relied on more frequently to estimate residential property values. As an alternative to the

most commonly used computational technique of multiple regression, this application

describes how a neural network was applied to estimate the selling price of single-

family residential properties in one area of a large California city. For the holdout

sample of 100 properties, the average absolute difference between the actual selling priceand the estimated selling price generated by the neural network was 9.48%. In terms

of comparative accuracy, the network was able to achieve, on average, more accurate

valuations of properties than the multiple regression model in the holdout sample. The

network also produced more accurate valuations than the multiple regression model for

57 out of the 100 residential properties in the holdout sample.

G6.4.1 Design process

Accurate, economical and justifiable valuation of residential property is of great importance to mortgage

holders who wish to value their portfolios, to prospective lenders who are contemplating the issuance of

new mortgages, and to local government authorities who must know the worth of their tax base. As large

computerized databases become increasingly more common, computational techniques, especially multiple

regression, are being relied on more frequently to assess residential property values.

Residential property, like many other commodities, can be viewed as bundles of attributes. A problem

in valuing residential property exists, however, because the prices of a propertys individual components

are both unobservable and devoid of an implicit market. Empirically, the choice of pricing equations that

value a propertys individual components often appears to be dictated by the nature of the available data

and the tendency of those providing the estimates to fixate on goodness of fit criteria. On one hand,

this is understandable because pricing equations for residential property represent, in reduced form, an

interaction between both supply and demand, and thus make the specification of an exact functional form

difficult. On the other hand, however, housing price estimates that critically depend on the functional form

chosen can be negatively impacted by this imprecision in the specification of pricing equations.

In an attempt to mitigate the negative effects on estimates of property values due to imprecision in the

specification of the valuation equation, what follows is a description of how a standard backpropagation C1.2

neural network (Rumelhart and McLelland 1986) is applied to estimate the selling price of single-family

residences. To measure the relative performance of the network, prices produced are compared to

estimations generated by a multiple regression model.

c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 G6.4:1
http://ncc1_2.pdf/http://ncc1_2.pdf/


3/7

Valuations of residential properties using a neural network

Table G6.4.1. An example of data downloaded from the MLS describing a sales transaction.

PT1 SINGLE FAMILY-DETACHED 08/28/93 03:49 PMLP: 189,000 STATUS: SOLD MT: 82 LD: 01/12/93 XD: 06/12/93 REF# 69SP: 186,000 OLP: 189,000 FIN: OMD: 05/27/93 LNO: 93 6000909AD: 13131 OLD WEST DR ZIP: 92129 APN: 3151703700MC: 30F3 XST: TED WMS PRWY COM: RP NCD: CRM YB: 1987 ZN: NONEBR: 3 OPBR: BATH: 2.5 ESF: 1638 SSF: ASSES TRM:

LR : 11X17 FP : F PTO: SLAB HOF: 101 TLB: 0DR : 10X10 TV : C EXT: STUCO HFP: MONTHLY TD1: 0FAM: 12X14 R/O: R/O ELEC RF : CNSHK HFI: GTC IN1: 0.0 AS1:KIT: 11X10 DW : DISHWASH SWR: SEWER OF : 0 LT1:MBR: 13X20 MW : MICRO BI SPA: NONE OFP: NONE KNO TD2: 0BR2: 11X10 TC : IRR: SPRINKLE TOF: NONE KNO IN2: 0.0 AS2:BR3: 11X11 HT : FAG FLR: SLAB LDY: GAR LT2:BR4: 0 WH : ALU: NONE KNO LSZ: 8500 AST: NONE KNOWNBR5: 0 SEC: EQPT OWN GUEST: NONE ACS: 0.00 BF : NONE KNOWNXRM: 0 VU : NK AGEREST: NONE LSF: 0 EQP: D,E,F,G,KSTY: 2 STO PL: YES CL: CFA PKG: 2GREMARKS: THIS PLAN 3 CAMBRIDGE HAS IT ALL! MINT CONDITION WITH NEW BERBER CARPETNEW WINDOW TREATMENTS, NEW FLOORING IN BATHROOMS SEC SYS, 2 PATIOS, PATIO COVER,BUILT-IN GAS BRICK BBQ, SOFT WTR SYS, REFINISHED KITCHEN CABINETS, LANDSCAPED WITHAUTO SPRINKLERS. SHOWS TERRIFIC! GATE CODE * 0289

G6.4.1.1 Description

Source data representing the sale of a residential property were obtained from the San Diego Board of

Realtors multiple listing service (MLS). For this application, data on single-family homes sold during

199293 in Rancho Penasquitos, a northern suburb of San Diego County, California, were electronically

downloaded. A typical entry for one of these properties is shown in table G6.4.1.

From the downloaded MLS residential sales data, a parser, written in C, extracted the following nine

descriptors for each property (these descriptors are shown in bold in table G6.4.1): SP is the actual selling

price, YB is the age of the structure in years, derived by subtracting the year the house was built from

1992, BR is the number of bedrooms, BATH is the number of bathrooms in increments of 1/4 baths, ESF

is estimated total square footage of the house, LSZ is the lot size measured in square feet, STY is thenumber of stories, PL/SPA indicates if a pool or spa existed (0 otherwise) and PKG is the number of

car-garages.

For the sample, descriptive statistics for the continuous variables are presented in table G6.4.2. In

addition, for the PL/SPA variable, 31% of the houses in the sample had either a pool or spa. Data from

the parser were then passed to an Excel spreadsheet. Using the spreadsheet, each of the values of the

variables was normalized according to equation (G6.4.1) and output to the neural network software.

inorm = (imin)/range (G6.4.1)

where inorm is the vector of normalized values of the variable, i is the vector of original values of the

variable, min is the minimum original value of the variable, and range is the range of the original values

of the variable.

G6.4.1.2 Topology

The topology of the network to estimate the selling price of a house is depicted in figure G6.4.1. This

standard backpropagation network consisted of an input layer of eight neurons, a hidden layer of N

neurons, and an output layer of a single neuron. The eight neurons in the input layer of the network

captured the attributes believed to determine a propertys value. The single neuron in the output layer

represented the networks determination of the selling price of a house. Values estimated by the network

fell within a range of 0 to 1 to achieve comparability to the previously transformed (also according to

equation (G6.4.1)) actual selling prices of these houses.



4/7


Table G6.4.2. Descriptive statistics of the continuous variables in the sample.

Variable Variable Overall mean Minimumabbreviation definition (std. deviation) (maximum)

SP Selling price ($) 214 112 150 000(32 997) (365 000)

YB Age (yrs) 9.19 0

(5.32) (22)BR Number of bedrooms 3.72 2(0.67) (6)

BATH Number of bathrooms 2.51 2(0.41) (4)

ESF Total square footage 1991 1100(413) (3009)

LSZ Lot size (sq. ft.) 8246 3746(4464) (44 866)

STY Number of stories 1.75 1(0.43) (2)

PKG Number of 2.16 1car-garages (0.37) (3)

Figure G6.4.1. Topology of the neural network.

G6.4.2 Training methods

The data set was randomly divided into three subsets. The first subset of the data, made up of 119

properties, was used to train the network. The second subset of the data, called the training-test set,

consisted of 30 properties. It was used to check the ability of the supposedly trained neural network to

generalize (i.e. to prevent overtraining), and to select the optimal number of hidden-layer neurons (Masters B3.5

1993, p 183). The third subset of the data consisted of 100 properties, and was used to assess the ability

of the network to estimate property values accurately.

The neural network software was written in C for a personal computer and is available as shareware

from Roy W Dobbins (Eberhart and Dobbins 1990). The network was run on a 33MHz 486DX. With

random starting weights between 5.0, and a learning coefficient and momentum factor of 0.1 and 0.6,

respectively, networks employing a logistic activation function and having from two to four neurons in

their hidden layer were trained. Figure G6.4.2 graphs the average absolute error(i.e. (estimated selling

price actual selling price)/actual selling price)of the training set against the average absolute error

of the training-test set for from two to four hidden-layer neurons at 2000, 4000, 6000, 8000 and 10 000

training iterations.

Figure G6.4.2 indicates for this training and training-test sample the superiority of a network with

two neurons in its hidden layer. Specifically, contrast the plot of the error of the network with two neurons

in its hidden layer to the plot of the error of the network with three neurons in its hidden layer. While

the error of the network with two neurons in its hidden layer moves consistently down and to the left as

http://ncb3_5.pdf/http://ncb3_5.pdf/


5/7


Figure G6.4.2. Average absolute error for the training set and training-test set when the number of hidden-

layer neurons is varied from 2 to 4.

the number of iterations increases from 2000 to 10 000, the plot of the training-test error for the network

with three neurons in its hidden layer initially declines from 0.0844 at 2000 iterations to 0.0831 at 4000

iterations, but then begins to rise fairly uniformly to 0.0848 at 10 000 iterations.

G6.4.3 Output interpretation

In terms of overall estimation of the selling price of the 100 properties in the test sample, the trained

network with two neurons in its hidden layer resulted in an average absolute error of 9.48%. The smallest

and largest individual absolute errors in estimating the selling price of the test sample residential properties

were 0.3% and 38.7%, respectively. Figure G6.4.3 graphs the absolute error of the networks prediction,ordered by the size of the absolute error, for the test sample of 100 properties. It shows that 28% of the

determinations were in error by less than 5%, 65% of the determinations were in error by less than 10%,

and 12% of the determinations of the network were in error by more than 20%.

Figure G6.4.3. Absolute error for the 100 test-set properties.



6/7


G6.4.4 Comparison with multiple regression

A linear multiple regression model was derived based on the 119 properties in the training sample. The

regression coefficients and their corresponding t values are given in table G6.4.3.

Table G6.4.3. Statistics for the multiple regression model.

Variable Variable Coefficient t valueabbreviation definition (std. error) (Prob > |t| )

Intercept 132 422 11.00(125042) (0.0001)

YB Age (yrs) 1769 8.51(208) (0.0001)

BR Number of bedrooms 2878 1.47(1964) (0.1458)

BATH Number of bathrooms 1789 0.34(5227) (0.7328)

ESF Total square footage 36 8.95(4) (0.0001)

LSZ Lot size (sq. ft.) 0.73 2.56(0.29) (0.0118)

STY Number of stories 1337 0.40

(3349) (0.6909)PL/SPA Existence of 5502.23 1.45

a pool or spa (3788.47) (0.1505)PKG Number of 3281 0.77

car-garages (4238) (0.4405)

In terms of statistical performance, the multiple regression model had an adjusted R-squared of 0.689

and an F value of 33.7. In terms of estimation performance, the multiple regression model resulted in an

average absolute error of 11.6% in estimating the selling price of the test sample properties. Thirty-six

per cent of the determinations of the multiple regression model were in error by less than 5%, 54% of the

determinations were in error by less than 10%, and 9% of the determinations were in error by more than

20%. Further, for 57 out of 100 test sample properties, the absolute error of the multiple regression model

exceeded that of the network.

G6.4.5 Conclusion

While for this sample of residential properties the network produced more accurate overall estimates of

selling prices than the multiple regression model, the networks average absolute error was still relatively

high and some of its errors were unacceptably large. These weaknesses are likely to be attributable to

two sources. First and most importantly, a number of potentially significant variables have been omitted

from the pricing equation. These include view characteristics of the property such as canyon, mountain,

and ocean; specific neighborhood location parameters, such as those that might be obtained by reference

to the Thomas Guide 0.25 square-mile grid identifier; and other physical attributes of a house such as the

existence of air conditioning, the type of roof, and the presence of a security system.

A second factor that contributed to the size of the network error was the source data. The source data

describing a property were supplied by the listing agent and are subject to buyer verification. Althoughthese agents attempt to describe the property as completely as possible, frequently the data were incomplete

or erroneous.

References

Eberhart R C and Dobbins R W (eds) 1990 Neural Network PC Tools (San Diego, CA: Academic)

Masters T 1993 Practical Neural Network Recipes in C++ (San Diego, CA: Academic)

Rumelhart D E and McLelland J L 1986 Parallel Distributed Processing vol 1 (Cambridge, MA: MIT Press)



7/7
http://www.reebooks.net/

Gary Grudnitski - Valuations of Residential Properties

Documents

Transcript of Gary Grudnitski - Valuations of Residential Properties