Gary Grudnitski - Valuations of Residential Properties

download Gary Grudnitski - Valuations of Residential Properties

of 7

Transcript of Gary Grudnitski - Valuations of Residential Properties

  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    1/7

    http://www.reebooks.net/
  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    2/7

    Economics, Finance, and Business

    G6.4 Valuations of residential properties using a neural

    network

    Gary Grudnitski

    Abstract

    With the advent of large computerized databases, computational techniques are being

    relied on more frequently to estimate residential property values. As an alternative to the

    most commonly used computational technique of multiple regression, this application

    describes how a neural network was applied to estimate the selling price of single-

    family residential properties in one area of a large California city. For the holdout

    sample of 100 properties, the average absolute difference between the actual selling priceand the estimated selling price generated by the neural network was 9.48%. In terms

    of comparative accuracy, the network was able to achieve, on average, more accurate

    valuations of properties than the multiple regression model in the holdout sample. The

    network also produced more accurate valuations than the multiple regression model for

    57 out of the 100 residential properties in the holdout sample.

    G6.4.1 Design process

    Accurate, economical and justifiable valuation of residential property is of great importance to mortgage

    holders who wish to value their portfolios, to prospective lenders who are contemplating the issuance of

    new mortgages, and to local government authorities who must know the worth of their tax base. As large

    computerized databases become increasingly more common, computational techniques, especially multiple

    regression, are being relied on more frequently to assess residential property values.

    Residential property, like many other commodities, can be viewed as bundles of attributes. A problem

    in valuing residential property exists, however, because the prices of a propertys individual components

    are both unobservable and devoid of an implicit market. Empirically, the choice of pricing equations that

    value a propertys individual components often appears to be dictated by the nature of the available data

    and the tendency of those providing the estimates to fixate on goodness of fit criteria. On one hand,

    this is understandable because pricing equations for residential property represent, in reduced form, an

    interaction between both supply and demand, and thus make the specification of an exact functional form

    difficult. On the other hand, however, housing price estimates that critically depend on the functional form

    chosen can be negatively impacted by this imprecision in the specification of pricing equations.

    In an attempt to mitigate the negative effects on estimates of property values due to imprecision in the

    specification of the valuation equation, what follows is a description of how a standard backpropagation C1.2

    neural network (Rumelhart and McLelland 1986) is applied to estimate the selling price of single-family

    residences. To measure the relative performance of the network, prices produced are compared to

    estimations generated by a multiple regression model.

    c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 G6.4:1

    http://ncc1_2.pdf/http://ncc1_2.pdf/
  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    3/7

    Valuations of residential properties using a neural network

    Table G6.4.1. An example of data downloaded from the MLS describing a sales transaction.

    PT1 SINGLE FAMILY-DETACHED 08/28/93 03:49 PMLP: 189,000 STATUS: SOLD MT: 82 LD: 01/12/93 XD: 06/12/93 REF# 69SP: 186,000 OLP: 189,000 FIN: OMD: 05/27/93 LNO: 93 6000909AD: 13131 OLD WEST DR ZIP: 92129 APN: 3151703700MC: 30F3 XST: TED WMS PRWY COM: RP NCD: CRM YB: 1987 ZN: NONEBR: 3 OPBR: BATH: 2.5 ESF: 1638 SSF: ASSES TRM:

    LR : 11X17 FP : F PTO: SLAB HOF: 101 TLB: 0DR : 10X10 TV : C EXT: STUCO HFP: MONTHLY TD1: 0FAM: 12X14 R/O: R/O ELEC RF : CNSHK HFI: GTC IN1: 0.0 AS1:KIT: 11X10 DW : DISHWASH SWR: SEWER OF : 0 LT1:MBR: 13X20 MW : MICRO BI SPA: NONE OFP: NONE KNO TD2: 0BR2: 11X10 TC : IRR: SPRINKLE TOF: NONE KNO IN2: 0.0 AS2:BR3: 11X11 HT : FAG FLR: SLAB LDY: GAR LT2:BR4: 0 WH : ALU: NONE KNO LSZ: 8500 AST: NONE KNOWNBR5: 0 SEC: EQPT OWN GUEST: NONE ACS: 0.00 BF : NONE KNOWNXRM: 0 VU : NK AGEREST: NONE LSF: 0 EQP: D,E,F,G,KSTY: 2 STO PL: YES CL: CFA PKG: 2GREMARKS: THIS PLAN 3 CAMBRIDGE HAS IT ALL! MINT CONDITION WITH NEW BERBER CARPETNEW WINDOW TREATMENTS, NEW FLOORING IN BATHROOMS SEC SYS, 2 PATIOS, PATIO COVER,BUILT-IN GAS BRICK BBQ, SOFT WTR SYS, REFINISHED KITCHEN CABINETS, LANDSCAPED WITHAUTO SPRINKLERS. SHOWS TERRIFIC! GATE CODE * 0289

    G6.4.1.1 Description

    Source data representing the sale of a residential property were obtained from the San Diego Board of

    Realtors multiple listing service (MLS). For this application, data on single-family homes sold during

    199293 in Rancho Penasquitos, a northern suburb of San Diego County, California, were electronically

    downloaded. A typical entry for one of these properties is shown in table G6.4.1.

    From the downloaded MLS residential sales data, a parser, written in C, extracted the following nine

    descriptors for each property (these descriptors are shown in bold in table G6.4.1): SP is the actual selling

    price, YB is the age of the structure in years, derived by subtracting the year the house was built from

    1992, BR is the number of bedrooms, BATH is the number of bathrooms in increments of 1/4 baths, ESF

    is estimated total square footage of the house, LSZ is the lot size measured in square feet, STY is thenumber of stories, PL/SPA indicates if a pool or spa existed (0 otherwise) and PKG is the number of

    car-garages.

    For the sample, descriptive statistics for the continuous variables are presented in table G6.4.2. In

    addition, for the PL/SPA variable, 31% of the houses in the sample had either a pool or spa. Data from

    the parser were then passed to an Excel spreadsheet. Using the spreadsheet, each of the values of the

    variables was normalized according to equation (G6.4.1) and output to the neural network software.

    inorm = (imin)/range (G6.4.1)

    where inorm is the vector of normalized values of the variable, i is the vector of original values of the

    variable, min is the minimum original value of the variable, and range is the range of the original values

    of the variable.

    G6.4.1.2 Topology

    The topology of the network to estimate the selling price of a house is depicted in figure G6.4.1. This

    standard backpropagation network consisted of an input layer of eight neurons, a hidden layer of N

    neurons, and an output layer of a single neuron. The eight neurons in the input layer of the network

    captured the attributes believed to determine a propertys value. The single neuron in the output layer

    represented the networks determination of the selling price of a house. Values estimated by the network

    fell within a range of 0 to 1 to achieve comparability to the previously transformed (also according to

    equation (G6.4.1)) actual selling prices of these houses.

    c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 G6.4:2

  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    4/7

    Valuations of residential properties using a neural network

    Table G6.4.2. Descriptive statistics of the continuous variables in the sample.

    Variable Variable Overall mean Minimumabbreviation definition (std. deviation) (maximum)

    SP Selling price ($) 214 112 150 000(32 997) (365 000)

    YB Age (yrs) 9.19 0

    (5.32) (22)BR Number of bedrooms 3.72 2(0.67) (6)

    BATH Number of bathrooms 2.51 2(0.41) (4)

    ESF Total square footage 1991 1100(413) (3009)

    LSZ Lot size (sq. ft.) 8246 3746(4464) (44 866)

    STY Number of stories 1.75 1(0.43) (2)

    PKG Number of 2.16 1car-garages (0.37) (3)

    Figure G6.4.1. Topology of the neural network.

    G6.4.2 Training methods

    The data set was randomly divided into three subsets. The first subset of the data, made up of 119

    properties, was used to train the network. The second subset of the data, called the training-test set,

    consisted of 30 properties. It was used to check the ability of the supposedly trained neural network to

    generalize (i.e. to prevent overtraining), and to select the optimal number of hidden-layer neurons (Masters B3.5

    1993, p 183). The third subset of the data consisted of 100 properties, and was used to assess the ability

    of the network to estimate property values accurately.

    The neural network software was written in C for a personal computer and is available as shareware

    from Roy W Dobbins (Eberhart and Dobbins 1990). The network was run on a 33MHz 486DX. With

    random starting weights between 5.0, and a learning coefficient and momentum factor of 0.1 and 0.6,

    respectively, networks employing a logistic activation function and having from two to four neurons in

    their hidden layer were trained. Figure G6.4.2 graphs the average absolute error(i.e. (estimated selling

    price actual selling price)/actual selling price)of the training set against the average absolute error

    of the training-test set for from two to four hidden-layer neurons at 2000, 4000, 6000, 8000 and 10 000

    training iterations.

    Figure G6.4.2 indicates for this training and training-test sample the superiority of a network with

    two neurons in its hidden layer. Specifically, contrast the plot of the error of the network with two neurons

    in its hidden layer to the plot of the error of the network with three neurons in its hidden layer. While

    the error of the network with two neurons in its hidden layer moves consistently down and to the left as

    c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 G6.4:3

    http://ncb3_5.pdf/http://ncb3_5.pdf/
  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    5/7

    Valuations of residential properties using a neural network

    Figure G6.4.2. Average absolute error for the training set and training-test set when the number of hidden-

    layer neurons is varied from 2 to 4.

    the number of iterations increases from 2000 to 10 000, the plot of the training-test error for the network

    with three neurons in its hidden layer initially declines from 0.0844 at 2000 iterations to 0.0831 at 4000

    iterations, but then begins to rise fairly uniformly to 0.0848 at 10 000 iterations.

    G6.4.3 Output interpretation

    In terms of overall estimation of the selling price of the 100 properties in the test sample, the trained

    network with two neurons in its hidden layer resulted in an average absolute error of 9.48%. The smallest

    and largest individual absolute errors in estimating the selling price of the test sample residential properties

    were 0.3% and 38.7%, respectively. Figure G6.4.3 graphs the absolute error of the networks prediction,ordered by the size of the absolute error, for the test sample of 100 properties. It shows that 28% of the

    determinations were in error by less than 5%, 65% of the determinations were in error by less than 10%,

    and 12% of the determinations of the network were in error by more than 20%.

    Figure G6.4.3. Absolute error for the 100 test-set properties.

    c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 G6.4:4

  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    6/7

    Valuations of residential properties using a neural network

    G6.4.4 Comparison with multiple regression

    A linear multiple regression model was derived based on the 119 properties in the training sample. The

    regression coefficients and their corresponding t values are given in table G6.4.3.

    Table G6.4.3. Statistics for the multiple regression model.

    Variable Variable Coefficient t valueabbreviation definition (std. error) (Prob > |t| )

    Intercept 132 422 11.00(125042) (0.0001)

    YB Age (yrs) 1769 8.51(208) (0.0001)

    BR Number of bedrooms 2878 1.47(1964) (0.1458)

    BATH Number of bathrooms 1789 0.34(5227) (0.7328)

    ESF Total square footage 36 8.95(4) (0.0001)

    LSZ Lot size (sq. ft.) 0.73 2.56(0.29) (0.0118)

    STY Number of stories 1337 0.40

    (3349) (0.6909)PL/SPA Existence of 5502.23 1.45

    a pool or spa (3788.47) (0.1505)PKG Number of 3281 0.77

    car-garages (4238) (0.4405)

    In terms of statistical performance, the multiple regression model had an adjusted R-squared of 0.689

    and an F value of 33.7. In terms of estimation performance, the multiple regression model resulted in an

    average absolute error of 11.6% in estimating the selling price of the test sample properties. Thirty-six

    per cent of the determinations of the multiple regression model were in error by less than 5%, 54% of the

    determinations were in error by less than 10%, and 9% of the determinations were in error by more than

    20%. Further, for 57 out of 100 test sample properties, the absolute error of the multiple regression model

    exceeded that of the network.

    G6.4.5 Conclusion

    While for this sample of residential properties the network produced more accurate overall estimates of

    selling prices than the multiple regression model, the networks average absolute error was still relatively

    high and some of its errors were unacceptably large. These weaknesses are likely to be attributable to

    two sources. First and most importantly, a number of potentially significant variables have been omitted

    from the pricing equation. These include view characteristics of the property such as canyon, mountain,

    and ocean; specific neighborhood location parameters, such as those that might be obtained by reference

    to the Thomas Guide 0.25 square-mile grid identifier; and other physical attributes of a house such as the

    existence of air conditioning, the type of roof, and the presence of a security system.

    A second factor that contributed to the size of the network error was the source data. The source data

    describing a property were supplied by the listing agent and are subject to buyer verification. Althoughthese agents attempt to describe the property as completely as possible, frequently the data were incomplete

    or erroneous.

    References

    Eberhart R C and Dobbins R W (eds) 1990 Neural Network PC Tools (San Diego, CA: Academic)

    Masters T 1993 Practical Neural Network Recipes in C++ (San Diego, CA: Academic)

    Rumelhart D E and McLelland J L 1986 Parallel Distributed Processing vol 1 (Cambridge, MA: MIT Press)

    c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 G6.4:5

  • 7/28/2019 Gary Grudnitski - Valuations of Residential Properties

    7/7

    http://www.reebooks.net/