Fauna habitat modelling and mapping: A review and case study in...

20
Austral Ecology (2005) 30, 719–738 *Corresponding author. Accepted for publication March 2005. Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW BRENDAN A. WINTLE,* JANE ELITH AND JOANNE M. POTTS School of Botany, The University of Melbourne,Vic. 3010, Australia (Email: [email protected]) Abstract Habitat models are now broadly used in conservation planning on public lands. If implemented correctly, habitat modelling is a transparent and repeatable technique for describing and mapping biodiversity values, and its application in peri-urban and agricultural landscape planning is likely to expand rapidly. Conservation planning in such landscapes must be robust to the scrutiny that arises when biodiversity constraints are placed on developers and private landholders. A standardized modelling and model evaluation method based on widely accepted techniques will improve the robustness of conservation plans. We review current habitat modelling and model evaluation methods and provide a habitat modelling case study in the New South Wales central coast region that we hope will serve as a methodological template for conservation planners. We make recommendations on modelling methods that are appropriate when presence-absence and presence-only survey data are available and provide methodological details and a website with data and training material for modellers. Our aim is to provide practical guidelines that preserve methodological rigour and result in defendable habitat models and maps. The case study was undertaken in a rapidly developing area with substantial biodiversity values under urbanization pressure. Habitat maps for seven priority fauna species were developed using logistic regression models of species-habitat relationships and a bootstrapping methodology was used to evaluate model predictions. The modelled species were the koala, tiger quoll, squirrel glider, yellow-bellied glider, masked owl, powerful owl and sooty owl. Models ranked sites adequately in terms of habitat suitability and provided predictions of sufficient reliability for the purpose of identifying preliminary conservation priority areas. However, they are subject to multiple uncertainties and should not be viewed as a completely accurate representation of the distribution of species habitat. We recommend the use of model prediction in an adaptive framework whereby models are iteratively updated and refined as new data become available. Key words: bootstrapping, conservation planning, habitat modelling, logistic regression, model evaluation, ROC. INTRODUCTION Governments at all levels place considerable emphasis on urban and regional planning, and have commit- ments to ensure that developments are socially and ecologically sustainable (Commonwealth of Australia 2003). Protected area planning exercises in Australia over the past 10 years have utilized statistical habitat modelling methods to define biodiversity attributes (National Parks & Wildlife Service 1998; Ferrier et al. 2002a). Many of the theoretical and technical advances in habitat modelling and evaluation methods have come about in response to the need for better information in public land planning in Australia (Aus- tin & Meyers 1996; Ferrier & Watson 1997; Elith & Burgman 2002; Ferrier et al. 2002a,b; Pearce et al. 2001a). However, there is currently little scientific input into the biodiversity aspects of the urban planning process and consideration of biodiversity values is characteris- tically ad hoc. Over 40% of Australia’s nationally listed threatened ecological communities and more than 50% of threatened species occur in urban fringe areas (Yencken & Wilkinson 2000) and rapidly increasing urbanization rates are their primary threat. Although the areal extent of urban development is usually rela- tively small, the magnitude of the impacts is often large. Urbanization is second only to land clearing for agriculture as a threat to Australia’s biodiversity (Burgman & Lindenmayer 1998) and there is an urgent need to improve conservation planning prac- tices in urban fringe areas. The efficacy of conservation planning relies criti- cally on the quality of the underlying biodiversity information (Pressey et al. 1999; Wilson et al. 2005). Several authors have noted the impracticality of com- plete biological inventory and problems arising when incomplete biological survey data are used as a basis

Transcript of Fauna habitat modelling and mapping: A review and case study in...

Page 1: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

Austral Ecology (2005) 30 719ndash738

Corresponding authorAccepted for publication March 2005

Fauna habitat modelling and mapping A review and case study in the Lower Hunter Central Coast region of NSW

BRENDAN A WINTLE JANE ELITH AND JOANNE M POTTSSchool of Botany The University of Melbourne Vic 3010 Australia (Email brendanwunimelbeduau)

Abstract Habitat models are now broadly used in conservation planning on public lands If implementedcorrectly habitat modelling is a transparent and repeatable technique for describing and mapping biodiversityvalues and its application in peri-urban and agricultural landscape planning is likely to expand rapidlyConservation planning in such landscapes must be robust to the scrutiny that arises when biodiversity constraintsare placed on developers and private landholders A standardized modelling and model evaluation method basedon widely accepted techniques will improve the robustness of conservation plans We review current habitatmodelling and model evaluation methods and provide a habitat modelling case study in the New South Walescentral coast region that we hope will serve as a methodological template for conservation planners We makerecommendations on modelling methods that are appropriate when presence-absence and presence-only surveydata are available and provide methodological details and a website with data and training material for modellersOur aim is to provide practical guidelines that preserve methodological rigour and result in defendable habitatmodels and maps The case study was undertaken in a rapidly developing area with substantial biodiversity valuesunder urbanization pressure Habitat maps for seven priority fauna species were developed using logistic regressionmodels of species-habitat relationships and a bootstrapping methodology was used to evaluate model predictionsThe modelled species were the koala tiger quoll squirrel glider yellow-bellied glider masked owl powerful owland sooty owl Models ranked sites adequately in terms of habitat suitability and provided predictions of sufficientreliability for the purpose of identifying preliminary conservation priority areas However they are subject tomultiple uncertainties and should not be viewed as a completely accurate representation of the distribution ofspecies habitat We recommend the use of model prediction in an adaptive framework whereby models areiteratively updated and refined as new data become available

Key words bootstrapping conservation planning habitat modelling logistic regression model evaluation ROC

INTRODUCTION

Governments at all levels place considerable emphasison urban and regional planning and have commit-ments to ensure that developments are socially andecologically sustainable (Commonwealth of Australia2003) Protected area planning exercises in Australiaover the past 10 years have utilized statistical habitatmodelling methods to define biodiversity attributes(National Parks amp Wildlife Service 1998 Ferrier et al2002a) Many of the theoretical and technicaladvances in habitat modelling and evaluation methodshave come about in response to the need for betterinformation in public land planning in Australia (Aus-tin amp Meyers 1996 Ferrier amp Watson 1997 Elith ampBurgman 2002 Ferrier et al 2002ab Pearce et al2001a)

However there is currently little scientific input intothe biodiversity aspects of the urban planning processand consideration of biodiversity values is characteris-tically ad hoc Over 40 of Australiarsquos nationally listedthreatened ecological communities and more than50 of threatened species occur in urban fringe areas(Yencken amp Wilkinson 2000) and rapidly increasingurbanization rates are their primary threat Althoughthe areal extent of urban development is usually rela-tively small the magnitude of the impacts is oftenlarge Urbanization is second only to land clearingfor agriculture as a threat to Australiarsquos biodiversity(Burgman amp Lindenmayer 1998) and there is anurgent need to improve conservation planning prac-tices in urban fringe areas

The efficacy of conservation planning relies criti-cally on the quality of the underlying biodiversityinformation (Pressey et al 1999 Wilson et al 2005)Several authors have noted the impracticality of com-plete biological inventory and problems arising whenincomplete biological survey data are used as a basis

720 B A WINTLE ET AL

for reserve planning (Burgman amp Lindenmayer 1998Ferrier et al 2002a) The role of habitat modellingmethods in addressing this problem is well established(Burgman amp Lindenmayer 1998 Ferrier et al 2002aWilson et al 2005) Reliable and defendable methodsfor defining and predicting the distribution of wildlifehabitat are critical components of conservationplanning

Here we attempt to coalesce recent developmentsin wildlife habitat modelling into one modelling andevaluation framework and present them in a simpleenough manner that they may be applied by plannerswith relatively little modelling experience

REVIEW OF WILDLIFE HABITAT MODELLING

At a simple level a habitat model is a numerical rep-resentation of a speciesrsquo habitat preferences It may beused to make inferences about a species habitatrequirements and likely response to environmentalchange or it may be used to predict a species abun-dance density carrying capacity or probability ofoccupying a location based on its environmentalattributes The primary use of habitat modelling inconservation planning is in predicting the spatial dis-tribution of suitable habitat for species of interest in alandscape Many habitat modelling methods are avail-able that may be more or less applicable depending onthe type of biological and environmental data avail-able the species of interest and the end use of themodel There are numerous steps involved in fittingmost types of habitat model each requiring subjectivejudgements that are based on experience and statisti-cal and biological insights There are several detailedreviews and comparisons of wildlife habitat modellingmethods in the literature (Franklin 1995 Manel et al1999ab Elith 2000 Guisan amp Zimmerman 2000Ferrier et al 2002a Zaniewski et al 2002) Our goalis to briefly outline the available methods and presentthose that we believe are most appropriate for predict-ing the distribution of species habitat in a conservationplanning context in which technical expertise is lim-ited We seek to provide enough detail to allow plan-ners with little statistical experience to follow ourrecommendations We have provided worked exam-ples along with code and data for fitting and evaluat-ing statistical habitat models in the statistical freewareR (R Development Core Team 2004) These materialsare available at httpwwwbotanyunimelbeduauenviscibrendanmodelhtml It is important to notethat biological knowledge is a critical prerequisite tosound habitat modelling Our recommendations areconfined to addressing limitations in statistical exper-tise and offer no means of overcoming a lack of avail-able ecological expertise

Choosing a modelling method appropriate for the available data

A primary consideration in deciding on which model-ling method to apply in any given situation is the typeof biological survey data that are available for modeldevelopment There are five main levels of data avail-ability (i) Little or no data are available for habitatmodelling (ii) presence-only (or ad hoc) data are avail-able where occupied locations are recorded but noattempt has been made to record locations that areunoccupied systematically (iii) presencendashabsence (orbinary) data are available where locations that areoccupied or unoccupied by a given species arerecorded usually in a systematic survey (iv) ordinalcategorical data are available where the number ofindividuals at survey locations is recorded in coarseabundance categories and (v) counts where anattempt is made to count the actual number of indi-viduals of a given species at survey locations Thelatter two situations arise very rarely in conservationplanning because of the prohibitive costs associatedwith capture of these data and are not dealt with indetail here The following sections outline the model-ling methods available when presencendashabsence pres-ence-only or no data are available Though we do nottackle the no-data situation in our case study we doprovide references and a description of the basicapproach For details about modelling methodsappropriate for count and ordinal categorical datasee Agresti (1996) Guisan and Harrell (2000) andPearce amp Ferrier (2001)

Little or no data

The absence of biological survey data does not pre-clude the development of a habitat model Habitatsuitability indices (HSIs) were introduced by theUnited States Fish and Wildlife Service as a meansof mapping species habitats for the purpose ofimpact assessment and conservation planning (VanHorne amp Wiens 1991) A HSI model for a givenspecies and area of land represents a conceptualmodel that relates each measured variable of theenvironment to the suitability of a site for the spe-cies scaled from 0 (for unsuitable habitat) to 1 foroptimum conditions (Burgman et al 2001) HSIsare very flexible and have been widely applied inconservation management (Reading et al 1996Breininger et al 1998) The weakness of HSIs isthat their credibility depends wholly on the credibil-ity of the expert(s) who constructs them The lackof independent data in the process makes themimpossible to evaluate statistically and therefore lessrobust to scrutiny

HABITAT MODELLING FOR CONSERVATION PLANNING 721

Presence-only data

Presence-only data are the most common form ofobservation data and are usually available from muse-ums and herbaria (Graham et al 2004) Presence-onlydata suffer from the problems that observations areunplanned and tend to be biased toward towns androads they are often of dubious reliability and unspec-ified spatial accuracy and the variation in survey effortbetween different environments and geographicalareas cannot be controlled or adjusted in model fitting(Ferrier et al 2002a Kadmon et al 2003) Nonethe-less presence-only modelling methods are widelyapplied due to the prevalence of presence-only dataPresence-only data may be modelled using a variety ofmodelling packages that are based on different ecolog-ical assumptions Presence-only methods fall intothree main categories (i) those that use the speciesdata without reference to any environmental data (ii)those that model a species-environment relationship inreference to the species presence data and (iii) thosethat model a species-environment relationship bycharacterizing the lsquobackgroundrsquo environment acrossthe region of interest and modelling the species pres-ence in comparison to this background The first cat-egory includes hulls and kernels (Worton 1989)which can be thought of as geographical envelopesThey are primarily useful for estimation of ranges butnot for more detailed maps of species distributionbecause the envelopes will generally encompass manysites that are unsuitable habitat for the species A lim-itation of all envelope methods is that they are partic-ularly sensitive to missing data and spatial error Thesecond category includes BIOCLIM (Nix 1986) andDOMAIN (Carpenter et al 1993) BIOCLIM is aclimate envelope method that maps habitat that isclimatically suitable for the species based on the dis-tribution of the known presence records across a suiteof climate variables derived from long-term records oftemperature rainfall and radiation It is useful overlarge extents for broadly defining climatically suitableregions but because of its orthogonal geometries theenvelope approach tends to include many sites that arein fact unsuitable for the species (Elith amp Burgman2003) DOMAIN takes an opposite approach bydetermining the similarity of each cell in a map to aknown presence site That is it measures the environ-mental similarity between a target site and the mostsimilar known record site using the Gower metric(Legendre amp Legendre 1998)

The third category includes most other presence-only methods including ENFA (Hirzel 2001) thegenetic algorithm GARP (Stockwell amp Peters 1999)and presencendashabsence methods adapted to presence-only data These have various strengths and weak-nesses and some are more thoroughly tested thanothers (Elith amp Burgman 2003) Of these methods

regression models (generalized linear models GLMsand generalized additive models GAMs) are ecologi-cally realistic and have shown reasonable performancewhen used as logistic models that have been adaptedto allow modelling of presence-only survey data(Ferrier amp Watson 1997 Zaniewski et al 2002) We usethem in the case study and present justifications fortheir use in the following sections

A substantial draw-back to the use of presence-onlydata is the lack of available and broadly acceptedmethods for evaluating the predictive performance offitted models There are some methods currentlyunder development but which at the time of publica-tion remain largely untested and not broadly accepted(Philips et al in press) For this and other reasonsmentioned earlier we would always consider pres-ence-only data and therefore presence-only models tobe inferior to presence-absence data and modelsHowever due to the prevalence of presence-only datawe have provided a demonstration of their use in hab-itat modelling our case study

Presencendashabsence data

A number of public planning exercises in Australiahave utilized presencendashabsence data for derivinghabitat models and maps (National Parks amp WildlifeService 1998 2000 Ferrier et al 2002a) Presencendashabsence data may suffer from the problem of uncertainzeros (MacKenzie et al 2002 Tyre et al 2003) buthave the advantage that they are usually collected in alsquosystematicrsquo manner involving some level of geograph-ical and environmental stratification in the samplingdesign (Austin amp Heyligers 1989) Consequently suchdata are more likely to contain samples that span theenvironmental gradients of interest making model fit-ting more reliable

There is a broad range of modelling methods thatcan utilize presence-absence data and assessment oftheir relative performance has been the subject of con-siderable research effort (Ferrier amp Watson 1997Manel et al 1999ab Elith 2000 Moisen amp Frescino2002) Multivariate association methods such ascanonical correspondence analysis (ter Braak 1986)machine learning methods such as genetic algorithms(Stockwell amp Peters 1999) and neural networks(Moisen amp Frescino 2002) and tree-based methodssuch as classification and regression trees (Breimanet al 1984) have all been proposed as potentially use-ful methods for modelling habitat preferences withpresencendashabsence data

Comparative studies have found the performanceof logistic regression to be typically at least as goodas other methods if not better (Ferrier amp Watson1997 Elith 2000) Of the competing methodsmentioned above only logistic regression naturally

722 B A WINTLE ET AL

assumes data are derived from a binomial processwhich is the correct distribution when data arebinary and observations independent Regressionmethods primarily GLMs (McCullagh amp Nelder1989) and GAMs (Hastie amp Tibshirani 1990) havebeen a commonly applied method for modelling andpredicting habitat occupancy for planning purposes(eg National Parks amp Wildlife Service 1998 2000Li et al 1999 Loyn et al 2001) One of the advan-tages of GLMs and GAMs is the availability of freestatistical software (R Development Core Team2004) and detailed documentation and guidance forfitting and interpreting models (Harrell 2001 Hastieet al 2001 R Development Core Team 2004) Forthese and other reasons described below we focus onthe use of GLMs and GAMs in describing species-habitat relationships and predicting the spatial distri-bution of suitable habitats

GLMs and GAMs in habitat modelling

All GLMs are composed of a random componentdescribed by the assumed distribution of the observa-tion data (either binomial or Poisson for many wildlifeobservation data) a systematic component specifyinga linear combination of explanatory (or independent)variables and a lsquolinkrsquo between the random and sys-tematic components of the model that specifies howthe mean response (ie observation) relates to theexplanatory variables in the linear predictor (Agresti1996) When observation data are binary (presencendashabsence) the expected value may be modelled as Pr(Y = 1) using the lsquologitrsquo transformation to link therandom and systematic component In this case theregression model becomes (Agresti 1996)

(1)

where pi is the probability that the species will bepresent at site i β0 is the intercept coefficient the xk

are the habitat variables and the βk are habitat variablecoefficients Equation 1 defines the special case of theGLM known as the logistic regression model

Generalized additive models are a non-parametricgeneralization of GLMs in which the relationshipsbetween the dependent and independent variables aredefined by non-parametric smoothing functions(Hastie et al 2001) In practice this means that the

linear predictor ( ) that defines the

relationship between the dependent and explanatoryvariables in GLMs is replaced by smoothing functions

( ) The fj are estimated in a lsquoflexiblersquo

log log

it pp

p

x x x

ii

i

i i k ki

( ) =-

EcircEumlAacute

ˆmacr

= + + + +1

0 1 1 2 2b b b b

b b01

+=

Acirc j jj

k

x

b01

+ ( )=

Acirc f xj jj

k

manner and there are a range of alternative smoothersavailable (Hastie et al 2001) This flexibility confersan advantage to GAMs over GLMs in that they areable to lsquofitrsquo data more closely for a given number ofdegrees of freedom because they are not constrainedto fit predefined parametric shapes (Bio et al 1998)However for the same reason GAMs cannot be aseasily interpreted as GLMs Indeed GAMs do notactually have a retrievable model formula in the classicsense and interpretation generally requires a plot ofthe fitted response curves Statistical packages such asR (R Development Core Team 2004) provide thisfacility for both GLMs and GAMs GAMs can befitted with the same lsquolinkrsquo functions as GLMs so arecapable of fitting logistic regression models

It is easy to build models that are ecologically unre-alistic Predictions from unrealistic models will havelarge errors and are likely to be less robust and gener-alizable than those from realistic models (Austin2002) Model interpretability should therefore influ-ence the choice of modelling method because a modelcan only be checked for its realism if it is interpret-able Two key features of ecological realism are thechoice of explanatory variables included in the modeland the shape of the response fitted for those variables(Austin 2002) Methods such as neural networks andgenetic algorithms are difficult to interpret on bothcounts Ennis et al (1998) found that logistic regres-sion provided as good or better performance thanmore complicated methods including multivariateadaptive regression splines (Friedman 1991) andback-propagated neural networks (Ripley 1995) Sim-ilar results have been found in other comparisons(Elith amp Burgman 2002 Moisen amp Frescino 2002)Even though GAMs tend to perform better thanGLMs in such comparisons the simplicity of GLMstheir broad availability in statistical packages the easewith which they can be applied within a geographicalinformation system (GIS) framework and the readyavailability of prediction intervals mean that they arestill useful and frequently implemented The con-struction of a mathematical formula that describes therelationship between the dependent variable and theenvironment provides a compact and communicativerepresentation of a model A logistic regression GLMformula that is published in a report or thesis can beused to predict species occurrence and compute pre-diction intervals without any direct reference to thedata on which the model was fitted (providing thatdue care is taken not to extrapolate beyond the envi-ronmental scope of the original data) This degree ofgenerality is not a feature of other modelling methodsA further strength of GLMs is the ease with whichuncertainty about coefficients and predictions can beconveyed as standard errors and prediction intervalsPrediction intervals are not yet easily obtained fromGAMs

HABITAT MODELLING FOR CONSERVATION PLANNING 723

A CASE STUDY HABITAT MODELLING FOR CONSERVATION PLANNING IN CENTRAL NSW

The Lower Hunter and Central Coast (LHCC) regionof New South Wales (NSW) presents an opportunityto study urbanization pressures in a relatively pristinelandscape with substantial biodiversity values Theregion includes the seven local councils of CessnockGosford Lake Macquarie Maitland Newcastle PortStephens and Wyong (Fig 1) It comprises large areasof native forest in both public and private tenures(National Parks amp Wildlife Service 2000) The regionis approximately 7200 square kilometres of whichapproximately 65 is under forest cover The LHCCRegional Environmental Management Strategy(LHCCREMS) seeks to integrate biodiversity infor-mation into future land use planning and developmentin the region (LHCCREMS 2004) Biodiversityprojects under the LHCCREMS include the develop-ment of vegetation mapping for the region a gapanalysis of biological survey data for the region andsubsequent fauna surveys to augment existing biodi-versity data (Ecotone Ecological Consultants 2001)the development of wildlife habitat models and habitatmaps for priority species in the region (Wintle et al2004) and the development of preliminary conserva-tion recommendations for each species (LHCCREMS2004) This paper reports on the methods used to

develop and evaluate habitat models and maps foreach of the priority species in the region

Priority species and their habitat requirements

Seven fauna species were selected for modelling on thebasis of data availability and lsquopriorityrsquo status with pri-ority largely determined by threatened species statusBy choosing species on the basis that they weredeemed particularly sensitive to identifiable threatsthe selection process loosely followed the rationale ofthe focal species approach (Lambeck 1997) The pri-mary threatening process in the LHCC region is landclearing for development though commercial forestryon public land may potentially pose a threat to somespecies (LHCCREMS 2004)

The koala (Phascolarctos cinereus) the yellow belliedglider (Petaurus australis) and the squirrel glider(Petaurus norfolcensis) are arboreal marsupials that arebroadly distributed throughout forest and woodlandsof eastern Australia The suitability of habitat for arbo-real marsupials is influenced by the size and species oftrees present soil nutrients climate rainfall and thesize and disturbance history of the habitat patches(Reed amp Lunney 1990) The two gliders are specifi-cally reliant on tree hollows for shelter The squirrelglider requires food supplied by flowering acacias andbanksias (Russel 1995) and the yellow bellied glider

Fig 1 The Lower Hunter Central Coast (LHCC) region situated north of Sydney in New South Wales Seven local councilsare located in the region Light shading indicates areas under forest cover

151degE

33degS

Vegetation cover

Vegetated

NonndashVegetated

Local government areas

724 B A WINTLE ET AL

needs particular species of sap-trees and winter flow-ering eucalypts (Goldingay amp Kavanagh 1993) Avail-able habitat for these species is becoming increasinglyfragmented due to habitat destruction and fragmenta-tion caused by urban and infrastructure developmentagriculture and mining (Reed amp Lunney 1990)

The tiger quoll (Dasyurus maculatus) is a medium-sized carnivorous marsupial that inhabits a wide rangeof habitat types from sclerophyll forest and woodlandsto coastal heathlands and rainforests (Edgar amp Belcher1995) The species requires suitable den sites (such ashollow logs tree hollows rock outcrops or caves) anda large area of intact vegetation for foraging (Edgar ampBelcher 1995)

The powerful owl (Ninox strenua) the sooty owl(Tyto tenbricosa) and the masked owl (Tyto novaehollan-diae) are broadly distributed throughout forest wood-land and rainforest of eastern Australia All owl speciestolerate some degree of fragmentation and discontinu-ity in forest cover However they do rely on large treehollows for nesting and denning and prey on forestdependent species such as marsupial gliders and smallground mammals making them susceptible todeclines in prey abundance resulting from high levelsof forest fragmentation (Kavanagh 2002)

The choice of model variables was driven by whatwas known about the habitat requirements of eachspecies Of the species modelled home range esti-mates vary from as low as 065 ha for the squirrelglider (Russel 1995) to 20 km2 for the tiger quoll(Edgar amp Belcher 1995) A modelling grid cell size of1 ha was chosen on the basis that it approximates thehome range size of the smallest ranging species

Data collation filtering and handling

Biological data

Data for the seven target species were obtained fromthe biological systematic survey (BSS) module of theNSW NPWS Wildlife Atlas database and from surveyscommissioned by the LHCCREMS specificallyundertaken to fill gaps in the Atlas data for priorityspecies (Ecotone Ecological Consultants 2001) Sys-tematic survey data contained records of both pres-ence and absence for each of the seven priority speciesThe availability of systematic survey data reduced theeffort required to prepare data for modelling as surveymethod and effort covariates were available for datafiltering Only data collected after 1990 and usingsurvey methods appropriate to detect each specieswere used in modelling Records were filtered toensure a minimum geographical separation distance ofat least 500 m in an attempt to increase the probabilitythat observations were independent Presence recordswere retained in preference to absence records when

two (or more) records fell within 500 m of each otherThe data that remained for model building and testingare summarized in Table 1

Survey method or effort covariates may be incorpo-rated in models when survey data are collected usingmethods of variable reliability (Pearce et al 2001a) Inour study such covariates were not required becausedata used for modelling were relatively uniform withrespect to method and effort

Environmental data

One of the primary limitations of the predictive per-formance of wildlife habitat models is the availabilityof broadly mapped environmental variables that areclosely related to environmental attributes that affectthe distribution of wildlife Austin (2002) delineatedtwo types of independent variables for model building(i) lsquoproximalrsquo (direct) variables are those that representresource shelter or thermal gradients that have adirect influence on a species distribution (eg temper-ature and foliar-nutrient) and (ii) lsquodistalrsquo (or indirect)variables that have no physiological effect on the spe-cies but are correlated with lsquoproximalrsquo variables (egaltitude latitude) Modelling with proximal variableswill more often produce a model that makes transport-able and robust predictions whereas models based ondistal predictors are likely to be more specific to thelocation in which they were constructed (Austin2002) However direct variables are not always avail-able as GIS layers because they tend to be difficult tomap (Guisan amp Zimmerman 2000) so model buildingfor the purpose of prediction is often undertaken usingdistal variables

Experts should play a role in the identification ofenvironmental variables that may be important predic-tors of a species habitat They may be useful indeveloping new variables that are combinations or der-ivations of distal variables For example indices such

Table 1 Number of presence and absence sites derivedfrom the LHCCREMS and BSS databases used in modelbuilding and evaluation for each of the seven priority species

SpeciesPresencerecords

Absencerecords Total

Koala 88 162 250Tiger quoll 36 75 111Squirrel glider 112 129 241Yellow-bellied glider 92 152 244Masked owl 55 149 204Powerful owl 97 142 239Sooty owl 56 156 212

BSS biological systematic survey LHCCREMS TheLower Hunter and Central Coast Regional EnvironmentalManagement Strategy

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 2: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

720 B A WINTLE ET AL

for reserve planning (Burgman amp Lindenmayer 1998Ferrier et al 2002a) The role of habitat modellingmethods in addressing this problem is well established(Burgman amp Lindenmayer 1998 Ferrier et al 2002aWilson et al 2005) Reliable and defendable methodsfor defining and predicting the distribution of wildlifehabitat are critical components of conservationplanning

Here we attempt to coalesce recent developmentsin wildlife habitat modelling into one modelling andevaluation framework and present them in a simpleenough manner that they may be applied by plannerswith relatively little modelling experience

REVIEW OF WILDLIFE HABITAT MODELLING

At a simple level a habitat model is a numerical rep-resentation of a speciesrsquo habitat preferences It may beused to make inferences about a species habitatrequirements and likely response to environmentalchange or it may be used to predict a species abun-dance density carrying capacity or probability ofoccupying a location based on its environmentalattributes The primary use of habitat modelling inconservation planning is in predicting the spatial dis-tribution of suitable habitat for species of interest in alandscape Many habitat modelling methods are avail-able that may be more or less applicable depending onthe type of biological and environmental data avail-able the species of interest and the end use of themodel There are numerous steps involved in fittingmost types of habitat model each requiring subjectivejudgements that are based on experience and statisti-cal and biological insights There are several detailedreviews and comparisons of wildlife habitat modellingmethods in the literature (Franklin 1995 Manel et al1999ab Elith 2000 Guisan amp Zimmerman 2000Ferrier et al 2002a Zaniewski et al 2002) Our goalis to briefly outline the available methods and presentthose that we believe are most appropriate for predict-ing the distribution of species habitat in a conservationplanning context in which technical expertise is lim-ited We seek to provide enough detail to allow plan-ners with little statistical experience to follow ourrecommendations We have provided worked exam-ples along with code and data for fitting and evaluat-ing statistical habitat models in the statistical freewareR (R Development Core Team 2004) These materialsare available at httpwwwbotanyunimelbeduauenviscibrendanmodelhtml It is important to notethat biological knowledge is a critical prerequisite tosound habitat modelling Our recommendations areconfined to addressing limitations in statistical exper-tise and offer no means of overcoming a lack of avail-able ecological expertise

Choosing a modelling method appropriate for the available data

A primary consideration in deciding on which model-ling method to apply in any given situation is the typeof biological survey data that are available for modeldevelopment There are five main levels of data avail-ability (i) Little or no data are available for habitatmodelling (ii) presence-only (or ad hoc) data are avail-able where occupied locations are recorded but noattempt has been made to record locations that areunoccupied systematically (iii) presencendashabsence (orbinary) data are available where locations that areoccupied or unoccupied by a given species arerecorded usually in a systematic survey (iv) ordinalcategorical data are available where the number ofindividuals at survey locations is recorded in coarseabundance categories and (v) counts where anattempt is made to count the actual number of indi-viduals of a given species at survey locations Thelatter two situations arise very rarely in conservationplanning because of the prohibitive costs associatedwith capture of these data and are not dealt with indetail here The following sections outline the model-ling methods available when presencendashabsence pres-ence-only or no data are available Though we do nottackle the no-data situation in our case study we doprovide references and a description of the basicapproach For details about modelling methodsappropriate for count and ordinal categorical datasee Agresti (1996) Guisan and Harrell (2000) andPearce amp Ferrier (2001)

Little or no data

The absence of biological survey data does not pre-clude the development of a habitat model Habitatsuitability indices (HSIs) were introduced by theUnited States Fish and Wildlife Service as a meansof mapping species habitats for the purpose ofimpact assessment and conservation planning (VanHorne amp Wiens 1991) A HSI model for a givenspecies and area of land represents a conceptualmodel that relates each measured variable of theenvironment to the suitability of a site for the spe-cies scaled from 0 (for unsuitable habitat) to 1 foroptimum conditions (Burgman et al 2001) HSIsare very flexible and have been widely applied inconservation management (Reading et al 1996Breininger et al 1998) The weakness of HSIs isthat their credibility depends wholly on the credibil-ity of the expert(s) who constructs them The lackof independent data in the process makes themimpossible to evaluate statistically and therefore lessrobust to scrutiny

HABITAT MODELLING FOR CONSERVATION PLANNING 721

Presence-only data

Presence-only data are the most common form ofobservation data and are usually available from muse-ums and herbaria (Graham et al 2004) Presence-onlydata suffer from the problems that observations areunplanned and tend to be biased toward towns androads they are often of dubious reliability and unspec-ified spatial accuracy and the variation in survey effortbetween different environments and geographicalareas cannot be controlled or adjusted in model fitting(Ferrier et al 2002a Kadmon et al 2003) Nonethe-less presence-only modelling methods are widelyapplied due to the prevalence of presence-only dataPresence-only data may be modelled using a variety ofmodelling packages that are based on different ecolog-ical assumptions Presence-only methods fall intothree main categories (i) those that use the speciesdata without reference to any environmental data (ii)those that model a species-environment relationship inreference to the species presence data and (iii) thosethat model a species-environment relationship bycharacterizing the lsquobackgroundrsquo environment acrossthe region of interest and modelling the species pres-ence in comparison to this background The first cat-egory includes hulls and kernels (Worton 1989)which can be thought of as geographical envelopesThey are primarily useful for estimation of ranges butnot for more detailed maps of species distributionbecause the envelopes will generally encompass manysites that are unsuitable habitat for the species A lim-itation of all envelope methods is that they are partic-ularly sensitive to missing data and spatial error Thesecond category includes BIOCLIM (Nix 1986) andDOMAIN (Carpenter et al 1993) BIOCLIM is aclimate envelope method that maps habitat that isclimatically suitable for the species based on the dis-tribution of the known presence records across a suiteof climate variables derived from long-term records oftemperature rainfall and radiation It is useful overlarge extents for broadly defining climatically suitableregions but because of its orthogonal geometries theenvelope approach tends to include many sites that arein fact unsuitable for the species (Elith amp Burgman2003) DOMAIN takes an opposite approach bydetermining the similarity of each cell in a map to aknown presence site That is it measures the environ-mental similarity between a target site and the mostsimilar known record site using the Gower metric(Legendre amp Legendre 1998)

The third category includes most other presence-only methods including ENFA (Hirzel 2001) thegenetic algorithm GARP (Stockwell amp Peters 1999)and presencendashabsence methods adapted to presence-only data These have various strengths and weak-nesses and some are more thoroughly tested thanothers (Elith amp Burgman 2003) Of these methods

regression models (generalized linear models GLMsand generalized additive models GAMs) are ecologi-cally realistic and have shown reasonable performancewhen used as logistic models that have been adaptedto allow modelling of presence-only survey data(Ferrier amp Watson 1997 Zaniewski et al 2002) We usethem in the case study and present justifications fortheir use in the following sections

A substantial draw-back to the use of presence-onlydata is the lack of available and broadly acceptedmethods for evaluating the predictive performance offitted models There are some methods currentlyunder development but which at the time of publica-tion remain largely untested and not broadly accepted(Philips et al in press) For this and other reasonsmentioned earlier we would always consider pres-ence-only data and therefore presence-only models tobe inferior to presence-absence data and modelsHowever due to the prevalence of presence-only datawe have provided a demonstration of their use in hab-itat modelling our case study

Presencendashabsence data

A number of public planning exercises in Australiahave utilized presencendashabsence data for derivinghabitat models and maps (National Parks amp WildlifeService 1998 2000 Ferrier et al 2002a) Presencendashabsence data may suffer from the problem of uncertainzeros (MacKenzie et al 2002 Tyre et al 2003) buthave the advantage that they are usually collected in alsquosystematicrsquo manner involving some level of geograph-ical and environmental stratification in the samplingdesign (Austin amp Heyligers 1989) Consequently suchdata are more likely to contain samples that span theenvironmental gradients of interest making model fit-ting more reliable

There is a broad range of modelling methods thatcan utilize presence-absence data and assessment oftheir relative performance has been the subject of con-siderable research effort (Ferrier amp Watson 1997Manel et al 1999ab Elith 2000 Moisen amp Frescino2002) Multivariate association methods such ascanonical correspondence analysis (ter Braak 1986)machine learning methods such as genetic algorithms(Stockwell amp Peters 1999) and neural networks(Moisen amp Frescino 2002) and tree-based methodssuch as classification and regression trees (Breimanet al 1984) have all been proposed as potentially use-ful methods for modelling habitat preferences withpresencendashabsence data

Comparative studies have found the performanceof logistic regression to be typically at least as goodas other methods if not better (Ferrier amp Watson1997 Elith 2000) Of the competing methodsmentioned above only logistic regression naturally

722 B A WINTLE ET AL

assumes data are derived from a binomial processwhich is the correct distribution when data arebinary and observations independent Regressionmethods primarily GLMs (McCullagh amp Nelder1989) and GAMs (Hastie amp Tibshirani 1990) havebeen a commonly applied method for modelling andpredicting habitat occupancy for planning purposes(eg National Parks amp Wildlife Service 1998 2000Li et al 1999 Loyn et al 2001) One of the advan-tages of GLMs and GAMs is the availability of freestatistical software (R Development Core Team2004) and detailed documentation and guidance forfitting and interpreting models (Harrell 2001 Hastieet al 2001 R Development Core Team 2004) Forthese and other reasons described below we focus onthe use of GLMs and GAMs in describing species-habitat relationships and predicting the spatial distri-bution of suitable habitats

GLMs and GAMs in habitat modelling

All GLMs are composed of a random componentdescribed by the assumed distribution of the observa-tion data (either binomial or Poisson for many wildlifeobservation data) a systematic component specifyinga linear combination of explanatory (or independent)variables and a lsquolinkrsquo between the random and sys-tematic components of the model that specifies howthe mean response (ie observation) relates to theexplanatory variables in the linear predictor (Agresti1996) When observation data are binary (presencendashabsence) the expected value may be modelled as Pr(Y = 1) using the lsquologitrsquo transformation to link therandom and systematic component In this case theregression model becomes (Agresti 1996)

(1)

where pi is the probability that the species will bepresent at site i β0 is the intercept coefficient the xk

are the habitat variables and the βk are habitat variablecoefficients Equation 1 defines the special case of theGLM known as the logistic regression model

Generalized additive models are a non-parametricgeneralization of GLMs in which the relationshipsbetween the dependent and independent variables aredefined by non-parametric smoothing functions(Hastie et al 2001) In practice this means that the

linear predictor ( ) that defines the

relationship between the dependent and explanatoryvariables in GLMs is replaced by smoothing functions

( ) The fj are estimated in a lsquoflexiblersquo

log log

it pp

p

x x x

ii

i

i i k ki

( ) =-

EcircEumlAacute

ˆmacr

= + + + +1

0 1 1 2 2b b b b

b b01

+=

Acirc j jj

k

x

b01

+ ( )=

Acirc f xj jj

k

manner and there are a range of alternative smoothersavailable (Hastie et al 2001) This flexibility confersan advantage to GAMs over GLMs in that they areable to lsquofitrsquo data more closely for a given number ofdegrees of freedom because they are not constrainedto fit predefined parametric shapes (Bio et al 1998)However for the same reason GAMs cannot be aseasily interpreted as GLMs Indeed GAMs do notactually have a retrievable model formula in the classicsense and interpretation generally requires a plot ofthe fitted response curves Statistical packages such asR (R Development Core Team 2004) provide thisfacility for both GLMs and GAMs GAMs can befitted with the same lsquolinkrsquo functions as GLMs so arecapable of fitting logistic regression models

It is easy to build models that are ecologically unre-alistic Predictions from unrealistic models will havelarge errors and are likely to be less robust and gener-alizable than those from realistic models (Austin2002) Model interpretability should therefore influ-ence the choice of modelling method because a modelcan only be checked for its realism if it is interpret-able Two key features of ecological realism are thechoice of explanatory variables included in the modeland the shape of the response fitted for those variables(Austin 2002) Methods such as neural networks andgenetic algorithms are difficult to interpret on bothcounts Ennis et al (1998) found that logistic regres-sion provided as good or better performance thanmore complicated methods including multivariateadaptive regression splines (Friedman 1991) andback-propagated neural networks (Ripley 1995) Sim-ilar results have been found in other comparisons(Elith amp Burgman 2002 Moisen amp Frescino 2002)Even though GAMs tend to perform better thanGLMs in such comparisons the simplicity of GLMstheir broad availability in statistical packages the easewith which they can be applied within a geographicalinformation system (GIS) framework and the readyavailability of prediction intervals mean that they arestill useful and frequently implemented The con-struction of a mathematical formula that describes therelationship between the dependent variable and theenvironment provides a compact and communicativerepresentation of a model A logistic regression GLMformula that is published in a report or thesis can beused to predict species occurrence and compute pre-diction intervals without any direct reference to thedata on which the model was fitted (providing thatdue care is taken not to extrapolate beyond the envi-ronmental scope of the original data) This degree ofgenerality is not a feature of other modelling methodsA further strength of GLMs is the ease with whichuncertainty about coefficients and predictions can beconveyed as standard errors and prediction intervalsPrediction intervals are not yet easily obtained fromGAMs

HABITAT MODELLING FOR CONSERVATION PLANNING 723

A CASE STUDY HABITAT MODELLING FOR CONSERVATION PLANNING IN CENTRAL NSW

The Lower Hunter and Central Coast (LHCC) regionof New South Wales (NSW) presents an opportunityto study urbanization pressures in a relatively pristinelandscape with substantial biodiversity values Theregion includes the seven local councils of CessnockGosford Lake Macquarie Maitland Newcastle PortStephens and Wyong (Fig 1) It comprises large areasof native forest in both public and private tenures(National Parks amp Wildlife Service 2000) The regionis approximately 7200 square kilometres of whichapproximately 65 is under forest cover The LHCCRegional Environmental Management Strategy(LHCCREMS) seeks to integrate biodiversity infor-mation into future land use planning and developmentin the region (LHCCREMS 2004) Biodiversityprojects under the LHCCREMS include the develop-ment of vegetation mapping for the region a gapanalysis of biological survey data for the region andsubsequent fauna surveys to augment existing biodi-versity data (Ecotone Ecological Consultants 2001)the development of wildlife habitat models and habitatmaps for priority species in the region (Wintle et al2004) and the development of preliminary conserva-tion recommendations for each species (LHCCREMS2004) This paper reports on the methods used to

develop and evaluate habitat models and maps foreach of the priority species in the region

Priority species and their habitat requirements

Seven fauna species were selected for modelling on thebasis of data availability and lsquopriorityrsquo status with pri-ority largely determined by threatened species statusBy choosing species on the basis that they weredeemed particularly sensitive to identifiable threatsthe selection process loosely followed the rationale ofthe focal species approach (Lambeck 1997) The pri-mary threatening process in the LHCC region is landclearing for development though commercial forestryon public land may potentially pose a threat to somespecies (LHCCREMS 2004)

The koala (Phascolarctos cinereus) the yellow belliedglider (Petaurus australis) and the squirrel glider(Petaurus norfolcensis) are arboreal marsupials that arebroadly distributed throughout forest and woodlandsof eastern Australia The suitability of habitat for arbo-real marsupials is influenced by the size and species oftrees present soil nutrients climate rainfall and thesize and disturbance history of the habitat patches(Reed amp Lunney 1990) The two gliders are specifi-cally reliant on tree hollows for shelter The squirrelglider requires food supplied by flowering acacias andbanksias (Russel 1995) and the yellow bellied glider

Fig 1 The Lower Hunter Central Coast (LHCC) region situated north of Sydney in New South Wales Seven local councilsare located in the region Light shading indicates areas under forest cover

151degE

33degS

Vegetation cover

Vegetated

NonndashVegetated

Local government areas

724 B A WINTLE ET AL

needs particular species of sap-trees and winter flow-ering eucalypts (Goldingay amp Kavanagh 1993) Avail-able habitat for these species is becoming increasinglyfragmented due to habitat destruction and fragmenta-tion caused by urban and infrastructure developmentagriculture and mining (Reed amp Lunney 1990)

The tiger quoll (Dasyurus maculatus) is a medium-sized carnivorous marsupial that inhabits a wide rangeof habitat types from sclerophyll forest and woodlandsto coastal heathlands and rainforests (Edgar amp Belcher1995) The species requires suitable den sites (such ashollow logs tree hollows rock outcrops or caves) anda large area of intact vegetation for foraging (Edgar ampBelcher 1995)

The powerful owl (Ninox strenua) the sooty owl(Tyto tenbricosa) and the masked owl (Tyto novaehollan-diae) are broadly distributed throughout forest wood-land and rainforest of eastern Australia All owl speciestolerate some degree of fragmentation and discontinu-ity in forest cover However they do rely on large treehollows for nesting and denning and prey on forestdependent species such as marsupial gliders and smallground mammals making them susceptible todeclines in prey abundance resulting from high levelsof forest fragmentation (Kavanagh 2002)

The choice of model variables was driven by whatwas known about the habitat requirements of eachspecies Of the species modelled home range esti-mates vary from as low as 065 ha for the squirrelglider (Russel 1995) to 20 km2 for the tiger quoll(Edgar amp Belcher 1995) A modelling grid cell size of1 ha was chosen on the basis that it approximates thehome range size of the smallest ranging species

Data collation filtering and handling

Biological data

Data for the seven target species were obtained fromthe biological systematic survey (BSS) module of theNSW NPWS Wildlife Atlas database and from surveyscommissioned by the LHCCREMS specificallyundertaken to fill gaps in the Atlas data for priorityspecies (Ecotone Ecological Consultants 2001) Sys-tematic survey data contained records of both pres-ence and absence for each of the seven priority speciesThe availability of systematic survey data reduced theeffort required to prepare data for modelling as surveymethod and effort covariates were available for datafiltering Only data collected after 1990 and usingsurvey methods appropriate to detect each specieswere used in modelling Records were filtered toensure a minimum geographical separation distance ofat least 500 m in an attempt to increase the probabilitythat observations were independent Presence recordswere retained in preference to absence records when

two (or more) records fell within 500 m of each otherThe data that remained for model building and testingare summarized in Table 1

Survey method or effort covariates may be incorpo-rated in models when survey data are collected usingmethods of variable reliability (Pearce et al 2001a) Inour study such covariates were not required becausedata used for modelling were relatively uniform withrespect to method and effort

Environmental data

One of the primary limitations of the predictive per-formance of wildlife habitat models is the availabilityof broadly mapped environmental variables that areclosely related to environmental attributes that affectthe distribution of wildlife Austin (2002) delineatedtwo types of independent variables for model building(i) lsquoproximalrsquo (direct) variables are those that representresource shelter or thermal gradients that have adirect influence on a species distribution (eg temper-ature and foliar-nutrient) and (ii) lsquodistalrsquo (or indirect)variables that have no physiological effect on the spe-cies but are correlated with lsquoproximalrsquo variables (egaltitude latitude) Modelling with proximal variableswill more often produce a model that makes transport-able and robust predictions whereas models based ondistal predictors are likely to be more specific to thelocation in which they were constructed (Austin2002) However direct variables are not always avail-able as GIS layers because they tend to be difficult tomap (Guisan amp Zimmerman 2000) so model buildingfor the purpose of prediction is often undertaken usingdistal variables

Experts should play a role in the identification ofenvironmental variables that may be important predic-tors of a species habitat They may be useful indeveloping new variables that are combinations or der-ivations of distal variables For example indices such

Table 1 Number of presence and absence sites derivedfrom the LHCCREMS and BSS databases used in modelbuilding and evaluation for each of the seven priority species

SpeciesPresencerecords

Absencerecords Total

Koala 88 162 250Tiger quoll 36 75 111Squirrel glider 112 129 241Yellow-bellied glider 92 152 244Masked owl 55 149 204Powerful owl 97 142 239Sooty owl 56 156 212

BSS biological systematic survey LHCCREMS TheLower Hunter and Central Coast Regional EnvironmentalManagement Strategy

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 3: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 721

Presence-only data

Presence-only data are the most common form ofobservation data and are usually available from muse-ums and herbaria (Graham et al 2004) Presence-onlydata suffer from the problems that observations areunplanned and tend to be biased toward towns androads they are often of dubious reliability and unspec-ified spatial accuracy and the variation in survey effortbetween different environments and geographicalareas cannot be controlled or adjusted in model fitting(Ferrier et al 2002a Kadmon et al 2003) Nonethe-less presence-only modelling methods are widelyapplied due to the prevalence of presence-only dataPresence-only data may be modelled using a variety ofmodelling packages that are based on different ecolog-ical assumptions Presence-only methods fall intothree main categories (i) those that use the speciesdata without reference to any environmental data (ii)those that model a species-environment relationship inreference to the species presence data and (iii) thosethat model a species-environment relationship bycharacterizing the lsquobackgroundrsquo environment acrossthe region of interest and modelling the species pres-ence in comparison to this background The first cat-egory includes hulls and kernels (Worton 1989)which can be thought of as geographical envelopesThey are primarily useful for estimation of ranges butnot for more detailed maps of species distributionbecause the envelopes will generally encompass manysites that are unsuitable habitat for the species A lim-itation of all envelope methods is that they are partic-ularly sensitive to missing data and spatial error Thesecond category includes BIOCLIM (Nix 1986) andDOMAIN (Carpenter et al 1993) BIOCLIM is aclimate envelope method that maps habitat that isclimatically suitable for the species based on the dis-tribution of the known presence records across a suiteof climate variables derived from long-term records oftemperature rainfall and radiation It is useful overlarge extents for broadly defining climatically suitableregions but because of its orthogonal geometries theenvelope approach tends to include many sites that arein fact unsuitable for the species (Elith amp Burgman2003) DOMAIN takes an opposite approach bydetermining the similarity of each cell in a map to aknown presence site That is it measures the environ-mental similarity between a target site and the mostsimilar known record site using the Gower metric(Legendre amp Legendre 1998)

The third category includes most other presence-only methods including ENFA (Hirzel 2001) thegenetic algorithm GARP (Stockwell amp Peters 1999)and presencendashabsence methods adapted to presence-only data These have various strengths and weak-nesses and some are more thoroughly tested thanothers (Elith amp Burgman 2003) Of these methods

regression models (generalized linear models GLMsand generalized additive models GAMs) are ecologi-cally realistic and have shown reasonable performancewhen used as logistic models that have been adaptedto allow modelling of presence-only survey data(Ferrier amp Watson 1997 Zaniewski et al 2002) We usethem in the case study and present justifications fortheir use in the following sections

A substantial draw-back to the use of presence-onlydata is the lack of available and broadly acceptedmethods for evaluating the predictive performance offitted models There are some methods currentlyunder development but which at the time of publica-tion remain largely untested and not broadly accepted(Philips et al in press) For this and other reasonsmentioned earlier we would always consider pres-ence-only data and therefore presence-only models tobe inferior to presence-absence data and modelsHowever due to the prevalence of presence-only datawe have provided a demonstration of their use in hab-itat modelling our case study

Presencendashabsence data

A number of public planning exercises in Australiahave utilized presencendashabsence data for derivinghabitat models and maps (National Parks amp WildlifeService 1998 2000 Ferrier et al 2002a) Presencendashabsence data may suffer from the problem of uncertainzeros (MacKenzie et al 2002 Tyre et al 2003) buthave the advantage that they are usually collected in alsquosystematicrsquo manner involving some level of geograph-ical and environmental stratification in the samplingdesign (Austin amp Heyligers 1989) Consequently suchdata are more likely to contain samples that span theenvironmental gradients of interest making model fit-ting more reliable

There is a broad range of modelling methods thatcan utilize presence-absence data and assessment oftheir relative performance has been the subject of con-siderable research effort (Ferrier amp Watson 1997Manel et al 1999ab Elith 2000 Moisen amp Frescino2002) Multivariate association methods such ascanonical correspondence analysis (ter Braak 1986)machine learning methods such as genetic algorithms(Stockwell amp Peters 1999) and neural networks(Moisen amp Frescino 2002) and tree-based methodssuch as classification and regression trees (Breimanet al 1984) have all been proposed as potentially use-ful methods for modelling habitat preferences withpresencendashabsence data

Comparative studies have found the performanceof logistic regression to be typically at least as goodas other methods if not better (Ferrier amp Watson1997 Elith 2000) Of the competing methodsmentioned above only logistic regression naturally

722 B A WINTLE ET AL

assumes data are derived from a binomial processwhich is the correct distribution when data arebinary and observations independent Regressionmethods primarily GLMs (McCullagh amp Nelder1989) and GAMs (Hastie amp Tibshirani 1990) havebeen a commonly applied method for modelling andpredicting habitat occupancy for planning purposes(eg National Parks amp Wildlife Service 1998 2000Li et al 1999 Loyn et al 2001) One of the advan-tages of GLMs and GAMs is the availability of freestatistical software (R Development Core Team2004) and detailed documentation and guidance forfitting and interpreting models (Harrell 2001 Hastieet al 2001 R Development Core Team 2004) Forthese and other reasons described below we focus onthe use of GLMs and GAMs in describing species-habitat relationships and predicting the spatial distri-bution of suitable habitats

GLMs and GAMs in habitat modelling

All GLMs are composed of a random componentdescribed by the assumed distribution of the observa-tion data (either binomial or Poisson for many wildlifeobservation data) a systematic component specifyinga linear combination of explanatory (or independent)variables and a lsquolinkrsquo between the random and sys-tematic components of the model that specifies howthe mean response (ie observation) relates to theexplanatory variables in the linear predictor (Agresti1996) When observation data are binary (presencendashabsence) the expected value may be modelled as Pr(Y = 1) using the lsquologitrsquo transformation to link therandom and systematic component In this case theregression model becomes (Agresti 1996)

(1)

where pi is the probability that the species will bepresent at site i β0 is the intercept coefficient the xk

are the habitat variables and the βk are habitat variablecoefficients Equation 1 defines the special case of theGLM known as the logistic regression model

Generalized additive models are a non-parametricgeneralization of GLMs in which the relationshipsbetween the dependent and independent variables aredefined by non-parametric smoothing functions(Hastie et al 2001) In practice this means that the

linear predictor ( ) that defines the

relationship between the dependent and explanatoryvariables in GLMs is replaced by smoothing functions

( ) The fj are estimated in a lsquoflexiblersquo

log log

it pp

p

x x x

ii

i

i i k ki

( ) =-

EcircEumlAacute

ˆmacr

= + + + +1

0 1 1 2 2b b b b

b b01

+=

Acirc j jj

k

x

b01

+ ( )=

Acirc f xj jj

k

manner and there are a range of alternative smoothersavailable (Hastie et al 2001) This flexibility confersan advantage to GAMs over GLMs in that they areable to lsquofitrsquo data more closely for a given number ofdegrees of freedom because they are not constrainedto fit predefined parametric shapes (Bio et al 1998)However for the same reason GAMs cannot be aseasily interpreted as GLMs Indeed GAMs do notactually have a retrievable model formula in the classicsense and interpretation generally requires a plot ofthe fitted response curves Statistical packages such asR (R Development Core Team 2004) provide thisfacility for both GLMs and GAMs GAMs can befitted with the same lsquolinkrsquo functions as GLMs so arecapable of fitting logistic regression models

It is easy to build models that are ecologically unre-alistic Predictions from unrealistic models will havelarge errors and are likely to be less robust and gener-alizable than those from realistic models (Austin2002) Model interpretability should therefore influ-ence the choice of modelling method because a modelcan only be checked for its realism if it is interpret-able Two key features of ecological realism are thechoice of explanatory variables included in the modeland the shape of the response fitted for those variables(Austin 2002) Methods such as neural networks andgenetic algorithms are difficult to interpret on bothcounts Ennis et al (1998) found that logistic regres-sion provided as good or better performance thanmore complicated methods including multivariateadaptive regression splines (Friedman 1991) andback-propagated neural networks (Ripley 1995) Sim-ilar results have been found in other comparisons(Elith amp Burgman 2002 Moisen amp Frescino 2002)Even though GAMs tend to perform better thanGLMs in such comparisons the simplicity of GLMstheir broad availability in statistical packages the easewith which they can be applied within a geographicalinformation system (GIS) framework and the readyavailability of prediction intervals mean that they arestill useful and frequently implemented The con-struction of a mathematical formula that describes therelationship between the dependent variable and theenvironment provides a compact and communicativerepresentation of a model A logistic regression GLMformula that is published in a report or thesis can beused to predict species occurrence and compute pre-diction intervals without any direct reference to thedata on which the model was fitted (providing thatdue care is taken not to extrapolate beyond the envi-ronmental scope of the original data) This degree ofgenerality is not a feature of other modelling methodsA further strength of GLMs is the ease with whichuncertainty about coefficients and predictions can beconveyed as standard errors and prediction intervalsPrediction intervals are not yet easily obtained fromGAMs

HABITAT MODELLING FOR CONSERVATION PLANNING 723

A CASE STUDY HABITAT MODELLING FOR CONSERVATION PLANNING IN CENTRAL NSW

The Lower Hunter and Central Coast (LHCC) regionof New South Wales (NSW) presents an opportunityto study urbanization pressures in a relatively pristinelandscape with substantial biodiversity values Theregion includes the seven local councils of CessnockGosford Lake Macquarie Maitland Newcastle PortStephens and Wyong (Fig 1) It comprises large areasof native forest in both public and private tenures(National Parks amp Wildlife Service 2000) The regionis approximately 7200 square kilometres of whichapproximately 65 is under forest cover The LHCCRegional Environmental Management Strategy(LHCCREMS) seeks to integrate biodiversity infor-mation into future land use planning and developmentin the region (LHCCREMS 2004) Biodiversityprojects under the LHCCREMS include the develop-ment of vegetation mapping for the region a gapanalysis of biological survey data for the region andsubsequent fauna surveys to augment existing biodi-versity data (Ecotone Ecological Consultants 2001)the development of wildlife habitat models and habitatmaps for priority species in the region (Wintle et al2004) and the development of preliminary conserva-tion recommendations for each species (LHCCREMS2004) This paper reports on the methods used to

develop and evaluate habitat models and maps foreach of the priority species in the region

Priority species and their habitat requirements

Seven fauna species were selected for modelling on thebasis of data availability and lsquopriorityrsquo status with pri-ority largely determined by threatened species statusBy choosing species on the basis that they weredeemed particularly sensitive to identifiable threatsthe selection process loosely followed the rationale ofthe focal species approach (Lambeck 1997) The pri-mary threatening process in the LHCC region is landclearing for development though commercial forestryon public land may potentially pose a threat to somespecies (LHCCREMS 2004)

The koala (Phascolarctos cinereus) the yellow belliedglider (Petaurus australis) and the squirrel glider(Petaurus norfolcensis) are arboreal marsupials that arebroadly distributed throughout forest and woodlandsof eastern Australia The suitability of habitat for arbo-real marsupials is influenced by the size and species oftrees present soil nutrients climate rainfall and thesize and disturbance history of the habitat patches(Reed amp Lunney 1990) The two gliders are specifi-cally reliant on tree hollows for shelter The squirrelglider requires food supplied by flowering acacias andbanksias (Russel 1995) and the yellow bellied glider

Fig 1 The Lower Hunter Central Coast (LHCC) region situated north of Sydney in New South Wales Seven local councilsare located in the region Light shading indicates areas under forest cover

151degE

33degS

Vegetation cover

Vegetated

NonndashVegetated

Local government areas

724 B A WINTLE ET AL

needs particular species of sap-trees and winter flow-ering eucalypts (Goldingay amp Kavanagh 1993) Avail-able habitat for these species is becoming increasinglyfragmented due to habitat destruction and fragmenta-tion caused by urban and infrastructure developmentagriculture and mining (Reed amp Lunney 1990)

The tiger quoll (Dasyurus maculatus) is a medium-sized carnivorous marsupial that inhabits a wide rangeof habitat types from sclerophyll forest and woodlandsto coastal heathlands and rainforests (Edgar amp Belcher1995) The species requires suitable den sites (such ashollow logs tree hollows rock outcrops or caves) anda large area of intact vegetation for foraging (Edgar ampBelcher 1995)

The powerful owl (Ninox strenua) the sooty owl(Tyto tenbricosa) and the masked owl (Tyto novaehollan-diae) are broadly distributed throughout forest wood-land and rainforest of eastern Australia All owl speciestolerate some degree of fragmentation and discontinu-ity in forest cover However they do rely on large treehollows for nesting and denning and prey on forestdependent species such as marsupial gliders and smallground mammals making them susceptible todeclines in prey abundance resulting from high levelsof forest fragmentation (Kavanagh 2002)

The choice of model variables was driven by whatwas known about the habitat requirements of eachspecies Of the species modelled home range esti-mates vary from as low as 065 ha for the squirrelglider (Russel 1995) to 20 km2 for the tiger quoll(Edgar amp Belcher 1995) A modelling grid cell size of1 ha was chosen on the basis that it approximates thehome range size of the smallest ranging species

Data collation filtering and handling

Biological data

Data for the seven target species were obtained fromthe biological systematic survey (BSS) module of theNSW NPWS Wildlife Atlas database and from surveyscommissioned by the LHCCREMS specificallyundertaken to fill gaps in the Atlas data for priorityspecies (Ecotone Ecological Consultants 2001) Sys-tematic survey data contained records of both pres-ence and absence for each of the seven priority speciesThe availability of systematic survey data reduced theeffort required to prepare data for modelling as surveymethod and effort covariates were available for datafiltering Only data collected after 1990 and usingsurvey methods appropriate to detect each specieswere used in modelling Records were filtered toensure a minimum geographical separation distance ofat least 500 m in an attempt to increase the probabilitythat observations were independent Presence recordswere retained in preference to absence records when

two (or more) records fell within 500 m of each otherThe data that remained for model building and testingare summarized in Table 1

Survey method or effort covariates may be incorpo-rated in models when survey data are collected usingmethods of variable reliability (Pearce et al 2001a) Inour study such covariates were not required becausedata used for modelling were relatively uniform withrespect to method and effort

Environmental data

One of the primary limitations of the predictive per-formance of wildlife habitat models is the availabilityof broadly mapped environmental variables that areclosely related to environmental attributes that affectthe distribution of wildlife Austin (2002) delineatedtwo types of independent variables for model building(i) lsquoproximalrsquo (direct) variables are those that representresource shelter or thermal gradients that have adirect influence on a species distribution (eg temper-ature and foliar-nutrient) and (ii) lsquodistalrsquo (or indirect)variables that have no physiological effect on the spe-cies but are correlated with lsquoproximalrsquo variables (egaltitude latitude) Modelling with proximal variableswill more often produce a model that makes transport-able and robust predictions whereas models based ondistal predictors are likely to be more specific to thelocation in which they were constructed (Austin2002) However direct variables are not always avail-able as GIS layers because they tend to be difficult tomap (Guisan amp Zimmerman 2000) so model buildingfor the purpose of prediction is often undertaken usingdistal variables

Experts should play a role in the identification ofenvironmental variables that may be important predic-tors of a species habitat They may be useful indeveloping new variables that are combinations or der-ivations of distal variables For example indices such

Table 1 Number of presence and absence sites derivedfrom the LHCCREMS and BSS databases used in modelbuilding and evaluation for each of the seven priority species

SpeciesPresencerecords

Absencerecords Total

Koala 88 162 250Tiger quoll 36 75 111Squirrel glider 112 129 241Yellow-bellied glider 92 152 244Masked owl 55 149 204Powerful owl 97 142 239Sooty owl 56 156 212

BSS biological systematic survey LHCCREMS TheLower Hunter and Central Coast Regional EnvironmentalManagement Strategy

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 4: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

722 B A WINTLE ET AL

assumes data are derived from a binomial processwhich is the correct distribution when data arebinary and observations independent Regressionmethods primarily GLMs (McCullagh amp Nelder1989) and GAMs (Hastie amp Tibshirani 1990) havebeen a commonly applied method for modelling andpredicting habitat occupancy for planning purposes(eg National Parks amp Wildlife Service 1998 2000Li et al 1999 Loyn et al 2001) One of the advan-tages of GLMs and GAMs is the availability of freestatistical software (R Development Core Team2004) and detailed documentation and guidance forfitting and interpreting models (Harrell 2001 Hastieet al 2001 R Development Core Team 2004) Forthese and other reasons described below we focus onthe use of GLMs and GAMs in describing species-habitat relationships and predicting the spatial distri-bution of suitable habitats

GLMs and GAMs in habitat modelling

All GLMs are composed of a random componentdescribed by the assumed distribution of the observa-tion data (either binomial or Poisson for many wildlifeobservation data) a systematic component specifyinga linear combination of explanatory (or independent)variables and a lsquolinkrsquo between the random and sys-tematic components of the model that specifies howthe mean response (ie observation) relates to theexplanatory variables in the linear predictor (Agresti1996) When observation data are binary (presencendashabsence) the expected value may be modelled as Pr(Y = 1) using the lsquologitrsquo transformation to link therandom and systematic component In this case theregression model becomes (Agresti 1996)

(1)

where pi is the probability that the species will bepresent at site i β0 is the intercept coefficient the xk

are the habitat variables and the βk are habitat variablecoefficients Equation 1 defines the special case of theGLM known as the logistic regression model

Generalized additive models are a non-parametricgeneralization of GLMs in which the relationshipsbetween the dependent and independent variables aredefined by non-parametric smoothing functions(Hastie et al 2001) In practice this means that the

linear predictor ( ) that defines the

relationship between the dependent and explanatoryvariables in GLMs is replaced by smoothing functions

( ) The fj are estimated in a lsquoflexiblersquo

log log

it pp

p

x x x

ii

i

i i k ki

( ) =-

EcircEumlAacute

ˆmacr

= + + + +1

0 1 1 2 2b b b b

b b01

+=

Acirc j jj

k

x

b01

+ ( )=

Acirc f xj jj

k

manner and there are a range of alternative smoothersavailable (Hastie et al 2001) This flexibility confersan advantage to GAMs over GLMs in that they areable to lsquofitrsquo data more closely for a given number ofdegrees of freedom because they are not constrainedto fit predefined parametric shapes (Bio et al 1998)However for the same reason GAMs cannot be aseasily interpreted as GLMs Indeed GAMs do notactually have a retrievable model formula in the classicsense and interpretation generally requires a plot ofthe fitted response curves Statistical packages such asR (R Development Core Team 2004) provide thisfacility for both GLMs and GAMs GAMs can befitted with the same lsquolinkrsquo functions as GLMs so arecapable of fitting logistic regression models

It is easy to build models that are ecologically unre-alistic Predictions from unrealistic models will havelarge errors and are likely to be less robust and gener-alizable than those from realistic models (Austin2002) Model interpretability should therefore influ-ence the choice of modelling method because a modelcan only be checked for its realism if it is interpret-able Two key features of ecological realism are thechoice of explanatory variables included in the modeland the shape of the response fitted for those variables(Austin 2002) Methods such as neural networks andgenetic algorithms are difficult to interpret on bothcounts Ennis et al (1998) found that logistic regres-sion provided as good or better performance thanmore complicated methods including multivariateadaptive regression splines (Friedman 1991) andback-propagated neural networks (Ripley 1995) Sim-ilar results have been found in other comparisons(Elith amp Burgman 2002 Moisen amp Frescino 2002)Even though GAMs tend to perform better thanGLMs in such comparisons the simplicity of GLMstheir broad availability in statistical packages the easewith which they can be applied within a geographicalinformation system (GIS) framework and the readyavailability of prediction intervals mean that they arestill useful and frequently implemented The con-struction of a mathematical formula that describes therelationship between the dependent variable and theenvironment provides a compact and communicativerepresentation of a model A logistic regression GLMformula that is published in a report or thesis can beused to predict species occurrence and compute pre-diction intervals without any direct reference to thedata on which the model was fitted (providing thatdue care is taken not to extrapolate beyond the envi-ronmental scope of the original data) This degree ofgenerality is not a feature of other modelling methodsA further strength of GLMs is the ease with whichuncertainty about coefficients and predictions can beconveyed as standard errors and prediction intervalsPrediction intervals are not yet easily obtained fromGAMs

HABITAT MODELLING FOR CONSERVATION PLANNING 723

A CASE STUDY HABITAT MODELLING FOR CONSERVATION PLANNING IN CENTRAL NSW

The Lower Hunter and Central Coast (LHCC) regionof New South Wales (NSW) presents an opportunityto study urbanization pressures in a relatively pristinelandscape with substantial biodiversity values Theregion includes the seven local councils of CessnockGosford Lake Macquarie Maitland Newcastle PortStephens and Wyong (Fig 1) It comprises large areasof native forest in both public and private tenures(National Parks amp Wildlife Service 2000) The regionis approximately 7200 square kilometres of whichapproximately 65 is under forest cover The LHCCRegional Environmental Management Strategy(LHCCREMS) seeks to integrate biodiversity infor-mation into future land use planning and developmentin the region (LHCCREMS 2004) Biodiversityprojects under the LHCCREMS include the develop-ment of vegetation mapping for the region a gapanalysis of biological survey data for the region andsubsequent fauna surveys to augment existing biodi-versity data (Ecotone Ecological Consultants 2001)the development of wildlife habitat models and habitatmaps for priority species in the region (Wintle et al2004) and the development of preliminary conserva-tion recommendations for each species (LHCCREMS2004) This paper reports on the methods used to

develop and evaluate habitat models and maps foreach of the priority species in the region

Priority species and their habitat requirements

Seven fauna species were selected for modelling on thebasis of data availability and lsquopriorityrsquo status with pri-ority largely determined by threatened species statusBy choosing species on the basis that they weredeemed particularly sensitive to identifiable threatsthe selection process loosely followed the rationale ofthe focal species approach (Lambeck 1997) The pri-mary threatening process in the LHCC region is landclearing for development though commercial forestryon public land may potentially pose a threat to somespecies (LHCCREMS 2004)

The koala (Phascolarctos cinereus) the yellow belliedglider (Petaurus australis) and the squirrel glider(Petaurus norfolcensis) are arboreal marsupials that arebroadly distributed throughout forest and woodlandsof eastern Australia The suitability of habitat for arbo-real marsupials is influenced by the size and species oftrees present soil nutrients climate rainfall and thesize and disturbance history of the habitat patches(Reed amp Lunney 1990) The two gliders are specifi-cally reliant on tree hollows for shelter The squirrelglider requires food supplied by flowering acacias andbanksias (Russel 1995) and the yellow bellied glider

Fig 1 The Lower Hunter Central Coast (LHCC) region situated north of Sydney in New South Wales Seven local councilsare located in the region Light shading indicates areas under forest cover

151degE

33degS

Vegetation cover

Vegetated

NonndashVegetated

Local government areas

724 B A WINTLE ET AL

needs particular species of sap-trees and winter flow-ering eucalypts (Goldingay amp Kavanagh 1993) Avail-able habitat for these species is becoming increasinglyfragmented due to habitat destruction and fragmenta-tion caused by urban and infrastructure developmentagriculture and mining (Reed amp Lunney 1990)

The tiger quoll (Dasyurus maculatus) is a medium-sized carnivorous marsupial that inhabits a wide rangeof habitat types from sclerophyll forest and woodlandsto coastal heathlands and rainforests (Edgar amp Belcher1995) The species requires suitable den sites (such ashollow logs tree hollows rock outcrops or caves) anda large area of intact vegetation for foraging (Edgar ampBelcher 1995)

The powerful owl (Ninox strenua) the sooty owl(Tyto tenbricosa) and the masked owl (Tyto novaehollan-diae) are broadly distributed throughout forest wood-land and rainforest of eastern Australia All owl speciestolerate some degree of fragmentation and discontinu-ity in forest cover However they do rely on large treehollows for nesting and denning and prey on forestdependent species such as marsupial gliders and smallground mammals making them susceptible todeclines in prey abundance resulting from high levelsof forest fragmentation (Kavanagh 2002)

The choice of model variables was driven by whatwas known about the habitat requirements of eachspecies Of the species modelled home range esti-mates vary from as low as 065 ha for the squirrelglider (Russel 1995) to 20 km2 for the tiger quoll(Edgar amp Belcher 1995) A modelling grid cell size of1 ha was chosen on the basis that it approximates thehome range size of the smallest ranging species

Data collation filtering and handling

Biological data

Data for the seven target species were obtained fromthe biological systematic survey (BSS) module of theNSW NPWS Wildlife Atlas database and from surveyscommissioned by the LHCCREMS specificallyundertaken to fill gaps in the Atlas data for priorityspecies (Ecotone Ecological Consultants 2001) Sys-tematic survey data contained records of both pres-ence and absence for each of the seven priority speciesThe availability of systematic survey data reduced theeffort required to prepare data for modelling as surveymethod and effort covariates were available for datafiltering Only data collected after 1990 and usingsurvey methods appropriate to detect each specieswere used in modelling Records were filtered toensure a minimum geographical separation distance ofat least 500 m in an attempt to increase the probabilitythat observations were independent Presence recordswere retained in preference to absence records when

two (or more) records fell within 500 m of each otherThe data that remained for model building and testingare summarized in Table 1

Survey method or effort covariates may be incorpo-rated in models when survey data are collected usingmethods of variable reliability (Pearce et al 2001a) Inour study such covariates were not required becausedata used for modelling were relatively uniform withrespect to method and effort

Environmental data

One of the primary limitations of the predictive per-formance of wildlife habitat models is the availabilityof broadly mapped environmental variables that areclosely related to environmental attributes that affectthe distribution of wildlife Austin (2002) delineatedtwo types of independent variables for model building(i) lsquoproximalrsquo (direct) variables are those that representresource shelter or thermal gradients that have adirect influence on a species distribution (eg temper-ature and foliar-nutrient) and (ii) lsquodistalrsquo (or indirect)variables that have no physiological effect on the spe-cies but are correlated with lsquoproximalrsquo variables (egaltitude latitude) Modelling with proximal variableswill more often produce a model that makes transport-able and robust predictions whereas models based ondistal predictors are likely to be more specific to thelocation in which they were constructed (Austin2002) However direct variables are not always avail-able as GIS layers because they tend to be difficult tomap (Guisan amp Zimmerman 2000) so model buildingfor the purpose of prediction is often undertaken usingdistal variables

Experts should play a role in the identification ofenvironmental variables that may be important predic-tors of a species habitat They may be useful indeveloping new variables that are combinations or der-ivations of distal variables For example indices such

Table 1 Number of presence and absence sites derivedfrom the LHCCREMS and BSS databases used in modelbuilding and evaluation for each of the seven priority species

SpeciesPresencerecords

Absencerecords Total

Koala 88 162 250Tiger quoll 36 75 111Squirrel glider 112 129 241Yellow-bellied glider 92 152 244Masked owl 55 149 204Powerful owl 97 142 239Sooty owl 56 156 212

BSS biological systematic survey LHCCREMS TheLower Hunter and Central Coast Regional EnvironmentalManagement Strategy

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 5: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 723

A CASE STUDY HABITAT MODELLING FOR CONSERVATION PLANNING IN CENTRAL NSW

The Lower Hunter and Central Coast (LHCC) regionof New South Wales (NSW) presents an opportunityto study urbanization pressures in a relatively pristinelandscape with substantial biodiversity values Theregion includes the seven local councils of CessnockGosford Lake Macquarie Maitland Newcastle PortStephens and Wyong (Fig 1) It comprises large areasof native forest in both public and private tenures(National Parks amp Wildlife Service 2000) The regionis approximately 7200 square kilometres of whichapproximately 65 is under forest cover The LHCCRegional Environmental Management Strategy(LHCCREMS) seeks to integrate biodiversity infor-mation into future land use planning and developmentin the region (LHCCREMS 2004) Biodiversityprojects under the LHCCREMS include the develop-ment of vegetation mapping for the region a gapanalysis of biological survey data for the region andsubsequent fauna surveys to augment existing biodi-versity data (Ecotone Ecological Consultants 2001)the development of wildlife habitat models and habitatmaps for priority species in the region (Wintle et al2004) and the development of preliminary conserva-tion recommendations for each species (LHCCREMS2004) This paper reports on the methods used to

develop and evaluate habitat models and maps foreach of the priority species in the region

Priority species and their habitat requirements

Seven fauna species were selected for modelling on thebasis of data availability and lsquopriorityrsquo status with pri-ority largely determined by threatened species statusBy choosing species on the basis that they weredeemed particularly sensitive to identifiable threatsthe selection process loosely followed the rationale ofthe focal species approach (Lambeck 1997) The pri-mary threatening process in the LHCC region is landclearing for development though commercial forestryon public land may potentially pose a threat to somespecies (LHCCREMS 2004)

The koala (Phascolarctos cinereus) the yellow belliedglider (Petaurus australis) and the squirrel glider(Petaurus norfolcensis) are arboreal marsupials that arebroadly distributed throughout forest and woodlandsof eastern Australia The suitability of habitat for arbo-real marsupials is influenced by the size and species oftrees present soil nutrients climate rainfall and thesize and disturbance history of the habitat patches(Reed amp Lunney 1990) The two gliders are specifi-cally reliant on tree hollows for shelter The squirrelglider requires food supplied by flowering acacias andbanksias (Russel 1995) and the yellow bellied glider

Fig 1 The Lower Hunter Central Coast (LHCC) region situated north of Sydney in New South Wales Seven local councilsare located in the region Light shading indicates areas under forest cover

151degE

33degS

Vegetation cover

Vegetated

NonndashVegetated

Local government areas

724 B A WINTLE ET AL

needs particular species of sap-trees and winter flow-ering eucalypts (Goldingay amp Kavanagh 1993) Avail-able habitat for these species is becoming increasinglyfragmented due to habitat destruction and fragmenta-tion caused by urban and infrastructure developmentagriculture and mining (Reed amp Lunney 1990)

The tiger quoll (Dasyurus maculatus) is a medium-sized carnivorous marsupial that inhabits a wide rangeof habitat types from sclerophyll forest and woodlandsto coastal heathlands and rainforests (Edgar amp Belcher1995) The species requires suitable den sites (such ashollow logs tree hollows rock outcrops or caves) anda large area of intact vegetation for foraging (Edgar ampBelcher 1995)

The powerful owl (Ninox strenua) the sooty owl(Tyto tenbricosa) and the masked owl (Tyto novaehollan-diae) are broadly distributed throughout forest wood-land and rainforest of eastern Australia All owl speciestolerate some degree of fragmentation and discontinu-ity in forest cover However they do rely on large treehollows for nesting and denning and prey on forestdependent species such as marsupial gliders and smallground mammals making them susceptible todeclines in prey abundance resulting from high levelsof forest fragmentation (Kavanagh 2002)

The choice of model variables was driven by whatwas known about the habitat requirements of eachspecies Of the species modelled home range esti-mates vary from as low as 065 ha for the squirrelglider (Russel 1995) to 20 km2 for the tiger quoll(Edgar amp Belcher 1995) A modelling grid cell size of1 ha was chosen on the basis that it approximates thehome range size of the smallest ranging species

Data collation filtering and handling

Biological data

Data for the seven target species were obtained fromthe biological systematic survey (BSS) module of theNSW NPWS Wildlife Atlas database and from surveyscommissioned by the LHCCREMS specificallyundertaken to fill gaps in the Atlas data for priorityspecies (Ecotone Ecological Consultants 2001) Sys-tematic survey data contained records of both pres-ence and absence for each of the seven priority speciesThe availability of systematic survey data reduced theeffort required to prepare data for modelling as surveymethod and effort covariates were available for datafiltering Only data collected after 1990 and usingsurvey methods appropriate to detect each specieswere used in modelling Records were filtered toensure a minimum geographical separation distance ofat least 500 m in an attempt to increase the probabilitythat observations were independent Presence recordswere retained in preference to absence records when

two (or more) records fell within 500 m of each otherThe data that remained for model building and testingare summarized in Table 1

Survey method or effort covariates may be incorpo-rated in models when survey data are collected usingmethods of variable reliability (Pearce et al 2001a) Inour study such covariates were not required becausedata used for modelling were relatively uniform withrespect to method and effort

Environmental data

One of the primary limitations of the predictive per-formance of wildlife habitat models is the availabilityof broadly mapped environmental variables that areclosely related to environmental attributes that affectthe distribution of wildlife Austin (2002) delineatedtwo types of independent variables for model building(i) lsquoproximalrsquo (direct) variables are those that representresource shelter or thermal gradients that have adirect influence on a species distribution (eg temper-ature and foliar-nutrient) and (ii) lsquodistalrsquo (or indirect)variables that have no physiological effect on the spe-cies but are correlated with lsquoproximalrsquo variables (egaltitude latitude) Modelling with proximal variableswill more often produce a model that makes transport-able and robust predictions whereas models based ondistal predictors are likely to be more specific to thelocation in which they were constructed (Austin2002) However direct variables are not always avail-able as GIS layers because they tend to be difficult tomap (Guisan amp Zimmerman 2000) so model buildingfor the purpose of prediction is often undertaken usingdistal variables

Experts should play a role in the identification ofenvironmental variables that may be important predic-tors of a species habitat They may be useful indeveloping new variables that are combinations or der-ivations of distal variables For example indices such

Table 1 Number of presence and absence sites derivedfrom the LHCCREMS and BSS databases used in modelbuilding and evaluation for each of the seven priority species

SpeciesPresencerecords

Absencerecords Total

Koala 88 162 250Tiger quoll 36 75 111Squirrel glider 112 129 241Yellow-bellied glider 92 152 244Masked owl 55 149 204Powerful owl 97 142 239Sooty owl 56 156 212

BSS biological systematic survey LHCCREMS TheLower Hunter and Central Coast Regional EnvironmentalManagement Strategy

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 6: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

724 B A WINTLE ET AL

needs particular species of sap-trees and winter flow-ering eucalypts (Goldingay amp Kavanagh 1993) Avail-able habitat for these species is becoming increasinglyfragmented due to habitat destruction and fragmenta-tion caused by urban and infrastructure developmentagriculture and mining (Reed amp Lunney 1990)

The tiger quoll (Dasyurus maculatus) is a medium-sized carnivorous marsupial that inhabits a wide rangeof habitat types from sclerophyll forest and woodlandsto coastal heathlands and rainforests (Edgar amp Belcher1995) The species requires suitable den sites (such ashollow logs tree hollows rock outcrops or caves) anda large area of intact vegetation for foraging (Edgar ampBelcher 1995)

The powerful owl (Ninox strenua) the sooty owl(Tyto tenbricosa) and the masked owl (Tyto novaehollan-diae) are broadly distributed throughout forest wood-land and rainforest of eastern Australia All owl speciestolerate some degree of fragmentation and discontinu-ity in forest cover However they do rely on large treehollows for nesting and denning and prey on forestdependent species such as marsupial gliders and smallground mammals making them susceptible todeclines in prey abundance resulting from high levelsof forest fragmentation (Kavanagh 2002)

The choice of model variables was driven by whatwas known about the habitat requirements of eachspecies Of the species modelled home range esti-mates vary from as low as 065 ha for the squirrelglider (Russel 1995) to 20 km2 for the tiger quoll(Edgar amp Belcher 1995) A modelling grid cell size of1 ha was chosen on the basis that it approximates thehome range size of the smallest ranging species

Data collation filtering and handling

Biological data

Data for the seven target species were obtained fromthe biological systematic survey (BSS) module of theNSW NPWS Wildlife Atlas database and from surveyscommissioned by the LHCCREMS specificallyundertaken to fill gaps in the Atlas data for priorityspecies (Ecotone Ecological Consultants 2001) Sys-tematic survey data contained records of both pres-ence and absence for each of the seven priority speciesThe availability of systematic survey data reduced theeffort required to prepare data for modelling as surveymethod and effort covariates were available for datafiltering Only data collected after 1990 and usingsurvey methods appropriate to detect each specieswere used in modelling Records were filtered toensure a minimum geographical separation distance ofat least 500 m in an attempt to increase the probabilitythat observations were independent Presence recordswere retained in preference to absence records when

two (or more) records fell within 500 m of each otherThe data that remained for model building and testingare summarized in Table 1

Survey method or effort covariates may be incorpo-rated in models when survey data are collected usingmethods of variable reliability (Pearce et al 2001a) Inour study such covariates were not required becausedata used for modelling were relatively uniform withrespect to method and effort

Environmental data

One of the primary limitations of the predictive per-formance of wildlife habitat models is the availabilityof broadly mapped environmental variables that areclosely related to environmental attributes that affectthe distribution of wildlife Austin (2002) delineatedtwo types of independent variables for model building(i) lsquoproximalrsquo (direct) variables are those that representresource shelter or thermal gradients that have adirect influence on a species distribution (eg temper-ature and foliar-nutrient) and (ii) lsquodistalrsquo (or indirect)variables that have no physiological effect on the spe-cies but are correlated with lsquoproximalrsquo variables (egaltitude latitude) Modelling with proximal variableswill more often produce a model that makes transport-able and robust predictions whereas models based ondistal predictors are likely to be more specific to thelocation in which they were constructed (Austin2002) However direct variables are not always avail-able as GIS layers because they tend to be difficult tomap (Guisan amp Zimmerman 2000) so model buildingfor the purpose of prediction is often undertaken usingdistal variables

Experts should play a role in the identification ofenvironmental variables that may be important predic-tors of a species habitat They may be useful indeveloping new variables that are combinations or der-ivations of distal variables For example indices such

Table 1 Number of presence and absence sites derivedfrom the LHCCREMS and BSS databases used in modelbuilding and evaluation for each of the seven priority species

SpeciesPresencerecords

Absencerecords Total

Koala 88 162 250Tiger quoll 36 75 111Squirrel glider 112 129 241Yellow-bellied glider 92 152 244Masked owl 55 149 204Powerful owl 97 142 239Sooty owl 56 156 212

BSS biological systematic survey LHCCREMS TheLower Hunter and Central Coast Regional EnvironmentalManagement Strategy

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 7: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 725

as a lsquohollows indexrsquo or lsquofoliar nutrient indexrsquo have beendeveloped from maps of forest age and floristic com-position and used in habitat modelling in the past(National Parks amp Wildlife Service 1998) Similarlyvegetation variables with many categories of vegetationare often difficult to use as habitat variables in theirraw form Experts may be used to aggregate vegetationclasses into those most likely to be of relevance to thespecies being modelled or to develop interaction termsbetween for example vegetation type topographiccontext and forest age Derived variables may serve asuseful predictors of habitat where the raw variables donot Similarly neighbourhood measures such as lsquotheproportion of old forest within 2 kmrsquo have also beenused successfully in habitat modelling (Ferrier et al2002a) Neighbourhood measures convey the localenvironmental context of a site which may be asimportant as the attributes of the site itself especiallyin the case where the home range of a species is largerthan the cell size used in modelling or where surveylocations are imprecisely known Compiling a set ofcandidate environmental variables should involvecareful consideration of the biology of the speciesbeing modelled

A set of climatic variables derived from ANUCLIM(Houlder et al 1999) were available for use in modeldevelopment These may have a direct role in deter-mining species distributions through metabolic con-straints However the remainder of available predictorvariables would be considered distal Neighbourhoodmeasures were derived and tested for all species at arange of neighbourhood distances The absence of pre-dictor variables relating to forest growth stage a com-monly used surrogate for tree-hollow and denningavailability (National Parks amp Wildlife Service 19982000 Loyn et al 2001) is likely to place substantiallimitations on the predictive performance of finalmodels All environmental variables were stored asspatial layers (or grids) in a GIS with a grid cell reso-lution of 100 m which was satisfactory with respectto the home range size of the target species

Testing the adequacy of survey data based on environmental strata

There were a sufficient number of species survey datapoints in the region for fitting statistical models foreach of the seven priority species However having asufficient number of data points for fitting a regressionmodel does not guard against inherent bias in surveydata To address concerns about geographical andenvironmental biases data were tested for their cover-age of key environmental strata This was undertakento ensure that models developed would be relevantto the range of environmental conditions present inthe region and that excessive extrapolation of fitted

responses would not be required Environmental stratawere defined by overlaying GIS raster maps (1 ha gridcell size) of four key variables broad vegetation cover(seven classes) topographic position (three classes)mean annual temperature (four classes) and meanannual rainfall (seven classes) resulting in 384 possiblestrata 256 of which were represented by at least 50 haof forest in the region By overlaying survey locationsand the map of environmental strata it is possible totabulate the proportion of sampled versus un-sampledstrata for each species Maps showing the regionalcoverage of sampled and unsampled strata were cre-ated for each species to illustrate where model predic-tions may be less reliable and where future biologicalsurveys could be targeted

Model development and evaluation

Data preparation

Geographical information system layers representingenvironmental variables were sampled at each surveylocation using ArcInfo (ESRI 1997) to construct amodelling data frame for each species Similar func-tions are available in most GIS software Each row ofdata contained the survey observation (1 = speciespresent 0 = species absent) and values for each of thecandidate predictor variables at the survey locationsThis resulted in an n(k + 1) data matrix where n isthe number of survey locations and k is the numberof potential predictor variables

In order to construct presence-only models a sec-ond data matrix was created for each species thatcontained records for which the species was presentBackground samples (sometimes termed lsquopseudo-absencersquo data) were generated for 10 000 randomlocations across the landscape according to methodsdescribed by Ferrier and Watson (1997) and usingsoftware developed by Landcare New Zealand (JOverton pers comm 2005) The rationale for usingbackground samples when no lsquorealrsquo or systematicabsence data exist is that it can be used to create anenvironmental profile of the study area which is thencompared with the environmental profile of knownlsquopresencersquo locations An n(k + 1) modelling matrixwas created as for the presencendashabsence data where nis equal to the number of presence observations plusthe 10 000 random lsquoobservationsrsquo of absence

Data transformation

In some instances the distribution of environmentalvariables at survey locations may be long-tailed Thismay bring about problems with model fitting due to

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 8: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

726 B A WINTLE ET AL

points in the tails having high leverage and tends toreduce the explanatory power of independent vari-ables Data may be transformed to avoid this problemA common transformation is the log-transformationthough a range of other transformations exist Inspec-tion of our modelling data revealed no substantialproblems with long-tailed predictor variables

Variable reduction

Approximately 50 variables were available thatdescribed the environmental characteristics whichmay govern site occupancy for the seven target speciesIt is common to offer all possible variables as candi-date model predictors (eg National Parks amp WildlifeService 1998) and utilize an automated variable selec-tion routine to eliminate inappropriate predictors andto specify the final model However offering manycandidate variables tends to result in models thatinclude nonsense predictors and exclude variablesthat in fact influence the probability of occupancy(Derksen amp Keselman 1992 Chatfield 1995 Steyer-berg et al 2001a) This is especially problematic whenthe number of species observations is low comparedwith the number of candidate predictors

An alternative approach is to minimize the numberof variables that are offered to the variable selectionroutine Harrell et al (1996) recommend a rule ofthumb that less than m10 predictor degrees of free-dom (PDF) should be offered as candidates to a vari-able selection routine such as backward selectionwhere m equals the number of observations of the leastprevalent class in the modelling data set (often thenumber of presence observations) PDF indicates thenumber of possible parameters estimated in the largestmodel that could be constructed from the set of can-didate predictors The number of parameters dependson the number of predictors and the form in whichthe predictor variables are used in the model (see nextsection) For categorical variables or non-linearresponses there are more than one PDF for each envi-ronmental variable For example categorical variableshave a parameter for all but one category sometimesleading to many PDF per categorical variable Simi-larly quadratic or cubic functional forms require twoand three PDF respectively In our situation m rangesfrom 36 for the tiger quoll to 112 for the squirrel glider(Table 1) Therefore the maximum number of candi-date PDF offered to the variable selection routineshould be no more than 4 or 11 respectively In thiscase the allowable PDF is much less than the totalnumber available The number of candidate PDF wasfirstly reduced by removing distal variables that werehighly correlated with proximal variables (R gt 06) inthe context of biological or metabolic requirements ofthe species

Irrespective of variable reduction issues fitting sta-tistical models with collinear predictor variables maycause statistical problems and should be avoided(Belsley et al 1980) Expert opinion and previous hab-itat models were consulted where further variablereduction was required The number of candidatemodel variables (and PDF) offered to the variableselection routine varied for each species depending onthe amount of biological survey data available Thefinal set of candidate predictors presented in Table 2includes variables that were offered as candidates forat least one species

Variable form

One of the strengths of GLMs and GAMs is the pos-sibility of fitting non-linear relationships between thedependent variable (probability of occurrence) and theindependent environmental variables However it maybe difficult to know a priori whether a particular rela-tionship is likely to be linear quadratic cubic or otherOne strategy that is commonly employed is to allowan automated variable selection routine full freedomto fit any relationships using a trade-off between com-plexity and variance explained However it has beenshown that such an approach may result in the fittingof spurious relationships with no logical interpretation(Steyerberg et al 2001a) Consequently we recom-mend either choosing functional forms with noautomated variable selection or limiting the range offunctional forms available to the variable selectionalgorithm to those that are supported by sound eco-logical intuition and preliminary data analysis In thisexercise we conduct preliminary analyses on responseshapes by fitting univariate GAMs with five degrees offreedom and visually inspecting plots of responseshapes (Austin amp Meyers 1996) This approach allowsthe user to assess whether more complicated responsesappear sensible and are justified by the data Visualinspection of fitted response shapes is used again laterin the process for evaluating the ecological realism offinal selected models (see below)

Variable selection and model fitting

Final GLMs and GAMs for all species were selectedand fitted to both presencendashabsence and presence-background data using a backward stepwise variableselection algorithm (Venables amp Ripley 2003) in Rresulting in four models for each species The variableselection algorithm tests a series of nested modelsusing Akaikersquos information criterion (AIC Akaike1973 Venables amp Ripley 2003) to select between mod-els The choice of algorithm may influence the struc-ture of the final model and alternative automatedmodel selection algorithms including forward selec-

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 9: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 727

tion are available The backward selection algorithmis generally preferred to the other standard automatedmethods because it generally performs better in thepresence of collinear candidate variables and becauseit requires consideration of the full model fit (Harrell2001) There is a substantial body of literature dedi-cated to model selection issues and automated algo-rithms have been criticized (Chatfield 1995 Hoetinget al 1999 Harrell 2001) However the alternativestend to be complicated We have chosen to use a pack-aged model selection algorithm as a compromisebetween practicality and sophistication We reiteratethat expert reduction of predictor variables prior tofinal model selection is critical to the successfulapplication of automated variable selection routines(Harrell 2001)

Model evaluation statistics

Presencendashabsence models were evaluated using twostatistics (i) the area under the receiver operating char-acteristic (ROC) curve (Hanley amp McNeil 1982)(ROC area) closely related to the MannndashWhitney U-statistic and (ii) Millerrsquos calibration slope (Miller et al1991 Pearce amp Ferrier 2000) The ROC area evaluatesa modelrsquos ability to discriminate between presence andabsence sites and is therefore referred to as a measureof model (or predictive) lsquodiscriminationrsquo (Pearce amp

Ferrier 2000) This provides an indication of the use-fulness of the models for prioritizing areas in terms oftheir relative importance as habitat for the particularspecies The ROC area ranges from 0 to 1 where ascore of 1 implies perfect discrimination and a scoreof 05 implies predictive discrimination that is no bet-ter than a random guess (Bambar 1975) The actualvalue of the ROC area has a straightforward interpre-tation It is the probability that for a randomly selectedpair of presencendashabsence observations derived fromfield surveys the model prediction for presence willbe greater than the prediction for absence

Millerrsquos calibration statistics (MCS) evaluate theability of a model to correctly predict the proportionof sites with a given environmental profile that will beoccupied MCS are derived from a logistic regressionof observations on the logit of predicted probabilitiesThe rationale is that the slope of the regression wouldbe equal to one and the regression intercept wouldequal zero if the predictions from the model wereperfectly calibrated (Harrell 2001 Pearce amp Ferrier2000) The model is known as the lsquologistic calibrationequationrsquo The two calibration statistics are literally thelogistic regression slope (lsquocalibration slopersquo) and inter-cept (lsquocalibration interceptrsquo) though it is common justto report the calibration slope

The presence-background models were evaluatedqualitatively The statistics available for evaluatingpresence-only models are limited and a fair com-

Table 2 Abbreviated names and definitions of mapped environmental data used as candidate predictor variables for inclusionin habitat models All environmental data were available in raster format with 100 m (side length) grid cell size

Candidate variable Definition

Temp Mean annual temperature derived from ANUCLIMCold Mean temperature of the coldest period derived from ANUCLIMRain Mean annual rainfall derived from ANUCLIMDry 2000 The percentage of cells in a 2000-m radius containing dry forestRf2000 The percentage of cells in a 2000-m radius containing rainforestSolar The solar radiation index of a cell derived from ANUCLIMElev The elevation of a cell (in metres) above see levelRugg250 (500 1000)Terr250 (500 1000)

Topographic ruggedness (standard deviation in elevation) in a 250-m 500-m and 1000-m radiusRelative terrain position in a 250-m 500-m and 1000-m radius

Topo The topographic position of a cell ranging from gully to ridge top (0ndash100)Unmod500 The percentage of cells in a 500-m radius containing unmodified forestUnmod2000 The percentage of cells in a 2000-m radius containing unmodified forestWet500 The percentage of cells in a 500-m radius containing wet forestWet2000 The percentage of cells in a 2000-m radius containing wet forestFert Index of the soil nutrient content at a site based geochemical data (CSIRO)Percnonfor2k The percentage of cells in a 2000-m radius classified as cleared landYbglexp2000dagger The percentage of cells in a 2000-m radius containing suitable ybgl habitatSqglexp2000dagger The percentage of cells in a 2000-m radius containing suitable sqgl habitatSowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable sowl habitatMowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable mowl habitatPowlexp2000dagger The percentage of cells in a 2000-m radius containing suitable powl habitatKoalexp2000dagger The percentage of cells in a 2000-m radius containing suitable koal habitat

daggerdenotes expert variable derived from vegetation classes (Michael Murray pers comm 2004) koal koala mowl maskedowl powl powerful owl quol tiger quoll sowl sooty owl sqgl squirrel glider ybgl yellow-bellied glider

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 10: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

728 B A WINTLE ET AL

parison with the presencendashabsence models was notpossible here

The model evaluation method (bootstrapping)

Presencendashabsence models for each species were eval-uated using a lsquobootstrappingrsquo approach (Efron ampTibshirani 1997 Harrell 2001) Naively testing amodel on the data that were used to fit it (alsoknown as lsquoin-sample validationrsquo) is inappropriate andknown to provide optimistic estimates of model per-formance (Fielding amp Bell 1997 Steyerberg et al2001b) despite being a commonly used approach tomodel evaluation (National Parks amp Wildlife Service1998 2000) Ideally models should be tested on acompletely independent data set with data collectionand stratification specifically designed for model eval-uation However this is seldom a practical option andmay also be prone to high variance unless samples arelarge (J Elith unpublished data 2005) Bootstrappingprovides a realistic estimate of the predictive perfor-mance of a model without incurring the expenses ofcollecting a completely new model-testing data setBootstrapping involves resampling the modelling dataand conducting a series of model building and testingsimulations that provide an estimate of the optimismarising from in-sample validation The estimate ofoptimism is used to provide an adjusted estimate ofthe model evaluation statistics (ROC and MCS) Thebootstrapping version implemented here is believed toprovide the least biased estimate of predictive perfor-mance of any of the model evaluation methods thatare based on re-sampling including cross-validation(Hastie et al 2001) Cross-validation (Efron ampTibshirani 1997) provides an alternative approach tomodel evaluation and might be more feasible withmethods or data sets that create large computationalloads However its estimates of error rates with inde-pendent data can be less precise than those derivedfrom bootstrapping (Steyerberg et al 2001b) whichcan be thought of as a smoothed version of cross-validation (Efron amp Tibshirani 1997) The bootstrap-ping method (the 0632+ bootstrap) is detailed inAppendix I R code for obtaining bootstrapped esti-mates of ROC and MCS is available from the URLgiven previously

Inspecting models for ecological realism

Final models were inspected for their ecologicalrealism using partial plots of univariate fitted functions(Venables amp Ripley 2003) similar to those used todetermine the complexity of candidate responseshapes Partial plots are available as an option in sta-tistical software and provide a ready means to evaluate

the ecological realism of fitted responses If an implau-sible fitted relationship between the probability ofoccupancy by the species and a particular environ-mental variable was observed in the partial plots thevariable was excluded from the model or the modelwas re-fitted with fewer degrees of freedom for thatparticular environmental variable

Case study results and interpretation

The adequacy of species survey data for model fitting

Environmental strata were reasonably well sampledeven for the species with the least and most restrictedsurvey data For six of the seven species sampledstrata make up at least 70 of the region that is underforest cover with the only exception being the quollfor which only 56 of the region comprises sampledstrata Similarly the geographical spread of sampledstrata was reasonable for most species with maps ofsampled strata showing good coverage of the regionfor all species (Fig 2) The maps of sampled andunsampled strata provide an indication of the areas inwhich habitat model predictions may be less reliable

Models and predictive performance

The best models in terms of both predictive discrim-ination and calibration were those for the sooty owlsquirrel glider yellow-bellied glider koala and tigerquoll all of which had predictive discriminationgreater than 075 for their best model (Table 3) Theworst models were the powerful owl and masked owlmodels with bootstrapped ROC areas ranging from061 to 069 In general GAMs had similar variablestructure and degrees of freedom to GLMs andcomparable predictive discrimination However boot-strapped predictive calibration was always better forGAMs than for GLMs (Table 3) This is probablybecause the smooth functions in GAMs are better atfitting complex responses over the complete environ-mental range of the response leading to more accuratepredicted probabilities

Presence-only models had similar model structuresto the presencendashabsence models The result suggeststhat the survey absence data used in the modellingexercise was not much more useful than a backgroundrandom sample in defining unsuitable habitat for mostspecies This may result from the difficulties associatedwith obtaining reliable absence data for cryptic spe-cies In addition the quality of the presence-only dataused in this modelling exercise is likely to be highcompared with presence-only data found in mostmuseums and herbaria because they were derivedfrom systematic stratified surveys Other studies have

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 11: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 729

shown a decline in performance when fitting modelswith background samples compared with using pres-encendashabsence data (Ferrier amp Watson 1997)

Preferred models were primarily chosen on the basisof predictive discrimination and calibration Plausibil-ity of response shapes was used to discern betweenmodels that showed similar predictive performanceOnly model structures for the preferred model for

each species are presented in Table 3 There was somevariation between the four modelling methods interms of the variables retained and their responseshapes for each species For example the yellow-bel-lied glider GLM contained the term lsquodry 2000rsquo whilethe GAM for the same species did not include thisterm Similarly the sooty owl GLM contained rainfallas a linear term while the GAM for that species

Fig 2 Coverage of sampled strata over the greater LHCC region for the best ((a) Koala) and worst ((b) Quoll) sampledspecies Unsampled strata are represented by the light grey shading Remaining areas within the region (not shaded) containsampled strata or are non-forest and therefore not included in the analysis

(a) (b)

Table 3 Final presencendashabsence models and bootstrapped estimates of predictive discrimination (ROC) and calibration

Species Preferred modelPreferred

model type

ROC Area Calibration

GAM GLM GAM GLM

Koala sp sim s(temp2) + s(dry20003) + s(rugg5002) +unmod2000 + percnonfor2k

GAM 076 076 084 075

Quoll sp sim temp + rain + percnonfor2k +poly(dry20002)

GLM 071 075 073 059

Sqgl sp sim s(rugg5003) + sqglexp500 +s(unmod5002)

GAM 078 077 076 068

Ybgl sp sim s(temp2) + s(rain2) + s(rf20002) + unmod2000 +s(ybglexp20003) + percnonfor2k

GAM 078 077 073 063

Mowl sp sim s(cold2) + mowlexp2000 + rf2000 +s(unmod20002) + wet2000 + percnonfor2k

GAM 063 061 063 055

Powl sp sim s(cold3) + s(rain2) + s(rugg5002) GAM 069 068 080 062Sowl sp sim s(rain2) + rugg500 + s(sowlexp20003) +

s(ter10002) + unmod2000GAM 085 085 074 067

Only the preferred model selected from the presence-absence models (either a generalized linear model or a generalizedadditive model) is presented for each species Column lsquopreferred model typersquo indicates which of the methods produced thefinal preferred model for each species ROC and calibration slope statistics are presented for the model derived under eachmodelling method Variable abbreviations and details are given in Table 2 The expressions lsquopoly(variable n)rsquo and lsquos(variablen)rsquo in the preferred model column indicate variables included as polynomial (lsquopolyrsquo) or smoothed (lsquosrsquo) terms with lsquonrsquo degreesof freedom ROC receiver operating characteristic

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 12: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

730 B A WINTLE ET AL

included rainfall as a smoothed spline with twodegrees of freedom (a non-linear term) However thestructural differences between GLM and GAM mod-els were minor and the similarity in ROC areas of thetwo model types across all species (Table 3) impliesthat the differences had a minor impact on predictivediscrimination

Plots of fitted functions

Plotting fitted relationships provided a useful meansto visualize and manipulate the behaviour of fittedmodels to ensure that they were ecologically meaning-ful In general fitted relationships were found to beecologically plausible Plots of sooty owl responseshapes (Fig 3) show how the probability of sooty owlsite occupancy varies with each of the predictorvariables contained in the final GAM model for that

species Fitted functions for individual variables ina multivariate model may be different to univariatefitted functions because of interactions betweenpredictor variables Consequently it is important toobserve response shapes for the final model to ensurethat they remain ecologically reasonable in the multi-variable context

Habitat maps

A habitat map is provided for the sooty owl as anexample of the type of output that can be used inconservation planning (Fig 4) The map shows thepredicted probability of sooty owl presence through-out the region Probabilities have been classified intofour categories (0ndash02 021ndash05 051ndash07 071ndash10)for presentation These maps also display the pres-ence records for the species that were used in model

Fig 3 Partial plots of the relationship between the probability of occupancy and environmental variables included in thefinal sooty owl model The X-axis represents the range of values for each environmental variable Probabilities on the Y-axisare plotted in transformed lsquologitrsquo space so that they can be interpreted in the same way as linear regressions Response shapesin each plot represent the relationship between each variable and the probability of sooty owl occupancy in the multivariatemodel context independent of the other variables included in the model Dashed lines represent 95 confidence intervalsaround the fitted response shape Rugg500 topographic ruggedness (standard deviation in elevation) in a 500-m radiusSowlexp 2000 the proportion of suitable vegetation (as defined by experts) in a 2000-m radius terr1000 relative terrainposition in a 1000-m radius unmod2000 the percentage of cells in a 2000-m radius containing unmodified forest

800 900 1000 1100 1200 1300

rain

-4-2

02

4

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60

rugg500

-20

24

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

sowlexp2000

-10

12

34

5

Logi

t (pr

obab

ility

of o

ccup

ancy

)

-50 0 50 100

terr1000

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

0 20 40 60 80 100

unmod2000

-12

-10

-8-6

-4-2

02

Logi

t (pr

obab

ility

of o

ccup

ancy

)

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 13: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 731

fitting allowing a visual assessment of predictiveperformance Mapped predictions and uncertainties(based on upper and lower 95 prediction intervals)for the remaining six species are available from theauthors

DISCUSSION

The importance of ecological realism in modelling and the role of experts

The choice of modelling methods and approach tomodel building and evaluation presented in this paperreflects the importance we place on maintaining eco-logical realism throughout the model building processWhile maintaining statistical rigour is central to goodmodel building there is an expansive literature on thestatistical nuances of model fitting interpretation andprediction (Guisan amp Zimmerman 2000) Converselythere is much less literature available concerning theimportance of maintaining ecological realism in modelbuilding (Austin 1991 2002) The process of plottingand evaluating fitted functions throughout modelbuilding and interpretation is fundamental to ensuringthe ecological realism and general credibility of thefinal models

Expert opinion forms an integral part of soundmodel building and evaluation irrespective of themodelling platform and theoretical approach used(Steyerberg et al 2001b) Some modellers view expertopinion as a source of undesirable subjectivity andprefer to let the data dictate the model However thisignores the limitations commonly encountered withecological data (small to medium-sized species datasets that may include biases and many candidate pre-dictor variables) and the related challenges in devel-oping robust and sensible models Expert opinion can(and should) be used to assist in the identification ofcandidate model variables interactions between vari-ables and likely response shapes (Pearce et al 2001bFerrier et al 2002a) for corroboration of a modelrsquosecological realism (Austin 2002) for ad hoc evaluationof model predictions (National Parks amp WildlifeService 1998) and for preparation of predictive mapsfor use in decision making (Ferrier et al 2002a)Experts may also be used effectively in the creation ofindices that may themselves be used as candidatevariables for statistical modelling (National Parks ampWildlife Service 1998) Consequently the role ofexperts should be thought of as complementary toother more data-driven methods rather than as acompeting alternative In our case study we emphasizethe use of experts in variable reduction prior to auto-mated variable selection We use Harrellrsquos (2001) ruleof thumb to identify the maximum PDF that shouldbe offered to the variable selection routine We con-tend that if expert knowledge is so lacking that reduc-ing the candidate set to m10 is impossible resultingmodels should be treated with scepticism

The lack of an accepted model evaluation statisticfor presence-only models detracts from their utility inconservation planning and increases the importance ofexpert evaluation Development of robust evaluationsof presence-only models is ongoing

Interpretation of habitat maps

Threshold predictions

In some instances users of habitat maps may wish toidentify a level of predicted probability of occupancybelow which an area would be considered lsquounsuitablersquohabitat for a given species In general such an inter-pretation of habitat maps is ill-advised unless themodel that underpins the map is particularly wellcalibrated However in such instances where it isrequired thresholds could be assigned in a riskweighted manner For example if it is very importantnot to classify an area as lsquonon-habitatrsquo when it may beused by a species a low probability threshold (egprobability of occurrence = 01) should be specifiedwhen delineating non-habitat By choosing a threshold

Fig 4 The predicted probability of sooty owl occupancyacross the LHCC region Probabilities are represented as agrey-scale (white = 0ndash02 light grey = 021ndash05 mediumgrey = 051ndash08 dark grey = 081ndash10) The open trianglesshow point survey locations where the sooty owl is recordedas present Habitat maps were constructed by creating avector of habitat model predictions (one for every 1 ha gridcell in the landscape) in R using the lsquopredictgamrsquo commandThe prediction vector is then written to an ascii text file inthe correct dimensions for the landscape and imported toArcInfo (or ArcView) for display

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 14: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

732 B A WINTLE ET AL

of 01 for a well-calibrated model it is implied that a10 chance of a site (a grid cell of 100 m) beingoccupied by the species is a tolerable risk of failing toinclude areas that are in fact habitat for the speciesThis decision must be made on a case-by-case basisby evaluating the management costs of the two typesof error (a false-negative and false-positive prediction)rather than by relying on an arbitrary threshold set bythe modeller

Representing uncertainty

Many forms of uncertainty impact on maps ofpredicted probability of species occurrence (Elithet al 2002) The prediction intervals available fromregression models describe the part of this uncer-tainty associated with estimating model parametersand this stems from inadequate data errors in mea-surement and natural variation (Elith et al 2002)There are additional uncertainties such as modelselection uncertainty and errors in explanatory vari-ables that are however not addressed by standardprediction intervals While some methods have beendeveloped for explicitly incorporating these forms ofuncertainty in prediction intervals (Burnham ampAnderson 2002 Wintle et al 2003) there is cur-rently very little guidance on how prediction inter-vals should be represented and interpreted inapplications of habitat modelling and conservationplanning

Interpreting model evaluation statistics

The two statistics used for model evaluation in thisstudy address different model applications Specifi-cally model discrimination measured by the ROCarea measures the degree to which the model suc-cessfully ranks presence sites higher than absencesites across the region in terms of predicted proba-bility of presence Models with satisfactory ROCareas will provide reliable ranking of areas in termsof habitat value This mode of evaluation is relevantwhen the goal of a model is to rank or prioritizeareas of interest in terms of their relative value ashabitat for a species This goal is different to that ofaccurately predicting the proportion of sites that areexpected to be occupied at a given predicted proba-bility of occurrence For instance at sites with apredicted probability of occurrence of 05 it is rea-sonable to expect that approximately 50 of suchsites would contain the species The degree towhich this is true is described as model calibration(Miller et al 1991) and is assessed using the logisticcalibration equation It is possible that a model withgood discrimination can have poor calibration It is

also possible to improve a modelrsquos predictive abilityby adjusting the model parameters with calibrationstatistics (Harrell 2001) or other shrinkage methods(Hastie et al 2001) A model with poor calibrationmay still be useful for prioritizing or ranking sites interms of habitat value but should not be used topredict the raw probability of finding a species at agiven site The choice of whether to focus on ROCareas or calibration statistics depends on theintended application of the models

The issue of occupancy versus persistence

Models of the sort discussed in this paper make theimplicit assumption that occupancy of a locationimplies suitability of the habitat Van Horne (1983)identified problems associated with this assumptionand Pulliamrsquos (1988) lsquosource-sinkrsquo model of popula-tion dynamics formalizes its conceptual deficienciesPulliamrsquos model differentiates lsquosinkrsquo areas where mor-tality exceeds population growth from lsquosourcersquo areasthat maintain overall population size through emigra-tion Similarly if the total population is in a state ofdecline it is possible that lsquoremnantrsquo individuals mayexist in unsuitable habitat that will not be occupied inthe next generation When a population is expandingits range observations of absence of the species mayfalsely imply that the suitability of the habitat is lowThese anomalies arise due to what is known as thelsquoequilibriumrsquo assumption which underpins the cur-rent static approach to wildlife habitat modelling(Austin 2002) Inclusion of disturbance history vari-ables dispersal barriers competition and successionaldynamics may assist in modelling such situationsthough examples of such approaches are rare

Survey design and sampling issues

This paper has focused on the choice and executionof particular modelling methods given a set of dataIssues of data quality and sampling have not beenaddressed Because modelling results are only as goodas the available data the issues of data quality andsampling demand some discussion here

Sample size

From a statistical perspective the choice of samplesize is a function of the desired precision of results Instudies where the goal is to make inference about aparticular effect (eg the influence of time since log-ging on the probability of greater glider occupancy)power analysis may be used to determine the samplesize required to achieve a statistically significant result

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 15: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 733

for a given effect size type I error rate and samplingvariance (Burgman amp Lindenmayer 1998) The rela-tionship between statistical significance power andmodel predictive performance is unclear and there isno guarantee that a model with statistically significantterms will give good predictive performance Unfortu-nately there is no simple way to determine a priorithe predictive performance of a habitat model or tocalculate a sample size that will ensure a predictiveperformance that is statistically significantly betterthan that of a null or alternative model We could findonly one attempt to address this problem in the statis-tical literature (Obuchowski 1994) making this animportant area of future research A simulation studythat provided rules of thumb about sample sizesrequired to achieve a given predictive performanceunder range of likely modelling scenarios would be animportant contribution to the habitat modellingliterature

Data quality

A more subtle consideration is the geographical spreadand environmental stratification of observations com-pared to the extent and environmental variability ofthe region for which models are to be built We brieflytouch on this issue in our case study when testing theadequacy of survey data using environmental strataLittle guidance is provided in the literature as to theminimum geographical or environmental coverage ofdata for statistical modelling though it is commonsense to expect that model predictions are unlikely tobe reliable in environmental domains for which thereare no survey data (Austin amp Meyers 1996) Stratifiedsampling designs such as lsquoGRADSECTrsquo (Austin ampHeyligers 1989) that ensure a geographical and bio-climatic coverage of sampling locations are thereforeappealing GRADSECT theory or newer develop-ments such as Generalized Dissimilarity Modelling(GDM Ferrier et al 2002b) or a p-median criterion(Faith amp Walker 1996) may also be applied to existingdata sets to identify lsquogapsrsquo in the geographical andenvironmental coverage of samples

Most statistical analyses including the regressionmethods used in our case study assume that observa-tions are independent of each other Survey locationsthat are close together are much less likely to be inde-pendent especially when surveys target animals withlarge home ranges Survey design should aim to ensurea minimum separation distance that is at least as greatas the home range radius of the widest ranging speciesin the study Random sampling within geographicaland environmental strata assists in minimizingunwanted dependencies in observation data Guid-ance on sampling theory should be sought from one

of the many texts that deal with the topic in detail (egSutherland 1996 Thompson 2002)

Technical issues and promising advances in modelling methods

Species versus community approaches

Models of the distribution of single species are not theonly way to approach conservation planning ques-tions Single species models are important for a subsetof species such as threatened focal or flagship speciesHowever landscape planning aims to conserve thebiodiversity of a region and there are too many speciesand too little data to achieve this through single-species modelling Therefore there is a role forcommunity-level modelling (Ferrier et al 2002b) Oneexample of community-level modelling is GDMalready mentioned in the context of sampling thismodels compositional dissimilarity across the land-scape This could then contribute to a conservationplan that aimed to span the range of species patternsin the landscape A further role for methods based ondata from many species is to use them for modellingrare species or species in data sets with relatively fewrecords per species In such situations there may beinsufficient data for developing a robust single-speciesmodel GDMs canonical correspondence analysis(CCA) or multivariate adaptive regression splines(MARS) may be useful for this purpose

Spatial autocorrelation

Spatial autocorrelation in wildlife observation dataarises when environmental processes and patterns thatinfluence the spatial distribution of wildlife are them-selves spatially structured andor because the speciesis subject to demographic processes territoriality ordispersal limitations causing spatial dependence(contagion or dispersion effects) Demographic andenvironmental processes underlying spatial patterns inwildlife distributions are usually poorly understoodand therefore difficult to incorporate in model fittingConsequently model residuals are often spatiallycorrelated (ie not independent) violating one of thebasic assumptions of regression modelling In practicenon-independence usually results in underestimationof standard errors and overestimation of the impor-tance of habitat variables (Legendre amp Fortin 1989)Methods such as autologistic regression (Augustinet al 1996) generalized estimating equations (Albertamp McShane 1995) and geographically weightedregression (Fotheringham et al 2002) may be used toincorporate spatial autocorrelation in habitat analyses

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 16: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

734 B A WINTLE ET AL

but they are still not incorporated in standardstatistical software and are technically demanding toimplement

Model selection and model uncertainty

The standard approach to representing predictionuncertainty involves the calculation of prediction (orconfidence) intervals that incorporate uncertaintyabout parameter estimates However this approach toprediction implicitly assumes that the model chosen isthe best available representation of the truth andeffectively ignores model selection uncertainty result-ing in overconfident predictions Automated variableselection algorithms exacerbate this problem by pro-viding a method for dredging through many candi-date predictors in search of an explanatory model(Chatfield 1995 Hoeting et al 1999) One solution isto completely avoid automated model selection and toanalyse only a small set of lsquoa priorirsquo models (Burnhamamp Anderson 2002) This approach may be practicalwhen strong prior knowledge exists and where thenumber of plausible models is small However if pre-diction is the primary goal and many possible modelsexist some level of automated variable selection isoften desirable Model averaging approaches includ-ing Bayesian model averaging have been promoted ina range of disciplines as a means of incorporatingmodel selection uncertainty into statistical inferenceand prediction (Hoeting et al 1999) and has beenapplied in some ecological examples (Burnham ampAnderson 2002 Wintle et al 2003)

Estimating and incorporating detectability in habitat models

Unless the probability of detecting a species when itis present is equal to 1 false negative observationerrors will occur in species surveys The probability ofdetecting the presence of the case study species in anysingle standard survey based on spot-lighting and callelicitation has been found to be very low (Pr[detec-tionpresence] sim 012ndash045 Wintle et al in press)making the reliability of absence data a potentiallyserious form of uncertainty in our case study Recentstudies have demonstrated the negative impact thatfalse-negative observation error may have on species-habitat analyses (Tyre et al 2003) meta-populationmodels (Moilanen 2002) and monitoring studies(MacKenzie et al 2002) Recently developed tech-niques for incorporating detectability in model estima-tion (MacKenzie et al 2002 Tyre et al 2003) reducbias in model estimation brought about by falseabsences though little effort has been invested in test-ing the relative predictive performance of such models

Incorporating biotic interactions

Despite the prevalence of interspecific competition asa key concept in community ecology (Diamond 1975)very little has been attempted by way of incorporatinginterspecific competition in distribution models (forone exception see Leathwick amp Austin 2001) Regres-sion models such as those described by Leathwick andAustin (2001) could be applied to fauna speciesFuture development of lsquoloop analysisrsquo (Levins 1975)or systems of simultaneous regressions (Guisan ampZimmerman 2000) may facilitate spatial models ofwildlife distributions that incorporate competitiveinteractions

The GLMs and GAMs presented here are relativelysimple though suited to the conservation problem andavailable data There is scope for greater sophisticationin the modelling method where a high level of model-ling expertise exists though we believe our recom-mended approach represents a reasonable trade-offbetween practicality and rigour

Model improvement and adaptive management

Due to environmental change natural and anthropo-genic disturbance and population stochasticity all pre-dictive models will become redundant if they are notupdated through time We recommend that models beapplied in an adaptive framework (Walters amp Holling1990) allowing immediate application of models inmanagement but ensuring a commitment to contin-ual improvement of models through future data col-lections and refinements to modelling methods

Future data collection will be central to rigorousmodel testing and model refinements They should betargeted to fill gaps in current data sets (Fig 3) Par-ticular attention should be paid to the collection ofhigh quality observation data based on multiple sitevisits aimed at minimizing false absences (Wintle et al2004) Some important environmental variablesincluding those based on forest growth stage data werenot available for model fitting in the LHCC region Itis likely that models would have better predictive accu-racy if forest growth stage information became avail-able and was incorporated in models

CONCLUSION

Habitat models are now a widely used tool in publicland conservation planning Systematic conservationplanning in the urban fringe will require such toolsto ensure transparency and repeatability of planningoutcomes However access to statistical and ecologi-cal expertise for urban conservation planning is likelyto be limited when compared with the public land

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 17: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 735

planning processes that have taken place in AustraliaIt is therefore important that we continue to developand refine planning tools including habitat modellingmethods with an emphasis on statistical and ecologi-cal rigour and simplicity

ACKNOWLEDGEMENTS

This work was instigated by the LHCCREMSorganization (now Hunter Councils) Steven House(EcoLogical) Michael Murray (Forest Fauna SurveysPL) Philip Gleeson (NPWS) Daniel Connolly(NPWS) Jill Smith (NPWS) Peter Bone (NPWS)Peter Ewin (NPWS) Mick Andren (NPWS) and RodKavanagh (SFNSW) provided data expert guidanceon model construction and valuable insights MarkBurgman and Kuniko Yamada (University ofMelbourne) helped throughout the project Commentsfrom Terry Walshe (University of Western Australia)David Tierney and Robbie Economos improved themanuscript substantially Discussions with TrevorHastie and John Leathwick helped clarify bootstrap-ping issues Two anonymous reviewers provided con-structive feedback Sophie Powrie Kirsty Winter andMeredith Liang (LHCCREMS) provided excellentsupport and useful feedback throughout the project

REFERENCES

Agresti A (1996) An Introduction to Categorical Data AnalysisJohn Wiley and Sons New York

Akaike H (1973) Information theory and an extension of themaximum likelihood In Proceedings of the 2nd InternationalSymposium on Information Theory (eds B N Petrov amp FCsrsquoaki) pp 267ndash81 Akademia Kaido Budapest

Albert P S amp McShane L M (1995) A generalized estimatingequations approach for spatially correlated data applica-tions to the analysis of neuroimaging data Biometrics 51627ndash38

Augustin N H Mugglestone M A amp Buckland S T (1996)An autologistic model for the spatial distribution of wildlifeJ Appl Ecol 33 339ndash47

Austin M P (1991) Vegetation data collection and analysis InNature Conservation Cost Effective Biological Surveys andData Analysis (eds C R Margules amp M P Austin) pp 37ndash41 CSIRO Canberra

Austin M P (2002) Spatial prediction of species distributionan interface between ecological theory and statistical mod-elling Ecol Model 157 101ndash18

Austin M P amp Heyligers P C (1989) Vegetation survey designfor conservation GRADSECT sampling of forests in north-eastern NSW Biol Conserv 50 13ndash32

Austin M P amp Meyers J A (1996) Current approaches tomodelling the environmental niche of Eucalypts implica-tions for management of forest biodiversity For Ecol Man-age 85 95ndash106

Bambar D (1975) The area above the ordinal dominance graphand the area below the receiver operating characteristicsgraph J Math Psych 12 387ndash415

Belsley D A Kuh E amp Welsch R E (1980) Regression Diag-nostics John Wiley and Sons New York

Bio A M F Alkemande R amp Barendregt A (1998) Determin-ing alternative models for vegetation response analysis ndash anon-parametric approach J Veg Sci 9 5ndash16

Breiman L Friedman J H Olshen R A amp Stone C J (1984)Classification and Regression Trees Wadsworth InternationalGroup Belmont

Breininger D R Larson V L Duncan B W amp Smith R B(1998) Linking habitat suitability to demographic successin Florida scrub-jays Wild Bull 26 118ndash28

Burgman M A amp Lindenmayer D B (1998) ConservationBiology for the Australian Environment Surrey Beatty andSons Chipping Norton

Burnham K P amp Anderson D R (2002) Model Selection andMultimodel Inference a Practical Information-TheoreticApproach Springer New York

Burgman M A Breininger D R Duncan B W amp Ferson S(2001) Setting reliability bounds on Habitat SuitabilityIndices Ecol Appl 11 70ndash8

Carpenter G Gillison A N amp Winter J (1993) DOMAIN aflexible modelling procedure for mapping potential distri-butions of plants and animals Biodivers Conserv 2 667ndash80

Chatfield C (1995) Model uncertainty data mining and statis-tical inference J Roy Stat Soc Ser A Stat 158 419ndash66

Commonwealth of Australia (2003) Terms of Reference StandingCommittee on Environment and Heritage Inquiry into Sus-tainable Cities Parliament of Australia House of Represen-tatives Canberra

Derksen S amp Keselman H J (1992) Backward forward andstepwise automated subset selection algorithms frequencyof obtaining authentic and noise variables Br J Math StatPsych 45 265ndash82

Diamond J M (1975) Assembly of species communities InEcology and Evolution of Communities (eds M L Cody amp JM Diamond) pp 342ndash444 Harvard University PressCambridge

Ecotone Ecological Consultants (2001) Lower Hunter CentralCoast Regional Biodiversity Strategy Fauna survey andmapping project ndash Module 1 Fauna Surveys Ecotone Eco-logical Consultants for LHCCREMS Waratah pp 43

Edgar R amp Belcher C (1995) Spotted-tailed Quoll In TheMammals of Australia (ed R Strahan) pp 67ndash8 ReedBooks Chatswood

Efron B amp Tibshirani R (1997) Improvements on cross-validation the 632+ bootstrap method J Am Stat Ass 92548ndash60

Elith J (2000) Quantitative methods for modelling specieshabitat comparative performance and an application toAustralian plants In Quantitative Methods for ConservationBiology (eds S Ferson amp M A Burgman) pp 39ndash58Springer-Verlag New York

Elith J amp Burgman M A (2002) Predictions and theirvalidation rare plants in the Central Highlands VictoriaAustralia In Predicting Species Occurrences Issues of Accu-racy and Scale (eds J M Scott P J Heglund M LMorrison M G Raphael W A Wall amp F B Samson) pp303ndash14 Island Press Covelo

Elith J amp Burgman M A (2003) Habitat models for PVA InPopulation Viability in Plants (eds C A Brigham amp M WSchwartz) pp 203ndash35 Springer-Verlag New York

Elith J Burgman M A amp Regan H M (2002) Mappingepistemic uncertainties and vague concepts in predictionsof species distribution Ecol Model 157 313ndash29

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 18: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

736 B A WINTLE ET AL

Ennis M Hinton G Naylor D Revow M amp Tibshirani R(1998) A comparison of statistical learning methods on theGUSTO database Stat Med 17 2501ndash8

ESRI (1997) Arcinfo 72 Environmental Systems Research Insti-tute Inc Redlands

Faith D P amp Walker P A (1996) Environmental diversity onthe best-possible use of surrogate data for assessing therelative biodiversity of sets of areas Biodivers Conserv 5399ndash415

Ferrier S amp Watson G (1997) An Evaluation of the Effectivenessof Environmental Surrogates and Modelling Techniques in Pre-dicting the Distribution of Biological Diversity Department ofEnvironment Sports and Territories Commonwealth ofAustralia and NSW National Parks and Wildlife ServiceCanberra

Ferrier S Watson G Pearce J amp Drielsma M (2002a)Extended statistical approaches to modelling spatial patternin biodiversity in north-east New South Wales I Specieslevel modelling Biodivers Conserv 11 2275ndash307

Ferrier S Drielsma M Manion G amp Watson G (2002b)Extended statistical approaches to modelling spatial patternin biodiversity the north-east New South Wales experienceII Community-level modelling Biodivers Conserv 112309ndash38

Fielding A H amp Bell J F (1997) A review of methods for theassessment of prediction errors in conservation presenceabsence models Environ Conserv 24 38ndash49

Fotheringham A S Brunsdon C amp Charlton M (2002)Geographically Weighted Regression the Analysis of SpatiallyVarying Relationships John Wiley and Sons London

Franklin J (1995) Predictive vegetation mapping geographicmodeling of biospatial patterns in relation to environmentalgradients Prog Phys Geog 19 494ndash519

Friedman J H (1991) Multivariate adaptive regression splines(with discussion) Ann Stat 19 1ndash141

Goldingay R L amp Kavanagh R P (1993) Home-range esti-mates and habitat of the yellow-bellied glider (Petaurus aus-tralis) at Waratah Creek New-South-Wales Wildl Res 20387ndash404

Graham C H Ferrier S Huettman F Moritz C amp PetersonA T (2004) New developments in museum-based infor-matics and applications in biodiversity analysis Trends EcolEvol 19 497ndash503

Guisan A amp Harrell F E (2000) Ordinal response regressionmodels in ecology J Veg Sci 11 617ndash26

Guisan A amp Zimmerman N E (2000) Predictive habitatdistribution models in ecology Ecol Model 135 147ndash86

Hanley J A amp McNeil B J (1982) The meaning and use of thearea under a Receiver Operating Characteristic (ROC)curve Radiology 143 29ndash36

Harrell F E (2001) Regression Modelling Strategies with Appli-cation to Linear Models Logistic Regression and SurvivalAnalysis Springer New York

Harrell F E Lee K L amp Mark D B (1996) Multivariateprognostic models issues in developing models evaluatingassumptions and adequacy and measuring and reducingerrors Stat Med 15 361ndash87

Hastie T amp Tibshirani R (1990) Generalized additive modelsMonographs on statistics and applied probability (edsD R Cox D V Hinkley D Rubin amp B W Silverman)Chapman amp Hall London

Hastie T Tibshirani R amp Friedman J H (2001) The Elementsof Statistical Learning Data Mining Inference and PredictionSpringer-Verlag New York

Hirzel A H (2001) When GIS come to life Linking Landscape-and Population Ecology for Large Population ManagementModelling the Case of Ibex (Capra Ibex) in Switzerland TheUniversity of Lausanne Lausanne

Hoeting J A Madigan D Raftery A E amp Volinsky C T(1999) Bayesian model averaging a tutorial Stat Sci 14382ndash401

Houlder D J Hutchinson M F Nix H A amp McMahon J P(1999) ANUCLIM User Guide Version 50 Centre forResource and Environmental Studies Australian NationalUniversity Canberra

Kadmon R Farber O amp Danin A (2003) A systematic analysisof factors affecting the performance of climatic envelopemodels Ecol Appl 13 853ndash67

Kavanagh R P (2002) Conservation and management of largeforest owls in southeastern Australia In The Ecology andConservation of Owls (eds I Newton R Kavanagh J Olsonamp I Taylor) pp 201ndash19 CSIRO Melbourne

Lambeck R J (1997) Focal species a multi-species umbrellafor nature conservation Conserv Biol 11 849ndash56

Leathwick J R amp Austin M P (2001) Competitive interactionsbetween tree species in New Zealandrsquos old-growth indige-nous forests Ecology 82 2560ndash73

Legendre P amp Fortin M J (1989) Spatial pattern and ecologicalanalysis Vegetatio 80 107ndash38

Legendre L amp Legendre P (1998) Numerical Ecology ElsevierNew York

Levins R (1975) Evolution of communities near equilibrium InEcology and Evolution of Communities (eds M L Cody amp J MDiamond) pp 16ndash50 Harvard University Press Cambridge

LHCCREMS (2004) Lower Hunter Central Coast Regional Biodi-versity Strategy 2004 (4 Volume) Hunter Councils on behalfof the Lower Hunter Central Coast Regional Environmen-tal Strategy Thornton

Li W Wang Z Ma Z amp Tang H (1999) Designing the corezone in a bioshpere reserve based on suitable habitatsYancheng Biosphere Reserve and the red crowned crane(Grus japonensis) Biol Conserv 90 167ndash73

Loyn R H McNabb E G Volodina L amp Willig R (2001)Modelling landscape distributions of large forest owls asapplied to managing forests in north-east Victoria AustraliaBiol Conserv 97 361ndash76

McCullagh P amp Nelder J A (1989) Generalized Linear ModelsChapman amp Hall London

MacKenzie D I Nichols J D Lachman G B Droege SRoyle J A amp Langtimm C A (2002) Estimating site occu-pancy rates when detection probabilities are less than oneEcology 83 2248ndash55

Manel S Dias J M Buckton S T amp Ormerod S J (1999a)Alternative methods for predicting species distribution anillustration with Himalayan river birds J Appl Ecol 36734ndash47

Manel S Dias J M amp Ormerod S J (1999b) Comparingdiscriminant analysis neural networks and logistic regres-sion for predicting species distributions a case study witha Himalayan river bird Ecol Model 120 337ndash47

Miller M E Hui S L amp Tierney W M (1991) Validationtechniques for logistic regression models Stat Med 101213ndash26

Moilanen A (2002) Implications of empirical data quality tometapopulation model parameter estimation and applica-tion Oikos 96 516ndash30

Moisen G G amp Frescino T S (2002) Comparing five modelingtechniques for predicting forest characteristics Ecol Model157 209ndash25

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 19: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

HABITAT MODELLING FOR CONSERVATION PLANNING 737

National Parks amp Wildlife Service (1998) Eden Fauna ModellingNew South Wales National Parks amp Wildlife Service NewSouth Wales Regional Forest Agreement Steering Commit-tee Canberra

National Parks and Wildlife Service (2000) Modelling Areas ofHabitat Significance for Fauna and Flora in the Southern CRANPWS NSW Canberra

Nix H (1986) A biogeographic analysis of Australian elapidsnakes In Atlas of Elapid Snakes of Australia (ed RLongmore) pp 4ndash15 Australian Government PublishingService Canberra

Obuchowski N A (1994) Computing sample size for receiveroperating characteristic studies Invest Radiol 29 238ndash43

Pearce J amp Ferrier S (2000) Evaluating the predictive perfor-mance of habitat models developed using logistic regres-sion Ecol Model 133 225ndash45

Pearce J amp Ferrier S (2001) The practical value of modellingrelative abundance of species for regional conservation plan-ning a case study Biol Conserv 98 33ndash43

Pearce J Ferrier S amp Scotts D (2001a) An evaluation of thepredictive performance of distributional models for floraand fauna in north-east New South Wales J Environ Man-age 62 171ndash84

Pearce J L Cherry K Drielsma M Ferrier S amp WhishG (2001b) Incorporating expert knowledge and fine-scalevegetation mapping into statistical modelling of faunaldistribution J Appl Ecol 38 412ndash24

Phillips S J Anderson R P amp Schapire R E (in press) Max-imum entropy modeling of species geographic distributionsEcol Model

Pressey R L Possingham H P Logan V S Day J R amp WilliamsP H (1999) Effects of data characteristics on the results ofreserve selection algorithms J Biogeogr 26 179ndash91

Pulliam H R (1988) Sources sinks and population regulationAm Nat 132 652ndash61

R Development Core Team (2004) R A language andenvironment for statistical computing R Foundation forStatistical Computing Vienna Austria URL httpwwwR-projectorg

Reading R P Clark T A Seebeck J H amp Pearce J (1996)Habitat Suitability Index model for the eastern barredbandicoot Perameles gunnii Wildl Res 23 221ndash35

Reed P C amp Lunney D (1990) Habitat loss the key problemfor the long-term survival of koalas in New South Wales InKoala Summit Managing Koalas in New South Wales (eds DLunney C A Urquhart amp P C Reed) pp 9ndash31 Universityof Sydney Sydney

Ripley B D (1995) Pattern Recognition and Neural Networks ndasha Statistical Approach Cambridge University PressCambridge

Russel R (1995) Yellow-bellied glider In The Mammals ofAustralia (ed R Strahan) pp 226ndash8 Reed BooksChatswood

Steyerberg E W Eijkemans M J C Harrell F E amp HabbemaJ D F (2001a) Prognostic modeling with logistic regres-sion analysis in search of a sensible strategy in small datasets Med Decis Making 21 45ndash56

Steyerberg E W Harrell F E Borsboom G J J M Eijke-mans M J C Vergouwe Y amp Habbema J D F (2001b)Internal validation of predictive models efficiency of someprocedures for logistic regression analysis J Clin Epide-miol 54 774ndash81

Stockwell D amp Peters D (1999) The GARP modelling systemproblems and solutions to automated spatial prediction IntJ Geogr Inf Sci 13 143ndash58

Sutherland W J (1996) Ecological Census Techniques a Hand-book 1st edn Cambridge University Press Cambridge

ter Braak C J F (1986) Canonical Correspondence Analysisa new eigenvector technique for multivariate direct gradientanalysis Ecology 67 1167ndash79

Thompson S K (2002) Sampling 2nd edn Wiley New YorkTyre A J Tenhumberg B Field S A Possingham H P

Niejalke D amp Parris K (2003) Improving precision andreducing bias in biological surveys by estimating false neg-ative error rates in presence-absence data Ecol Appl 131790ndash801

Van Horne B (1983) Density as a misleading indicator of hab-itat quality J Wildl Man 47 893ndash901

Van Horne B amp Wiens J A (1991) Forest Bird Habitat SuitabilityModels and the Development of General Habitat ModelsUnited States Department of the Interior Fish and WildlifeService Washington DC

Venables W N amp Ripley B D (2003) Modern Applied Statisticswith S 4th edn Springer New York

Walters C amp Holling C S (1990) Large-scale managementexperiments and learning by doing Ecology 71 2060ndash8

Wintle B A McCarthy M A Volinsky C T amp KavanaghR P (2003) The use of Bayesian Model Averaging to betterrepresent the uncertainty in ecological models ConservBiol 17 1579ndash90

Wintle B A Elith R J Yamada K amp Burgman M A (2004)LHCCREMS fauna survey and mapping project Module2 Habitat modelling and conservation requirements LowerHunter amp Central Coast Regional Environmental Manage-ment Strategy Callaghan

Wintle B A McCarthy M A Parris K M amp Burgman M A(2004) Precision and bias of methods for estimating pointsurvey detection probabilities Ecol Appl 14 703ndash12

Wilson K A Westphal M I Possingham H P amp Elith J(2005) Sensitivity of conservation planning to differentapproaches to using species distribution data Biol Conserv122 99ndash112

Wintle B A Burgman M A McCarthy M A amp KavanaghR P (in press) The magnitude and management conse-quences of false negative observation error in surveys ofarboreal marsupials and large forest owls J Wildl Man

Worton B J (1989) Kernel methods for estimating the utiliza-tion distribution in home-range studies Ecology 70 164ndash8

Yencken D amp Wilkinson D (2000) Resetting the CompassAustraliarsquos Journey Towards Sustainability CSIRO Publish-ing Melbourne

Zaniewski A E Lehmann A amp Overton J M (2002) Predict-ing species distribution using presence-only data a casestudy of native New Zealand ferns Ecol Model 157 261ndash80

APPENDIX 1

Method for 0632+ bootstrap evaluation after Harrellet al (1996) and Efron and Tibshirani (1997)1 Develop model on all n observations2 Calculate the statistic(s) of choice for evaluation

on the same data (ie the training data) ndash call thisStatapp because it is the apparent value of thestatistic

3 Take a bootstrap sample ie a sample of size nwith replacement of rows of the data matrix

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO

Page 20: Fauna habitat modelling and mapping: A review and case study in …au.fsc.org/download.fauna-habitat-modelling-and-mapping... · Fauna habitat modelling and mapping: A review and

738 B A WINTLE ET AL

Keep track of which sites are in the bootstrapsample and which are excluded

4 Fit the model on the bootstrap sample (using thesame methods as used on the full set)

5 Compute the statistic on the bootstrap data set(observations vs fitted values) and call it Statboot

6 Also compute the statistic on a version of thebootstrap data where the observations are ran-domized (Statpermute)

7 Use the bootstrap model to predict to theexcluded data set calculate the statistic on thesepredictions Statexcl

8 Use Statboot Statpermute and Statexcl to calculate theamount of overfitting the relative overfitting rate

and weights that are then used to make a bestestimate of predictive performance Statbest_est Thisstatistic puts most emphasis on predictions to theexcluded data particularly when the model is over-fitted (ie when Statboot ndash Statexcl is large) Fordetails of this step see Steyerberg et al (2001b)

9 Measure how optimistic the fit on the bootstrapsample was O = Statboot ndash Statbest_est

10 Repeat steps 3ndash7 100ndash200 times11 Calculate an average optimism 12 Use to correct Statapp for its optimism Statapp ndash

This is a near unbiased estimate of the expectedvalue of the external predictive performance of theprocess that generated Statapp

OO