Test of statistical means for the extrapolation of soil ...

13
HYDROLOGICAL PROCESSES Hydrol. Process. 23, 3017–3029 (2009) Published online 26 August 2009 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/hyp.7413 Test of statistical means for the extrapolation of soil depth point information using overlays of spatial environmental data and bootstrapping techniques Helen E. Dahlke, 1 * Thorsten Behrens, 2 Jan Seibert 3,4 and Lotta Andersson 5 1 Biological and Environmental Engineering, Cornell University, 165 Riley-Robb Hall, Ithaca, New York, 14853, USA 2 Physical Geography, Institute of Geography, University of Tuebingen, Ruemelinstrasse 19-23, 72070 T¨ ubingen, Germany 3 Department of Geography, University of Zurich, CH-8057 Zurich, Switzerland 4 Department of Physical Geography and Quaternary Geology, Stockholm University, SE-106 91, Stockholm, Sweden 5 Swedish Meteorological and Hydrological Institute, Department of Research and Development, SE-601 76 Norrk¨ oping, Sweden Abstract: Hydrological modelling depends highly on the accuracy and uncertainty of model input parameters such as soil properties. Since most of these data are field surveyed, geostatistical techniques such as kriging, classification and regression trees or more sophisticated soil-landscape models need to be applied to interpolate point information to the area. Most of the existing interpolation techniques require a random or regular distribution of points within the study area but are not adequate to satisfactorily interpolate soil catena or transect data. The soil landscape model presented in this study is predicting soil information from transect or catena point data using a statistical mean (arithmetic, geometric and harmonic mean) to calculate the soil information based on class means of merged spatial explanatory variables. A data set of 226 soil depth measurements covering a range of 0–6Ð5 m was used to test the model. The point data were sampled along four transects in the Stubbetorp catchment, SE-Sweden. We overlaid a geomorphology map (8 classes) with digital elevation model-derived topographic index maps (2–9 classes) to estimate the range of error the model produces with changing sample size and input maps. The accuracy of the soil depth predictions was estimated with the root mean square error (RMSE) based on a testing and training data set. RMSE ranged generally between 0Ð73 and 0Ð83 m š 0Ð013 m depending on the amount of classes the merged layers had, but were smallest for a map combination with a low number of classes predicted with the harmonic mean (RMSE D 0Ð46 m). The results show that the prediction accuracy of this method depends on the number of point values in the sample, the value range of the measured attribute and the initial correlations between point values and explanatory variables, but suggests that the model approach is in general scale invariant. Copyright 2009 John Wiley & Sons, Ltd. KEY WORDS soil-landscape modelling; hydrological modelling; soil depth; bootstrapping; soil attributes; soil attribute prediction; statistical mean; root mean square error Received 18 November 2008; Accepted 16 June 2009 INTRODUCTION Digital high-resolution soil information and new app- roaches to obtain landscape heterogeneities face still a growing demand for improvements of existing hydro- logical models and to capture the space–time variability of hydrological processes. Soil depth is seen as one of the essential input parameters for distributed hydrologi- cal and environmental modelling. Soil depth, or the depth from the ground surface to the surface of the bedrock or an impermeable layer, is seen as a major control on soil–water storage and availability in many envi- ronments (Tromp-van Meerveld and McDonnell, 2006a). Soil depth significantly affects spatial soil moisture pat- terns (Burt and Butcher, 1985; Freer et al., 2002; Tromp- van Meerveld and McDonnell, 2006b) as well as subsur- face and groundwater flow (Buttle and McDonald, 2002; Freer et al., 2002; Stieglitz et al., 2003). Soil depth or * Correspondence to: Helen E. Dahlke, Biological and Environmental Engineering, Cornell University, 165 Riley-Robb Hall, Ithaca, New York, 14853, USA. E-mail: [email protected] depth to bedrock is thus a standard variable used in many hydrological models such as soil & water assess- ment tool (SWAT) (Arnold and Fohrer, 2005), distributed hydrology soil vegetation model (DHSVM) (Wigmosta et al., 1994), soil moisture distribution and routing model (SMDR) (Frankenberger et al., 1999) or TOPMODEL (Beven et al., 1984). To face the growing demand for high-resolution spatial soil information, so-called quan- titative soil-landscape methods are applied to extend conventional soil survey point observations to the land- scape scale (Ryan et al., 2000; McBratney et al., 2003). Approaches applied to predict continuous soil attributes such as soil depth comprise simple linear regression, kriging and co-kriging (Odeh et al., 1994, 1995; Ryan et al., 2000), generalized linear models (McKenzie and Ryan, 1999), discriminant analysis (Sinowski and Auer- swald, 1999) and landform evolution models (Saco et al., 2006). The development of these models has especially been facilitated by the achieved advances in geographical information systems (GIS), digital elevation models Copyright 2009 John Wiley & Sons, Ltd.

Transcript of Test of statistical means for the extrapolation of soil ...

Page 1: Test of statistical means for the extrapolation of soil ...

HYDROLOGICAL PROCESSESHydrol. Process. 23, 3017–3029 (2009)Published online 26 August 2009 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/hyp.7413

Test of statistical means for the extrapolation of soil depthpoint information using overlays of spatial

environmental data and bootstrappingtechniques

Helen E. Dahlke,1* Thorsten Behrens,2 Jan Seibert3,4 and Lotta Andersson5

1 Biological and Environmental Engineering, Cornell University, 165 Riley-Robb Hall, Ithaca, New York, 14853, USA2 Physical Geography, Institute of Geography, University of Tuebingen, Ruemelinstrasse 19-23, 72070 Tubingen, Germany

3 Department of Geography, University of Zurich, CH-8057 Zurich, Switzerland4 Department of Physical Geography and Quaternary Geology, Stockholm University, SE-106 91, Stockholm, Sweden

5 Swedish Meteorological and Hydrological Institute, Department of Research and Development, SE-601 76 Norrkoping, Sweden

Abstract:

Hydrological modelling depends highly on the accuracy and uncertainty of model input parameters such as soil properties.Since most of these data are field surveyed, geostatistical techniques such as kriging, classification and regression trees ormore sophisticated soil-landscape models need to be applied to interpolate point information to the area. Most of the existinginterpolation techniques require a random or regular distribution of points within the study area but are not adequate tosatisfactorily interpolate soil catena or transect data. The soil landscape model presented in this study is predicting soilinformation from transect or catena point data using a statistical mean (arithmetic, geometric and harmonic mean) to calculatethe soil information based on class means of merged spatial explanatory variables. A data set of 226 soil depth measurementscovering a range of 0–6Ð5 m was used to test the model. The point data were sampled along four transects in the Stubbetorpcatchment, SE-Sweden. We overlaid a geomorphology map (8 classes) with digital elevation model-derived topographic indexmaps (2–9 classes) to estimate the range of error the model produces with changing sample size and input maps. The accuracyof the soil depth predictions was estimated with the root mean square error (RMSE) based on a testing and training data set.RMSE ranged generally between 0Ð73 and 0Ð83 m š 0Ð013 m depending on the amount of classes the merged layers had, butwere smallest for a map combination with a low number of classes predicted with the harmonic mean (RMSE D 0Ð46 m).The results show that the prediction accuracy of this method depends on the number of point values in the sample, the valuerange of the measured attribute and the initial correlations between point values and explanatory variables, but suggests thatthe model approach is in general scale invariant. Copyright 2009 John Wiley & Sons, Ltd.

KEY WORDS soil-landscape modelling; hydrological modelling; soil depth; bootstrapping; soil attributes; soil attributeprediction; statistical mean; root mean square error

Received 18 November 2008; Accepted 16 June 2009

INTRODUCTION

Digital high-resolution soil information and new app-roaches to obtain landscape heterogeneities face still agrowing demand for improvements of existing hydro-logical models and to capture the space–time variabilityof hydrological processes. Soil depth is seen as one ofthe essential input parameters for distributed hydrologi-cal and environmental modelling. Soil depth, or the depthfrom the ground surface to the surface of the bedrockor an impermeable layer, is seen as a major controlon soil–water storage and availability in many envi-ronments (Tromp-van Meerveld and McDonnell, 2006a).Soil depth significantly affects spatial soil moisture pat-terns (Burt and Butcher, 1985; Freer et al., 2002; Tromp-van Meerveld and McDonnell, 2006b) as well as subsur-face and groundwater flow (Buttle and McDonald, 2002;Freer et al., 2002; Stieglitz et al., 2003). Soil depth or

* Correspondence to: Helen E. Dahlke, Biological and EnvironmentalEngineering, Cornell University, 165 Riley-Robb Hall, Ithaca, New York,14853, USA. E-mail: [email protected]

depth to bedrock is thus a standard variable used inmany hydrological models such as soil & water assess-ment tool (SWAT) (Arnold and Fohrer, 2005), distributedhydrology soil vegetation model (DHSVM) (Wigmostaet al., 1994), soil moisture distribution and routing model(SMDR) (Frankenberger et al., 1999) or TOPMODEL(Beven et al., 1984). To face the growing demand forhigh-resolution spatial soil information, so-called quan-titative soil-landscape methods are applied to extendconventional soil survey point observations to the land-scape scale (Ryan et al., 2000; McBratney et al., 2003).Approaches applied to predict continuous soil attributessuch as soil depth comprise simple linear regression,kriging and co-kriging (Odeh et al., 1994, 1995; Ryanet al., 2000), generalized linear models (McKenzie andRyan, 1999), discriminant analysis (Sinowski and Auer-swald, 1999) and landform evolution models (Saco et al.,2006).

The development of these models has especially beenfacilitated by the achieved advances in geographicalinformation systems (GIS), digital elevation models

Copyright 2009 John Wiley & Sons, Ltd.

Page 2: Test of statistical means for the extrapolation of soil ...

3018 H. E. DAHLKE ET AL.

(DEM), terrain analysis, statistical analysis and theincreasing computing capacity during the last decade.Based on differences in the quality and type of fieldmeasurements of soil properties and the availability ofadditional spatial environmental explanatory variables,the available methods can be categorized into continu-ous and discrete approaches (Burrough, 1993). Commoncontinuous approaches analyze the spatial continuity ofa specific soil variable based on the variance of theirdistribution using geostatistical methods (e.g. kriging)or they include known environmental information (e.g.topographic, land use and substrate information) for thespatial distribution of the soil variable based on a regres-sion model (Mertens et al., 2002). Discrete approachessuch as Bayesian expert systems model categorical (nom-inal, ordinal or interval) soil attributes or soil classesthrough the integration of soil and landscape informationinto a semantic net and/or the definition of logical rules(Skidmore et al., 1996). Other methods that predict con-ventionally mapped soil-landscape units are fuzzy logicapproaches (Zhu, 2000) and neural networks (Lehmannet al., 1999; Behrens et al., 2005) that use learning algo-rithms to train a network that predicts the desired outputunits based on mapped soil units.

Despite the great variety and advances that have beenmade in the development of continuous and discrete soil-landscape models, the approaches have limitations intheir applicability to provide input parameters for dis-tributed hydrological models. Discrete approaches pro-vide soil information for spatial entities and providehence the data structure required in most of the distributedor hydrological response units (HRU)-based hydrologicalmodels. HRUs describe areas of homogeneous hydro-logical response based on similar topographical, pedo-logical and geomorphological characteristics, which areextracted from an overlay of topographic, soil and landuse data. The concept is based on the assumption thathydrological processes within a delineated hydrologicalresponse unit show a certain degree of homogeneity andtherefore less variability as compared with surroundingarea units. In comparison to raster-based hydrologicalmodels, it aims to reduce parameterization complexityand computing time, especially at regional and catchmentscale applications (Flugel, 1995; Leavesley and Stannard,1995). Following the HRU concept, discrete soil modelapproaches effectively facilitate the reduction of the spa-tial variability of hydrological processes in the landscapeand reduce the time and effort to collect necessary soilattribute data in a study area (Park and van de Giesen,2004). However, they bear the risk that the hydrologi-cal model application is bound to the scale of the pre-existing conventional soil surveys, which exist mostly inthe range of 1 : 50 000 to 1 : 1 000 000 (e.g. 1 : 1 000 000in Sweden) and are rather inflexible to scaling of the soilinformation (Olsson, 1999; Behrens and Scholten, 2007).Moreover, the development of soil unit-based quantita-tive soil models reached a degree of complexity in userexpertise and user knowledge, both on the soil surveyand on the model side that challenges their short-term

applicability as simple tools to generate soil input datafor hydrological models and modeller.

Continuous approaches have the advantage that theyare easy applicable, have little demands in computationsoftware (e.g. implemented in common GIS) and userexpertise. However, most of the geostatistical methodsrequire a large number of samples or frequent sam-pling for accurate predictions and bear the problem thateven with established model functions, the capabilities toextrapolate the results outside the study area or catch-ment remain limited (Kravchenko, 2003). Geostatisticalmethods also assume a certain data structure such as aregular grid or uniform distribution (Odeh et al., 1994,1995; Lane, 2002; Kravchenko, 2003; Lyon et al., 2006).Methods such as kriging and inverse distance weighting(IDW) and regression trees require a regular or randomdistribution of the point data that are scattered over theobservation area. However, transect or catena data areusually not object of interpolation techniques, becausetheir spatial representation for a defined area of inter-est is limited to the proximate surrounding of the catenaand the incremental distance of the points along thecatena. The application of common interpolation tech-niques (e.g. kriging and IDW) to catena point data resultsin a decrease of the predictive capacity the farther apoint/cell needs to be predicted from the field-measuredpoints. Typical artefacts such as stripes or facets are pro-duced in the prediction maps showing the decreasingability of the interpolation algorithm to predict in areas,which lack point observations.

The interpolation of soil information sampled with thecatena approach remains therefore a challenge for geosta-tistical methods and soil-landscape modelling techniques.Most studies that use catena soil information are, thus,limited to small-scale applications such as single hill-slopes and avoid predictions of larger landscape areas.Most of the interpolation of catena-sampled soil infor-mation is facilitated through the integration of digitalterrain analysis into the interpolation process (Mooreet al., 1993; Sommer and Schlichting, 1997; Gessleret al., 2000; Chamran et al., 2002). Statistical correla-tions among soil properties such as soil moisture, netprimary productivity, soil organic carbon, soil textureclasses and especially soil depths and terrain attributesgenerated from a DEM have been investigated sincethe end-1970s and have greatly enhanced the quanti-tative investigation of hydrological processes in soils(Beven and Kirkby, 1979; O’Loughlin, 1986; Mooreet al., 1991). These studies contribute to the under-standing of relations between topography, water move-ment and ecosystem processes and support quantitativeand dynamic modelling of eco-hydrological processesthrough the integration of GIS-based terrain analysis andfield observations (Chamran et al., 2002).

This study presents a soil-modelling technique toextrapolate soil-depth information from four transects(soil depth as understood as depth to bedrock) to asmall catchment in Sweden based on different maps ofexplanatory variables. Three statistical means (arithmetic,

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 3: Test of statistical means for the extrapolation of soil ...

THE EXTRAPOLATION OF TRANSECT SOIL DEPTH POINT INFORMATION 3019

geometric and harmonic) are tested to predict soil depthsbased on class means derived from an overlay of thepoint observations with each class of a geomorphologyand different terrain maps. Using bootstrapping, thecapability of the statistical means to predict soil depthsand the model uncertainty is estimated for differentspatial disaggregation.

SITE DESCRIPTION

The Stubbetorp catchment (58°440N, 16°210E) is locatedabout 120 km southwest of Stockholm in the eastern partof central South Sweden (Figure 1). The hilly catchmentbelongs to the upper part of the Kolmarden mountainridge, a region dominated by low-weathering gneissicgranites that bounds the northern shore of the deeplyincised bay Braviken of the Baltic Sea (Wikstrom,1979). The main valley and the two side valleys ofStubbetorp catchment, which covers an area of 0Ð94 km2,are northwest–southeast orientated following the majorfault line in this region. Altitude in the catchment rangesfrom 80 m above sea level (asl) at the gauge to 130 m asl.The Stubbetorp catchment was completely covered withwater after the last deglaciation period (Persson, 1982).Both glacial ice movements and the action of oceanwaves, which left the top of the hills with little soil cover,influenced the present geomorphology and topography.In large parts of the catchment (46%), the bedrock iscovered with till on which usually rather conductive, verystony and in fine materials depleted soils are developed.

The eroded gravel and fine sediments have accumulatedin depressions and in the main valley where ombrotrophicpeatlands and swamp forests (in total 10Ð5%) with amaximum peat depth of 6Ð5 m occur. The catchmentis largely dominated by podzolic forest soils, whereaslithosols with rocky outcrops are especially occurring inthe southeast part of the catchment. The mean slope of thecatchment is 5Ð9° with a maximum slope of 26° in the areaof the catchment outlet. Most of the catchment is forested(83%) with Pinus sylvestris and Picea abies of differentage, deciduous tree species are less important and occuronly in the wetland areas. The climate in the catchment ischaracterized by a mean annual precipitation of 666 mmand an annual potential evaporation of 432 mm (period1985–1994). Mean annual runoff measured for the sametime period was 230 mm (Pettersson, 1995).

MATERIALS

Soil depth measurements

Soil depth measurements (depth to bedrock) wereavailable for two longer transects (485 m length) crossingthe main valley in the upper part of the catchmentand in two shorter transects in the central (210 mlength) and lower part (120 m length) of the catchment(Figure 1). These soil depth measurements were obtainedin 1994 using Georadar (Olofsson and Fleetwood, 1994).The derived data set consists of 226 points with anincremental distance of 5 m with soil depths varyingbetween zero and 6Ð5 m (Figure 2).

Figure 1. Study area: Stubbetorp catchment, central-southeast Sweden. Dots indicate locations of soil depth measurements used in this study. Greyareas indicate wetland areas, mapped in July 2005 in the catchment

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 4: Test of statistical means for the extrapolation of soil ...

3020 H. E. DAHLKE ET AL.

Figure 2. Histogram and univariate statistics of the soil depth measure-ments in Stubbetorp catchment

Geomorphology and wetland areas

Geomorphological information about the catchmentcomprises different types of till, i.e. differentiations of theamount of boulders in the till, sediment deposits (sand,gravel) and bare rock areas mapped by the Departmentof Land and Water Resources of the Royal Institute ofTechnology (Olofsson and Fleetwood, 1994). This geo-morphology layer originally included 13 classes, but wasreclassified into 8 classes (Table I). The geomorphologyinformation was extended by a more detailed map of wet-land areas, which was derived using a differential GPS(Trimble TSC1, horizontal accuracy <0Ð5 m) during afield survey in July 2005. Areas were classified as wet-lands, if they showed signs of surface saturation or watertables close to the ground surface (e.g. bootprints wouldfill with water), hydric soils (Histosols, redox or gleyedsoils) as well as hydrophytic vegetation (e.g. sedges,rushes and hydrophytic grass species). Both data sets,the wetland map and the geomorphology map, were com-bined in GIS using an overlay analysis known as Merge.

DEM

A DEM of 10 m resolution was generated in ArcInfo(ESRI Inc.) using the TOPOGRID function and 5 m iso-line data of the Swedish land surveying office. This DEMwas used to compute terrain maps of different topo-graphic indices. All other data sets were converted tothe same raster.

Table I. Class-ids and descriptions of the geomorphology mapused in the soil model

ID Class Description

1 Sand, gravel2 Till, washed till, less amount of boulders3 Till—washed till, normal amount of boulder4 Till—washed till, rich in boulders, large boulders5 Bare rock6 Swamp forests7 Bogs and fens8 Wet depressions

METHODS

Digital terrain analysis

On the basis of the assumed interrelation betweentopography and soil depth (Gessler et al., 1995; Mitasovaet al., 1995; Moore et al., 1991), we calculated 49 dif-ferent terrain parameters (Table II) ranging from localparameters such as slope, curvature and aspect to morecomplex parameters such as distance to drainage dividesor hillslope position based on equations found in Zeven-bergen and Thorne (1987), Dikau (1989), Wood (1996),Shary et al. (2002) and Behrens (2003).

Terrain attribute selection

To select relevant terrain attributes, the Pearsonproduct–moment correlation coefficients between eachterrain parameter and observed soil depth were cal-culated. These correlation coefficients varied between�0Ð58 and 0Ð35. The four terrain attributes, which werestrongest correlated to the observed soil depth, wereselected for further processing (Table II). These werethe vertical distance to channel network (vd; r D �0Ð58)(Olaya, 2004), the elevation above channel (eac; r D�0Ð54) (McQuire et al., 2005), the relative profile curva-ture (rpc; r D �0Ð52) (Behrens et al., 2005) and the rel-ative hillslope position (rhp; r D �0Ð42) (Behrens et al.,2005). The vd is based on the height difference between acertain cell and the stream-channel base-level elevation.The latter is computed by interpolation of the elevationvalues of stream cells to the surrounding area (Olaya,2004). The eac describes the elevation of a cell abovea cell in the stream channel. The elevation difference isobtained depending on where cells of the same flow pathor steepest gradient flow path enter the stream (McGuireet al., 2005). The rpc is estimated using the movingwindow approach. Within a moving window of three-by-three cells, first the inclination of the cell in the centre tothe surrounding cells is calculated and secondly the rpcis obtained as the average inclination of all cells higherthan the cell in the centre divided by the average inclina-tion of all cells with lower elevation than the central cell(Behrens, 2003). The rhp is based on the subtraction ofthe distance to the ridges and the distance to channel. Forthe distance to channel (Behrens, 2003), the flow accu-mulation in square meters is used and for the distance toridges, the inverse of the flow accumulation. Thus, valuesof zero indicate mid slope areas.

As these indices quantify terrain features by continuousvalues, we classified these values into a certain numberof classes to extract class–average soil depths for theconceptual model. A k-means cluster algorithm was usedto achieve an objective reclassification of the terrainattributes into a user-specified number of 2 to 9 classes.K-means clustering is an algorithm that attempts tofind the centres of a user-specified number of naturalclusters in a data set by minimizing the total intra-clustervariance through iterative shifting of the cluster centroids(Hartigan and Wong, 1979).

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 5: Test of statistical means for the extrapolation of soil ...

THE EXTRAPOLATION OF TRANSECT SOIL DEPTH POINT INFORMATION 3021

Table II. Terrain parameters calculated for Stubbetorp catchmentand Pearson product–moment correlation coefficients (r) esti-

mated between soil depths and terrain attributes, respectively

Terrain Attributes r

Vertical distance to channel network (Olaya,2004)

�0Ð58

Elevation above channel (McGuire et al., 2005) �0Ð54Relative profile curvature (Behrens, 2003) �0Ð52Relative hillslope position (Hatfield, 1999) �0Ð42Minimum curvature (Wood, 1996) �0Ð41Waxing/waning slopes (Huber, 1994) �0Ð36Longitudinal curvature (Wood, 1996) �0Ð35Mean curvature (Shary et al., 2002) �0Ð33Mean curvature (Zevenbergen and Thorne, 1987) �0Ð33Mean curvature (Bolstad et al., 1998) �0Ð33Mean curvature ‘high pass filter’ (Behrens,

2003)�0Ð33

Mean curvature (Mc Nab, 1989) �0Ð32True surface distance from streams (Behrens,

2003)�0Ð31

Relative aspect curvature (Lehmeier and Kothe,1992)

�0Ð28

Profile curvature (Shary et al., 2002) �0Ð27Minimum curvature (Shary et al., 2002) �0Ð25Height above channel (Behrens, 2003) �0Ð23Maximum curvature (Shary et al., 2002) �0Ð23Maximum curvature (Wood, 1996) �0Ð16Horizontal curvature (Shary et al., 2002) �0Ð16Plan curvature (Zevenbergen and Thorne, 1987) �0Ð14Difference curvature (Shary et al., 2002) �0Ð12Solar insolation (Shary et al., 2002) �0Ð12Vertical excess curvature (Shary et al., 2002) �0Ð12Plan curvature (Shary et al., 2002) �0Ð09Surface volume above minimum elevation

(Nogami, 1995)�0Ð06

Topographic roughness (Behrens, 2003) �0Ð06Surface area (Jenness, 2004) �0Ð03Unsphericity (Shary et al., 2002) �0Ð02Ring-curvature (Shary et al., 2002) �0Ð02Aspect (Moore et al., 1993) �0Ð01Gaussian curvature (Shary et al., 2002) 0Ð02Slope (Horn, 1981) 0Ð04Surface runoff velocity (Moore et al., 1991) 0Ð04Gradient Factor (Shary et al., 2002) 0Ð04Gradient Factor (Behrens, 2003) 0Ð04Total accumulation curvature (Shary et al., 2002) 0Ð04Horizontal excess curvature (Shary et al., 2002) 0Ð07Cross-curvature (Wood, 1996) 0Ð09Rotor curvature (Shary et al., 2002) 0Ð11Reflectance map (Florinsky, 1998) 0Ð12Topographic index (Beven and Kirkby, 1979) 0Ð13Slope-length-factor (Moore et al., 1991) 0Ð16Relative height curvature (Behrens, 2003) 0Ð17Cross-curvature (Moore et al., 1991) 0Ð24Hemispherical dispersion (Hodgson and Gaile,

1999)0Ð26

Longitudinal curvature (Moore et al., 1991) 0Ð27Steepest downslope (Tarboton, 1997) 0Ð28Profile curvature (Zevenbergen and Thorne,

1987)0Ð35

For the final selection of the terrain attributes asinput data sets for the soil model, both a clusteringof the four single terrain parameters in a number of 2to 9 classes and parameter combinations of two, threeand all four terrain parameters were tested, resulting in

104 data sets. Parameter combinations were tested inthe sense to artificially generate terrain maps with avarying number of classes whose spatial disaggregationcould explain best the spatial variability of the measuredsoil depths. Since one of the aims of this study is totest the model’s applicability to predict soil depth forvarious spatially disaggregated input data sets, the lackof sufficient environmental data sets as input data inthe model was substituted by terrain maps of variablenumber of classes generated through the combinationof different terrain parameters. To extract the terrainparameters or parameter combination that showed thehighest class dissimilarity, the F-value of a one-wayanalysis of variance was calculated for each terrain dataset. The F-value is a measure for how representative thespatial variance of the fractioned terrain maps for thedistribution of soil depth in the catchment is and whetherthe terrain map can be selected as input data set in theconceptual soil model or not (Table III).

Soil model approach

The soil model approach is aimed to allow generatingspatial maps of soil characteristics (in this study: soildepth) based on catena point information. The approachis applicable to generate either user-defined discretelandscape units like entities used in HRUs or semi-continuous raster maps. The general approach is basedon class means resulting from an overlay of the soil-depth measurements with each class of any nominaldata set (e.g. geomorphology and terrain layer). Theapproach assumes that each environmental data set usedin the model represents actual differences in the soilcharacteristic to be modelled in an area of interest. Theclass means are calculated as arithmetic means overall points located in spatial units with the same class-id. Assuming that the catena of soil depths points iscrossing several spatial units in each spatial data layer,the information of each class can be spread over the studysite, if overlaid with other spatial data sets and theirclass means. Analogue to the regionalization concept(Diekkrueger et al., 1999), the overlay of two or morespatial data sets results, thus, in the disaggregation ofthe study site into smaller discrete units whose ‘real’ soildepth will be approached, the more data sets are used inthe model, the higher the explanatory variables correlatewith the measured soil attribute.

In this study, we tested three statistical means (arith-metic, geometric and harmonic mean) to predict the soildepth for Stubbetorp catchment from class means of thegenerated terrain maps and the geomorphology map.

Model fitting and validation

The set of 226 soil depth points was split into trainingand testing data sets of pre-defined size to evaluate thespatial soil depths predictions and the model error of thedifferent soil models. To estimate the model performance,we applied a bootstrapping technique. Bootstrapping is astatistical method to estimate standard errors by sampling,

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 6: Test of statistical means for the extrapolation of soil ...

3022 H. E. DAHLKE ET AL.

Table III. Variability of F-values as measure of class dissimilarity in mean soil depth tested for all possible terrain parameters andparameter combinations of the cluster analysis

Terrain Parameter Combinations Number of Classes

2 3 4 5 6 7 8 9

1 vd 11Ð27 9Ð86 8Ð18 8Ð10 6Ð84 8Ð11 7Ð30 7Ð51rhp 37Ð28 27Ð54 17Ð14 17Ð66 13Ð10 16Ð46 10Ð72 9Ð61eac 46Ð85 24Ð13 19Ð33 17Ð07 17Ð41 16Ð86 19Ð29 17Ð12rpc 38Ð81 55Ð78 24Ð28 20Ð36 23Ð75 18Ð90 15Ð68 13Ð37

2 eac/rhp 46Ð85 24Ð13 21Ð94 17Ð07 21Ð66 25Ð45 27Ð62 14Ð87eac/rpc 40Ð10 52Ð91 30Ð56 28Ð46 25Ð82 25Ð45 27Ð62 25Ð64eac/vd 46Ð18 25Ð74 21Ð76 16Ð85 17Ð82 15Ð89 15Ð33 13Ð59rhp/rpc 39Ð53 59Ð21 25Ð98 23Ð64 24Ð81 19Ð45 15Ð14 16Ð30vd/rhp 22Ð18 11Ð41 19Ð61 19Ð92 20Ð72 11Ð14 10Ð16 9Ð45vd/rpc 39Ð53 53Ð20 26Ð80 21Ð81 25Ð13 19Ð46 16Ð23 15Ð04

3 eac/vd/rpc 40Ð10 53Ð15 30Ð56 25Ð99 25Ð89 26Ð29 23Ð18 24Ð02eac/vd/rhp 46Ð18 25Ð74 22Ð51 19Ð99 19Ð99 17Ð73 17Ð78 14Ð99eac/rhp/rpc 39Ð74 52Ð91 30Ð56 25Ð99 25Ð89 25Ð45 25Ð58 21Ð68

4 eac/rhp/rpc/vd 8Ð95 52Ð91 30Ð56 26Ð94 28Ð44 21Ð66 23Ð18 21Ð69

Note: Vd, vertical distance to channel network; eac, elevation above channel; rpc, relative profile curvature; rhp, relative hillslope position. The higherthe F-value, the better is the class separation of the arithmetic class means. The highest F-value reached for each group of classes is highlighted inbold.

Table IV. Number of soil depth points in each class of the raster maps

Number of Classes Raster Maps Used Class id Totalin the Overlay

1 2 3 4 5 6 7 8 9

2 eac 33 193 2263 rhp rpc 24 55 147 2264 eac rpc 101 80 11 34 2265 eac rpc 71 94 34 7 20 2266 eac vd rhp rpc 5 17 69 45 12 80 2267 eac vd rpc 86 30 15 68 5 7 17 2268 eac rpc 49 47 0 20 67 22 14 7 2269 eac rpc 64 14 11 15 45 53 4 0 20 2268 geomorphology 12 3 40 83 37 17 30 4 226

Note: Vd, vertical distance to channel network; eac, elevation above channel; rpc, relative profile curvature; rhp, relative hillslope position.

where the samples are repeatedly replaced (Efron, 1981).In this study, we used bootstrapping to estimate the rootmean square error (RMSE) between predicted soil depthscalculated of the training set and the soil depths of thetesting data set, used as expected values. Although theoriginal data set was split into equally sized training andtesting data sets (113/113 points), we expected the RMSEto be largely influenced by the sample size of some ofthe raster map classes. Some of the terrain maps witha high number of classes contain a low number of soildepth points or even no soil depth points (empty classes)(Table IV). Due to the large data range of measured soildepths, the sample mean of these classes and the RMSEare greatly influenced by the values picked during thebootstrapping.

We calculated the RMSE for different scenarios toestimate the quality of the predicted soil depth mapsusing bootstrapping and 5000 iterations for each test. Indetail we tested three different scenarios for validation

and calculated the RMSE as follows:

RMSE D√√√√(

1

n∑iD1

�xi � yi�2

)

where xi is the estimated soil depth calculated of thearithmetic class means of two classes when combiningtwo input maps using one of the statistical means andyi is a soil depth point of the testing data set. The threemeasures for validation were the following:

1. The RMSE was calculated between the estimatedsoil depth (xi) of a certain class combination of thetraining data set and each of the respectively soildepth points of the testing set (yi) of exactly the sameclass combination, in the following referred to as theRMSEsingle value.

2. The RMSE was calculated based on the estimated soildepth (xi) of a class combination of the training data setand the class average of the soil depth points of either

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 7: Test of statistical means for the extrapolation of soil ...

THE EXTRAPOLATION OF TRANSECT SOIL DEPTH POINT INFORMATION 3023

the geomorphology or the terrain map class (yi) thatthe class combination is consisting of, in the followingreferred to as the RMSEclass value.

3. The RMSE calculated on class level averaged toa total RMSE of a given map combination (e.g.Geomorphology and 2-classes terrain map) to comparethe quality of the different dissolved soil depth maps,for the remainder of this article defined as RMSEtotal.

RESULTS

Terrain layer selection and classification

F-values were calculated for all combinations of ter-rain attributes. For each class category (number ofclasses), the highest F-value was estimated and the ter-rain map among all raster maps selected that showed thebest class separation. Table III shows F-values obtainedfor all single terrain parameters and terrain parametercombinations. The best F-value reached for each classcategory is highlighted in bold. In case that more thanone terrain map reached the best F-value, we chose theterrain map with the lowest number of combined ter-rain parameters on the basis of Ockham’s razor (Wolpert,1990).

Variability of soil depth measurements and classcombinations

Table V summarizes the available number of soil depthpoints for each class combination, when the geomorphol-ogy map is merged with a terrain map assuming all 226soil depth points in the model. However, with respectto the three validation scenarios stated in section modelfitting and validation, the best validation method of theestimated soil depths is to compare the estimated soildepth of an area to soil depth points that are exactlylocated in the same area. Since the soil depth points in ourstudy show a non-uniform distribution over the catchment(see Figure 1), the estimated soil depth can only directlybe verified for a few class combinations with soil depthpoints located in exactly the same area. Table III sum-marizes the number of maximal available points for eachclass combination to estimate the soil depths that can bedirectly or indirectly validated with soil depths from thetesting data set. The 8-classes and the 9-classes terrainmaps both have ‘empty’ classes (class 3 of eac rpc8; class8 of eac rpc9) and contain no representative soil depthpoints for the calculation of a class mean (Table IV).

Soil model test using bootstrapping

Results of the total RMSE averaged over 5000 boot-strapping iterations using the harmonic mean are shownin Figure 3. Tests of the arithmetic and geometric meanto predict soil depths for each geomorphology and ter-rain map class combination were also performed. How-ever, the results of the total RMSE, RMSEclass value andRMSEsingle value indicated a poorer performance of thestatistical means as predictors, compared with the har-monic mean. Both statistical means showed in general

higher RMSE in all validation scenarios and predictedlower soil depth ranges in the output maps comparedwith the original measurements and the predictions madewith the harmonic mean. The 226 point observations ofsoil depth ranged from 0 to 6Ð5 m. The use of the arith-metic mean to calculate class means would have resultedin non-zero values and would have caused a bias of pre-dicted soil depth in areas (e.g. bare soil areas) where themajority of soil depth points is zero. Initial test comput-ing the coefficient of determination between the predictedsoil depth maps and class means of the original soil depthmeasurements resulted in lowest coefficients for the mapspredicted with the arithmetic mean (max. R2 D 0Ð60)and highest coefficients for the maps predicted with theharmonic mean (max. R2 D 0Ð73). Consequently, onlyassessments based on the harmonic means were selectedfor further analyses.

The different map combinations shown in Figure 3resulted in similar mean RMSE values for the comparedstatistical means with slightly decreasing RMSE valueswith increasing number of classes. The means of the cal-culated total RMSE values decrease from approximately0Ð82 m (12Ð6% of the total data range) for the 2-classesterrain map combination to about 0Ð73 m (11Ð1% of totaldata range) for the 9-classes terrain map combination.For convenience, the number of classes in the respec-tive terrain maps is used in the remaining sections todistinguish the tested map combinations in further inter-pretations.

Validation results of the single-RMSE(RMSEsingle values), class-RMSE (RMSEclass values) and acomparison of estimated and predicted soil depths areshown for the harmonic mean in Figure 4. For themajority of the estimated soil depths, the single and

Figure 3. Box-and-whisker plot of total RMSE reached for the harmonicmean and different map combinations. The RMSE are sorted accordingto the number of classes of the terrain map used in the overlay withthe geomorphology map. The diagram shows for each map combinationthe median, the upper and lower quartile and the smallest and largest

observed RMSE during the 5000 bootstrapping iterations

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 8: Test of statistical means for the extrapolation of soil ...

3024 H. E. DAHLKE ET AL.

Table V. Maximum number and number of exactly located soil depth points available for the prediction of soil depths for eachgeomorphology-terrain map combination based on all 226 points

Maximal Available Number of Points Exactly Located PointsNumber of TerrainClasses Maps Geomorphology Geomorphology

Class id 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

N 12 3 40 83 37 17 30 4

2 eac1 33 45 36 73 116 70 50 63 37 3 7 19 4

2 193 205 196 233 276 230 210 223 197 12 3 37 76 18 17 3

3 rhp rpc1 24 36 27 64 107 61 41 54 28 16 82 55 67 58 95 138 92 72 85 59 12 5 4 2 6 263 147 159 150 187 230 184 164 177 151 3 35 63 27 11 4 4

4 eac rpc

1 101 113 104 141 184 138 118 131 105 4 1 23 4 3 13 16 12 80 92 83 120 163 117 97 110 84 2 13 31 31 33 11 23 14 51 94 48 28 41 15 9 24 34 46 37 74 117 71 51 64 38 8 4 3 1 4 14

5 eac rpc

1 71 83 74 111 154 108 88 101 75 13 28 28 22 94 106 97 134 177 131 111 124 98 4 3 21 37 13 163 34 46 37 74 117 71 51 64 38 8 4 3 1 4 144 7 19 10 47 90 44 24 37 11 5 25 20 32 23 60 103 57 37 50 24 2 1 6 2

6 eac vd rhp rpc

1 5 17 8 45 88 42 22 35 9 2 12 17 29 20 57 100 54 34 47 21 6 1 2 83 69 81 72 109 152 106 86 99 73 6 1 19 1 13 24 45 57 48 85 128 82 62 75 49 7 19 17 25 12 24 15 52 95 49 29 42 16 2 2 6 26 80 92 83 120 163 117 97 110 84 2 12 5 12 2 2

7 eac vd rpc

1 86 98 89 126 169 123 103 116 90 2 15 55 1 2 22 30 42 33 70 113 67 47 60 34 4 18 83 15 27 18 55 98 52 32 45 19 6 1 2 64 68 80 71 108 151 105 85 98 72 6 1 18 8 13 225 5 17 8 45 88 42 22 35 9 2 16 7 19 10 47 90 44 24 37 11 3 3 17 17 29 20 57 100 54 34 47 21 14 3

8 eac rpc

1 49 61 52 89 132 86 66 79 53 2 13 23 112 47 59 50 87 130 84 64 77 51 5 1 7 6 8 23 0 12 174 20 32 23 60 103 57 37 50 24 2 14 45 67 79 70 107 150 104 84 97 71 1 17 38 7 46 22 34 25 62 105 59 39 52 26 14 87 14 26 17 54 97 51 31 44 18 6 2 68 7 19 10 47 90 44 24 37 11 3 4

9 eac rpc

1 64 76 67 104 147 101 81 94 68 1 17 33 9 42 14 26 17 54 97 51 31 44 18 6 2 63 11 23 14 51 94 48 28 41 15 9 24 15 27 18 55 98 52 32 45 19 2 7 4 25 45 57 48 85 128 82 62 75 49 5 1 7 6 6 26 53 65 56 93 136 90 70 83 57 2 13 23 157 4 16 7 44 87 41 21 34 8 1 38 0 12 179 20 32 23 60 103 57 37 50 24 14 6

3 40 83 37 30 4

3 40 83 37 30 4

Note: N is the maximum number of soil depths points located in each class of each map. Light grey highlighted cells show class combinations thoseestimated soil depths can directly be validated with soil depths points that are exactly located in the same class combination. Dark grey highlightedcells indicate class combinations that do not comprise direct validation points, but that can be compared with the class mean of the testing dataset. Black cells highlight class combinations that occur in the final prediction maps, but those soil depths cannot be calculated due to a lack of soildepth points located in one or both of the combined classes (empty classes). White cells highlight class combinations that do not occur in the finalprediction maps.

class RMSE stay in the range of the calculated totalRMSE and the data set’s standard deviation of 1Ð09 m.The single and class RMSE exceed the mean totalRMSE for estimated soil depth greater than 1 m. Thiswas expected considering the value range of mea-sured (0–1 m) and predicted soil depths (0–0Ð54m)

(Figure 4b). RMSEclass values are generally larger thanRMSEsingle values because of the greater data range result-ing from the comparison of an estimated soil depthpoint to the mean soil depth of a layer class. The smallRMSEsingle values indicate that the estimated soil depthspredicted with the harmonic mean differ only little from

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 9: Test of statistical means for the extrapolation of soil ...

THE EXTRAPOLATION OF TRANSECT SOIL DEPTH POINT INFORMATION 3025

Figure 4. Comparison of RMSE and estimated soil depths calculated with the harmonic mean. Diagram (a) shows the estimated soil depths (blackdots), results of the two validation scenarios RMSEsingle value (black crosses) and RMSEclass value (grey diamonds). The RMSEsingle value result froma comparison of the estimated soil depth of a certain class combination to validation points located in areas with the same class combination andRMSEclass value show the comparison of the estimated soil depth of a certain class combination to all validation points of either one of the combinedclasses in a map combination. Diagram (b) shows a comparison of minimum and maximum estimated soil depths predicted with the testing and

training data set

the soil depths actually measured in the area of a certainclass.

Predicted soil depths maps

Maps of estimated soil depth were generated with theharmonic mean for each map combination (Figure 5).The predicted soil depth maps show an increasing degreeof spatial disaggregation the more classes the spatialdata sets in the overlay process have. The numberof entities increases in the prediction maps from 128to 1438 for the overlay of the geomorphology witha terrain map consisting of minimum two classes tomaximum 9 classes. Similarly, the size of the largestspatial entity in the predicted soil maps decreases frommaximum 160 800 m2 to 32 300 m2. Soil depth mapspredicted with the 8- or 9-classes terrain layer exhibit‘empty’ or ‘no-data’ areas, where the soil depth cannotbe modelled. Both terrain layer lack soil depth pointsin one of the classes to calculate the class mean. Thesize of the ‘no data’ areas in the geomorphology/8-terrain classes map covers 0Ð034 km2 and 0Ð031 km2

in the geomorphology/9-terrain classes map. The areasequal 3Ð9% and 3Ð3% of the catchment area (0Ð942 km2),respectively.

Soil depth maps with a higher degree of spatialdisaggregation show also a greater range of predictedsoil depths. Minimum, maximum and average soil depthsincreased from 0Ð50 m to 0Ð31 m, 2Ð24 m to 3Ð04 m and1Ð2 m to 1Ð68 m, respectively with increasing number ofincluded terrain classes in the predicted soil depth map(Figure 6).

RMSE were calculated between the soil catena pointsand the cell values in the soil depth prediction mapsto estimate the most suitable soil depth prediction map

(Table VI). The map combinations of the geomorphologymap with the 2-terrain-classes map reached the bestcoefficients among all map combinations and testedstatistical means. The lowest RMSE (RMSE D 0Ð46 m)was reached for the geomorphology/2-terrain classes mappredicted with the harmonic mean, which also showedthe highest R2. The second lowest RMSE (RMSE D0Ð61 m) was reached for the geomorphology/5-terrainclasses map. The prediction error of these two mapcombinations was less than 10% of the overall soil depthrange measured in the catchment.

DISCUSSION

The R2 reached in the soil depth prediction maps agreeswell with accuracies achieved for most quantitativespatial soil models (Beckett and Webster, 1971; Ryanet al., 2000). According to Beckett and Webster (1971),R2 greater than 0Ð7 are unusual for most spatial modelsand R2 of 0Ð5 or less are common. In this study, theRMSE of the final soil depth prediction maps showedan error smaller than 10% of the data range. This showsthat the presented soil model approach provides an easyapplicable method in terms of computation requirementsthat predict spatial variability of soil depth more accuratethan a single explanatory variable.

The fact that the geomorphology/2-terrain classes mapreached the lowest RMSE among all tested statisticalmeans was unexpected, because both the value range ofestimated soil depths and the degree of spatial disaggrega-tion were smaller in the final prediction map than in mapcombinations with more classes. However, this fact canbe explained with the clustering approach that has beenused to reclassify the terrain attributes to generate second

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 10: Test of statistical means for the extrapolation of soil ...

3026 H. E. DAHLKE ET AL.

Figure 5. Maps of estimated soil depths using the harmonic mean as prediction model. Each map shows a map combination of the geomorphologylayer consisting 8 classes and a terrain layer with varying number of classes (2–9 terrain classes). Grey areas indicate ‘empty’ classes, where soil

depths could not be estimated due to lacking point data in the training data set

Figure 6. Comparison of minimum, maximum and mean soil depth foreach produced soil depth map using the harmonic mean. Statistics issorted according to the number of classes in the terrain map used in the

overlay with the geomorphology map

input layer for the overlay process. The k-means cluster-ing algorithm used in this study randomly generates k

clusters from the continuous terrain attribute maps. Thefinal location and size of the clusters are, however, statis-tically determined by the convergence criterion that needsto be met for each cluster (Hartigan and Wong, 1979).The terrain classes resulting from the clustering dependon statistical differences in topography, but might notreflect the actual soil depth variability in the watershed.An expert-based differentiation and reclassification of theterrain attributes as input layer are therefore suggested forfuture applications.

Although the best RMSE suggests that the soil depthmap with the lowest disaggregation is the best choice forfurther applications, if a higher spatial disaggregation isdesired, the user has to balance between the predictionaccuracy and the number of classes used in the overlayprocess. The use of input layers with more classes maylower the probability to calculate the layer class means(e.g. soil depth). The overlay of several explanatoryvariables with a low number of classes will likelyincrease the probability to ensure complete coverage inthe prediction maps and higher prediction accuracies.However, in case of the occurrence of unpredictableareas, post-processing is needed to complete the soildepth information. Several approaches can be appliedsuch as taking only the information from one of the

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 11: Test of statistical means for the extrapolation of soil ...

THE EXTRAPOLATION OF TRANSECT SOIL DEPTH POINT INFORMATION 3027

Table VI. Root mean square error (RMSE) calculated cell-based between predicted soil depths and point-measured soil depths

Number of Classes in Terrain Maps

2 3 4 5 6 7 8 9

RMSE (m) 0Ð46 0Ð69 0Ð73 0Ð61 0Ð72 0Ð80 0Ð94 0Ð87Error % 7Ð1 10Ð6 11Ð2 9Ð3 11Ð1 12Ð3 14Ð4 13Ð4R2 0Ð71 0Ð13 0Ð02 0Ð4 0Ð1 0Ð25 0Ð24 0Ð35

Note: The lowest RMSE was reached for the map combination of the geomorphology/2-terrain-classes map. Percentage values show RMSE in percentof the soil depth range.

explanatory variables that exhibits a class mean for thisarea (e.g. from the geomorphology classes in this study)or interpolate the soil depth in the ‘no-data’ areas usinga nearest neighbour interpolation algorithm in GIS.

In many hydrological modelling studies, heterogene-ity and scaling of input variables significantly affectmodel outputs such as predicted outflow or water bal-ance (Quinn et al., 2005). Modelling success is to a greatextent influenced by factors including heterogeneity ofclimate and surface data, the presence of lateral connec-tivity, but also by the ‘mismatch between spatial resolu-tion of measurements and models’ (Arrigo and Salvucci,2005). According to Bloschl and Sivapalan (1995), thespace dimensions of a measurement or a model can beuniquely described using the scale triplet consisting ofspacing, extent and support. Spacing refers to the distancebetween observations, extent refers to the overall cover-age and support refers to the integration area (Bloschland Sivapalan, 1995). Interpolation techniques and trendsurfaces provide useful methods to predict point data atuser-defined scales, however, interpolations depend onspatial autocorrelation, a function of the spatial varianceof the distance (lag) between data points, which is to agreat extent influenced by the spatial organization of mea-surements. Thus, known interpolation techniques such askriging and inverse distance weighting requiring a moreor less uniform and organized distribution of the datapoints over the area to meet the geostatistical assumptionof a spatially correlated random variable. In the case ofnon-uniformly distributed data, these techniques need astrategic sampling of missing data to achieve the desiredcoverage, which is both expensive and time-consuming.The presented soil-modelling approach is to a great extentindependent from the spacing of point observations underthe assumption that the extent of the explanatory vari-ables is greater or equal to the spatial extent of the pointobservations and the spacing between the point observa-tions is smaller than the minimum size of the smallestspatial entity of any explanatory variable. In this study,the sampling distance between point observations was5 m, however, this information is not used in an auto-correlation variogram analysis since the data structure ofcatena sampled point observations invalidates the spatialvariability represented by the variogram. The presentedapproach bypasses the variogram analysis through theoverlay of the point observations with the maps of variousexplanatory variables. It therefore provides a powerful

soil-modelling approach for areas with non-randomly ororganized distribution of point observations.

CONCLUSIONS

In this study, a method entirely based on the use ofnon-uniform distributed point information (e.g. catenaor transect data) was developed and tested. The methodpresents a simple way to predict soil attributes over largerareas using environmental variables and statistical means.First, class means for all points that fall into a class ofeach explanatory layer are calculated and second, thisinformation is predicted for all merged layer classes usinga statistical mean. The method is applicable at all scalesand the final resolution of the predicted map is adjustable.

In comparing the three statistical means used to predictthe soil attribute (soil depth) for merged classes oftwo explanatory variables, several differences in thepredictability of the statistical means were identified. Thevalidation results suggest that soil depth maps predictedwith the harmonic mean showed the highest agreementwith the initial point data set. The prediction accuracyof this method depends generally on the number ofpoint values in the sample, the value range of themeasured attribute (soil depth range: 0–6Ð5 m), and theinitial correlations between the point values and theexplanatory variables. Best results (RMSE D 0Ð46 m)could be achieved, if the merged explanatory variableshad a low number of classes. The results indicate thatusing several explanatory variables with only a fewclasses in the merging process (e.g. terrain indices, landuse maps, geomorphology maps) increases the predictionaccuracy and the degree of spatial disaggregation.

Although the scale independence of the presented soil-modelling approach supports its easy application in largercatchments, regions and for different soil attributes (e.g.soil texture, soil moisture and soil hydraulic properties),further validation in other catchments is required. Resultsfrom this study show that simpler soil-landscape modelscan satisfactorily predict soil hydrological parameterson the basis of non-uniformly distributed data. We findthat this soil model approach provides a useful methodfor generating spatial data for hydrological models inregions with sparsely available data. This might help toimprove hydrological model predictions. Further analysesof the effects of differently disaggregated predictionmaps on modelled runoff, assuming steady conditions

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 12: Test of statistical means for the extrapolation of soil ...

3028 H. E. DAHLKE ET AL.

for the residual model parameters, will help to quantifyuncertainties in hydrological modelling caused by themisrepresentation of scale and landscape properties in themodel input data.

ACKNOWLEDGEMENTS

The soil data were collected in a project carried outat the Department of Water and Environmental Studies,Linkoping University, funded by the National ResearchFoundation. Additional information was provided by theSwedish Meteorological and Hydrological Institute andthe Swedish cadastral system. Many thanks also to GoranLindstrom for his valuable input.

REFERENCES

Arnold JG, Fohrer N. 2005. SWAT2000: current capabilities and researchopportunities in applied watershed modelling. Hydrological Processes19(3): 563–572.

Arrigo JAS, Salvucci GD. 2005. Investigation hydrologic scaling:observed effects of heterogeneity and nonlocal processes acrosshillslope, watershed, and regional scales. Water Resources Research41: W11417. DOI:10.1029/2005WR004032.

Beckett PHT, Webster R. 1971. Soil variability: a review. Soils andFertilizers 34: 1–15.

Behrens T. 2003. Digitale Reliefanalyse als Basis von Boden-Landschaftsmodellen am Beispiel der Verbreitungssystematikperiglaziarer Lagen in deutschen Mittelgebirgen. Dissertation, JustusLiebig Universitat, Giesen.

Behrens T, Forster H, Scholten T, Steinrucken U, Spies E-D, Gold-schmitt M. 2005. Digital soil mapping using artificial neural networks.Journal of Plant Nutrition and Soil Science 168(1): 21–33.

Behrens T, Scholten T. 2007. Digital soil mapping in Germany—areview. Journal of Plant Nutrition and Soil Science 170(1): 181–202.

Beven KJ, Kirkby MJ. 1979. A physically based, variable contributingarea model of basin hydrology. Hydrological Sciences Bulletin 24(1):43–69.

Beven KJ, Kirkby MJ, Schofield N, Tagg AF. 1984. Testing a physically-based flood forecasting model (topmodel) for three U.K. catchments.Journal of Hydrology 69(1–4): 119–143.

Bloschl G, Sivapalan M. 1995. Scale issues in hydrological modelling—areview. Hydrological Processes 9: 251–290.

Bolstad PV, Swank W, Vose J. 1998. Predicting Appalachian overstoryvegetation with digital terrain data. Landscape Ecology 13: 271–283.

Burrough P. 1993. Soil variability: a late 20th century view. Soils andFertilizers 56: 529–562.

Burt TP, Butcher DP. 1985. Topographic controls of soil moisturedistributions. European Journal of Soil Science 36(3): 469–486.

Buttle JM, McDonald DJ. 2002. Coupled vertical and lateral preferentialflow on a forested slope. Water Resources Research 38(5): 1060–1076.

Chamran F, Gessler PE, Chadwick OA. 2002. Spatially explicit treatmentof soil-water dynamics along a semiarid catena. Soil Sciences Societyof America Journal 66: 1571–1583.

Cook SE, Corner RJ, Grealish GJ, Gessler PE, Chartres CJ. 1996. A rulebased system to map soil properties. Soil Sciences Society of AmericaJournal 60: 1893–1900.

Diekkruger B, Kirkby MJ, Schroder U. 1999. Regionalization inhydrology , IAHS Publication Nr. 254. IAHS Press, Institute ofHydrology: Wallingford, Oxfordshire; 265.

Dikau R. 1989. The application of a digital relief model to landformanalysis in geomorphology. In Three Dimensional Application inGeographic Information Systems , Raper J (ed.) Taylor & Francis:London; 51–77.

Efron B. 1981. Nonparametric estimates of standard error: the jackknife,the bootstrap and other methods. Biometrika 68: 589–599.

Florinsky IV. 1998. Accuracy of local topographic variables derived fromdigital elevation models. Geographic Information Science 12: 47–61.

Flugel W. 1995. Delineating hydrological response units by GeographicalInformation System analyses for regional hydrological modelling usingPRMS/MMS in the drainage basin of the River Brol, Germany.Hydrological Processes 9: 423–436.

Frankenberger JR, Brooks ES, Walter MT, Walter MF, Steenhuis TS.1999. A GIS-based variable source area hydrology model.Hydrological Processes 13: 805–822.

Freer J, McDonnell J, Beven KJ, Peters NE, Burns DA, Hooper RP,Aulenbach B, Kendall C. 2002. The role of bedrock topography onsubsurface storm flow. Water Resources Research 37(10): 2607–2618.

Gessler PE, Chadwick OA, Chamran F, Althouse L, Holmes K. 2000.Modeling soil—landscape and ecosystem properties using terrainattributes. Soil Sciences Society of America Journal 64: 2046–2056.

Gessler PE, Moore ID, McKenzie NJ, Rayan PJ. 1995. Soil-landscapemodelling and spatial prediction of soil attributes. International Journalof Geographic Information Systems 4: 421–432.

Hartigan JA, Wong MA. 1979. Algorithm AS 136: a K-Means clusteringalgorithm. Applied Statistics 28(1): 100–108.

Hatfield DC. 1999. TopoTools—A collection of topographic mod-eling tools for ArcInfo.http://gis.esri.com/library/userconf/proc00/professional/papers/PAP560/p560.htm.

Huber M. 1994. The digital geo-ecological map—concepts, GIS-methodsand case studies. Physiogeographica 20: 1–144.

Hodgson ME, Gaile G. 1999. A cartographic modeling approach forsurface orientation related applications. Photogrammetric Engineeringand Remote Sensing 65(1): 85–95.

Horn BK. 1981. Hillshading and the reflectance map. Proceedings of theIEEE 69(1): 14–47.

Jenness J. 2004. Calculating landscape surface areas from digitalelevation models. Wildlife Society Bulletin 32(3): 829–839.

Kravchenko AN. 2003. Influence of spatial structure on accuracy ofinterpolation methods. Soil Sciences Society of America Journal 67:1564–1571.

Lane PW. 2002. Generalized linear models in soil science. EuropeanJournal of Soil Science 53: 241–251.

Leavesley GH, Stannard LG. 1995. The precipitation-runoff modelingsystem—PRMS. In Computer Models of Watershed Hydrology ,Singh VP (ed). Water Resources Publications; 281–310.

Lehmann D, Billen N, Lenz R. 1999. Anwendung von NeuronalenNetzen in der Landschaftsokologie—synthetische Bodenkartierungim GIS. In Anwendung von Neuronalen Netzen in der Land-schaftsokologie—Synthetische Bodenkartierung im GISBook , Strobl J,Blaschke T (eds). Wichmann: Heidelberg; 330–336.

Lehmeier F, Kothe R. 1992. Geomorphological data as componentof a geoscientific information system. Geologisches Jahrbuch 122:371–380.

Lindstrom G, Gardelin M, Johansson B, Persson M, Bergstrom S. 1996.HBV-96—a distributed hydrological model concept. Contribution tothe Nordic Hydrological Conference in Akureyri, 13–15 August 1996,NHP-Report No. 40, Iceland; 708–717.

Lyon SW, Seibert J, Lembo AJ, Walter MT, Steenhuis TS. 2006.Geostatistical investigation into the temporal evolution of spatialstructure in a shallow water table. Hydrology and Earth System Sciences10: 113–125.

McBratney AB, Mendonca Santos ML, Minasny B. 2003. On digital soilmapping. Geoderma 117: 3–52.

McKenzie NJ, Austin MP. 1993. A quantitative Australian approachto medium and small scale surveys based on soil stratigraphy andenvironmental correlation. Geoderma 57: 329–355.

McKenzie NJ, Ryan PJ. 1999. Spatial prediction of soil properties usingenvironmental correlation. Geoderma 89(1–2): 67–94.

McGuire KJ, McDonnell JJ, Weiler M, Kendall C, McGlynn BL,Welker JM, Seibert J. 2005. The role of topography on catchment-scale water residence time. Water Resources Research 41: W05002.DOI:10.1029/2004WR003657.

McGuire KJ, McDonnell JJ, Weiler M, Kendall C, McGlynn BL,Welker JM, Seibert J. 2005. The role of topography on catchment-scale water residence time. Water Resources Research 41: W05002.DOI:10.1029/2004WR003657.

McNab HW. 1989. Terrain shape index: quantifying effect of minorlandforms on tree height. Forest Science 35(1): 91–104.

Mertens M, Nestler I, Huwe B. 2002. GIS-based regionalization of soilprofiles with Classification and Regression Trees (CART). Journal ofPlant Nutrition and Soil Science 165: 39–43.

Mitasova H, Mitas L, Brown WM, Gerdes DP, Kosinovsky I, Baker T.1995. Modeling spatially and temporally distributed phenomena:new methods and tools for GRASS GIS. International Journal ofGeographical Information Systems 9(4): 433–446.

Moore ID, Gessler PE, Nielson GA. 1993. Soil attribute prediction usingterrain analysis. Soil Sciences Society of America Journal 57: 443–452.

Moore ID, Ladson AR, Grayson R. 1991. Digital terrain modelling: areview of hydrological, geomorphological, and biological applications.Hydrological Processes 5: 3–30.

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp

Page 13: Test of statistical means for the extrapolation of soil ...

THE EXTRAPOLATION OF TRANSECT SOIL DEPTH POINT INFORMATION 3029

Nogami M. 1995. Geomorphometric measures for digital elevationmodels. Zeitschrift fur Geomorphologie, N.F. Suppl. Bd. 101: 53–67.

Olofsson B, Fleetwood A. 1994. Georadarundersokningar i Stubbetorp-somradet, Kolmarden. Avd for mark-och vattenresurser, KTH.

O’Loughlin EM. 1986. Prediction of surface saturation zones in naturalcatchments by topographic analysis. Water Resources Research 22:794–804.

Odeh IOA, McBratney AB, Chittleborough DJ. 1994. Spatial predictionof soil properties from landform attributes derived from a digitalelevation model. Geoderma 63(3–4): 197–214.

Odeh IOA, McBratney AB, Chittleborough DJ. 1995. Further resultson prediction of soil properties from terrain attributes: heterotopiccokriging and regression kriging. Geoderma 673: 215–226.

Olaya V. 2004. A Gentle Introduction to SAGA GIS , 1Ð1 edn, 216.Olsson M. 1999. Soil Survey in Sweden. In Soil Resources of Europe,

Bullock P, Jones RJA, Montanarella L (eds). European Soil BureauResearch Report No.6, EUR 18991 EN. Office for Official Publicationsof the European Communities: Luxembourg; 202.

Park SJ, van de Giesen N. 2004. Soil-landscape delineation to definespatial sampling domains for hillslope hydrology. Journal of Hydrology295: 28–46.

Persson C. 1982. Beskrivning till jordartskartan Katrineholm SO , SerieAe, Nr 46. SGU (description of the soil map Katrineholm SO, inSwedish).

Pettersson O. 1995. Vattenbalans for faltforskningsomraden. SMHIHydrologi, No 59. SMHI; 21 (Water balance for field research areas,in Swedish).

Quinn T, Zhu AX, Burt JE. 2005. Effects of detailed soil spatial informa-tion on watershed modeling across different model scales. InternationalJournal of Applied Earth Observation and Geoinformation 7: 324–338.

Ryan PJ, McKenzie NJ, O’Connell D, Loughhead AN, Leppert PM,Jacquier D, Ashton L. 2000. Integrating forest soils information acrossscales: spatial prediction of soil properties under Australian forests.Forest Ecology and Management 138: 139–157.

Saco PM, Willgoose GR, Hancock GR. 2006. Spatial organization ofsoil depths using a landform evolution model. Journal of GeophysicalResearch 111: 14, F02016. DOI:10.1029/2005JF000351.

Shary PA, Sharaya LS, Mitusov AV. 2002. Fundamental quantitativemethods of land surface analysis. Geoderma 107: 1–35.

Sinowski W, Auerswald K. 1999. Using relief parameters in adiscriminant analysis to stratify geological areas with different spatialvariability of soil properties. Geoderma 89(1–2): 113–128.

Skidmore AK, Watford F, Luckananurug P, Ryan PJ. 1996. An opera-tional GIS expert system to map forest soils. Photogrammetric Engi-neering and Remote Sensing 62: 501–511.

Sommer M, Schlichting E. 1997. Archetypes of catenas in respect tomatter a concept for structuring and grouping catenas. Geoderma 76:1–33.

Stieglitz M, Shaman J, McNamara J, Engel V, Shanley J, Kling GW.2003. An approach to understanding hydrologic connectivity onthe hillslope and implications for nutrient transport. GlobalBiogeochemical Cycles 17(4): 1105. DOI:10.1029/2003GB002041.

Tarboton DG. 1997. A new method for the determination of flowdirections and upslope areas in grid digital elevation models. WaterResources Research 33(2): 309–319.

Tromp-van Meerveld HJ, McDonnell JJ. 2006a. Threshold relations insubsurface stormflow: 1. A 147-storm analysis of the Panola hillslope.Water Resources Research 42: W02410.

Tromp-van Meerveld HJ, McDonnell JJ. 2006b. On the interrelationsbetween topography, soil depth, soil moisture, transpiration rates andspecies distribution at the hillslope scale. Advances in Water Resources29: 293–310.

Wigmosta MS, Vail L, Lettenmaier DP. 1994. A distributed hydrology-vegetation model for complex terrain. Water Resources Research 30:1665–1679.

Wikstrom A. 1979. Beskrivning till berggrundskartan Katrineholm SO ,Serie Af, Nr 123. SGU (Description of the geological map,Katrineholm SO, in Swedish).

Wolpert D. 1990. The relationship between Occam’s razor andconvergent guessing. Complex Systems 4: 319–368.

Wood J. 1996. The Geomorphological Characterisation of DigitalElevation Models. PhD Thesis. Department of Geography, Universityof Leicester.

Zevenbergen LW, Thorne CR. 1987. Quantitative analysis of land surfacetopography. Earth Surface Processes and Landforms 12(1): 47–56.

Zhu AX. 2000. Mapping soil landscape as spatial continua: the neuralnetwork approach. Water Resources Research 36: 663–677.

Copyright 2009 John Wiley & Sons, Ltd. Hydrol. Process. 23, 3017–3029 (2009)DOI: 10.1002/hyp