An integrated pan-tropical biomass map using multiple ...

47
1 An integrated pan-tropical biomass map using multiple reference datasets Avitabile V. 1 , Herold M. 1 , Heuvelink G.B.M. 1 , Lewis S.L. 2,3 , Phillips O.L. 2 , Asner G. P. 4 , Asthon P., Banin L.F. 5 , Bayol N. 6 , Berry N. 12 , Boeckx P., de Jong B. 7 , DeVries B. 1 , Girardin C. 8 , Kearsley E. 9 , Lindsell J 10 , Lopez-Gonzalez G. 2 , Lucas R. 11 , Malhi Y. 8 , Morel A. 8 , Mitchard E. 12 , Nagy L., Qie L. 2 , Quinones M. 13 , Ryan C. 12 , Slik F., Sunderland, T., Vaglio Laurin G. 14 , Valentini R. 15 , Verbeeck H. 9 , Wijaya A. 16 , Willcock S. 17 (1). Wageningen University, the Netherlands; (2). University of Leeds, UK; (3).Universtity College London, UK; (4). Carnegie Institution for Science, USA; (5). Centre for Ecology and Hydrology, UK. (6). Foret Ressources Management, France; (7). Ecosur, Mexico; (8). University of Oxford, UK; (9) Ghent University, Belgium; (10). The RSPB Centre for Conservation Science, UK.; (11). Aberystwyth University, Australia; (12). University of Edinburgh, UK; (13). SarVision, the Netherlands; (14). Centro Euro-Mediterraneo sui Cambiamenti Climatici, Italy; (15). Tuscia University, Italy; (16). Center for International Forestry Research, Indonesia; (17). University of Southampton, UK; Abstract We combined two existing biomass datasets (input maps) into a pan-tropical biomass map at 1 km resolution using an unprecedented reference dataset and a data fusion approach, achieving lower errors than the input maps and almost unbiased estimates at the continental scale. A variety of field observations and locally-calibrated high-resolution biomass maps - not used by the input maps - were harmonized and upscaled to the map resolution, providing 15,969 biomass estimates at 1 km resolution (reference dataset). The input maps were integrated using a data fusion approach based on bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input maps, which were estimated from the reference data and additional covariates. The fused map showed biomass stocks for the tropics 15% and 19% lower than the two input maps, and a different spatial pattern. Compared to the input maps, the fused map shows higher biomass density in the dense forest areas in the Congo basin, West Africa Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa. The validation exercise, based on 3,619 estimates from the reference

Transcript of An integrated pan-tropical biomass map using multiple ...

Page 1: An integrated pan-tropical biomass map using multiple ...

1

An integrated pan-tropical biomass map using multiple reference datasets

Avitabile V. 1, Herold M. 1, Heuvelink G.B.M. 1, Lewis S.L. 2,3, Phillips O.L. 2, Asner G. P. 4,

Asthon P., Banin L.F.5, Bayol N. 6, Berry N.12, Boeckx P., de Jong B. 7, DeVries B. 1, Girardin C. 8,

Kearsley E. 9, Lindsell J 10, Lopez-Gonzalez G. 2, Lucas R. 11, Malhi Y. 8, Morel A. 8, Mitchard E. 12,

Nagy L., Qie L.2, Quinones M. 13, Ryan C. 12, Slik F., Sunderland, T., Vaglio Laurin G. 14, Valentini

R. 15, Verbeeck H. 9, Wijaya A. 16, Willcock S. 17

(1). Wageningen University, the Netherlands; (2). University of Leeds, UK; (3).Universtity College

London, UK; (4). Carnegie Institution for Science, USA; (5). Centre for Ecology and Hydrology,

UK. (6). Foret Ressources Management, France; (7). Ecosur, Mexico; (8). University of Oxford,

UK; (9) Ghent University, Belgium; (10). The RSPB Centre for Conservation Science, UK.; (11).

Aberystwyth University, Australia; (12). University of Edinburgh, UK; (13). SarVision, the

Netherlands; (14). Centro Euro-Mediterraneo sui Cambiamenti Climatici, Italy; (15). Tuscia

University, Italy; (16). Center for International Forestry Research, Indonesia; (17). University of

Southampton, UK;

Abstract

We combined two existing biomass datasets (input maps) into a pan-tropical biomass map at 1 km

resolution using an unprecedented reference dataset and a data fusion approach, achieving lower

errors than the input maps and almost unbiased estimates at the continental scale. A variety of field

observations and locally-calibrated high-resolution biomass maps - not used by the input maps -

were harmonized and upscaled to the map resolution, providing 15,969 biomass estimates at 1 km

resolution (reference dataset). The input maps were integrated using a data fusion approach based

on bias removal and weighted linear averaging that incorporates and spatializes the biomass

patterns indicated by the reference data. The method was applied independently in areas (strata)

with homogeneous error patterns of the input maps, which were estimated from the reference data

and additional covariates. The fused map showed biomass stocks for the tropics 15% and 19%

lower than the two input maps, and a different spatial pattern. Compared to the input maps, the

fused map shows higher biomass density in the dense forest areas in the Congo basin, West Africa

Eastern Amazon and South-East Asia, and lower values in Central America and in most dry

vegetation areas of Africa. The validation exercise, based on 3,619 estimates from the reference

Page 2: An integrated pan-tropical biomass map using multiple ...

2

dataset not used in the fusion process, showed that the fused map had a RMSE 8 – 74% lower than

that of the input maps and, most importantly, nearly unbiased estimates (mean bias 4 Mg dry mass

ha-1 vs. 21 and 28 Mg ha-1 for the input maps). The fusion method can be applied at any scale

including the policy-relevant national level, where it can provide improved biomass estimates by

integrating existing regional biomass maps as input maps and additional, country-specific reference

datasets.

Keywords: aboveground biomass, carbon cycle, forest plots, tropical forest, forest inventory,

REDD+, satellite mapping, remote sensing,

Introduction

Recently, considerable efforts have been made to better quantify the amounts and spatial

distribution of aboveground biomass, a key parameter for estimating carbon emissions and

removals due to land-use change, and related impacts on climate (Baccini et al., 2012; Harris et al.,

2012; Houghton et al., 2012; Saatchi et al. 2011; Mitchard et al. 2014). Particular attention has been

given to the tropical regions, where uncertainties are higher (Ziegler et al., 2012; Grace et al., 2014).

In addition to ground observations acquired by research networks or for forest inventory purposes,

several biomass maps have been recently produced at different scales, using a variety of empirical

modelling approaches based on remote sensing data calibrated by field observations (e.g., Goetz et

al., 2011; Birdsey et al., 2013). Biomass maps at moderate resolution have been produced for the

entire tropical belt by integrating various satellite observations (Saatchi et al., 2011; Baccini et al.,

2012), while higher resolution datasets have been produced at local or national level using medium-

high resolution satellite data (e.g., Avitabile et al., 2012; Cartus et al., 2014), sometimes in

combination with airborne Light Detection and Ranging (LiDAR) data (Asner et al., 2012a, 2012b,

2013, 2014a). The various datasets often have different purposes: research plots provide a detailed

and accurate estimation of biomass (and other ecological parameters or processes) at the local level,

forest inventory networks using a sampling approach to obtain statistics of biomass stocks (or

growing stock volume) per forest type at the sub-national or national level, while high-resolution

biomass maps can provide detailed and spatially explicit estimates of biomass density to assist

natural resource management, and large scale datasets depict biomass distribution for global-scale

carbon accounting and modelling.

Page 3: An integrated pan-tropical biomass map using multiple ...

3

In the context of the United Nations mechanism for Reducing Emissions from Deforestation and

forest Degradation (REDD+), emission estimates obtained from spatially explicit biomass datasets

may be favoured compared to those based on mean values derived from plot networks. This

preference stems from the fact that plot networks are not designed to represent land cover change

events, which usually do not occur randomly and may affect forests with biomass density

systematically different from the mean value (Baccini and Asner, 2013). With very few tropical

countries having national biomass maps or reliable statistics on forest carbon stocks, regional maps

may provide advantages compared to the use of default mean values (e.g., IPCC (2006) Tier 1

values) to assess emissions from deforestation, if their accuracy is reasonable and their estimates are

not affected by systematic errors (Avitabile et al., 2011). However, these conditions are difficult to

assess since proper validation of regional biomass maps remains problematic, given their large area

coverage and large mapping unit (Mitchard et al., 2013), while ground observations are only

available for a limited number of small sample areas.

The comparison of two recent pan-tropical biomass maps (Saatchi et al., 2011; Baccini et al., 2012)

reveals substantial differences between the two products (Mitchard et al., 2013). Further

comparison with ground observations and high-resolution maps in the Amazon basin indicated

substantially different biomass patterns at regional scales (Mitchard et al., 2014; Baccini and Asner,

2013, Hills et al., 2013). Such comparisons have stimulated a debate over the use and capabilities of

different types of biomass products (Saatchi et al., 2014; Langner et al., 2014) and have highlighted

both the importance and sometimes the lack of integration of different datasets. On one hand, the

two pan-tropical maps are consistent in terms of methodology because both use the same primary

data source (GLAS LiDAR) alongside a similar modelling approach to upscale the LiDAR data to

larger scales. Moreover, they have the advantage of being calibrated using hundreds of thousands of

biomass estimates derived from height metrics computed by a spaceborne LiDAR sensor distributed

over the tropics. However, such maps are based on remotely sensed variables that do not directly

measure biomass, but are sensitive to canopy cover and canopy height parameters that do not fully

capture the biomass variability of complex tropical forests. Furthermore, both products assume

global or continental allometric relationships in which biomass varies only with stand height, and

further errors are introduced by upscaling the calibration data to the coarser satellite data. On the

other hand, ground plots use allometric equations to estimate biomass at individual tree level using

directly measurable parameters such as diameter, height and species identity (hence wood density).

However, they have limited coverage, are not error-free, and compiling various datasets over large

areas is made more complex due to differing sampling strategies (e.g., stratification (or not) of

Page 4: An integrated pan-tropical biomass map using multiple ...

4

landscapes, plot size, minimum diameter of trees measured). Considering the rapid increase of

biomass observations at different scales and the different capabilities and limitations of the various

datasets, it is becoming more and more important to identify strategies that are capable of making

best use of existing information and optimally integrate various data sources for improved large

area biomass assessment (e.g., see Willcock et al. 2012).

In the present study, we compiled existing ground observations and high-resolution biomass maps

to obtain a reference dataset of high quality aboveground biomass data for the tropical region,

including both plot data and high-resoltion biomass maps (objective 1). This reference dataset was

used to assess the two pan-tropical biomass maps (objective 2) and to combine them in a fused map

that optimally integrates the two maps, based on the method presented by Ge et al. (2014) (objective

3). Lastly, the fused map was compared to known biomass patterns and stocks across the tropics

(objective 4).

Overall, the approach consisted of pre-processing, screening and harmonizing the Saatchi and

Baccini maps (called ‘input maps’), the high-resolution biomass maps (called ‘reference maps’) and

the field plots (called ‘reference plots’; ‘reference dataset’ refers to the maps and plots combined) to

a common spatial resolution and geospatial reference system (Figure 1). The input maps were

combined using bias removal and weighted linear averaging (‘fusion’). The fusion model was

applied independently in areas representing different error patterns of the input maps (called ‘error

strata’), which were estimated from the reference data and additional covariates (called ‘covariate

maps’). The reference dataset included only a subset of the reference maps (i.e., the cells with

highest confidence) and if a stratum was lacking reference data (‘reference data gaps’), additional

data were extracted from the reference maps (‘consolidation’). The fused map was validated using

independent data and its uncertainty quantified using model parameters. In this study, the term

biomass refers to aboveground live woody biomass and is reported in units of Mg dry mass ha-1.

Page 5: An integrated pan-tropical biomass map using multiple ...

5

Figure 1: methodology flowchart

Data & Methods

Input maps

The input maps used for this study were the two pan-tropical datasets published by Saatchi et al.

(2011) and Baccini et al. (2012), hereafter referred to as the Saatchi and Baccini maps individually,

or collectively as input maps. The Baccini map was provided in MODIS sinusoidal projection with

a spatial resolution of 463 m while the Saatchi map is in a geographic projection (WGS-84) at

0.00833 degrees (c. 1 km) pixel size. The two datasets were harmonized by first projecting the

Baccini map to the coordinate system of the Saatchi map using the Geospatial Data Abstraction

Library (www.gdal.org) and then aggregating to match its spatial resolution and grid. Spatial

aggregation was performed by computing the mean value of the pixels whose centre was located

Page 6: An integrated pan-tropical biomass map using multiple ...

6

within each 1 km cell of the Saatchi map. Resampling was then undertaken using the nearest

neighbor method.

Reference dataset

The reference dataset comprised individual tree-based field data and high-resolution biomass maps.

The field data included biomass estimates derived from field measurement of tree parameters and

allometric equations. The biomass maps included high-resolution (≤ 100 m) datasets derived from

satellite data using empirical models calibrated and validated using local ground observations and,

in some cases, airborne LiDAR measurements (Table S7 – S10). Given the variability of procedures

used to acquire and produce the various datasets, they were first screened according to a set of

quality criteria to select only the most reliable biomass estimates, and then pre-processed to be

harmonized with the pan-tropical biomass maps in terms of spatial resolution and variable observed.

Field and map datasets providing aboveground carbon density were converted to biomass units

using the same coefficients used for their original conversion from biomass to carbon. The sources

of the reference data are listed in table S7 and S9.

Data screening and pre-processing

Reference field data

The reference field data included ground observations in forest inventory plots, for which accurate

geolocation and biomass estimates were available. The pre-processing of the data consisted of a 2-

step screening and a harmonization procedure. A preliminary screening selected only the ground

data that estimated aboveground biomass of all living trees with diameter at breast height ≥ 5-10 cm,

and acquired on or after the year 2000. The taxonomic identities of trees strongly indicate wood

density and hence stand-level biomass (e.g., Baker et al., 2004; Mitchard et al. 2014). Plots were

therefore only selected if tree biomass was estimated using at least tree diameter and wood density

as input parameters, and plot coordinates were measured using a GPS. All datasets not conforming

to these requirements or not providing clear information on the biomass pool measured, the tree

parameters measured in the field, the allometric model applied, the year of measurement and the

plot geolocation and extent were excluded. Next, the plot data were projected to the geographic

reference system WGS-84 and harmonized with the input maps by averaging the biomass values

located within the same 1 km pixel. The field plots not fully located within one pixel were attributed

to the map cell where the majority of the plot area (i.e., the plot centroid) was located.

Page 7: An integrated pan-tropical biomass map using multiple ...

7

Lastly, the representativeness of the plot over the 1km pixels was considered, and the ground data

were further screened to discard plots not representative of the map cells in terms of biomass

density. More specifically, since the two input maps are not aligned and therefore their pixels do not

correspond to the same geographic area, the plot representativeness was assessed on the area of both

pixels (identified before the map resampling). The representativeness was evaluated on the basis of

the homogeneity of the tree cover and crown size within the pixel, and it was assessed using visual

interpretation of high-resolution images provided on the Google Earth platform. If the tree cover

and tree crowns were not homogeneous over at least 90% of the pixel area, the plots located within

the pixel were discarded. More details on the selection procedure are provided in the Supplementary

Information.

Reference biomass maps

The reference biomass maps consisted of high quality local or national maps published in the

scientific literature. Maps providing biomass estimates grouped in classes (e.g., Willcock et al.,

2012) were not used since the class values represent the mean biomass over large areas, usually

spanning multiple strata used in the present study (see ‘Stratification approach’). The reference

biomass maps were first pre-processed to match the input maps through re-projection, aggregation

and resampling using the same procedures described for the pre-processing of the Baccini map.

Then, only the cells with largest confidence (i.e., lowest uncertainty) were selected from the maps.

Since uncertainty maps were usually not available, and considering that the reference maps were

based on empirical models, the map cells with greatest confidence were assumed to be those in

correspondence of the training data (field plots and/or LiDAR data). When the locations of the

training data were not available, random pixels were extracted from the maps. In order to compile a

reference database that was representative of the area of interest and well balanced among the

various input datasets (as defined in ‘Consolidation of the reference dataset’), the amount of

reference data extracted from the biomass maps was proportional to their area and not greater than

the amount of samples provided by the field datasets representing a similar area. In the case where

maps with extensive training areas provided a disproportionate number of reference pixels, a further

screening selected only the areas underpinned by the largest amount of training data.

Selected reference data

Page 8: An integrated pan-tropical biomass map using multiple ...

8

The biomass reference dataset compiled for this study consists of 15,969 1-km reference pixels,

distributed as follows: 953 in Africa, 449 in South America, 9,167 in Central America, 400 in Asia

and 5,000 in Australia (Fig. 2, Table 1). The reference data were relatively uniformly distributed

among the strata (Table S5) but their amount varied considerably by continent. The average amount

of reference data per stratum ranged from 50 (Asia) to 1,144 (Central America) reference pixels and

their variability (computed as standard deviation relative to the mean) ranged from 25% (South

America) to 57% (Central America). The uneven distribution of reference data across the continents

is mostly caused by the availability of ground observations: in order to have a balanced reference

dataset for each stratum, the reference data extracted from biomass maps were limited to the

(smaller) amount of direct field observations. When biomass maps were the only source of data this

constrain was not occurring and larger datasets could be derived from the maps (i.e., Central

America, Australia). Ad example, all pixels (4,263) in correspondence of the training data could be

selected from the biomass map of Mexico while only 13% of the 1,167 pixels with training data

could be selected from the biomass map of Uganda to maintain comparability with smaller

reference datasets available in Africa.

The reference data were selected from 18 ground datasets providing 1,591 research field

observations and 5,036 forest inventory plots, and from 9 high resolution biomass maps calibrated

by field observations and, in four cases, airborne LiDAR data. The field plots used for the

calibration of the maps are not included in this section because they were only used to select the

reference pixels from the maps. The visual screening of the field plots removed 35% of the input

data (from 6,627 to 4,283) and their aggregation to 1 km resolution further removed 70% of the

reference units derived from field plots (from 4,283 to 1,274), while 10,741 reference pixels were

extracted from the high-resolution biomass maps. The criteria used to select the reference pixels for

each map are reported in Table S3. The consolidation procedure added 3,954 reference data to the

final reference dataset that consisted of 15,969 units (Table S2). In particular, ground observations

were mostly discarded in areas characterized by fragmented or heterogeneous vegetation cover and

high biomass spatial variability. In such contexts, reference data were often acquired from the

biomass maps. The reference dataset was compared to the input maps, revealing substantial linear

correlation (ranging from 0.60 to 0.77) between the errors of the Saatchi and Baccini maps and

RMSE values in the same order of magnitude for all continents, ranging from 96 to 124 Mg ha-1,

with the exception of Australia (45 Mg ha-1) (Table S4).

Page 9: An integrated pan-tropical biomass map using multiple ...

9

Figure 2: Biomass reference dataset for the tropics and spatial coverage of the two input maps

Table 1: Number of reference data (plots and 1-km pixels) selected after the screening, upscaling and

consolidating procedures, per continent. The reference data selected for each individual dataset are reported in

Table S2. The field plots underpinning the reference biomass maps are not included.

Continent Available Selected Consolidated

Plots Plots Pixels Pixels

Africa 2,281 1,976 953 953

S. America 648 474 449 449

C. America - - 5,260 9,167

Asia 3,698 1,833 353 400

Australia - - 5,000 5,000

Total TROPICS 6,627 4,283 12,015 16,969

Modelling approach

The fusion model

The integration of the two input maps was performed with a fusion model based on the concept

presented by Ge et al. (2014) and further developed for this study. The fusion model consists of bias

removal and weighted linear averaging of the input maps to produce an output with greater

accuracy than each of the input maps. The reference biomass dataset described above was used to

calibrate the model and to assess the accuracy of the input and fused maps. A specific model was

developed for each stratum (see ‘Stratification approach’).

Page 10: An integrated pan-tropical biomass map using multiple ...

10

Following Ge et al. (2014), the p input maps for locations sD, where D is the geographical domain

of interest common to the input maps, were combined using a weighted linear average:

(1) 1

( ) ( ) ( ( ) ( ))

p

i i iif s w s z s v s

where f is the fused map, the wi(s) are weights, zi the estimate of the i-th input map and vi(s) is the

bias estimate. The bias term was computed as the average difference between the input map and the

reference data. The weights were obtained from a statistical model that assumes the map estimates

zi to be the sum of the true biomass bi with a bias term vi and a random noise term i with zero mean

for each location sD. We further assumed that the i of the input maps are jointly normally

distributed with variance-covariance matrix C(s). Differently from Ge et al. (2014), C(s) was

estimated using a robust covariance estimator as implemented by the ‘robust’ package in R (Wang

et al., 2014), which uses the Stahel-Donoho estimator for strata with fewer than 5,000 observations

and the Fast Minimum Covariance Determinant estimator for larger strata. Under these assumptions,

the variance of the estimation error of the fused map f(s) is minimized by calculating the weights

w(s) as Searle (1971. p. 89):

(2) 1

1 1( ) ( ) ( )

1 C 1 1 CT T Tw s s s

where 1=[1, ..., 1]T is the p-dimensional unit vector and where T means transpose. Larger weights

were assigned to the map with lower error variance. The fusion model assured that the variance of

the error in the fused map was smaller than that of the input maps (Bates and Granger, 1969),

especially if the errors associated with these maps were not strongly positively correlated and their

error variances were close to the smallest error variance. The fusion model can be applied to any

number of input maps. Where there is only one input map, the model estimates and removes its bias

and the weights are set equal to 1.

The model parameters

The fusion model computed a set of bias and weight parameters for each stratum and continent on

the basis of the respective reference data, and used these for the linear weighted combination of the

input maps (Table S5). Since the stratification approach grouped together data with similar error

patterns (see ‘Stratification approach’), the biases varied considerably among the strata and could

reach values up to ±200 Mg ha-1. However, considering the area of the strata, the biases of both

Page 11: An integrated pan-tropical biomass map using multiple ...

11

input maps were smaller than ± 45 Mg ha-1 for at least 50% of the area of all continents and smaller

than ±100 Mg ha-1 for 81% - 98% of the area of all continents.

Stratification approach

Error modelling

Preliminary comparison of the reference data with the input maps showed that the error variances

and biases of the input maps were not spatially homogeneous but varied considerably in different

regions. Since the fusion model is based on bias removal and weighted combination of the input

maps, the more homogeneous the error characteristics in the input maps are, the better they can be

reduced by the model. For this reason, the stratification approach aimed at identifying areas with

homogeneous error structure (hereafter named ‘error strata’) in both input maps. A first

stratification was done by geographic location (namely Central America, South America, Africa,

Asia and Australia) to reflect the regional allometric relationships between biomass and tree

diameter and height (Feldpausch et al., 2011, 2012). Then, the error strata were identified for each

continent, using a two-step process. Firstly, the error maps of the Saatchi and Baccini maps were

predicted separately on the basis of their biomass estimates and land cover, tree cover and tree

height parameters by using a Random Forest model (Breiman, 2001), calibrated on the basis of the

reference dataset. Secondly, the error maps of the Saatchi and Baccini datasets were clustered using

the K-Means approach. Eight clusters (hence, eight error strata) was considered as a sensible trade-

off between homogeneity of the errors of the input maps and number of reference observations

available per stratum (Fig. S1). The performance of this approach was assessed on the basis of the

root mean square error (RMSE) of the models that predicted the errors of the input maps. Since the

ensemble modelling approach used in this study (Random Forest) includes a certain level of

randomness, the model performance was computed as an average of 100 model runs (Fig. S2, Fig.

S3). The RMSE computed on the Out-Of-Bag data (i.e., data not used for training) of the Random

Forest models for the Baccini extent ranged between 22.8 ± 0.3 Mg ha-1 for Central America to 83.7

± 2.5 Mg ha-1 in Africa, with the two models (one for each input map) achieving similar accuracies

in each continent (Fig. S2, Fig. S3). In most cases the main predictors of the errors of the input

maps were the biomass values of the maps themselves, followed by tree cover and tree height, while

land cover was always the least important predictor (Table S1). Further details on error modelling

and processing of the input data are provided in the Supplementary Information.

Page 12: An integrated pan-tropical biomass map using multiple ...

12

The stratification map identifies eight strata for each continent with homogeneous error patterns in

the input maps (Fig. S4). The use of a stratification based on the errors of the input maps was

compared with a stratification based on an alternative variable, such as land cover (used by Ge et al.,

2014), tree cover or tree height. Each of these variables was aggregated into eight classes to

maintain comparability with the number of clusters used in the error strata, and each stratification

map was used to develop a specific fused map. The performance of alternative stratification

approaches was assessed by validating the respective fused maps. The results (Fig. S5)

demonstrated that the stratification based on error modelling and clustering (i.e., the error strata)

produced a fused map with higher accuracy than that of the maps based on other stratification

approaches, and therefore was used in this study.

Consolidation of the reference dataset

Considering that the parameters of the fusion model (i.e. weight and bias) are sensitive to the

characteristics of the reference data, each stratum requires that calibration data are relatively well-

balanced between the various reference datasets. Specifically, if a stratum contains few calibration

data, the model becomes more sensitive to outliers, while if a reference dataset is much larger than

the others, the model is more strongly determined by the dominant dataset. For these reasons, where

the reference dataset was under-represented or un-balanced, it was consolidated by additional

reference data taken from the reference biomass maps, if available. The reference data were

considered insufficient if a stratum had less than half of the average reference data per stratum, and

were considered un-balanced if a single dataset provided more than 75% of the reference data of the

whole stratum and it was not representative of more than 75% of its area. In such cases, additional

reference data were randomly extracted from the reference biomass maps that did not provide more

than 75% of the reference data. The amount of data to be extracted from each map was computed in

a way to obtain a reference dataset with an average number of reference data per stratum and not

dominated by a single dataset. If necessary, additional training data representing areas with no

biomass (e.g., bare soil) were included, using visual analysis of Google Earth images to identify

locations without vegetation.

Post-processing

Predictions outside the coverage of the Baccini map

The Baccini map covers the tropical belt between 23.4 degree north latitude and 23.4 degree south

latitude while the Saatchi map presents a larger latitudinal coverage (Fig. 2). The fusion model was

Page 13: An integrated pan-tropical biomass map using multiple ...

13

firstly applied to the area common to both input maps (Baccini extent) and then extended to the area

where only the Saatchi map is available. In the latter area, the model focused only on removing the

bias of the Saatchi map using the values estimated for the Baccini extent. The model predictions for

the Saatchi extent were mosaicked to those for the Baccini extent using a smoothing function

(inverse distance weight) on an overlapping area of 1 degree within the Baccini extent between the

two maps. The resulting fused map was projected to an equal area reference system (MODIS

Sinusoidal) before computing the total biomass stocks for each continent, which were obtained by

summing the products of the biomass density of each pixel with their area. Water bodies were

masked over the whole study area using the ESA CCI Water Bodies map (ESA, 2014).

Assessing biomass in intact and non-intact forest

The biomass estimates of the fused and input maps in forest areas were further investigated

regarding their distribution in ecozones and between intact and non-intact landscapes. Forest areas

were defined as areas dominated by tree cover according to the GLC2000 map (classes 1-10 in the

global legend of GLC2000). Ecozones were defined according to the Global Ecological Zone (GEZ)

map for the year 2000 (FAO, 2000). The intact landscapes were defined according to the Intact

Forest Landscape (IFL) map for the year 2000 (Potapov et al., 2008). On the basis of these datasets

the mean forest biomass density of the fused and input maps were computed for intact and non-

intact landscapes for each continent and major ecozone. To reduce the impact of spatial

inaccuracies in the maps only ecozones with intact forest areas larger than 1,000 km2 were

considered. The mean biomass density of intact and non-intact forests per continent was computed

as the area-weighted mean of the contributing ecozones.

Validation and uncertainty

Validation was performed by randomly splitting the reference data into a calibration set (70% of the

data) and a validation set (remaining 30%). The ‘final’ fused map presented in Fig. 3 used 100% of

the reference data and for validation purposes a ‘test’ fused map was produced using only the

calibration data and its estimates, as well as those of the input maps, were compared with the

validation data. To maintain full independence, validation data were not used for any step related to

the development of the fused map, including production of the stratification map. To account for

any potential impacts of the random selection of validation data, the procedure was repeated 100

times, computing each time a new random selection of the calibration and validation datasets. This

procedure allowed computing the mean RMSE and assessing its standard deviation for each map.

Page 14: An integrated pan-tropical biomass map using multiple ...

14

The uncertainty of the fused map was computed with respect to model uncertainty, not including the

error sources in the input data (see ‘Discussion’). The model uncertainty consisted of the expected

variance of the error of the fused map (which is assumed bias-free) and was derived for each

stratum from C(s). The error variance was converted to an uncertainty map by reclassifying the

stratification map.

Results

Biomass map

The fusion model produced a biomass map at 1 km resolution for the tropical region, with an extent

equal to that of the Saatchi map (Fig. 3). In terms of aboveground stocks, the fused map gave

biomass estimates lower than both input maps at continental level. The total stock was 451 Pg dry

mass, 17% lower than the estimate of the Saatchi map (545 Pg) for the same extent. Considering the

same common area (Baccini extent), the fused map estimate was 360 Pg, 13% and 21% lower than

the Saatchi (413 Pg) and Baccini (457 Pg) estimates, respectively (Table S6).

Moreover, the fused map presented spatial patterns substantially different from both input maps

(Fig. 4): the biomass estimates were higher than both input maps in the dense forest areas in the

Congo basin, in West Africa, in the north-eastern part of the Amazon basin (Guyana shield) and in

South-East Asia, and lower in Central America and in most dry vegetation areas of Africa. In the

central part of the Amazon basin the fused map showed lower estimates than the Baccini map and

higher estimates than the Saatchi map, while in the southern part of the Amazon basin these

differences were inversed. Similar trends emerged when comparing the maps separately for intact

and non-intact forest ecozones (Supporting Information). In addition, the average difference

between intact and non-intact forests was larger than that derived from the input maps in Africa,

Asia and Australia, similar or slightly larger in South America, and smaller in Central America (Fig.

S7).

The fused map records the highest biomass density (> 400 Mg ha-1) in the Guyana shield, in the

Central and Western part of the Congo basin and in the intact forest areas of Borneo and Papua New

Guinea. The analysis of the distribution of forest biomass in intact and non-intact ecozones showed

that, according to the fused map, the mean biomass density was greatest in intact African (360 Mg

ha-1) and Asian (328 Mg ha-1) forests, followed by intact forests in South America (262 Mg ha-1),

Page 15: An integrated pan-tropical biomass map using multiple ...

15

Asia (202 Mg ha-1), Australia (162 Mg ha-1) and Central America (135 Mg ha-1) (Fig. S7). Biomass

in non-intact forests was much lower in all regions (Africa, 76 Mg ha-1; South America, 136 Mg ha-

1; Asia, 196 Mg ha-1; Australia, 90 Mg ha-1; and Central America, 46 Mg ha-1).

Figure 3: Fused map, representing the distribution of live woody aboveground biomass (AGB) for all land cover

types at 1 km resolution for the tropical region.

Page 16: An integrated pan-tropical biomass map using multiple ...

16

Figure 4: Difference maps obtained by subtracting the fused map from the Saatchi map (top) and the Baccini

map (bottom).

Validation

The validation exercise showed that the fused map achieved a lower RMSE (a decrease of 8 – 74%)

and bias (a decrease of 90 – 153%) than the input maps for all continents (Fig. 5). While the RMSE

of the fused map was consistently lower than that of the input maps but still substantial (87 – 98 Mg

ha-1) in the largest continents (Africa, South America and Asia), the mean error (bias) of the fused

map was almost null in most cases. Moreover, in the three main continents the bias of the input

maps tended to vary with biomass, with overestimation at low values and underestimation at high

values, while the errors of the fused map were more consistently distributed (Fig. 6). When

computing the error statistics as average of the regional validation results weighted by the

respective area coverage, the mean bias (in absolute terms) for the fused, Saatchi and Baccini maps

was 5, 21 and 28 Mg ha-1 and the mean RMSE was 89, 104 and 112 Mg ha-1, respectively (Fig. 5).

The accuracy of the input maps was computed using the validation dataset (30% of the reference

dataset) to be consistent with the accuracy of the fused map. The accuracy of the input maps was

Page 17: An integrated pan-tropical biomass map using multiple ...

17

also computed using all reference data and the results (Table S4) were similar to those based on the

validation dataset.

Figure 5: RMSE (left) and bias (right) of the fused and input maps (in Mg ha-1) per continent obtained using

independent reference data not used for model development. The error bars indicate one standard deviation of

the 100 simulations. Numbers reported in brackets indicate the number of reference observations used for each

continent.

Figure 6: scatterplots of the validation reference data (x-axis) and predictions (y-axis) of the input maps (left

plots) and fused map (right plots) by continent.

Page 18: An integrated pan-tropical biomass map using multiple ...

18

Uncertainty map

The uncertainty of the model predictions at 1 km resolution indicated that the standard deviation of

the error of the fused map for each stratum was in the range 11 - 78 Mg ha-1, with largest

uncertainties in areas with largest biomass estimates (Congo basin and Eastern Amazon basin).

When computed in relative terms (as percentage of the biomass estimate) the model uncertainties

presented opposite patterns, with uncertainties larger than the estimates (> 100%) in low biomass

areas (< 20 Mg ha-1 on average) of Africa, South America and Central America, while high biomass

forests (> 210 Mg ha-1 on average) had uncertainties lower than 25% (Fig. 7). The uncertainty

measure derived from C(s) is computed only when two or more input maps are available. Hence it

could not be calculated for Australia because the model for this continent was based on only one

input map (Saatchi map).

Figure 7: Uncertainty of the fused map, in absolute values (top) and relative to the biomass estimates (bottom),

representing one standard deviation of the error of the fused map.

Page 19: An integrated pan-tropical biomass map using multiple ...

19

Discussion

Biomass patterns and stocks emerging from the reference data

The biomass map produced with the fusion approach is largely driven by the reference dataset and

essentially the method is aimed at spatializing the biomass patterns indicated by the reference data

using the support of the input maps. For this reason, great care was taken in the pre-processing of

the reference data, which included a two-step quality screening based on metadata analysis and

visual interpretation, and their consolidation after stratification. As a result, the reference dataset

provides an unprecedented compilation of biomass estimates at 1 km resolution for the tropical

region, covering a wide range of vegetation types, biomass ranges and ecological regions across the

tropics. It includes the most comprehensive and accurate tropical field plot networks and high

quality maps calibrated with airborne LiDAR, which provide more accurate estimates compared to

those obtained from other sensors (Zolkos et al., 2013). The main trends present in the fused map

emerged from the combination of different and independent reference datasets and are in agreement

with the estimates derived from long-term research plot networks (Malhi et al., 2006; Phillips et al.

2009; Lewis et al. 2009; Slik et al., 2010, 2013; Lewis et al., 2013) and high-resolution maps (Asner

et al., 2012a, 2012b, 2013, 2014a). Specifically, the biomass patterns in South America represent

spatial trends described by research plot networks in the dense intact and non-intact forests in the

Amazon basin, forest inventory plots collected in the dense forests of Guyana and samples extracted

from biomass maps for Colombia and Peru representing a wide range of vegetation types, from arid

grasslands to humid forests. Similarly, biomass patterns depicted in Africa were derived from a

combination of various research plots in dense undisturbed forest (Gabon, Cameroon, Democratic

Republic of Congo, Ghana, Liberia), inventory plots in forest concessions (Democratic Republic of

Congo), biomass maps in woodland and savannah ecosystems (Uganda, Mozambique) and research

plots and maps in montane forests (Ethiopia, Madagascar). Most vegetation types in Central

America, Asia and Australia were also well-represented by the extensive forest inventory plots

(Indonesia, Vietnam and Laos) and high-resolution maps (Mexico, Panama, Australia).

In spite of the extensive coverage, the current database is far from being representative of the

biomass variability across the tropics. As a consequence, the model estimates are expected to be

less accurate in contexts not adequately represented. In the case of the fusion approach, this

corresponds to the areas where the input maps present error patterns different than those identified

in areas with reference data: in such areas the model parameters used to correct the input maps (bias

and weight) may not adequately reflect the errors of the input maps and hence cannot optimally

Page 20: An integrated pan-tropical biomass map using multiple ...

20

correct them. In particular, deciduous vegetation and heavily disturbed forest of Africa and South

America, and large parts of Asia were lacking quality reference data. Moreover, even though plot

data were spatially distributed over the central Amazon and the Congo basin, large extents of these

two main blocks of tropical forest have never been measured (cf. maps in Mitchard et al. 2014;

Lewis et al. 2013). Considering the evidence of significant local differences in forest structure and

biomass density within the same forest ecosystems (Kearsley et al., 2013), additional data are

needed to strengthen the confidence of the fused map as well as that of any other biomass map

covering the tropical region. Moreover, a dedicated gap analysis to assess the main regions lacking

biomass reference data and identify priority areas for new field sampling and LiDAR campaigns

would be very valuable for future improved biomass mapping.

Regarding the biomass stocks, a previous study showed that despite their often very strong local

differences the two input maps tended to provide similar estimates of total stocks at national and

biome scales and presented an overall net difference of 10% for the pan-tropics (Mitchard et al.,

2013). However, such convergence is mostly due to compensation of contrasting estimates when

averaging over large areas. The larger differences with the estimates of the present study (13% and

21%) suggest an overestimation of the total stocks by the input maps. This is in agreement with the

results of two previous studies that, on the basis of reference maps obtained by field-calibrated

airborne LiDAR data, identified an overestimation of 23% - 42% of total stocks in the Saatchi and

Baccini maps in the Colombian Amazon (Mitchard et al., 2013) and a mean overestimation of about

100 Mg ha-1 for the Baccini map in the Colombian and Peruvian Amazon (Baccini and Asner,

2013).

In general, the biomass density values of the fused map were calibrated and therefore in agreement

with the existing estimates obtained from plot networks and high resolution maps. The comparison

of mean biomass values in intact and non-intact forests stratified by ecozone provided further

information on the differences among the maps. The mean biomass values of the fused map in non-

intact forests were mostly lower than those of the input maps, suggesting that in disturbed forests

the biomass estimates derived from stand height parameters retrieved by spaceborne LiDAR (as in

the input maps) tend to be higher compared to those based on tree parameters or very high

resolution airborne LiDAR measurements (as in the fused map and reference data). This difference

occurred especially in Africa, Asia and Central America while it was less evident in South America

and Australia. By contrast, the differences among the maps for intact forests varied by continent,

with the fused map having, on average, higher mean biomass values in Africa, Asia and Australia,

Page 21: An integrated pan-tropical biomass map using multiple ...

21

lower values in Central America, and variable trends within South America, reflecting the different

allometric relationships used by the various datasets in different continents.

As mentioned above, a larger amount of reference data, ideally acquired based on a clear statistical

sampling design, instead of an opportunistic one, will be required to confirm such conclusions.

While dense sampling of tropical forests using field observations is often impractical, new

approaches combining sufficient ground observations of individual trees at calibration plots with

airborne LiDAR measurements for larger sampling transects would allow a major increase in the

quantity of calibration data. In combination with wall-to-wall medium resolution satellite data (e.g.,

Landsat) these may be capable of achieving high accuracy over large areas (10% - 20% uncertainty

at 1-ha scale) while being cost-effective (e.g., Asner et al. 2014b; Asner et al., 2013). In addition,

new technologies, such as terrestrial LiDAR scanning, allows for better estimates at ground level

(Calders et al., 2015), reducing considerably the uncertainties of field estimates based on

generalized allometric equations without employing destructive sampling. Nevertheless, such

techniques benefit from extensive and precise measurements of tree identity in order to determine

wood density patterns, since floristic composition influences biomass at multiple scales (e.g., the

strong pan-Amazon gradient in wood density shown by ter Steege et al., 2006), and cannot account

for variations in hollow stems and rottenness (Nogueira et al., 2006).

Additional error sources

Apart from the uncertainty of the fusion model described above (see ‘Uncertainty’), three other

sources of error were identified and assessed in the present approach: i) errors in the reference

dataset; ii) errors due to temporal mismatch between the reference data and the input maps; iii)

errors in the stratification map.

Errors in the reference dataset

The reference dataset is not error-free but it inherits the errors present in the field data and local

maps and introduces additional uncertainties during the pre-processing of the data by resampling

the maps and by upscaling the plot data to 1 km resolution. In particular, while the geolocation error

of the original datasets was considered relatively small (< 50 m) since plot coordinates were

collected using GPS measurements and the biomass maps were based on satellite data with accurate

geolocation (i.e., Landsat, ALOS, MODIS), larger errors (up to 500 m, half a pixel) could have

been introduced with the resampling of the 1 km input maps. All these error sources were

Page 22: An integrated pan-tropical biomass map using multiple ...

22

minimized by selecting only the datasets that fulfilled certain quality criteria and by further

screening them by visual analysis of high-resolution images available on the Google Earth platform,

discarding the data not representative of the respective map pixels. In case of reference data that

clearly did not match with the high-resolution images and/or with the input maps (e.g., reporting no

biomass in dense forest areas or high biomass on bare land), the data were considered as an error in

the reference dataset, a geolocation error in the plots or maps, or it was assumed that a land change

process occurred between the plot measurement and the image acquisition time (see next paragraph).

Errors due to temporal mismatch

The temporal difference of input and reference data introduced some uncertainty in the fusion

model. The input maps refer to the years 2000 - 2001 (Saatchi) and 2007 - 2008 (Baccini) while the

reference data mostly spanned the period 2000 – 2013. Therefore, the differences between the input

maps and the reference data may also be due to a temporal mismatch of the datasets. However,

changes due to deforestation were most likely excluded during the visual selection of the reference

data, when high-resolution images showed clear land changes (e.g., bare land or agriculture) in

areas where the input maps provided biomass estimates relative to forest areas (or vice-versa,

depending on the timing of acquisition of the datasets). However, changes due to forest regrowth

and forest degradation events that did not affect the forest canopy could not be considered with the

visual analysis and may have affected the mismatch observed between the reference data and the

input maps (< |56 - 78| Mg ha-1 for 50% of the cases of the Saatchi and Baccini maps, respectively).

The mismatch was in the range of biomass changes due to regrowth (1 - 13 Mg ha-1 year-1) (IPCC,

2003) or low-intensity degradation (14 - 100 Mg ha-1, or 3-15% of total stock) (Pearson et al., 2014;

Asner et al., 2010). On the other hand, considering the area affected by degradation (about 20% in

the humid tropics) (Asner et al., 2009), the temporal mismatch was considered responsible only for

a limited part of the large differences observed between the reference data and the input maps.

Small additional offsets may also be caused by the documented secular changes in biomass density

within intact tropical forests, which has been increasing by 0.2 – 0.5% per year (Phillips et al. 1998,

Chave et al. 2008, Phillips and Lewis 2014). It should also be noted that the reference data were

used to optimally integrate the input maps, and in the case of a temporal difference the fused map

was also ‘actualized’ to the state of the vegetation when the reference data were acquired. Therefore

the fused map cannot be attributed to a specific year and it represents the first decade of the 2000’s.

Errors in the stratification map

Page 23: An integrated pan-tropical biomass map using multiple ...

23

The errors in the stratification map (i.e., related to the prediction of the errors of the input maps)

were still substantial in some areas and affected the fused map in two ways. First, the reference data

that were erroneously attributed to a certain stratum introduced ‘noise’ in the estimation of the

model parameters (bias and weight), but the impact of these ‘outliers’ was largely reduced by the

use of a robust covariance estimator. Second, erroneous predictions of the strata caused the use of

incorrect model parameters in the combination of the input maps. The latter is considered to be the

main source of error of the fused map and indicates that the method can achieve improved results if

the errors of the input maps can be predicted more accurately. However, additional analysis showed

that, on average, fused maps based on alternative stratification approaches achieved lower accuracy

than the map based on an error stratification approach (Fig. S5). Therefore, this approach was

preferred over a stratification based on an individual biophysical variable (e.g., tree cover, tree

height, land cover or ecozone).

Application of the method at national scale

The fusion method presented in this study allows for the optimal integration of any number of input

maps to match the patterns indicated by the reference data. However, the accuracy of the fused map

depends on the availability of reference data representative of the error patterns of the input maps.

While the current reference database does not represent adequately all error strata for the tropical

region, and the model estimates are expected to have lower confidence in under-represented areas,

the proposed method may be applied locally and provide improved biomass estimates where

additional reference data are available. For example, the fusion method may be applied at national

level using existing forest inventory data, research plots and local maps that cover only part of the

country to calibrate global or regional maps, which provide national coverage but may not be

tailored to the country context. Such country-calibrated biomass maps may be used to support

natural resource management and national reporting under the REDD+ mechanism, especially for

countries that have limited capacities to map biomass from remote sensing data (Romijn et al. 2012).

Considering the increasing number of global or regional biomass datasets based on different data

and methodologies expected in the coming years, and that likely there will not be a single ‘best

map’ but rather the accuracy of each will vary spatially, the fusion approach may allow to optimally

combine and adjust available datasets to local biomass patterns identified by reference data.

Page 24: An integrated pan-tropical biomass map using multiple ...

24

Acknowledgments

This study was supported by the EU GEOCARON project. Data were also acquired by the

Sustainable Landscapes Brazil project supported by the Brazilian Agricultural Research

Corporation (EMBRAPA), the US Forest Service, and USAID, and the US Department of State. OP,

SLL and LQ acknowledge the support of the European Research Council (T-FORCES), SLL and

TS were supported by CIFOR/USAID; SLL was also supported by a Philip Leverhulme Prize.

References

Asner GP, Clark JK, Mascaro J et al. (2012) High-resolution mapping of forest carbon stocks in the

Colombian Amazon. Biogeosciences, 9, 2683–2696.

Asner GP, Clark JK, Mascaro J et al. (2012) Human and environmental controls over aboveground

carbon storage in Madagascar. Carbon Balance and Management, 7, 2.

Asner GP, Mascaro J, Anderson C et al. (2013) High-fidelity national carbon mapping for resource

management and REDD+. Carbon balance and management, 8, 7.

Asner GP, Mascaro J (2014) Mapping tropical forest carbon: Calibrating plot estimates to a simple

LiDAR metric. Remote Sensing of Environment, 140, 614–624.

Asner GP, Knapp DE, Martin RE et al. (2014) Targeted carbon conservation at national scales with

high-resolution monitoring. Proceedings of the National Academy of Sciences, 111, E5016–

E5022.

Avitabile V, Herold M, Henry M, Schmullius C (2011) Mapping biomass with remote sensing: a

comparison of methods for the case study of Uganda. Carbon Balance and Management, 6, 7.

Avitabile V, Baccini A, Friedl M a., Schmullius C (2012) Capabilities and limitations of Landsat

and land cover data for aboveground woody biomass estimation of Uganda. Remote Sensing of

Environment, 117, 366–380.

Baccini a., Goetz SJ, Walker WS et al. (2012) Estimated carbon dioxide emissions from tropical

deforestation improved by carbon-density maps. Nature Climate Change, 2, 182–185.

Baccini A, Asner GP (2013) Improving pantropical forest carbon maps with airborne LiDAR

sampling. Carbon Management, 4, 591–600.

Bates JM, Granger CWJ (1969) The Combination of Forecasts. Journal of the Operational

Research Society, 20, 451–468.

Birdsey R, Angeles-Perez G, Kurz W a et al. (2013) Approaches to monitoring changes in carbon

stocks for REDD+. Carbon Management, 4, 519–537.

Page 25: An integrated pan-tropical biomass map using multiple ...

25

Breiman L (2001) Random forests. Machine Learning, 45, 5−23.

Calders K, Newnham G, Burt A et al. (2015) Nondestructive estimates of above-ground biomass

using terrestrial laser scanning. Methods in Ecology and Evolution, 6, 198–208.

Cartus O, Kellndorfer J, Walker W, Franco C, Bishop J, Santos L, Michel-Fuentes JM (2014) A

National, Detailed Map of Forest Aboveground Carbon Stocks in Mexico. Remote Sensing, 6,

5559–5588.

Chave J, Olivier J, Bongers F et al. (2008) Above-ground biomass and productivity in a rain forest

of eastern South America. Journal of Tropical Ecology, 24, 355–366.

ESA, 2014. Global Water Bodies. http://www.esa-landcover-cci.org/

FAO, 2000. Global ecological zoning for the global forest resources assessment 2000. FAO FRA

Working Paper Rome, Italy; 2001

Feldpausch TR, Banin L, Phillips OL et al. (2011) Height-diameter allometry of tropical forest

trees. Biogeosciences, 8, 1081–1106.

Feldpausch TR, Lloyd J, Lewis SL et al. (2012) Tree height integrated into pantropical forest

biomass estimates. Biogeosciences, 9, 3381–3403.

Ge Y, Avitabile V, Heuvelink GBM, Wang J, Herold M (2014) Fusion of pan-tropical biomass

maps using weighted averaging and regional calibration data. International Journal of Applied

Earth Observation and Geoinformation, 31, 13–24.

Goetz S, Dubayah R (2011) Advances in remote sensing technology and implications for measuring

and monitoring forest carbon stocks and change. Carbon Management, 2, 231–244.

Grace J, Mitchard E, Gloor E (2014) Perturbations in the carbon budget of the tropics. Global

Change Biology.

Harris NL, Brown S, Hagen SC et al. (2012) Baseline Map of Carbon Emissions from Deforestation

in Tropical Regions. Science, 336, 1573–1576.

Houghton RA, House JI, Pongratz J et al. (2012) Carbon emissions from land use and land-cover

change. Biogeosciences, 9, 5125–5142.

Jiahui W, Zamar R, Marazzi A, et al. (2014). robust: Robust Library. R package version 0.4-16.

http://CRAN.R-project.org/package=robust.

Kearsley E, de Haulleville T, Hufkens K et al. (2013) Conventional tree height-diameter

relationships significantly overestimate aboveground carbon stocks in the Central Congo

Basin. Nature communications, 4, 2269.

IPCC (2003). Good practice guidance for land use, land-use change and forestry. IPCC National

Greenhouse Gas Inventories Programme, Technical Support Unit. Hayama, Japan: Institute for

Global Environmental Strategies.

Page 26: An integrated pan-tropical biomass map using multiple ...

26

Langner A, Achard F, Grassi G (2014) Can recent pan-tropical biomass maps be used to derive

alternative Tier 1 values for reporting REDD+ activities under UNFCCC? Environmental

Research Letters, 9, 124008.

Lewis SL, Sonké B, Sunderland T et al. (2013) Above-ground biomass and structure of 260 African

tropical forests. Philosophical transactions of the Royal Society of London. Series B,

Biological sciences, 368, 20120295.

Lewis SL, Lopez-Gonzalez G, Sonké B et al. (2009) Increasing carbon storage in intact African

tropical forests. Nature, 457, 1003–1006.

Malhi Y, Wood D, Baker TR et al. (2006) The regional variation of aboveground live biomass in

old-growth Amazonian forests. Global Change Biology, 12, 1107–1138.

Mitchard ET, Saatchi SS, Baccini A, Asner GP, Goetz SJ, Harris NL, Brown S (2013) Uncertainty

in the spatial distribution of tropical forest biomass: a comparison of pan-tropical maps.

Carbon balance and management, 8, 10.

Mitchard ET a, Feldpausch TR, Brienen RJW et al. (2014) Markedly divergent estimates of

Amazon forest carbon density from ground plots and satellites. Global Ecology and

Biogeography, 23, 935–946.

Nogueira MA, Diaz G, Andrioli W, Falconi F a., Stangarlin JR (2006) Secondary metabolites from

Diplodia maydis and Sclerotium rolfsii with antibiotic activity. Brazilian Journal of

Microbiology, 37, 14–16.

Pearson TRH, Brown S, Casarim FM (2014) Carbon emissions from tropical forest degradation

caused by logging. Environmental Research Letters, 034017, 11.

Phillips, O. L., Malhi, Y., Higuchi, N., Laurance, W., Nuñez, P., Vásquez, R., Laurance, S.,

Ferreira, L., Stern, M., Brown, S., Grace J (1998) Changes in the carbon balance of Tropical

Forests: Evidence from long-term plots. Science, 282, 439–442.

Phillips OL, Aragão LEOC, Lewis SL et al. (2009) Drought sensitivity of the Amazon Rainforest.

Science, 323, 1344–1347.

Phillips OL, Lewis SL (2014) Evaluating the tropical forest carbon sink. Global Change Biology,

20, 2039–2041.

Potapov P, Yaroshenko A, Turubanova S et al. (2008) Mapping the world’s intact forest landscapes

by remote sensing. Ecology and Society, 13.

Romijn E, Herold M, Kooistra L, Murdiyarso D, Verchot L (2012) Assessing capacities of non-

Annex I countries for national forest monitoring in the context of REDD+. Environmental

Science and Policy, 19-20, 33–48.

Saatchi SS, Harris NL, Brown S et al. (2011) Benchmark map of forest carbon stocks in tropical

regions across three continents. Proceedings of the National Academy of Sciences, 108, 9899–

9904.

Page 27: An integrated pan-tropical biomass map using multiple ...

27

Saatchi SS, Mascaro J, Xu L et al. (2014) Seeing the forest beyond the trees. Global Ecology &

Biogeography, 23, 935 – 946.

Searle SR (1971) Linear Models, Vol. XXI. WILEY-VCH Verlag, York-London-Sydney-Toronto,

532 pp.

Slik JWF, Paoli G, Mcguire K et al. (2013) Large trees drive forest aboveground biomass variation

in moist lowland forests across the tropics. Global Ecology and Biogeography, 22, 1261–1271.

Ter Steege H, Pitman NC a, Phillips OL et al. (2006) Continental-scale patterns of canopy tree

composition and function across Amazonia. Nature, 443, 444–447.

Willcock S, Phillips OL, Platts PJ et al. (2012) Towards Regional, Error-Bounded Landscape

Carbon Storage Estimates for Data-Deficient Areas of the World. PLoS ONE, 7, 1–10.

Wright JS (2013) The carbon sink in intact tropical forests. Global Change Biology, 19, 337–339.

Ziegler AD, Phelps J, Yuen JQ et al. (2012) Carbon outcomes of major land-cover transitions in SE

Asia: Great uncertainties and REDD+ policy implications. Global Change Biology, 18, 3087–

3099.

Zolkos SG, Goetz SJ, Dubayah R (2013) A meta-analysis of terrestrial aboveground biomass

estimation using lidar remote sensing. Remote Sensing of Environment, 128, 289–298.

Supporting Information

Appendix S1. Supplementary methods and results

Page 28: An integrated pan-tropical biomass map using multiple ...

28

Appendix S1

Supplementary methods and results

Stratification approach

Error modelling

The stratification approach implemented in this study aimed to identify areas with homogeneous

error structure in both input maps. The error maps of the Saatchi and Baccini maps were first

predicted separately and then combined in eight error strata per continent using a clustering

approach. Since the biomass estimates of the input maps were mostly based on optical and LiDAR

data that are sensitive to tree cover and tree height, it was assumed that their uncertainties were

related to the spatial variability of these parameters. In addition, the errors of the input maps

resulted to be linearly correlated with the respective biomass estimates. For these reasons, the

biomass maps themselves as well as global datasets of land cover, tree cover and tree height were

used to predict the map errors using a Random Forest model (Breiman, 2001), calibrated on the

basis of the reference dataset. Then, the error maps of the Saatchi and Baccini datasets were

clustered using the K-Means approach. The number of clusters was determined as a trade-off

between homogeneity of the errors of the input maps (quantified as variance (sum of squares)

within groups of the error maps) and number of reference observations available per stratum. Eight

clusters was considered as a sensible compromise between these two parameters, with a larger

number of clusters providing only a marginal increase in homogeneity but leading to a small

number of reference data in some strata (Fig. S1). The resulting stratification map presented missing

values where the predictors presented no data, i.e. areas without coverage of the Baccini map, or for

classes of the categorical predictor (i.e., land cover) without reference data. The missing values

were estimated using an additional Random Forest model that predicted the strata (instead of the

errors of the input maps) based on 10,000 randomly selected training data. These ‘secondary’

training data were extracted from the stratification map and included only the predictors without

missing values (i.e., Saatchi map, tree cover and tree height). The performance of this approach was

assessed on the basis of the RMSE of the models that predicted the errors of the input maps, and on

the error rate of the models that predicted the strata for the areas with missing values (Fig. S2, Fig.

S3). The performance statistics were computed as an average of 100 model repetitions to account

for a certain level of randomness inherent in the ensemble modelling approach used in this study

(Random Forest). The importance of the predictor variables was assessed on the basis of the total

increase in node purities (measured by residual sum of squares) from splitting on the variable,

Page 29: An integrated pan-tropical biomass map using multiple ...

29

averaged over all trees (Liaw and Wiener, 2002) (Table S1). According to this variable, the main

predictors of the errors of the input maps were in most cases the biomass values of the maps

themselves, followed by tree cover and tree height, while land cover was always the least important

predictor.

Figure S1: Relation between the number of clusters (x axis), the average number of reference data per cluster

(red line) and their homogeneity (within groups sum of square) (black line) when the error maps of the Saatchi

and Baccini datasets are clustered using the K-Means approach.

Page 30: An integrated pan-tropical biomass map using multiple ...

30

Figure S2: Performance of the models that predicted the error of the input maps (RMSE) and of the models that

predicted the strata for the areas not covered by the Baccini map (Error rate). Error statistics are computed as

mean of 100 random model repetitions, with the error bars indicating 1 standard deviation.

Figure S3: Comparison of predicted errors on the Out-Of-Bag data (i.e., data not used for training the model)

with observed errors of the Random Forest models that estimated the errors of the input maps.

Table S1: Importance of the variables used to predict the errors of the input maps by the Random Forest models.

The importance is represented by the increase in node purity (x 1000). The variables are the biomass of Saatchi

Page 31: An integrated pan-tropical biomass map using multiple ...

31

map (“AGB Saatchi”), biomass of the Baccini map (“AGB Baccini”), forest height (“Height”), tree cover (“VCF”)

and land cover according to the ESA CCI map (“CCI”).

Africa S. America C. America Asia Australia

Saatchi Baccini Saatchi Baccini Saatchi Baccini Saatchi Baccini Saatchi

AGB Saatchi 2,607 2,963 1,321 839 14,684 963 1,293 618 1,536

AGB Baccini 2,365 2,737 839 923 1,832 6,219 890 766 -

VCF 2,346 3,223 1,170 1,237 2,469 3,071 614 534 4,971

Height 1,629 1,710 784 748 3,320 1,990 586 537 949

CCI 471 937 22 12 433 239 164 104 1,702

Pre-processing of covariate maps

The maps used to predict the error strata were obtained as follows. The land cover information was

derived from the ESA CCI 2005 Land Cover map (ESA, 2014) by first reclassifying it into eight

classes representing the main vegetation types (Evergreen forest, Deciduous forest, Woodland,

Mosaic Vegetation, Shrubland, Grassland, Cropland, Other land) and then resampling it from its

original resolution (330 m) to the resolution of the Saatchi map (1 km) on the basis of the majority

criterion. The tree cover value was obtained from the annual MODIS VCF tree cover composites at

250m for the years 2000 – 2010 (DiMiceli et al., 2011). The eleven annual datasets were averaged

to a mean decadal composite, spatially aggregated to 1 km resolution and warped to the geographic

projection and grid of the Saatchi map (WGS-84). Tree height was directly derived from the Global

3D Vegetation Height map at 1km resolution (Simard et al., 2011). Additional potential land cover

and ecosystem predictors, namely the GLC2000 map (Mayaux et al., 2004), the Synmap map (Jung

et al., 2006) and the Global Ecological Zone (GEZ) map (FAO, 2000), were tested but discarded

after preliminary analysis because they did not provide additional explanatory power to the error

models. All maps were resampled to match the grid of the Saatchi map using the nearest neighbor

method.

Stratification map and alternative approaches

The stratification map obtained with the approach described above identified eight strata depicting

homogeneous error patterns of the input maps for each continent (Central America, South America,

Africa, Asia and Australia). The map, reported in Figure S4, was used to identify the parameters

(bias and weights) of the fusion model.

Page 32: An integrated pan-tropical biomass map using multiple ...

32

Alternative stratification approaches using individual datasets to predict the errors of the input maps,

namely land cover, tree cover or tree height, were tested and their performance assessed by

validating the respective fused maps. The fused maps based on each stratification map were trained

using a calibration dataset (random selection of 70% of the reference data) and compared with the

validation dataset (remaining 30% of the reference data) to compute the respective RMSE. To

account for any potential impacts of the random selection of validation data, the procedure was

repeated 100 times, computing each time a new random selection of the calibration and validation

datasets. This procedure allowed to compute the mean RMSE and assess its standard deviation for

each map.

Figure S4: Stratification map, depicting eight strata per continent with homogeneous error patterns of the input

maps.

Page 33: An integrated pan-tropical biomass map using multiple ...

33

Figure S5: Performance of all stratification approaches, computed as RMSE of the respective fused maps

assessed using independent reference data. The “Model” stratification refers to the approach used in this study

(“Error strata”). The alternative stratification approaches consist of individual datasets used to predict the

errors of the input maps, i.e., forest height (“HEI”), tree cover (“VCF”), land cover according to the GLC2000

map (“GLC”) and land cover according to the ESA CCI map (“CCI”).

Reference dataset

Visual selection of the reference field data

After the preliminary selection of field plots based on metadata information (tree parameters

measured in the field, allometric model applied, year of measurement, plot geolocation and extent),

the ground data were further screened to discard the plots not representative of the biomass density

within the 1 km cells of the input maps. Considering that the two input maps are not aligned and

therefore their pixels do not correspond to the same geographic area, the plots needed to be

representative of the area of both pixels identified before their resampling. For this reason, the plot

representativeness was assessed on the area obtained by merging the pixel extent of the Saatchi map

with the corresponding pixel extent of the Baccini map after its aggregation to 1 km but before its

resampling. The representativeness was assessed by means of visual interpretation of high

resolution images provided on the Google Earth platform, discarding the plots located in pixels with

Page 34: An integrated pan-tropical biomass map using multiple ...

34

heterogeneous tree cover and tree crowns. Pixels were considered heterogeneous if more than 10%

of the area presented forest structure different than that of the majority of the area (i.e., pixels

needed to be homogeneous for at least 90% of their area). Moreover, if the sum of the area of all

plots located within a pixel was smaller than 0.1 ha, these plots were removed because the area

measured on the ground was considered insufficient to represent 1km pixels even in homogeneous

contexts. Even though biomass may still vary substantially within an area having similar forest

structure, using the texture and context information assessed through visual interpretation allowed

for higher confidence than assessing pixel homogeneity using fixed thresholds of tree cover and

texture variability automatically derived from coarser resolution images (i.e., MODIS or Landsat).

Examples of plots selected or discarded through the visual analysis of Google Earth images are

provided in Fig. S6.

Figure S6: visual selection of the field plots. Plots are discarded if not representative of the biomass density in the

1 km2 area of the input maps. The yellow and red polygons represent the area of the corresponding pixels of the

Saatchi and Baccini maps and the blue circles represent the extent of the field plots, superimposed on the Google

Earth images. The right plots were selected (tree cover and texture are homogeneous) and the left plot was

discarded (heterogeneous pixels). FIGURE TO BE ENHANCED: plot area is not visible

The pan-tropical reference dataset

The reference data selected after the screening, upscaling and consolidating procedure, for each

individual dataset are reported in Table S2. The field plots underpinning the reference biomass

maps are not included among the “Available plots” to avoid double-counting, since the respective

reference data are already considered as “Selected pixels”. Table S3 indicates the main type of

training data used by the high resolution biomass maps and the criteria used to select the 1-km

Page 35: An integrated pan-tropical biomass map using multiple ...

35

reference pixels for this study. A complete description of the metadata and the literature reference

of the ground reference data and reference biomass maps are provided in Table S7 – S10.

Table S2: Number of reference data (plots and 1-km pixels) selected after the screening, upscaling and

consolidating procedure, for each individual dataset.

ID Continent Type Country/Region Available Selected Consolidated

Plots Plots Pixels Pixels

AFR1 Africa Plots DRC 1,157 1,067 227 227

AFR2 Africa Plots Sierra Leone 609 576 162 162

AFR3 Africa Plots Tropical Africa 260 175(a) 161 161

AFR4 Africa Plots Ethiopia 119 60 42 42

AFR5 Africa Plots Ghana 74 65 12 12

AFR6 Africa Plots Tanzania 42 23 9 9

AFR7 Africa Plots DRC 20 10 9 9

AFR8 Africa Map Uganda 223 223

AFR9 Africa Map Madagascar 60 60

AFR10 Africa Map Mozambique 38 38

AFR11 Africa Map Cameroon 10 10

TOTAL AFRICA 2,281 1,976 953 953

SAM1 S. America Plots Amazon basin 413 287(b) 221 221

SAM2 S. America Plots Brazil 124 101 48 48

SAM3 S. America Plots Guyana 111 86 30 30

SAM4 S. America Map Peru 100 100

SAM5 S. America Map Colombia 50 50

TOTAL S. AMERICA 648 474 449 449

CAM1 C. America Map Mexico 4,263 6,072

CAM2 C. America Map Panama 997 3,095

TOTAL C. AMERICA 0 0 5,260 9,167

ASI1 Asia Plots Vietnam 3,197 1,547 101 101

ASI2 Asia Plots Laos 122 68 65 65

ASI3 Asia Plots Sabah 104 74 53 53

ASI4 Asia Plots Indonesia 82 39 36 36

ASI5 Asia Plots SE Asia 132 77 77 77

ASI6 Asia Plots Indonesia 25 17 11 11

Page 36: An integrated pan-tropical biomass map using multiple ...

36

ASI7 Asia Plots SE Asia 25 4 4 4

ASI8 Asia Plots Indonesia 11 7 6 6

ASI9 Asia Map Asia 0 47(c)

TOTAL ASIA 3,698 1,833 353 400

AUS1 Australia Map Australia 5,000 5,000

TOTAL AUSTRALIA 0 0 5,000 5,000

TOTAL TROPICS 6,627 4,283 12,015 15,969

(a) 22 plots (DOU, OVG, EKO, LOP-01, CVL, GBO) were removed because they were used by Saatchi et al. (2011) to

calibrate the LiDAR/biomass relationships, and therefore highly correlated with the map values.

(b) 264 plots measured for the last time after the year 2000 were selected. In addition, 23 plots measured from 1990 in

undisturbed forests were included because the visual analysis of high-resolution images did not indicate any sign of

disturbance.

(c) Additional training data representing areas with no biomass (e.g., bare soil) were included only for Asia, where 47

pixels were selected using visual analysis of Google Earth images.

Table S3: Type of calibration data used by the high resolution biomass maps (identified by the “ID”) and criteria

used to select the 1-km reference pixels for this study (“Selected pixels”).

ID Calibration data Selected pixels

AFR8 Field plots Pixels with plots representative of >40% of pixel area

AFR9 Airborne LiDAR Random pixels in an amount proportional to other datasets

AFR10 Field plots All pixels with plots

AFR11 Field plots All pixels with plots

SAM4 Airborne LiDAR Random pixels in an amount proportional to other datasets

SAM5 Airborne LiDAR Random pixels in an amount proportional to other datasets

CAM2 Field plots Pixels with available plots

CAM3 Airborne LiDAR Random pixels in an amount proportional to the map area

AUS1 Field plots Random pixels in an amount proportional to other continents

The reference dataset described above was used to assess the errors of the input maps (RMSE) and

the linear correlation (r) of the errors for each continent (Table S4).

Table S4: Linear correlation of the errors (r) and RMSE of the input maps computed using the complete

reference dataset, per continent.

Africa S. America C. America Asia Australia

Page 37: An integrated pan-tropical biomass map using multiple ...

37

Saatchi Baccini Saatchi Baccini Saatchi Baccini Saatchi Baccini Saatchi

r 0.77 0.72 0.61 0.65 0.60 0.75 0.64 0.72 0.69

RMSE 105 116 104 96 97 101 124 103 45

The fusion model

The biases and weights computed by the fusion model per stratum and continent are reported in

Table S5, along with the number of reference data and relative area of each stratum (in percentage).

Table S5: Bias values and weights applied to the input maps, number of reference data and percentage relative

area per stratum and continent

Bias Weight Ref. data (N) Area

Strata Saatchi Baccini Saatchi Baccini

AFRICA

1 -11 -55 0.29 0.71 111 13%

2 -26 -17 0.57 0.43 123 21%

3 113 203 0.54 0.46 163 1%

4 200 146 0.00 1.00 77 2%

5 77 -51 0.93 0.07 66 3%

6 -91 -49 0.07 0.93 146 7%

7 -56 -87 0.45 0.55 89 8%

8 -7 16 0.60 0.40 178 45%

ASIA

1 -42 -22 0.75 0.25 83 69%

2 82 52 0.00 1.00 19 2%

3 43 116 0.45 0.55 26 2%

4 174 190 0.00 1.00 38 2%

5 -186 -74 0.84 0.16 74 2%

6 -125 -117 0.43 0.57 61 2%

7 11 -31 1.00 0.00 55 14%

8 -77 20 1.00 0.00 44 6%

CENTRAL AMERICA

1 -16 -67 0.43 0.57 1574 20%

2 -12 -112 0.34 0.66 1155 7%

3 -119 -80 0.62 0.38 1511 6%

4 -76 -163 0.53 0.47 748 2%

5 -204 -101 0.45 0.55 617 3%

6 -63 -74 0.65 0.35 2402 9%

7 -14 -27 0.38 0.62 637 51%

8 -178 -178 0.33 0.67 506 1%

Page 38: An integrated pan-tropical biomass map using multiple ...

38

SOUTH AMERICA

1 -37 -112 0.38 0.62 72 6%

2 27 5 0.84 0.16 43 10%

3 -18 -44 0.96 0.04 37 56%

4 38 -64 0.39 0.61 51 7%

5 161 139 0.17 0.83 79 3%

6 -102 -79 0.75 0.25 59 4%

7 -23 3 0.33 0.67 48 10%

8 124 40 0.97 0.03 60 5%

AUSTRALIA

1 -21 NA 1.00 0.00 847 16%

2 -37 NA 1.00 0.00 449 6%

3 79 NA 1.00 0.00 411 2%

4 2 NA 1.00 0.00 859 29%

5 141 NA 1.00 0.00 220 2%

6 41 NA 1.00 0.00 564 4%

7 -11 NA 1.00 0.00 1145 35%

8 16 NA 1.00 0.00 505 6%

Biomass map

Biomass stocks of the fused and input maps

The total biomass stocks (Pg) of all land cover types per continent and biomass map are reported in

Table S6. For comparability among the three maps (Saatchi, Baccini and fused map), the stocks are

computed for the extent of the Baccini map while the values in brackets represent the stocks relative

to the larger extent of the Saatchi map. The values reported for the Saatchi map are higher than

those published by Saatchi et al. (2011) because the latter refer only to areas with tree canopy

cover >10%.

Table S6: Total biomass stocks (Pg) of all land cover types per continent and biomass map. The values refer to

the extent of the Baccini map while the values in brackets represent the stocks relative to the larger extent of the

Saatchi map.

Africa S. America C. America Asia Australia Total

Baccini 129 216 18 93 - 457

Saatchi 113 (117) 178 (196) 15 (20) 107 (192) (21) 413 (545)

Fused 96 (97) 179 (191) 7 (9) 78 (134) (19) 360 (451)

Page 39: An integrated pan-tropical biomass map using multiple ...

39

Comparison of forest biomass in intact and non-intact landscapes

The biomass values per continent for intact and non-intact forests were computed as the area-

weighted mean of the contributing ecozones (see related section in “Data and Methods”). In Africa

the fused map presented higher mean biomass values than the Saatchi and Baccini maps in intact

forests (+86 and +65 Mg ha-1, respectively) and lower values in non-intact forests (-15 and -36 Mg

ha-1, respectively), in Asia and in Central America the fused map was consistently lower than the

input maps in both intact (between -74 and -125 Mg ha-1) and non-intact forests (between -51 and -

65 Mg ha-1), in South America the fused map was similar to the Saatchi map in both intact and non-

intact forests (+20 and +3 Mg ha-1, respectively) and lower than the Baccini map (-18 and -28 Mg

ha-1, respectively), and in Australia the fused map was higher than the Saatchi map in both intact

and non-intact forests (+72 and +30 Mg ha-1, respectively). As expected, intact forests had

consistently higher mean biomass than non-intact forest in all continents and major ecozones. When

considering the average difference between intact and non-intact forests, this was larger in the fused

map than in the Saatchi and Baccini maps in Africa (236, 143 and 154 Mg ha-1 respectively),

similar or slightly larger in South America (78, 63 and 79 Mg ha-1 respectively), and smaller in

Central America (58, 74 and 86 Mg ha-1 respectively) and Asia (39, 59 and 55 Mg ha-1 respectively).

Page 40: An integrated pan-tropical biomass map using multiple ...

40

Figure S7: Mean biomass density of the fused and input maps in intact (I) and non-intact (NI) forests stratified

by continent and ecozone. Only ecozones with intact forest larger than 1,000 km2 are reported

Page 41: An integrated pan-tropical biomass map using multiple ...

41

Table S7: Metadata of the ground reference datasets (part 1)

ID Continent Country / Region Location Extent Vegetation type(s) Year(s) N.

plots

Area -

Range (ha)

Area - Mean

(ha)

Min. DBH

(cm)

Reference

AFR1 Africa DRC Lukenie Local Forest (concession) NA 1157 0.5 0.5 10 Hirsch et al., 2013

AFR2 Africa Sierra Leone Gola Forest Local Forest 2005-2007 609 0.125 0.125 10 Lindsell and Klop, 2013

AFR3 Africa Tropical Africa Tropical Africa Regional Forest (Intact) 1970s - 2013 260 0.2 - 10 1.2 10 Lewis et al., 2013

AFR4 Africa Ethiopia Kafa Local Forest - Woodland 2011-2013 119 0.126 0.126 5 De Vries et al., 2012

AFR5 Africa Ghana Ankasa Local Forest 2012 34 0.05 0.05 10 Laurin et al., 2013

AFR5 Africa Ghana Bia Boin, Dadieso Local Forest 2012-2013 40 0.16 0.16 5 Pirotti et al., 2014

AFR6 Africa Tanzania Eastern Arc Mountain Local Forest 2007-10 42 0.08 - 1 0.66 10 Lopez-Gonzalez et al., 2009, 2011

AFR7 Africa DRC Yangambi Local Forest (Intact) NA 20 1 1 10 Kearsley et al., 2013

SAM1 S. America Amazon Amazon Regional Forest 1956 - 2013 413 0.25 - 9 1 10 Mitchard et al., 2013; Lopez-Gonzalez et al., 2014

SAM2 S. America Brazil Brazil National Forest 2009 - 2013 124 0.16 - 1 0.42 5 - 10 Embrapa, 2014

SAM3 S. America Guyana Guyana Local Forest 2010 - 2011 111 5 NA

CAM1 C. America Mexico Mexico National Forest 2004-2008 4136 7.5 de Jong, 2013

ASI1a Asia Vietnam Quang Nam Province Forest 2007-2009 3035 0.05 0.05 6 Avitabile et al., 2014

ASI1b Asia Vietnam Quang Nam Province Forest 2011-2012 162 0.01 - 0.126 0.08 5 Avitabile et al., 2014

ASI2 Asia Laos Xe Pian Local Forest 2011-2012 122 0.1 - 0.126 0.11 5 WWF and OBf, 2013

ASI3 Asia Indonesia Sabah Local Forest (concession) 2005 - 2008 104 0.5 - 1.5 1 10 Morel et al., 2011

ASI4 Asia Indonesia Riau province Local Forest 2009-2010 82 0.015 0.015 5 Wijaya et al., 2015

ASI5 Asia Asia India, China, Indonesia Local Forest (Intact) circa-2010 132 0.25 - 20 1.5 10 Slik et al., 2013, 2014

ASI6 Asia Malaysia, Indonesia Sarawak, C. Kalimantan Regional Forest (Intact) 2013 - 2014 25 0.25 - 1 0.57 10 Phillips et al., in prep.

ASI7 Asia Indo-Pacific Indo-Pacific Regional Mangrove 2008 - 2009 25 0.015 0.015 5 Donato et al., 2011

ASI8 Asia Indonesia Kalimantan Local Mangrove 2008-2009 11 0.015 0.015 5 Murdiyarso et al., 2010 (a)

(a) The metadata for this dataset are provided in Amira (2008) and Kauffman and Donato (2012)

Page 42: An integrated pan-tropical biomass map using multiple ...

42

Table S8: Metadata of the ground reference datasets (part 2)

ID Parameter

measured

Tree Height Allometric equation Parameters of

allom. eq.

Plot type Permanent plot

AFR1 Dbh, Sp, Hei not used Chave (2005) Moist Dbh, wd, hei For. Inv. No

AFR2 Dbh, Sp, Hei local eq. Chave (2005) Moist Dbh,wd, hei For. Inv. No

AFR3 Dbh, Sp Feldpausch (2012) Chave (2005) Moist Dbh,wd, hei Res. plots Yes

AFR4 Dbh, Sp not used Chave (2005) Wet dbh, wd Res. plots No

AFR5 Dbh, Sp, Hei measured for all trees Chave (2005) Moist Dbh,wd, hei Res. plots No

AFR5 Dbh, Sp, Hei measured for all trees Chave (2005) Moist Dbh,wd, hei Res. plots No

AFR6 Dbh, Sp Feldpausch (2012) Chave (2005) Moist Dbh,wd, hei Res. plots Yes

AFR7 Dbh, Sp, Hei stand-specific eq. Chave (2005) Moist Dbh,wd, hei Res. plots Yes

SAM1 Dbh, Sp Feldpausch (2012) Chave (2005) Moist Dbh,wd, hei Res. plots Yes

SAM2 Dbh, Sp, Hei measured for all trees Chave (2005) Moist Dbh,wd, hei For. Inv. No

SAM3 Dbh, Sp not used Chave (2005) Moist dbh, wd For. Inv. No

CAM1 Dbh, Sp, Hei measured for all trees Urquiza-Haas et al. (2007) Dbh,wd, hei For. Inv. Yes

ASI1a Dbh, Sp, Hei local eq. Chave (2005) Moist Dbh,wd, hei For. Inv. No

ASI1b Dbh, Sp not used Chave (2005) Moist dbh, wd Res. plots No

ASI2 Dbh, Sp not used Chave (2005) Dry/Moist dbh, wd Res. plots No

ASI3 Dbh, Sp, Hei stand-specific eq. Chave (2005) Moist Dbh,wd, hei Res. plots No

ASI4 Dbh, Sp not used Komiyama et al. (2008), Chave (2005) Moist dbh, wd Res. plots No

ASI5 Dbh, sp Feldpausch (2012) Chave (2005) Dry/Moist/Wet Dbh,wd, hei Res. plots No

ASI6 Dbh, sp Feldpausch (2012) Chave (2005) Moist Dbh,wd, hei Res. plots Yes

ASI7 Dbh, sp not used Komiyama et al. (2008) dbh, wd Res. plots No

ASI8 Dbh, Sp not used Komiyama et al. (2008) dbh, wd Res. plots No

Page 43: An integrated pan-tropical biomass map using multiple ...

43

Table S9: Metadata of the reference biomass maps (part 1)

ID Continent Country/Region Location Extent Vegetation types (a)

Year (map) Resolution (m)

Accuracy dataset

RMSE (Mg/ha)

R2 RS data Reference

AFR8 Africa Uganda Uganda National For – Wood – Sav.

1999-2003 30 Validation 13 0.81 Landsat, LC

Avitabile et al., 2012

AFR9 Africa Madagascar North Madagascar

Local Forest 2010 100 Calibration 42 0.88 Landsat, LiDAR

Asner et al., 2012

AFR10 Africa Mozambique Gorongosa Local Wood – Sav 2007 50 Validation 20 0.49 ALOS Ryan et al., 2012

AFR11 Africa Cameroon Mbam Djerem Local For - Sav 2007 100 Calibration 29 NA ALOS Mitchard et al., 2011

SAM4 S. America Peru Peru National For – Wood – Grass

NA 100 Calibration 55 0.82 Landsat, LiDAR

Asner et al., 2014

SAM5 S. America Colombia Colombian amazon

Regional Forest 2010 100 Validation 58 NA Landsat, LiDAR

Asner et al., 2012

CAM2 C. America Mexico Mexico National Forest 2007 30 Validation 28 0.52 Landsat, ALOS

Cartus et al., 2014

CAM3 C. America Panama Panama National For – Wood – Grass

2008 - 2012

100 45 0.62 Landsat, LiDAR

Asner et al., 2013

AUS1 Australia Australia West Australia Local Wood – Sav 50 NA NA NA ALOS Lucas et al., 2010

(a) Forest (For), woodland (Wood), Savannah (Sav), Grassland (Grass)

Page 44: An integrated pan-tropical biomass map using multiple ...

44

Table S10: Metadata of the reference biomass maps (part 2)

ID N. plots Years (Plots) Plot size -

Range (ha)

Plot size -

Mean (ha)

Min. DBH

(cm)

Parameter

measured

Tree Height Allometric equation Parameters of

allometric eq.

Plot type

AFR8 2527 1995-2005 0.25 0.25 3 Dbh, Sp, Crown measured Drichi (2003) Dbh, wd,

crown

For. Inv.

AFR9 19 NA 0.28 0.28 0 Dbh, Sp, Hei local eq. Chave (2005) Wet Dbh, wd, hei Res. plots

AFR10 96 2006-2009 0.1 - 2.2 0.63 5 Dbh not used Ryan et al. (2011) dbh Res. plots

AFR11 25 2007 0.2 - 1 0.6 10 Dbh, Sp, Hei local eq. Chave (2005) Dry/Moist/Wet Dbh, wd, hei Res. plots

SAM4 272 NA 0.3 - 1 0.33 NA Dbh, Sp, Hei local eq. Chave et al., 2014 Dbh, wd, hei For. Inv.

SAM5 11 NA 0.28 0.28 10 Dbh, Sp local eq. Chave (2005) Moist Dbh, wd, hei Res. plots

CAM2 16906 2004 - 2007 1 1 7.5 Dbh, Sp, Hei measured National species-specific eq. Dbh, wd, hei For. Inv.

CAM3 228 NA 0.1 - 0.36 NA 10 Dbh, Sp, Hei local eq. Chave (2005) Dry/Moist/Wet Dbh, wd, hei Res. plots

AUS1 2781

Page 45: An integrated pan-tropical biomass map using multiple ...

45

Additional References

Amira S (2008). An estimation of Rhizophora apiculata Bl. biomass in mangrove forest in Batu

Ampar Kubu Raya Regency, West Kalimantan. Undergraduate Thesis, Bogor Agricultural

University, Indonesia

Avitabile V, 2014. Carbon stocks of vegetation in the Vu Gia Thu Bon river basin, central Vietnam.

Technical Report. Land Use and Climate Change Interactions in Central Vietnam (LUCCi)

project. http://leutra.geogr.uni-jena.de/vgtbRBIS/metadata/start.php

DeVries B, Avitabile V, Kooistra L, Herold M, 2012. Monitoring the impact of REDD+

implementation in the Unesco Kafa biosphere reserve, Ethiopia. Proceedings of the “Sensing a

Changing World” Workshop (2012).http://www.wageningenur.nl/upload_mm/9/d/c/f80b6db7-

9c3c-4957-8717-6fdc6e46e60f_deVries.pdf

Donato DC, Kauffman JB, Murdiyarso D, Kurnianto S, Stidham M, Kanninen M (2011) Mangroves

among the most carbon-rich forests in the tropics. Nature Geoscience, 4, 293–297.

EMBRAPA, 2014. Sustainable Landscape Brazil. http://geoinfo.cnpm.embrapa.br/geonetwork/srv/

eng/main.home

ESA, 2014. Global land cover map for the epoch 2005. http://www.esa-landcover-cci.org/

DiMiceli, C.M., Carroll, M.L., Sohlberg, R.A., Huang, C., Hansen, M.C. and Townshend J.R.G.

(2011). Annual Global Automated MODIS Vegetation Continuous Fields (MOD44B) at 250 m

Spatial Resolution for Data Years Beginning Day 65, 2000 - 2010, Collection 5 Percent Tree

Cover, University of Maryland, College Park, MD, USA

Hirsh, F.; Jourget, J. G.; Feintrenie, L.; Bayol, N.; Atyi, R. E., 2013. REDD+ pilot project in

Lukenie. Center for International Forestry Research (CIFOR), Jakarta, Indonesia, CIFOR

Working Paper, 2013, 111, pp vi + 48 .

Jung M, Henkel K, Herold M, Churkina G (2006) Exploiting synergies of global land cover

products for carbon cycle modeling. Remote Sensing of Environment, 101, 534–553.

Kauffman JB and Donato DC (2012). Protocols for the measurement, monitoring and reporting of

structure, biomass and carbon stocks in mangrove forests. Working Paper 86. CIFOR, Bogor,

Indonesia.

Liaw, A., Wiener, M. (2002). Classification and regression by random forest. R News, 2

(3), 18–22.

Lindsell J a., Klop E (2013) Spatial and temporal variation of carbon stocks in a lowland tropical

forest in West Africa. Forest Ecology and Management, 289, 10–17.

Lopez-Gonzalez, G., Lewis, S.L., Burkitt, M., Baker T.R. and Phillips, O.L. 2009. ForestPlots.net

Database. www.forestplots.net. Date of extraction [09,09,13]

Page 46: An integrated pan-tropical biomass map using multiple ...

46

Lopez-Gonzalez G, Lewis SL, Burkitt M and Phillips OL (2011). ForestPlots.net: a web application

and research tool to manage and analyse tropical forest plot data. Journal of Vegetation

Science 22: 610–613. doi: 10.1111/j.1654-1103.2011.01312.x

Lopez-Gonzalez G, Mitchard ETA, Feldpausch TR et al. (2014) Amazon forest biomass measured

in inventory plots. Plot Data from "Markedly divergent estimates of Amazon forest carbon

density from ground plots and satellites." doi: 10.5521/FORESTPLOTS.NET/2014_1

Lucas R, Armston J, Fairfax R et al. (2010). An evaluation of the ALOS PALSAR L-band

backscatter – above ground biomass relationship over Queensland, Australia. IEEE Journal of

Selected Topics in Earth Observations and Remote Sensing, 3(4): 576-593.

Mayaux P, Bartholome E, Fritz S, Belward A (2004) A New Land Cover Map of Africa for the

Year 2000. Journal of Biogeography, 31, 861–877.

Mitchard ET, Saatchi SS, Lewis SL et al. (2011) Measuring biomass changes due to woody

encroachment and deforestation/degradation in a forest-savanna boundary region of central

Africa using multi-temporal L-band radar backscatter. Remote Sensing of Environment, 115,

2861–2873.

Morel AC, Saatchi SS, Malhi Y et al. (2011) Estimating aboveground biomass in forest and oil

palm plantation in Sabah, Malaysian Borneo using ALOS PALSAR data. Forest Ecology and

Management, 262, 1786–1798.

Murdiyarso D, Donato D, Kauffman JB, Kurnianto S, Stidham M, Kanninen M (2010). Carbon

storage in mangrove and peatland ecosystems: a preliminary account from plots in Indonesia.

CIFOR Working Paper no. 48. 35pp. Bogor – Indonesia.

Phillips O.L., Lewis S.L., Qie L., Banin L., et al. (in prep.) Tropical Forests in the Changing Earth

System Borneo forest dataset'.

Pirotti F, Laurin G, Vettore A, Masiero A, Valentini R (2014) Small Footprint Full-Waveform

Metrics Contribution to the Prediction of Biomass in Tropical Forests. Remote Sensing, 9576–

9599.

Ryan CM, Hill T, Woollen E et al. (2012) Quantifying small-scale deforestation and forest

degradation in African woodlands using radar imagery. Global Change Biology, 18, 243–257.

Simard M, Pinto N, Fisher JB, Baccini A (2011) Mapping forest canopy height globally with

spaceborne lidar. Journal of Geophysical Research: Biogeosciences, 116, 1–12.

Vaglio Laurin G, Chen Q, Lindsell J, Coomes D, Cazzolla-Gatti R, Grieco E, Valentini R (2013)

Above ground biomass estimation from lidar and hyperspectral airbone data in West African

moist forests. EGU General Assembly Conference Abstracts, 15, 6227.

Wijaya A, Liesenberg V, Susanti A, Karyanto O, Verchot LV (2015). Estimation of Biomass

Carbon Stocks over Peat Swamp Forests using Multi-Temporal and Multi-Polarizations SAR

Data. Proceeding of the 36th International Symposium on Remote Sensing of Environment, 11-

15 May 2015, Berlin, Germany.

Page 47: An integrated pan-tropical biomass map using multiple ...

47

WWF and ÖBf. 2013. Xe Pian REDD+ project document. Gland, Switzerland.

http://www.leafasia.org/sites/default/files/public/resources/WWF-REDD-pres-July-2013-

v3.pdf