ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background...
Transcript of ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background...
Page 1 | 75
E S S n e t B i g D a t a I I
G r a n t A g r e e m e n t N u m b e r : 8 4 7 3 7 5 - 2 0 1 8 - N L - B I G D A T A
h t t p s : / / w e b g a t e . e c . e u r o p a . e u / f p f i s / m w i k i s / e s s n e t b i g d a t a h t t p s : / / e c . e u r o p a . e u / e u r o s t a t / c r o s / c o n t e n t / e s s n e t b i g d a t a _ e n
W o r k P a c k a g e H
E a r t h O b s e r v a t i o n
D e l i v e r a b l e H 1
I n t e r i m t e c h n i c a l r e p o r t
Version 2019-09-30
Work package Leader:
Marek Morze (CSO, Poland)
telephone : +48 89 524 36 66
mobile phone :
Prepared by WPH team
Page 2 | 75
Table of contents
1. Introduction ..................................................................................................................................... 4
2. Methodological framework ............................................................................................................. 5
3. Satellite Earth Observation data sources ......................................................................................... 5
4. Report on thematic task 1 – Agriculture .......................................................................................... 8
4.1. Case study 1 - Crop recognition, mapping and monitoring ..................................................... 8
4.1.1. Pre-works ............................................................................................................................. 8
4.1.2. Stage 1 ............................................................................................................................... 11
4.1.3. Stage 2 ............................................................................................................................... 13
4.1.4. Stage 3 ............................................................................................................................... 19
4.2. Case study 2 - Monitoring of the off-season vegetation cover .............................................. 19
4.2.1. Pre-works ........................................................................................................................... 19
4.2.2. Stage 1 ............................................................................................................................... 21
4.2.3. Stage 2 ............................................................................................................................... 21
4.2.4. Stage 3 ............................................................................................................................... 22
4.3. Case study 3 - Crop recognition with very high-resolution aerial data .................................. 22
4.3.1. Pre-works ........................................................................................................................... 22
5. Report on thematic task 2 - Build-up area ..................................................................................... 26
5.1. Case study 4 - Implementing SDG indicator 11.7.1 ................................................................ 26
5.1.1. Pre-works ........................................................................................................................... 26
5.1.2. Stage 1 ............................................................................................................................... 27
5.1.3. Stage 2 ............................................................................................................................... 28
5.1.4. Stage 3 ............................................................................................................................... 28
5.2. Case study 5 - Urban sprawl across urban areas in Europe ................................................... 30
5.2.1. Pre-works ........................................................................................................................... 30
5.2.2. Stage 1 ............................................................................................................................... 35
5.3. Case study 6 - Combination of administrative and Earth Observation data to determine the
quality of housing .............................................................................................................................. 37
5.3.1. Pre-works ........................................................................................................................... 38
5.3.2. Stage 1 ............................................................................................................................... 43
5.3.3. Stage 2 ............................................................................................................................... 46
5.3.4. Stage 3 ............................................................................................................................... 46
6. Report on thematic task 3 - Land cover ......................................................................................... 47
6.1. Case study 7 - Comparing «in-situ» and «remote-sensing» collection mode for land cover data
47
6.1.1. Pre-works ........................................................................................................................... 47
6.1.2. Stage 1 ............................................................................................................................... 56
6.1.3. Stage 2 ............................................................................................................................... 56
6.1.4. Stage 3 ............................................................................................................................... 59
6.2. Case study 8 - Land cover maps at very detailed scale .......................................................... 60
6.2.1. Pre-works ........................................................................................................................... 60
6.2.2. Stage 1 ............................................................................................................................... 61
6.2.3. Stage 2 ............................................................................................................................... 62
Page 3 | 75
6.2.4. Stage 3 ............................................................................................................................... 62
7. Report on thematic task 4 - Settlements, Enumeration Areas and Forestry.................................. 66
7.1. Case study 9 - Update the INSPIRE Theme Statistical Units dataset and preventing forest fire
66
7.1.1. Pre-works ........................................................................................................................... 66
7.1.2. Stage 1 ............................................................................................................................... 67
7.1.3. Stage 2 ............................................................................................................................... 67
7.1.4. Stage 3 ............................................................................................................................... 68
8. Report on the meetings ................................................................................................................. 70
9. Bibliography ................................................................................................................................... 70
Page 4 | 75
1. Introduction
Workpackage H (WPH) is one of the four pilot projects carries out within the ESSnet Big Data II and is
implemented in a partnership of nine institutions from Poland, Belgium, Germany, France, Italy, Finland,
Netherlands and Portugal. The aim of the pilot is to support areal statistics with Earth Observation (EO)
data. Project results in experimental statistics using remote sensing data. From the technological point
of view, the WPH uses the new methods like machine learning algorithms for image analysis. EO creates
an unprecedented advantage in Europe and the World for the development of operational applications
of remote sensing providing an enormous dataset. Recently the EO has become increasingly
technologically sophisticated. The market is full of the EO data from high to low resolution, gathered
from unmanned aerial vehicles through aircrafts to satellites. Especially the launch of the Sentinels from
the Copernicus Programme opened a new chapter in applicability of remote sensing data ensuring free,
open access, continuous and systematic acquisition of the satellite images. One of the important
economic and commercial applications of EO data is official statistical production and landscape
mapping for variable thematic purposes. Nowadays there is the evident need to facilitate and improve
the mandatory statistical registers. In the era of geospatialization of the information the use of EO data
is reasonable and continuously promising, particularly in the perspective of upcoming Census 2021 and
Agricultural Census as well as other commitments of European Commission or United Nations. The
crucial goal of the WPH is the usage of the EO data from different sources that will contribute to build
the geospatial framework to support the mentioned registers. Within this project the usefulness and
practical usage of EO data in order to fill the gap between statistical and geographical information
named as “geospatial breakdown” is proposed. The main objectives of WPH are implemented by the
execution of different case studies divided into thematic tasks: agriculture, build-up area, land cover,
settlements, enumeration areas and forestry. The overview of thematic tasks and cases studies is
presented in Figure 1.1.
Figure 1.1 Overview of thematic tasks and cases studies.
Page 5 | 75
2. Methodological framework
Based on the thematic fields raised in the project, methodological framework was jointly developed
(Figure 2.1). The methodological framework is closely related to quality, metadata and IT infrastructure
issues and is divided into five general stages: Pre-works and Stages from 1 to 4. Pre-works are all
activities related to background study including state of the art deep researching, definition of statistical
products with using EO data and study about available data source and toolkits. Stage 1 is focused on
specification of the test area and collection of the EO data (ordering, downloading etc.) with other type
of the data (administrative, cadastral etc.). Stage 2 is the preparation of acquired and downloaded data
for main processing and analysis. It can mean unzipping and importing original data to expected format,
data reformatting, database re-shaping and necessary information extraction, radiometric and
geometric correction of images, SAR pre-processing from single look complex/ground range data to
calibrated sigma nought orthoimages etc. Stage 3 is a development of the methods and procedures to
be used for producing statistics. This stage includes data processing (e.g. image segmentation,
classification, learning machine etc.) and complex analysis of the results. Stage 4 is the last part and
includes pilot production, validation and final conclusions.
Additionally, the quality assessment (in dark blue on Figure 2.1) should be performed for Pre-works and
Stages 1-3 as well as IT Infrastructures (in green) and Metadata (in aquamarine) issues for Stages 1-3.
Figure 2.1 The methodological framework of WPH.
3. Satellite Earth Observation data sources
The basis of performed tasks are EO data from various sources. In some cases, the data sources are
common. Each of the data sources offers many different products. The used products may vary
depending on the case study, that’s way the detailed specifications of products are included in particular
case studies. Below is described general information of the common data sources.
Copernicus data
Copernicus programme is the European Programme for the establishment of a European capacity for
Earth Observation. Copernicus consists of satellite missions named as Sentinels. Sentinel-1A/1B, -2A/B,
-3A/B and -6 are dedicated satellites, while Sentinel-4 and -5 are instruments on board EUMETSAT’s
weather satellites. Figure 3.1 shows all Sentinels of Copernicus Programme.
Page 6 | 75
Figure 3.1 Sentinels of Copernicus Programme
Sentinel-1 operates day and night, performs C-band synthetic aperture radar imaging, enabling to
acquire imagery regardless of the weather. The Sentinel-1 is a two-satellite constellation of Sentinel-1A
(launched in 2016) and Sentinel-1B (launched in 2016), which provides acquisitions with 6 days interval.
Sentinel-1 works with four nominal operational modes on each spacecraft:
Stripmap mode (SM): 80 km swath, 5 m x 5 m resolution
Interferometric Wide Swath mode (IWS): 240 km swath, 5 m x 20 m resolution
Extra Wide Swath mode (EWS): 400 km swath
Interferometric Wide Swath mode (IWS): 240 km swath, 25 m x 80 m resolution
Wave mode (WM): 20 km x 20 km, 20 m x 5 m resolution
Sentinel-2 consisting of 2 polar orbiting satellites (786 km above sea level), Sentinel-2A (since June 2015)
and Sentinel-2B (since March 2017), allows total coverage of the Earth with a 5-day repetition and
provides images of a 290 km swath and a resolution of 10 to 60 m according to spectral bands ranging
from visible to infrared. A total of 13 spectral bands, 3 of which in the short infrared (Short-Wave
Infrared; SWIR) are Sentinel-2 products containing surface reflectance data (Figure 3.2).
Figure 3.2 Sentinel-2 spectral bands. Source ESA.
Sentinel-5 Precursor mission is the mission dedicated to monitoring atmosphere. The mission consists
of one satellite carrying the TROPOspheric Monitoring Instrument (TROPOMI) instrument. The satellite
was launched in 2017.
Page 7 | 75
Access to Sentinels data
Open data policy adopted for the Copernicus programme foresees access to Sentinels remote sensing
data available to all users via a simple pre-registration. Sentinels data is available at no costs.
The Sentinel data is available through the Copernicus Open Access Hub https://scihub.copernicus.eu/).
This platform provides full access to all Sentinel-1, -2 and -3 user products. One possibility is to
interactively select an area and a time period of interest on the Copernicus Open Access Hub. Figure 3.3
visualizes how images can be searched for and filtered by cloud coverage, area and date.
Figure 3.3 Selection of area and time period of interest on the Copernicus Access Hub
Data can be downloaded using the API Hub which is a dedicated interface allowing users access via a
scripting interface.
Lansdat mission
Landsat a joint program of the USGS and NASA, has been observing the Earth continuously since 1972.
Landsat satellites image currently provides global coverage at a 30-meter resolution about once every
two weeks, with multispectral and thermal data.
The Landsat satellites are a series of civil NASA Earth observation satellites for remote sensing of the
continental earth’s surface and coastal regions. Since 1972, eight satellites of this series have been
launched (one of which was a false start), spread over four series. The latest satellite of the Landsat
program is Landsat 8. Landsat 8 was launched in February 2013 by NASA. It is equipped with the OLI and
TIRS sensors, which deliver images in various spectral ranges of visible light and infrared with pixel
resolutions of 15 to 100 m (at object Earth). The thermal bands 10 and 11 have a bandwidth of 10.6 –
12.5 µm while the resolution in the visible and near infrared is 30m.
Terra/Aqua MODIS
Terra and Aqua are a joint Earth observing missions within NASA's ESE (Earth Science Enterprise)
program between the United States, Japan and Canada. One of the sensors carried by Terra and Aqua
is MODerate resolution Imaging Spectrometer (MODIS). MODIS has 36 channels between 0.44 µm and
15 µm with spatial resolution ranging from 250 m to 1 km. Objective of MODIS is to measure biological
and physical processes on a global basis on time scales of 1 to 2 days. MODIS gives information for
example of cloud and aerosol properties, surface temperature at 1 km resolution, chlorophyll
concentration.
Access to data
Landsat-8 and MODIS data can be freely downloaded directly from USGS service named Earth Explorer
https://earthexplorer.usgs.gov/.
Page 8 | 75
4. Report on thematic task 1 – Agriculture
4.1. Case study 1 - Crop recognition, mapping and monitoring
4.1.1. Pre-works
State of the art
Earth observation satellite systems have been providing information of the Earth’s surface for several
decades. Nowadays the number of active EO systems has significantly increased up to hundreds which
is over 30% of operational satellite systems. Current state is due to the constantly growing demand for
geoinformation in multiscale dimension in many sectors of life. Technology development caused the
data is accurate, timely and easily accessible. There are several official databases of EO missions and
sensors which shows enormous market of satellites images, for example oePortal of ESA - European
Space Agency (https://directory.eoportal.org/web/eoportal/satellite-missions), website of CEOS –
Committee on Earth Observation Satellites (http://database.eohandbook.com/), OSCAR developed by
WMO - World Meteorological Organization (https://www.wmo-sat.info/oscar/satellites). Over the
years, international cooperation between individuals, organizations, institutions and industrial sector
has strongly developed leading to the formation of many associations and societies. The most important
international societies of remote sensing users are Geoscience and Remote Sensing Society (GRSS)
formed in 1961, International Society of Photogrammetry and Remote Sensing (ISPRS) established
under this name in 1980, but founded in 1910 as International Society for Photogrammetry, European
Association of Remote Sensing Laboratories (EARSeL) founded in 1977, Committee on Earth
Observation Satellites (CEOS) established in 1984, Group on Earth Observations (GEO) formed in 2005
and UN Committee of Experts on Global Geospatial Information Management (UN-GGIM) established
in 2011. A broad look on the market of the satellite images shows that remote sensing techniques are
widely used and permanently developing. One of the applications of remote sensing is official statistics.
The brief summary of the using satellite imagery and earth observation technology in official statistics
was currently presented in documents (UNECE-CES 2019; United Nation 2017; GSARS 2017).
The main thematic field of using the remote sensing in official statistics is agriculture, which is also the
subject of a case study 1. Remote sensing can be used for many tasks, from cropland mapping, crop
acreage estimation, biomass and yield estimation, vegetation vigour and drought stress monitoring to
overwintering. Most of the research is based on the observations over the longer period (Bargiel 2017;
Demarez et al. 2019; Inglada et al. 2015; Navarro et al. 2017). It is reasonable because of crops
seasonality and their phenology. Additionally, the crop mapping is difficult because single crop type can
vary due to different cultivation practices, soil type or moisture. In the context of digital image
classification, this is called intra-class variability. In literature can be found that for agriculture area
mapping the two types of satellite images are using, gathered from passive (Rufin et al. 2019; Feng et
al. 2019; Dimitrov et al. 2019) and active sensors (Bargiel 2017; Bargiel et al. 2014; Hütt and Waldhoff
2018). Optical sensors are passive and well-suited for crops mapping and monitoring growth condition
(Atzberger 2013). Unfortunately, the information from optical sensors is not possible to achieve when
is cloudy. On the other hand, there are active systems like Synthetic Aperture Radar (SAR), generating
their own energy with few centimetres wavelength, which allows to penetrate clouds and imaging
regardless of weather conditions. In general, SAR and optical data both accurately reproduce crop
growth cycles and may be combined for having full gap-free time series (Veloso et al. 2017; De Bernardis
et al. 2016). In this study the combined data from optical and SAR systems is under investigation. In
literature two approaches of classification can be found: pixel-based (Xie et al. 2019; Sonobe 2019;
Page 9 | 75
Navarro et al. 2017) and object-based classification (Peña-Barragán et al. 2011; Q. Li et al. 2015; Csillik
et al. 2019). Since our study will use object-oriented administrative data the object-based classification
will be performed. The machine learning algorithm like supported vector machine, random forest,
decision tree, maximum likelihood, artificial neural network will be tested. Examples of implementations
of these algorithms can be found in the literature (Feng et al. 2019; Gómez et al. 2019; Jia et al. 2012;
Sonobe et al. 2015).
It is worth to mention that ESA has ongoing projects related to the agriculture crops:
Sen2Agri - Sentinel-2 for Agriculture (http://www.esa-sen2agri.org/). This project exploits
optical satellite systems like Sentinel-2 and Landsat 8 for agriculture monitoring.
Sen4CAP - Sentinels for Common Agricultural Policy (http://esa-sen4cap.org/). Project
provides to the European and national stakeholders of the CAP validated algorithms, products,
workflows and best practices for agriculture monitoring relevant for the management of the
CAP based on Sentinel-1 and Sentinel-2 data. Sen4CAP has been setup by ESA in direct
collaboration and on request from DG-Agri, DG-Grow and DG-JRC.
SEN4Stat – Sentinels for Statistics (http://www2.rosa.ro/index.php/en/esa/oferte-
furnizori/3153-sen4stat-sentinels-for-statistics). The aim of SEN4Stat is facilitating the uptake
of Sentinels data in NSOs supporting the agricultural statistics. The development and
demonstration of EO products as well as best practices for agricultural monitoring relevant to
SDGs reporting and monitoring their progress at national scale will be given.
Only Sen2Agri is operational at the present day, the rest of projects are under construction. Moreover,
ESA started in 2014 Thematic Exploitation Platform (TEP) among others dedicated to food security
taking into account the use EO data to agriculture monitoring. The TEP platform is a collaborative, virtual
work environment providing access to EO data and the tools, processors. This TEP platform is still being
built.
Statistical product definition
The main purpose of the “Agriculture - Crop recognition, mapping and monitoring” case study is to use
EO data gathered from Sentinel-1 and Sentinel-2 satellites for agricultural crops mapping and area
estimation in Northern Europe conditions. EO data combined with administrative geodata is promising
tool for statistics production. The pilot project shows the methodology of using EO data with machine
learning algorithms.
The main product of the case study 1 is map of crops. Based on the obtained map, area of crops is
estimated. The crops map can support agricultural statistics in further projects including crop yields and
growth models.
Data source and toolkit
The list of satellite systems achieving earth observation data is long. For the detail crops identification,
the systems with high and medium resolution (up to 30 m) should be chosen. The list of example and
potential SAR satellite systems using for agriculture is shown in Figure 4.1.
Page 10 | 75
Figure 4.1 Timelines of SAR satellite systems.
The tools for three types of processing including satellite data processing, administrative data
processing and machine learning are needed in reference to case study 1. There are many available
commercial as well as open source/free software on the market. The list of potential software to use is
showed in Table 4.1.
Table 4.1 List of potential software to use.
Purpose Type of
software Name
Type of data SAR images Optical images Vector data Text
Satellite data processing
Commercial ENVI/SARscape + + + +
Commercial GAMMA software + +
Commercial PCI Geomatics + + + + Commercial Erdas Imagine + + +
Commercial TerrSET + + +
Commercial SARPROZ + Open source ESA/SNAP + + +
Open source PolSARpro + +
Open source RAT (Radar Tools) + Open source Sen2Agri + +
Open source Ilwis + Open source MapReady + +
Administrative data processing
Commercial ArcGIS +
Open source QGIS + Open source SAGA +
Open source GRASS +
Commercial Microsoft Office + Open source Libre Office +
Machine learning
Commercial eCognition + + +
Commercial ENVI + + + Open source ORFEO Toolbox + + +
Open source LEOworks + + +
Page 11 | 75
4.1.2. Stage 1
Test site definition
The investigated area is Warmian-Masurian Voivodship (24 173 km2) located in the north-eastern
Poland (Figure 4.2). It is 7.7% of Poland’s area. It is the fourth in terms of area among the voivodships.
Figure 4.2 Geographical localization of the test area
The largest area of the Warmian-Masurian Voivodship is agricultural land – 54.4% (mainly: arable land
– 66.4%, permanent pastures – 16.9%, permanent meadows - 12.2%). Forests cover 32.2% of the area.
Warmia and Mazury is characterized by many lakes and rivers in Poland. The water area in the
voivodship is 5.7%. The soils of the test area are characterized by high variability. The dominant soil
types are brown soils (covering about 70% of the area) and hydrogenic soils (about 14% of the area).
The arable land is dominated by good and medium soils in terms of agricultural usefulness. Mainly the
soils are 3rd and 4th quality classes, they cover 73,8% of the voivodship’s area. There are 20 main types
of crops which are under investigation. The list of crops for recognition is the following:
sugar beets
buckwheat
spring barley
winter barley
corn
cereal mixes
oat
fruit trees plantations
fruit bushes plantations
spring wheat
winter wheat
spring triticale
winter triticale
spring rape
winter rape
grassland
potatoes
rye
mustard
leguminous crops
Page 12 | 75
Data collection
Two types of data are used for case study 1: satellite images and administrative geodata. For the pilot
study the long time series of Sentinel-1 images is the base of the research and Sentinel-2 optical images
support the investigation.
The gathered data of Sentinel-1 are Ground Range Detected (GRD) products from Interferometric Wide
Swath Imaging Mode and resampled to 10 m pixel size. The products are acquired in dual polarization
mode VV (vertical transmitted and vertical received signal) and VH (vertical transmitted and horizontal
received signal). It means that physically for each acquisition date there are two images. The coverage
of the single Sentinel-1 image are 270 km by 200 km. For whole test area two scenes are needed. The
Sentinel-1 data characteristics are shown in Table 4.2. The second collected type of images are from
optical sensor from Sentinel-2. The images of Sentinel-2 are bottom of atmosphere corrected images in
cartographic geometry (2A processing level). The optical data contains 13 bands and has 10 m or 20 m
or 60 m resolution depend on spectral band. Single image of Sentinel-2 covers 100 km by 100 km, thus,
to fill out whole test area twelve images are required. The Sentinel-2 data characteristics are shown in
Table 4.2.
Table 4.2 Characteristics of Sentinel-1 and Sentinel-2 data.
a) Sentinel-1 data characteristics
Imaging mode IW
Product GRD
Relative orbit 51
Pass direction descending
Polarization VV/VH
Resolution 10m
Scene size 270 x 200 km
File size 1.5-2 GB per scene
b) Sentinel-2 data characteristics
Processing level 2A
Product S2 MSI 2A
Relative orbit 79
Pass direction descending
Resolution 10m
Tile size 100 x 100 km
File size 1 GB per tile
The Figure 4.3 shows footprints of Sentinels images for the test area.
Figure 4.3 Footprints of Sentinel-1 images covering test area are presented in red, footprints of Sentinel-2
images are shown in orange.
Page 13 | 75
The analysed time series covers vegetation season 2018. The acquisitions of satellite data started in
October 2017 when winter crops had emerged and ended in September 2018 after the harvest. The
satellite image collection includes 26 acquisitions of Sentinel-1 and 3 acquisitions of Sentinel-2. The
timeline of acquisition’s dates is shown in Figure 4.4.
Figure 4.4 Timeline of Sentinel-1 and Sentinel-2 acquisitions.
Geodata obtained from administrative sources: Land Parcel Identification System (LPIS), Agency for
Restructuring and Modernisation of Agriculture (ARMA) and Statistics Poland is the second type of data
used for case study 1. LPIS provides details of feature boundaries and land use information in vector
format. ARMA gives information of crops declared by farmers and Statistics Poland provides information
of crop type from field campaigns (in-situ measurement).
4.1.3. Stage 2
Data pre-processing
The pre-processing of satellite data is a crucial and necessary point before further analysis and
classifications. The Sentinel-1 data are images in SAR geometry and they present uncalibrated values of
radar backscatter. It is required to do the pre-processing which consists in radiometric and geometric
transformations leading to the elaboration of orthorectified sigma nought maps. The pre-processing in
this case study was done using open source software – SNAP 7.0. The workflow of pre-processing is
presented in Figure 4.5.
Page 14 | 75
Figure 4.5 Scheme of Sentinel-1 data pre-processing
The pre-processing of Sentinel-1 data includes:
Choosing subset of image in case of test area smaller than half scene one Sentinel-1 image. The
pre-processing of the whole scene is more time consuming.
Thermal noise removal
Remove GRD border noise - masking the "no-value" samples efficiently with thresholding
method
Radiometric calibration to Sigma Nought in linear scale
Slice assembly if the area of interest is located on the border of two consecutive images along
track
Sub-pixel coregistration of SAR images
Speckle filtering based on time series in reference to each polarization
Stack creation – joining two datasets after multitemporal filtering
Page 15 | 75
Geocoding – geometric correction to cartographic system including digital elevation model
(SRTM)
Spatial subset selection in the cartographic coordinates
Converting linear scale to the logarithmic scale in decibels
Additional median filtering with window size 3 by 3 pixels
The example of pre-processing result is shown in Figure 4.6.
Figure 4.6 Sentinel-1 colour composition of Warmian-Mazurian Voivodeship (R: 9 May 2018, G: 8 June 2018, B: 7 August 2018; pixel size: 10x10m; polarization VH)
In case of Sentinel-2 data the key pre-processing step is image mosaicking and cloud masking. The pre-
processing is performed using SNAP 7.0 by steps presented in Figure 4.7. The main outputs are 4 bands
VNIR and calculated NDVI images.
Page 16 | 75
Figure 4.7 Scheme of Sentinel-2 data pre-processing
The example of pre-processing results of Sentinel-2 is shown in Figure 4.8.
Figure 4.8 Sentinel-2 natural colour composition of Warmian-Mazurian Voivodeship (R: B4, G: B3, B: B2 of 7 June
2018; pixel size: 10x10m)
Page 17 | 75
The full list of gathered images for further analysis contains 72 images is above (Table 4.3).
Table 4.3 List of output images after pre-processing.
1 Sigma0_VH_2017_10_17_db 25 Sigma0_VH_2018_08_31_db 49 Sigma0_VV_2018_08_19_db
2 Sigma0_VH_2017_10_23_db 26 Sigma0_VH_2018_09_06_db 50 Sigma0_VV_2018_08_25_db
3 Sigma0_VH_2018_04_03_db 27 Sigma0_VV_2017_10_17_db 51 Sigma0_VV_2018_08_31_db
4 Sigma0_VH_2018_04_09_db 28 Sigma0_VV_2017_10_23_db 52 Sigma0_VV_2018_09_06_db
5 Sigma0_VH_2018_04_15_db 29 Sigma0_VV_2018_04_03_db 53 B2_Blue_443nm_2018_03_19
6 Sigma0_VH_2018_04_21_db 30 Sigma0_VV_2018_04_09_db 54 B3_Green_560nm_2018_03_19
7 Sigma0_VH_2018_04_27_db 31 Sigma0_VV_2018_04_15_db 55 B4_Red_665nm_2018_03_19
8 Sigma0_VH_2018_05_03_db 32 Sigma0_VV_2018_04_21_db 56 B8_NIR_842nm_2018_03_19
9 Sigma0_VH_2018_05_09_db 33 Sigma0_VV_2018_04_27_db 57 B2_Blue_443nm_2018_04_13
10 Sigma0_VH_2018_05_15_db 34 Sigma0_VV_2018_05_03_db 58 B3_Green_560nm_2018_04_13
11 Sigma0_VH_2018_05_21_db 35 Sigma0_VV_2018_05_09_db 59 B4_Red_665nm_2018_04_13
12 Sigma0_VH_2018_05_27_db 36 Sigma0_VV_2018_05_15_db 60 B8_NIR_842nm_2018_04_13
13 Sigma0_VH_2018_06_08_db 37 Sigma0_VV_2018_05_21_db 61 B2_Blue_443nm_2018_05_08
14 Sigma0_VH_2018_06_14_db 38 Sigma0_VV_2018_05_27_db 62 B3_Green_560nm_2018_05_08
15 Sigma0_VH_2018_06_20_db 39 Sigma0_VV_2018_06_08_db 63 B4_Red_665nm_2018_05_08
16 Sigma0_VH_2018_06_26_db 40 Sigma0_VV_2018_06_14_db 64 B8_NIR_842nm_2018_05_08
17 Sigma0_VH_2018_07_02_db 41 Sigma0_VV_2018_06_20_db 65 B2_Blue_443nm_2018_06_07
18 Sigma0_VH_2018_07_08_db 42 Sigma0_VV_2018_06_26_db 66 B3_Green_560nm_2018_06_07
19 Sigma0_VH_2018_07_14_db 43 Sigma0_VV_2018_07_02_db 67 B4_Red_665nm_2018_06_07
20 Sigma0_VH_2018_07_20_db 44 Sigma0_VV_2018_07_08_db 68 B8_NIR_842nm_2018_06_07
21 Sigma0_VH_2018_08_01_db 45 Sigma0_VV_2018_07_14_db 69 NDVI_2018_03_19
22 Sigma0_VH_2018_08_07_db 46 Sigma0_VV_2018_07_20_db 70 NDVI_2018_04_13
23 Sigma0_VH_2018_08_19_db 47 Sigma0_VV_2018_08_01_db 71 NDVI_2018_05_08
24 Sigma0_VH_2018_08_25_db 48 Sigma0_VV_2018_08_07_db 72 NDVI_2018_06_07
Geodata from administrative sources also has been pre-processed. The ingestion of geodata relies on
reformatting, database re-shaping and necessary information extraction. In this case study, two tasks
must be done. First task is to choose area intended only for agricultural practices. For this purpose,
information from cadastral parcels (LPIS) and land use (ARMA) was combined (Figure 4.9).
Figure 4.9 Cadastral parcels borders on the left (red lines), Land Use borders in the middle (white lines), border
of agricultural parcels on the right (blue lines filled with white).
Page 18 | 75
Second task is to choose the representative parcels of each type crop to perform supervised
classification. For this task information on crops declared by farmers (ARMA) and cadastral parcels (LPIS)
were used. The parcels were chosen with three conditions: i) parcels bigger than 1 ha, ii) on type of crop
on one cadastral parcel, iii) declared crop type covers at least 98% of cadastral parcel (Figure 4.10).
Figure 4.10 Examples of representative parcels (green) and excluded parcels (red) on colour composition of
Sentinel-1 (R: 9 May 2018 VH, G: 8 Jun 2018 VH, B: 7 Aug 2018 VH).
The output from this task are the parcels limits facilitating image segmentation, objects aggregation and
validation. Summary of the selected parcels is in Table 4.4.
Table 4.4 Summary of the selected parcels for image classification.
Crop type Number of samples
Sum of area samples [ha]
% of agriculture area
sugar beets 75 520 0.04
buckwheat 268 1174 0.10
spring barley 848 3835 0.32
winter barley 156 539 0.04
corn 1163 5451 0.45
cereal mixes 553 1862 0.15
oat 610 2577 0.21
fruit trees plantations 91 291 0.02
fruit bushes plantations 85 227 0.02
spring wheat 1097 5618 0.46
winter wheat 2250 11404 0.94
spring triticale 222 626 0.05
winter triticale 1422 6033 0.50
spring rape 147 792 0.07
winter rape 1250 7335 0.60
grassland 12013 58108 4.78
potatoes 87 225 0.02
rye 878 3264 0.27
mustard 75 218 0.02
leguminous crops 755 4097 0.34
SUM 24045 114195 9.40
Page 19 | 75
4.1.4. Stage 3
Main data processing
In order to agricultural crops mapping and area estimation, the goal of the Case Study 1, the object-
based image classification will be performed. Due to the fact, that the time series is a base of analysis,
the selection of the subsets of images the most suitable for object-based classification will be done. The
selection is related to quality of images and information gathered from them. Merging temporal and
polarimetric features of the crops should permit to extract the subsets returning maximum separability
of crops and highly reduced data volume. This task is very important and critical for final results. It should
give the answer which crops can be effectively recognized and mapped using Sentinel-1A/B very long
time series.
In order to achieve the goals object based classification including machine learning algorithms will be
performed. Key elements of this task are:
• testing mean shift segmentation algorithm parameters for calculating homogeneous areas
(segments) on Sentinel 1 and Sentinel 2 preprocessed data with open source CNES OrfeoToolbox
software,
• testing machine learning algorithms (support vector machine classifier, decision tree classifier,
artificial neural network classifier, random forest classifier, KNN classifier) parameters in the context of
obtaining the best accuracy for crop recognition with open source CNES OrfeoToolbox software.
4.2. Case study 2 - Monitoring of the off-season vegetation cover
4.2.1. Pre-works
State of the art
Radar remote sensing data enables the classification of arable land surface at critical moments, such as
in spring when snow is thawing. While passive optical satellite data e.g. from Sentinel-2 heavily relies on
the sun as target illumination source, and cannot receive reflectance due to clouds, water vapor and
aerosols, radar satellite like Sentinel-1 is an active system, which allows operation during day and night
and through the shorter wavelengths of radars it can also penetrate clouds. Due to these capabilities it
has also good ability to separate surfaces based on its roughness and water content. Thus, it can be
assumed that it is possible to distinguish bare, snow-free arable land with fairly good accuracy from land
covered in vegetation in the time window between snow thaw and the beginning of the growing season.
Operating on microwave wavelengths, the radar beam can penetrate to some extent also the soil.
Therefore, the soil properties have effect on the signal and should be included in the model. Whereas
radar is sensitive to roughness and water content of the surface, optical reflectance at microwave
regions corresponds to cellulose absorption of dead vegetation, thus making it sensitive to plants
residue levels on the soil surface. For example, Normalized Difference Tillage Index (NDTI) could give
additional information to the model on the quality of soil residue cover. In summary, an integration of
Sentinel-1 and Sentinel-2 imagery is a promising approach to classify soil cover.
Statistical product definition
In Finland the statistics on the structure of agricultural and horticultural enterprises is produced by
Natural Resources Institute Finland. The statistics include information such as the number of
enterprises, land use, type of production, and educational level of the owners of agricultural holdings.
Regulation (EU) 2018/1091 on integrated farm statistics provides the framework for the statistics on
Page 20 | 75
the structure of agricultural and horticultural enterprises. According to a new implementing regulation,
a new variable will be set out in 2023: proportion of agricultural area under vegetation cover in winter.
Large-scale information on off-season vegetation cover will enhance the estimation of soil erosion and
nutrient loads to the water bodies. The statistics on off-season vegetation cover would also relate to
the UN's Sustainable Development Goals, Indicator 2.4.1: Proportion of agricultural area under
productive and sustainable agriculture. This method would provide grounds for establishing an indicator
on sustainable agriculture as land management practices closely relate to sustainability. The statistical
product would tell the proportion of agricultural area under vegetation cover in winter on regional level.
Data source & toolkit
The spaceborne radar remote sensing data used in this study is acquired by the Sentinel-1
mission.Sentinel-1 data have been radiometrically and geometrically corrected and provided in 11-day
mosaics covering Finland with a spatial resolution of 20 meters on the Finnish Geospatial Platform. The
Sentinel-1 data is preprocessed by the National Satellite Data Center (NSDC) at the Finnish
Meteorological Institute. NSDC also provides preprocessed mosaics of NDTI from Sentinel-2 mission on
60 meters spatial resolution.
Sentinel-1 data and Sentinel-2 based vegetation indices are examples of preprocessed data sets made
publicly available on the Finnish Geospatial Platform. The platform collects spatial data from various
providers and makes them openly available to users. The aim of the platform is to harmonise and
improve services provided by the public administration, to improve data-based decision-making and
increase transparency, as well as to save public administration costs, e.g. by enabling the efficient
maintenance of data resources, removing overlapping activities and harmonising datasets. The
responsible party of the Geospatial Platform project is the Ministry of Agriculture and Forestry.
Participating in the preparation and implementation of the project are the Ministry of Finance, the
Ministry of the Environment, the Finnish Environment Institute, the National Land Survey of Finland and
other partners from the private and public sectors.
As a reference data, we use existing administrative data from the national Integrated Administration
and Control System (IACS) operated by Finnish Agency for Rural Affairs. IACS provides open data on
annual agricultural land use (Land Parcel Identification System, LPIS) and agricultural payment
entitlements. Under the EU’s Common Agricultural Policy the Agri-Environmental Support (AES)
Schemes provide financial support for Member States to design and implement agri-environment
measures. Farmers who subscribe, on a voluntary basis, to environmental commitments related to the
preservation of the environment and maintaining the countryside, are provided payments. IACS
contains also data of agri-environment measures on field parcel level. There are several measures that
commit to off-season vegetation cover. For each field parcel, we set a value of a type of vegetation
cover based on the information on farmers environmental commitments. In order to combine satellite
and reference data, field parcel geometries (LPIS) is used to extract backscatter intensity values derived
from the satellite data per field parcel. All IACS data used in this study is openly available upon
application with non-disclosure agreement.
Our initial plan was to utilize ancillary data on precipitation, temperature and soil properties. The idea
was that with meteorological data we could decide when the time window between snow thaw and the
beginning of the growing season occurs each year. As we learned that there are preprocessed 11-day
mosaics in data analysis ready format, that actually solve the problem of changing weather conditions
in the time window, we decided to use mosaics instead. Moreover, we found a meteorological product
of the date of the start of the growing season. The date helps to set the time window suitable for
Page 21 | 75
monitoring the soil cover regionally. In the Southern Finland the 11-day mosaic would be around the
11th or 21st of April, in the North the 1st or 11th of May. The meteorological product of the date of the
start of the growing season is also open data from the Finnish Meteorological Institute.
Data on soil properties is freely available from the Finnish Soil Database. Considering the spatial
variability of soil properties on field scale we need to average the variability to one class per parcel. Our
plan is to append the soil data only after first results from modelling. The hypothesis is that adding soil
type may improve the model.
All data in this project is processed with open source Python libraries. Python is suitable for spatial data
and also for downstream data analysis. Note that radar satellite images were already preprocessed, and
we can use products in analysis ready format.
4.2.2. Stage 1
Test site definition
The area of interest (AOI) comprises 23200 field parcels of total 91500 ha arable land from the primary
agricultural production region in southwestern Finland. The soil of the arable land is approximately 93%
of mineral and 7% of organic soils.
Data collection
In this project, data will be retrieved from a WMS service (GeoTIFFs), by email (IACS data) and from a
database (meteorological data). Files can be saved on standard PC storage.
4.2.3. Stage 2
Data pre-processing
AOI was masked from LPIS. Parcels smaller than 1ha or with holes were masked out. The resulting set
of field parcels were merged with AES data. The soil cover class was decided based on the variables of
the agri-environment measures subscribed to a parcel. If a parcel was not subscribed to any measure,
it was considered as potentially ploughed, that is bare soil. It was checked from the following year’s
parcel data that no autumn crop was sowed. In the end, we decided that if the following year the parcel
was growing spring crop, then the parcel was most probably ploughed. After preprocessing we have
15000 parcels covering 59000ha of arable that have vegetation cover in the winter, either truly
vegetation or vegetation residues (reduced tillage), and 8300 ploughed parcels covering 35000ha.
We use Sentinel-1 mosaics for the ground range detected horizontal polarization ground backscatter of
vertical polarization radar pulse (VH) and vertical polarization ground back scatter of vertical polarization
radar pulse (VV). Usually there is an overpass every 2-3 days over Finland. As several measurements
have been combined over 11 days, we get the maximum, minimum, mean and standard deviation of
these measurements. Sentinel-2 optical sensors acquire 13 spectral bands in the visible, the near
infrared, and the short-wave infrared (SWIR) wavelength on 60m spatial resolution. After preprocessing,
NDTI is calculated from the SWIR bands number 11 and 12. 15-day mosaics have the index averaged per
pixel.
Page 22 | 75
4.2.4. Stage 3
Main data processing
From each parcel we have a distribution of backscatter coefficients from two radar polarisations VV and
VH, their difference and NDTI. As target variable we have class variable with three soil cover values:
“bare soil”, “reduced tilling”, “vegetation cover”. For a classification task, we divide the data set into
training, validation, and testing set by 60-20-20 to train a convolutional neural network and by 80-20
for Random Forest as a baseline.
4.3. Case study 3 - Crop recognition with very high-resolution aerial data
4.3.1. Pre-works
Introduction
Research question
The use of satellite data with an intermediate resolution (10 m for Sentinel-1/2) for agriculture and more
specifically for crop recognition might not be adequate in areas where the size of parcels is relatively
small and diversity of crops is relatively high. In those cases, aerial photography data with a higher
resolution would offer a solution. This case study addresses two concrete questions:
Data availability: are aerial photography data available for official statistics, with characteristics
and conditions of use allowing them to be used realistically?
Testing in practice whether high-resolution aerial photography data can be used for crop
recognition at a more diversified, detailed and small-scale level, using machine learning
algorithms.
Obviously, the second objective can only be attained after the first one has been addressed successfully.
Concrete approach
Belgium is a federal state where competencies linked to territory (such as environment, agriculture,
land use, zoning regulations, construction and housing, GIS, …) are exercised at the level of Belgium’s
regions Flanders, Wallonia and Brussels. In order to tackle both research questions, Statbel opted to
limit itself to Flanders, partnering with Statistics Flanders.
Flanders has 6.6 million inhabitants (about 58% of Belgium’s population) for a surface area of 13,500
km², resulting in a high population density of about 490 persons per km². As a result, its agricultural area
(7,425 km² or 55%) consists of a great number of relatively small plots which may vary considerably as
to the crops under cultivation. Consequently, the use of satellite data with a fairly low resolution is
probably less effective to assess crops than in predominantly agricultural regions characterised by large
plots and monocultures.
Statistics Flanders, being interested in the exploitation of satellite and aerial imagery for various
statistical purposes, has agreed to act as ‘unofficial’ non-refunded partner in WPH. Its role is essential
for connecting to the departments and units in the Flemish administration owning satellite and aerial
photography data and/or using machine learning/deep learning/artificial intelligence to analyse these
data.
Data availability
In order to assess the availability of aerial photography data with the required resolution, frequency and
access conditions, Statbel and Statistics Flanders had e-mail and face-to-face exchanges with the major
Page 23 | 75
organisations and units responsible within the Flemish administrations: EODaS (Earth Observation Data
Science) programme of the Department Informatie Vlaanderen (Information Flanders), VITO (Vlaams
Instituut voor Technologisch Onderzoek, Flemish Institute for Technological Research) and the GIS
unit/Geoloket (Geocounter) of the department Agriculture and Fisheries.
EODaS Programme, Ghent (BE)
https://overheid.vlaanderen.be/informatie-vlaanderen/producten-diensten/earth-observation-data-
science (Dutch)
The EODaS programme manages the collection of earth observation data for Flanders and their
dissemination as open data in a standardised way via web services.
EODaS has several potentially useful datasets freely available (for Sentinel data, see below, VITO):
10-yearly high-resolution aerial photography (RGB) data, resolution 10 cm, and LiDAR data,
resolution 25 cm, collected 2013-2015 for DHMVII (Digitaal Hoogtemodel Vlaanderen, Digital
Elevation Model Flanders);
triannual aerial photography datasets, summer, resolution 40cm;
annual aerial photography datasets, winter, resolution 25 cm;
Data can be accessed via:
the ‘Orthophoto mosaic webservice’ for aerial photography
(http://www.geopunt.be/catalogus/webservicefolder/418e8e4a-12c1-80a8-8306-fcf4-799c-
581d-c4e38594), with among others most recent aerial photography dataset 2019.02, ground
resolution 25 cm (http://www.geopunt.be/catalogus/datasetfolder/50134be3-f0cd-47c5-8f6e-
4a0936287947
the OpenData Viewer for LiDAR raw remote sensing data collected 2013-2015
(https://remotesensing.vlaanderen.be/apps/openlidar/), including RGB.
Other Links
Beeldverwerkingsketen (BVK) (Image processing chain): https://overheid.vlaanderen.be/bvk-
algemeen
LiDAR DHMV (Digitaal Hoogtemodel Vlaanderen, Digital Elevation Model Flanders):
http://www.geopunt.be/catalogus/datasetfolder/7e40413e-9c17-492b-ac24-e72d37251e5a
Geopunt, central Inspire-compliant gateway to Flemish geographic government data made
accessible to government agencies, citizens, organizations and companies:
http://www.geopunt.be/over-geopunt
Presentation (Dutch) Source Data DHMV
https://overheid.vlaanderen.be/sites/default/files/media/documenten/informatie-
vlaanderen/producten/BVK/documenten/Infosessie_BVKOpenData_LiDAR_21022017%20%28
1%29.pdf
VITO, Ghent (BE)
VITO (Vlaams Instituut voor Technologisch Onderzoek, Flemish Institute for Technological
Research) undertakes research projects of public interest in various domains, among others
using remote sensing (https://vito.be/nl/technologie-groep/remote-sensing), with some 90
persons employed. They collaborate extensively with EODaS which provides data, but apart
from that they collect or buy remote sensing data (satellite, airplane manned or unmanned,
drone) paid for by projects’ clients.
Page 24 | 75
They provide free access to Sentinel satellite data via the Terrascope data platform
(https://remotesensing.vito.be/case/terrascope).
Agriculture and Fisheries– GIS & Geocounter (Geoloket), Brussels (BE)
Landbouw en Visserij - Geoloket Landbouw (Agriculture and Fisheries - Geocounter Agriculture)
conducts agricultural surveys on crop areas under cultivation, for the calculation of EU CAP
subsidies. To that end, they use machine learning algorithms to automatically recognise crops
as far as is possible; when this proves unfeasible, the traditional method of surveying is used
and its results are fed into the AI algorithm to improve it gradually. LV uses data provided by
EODaS. See
https://www.landbouwvlaanderen.be/eloket/Domain.Eloket.Portaal.Wui/Content/Help/132.G
eoloket%20landbouw/geoloket.htm, Dutch only).
Conclusions
A large amount of fairly recent Earth observations is freely available online as open data, including high-
resolution and very-high-resolution aerial photography. Unfortunately, these images cannot be used for
crop recognition via machine learning, because they are recorded at an insufficient frequency (annually
at low frequency (annually and tri-annually, and even 10-yearly for the highest-resolution data) to
constitute the time series needed for analysis. Furthermore, many images cannot be used for crop
recognition due to a recording time unfit for this purpose (during winter).
Analysis
Because the first subtask, assessing the data situation, came to the conclusion that high-resolution aerial
photography data are not readily available with the required frequency, the next foreseen step of
testing machine learning methods to analyse the data could not be taken.
Nevertheless, a first assessment of the data science capabilities of the various partners to address the
research question was conducted.
Statbel
Statbel has personnel capable of performing this type of analysis, but all are more than fully occupied
with operational business. Although there is awareness of the need to invest in building AI and machine
learning capacity, at this moment the budgetary situation does not allow doing so. Other urgent
priorities unfortunately have to take precedence.
Statistics Flanders
The recruitment of experts able to conduct the type of analysis required by this task is planned.
VITO
VITO has machine learning expertise and in fact has executed or is executing various quite similar
projects using aerial photography (e.g., detecting asbestos roofs from aerial photography, creating Solar
Map of Flanders). However, and although these experts are perfectly willing to provide advice,
comments and feedback when asked, this analytical capacity is only available against payment.
Dept. Agriculture and Fishery – GIS/Geocounter
The Department Agriculture and Fisheries of the Flemish administration conducts agricultural surveys
on crop areas under cultivation, for the calculation of EU CAP subsidies. They use machine learning
algorithms to maximally determine land use and if possible, crops from satellite images and aerial
photography, thus greatly reducing the need to survey farmers. These survey responses are then fed
Page 25 | 75
back into the AI algorithms to improve them. LV also is willing to provide advice, comments and
feedback, but their core business is of course not conducting projects. Moreover, their need for
additional precision is not high as they always have the option to ask information about specific crops,
and they need to do so to obtain the level of precision required for the correct assignment of EU
subsidies.
Annotated literature overview
Another potentially useful outcome of Case study 3 is a commented overview of reports, documents
and webpages on similar applications and projects by the organisations and units contacted in the
course of the case study.
EODaS, VITO: Detection of asbestos-containing roofs from the sky via artificial neural networks
(AI) - https://overheid.vlaanderen.be/asbestdaken-monitoren-vanuit-de-lucht-aan-de-hand-
van-ai (Dutch)
Asbestos was used as a building material in many houses and buildings in Flanders from the 1970’s and
1980’s, among others as slate or corrugated-sheet roof covering. Because of the health hazard of
asbestos, OVAM (the Flemish Public Waste Agency) is elaborating an asbestos removal plan, part of
which is the gradual replacement of asbestos-containing slates and corrugated sheets. In order to create
an inventory of asbestos-containing roofs in Flanders, OVAM has turned to remote sensing expertise
provided by EODaS and VITO. To this end the high-resolution aerial images obtained in the context of
the Flanders Digital Height Model (DHMVII 2013-2015) are being analysed with deep-learning
algorithms within the EODaS Machine Learning workflow.
EODaS, VITO: Unmanned aircraft for the operations of VLM (Flemish Land Agency) -
https://overheid.vlaanderen.be/onbemande-vliegtuigen-voor-de-werking-van-de-vlm (Dutch)
EODaS, VITO, VLM and ANB (Agency for Nature and Forests) carry out an analysis on the possible uses
of imagery collected by unmanned aircraft in a limited test natural area, to be analysed via AI techniques
and other methods. The project aims to assess the possible added value for determining groundwater
levels, plant growth, relief or woody vegetation, especially in the less inaccessible parts.
Solar Map of Flanders (Zonnekaart Vlaanderen) - https://overheid.vlaanderen.be/bvk-
zonnepotentieel-vlaanderen-voorbeeldprojecten
By analysing the very precise LiDAR-based elevation measures from EODaS’ Digital Height Model
Flanders II (DHMV II, 2013-2015) to determine the surface area, orientation and inclination of some 2.5
million roofs, and by combining these data with meteorological, land registry and address data, the
Flemish Energy Agency (VEA) and VITO created the Solar Map of Flanders which shows the ‘solar score’
for all buildings or parts of buildings, and their ‘solar potential’.
An overview of all ‘Image,Processing Chain’ remote sensing projects by Information Flanders can be
found here: https://overheid.vlaanderen.be/BVK-remote-sensing-projecten-bij-Informatie-Vlaanderen
(Dutch).
Conclusions
The research question, whether aerial photography rather than satellite data is needed to recognise
crops in areas characterised by relatively small plots and higher diversity of crops, could not be answered
due to unavailability of images with both a high resolution and high frequency. Satellite data, at a fairly
low resolution of 10 m, are freely available at a frequency (every 5 days) seemingly adequate for crop
recognition in areas characterised by fairly large plots and a limited variety of crops. Aerial photography
Page 26 | 75
data with a high resolution, on the other hand, are available only at an annual or even multi-annual
frequency, and sometimes in the winter season when hardly any crops are present.
A possible way out of this dilemma would be to pay for the creation of a high-resolution high-frequency
dataset for a carefully selected area, limited in surface area but predominantly agricultural, with small
and varied plots. The cost might not be as prohibitive as before, due to the continuous evolution of
aerial photography techniques, notably the development of unmanned aerial vehicles (UAVs or
‘drones’).
This may eventually result in a two-way approach to crop recognition, on the one hand using lower-
resolution satellite data for regions with large plots and a limited variety of crops, complemented on
the other hand with high-resolution aerial photography for regions with fairly small plots and a larger
range of different crops.
5. Report on thematic task 2 - Build-up area
5.1. Case study 4 - Implementing SDG indicator 11.7.1
5.1.1. Pre-works
State of the art
As our goal is to implement the UN methodology to compute the SDG indicator 11.7.1 we mainly used
the methodological reports written by UN HABITAT. We received several documents, some of them are
draft, from the UN HABITAT documenting on the methodology.
Moreover, several documents or articles concern defining population agglomerations or cities, which is
as the heart of the SDG indicator. Eurostat for example define, within the Tercet typology, the concept
of Cities based on a 1km² grid cells. For that purpose, Eurostat has released a Methodological manual
on territorial typologies (https://ec.europa.eu/eurostat/documents/3859598/9507230/KS-GQ-18-008-
EN-N.pdf/a275fd66-b56b-4ace-8666-f39754ede66b). One similarity with the UN HABITAT methodology
is that one has to cluster contiguous cells to define areas.
Statistical product definition
The statistical product is the SDG indicator 11.7.1 which is the “average share of the build-up area of
cities that is open space for public use for all, by sex, age and persons with disabilities”. It consists of a
number between 0 and 1 for every city in France.
Moreover, the statistical product can be refined by:
- distinguishing among the “open space for public use” between the streets, the green open public
spaces and others open public spaces
- giving the proportion of people (by sex and age) who live within the cities who have access to these
open public spaces (except streets) meaning they are located at less than 400m from the space.
Data source & toolkit
Data
OSO – OSO is a landcover map (raster) which resolution is 20m. It is derived from Sentinel 2 imageries
which are processed through the iota2 chain developed by CESBIO, an informatic laboratory based in
Toulouse, France. It covers the whole metropolitan France and the last version is from 2018 imageries.
Bdtopo – The Bdtopo consists of multiple vector maps (layers) describing the human constructions
(roads, buildings, bridges, parks, parking, cemeteries, etc.) as well as the natural elements (waterways,
Page 27 | 75
ground elevation, forests, etc.). It is maintained by the national geographical institute (IGN) in France. It
covers the whole french territory and is updated every year.
OSM – Open Street Map is a well known open data platform giving access to detailed geographical
information. It is available in France, but the quality may vary between areas.
Sentinel 2 – Sentinel 2 is the name of two earth observations satellites, the first one Sentinel-2A
launched in 2015 and the second one Sentinel-2B launched in 2017. They will be active for normally 7
years. They give access to 20m resolution raster imageries of the whole earth every 5 days
approximately. The bandwidths are located in the visible lights and in the infra-reds.
Cadastral plan – Vectorized cadastral plan with some additive information on the land use and owners.
Software
R (packages raster, sf)
5.1.2. Stage 1
Test site definition
We first considered the 31 urban units in France that contains more than 200 000 inhabitants. These
cities are listed in the table below.
Table 5.1 List of cities.
City name Population
1 Paris 10 923 026
2 Lyon 1 668 841
3 Marseille 1 443 980
4 Nice 1 111 658
5 Lille 1 004 759
6 Toulouse 955 238
7 Bordeaux 928 517
8 Nantes 659 454
9 Toulon 596 700
10 Douai 510 698
11 Avignon 480 117
12 Rouen 456 875
13 Grenoble 454 016
14 Strasbourg 440 724
15 Montpellier 427 441
16 Tours 350 648
17 Rennes 321 135
18 Valenciennes 314 183
19 Metz 284 303
20 Saint-Étienne 283 825
21 Armentières 281 176
22 Orléans 280 029
23 Nancy 271 940
24 Clermont-Ferrand 268 475
25 Bayonne 238 495
26 Mulhouse 238 358
27 Angers 235 886
28 Le Havre 234 250
29 Dijon 229 250
30 Le Mans 226 563
31 Reims 207 692
These cities are located all over metropolitan France, but none of them belong to an overseas
department.
Page 28 | 75
Data collection
Data sources
The OSO landcover map has been acquired freely from http://osr-cesbio.ups-tlse.fr/~oso/. The Bdtopo
and the cadastral plan are data available within the institute (Insee): some agreements between Insee
and IGN and between Insee and DGFiP (general direction of public finances) allow the institute to have
access to these data. OSM and Sentinel 2 are freely accessible online data.
Repositories
The Bdtopo and the cadastral plan are located on a secured server. The non confidential data can be
stored on a local platform (non protected servers) and used on virtual machines.
Collection
For the moment, only OSM and Sentinel 2 data are non comprehensive. All the data available have not
been downloaded because of memory size and downloading time.
5.1.3. Stage 2
Data pre-processing
The OSO data base has been pre-processed: pixels are classified according to 23 classes, four of them
correspond to built-up area (thick built-up area, light built-up area, industrial and commercial areas,
roads). After pre-processing, each pixel can take one out of two values (0 for non built-up area and 1 for
built-up area).
The pre-process of the BDTopo consists of keeping the geographic elements corresponding to open
public spaces. Fortunately, there is a variable in the database that allow to directly target those spaces.
Like for the BDTopo we only keep some elements of the OSM data base: fclass variable is equal to “park”,
“recreation_ground” or “forest” and fclass is not equal to “residential”.
Pre-processing of Sentinel 2 data consists of computing the NDVI for each pixel. Moreover, we only
choose imageries with low cloud cover on the area of interest (the city boundary), and because we want
to detect green areas, we only take into account imageries recorded between March and September.
The cadastral plan is still being studied for what new information it can provides: this source has still to
be exported.
5.1.4. Stage 3
Main data processing
The data processing consists of two main steps. The first step is deriving from the OSO land use map,
the cities boundaries. The second step is to delineate within those boundaries the areas which are open
for public use.
The first step is based on pixels clustering. First, we compute for each built-up pixels the proportion of
other built-up pixels located in a 1km² disk. This proportion can vary from 0 (no other built-up pixels
around the considered pixel) to 1 (every neighbour pixel is also a built-up pixel). Then we only take into
account built-up pixels whose proportion is higher than 0.25. Finally, we cluster those pixels according
to a proximity rule: if two pixels share a common point (an edge or a corner) they are in an equivalence
relation. This allows to define equivalence classes which correspond to the final clusters. Among all the
clusters we only keep the biggest one located on the administrative boundary of the city. All the pixels
of this cluster are then unioned to produce a boundary, which is the city boundary.
Page 29 | 75
The second step is trickier as no sources can provide accurately all the open public spaces. Moreover,
we must clearly define what type of areas we are looking for. We can take advantage of having multiple
data bases: by cleverly combining them we can improve the quality of delineating open public spaces.
How to combine those sources is still under study.
Results analysis
Step 1 of the UN methodology works well. We can define for every city in France the geographical
delineation based on the OSO land use map. Computing time is around 30 minutes for one city,
depending of the size of the city (it is slower for big cities). For the moment, we only applied the method
for the main 20 cities in France. It appears that the results are very close to the urban units. Urban unit
is a concept to define urban units based on continuity of the built-up fabric. Every two buildings at a
distance of less than 200m are considered contiguous.
Nevertheless, some important differences sometimes appear. For example, for the city of Rouen located
at the north of Paris, the UN methodology doesn’t lead to link the north part and the south part of the
city separated by the Seine river (Figure 5.1).
Figure 5.1 The city of Rouen along the Seine river
The blue area is the city as defined by the UN methodology. The orange area with a red border is the
extent of the urban unit defined by Insee. The green areas are the other urban areas according to the
UN methodology.
Unlike the city of Rouen, the match between the UN result and the urban unit result is almost perfect
for the city of Reims (Figure 5.2).
Page 30 | 75
Figure 5.2 City of Reims
The blue area is the city as defined by the UN methodology. The orange area with a red border is the
extent of the urban unit defined by Insee. The green areas are the other urban areas according to the
UN methodology.
Moreover, we analyse for some cities the sensitivity of the results to the input parameters have been
explored. It appears that the results are not very sensitive to the radius on which neighbour pixels are
considered, but it can change dramatically according to the share of neighbour pixels which are also
built-up.
Step 2 is still under development as it is not straightforward to delineate the open public spaces.
Multiple sources have to be used to precisely create the boundaries of the open public spaces.
5.2. Case study 5 - Urban sprawl across urban areas in Europe
5.2.1. Pre-works
State of the art
This case study aims at characterizing urban sprawl across urban areas in the Netherlands by means of
data-driven machine learning methods, in order to evaluate to which extent can NSOs benefit from
Earth observation to monitor and report on build-up area at local to national level. Urban sprawl was
only recently officially acknowledged as an issue in Europe (Hennig et al. 2016) and numerous attempts
at characterizing urban sprawl have been made in recent years. In its 2016 report, EEA estimated that
sprawl is most pronounced in wide rings around city centres, along large transport corridors, and along
many coastlines. It further identifies the two largest clusters of high-sprawl values in Europe. The first
spans from north-eastern France to western Germany including Belgium and the Netherlands and the
second is in the United Kingdom between London and the Midlands. Hennig et al. (2015) concluded that
increasing urban sprawl in Europe causes land-use conflicts and threatens sustainable land use.
Page 31 | 75
At country scale, in order to preserve the nature and the environment, the Netherlands has striven for
the past 60 years to keep existing cities compact to avoid extensive and uncontrolled urban and
suburban sprawl. Although the urban compaction policy has prevented urban sprawl in the Netherlands,
most rural-urban fringes in the Netherlands have seen substantial urbanization in recent years
(Nabielek, Hamers, and Evers 2016). Indeed, growing welfare, global economic forces, improved
transportation networks and increase mobility have made possible for people to live and work further
away from the cities while retaining most of the cities’ advantages. Further, OECD (2018) estimated in
2018, that Dutch urban areas are less fragmented, but also more decentralized than the OECD average.
In the Netherlands, the population density dispersion across urban space lies indeed far below the OECD
average. It was also evaluated that between 2000 and 2014 the fragmentation index had reduced by
9%, i.e. new development has been constructed in a more contiguous manner.
In this study we follow the EEA (Hennig et al. 2016) approach define urban sprawl by applying the
method of ’weighted urban proliferation’ (WUP). This method quantifies the degree of urban sprawl for
any given landscape through a combination of three components: (i) the size of the built-up areas; (ii)
the spatial configuration (dispersion) of the built-up areas in the landscape (iii) the uptake of built-up
area per inhabitant or job.
Thus, the first step here is to perform a spatial analysis to delimit the built-up area of urban
agglomeration. The last two decades has seen an exponential increase in the amount of satellite
missions acquiring high resolution image time series and a general consensus has been reached that
satellite remote sensing provides viable means for measurement-based characterization of the land use
(Corbane et al. 2017; Pesaresi et al. 2016) and land cover on regional to global scale (Charlotte Pelletier
et al. 2016). Recently, the focus has been on ensemble learning methods. Two approaches will be used
to evaluate the extend of a built-up area.
First, following the Belgiu et al. (2016) study we will make use of a Random Forest state-of-the-art
classifier. Random Forest (Breiman 2001) is an ensemble method, which constructs many decision trees
to be used to classify a new instance by a majority vote. Each decision tree node uses a subset of
attributes randomly selected from the original set of attributes. Additionally, each tree uses a different
bootstrap sample data. The decision rule will be learned by evaluating the NDVI and NDBI spectral
features which stem for vegetation depiction and building detection respectively.
The second approach will explore the added-value of convolutional deep neural networks, indeed, since
the high rate availability of high resolution images (spatial resolution of 10m or finer) Deep Learning
(Lecun, Bengio, and Hinton 2015) algorithms have recently seen a massive rise in popularity in the
remote sensing community. Deep Learning is characterized by an "end-to-end" learning approach and
depends on a multilayer task module to achieve the final goal. It attempts to mimic the activity in layers
of neurons in the neocortex. The applications span from analysis task such as image fusion and/ or
registration, scene classification, object detection, segmentation, and object-based image analysis.
Most studies focused on the field of land use and land cover classification where information is retrieved
from hyper-spectral or high resolution images (Ma et al. 2019) by means of CNNs. Further, most studies
investigated supervised Deep Learning models, which require a large amount of training data to trigger
its image classification power. However, the preparation of training datasets is tedious and highly time
and/or cost-intensive. Therefore, augmentation techniques such as transfer learning, active learning are
being investigated in recent studies to increase the size and/or efficiency of the training set (Liu, Zhang,
and Eom 2017).
It appears from available LULC studies, built-up area has been one of the most important aspects to be
extracted in remote sensing images by means of Deep Learning models. Several factors such as complex
Page 32 | 75
structure, diverse texture and varied backgrounds are typical challenges for the task of built-up area
extraction (Ehrlich et al. 2018; Ma et al. 2019). Unfortunately, most available studies solely used a single
image scene, although some used multisource remote-sensing data and time series were rarely
analysed using DL algorithms. For example, Ienco et al. (2017) used multi-temporal remote sensing data
(Pléiades images) and RNNs to perform LULC classification, but only three dates of imagery were used
for this analysis (July and September in 2012, and March in 2013). See Table 5.2. Therefore, further
developments are needed to investigate the wealth of information present in long time records, such
as Landsat and Sentinel time series.
Table 5.2 Deep Learning for remote sensing related work.
Approach Sensors Pre-processing Features Applications*
Ruswurm et al 2018 Sentinel - 2 None TOA ConvRNN
Ruswurm et al 2017 Sentinel - 2 Atmospheric correction BOA RNN
Schiachalou et al
2015
Landsat
RapidEye Geometric correction and image registrations TOA HMM
Hao et al 2015 MODIS Atmospheric correction and image
registrations
Statistical
phen. Features RF
Nougueria et al
2016
Aerial images,
SPOT - - CNN
Ienco et al 2017 Pleiades Mosaic of orthorectified and atmospherically
corrected scenes
Mean and std
of bands, NDVI RNN
* ConvRNN – Convolutional Recurrent neural network, RNN – Recurrent neural network, HMM - Hidden Markov model, RF –
Random Forest, CNN – Convolutional Neural Network
Statistical product definition
The percentage of built-up area (PBA) is the ratio of the size of the built-up areas to the size of the total
area of the reporting unit and is given as a percentage.
The utilization density (UD) measures the number of people working or living (N Inh + Jobs) in a built-up
area (per km2). Built-up areas with more workplaces and/or inhabitants are considered more intensively
used, and hence less sprawled, than areas with a lower density of workplaces and/or inhabitants. LUP,
the reciprocal can also be used, that is the area of land used per inhabitant or workplace (LUP). High
LUP values indicate that more space is used per inhabitant or workplace than in areas of low LUP values.
Urban sprawl is quantified by means of Weighted Urban Proliferation (WUP) which is the product of the
dispersion, a weighting of dispersion, the percentage of built-up area (PBA) and a weighting of the land
uptake per person (LUP), that is land uptake per inhabitant or workplace. It is measured in urban
permeation units (UPU) per square metre of landscape (UPU.m−2).
The dispersion quantifies the spatial distribution of built-up areas, expressed as UPU per m2 of built-up
area (UPU.m−2). The further dispersed the built-up areas, the larger the value of DIS. Therefore, more
compact built-up areas have lower values of DIS than less compact built-up areas. Urban permeation
(UP) is a measure of the permeation of a landscape by built-up areas. It accounts for the DIS and the
PBA in the reporting unit. It is measured in UPU per m2 of landscape.
The Normalized Difference Vegetation Index (NDVI) is the well-known and most used vegetation index.
It normalizes green leaf scattering in the Near Infra-red wavelength and chlorophyll absorption in the
red wavelength.
NDVI = (NIR - RED) / (NIR + RED)
NDVI values range is -1 to 1. Negative values (values approaching -1) indicate the presence of water
while values close to zero (-0.1 to 0.1) correspond to barren areas of rock, sand, or snow. Low, positive
Page 33 | 75
values ranging from 0.2 to 0.4 represent shrub and grassland and high values (values approaching 1)
indicate temperate and tropical rainforests.
The Normalized Difference Built-up Index (NDBI)s used to extract built-up features and have indices
ranging from -1 to 1. Build-up areas and bare soil reflect more SWIR than NIR while water bodies do not
reflect in the Infrared spectrum. For vegetated surfaces, the reflection of NIR is higher than the SWIR
spectrum. The Normalize Difference Build-up Index value lies between -1 to +1. Negative value of NDBI
represent water bodies whereas higher values represent build-up areas. NDBI value for vegetation is
low. NDBI calculation is simple and easy to be derived.
NDBI = (SWIR – NIR) / (SWIR + NIR)
Built-up Index (BU) allows for analysis of urban patterns using NDBI and NDVI, it is a binary image of
which higher positive value indicates built-up and barren areas.
BU = NDBI - NDVI
Data source & toolkit
Data Sources and Access
Satellites Data
Sentinel-1 and Sentinel-2 data were acquired from The Copernicus Open Access Hub. To this end, we
make use of the API Hub which is a dedicated interface allowing users access via a scripting interface.
The API Hub Access is currently available for all users registered on SciHub. MODIS and Landsat data
were accessed via the Application for Extracting and Exploring Analysis Ready Samples
(https://lpdaacsvc.cr.usgs.gov/appeears/; AppEEARS) which offers a simple and efficient way to access
and transform geospatial data from a variety of Earth Observation datasets from NASA.
Administrative Datasets
When not in house, the Public Services On the Map (https://www.pdok.nl; PDOK) website was used to
access latest official release of open access geo-information. These datasets can be accessed via geo
web services, RESTful APIs and are available as downloads and linked data. This is current and reliable
data for both the public and private sectors. PDOK makes digital geo-information available as data
services and files. The PDOK services are based on open data and are therefore freely available to
everyone.
Software
Programming language
In this study, all tasks are carried out in Python (https://www.python.org/about/gettingstarted/)
programming language. Created 30 years ago, Python is a general purpose and high level programming
language. Python use is particularly common in data science and machine learning fields. Furthermore,
Python is developed under an OSI-approved open source license, making it freely usable and
distributable, even for commercial use. Jupyter notebooks and/or PyCharm will be use for executing
Python code.
Classification methods
Here, the first challenge is to classify land use cover by means of earth observation datasets and machine
learning methods. To this end, the Python module scikit-learn is used to implement the classifiers. This
Page 34 | 75
module integrates a wide range of state-of-the-art machine learning algorithms for medium-scale
supervised and unsupervised problems (Pedregosa et al. 2011). For comparison purposes two
traditional machine learning approaches are under consideration for implementations:
Supported Vector Machine (SVM) classifier has been widely used and reported as an
outstanding classifier (Cortes and Vapnik 1995). The basic idea of SVM is to classify the input
vectors into two classes using a hyperplane with maximal margin.
Random Forest (RF) (Breiman 2001) is an ensemble method, which constructs many decision
trees to be used for classifying a new instance by the majority vote. Each decision tree node
uses a subset of attributes randomly selected from the original set of attributes. Additionally,
each tree uses a different bootstrap of sample data.
Image processing
Image processing will be carried out mainly by means of Python code. To this end, various libraries and
packages will be used to read and process the data in order to prepare them for the classification step.
table 4.3 provides an explicit listing of the packages used for image processing.
Visualization
During this project the visualization of geo-spatial data will be made with Folium (https://python-
visualization.github.io/folium/): a Python library, which make it easy to visualize data on an interactive
leaflet map. It allows for both creating choropleth maps and passing rich vector/raster/HTML
visualizations as markers on the map. Folium supports both Image, Video, GeoJSON and TopoJSON
overlays. Moreover, this library has a number of built-in tilesets from OpenStreetMap, Mapbox, and
Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. In addition, Matplotlib
(https://matplotlib.org/), a plotting library for the Python programming language, and its numerical
mathematics extension NumPy will be used for generating more traditional plots such as generate plots,
histograms, power spectra, bar charts, error charts, scatter plots, etc.
The overview of satellite missions used in this study is presented in Table 5.3. List and description of
used packages and libraries is shown in Table 5.4
Table 5.3 Overview of satellite missions used in this study.
Mission Sensors Applications Repeat
cycle
Spatial
Resolution Formats
Sentinel -1 C-Band SAR
Sea ice, oil spills, marine winds and
waves, landuse change, respond
emergencies
12 days
5 m
20 m
20 m
.SAFE with GEO-
TIFF, XML, PNG,
XDS, HTMAL
and netCDF files
Sentinel -2
MSI (13 bands
from 443 nm to
2,190 nm)
Agriculture, Forests, land-use and
landcover change; mapping biophysical
variables; Monitoring coastal and inland
waters; risk and disaster mapping
10 days
10m
20 m
60 m
.SAFE with
JPEG2000, XML,
GML and HTML
files
Landsat 8
OLI ( 9 spectral
bands from 0,43
μm to 1.38 μm)
Agriculture, Forestry and Range
Resources Land Use and Mapping
Geology Hydrology Coastal Resources
Environmental monitoring
16 days 30 m GEOTIFF (ARD
via AppEars)
Terra/Aqua
MODIS (36
discrete spectral
bands from 405
m to 14,085 μm)
Atmosphere, Land, Cryosphere and
Ocean 16 days
250 m
500 m
1000 m
GEOTIFF (ARD
via AppEars)
Page 35 | 75
Table 5.4 List and Description of used Package and library.
Name Description
SCIPY a Python-based ecosystem of open-source soft- ware for mathematics, science, and engineering.
https://www.scipy.org/
OSGEO/GDAL
open source X/MIT licensed translator library for raster and vector geospatial data formats. It presents a
single raster abstract data model and single vector abstract data model to the calling application for all
supported formats.
https://gdal.org/
OPENCV
open source computer vision and machine learning software library. OpenCV was built to provide a
common infrastructure for computer vision applications and to accelerate the use of machine perception
in the commercial products.
https://opencv.org/
PANDAS
open source, BSD-licensed library providing high-performance, easy-to-use data structures and data
analysis tools for the Python programming language. The perfect tool for bridging the gap between rapid
iterations of ad-hoc analysis and production quality code.
https://pandas.pydata.org/
GEOPANDAS
open source project to make working with geospatial data in python easier. GeoPandas extends the
datatypes used by pandas to allow spatial operations on geometric types.
http://geopandas.org/
5.2.2. Stage 1
Test site definition
Two study areas are selected in the Netherlands (Figure 5.3). The first study area will focus on the most
densely populated part of the Netherlands, The Randstad. In 2016, it was estimated that almost 50% of
the Dutch population lived in Randstad which represents about 25% of the country’s surface area. The
second study area will focus on Zuid-Limburg which in past decades has seen the number of inhabitants
decreased and according to forecast (PBL add ref here) is projected to shrink by more than 10.% by 2027
(Nabielek, Hamers, and Evers 2016).
Figure 5.3 Location of test areas
Page 36 | 75
Data collection
In this study, we will exploit the spectral and temporal information present in four satellites datasets.
The satellite datasets will be acquired on the domain of interest for May, June, July of 2017, 2018, and
2019.
Earth Observation Data
MODIS
In this study, we make use of the 16-day NDVI data composite from MODIS Terra and Aqua. The
MOD13Q1 Version 6 product provides a Vegetation Index (VI) value at a per pixel basis for two primary
vegetation layers, the first is the Normalized Difference Vegetation Index (NDVI) and the Enhanced
Vegetation Index (EVI), which has improved sensitivity over high biomass regions. Detailed description
on the MODIS NDVI retrieval algorithm can be found in Didan (Didan 2015). This dataset runs from 2004
to present with a fortnightly temporal and 250 to 500 m spatial resolution. The datasets are provided
with 16 days pixel reliability which allow for cleaning of the data as well as masking sea pixels.
Landsat
In this study, we will make use of the Landsat 8 OLI Collection 1 Tier 1 orthorectified scenes, using the
computed surface reflectance to create 16-day NDVI data composite. OLI collects data in eight bands at
30m resolution and one panchromatic band at 15m resolution. The atmospheric correction is
performed by means of the LEDAPS algorithm and convert the raw Landsat data to surface reflectance.
Sentinel 1&2
In this study, the Level-1 Ground Range Detected (GRD) products in dual-pol (VV+VH) Interferometric
Wide swath mode (IW) of Sentinel-1 data are used. Sentinel-2 constellation, launched in 2015 and 2017,
aims at providing a full and systematic global coverage at spatial resolution as high as 10 m. In this study,
we will solely make use of the, atmosphere corrected, bidirectional surface reflectance from the 10m
resolution bands, namely B02, B03, B04 and B08 of which B04 and B08 will be used to compute the
NDVI. The data volume of Sentinel 1 and 2 in their twin constellations is approximately 3.6 TB/day and
1.6 TB/day respectively(Soille et al. 2018). The synergistic use of Sentinel-1 and Sentinel-2 promises an
access to a cloud-free global image database at high spatial resolution.
Register Data
The Basisregistratie Adressen en Gebouwen
BAG (https://zakelijk.kadaster.nl/basisregistratie-adressen-en-gebouwen) is the main registry
containing all addresses and buildings in the Netherlands. The BAG furthermore contains additional
information such as the object type, the area of a building, the date of build, etc. The BAG is maintained
by local authorities who are also responsible for the quality of the registry. By using the BAG as a filter
for the satellite images, geographical areas without buildings could be removed from consideration.
Soil Use Classification
The Soil Use File (https://www.pdok.nl/introductie/-/article/cbs-bestand-bodemgebruik) contains
digital geometry of land use in the Netherlands. Examples of land use are traffic areas, buildings,
recreational areas and inland and outside water. The limitations are largely based on the Top10NL (BRT).
The current classification is mainly based on information from aerial photos.
CORINE Land cover
For validation and training purposes the CORINE Land Cover (CLC) for the year 2006 will be used. The
CLC inventory was initiated in 1985 (reference year 1990) and updated version have been produced in
Page 37 | 75
2000, 2006, and 2012. It consists of an inventory of land cover in 44 classes. CLC uses a Minimum
Mapping Unit (MMU) of 25 hectares (ha) for areal phenomena and a minimum width of 100 m for linear
phenomena. The time series are complemented by change layers, which highlight changes in land cover
with an MMU of 5 ha. The Eionet network National Reference Centres Land Cover (NRC/LC) is producing
the national CLC databases, which are coordinated and integrated by EEA. CLC is produced by the
majority of countries by visual interpretation of high resolution satellite imagery. In a few countries
semi-automatic solutions are applied, using national in-situ data, satellite image processing, GIS
integration and generalization. The 2012 version of CLC is the first one embedding the CLC time series
in the Copernicus program, thus ensuring sustainable funding for the future.
GHS-BUILT
The Global Human Settlement Layer (GHSL) produces new global spatial information, evidence-based
analytics and knowledge describing the human presence on the planet (Pesaresi, Syrris, and Julea 2016;
Corbane et al. 2017). GHSL aims to provide scientific methods and a system for reliable ad automatic
mapping of built-up areas from remote sensing data. GHSL operates in an open and free data and
methods access policy (open input, open method, open output). The GHS P2016 suite consists of multi-
temporal products, that offers an insight into the human presence in the past: 1975, 1990, 2000, and
2014. The European Settlement Maps (GHS-BUILT) are pan-European built-up layers derived from
higher resolution imagery. Information layers on built-up presence as derived from Sentinel1 image
collections (S1A 2016). It contains two experimental datasets, made with different set of parameters,
ESM training (Europe only) and GHSL training (World)
5.3. Case study 6 - Combination of administrative and Earth Observation data to determine
the quality of housing
The aim of this case study is to combine remote sensing data with official statistics and administrative
data in order to investigate the quality of urban life. Remote sensing data can be used to measure
different aspects of quality of life such as air quality, urban heat islands and urban green. Through the
combination of official statistics, earth observation and geodata this topic can be addressed in a more
comprehensive way than with only one data source.
As of the reporting year 2018, the German Microcensus is geocoded so that information about the
surroundings can be linked to a household. The aim of this case study is to investigate the added value
that can be generated from geocoded survey statistics. The topic of urban quality of life was chosen to
demonstrate this, since it is assumed that the wellbeing in a residential environment can be influenced
through its surroundings. In a first step, this case study will identify quality of life aspects that can be
measured through geographic data so that they can be linked to the geocoded statistic. The focus of
the geographic data lies on remote sensing data. The aspects air quality, urban heat islands, urban green
and noise pollution were identified through reviewing quality of life initiatives, discussions with earth
observation experts and a literature review on determining the quality of life using remote sensing data.
Furthermore, a literature review of the identified aspects was conducted. After the determination of
the aspects that could be examined in this study, possible data sources and data sets, which could be
linked to a geocoded statistic, are listed.
The Microcensus contains information about the socio-economic situation of a household. In this case
study, it will be investigated if remote sensing data can be used to determine differences in quality of
life within cities. The data on quality of life on a regional scale and socio-economic characteristics can
be linked to investigate their connection.
Page 38 | 75
5.3.1. Pre-works
State of the art
Firstly, the literature review will focus on the different initiatives of quality of life and secondly
specifically on those aspects of urban quality of life, which can be monitored with remote sensing data.
In this report, quality of life in urban regions will be referred to the wellbeing of the inhabitants of urban
areas based on different aspects. In a first step, these relevant facets will be identified.
Quality of life initiatives
Several actions deal with different aspects of quality of life. The initiative “Well-being in
Germany”(“Well-Being in Germany” n.d.) identified different objects of investigation within the
framework of national and international research projects and discussions. The OECD Better Life index
(http://www.oecdbetterlifeindex.org) is an interactive tool that allows to perform an analysis on well-
being according to the users own preferences. These preferences can be saved and compared by region
or gender. Furthermore, a regional score exists, measuring the topics on a regional scale. There are
similarities between these two initiatives: Both include indicators on income, health, education,
environment and work-life-balance. These initiatives were used as a starting point to identify quality of
life aspects that are relevant for the urban environment and filter out issues that can be measured using
remote sensing data. Only the target of air quality in both initiatives is related to urban quality of life
and quantifiable through remote sensing. This aspect is also included in the Sustainable Development
Goals (SDGs), in the indicators 3.9.1 “Mortality attributed to household and ambient air pollution” and
11.6.2 “Fine particulate matter in cities”.
Additionally, the 2030 Agenda for Sustainable Development adopted by all United Nations Member
States in 2015, provides a shared blueprint for peace and prosperity for people and the planet, now and
into the future ( https://sustainabledevelopment.un.org/sdgs). As a result, 17 Sustainable Development
Goals (SDGs) were declared, including indicators related to quality of urban life and which could be
generated by remote sensing data. The SDG indicators which are connected to urban quality of life are:
11.1.1 Urban population living in inadequate housing
11.2.1 Convenient access to public transport
11.3.1 Land consumption rate to population growth rate
11.7.1 Built-up area of cities that is open space for public use
There are further aspects which are named in the literature concerning quality of life: The literature
includes the aforementioned aspects but also covers further topics dealing with urban quality of life,
which can be assessed using remote sensing. The aspects are noise pollution, urban green and urban
heat islands. For both geo- and remote sensing data can be used to find regional differences which could
indicate a difference in quality of life across one urban region.
Air quality
Air pollution and thus air quality is an important aspect of urban quality of life which can be seen by the
fact that it is mentioned in the initiatives. Remote sensing can be used to aid in air quality
measurements:
Sentinel-5P from the Copernicus Programme of the EU has the Spectrometer TROPOMI
(Tropospheric Monitoring Instrument) measures Ozone, nitrogen dioxide, carbon monoxide
with resolution of 3.5 km to 7 km. The datasets are available the Sentinel-5P pre-operations
datahub.
Page 39 | 75
The Copernicus Atmosphere Monitoring Service (CAMS) is a component of Copernicus and
consists of two major forecast and analysis systems. First, the CAMS global near real time (NRT)
service, based on the European Centre for Medium-Range Weather Forecasts (ECMWF)
Integrated Forecast System, provides daily analyses and forecasts of reactive trace gases,
greenhouse gases and aerosol concentrations. Secondly, seven regional models in Europe
perform air quality forecasts and analyses on a daily basis. Based on these individual forecasts
and analyses an ensemble forecast of air quality over Europe is produced and disseminated by
Météo France called ENSEMBLE. Predictions of daily mean and maximum concentrations of
greenhouse related gases as well as particulate matter in the air, computed using numerical
models, are available online. The data have a spatial resolution of 0.1 degree and are available
daily with 1 hour intervals.
NASA provides air quality products derived from the Moderate-resolution Imaging
Spectroradiometer (MODIS) instrument.
The project “SAUBER” simulates the air quality with machine learning methods to provide
comprehensive spatial information about the current and future air quality. To achieve this goal satellite
data will be linked with data from local pollution monitoring stations as well as traffic and weather data.
Through this combination a higher spatial resolution will be reached than through satellite data alone.
Urban Heat Islands (UHI)
The following summary is based on the report of the U.S. Environmental Protection Agency (2008): Land
cover influences the temperature of areas: Roofs and pavements have higher temperatures than
vegetated areas or wetlands. Air temperatures in cities are much higher than rural surrounding areas,
especially after sunset, where the difference can be up to 12 °C. This leads to UHIs, particularly in the
summer, which can result in problems with human health including heat-related mortality. The elderly
and infants are especially vulnerable. Van der Hoeven und Wandl (2014) studied UHI in Amsterdam
retrospectively during a heat wave using Landsat 5 with a resolution of 120 m. They combined their
findings of the location of heat islands with the energy labels of buildings and a quality of life index to
identify vulnerable inhabitants. In general, Landsat satellites are able to provide information about
surface temperatures in a spatial resolution of up to 30m. The latest Landsat satellite in orbit is Landsat
8 with two thermal bands. Since the start of the Sentinel-3 satellite, thermal information can also be
derived from this Copernicus satellite in medium spatial resolutions (300m). Due to global warming the
importance of addressing UHIs will grow with time.
Urban green
The presence of urban green improves the quality of life since it improves the quality of air, reduces
noise pollution, regulates the temperature and provides space for recreation.
Vegetation can be detected in satellite images by using the visible bands, the near infrared channel and
by calculating the NDVI. An interesting aspect of urban green is how its access is distributed across a
city.
Possible data sources:
Satellite data from Sentinel-2 or Landsat can be used to identify green covered pixels/areas.
Data from the high-resolution layer Tree cover density can also be used to acquire data
regarding tree cover.
The German digital land cover model LBM-DE can be used to identify potentially green areas by
using land cover and land use information derived from remote sensing data.
Page 40 | 75
The Copernicus Urban Atlas or High-resolution Layer “Imperviousness” can be used to identify
urban or built-up areas.
Noise pollution
The following data sources can be used to model noise pollution. Geodata can be used to get
information about traffic noise emission from railways or motorways. Information about street network
from OpenStreetMap, Points of Interest (like hospitals, schools etc.) and railway network from the
Deutsche Bahn (DB) Netz AG can be used.
StreckeDB Streckennetz DB Netz AG 19
Street data from TopPlus and OpenStreetMap
The data of the DB route network are supplied to the BKG in MapInfo format at the beginning of each
year, as of November of the previous year. The data set has a positioning accuracy of 10m and
corresponds to a scale of 1:25000. The Deutsche Bahn route network is available to federal institutions
to perform compulsory tasks, subject to approval by DB Netz AG.
Statistical product definition
The output of this case study will be an analysis. Through the combination of different data sources, a
statistical analysis of how urban quality of life differs between socio-economic groups can be conducted.
The collected geographic data is an output as well. It can be used as a basis for further research.
Data source & toolkit
In the following, besides some useful tools all data sources which were identified as useful for this study
will be described.
Geodata
Additionally, to the already mentioned data sources, further useful geodata that were identified are:
From digital elevation and surface models, information about the height of buildings and
infrastructures can be derived.
Data sets like special Points of interest (POI) can be used to evaluate the distance to essential
infrastructures like hospitals, schools etc. The BKG offers the following georeferenced POIs with
additional information, like universities, kindergartens, hospitals, and schools.
A digital data set about the explicit geometry of houses is available via the HK-DE/HU-DE House
coordinates
Urban Atlas
The Urban Atlas is a product derived from Copernicus data to create a harmonized land cover and land
use map for European cities. The information is derived from Earth Observation but backed by ancillary
data: Very high resolution satellite imagery such as SPOT 5 & 6 and Formosat-2 are used. Basic land
cover classes are determined through automatic segmentation and classification. Furthermore, a visual
and manual interpretation is done from both very high-resolution satellite imagery and navigation data
(OSM or commercial navigation data).
The Urban Atlas 2012 is available for 693 functional urban areas (FUAs) in EU28 and EFTA as well as 107
FUAs in Turkey and the West Balkans countries. 17 urban classes with a minimum mapping unit of 0.25
ha, depending on their class, in urban areas exist. The classes which are relevant for this case study are
described below:
Page 41 | 75
Urban fabric classes are distinguished by their degree of soil sealing independent of the type of
housing. They are separated into continuous urban fabric (>80% soil sealing), discontinuous
dense urban fabric (50-80%), discontinuous medium density urban fabric (30-50%),
discontinuous low density urban fabric (10-30%), discontinuous very low density urban fabric
(<10%) and isolated structures.
Industrial commercial, public, military, private and transport units is a land use class where the
artificial surface is higher than 30 % and more than half of the surfaces has non-residential use
such as industrial, commercial or transport.
Roads:
o Fast transit roads and associated land, which are defined as “motorways” in the
navigation data
o Other roads and associated land
o Railways and associated land
Land without current use is defined as areas in proximity to artificial surfaces which are still
waiting to be used or re-used.
Green urban areas are public green areas mainly for recreational use such as gardens, zoos and
parks. Furthermore, forests from surrounding rural areas which extend into urban areas are
also classified as green urban areas if at least two sides are adjacent “to urban areas and
structures and traces of recreational use are visible.”
Sports and leisure facilities can be publicly or commercially managed.
Earth observation data
Sentinel 2
General information of Sentinel-2 satellite is described in chapter 3 - EO data sources. Sentinel data can
be used to derive green covered areas within a city. Thus, the urban green ratio will be calculated based
on optical Sentinel-2 data.
Sentinel-5P
Sentinel-5P carries TROPOMI which is the most advanced multispectral imaging spectrometer to date
which measures the air pollution. 5 different aspects of the atmosphere can be measured, which can
be seen in Figure 5.4:
Ozone: Protection from the ultraviolet radiation is given through the stratospheric ozone.
Ozone can form in the lower atmosphere which can lead to respiratory problems and damage
vegetation.
Forest fires and wood processing releases formaldehyde, which can irritate the eyes and the
lining of the nose and throat.
Sentinel-5P can measure nitrogen dioxide (NO2). NO2 can form naturally from BIOGENE
emissions such as microbiologic processes in the soil. In cities however most NO2 emissions
stem from motor vehicle exhaust. Chronic exposure can cause respiratory effects. The total
atmospheric NO2 column between the surface and the top of the troposphere is measured with
Sentinel-5P.
Methane, which is a potent greenhouse gas, stems from the fossil fuel industry, landfill sites,
livestock forming, rice agriculture and permafrost thawing. Headaches and nausea can be a
consequence of exposure.
Page 42 | 75
Volcanic activity, fires and burning of fossil fuels can lead to carbon monoxide pollution. The
amount of oxygen which can be transported in the blood stream can be affected through
breathing air polluted by carbon monoxide.
Figure 5.4 Sentinel-5P (Source: http://www.esa.int/spaceinimages/Images/2017/09/Sentinel-5P_infographic)
Landsat 8
Landsat 8 can be used as a freely available source of optical data. General information of Landsat 8
satellite is described in chapter 3 - EO data sources.
Georeferenced official Data
Page 43 | 75
The Federal Statistical Office of Germany (Destatis) is currently working on georeferencing all statistics
which have a geographic component. Some examples are listed below focusing on data sources which
might become useful during this ESSnet through the combination with the other data sources:
Census
Demographic data from the census 2011 is georeferenced on a grid of 100 by 100 meters. These results
are published, where data privacy allows it. The data which is available for most urban grid cells are the
number of inhabitants and demographic information such as age and gender.
Microcensus
The microcensus is a yearly survey containing 1% of the population. The survey gathers information on
the topics of demography, economic and social situation of the household, labor market, education and
housing. It is conducted as a panel, where every household is surveyed for four consecutive years. The
microcensus of 2018 will be geocoded for the first time.
However, it is not yet ready and will only be available in the autumn of 2019 and will therefore not yet
be included in this interim report.
Map of traffic accidents
Data exists of all street accidents with the modes of transportation which were involved and their
severity along with their location.
Map of hospital accessibility
The accessibility of hospitals exists as a data set on the driving distance to the closest hospital on a grid
of 100 by 100 meters.
Business register
The Statistical Business Register (German: Statistisches Unternehmensregister, URS) is geocoded and
includes among other things the branch of the company, revenue and number of employees.
Toolkit
GIS and R (R Core Team 2018) will be used as a toolkit as well as python scripting.
5.3.2. Stage 1
Test site definition
This case study is limited to urban areas. In a first step only the Frankfurt Rhine-Main area will be
considered. The surroundings of the households covered in the microcensus will be examined. The
surrounding characteristics of a household will be calculated on the INSPIRE 100 by 100 m grid as to
match the grid of the georeferenced census.
Data collection
The data was collected and matched to the level of the Inspire grid cells, which have the Lambert
azimuthal equal-area (LAEA) projection, with a width of 100 m. The georeferenced census data is
published in this format. Furthermore, at Destatis a data base for this grid is being built up, results of
this project can be used to fill the data base.
Sentinel-2
Page 44 | 75
The Sentinel-2 scenes has been downloaded through the Copernicus Open Access Hub Figure 5.5Figure
5.5 shows the satellite images matching these parameters which can be downloaded directly.
Figure 5.5 Results: Sentinel-2 images found for the period of interest, in the area of interest, as shown by the
green shapes.
Sentinel-5P
Sentinel-5P data can be collected from the Sentinel-5P Pre-Operations Data Hub
(https://s5phub.copernicus.eu/). An area and a time frame of interest can be defined. Figure 5.6 shows
the vertical column of NO2.
Page 45 | 75
Figure 5.6 Sentinel-5P Nitrogen dioxide
Landsat
Land surface temperature (LST) derived from Landsat 8 will be collected in order to analyse the heat
development within the study area. The LST has to be calculated based on the published methodology
following Cook et al. (2014). Therefore, the information of the thermal band and the NIR and Red Band
is needed. Figure 5.7 illustrates the result of the calculation of the summer of 2016 for the study area.
Page 46 | 75
Figure 5.7 Land surface temperature derived from Landsat 8 for the Frankfurt Rhine-Main area. Date: 23rd of August 2016.
5.3.3. Stage 2
Data pre-processing
Sentinel-5P
To process Sentinel-5P data in a way in which air quality can be inferred from it, chemical transport
models (CTM) have to be simulated due to the non-linearity of NO2 chemistry. A CTM simulates the
atmospheric chemistry. The satellite data has to be scaled to relate to ground-level measurements.
Satellite data of NO2 measurements is strongly linked to in-situ measurements (Bechle et al. 2013).
Landsat
In order to get land surface temperature (LST) from Landsat satellites a couple of pre-processing steps
are needed, as long as the USGS Landsat 8 LST ARD (analysis ready data) Level-2 product is not available,
as it is for the study region. Thus, the digital numbers need to be converted to radiance, then to at-
sensor (Top of atmosphere, TOA) brightness temperature. Furthermore, the land surface temperature
(LST) is calculated by estimating the surface emissivity, therefore land cover information and the
proportion of vegetation (via the Normalized Difference Vegetation Index, NDVI) is needed.
5.3.4. Stage 3
Main data processing
Since the microcensus data is not yet available the main data processing will be included in the final
report.
Page 47 | 75
6. Report on thematic task 3 - Land cover
6.1. Case study 7 - Comparing «in-situ» and «remote-sensing» collection mode for land cover
data
6.1.1. Pre-works
State of the art
Classification of satellite's images
Satellite's images are processed by a numerical classification technique that uses the information
contained in the values of one or more spectral bands to classify each pixel individually by assigning a
particular land cover class to it (i.e. water, forest, maize, buildings, etc.). The result of the classification
is a new image composed of a mosaic of pixels that each belongs to a particular class. This image is
essentially a thematic representation of the original image. There are several approaches to
classification including supervised and non-supervised classification methods. The main difference is
that in the supervised classification the user specifies the different pixel values or spectral signatures to
be associated with each class by selecting representative sampling sites of type of known coverage. The
computer algorithm uses these training zones to classify the entire image. On the other hand, non-
supervised classification does not use training data and it is the algorithm which forms the spectral
classes on the basis of the numerical information contained in the data itself (pixel values for each band
or index; NRCan 2018). Several studies show that supervised classification methods perform better than
unsupervised methods (Khatami, Mountrakis, and Stehman 2016; Szuster, Chen, and Borger 2011).
Many algorithms are used in the land use classification. The most well-known are Support Vector
Machine (SVM), Decision Trees and Random Forest (Breiman 2001). Several studies show more
satisfactory results with RF compared to the first two classifications (C Pelletier et al. 2016; Inglada et
al. 2015; Rodriguez-Galiano et al. 2012; Gislason, Benediktsson, and Sveinsson 2006). In order to
automate the classification procedure, to process large volumes of data over large French territories in
a short time, the Centre d'Etudes Spatiales de la BIOsphère (CESBIO) developed the iota2 processing
chain based on Random Forest. Land use maps for the whole of Metropolitan France have already been
produced by iota2.
The use of satellite images in the production of land cover maps is becoming increasingly frequent.
Multi-spectral and multi-temporal imaging is used to characterize phrenological variations in the state
of vegetation cover (Rodriguez-Galiano et al. 2012) and to detect the different components of the
Earth’s surface. According to Inglada et al. (2017) high-resolution spatial (metric or decametric) and
temporal images are required to produce detailed land cover maps. Sentinel-2 images (Drusch et al.
2012) with its unique characteristics (290 km wide swath, 10-60 m spatial resolution, revisited 5 days
with 2 satellites and 13 spectral bands) provide a powerful tool for mapping and monitoring large rich,
complex and sensitive ecosystems (Yesou et al. 2016). Ma et al. (2017) also found a positive correlation
between the size of study areas and the spatial resolutions of the images used. Very high spatial
resolution (<2 m) images are the most used. Nevertheless, due to the ease of access and the high
availability, Sentinel-2 images are often used.
In order to automate the classification procedure and process large volumes of data over large French
territories in a short time, the Centre d'Etudes Spatiales de la Bioshphère (CESBIO) developed the iota2
processing chain based on Random Forest. The main product of the chain so far is the Land Use Map of
metropolitan France "OSO" from images Landsat-8 and Sentinel-2 to, respectively, 20 and 10 m
resolution.
Page 48 | 75
Statistical product definition
The statistical products of the french TERUTI annual survey are complete breakdowns of the national
areas into land cover and land use classifications at NUTS2 level. The land cover classification has 7 main
categories: Artificial Land, Cropland, Woodland, Shrubland, Grassland, Bare land, Water. The statistics
derived from TERUTI survey are a combination of about 68.000 direct observations made by surveyors
in the field every year and automatic data imputation for about 7.2 million points from administrative
and geographical databases.
The OSO map is a land cover map covering the whole french metropolitan territory with a land cover
classification of 17 categories. Both vector and raster maps are available for 2016, 2017 and 2018. The
vector map has a minimum collection unit of 0.1 ha whereas the raster map is made of 10 meters pixels.
The automatic production process of these OSO maps is based on a supervised classification of time
series of SENTINEL2 images using existing databases as reference data for training and validation steps.
The statistical products of the CS7 should be:
a same land cover classification suitable for both TERUTI survey and OSO map;
for the comparison at an individual level (points-pixels) : kind-of confusion matrix (and derived
indices) crossing land cover classes from Teruti with land cover classes from OSO, for the
samples of «in situ» points and imputed points, and for different areas (regions) of the French
territory ;
for the comparison at an aggregated level (NUTS2): land cover area estimation from TERUTI and
OSO;
an adaptation of the remote-sensing process in order to improve the land cover classification
on TERUTI points;
investigate land-cover changes detection by analysing multi-annual Sentinel-2 time series
images in order to provide a list of points from TERUTI sample where land cover likely changed
since 2017.
Data source & toolkit
Sentinel-2 images and iota² processing chain
In France, the Theia Data and Services Center for continental surfaces is in charge of processing Sentinel-
2 images (crossing to level 2A) to correct atmospheric effects and obtain a surface reflectance (Olivier
Hagolle, Huc, et al. 2015; Olivier Hagolle, Sylvander, et al. 2015; O. Hagolle et al. 2010; 2008). These
corrections, made by the MUSCATE processing centre of the CNES with the MAJA processing chain
(MACCS ATCOR® Joint Algorithm; Multi-sensor Atmospheric Correction and Cloud Screening;
Atmospheric and Topographic Correction), allows in addition to detect clouds and their shadows.
The Sentinel-2 images are provided according to a precise and fixed cut creating tiles of 110 km per 110
km in the projection UTM/ WGS84 (Universal Transverse Mercator/ World Geodetic System 1984), with
a 10 km overlap of adjacent tiles (Figure 6.1). Time series of Sentinel 2 images are referred to be the
primary data source for the remote sensing method of the OSO process.
Page 49 | 75
Figure 6.1 Sentinel-2 tiles on France. Source CESBIO.
This high resolution time series images (Sentinel-2) has opened up new opportunities in satellite image
processing and land use/cover classification. The high repeatability (5 days) allows the evolution of land
cover to be monitored very frequently and therefore dynamic classes to be mapped over time (e.g.
agricultural classes). In this context, CESBIO has developed an image processing chain that can integrate
all the satellite images of a given period to automatically produce large-scale land cover maps, such as
OSO map: the iota² processing chain.
The iota2 processing chain
There is an increasing number of software solutions (commercial and free) for performing supervised
classifications of satellite images (R packages, python modules, ENVI, etc.). For example, the Orfeo
Toolbox (OTB) library developed by the CNES is a solution widely used by the remote sensing
community. These solutions each have their advantages and disadvantages.
In this project, the operational solution chosen is the processing chain iota² (Infrastructure for Land
Cover by Automatic Processing Incorporating Orfeo Toolbox Applications). Iota² is a processing chain
developed by the CESBIO to produce land cover classifications over large areas from satellite images
with one or more sensors. It allows the treatment of large volumes of data, from different tiles and
different acquisition dates. Available as free software (https://framagit.org/iota2-project/iota2/), this
python-based software is based on the Orfeo Toolbox (OTB) library dedicated to image processing
(https://www.orfeo-toolbox.org/) and on the Random Forest algorithm with eco-climatic stratification
(Inglada et al. 2017). It offers several features of large-scale image processing:
management of multiple Sentinel-2 tiles (overlay);
Multi-sensor management (Sentinel-2, Spot 6, Sentinel-1, Landsat, etc.);
gapfilling (filling voids): transition from irregular to regular time series / cloud management;
stratification by large region (e.g. ecoclimatic zone, sylvo-eco-region, etc.).
Iota² was created to be run on POSIX (Portable Operating System Interface–UNIX) operating systems. It
is possible to parallelise processing and use it both on multi-core shared memory machines and on high-
performance computing clusters with hundreds of nodes (C Pelletier et al. 2016). The main product
resulting from the processing of satellite images with iota² is the land cover map of metropolitan France
(« Occupation du Sol - OSO ») based on the analysis of time series Sentinel-2 (http://osr-cesbio.ups-
tlse.fr/~oso/).
Operating iota2
The general classification methodology applied by iota2 is based on a conventional supervised
classification procedure (figure 5.3), with the advantage of being able to process very large territories
and large volumes of data completely automatically in a short time. Because of this automatic approach,
iota2 was designed to be applicable independently of the land cover classes and therefore no date
Page 50 | 75
selection in terms of season or vegetation phenology is applied. It is therefore advisable to use as many
images as possible to better characterize the annual soil cover cycle (Inglada et al. 2017).
The processing entries shown in Figure 6.2 are:
1. the reference data. These are the georeferenced samples labelled with a known land cover
class;
2. the validity masks of the images time series. Each image corresponding to a date is accompanied
by a mask indicating the valid pixels (surface reflection) and the invalid ones (cloud detection,
cloud shadow, saturation);
3. the level 2A satellite images time series;
4. Optional entry: a Region of Interest (ROI) mask to exclude areas from classification.
The treatment is divided into 6 main steps (green boxes in Figure 6.2):
Figure 6.2 Schematic of the OSO Map Production Procedure (Inglada et al. 2017)
Sample Selection
Iota2 randomly separates reference data into learning data and validation data. In order to limit the
phenomenon of spatial self-correction which artificially increases the accuracy of classifications, the
separation takes place at the polygon level and not at the pixel level. This prevents pixels from the same
polygon, with similar characteristics, from being used for learning and validation, which would bias the
assessment of the quality of the classifications towards optimistic results (Accuracy, F-Score and Kappa
Index).
Linear Interpolation
The images time series are preprocessed with temporal gapfilling and re-sampling to ensure spatial and
temporal homogeneity. The approach consists of linear interpolation of invalid pixels using surface
reflectance values from previous and subsequent dates for the dates with clouds. For temporal re-
sampling, linear interpolation is applied to all surface reflectance values of all dates (valid and invalid
pixels), in order to have common dates for all pixels in the study area.
Feature Extraction
The images time series obtained with interpolations are used for the calculation of spectral indices
(NDVI: Normalised Difference Vegetation Index; NDWI: Normalized Difference Water Index and
Brightness) for each pixel on each acquisition date. These indices are added to the surface reflectance
data of each pixel, which improves the results of the classifications, especially when the study areas are
very large and with very variable landscapes. These indices are used to highlight specific properties of
Page 51 | 75
observed surfaces such as the presence of vegetation with NDVI, of water and wetlands with NDWI (C
Pelletier et al. 2016).
Training
Training data are used to learn the classifier to identify land cover classes. At this stage, the classification
model is produced. Iota2 is based on the classification algorithm Random Forest by Breiman (2001),
which showed higher overall precision than traditional methods such as Decision Trees and SVM. In
addition, this classification requires shorter processing times with simpler settings (C Pelletier et al.
2016; Inglada et al. 2015; Rodriguez-Galiano et al. 2012; Gislason, Benediktsson, and Sveinsson 2006).
Random Forest algorithm uses combination of decision trees, in a way that each tree depends on an
initial sample and at each step, the construction of a tree node is done on a subset of variables drawn
randomly with replacement (bootstrap). After generating a large number of trees, the prediction is the
result of a majority vote (Ensemble learning). In other words, the class assigned to each pixel is the most
frequently predicted.
Classification
This step assigns a particular land cover class to each pixel of the image using the time series of surface
reflectance and spectral indices with the classification model. The product of this step is a land use map
with the same classes as the learning data.
Validation
The quality of the land cover maps produced by iota2 is assessed with a set of indices derived from a
confusion matrix where the values in the cells correspond to the count of the validation pixels. The rows
correspond to the reference class, called the "true class", and the columns to the class obtained by the
classification. The indices correspond to global statistics which give summarised information on
classification, calculated from the validation data used at the pixel level:
1. Overall Accuracy, calculated by the sum of the diagonal divided by the sum of all the elements
of the confusion matrix, indicates the proportion of pixels that have been well classified, all
classes combined;
2. the Recall, indicating the fraction of pixels correctly classified in relation to the ground truth;
3. the Precision, indicating the fraction of pixels correctly classified in relation to all pixels classified
in the class;
4. the F-Score, denoting the harmonic mean of the Precision and the Recall;
5. and the Kappa Index, which takes into account the part of the agreement between the output
of the classifier and the reference data that may be due to chance. It therefore expresses the
relative difference between the observed agreement and the random agreement that can be
expected if the classification was random (Inglada et al. 2017).
OSO product
Thanks to the processing chain iota², it has been possible to produce the land cover map OSO at the
scale of France territory integrating several terabytes of Sentinel-2 satellite images. This land cover map
has the following nomenclature (in brackets, the data sources and corresponding type used for the
learning samples; CLC: Corine Land Cover, RPG: the agricultural Land Parcel Information System
“Graphical Parcel Register”, BD Topo: French National Geographic Institute, Randolph: The Randolph
Glacier Inventory):
Artificial Areas
o Continuous urban fabric (CLC 111)
o Discontinuous urban fabric (CLC 112)
o Industrial or commercial units (CLC 121)
Page 52 | 75
o Road surfaces (BD Topo)
Agricultural Areas
o Arable lands
Annual summer crops (RPG)
Annual winter crops (RPG)
Intensive grassland (RPG)
o Perennial crops
Orchards (RPG)
Vineyards (RPG)
Forest and Semi-Natural Areas
o Forests
Broad-leaved forest (BD Topo).
Coniferous forest (BD Topo).
o Shrubs and herbaceous vegetation
Natural grasslands (CLC 321)
Woody moorlands (BD Topo)
o Open spaces with little or no vegetation
Beaches, dunes and sand plains (CLC 331)
Bare rock (CLC 332)
Glaciers and perpetual snow (Randolph)
Water bodies (CLC 523 and BD Topo).
TERUTI data
TERUTI is a statistical area-frame annual survey on land cover and land use covering the French territory
since 1982. The statistical sampling unit is a portion of territory, generally a circular place of 3 meters
diameter. Since 2017, the TERUTI survey's sample is drawn from a 250 m by 250 m grid (into the EPSG:
3035 coordinate system) which includes around 8,8 million covering the whole French metropolitan
territory. Each of these points is classified into 11 land cover categories (the strata) on the basis of
geomatic intersection with administrative and topographical geo-databases (see below). Beyond the
geographical characteristics of the point (i.e. its GPS coordinates, the values of the corresponding
NUTS3, NUTS2, NUTS1 and cities of France), some specific information is added to each point; in
particular the elevation, the distance to the nearest road, the population density in the most internal 1
km², etc.
The 11 strata of TERUTI master sample are:
S1-Water areas
S2-Artificial areas (built-up and non-built up)
S3-Agricultural land presently registered in the RPG (French LPIS)
S4-Land parcel previously registered but presently out of the RPG
S51-Heart of forest (> 10 m)
S52-Edge of forest (< 10 m)
S6-Undescribed areas in urban areas with high population density (> 150 hab./km2)
S91-Other natural areas in urban and suburban agglomerations
S92-Other natural areas in sparsely populated suburban or touristic rural areas
S93-Other natural areas in remote rural areas
S100- High elevation or hard-to-reach areas (to be photo-interpreted)
Page 53 | 75
Then the sample master is split into three parts:
strata S1, S2, S3, S51 are not surveyed in the field; for these 7.3 million points (82 % of the grid),
the land cover and the land use information is imputed from the available administrative and
geographical databases. These databases may be incomplete or out-of-date, thus there is
possibility to update it during surveys in the field.
strata S4, S52, S6, S91, S92, S93 are sampled and surveyed in the field; A sample of 200 000
points, out of 1.3 million, is randomly selected by strata and allocated by department (NUTS3).
The data field collection is carried out on a 3-year cycle: 68 000 points are visited by surveyors
every year of the cycle in order to determine the land cover and land use at a detailed level. In
the next 3-year cycle, the same points will be re-visited in order to provide a good estimation
of the changes rate of land cover and land use.
strata S100 is sampled and photo-interpreted; A sample of 45 000 points, out of 0.2 million, is
randomly selected and allocated by department (NUTS3). Photo-interpretation is also carried
out on a 3-year cycle with an annual sample of 5 000 points.
The questionnaires used by surveyors to collect data on the field are the following:
Land cover (on a 3 m diameter circle)
Artificial land
C111-Building with one to three floors
C112-Building with more than three floors
C113-Artificial non built-up impervious (coated) area
C121-Artificial non built-up pervious (stabilized, compacted) area
C122-Heterogeneous and artificial coverage area
Bare land
C211-Rock, cliff
C212-Sand, stones
C213-Other bare soil
Water land C221-Water area
C222-Glaciers, permanent snow
Cropland
C311-Annual crop
C312-Fruit and vegetables (excl. Fruit tree)
C313-Fruit tree and small fruit
C314-Vine
C315-Other permanent crop : ornemental, aromatic,...
Grassland
C411-Temporary or artificial meadow
C412-Natural or permanent pasture
C413-Fallow
C414-Agricultural grass strip
C415-Other grassland (with no agricultural use)
Woodland C510-Woodland
Shrubland
C521-Shrubland
C522-Low shrub hedge (linear or organized)
C523-Other woodland
Page 54 | 75
Environment (features) of the point
Artificial land M110-Artificial linear feature
M120-Artificial area feature
Water land
M211-Inland water body
M212-Inland running water
M221-Intertidal area
M222-Salines
M223-Coastal water body
Agricultural land
M310-Greenhouse
M320-Field surrounded by a hedge
M330-Open field (without border)
M340-Agro-forestry
Woodland
M410-Vegetation with sparse tree cover
M420-Woody hedge and other line of trees
M430-Grove
M441-Open forest
M442-Closed forest
Land use
Primary sector
U11-Agriculture
U13-Fishing
U14-Forestry
U15-Mining, quarrying and other primary production
Secondary sector U20-Industry, manufacturing, energy production and transport
Tertiary sector and residential
U31-Nature preservation, recreation, leisure, sport
U32-Commercial, service and other tertiary activities
U33-Transport, communication networks, storage, protective works
U34-Residential
U91-Unused or abandoned area
U99-Construction site or unknown use
The final statistical estimates are based on the weights derived from the sample master (imputation),
the observations collected on the field (field sample) and photo-interpretation.
TERUTI statistical process involves massive data imputation (for about 7.3 million points) from
administrative and geographical databases.
Administrative database:
the Graphical Parcel Register (RPG) derived from the Integrated Administration and Control
System (IACS) of the Common Agricultural Policy (CAP). The RPG is a geographic information
system for the identification of agricultural parcel. It consists of about 9,000,000 graphic
objects, parcels, covering the French territory of metropolis and overseas. It compiles data from
Page 55 | 75
agricultural area declarations made by farmers to receive aid from the Common Agricultural
Policy, especially identification of farms and agricultural parcel with the type of culture reported
for each parcel: grains, oilseeds, protein crops, vegetables, fruits, grasslands, etc. Introduced in
France from 2002, the RPG is updated annually.
the RPG is administered by the Service and Payment Agency (ASP) which is the only service
authorised to distribute the full RPG to the applicants. The ASP is a French public institution
created on 2009 whose mission is to contribute to the implementation of public policies, both
national and European. It pays almost all European aid to recipient agricultural farmers of the
Common Agricultural Policy (CAP).
Geographical databases:
the BD TOPO® from the National Geographical Institute (IGN) is a 3D vectoral description
(structured in objects) of the elements of the territory and its infrastructures, with metric
precision. It covers all the geographical and administrative entities of the national territory. The
BD TOPO® objects are grouped by theme: road network, rail network, energy transport
network, river system, buildings, wooded vegetation, etc. The 3D production process provides
the altimetry of the objects, as well as the height of the buildings. Many themes are
continuously updated; for instance: road network delay < 6 months, railway network delay = 1
year, energy transport system, buildings and river system update follows the cycle of aerial
photography update (3-4 years). The BD TOPO® is produced twice a year (semi-annual edition).
It covers all French departments (NUTS3) including overseas departments as well as the
overseas collectivities in Saint-Pierre-et-Miquelon, Saint-Barthélemy and Saint-Martin.
the BD FORET® is a reference vector database for forest space and semi-natural environments
(woodland, shrubland, grassland). It is the geographical reference framework for the
description of forest species. The objects in the BD FORET® are defined by an area greater than
or equal to 5,000 m2 (50 acres), according to the following thresholds: excluding areas where
the use of land is exclusively agricultural, width of at least 20 metres, vegetation cover rate of
10% or more. It is produced by photo-interpretation of colour infrared aerial images covering
all the departments (NUTS3) of the metropolitan territory. The outer limits of forest surfaces
are obtained by automatic segmentation of the image and, therefore, are based on the outer
limit of the treetop. The positioning accuracy of the limits is less than 10 meters for the external
contours of forest surfaces. BD Forêt® version 2 has been available throughout the metropolitan
area since December 2018.
There is a free access to all IGN data for the government services and its public administrative
establishments as well as for the education and research activities. Private sector can access
IGN data by purchasing user licenses.
Page 56 | 75
6.1.2. Stage 1
Test site definition
The comparison works between «in situ» land cover data from TERUTI and remote-sensing land cover
data from OSO will be implemented over the whole of the french metropolitan territory.
Data collection
Sentinel-2 images
Level 2A products can be downloaded free of charge from the Theia thematic cluster platform
(http://www.theia-land.fr/en/produits/reflectance-sentinelle-2). For the OSO process, they are stored
in the CNES high performance computing cluster.
In-situ land cover data collection in TERUTI survey
Each year, 68,000 points of the TERUTI sample are observed in the field by about 600 surveyors
recruited and trained by the regional statistical services of the Ministry of Agriculture. These points are
optimally allocated by strata and by NUTS3 french «departments» in order to maximize the accuracy of
the statistical estimators (like artificialization rate) at the NUTS3 geographical scale. Each surveyor have
to collect land cover and land use information on about 100 TERUTI points between June and
September, and should be able to collect about 10 points per day.
Each surveyor has a paper questionnaire to note land cover and land use at the exact location of the
Teruti point. The conditions of access and observation of the point (distance, visibility, environment)
must also be noted by the surveyor. Getting to the exact location of the plot has a great importance
regarding to the quality of the survey results. For this, the surveyor has the GPS coordinates of the point
to be surveyed, an aerial photograph at 1/25000 scale and a recent map at 1/25000 and 1/10000 scale.
A Spot satellite image from the previous year is also provided to the surveyor in case the aerial photo is
too old. Furthermore, surveyors can use a file of points to observe in KML format in order to
automatically geolocate the plot with a GPS device or a smartphone. *
Once the points have been visited and the questionnaires have been filled in, the surveyors have to
enter the collected data in a computer application for data entry and control. Once data entry is
validated by the application, the surveyor can upload the data to a centralized computer server. Thus,
the collection phase is continuously monitored and supervised by the regional statistical services and
by the central statistical service. A collection assessment is carried out by each regional statistical service
at the end of the operation.
6.1.3. Stage 2
Data pre-processing
Teruti-OSO comparison
An initial comparison analysis between the Teruti database and the OSO product was carried out. In
order to identify the differences in area by land cover class between the two methods and thus highlight
the most different classes. The comparison is based on the products of the year 2017, specifically on
the 13 departments (NUTS3) of the Occitanie region (NUTS2) in the south of France. In order to do this,
it was necessary to harmonize the two classifications by grouping classes until a nomenclature of 7
classes was obtained (Table 6.1).
Page 57 | 75
Table 6.1 Harmonization of the Teruti and OSO classifications 2017.
TERUTI OSO
Artificial impervious land
Artificial permeable land
Cropland
Agricultural permanent and temporary grassland
Non-agricultural grassland
Shrubland
Forest
Other wooded land
Bareland
Water areas
Continuous Urban Fabric
Discontinuous Urban Fabric
Industrial and commercial units
Road surfaces
Annual summer crops
Winter summer crops
Orchards
Vineyards
Intensive grasslands
Natural grasslands
Broad-leaved forests
Coniferous Forests
Woody moorlands
Beaches, dunes and sand
Bare rocks
Water bodies
Glaciers and perpetual snow
TERUTI-OSO
Artificial land
Cropland
Agricultural grassland (ex: meadows, pastures)
Non-agricultural grassland (ex: lawns)
Shrubland
Woodland
Other natural land (Bareland and water area)
Once the nomenclature of comparison was defined, the calculation of the areas by new classes by
department was carried out for each product. In the case of the Teruti database, information on areas
in ha by department is provided. As for the OSO map, pixel counts on the raster were performed and
the area per department was calculated (1 pixel = 0.01 ha).
The results presented in Figure 6.3 show that there is a medium-small gap on artificial land areas in
Occitanie between TERUTI and OSO. It is around + 2% in OSO estimation for the whole NUTS2 region
(+135,000 ha), and larger in the urbanized departments. This is due to the detection of the urban fabric
by OSO related to the spatial resolution of the sentinel2 images (10 m), when the Teruti observation is
based on a 3 m diameter circle. The average difference is small on cultivated areas in Occitanie (+1.2%
or +91,000 ha) with variable differences according to the departments (+7.5% in 82-Tarn-et-Garonne
and -1.8% in 12-Aveyron). The difference on schrubland areas of -4% for OSO (- 275,000 ha) is probably
explained by a difference in the semantic precision of the classes of the two methods; as well as by
confusion with other classes (meadow, lawn, other wooded soils) during remote-sensing classification
(OSO) or field identification (TERUTI). For the woodland areas, the difference is about - 8% for OSO (-
580,000 ha). It may be due to the use of national forest inventory data in the TERUTI process. Concerning
agricultural grassland, the difference of -7% for OSO (- 500,000 ha) may also be due to a very frequent
confusion with other classes of grassland (heaths, lawns) by the remote-sensing classification. It is also
possible that the administrative source used by Teruti (RPG – CAP Declaration) has reporting bias.
Finally, the largest gap between TERUTI and OSO occurs on non-agricultural grassland (lawns): +16% for
Page 58 | 75
OSO (+1 150 000 ha), which partially compensates for the differences on grasslands and schrublands.
On the other hand, if the Teruti-OSO comparison were made with another class grouping and with more
accurate class-definitions, the gap would be smaller. So, it should be recalled that each of these two
products is based on completely different spatial methods and resolutions that may be complementary
in order to identify a change in the land cover at two different times.
Figure 6.3 Land cover by NUT3 department in Occitanie.
TERUTI-oriented classification (work in progress)
The previous results suggest experimenting with an automatic classification of Sentinel-2 images to
automatically assign land cover information to TERUTI points. It is therefore necessary to use iota² to
establish a land cover classification based on the following nomenclature:
Artificial impervious land
Artificial permeable land
Bareland
Water areas
Cropland
Agricultural permanent and temporary grassland
Non-agricultural grassland
Forest
Schrubland
The assignment of TERUTI points takes place in the spring of the reference year (automatic imputation
and field campaign). It is therefore proposed to classify the Sentinel-2 images from the previous year
until March of the current year (from March n-1 to April n). Two different scenarios of classification are
planned:
classification model calibrated from TERUTI points (one point corresponding to one Sentinel-2
pixel),
Page 59 | 75
classification model calibrated from filtered OSO training dataset (Inglada et al. 2017). The
filtering of OSO training polygons will be done using TERUTI points.
This experimental phase will be conducted in a test region, namely the Occitanie region (NUTS2) in the
south of France.
At the end of this phase, a new contingency analysis will be carried out again between these new
classifications obtained and the TERUTI points.
6.1.4. Stage 3
Main data processing
The main part of the case study 7 has not yet begun. It will consist in a deeper analysis of differences
between TERUTI and OSO products at individual levels of TERUTI points and OSO pixels. Then some
methods of detecting changes will be tested but depending on their implementation, their
performances and more broadly theirs relevancies, some may ultimately not be retained.
As described above, the contingency between OSO product and TERUTI points is highly variable
depending on the land use/cover classes. These errors have several causes:
automatic imputation from databases that are not always up to date, for example an IGN forest
database, often describes a forest cover that existed a few years earlier.
classification error due to the spatial resolution of the Sentinel-2 image which integrates several
spectral responses, where the collector analyses within a radius of 3 m (except for vegetation).
However, higher accuracy is observed for high-level classes (LESQUELLES) at a higher scale (zonal
statistics / indicator scale). The land cover/use changes identified from these high-level classes could
allow inter-annual changes to be spatialized with greater thematic and spatial accuracy.
It is therefore important to understand at what spatial scale and with what thematic precision is remote
sensing from time series Sentinel 2 images relevant for TERUTI issues?
The methodological proposal to be conducted is based on the experimentation of different methods of
detecting land cover/use changes between 2017 and 2019 on the field-based TERUTI points (~70 000
points):
Spectral/Index method
Analyse automatically detected changes and qualify them by photo-interpretation
o urbanization / artificialization
o deforestation (clear cut)
o intensification
o abandonment
Post-classification detection (around TERUTI points)
o classification with high-level classes on 2 successive years (2016 – 2018)
direct changes
classes and changes probabilities
Changes classification
Validation with 2020 TERUTI survey on summer 2020 (changes from 2017 to 2020).
Page 60 | 75
6.2. Case study 8 - Land cover maps at very detailed scale
6.2.1. Pre-works
State of the Art
Timely and frequently updated Land Cover (LC) information is of paramount importance to modern
National Statistical Institutes (NSI). Since most of – if not all – the facts and events surveyed by NSIs take
place somewhere in the national territory, LC information is structurally complementary to survey and
administrative data. High quality LC data and statistics can lead to a wider and deeper understanding of
many phenomena of interest.
As far as Europe is concerned, two flagship LC projects exist: CORINE (Bossard, Feranec, and Otahel
2000; Büttner 2014), currently run by the Copernicus Program, and LUCAS (Bettio et al. 2002; EUROSTAT
2003), managed by Eurostat. Despite these projects address the study of land cover very differently –
CORINE in a cartography (i.e. full-coverage) perspective, LUCAS in a statistical estimation (i.e. sample
survey) perspective – they suffer common shortcomings. Both are very costly, have very complex
production pipelines, rely heavily on clerical work, and produce their outputs with a rather low time
frequency. Most of the shortcomings affecting CORINE and LUCAS depend on the huge amount of
human workload they require. It is, therefore, very tempting to try to overcome these shortcomings
through process automation. Given an input satellite image depicting a portion of territory, a fully
automatic system should ideally be able to (i) classify the territory according to some standard LC
taxonomy, and to (ii) quantify the area (or the proportion) of territory covered by each LC class, without
any human intervention.
The Italian National Institute of Statistics (Istat) is currently investigating whether Deep Learning
(Goodfellow, Bengio, and Courville 2016) methods could be used to derive automated Land Cover
estimates of satisfactory quality from Sentinel-2 satellite images. A prototype software system is being
developed within the scope of this research. The present case study about “land cover maps” focuses
on a very relevant, though quite specific, output artefact of the system.
Methodology
Istat research goal is to design and develop an automatic LC estimation system. Such a system should
be able to take as input a satellite image depicting a portion of territory, and to return as output a table
of LC statistics.
Although LC estimation is a quantification problem rather than a classification one, we decided to
implement our system according to a ‘classify-and-count’ design. The main driver of this design choice
was to incorporate into our system a Convolutional Neural Network (CNN)1, so as to take advantage of
its tremendous performance in image classification tasks. Without going into technical details, our
classify-and-count design can be summarized as follows:
0. Train a CNN to predict the LC class of a satellite image ‘tile’ (i.e. a small, fixed-size sub-image).
1. Divide the satellite images covering a ‘target area’ (i.e. the territory for which LC statistics have
to be computed) into tiles.
2. Use the trained CNN to predict the LC class of all the generated tiles.
1 CNNs (LeCun et al. 1989; LeCun and Bengio 1995) are cutting edge Deep Learning architectures that have recently reached superhuman accuracy in many Computer Vision tasks and whose topology was originally inspired by the organization of the visual cortex of mammals.
Page 61 | 75
3. Obtain LC statistics for the target area by simply computing the relative frequencies of predicted
LC classes.
It ought to be clear that phases (1), (2), (3) have to be repeated each time LC statistics are requested for
a new target area, whereas the CNN’s training phase is carried out only once (whence the (0) index in
the list).
6.2.2. Stage 1
Test site & data collection
We tested our automatic LC estimation system on two sample satellite images. Both test images have
been cropped from Sentinel-2 products downloaded from Copernicus Open Access Hub. These products
are TCI (True Color Image) objects (Ledley, Buas, and Golab 1990) encoded in JPEG2000 format
(Christopoulos 2000). They represent two quite different Italian territories:
The first territory is a portion of Apulia that includes the city of Lecce. The corresponding image
crop is shown in Figure 6.4. For conciseness, we will call this crop ‘Lecce image’. The Lecce image
has a size of 2,496 x 3,008 pixels, therefore depicting a surface area of approximately 751 km2.
Note that this is just 3.8% of Apulia’s overall surface.
Figure 6.4 The ‘Lecce image’. This test image has been cropped from a Sentinel-2 TCI object taken on June 26th,
2016. The area of the depicted territory is about 751 km2, i.e. about 3.8% of Apulia.
The second territory is a portion of Tuscany that includes (part of) the city of Pisa. The
corresponding image crop is shown in Figure 6.5. For conciseness, we will call this crop
‘Pisaimage’. The Pisa image has a size of 3,008 x 1,472 pixels, therefore depicting a surface area
of approximately 443 km2. Note that this is just 1.9% of Tuscany’s overall surface.
Page 62 | 75
Figure 6.5 The ‘Pisa image’. This test image has been cropped from a Sentinel-2 TCI object taken on March 25th,
2019. The area of the depicted territory is about 443 km2, i.e. about 1.9% of Tuscany.
6.2.3. Stage 2
Pre-processing
We decided to adopt the EuroSAT dataset (Helber et al. 2019) as training set for our CNN. EuroSAT
contains 27,000 manually labelled image patches of size 64 x 64 pixels. These patches have been
cropped from carefully selected Sentinel-2 satellite images covering 34 European countries. EuroSAT
images are multispectral (all 13 Sentinel-2 bands are provided) but we have so far restricted our interest
to Red, Green and Blue bands only (i.e. to RGB color images). Since the resolution of Sentinel-2 images
in the R, G and B bands is 10 meters per pixel, each 64 x 64 EuroSAT patch represents a ground area of
6402 square meters, i.e. about 41 hectares. The LC classification according to which EuroSAT patches
have been manually labelled entails 10 classes: 1) ‘Annual Crop’, 2) ‘Forest’, 3) ‘Herbaceous Vegetation’,
4) ‘Highway’, 5) ‘Industrial’, 6) ‘Pasture’, 7) ‘Permanent Crop’, 8) ‘Residential’, 9) ‘River’, 10) ‘Sea &
Lake’. EuroSAT authors have defined this LC taxonomy following the principle that the patterns of each
class should be visible at the resolution of 10 meters per pixel. The dataset is roughly balanced with
respect to the 10 classes, as class cardinalities range from 2,000 to 3,000 patches.
Figure 6.6 below reports the labels of the 10 classes of EuroSAT’s land cover classification, along with a
convenience color palette that we will use in later sections for visualization purposes.
Figure 6.6 Labels of EuroSAT’s LC classification.
6.2.4. Stage 3
CNN Model Training and Accuracy
To implement the classification engine of the system, we are currently using a cutting-edge, highly
sophisticated CNN model named Inception-V3 (Szegedy et al. 2016), which we customized and trained
on the EuroSAT dataset. As far as the training stage is concerned, we randomly split the EuroSAT data
into training set and test set according to a 75/25 proportion. The generated training set and the test
set contain 20,250 and 6,750 image patches, respectively. Figure 6.7 below reports the Confusion Matrix
obtained contrasting the LC classes predicted by our best model and the true LC labels of the 6,750
image patches belonging to the test set.
Page 63 | 75
Figure 6.7 The Confusion Matrix obtained contrasting the LC classes predicted by our Inception-V3 model and the true LC labels of the 6,750 image patches belonging to the test set. The trace of the matrix gives the overall
number of exact predictions (i.e. 6,644), which implies an accuracy of 6,644/6,750, i.e. 98.43%.
Land Cover Estimation Algorithm
Once the CNN has been trained on the EuroSAT dataset, our automatic LC estimation system can be fed
with a satellite image and return LC statistics for the corresponding territory. To do so, a classify-and-
count algorithm is used, whose main logical steps can be summarized as follows:
i. The input Sentinel-2 image is split into a set of (possibly overlapping) tiles of size 64 x 64 pixels.
These tiles are generated by cropping the input image along a regular spatial grid, through a
‘sliding window’ algorithm.
ii. The trained CNN classifies one tile at a time and logically links the predicted LC class to the
corresponding area of the original image. The output of the whole process is a ‘classification
matrix’: each element of this matrix corresponds to a tile of the original image and stores its
predicted LC class.
iii. The area shares of each LC class for the whole territory depicted in the input satellite image is
estimated by the relative frequency of the corresponding label within the classification matrix.
iv. A moderate resolution land cover map of the territory depicted in the input satellite image is
obtained by rendering the classification matrix as a raster image.
The working mechanism of the sliding window algorithm mentioned in (i) is schematically illustrated in
Figure 6.8. Basically, a window of 64 x 64 pixels slides horizontally and vertically over the input image
with a stride (i.e. step length) of s pixels, starting from its upper-left corner. For each step of the window,
one tile is generated by cropping the area of the input image that is framed by the window. This way,
the algorithm actually produces a systematic spatial sample of tiles drawn from the input image. Note
that, since each generated tile corresponds to a specific area of the input image, the output sample has
an intrinsic geometrical structure. More specifically, the generated tiles are naturally arranged
according to a regular spatial grid (see Figure 6.8).
Page 64 | 75
Figure 6.8 Illustration of the sliding window algorithm. A convenience image of size 2D x 2D is split into tiles of size D x D. In the upper panel, the window slides horizontally and vertically with a stride of length D, giving rise to
4 non-overlapping tiles arranged according to a 2 x 2 grid. In the lower panel, the stride is reduced to D/2: this generates 9 partially overlapping tiles arranged along a 3 x 3 grid. Note that reducing the stride from D to D/2
allowed to resolve more image details: for instance the red ‘A’, which in the upper panel was not framed in any tile of the grid, now pops up in the central tile of the lower panel grid.
In (ii) the trained CNN is used to predict the LC class of all the tiles generated in (i). Note, incidentally,
that different tiles can be processed independently, allowing our system to take advantage of
high-performance parallel computing architectures (GPUs).
In (iii) the system calculates output LC statistics from the classification matrix. In accordance with our
classify-and-count approach, this is accomplished by simply computing class frequencies. If we indicate
with c a generic LC class and with fc the proportion of class c within the classification matrix, W is the
width in pixel of the input image and H is its height, then the corresponding area and area share are
estimated by:
{𝐴𝑟𝑒𝑎𝑐 = (𝑓𝑐 ∙ 𝑊 ∙ 𝐻) ∙ 100𝑚2
𝐴𝑟𝑒𝑎𝑆ℎ𝑎𝑟𝑒𝑐 = 𝑓𝑐 (6.1)
Note that in the upper equation of (5.1) we took into account that the resolution of the satellite images
processed by our system is 10 meters per pixel.
While the LC statistics calculated in (iii) have to be regarded as the main output of our system, a further
interesting artefact can be distilled, as a by-product, from the classification matrix. Indeed, as mentioned
in (iv), a moderate resolution land cover map can be produced by simply rendering the classification
matrix as a raster image. It is worth stressing that this is only possible because of the geometric structure
of the systematic spatial sample of tiles generated by the sliding window algorithm. Clearly, the smaller
the stride, the larger will be the dimension of the classification matrix and, therefore, the resolution of
the obtained land cover map.
Automated Land Cover Maps
We briefly analyse here the automated LC maps that our system generated from the Lecce and Pisa
images as by-products of LC estimation. Recall that our system produces LC maps by simply rendering
as a raster image the classification matrix computed for LC estimation. The success of this approach
entirely rests on the inherent spatial structure of the sample of tiles determined by the sliding window
algorithm (Section Land Cover Estimation Algorithm). Since both the LC estimates and the LC maps
produced by our system improve as the stride of the sliding window decreases, we provide here results
obtained by setting the stride to its minimum value of 1 pixel. This setting generated (39 x 47) = 1,833
tiles for the Lecce image, and (47 x 23) = 1,081 tiles for the Pisa image.
Page 65 | 75
Figure 6.9 shows the automated LC map obtained for the territory depicted in the Lecce image. The
adopted color legend is provided in Figure 6.6. Overall, the map exhibits a high degree of spatial
consistency, in that the main structures (urban centres, industrial areas, highways, crops and
vegetation) have been correctly detected and nicely reconstructed.
For instance, focusing on residential areas (brown pixels on the map) and comparing visually the map
with the original image, one can observe that the sizes, the shapes and the relative positions of cities
are all described fairly well by the map.
Figure 6.9 Automated LC map (right panel) of the territory depicted in the Lecce image (left panel). The color legend of the LC map is provided in figure 5.4.
Figure 6.10 shows the automated LC map obtained for the territory depicted in the Pisa image. The
color legend is again provided in Figure 6.6. The detected LC classes exhibit a much more complex
topology in this map than it happened in the previous one (Figure 6.9). This comes as no surprise, since
the Pisa image reveals at first sight a wider variety of structures that are arranged on the ground in a
much more intricate way. Nevertheless, most of these structures seem nicely reconstructed in the map
of Figure 6.10.
Figure 6.10 Automated LC map (right panel) of the territory depicted in the Pisa image (left panel). The color legend is provided in Figure 6.6
Page 66 | 75
7. Report on thematic task 4 - Settlements, Enumeration Areas and Forestry
7.1. Case study 9 - Update the INSPIRE Theme Statistical Units dataset and preventing forest
fire
7.1.1. Pre-works
State of the art
The use of cartography has supported census data collection at Statistics Portugal since 1981. In 1995,
Statistics Portugal started the preparation of the 2001 census cartography, which was named
“Geographic Information Referencing Base” (BGRI 2001) and was based on Geographic Information
Systems. Since 2006, with the production of the BGRI 2011 to support the 2011 census, Statistics
Portugal has been developing a Spatial Data Infrastructure (SDI) and carrying out other statistical
activities in a permanent effort to introduce the spatial perspective across the different phases of
statistical production.
Aiming to support the Census 2021, Statistics Portugal is conducted an internal work to create an update
version of the small statistics units.
The SDI is currently being used, in a transversal way, at Statistics Portugal activities, promoting the
integration of the spatial component in the statistical production process, in order to achieve efficiency
and accuracy, within several domains such as the sampling process, the data collection or the
dissemination of statistical information.
Statistics Portugal has only little experience using Copernicus data.
This context is the framework that supports the actions of the Portugal in the present case study.
Literature review
The literature review focused on general literature about the use of remote sensing images and specific
literature related to our task specifically Machine Learning and deep learning theory (Al-Obeidat et al.
2015; Carfagna and Gallego 2006; Chasmer et al. 2014; Crammer and Singer 2002; Ghosh, Mishra, and
Ghosh 2011; Han and Liu 2015; Hansen et al. 2000; Hawkins 2004; Huang and Zhang 2013; Jackson and
Landgrebe 2002; Lawrence, Giles, and Tsoi 1997; Y. Li et al. 2014; Piotrowski and Napiorkowski 2013;
Santaella 2019; Sun et al. 2019).
Tools/Data
CLC 2018: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018
COS 2015, national land cover map:
http://www.dgterritorio.pt/cartografia_e_geodesia/cartografia/cartografia_tematica/cartogra
fia_de_uso_e_ocupacao_do_solo__cos_clc_e_copernicus_/
DIAS (Data and Information Access Services): https://www.copernicus.eu/en/access-data/dias
ESA Science Toolbox Exploitation Platform: http://step.esa.int/main/
Getting started with Google Earth engine: https://developers.google.com/earth-
engine/getstarted
GHSL - Global Human Settlement Layer, https://ghsl.jrc.ec.europa.eu/
SDG Monitoring and Reporting Toolkit for UN Country Teams. Monitoring and Reporting the
SDGs | LAND CONSUMPTION; https://unhabitat.org/wp-content/uploads/2019/02/Indicator-
11.3.1-Training-Module_Land-Consumption_Jan-2019.pdf
The Urban Mapper / Trends Earth (http://trends.earth/docs/en/): Open source tool based on
Google Earth Engine for tracking land use change.
Page 67 | 75
Wekeo Platform: https://www.wekeo.eu/
Zonal statistics function, as referred on the site of Python GDAL/OGR cookbook :
https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#calculate-zonal-statistics
Statistical product definition
The output of this case study will be an updated version of the geography of the Settlements and
Enumeration Areas for Census 2021 and an analysis on the possibility to produce an indicator of the
total of eucalyptus plantation.
Data source & toolkit
Research on the possibilities of using the cloud based sources for data processing as referred on the
DIAS website were made. We registered as beta tester for one of the clouds based platforms, WEkEO,
however no tests with this platform have been executed. The advantages of this platform in relation to
google earth engine must be evaluated.
Registered as user for Google Earth engine, some small tests were made with some of the online java
scripts samples. This platform looks promising, since it is used for several applications, for example the
“The Urban Mapper / Trends Earth” tools.
Test of the SNAP Desktop software, using local data.
7.1.2. Stage 1
Test site definition
For the first action, update the INSPIRE Theme Statistical Units dataset, we intend to apply for the
mainland territory of Portugal. For the second action the investigated area will be a NUTS III area.
Data collection
For the first action data from the following sources were used:
National reference thematic map for Land Cover Land Use in Portugal (2015)
CORINE Land Cover / CLC 2018
GHSL - Global Human Settlement Layer 2014
7.1.3. Stage 2
Data pre-processing
For the update of the INSPIRE Theme Statistical Units dataset the aim is to obtain the following results:
Potential urban areas within residual subsections
Degree of imperviousness of the 2011 enumeration areas using GHSL data
Treatment for COS2015 and CLC18 data
Download data
Selection of urban areas
Pre-processing of GHSL data
Download of datasets, GHS_BUILT 38 m2 and 250 m2
Clip of the images for the territory of continental Portugal
Page 68 | 75
For the analysis of the forest and the eucalyptus plantation the aim is to study a georeferencing
methodology for eucalyptus areas in Portugal mainland by municipality, using Artificial Intelligence
algorithms in particular Machine Learning and Deep Learning on Copernicus satellite images, namely
Sentinel-2.
For test and training the models data from the COS will be used with some field work to validate the
eucalyptus areas. COS is a Land use and soil occupation map, it is a product with a minimum unit of 1
hectare and a minimum line distance of 20 meters, published for the reference years 1995, 2007, 2010
and 2015. It is a map of polygons representing homogeneous land use / occupation units. It will be used
the 2015 version.
In ArcGIS the algorithms Maximum Likelihood, Random Trees, Support Vector Machine and Forest
Based Classification and regression are already implemented and can be used to extract eucalyptus
areas from earth observation images.
Deep Learning is one of the methods for feature extraction and classification. Deep Learning models are
capable of learning to focus on the right features by themselves and requires little guidance from the
programmer. Basically, deep learning mimics the way our brain functions (Figure 7.1).
Figure 7.1 The deep learning workflow includes Create training samples, Train model and Perform inference, all can be used in ArcGis Pro to identify Eucalyptus parcels.
7.1.4. Stage 3
Main data processing
Processing executed for update of the INSPIRE Theme Statistical Units dataset
Selection of residential enumerations areas
Selection of areas with urban classification from the CLC18 and COS2015 data
Overlay of the datasets and selection urban areas within residential enumeration areas
The work to obtain the degree of imperviousness of the 2011 enumeration areas using GHSL data is still
in process. Until now the following activities have been executed:
Selection of study area
Development of a script in python 2.7 making use of the capabilities of gdal, ogr and numpy
libraries to obtain zonal statistics for image data for polygons. This script is based on the
examples listed in the Python GDAL/OGR cookbook.
Some more work has to be done to analyse what will be the value of this kind of data.
Results analysis
Only preliminary results have been obtained concerning the identification of potential urban areas using
COS2015 and CLC18 data. The following images (Figure 7.2, Figure 7.3) show some results:
Page 69 | 75
Figure 7.2 New artificial surfaces outside localities using CLC18
Figure 7.3 New artificial surfaces outside localities using COS2015
Page 70 | 75
8. Report on the meetings
Within WPH of ESSnet Big Data II were held four meetings. Three of them were organised by WebEx
platform. One internal face to face meeting took place in Statistical Office in Olsztyn (Poland). The list
of meetings is below in Table 8.1
Table 8.1 List of meeting within WPH.
No Date Type of meeting
1 28 Feb 2019 WPH WebEx Meeting 1
2 09 May 2019 WPH WebEx Meeting 2
3 26-28 June2019 Meeting 3 in Olsztyn
4 12 Sep 2019 WPH WebEx Meeting 4
Overview of the meetings at the level of WPH Earth observation can be found on Wiki
(https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WPH_Meetings).
9. Bibliography
Al-Obeidat, Feras, Ahmad T. Al-Taani, Nabil Belacel, Leo Feltrin, and Neil Banerjee. 2015. “A Fuzzy Decision Tree for Processing Satellite Images and Landsat Data.” In Procedia Computer Science. https://doi.org/10.1016/j.procs.2015.05.157.
“AppEEARS.” n.d. Accessed September 19, 2019. https://lpdaacsvc.cr.usgs.gov/appeears/. Atzberger, Clement. 2013. “Advances in Remote Sensing of Agriculture: Context Description, Existing
Operational Monitoring Systems and Major Information Needs.” Remote Sensing. https://doi.org/10.3390/rs5020949.
Bargiel, Damian. 2017. “A New Method for Crop Classification Combining Time Series of Radar Images and Crop Phenology Information.” Remote Sensing of Environment. https://doi.org/10.1016/j.rse.2017.06.022.
Bargiel, Damian, Felix Neuendorf, Michael Schlund, and Uwe Soergel. 2014. “Classification of Crops in Different European Regions Based on TerraSAR-X Data.” In 10TH EUROPEAN CONFERENCE ON SYNTHETIC APERTURE RADAR (EUSAR 2014).
Belgiu, Mariana, and Lucian Drăgu. 2016. “Random Forest in Remote Sensing: A Review of Applications and Future Directions.” ISPRS Journal of Photogrammetry and Remote Sensing 114: 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011.
Bernardis, Caleb De, Fernando Vicente-Guijalba, Tomas Martinez-Marin, and Juan M. Lopez-Sanchez. 2016. “Contribution to Real-Time Estimation of Crop Phenological States in a Dynamical Framework Based on NDVI Time Series: Data Fusion with SAR and Temperature.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. https://doi.org/10.1109/JSTARS.2016.2539498.
Bettio, M, J Delincé, P Bruyas, W Croi, and G Eiden. 2002. “Area Frame Surveys: Aim, Principals and Operational Surveys. Building Agri-Environmental Indicators, Focussing on the European Area Frame Survey LUCAS,” 12–27.
Bossard, M, J Feranec, and J Otahel. 2000. “CORINE Land Cover Technical Guide: Addendum 2000.” Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.
https://doi.org/10.1017/CBO9781107415324.004. Büttner, G. 2014. “CORINE Land Cover and Land Cover Change Products. In Land Use and Land Cover
Mapping in Europe,” 55–74. Carfagna, Elisabetta, and F. Javier Gallego. 2006. “Using Remote Sensing for Agricultural Statistics.”
International Statistical Review. https://doi.org/10.1111/j.1751-5823.2005.tb00155.x. Chasmer, L., C. Hopkinson, T. Veness, W. Quinton, and J. Baltzer. 2014. “A Decision-Tree Classification
for Low-Lying Complex Land Cover Types within the Zone of Discontinuous Permafrost.” Remote Sensing of Environment. https://doi.org/10.1016/j.rse.2013.12.016.
Page 71 | 75
Christopoulos, Charilaos. 2000. “The Jpeg2000 Still Image Coding System: An Overview.” IEEE Transactions on Consumer Electronics 46 (4): 1103–27. https://doi.org/10.1109/30.920468.
Corbane, Christina, Martino Pesaresi, Panagiotis Politis, Vasileios Syrris, Aneta J. Florczyk, Pierre Soille, Luca Maffenini, et al. 2017. “Big Earth Data Analytics on Sentinel-1 and Landsat Imagery in Support to Global Human Settlements Mapping.” Big Earth Data 1 (1–2): 118–44. https://doi.org/10.1080/20964471.2017.1397899.
Cortes, Corinna, and Vladimir Vapnik. 1995. “Support-Vector Networks.” Machine Learning 20 (3): 273–79. https://doi.org/10.1023/A:1022627411411.
Crammer, and Singer. 2002. “On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines.” Journal of Machine Learning Research - JMLR.
Csillik, Ovidiu, Mariana Belgiu, Gregory P. Asner, and Maggi Kelly. 2019. “Object-Based Time-Constrained Dynamic Time Warping Classification of Crops Using Sentinel-2.” Remote Sensing. https://doi.org/10.3390/rs11101257.
Demarez, Valérie, Florian Helen, Claire Marais-Sicre, and Frédéric Baup. 2019. “In-Season Mapping of Irrigated Crops Using Landsat 8 and Sentinel-1 Time Series.” Remote Sensing. https://doi.org/10.3390/rs11020118.
Didan, Kamel. 2015. “MOD13Q1 - MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid.” Nasa Lp Daac. https://doi.org/10.5067/MODIS/MOD13Q1.006.
Dimitrov, Petar, Qinghan Dong, Herman Eerens, Alexander Gikov, Lachezar Filchev, Eugenia Roumenina, and Georgi Jelev. 2019. “Sub-Pixel Crop Type Classification Using PROBA-V 100 m NDVI Time Series and Reference Data from Sentinel-2 Classifications.” Remote Sensing. https://doi.org/10.3390/rs11111370.
Drusch, M., U. Del Bello, S. Carlier, O. Colin, V. Fernandez, F. Gascon, B. Hoersch, et al. 2012. “Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services.” Remote Sensing of Environment 120: 25–36. https://doi.org/10.1016/j.rse.2011.11.026.
Ehrlich, D., T. Kemper, M. Pesaresi, and C. Corbane. 2018. “Built-up Area and Population Density: Two Essential Societal Variables to Address Climate Hazard Impact.” Environmental Science and Policy 90: 73–82. https://doi.org/10.1016/j.envsci.2018.10.001.
EUROSTAT. 2003. “The Lucas Survey - European Statisticians Monitor Territory. Office for Official Publications of the European Communities.”
Feng, Siwen, Jianjun Zhao, Tingting Liu, Hongyan Zhang, Zhengxiang Zhang, and Xiaoyi Guo. 2019. “Crop Type Identification and Mapping Using Machine Learning Algorithms and Sentinel-2 Time Series Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. https://doi.org/10.1109/jstars.2019.2922469.
“Folium.” n.d. Accessed September 19, 2019. https://python-visualization.github.io/folium/. Ghosh, Ashish, Niladri Shekhar Mishra, and Susmita Ghosh. 2011. “Fuzzy Clustering Algorithms for
Unsupervised Change Detection in Remote Sensing Images.” Information Sciences. https://doi.org/10.1016/j.ins.2010.10.016.
Gislason, Pall Oskar, Jon Atli Benediktsson, and Johannes R. Sveinsson. 2006. “Random Forests for Land Cover Classification.” Pattern Recognition Letters 27: 294–300. https://doi.org/10.1016/j.patrec.2005.08.011.
Gómez, Salvador, Sanz, and Casanova. 2019. “Potato Yield Prediction Using Machine Learning Techniques and Sentinel 2 Data.” Remote Sensing. https://doi.org/10.3390/rs11151745.
Goodfellow, I, Y Bengio, and A Courville. 2016. “Deep Learning.” MIT Press. GSARS. 2017. “Handbook on Remote Sensing for Agricultural Statistics.” http://gsars.org/wp-
content/uploads/2017/09/GS-REMOTE-SENSING-HANDBOOK-FINAL-04.pdf. Hagolle, O., G. Dedieu, B. Mougenot, V. Debaecker, B. Duchemin, and A. Meygret. 2008. “Correction of
Aerosol Effects on Multi-Temporal Images Acquired with Constant Viewing Angles: Application to Formosat-2 Images.” Remote Sensing of Environment 112: 1689–1701. https://doi.org/10.1016/j.rse.2007.08.016.
Hagolle, O., M. Huc, D. Villa Pascual, and G. Dedieu. 2010. “A Multi-Temporal Method for Cloud Detection, Applied to FORMOSAT-2, VENμS, LANDSAT and SENTINEL-2 Images.” Remote Sensing
Page 72 | 75
of Environment 114: 1747–55. https://doi.org/10.1016/j.rse.2010.03.002. Hagolle, Olivier, Mireille Huc, David Villa Pascual, and Gerard Dedieu. 2015. “A Multi-Temporal and
Multi-Spectral Method to Estimate Aerosol Optical Thickness over Land, for the Atmospheric Correction of FormoSat-2, LandSat, VENμS and Sentinel-2 Images.” Remote Sensing 7: 2668–91. https://doi.org/10.3390/rs70302668.
Hagolle, Olivier, Sylvia Sylvander, Mireille Huc, Martin Claverie, Dominique Clesse, Cécile Dechoz, Vincent Lonjou, and Vincent Poulain. 2015. “SPOT-4 (Take 5): Simulation of Sentinel-2 Time Series on 45 Large Sites.” Remote Sensing 7: 12242–64. https://doi.org/10.3390/rs70912242.
Han, Min, and Ben Liu. 2015. “Ensemble of Extreme Learning Machine for Remote Sensing Image Classification.” Neurocomputing. https://doi.org/10.1016/j.neucom.2013.09.070.
Hansen, M. C., R. Sohlberg, R. S. Defries, and J. R.G. Townshend. 2000. “Global Land Cover Classification at 1 Km Spatial Resolution Using a Classification Tree Approach.” International Journal of Remote Sensing. https://doi.org/10.1080/014311600210209.
Hawkins, Douglas M. 2004. “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences. https://doi.org/10.1021/ci0342472.
Helber, P, B Bischke, A Dengel, and D Borth. 2019. “A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Hennig, Ernest I., Christian Schwick, Tomáš Soukup, Erika Orlitová, Felix Kienast, and Jochen A.G. Jaeger. 2015. “Multi-Scale Analysis of Urban Sprawl in Europe: Towards a European de-Sprawling Strategy.” Land Use Policy, 483–98. https://doi.org/10.1016/j.landusepol.2015.08.001.
Hennig, Ernest I., Tomáš Soukup, Erika Orlitová, Christian Schwick, Felix Kienast, and Jochen A.G. Jaeger. 2016. “Annexes 1-5: Urban Sprawl in Europe. Joint EEA-FOEN Report.” Luxembourg. https://doi.org/10.2800/143470b.
Huang, Xin, and Liangpei Zhang. 2013. “An SVM Ensemble Approach Combining Spectral, Structural, and Semantic Features for the Classification of High-Resolution Remotely Sensed Imagery.” IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2012.2202912.
Hütt, Christoph, and Guido Waldhoff. 2018. “Multi-Data Approach for Crop Classification Using Multitemporal, Dual-Polarimetric TerraSAR-X Data, and Official Geodata.” European Journal of Remote Sensing. https://doi.org/10.1080/22797254.2017.1401909.
Ienco, DIno, Raffaele Gaetano, Claire Dupaquier, and Pierre Maurel. 2017. “Land Cover Classification via Multitemporal Spatial Data by Deep Recurrent Neural Networks.” IEEE Geoscience and Remote Sensing Letters. https://doi.org/10.1109/LGRS.2017.2728698.
Inglada, Jordi, Marcela Arias, Benjamin Tardy, Olivier Hagolle, Silvia Valero, David Morin, Gèrard Dedieu, et al. 2015. “Assessment of an Operational System for Crop Type Map Production Using High Temporal and Spatial Resolution Satellite Optical Imagery.” Remote Sensing 7: 12356–79. https://doi.org/10.3390/rs70912356.
Inglada, Jordi, Arthur Vincent, Marcela Arias, Benjamin Tardy, David Morin, and Isabel Rodes. 2017. “Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series.” Remote Sensing 9: 95. https://doi.org/10.3390/rs9010095.
Jackson, Qiong, and David A. Landgrebe. 2002. “Adaptive Bayesian Contextual Classification Based on Markov Random Fields.” IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2002.805087.
Jia, Kun, Qiangzi Li, Yichen Tian, Bingfang Wu, Feifei Zhang, and Jihua Meng. 2012. “Crop Classification Using Multi-Configuration SAR Data in the North China Plain.” International Journal of Remote Sensing. https://doi.org/10.1080/01431161.2011.587844.
Khatami, Reza, Giorgos Mountrakis, and Stephen V. Stehman. 2016. “A Meta-Analysis of Remote Sensing Research on Supervised Pixel-Based Land-Cover Image Classification Processes: General Guidelines for Practitioners and Future Research.” Remote Sensing of Environment 177: 89–100. https://doi.org/10.1016/j.rse.2016.02.028.
Lawrence, Steve, C. Lee Giles, and Ah Chung Tsoi. 1997. “Lessons in Neural Network Training: Overfitting May Be Harder than Expected.” In Proceedings of the National Conference on Artificial Intelligence.
Page 73 | 75
LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to Handwritten Zip Code Recognition.” Neural Computation 1 (4): 541–51. https://doi.org/10.1162/neco.1989.1.4.541.
LeCun, Y, and Y Bengio. 1995. “Convolutional Networks for Images, Speech, and Time Series. The Handbook of Brain Theory and Neural Networks” 3361 (10): 1995.
Lecun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.
Ledley, RS, M Buas, and TJ Golab. 1990. “Fundamentals of True-Color Image Processing.” In 10th International Conference on Pattern Recognition, 791–95.
Li, Qingting, Cuizhen Wang, Bing Zhang, and Linlin Lu. 2015. “Object-Based Crop Classification with Landsat-MODIS Enhanced Time-Series Data.” Remote Sensing. https://doi.org/10.3390/rs71215820.
Li, Y., X. Zhu, Y. Pan, J. Gu, A. Zhao, and X Liu. 2014. “A Comparison of Model-Assisted Estimators to Infer Land Cover/Use Class Area Using Satellite Imagery.” Remote Sensing 6 (9): 9034–63.
Liu, Peng, Hui Zhang, and Kie B. Eom. 2017. “Active Deep Learning for Classification of Hyperspectral Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (2): 712–24. https://doi.org/10.1109/JSTARS.2016.2598859.
Ma, Lei, Manchun Li, Xiaoxue Ma, Liang Cheng, Peijun Du, and Yongxue Liu. 2017. “A Review of Supervised Object-Based Land-Cover Image Classification.” ISPRS Journal of Photogrammetry and Remote Sensing 130: 277–93. https://doi.org/10.1016/j.isprsjprs.2017.06.001.
Ma, Lei, Yu Liu, Xueliang Zhang, Yuanxin Ye, Gaofei Yin, and Brian Alan Johnson. 2019. “Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review.” ISPRS Journal of Photogrammetry and Remote Sensing 152: 166–77. https://doi.org/10.1016/j.isprsjprs.2019.04.015.
“Matplotib.” n.d. Accessed September 19, 2019. https://matplotlib.org/. Nabielek, K, D Hamers, and D Evers. 2016. “Cities in Europe.” PBL Publishers 521 (2470). Navarro, Ana, João Rolim, Irina Miguel, João Catalão, Joel Silva, Marco Painho, Zoltán Vekerdy, et al.
2017. “Regional Scale Cropland Carbon Budgets: Evaluating a Geospatial Agricultural Modeling System Using Inventory Data.” International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1016/j.jag.2017.04.009.
NRCan. 2018. “Image Classification and Analysis. Natural Resources Canada Governement of Canada.” 2018. https://www.nrcan.gc.ca/maps-tools-publications/satellite-imagery-air-photos/remote-sensing-tutorials/image-interpretation-analysis/image-classification-and-analysis/9361.
OCED. 2018. Rethinking Urban Sprawl. Rethinking Urban Sprawl. https://doi.org/10.1787/9789264189881-en.
Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.
Pelletier, C, S Valero, J Inglada, N Champion, and G Dedieu. 2016. “Assessing the Robustness of Random Forests to Map Land Cover with High Resolution Satellite Image Time Series over Large Areas.” Remote Sensing 187: 156–68.
Pelletier, Charlotte, Silvia Valero, Jordi Inglada, Nicolas Champion, and Gérard Dedieu. 2016. “Assessing the Robustness of Random Forests to Map Land Cover with High Resolution Satellite Image Time Series over Large Areas.” Remote Sensing of Environment 187: 156–68. https://doi.org/10.1016/j.rse.2016.10.010.
Peña-Barragán, José M., Moffatt K. Ngugi, Richard E. Plant, and Johan Six. 2011. “Object-Based Crop Identification Using Multiple Vegetation Indices, Textural Features and Crop Phenology.” Remote Sensing of Environment. https://doi.org/10.1016/j.rse.2011.01.009.
Pesaresi, Martino, Christina Corbane, Andreea Julea, Aneta J. Florczyk, Vasileios Syrris, and Pierre Soille. 2016. “Assessment of the Added-Value of Sentinel-2 for Detecting Built-up Areas.” Remote Sensing 8 (4). https://doi.org/10.3390/rs8040299.
Pesaresi, Martino, Vasileios Syrris, and Andreea Julea. 2016. “A New Method for Earth Observation Data Analytics Based on Symbolic Machine Learning.” Remote Sensing 8 (5).
Page 74 | 75
https://doi.org/10.3390/rs8050399. Piotrowski, Adam P., and Jarosław J. Napiorkowski. 2013. “A Comparison of Methods to Avoid
Overfitting in Neural Networks Training in the Case of Catchment Runoff Modelling.” Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2012.10.019.
“Python.” n.d. Accessed September 19, 2019. https://www.python.org/about/gettingstarted/. Rodriguez-Galiano, V. F., B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez. 2012. “An
Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification.” ISPRS Journal of Photogrammetry and Remote Sensing 67: 93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002.
Rufin, Philippe, David Frantz, Stefan Ernst, Andreas Rabe, Patrick Griffiths, Mutlu özdoğan, and Patrick Hostert. 2019. “Mapping Cropping Practices on a National Scale Using Intra-Annual Landsat Time Series Binning.” Remote Sensing. https://doi.org/10.3390/rs11030232.
Santaella, Julio. 2019. “In-Depth Review of Satellite Imagery / Earth Observation Technology in Official Statistics.” In Conference of European Statisticians, 67th Plenary Session, Paris, France.
“Sentinel-5P Pre-Operations Data Hub.” n.d. Accessed September 19, 2019. https://s5phub.copernicus.eu/dhus/#/home.
Soille, P., A. Burger, D. De Marchi, P. Kempeneers, D. Rodriguez, V. Syrris, and V. Vasilev. 2018. “A Versatile Data-Intensive Computing Platform for Information Retrieval from Big Geospatial Data.” Future Generation Computer Systems 81: 30–40. https://doi.org/10.1016/j.future.2017.11.007.
Sonobe, Rei. 2019. “Parcel-Based Crop Classification Using Multi-Temporal TerraSAR-X Dual Polarimetric Data.” Remote Sensing. https://doi.org/10.3390/rs11101148.
Sonobe, Rei, Hiroshi Tani, Xiufeng Wang, Nobuyuki Kobayashi, and Hideki Shimamura. 2015. “Discrimination of Crop Types with TerraSAR-X-Derived Information.” Physics and Chemistry of the Earth. https://doi.org/10.1016/j.pce.2014.11.001.
Sun, Zhongchang, Ru Xu, Wenjie Du, Lei Wang, and Lu Dengsheng. 2019. “High-Resolution Urban Land Mapping in China from Sentinel 1A/2 Imagery Based on Google Earth Engine.” Remote Sensing.
“Sustainable Development.” n.d. Accessed September 19, 2019. https://sustainabledevelopment.un.org/sdgs.
Szegedy, C, V Vanhoucke, S Ioffe, J Shlens, and Z Wojna. 2016. “Rethinking the Inception Architecture for Computer Vision.” In IEEE Conference on Computer Vision and Pattern Recognition, 2818–26.
Szuster, Brian W., Qi Chen, and Michael Borger. 2011. “A Comparison of Classification Techniques to Support Land Cover and Land Use Analysis in Tropical Coastal Zones.” Applied Geography 31: 525–32. https://doi.org/10.1016/j.apgeog.2010.11.007.
“The Basisregistratie Adressen En Gebouwen.” n.d. Accessed September 19, 2019. https://zakelijk.kadaster.nl/basisregistratie-adressen-en-gebouwen.
“The Copernicus Open Access Hub.” n.d. Accessed September 19, 2019. https://scihub.copernicus.eu/. “The OECD Better Life Index.” n.d. Accessed September 19, 2019. http://www.oecdbetterlifeindex.org. “The Public Services On the Map.” n.d. Accessed September 19, 2019. https://www.pdok.nl. “The Soil Use File.” n.d. Accessed September 19, 2019. https://www.pdok.nl/introductie/-/article/cbs-
bestand-bodemgebruik. UNECE-CES. 2019. “In-Depth Review of Satellite Imagery / Earth Observation Technology in Official
Statistics.” https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2019/ECE_CES_2019_16-1906490E.pdf.
United Nation. 2017. “Earth Observations for Official Statistics.” https://unstats.un.org/bigdata/taskteams/satellite/UNGWG_Satellite_Task_Team_Report_WhiteCover.pdf.
Veloso, Amanda, Stéphane Mermoz, Alexandre Bouvet, Thuy Le Toan, Milena Planells, Jean François Dejoux, and Eric Ceschia. 2017. “Understanding the Temporal Behavior of Crops Using Sentinel-1 and Sentinel-2-like Data for Agricultural Applications.” Remote Sensing of Environment 199: 415–26. https://doi.org/10.1016/j.rse.2017.07.015.
“Well-Being in Germany.” n.d. Accessed September 19, 2019. https://www.gut-leben-in-
Page 75 | 75
deutschland.de/static/LB/en/. Xie, Qinghua, Jinfei Wang, Chunhua Liao, Jiali Shang, Juan M. Lopez-Sanchez, Haiqiang Fu, and Xiuguo
Liu. 2019. “On the Use of Neumann Decomposition for Crop Classification Using Multi-Temporal RADARSAt-2 Polarimetric SAR Data.” Remote Sensing. https://doi.org/10.3390/rs11070776.
Yesou, Herve, Eric Pottier, Gregoire Mercier, Manuel Grizonnet, Sadri Haouet, Alain Giros, Robin Faivre, Claire Huber, and Julien Michel. 2016. “Synergy of Sentinel-1 and Sentinel-2 Imagery for Wetland Monitoring Information Extraction from Continuous Flow of Sentinel Images Applied to Water Bodies and Vegetation Mapping and Monitoring.” In International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1109/IGARSS.2016.7729033.