CDCOL · 2017. 9. 26. · CDCOL A GEOSCIENCE DATA CUBE THAT MEETS COLOMBIAN NEEDS Christian...

31
CDCOL A GEOSCIENCE DATA CUBE THAT MEETS COLOMBIAN NEEDS Christian Ariza-Porras, Germán Bravo, Mario Villamizar , Andrés Moreno, Harold Castro, Gustavo Galindo, Edersson Cabera, Saralux Valbuena, and Pilar Lozano

Transcript of CDCOL · 2017. 9. 26. · CDCOL A GEOSCIENCE DATA CUBE THAT MEETS COLOMBIAN NEEDS Christian...

CDCOL A GEOSCIENCE DATA CUBE THAT MEETS COLOMBIAN NEEDS

Christian Ariza-Porras, Germán Bravo, Mario Villamizar , Andrés Moreno, Harold Castro, Gustavo Galindo, Edersson Cabera, Saralux Valbuena, and Pilar Lozano

Dimensions Latitude, Longitude

Dimensions Time

2010

2011

2012

2013

2014

2015

2016

x

y

t

Dimensions Spectral bands

Variety

SOURCE

Landsat

MODIS

IDEAM products

Sentinel

RESOLUTION

Temporal

Spatial

Spectral

Problem Analysts’ time

Effort replication

Processing

Variety of sources and tools

Replicability

Processing power and storage

Developers are a scarce resource

Results can be reused only if can be trusted

Traditional remote sensing product generation process

Source: Held A. 2015. Power Point presentation First Workshop Data Cube Colombia

To majority of end-users, saving up to 80% of collective effort and costs.

New Vision – Analysis ready data

Source: Held A. 2015. Power Point presentation First Workshop Data Cube Colombia

CDCol Goals

Data ownership Extensibility Lineage Replicability Standardization

Reusability Complexity abstraction

Ease of use Parallelization

Related Works

Background

This work

Solution Strategy

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

CDCol User Roles

System Administrator

Data Administrator

Developer

Analyst

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

Algorithms Life Cycle

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

Development

Complexity Abstraction

• Independent of datacube-core

• Automatic parallelization

• Python well known libraries

• Numpy

• xArray

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

Execution

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

CDCol Web UI

Empowers users to work on a large set of satellite images from any device

Reduces learning curve

Authentication and roles management

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

CDCol Demo

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

Parallelization Strategy

Automatic

By Tile

Generic Task

Celery

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

Bulk Ingestion

Initial ingestion

15854 Scenes

Landsat 5, 7, and 8 (T1 Surface

Reflectance products from USGS)

15 years

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

Training Workshops

Training and diffusion workshops are essential to the success of the data cube.

Developers

• Python fundamentals

• Multidimensional arrays manipulation on python

Analysts

• Datacube workfow

Roles Bank of

algorithms and results

Web UI

Parallelization strategy

Bulk Ingestion Training

Workshop

CDCol Components

CDCol Components

OpenDatacube/

datacube-core

CDCol Components

Results Bank of algorithms

◦ Algorithms

◦ Temporal medians compounds

◦ NDVI

◦ Forest-No forest classification

◦ Change detection using PCA

◦ WOFS –adapted

Workshops participants developed their own algorithms

Repeatable results

Set of available tools to analysts

Time reduction (a task that used to take 72 hours now can be done on 12 hours)

Results

15años DATOS DE 2000-2015

30metros RESOLUCIÓN DE PIXEL

342 escenas

LANDSAT 7/8

2h PROCESAMIENTO

Results

15años DATOS DE 2000-2015

30metros RESOLUCIÓN DE PIXEL

466 imágenes

LANDSAT 7/8

2min x año PROCESAMIENTO

Bosque Otras Coberturas

Results

15años DATOS DE 2000-2015

30metros RESOLUCIÓN DE PIXEL

45 imágenes

LANDSAT 7

20min x periodo PROCESAMIENTO

Conclusions Data ownership

• 15 years of curated images from different sources

Extensibility

• Developers can implement, with a low learning curve, new algorithms

• Data administrator to add new images to collection, and create new data types to support new sources.

Lineage and Replicability

• Results are replicable by logging executions parameters and algorithms versions.

Complexity abstraction

• Algorithms are independent of data cube core API. Developers Works only with multidimensional arrays with well stablished Python packages.

Ease of use

• Easy to use web user interface.

Parallelism

• Automatic parallelism by tile.

Future Work Horizontal Scaling

Algorithm dependent parallelization schemes

Workflows management

New sensors

New algorithms

Training

Cloud enabled-CDCol

Acknowledgements We thank to Brian Killough from NASA, and Alfredo Delos Santos and Kayla Fox from AMA team, for their support and fruitfully discussions. We also thank to CEOS Australia group for its work and for share it with the world. We thank also to the Environmental Ministry for financial support.

CDCol uses NetCDF format UCAR/Unidata to storage ingested data and results (http://doi.org/10.5065/D6H70CW6).