Providing Statistical Algorithms as-a-Service

31
Providing Statistical Algorithms as-a-Service as-a-Service Gianpaolo Coro, Pasquale Pagano, Leonardo Candela ISTI-CNR, Pisa, Italy

description

In computational statistics, algorithms often have specialized implementations that address very specific problems. Every so often, these algorithms are applicable also to other problems than the original ones. Today, interest is growing towards modular and pluggable solutions that enable the repetition and validation of the experiments made by other scientists and allow the exploitation of those algorithms in other contexts. Furthermore, such procedures are requested to be remotely hosted and to “hide” the complexity of the calculations, managed by remote computational infrastructures behind the scenes. For such reasons, the usual solution of supplying modular software libraries containing implementations of algorithms is leaving the place to Web Services accessible through standard protocols and hosting such implementations. The protocols describing the computational capabilities of these Services are more and more elaborate, so that modular workflows can rely on them.

Transcript of Providing Statistical Algorithms as-a-Service

Page 1: Providing Statistical Algorithms as-a-Service

Providing Statistical Algorithms

as-a-Serviceas-a-Service

Gianpaolo Coro, Pasquale Pagano,

Leonardo Candela

ISTI-CNR, Pisa, Italy

Page 2: Providing Statistical Algorithms as-a-Service

Statistical Manager is a set of web services that aim to:

• Help scientists in managing marine, biological or climatic statistical problems

• Supply precooked state-of-the-art algorithms as-a-Service

• Perform calculations by using Cloud computing in a transparent way to the users

• Share input, results, parameters and comments with colleagues by means of Virtual

Research Environment in the D4Science e-Infrastructure

Statistical Manager

Research Environment in the D4Science e-Infrastructure

Statistical

Manager

D4Science

Computational

FacilitiesSharing

Setup and execution

Page 3: Providing Statistical Algorithms as-a-Service

Architecture

Page 4: Providing Statistical Algorithms as-a-Service

Internal Work

Page 5: Providing Statistical Algorithms as-a-Service

Resources and Sharing

Page 6: Providing Statistical Algorithms as-a-Service

Statistical Manager - Interface

Page 7: Providing Statistical Algorithms as-a-Service

Experiment Execution

Page 8: Providing Statistical Algorithms as-a-Service

Computations Check

Summary of the Input, Output

and Parameters of the experiment

Page 9: Providing Statistical Algorithms as-a-Service

Data Space - Sharing and Import

Page 10: Providing Statistical Algorithms as-a-Service

Hosted Algorithms

Page 11: Providing Statistical Algorithms as-a-Service

o Ecology

o Environment

o Biodiversity

Application Fields

o Biodiversity

o Life

Page 12: Providing Statistical Algorithms as-a-Service

EcologyEcology

Page 13: Providing Statistical Algorithms as-a-Service

Niche Modelling

• AquaMaps – Suitable Habitat

• AquaMaps – Native Habitat

• AquaMaps for 2050

• Artificial Neural Networks

• AquaMaps - ANN

Gadus morhua

AquaMaps - Suitable Habitat

Page 14: Providing Statistical Algorithms as-a-Service

Outliers Detection

Presence

Points

Density-based

Clustering

and Outliers detection

Cetorhinus maximus

Distance Based Clustering

K-Means

X-Means

DBScan

Page 15: Providing Statistical Algorithms as-a-Service

Climate Changes Effects on Species

Estimated impact of climate

changes over 20 years on 11549

Bioclimate HSpec

Overall occupancy in

time

changes over 20 years on 11549

species.Pseudanthias evansi

The occupancy by the

Pseudanthias evansi

decreases in Area 71 but

increases in Area 77

Page 16: Providing Statistical Algorithms as-a-Service

Similarity between habitats

Habitat Representativeness Score:

1. Measures the similarity between the environmental features of two areas

2. Assesses the quality of models and environmental features

Latimeria chalumnae

HRS=10.5HRS=10.5

Habitat

Representativeness

Score

Page 17: Providing Statistical Algorithms as-a-Service

EnvironmentEnvironment

Page 18: Providing Statistical Algorithms as-a-Service

Rasterization

A polygonal map is

transformed into a raster

map or into a point map

Page 19: Providing Statistical Algorithms as-a-Service

Maps Comparison

compare

Compares :

• Species Distribution

mapsmaps

• Environmental layers

• SAR Images

Page 20: Providing Statistical Algorithms as-a-Service

Periodicity and Seasonality

Periodicity: 12 months

Extraction Tools Fourier AnalysisExtraction Tools Fourier Analysis

Page 21: Providing Statistical Algorithms as-a-Service

Environmental Signal Processing

Resampling

Spectrogram

Page 22: Providing Statistical Algorithms as-a-Service

BiodiversityBiodiversity

Page 23: Providing Statistical Algorithms as-a-Service

Occurrence Data from GBIF Occurrence Data from Obis

∩Intersection

-Difference

ᴜUnion

Occurrence Points

DD

Duplicates DeletionIntersection DifferenceUnion

A

x,y

Event Date

Modif Date

Author

Species Scientific Name

B

x,y

Event Date

Modif Date

Author

Species Scientific Name

Records

Similarity

Records

Similarity

Duplicates Deletion

Page 24: Providing Statistical Algorithms as-a-Service

BiOnym

Preprocessing

And

Parsing

A flexible workflow approach to

Taxon name

Matcher 1

Taxon name

ReferenceReference

Source

(ASFIS)(FISHBASE)

Reference

Source

(FISHBASE)

ReferenceReference

Source

(WoRMS)

Raw Input String.

E.g. Gadus morua Lineus 1758

DwC-A)

Reference

Source

(Other in

DwC-A)

A flexible workflow approach to

taxon name matching

Accounts for:

• Variations in the spelling and

interpretation of taxonomic

names

• Combination of data from

different sources

• Harmonization and reconciliation

of Taxa names

Taxon name

Matcher 2

Taxon name

Matcher n

PostProcessing

Correct Transcriptions:

E.g. Gadus morhua (Linnaeus, 1758)

Page 25: Providing Statistical Algorithms as-a-Service

Trendylyzer

• Fill some knowledge gaps on marine species

• Account for sampling biases

• Define trends for common species• Define trends for common species

Plankton regime shift

Herring recovered after the fish ban

Can we recognize big changes in

species presence?

Page 26: Providing Statistical Algorithms as-a-Service

LifeLife

Page 27: Providing Statistical Algorithms as-a-Service

Calculate the a and b parameters for 14 230

species by means of Bayesian Methods

Length-Weight Relationships

Approach:

� Collaborative development with the final user

� Integration of user’s R Scriptsbluewatermag.com.au

� Integration of user’s R Scripts

� Usage of Cloud computing for R Scripts

� Periodic runs

� The porting to the D4Science Statistical Manager allowed to run the scripts in distributed

fashion

� The time reduction was from 20 days to 11 hours! 95.4% reduction

Page 28: Providing Statistical Algorithms as-a-Service

Functions Simulation - Spawning Stock Biomass vs Recruits

Estimate biological limits for 50

Northeast Atlantic fish stocks

� Use real measures

� Rely on previous expert knowledge

� Use Bayesian models to combine

information

Re-estimated SSB limit

Re-estimated HS

Rule-

based

HS

Re-estimated

precautionary limit

Page 29: Providing Statistical Algorithms as-a-Service

Future WorkFuture Work

Page 30: Providing Statistical Algorithms as-a-Service

Plan

• Make the Statistical Manager Algorithms accessible

through the OGC WPS standard (currently available via

SOAP and Java API)

• Invoke the algorithms from a Workflow Management• Invoke the algorithms from a Workflow Management

System (e.g. Taverna)

• Expand the system with new algorithms

Page 31: Providing Statistical Algorithms as-a-Service

Thank you