Providing Statistical Algorithms as-a-Service

Post on 13-Jan-2015

81 views 0 download

Tags:

description

In computational statistics, algorithms often have specialized implementations that address very specific problems. Every so often, these algorithms are applicable also to other problems than the original ones. Today, interest is growing towards modular and pluggable solutions that enable the repetition and validation of the experiments made by other scientists and allow the exploitation of those algorithms in other contexts. Furthermore, such procedures are requested to be remotely hosted and to “hide” the complexity of the calculations, managed by remote computational infrastructures behind the scenes. For such reasons, the usual solution of supplying modular software libraries containing implementations of algorithms is leaving the place to Web Services accessible through standard protocols and hosting such implementations. The protocols describing the computational capabilities of these Services are more and more elaborate, so that modular workflows can rely on them.

Transcript of Providing Statistical Algorithms as-a-Service

Providing Statistical Algorithms

as-a-Serviceas-a-Service

Gianpaolo Coro, Pasquale Pagano,

Leonardo Candela

ISTI-CNR, Pisa, Italy

Statistical Manager is a set of web services that aim to:

• Help scientists in managing marine, biological or climatic statistical problems

• Supply precooked state-of-the-art algorithms as-a-Service

• Perform calculations by using Cloud computing in a transparent way to the users

• Share input, results, parameters and comments with colleagues by means of Virtual

Research Environment in the D4Science e-Infrastructure

Statistical Manager

Research Environment in the D4Science e-Infrastructure

Statistical

Manager

D4Science

Computational

FacilitiesSharing

Setup and execution

Architecture

Internal Work

Resources and Sharing

Statistical Manager - Interface

Experiment Execution

Computations Check

Summary of the Input, Output

and Parameters of the experiment

Data Space - Sharing and Import

Hosted Algorithms

o Ecology

o Environment

o Biodiversity

Application Fields

o Biodiversity

o Life

EcologyEcology

Niche Modelling

• AquaMaps – Suitable Habitat

• AquaMaps – Native Habitat

• AquaMaps for 2050

• Artificial Neural Networks

• AquaMaps - ANN

Gadus morhua

AquaMaps - Suitable Habitat

Outliers Detection

Presence

Points

Density-based

Clustering

and Outliers detection

Cetorhinus maximus

Distance Based Clustering

K-Means

X-Means

DBScan

Climate Changes Effects on Species

Estimated impact of climate

changes over 20 years on 11549

Bioclimate HSpec

Overall occupancy in

time

changes over 20 years on 11549

species.Pseudanthias evansi

The occupancy by the

Pseudanthias evansi

decreases in Area 71 but

increases in Area 77

Similarity between habitats

Habitat Representativeness Score:

1. Measures the similarity between the environmental features of two areas

2. Assesses the quality of models and environmental features

Latimeria chalumnae

HRS=10.5HRS=10.5

Habitat

Representativeness

Score

EnvironmentEnvironment

Rasterization

A polygonal map is

transformed into a raster

map or into a point map

Maps Comparison

compare

Compares :

• Species Distribution

mapsmaps

• Environmental layers

• SAR Images

Periodicity and Seasonality

Periodicity: 12 months

Extraction Tools Fourier AnalysisExtraction Tools Fourier Analysis

Environmental Signal Processing

Resampling

Spectrogram

BiodiversityBiodiversity

Occurrence Data from GBIF Occurrence Data from Obis

∩Intersection

-Difference

ᴜUnion

Occurrence Points

DD

Duplicates DeletionIntersection DifferenceUnion

A

x,y

Event Date

Modif Date

Author

Species Scientific Name

B

x,y

Event Date

Modif Date

Author

Species Scientific Name

Records

Similarity

Records

Similarity

Duplicates Deletion

BiOnym

Preprocessing

And

Parsing

A flexible workflow approach to

Taxon name

Matcher 1

Taxon name

ReferenceReference

Source

(ASFIS)(FISHBASE)

Reference

Source

(FISHBASE)

ReferenceReference

Source

(WoRMS)

Raw Input String.

E.g. Gadus morua Lineus 1758

DwC-A)

Reference

Source

(Other in

DwC-A)

A flexible workflow approach to

taxon name matching

Accounts for:

• Variations in the spelling and

interpretation of taxonomic

names

• Combination of data from

different sources

• Harmonization and reconciliation

of Taxa names

Taxon name

Matcher 2

Taxon name

Matcher n

PostProcessing

Correct Transcriptions:

E.g. Gadus morhua (Linnaeus, 1758)

Trendylyzer

• Fill some knowledge gaps on marine species

• Account for sampling biases

• Define trends for common species• Define trends for common species

Plankton regime shift

Herring recovered after the fish ban

Can we recognize big changes in

species presence?

LifeLife

Calculate the a and b parameters for 14 230

species by means of Bayesian Methods

Length-Weight Relationships

Approach:

� Collaborative development with the final user

� Integration of user’s R Scriptsbluewatermag.com.au

� Integration of user’s R Scripts

� Usage of Cloud computing for R Scripts

� Periodic runs

� The porting to the D4Science Statistical Manager allowed to run the scripts in distributed

fashion

� The time reduction was from 20 days to 11 hours! 95.4% reduction

Functions Simulation - Spawning Stock Biomass vs Recruits

Estimate biological limits for 50

Northeast Atlantic fish stocks

� Use real measures

� Rely on previous expert knowledge

� Use Bayesian models to combine

information

Re-estimated SSB limit

Re-estimated HS

Rule-

based

HS

Re-estimated

precautionary limit

Future WorkFuture Work

Plan

• Make the Statistical Manager Algorithms accessible

through the OGC WPS standard (currently available via

SOAP and Java API)

• Invoke the algorithms from a Workflow Management• Invoke the algorithms from a Workflow Management

System (e.g. Taverna)

• Expand the system with new algorithms

Thank you