Providing Statistical Algorithms as-a-Service
-
Upload
imarine283644 -
Category
Design
-
view
81 -
download
0
description
Transcript of Providing Statistical Algorithms as-a-Service
Providing Statistical Algorithms
as-a-Serviceas-a-Service
Gianpaolo Coro, Pasquale Pagano,
Leonardo Candela
ISTI-CNR, Pisa, Italy
Statistical Manager is a set of web services that aim to:
• Help scientists in managing marine, biological or climatic statistical problems
• Supply precooked state-of-the-art algorithms as-a-Service
• Perform calculations by using Cloud computing in a transparent way to the users
• Share input, results, parameters and comments with colleagues by means of Virtual
Research Environment in the D4Science e-Infrastructure
Statistical Manager
Research Environment in the D4Science e-Infrastructure
Statistical
Manager
D4Science
Computational
FacilitiesSharing
Setup and execution
Architecture
Internal Work
Resources and Sharing
Statistical Manager - Interface
Experiment Execution
Computations Check
Summary of the Input, Output
and Parameters of the experiment
Data Space - Sharing and Import
Hosted Algorithms
o Ecology
o Environment
o Biodiversity
Application Fields
o Biodiversity
o Life
EcologyEcology
Niche Modelling
• AquaMaps – Suitable Habitat
• AquaMaps – Native Habitat
• AquaMaps for 2050
• Artificial Neural Networks
• AquaMaps - ANN
Gadus morhua
AquaMaps - Suitable Habitat
Outliers Detection
Presence
Points
Density-based
Clustering
and Outliers detection
Cetorhinus maximus
Distance Based Clustering
K-Means
X-Means
DBScan
Climate Changes Effects on Species
Estimated impact of climate
changes over 20 years on 11549
Bioclimate HSpec
Overall occupancy in
time
changes over 20 years on 11549
species.Pseudanthias evansi
The occupancy by the
Pseudanthias evansi
decreases in Area 71 but
increases in Area 77
Similarity between habitats
Habitat Representativeness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
Latimeria chalumnae
HRS=10.5HRS=10.5
Habitat
Representativeness
Score
EnvironmentEnvironment
Rasterization
A polygonal map is
transformed into a raster
map or into a point map
Maps Comparison
compare
Compares :
• Species Distribution
mapsmaps
• Environmental layers
• SAR Images
Periodicity and Seasonality
Periodicity: 12 months
Extraction Tools Fourier AnalysisExtraction Tools Fourier Analysis
Environmental Signal Processing
Resampling
Spectrogram
BiodiversityBiodiversity
Occurrence Data from GBIF Occurrence Data from Obis
∩Intersection
-Difference
ᴜUnion
Occurrence Points
DD
Duplicates DeletionIntersection DifferenceUnion
A
x,y
Event Date
Modif Date
Author
Species Scientific Name
B
x,y
Event Date
Modif Date
Author
Species Scientific Name
Records
Similarity
Records
Similarity
Duplicates Deletion
BiOnym
Preprocessing
And
Parsing
A flexible workflow approach to
Taxon name
Matcher 1
Taxon name
ReferenceReference
Source
(ASFIS)(FISHBASE)
Reference
Source
(FISHBASE)
ReferenceReference
Source
(WoRMS)
Raw Input String.
E.g. Gadus morua Lineus 1758
DwC-A)
Reference
Source
(Other in
DwC-A)
A flexible workflow approach to
taxon name matching
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Taxon name
Matcher 2
Taxon name
Matcher n
PostProcessing
Correct Transcriptions:
E.g. Gadus morhua (Linnaeus, 1758)
Trendylyzer
• Fill some knowledge gaps on marine species
• Account for sampling biases
• Define trends for common species• Define trends for common species
Plankton regime shift
Herring recovered after the fish ban
Can we recognize big changes in
species presence?
LifeLife
Calculate the a and b parameters for 14 230
species by means of Bayesian Methods
Length-Weight Relationships
Approach:
� Collaborative development with the final user
� Integration of user’s R Scriptsbluewatermag.com.au
� Integration of user’s R Scripts
� Usage of Cloud computing for R Scripts
� Periodic runs
� The porting to the D4Science Statistical Manager allowed to run the scripts in distributed
fashion
� The time reduction was from 20 days to 11 hours! 95.4% reduction
Functions Simulation - Spawning Stock Biomass vs Recruits
Estimate biological limits for 50
Northeast Atlantic fish stocks
� Use real measures
� Rely on previous expert knowledge
� Use Bayesian models to combine
information
Re-estimated SSB limit
Re-estimated HS
Rule-
based
HS
Re-estimated
precautionary limit
Future WorkFuture Work
Plan
• Make the Statistical Manager Algorithms accessible
through the OGC WPS standard (currently available via
SOAP and Java API)
• Invoke the algorithms from a Workflow Management• Invoke the algorithms from a Workflow Management
System (e.g. Taverna)
• Expand the system with new algorithms
Thank you