ECOTAXA, a web based application for collaborating on...

1
Marc Picheral 1 , Sebastien Colin 2 , Jean Olivier Irisson 1 , Amanda Elineau 1 , Colomban de Vargas 2 & Lars Stemmann 1 1 Sorbonne Universités, UPMC Université Paris 06, CNRS, Laboratoire d’oceanographie de Villefranche (LOV), Observatoire Océanologique, 06230 Villefranche-sur-Mer, France 2 Sorbonne Universités, UPMC Université Paris 06, CNRS, Laboratoire Adaptation et Diversité en Milieu Marin, Station Biologique de Roscoff, 29680 Roscoff, France Over the last 25 years, our team developed laboratory and in situ imaging instruments to study marine particles and zooplankton. The Zooscan (Gorsky, 2010) and the Underwater Vision Profiler (Picheral, 2010) are now used worldwide (102 Zooscans and 19 UVPs). We also developed a software tool-chain, based on ImageJ, to analyze the images acquired (Zooprocess). This suite can now process many kinds of images and has been used with other instruments such as ISIIS, Flowcam, etc. It can pilot the imaging instrument, subtract the background of the image and segment regions of interest, measure ~50 properties on each object and create labelled vignettes. These properties and vignettes were used by the Plankton Identifier application to automatically classify objects using machine learning. This tool assisted the users for the creation of the learning sets and the validation of the identification proposed by the classification algorithm; it achieved good success using Random Forests. Plankton Identifier paved the way for a new web based application enabling a network approach to plankton identification on images. Ecotaxa (http://ecotaxa.obs-vlfr.fr/ & http://ecotaxa.sb- roscoff.fr/ ) was specified in 2015 in collaboration with the CNRS EPEP team in Roscoff in the frame of the Tara Oceans expeditions and released in 2016. As its predecessor, Ecotaxa handles images of individual organisms, proposes identifications using machine learning, and keeps all metadata associated with each vignette, from the acquisition to the final identification. However, the founding principles of Ecotaxa are that: (i) The identification of organisms is collaborative, through the internet; (ii) Every change is explicit and recorded in a robust relational database (including the simple confirmation of a correct identification); (iii) Identifications are based on a universal taxonomy (http://unieuk.org/) that allows to link the morphology of organisms with genomic information. (iv) ECOTAXA can easely import image datasets from any instrument The application already hosts over 30 million images of plankton, about 30% which have been verified by experts. We are entering a round of updates that will allow communication between Ecotaxa instances, ease the subscription of new users, ease the selection of a subset of objects and offer new deep learning algorithms for automatic classification. We are also publishing instrument-specific taxonomic guides to homogenize the sorting of the images by different experts worldwide. ECOTAXA, a web based application for collaborating on large plankton image datasets FREE of use/installation CLASSIFICATION (annotation) of ORGANISMS using individual images COLLABORATIVE annotation via WEB interface (all successive annotations are recorded) OPTIMIZED for large datasets ( > 20 000 000 images per instance) USES classification tools to assist taxonomist classifying large datasets : Random Forest Deep Learning EXPLICIT VALIDATION of the prediction PROVIDES A REFERENCE TAXONOMIC framework for SORTING : UniEuk Application coded by www.altidev.com Initial image process : segmentation, ROI, extraction of features => datafile + vignettes UVP5 Zooscan Flowcam ISIIS IFCB Automated microscope www.ecotaxa.obs-vlfr.fr www.ecotaxa.sb-roscoff.fr today : Public exploration of validated images across oceans Powerfull filters: Taxonomy Date/time/month/Depth/sample… Annotators (experts) Automatic classification Random Forest Deep Learning (soon) POWERFUL manual annotation > 20 000 images / day !!! All operations recorded Explict validation Instruments datasets : IFCB, FlowCam, Zooscan, ISIIS, Camera, UVP5, HCS1, Zoocam… > 30 10 6 images hosted / > 10 10 6 annotated (prediction + validation by experts) Today, the MOST IMPORTANT worldwide dataset of annotated plankton images ! THE BOTTLENECK : AUTOMATIC CLASSIFICATION / PREDICTION & MANUAL ANNOTATION / VALIDATION

Transcript of ECOTAXA, a web based application for collaborating on...

Page 1: ECOTAXA, a web based application for collaborating on ...marine-imaging-workshop.com/documents/miw17/presentations/public/... · ECOTAXA, a web based application for collaborating

Marc Picheral1, Sebastien Colin2, Jean Olivier Irisson1, Amanda Elineau1, Colomban de Vargas2 & Lars Stemmann1

1Sorbonne Universités, UPMC Universite ́ Paris 06, CNRS, Laboratoire d’oceanographie de Villefranche (LOV), Observatoire Océanologique, 06230 Villefranche-sur-Mer, France2Sorbonne Universités, UPMC Université Paris 06, CNRS, Laboratoire Adaptation et Diversité en Milieu Marin, Station Biologique de Roscoff, 29680 Roscoff, France

Over the last 25 years, our team developed laboratory and in situ imaging instruments to study marine particles and zooplankton. The Zooscan (Gorsky, 2010) and the Underwater Vision Profiler(Picheral, 2010) are now used worldwide (102 Zooscans and 19 UVPs). We also developed a software tool-chain, based on ImageJ, to analyze the images acquired (Zooprocess). This suite can nowprocess many kinds of images and has been used with other instruments such as ISIIS, Flowcam, etc. It can pilot the imaging instrument, subtract the background of the image and segment regions ofinterest, measure ~50 properties on each object and create labelled vignettes. These properties and vignettes were used by the Plankton Identifier application to automatically classify objects usingmachine learning. This tool assisted the users for the creation of the learning sets and the validation of the identification proposed by the classification algorithm; it achieved good success usingRandom Forests.Plankton Identifier paved the way for a new web based application enabling a network approach to plankton identification on images. Ecotaxa (http://ecotaxa.obs-vlfr.fr/ & http://ecotaxa.sb-roscoff.fr/ ) was specified in 2015 in collaboration with the CNRS EPEP team in Roscoff in the frame of the Tara Oceans expeditions and released in 2016. As its predecessor, Ecotaxa handles images ofindividual organisms, proposes identifications using machine learning, and keeps all metadata associated with each vignette, from the acquisition to the final identification. However, the foundingprinciples of Ecotaxa are that:(i) The identification of organisms is collaborative, through the internet;(ii) Every change is explicit and recorded in a robust relational database (including the simple confirmation of a correct identification);(iii) Identifications are based on a universal taxonomy (http://unieuk.org/) that allows to link the morphology of organisms with genomic information.(iv) ECOTAXA can easely import image datasets from any instrumentThe application already hosts over 30 million images of plankton, about 30% which have been verified by experts. We are entering a round of updates that will allow communication between Ecotaxainstances, ease the subscription of new users, ease the selection of a subset of objects and offer new deep learning algorithms for automatic classification. We are also publishing instrument-specifictaxonomic guides to homogenize the sorting of the images by different experts worldwide.

ECOTAXA, a web based application for collaborating on large plankton image datasets

• FREE of use/installation• CLASSIFICATION (annotation) of ORGANISMS using individual images• COLLABORATIVE annotation via WEB interface (all successive annotations

are recorded)• OPTIMIZED for large datasets ( > 20 000 000 images per instance)• USES classification tools to assist taxonomist classifying large datasets :

• Random Forest• Deep Learning

• EXPLICIT VALIDATION of the prediction• PROVIDES A REFERENCE TAXONOMIC framework for SORTING : UniEuk

Application coded by www.altidev.com

Initial image process : segmentation, ROI, extraction of features => datafile + vignettes

UVP5

Zooscan Flowcam ISIISIFCB Automated microscope

www.ecotaxa.obs-vlfr.frwww.ecotaxa.sb-roscoff.fr

today : Public exploration of validated images acrossoceans

Powerfull filters:• Taxonomy• Date/time/month/Depth/sample…• Annotators (experts)

Automatic classification• Random Forest• Deep Learning (soon)

POWERFUL manual annotation• > 20 000 images / day !!!• All operations recorded• Explict validation

Instruments datasets : IFCB, FlowCam, Zooscan, ISIIS, Camera, UVP5, HCS1, Zoocam…> 30 106 images hosted / > 10 106 annotated (prediction + validation by experts)

Today, the MOST IMPORTANT worldwide dataset of annotated plankton images !

THE BOTTLENECK :AUTOMATIC

CLASSIFICATION / PREDICTION&

MANUALANNOTATION / VALIDATION