2015 06 hcmr - emodnet-eubon - associating organisms with their environments - low res
-
Upload
evangelos-pafilis -
Category
Data & Analytics
-
view
40 -
download
0
Transcript of 2015 06 hcmr - emodnet-eubon - associating organisms with their environments - low res
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Evangelos Pafilis
Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC)
Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece
[email protected], http://epafilis.info
Associating Organisms With Their Environment:
Automated Mining, Assisted Curation and
Exploratory Visualization Approaches
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
http://eol.org/data_objects/31415353
Information in Free Text
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
http://eol.org/data_objects/31415353
Information in Free Text
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008)
Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis
In-house HCMR document (Polymenakou, Oulas, et al.)
Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)
Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…]
Scientific web pages
Literature (abstracts, full-text articles, legends)
In-house documents
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”, Dinsdale et al, 2008)
Microbial mat samples were collected from the hydrothermal vent field located in the Kolumbo submarine volcanic crater, off the coast of the island of Santorini. The bacteria and archaea community composition was evaluated further via shotgun metagenomics analysis
In-house HCMR document (Polymenakou, Oulas, et al.)
Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)
Figure 1. Sampling sites on a cross-slope transect. [….] Oceanographically, the stations represent the abyssal plain (GeoB12815), the continental rise (GeoB12808, GeoB12811), the continental slope (GeoB12803, GeoB12802), the shelf break (GeoB12807) and the shelf (GeoB12806). Surface sediments were recovered by gravity and multi-coring. [.…]
Scientific web pages
Literature (abstracts, full-text articles, legends)
In-house documents
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
terrestrial, aquatic, marine, lagoon, coral reef, sediment, freshwater, soil
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ENVIRONMENTS http://environments.hcmr.gr http://environments-eol.blogspot.gr/
● Dictionary based, Open Source ● Environment Ontology ● Fast (4000 PubMed abstracts / sec) * ● Based on SPECIES name recognition
tagger (Pafilis et al, PLOS ONE)
● E600 gold standard: EnvO-based corpus of EOL Species pages
● Recognition Accuracy – Mention Level: - F1: 82.0% 87.1% of the TPs: exact id among predicted ones
*: based a single-thread run on an Intel 2,27GHz, 24 GB RAM processing a set of 536,052 abstracts
Pafilis,E. et al. (2015) ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life. Bioinformatics, 10.1093/bioinformatics/btv045
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
biome
environmental feature
environmental material
environmental condition
habitat … … … … …
Based on slides by Dr. Pier Luigi Buttigier, AWI, Bremenhaven, Germany
http://environmentontology.org ~1600 terms, June 2013
EnvO: source of environment descriptor names and synonyms
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ENVIRONMENTS – Improving Accuracy
● Increasing matches in text ● orthographic variation supported
e.g. freshwater, fresh water, and fresh-water ● Case-insensitive matching ● Synonym generation to reflect the way environment descriptive
terms are mentioned in text (both generic and EnvO specific)
● Preventing overmatching (i.e. avoiding increased FP) ● „stopword-list” (e.g. spring, well, range)
Action Example Add a variant in which non-informative words have been removed
epipelagic zone → epipelagic estuarine biome → estuarine
Plural form addition sediment → sediments Adjective form addition lagoon → lagoonal
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
EnvO parts Not included: species tissues foods
Limitations – Known Issues
negation not supported conflicts with anatomy terms
(e.g. mouth, blowhole)
Scope
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ENVIRONMENTS – Sample Output
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477
File Name
Start offset
End offset
Match text EnvO ID
Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ENVIRONMENTS – Sample Output
Update to EOLTAGS 346289845
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477
File Name
Start offset
End offset
Match text EnvO ID
Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
Traversing all IS_A, PART_OF
Relationships in EnvO
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Parr CS, et al. The Encyclopedia of Life v2: Providing Global Access to Knowledge About Life on Earth (2014) Biodiversity Data Journal 2: e1079
http://eol.org/
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
http://eol.org/info/discover_what
• Encyclopedia of Life (EOL) http://www.eol.org • one-stop-shop for biodiversity knowledge • Over 3 Mi Taxa
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
• Encyclopedia of Life (EOL) http://www.eol.org • one-stop-shop for biodiversity knowledge • Over 3 Mi Taxa
http://eol.org/info/discover_what
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ID: ENVO:00000192 Name: mudflat
ID: ENVO:00000020 Name: lake
http://eol.org/data_objects/31415353
Information in Free Text
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
http://rs.tdwg.org/ontology/voc/SPMInfoItems#Biology http://rs.tdwg.org/ontology/voc/SPMInfoItems#Conservation http://rs.tdwg.org/ontology/voc/SPMInfoItems#Description http://rs.tdwg.org/ontology/voc/SPMInfoItems#Dispersal http://rs.tdwg.org/ontology/voc/SPMInfoItems#Distribution http://rs.tdwg.org/ontology/voc/SPMInfoItems#Ecology http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat http://rs.tdwg.org/ontology/voc/SPMInfoItems#LifeCycle http://rs.tdwg.org/ontology/voc/SPMInfoItems#Migration http://rs.tdwg.org/ontology/voc/SPMInfoItems#Reproduction http://rs.tdwg.org/ontology/voc/SPMInfoItems#TrophicStrategy http://www.eol.org/voc/table_of_contents#Wikipedia More info: http://eol.org/info/98 (“EOL Subject Types”)
Species Descriptions (e.g. “Biology Description”, “Ecology”, “Habitat”)
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ENV-EOL Annotation .tsv
EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00000192 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00002297 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00000043 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00000000 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00000012 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud ENVO:01000001 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud ENVO:00010483 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00000180 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00000191 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00002297 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00000176 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00000000 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00000477
EOL Taxon ID
EOL Data
Object EnvO ID
Traversing all IS_A, PART_OF
Relationships in EnvO
Subject Type Match
text
Annotations corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
ENV-EOL Annotation .tsv
EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00000192 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud ENVO:01000001 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mounds ENVO:00000180 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat mud flats ENVO:00000192 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat estuaries ENVO:00000045 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat bodies of water ENVO:00000063 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat coastal areas ENVO:00000303 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat inlets ENVO:00000475 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat rivers ENVO:00000887 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat rivers ENVO:00000890 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat lakes ENVO:00000020 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat coastal ENVO:00000303 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat lagoons ENVO:00000038 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat volcanic ENVO:00000354 EOL:913221 31415353;http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat lakes ENVO:00000020
EOL Taxon ID
EOL Data
Object EnvO ID
Direct Matches Only (No hierarchy traversal)
Subject Type Match
text
Annotations corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Download
ENVIRONMENTS-EOL Annotations (17th Oct 2014) • 234.582 EOL Taxa • 1.945.383 Tags • 1.077.522 Unique EOL Taxon – EnvO ID pairs
ENVIRONMENTS
• Home Page: http://environments.hcmr.gr/ • ENVIRONMENTS-EOL Annotations:
http://download.jensenlab.org/EOL/ • Tagger Software:
http://download.jensenlab.org/environments_tagger.tar.gz
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Phoenicopterus ruber (Greater Flamingo) EOL Taxon Page Overview: http://eol.org/pages/913221
Quick Facts
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Data Tab
Phoenicopterus ruber (Greater Flamingo) EOL Taxon Page Data Tab: http://eol.org/pages/913221 Parts of the screenshot truncated for illustration purposes
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Integrated in Traitbank
http://eol.org/info/516 Parr CS, et al. TraitBank: Practical semantics for organism attribute data, Semantic Web Journal, under review
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
http://environments.hcmr.gr,http://environments.jensenlab.org
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
• Interactive
• Lightweight
• Term look up assistant
• Standards-compliant term suggestions
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Prototype: http://environments.hcmr.gr/biocreative.html
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
https://gold.jgi-psf.org/studies?Study.Metagenomic+Study=Yes&Study.Is+Public=Yes
A
B
C
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Prototype: http://environments.hcmr.gr/biocreative.html
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Prototype: http://environments.hcmr.gr/biocreative.html
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Prototype: http://environments.hcmr.gr/biocreative.html
retroactive prospective
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
BioCreative V Track 5: Interactive Curation (IAT) Dr. L. Hirschman, Dr. C. Arighi et al. September 2015, Sevilla, Spain Beta: • Entity highlighting • Suggested Term:
• sorting • Selection • exporting
• Integration with Metagenomics Resources
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
http://jensenlab.org/
http://tissues.jensenlab.org/ - Santos A et al., PeerJ in press preprint: http://biorxiv.org/content/early/2014/11/10/010975
http://diseases.jensenlab.org/ - Pletscher-Frankild,S., et al. (2014) DISEASES: Text mining and data integration of disease-gene associations. Methods, 74, 83–89.
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
When and where is a species
most likely to be found, and vice versa?
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Which other species are most likely
to be found near by?
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Where ? • Location • Environment type
When ? • Life Stage • Periods of year
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
• ~15000 EnvO term tags in the EOL Pages • For 234 bird species reported in Crete,
Greece (accord to GBIF, NCBI Taxonomy)
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
• Statistics (e.g term count per taxon) • Data provenance (e.g. EOL Text Type) • Taxonomy • Interactive Visualizations
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Developed by Dr. Umer Ijaz, University of Glasgow ([email protected]) http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/summarize_v0.2/summarize.html
Try it live: http://environments.hcmr.gr/cretan-birds/summarize.html
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Developed by Dr. Umer Ijaz, University of Glasgow ([email protected]), http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/HEAPcloud_v0.1/
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Specific text sources to address specific biological questions http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat http://rs.tdwg.org/ontology/voc/SPMInfoItems#Migration http://rs.tdwg.org/ontology/voc/SPMInfoItems#Reproduction http://rs.tdwg.org/ontology/voc/SPMInfoItems#TrophicStrategy Hierarchy of ontological data structures e.g. habitat only, environmental material only , environmental feature only Taxonomy e.g. re-calculate term counts at the family or higher level taxon
Further Inspection
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
! Species – Environment association ! ENVIRONMENTS:
! Dictionary-based environment descriptive term identification ! Ontological Community standards, e.g. EnvO: name source
! EOL: global aggregator of biodiversity knowledge ! Semantically typed text clauses
! ENVIRONMENTS and EOL ! processing the EOL Taxon pages to extractenvironment descriptors ! Raw annotations (tabular format) ! Browse via the EOL Web Interface (Broad audience) ! Search and Retrieve via TraitBank@EOL
! Extract: standard compliant environement term suggestion ! Lightweight interactive browsing escort ! Process user selected text ! Simple and generic annotation retrieval
! Large scale biological questions ! Integrative data analysis / interactive visualization ! Citizen science project support
Summary
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Based on processing PubMed, Apr 2013
Co-mention based analysis Visualization – incorporate in community resources
LifeWatchGreece - RvLab More Entity types (depending on resources) • Functional Traits (feeding type, body features, etc) • Localities (co-ordindates, geographical locations)
Next Steps
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Digging-out Information
http://hartpurylrc.files.wordpress.com Photo by Dr Chatzinikolaou E
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Acknowledgements
Amvrakikos Lagoons, May 2011
ACTION ES1103
Thank You! HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou, Lucia Fanini, Sarah Faulwetter, Anastasis Oulas, Alexandros Gkougkousis et al. (RvLAB) NNF CPR: Lars Juhl Jensen, Sune Frankild, U Mass: Rob Stevenson Uni Glasgow: Christopher Quince, Umer Ijaz EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz MM-MPI: J. Schnetzer, E Pereira, AWI: Dr P. Buttigieg BioCreative: Dr. L. Hirschman (MITRE, DoE Award No DE-SC0010838) SEQenv (https://bitbucket.org/seqenv), Reflect (http://reflect.ws) Genomes OnLine Database, Virome / Metagenomes Online
Funding: EOL Rubenstein Fellowship, LifeWatc Greece, MARBIGEN, NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,”SEQenv” Hackathons (COST ES1103)
EMDONET/EUBON – 10th June 2015 – HCMR, Crete Greece
Acknowledgements
Thank You!
Amvrakikos Lagoons, May 2011
ACTION ES1103
id: ENVO:00000038 name: lagoon
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou, Lucia Fanini, Sarah Faulwetter, Anastasis Oulas, Alexandros Gkougkousis et al. (RvLAB) NNF CPR: Lars Juhl Jensen, Sune Frankild, U Mass: Rob Stevenson Uni Glasgow: Christopher Quince, Umer Ijaz EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz MM-MPI: J. Schnetzer, E Pereira, AWI: Dr P. Buttigieg BioCreative: Dr. L. Hirschman (MITRE, DoE Award No DE-SC0010838) SEQenv (https://bitbucket.org/seqenv), Reflect (http://reflect.ws) Genomes OnLine Database, Virome / Metagenomes Online
Funding: EOL Rubenstein Fellowship, LifeWatc Greece, MARBIGEN, NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,”SEQenv” Hackathons (COST ES1103)