OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background...

OntoGene in the BioNLP Shared Task

and in BioCreative II.5

Fabio Rinaldi

Outline

Background our work on GENIA participation to BioCreative II

The IntAct activity Interactors (AIME 09), Methods (SMBM08), Organisms

(BioNLP 09), Interactions (CICLING 09)

Recent work: BioNLP shared task BioCreative II.5

Outline

OntoGene: the beginnings

BioNLP identified as a 'hot' area for research Leverage on the work done on terminology

structuring original focus: ontology learning later refocused on ontology usage

Gradually moved into relation extraction leverage upon dependency structures (Pro3Gres) organize different tools into an NLP pipeline

How does it work?

The pipeline delivers: tokens with unique identifiers terms and their heads chunks and their heads dependency relations; encoded as (sentenceid, type,

head, dependent); can be delivered either as CSV or XML

OGRM application makes use of this information (stored in a Prolog database) in order to extract domain relations by means of cascading rules

NLP Pipeline sentence splitting (mxterminator), tokenizer

(Penn Treebank tokenizer), POStagger (MXPOST), lemmatizer (morpha), NG/VG chunker (LTCHUNK), dependency parser (Pro3Gres)

each tool has a wrapper to make inputoutput XMLbased

other outputs are possible: CSV, Prolog

integrates LingPipe, Term Detection...

support various postprocessing of dependency relations

Performance: 1 hour to parse the GENIA corpus

dual core AMD opteron 2.5 Ghz, 8GB ram 45 min for parsing

Pro3Gres parse example

OG-RM: cascading rules

By [by, through, via]

agent target

A regulates BB is regulated by Athe regulation of B by A

semRel(xrel([H,A,B]), direct_transitive([H,A,B])).semRel(xrel([H,A,B]), passive([H,B,A])).semRel(xrel([H,A,B]), nominalization([H,B,A])).

OG-RM: cascading rules

By [by, through, via]

agent target

A H [nominalisation]

trigger

BPrep [of, ..]

deprel

A triggers the H of B

References

Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Michael Hess, Martin Romacker. An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics 2006, 7(Suppl 3):S3. doi:10.1186/147121057S3S3

Outline

Krallinger et al., Overview of the proteinprotein interaction annotation task of BioCreative II,

Genome Biology (2008), vol. 9, suppl. 2, pp. S4

IPS: Our Approach

Protein Name Detection and Disambiguation identification of proteins organismbased disambiguation further disambiguation

Interaction Detection generation of potential interactions filtering of candidate interactions

Syntaxbased filter Novelty filter

Evaluation

UniProt

Annotated Abstract

Detection of experimental methods, based on PSIMI taxonomy

Best official results !!!!!

References

Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, Gerold Schneider, Manfred Klenner, Simon Clematide, Michael Hess, JeanMarc von Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGene in BioCreative II. Genome Biology, 2008, 9:S13.

Outline

The IntAct activity Methods (SMBM08), Interactors (AIME 09), Organisms

Detection of Biological Interactions

from Biomedical Literature[SNF 100014 / 118396]

Duration: 18 months (April 2008 – September 2009) SNF Funding: 114'046 CHF Novartis Funding: ~ 70'000 CHF University funding: 50% Fabio's position

IntAct

Can be used as source of interactions, interactors, methods, organisms, “snippets” Used to derive distributional frequencies Used to derive a gold standard for testing purposes (for

IMS and TX): 621 PubMedindexed articles

Subtasks: IMS: Experimental Methods TX: Organism Detection PID: Protein Identification and Disambiguation PPI: Protein Interactions

Balance so far

Highlights: IntAct, BioNLP shared task, BioCreative Publications:

Genome Biology paper finally published 4 poster presentations (G2S, LREC, CICLING, ISWC) 4 conference papers (SMBM, OWLED, CICLING,

AIME) 2 workshop presentations [BioNLP workshop & shared

Invited presentations: FBK, Trento; DBTA, Basel; CCP, Denver.

Outline

Introduction

Approach originally developed for participation in the BioCreative proteinprotein interaction task

Used also on an internal project based on the IntAct dataset of protein interactions

Adaptation to the BioNLP shared task took approximately one month

Based on straightforward rewriting of syntactic structures to event structures, taking statistics from training data into account

Preprocessing

Lingpipe for sentence splitting, tokenization, and PoS tagging (GENIA training model)

Term annotation: only terms provided in a1,a2 files (in 10 cases not compatible with tokenization)

Lemmatization (morpha) used only by dep. Parser Chunking using LTCHUNK & detecting chunk heads Dependency parsing with Pro3Gres, only among

chunks

Data format

Tokens (tokID > lemma, Pos Tag, offset) Chunks (tokID > chunk, chunk type, head) Terms (tokID > term ID) Sentences (sent ID > tokens Ids) Dependences (dependent ID > head ID)

Data from training

word_to_freq(+Word, F) eword_to_event(+EventWord, EventType,

EventArgs, F1, F2) F1: frequency of EventWord, EventType, EventArgs F2: frequency of EventWord as trigger

Domination path Direct domination

“regulates expression” Chunk internal domination

“inducible Oct2 expression”

Trigger generation

Event structure generation

Event argument filling

BioNLP shared task

Outline

Our approach

Core pipeline delivers rich annotation format Used to process training and test data

Entities Detection and Disambiguation (IntAct approach) [Orgbased disambiguation]

Candidate interactions Initial training based on GENIA (IntAct approach) Statistics adjusted using training data

Results: “impressively good” AUC training: ~ 22%

BioCreative II.5

Acknowledgments Kaarel Kaljurand Gerold Schneider Thomas Kappeler Simon Clematide

Therese Vachon Martin Romacker Josef Scheiber

www.ontogene.org

OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background...

Documents

Transcript of OntoGene in the BioNLP Shared Task and in …nactem.ac.uk/talk_slides/Rinaldi.pdfOutline Background...

BioNLP 2013 - acl-arc.comp.nus.edu.sgantho/W/W13/W13-19.pdf · Introduction BioNLP 2013 has accepted 11 outstanding full papers and v e posters. The themes in this year's papers and

Unitarity and Gauge Invariance in Dark Matter Modelsrebeccaleane.com/Talk_slides/RKLeane_CETUP_2016.pdf• Both EFTs and simplified models are popular frameworks for setting limits

Proceedings of BioCreative III Workshop

Thermoelectrics and Thermomagnetics of novel materials and ...qcmjc/talk_slides/QCMJC.2016.08.25_Kingshuk.pdfOutline 1 Motivation 2 What has been "aimed" in the paper ? 3 Chemical

BioNLP Tutorial PSB 2006 Wailea, Maui, HI K. Bretonnel Cohen Olivier Bodenreider Lynette Hirschman.

The role of electron localization in density functionalqcmjc/talk_slides/QCMJC.2014.10.30_Debabrata.pdfBasic Introduction to DFT Any property of a system of many interacting particles

BioNLP 2007: Biological, translational, and clinical ...wing.comp.nus.edu.sg/~antho/W/W07/W07-1000.pdf · Biological, translational, and clinical language processing K. B RETONNEL

BioCreative Keynote Refs - OHSU InformaticsInformation*Retrievaland*TextMining*Evaluation*MustGo*Beyond*“Users”:* Incorporating*Real@World*Context*and*Outcomes* WilliamHersh* ProfessorandChair*

Translational BioNLP - NIBIB · Translational BioNLP Kevin Bretonnel Cohen Biomedical Text Mining Group Lead, Computational Bioscience Program, U. of Colorado School of Medicine Adjunct

BioNLP, Information Extraction from Radiology Reports Emilia Apostolova College of Computing and Digital Media DePaul University.

Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

PROCEEDINGS Open Access University of Turku in the BioNLP 11 Shared Task · 2017. 4. 6. · Most of the BioNLP’11 Shared Task tasks define task-specific annotation terminology,

Disorder and Correlations in Mott-Hubbard systemsqcmjc/talk_slides/QCMJC.2012.08.09... · Disorder and Correlations in Mott-Hubbard systems N. S. Vidhyadhiraja Theoretical Sciences

Proceedings of BioNLP Shared Task 2011 Workshopaclweb.org/anthology/W/W11/W11-18.pdf · Table of Contents Overview of BioNLP Shared Task 2011 Jin-Dong Kim, Sampo Pyysalo, Tomoko Ohta,

Proceedings of the Fourth BioCreative Challenge … of the Fourth BioCreative Challenge Evaluation Workshop vol. 2 October 8, 2013 Washington, DC USA Editors: • Martin Krallinger

Uncertainty Aware Semantics for Information Fusion FUSION ...fusion.isif.org/conferences/fusion2017/Talk_slides... · Uncertainty Aware Semantics for Information Fusion. Kathryn Laskey.

DUTIR at the BioCreative V CDR Task: Extracting the ... · Adverse drug reactions between chemicals and diseases make the topic of ... BioCreative V proposes a challenge task of automatic

Proceedings of BioNLP Shared Task 2011 Workshopmarc/misc/proceedings/bionlp-2011/BioNLP-ST/BioNLP-ST...M.A. Sloot Detecting Entity Relations as a Supporting Task for Bio-Molecular

MilkER – a milk informatics resource Stephen Edwards BSc. University of Edinburgh BioNLP meeting 6th June 2005.

LESSIONS FROM THE BIOCREATIVE PROTEIN- PROTEIN INTERACTION (PPI) TASK RegCreative Jamboree ,

BioCreative Keynote Refs - OHSU InformaticsInformationRetrievalandTextMiningEvaluationMustGoBeyond“Users”:* IncorporatingReal@WorldContextandOutcomes* WilliamHersh* ProfessorandChair*