Textmining activities at BioHackathon 2010

Post on 09-May-2015

1.277 views 2 download

description

Summary of the activities developed by the text mining task force during the BioHackathon 2010http://hackathon3.dbcls.jp/wiki/TextMining

Transcript of Textmining activities at BioHackathon 2010

Semantic Textmining

Goals and achievements

BioHackathon 2010

Team members

bull Hammad

bull Matthias

bull Venkata

bull Heiko

bull YAMAMOTO-san

bull Alberto

Original Proposal

bull Integration of text mining results

ndash Reflect Whatizit Medie

ndash Results as triplets

bull URI and predicates

ndash Implementation with SADI

ndash Result presentation using aTag

bull Explore relations

bull Interfaces

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Team members

bull Hammad

bull Matthias

bull Venkata

bull Heiko

bull YAMAMOTO-san

bull Alberto

Original Proposal

bull Integration of text mining results

ndash Reflect Whatizit Medie

ndash Results as triplets

bull URI and predicates

ndash Implementation with SADI

ndash Result presentation using aTag

bull Explore relations

bull Interfaces

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Original Proposal

bull Integration of text mining results

ndash Reflect Whatizit Medie

ndash Results as triplets

bull URI and predicates

ndash Implementation with SADI

ndash Result presentation using aTag

bull Explore relations

bull Interfaces

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

TMOntology

httphackathon3dbclsjpwikiTextMining

httphackathon3dbclsjpwikiTextMining