BioNLPSADI

Presenter: Ahmad C. Bukhari

Google Project Page: https://code.google.com/p/bionlp-sadi/

Project Demo Page: https://cbakerlab:8080/p/bionlp-sadi/

1

Motivation and Introduction Past Research Work Proposed Methodology System architecture System design Ontology Development SADI Service development

Demo and code view Experiments and Results Conclusion and Future work References

2

Scientific literature, the most updated source of information

Explosive growth observed in scientific literatureproduction

Internet is full of Bio related databases and searchengines

Text formats are provided by PubMed and OMIM.

Sequence data is provided by GenBank, in terms of DNA, and UniProt, in terms of protein.

Protein structures are provided by PDB, SCOP, and CATH.

3

Thousands of documents produced weekly : Impossible to read all the published documents

Several solution developed based on AI techniques

Lost significant due to new terms developed and static mechanism

NLP emerged as possible solution in past decade

NLP was widely adopted by scientists

Several applications are available on internet based on NLP techniques

4

We Introduced semantically rich interoperable suite of BioNLPservices based on SADI framework.

Exploits the NLP technologies in order to extract the biological useful information from scientific documents.

Can present the extracted information in such fashion that itwould be reusable, searchable and interoperable.

Can display the output in integrated format which further can lead for better bio system analysis

5

Existing text mining services

Existing text mining services with web services

•U-Compare•Whatizit•EBIMED

6

Scientific community looking for sophisticated solution which can handle Biological data interoperability, usability and integrationchallenges.

We coupled the useful biological NLP techniques with SADIframework to cope the biological information logisticsissues.

Proposed solution exploit the NLP technologies to extract bio worthy info. With semantic support

Proposed solution provides output in reusable; searchable and interoperable format

7

User Interaction Layer

SADI services suite

8

REST, XML, SOAP, or WSDL

KLEIOU-CompareGENIAFACTA+etc

XML, RDF, OWL, RDFS

NLP +WS = XML output

SWS+BNLP

9

Deal with Annotation

All document related concepts

Feature Modeling

12

mutationFinder DrugExtractor (enhanced) DrugDrug Interaction (80% complte) Drug2Food Interaction (Business logic

complte) Pmid2pdf (enhanced) Pdf2ascii (upgraded overall) // A lot bug in

existing SADI client level integration service

14

•Java•Servlet•RDF•SPARQL•JSP•JSF•Javascript•XHTML•And several third partylibraries

15

Too

ls a

nd

te

chn

olo

gie

s u

sed

Demo and Code View

16

Show where the drug Amoxicillin (DB01060 ) positive effect against higher serum levels

Give me the sentence where mutation and drug name occur in the same sentence.

Extract all the drug names from text and show me the interaction (if exist) among all the drugs

Tell me the food which have bad interaction with drug Cytarabine

19

Consolidated Output Generated By system

20

Proposed a generalized architecture : semantic interoperability and integration among BNLP tools

Performed several experiments by designing different corpora’s and by choosing different combination of services

In most of the cases: system generated the results according to our requirements

. AS a future work, we will try to enhance the performance of the system by refining the algorithms

A registry feature will be added to give user more freedom to work.

21

Topic Finding

Limited availability of tools

Development challenges (countless)

Integration with web

Finding case study (still have)

22

E. Gatial, Z. Balogh, M. Ciglan, L. Hluchy, Focused web crawling mechanism based on page relevance, In: Proceedings of (ITAT 2005) information technologies applications and theory, 2005, pp. 41–45

F.N Natalya, LM Deborah, Ontology development 101: a guide to creating your first ontology. http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.htm

H. Cunningham, Y. Wilks, R. J. Gaizauskas, GATE, a General Architecture for Text Engineering. Computers and humanities (2002), 1057-1060.

R. Subhashini, V.J.S Kumar, Shallow NLP techniques for noun phrase extraction, In: Proceeding of Trendz in Information Sciences & Computing (TISC), 2010 , pp.73-77.

S. Nasrolahi, M. Nikdast, M. Boroujerdi, The semantic web: a new approach for future world wide web, In: Proceedings of World Academy of Science, Engineering and Technology, 2009, pp. 1149-1154

A.C. Bukhari, Y.G Kim, Exploiting the Heavyweight Ontology with Multi-Agent System Using Vocal Command System: A Case Study on E-Mall, International Journal of Advancements in Computing Technology 3(2011) 233-241.

A.C. Bukhari, Y.G Kim, Ontology-assisted automatic precise information extractor for visually impaired inhabitants, Artificial Intelligence Review (2005) Issn: 0269-2821.

D.H. Fudholi, N. Maneerat, R. Varakulsiripunth, Y. Kato, Application of Protégé, SWRL and SQWRL in fuzzy ontology-based menu recommendation, International Symposium on Intelligent Signal Processing and Communication Systems, 2009, pp. 631-634.

Baumgartner WA, Cohen KB, Fox L, Acquaah-Mensah G, Hunter L: Manual annotation is not sufficient for curating genomic databases.

Bioinformatics 2007, 23:i41-i48. PubMed Abstract | Publisher Full Text | PubMed Central Full Text Laurilla J, Naderi N, Witte R, Riazanov A, Kouznetsov A, Baker CJO: Algorithms and semantic infrastructure for

mutation impact extraction and grounding. BMC Genomics 2010, 11(Suppl 4):S24. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text

23

http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.htm






Many Thanks

24

BioNLPSADI

Education

Transcript of BioNLPSADI