Data Integration and Extraction over Molecular Biological Data

Post on 30-Dec-2015

25 views 0 download

description

Data Integration and Extraction over Molecular Biological Data. Cui Tao. supported by NSF. Motivation. Online biological data: Highly diverse in granularity and variety Various formats Different terminologies, ID systems, units. How to Build a Gene Extraction Ontology?. Concepts - PowerPoint PPT Presentation

Transcript of Data Integration and Extraction over Molecular Biological Data

1

Data Integration and Extraction over Molecular Biological Data

Cui Tao

supported by NSF

2

Motivation

Online biological data: Highly diverse in granularity and

variety Various formats Different terminologies, ID systems,

units

3

How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames

4

How to Build a Gene Extraction Ontology?

(G*A*U*C*)*

(G*A*T*C*)*

5

Knowledge Sources Gene Ontology

Thousands of terms

All Species Toolkit 1,231,935 species names

Protein Databases Thousands of protein names

(Molecular Function, Biological Process, Cellular Component)

6

Extraction Rules Statistical NLP Machine learning

Naïve Bayes Hidden Markov Models Decision Trees

7

Integration

8

9

10

11

12

13

Integration Information Hidden behind Links

14

15

16

17

Query-based Extraction

Query the gene extraction ontology

Find applicable resources Fill out forms Extract information

18

Query-based Extraction

Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.”

Gene NameGene Sequence

Gene

Mutant

Protein FunctionMutant Function

19

20

21

22

23

Contribution Provides a way to automatically

integrate online biological data from different sources

Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query