Data Integration and Extraction over Molecular Biological Data

23
1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

description

Data Integration and Extraction over Molecular Biological Data. Cui Tao. supported by NSF. Motivation. Online biological data: Highly diverse in granularity and variety Various formats Different terminologies, ID systems, units. How to Build a Gene Extraction Ontology?. Concepts - PowerPoint PPT Presentation

Transcript of Data Integration and Extraction over Molecular Biological Data

Page 1: Data Integration and Extraction over Molecular Biological Data

1

Data Integration and Extraction over Molecular Biological Data

Cui Tao

supported by NSF

Page 2: Data Integration and Extraction over Molecular Biological Data

2

Motivation

Online biological data: Highly diverse in granularity and

variety Various formats Different terminologies, ID systems,

units

Page 3: Data Integration and Extraction over Molecular Biological Data

3

How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames

Page 4: Data Integration and Extraction over Molecular Biological Data

4

How to Build a Gene Extraction Ontology?

(G*A*U*C*)*

(G*A*T*C*)*

Page 5: Data Integration and Extraction over Molecular Biological Data

5

Knowledge Sources Gene Ontology

Thousands of terms

All Species Toolkit 1,231,935 species names

Protein Databases Thousands of protein names

(Molecular Function, Biological Process, Cellular Component)

Page 6: Data Integration and Extraction over Molecular Biological Data

6

Extraction Rules Statistical NLP Machine learning

Naïve Bayes Hidden Markov Models Decision Trees

Page 7: Data Integration and Extraction over Molecular Biological Data

7

Integration

Page 8: Data Integration and Extraction over Molecular Biological Data

8

Page 9: Data Integration and Extraction over Molecular Biological Data

9

Page 10: Data Integration and Extraction over Molecular Biological Data

10

Page 11: Data Integration and Extraction over Molecular Biological Data

11

Page 12: Data Integration and Extraction over Molecular Biological Data

12

Page 13: Data Integration and Extraction over Molecular Biological Data

13

Integration Information Hidden behind Links

Page 14: Data Integration and Extraction over Molecular Biological Data

14

Page 15: Data Integration and Extraction over Molecular Biological Data

15

Page 16: Data Integration and Extraction over Molecular Biological Data

16

Page 17: Data Integration and Extraction over Molecular Biological Data

17

Query-based Extraction

Query the gene extraction ontology

Find applicable resources Fill out forms Extract information

Page 18: Data Integration and Extraction over Molecular Biological Data

18

Query-based Extraction

Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.”

Gene NameGene Sequence

Gene

Mutant

Protein FunctionMutant Function

Page 19: Data Integration and Extraction over Molecular Biological Data

19

Page 20: Data Integration and Extraction over Molecular Biological Data

20

Page 21: Data Integration and Extraction over Molecular Biological Data

21

Page 22: Data Integration and Extraction over Molecular Biological Data

22

Page 23: Data Integration and Extraction over Molecular Biological Data

23

Contribution Provides a way to automatically

integrate online biological data from different sources

Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query