Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of...
-
Upload
marilyn-freeman -
Category
Documents
-
view
213 -
download
0
Transcript of Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of...
![Page 1: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/1.jpg)
Databases, Ontologies and Text mining
Session IntroductionPart 2
Carole Goble, University of Manchester, UKDietrich Rebholz-Schuhmann, EBI, UK
Philip Bourne, SDSC/UCSD, [email protected]
![Page 2: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/2.jpg)
UniP
rot
The Gene O
ntology
Ontologies
DatabasesApplications
and Mining
Bioinformatics
LocusLink
Text
min
ing
Knowledge mining
Resources in Bioinformatics
![Page 3: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/3.jpg)
UniP
rotDatabases
Bioinformatics
LocusLink
Resources in Bioinformatics
![Page 4: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/4.jpg)
What perspective do I bring?
![Page 5: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/5.jpg)
Preface
• A review of the state and needs of the field from the perspective of a user of biological databases….
… the p53 core domain structure consists of a ß sandwich that serves as a scaffold for two large loops and a loop-sheet- helix motif ... ----Science Vol.265, p346
1TSR
Corresponding structure from the PDB
?Oops!
ß sandwich? Where?Large loop? Which one??
Loop-sheet-helix???
![Page 6: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/6.jpg)
Preface
• A review of the state and needs of the field from the perspective of a developer of biological databases….
![Page 7: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/7.jpg)
What are the current biological databases and what does this tell
us?
![Page 8: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/8.jpg)
Large Growth in the Number of Biological Databases
NAR Database Issue
0
100
200
300
400
500
600
1996 1997 1998 1999 2000 2001 2002 2003 2004
Year
Nu
mb
er o
f E
ntr
ies
![Page 9: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/9.jpg)
Resources are Becoming More Diverse Database Types
Nucleotide Sequence
RNA Sequence
Protein Sequence
Structure
Genome (non-human)
Pathways
Genome (human)
Disease
Gene Expression Other
NAR 2004 – Division by Resource Type
![Page 10: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/10.jpg)
NAR 2004 – A Closer Look
Database Types
Nucleotide Sequence
RNA Sequence
Protein Sequence
Structure
Genome (non-human)
Pathways
Genome (human)
Disease
Gene Expression Other
• Genome scale databases have proliferated
• Traditional sequence databases are now a small part
• Databases around new specific data types are emerging
• Pathway and disease orientated databases are emerging
![Page 11: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/11.jpg)
The Future - ISMB04 Poster Distribution
Nucleotide
Sequence
RNA Sequence
Protein Sequence
Structure
Genome (non-
human)
Pathways
Genome (human)Disease
Gene Expression
Other
Database Types
Nucleotide Sequence
RNA Sequence
Protein Sequence
Structure
Genome (non-human)
Pathways
Genome (human)
Disease
Gene Expression Other
ISMB04
![Page 12: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/12.jpg)
What Does ISMB04 Tell Us About New Biological Databases?
• Microarray data resources are hot• Genotypic – phenotypic resources are
emerging• Surprisingly pathway resources are not
growing fast • Disease and species based resources are
increasing – notably plants• Human genome related resources are
increasing
![Page 13: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/13.jpg)
What About Data in These Databases?
![Page 14: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/14.jpg)
Data are Becoming More Plentiful and More Complex
![Page 15: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/15.jpg)
Note: Redundancy at 30% Sequence Identity
Data are Becoming More Redundant
![Page 16: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/16.jpg)
So the amount and complexity of data are increasing across biological scales – what are the challenges?
![Page 17: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/17.jpg)
A Major Challenge
12:00
We suffer from the “high noon syndrome”
Those who can gain and contribute most to biological databasesare frequently NOT the users
We need to lower the cost:benefit ratio
![Page 18: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/18.jpg)
How Do We Lower this Barrier?
• Better support of complex data types e.g., networks, images, graphs
• Associated optimized query languages
• Associated ontologies
• Better handling of uncertainty and inconsistency
• More and automated data curation
• Large scale data integration
![Page 19: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/19.jpg)
How Do We Lower this Barrier?
• Better support of complex data types e.g., networks, images, graphs
• Associated optimized query languages
• Associated ontologies
• Better handling of uncertainty and inconsistency
• More and automated data curation
• Large scale data integration
![Page 20: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/20.jpg)
How Do We Lower this Barrier?
• Support of data provenance
• Support for rapid data and associated schema evolution
• Support for temporal data
• Better integration of data and methods
• Usability engineering
![Page 21: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/21.jpg)
How Do We Lower this Barrier?
• Support of data provenance
• Support for rapid data and associated schema evolution
• Support for temporal data
• Better integration of data and methods
• Usability engineering
We need more work in these other areas
![Page 22: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/22.jpg)
A Note on Data Provenance
![Page 23: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/23.jpg)
Further Reading
• Jagadish and Olken (2003) Omics 7(1) 131-137. Data Management for Life Sciences Research http://www.lbl.gov/~olken/wmdbio
• Maojo and Kulikowski (2003) J. of AMIA 515-522. Bioinformatics and Medical Informatics – Collaborations on the Road to Genomic Medicine?
![Page 24: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/24.jpg)
GeneXPress: A Visualization and Statistical Analysis Tool for Gene Expression and
Sequence DataSegal, Kaushal, Yelensky, Pham, Regev, Koller,
Friedman
DataQuery &Analysis
BiologicalResults
Curation
Usability Integration
• Assign biological meaning to gene expression data through post-processing and visualization
![Page 25: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/25.jpg)
Filtering Erroneous Protein AnnotationWieser, Kretschmann and Apweiler
DataQuery &Analysis
BiologicalResults
Curation
Usability Integration
• Automated detection of annotation errors using a decision tree approach based upon the C4.5 data mining algorithm
![Page 26: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/26.jpg)
Selecting Biomedical Data Sources According to User Preferences
Cohen-Boulakia, Lair, Stransky, Graziani, Radvanyi, Barillot and Froidevaux
DataQuery &Analysis
BiologicalResults
Curation
Usability Integration
• Understand the characteristics of biological data
• Present a selection of resources relevant to a user query
• Framework for the multiple parametric analysis of cancer
![Page 27: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/27.jpg)
Integration of Biological Data from Web Resources: Management of Multiple Answers through Metadata Retrieval
Devignes, Smail
DataQuery &Analysis
BiologicalResults
Curation
Usability Integration
• Same question – different answers from different resources – How can this be understood?
• Semantic integration based on domain ontologies
![Page 28: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/28.jpg)
Critically-based Task Composition in Distributed Bioinformatics Systems
Karasavvas, Baldock, Burger
Data Query &Analysis
BiologicalResults
Curation
Usability Integration
• Task composition in workflow systems requires decision support
• Provision of data providing providence information provides that support
![Page 29: Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bfae1a28abf838c9c8cf/html5/thumbnails/29.jpg)
ENJOY !!