Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for...

Post on 11-Jan-2016

213 views 0 download

Tags:

Transcript of Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for...

Analysis EnvironmentsAnalysis Environments For Scientific CommunitiesFor Scientific Communities

From Bases to SpacesFrom Bases to Spaces

Bruce R. SchatzInstitute for Genomic Biology

University of Illinois at Urbana-Champaignschatz@uiuc.edu,www.beespace.uiuc.edu

Baker Center for BioinformaticsIowa State University

October 6, 2006

What are Analysis EnvironmentsWhat are Analysis Environments

Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases

Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature

Building Analysis EnvironmentsBuilding Analysis Environments

Manual by Humans Interaction user navigation Classification collection indexing

Automatic by Computers Federation search bridges Integration results links

Trends in Analysis EnvironmentsTrends in Analysis Environments

Central versus Distributed Viewpoints

The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona)

The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois)

Pre-Genome EnvironmentsPre-Genome Environments

Focused on Syntax pre-Web

WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual

Towards Integrated Searching

Post-Genome EnvironmentsPost-Genome Environments

Focused on Semantics post-Web

BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic

Towards Conceptual Navigation

Worm Community SystemWorm Community System WCS Information:Literature BIOSIS, MEDLINE, newsletters,

meetings

Data Genes, Maps, Sequences, strains, cells

WCS FunctionalityBrowsing search, navigationFiltering selection, analysisSharing linking, publishing

WCS: 250 users at 50 labs across Internet (1991)

WCSMolecular

WCS Cellular

WCS invokes

gm

WCS vis-à-vis

acedb

from Objects to Concepts

from Syntax to Semantics

Infrastructure is Interaction with Abstraction

Internet is packet transmission across computers

Interspace is concept navigation across repositories

Towards the InterspaceTowards the Interspace

THE THIRD WAVE OF NET EVOLUTIONTHE THIRD WAVE OF NET EVOLUTION

PACKETS

OBJECTS

CONCEPTS

Technology

Engineering

Electrical

FORMAL

INFORMAL

(manual)

(automatic)

IEEE

communities

groups

individuals

LEVELS OF INDEXESLEVELS OF INDEXES

Post-Genome Informatics IPost-Genome Informatics I

Comparative Analysis within theDry Lab of Biological Knowledge

Classical Organisms have Genetic Descriptions.There will be NO more classical organisms beyondMice and Men, Worms and Flies, Yeasts and Weeds.

Must use comparative genomics on classical organismsVia sequence homologies and literature analysis.

Post-Genome Informatics IIPost-Genome Informatics II

Functional Analysis within theDry Lab of Biological Knowledge

Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences.

Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions.

Informatics: From Bases to SpacesInformatics: From Bases to Spaces

data Bases support genome datae.g. FlyBase has sequences and mapsGenes annotated by GeneOntology and

linked to biological literature

information Spaces support biological literaturee.g. BeeSpace uses automatically generated conceptual relationships to navigate functions

BeeSpace FIBR ProjectBeeSpace FIBR Project

BeeSpace project is NSF FIBR flagshipFrontiers Integrative Biological Research, $5M for 5 years at University of Illinois

Analyzing Nature and Nurture in Societal Roles using honey bee as model

(Functional Analysis of Social Behavior)

Genomic technologies in wet lab and dry lab BeeBee [Biology] gene expressions SpaceSpace [Informatics] concept navigations

System ArchitectureSystem Architecture

Concept Navigation in BeeSpaceConcept Navigation in BeeSpace

NeuroscienceLiterature

MolecularBiology

Literature

BeeLiterature

Flybase,WormBase

BeeGenome

Brain RegionLocalization

Brain GeneExpression

Profiles

BehavioralBiologist

MolecularBiologist

Neuro-scientist

V1 BeeSpace Community CollectionsV1 BeeSpace Community Collections

Organism Honey Bee / Fruit Fly Song Bird / Soy Bean

Behavior Social / Territorial Foraging / Nesting

Development Behavioral Maturation Insect Development Insect Communication

 Structure Fly Genetics / Fly Biochemistry Fly Physiology / Insect Neurophysiology

CONCEPT SWITCHINGCONCEPT SWITCHING

“Concept” versus “Term” set of “semantically” equivalent terms

Concept switching region to region (set to set) match

term

Semantic region

Concept SpaceConcept Space

BeeSpace Analysis EnvironmentBeeSpace Analysis Environment Build Concept Space of Biomedical Literature

for Functional Analysis of Bee Genes

-Partition Literature into Community Collections-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Links from Documents into Databases

Locate Candidate Genes in Related Literatures then follow links into Genome Databases

Well Characterized GeneWell Characterized Gene

Poorly Characterized GenePoorly Characterized Gene

Gene Summarization, BeeSpace V2

Collaboration across UsersCollaboration across Users

Category Browse (Collection)Category Browse (Collection)

Category Browse (Search)Category Browse (Search)

PlantSpace ExamplesPlantSpace Examples

Interactive Functional AnalysisInteractive Functional AnalysisBeeSpace will enable users to navigate a uniform space of

diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior.

Genes to BehaviorsBehaviors to GenesConcepts to ConceptsClusters to ClustersNavigation across Sources

BeeSpace Information SourcesBeeSpace Information Sources

General for All Spaces: Scientific Literature-Medline, Biosis, CAB Abstracts Genome Databases-GenBank, ProteinDataBank, ArrayExpress

Special for BeeSpace: Model Organisms (heredity)-Gene Descriptions (FlyBase, WormBase) Natural Histories (environment)-BeeKeeping Books (Cornell, Harvard)

XSpace Information SourcesXSpace Information SourcesOrganize Genome Databases (XBase)Compute Gene Descriptions from Model OrganismsPartition Scientific Literature for Organism XCompute XSpace using Semantic Indexing

Boost the Functional Analysis from Special SourcesCollecting Useful Data about Natural Historiese.g. CowSpace Leverage in AIPL Databases

Towards SoySpaceTowards SoySpace Organize Genome Databases (SoyBase) Partition Scientific Literature for SoyBean Gene Descriptions from Models (TAIR) Natural Histories from Population Databases

Key to Functional Analysis is Special Sources Collecting Appropriate Text about Genes Extracting Adequate Data about Histories Leverage is National Archives of germplasm

and Historical Records for soybean crops

Towards the InterspaceTowards the Interspace

The Analysis Environment technology is GENERAL!

BirdSpace? BeeSpace?PigSpace? CowSpace? BehaviorSpace? BrainSpace?SoySpace? PlantSpace?

BioSpace… Interspace