Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for...

60
Analysis Environments Analysis Environments For Scientific Communities For Scientific Communities From Bases to Spaces From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana- Champaign [email protected],www.beespace.uiu c.edu Baker Center for Bioinformatics Iowa State University October 6, 2006

Transcript of Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for...

Page 1: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Analysis EnvironmentsAnalysis Environments For Scientific CommunitiesFor Scientific Communities

From Bases to SpacesFrom Bases to Spaces

Bruce R. SchatzInstitute for Genomic Biology

University of Illinois at [email protected],www.beespace.uiuc.edu

Baker Center for BioinformaticsIowa State University

October 6, 2006

Page 2: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

What are Analysis EnvironmentsWhat are Analysis Environments

Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases

Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature

Page 3: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Building Analysis EnvironmentsBuilding Analysis Environments

Manual by Humans Interaction user navigation Classification collection indexing

Automatic by Computers Federation search bridges Integration results links

Page 4: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Trends in Analysis EnvironmentsTrends in Analysis Environments

Central versus Distributed Viewpoints

The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona)

The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois)

Page 5: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Pre-Genome EnvironmentsPre-Genome Environments

Focused on Syntax pre-Web

WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual

Towards Integrated Searching

Page 6: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Post-Genome EnvironmentsPost-Genome Environments

Focused on Semantics post-Web

BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic

Towards Conceptual Navigation

Page 7: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Worm Community SystemWorm Community System WCS Information:Literature BIOSIS, MEDLINE, newsletters,

meetings

Data Genes, Maps, Sequences, strains, cells

WCS FunctionalityBrowsing search, navigationFiltering selection, analysisSharing linking, publishing

WCS: 250 users at 50 labs across Internet (1991)

Page 8: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

WCSMolecular

Page 9: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

WCS Cellular

Page 10: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

WCS invokes

gm

Page 11: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

WCS vis-à-vis

acedb

Page 12: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

from Objects to Concepts

from Syntax to Semantics

Infrastructure is Interaction with Abstraction

Internet is packet transmission across computers

Interspace is concept navigation across repositories

Towards the InterspaceTowards the Interspace

Page 13: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

THE THIRD WAVE OF NET EVOLUTIONTHE THIRD WAVE OF NET EVOLUTION

PACKETS

OBJECTS

CONCEPTS

Page 14: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Technology

Engineering

Electrical

FORMAL

INFORMAL

(manual)

(automatic)

IEEE

communities

groups

individuals

LEVELS OF INDEXESLEVELS OF INDEXES

Page 15: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Post-Genome Informatics IPost-Genome Informatics I

Comparative Analysis within theDry Lab of Biological Knowledge

Classical Organisms have Genetic Descriptions.There will be NO more classical organisms beyondMice and Men, Worms and Flies, Yeasts and Weeds.

Must use comparative genomics on classical organismsVia sequence homologies and literature analysis.

Page 16: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Post-Genome Informatics IIPost-Genome Informatics II

Functional Analysis within theDry Lab of Biological Knowledge

Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences.

Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions.

Page 17: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Informatics: From Bases to SpacesInformatics: From Bases to Spaces

data Bases support genome datae.g. FlyBase has sequences and mapsGenes annotated by GeneOntology and

linked to biological literature

information Spaces support biological literaturee.g. BeeSpace uses automatically generated conceptual relationships to navigate functions

Page 18: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

BeeSpace FIBR ProjectBeeSpace FIBR Project

BeeSpace project is NSF FIBR flagshipFrontiers Integrative Biological Research, $5M for 5 years at University of Illinois

Analyzing Nature and Nurture in Societal Roles using honey bee as model

(Functional Analysis of Social Behavior)

Genomic technologies in wet lab and dry lab BeeBee [Biology] gene expressions SpaceSpace [Informatics] concept navigations

Page 19: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 20: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

System ArchitectureSystem Architecture

Page 21: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Concept Navigation in BeeSpaceConcept Navigation in BeeSpace

NeuroscienceLiterature

MolecularBiology

Literature

BeeLiterature

Flybase,WormBase

BeeGenome

Brain RegionLocalization

Brain GeneExpression

Profiles

BehavioralBiologist

MolecularBiologist

Neuro-scientist

Page 22: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

V1 BeeSpace Community CollectionsV1 BeeSpace Community Collections

Organism Honey Bee / Fruit Fly Song Bird / Soy Bean

Behavior Social / Territorial Foraging / Nesting

Development Behavioral Maturation Insect Development Insect Communication

 Structure Fly Genetics / Fly Biochemistry Fly Physiology / Insect Neurophysiology

Page 23: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

CONCEPT SWITCHINGCONCEPT SWITCHING

“Concept” versus “Term” set of “semantically” equivalent terms

Concept switching region to region (set to set) match

term

Semantic region

Concept SpaceConcept Space

Page 24: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 25: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 26: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 27: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 28: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 29: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

BeeSpace Analysis EnvironmentBeeSpace Analysis Environment Build Concept Space of Biomedical Literature

for Functional Analysis of Bee Genes

-Partition Literature into Community Collections-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Links from Documents into Databases

Locate Candidate Genes in Related Literatures then follow links into Genome Databases

Page 30: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Well Characterized GeneWell Characterized Gene

Page 31: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Poorly Characterized GenePoorly Characterized Gene

Page 32: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Gene Summarization, BeeSpace V2

Page 33: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 34: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Collaboration across UsersCollaboration across Users

Page 35: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 36: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 37: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 38: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 39: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 40: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 41: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Category Browse (Collection)Category Browse (Collection)

Page 42: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 43: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 44: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Category Browse (Search)Category Browse (Search)

Page 45: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 46: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 47: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

PlantSpace ExamplesPlantSpace Examples

Page 48: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 49: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 50: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 51: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 52: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 53: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 54: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 55: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Page 56: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Interactive Functional AnalysisInteractive Functional AnalysisBeeSpace will enable users to navigate a uniform space of

diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior.

Genes to BehaviorsBehaviors to GenesConcepts to ConceptsClusters to ClustersNavigation across Sources

Page 57: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

BeeSpace Information SourcesBeeSpace Information Sources

General for All Spaces: Scientific Literature-Medline, Biosis, CAB Abstracts Genome Databases-GenBank, ProteinDataBank, ArrayExpress

Special for BeeSpace: Model Organisms (heredity)-Gene Descriptions (FlyBase, WormBase) Natural Histories (environment)-BeeKeeping Books (Cornell, Harvard)

Page 58: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

XSpace Information SourcesXSpace Information SourcesOrganize Genome Databases (XBase)Compute Gene Descriptions from Model OrganismsPartition Scientific Literature for Organism XCompute XSpace using Semantic Indexing

Boost the Functional Analysis from Special SourcesCollecting Useful Data about Natural Historiese.g. CowSpace Leverage in AIPL Databases

Page 59: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Towards SoySpaceTowards SoySpace Organize Genome Databases (SoyBase) Partition Scientific Literature for SoyBean Gene Descriptions from Models (TAIR) Natural Histories from Population Databases

Key to Functional Analysis is Special Sources Collecting Appropriate Text about Genes Extracting Adequate Data about Histories Leverage is National Archives of germplasm

and Historical Records for soybean crops

Page 60: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Towards the InterspaceTowards the Interspace

The Analysis Environment technology is GENERAL!

BirdSpace? BeeSpace?PigSpace? CowSpace? BehaviorSpace? BrainSpace?SoySpace? PlantSpace?

BioSpace… Interspace