BeeSpace Informatics:Interactive System for
Functional Analysis
BeeSpace Informatics:Interactive System for
Functional Analysis
Bruce SchatzInstitute for Genomic Biology
University of Illinois at Urbana-Champaignwww.beespace.uiuc.edu
Fifth Annual Project WorkshopIGB, Urbana IL May 22, 2009
Concept Navigation in BeeSpaceConcept Navigation in BeeSpace
NeuroscienceLiterature
MolecularBiology
Literature
BeeLiterature
Flybase,WormBase
BeeGenome
Brain RegionLocalization
Brain GeneExpression
Profiles
BehavioralBiologist
MolecularBiologist
Neuro-scientist
Informatics: From Bases to SpacesInformatics: From Bases to Spaces
data Bases support genome datae.g. FlyBase has sequences and mapsGenes annotated by GeneOntology and
linked to biological literature
information Spaces support biological literaturee.g. BeeSpace uses automatically generated conceptual relationships to navigate functions
System ArchitectureSystem Architecture
System VersionsSystem Versions V1 Filter Concept Graph
Search, Expand, Merge, Switch, Visualize V2 Cluster Conceptual Groupings
Small Worlds (Natural), Language Model (Steerable), Concepts/Documents
V3 Summarize Gene Descriptions Gene Extraction, Sentence Classification
V4 Analyze Functional Concepts Concept Identification, Category Grouping
V5 Answer Entity Relationships Entities, Relations, Templates
Informatics Researchers (Faculty)Informatics Researchers (Faculty) Investigators: Bruce Schatz, systems (Medical Information Science) ChengXiang Zhai, algorithms (Computer Science) Collaborators (students): Saurabh Sinha, Computer Science Jiawei Han, Computer Science Sheng Zhong, Bioengineering Nathan Price, Chemical & Biomolecular Engineering Collaborators (advices): John MacMullen, Library & Information Science Dan Roth, Computer Science Roxana Girju, Linguistics Karrie Karahalios, Computer Science
Informatics Researchers (Staff)Informatics Researchers (Staff)
V1-V3 Todd Littell, research programmer Jim Buell, research coordinator Nyla Ismail, biology postdoc Moushumi Sen Sarma, biology postdoc
V4-V5 David Arcoleo, research programmer Barry Sanders, research programmer Moushumi Sen Sarma, biology postdoc Radhika Khetani, biology postdoc
Informatics Researchers (Students)Informatics Researchers (Students)
V1 Filter (parse)Jing Jiang, Azadeh Shakery, Yuanhua LvV2 Cluster (group)Brant Chee, Qiaozhu Mei, Peixiang Zhao V3 Summarize (classify)Xu Ling, Jing Jiang, Qiaozhu Mei, Xin HeV4 Analyze (annotate)Xin He, Brant Chee, Moushumi Sarma, Xu LingV5 Answer (extract)Xu Ling, Xin He, Yanen Li, Yue Lu
Analysis Environment: FeaturesAnalysis Environment: Features
SPACE is a Paradigm not a Metaphor!
Point of View for YOUR Problem
Externally:-Dynamically describe custom Region of Space-Merge Regions to form Hypothesis Space-Differentially express genes against Space
Analysis Environment: SystemAnalysis Environment: System
Concepts and Genes are Universal Entities!
Uniformly Represented Uniformly Manipulated
Internally:-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Genes from Documents into Databases
Automatic Categorization v2Automatic Categorization v2 Sorting of Spaces based on Metadata Sorting of Spaces based on Ontology
MeSH for Medline Abstracts Gene Ontology computed for documents
Sorting of Spaces based on Clustering Natural Maps from Small Worlds Steerable Maps from Language Models
Semantic Indexing of Dynamic SpacesFast System enables Interactive Sorting!
Small World GraphSmall World Graph
Semantics Deeper and FasterSemantics Deeper and Faster Semantic Indexing across all of Medline
Previous Attempts used Word Co-Occurrence Now Phrase Parser works general-purpose Now Mutual Information full differential
Parallel Optimization of MI Graph Real-time Computation Shared Memory Cluster Interactive on our 16PC 256GB RAM workerbee Dynamic Spaces then Dynamic Semantic Indexing
Interactive Clustering Natural Map Heuristic Approximation Small Worlds Graphs
Dynamic ClusteringDynamic Clustering
Automatic Curation v3Automatic Curation v3 Automatic Summarization of Genes
Retrieve relevant sentences about gene Classify sentences into important aspects
protein domain, homolog/ortholog expression pattern, phenotype function regulatory element, genetic interaction
Generalizing to Biology Entities Genes, anatomical, behavior, chemical Question answering from biology factoids
Computed Curation from Literature
Gene Summary (FlyBase) Gene Summary (FlyBase)
GP
EL
SI
GI
MP
WFPI
Gene Summary (BeeSpace)Gene Summary (BeeSpace)
Structured summary consists of relevant sentences covering 6 aspects of a gene Gene Products (GP) Expression Location (EL) Sequence Information (SI) Wild-type Function & Phenotypic
Information (WFPI) Mutant Phenotype (MP) Genetical Interaction (GI)
Drosophila gene Abelson (Abl) tyrosine kinaseDrosophila gene Abelson (Abl) tyrosine kinase
Tribolium gene ScrTribolium gene Scr
Gene Summarizer New AspectsGene Summarizer New Aspects
New categories (proposed by FlyBase curators) GP + SI => PS (protein domain or structure) SI => HO (homologs or orthologs) EL => EP (spatial/temporal expression patterns) SI => RE (regulatory element information) WFPI + MP => PF (wild-type or mutant phenotype
and function) GI => IT (genetic or physical interaction) New (beyond FlyBase) => PG (population genetics)
Utilize cross-domain information for improving the GS on other organisms.
BeeSpace System v3BeeSpace System v3
SPACES and REGIONS Dynamic and Relative
Space is collection of documentsRegion is collection of terms
Extract creates new Region from old Space Map creates new Space from old Region New from Old Spaces and Regions via merges Summarize classifies Gene within Space Annotate finds differential functional expression
BeeSpace Semantic OperationsBeeSpace Semantic Operations
Merge (S1,S2) into S3
Summarize (S) into Gene classify
New Interface v4New Interface v4 Single Window, Multiple Panes
Space Panel, Service Tabs
SPACES custom, system
FILTER searching, sorting CLUSTER map natural and steerable SUMMARIZE categorize using space ANALYZE annotate using space
Functional Analysis v4Functional Analysis v4The software system goes beyond a searchable database,
using statistical literature analyses to discover functional relationships between genes and behavior.
This research will enable all scientists who study bee genes to live on the frontier of integrative biology, where biotechnology enables routine expression analysis and bioinformatics enables functional analysis unconstrained by pre-existing categories.
Genelist Analyzer v4-Differential Expression of Gene Names against Space-Background is custom made Literature Space-Produces Concept List from Gene List-Analyze using Concept Navigation and Gene Summarization
Question Answering v5Question Answering v5
Entities and RelationsQuestion Answering templates
Entity Gene, Anatomical Behavior, Chemical
Relation Regulation (Gene-Gene) Expression (Gene-Anatomy) Function (Gene-Behavior) Biological Process Function (Gene-Chemical) Molecular Function
Towards the InterspaceTowards the Interspace
The Analysis Environment technology is GENERAL!
BirdSpace? BeeSpace?PigSpace? CowSpace?
ArthropodSpace? AnimalSpace?
BioSpace? MedSpace?
Top Related