BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1,...

1
BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1 , Stephen M. Beckstrom-Sternberg 2,3 , Paul S. Keim 2,3 , Raymond K. Auerbach 2,3 , Zuoming Deng 4 , Aihui Wang 4 , Jianjun Wang 4 , Burke Squires 1 , Christopher N. Larsen 5 , Alvin Ramsey 5 , Kevin Biersack 4 , Tom Brettin 6 and Richard H. Scheuermann 1 1 Department of Pathology and Division of Biomedical Informatics, University of Texas Southwestern Medical Center, Dallas, TX, 2 Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 3 Pathogen Genomics Division, Translational Genomics Research Institute (TGen), Phoenix, AZ, 4 Northrop Grumman Information Technology, Rockville, MD, 5 Vecna Technologies, Inc., College Park, MD, 6 Bioscience Division and the Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM. Abstract BioHealthBase (BHB) is a Bioinformatics Resource Center (BRC) for Biodefense and Emerging/Re-emerging Infectious Diseases funded by the Division of Microbiology and Infectious Diseases (DMID)of the National Institute of Allergy and Infectious Diseases (NIAID). The goal of BHB is to provide bioinformatics resources and support to research communities involved in the development of vaccines, therapeutics, and diagnostics for organisms considered potential agents for bioterrorism, including Francisella tularensis. The BioHealthBase ( www.biohealthbase.org ) is an integrated and comprehensive relational database designed to collect, analyze, annotate, store, query, view and display genomics, protein structure, protein function, metabolic and signaling pathway and polymorphism data and is equipped with a user friendly interface that provides a single robust point of access for the scientific community. With sequencing of new Francisella genomes, there is an increasing need to compile the genomic data, improve genome annotations and provide comparative genomics analyses as well as information on available scientific resources for Francisella researchers. An overview of BioHealthBase database and current resources for Francisella genomics data will be presented along with the preliminary genome annotation and comparative genomics data for Francisella tularensis Wyoming (FTW) strain. BioHealthBase Mission To assist scientific researchers in their development of vaccines, therapeutics, and diagnostics by providing data integration, data accessibility and bioinformatics support. BioHealthBase Goals To provide central, integrated and comprehensive data repository equipped with a user-friendly interface for data retrieval and analysis for a wide variety of scientific data for selected pathogenic organisms To provide a platform for software tools that support investigator-driven data analysis. Francisella strains in BioHealthBase Francisella tularensis subspecies tularesnsis strain Schu S4 Francisella tularensis subspecies tularesnsis strain FSC198 Franciella tularensis subspecies holarctica strain OSU18 Francisella tularensis subspecies holarctica strain LVS Types of Data available Genomic sequence data Protein sequence, molecular weight, domains, motifs, secondary structure, etc. Operons GO annotations (Molecular Function, Localization, Biological Process) Predicted tRNA, rRNA genes Mutations sites, phenotypes and links to mutant strain resources Pathways: metabolic and cellular signaling Host-pathogen interactions Links to research community and organism specific news, resources, literature etc. Future Releases Immune epitopes data from IEDB 3-D structure visualization tools for available protein structures Enhanced manual curation and literature derived GO annotations Proteomics data to support ORF predictions Comparative genomics analyses of Francisella strains Francisella homepage Query Interface Query Results Genome browser visualization Gene Prediction BHB Genome Curation Dataflow Curation Database Raw Sequence Data Automated Annotation Pipeline Manual Curation Literature & Other public Databases Reference Sequence Database Analytical Tools Visualization Tools Query Interface Research Community Users Genomic Sequences Francisella Gene Prediction tRNAScanSE blastN for rRNA genes Prediction of protein coding genes (CDS) Glimmer3 Structural Features RBS Start/stop codons Manual Curation Predicted genes and protein sequences Literature Public Data Sources Gene prediction Protein Annotation Predicted Protein Sequences BlastP vs integrated, non-redundant protein database Signal peptide Transmembrane segments Secondary structure Hydrophobicity plot Functional domains Motifs Protein Family Homologs/orthologs Public Data Sources Manual Curation Literature Protein Sequence with Functional assignment, GO annotation, EC number, pathway affiliation or hypothetical protein, pesudogene status BioHealthBase Genome Annotation BioHealthBase has adapted TIGR’s software infrastructure for bacterial genome annotation. It includes automated annotation tools as well as manual curation interface. Currently we are collaborating with LANL and Tgen for whole genome annotation of a newly sequenced Francisella tularensis Wyoming strain which is an A2 type isolate from the Western/Rocky mountain region of USA. References *for TIGR’s annotation infrastructure please refer to http://manatee.sourceforg Supported by NIAID/NIH N01-AI-40041

Transcript of BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1,...

Page 1: BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1, Stephen M. Beckstrom-Sternberg 2,3, Paul S. Keim 2,3,

BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole1, Stephen M. Beckstrom-Sternberg2,3, Paul S. Keim2,3, Raymond K. Auerbach2,3, Zuoming Deng4, Aihui Wang4, Jianjun Wang4, Burke Squires1, Christopher N. Larsen5, Alvin Ramsey5, Kevin Biersack4, Tom Brettin6 and Richard H. Scheuermann1

1Department of Pathology and Division of Biomedical Informatics, University of Texas Southwestern Medical Center, Dallas, TX, 2Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 3Pathogen Genomics Division, Translational

Genomics Research Institute (TGen), Phoenix, AZ, 4Northrop Grumman Information Technology, Rockville, MD, 5Vecna Technologies, Inc., College Park, MD, 6Bioscience Division and the Joint Genome Institute, Los Alamos National Laboratory, Los Alamos,

NM.

AbstractBioHealthBase (BHB) is a Bioinformatics Resource Center (BRC) for Biodefense and Emerging/Re-emerging Infectious Diseases funded by the Division of Microbiology and Infectious Diseases (DMID)of the National Institute of Allergy and Infectious Diseases (NIAID). The goal of BHB is to provide bioinformatics resources and support to research communities involved in the development of vaccines, therapeutics, and diagnostics for organisms considered potential agents for bioterrorism, including Francisella tularensis. The BioHealthBase (www.biohealthbase.org) is an integrated and comprehensive relational database designed to collect, analyze, annotate, store, query, view and display genomics, protein structure, protein function, metabolic and signaling pathway and polymorphism data and is equipped with a user friendly interface that provides a single robust point of access for the scientific community. With sequencing of new Francisella genomes, there is an increasing need to compile the genomic data, improve genome annotations and provide comparative genomics analyses as well as information on available scientific resources for Francisella researchers. An overview of BioHealthBase database and current resources for Francisella genomics data will be presented along with the preliminary genome annotation and comparative genomics data for Francisella tularensis Wyoming (FTW) strain.

BioHealthBase MissionTo assist scientific researchers in their development of vaccines, therapeutics, and diagnostics by providing data integration, data accessibility and bioinformatics support.

BioHealthBase GoalsTo provide central, integrated and comprehensive data repository equipped with a user-friendly interface for data retrieval and analysis for a wide variety of scientific data for selected pathogenic organisms To provide a platform for software tools that support investigator-driven data analysis.

Francisella strains in BioHealthBaseFrancisella tularensis subspecies tularesnsis strain Schu S4Francisella tularensis subspecies tularesnsis strain FSC198Franciella tularensis subspecies holarctica strain OSU18Francisella tularensis subspecies holarctica strain LVS

Types of Data availableGenomic sequence dataProtein sequence, molecular weight, domains, motifs, secondary structure, etc.OperonsGO annotations (Molecular Function, Localization, Biological Process)Predicted tRNA, rRNA genesMutations sites, phenotypes and links to mutant strain resourcesPathways: metabolic and cellular signalingHost-pathogen interactionsLinks to research community and organism specific news, resources, literature etc.

Tools available at BioHealthBaseBlastMultiple sequence alignmentsGenome browser viewer

Future ReleasesImmune epitopes data from IEDB3-D structure visualization tools for available protein structuresEnhanced manual curation and literature derived GO annotationsProteomics data to support ORF predictionsComparative genomics analyses of Francisella strains

Francisella homepage

Query Interface

Query Results

Genome browser visualization

Gene Prediction

BHB Genome Curation Dataflow

Curation Database

Raw Sequence Data

Automated Annotation Pipeline

Manual CurationLiterature &Other public

Databases

Reference Sequence Database

Analytical Tools Visualization Tools

Query Interface

Research Community Users

Genomic SequencesFrancisella

Gene Prediction

tRNAScanSEblastN for

rRNA genes

Prediction of protein coding genes (CDS)Glimmer3

Structural FeaturesRBS

Start/stop codons

Manual Curation

Predicted genes and protein sequences

LiteraturePublic Data Sources

Gene prediction

Protein AnnotationPredicted Protein Sequences

BlastP vs integrated, non-redundant protein database

Signal peptideTransmembrane segments

Secondary structureHydrophobicity plot

Functional domainsMotifs

Protein Family

Homologs/orthologs

Public Data Sources

Manual Curation Literature

Protein Sequence with Functional assignment, GO annotation, EC number, pathway affiliation or

hypothetical protein, pesudogene status

BioHealthBase Genome AnnotationBioHealthBase has adapted TIGR’s software infrastructure for bacterial genome annotation. It includes automated annotation tools as well as manual curation interface. Currently we are collaborating with LANL and Tgen for whole genome annotation of a newly sequenced Francisella tularensis Wyoming strain which is an A2 type isolate from the Western/Rocky mountain region of USA.

References*for TIGR’s annotation infrastructure please refer to http://manatee.sourceforge.net/Supported by NIAID/NIH N01-AI-40041