Post on 12-Jan-2016
Abstract
BarleyBase is a USDA-funded public repository for plant microarray
data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix Barley1 and Arabidopsis ATH1 GeneChips, presently
the only two available Affymetrix high-density arrays from plants, along with experiment and sample information.
BarleyBase features a web-based, MIAME-compliant, experiment submission tool, BarleyExpress. BarleyExpress allows users to efficiently
submit and manage their experiment descriptions, array design and expression analysis information.
BarleyBase contains a broad set of query and display options at all data
levels, from experiment, hybridization to probe set and probe levels. Users can query microarray elements by expression profile and by
biological information of the probe sets. Probe set queries are seamlessly integrated with visualization and analysis tools such as scatter plots, the
R statistical toolbox, and data filters.
BarleyBase collaborates with PlantGDB and Gramene databases to perform gene prediction and cross-species comparison at the genome
level using the Barley1 GeneChip exemplar sequences.
BarleyBase is accessible at http://www.BarleyBase.org/
BARLEYBASE – AN EXPRESSION PROFILING DATABASE FOR CEREAL GENOMICS
Xiaoyun Tang, Jian Gong, Jianqiang Xin, Lishuang Shen, Stacy Turner, Rico A. Caldo, Dan Nettleton, Roger P. Wise, Julie A. Dickerson* Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011
Acknowledgments1. BarleyBase is funded by USDA-NRI/CGP #2002-03582; USDA-CSREES
North American Barley Genome Project; USDA Initiative for Future
Agriculture and Food Systems (IFAFS) #01-52100-11346. 2. PlantGDB, Gramene, KEGG, TAIR for providing tools or genomic data.3. Many people who provided technical support and advice on BarleyBase
development.
Fig. 2. BarleyBase Homepage
BarleyExpress Features• MIAME-compliant, web-based data submission and annotation tool• Experiment, array design, protocol, sample, expression submissions• Enforces plant ontology in collaboration with Gramene.• Uses controlled vocabulary for descriptions wherever possible• First database to explicitly capture information on experiment factors
and levels for presenting experiment in factorial design.• Images and other supporting information can be uploaded.• Minimal requirements on user’s computer skills and effort.• Flexible access control for submitters to designate individuals or groups
access to their private data before publication.
BarleyBase Data Model • BarleyBase uses a hierarchical data model to store gene expression data that is based on the
Affymetrix GeneChip data formats.• The highest level data structure is experiment, each of which contains one or more treatments,
each treatment has one or more samples as replicates, each sample has one or more hybridizations. • Protocols are associated with experiment at the hybridization level.• Five types of tables: Array, Expression, Experiment, Protocol, Submitter.• Follows MIAME principles recommended by MGED and implemented in MIAMExpress, but
removes the Extract level and captures the information for hybridization protocol.• Added statistical experimental design factors fields.• Using plant ontology and controlled vocabulary in experiment description.• Biological annotation for microarray probe sets and exemplars. • Presently, only stores expression data from Affymetrix GeneChips.
Data Access • Download complete data sets for experiment annotation, raw and
normalized expression data in MAGE-ML, comma-separated values (CSV), or cel-file formats.
• Experiment, hybridization and probe set browse & query. • Query and filter probe sets by expression profiles.• Search by biological criteria: annotation keywords, sequence, probe
set names, pathway or gene family membership.• Data set management and creation for filtered probe sets.• Owner-controlled, group access to private submissions.
Visualization & Analysis
• Visualization for experiments, hybridizations, probe sets, and probes.• Data analysis uses data sets obtained from probe set filtering.• Analysis methods include hierarchical clustering, k-means
partitioning, PCA, SOM, and multi-dimensional scaling (MDS)• Identification of differentially expressed and co-expressed genes.• Most data analysis & visualizations use R and Bioconductor.• Probe alignments with exemplar sequence.• Gene prediction through interconnections with PlantGDB database. • Cross-species comparative genomics through the Gramene database.
Future Plans1. Cross-experiment analysis.2. Visualization and analysis tool development.3. Barley1 exemplar annotation.
BarleyExpress Submission Steps• Experiment design information submission.• Submit experiment factors and factor level as treatments.• Batch upload raw GeneChip data.• Associate raw data files with each studied treatment.• Protocol submission – optional.• Sample preparation details for each hybridization.• Finalize experiment submission.• Grant access to designated individuals and groups.
Data Acquisition & Processing
• Experiment and expression raw data submission by submitter.• BarleyBase normalizes submitted raw data. Methods are the statistical algorithm from
Affymetrix MAS 5 and RMA (Robust Multi-Array Analysis) from Bioconductor.• Compute summary statistics and graphs for raw and normalized expression data for
summary and quality diagnostics.• Store all types of data in an open-source MySQL database.• BarleyBase assigns unique accession numbers to experiments, hybridizations &
samples.• BarleyBase generates MAGE-ML files and CSV files for batch download.• Experiment submission and associated data are available for online access and
analysis.
Fig. 1. BarleyBase Overview
Fig. 3. Major Steps in Experiment Submission
Fig. 6. Graphs for Hybridization Expression & Cluster
Fig. 5. Probe Set Query and Result Visualization
BarleyBase: BarleyBase.org
Fig. 4. Probe Alignment with Barley1 GeneChip Exemplar
Batch DownloadMAGE-MLRaw Data
CSV
BarleyBase Overview
BarleyBaseData Processing
Pipeline
Internet
User
BarleyExpress MAS5.0RMA
Query & Analysis