Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy...
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Bioinformatics Tools for Microarray Analysis Connie Wu Dr. Jim Breaux Dr. Sandeep Gulati ViaLogy...
Bioinformatics Tools for Microarray Analysis
Connie Wu
Dr. Jim Breaux
Dr. Sandeep GulatiViaLogy
Southern California Bioinformatics Institute
Summer 2004
Funded by the National Science Foundation and National Institutes of Health
Company Overview• Discovered and developed software Discovered and developed software
implementation of Active Signal Processing implementation of Active Signal Processing (called Quantum Resonance Interferometry)(called Quantum Resonance Interferometry)
• Applying QRI to analysis of DNA Applying QRI to analysis of DNA microarrays enhances performance:microarrays enhances performance:
• Increased detection sensitivity and Increased detection sensitivity and dynamic rangedynamic range• Increased specificityIncreased specificity• Increased reproducibilityIncreased reproducibility
Company Overview• VMAxS: web-based service for analyzing VMAxS: web-based service for analyzing
microarrays using QRI.microarrays using QRI.
VMAxS
Microarray image
Signal Values
Cel Report
Active Signal Processing
Further Analysis
in R
Cel Report File Reader
Project 1: Development of a more efficient file reader
• VMAxS generates Cel Report with gene and feature-level signal for a single microarray.• ~22000 genes• ≤ 69 features per gene • ≤ 7 statistical values for each gene and
feature
• Cel Report
Project 1: Development of a more efficient file reader
• Read through the entire file in the shortest amount of time
• Store the data in R data structure for further analysis
• Extract the statistic of interest with all labels attached (i.e. gene names, gene feature names, etc.)
• Goals:
R version Cel Report reader: average speed for one execution is over 30 sec.
Things to consider…
• Reading a file when no header information is disclosed
• Reading a file as efficient as possible =“open, read, close” in one step
• Use more efficient language: C• Interface C with R• Transferring C data structure to R data
structure
C Data Structure
1. Gene Feature ID2. Gene Feature
3. Gene ID4. Number of features
per Gene
5. Gene Results
R Data Structure
1. Feature Data
2. Number of Features
3. Gene Results
Advantages…
All vectors in C are dynamically allocated.
Both time and memory efficient:1. File is only read once
2. Only appropriate amount of memory is allocated for each data set
Runtime Comparison
16 Cel Reports, each with ~22000 genes
R Version C Version
9 min. 25 sec 28 sec
42 Cel Reports, each with ~22000 genes
R Version C Version
37min 57sec 1min 12sec
Project 2: Development of an automated comparative performance report
• Compare performance of ViaLogy’s analytical process to that of current standard approach (e.g., GCOS from Affymetrix)
• Write R script to automatically generate the following plots for performance report:
1. Sensitivity Bar Plots
2. CV Plots
3. ECDF Plots
Sensitivity Bar Plots
• Compares the Sensitivity of VMAxS to GCOS
1. Genes called Present in GCOS
2. Genes called Present in VMAxS
3. Genes called Present in GCOS, Absent in VMAxS
4. Genes called Present in VMAxS, Absent in GCOS
CV Plots• Purpose: Compare reproducibility
• Displays scatter plots of CV values for each gene.
• CVi = std.dev / mean for replicate signal values for gene i
• For each group of replicates, plot CVi,GCOS vs. CVi,VMAxS
ECDF Plot
• Displays empirical cumulative distribution function (ECDF) of the CV values for each analytical method
Subgroup Analysis
• For a given set of replicates, break down the data into smaller groups and compare the reproducibility in smaller sets of data
• One way to break down: consider PRESENT/ABSENT calls• Divide the genes into groups based on the number of
PRESENT calls received for each analytical method, e.g.:• 6 P in VMAxS, 0 P in GCOS• 6 P in VMAxS, 1 P in GCOS• 6 P in VMAxS, 2 P in GCOS• …• 0 P in VMAxS, 6 P in GCOS• Total of 49 (7x7) groups for 6 replicates.