Design of a Clinical Microarray Chipcompdiag.molgen.mpg.de/docs/[email protected] · A virtual...
Transcript of Design of a Clinical Microarray Chipcompdiag.molgen.mpg.de/docs/[email protected] · A virtual...
Design of a Clinical Microarray ChipJ. Jäger and R. Spang
Department of Computational Molecular BiologyMax-Planck-Institute for Molecular Genetics, Ihnestr. 73, D-14195 Berlin (Germany)
E-mail: [email protected], [email protected]
Problem setting
First results
Our goal is to reduce the costs for a clinical diagnostic system based onmicroarray chips. Currently whole genome chips are used to examine theexpression of as many genes as possible. We study the problem of howmany samples should be analyzed before moving from a full genome chipapproach to a smaller and therefore more cost efficient custom diagnosticchip.
Series of wholegenome chips
How many whole genome chips do we haveto look at before we can design a newdiagnostic chip?
Gene subset selection New compact
diagnostic chip
Experimental design
Diagnostic signature and gene selection
Intensity
Fre
quen
cy
Questions
Normal scenario:Rank all genes based on a teststatistic that evaluates differencesbetween diagnostic groups. Thenselect top genes from this list.
Perfect diagnosticmarker gene
Perfect diagnostic signatureusing more than one gene
Boxplot sampling k relapse and k control patients 20 times. Numberof genes on the new chip fixed to 300. Accuracy using all data: 0.77
Boxplot sampling 20 relapse and 20 control patients 10 times.Varying the number of genes from 10 to 300 step 10.
Intensity
Fre
quen
cy
Group 1 Group 2
Gene 1
Gen
e 2
For the gene subset selection we would like to cover as many of thepotential candidates as necessary for a consistent and reliable classprediction. These genes do not necessarily have to be the mostdiscriminative ones. They should rather represent a reliable subset onwhich further feature selection can be applied to. So chip design does notyet select genes for diagnostic signatures, but limits further featureselection to a subset of genes to choose from and is therefore differentfrom the well known feature/gene and diagnostic signature selection.
In order to study the effects of a new chip design for clinical trials weused the St. Jude* acute lymphoblastic leukemia dataset with 335patient samples (68 of which had a relapse) and simulated chip designfrom this dataset. To learn a classifier we used support vector machineswith feature selection prefiltering. The performance was evaluatedusing leave one out classification.
* E.-J. Yeoh, M.E. Ross et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1:133--145, March 2002.
Difference of Chip Design and Feature Selection
A virtual chip holds only a subset of genes of the original chip. Tocompare the performance of virtual chips we measure the performanceof a classifier based on this virtual chip. In order to simulate differentstudy sizes we randomly sample from the study and determine thesubset of genes for the virtual chip based on this data.
All chips of thewhole study
SamplingSelect subsetof genes forchip design
Apply subset towhole study
Random Sample
How many genes should we put on the newchip?(tradeoff between accuracy and budget)
Virtual chips forwhole study
Featureselection
Learn SVMClassifier
Calculate LOOCVperformance of thisclassifier
Top genes for allvirtual chips