Design of a Clinical Microarray Chipcompdiag.molgen.mpg.de/docs/jaeger@recomb2003.pdf · A virtual...

Post on 18-Jan-2021

1 views 0 download

Transcript of Design of a Clinical Microarray Chipcompdiag.molgen.mpg.de/docs/jaeger@recomb2003.pdf · A virtual...

Design of a Clinical Microarray ChipJ. Jäger and R. Spang

Department of Computational Molecular BiologyMax-Planck-Institute for Molecular Genetics, Ihnestr. 73, D-14195 Berlin (Germany)

E-mail: Jochen.Jaeger@molgen.mpg.de, Rainer.Spang@molgen.mpg.de

Problem setting

First results

Our goal is to reduce the costs for a clinical diagnostic system based onmicroarray chips. Currently whole genome chips are used to examine theexpression of as many genes as possible. We study the problem of howmany samples should be analyzed before moving from a full genome chipapproach to a smaller and therefore more cost efficient custom diagnosticchip.

Series of wholegenome chips

How many whole genome chips do we haveto look at before we can design a newdiagnostic chip?

Gene subset selection New compact

diagnostic chip

Experimental design

Diagnostic signature and gene selection

Intensity

Fre

quen

cy

Questions

Normal scenario:Rank all genes based on a teststatistic that evaluates differencesbetween diagnostic groups. Thenselect top genes from this list.

Perfect diagnosticmarker gene

Perfect diagnostic signatureusing more than one gene

Boxplot sampling k relapse and k control patients 20 times. Numberof genes on the new chip fixed to 300. Accuracy using all data: 0.77

Boxplot sampling 20 relapse and 20 control patients 10 times.Varying the number of genes from 10 to 300 step 10.

Intensity

Fre

quen

cy

Group 1 Group 2

Gene 1

Gen

e 2

For the gene subset selection we would like to cover as many of thepotential candidates as necessary for a consistent and reliable classprediction. These genes do not necessarily have to be the mostdiscriminative ones. They should rather represent a reliable subset onwhich further feature selection can be applied to. So chip design does notyet select genes for diagnostic signatures, but limits further featureselection to a subset of genes to choose from and is therefore differentfrom the well known feature/gene and diagnostic signature selection.

In order to study the effects of a new chip design for clinical trials weused the St. Jude* acute lymphoblastic leukemia dataset with 335patient samples (68 of which had a relapse) and simulated chip designfrom this dataset. To learn a classifier we used support vector machineswith feature selection prefiltering. The performance was evaluatedusing leave one out classification.

* E.-J. Yeoh, M.E. Ross et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1:133--145, March 2002.

Difference of Chip Design and Feature Selection

A virtual chip holds only a subset of genes of the original chip. Tocompare the performance of virtual chips we measure the performanceof a classifier based on this virtual chip. In order to simulate differentstudy sizes we randomly sample from the study and determine thesubset of genes for the virtual chip based on this data.

All chips of thewhole study

SamplingSelect subsetof genes forchip design

Apply subset towhole study

Random Sample

How many genes should we put on the newchip?(tradeoff between accuracy and budget)

Virtual chips forwhole study

Featureselection

Learn SVMClassifier

Calculate LOOCVperformance of thisclassifier

Top genes for allvirtual chips