Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e...
-
Upload
ethan-parks -
Category
Documents
-
view
215 -
download
1
Transcript of Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e...
![Page 1: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/1.jpg)
Statistical Issues in the Design of Statistical Issues in the Design of Microarray ExperimentsMicroarray Experiments
Lara LusaLara LusaU.O. Statistica Medica e BiometriaU.O. Statistica Medica e Biometria
Istituto Nazionale per lo Studio e la Cura dei Tumori, MilanoIstituto Nazionale per lo Studio e la Cura dei Tumori, Milano
NETTAB 2003NETTAB 2003
Bologna, 28th November 2003Bologna, 28th November 2003
![Page 2: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/2.jpg)
Outline
• Biostatistics and microarrays
• Study objectives
• Design of microarray experiments
• A case study: a designed experiment
![Page 3: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/3.jpg)
Biostatistics and microarrays
• Microarray research: unique challenge for interdisciplinary collaboration
• Can biostatisticians be useful in microarray research?
• Are available software tools a valid substitution for collaboration with biostatisticians?
![Page 4: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/4.jpg)
What can biostatisticians do?
• Active collaboration with researchers from biomedical and bioinformatics fields– to develop and critically evaluate methods for
• design of microarray experiments
• analysis of data
– to perform data-analysis– to develop software tools and train biomedical
researchers to use them
![Page 5: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/5.jpg)
Italian inter-university research group
• Statistical issues in design and analysis of microarray data
• MIUR grant 2003-2005– Firenze (Annibale Biggeri)– Milano (Giuseppe Gallus)– Padova (Monica Chiogna)– Torino (Mauro Gasparini)– Udine(Corrado Lagazio)
![Page 6: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/6.jpg)
Collaborations
•Milano •Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano (statisticians, biologists, molecular oncologists)• IFOM, Milano (biologists, bioinformatics)• Biometric Research Branch, NCI, Bethesda (statisticians)•“Edo Tempia” Foundation, Biella• Bioconductor poject (software development)
![Page 7: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/7.jpg)
Study objectives
• Class comparison (supervised)– establish differences in gene expression between
predetermined classes
• Class prediction (supervised)– prediction of phenotype using gene expression data
• Class discovery (unsupervised)– discover groups of samples or genes with similar
expression
![Page 8: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/8.jpg)
Design of microarray experiments
• Design of arrays
• Allocation of samples– Replication– Labeling of samples (cDNA)
• reference design
• balanced block design
• loop design
![Page 9: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/9.jpg)
Levels of replication
• Biological replicates– multiple samples from different populations
• Technical replicates– multiple samples from the same subject– multiple samples from the same mRNA– multiple clones or probes of the same gene on
the array
![Page 10: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/10.jpg)
How many replicates?• Biological replicates essential to make inference
about population• Technical replicates useful for quality control
and for increasing precision• How to determine sample size?
– Problem-dependent– simple methods available for class comparison
problems – not yet clear what to use for class discovery
![Page 11: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/11.jpg)
Common pitfalls in microarray experiments
• Too little or no replication• Use of replication at the wrong level• Experiments with cell lines: assuming no
variability among cell lines of the same type• Inappropriate use of pooling
– ok: use of multiple independent pools– but: is it useful? – individual information lost
![Page 12: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/12.jpg)
Case study: a designed experiment
• Biological aim: assess the effect of Toremifen on MCF-7 breast cancer cell line, in terms of gene expression
![Page 13: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/13.jpg)
BATCH
A1 A2 A3
Control
B1 B2 B3
Control Control Treatment Treatment Treatment
CDNA
Affymetrix
CDNA
Affymetrix
CDNA
Affymetrix
CDNA
Affymetrix
CDNA
Affymetrix
CDNA
Affymetrix
POOL POOL
CDNA Affymetrix CDNA Affymetrix
Week 1
![Page 14: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/14.jpg)
BATCH
A1 A2 A3
Control
B1 B2 B3
Control Control Treatment Treatment Treatment
POOL POOL
CDNA Affymetrix CDNA Affymetrix
Week 2 and 3
![Page 15: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/15.jpg)
Statistical aims
• comparison of microarray platforms (cDNA vs Affymetrix)
• hybridization of individual samples vs pools
• variability of cell lines
• robustness of commonly used statistical methods
![Page 16: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/16.jpg)
Data available (so far)
• Hybridizations from Affymetrix HGU133 Chips
• Summary measure of intensities: MAS5 (Affymetrix, 2002)
• most commonly used, but other possibilities available
– Robust Multichip Analysis (Irizarry et al., 2002)– Model-Based Expression Index (Li and Wong,
2001) (at least 16 chips!)
![Page 17: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/17.jpg)
Brief summary of data• HGU133A:
– chipA : 22.283 probe sets– chipB : 22.645 probe sets
• Present – chipA: 48.5%– chipB: 38.2%
• pm<mm– chipA: 27%– chipB: 31%
![Page 18: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/18.jpg)
Methods for exploring reproducibility among arrays
• Pearson’s coefficient of correlation (common, but wrong!)
• Coefficient of variation
• Distribution of differences of intensities
• Altman and Bland’s plot (MA plot)
![Page 19: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/19.jpg)
![Page 20: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/20.jpg)
Class comparison• Identification of differentially expressed genes
between treated and not treated cell lines
• t-tests (adjusting for multiple comparisons)– all arrays– only pooled arrays– only individual arrays
• ANOVA (linear) model – estimation of treatment effect, adjusting for pool effect
and week effect
![Page 21: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/21.jpg)
Some results...
• Pooled variance t-test on whole data– treated versus controls:
• chipA– 1948 p<0.001 (356 p<0.001 and abs(FC)>2)
– 240 with Bonferroni correction
• chipB– 743 p<0.001 (143 p<0.001 and abs(FC)>2)
– 76 with Bonferroni correction
![Page 22: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/22.jpg)
Some results...
• Pooled variance t-test on pooled data– treated versus controls:
• chipA– 204 p<0.001
» 189/(204, 1948) common to overall analysis
– 82 p<0.001 and abs(FC)>2
» 82/(82, 356) common to overall analysis
• chipB– 80 p<0.001
» 69/(80, 743) common to overall analysis
– 37 p<0.001 and abs(FC)>2
» 37/(37, 143) common to overall analysis
![Page 23: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/23.jpg)
Some results...
• Pooled variance t-test on “individual” data– treated versus controls:
• chipA– 669 p<0.001
» 594/(669, 1948) common to overall analysis
– 226 p<0.001 and abs(FC)>2
» 221/(226, 356) common to overall analysis
• chipB– 245 p<0.001
» 196(245, 743) common to overall analysis
– 80 p<0.001 and abs(FC)>2
» 77/(80, 143) common to overall analysis
![Page 24: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/24.jpg)
Some results...
• Pooled variance ANOVA results– treated versus controls:
• chipA– 1913 p<0.001
» 1624/(1913, 1948) common to overall analysis
– 343 p<0.001 and abs(FC)>2
» 339/(343, 356) common to overall analysis
• chipB– 245 p<0.001
» 196(245, 743) common to overall analysis
– 80 p<0.001 and abs(FC)>2
» 77/(80, 143) common to overall analysis
![Page 25: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/25.jpg)
![Page 26: Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.](https://reader034.fdocuments.us/reader034/viewer/2022042821/56649b4e550346318e8c7340/html5/thumbnails/26.jpg)
Conclusions
• ?????????
• So far no evidence for the usefulness of pooling data from cell lines… no evidence of decreased variability
• … but need to further investigate the differences in the “individual” versus “pooled” results
• Need for a plan of biological (quantitative) validations of expression measures