An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk...

25
An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research U.S. Food and Drug Administration 2006 FDA and Industry Workshop September 29, 2006 The views expressed in this presentation do not represent those of the U.S. Food and Drug Administration

Transcript of An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk...

Page 1: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

An Analysis of MicroArray Quality Control Data

James J. Chen, Ph.D.Division of Biometry and Risk AssessmentNational Center for Toxicological Research

U.S. Food and Drug Administration

2006 FDA and Industry WorkshopSeptember 29, 2006

The views expressed in this presentation do not represent those of the U.S. Food and Drug Administration

Page 2: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Outline

Background: MAQC experimental design and data

Microarray Platform Comparisons Inter-platform analysis Intra-platform analysis and platform’s performance

concordance, site effects, consistency, discriminability sensitivity, specificity, and accuracy in gene selection self-consistency of titration mixture

TaqMan and microarray platforms comparability Conclusion

Page 3: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

MicroArray Quality Control Project

Objective: To compare expression data generated at multiple test sites (labs) using several microarray-based and alternative technology platforms

Microarray platforms Alternatives platforms

Applied Biosystems ABI (1) Applied Biosystems (TAQ)Affymetrix  AFX (1) Panomics (QGN) Agilent AGI (1, 2) Gene Express (GEX)

Eppendorf  EPP (1) GE Healthcare  GEH (1)Illumina  ILM (1)NCI_Operon  NCI (2)

Nature Biotechnologyv24(9), Sep (2006)

Page 4: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

MAQC Experimental Design

Four RNA samples: Sample A: Universal human reference RNA (Stratagene) Sample B: Human brain reference RNA (Ambion) Sample C (75% A + 25% B) Sample D (25% A + 75% B) Three sites for each microarray platform (NCI: 2 sites) One site for the TAQ, QGN, GEX Five technical replicates for each microarray platform Four replicates for TAQ, three replicates for QGN & GEX

EPP: 294 target genes; QGN: 245; GEX:205

Page 5: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

MAQC Data Used for Comparisons

Platform

ABIAFXAGIGEHILM

TAQ

Probe

32,87854,67543,93154,35947,293

1,004

Site

33

3 33

1

Array2

5860566059

N/A

Rep1

55555

412,091 common genes among microarray platforms 906 TAQ genes are among the 12,091 genes1. technical replicates; 2. a total of 293 arrays

Sample

44444

4

Page 6: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Hierarchical Clustering of 293 arrays on12091 genes from all pairwise correlations between two arrays.

AFXABIAGLGEHILM

ABCD

Site1Site2Site3

Sam

ple

Site

Page 7: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

A B C D

0.5

0.6

0.7

0.8

Concordance: all pairwise Inter-platform sample correlation coefficients between two arrays from different platforms.

Up to 2250 (10x15x15) correlations computed for each sample.

.74.70 .71

.68

.82

.45

Page 8: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Concordance: all pairwise Inter-platform fold-change correlation coefficients between two arrays from different platforms.

B/A C/A D/A C/B D/B C/D

0.6

0.7

0.8

0.9

90 (10 x 3 x 3) correlations for each fold-change

.85

.75

.82.78

.84

.78

.92

.53

Page 9: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Cross Platform Consistency

Proportion of genes shows a significant platform*sample interaction from the (gene-by-gene) ANOVA:

y = m + P + Sample + P*Sample + e

Significant interaction: the patterns of expression of the four samples are inconsistent across the platforms.

Page 10: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

alpha:10pow er

pro

portio

n o

f sig

nifi

cance

s

0.2

0.4

0.6

0.8

1.0

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0

Plot of the p-values versus ranking proportions

Proportion

log10 p

The proportion of significances is 30% at = 0.01

0.3

Page 11: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Inconsistency (p < 0.01) Consistency (p > 0.01) Gene5 ,pvalue < 10 11

-1

0

1

2

3

AFX ABI AG1 ILM GEH

ABCD

Gene21 ,pvalue = 0.001

-1

0

1

2

3

AFX ABI AG1 ILM GEH

ABCD

Gene312 ,pvalue = 0.11

-1

0

1

2

3

AFX ABI AG1 ILM GEH

ABCD

Gene15 ,pvalue = 0.991

-1

0

1

2

3

AFX ABI AG1 ILM GEH

ABCD

Page 12: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Intra-Platform Analysis

Concordance: all pairwise correlations between two arrays from different sites for samples A,B,C, and D (3 x 5 x 5 correlations).

Site Effects: ANOVA: y = m + sample+ site + sample*site + e

Site Effect: the variance ratio, F = MSEsite/MSEe

Consistency: proportion of genes shown to have a significant sample*site interaction (

Discriminability: ANOVA: y = m + sample + e

Variability: residual mean square (total variation other than sample differences).

Discriminability: the proportion of the genes shown to have significant sample effects ( .

Page 13: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Individual Platform’s Performance

Reproducibility and Consistency Performance Median Correlation Site Cons’y MSE

Discr’ty2

A B C D Fm 1

AFX .988 .988 .991 .992 24. .012 .066 .618

ABI .968 .964 .972 .969 15. .008 .107 .620

AG1 .978 .982 .982 .981 28. .063 .090 .633

ILM .980 .979 .980 .981 242. .020 .266 .441 GEH .925 .904 .872 .862 64. .097 .267 .453

2.

Page 14: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Gold Standard Set

A gene is differentially expressed if it was shown to be significant in at least 2 of the 5 platforms at 10-5.

H0: A - B = 0 versus H1: A - B ≠ 0 (8265 genes were selected)

A gene is non-differentially expressed if its fold change was shown to be between 0.90 and 1/0.90 in at least 2 of the 5 platforms at 10-3. Let - log2(0.90)

Equivalence test: H0: |A-B| > versus H1: |A-B| <

(498 genes were selected)

Gold Standard: 8607 genes (delete 78 overlaps)

Page 15: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Accuracy (AC), sensitivity (SN), specificity (SP), and FDR by FWE = 0.05* and FDR = 0.05 as threshold.

AC SN SP FDR

.77 .76 .95 .004

.74 .73 .95 .004

.81 .80 .80 .003

.55 .53 1.0 .000

.54 .52 .95 .005

AC SN SP FDR

.92 .94 .55 .024

.89 .91 .59 .023

.92 .94 .55 .024

.88 .88 .95 .023

.82 .82 .69 .019

AFXABIAG1ILMGEH

FWE = 0.05* FDR = 0.05

= 0.05/8607 = 5.8 x 10-6

Page 16: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Comment on MAQC: Gene Selection

The MAQC project used technical replicates (small variance) with two distinct biological samples (large difference).

The number of differential expressed genes are much more than typical microarray experiments.

Generating a gene list is not a problem, the problem is determining the number of genes in the list.

General principle: to identify a list of differentially expressed genes as accurately as possible.

Page 17: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Reproducibility of lists of differentially expressed genes – Percentage of Overlapping Genes (POG)

For AFX, 6319 genes have p < 10-5 4370 genes have FC > 2.

For AB1, 6127 genes have p < 10-5 4835 genes have FC > 2.

At least more than 4,000 genes can be selected with an FDR estimate less than 2/4,000.

from MAQC Fig S2 of supplements.

Page 18: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Assessment of Titration Trend

Titration correlations: 0.75A+0.25B and C 0.25A+0.75B and D

Titration model: (A two-step test)

The titration relationship can be modelled by M1t: y = m + Conc + Site + e

Full ANOVA model. M1 y = m + Sample + Site + e

S1: Test for Sample difference M1: H0t1: A = B = C = D

S2: Test for the goodness of fit: H0t2 M1t = M1 Proportion of genes that reject H0t1and accept H0t2

Page 19: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Linear Titration Model

H0t1:A H0t1:R,H0t2:A H0t1:R,H0t2:R

p1= 0.316

2

4

6

8

10

12

B_0 D_0.25 C_0.75 A_1

p1<0.0001 , p2= 0.108

2

4

6

8

10

12

B_0 D_0.25 C_0.75 A_1

p1<0.0001 , p2<0.0001

2

4

6

8

10

12

B_0 D_0.25 C_0.75 A_1

Page 20: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Titration correlation for samples C and D, and the proportions of the genes that follow the titration relationship.

Sample C Sample D (5%, 5%) (1%, 1%)

.909 .911 .963 .976 .916 .928 .954 .967 .930 .939 .923 .944 .930 .936 .937 .954 .923 .934 .988 .988

AFXABIAG1ILMGEH

Correlation Titration Model (,

Page 21: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Taqman and microarray platform concordance: Box-Plots of all pairwise sample correlation coefficients. Corre. of TAQ v.s. microarrays

0.50

0.55

0.60

0.65

0.70

0.75

0.80

AFX ABI AG1 ILM GEH

AB

60 (4 x 15) correlations computed in each sample

.78

.62

.77.75

.74.76

.66

.71

.74

.71

.52

.80

Page 22: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Taqman and microarray platform concordance: Box-Plots of fold-change (B/A) correlation coefficients.Corre. of TAQ v.s. microarrays: B/A

0.82

0.84

0.86

0.88

0.90

AFX ABI AG1 ILM GEH

.86

.88

.89

.86

.89

.82

.90

Page 23: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Consistency of TaqMan and Microarray platforms

Proportions of significances: 0.72, 0.57, 0.49, 0.65, 0.39; Proportion of significances microarray platforms: 0.30

pvalue = 0.74

0.0

0.5

1.0

1.5

2.0

AFX ABI AG1 ILM GEH

ABCD

0.0

0.5

1.0

1.5

2.0

AFX ABI AG1 ILM GEH

ABCD

10 10

10 8

10 4

10 9

10 7

microarray platforms Taqman and microarray

Page 24: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Conclusion (1)

Inter platform (microarray and Taqman): Concordance

Sample correlations: 0.45(D)-0.82 (A) FC correlations: Higher B/A; Lower: C/A

In-consistency Microarray platforms: Thirty percent (30%) of genes show

inconsistent expression patterns at = 0.01. Taqman and microarray platforms: The proportions are

between 0.34 to 0.74 for the five platforms.

Comparability Intensities measured by different microarray platforms, and

measured between microarray and Taqman platforms are different.

Page 25: An Analysis of MicroArray Quality Control Data James J. Chen, Ph.D. Division of Biometry and Risk Assessment National Center for Toxicological Research.

Conclusion (2)

Titration Trend Titration Correlation: The correlations between observed

intensity and expected intensity are more than 90%. Titration trend: All five platforms follow the linear titration

relationship well.

Intra microarray platforms’ performance

Concordance: Intra-platform correlations are high. Site effect: All platforms show site effects. Consistency: The patterns of expression are consistent across

three sites.