Pooling data across Micorarray

50
Pooling Data Across Microarray Experiments Using Different Versions of Affymetrix Oligonucleotide Arrays Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes UT MD Anderson Cancer Center Houston, TX, USA

Transcript of Pooling data across Micorarray

Page 1: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 1/50

Pooling Data Across Microarray

Experiments Using Different Versions of AffymetrixOligonucleotide Arrays

Jeffrey S. Morris

Li Zhang, Chunlei Wu, Keith

Baggerly and Kevin CoombesUT MD Anderson Cancer Center

Houston, TX, USA

Page 2: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 2/50

Combining Information across

Microarray Studies

Many publicly available microarray data sets

Can combine information across studies to:1. Validate results from individual studies Find intersection of differentially expressed genes Build model using one study, validate using another

2. Discover new biological insights by analysespooling data across studies.

Potential for increased statistical power Important since many individual studies are

underpowered.

Page 3: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 3/50

Pooling Data across Studies Challenge: In general, microarray data

from different studies not comparable Clinical differences

Different study populations

Technical differences Laboratory differences: sample collection and

storage, microarray protocol Different platforms: cDNA/oligo, different 

versions of same technology (e.g. Affy chips)

Page 4: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 4/50

Pooling Data across Studies Approaches in Existing Literature:1. Include study effects in model

Gene-specific study effects SVD, Distance-weighted DiscriminationDrawback: First-order corrections not enough

2. Model unitless summary measures

standardized log fold-change t-statistics probabilities of +/0/- expressionDrawback: Implicit assumptions about comparability

of clinical populations across studies

Page 5: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 5/50

Pooling Data across Studies Sequence-related reasons for incomparability

of raw expression levels across platforms:

Cross-hybridization RNA degradation (near 5 end) Probe validity map to RefSeq? Alternative splicing

It may be possible that, by taking these into

account, we can obtain more comparable rawexpression levels to use in pooled analyses

Our focus: combining information acrossdifferent versions of Affymetrix genechips

Page 6: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 6/50

Overview of Affymetrix

GeneChips Probes: 25-base sequences from gene

of interest  Probesets: set of probes corresponding

to same gene. Obtained from current sequence information

in GenBank, Unigene, RefSeq

Generations of human chips: HuGeneFL: 5600 genes, 20 probes/gene U95Av2: 10,000 genes, 16 probes/gene U133A: 14,500 genes, 11 probes/gene

Page 7: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 7/50

Example: CAMDA Lung Cancer

Data CAMDA: Critical Assessment of

Microarray Data Analysis: annualconference at Duke University

CAMDA 2003: Two studies relating geneexpression data to survival in lung cancerpatients

1. Harvard (Bhattacharjee, et al. 2001) 124 lung adenocarcinoma samples

2. Michigan (Beer, et al. 2002) 86 lung adenocarcinoma samples

GOAL: Pool data across studies to identify

prognostic genes for lung cancer.

Page 8: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 8/50

Pooling Information across Studies

Harvardpatients worseprognosis

Page 9: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 9/50

Pooling Information across Chip

Types

Michigan: HuGeneFl Chip

6,633 probe sets -- 20 probe pairs each

Harvard: HG_U95Av2 Chip

12,453 probe sets 16 probe pairs each

Problems:

Different genes

Incomparable expression levels

Page 10: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 10/50

Example: CAMDA 2003 Data Two studies used different chip types:

Michigan: HuGeneFL

~130,000 probes in 6,633 probesets Harvard: U95Av2

~200,000 probes in 12,625 probesets

34,428 matching probes combined into 4,101partial probesets

After preprocessing, 1036 probesets consideredin subsequent analyses

We used PDNN (Zhang et al. 2003 NatureBiotech) method for quantification

Page 11: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 11/50

Partial Probeset Method

HuGeneFL :HG_U95Av2:

Matching ProbesPartial Probesets

1. Identify matching probes

2. Recombine into new probesets based on UNIGENE

clusters, which we refer to as partial probesets3. Eliminate any probesets containing just one or two

probes

Note: Any quantification method can subsequently beused (MAS, dChip, RMA, PDNN)

Page 12: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 12/50

Pooling Information across Chip

Types

Page 13: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 13/50

Quantification of Expression Levels

Gene expressions quantified by applying LisPDNN model to our partial probesets

Uses probe sequence info to predict patterns of specific and nonspecific hybridization intensities

Allows borrowing of strength across probe sets

Model is not overparameterized O(N probesets)

See Zhang, et al. (2003) Nature Biotech forfurther details on method and comparison

Page 14: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 14/50

Page 15: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 15/50

Assessing Our Method for Combining

Information Across Chip Types

Partial

Probesetmethodappears to givecomparableexpressionlevels acrosschip types.

Page 16: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 16/50

Assessing our Method for Combining

Information across Chip Types

Median partial probesetsize is 7, vs. 16 or 20

Loss of precision?

No evidence of significant precision loss

Page 17: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 17/50

Assessing Partial Probeset Method

Agreement inrelative

quantificationsacross samples

Page 18: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 18/50

Assessing Partial Probeset Method Agreement in

relative

quantificationsacross samples

Less variablegenes worse

Page 19: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 19/50

Assessing Partial Probeset Method Agreement in

relative

quantificationsacross samples

Less variablegenes worse

Eliminate geneswith sd<0.20 orr<0.90

Page 20: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 20/50

Assessing Partial Probeset Method Agreement in

relative

quantificationsacross samples

Less variablegenes worse

Eliminate geneswith sd<0.20 orr<0.90

1,036 genes

Page 21: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 21/50

Identifying Prognostic Genes1. Preprocess raw microarray data

Outlier Detection, Normalization, Quantification,

Remove Some Genes? Left with n-by-p matrix of expression levels for

p genes on n microarrays.

2. Identify which genes are correlated to

outcome of interest  Perform standard statistical test for each gene

obtain (permutation) p-values

Find cutpoint on p-values to declaresignificance that accounts for multiplicities.

Page 22: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 22/50

Identifying Prognostic Genes After preprocessing:

1036 genes, 200 samples Identify genes related to survival

After adjusting for known clinical

predictors Provide prognostic information on survival

above and beyond clinical predictors

Page 23: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 23/50

Identifying Prognostic Genes:

Cox Regression Modeling Hazard : P(t) ~ Prob(X<t +(t | X>t )

Cox Model: Pi(t) = P0(t) exp(X i F )

X i = Vector of covariates for subject i F = Vector of regression coefficients

Key Assumption: Proportional Hazards Hazard ratio between subjects with different 

covariates does not vary over time.

Pi(t )/Pk(t ) = exp{ (X i-X k) F } Exp(F) =Change in hazard per unit change in X 

Page 24: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 24/50

Identifying Prognostic Genes:

Cox Regression Modeling Best Clinical Model:

Factor FExp(F) Z p

StudyMichigan = 0

Harvard  = 1

0.67 1.95 2.73 0.0062

Age 0.03 1.03 2.60 0.0094

StageEarly (1-2) = 0

Late  (3-4) = 1

1.53 4.61 6.61 <0.000000001

Page 25: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 25/50

Identifying Prognostic Genes Series of 1036 multivariable Cox models fit to

identify prognostic genes. Each model contained: Study (Michigan=-1, Harvard=1). Age (continuous factor). Stage (early=0/late=1). Probeset (log intensity value as continuous factor).

Exact p-values for each probeset computed

using permutation approach By using multivariate modeling, we search for genes

offering prognostic information beyond clinicalpredictors

Page 26: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 26/50

Identifying Prognostic Genes:

BUM Method No prognostic genes pvals Uniform

Prognostic genes

smaller pvals Fit Beta-Uniform mixture to histogram

of p-values BUM method

(Pounds and Morris, 2003 Bioinformatics)

Method can be used to identifyprognostic genes while controlling FDR

Page 27: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 27/50

Results Histogram suggests

there are somesignificant probesets

FDR=0.20 correspondspval cutoff of 0.0024(BUM, Pounds andMorris 2003)

26 probesets flaggedas significant 

Page 28: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 28/50

Selected Flagged GenesRank Gene F p Function

1 FCGRT -2.07 <0.00001 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 Marker of NSCLC4 RRM1 1.81 0.00002 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 Marker of NSCLC

11 CPE 0.72 0.00031 Marker of SCLC

12 ADRBK1 -2.20 0.00044 Co-expressed with Cox-2 in lung ADC

16 CLU -0.52 0.00109 Marker of SCLC

20 SEPW1 -1.29 0.00145 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 Marker of invasiveness in Stg 1 NSCLC

25 BTG2 -0.75 0.00232 Induced by p53 in SCLC cell lines

Page 29: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 29/50

Selected Flagged GenesRank Gene F p Function

1 FCGRT -2.07 <0.00001 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 Marker of NSCLC

4 RRM1 1.81 0.00002 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 Marker of NSCLC

11 CPE 0.72 0.00031 Marker of SCLC

12 ADRBK1 -2.20 0.00044Co-expressed with Cox-2 in lung ADC

16 CLU -0.52 0.00109 Marker of SCLC

20 SEPW1 -1.29 0.00145 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 Marker of invasiveness in Stg 1 NSCLC

25 BTG2 -0.75 0.00232 Induced by p53 in SCLC cell lines

Page 30: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 30/50

Selected Flagged GenesRank Gene F p Function

1 FCGRT -2.07 <0.00001 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 Marker of NSCLC

4 RRM1 1.81 0.00002 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 Marker of NSCLC

11 CPE 0.72 0.00031 Marker of SCLC

12 ADRBK1 -2.20 0.00044Co-expressed with Cox-2 in lung ADC

16 CLU -0.52 0.00109 Marker of SCLC

20 SEPW1 -1.29 0.00145 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 Marker of invasiveness in Stg 1 NSCLC

25 BTG2 -0.75 0.00232 Induced by p53 in SCLC cell lines

Page 31: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 31/50

Selected Flagged GenesRank Gene F p pStage Function

1 FCGRT -2.07 <0.00001 0.154 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 0.282Marker of NSCLC

4 RRM1 1.81 0.00002 0.321 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979 Marker of NSCLC

11 CPE 0.72 0.00031 0.088 Marker of SCLC

12 ADRBK1 -2.20 0.00044 0.484 Co-expressed with Cox-2 in PUC

16 CLU -0.52 0.00109 0.014 Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082 Marker of invasiveness in Stage 1NSCLC

Page 32: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 32/50

Selected Flagged GenesRank Gene F p pStage Function

1 FCGRT -2.07 <0.00001 0.154 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 0.282Marker of NSCLC

4 RRM1 1.81 0.00002 0.321 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979 Marker of NSCLC

11 CPE 0.72 0.00031 0.088 Marker of SCLC

12 ADRBK1 -2.20 0.00044 0.484 Co-expressed with Cox-2 in PUC

16 CLU -0.52 0.00109 0.014 Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082 Marker of invasiveness in Stage 1NSCLC

Page 33: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 33/50

Selected Flagged GenesRank Gene F p pStage Function

1 FCGRT -2.07 <0.00001 0.154 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 0.282Marker of NSCLC

4 RRM1 1.81 0.00002 0.321 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979 Marker of NSCLC

11 CPE 0.72 0.00031 0.088 Marker of SCLC

12 ADRBK1 -2.20 0.00044 0.484 Co-expr with Cox-2 in lung adenocarc

16 CLU -0.52 0.00109 0.014 Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082 Marker of invasiveness in Stage 1NSCLC

Page 34: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 34/50

Selected Flagged GenesRank Gene F p pStage Function

1 FCGRT -2.07 <0.00001 0.154 Induced by IF-K in treating SCLC

2 ENO2 1.46 0.00001 0.282 Marker of NSCLC

4 RRM1 1.81 0.00002 0.321 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979 Marker of NSCLC

11 CPE 0.72 0.00031 0.088 Marker of SCLC

12 ADRBK1 -2.20 0.00044 0.484Co-expressed with Cox-2 in PUC

16 CLU -0.52 0.00109 0.014 Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082 Marker of invasiveness in Stage 1NSCLC

Page 35: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 35/50

Selected Flagged Genes

Rank Gene F p pStage Function

3 NFRKB -2.81 0.00001 0.058 Amplified in AML

7 ATIC 1.81 0.00009 0.771 Fusion partner of ALK which definessubtype of ALCL

13 BCL9 -1.64 0.00069 0.057 Over-expressed in ALL

15 TPS1 -0.64 0.00107 0.882 Associated with pulmonaryinflammation

25 BTG2 -0.75 0.00232 0.726 Inhibits cell proliferation in primarymouse embryo fibroblasts lackingfunctional p53

Page 36: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 36/50

Selected Flagged Genes

Rank Gene F p pStage Function

3 NFRKB -2.81 0.00001 0.058 Amplified in AML

7 ATIC 1.81 0.00009 0.771 Fusion partner of ALK which definessubtype of ALCL

13 BCL9 -1.64 0.00069 0.057 Over-expressed in ALL

15 TPS1 -0.64 0.00107 0.882 Associated with pulmonaryinflammation

25 BTG2 -0.75 0.00232 0.726 Inhibits cell proliferation in primarymouse embryo fibroblasts lackingfunctional p53

Page 37: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 37/50

Selected Flagged Genes

Rank Gene F p pStage Function

3 NFRKB -2.81 0.00001 0.058 Amplified in AML

7 ATIC 1.81 0.00009 0.771 Fusion partner of ALK which definessubtype of ALCL

13 BCL9 -1.64 0.00069 0.057 Over-expressed in ALL

15 TPS1 -0.64 0.00107 0.882 Associated with pulmonaryinflammation

25 BTG2 -0.75 0.00232 0.726 Inhibits cell proliferation in primarymouse embryo fibroblasts lackingfunctional p53

Page 38: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 38/50

Results Our gene list has almost no overlap with other

publications of these data. Reasons:

1. We addressed a different research question Us: ID Genes offering prognostic info beyond clinical

Michigan: Univariate Cox models fit; results used toconstruct dichotomous risk index

Harvard: Cluster analysis done; clusters linked tosurvival; found genes driving the clustering

2. Pooling across studies yielded significantgains in statistical power.

Most genes (17/26) in our study are not flagged if we

analyze 2 data sets separately (i.e. no pooling)

Page 39: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 39/50

CAMDA 2003 Results

We Won!

Page 40: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 40/50

Limitations of Partial Probeset Method Worked well for combining across HuGeneFL/

U95Av2

~25% probes from HuGeneFL on U95Av2, with4,101 probesets

Not enough matching probes for use withU95Av2/U133A ~6% of probes from U95Av2 also on U133A, with

only 628 probesets

Requiring matching probes strong criterion,maybe weaker criterion would suffice?

Page 41: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 41/50

Page 42: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 42/50

Full-Length Transcript Based Probesets New probeset definition (FLTBP): probes match

the same set of full-length mRNA sequences

Procedure1. Construct comprehensive library of full-length mRNA

transcript sequences from RefSeq and HinvDB

2. For each probe, identify all matching full-length

transcripts using Blast programU95Av2: 15% matched no sequence, 33% matched multiple seq.

U133A:   18% matched no sequence, 38% matched multiple seq.

3. Group probes with same matched target lists (FLTBPs)U95Av2: 23,972 probesets, U133A: 14,148 probesets

Page 43: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 43/50

Full-Length Transcript Based Probesets Matching across chip types:

9,642 FLTBPs match across U95Av2 and U133A

Affymetrix has their own method for mapping theirprobesets across arrays 9,480 pairs of probesets(only about ½ map the same way as FLTBPs)

Example: Lung cancer cell line data

28

cell lines, each hybridized onto both U95Av2 andU133A arrays.

Paired design suggests any differences between pairedmeasurements due to technical, not biological, sources.

Different quantification methods (PDNN, RMA, MAS, dChip)

Page 44: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 44/50

Results Densityestimate of chip-to-chipcorrelations for

each gene Positive shift 

for FLTBPsuggests bettercorrelations

Improvement greatest forPDNN

Correlation stillnot perfect 

-0.5 0.0 0.5 1.0

0.0

0.5

1

.0

1.5

2.0

2.5

Gene-gene c orrelation

D

ensity

A

PDNN(P<0.00001)

-0.5 0.0 0.5 1.0

0.0

0.5

1

.0

1.5

2.0

2.5

Gene-gene c orrelat ion

D

ensity

B

RMA(P<0.00031)

-0 .5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Gene-gene c orrelation

D

ensity

C

MAS5(P<0.00575)

-0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Gene-gene c orrelat ion

D

ensity

D

dChip(P<0.00005)

Page 45: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 45/50

Page 46: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 46/50

Example: Sample Gene 2

4

5

6

7

8

9

10

1 6 11 16 21 26

Sample

Ln(probe sign

al)

4

5

6

7

8

9

10

1 6 11 16 21 26

Sample

Ln(probe sign

al)

11

12

13

14

15

7 8 9 10 11

U95Av2

U133A

R2

=0.87

R2

=0.25

Again, significantly higher correlation using FLTBP thanusing Affymetrix definition

Page 47: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 47/50

Results Boxplot of 

chip-to-chipcorrelations

(over genes)for eachsample

PDNN resulted

in highercorrelations

PDNN RMA MAS5 dChip

FLTBP

AffyPS

FLTBP

AffyPS

FLTBP

AffyPS

FLTBP

AffyPS

0.6

0.7

0.8

0.9

1.0

Sam

pl

la

ti

Page 48: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 48/50

Conclusions New method for pooling info across studies

using different versions of Affymetrix chips. Recombine matched probes into new

probesets using Unigene clusters. Method appears to obtain comparable

expression levels across chips without sacrificingmuch precision or significantly altering therelative ordering of the samples.

Worked well combining information acrossHuGeneFL/U95Av2, but not U95Av2/U133A

Page 49: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 49/50

Conclusions

Discussed new probeset definition based on

full-length transcript sequences. Removes effect of known alternative splicing Yields stronger between-chip correlations than

Affymetrix standard definitions

Pooling information across studies is difficult there is still more work to be done but worththe effort.

Page 50: Pooling data across Micorarray

8/7/2019 Pooling data across Micorarray

http://slidepdf.com/reader/full/pooling-data-across-micorarray 50/50

ReferencesMorris JS, Yin G, Baggerly KA, Wu C, and Zhang L (2005).Pooling Information Across Different Studies and OligonucleotideMicroarray Chip Types to Identify Prognostic Genes for LungCancer.  Methods of Microarray Data Analysis IV, eds. JSShoemaker and SM Lin, pp. 51-66, New York: Springer-Verlag.

Wu C, Morris JS, Baggerly KA, Coombes KR, Minna JD, andZhang L (2005). A probe-to-transcripts mapping method forcross-platform comparisons of microarray data taking into

account the effects of alternative splicing.  Under review.

Morris JS, Wu C, Coombes KR, Baggerly KA, Wang J, and ZhangL (2006). Alternative Probeset Definitions for CombiningMicroarray Data Across Studies Using Different Versions of Affymetrix Oligonucleotide Arrays. To appear in Meta-Analysis in

Genetics edited by Rudy Guerra and David Allison Chapman