Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A...
Transcript of Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A...
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Microarray data analysis using BioConductor
Guiyuan Lei
Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN)School of Mathematics & Statistics
Newcastle Universityhttp://www.mas.ncl.ac.uk/∼ngl9/
6 June, 2008
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Outline
Outline
Bioconductor http://www.bioconductor.orgPre-process dataIdentify differential expressionNetwork inference
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis
Pre-process of data
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalising Microarray dataProbeset level expression to gene level expressionPrincipal Component Analysis
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis
Entering data into Bioconductor
library(affy)fns2 = list.celfiles(path="data2", full.names=TRUE)rawdata = ReadAffy(filenames=fns2)print(rawdata)
Strain 0 hours 1 hour 2 hours 3 hours 4 hoursMutant 1 yeast01.cel yeast02.cel yeast03.cel yeast04.cel yeast05.celWild type 1 yeast06.cel yeast07.cel yeast08.cel yeast09.cel yeast10.celMutant 2 yeast11.cel yeast12.cel yeast13.cel yeast14.cel yeast15.celWild type 2 yeast16.cel yeast17.cel yeast18.cel yeast19.cel yeast20.celMutant 3 yeast21.cel yeast22.cel yeast23.cel yeast24.cel yeast25.celWild type 3 yeast26.cel yeast27.cel yeast28.cel yeast29.cel yeast30.cel
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis
Mask file for Cerevisiae probesets
Mask file to filter out pombe probesetshttp://www.affymetrix.com/Auth/support/downloads/mask files/s cerevisiae.zips_cerevisiae<-scan("s_cerevisiae.msk", skip=2, list("", ""))pombe_filter_out<-s_cerevisiae[[1]]
RemoveProbe [1]source("RemoveProbes.r")library(yeast2probe)cleancdf = cleancdfname("yeast2")RemoveProbes(listOutProbes=NULL, pombe_filter_out,
"yeast2cdf","yeast2probe")
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis
Exploratory data analysis
Examining raw imagesProbe intensitiesMA plotsRNA degradation
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis
Pre-process of data: Normalisation
yeast01.cel yeast08.cel yeast15.cel yeast22.cel yeast29.cel
68
1012
1416
Cerevisiae Probe intensities
yeast01.cel yeast08.cel yeast15.cel yeast22.cel yeast29.cel
46
810
1214
Cerevisiae RMA expression values
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis
Principal Component Analysis [5]
−20 −10 0 10 20 30 40
−10
−5
05
10
1st PC
2nd
PC
Clustering Samples
Principal Component Projection
yeast01.cel
yeast02.cel
yeast03.cel
yeast04.cel
yeast05.cel
yeast06.cel
yeast07.cel
yeast08.cel
yeast09.celyeast10.cel
yeast11.cel
yeast12.celyeast13.cel
yeast14.cel
yeast15.cel
yeast16.cel
yeast17.celyeast18.cel
yeast19.celyeast20.cel
yeast21.cel
yeast22.cel
yeast23.cel
yeast24.cel
yeast25.cel
yeast26.celyeast27.cel
yeast28.cel
yeast29.cel
yeast30.cel
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
LimmaPlot time course for top differential expressionHeatmap
Model Fitting for Identifying Differential Expression
Limma model [4]Construct design matrixConstruct constrasts
Up-regulated and down-regulated listPlot time course for top differential expressionVolcano Plot for viewing p-value and fold-changeHeatmap
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
LimmaPlot time course for top differential expressionHeatmap
Up and down regulated listFor identifying differential expression, combine the contrasts bycomparing mutant type and wild type at time point 1,2,3 and 4.
#Model Fittingfit<-lmFit(CerevisiaeProbeData, design)mc<-makeContrasts(’m1-w1’,’m2-w2’,’m3-w3’,’m4-w4’,levels=design)fit2<-contrasts.fit(fit, mc)eb<-eBayes(fit2)
Probeset ID Gene Symbol T1 T2 T3 T4ProbesetID 1 Gene 1 -1 -1 -1 -1ProbesetID 2 Gene 2 1 1 1 1ProbesetID 3 Gene 3 -1 -1 -1 -1
Table: Up and down regulated list1
1Not use real gene names hereGuiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
LimmaPlot time course for top differential expressionHeatmap
Plot time course for top differential expression
M
M
M
M
M
0 1 2 3 4
89
1011
1213
ProbesetID 1 Gene 1 Rank= 1
time
Exp
ress
ion
M
M
MM
M
M
M
M
M
M
W W W WW
W W W W WW
W WW
W
M
M
M
M M
0 1 2 3 4
89
1011
1213
14
ProbesetID 2 Gene 2 Rank= 2
time
Exp
ress
ion
M
M
M
M M
M
M
M
MM
W W
W W W
W
W
W W W
W W
W W W
M
M
M
M
M
0 1 2 3 4
78
910
11
ProbesetID 3 Gene 3 Rank= 3
time
Exp
ress
ion
M
M
M
M
M
M
M
M
M
M
W
WW W W
W
W W W
W
W W
W W
W
Figure: Time course expression for top 3 differentially expressedYeast genes2
2Not use real gene names hereGuiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
LimmaPlot time course for top differential expressionHeatmap
Volcano Plot for viewing p-value and fold-change
Figure: Bonferroni adjusted p-value less than 0.01 and fold-changelarger than 3.5
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
LimmaPlot time course for top differential expressionHeatmap
Heatmap
r1t0
r2t0
r3t0
r1t1
r2t1
r3t1
r1t2
r2t2
r3t2
r1t3
r2t3
r3t3
r1t4
r2t4
r3t4
Gene 2Gene 6Gene 38Gene 93Gene 80Gene 94Gene 88Gene 63Gene 85Gene 12Gene 91Gene 60Gene 46Gene 50Gene 19Gene 74Gene 92Gene 73Gene 100Gene 71Gene 69Gene 9Gene 43Gene 72Gene 32Gene 55Gene 64Gene 61Gene 75Gene 59Gene 86Gene 89Gene 45Gene 44Gene 83Gene 67Gene 68Gene 24Gene 53Gene 84Gene 90Gene 37Gene 87Gene 96Gene 15Gene 13Gene 21Gene 8Gene 39Gene 7Gene 3Gene 4Gene 5Gene 22Gene 42Gene 16Gene 95Gene 48Gene 77Gene 65Gene 76Gene 36Gene 20Gene 29Gene 33Gene 54Gene 52Gene 79Gene 51Gene 70Gene 97Gene 58Gene 62Gene 66Gene 14Gene 47Gene 23Gene 41Gene 1Gene 35Gene 57Gene 10Gene 99Gene 81Gene 98Gene 27Gene 40Gene 34Gene 31Gene 56Gene 11Gene 49Gene 82Gene 78Gene 18Gene 26Gene 25Gene 17Gene 30Gene 28
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
Network Inference
GeneNet [2]: simple, quick, but not robustMetropolis Hastings for decomposable graphs (MH-d) [3].Computationaly intensive but more stableReversiable Jump MCMC for time course data
Yeast Network
Gene 41
Gene 87
Gene 64
Gene 99Gene 57
Gene 81
Gene 94 Gene 91
Gene 98
Gene 96
Gene 58
Gene 68
Gene 88
Gene 92
Gene 80
Gene 95
Gene 69
Gene 97
Gene 65
Gene 63
Gene 66Gene 74
Gene 49
Gene 67 Gene 89
Gene 70
Gene 82Gene 75
Gene 100
Gene 90
Gene 22
Gene 53
Gene 44
Gene 93
Gene 60
Gene 56
Gene 59Gene 55Gene 76
Gene 54
Gene 34
Gene 10
Gene 35
Gene 71
Gene 28
Gene 50
Gene 84
Gene 79
Gene 73
Gene 16
Gene 86
Gene 42
Gene 20
Gene 31
Gene 27
Gene 40Gene 39
Gene 77
Gene 18
Gene 38
Gene 24
Gene 48
Gene 33
Gene 15
Gene 32
Gene 14
Gene 83
Gene 11
Gene 23
Gene 25
Gene 36 Gene 4
Gene 52
Figure: Inferred network by GeneNet package for top 100differentially expressed Yeast genes
Guiyuan Lei Microarray data analysis using BioConductor
outlinePre-process of data
Model Fitting for Identifying Differential ExpressionNetwork Inference
References
W.G. Alvord, J.A. Roayaei, O.A. Quinones, and K.T. Schneider.A microarray analysis for differential gene expression in the soybean genomeusing Bioconductor and R.Briefings in Bioinformatics, September 29, 2007.
Schafer J. and K. Strimmer.An empirical bayes approach to inferring large-scale gene association networks.Bioinformatics, 21:754–764, 2005.
Beatrix Jones, Carlos Carvalho, Adrian Dobra, Chris Hans, Chris Carter, andMike West.Experiments in stochastic computation for high-dimensional graphical models.Statistical Science, 20(4):388–400, 2005.
G.K. Smyth.Linear models and empirical Bayes methods for assessing differential expressionin microarray experiments.Statistical Applications in Genetics and Molecular Biology, 3, 2004.
E. Wit and J. Mcclure.Statistics for Microarrays: Design, Analysis and Inference.Wiley, 2004.
Guiyuan Lei Microarray data analysis using BioConductor