Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray...

28
Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis

Transcript of Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray...

Page 1: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Bringing A Statistical Package To The

Biologist’s Fingertips

With Applications to Microarray Analysis

Page 2: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Microarray ExperimentsSome examples of the many types of microarray

experiments currently being considered.• Comparison to normal cells.• Comparison of many cell types using an

appropriate pool of RNA as a reference.• Time series using either time 0 or past time as

a reference• Knockout experiments• Factor experiments

Page 3: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Statistical issues to be addressed.Image analysis.• Spot identification• Background correction

Data analysis• Normalisation• Transformation• Significant genes• Large amounts of data• • ………………….Need a flexible approach.

Page 4: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

A tool for analysis : R

R is freeware that is rapidly becoming very widely used.

It can handle the large data files used to analyse microarrays.

Is available for Unix, Linux and Windows.

Has excellent documentation and help available.

Page 5: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Image Analysis and R

In collaboration with the CSIRO (Sydney) , Jean Yee Hwa Yang and Terry Speed have developed a microarray image analysis package that is currently being written for implementation using Z-image and R.

This automated image analysis program overcomes some of the problems and limitations of other commercial packages.

Output will automatically be setup for further analysis in R.

Page 6: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Using R at WEHI

Currently only available on unix02.

Access from a Macintosh is limited to command line window only. The graphics window can only be seen if an X-Windows program is installed on the Mac.

However, if there is a demand for use of R at WEHI then Computer Centre will investigate options to change this situation.

Install R windows on a PC or install R for linux.

Page 7: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Using R at WEHI (2)

NAT>R

R : Copyright 2000, The R Development Core TeamVersion 1.0.0 (February 29, 2000)Type "demo()" for some demos, "help()" for on-line help, or "help.start()" for a HTML browser interface to help.

Type "q()" to quit R.>q()Save workspace image? [y/n/c]: y

NAT>R --vsize=50M --nsize=2000k

Page 8: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

How to make a vector

> x<-c(1,3,5,4,7,8)> x[1] 1 3 5 4 7 8

> t(x) [,1] [,2] [,3] [,4] [,5] [,6][1,] 1 3 5 4 7 8

> length(x)[1] 6

> index<-c(2,3,4)> x[index][1] 3 5 4>

Page 9: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

How to make a matrix

> xmat<-matrix(x,nrow=2,ncol=3,byrow=T)> xmat [,1] [,2] [,3][1,] 1 3 5[2,] 4 7 8

> xmat[1,2][1] 3> xmat[,3][1] 5 8

> xmat<-matrix(x,nrow=2,ncol=3,byrow=F)> xmat [,1] [,2] [,3][1,] 1 5 7[2,] 3 4 8

Page 10: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Adding and removing a column

> addcol<-c(9,2)>> newxmat<-cbind(xmat,addcol)> newxmat addcol[1,] 1 5 7 9[2,] 3 4 8 2

> oldxmat<-newxmat[,-4]> oldxmat

[1,] 1 5 7[2,] 3 4 8

>

Page 11: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

A script to find mean of columns

> for( i in 1:3){+ print(mean(xmat[,i]))+ }> > 2.0 > 4.5 > 7.5 >

m<-0for( i in 1:3){m<-c(m,mean(xmat[,i]))}m<-m[-1]

for( i in 1:3){ print(mean(xmat[,i]))}

> dim(xmat)[1] 2 3> m<-0+ for( i in 1:3){+ m<-c(m,mean(xmat[,i]))+ }+ m<-m[,-1] >+ + + > > > > > > > > m[1] 2.0 4.5 7.5

Page 12: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Reading in Datanum GR GC SR SC NAME X Y CH1ICH1B CH1ISD CH1BSD CH2I CH2B CH2ISDCH2BSD1 1 1 1 1 CL0001 1220.00 890.00 1223.317505 168.473679 435.35226437.599304 1014.603149 139.578949 446.61496021.9375782 1 1 1 2 CL0001 1400.00 890.00 1257.714233 233.368423 337.94632090.568703 975.333313 142.684204 354.19403122.9348183 1 1 1 3 CL0008 1580.00 890.00 333.555542 144.000000 145.99256915.944347 277.730164 126.842102 156.3145299.719757

Page 13: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Reading in data from a text file

>#check that file has same number of arguments >#on each line for all lines>count.fields(file="tp04sk1.txt",sep="\t",skip=0)> . . . . . . . . . . . . . . . . . 16 16 16 16[9145] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9169] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9193] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9217] 16

>tp4sk1<- read.table("tp04sk1.txt", header=T, sep="\t", skip=0, row.names=1)> > >>attach(tp4sk1)> median(CH1I)[1] 375.627

Page 14: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Getting spot info from the dataframe

> cy3 <- CH2I # Greency5 <- CH1I # Red>> cy3bc <- CH2I-CH2B # Background Corrected.cy5bc <- CH1I-CH1B

> # Get duplicates.> d1 <- seq(1,(dim(tp4sk1)[1]-1),2)d2 <- seq(2,(dim(tp4sk1)[1]),2)>> cy3d1 <- cy3bc[d1] cy3d2 <- cy3bc[d2]> cy5d1 <- cy5bc[d1]cy5d2 <- cy5bc[d2]>

Page 15: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Always log the intensities

> > par(mfrow=c(2,3))hist(cy3,col="green")plot(density(cy3),col="green")plot(density(Cy3),col="green") # Use Log base 2 hist(cy5,col="red") plot(density(cy5),col="red")plot(density(Cy5),col="red")>>

Page 16: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Normalisation

>>>>

> par(mfrow=c(2,1))plot(density(Cy3),type="n")lines(density(Cy3),col="green")lines(density(Cy5),col="red")plot(Cy3,Cy5,xlab="Log(cy3) Background Corrected",ylab="Log(cy5) Background Corrected",main="The Need For Normalisation Between Green and Red Intensities")lines(lowess(Cy3,Cy5),col="yellow")

Page 17: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Normalisation (2)

>

>K <- median(

log2(cy3)-log2(cy5) )>>k <- 2**KCy5n <- k*cy5Cy5n <- log2(cy5n)

>

>

Green intensity is a multiple of the red intensity.cy3 <- k*cy5

So when you take logs,log2(cy3) <- K+log2(cy5)

Therefore, estimate K by the median difference of log intensities.

K <- median( Cy3 - Cy5 )k <- 2**(K)cy5n <- k*cy5Cy5n <- log2(cy5n)

Page 18: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Approximate normality of log ratios

> par(mfrow=c(2,1))plot(density(Cy5n-Cy3),col="purple")>>qqnorm(Cy5n-Cy3,col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>

>

Page 19: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

A question of significance

> par(mfrow=c(1,1))>plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant",col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>>lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3),col=”yellow")

> >

Page 20: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Identifying a spot on a plot

> par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-4,4),xlim=c(6,12))>text(0.5*(Cy3+Cy5n),Cy5n -Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3), col="yellow")

Page 21: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Saving graphics to a file (postscript)

>postscript(“filename.ps”) par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-0.1,1),xlim=c(10,11))text(0.5*(Cy3+Cy5n),Cy5n-Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)

dev.off()

>

Page 22: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Using R help

> ?plotGeneric X-Y PlottingDescription:Generic function for plotting of R objects. For more details about the graphical parameter arguments, see`par'.Usage: plot(x, ...) plot(x, y, xlim=range(x), ylim=range(y), type="p", main, xlab, ylab, ...) plot(y ~ x, ...)Arguments: x: the coordinates of points in the plot. Alternatively, a single plotting structure or any R object with a `plot’ method can be provided.:

Page 23: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Using R help (2)

> help.start()

Page 24: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

R Help (3)

Page 25: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

11 22

66

14 15

11

7

16

12

8

443

5

9

13

10

1 2 3 4 ……………….2425 26 27 …………………..48…….…..…........ 1.......….……...……..…………………………….576

577 578 579 …………….10011002 1003 …..…………..1025…….…..…........ 2.......….……...……..…………………………..1152

Page 26: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Level colour plot of background

> bkgmat<-matrix(1:24,nrow=24,ncol=1) for(i in 1:16){ s<-c((((i-1)*576)+1):(i*576)) m<-matrix(CH1B[s],nrow=24,ncol=24,byrow=T) bkgmat<-cbind(bkgmat,m) } bkgmat<-bkgmat[,-1] m1<-bkgmat[,1:96] m2<-bkgmat[,(97:192)] m3<-bkgmat[,(193:(3*96))] m4<-bkgmat[,(((3*96)+1):(4*96))] bkg<-rbind(m1,m2,m3,m4) > + + + >> + + + + > > > > >

> filled.contour(1:96,1:96,bkg,nlevels=100,color.palette=heat.colors)

Page 27: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

ConclusionR is flexible and powerful

• Easy to read in data.

• Enables manipulation of data.

• Extensive control of and range of graphics.

• Wide range of statistical functions.

• Add on packages available.

• Can write scripts as a text file to send to collaborators for importing into R. (Use source(“filename”) to import and execute code).

• Can save all the work you do in a session.

Page 28: Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis.

Acknowledgements

Terry Speed

Melanie Bahlo

Asa Wirapati

George Rudy

Jean Yee HwaYang

Chuang Fong Kong

Keith Slattery