Statistical Techniques for Temporal Microarray Data Analysis

Ritesh KrishnaDepartment Of Computer Science

WPCCS July 1, 2008

Why should you listen to my talk ?System Biology is everybody’s playground in

this room – Image processing, Algorithms, Parallel processing etc.

Importance of System Biology in today’s context –AgricultureEnergy sources (Bio Fuels)Gene TherapyWaste clean-up

Use of Computational TechniquesMassive data generated by molecular biology

experimentsNeed to analyse outputs files produced in

various formats, facilitate storage of bulk data, quick and precise retrieval, and most importantly understanding the behaviour and pattern in the data

How are these experiments performed

• Major revolution in the world of molecular biology• No limitation of one gene in one experiment• Possible to monitor expression levels of thousands of genes simultaneously

An example - Arabidopsis Thaliana• Popular in plant biology as a model plant• One of the smallest plant genome• First plant genome to be sequenced

Present Study• The present study is about understanding leaf senescence process in Arabidopsis.• Senescence refers to the biological processes of a living organism approaching an advanced age, caused due to age and stress in plant• It is a programmed event responding to a wide range of external and internal signals and is controlled in a tightly regulated manner by different genes and proteins..

Experimental Design

Issues with dataBiological variations vs. Technical

variationsTechnical variations – Sample bias, Dye

bias, Slide bias, Experimental conditions variations, Scanning and Imaging errors, Human errors

Massive dataset with ~31,000 genesGoal is to understand functioning of

certain sets of genes (needle in the haystack)

Step one – Clean the raw data using Normalization

To assess different sources of technical biasesTo remove the correlations between replicates to

make them independent from each other Fitting a multivariate error model - Normal

distribution with mean zero and constant variance for the residuals associated with genes

Propose statistical tests for evaluating the effects of normalization

Step two - ClusteringReduce the data dimensionSimilar genes sit in the same cluster.

Step three – Causal Network inference

0 5 105

7

9

11

time (day)

int.

ELF4

LFY

CCA1

TOC1

1 3 5 7 9 110

5

10

15

20

fre.

mag

.

o CCA1+ LHY

o ELF4+ TOC1

ELF4

TOC1

CCA1LFY

ERS2 ETR2 ETR1

CTR1

ERF1

EIN3

EIL3 EIL4EIL5

EIN2

EIL1EIL2

EIN4

PDF1.2

EIN6

ERS1

More information….Affymetrix Inc. (

http://www.affymetrix.com/index.affx)

Agilent Technologies (http://www.chem.agilent.com)

Microarray Analysis , Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15

Statistical Techniques for Temporal Microarray Data Analysis

Documents

Transcript of Statistical Techniques for Temporal Microarray Data Analysis