Statistical Mechanics of Temporal Association in Neural Networks
Statistical Techniques for Temporal Microarray Data Analysis
description
Transcript of Statistical Techniques for Temporal Microarray Data Analysis
Ritesh KrishnaDepartment Of Computer Science
WPCCS July 1, 2008
Why should you listen to my talk ?System Biology is everybody’s playground in
this room – Image processing, Algorithms, Parallel processing etc.
Importance of System Biology in today’s context –AgricultureEnergy sources (Bio Fuels)Gene TherapyWaste clean-up
Use of Computational TechniquesMassive data generated by molecular biology
experimentsNeed to analyse outputs files produced in
various formats, facilitate storage of bulk data, quick and precise retrieval, and most importantly understanding the behaviour and pattern in the data
How are these experiments performed
• Major revolution in the world of molecular biology• No limitation of one gene in one experiment• Possible to monitor expression levels of thousands of genes simultaneously
An example - Arabidopsis Thaliana• Popular in plant biology as a model plant• One of the smallest plant genome• First plant genome to be sequenced
Present Study• The present study is about understanding leaf senescence process in Arabidopsis.• Senescence refers to the biological processes of a living organism approaching an advanced age, caused due to age and stress in plant• It is a programmed event responding to a wide range of external and internal signals and is controlled in a tightly regulated manner by different genes and proteins..
Experimental Design
Issues with dataBiological variations vs. Technical
variationsTechnical variations – Sample bias, Dye
bias, Slide bias, Experimental conditions variations, Scanning and Imaging errors, Human errors
Massive dataset with ~31,000 genesGoal is to understand functioning of
certain sets of genes (needle in the haystack)
Step one – Clean the raw data using Normalization
To assess different sources of technical biasesTo remove the correlations between replicates to
make them independent from each other Fitting a multivariate error model - Normal
distribution with mean zero and constant variance for the residuals associated with genes
Propose statistical tests for evaluating the effects of normalization
Step two - ClusteringReduce the data dimensionSimilar genes sit in the same cluster.
Step three – Causal Network inference
0 5 105
7
9
11
time (day)
int.
ELF4
LFY
CCA1
TOC1
1 3 5 7 9 110
5
10
15
20
fre.
mag
.
o CCA1+ LHY
o ELF4+ TOC1
ELF4
TOC1
CCA1LFY
ERS2 ETR2 ETR1
CTR1
ERF1
EIN3
EIL3 EIL4EIL5
EIN2
EIL1EIL2
EIN4
PDF1.2
EIN6
ERS1
More information….Affymetrix Inc. (
http://www.affymetrix.com/index.affx)
Agilent Technologies (http://www.chem.agilent.com)
Microarray Analysis , Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15