Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence...

58
Gene Expression Clustering

Transcript of Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence...

Page 1: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Gene Expression Clustering

Page 2: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

The Main Goal

Gain insight into the gene’s function.

Using: Sequence Transcription levels.

Page 3: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Microarray Technology

Page 4: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Microarray Technology

Microarray - standard laboratory technique. Information about gene expression. Tens of thousands of data points. Analyze by computational methods.

Page 5: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Gene Clustering

To cluster genes means to group together genes with similarity in their expression patterns.

Page 6: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Why do we need to cluster genes?

Unknown gene function. Common regulatory elements. Pathways and biological processes. Defining new disease subclasses. Predict categorization of new samples. Data reduction and visualization.

Page 7: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Gene Clustering

Clustering methods can be divided into two major groups: Supervised clustering –classify according to previous

knowledge (group prediction). Unsupervised clustering – no previous knowledge is

used (pattern discovery).

Page 8: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Unsupervised clustering

In many cases we have little a-priory knowledge about genes.

There are many different methods of unsupervised clustering.

We will present Hierarchical clustering.

Page 9: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

The Method

Page 10: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering

All data instances start in their own clusters. Two most closely related clusters are merged. Repeated until a single cluster remains.

Arranges the data into a tree structure Can be broken into the desired number of

clusters.

Page 11: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringThe raw data

GeneChip1Chip2…Chip20

1x1,1x1,2…x1,20

2x2,1x2,2…x2,20

3x3,1x3,2…x3,20

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

12,000x12000,1x12000,2…x12000,20

Page 12: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringNormalized data

Page 13: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringCalculate the Distance Matrix

Euclidean distance formula:

)...,(),...,( 2121 nn yyyyxxxx

n

iii yxyxd

1

2)(),(

Correlation coefficient ():

N

i

N

i

N

i

N

XEXiXV

XiN

XE

YVXV

YEYiXEXiN

yxd

1

2

1

1

))(()(

1)(

)()(

))())(((1

),(

A

B

C

Page 14: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Calculate the Distance Matrix

Average linkage - midpoint. Single linkage – smallest distance. Complete linkage - largest distance.

Page 15: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Calculate the Distance Matrix

GeneChip1Chip2

A-2.01.0

B-1.5-0.5

C1.00.25

ABC

A0.001.583.09

B1.580.002.61

C3.092.610.00

Page 16: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringAverage Linkage Algorithm

ABCD

A0.001.583.094.74

B1.580.002.615.00

C3.092.610.002.70

D4.745.002.700.00

ADBC

Page 17: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Average Linkage Algorithm

ABCD

AB0.002.854.81

C2.850.002.70

D4.812.700.00

ABDC

Page 18: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Average Linkage Algorithm

ABCD

AB0.003.83

CD3.830.00

ABCD

Page 19: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringdendogram

ABCD

Page 20: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering heat maps

red corresponding to high expression levels

green corresponding to low expression levels

black corresopnding to intermediate expression levels.

Page 21: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Experiment Control

Random 1 – randomized by rows.

Random 2 – randomized by columns.

Random 3 – randomized by both rows and columns.

Page 22: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Examples

Page 23: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example I

We present here an experiment of Spellman et al that was published in Mol. Biol. Cell 9, 3273-3297 (1998).

Goals of the experiment: Identify all cell cycle regulated genes in Yeast. Show clustering at work.

Page 24: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example ICell Cycle

Page 25: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IMethods

DNA microarrays contained all the yeast genome.

Measure levels of mRNA as a function of time.

Page 26: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IMethods

Synchronization: factor. Elutriation – size based. Cdc15 – heat mutation.

Factors: cln3p, clb2p deletation. induced with these factors.

Data from a previously published study (Cho et al. 1998)

Control sample: asynchronous cultures.

Page 27: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IMethods

Measurements analyzed based on:

Fourier algorithm - assesses periodicity.

Correlation measurement - compared with previously identified cell cycle regulated genes.

Page 28: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IMethods

Calculate a score for each gene - "CDC score".

Threshold CDC value.

91% of the genes previously shown to be cell cycle regulated are included.

About 800 genes were identified as cell cycle regulated.

Page 29: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IPhasing

By time of peak expression:

Page 30: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

By similarity of expression across the measurements:

Example IClustering

Page 31: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IClustering

Hierarchical clustering. Identified 9 clusters.

Genes in each cluster share: Common upstream elements Regulation by similar transcription factors. Common function (only for known genes). Cln3p and clb2p has the same effect on the

genes in a cluster.

Page 32: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IClustering

Histone cluster: A very tight cluster. Repeated SCB motif in promoter. Induced by Cln3. Unaffected by Clb2. Peak during S phase.

Page 33: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IResults

Genes with known functionality:

Cell cycle regulated functions The MET cluster. Genes involved in secretion and lipid synthesis.

Known genes discovered as cell cycle regulated.

Page 34: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IResults

New binding sites for regulators.

The CLB cluster is highly regulated. Aligning the genes in the cluster. New consensus for MCM1+SFF binding site.

Page 35: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IResults

MCM1:T-T-A-C-C-N-A-A-T-T-N-G-G-T-A-A SFF: G-T-M-A-A-C-A-A New motif:T-T-W-C-C-Y-A-A-W-N-N-G-G-W-A-A-W-W-N-R-T-A-A-A-Y-A-A

Page 36: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II

Gasch AP. et al. Genomic expression programs in the response of yeast cells to environmental changes.Mol Biol Cell. 2000; 11(12): 4241-57

Main Goal: Characterize the yeast response to environmental

changes, and particularly to stress conditions.

Page 37: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example IIMethods

Yeast cells responding to diverse environmental stresses.

Microarray contained all yeast genes. Results were organized by hierarchical

clustering.

Page 38: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Page 39: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II General features of the stress response

Massive and rapid changes. Transient changes.

Correlated with the magnitude of the shift: Duration Amplitude Steady-state difference.

Page 40: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Page 41: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II General features of the stress response

Some genes responded in a stereotypical manner.

Some genes had unique response. No two expression programs were identical.

Page 42: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II The Environmental Stress Response (ESR)

About 900 genes responded in a stereotypical manner.

ESR – Environmental Stress Response.

Two large clusters of genes: repressed genes (~ 600) induced genes (~ 300)

Showed reciprocal response.

Page 43: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Page 44: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II The Environmental Stress Response (ESR)

Response to different shift in: Temperature Osmolarity.

Page 45: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

osmolarityHeat shock

Example II The Environmental Stress Response (ESR)

The ESR is not a response to all environmental changes.

Page 46: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II The Environmental Stress Response (ESR)

Shift between two equally stressful environments: 29oC and hyper-osmotic medium. 33oC with normal osmolarity.

sum of the responses.

Independent response to each of the changes.

Page 47: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II The Environmental Stress Response (ESR)

Previously known: STRE promoter. Recognized by Msn2p and Msn4p.

One all-purpose regulatory system ?

Page 48: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Page 49: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II The Environmental Stress Response (ESR)

TRX2 cluster genes: Dependent on Msn2/Msn4p in response to heat

shock. Unaffected from Msn2/Msn4p in response to H2O2.

Contained binding site for Yap1p.

Yap1p deletion strain.

Page 50: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Page 51: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II The Environmental Stress Response (ESR)

Revealed that TRX2 cluster genes: Induced by Yap1p in response to H2O2 treatment Unaffected by the deletion in response to heat shock.

ESR regulated by different transcription factors.

Regulation is condition-specific and gene-specific.

Page 52: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Example II Specific Response

Response to stress: Stereotypic response (ESR). Specific response.

Character cell’s response to specific stress.

Example: Heat-shock response ESR initiated fast (minutes). Induction of chaperones. Alternative carbon source utilization.

Page 53: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Conclusions

Page 54: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringConclusion

Difficulty: Post transcriptional regulation.

Solution: Use the method in cases the main regulation is in transcription level (example – Yeast cell cycle).

Page 55: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clusteringConclusion

Difficulty: No statistical foundation for the decision of where to cut the dendogram.

Solution: Split a tree in such a way which will produce clusters of genes with homogeneity. Such a split is considered to be evidence that the grouping was correct.

Page 56: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Conclusion

Difficulty: The algorithm will produce clusters in any case.

Solution:Introduces a small amount of random to the data, re-cluster the data and compare the results to the original clustering. If the results are the same, then the clustering is not representing true biological meaning.

Page 57: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

Hierarchical clustering Conclusion

Discover gene’s function. Status of cellular processes. Information on regulatory mechanisms. General cell behaviors. Assign genes to pathways. Unknown biological pathways.

Page 58: Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.

References Eisen M. B., Spellman P. T., Brown R. O., Botstein D. Cluster

analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA, 95: 14863-14868, 1998

Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273-3297 (1998).

Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO.

Genomic expression programs in the response of yeast cells to environmental changes.Mol Biol Cell. 2000; 11(12): 4241-57.

Shannon William, Culverhouse Robert, Duncan Jill. Analyzing microarray data using cluster analysis. Pharmacogenomics, 2003, 4(1):41-51. Review.

Kaminski Naftali, Friedman Nir. Practical Approaches to Analyzing Results of Microarray Experiments. American Journal of Respiratory and Cell Molecular Biology, 2002, 27:125-132. Reviwe.