Post on 28-Dec-2015
Analyzing transcription modules in the pathogenic yeast Candida albicans
Elik ChapnikYoav Amiram
Supervisor:Dr. Naama Barkai
Background (1) – C. albicans
• Opportunistic fungal pathogen• Genome was recently sequenced• Lack of sufficient annotation of genes• Distant cousins: S. cerevisiae
– SC is the yeast model organism– SC is used as a model to study CA– comparative genomics: what are the tools?
Genes
Con
ditio
ns
• BLAST• DNA Microarrays
– monitors 1000’s of genes simultaneously
– co-expression patterns canprovide functional links
• Cluster Analysis, SVD– limited size of data sets– mutually exclusive clusters– expression analyzed under all conditions
Background (2) – Tools
• “Transcription Modules” (TMs):– a self-consistent regulatory unit
– co-regulated genes and their regulating conditions
• Signature Algorithm– global decomposition into TMs
– robust, fast
– integration of external data
– if no a-priory information exists, can be applied iteratively (ISA)
Background (2) – Tools
• Expression levels of SC have been measured for over 1000 conditions
• Emerging quantities of CA microarray experiments• Genomes are both fully sequenced
What can be done with all this?1. Large scale expression analysis of CA (Dr. Barkai’s
group and Prof. Judith Berman)
2. Use the homology between SC and CA− focus on selected annotated SC transcription modules− use the information from SC TMs to study CA
Better understanding of CA via SC data
Measures:1. computing pair-wise
correlations between genes in TMs (Pearson correlation coefficient)
Annotating C. albicans ORFs with unknown functions
Main goal of the project (1)
Measures (cont.):2. Search for cis-regulatory elements (CREs) in
the upstream region of genes– find over represented sequence in the upstream
region of genes in the SC modules, using computational DNA pattern recognition methods
– search for previously identified cis-regulatory elements in the CA homologue modules
Main goal of the project (2)
• Programming software: MATLAB 6.5
• Cluster analysis tools: GeneHopping
• Sequence data: Stanford Genome Technology center
• Expression data: C. albicans expression data was provided by Prof. Berman’s lab
• Software for CRE prediction: MEME, TESS, EPD, CONSENSUS
Tools and methods
Generating modules
BLAST
signature algorithm
Yeast Module
Candida Refined Module
Candida Homologue
Module
And the modules are:
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0 1
Identifying co-regulation
Yeast Module
Candida Homologue
Module
Candida Refined Module
Find all pair-wise correlation in the module genes using the Pearson correlation coefficient
Apply statistical significance tests:generate random modules to compute Z-scores
Average Correlation+
Z-score
Average Correlation+
Z-score
Average Correlation+
Z-score> <
1. Generate random modules by reshuffling genes in whole genome database
2. Compute average correlations for the random and “real” modules
3. Calculate mean and standard deviation from random modules set
4. Calculate Z-scores of “real” modules5. High Z-score (>2) represents a
statistically significant correlated module
Statistical analysis
BLAST
signature algorithm
Yeast Module
Candida Refined Module
Candida Homologue
Module
Two slides ago…
Yeast Module
Candida Homologue
ModuleRejected
Overlapped
Overlapped
Candida Refined Module
Included
Included
Find common CRE in Yeast
Module Rejected
Identification of cis-regulatory elements
Candida Homologue
ModuleRejected
Overlapped
Yeast Module
Overlapped
Candida Refined Module
Included
Included
Rejected
our prediction for CRE % and Mean CRE in each module
CRE
CRE
CRE
CRE CRE ?
Identification of cis-regulatory elements
Average Correlation 0.34816
Z-Score = 106.9
Results – co-regulation of SC aa Module
Module type
Module nameS. cerevisiae
C. albicans homologue
module
C. albicans refined module
Amino acid Biosynthesis
0.34816 ±0.0029[106.9]
0.043325±0.0038
[7.5693]
0.26942±0.0082
[31.038]
Cell Cycle G10.2921±0.0028
[90.0693]
0.0475±0.0047
[7.0945]
0.18±0.0079
[20.926]
rRNA Processing0.674±0.0045
[142.113]
0.3216±0.0051
[60.2796]
0.3097±0.0023
[127.507]
Proteosome Subunits
0.4211±0.0054
[71.2679]
0.1611±0.0078
[18.8772]
0.2342±0.0045
[48.9743]
0.9-1.0
0.8-0.9
0.7-0.8
0.6-0.7
0.5-0.6
0.4-0.5
0.3-0.4
0.2-0.3
0.1-0.2
0.0-0.1
Mean Correlation±Standard Deviation
[Z-Score]
Results – co-regulation of modules
Amino acid
Biosynthesis(13.7)
Cell Cycle G1
(12.9)
rRNA Processing
(12.6)
Proteosome subunits(11.31)
Amino acidBiosynthesis
(13.7)---
-0.0216±0.0017
[-35.0476]
0.0042±0.0025
[-13.9315]
0.0337±0.0031
[-1.6166]
Cell Cycle G1
(12.9)
-0.0216±0.0017
[-35.0476]---
0.0779±0.0024
[16.1595]
0.0203±0.0025
[-7.2475]
rRNA Processing
(12.6)
0.0042±0.0025
[-13.9315]
0.0779±0.0024
[16.1595]---
-0.1241±0.0033
[-48.9049]
Proteosome subunits(11.31)
0.0337±0.0031
[-1.6166]
0.0203±0.0025
[-7.2475]
-0.1241±0.0033
[-48.9049]---
Modules are anti-regulated
Modules are co-regulated
Results – co-regulation between SC modules
Amino acid
Biosynthesis(13.7)
Cell Cycle G1
(12.9)
rRNA Processing
(12.6)
Proteosome subunits(11.31)
Amino acidBiosynthesis
(13.7)---
-0.0078±0.0051
[-4.5555]
0.0622±0.0032
[14.8978]
-2.02E-04±0.0041
[-3.5271]
Cell Cycle G1
(12.9)
-0.0078±0.0051
[-4.5555]---
0.0117±0.0034
[-0.9320
0.0341±0.0041
[4.7324]
rRNA Processing
(12.6)
0.0622±0.0032
[14.8978]
0.0117±0.0034
[-0.9320]---
-0.0028±0.0026
[-6.6787]
Proteosome subunits(11.31)
-2.02E-04±0.0041
[-3.5271]
0.0341±0.0041
[4.7324]
-0.0028±0.0026
[-6.6787]---
Modules are anti-regulated
Modules are co-regulated
Results – co-regulation between CA modules
Candida Homologue
ModuleRejected
Overlapped
Yeast Module
Overlapped
Candida Refined Module
Included
Included
Rejected
TGACTCCRE
CRE
CRE
CRE CRE ?
Results - cis-regulatory elements in the aa modules
46%, 1.25
34%, 1.06
54%, 1.29
53%, 1.22
29%, 1.00
52%, 1.18
CRE %, Mean CRE
Results – cis-regulatory elements chart
Module type
Module name
S. cerevisiae
C. albicans homologue
module
Rejected genes
Included genes
Overlapped genes
C. albicans refined module
Amino acid Biosynthesis
15646%1.25
9834%1.06
7729%
1
1354%1.285
2152%1.181
3453%1.222
rRNA Processing
12.6
6167%1.585
5542%1.304
944%1.25
21932%1.225
4641%1.315
26534%1.24
Protesosome subunits
10.14
4137%
1
3719%1.428
1118%
1
3816%1.166
2619%1.6
6417%1.363
Protesosome subunits
11.31
4562%1.071
3923%
1
1323%
1
3813%
1
2623%
1
6417%
1
Cell Cycle G1 12.9
12459%1.41
7146%
1
5242%
1
1429%
1
1958%
1
3345%
1
Cell Cycle G1 16.4
15852%1.378
8845%1.025
6740%1.037
1323%
1
2162%
1
3447%
1
# of GenesCRE %
Mean CRE
• Co-regulation:– Different co-regulation schemes can point out
alternative gene function between SC and CA– Investigate the relations between “real” CA modules
and refined CA modules with a similar annotation• cis-regulatory elements:
– CRE as a function of homology– CRE as a function of co-regulation– Low expression of SC CRE as an indicator for
biological importance– Not all CREs are conserved between the organisms:
GCN4 vs. GAL4
Conclusions
• Experimental validation of functional assignment:– verify if the cis-regulatory elements found in
C. albicans are biologically active– test the conservation of function across
homologue modules of S. cerevisiae and C. albicans
Future research tasks
• Naama Barkai – Weizmann Institute
• Judith Berman – University of Minnesota
• Sven Bergmann – Barkai’s group• Jan Ihmels – Barkai’s group
Acknowledgements