RECONSTRUCTING GENE REGULATORY NETWORKS FROM …
Transcript of RECONSTRUCTING GENE REGULATORY NETWORKS FROM …
RECONSTRUCTING GENE REGULATORY NETWORKS FROM FUNGAL TRANSCRIPTOMIC DATA USING BAYESIAN NETWORK
Li Guo Fungal Comparative Genomics Laboratory
Department of Biochemistry and Molecular biology University of Massachusetts Amherst
October 19, 2012
Outline ! Introduction to gene expression and regulation ! Fungal transcriptomics ! Data collection and analysis ! Building transcription regulation networks using
Bayesian model learning
Why life forms are so diverse? What determines what they are?
! Phenotype: biological characteristics or traits of an organism
! Genotype: genetic background or makeup of an organism
! Gene: a unity of heredity in a living organism, usually is a DNA fragment that codes for a type of protein or RNA
! Genome: a collection of entire DNA in an organism
! Transcription: DNA to RNA
! Translation: RNA to protein
Code of Life
Whole genome shotgun sequencing
Gene expression regulation through transcription factors (TF)
What is a gene regulatory network? ! Cells respond to internal and external cues by rapidly
regulating expression of thousands of genes ! Genes work together via a fine regulatory network to
control life process ! Two major kinds of regulatory molecules: ! Transcription Factors (TF) ! Signaling Proteins(SP):
! SP and TF work in a combinatory way to determine the mRNA levels in a living cell
An example of regulatory networks
http://osf1.gmu.edu/~rcouch/protprot.jpg
Why study regulatory networks?
! Understanding the mechanisms of cellular function
! Applications in biomedical practices and agricultural production
! Facilitating comparative analysis of regulatory networks in different organisms
Where to start? ! How about studying one gene at a time? ! Genome sequencing and annotation ! Global transcriptional profiling—
transcriptomics ! DNA microarray ! RNA sequencing
! Expression of thousands of genes (including SPs and TFs) can be simultaneously measured under various conditions and perturbations
DNA Microarray RNA sequencing
Pathogenic Fusarium Fungus Host Disease Mycotoxins Genome
sequenced? Genome size (Mbps)
Fusarium graminearum PH-1
Wheat, Barley, maize
Head blight, ear rot and stalk rot
deoxynivalenol, zearalenone
Yes 36
Fusarium verticillioides 7600
Maize, sorghum
Ear rot and stalk rot
Fumonisins Yes 42
Fusarium oxysporum 4287
Tomato Vascular wilt Unknown Yes 60
Research focuses:
! Functional characterization of Fusarium genes essential for fungal development and host infection
! Comparative genomics ! Transcriptomics ! Fusarium comparative transcriptomics
Experimental procedure
Minimal N Minimal C
Mutations in key signaling
pathways
mRNA cDNA
Data processing and normalization
Hybridization image
Statistical analysis
Data after normalization
Every gene has its expression levels measured at each condition in log2 scale
We have the expression data, now what? ! Statistical analysis can reveal the genes
differentially expressed in test samples versus reference sample
! These genes are selected for downstream analysis such as gene ontology (GO) term enrichment and mutagenesis study
! Used to construct transcription regulation networks
Guo et al. unpublished
Fusarium gene expression changed responding to stress and mutation
QIAGEN
!"#"$%&' ()*+,-./' -0#12)#34'5"617892)#' !"#"$%&' ()*+,-./' -0#12)#34'5"617892)#'
FGSG_03969! 8.74! conserved hypothetical protein! FGSG_08624! 5.55! conserved hypothetical protein!
FGSG_07894! 7.67! "#$%&#'()!(*'+%&$*(!&*$(),+! FGSG_00793! 5.41! conserved hypothetical protein!
FGSG_03737! 7.47! related to major facilitator MirA! FGSG_02880! 5.4! nitrate reductase!
FGSG_03738! 7.42! conserved hypothetical protein FGSG_02051! 5.28!probable manganese superoxide
dismutase precursor (sod-2)!
FGSG_00773! 7.02! copper transport protein! FGSG_04089! 5.21! conserved hypothetical protein!
FGSG_02050! 6.9! conserved hypothetical protein! FGSG_03692! 5.19! related to Fre1p and Fre2p!
FGSG_02324! 6.86! "-./0! FGSG_07792! 5.18!related to integral membrane
protein!
FGSG_02327! 6.85! '1*2! FGSG_10608! 5.15! conserved hypothetical protein!
FGSG_02329! 6.72! conserved hypothetical protein! FGSG_02589! 4.94! probable ZRT2 - Zinc transporter II!
FGSG_03593! 6.66! 6-hydroxy-D-nicotine oxidase! FGSG_07802! 4.87!putative multidrug transporter
Mfs1.1!
FGSG_03586! 6.53! conserved hypothetical protein FGSG_07375! 4.8! alkaline phosphatase!
FGSG_04780! 6.52! ferric reductase Fre2p! FGSG_11205! 4.79! "*$3'34)!.+$5"*$(/!&)*61%$*!
FGSG_03735! 6.45! ABC1 transport protein! FGSG_08375! 4.72! dicarboxylate carrier protein!
FGSG_02325! 6.43! conserved hypothetical protein! FGSG_03592! 4.66! conserved hypothetical protein!
FGSG_02326! 6.32! '1*7! FGSG_07567! 4.65!related to Staphylococcus multidrug
resistance protein!
FGSG_07804! 6.18! cytochrome p450! FGSG_07596! 4.58! 8,91+6:$+'4!";<=>!
FGSG_06564! 6.15! conserved hypothetical protein! FGSG_13223! 4.54!related to elongation factor 1-
gamma!
FGSG_04512! 6.1!PMR1 - Ca++-transporting P-type
ATPase located in Golgi! FGSG_00260! 4.47! conserved hypothetical protein!
FGSG_02328! 6.09! GIP1! FGSG_11379! 4.47! conserved hypothetical protein!
FGSG_03736! 6.05! transferase family protein! FGSG_11310! 4.39! conserved hypothetical protein!
FGSG_08697! 5.92! conserved hypothetical protein! FGSG_10506! 4.32! monocarboxylate transporter 2!
FGSG_07798! 5.9! PKS10! FGSG_07678! 4.29! acid phosphatase Pho610!
FGSG_07801! 5.7! conserved hypothetical protein! FGSG_07805! 4.21! related to S-adenosylmethionine
FGSG_03046! 5.69! conserved hypothetical protein! FGSG_00742! 4.19! related to S-adenosylmethionine!
FGSG_10655! 5.69! ferric reductase FRE2 precursor! FGSG_09727! 4.19! conserved hypothetical protein
FGSG_08151! 5.56! heme peroxidase! FGSG_03372! 4.18! conserved hypothetical protein
FGSG_13222! 5.56! conserved hypothetical protein! FGSG_03957! 4.18!related to myo-inositol transport
protein ITR1!
FGSG_12214! 4.07! conserved hypothetical protein!
0
5
10
15
20
25
30
Potassium binding
Cellular transport
Cell rescue, defense and
virulence
Interaction with the
environment
Freq
uenc
y %
GO Description
Genome
FgMac1 up
Top up-regulated genes in F. graminearum mac-1 mutant
P<0.05
Guo et al. unpublished
!"#"$%&' ()*+,-./' -0#12)#34'5"617892)#' !"#"$%&' ()*+,-./' -0#12)#34''5"617892)#'
FGSG_02034! -7.1! alcohol dehydrogenase I - ADH1! FGSG_03838! -4.27!related to 3-oxoacyl-[acyl-carrier-protein] reductase!
FGSG_03162! -7.01! formate transport protein! FGSG_02949! -4.25! conserved hypothetical protein
FGSG_04468! -6.49! neutral amino acid permease! FGSG_10647! -4.21!
methylase involved in ubiquinone menaquinone
biosynthesis!
FGSG_07522! -6! conserved hypothetical protein! FGSG_01234! -4.15! MAC1!
FGSG_13979! -6! conserved hypothetical protein FGSG_08055! -4.15!related to neutral amino acid
permease!
FGSG_08415! -5.52! extracellular invertase! FGSG_03990! -4.14! conserved hypothetical protein
FGSG_07411! -5.47! sulphite efflux pump protein! FGSG_11073! -4.09! related to aminopeptidase!
FGSG_01947! -5.37! nitrate reductase! FGSG_11385! -4.09!Related to integral membrane
protein PTH11!
FGSG_12890! -5.17! ?)4'()5!($!3*(/!&*$(),+! FGSG_10598! -3.93! conserved hypothetical protein
FGSG_11413! -5.14! alpha methylacyl- racemase! FGSG_03430! -3.87!related to D-arabinitol 2-
dehydrogenase!
FGSG_03026! -5.07! FGSG_07672! -3.8! conserved hypothetical protein
FGSG_03881! -4.96!related to TRI15 - putative
transcription factor! FGSG_03882! -3.79!probable ABC1 transport
protein!
FGSG_06537! -4.85! neutral amino acid permease! FGSG_11741! -3.77! conserved hypothetical protein
FGSG_08402! -4.82! nitrite reductase! FGSG_02901! -3.72! related to sulfatase!
FGSG_11992! -4.82! related to UDPglucose 4-epimerase!FGSG_07837! -3.7! carbon-nitrogen hydrolase!
FGSG_08357! -4.65! conserved hypothetical protein FGSG_06974! -3.69!related to tRNA 2`-phosphotransferase!
FGSG_00817! -4.63! conserved hypothetical protein FGSG_01980! -3.64! conserved hypothetical protein
FGSG_02821! -4.56! conserved hypothetical protein FGSG_09354! -3.62!probable neutral amino acid
permease!
FGSG_11412! -4.54! conserved hypothetical protein FGSG_09706! -3.6!related to positive effector
protein GCN20!
FGSG_04458! -4.52! probable flavohemoglobin! FGSG_03422! -3.57! conserved hypothetical protein
FGSG_00285! -4.48! conserved hypothetical protein FGSG_07705! -3.53! conserved hypothetical protein
FGSG_02950! -4.45!related to neutral amino acid
permease! FGSG_04892! -3.52! retinol dehydrogenase 8!
FGSG_05683! -4.45! related to monooxygenase! FGSG_04251! -3.51! short chain oxidoreductase!
FGSG_11500! -4.34!related to pyridoxamine 5`-
phosphate oxidase! FGSG_07832! -3.5!
Related to CCC1 protein (involved in calcium
homeostasis)!
FGSG_08721! -4.32!probable superoxide dismutase
[Cu-Zn]! FGSG_13962! -3.5!related to 3-hydroxybutyryl-
CoA dehydrogenase!
FGSG_03368! -4.27! conserved hypothetical protein
Top down-regulated genes in F. graminearum mac-1 mutant
0
5
10
15
20
25
30
35
Freq
uenc
y %
Go Description
Genome
FgMac1down
Guo et al. unpublished
CELLULAR TRANSPORT Lipid metabolism and transport
rRNA synthesis
F. verticillioides mac1xcpka
6 TFs
4 TFs
How to use transcriptomic data to elucidate the regulatory networks?
! Regulatory interactions between genes give statistical dependencies between random variables representing their expression levels
! Bayesian networks model ! Why Bayesian networks? ! Unknown network structures ! Microarray typically noisy, needing probabilistic
model ! Hidden variables
Bayesian network ! Representation: graph ! Reasoning: probability theory
Earthquake Burglary
Alarm
NeighborCalls
Score!based Learning
E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>
E B
A
E
B
A E
B A
Search for a structure that maximizes the score
Define scoring function that evaluates how well a structure matches the data
Bayesian Score ! The score evaluates the posterior probability of
the graph given the data:
P(D)G)P(G)P(DD)P(G || = (1)
(2)
(3)
Likelihood Prior
Probability of data Posterior
Using Prior Biological Knowledge
Pe’er et al. 2006
Problem Definition
! Input ! 10 samples of Gene Expression level under different
condition. 13331 Genes ! List of candidate Regulator as TF and SP
! Output ! List of most activated Regulator (k = 30) ! Bayesian Network for Genes
Learning Bayesian Networks ! Parameter Estimation ! Structure Learning
Structure Learning
! Using the typical heuristic greedy hill-climbing search, we pick a candidate regulator which can give us the high score as the regulator amount all the candidates, add one to the output regulator each iteration.
Candidate Regulator
294
Output Regulator
Empty
Output Regulator
Gat1
Candidate Regulator
293
Z A
D
B
X Y
C
Predicted networks will be validated based on existed biological knowledge.
Summary ! Regulation of gene expression determines the cell state
and response to the environment ! Transcriptomics studies the gene expression of given
times and cell types in a global scale ! Gene expression (transcript) levels are measured by
microarray and RNA sequencing technology ! Connection between genes and their expression
regulators such as transcription factors and signaling molecules can be revealed by probabilistic and graphic model such as Bayesian networks
Acknowledgment ! UMass Amherst ! Dr. Li-Jun Ma ! Dr. Lixin Gao ! Guoyi Zhao ! Andy Berg ! Jiangtao Yin
! Collaborators: ! Corby Kistler (University of Minnesota) ! Jin-Rong Xu (Purdue University)