RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

35
RECONSTRUCTING GENE REGULATORY NETWORKS FROM FUNGAL TRANSCRIPTOMIC DATA USING BAYESIAN NETWORK Li Guo Fungal Comparative Genomics Laboratory Department of Biochemistry and Molecular biology University of Massachusetts Amherst October 19, 2012

Transcript of RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Page 1: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

RECONSTRUCTING GENE REGULATORY NETWORKS FROM FUNGAL TRANSCRIPTOMIC DATA USING BAYESIAN NETWORK

Li Guo Fungal Comparative Genomics Laboratory

Department of Biochemistry and Molecular biology University of Massachusetts Amherst

October 19, 2012

Page 2: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Outline !  Introduction to gene expression and regulation !  Fungal transcriptomics !  Data collection and analysis !  Building transcription regulation networks using

Bayesian model learning

Page 3: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Why life forms are so diverse? What determines what they are?

Page 4: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …
Page 5: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

!  Phenotype: biological characteristics or traits of an organism

!  Genotype: genetic background or makeup of an organism

!  Gene: a unity of heredity in a living organism, usually is a DNA fragment that codes for a type of protein or RNA

!  Genome: a collection of entire DNA in an organism

!  Transcription: DNA to RNA

!  Translation: RNA to protein

Code of Life

Page 6: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …
Page 7: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Whole genome shotgun sequencing

Page 8: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Gene expression regulation through transcription factors (TF)

Page 9: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

What is a gene regulatory network? !  Cells respond to internal and external cues by rapidly

regulating expression of thousands of genes !  Genes work together via a fine regulatory network to

control life process !  Two major kinds of regulatory molecules: !  Transcription Factors (TF) !  Signaling Proteins(SP):

!  SP and TF work in a combinatory way to determine the mRNA levels in a living cell

Page 10: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

An example of regulatory networks

http://osf1.gmu.edu/~rcouch/protprot.jpg

Page 11: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Why study regulatory networks?

! Understanding the mechanisms of cellular function

! Applications in biomedical practices and agricultural production

!  Facilitating comparative analysis of regulatory networks in different organisms

Page 12: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Where to start? ! How about studying one gene at a time? ! Genome sequencing and annotation ! Global transcriptional profiling—

transcriptomics ! DNA microarray ! RNA sequencing

! Expression of thousands of genes (including SPs and TFs) can be simultaneously measured under various conditions and perturbations

Page 13: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

DNA Microarray RNA sequencing

Page 14: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Pathogenic Fusarium Fungus Host Disease Mycotoxins Genome

sequenced? Genome size (Mbps)

Fusarium graminearum PH-1

Wheat, Barley, maize

Head blight, ear rot and stalk rot

deoxynivalenol, zearalenone

Yes 36

Fusarium verticillioides 7600

Maize, sorghum

Ear rot and stalk rot

Fumonisins Yes 42

Fusarium oxysporum 4287

Tomato Vascular wilt Unknown Yes 60

Page 15: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Research focuses:

!  Functional characterization of Fusarium genes essential for fungal development and host infection

! Comparative genomics ! Transcriptomics !  Fusarium comparative transcriptomics

Page 16: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Experimental procedure

Minimal N Minimal C

Mutations in key signaling

pathways

mRNA cDNA

Data processing and normalization

Hybridization image

Statistical analysis

Page 17: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Data after normalization

Page 18: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Every gene has its expression levels measured at each condition in log2 scale

Page 19: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

We have the expression data, now what? !  Statistical analysis can reveal the genes

differentially expressed in test samples versus reference sample

! These genes are selected for downstream analysis such as gene ontology (GO) term enrichment and mutagenesis study

! Used to construct transcription regulation networks

Page 20: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Guo et al. unpublished

Fusarium gene expression changed responding to stress and mutation

Page 21: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

QIAGEN

Page 22: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

!"#"$%&' ()*+,-./' -0#12)#34'5"617892)#' !"#"$%&' ()*+,-./' -0#12)#34'5"617892)#'

FGSG_03969! 8.74! conserved hypothetical protein! FGSG_08624! 5.55! conserved hypothetical protein!

FGSG_07894! 7.67! "#$%&#'()!(*'+%&$*(!&*$(),+! FGSG_00793! 5.41! conserved hypothetical protein!

FGSG_03737! 7.47! related to major facilitator MirA! FGSG_02880! 5.4! nitrate reductase!

FGSG_03738! 7.42! conserved hypothetical protein FGSG_02051! 5.28!probable manganese superoxide

dismutase precursor (sod-2)!

FGSG_00773! 7.02! copper transport protein! FGSG_04089! 5.21! conserved hypothetical protein!

FGSG_02050! 6.9! conserved hypothetical protein! FGSG_03692! 5.19! related to Fre1p and Fre2p!

FGSG_02324! 6.86! "-./0! FGSG_07792! 5.18!related to integral membrane

protein!

FGSG_02327! 6.85! '1*2! FGSG_10608! 5.15! conserved hypothetical protein!

FGSG_02329! 6.72! conserved hypothetical protein! FGSG_02589! 4.94! probable ZRT2 - Zinc transporter II!

FGSG_03593! 6.66! 6-hydroxy-D-nicotine oxidase! FGSG_07802! 4.87!putative multidrug transporter

Mfs1.1!

FGSG_03586! 6.53! conserved hypothetical protein FGSG_07375! 4.8! alkaline phosphatase!

FGSG_04780! 6.52! ferric reductase Fre2p! FGSG_11205! 4.79! "*$3'34)!.+$5"*$(/!&)*61%$*!

FGSG_03735! 6.45! ABC1 transport protein! FGSG_08375! 4.72! dicarboxylate carrier protein!

FGSG_02325! 6.43! conserved hypothetical protein! FGSG_03592! 4.66! conserved hypothetical protein!

FGSG_02326! 6.32! '1*7! FGSG_07567! 4.65!related to Staphylococcus multidrug

resistance protein!

FGSG_07804! 6.18! cytochrome p450! FGSG_07596! 4.58! 8,91+6:$+'4!";<=>!

FGSG_06564! 6.15! conserved hypothetical protein! FGSG_13223! 4.54!related to elongation factor 1-

gamma!

FGSG_04512! 6.1!PMR1 - Ca++-transporting P-type

ATPase located in Golgi! FGSG_00260! 4.47! conserved hypothetical protein!

FGSG_02328! 6.09! GIP1! FGSG_11379! 4.47! conserved hypothetical protein!

FGSG_03736! 6.05! transferase family protein! FGSG_11310! 4.39! conserved hypothetical protein!

FGSG_08697! 5.92! conserved hypothetical protein! FGSG_10506! 4.32! monocarboxylate transporter 2!

FGSG_07798! 5.9! PKS10! FGSG_07678! 4.29! acid phosphatase Pho610!

FGSG_07801! 5.7! conserved hypothetical protein! FGSG_07805! 4.21! related to S-adenosylmethionine

FGSG_03046! 5.69! conserved hypothetical protein! FGSG_00742! 4.19! related to S-adenosylmethionine!

FGSG_10655! 5.69! ferric reductase FRE2 precursor! FGSG_09727! 4.19! conserved hypothetical protein

FGSG_08151! 5.56! heme peroxidase! FGSG_03372! 4.18! conserved hypothetical protein

FGSG_13222! 5.56! conserved hypothetical protein! FGSG_03957! 4.18!related to myo-inositol transport

protein ITR1!

FGSG_12214! 4.07! conserved hypothetical protein!

0

5

10

15

20

25

30

Potassium binding

Cellular transport

Cell rescue, defense and

virulence

Interaction with the

environment

Freq

uenc

y %

GO Description

Genome

FgMac1 up

Top up-regulated genes in F. graminearum mac-1 mutant

P<0.05

Guo et al. unpublished

Page 23: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

!"#"$%&' ()*+,-./' -0#12)#34'5"617892)#' !"#"$%&' ()*+,-./' -0#12)#34''5"617892)#'

FGSG_02034! -7.1! alcohol dehydrogenase I - ADH1! FGSG_03838! -4.27!related to 3-oxoacyl-[acyl-carrier-protein] reductase!

FGSG_03162! -7.01! formate transport protein! FGSG_02949! -4.25! conserved hypothetical protein

FGSG_04468! -6.49! neutral amino acid permease! FGSG_10647! -4.21!

methylase involved in ubiquinone menaquinone

biosynthesis!

FGSG_07522! -6! conserved hypothetical protein! FGSG_01234! -4.15! MAC1!

FGSG_13979! -6! conserved hypothetical protein FGSG_08055! -4.15!related to neutral amino acid

permease!

FGSG_08415! -5.52! extracellular invertase! FGSG_03990! -4.14! conserved hypothetical protein

FGSG_07411! -5.47! sulphite efflux pump protein! FGSG_11073! -4.09! related to aminopeptidase!

FGSG_01947! -5.37! nitrate reductase! FGSG_11385! -4.09!Related to integral membrane

protein PTH11!

FGSG_12890! -5.17! ?)4'()5!($!3*(/!&*$(),+! FGSG_10598! -3.93! conserved hypothetical protein

FGSG_11413! -5.14! alpha methylacyl- racemase! FGSG_03430! -3.87!related to D-arabinitol 2-

dehydrogenase!

FGSG_03026! -5.07! FGSG_07672! -3.8! conserved hypothetical protein

FGSG_03881! -4.96!related to TRI15 - putative

transcription factor! FGSG_03882! -3.79!probable ABC1 transport

protein!

FGSG_06537! -4.85! neutral amino acid permease! FGSG_11741! -3.77! conserved hypothetical protein

FGSG_08402! -4.82! nitrite reductase! FGSG_02901! -3.72! related to sulfatase!

FGSG_11992! -4.82! related to UDPglucose 4-epimerase!FGSG_07837! -3.7! carbon-nitrogen hydrolase!

FGSG_08357! -4.65! conserved hypothetical protein FGSG_06974! -3.69!related to tRNA 2`-phosphotransferase!

FGSG_00817! -4.63! conserved hypothetical protein FGSG_01980! -3.64! conserved hypothetical protein

FGSG_02821! -4.56! conserved hypothetical protein FGSG_09354! -3.62!probable neutral amino acid

permease!

FGSG_11412! -4.54! conserved hypothetical protein FGSG_09706! -3.6!related to positive effector

protein GCN20!

FGSG_04458! -4.52! probable flavohemoglobin! FGSG_03422! -3.57! conserved hypothetical protein

FGSG_00285! -4.48! conserved hypothetical protein FGSG_07705! -3.53! conserved hypothetical protein

FGSG_02950! -4.45!related to neutral amino acid

permease! FGSG_04892! -3.52! retinol dehydrogenase 8!

FGSG_05683! -4.45! related to monooxygenase! FGSG_04251! -3.51! short chain oxidoreductase!

FGSG_11500! -4.34!related to pyridoxamine 5`-

phosphate oxidase! FGSG_07832! -3.5!

Related to CCC1 protein (involved in calcium

homeostasis)!

FGSG_08721! -4.32!probable superoxide dismutase

[Cu-Zn]! FGSG_13962! -3.5!related to 3-hydroxybutyryl-

CoA dehydrogenase!

FGSG_03368! -4.27! conserved hypothetical protein

Top down-regulated genes in F. graminearum mac-1 mutant

0

5

10

15

20

25

30

35

Freq

uenc

y %

Go Description

Genome

FgMac1down

Guo et al. unpublished

Page 24: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

CELLULAR TRANSPORT Lipid metabolism and transport

rRNA synthesis

F. verticillioides mac1xcpka

6 TFs

4 TFs

Page 25: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

How to use transcriptomic data to elucidate the regulatory networks?

! Regulatory interactions between genes give statistical dependencies between random variables representing their expression levels

! Bayesian networks model ! Why Bayesian networks? ! Unknown network structures ! Microarray typically noisy, needing probabilistic

model ! Hidden variables

Page 26: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Bayesian network !  Representation: graph !  Reasoning: probability theory

Earthquake Burglary

Alarm

NeighborCalls

Page 27: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Score!based Learning

E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>

E B

A

E

B

A E

B A

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

Page 28: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Bayesian Score ! The score evaluates the posterior probability of

the graph given the data:

P(D)G)P(G)P(DD)P(G || = (1)

(2)

(3)

Likelihood Prior

Probability of data Posterior

Page 29: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Using Prior Biological Knowledge

Pe’er et al. 2006

Page 30: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Problem Definition

!  Input !  10 samples of Gene Expression level under different

condition. 13331 Genes !  List of candidate Regulator as TF and SP

!  Output !  List of most activated Regulator (k = 30) !  Bayesian Network for Genes

Page 31: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Learning Bayesian Networks !  Parameter Estimation !  Structure Learning

Page 32: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Structure Learning

!  Using the typical heuristic greedy hill-climbing search, we pick a candidate regulator which can give us the high score as the regulator amount all the candidates, add one to the output regulator each iteration.

Candidate Regulator

294

Output Regulator

Empty

Output Regulator

Gat1

Candidate Regulator

293

Page 33: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Z A

D

B

X Y

C

Predicted networks will be validated based on existed biological knowledge.

Page 34: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Summary !  Regulation of gene expression determines the cell state

and response to the environment !  Transcriptomics studies the gene expression of given

times and cell types in a global scale !  Gene expression (transcript) levels are measured by

microarray and RNA sequencing technology !  Connection between genes and their expression

regulators such as transcription factors and signaling molecules can be revealed by probabilistic and graphic model such as Bayesian networks

Page 35: RECONSTRUCTING GENE REGULATORY NETWORKS FROM …

Acknowledgment ! UMass Amherst ! Dr. Li-Jun Ma ! Dr. Lixin Gao ! Guoyi Zhao ! Andy Berg !  Jiangtao Yin

! Collaborators: ! Corby Kistler (University of Minnesota) !  Jin-Rong Xu (Purdue University)