NetBioSIG2014-Talk by Ashwini Patil
-
Upload
alexander-pico -
Category
Science
-
view
525 -
download
5
description
Transcript of NetBioSIG2014-Talk by Ashwini Patil
TimeXNet: Identifying active gene sub-networks using
time-course gene expression profilesAshwini Patil
Institute of Medical ScienceUniversity of Tokyo
NetBio SIG, ISMB 2014
Goal• Comprehensive computational analysis of the innate
immune response
Mouse Interaction network103218 protein-protein, protein-DNA,
post-translational modifications
Time-course gene expressionRNA-seq expression levels in dendritic cells on LPS stimulus at 8 time points
Method - TimeXNetPartition differentially expressed genes into 3 time-based groups
Identify most probable paths in the network connecting the three groups
Patil et al., PLOS Comp. Biol., 2013
Minimum cost flow optimization
• ResponseNet• Identifies paths between two groups of genes (genetic hits and differentially
expressed genes in yeast)
- Yeger-Lotem et l., Nat. Genetics, 2009
TimeXNet methodology• Edge cost: inversely proportional to edge reliability• Edge capacity: directly proportional to
• Fold change in expression of adjacent gene(s)• Absolute tag counts of adjacent gene(s)
• Objective function
Minimize cost of flow through the network from T1 to T3 genes• Constraint
Flow must pass through intermediate nodes (T2 genes)
Most probable paths connecting T1->T2->T3 genes 2681 scored interactions among 1225 proteins
Candidate genesEarly genes
(0.5-1 hour)
Intermediate genes
(2-4 hours)
Late genes
(6-8 hours)
Genes with no change
in expression
Gene Flow Gene Flow Gene Flow Gene Flow
Jun 13.68 Socs3 85.85 Cxcl10 10.91 Stat1 8.74
Fos 10.34 Nfκb1 76.87 Ddx58 9.33 Mapk8 8.72
Il1b 9.86 Jak2 54.44 Stat2 8.65 Irf5 7.60
Tnf 9.36 Src 38.30 Atf3 8.29 Adcy5 7.43
Cxcl2 7.59 Pik3r5 27.86 Isg15 8.15 Mapk1 7.40
Il1a 7.40 Rela 23.35 Irf7 7.30 Sp1 7.37
Akt1 6.43 Stat5a 20.40 Nos2 6.91 Stat6 7.17
Atf4 5.49 Met 18.94 Ifnar2 5.20 Sp3 7.13
Candidate networks
Gm13305
Ifnar1
Il12rb1
Il13ra2
Ifngr2
Gm2002
Il13ra1
Il11ra1
Stat5b
Stat4
Irs3
Irs4
Lifr
Jak2
Cxcr4
Stat6
Il9r
Nck1
Il20ra
Il22ra11Il22ra2
Il7r
Il2rgIl4ra2
Il28raa Il2ra
Il6ra
Ifnar2
Il21r
Stat2
Il3ra
Crlf2
Ifngr1
Il15ra
Ddx58
Fos
Rela
Nfkb1
Stat5a
Bcl10
Il10rb
JunStat1
Sp3
CR974586.2
Socs3
Foxo3CT868723.4 Csf2rb
Gfi1b
CT868723.4CT868723.4CT868723.4Csf2rb2
Cntfr
bb Cre
• Socs3• Suppressor of cytokine signaling 3• Induced by Nfkb and inhibits a large number of proteins, specifically the
interleukin receptors
Method evaluation
• Comparison with experimentally identified regulators• Amit et al., Science 2009: 49.6% previously unknown genes identified• Chevrier et al., Cell 2011: 69.8% regulators (novel and known) and 54.9% TLR target
genes identified
• Overlap with KEGG pathways• Directed paths of 3 to 7 edges identified in 13 KEGG pathways• Jak-STAT signaling pathway, Chemokine signaling pathway, Toll-like receptor pathway,
MAPK signaling pathway
Comparison with other methods
Method Experimentally confirmed regulators (3 datasets)
KEGG Pathways with predicted
paths (max length)
Execution time (4 CPUs, 2.4Ghz, 12Gb
RAM)
Prior knowledge required
Time-course data
TimeXNet 49.6%1 69.8%2 54.9%3 13 (7 edges) 3 min None Yes
ResponseNet* 39.2%1 53.5%2 39.2%3 0 (3 edges) 1 min None No
SDREM 12.0%1 32.6%2 11.8%3 2 (4 edges) ~10 days Initial genes Yes
1 Regulatory genes from Amit et al., Science, 20092 Regulatory genes from Chevrier et al., Cell, 2011
3 Target genes from Chevrier et al., Cell, 2011
*Local implementation using GLPK
Yeast osmotic stress response
• Time-course gene expression (min) in yeast on hyperosmotic stress- Romero-Santacreu et al., RNA 2009
• Previously used to evaluate SDREM and ResponseNet- Gitter et al., Genome Research 2013
• Genes with 1.5 fold change in expression• Initial response genes: 2-4 min • Intermediate regulators: 6-8 min• Final effectors: 10-15 min
Predicted osmotic stress response network
• 2-4 min
• 6-8 min
• 10-15 min
• Predicted MethodGold
Standard* TFs* Hog1 RuntimeTimeXNet 19 5 Yes 5 secSDREM* 10 4 Yes -
ResponseNet* 3 2 No -*Taken from Gitter et al., Genome Research 2013
Circadian regulation of metabolism in mouse liver cells
- Unpublished
• Paths connecting genes showing rhythmic patterns of expression in 24 hours• Network predicted by TimeXNet contains Sphk2, Pld1, Pld2, Glud1
• Input • 3 sets of genes with
scores• Weighted interaction
network• Parameters gamma1 and
2• Location of glpsol
executable from the GLPK• Directory where results
will be storedCytoscape
Running TimeXNet• Standalone application • Command line version• Iterative command line version to
identify optimal parameters
Patil & Nakai, under review
Conclusion• TimeXNet: A method to predict active gene sub-networks using time-course
gene expression profiles• Advantages
• Accurate and fast• Independent of biological system: Innate immune response, circadian regulation of
metabolism in mouse, yeast osmotic stress response• Amenable to incorporation of other time-course data types: phosphorylation levels,
protein levels, epigenetic information
• Issues to be addressed• Allowing path prediction between more than 3 groups of genes while maintaining
speed and accuracy • Incorporating other forms of time-course information• Enhancements: Automatic install of GLPK, allowing users to enter non-numeric gene
IDsPatil et al., PLOS Comp. Biol., 2013
Acknowledgements• Innate immune response
• Prof. Kenta Nakai - University of Tokyo• Dr. Yutaro Kumagai – Osaka University• Dr. Kuo-ching Liang – University of Tokyo• Prof. Yutaka Suzuki – University of Tokyo• Dr. Tomonao Inobe – Toyama University
• Yeast osmotic stress response• Dr. Anthony Gitter – Microsoft Research
• Circadian regulation of metabolism• Dr. Craig Jolley – RIKEN Center for
Developmental Biology, Kobe
• Funding• Japan Society for the Promotion of
Science (JSPS) FIRST Program• JSPS Grant-in-Aid for Young Scientists• Takeda Science Foundation (with Dr.
Tomonao Inobe)
• Computational resources• Supercomputer at the Human Genome
Center, Institute of Medical Science, University of Tokyo
Edge CapacitiesFor edges between the auxiliary source, S, and the initial response genes GT1,
2 1 log / /
imax iSi T
imax ii i
fc eC i G
fc N e N
(3)
For edges connected to the intermediate regulators GT2,
2 2 2 log , / /
imax iij T T
imax ii i
fc eC i G j G
fc N e N
(4)
2 2
2
log log/ // /
, 2
jmax jimax i
imax jmaxi ji ji j
ij T
fc efc e
fc N fc Ne N e NC i j G
(5)
For edges between the late effectors, GT3, and the auxiliary sink T,
2 3 log / /
imax iiT T
imax ii i
fc eC i G
fc N e N
(6)
For edges between the auxiliary source, S, and the initial response genes GT1,
2 1 log / /
imax iSi T
imax ii i
fc eC i G
fc N e N
(3)
For edges connected to the intermediate regulators GT2,
2 2 2 log , / /
imax iij T T
imax ii i
fc eC i G j G
fc N e N
(4)
2 2
2
log log/ // /
, 2
jmax jimax i
imax jmaxi ji ji j
ij T
fc efc e
fc N fc Ne N e NC i j G
(5)
For edges between the late effectors, GT3, and the auxiliary sink T,
2 3 log / /
imax iiT T
imax ii i
fc eC i G
fc N e N
(6)
For edges between the auxiliary source, S, and the initial response genes GT1,
2 1 log / /
imax iSi T
imax ii i
fc eC i G
fc N e N
(3)
For edges connected to the intermediate regulators GT2,
2 2 2 log , / /
imax iij T T
imax ii i
fc eC i G j G
fc N e N
(4)
2 2
2
log log/ // /
, 2
jmax jimax i
imax jmaxi ji ji j
ij T
fc efc e
fc N fc Ne N e NC i j G
(5)
For edges between the late effectors, GT3, and the auxiliary sink T,
2 3 log / /
imax iiT T
imax ii i
fc eC i G
fc N e N
(6)
For edges connected to the intermediate regulators GT2,
• Graph G = (V, E) with E edges and V nodes (containing S – auxiliary source, T – auxiliary sink)
• fc = fold change• = average expression level at all time
points• N = number of genes with expression
values• S = auxiliary source node• T = auxiliary sink node• GT1, GT2, GT3 = genes having
maximal fold change at times T1, T2 and T3
For all other edges, not connected to the intermediate regulators or the auxiliary source and sink,
2 1 , ij TC i j S G T
Edge costs
1 Si Si Tw C i G (8)
2 ij ij Tw C i G (9)
3 iT iT Tw C i G (10)
2 , ij ij Tw f s i j S G T , as per equation (2)
The edge costs were calculated as:
10log , ij ijA w i j E (11)
Where ()f = scaling function
likelihood ratio , HitPredictijs i j ; 0.163 999ijs
999 , Innatedb, KEGGijs i j
, TRANSFACijs Transfac score i j ; 1 6ijs
1 Si Si Tw C i G (8)
2 ij ij Tw C i G (9)
3 iT iT Tw C i G (10)
2 , ij ij Tw f s i j S G T , as per equation (2)
The edge costs were calculated as:
10log , ij ijA w i j E (11)
1 Si Si Tw C i G (8)
2 ij ij Tw C i G (9)
3 iT iT Tw C i G (10)
2 , ij ij Tw f s i j S G T , as per equation (2)
The edge costs were calculated as:
10log , ij ijA w i j E (11)
1 Si Si Tw C i G (8)
2 ij ij Tw C i G (9)
3 iT iT Tw C i G (10)
2 , ij ij Tw f s i j S G T , as per equation (2)
The edge costs were calculated as:
10log , ij ijA w i j E (11)