Mahajan and Earp An SH2 domain-dependent, Phosphotyrosine ...
June 24th, ENS, Lyon Kami (Knowledge Aggregator and Model...
Transcript of June 24th, ENS, Lyon Kami (Knowledge Aggregator and Model...
1
Kami (Knowledge Aggregator and Model Instantiator) A graph rewriting pipeline from the biological
literature to rule-based models.
Adrien Basso-BlandinRussell Harmer
June 24th,ENS, Lyon
2
Biological knowledges
Problematic :• Large and complex systems• Multiple interaction levels• Literature and data fragmented• Studied piecewise in specific contexts• Inconsistent data in some cases
Objectives :• Understand biological processes globally• Aggregate knowledges into big mechanistic models• Infer new knowledges from models, graph analysis and
simulations
3
Big Mechanism
BDD
Litterature
Data fragments
Knowledge sources Biological data
(Nuggets)
Knowledge representation
SimulationLitteratureLiterature
BDDBDD
Extraction CompilationAggregation
4
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
5
Knowledge representation pipeline
Nuggets Action Graph
Simulation
Mechanistic rules
Model constraints
Logical contact graph
Ontology
instantiation rules
Meta-model : Layer-graph
Graph rewriting rules
• Graph patterns ?• DL logic
• Meta-model : Layer-graph
• Nuggets projection
Projection
compilation
Projection
• Meta-model : Site graph
• Nuggets projection
traduction
6
Nuggets
7
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
8
Biological nuggets
1. “EGFR binds the SH2 domain of Grb2 provided that EGFR is phosphorylated and residue 90 of Grb2 is a serine”
2. “Shc binds Grb2 on the Sh2 domain provided that Shc has a phosphorylated tyrosine”
3. “Grb2 can dimerize”4. “Shc tyrosine phosphorylation has a positive influence on Grb2
dimerization” 5. “Shc concentration has a negative influence on EGFR – Grb2 binding”
#### Signatures
%agent: EGFR(s2~u~p,s3)
%agent: Shc(y~u~p,s1)
%agent: Grb2(sh2,s4,s5)
#### Rules
EGFR(s2~p,s3),Grb2(sh2) -> EGFR(s2~p,s3!1),Grb2(sh2!1)@
Shc(y~p,s3),Grb2(sh2) -> Shc(y~p,s3!1),Grb2(sh2!1)@
Shc(y~u) -> Shc(y~p)@
Grb2(s4),Grb2(s5) -> Grb2(s4!1),Grb2(s5!1)@
#### Variables
#### Initial conditions
Kappa
• Site graphs• Graph rewriting rules• Modeling protein-protein interaction• Complete signature of each agents (protein)
Rules based modeling
Grb2EGFR Bnd sh2
Shc
P:ph
yP:ph Bnd
Mod
Bnd
Grb2EGFR Bnd
+
sh2
aa:S
S90Loc:90
Shc
P:ph
yaa:y
P:ph Bnd
Mod
Bnd
-
9
Biological nuggets
1. “EGFR binds the SH2 domain of Grb2 provided that EGFR is phosphorylated and residue 90 of Grb2 is a serine”
2. “Shc binds Grb2 on the Sh2 domain provided that Shc has a phosphorylated tyrosine”
3. “Grb2 can dimerize”4. “Shc tyrosine phosphorylation has a positive influence on
Grb2 dimerization” 5. “Shc concentration has a negative influence on EGFR – Grb2
binding”
Grb2EGFR Bnd sh2
aa:S
S90Loc:90
P:ph
1
Grb2sh2Shc
yaa:y
P:ph
Bnd2
Grb2
Bnd3
Grb2
+
Shc
yP:ph
BndMod
4
Shc
-
EGFR
Grb2Bnd
5
Partial mechanisticinformations
Influence(correlation)
10
Meta-model
loc aa
Is_bnd
bnd_rc
brk_rc
mod_rc
Agent MODBND
reg
res
flag
st
s
int
Flags -> impact the system during the simulation
Attributes -> impact the system during instantiation≠
∈ [𝑎 − 𝑧]
∈ {𝑢, 𝑝}∈ 𝐵𝑜𝑜𝑙
∈ ℝ
11
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
12
Graph rewriting rules
Example :• Rule for complexes
• Rule for families
Ac_complex
A CA C
A C
AAc_complex
A1G
A2 C A3T
F:A
A1G
A2 C A3T
F:A
G
C
T
F:A
G
C
T
13
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
14
Constraints on the model
Grb2EGFR Bnd
+
sh2
aa:S
S90Loc:90
Shc
P:ph
yaa:y
P:ph Bnd
Mod
Bnd
-
15
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
16
Biological meta-data
• “EGFR binds the SH2 domain of Grb2 provided that EGFR is phosphorylated and residue 90 of Grb2 is a serine”
• “Shc binds Grb2 on the Sh2 domain provided that Shc has a phosphorylated tyrosine”
• “Grb2 can dimerize”• “Shc tyrosine phosphorylation has a positive influence on
Grb2 dimerization” • “Shc concentration has a negative influence on EGFR – Grb2
binding”
Grb2EGFR Bnd
+
sh2
aa:S
S90Loc:90
Shc
P:ph
yaa:y
P:ph Bnd
Mod
Bnd
-
t:sh2Bnd
Rate:2,1
T:phos
t:sh2Bnd
17
Nuggets aggregation
18
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
19
“hand made” knowledge Aggregation
• “EGFR binds the SH2 domain of Grb2 provided that EGFR is phosphorylated and residue 90 of Grb2 is a serine”
• “Shc binds Grb2 on the Sh2 domain provided that Shc has a phosphorylated tyrosine”
• “Grb2 can dimerize”• “Shc tyrosine phosphorylation has a positive influence on
Grb2 dimerization” • “Shc concentration has a negative influence on EGFR – Grb2
binding”
Grb2EGFR Bnd
+
sh2
aa:S
S90Loc:90
Shc
P:ph
yaa:y
P:ph Bnd
Mod
Bnd
-
1. Nuggets aggregation and correctness2. Diversity of knowledges (non-homogeneous)3. Sparse models
Action graph
20
Semantic check
Mods t Bb1
type:phosPhos:1
type:kinase
??
C Binds t Bb1
[50-52]
?
?
DBinds t
• Verification over amino acid• Verification of sequence inclusion• Verification of action type / coherence
• Over binds• Over modification actions
• Verification of sequence overlapping / edge conflict• Domain specific action aggregation
K[1-122]
[50-52]
51
Aa:y
D
BBinds t
[40-52]
[30-47]Binds t
#
! work in progress !
• Ontology representing :• the formal model• Biological constraints
• DL constraints for structural integrity and automated inferences
21
Action aggregation
aa:yEGFR
Y
Loc:1092Phos:1 BNDs s
Grb2
• EGFR binds Grb2 on the SH2 domain provided EGFR is phosphorylated on Y1092• tyrosine-phosphorylated Shc binds the SH2 domain of Grb2
aa:S
pYLoc:90
Phos:1
SH2
aa:ySHC
YPhos:1
BNDs s
22
Compilation
23
Knowledge representation pipeline
NuggetsAction Graph
Simulation
Mechanistic rules
Model constraints
B
A
y445
M
B
B
Logical contact graph
+ Meta-Data
K
pinstantiation rules
?
...
Multiple instances of model
(site graph)
• Semantic Check (C.Froidevaux )
• Inference (F.Fages)
24
Logical contact graph
A logical graph :• Actions• Agents• Mechanistic edges• key residues/region instantiation• Virtual sites added
Grb2EGFR sh2
aa:S
S90Loc:90
Shc
yaa:y
P:ph Bnd
Mod
Bnd
User choice
User choice?
? ?
?
?P:ph
Nuggets modelLogical contact graph
25
Attributes instantiation
Grb2EGFR sh2
aa:S
S90Loc:90
Shc
yaa:y
P:ph Bnd
Mod
Bnd
User choice
User choice?
? ?
?
?P:ph
Aa:{y,t,u}
Loc:[75,95]
26
From Lcg to Simulation
Grb2EGFR
Bnd
sh2
Shc
yP:p Bnd
Mod
s3
s4 s5
s1
s2P:p
#### Signatures
%agent: EGFR(s2~u~p,s3)
%agent: Shc(y~u~p,s1)
%agent: Grb2(sh2,s4,s5)
#### Rules
EGFR(s2~p,s3),Grb2(sh2) -> EGFR(s2~p,s3!1),Grb2(sh2!1)@
Shc(y~p,s3),Grb2(sh2) -> Shc(y~p,s3!1),Grb2(sh2!1)@
Shc(y~u) -> Shc(y~p)@
Grb2(s4),Grb2(s5) -> Grb2(s4!1),Grb2(s5!1)@
#### Variables
#### Initial conditions
Egfr,Egfr.s2,Egfr.s2.p=p,Egfr.s3,Grb2,Grb2.sh2
Shc,Shc.y,Shc.y.p=p,Shc.s1,Grb2,Grb2.sh2
Grb2,Grb2.s4,Grb2.s5
Shc,Sch.y,Shc.y.p
F(P) = {u->p}
Rc:0.3
Rc:0.20.3
0.3
0.3
27
Knowledge inference
! work in progress !
28
Inferences
BDD
Litterature
Data fragments
Knowledge sources Biological
data
Knowledge representation
SimulationLitteratureLiterature
BDDBDD
inferenceRefinement
Semantic check
Extraction CompilationAggregation
inference
29
Knowledge inference
BA
y445
??
+
C-
Ac_complex
?
2 kind of knowledge
mechanistic rules Behaviours
A bind B C has a negative influence on B
Ph : Bind(A,B)
Problematic : • How to automatically infer formal rules from assertions ?
Approach : • Represent the model as a formula.• Derivating Graph from formula• Using sat Solver over formula.
• Use a proof assistant approach !
30
Knowledge inference example
BA
y445
Ph
??
+
C-
Ac_complex
?
1. Phosphorylation of B has a positive influence on its homodimerization
2. C is a known inhibitor of B homodimerization3. C is often found in complex with A
?
mechanistic rules assertions
∃ 𝐴, 𝐶, 𝑥, 𝑦, 𝑎𝑔𝑒𝑛𝑡 𝐴 ,a𝑔𝑒𝑛𝑡 𝐶 , 𝑠𝑖𝑡𝑒 𝐴, 𝑥 ,𝑠𝑖𝑡𝑒 𝐶, 𝑦 , 𝑏𝑖𝑛𝑑(𝑥, 𝑦)
𝐶𝑜𝑚𝑝𝑙𝑒𝑥(𝐴, 𝐶)
∃𝑏1, 𝑏2, 𝑥, 𝑦, 𝑏1: 𝐵, 𝑏2: 𝐵,𝑎𝑔𝑒𝑛𝑡 𝑏1 , 𝑠𝑖𝑡𝑒 𝑏1, 𝑥 ,𝑠𝑖𝑡𝑒 𝑏2, 𝑦 , 𝑎𝑔𝑒𝑛𝑡 𝑏2 ,𝑝ℎ𝑜𝑠 𝑥 → 𝑏𝑖𝑛𝑑 𝑥, 𝑦
𝑃ℎ𝑜𝑠 𝐵++> [𝑏𝑖𝑛𝑑 𝐵, 𝐵 ]
∃𝑏1, 𝑏2, 𝑥, 𝑦, 𝐶, 𝑎𝑔𝑒𝑛𝑡 𝑏1 ,𝑎𝑔𝑒𝑛𝑡 𝑏2 , 𝑏1: 𝐵, 𝑏2: 𝐵, … ?
𝐶 −→ [𝑏𝑖𝑛𝑑 𝐵, 𝐵 ]?
Ph : Bind(A,B)
PhBind : Phos(y445)
31
Collaborations
BDD
Litterature
Data fragments
Knowledge sources Biological
data
Knowledgerepresentation
SimulationLitteratureLiterature
BDDBDD
inferenceRefinement
Semantic check
Extraction CompilationAggregation
inference
?(James Allen
Team)
Human reading
Jérôme Feret(3)
François Fages(4)
Walter Fontana(1)
Hector Medina(1)
Pierre boutilier(1),Jean Krivine(2),Jérôme Feret(3)
Christine Froidevaux(5)
Conclusion
John Bachman(1)
1. Harvard Medical School2. IRIF (PPS)3. ENS Paris4. Inria Saclay5. LRI Orsay
32
Long term objectives ?
The
sisPostDoc
PostDoc
33
Thank you