SBML (the Systems Biology Markup Language)

53
SBML (the Systems Biology Markup Language) Michael Hucka, Ph.D. Department of Computing and Mathematical Sciences California Institute of Technology Pasadena, CA, USA COMBINE/ERASysApp tutorial, Melbourne, Australia, September 2014 Email: [email protected] Twitter: @mhucka

description

Morning tutorial given at the COMBINE/ERASysApp day of tutorials on "Modelling and Simulation of Biological Models" on Sunday, September 14, ahead of ICSB 2014 in Melbourne, Australia.

Transcript of SBML (the Systems Biology Markup Language)

Page 1: SBML (the Systems Biology Markup Language)

SBML (the Systems Biology Markup Language)Michael Hucka, Ph.D.

Department of Computing and Mathematical Sciences California Institute of Technology

Pasadena, CA, USA

COMBINE/ERASysApp tutorial, Melbourne, Australia, September 2014

Email: [email protected] Twitter: @mhucka

Page 2: SBML (the Systems Biology Markup Language)

What is SBML for, and why would anyone care?

Page 3: SBML (the Systems Biology Markup Language)

ABC-SysBio CellNetAnalyzer Karyote* PaVESy SBW: Auto Layout acslXtreme CellNOpt KEGGconverter PAYAO sbw: javasim ALC Cellware KEGGtranslator PET sbw: stochastic simulator AMIGO CLEML Kineticon PhysioLab Modeler SCIpath Antimony CL-SBML Kinsolver PINT SED-ML Web Tools APMonitor COBRA libAnnotationSBML PK-Sim / MoBi semanticSBML Arcadia CompuCell3D libRoadRunner PNK SensSB Asmparts ConsensusPathDB libSBML PottersWheel SGMP Athena COPASI libSBMLSim PRISM Sigmoid* AutoSBW CRdata libStruct ProcessDB SIGNALIGN AVIS CycSim MASS Toolbox ProMoT SignaLink BALSA CySBML MatCont PROTON SigPath BASIS Cytoscape MathSBML pybrn SigTran BetaWB Cyto-Sim Medicel PyDSTool SIMBA Bifurcation Discovery Tool DBSolve MEMOSys PySB SimBiology BiGG DEDiscover MesoRD PySCeS Simpathica BiNoM Dizzy Meta-All RANGE SimPheny* BiNoM Cytoscape Plugin DOTcvpSB Metaboflux RAVEN Simulate3D Bio Sketch Pad E-CELL MetaCrop Reactome Simulation Core Library BioBayes ecellJ MetaFluxNet ReMatch Simulation Tool BIOCHAM EPE Metannogen RMBNToolbox SimWiz BioCharon ESS Metatool roadRunner SloppyCell BioCyc Facile MetExplore RSBML SmartCell BioGRID FAME MetNetMaker SABIO-RK Snoopy Biological Networks FASIMU MIRIAM Resources Saint SOSlib BioMet Toolbox FBASBW MMT2 SBFC SPDBS BioModels Database FERN modelMaGe SBML Harvester SRS BioModels Importer FluxBalance ModeRator SBML Layout STEPS BioNessie Fluxor Modesto SBML Reaction Finder StochKit BioNetGen Genetdes Moleculizer SBML Translators StochPy BioPARKIN Genetic Network Analyzer MonaLisa SBML2APM StochSim BioPathwise Gepasi Monod SBML2BioPax STOCKS BioPAX2SBML Gillespie2 MOOSE SBML2LaTeX SurreyFBA BioRica GINsim MuVal (Multi-valued logic) SBML2NEURON SyBiL BioSens GNAT Narrator SBML2Octave SYCAMORE BioSPICE Dashboard GNU MCSim nemo SBML2SMW SynBioSS BioSpreadsheet GRENDEL NetBuilder' SBML2TikZ Systrip BioSyS HSMB NetPath SBML2XPP TERANODE Suite BioTapestry HybridSBML NetPro SBMLEditor The Cell Collective BioUML iBioSim Odefy SBML-PET-MPI Tide BoolNet IBRENA Omix SBMLR TinkerCell braincirc Insilico Discovery ONDEX SBML-SAT Trelis BRENDA insilicoIDE optflux SBML-shorthand UTKornTools BSTLab iPathways Oscill8 SBMLSim VANTED ByoDyn JACOBIAN PANTHER Pathway SBMLsqueezer Vcell CADLIVE Jacobian Viewer PathArt sbmltidy WebCell Cain Jarnac Pathway Access SBMLToolbox WinSCAMP CARMEN JarnacLite Pathway Analyser SBMM assistant Wolfram SystemModeler Cell Illustrator JDesigner Pathway Builder SBO xCellerator CellDesigner JigCell Pathway Solver SBSI Xholon Cellerator JSBML Pathway Tools SBToolbox2 XPPAUT CellMC JSim PathwayLab sbtranslate CellML2SBML JWS Online PATIKAweb SBW

Many software tools for modelingand simulation are available

Page 4: SBML (the Systems Biology Markup Language)

https://www.behance.net/gallery/d/7465033

Research often involves the use of more than one tool

Page 5: SBML (the Systems Biology Markup Language)

Need flexible way to exchange results between tools (and researchers)

Page 6: SBML (the Systems Biology Markup Language)
Page 7: SBML (the Systems Biology Markup Language)

Why not simply distribute a model in the original format?

Page 8: SBML (the Systems Biology Markup Language)

Why not simply distribute a model in the original format?Of course, you should do that anyway – it’s vital for good science

• Allows others to run the original model, understand it, verify it, etc.

Page 9: SBML (the Systems Biology Markup Language)

Why not simply distribute a model in the original format?Of course, you should do that anyway – it’s vital for good science

• Allows others to run the original model, understand it, verify it, etc.

But native formats by themselves are not ideal for communication

• Not everyone can run the same software

- One tool’s native format is not necessarily another tool’s format

• Biological semantics often not encoded in native formats

• A neutral format can allow more flexible model reuse

• A common format can allow people to relate your work to theirs more directly (and algorithmically)

Page 10: SBML (the Systems Biology Markup Language)

Why not simply distribute a model in the original format?Of course, you should do that anyway – it’s vital for good science

• Allows others to run the original model, understand it, verify it, etc.

But native formats by themselves are not ideal for communication

• Not everyone can run the same software

- One tool’s native format is not necessarily another tool’s format

• Biological semantics often not encoded in native formats

• A neutral format can allow more flexible model reuse

• A common format can allow people to relate your work to theirs more directly (and algorithmically)

!global p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19; … !% Declare parameters u1 = inf1; u4 = inf4; kdelay = 5/8; … p1 = 0.00174155; p2 = 8; p3 = 0.868; p4 = 0.108; p5 = 584; … q4F = (p24+p25*q(1)+p26*q(1)^2+p27*q(1)^3)*q(4); q1F = (p7 +p8 *q(1)+p9 *q(1)^2+p10*q(1)^3)*q(1); SR3 = (p19*q(19))*d3; SR4 = (p1 *q(19))*d1; … % ODEs q1dot = SR4+p3*q(2)+p4*q(3)-(p5+p6)*q1F+p11*q(11)+u1; q2dot = p6*q1F-(p3+p12+p13/(p14+q(2)))*q(2); q3dot = p5*q1F-(p4+p15/(p16+q(3))+p17/(p18+q(3)))*q(3); …

Example fragment of model in MATLAB

Page 11: SBML (the Systems Biology Markup Language)

Why not simply distribute a model in the original format?Of course, you should do that anyway – it’s vital for good science

• Allows others to run the original model, understand it, verify it, etc.

But native formats by themselves are not ideal for communication

• Not everyone can run the same software

- One tool’s native format is not necessarily another tool’s format

• Biological semantics often not encoded in native formats

• A neutral format can allow more flexible model reuse

• A common format can allow people to relate your work to theirs more directly (and algorithmically)

Page 12: SBML (the Systems Biology Markup Language)

What is SBML?

Page 13: SBML (the Systems Biology Markup Language)

SBML is a file format based on XML

Page 14: SBML (the Systems Biology Markup Language)

SBML is a file format based on XML

Don’t work with it directly! Let software do it.

Page 15: SBML (the Systems Biology Markup Language)

Format for representing models of biological processes

• Data structures + principles + serialization to XML

• (Mostly) Declarative, not procedural—not a scripting language

(Mostly) neutral with respect to modeling framework

• E.g., ODE, stochastic systems, etc.

Does not store experimental data, or simulation descriptions

• But software may write their own metadata (annotations) in SBML

For software to read/write, not humans

SBML = Systems Biology Markup Language

Page 16: SBML (the Systems Biology Markup Language)

The process is central

• Literally called “reaction” (not necessarily biochemical)

• Participants are pools of entities of the same kind (“species”) !

!

!

!

• Species are located in containers (“compartments”)

- Core SBML assumes well-mixed compartments

Models can further include:

• Other constants & variables

• Discontinuous events

Core SBML concepts are fairly simple

• Unit definitions

• Annotations

• Other, explicit math

na1 A nb1 B+ nc1 Cf1(...)

na2 A nd2 D+ ne1 Ef2(...)

. . .nc3 C nf3 F

f3(...) + ng3 G

Page 17: SBML (the Systems Biology Markup Language)

Core SBML constructs support many types of models

!

Typical ODE models (e.g., cell differentiation)

Conductance-based models (e.g., Hodgin-Huxley)

Typically do not use SBML “reaction” construct,but instead use “rate rules” construct

Neural models (e.g., spiking neurons)

Typically use “events” for discontinuous changes

Pharmacokinetic/dynamics models

“Species” are not required to be biochemical entities

Infectious diseases BioModels Database model #MODEL1008060001

BioModels Database model #BIOMD0000000451

BioModels Database model #BIOMD0000000020

BioModels Database model #BIOMD0000000127

BioModels Database model #BIOMD0000000234

Example of model type Example model

List originally by Nicolas Le Novére

Page 18: SBML (the Systems Biology Markup Language)

Many examples of SBML and software resources are available

Accepted by dozens of journals *

100’s of software tools available today

• Libraries: libSBML, JSBML

• 260+ listed in SBML Software Guide †

1000’s of models available

• ... in public databases, e.g., BioModels Database, Reactome

• ... as supplementary data to papers

• ... in private repositories

!!* http://sbml.org/Documents/Publications_known_to_accept_submissions_in_SBML_format † http://sbml.org/SBML_Software_Guide

http://sbml.org

Page 19: SBML (the Systems Biology Markup Language)

Where to find software applications compatible with SBML

Page 20: SBML (the Systems Biology Markup Language)

Where to find software applications compatible with SBML

Find SBML software

Page 21: SBML (the Systems Biology Markup Language)

Results of 2011 survey of SBML-compatible softwareQuestion: Which of the following categories best describe your software? (Check all that apply.)

Simulation software

Analysis s/w (in addition, or instead of, simulation)

Creation/model development software

Visualization/display/formatting software

Utility software (e.g., format conversion)

Data integration and management software

Repository or database

Framework or library (for use in developing s/w)

S/w for interactive env. (e.g., MATLAB, R, ...)

Annotation software0 20 40 60 80

11

13

13

14

16

23

31

31

40

42

Out of 81 responses

Page 22: SBML (the Systems Biology Markup Language)

What are some practical details useful to know about SBML?

Page 23: SBML (the Systems Biology Markup Language)

From modeling environments From databases

From papersFrom other people

SBML filescan come from

anywhere

Page 24: SBML (the Systems Biology Markup Language)

SBML FilesFormat: UTF-8 (a text file format)

Extension: usually .xml (sometimes .sbml)

Applications usually have their own native format

• Import/export SBML

Models exist in different “Levels and Versions” of SBML

• Indicated at top of file

Page 25: SBML (the Systems Biology Markup Language)

SBML identifiers and namesMost elements have both an “id” and a “name” field

• Identifier field has restricted syntax: abc123 or _abc123, etc.

- This “id” field value is what you use in math formulas in SBML

• Value of “name” field is almost completely unrestricted

- Names with spaces, $tr@nge and Fμηηγ characters!

• Must assign a value to “id”, but “name” is optional

Some tools let you use names and ignore ids—they autogenerate ids

• But some (especially those w/ script language features) expose ids

Page 26: SBML (the Systems Biology Markup Language)

SBML “rules”“Rules” in SBML define extra mathematical expressions

• E.g.: if need to express additional mathematical relationships beyond what is implied by the system of reactions

3 subtypes:

!

!

!

!

!

Rules define relationships that hold at all times

Rule type General form Example

algebraic 0 = S1 + S2

assignment x = y + z

rate dS/dt = 10.5

x = f(V)

0 = f(W)

dx/dt = f(W)

Page 27: SBML (the Systems Biology Markup Language)

Rules in the context of the overall model

dS1/dt = r1 + r2 + r3 + . . .

dS2/dt = �r1 + r5 + . . .

. . .

0 = f1(W)0 = f2(W)

. . .

x = g1(W)y = g2(W)

. . .

dm/dt = h1(W)dq/dt = h2(W)

. . .

Page 28: SBML (the Systems Biology Markup Language)

Rules in the context of the overall model

dS1/dt = r1 + r2 + r3 + . . .

dS2/dt = �r1 + r5 + . . .

. . .

0 = f1(W)0 = f2(W)

. . .

x = g1(W)y = g2(W)

. . .

dm/dt = h1(W)dq/dt = h2(W)

. . .

Equations derived from reaction

definitions

Page 29: SBML (the Systems Biology Markup Language)

Rules in the context of the overall model

dS1/dt = r1 + r2 + r3 + . . .

dS2/dt = �r1 + r5 + . . .

. . .

0 = f1(W)0 = f2(W)

. . .

x = g1(W)y = g2(W)

. . .

dm/dt = h1(W)dq/dt = h2(W)

. . .

Algebraic rules

Page 30: SBML (the Systems Biology Markup Language)

Rules in the context of the overall model

dS1/dt = r1 + r2 + r3 + . . .

dS2/dt = �r1 + r5 + . . .

. . .

0 = f1(W)0 = f2(W)

. . .

x = g1(W)y = g2(W)

. . .

dm/dt = h1(W)dq/dt = h2(W)

. . .

Assignment rules

Page 31: SBML (the Systems Biology Markup Language)

Rules in the context of the overall model

dS1/dt = r1 + r2 + r3 + . . .

dS2/dt = �r1 + r5 + . . .

. . .

0 = f1(W)0 = f2(W)

. . .

x = g1(W)y = g2(W)

. . .

dm/dt = h1(W)dq/dt = h2(W)

. . .

Rate rules

Page 32: SBML (the Systems Biology Markup Language)

Rules in the context of the overall model

Rules and equations from reactions are

taken together

dS1/dt = r1 + r2 + r3 + . . .

dS2/dt = �r1 + r5 + . . .

. . .

0 = f1(W)0 = f2(W)

. . .

x = g1(W)y = g2(W)

. . .

dm/dt = h1(W)dq/dt = h2(W)

. . .

Page 33: SBML (the Systems Biology Markup Language)

rate law = k · [S1]rate law =

Why are SBML rate expression not identical to rate laws?

• Consider reaction between 2 compartments, #1 and #2

• Assume told . What does this mean?

!

• But what if volume ?

- Look at # of molecules in each compartment:

!- How molecules leave & enter each compartment?

molecules leave the first compartment

molecules enter the second compartment

Interpreting reactions in SBML

S1 � S2

S1

S2

V1V2

d[S2]dt

= �d[S1]dt

= k · [S1]

V2 = 3V1

nS1 = [S1] · V1 nS2 = [S2] · V2

k · [S1] · V1

3 · k · [S1] · V1

Page 34: SBML (the Systems Biology Markup Language)

rate law = k · [S1]rate law =

Why are SBML rate expression not identical to rate laws?

• Consider reaction between 2 compartments, #1 and #2

• Assume told . What does this mean?

!

• But what if volume ?

- Look at # of molecules in each compartment:

!- How molecules leave & enter each compartment?

molecules leave the first compartment

molecules enter the second compartment

Interpreting reactions in SBML

S1 � S2

S1

S2

V1V2

d[S2]dt

= �d[S1]dt

= k · [S1]

V2 = 3V1

nS1 = [S1] · V1 nS2 = [S2] · V2

k · [S1] · V1

3 · k · [S1] · V1

Oops! Something’s wrong …

Page 35: SBML (the Systems Biology Markup Language)

“Kinetic law” in SBML

Rate expressions are (usually) substance/time, not substance/size/time

Conversion for basic cases is simple:

• Multiply by volume of compartment where reactants are located: !

!• Express rates of changes of reactants & products in terms of substances:

!

!

!

!

• Can easily recover concentrations:

rate = k · [S1] · V1

dnS1

dt= �k1 · [S1] · V1

[S1] =nS1

V1[S2] =

nS2

V2

moles, or items, or mass, or

“dimensionless”

dnS2

dt= k1 · [S1] · V1

Page 36: SBML (the Systems Biology Markup Language)

SBML itself provides syntax and only limited semantics

Page 37: SBML (the Systems Biology Markup Language)

SBML itself provides syntax and only limited semantics

No standard identifiers

Page 38: SBML (the Systems Biology Markup Language)

SBML itself provides syntax and only limited semantics

Low info content

No standard identifiers

Page 39: SBML (the Systems Biology Markup Language)

SBML itself provides syntax and only limited semanticsRaw models alone are insufficient

Need standard schemes for machine-readable annotations

• Identify entities

• Clarify mathematical semantics

• Link to other data resources

• Provide authorship & pub. info

Low info content

No standard identifiers

Page 40: SBML (the Systems Biology Markup Language)

SBML supports two annotation schemesSBO (Systems Biology Ontology)

• For mathematical semantics

• One SBML object ← one SBO term

• Short, compact, tightly coupled but limited scope

MIRIAM (Minimum Information Requested In the Annotation of Models)

• For any kind of annotation

• One SBML object ← multiple MIRIAM annotations

• Larger, more free-form, wider scope

Both are externalized and independent of SBML

Page 41: SBML (the Systems Biology Markup Language)

<sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000036" /> <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000010" /> </listOfReactants> <listOfProducts> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000011" /> </listOfProducts> <kineticLaw sboTerm="SBO:0000052"> <math> ... <math> ... </sbml>

Page 42: SBML (the Systems Biology Markup Language)

<sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000036" /> <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000010" /> </listOfReactants> <listOfProducts> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000011" /> </listOfProducts> <kineticLaw sboTerm="SBO:0000052"> <math> ... <math> ... </sbml>

SBO:0000036

“forward bimolecular rate constant, continuous case”

Page 43: SBML (the Systems Biology Markup Language)

Why care about standard ways of writing annotations?

Structured, machine-readable annotations increase your model’s utility

• Allow more precise identification of model components

- Understand model structure

- Search/discover models

- Compare models

• Adds a semantic layer—integrates knowledge into the model

- Helps recipients understand the underlying biology

- Allows for better reuse of models

- Supports conversion of models from one form to another

Page 44: SBML (the Systems Biology Markup Language)

What has been happening with SBML lately?

Page 45: SBML (the Systems Biology Markup Language)

SBML Level 1 SBML Level 2 SBML Level 3

predefined math functions user-defined functions user-defined functions

text-string math notation MathML subset MathML subset

reserved namespaces for annotations

no reserved namespaces for annotations

no reserved namespaces for annotations

no controlled annotation scheme

RDF-based controlled annotation scheme

RDF-based controlled annotation scheme

no discrete events discrete events discrete events

default values defined default values defined no default values

monolithic monolithic modular & extensible

Page 46: SBML (the Systems Biology Markup Language)

Level 3 packages add constructs on top of SBML Level 3 Core

Page 47: SBML (the Systems Biology Markup Language)

Level 3 package What it enablesHierarchical model composition Models containing submodels ✔

Flux balance constraints Constraint-based models ✔

Qualitative models Petri net models, Boolean models ✔

Graph layout Diagrams of models ✔

Multicomponent/state species Entities w/ structure; also rule-based models draft

Spatial Nonhomogeneous spatial models draft

Graph rendering Diagrams of models draft

Groups Arbitrary grouping of components draft

Arrays Arrays or sets of entities draft

Dynamic structures Creation & destruction of components draft

Distributions Numerical values as statistical distributions draft

Annotations Richer annotation syntax

Status

Page 48: SBML (the Systems Biology Markup Language)

Level 3 package What it enablesHierarchical model composition Models containing submodels ✔

Flux balance constraints Constraint-based models ✔

Qualitative models Petri net models, Boolean models ✔

Graph layout Diagrams of models ✔

Multicomponent/state species Entities w/ structure; also rule-based models draft

Spatial Nonhomogeneous spatial models draft

Graph rendering Diagrams of models draft

Groups Arbitrary grouping of components draft

Arrays Arrays or sets of entities draft

Dynamic structures Creation & destruction of components draft

Distributions Numerical values as statistical distributions draft

Annotations Richer annotation syntax

Status

Updated

Updated

Updated

New

Page 49: SBML (the Systems Biology Markup Language)

Community-based development processDefines process for

• Proposing changes and additions to SBML and SBML packages

• Developing specifications

• Voting

• The roles of editors

!

Small changes forthcoming in package requirements and procedures

Page 50: SBML (the Systems Biology Markup Language)

Acknowledgments

Page 51: SBML (the Systems Biology Markup Language)

Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney, Herbert Sauro, Hamid Bolouri, Ben Bornstein, Maria Schilstra, Jo Matthews, Bruce Shapiro, Linda Taddeo, Akira Funahashi, Akiya Juraku, Ben Kovitz, Nicolas Rodriguez, Andreas Dräger, Alex Thomas

SBML & JSBML Team:

SBML Editors:Mike Hucka, Frank Bergmann, Sarah Keating, Nicolas Le Novère, Chris Myers, Lucian Smith, Stefan Hoops, Sven Sahle, James Schaff, Dagmar Waltemath, Darren Wilkinson, Brett Olivier

And another huge thanks to everyone in the SBML and COMBINE communities for massive contributions to SBML development and continuing support

GoSC students: Victor Kofia, Ibrahim Vazirabad, Leandro Watanabe

Attendees of COMBINE 2013, Paris, France

A huge thanks to the many contributors to SBML Level 3 package specifications!

Page 52: SBML (the Systems Biology Markup Language)

SBML funding sources over the past 14 years

National Institute of General Medical Sciences (USA) Air Force Office of Scientific Research (USA) BBSRC (UK) Beckman Institute, Caltech (USA) DARPA IPTO Bio-SPICE Bio-Computation Program (USA) Drug Disease Model Resources (EU-EFPIA Innovative Medicine Initiate) ELIXIR (UK) European Molecular Biology Laboratory (EMBL) Google Summer of Code International Joint Research Program of NEDO (Japan) Japanese Ministry of Agriculture Japanese Ministry of Education, Culture, Sports, Science and Technology JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003) JST ERATO-SORST Program (Japan) Keio University (Japan) Molecular Sciences Institute (USA) National Science Foundation (USA) STRI, University of Hertfordshire (UK)

Page 53: SBML (the Systems Biology Markup Language)

I’d like your feedback! You can use this anonymous form:

http://tinyurl.com/mhuckafeedback