SLML Level 1 Version 1.6 Release 1 - Murphy Lab |...
Transcript of SLML Level 1 Version 1.6 Release 1 - Murphy Lab |...
1
Subcellular Location Markup Language (SLML)
Level 1 Version 1.6 Release 1
Communicating subcellular location protein
patterns for systems biology
A Thesis Presented To
Carnegie Mellon University
Pittsburgh, Pennsylvania
In Partial Fulfillment of the Requirements
For the Degree of
Master of Science
In Computational Biology
By
Iván E. Cao-Berg
Spring 2009
2
“The nice thing about standards is that there are so many to choose from.”
-Andrew S. Tannenbaum
3
Contents Abstract ........................................................................................................................................... 5
Introduction .................................................................................................................................... 5
Subcellular Location Markup Language Level 1 .............................................................................. 7
Document Conventions ............................................................................................................... 8
Preliminary Definitions and Principles ........................................................................................ 9
Matrix ...................................................................................................................................... 9
Identification, Name, Meta and Notes...................................................................................... 10
Identification ......................................................................................................................... 10
Name ..................................................................................................................................... 11
Meta ...................................................................................................................................... 12
Value ...................................................................................................................................... 12
Mathematical notation support ................................................................................................ 13
SLML Components ........................................................................................................................ 13
The SLML Container .................................................................................................................. 13
Documentation ......................................................................................................................... 14
ListOfCells .................................................................................................................................. 15
Cell ............................................................................................................................................. 15
Information ............................................................................................................................... 16
ListOfModels ............................................................................................................................. 16
Model ........................................................................................................................................ 16
ListOfPatterns ............................................................................................................................ 17
ListOfObjects ............................................................................................................................. 17
Object ........................................................................................................................................ 18
Shape ......................................................................................................................................... 19
Texture ...................................................................................................................................... 19
Frequency .................................................................................................................................. 20
ListOfParameters ....................................................................................................................... 21
Parameter .................................................................................................................................. 21
XML Definition ....................................................................................................................... 22
Materials and Methods ................................................................................................................. 23
SLML Toolbox for Matlab .......................................................................................................... 24
Main Tools ............................................................................................................................. 25
4
SLML Model Trainer .............................................................................................................. 25
Results ........................................................................................................................................... 26
Disscussion .................................................................................................................................... 28
Futute Levels in SLML ................................................................................................................ 28
Language Integration ................................................................................................................ 29
SBML ...................................................................................................................................... 29
VCML ...................................................................................................................................... 29
Software Integration ................................................................................................................. 29
MCell ...................................................................................................................................... 30
References .................................................................................................................................... 30
Appendix ....................................................................................................................................... 32
List of Tests ................................................................................................................................ 32
5
Abstract
The Subcellular Location Markup Language (SLML) Level 1 Version 1.6 Release 1 is a model
representation for generative models of subcellular location protein patterns. SLML is oriented
towards describing and annotating parameters and relationships of these models that can be
used to synthesize, among other things, multicolor images. SLML is an XML-based language,
that is, it is a written in neutral fashion with respect to programming languages and software
encoding. SLML was built as a tool for communicating these patterns with fewer bits than the
original data by describing a model that is automated, generative and statistically accurate.
Thus it provides a foundation for accurately describing compartmental volumes that can be
incorporated with other systems biology markup languages like SBML and CellML as well
biochemical applications like MCell and VCell. A detailed description of the language model is
presented with a set of tools to train the generative models and synthesize multicolor images
from the SLML instances.
Introduction
While studying complex biological phenomena, one of the most popular mathematical tools,
involve using ordinary and partial differential equations to represent biochemical kinetics
(Doyle 2001). Study of the behavior of such models involve much more than finding the
solutions to the system of equations, and these approaches, such as equilibrium analysis,
provide a deep insight on the behavior of the model. Nevertheless, in recent years, research
studies have shown that to understand complex biological systems it is required the integration
6
of experimental and computational research –in other words a systems biology approach
(Kitano 2002).
Simulations of single or several biochemical pathways that can be found in literature have the
potential to be used in a systems-wide approach because they serve as building blocks for more
complex phenomena. Hence, computational models that reproduce and predict the detailed
behaviors of cellular systems at this level are the Holy Grail of systems biology (Kitano 2006).
Since there are multiple tools that allow simulation at the systems level, their existence has
fueled the developing of languages that enable the use and reuse of mathematical models
without the necessity of rewriting them for each tool. This permits instances of models to
become blocks of more complex simulations. Languages like Virtual Cell Markup Language
(VCML), CellML and the Systems Biology Markup Language (SBML) allow the communication of
mathematical models in a neutral fashion. Yet their support for compartmental geometries is
limited by constructions of geometrical shapes or the pixilation of 2D experimental images.
Thus, we strive to provide more detailed information about compartmental topologies that
could be mapped into a language similar to SBML and used in other applications. The
Subcellular Location Markup Language (SLML) Level 1 Version 1.6 Release 1 is a model
representation format for generative models of protein subcellular location patterns. SLML is
defined in eXtensible Markup Language (W3C 2001) and is supported by an XML Schema which
defines the different components and relationship of the language model. These models as
described in the instances allow the systematic and comprehensive study of protein subcellular
location and provide useful descriptions of these patterns. The models described in SLML Level
7
1 instances are (1) automated, (2) generative, (3) statistically accurate and (4) compact (Zhao
and Murphy 2007).
The definition of the model description language presented in this document only specifies
generative model parameters and the relationship between models. It doesn’t specify how
programs should use SLML instances nor does it describe how to implement them.
Nevertheless, a collection of applications were written for reading/writing SLML instances as
well as for generation of new examples from these. The SLML Toolbox for Matlab is also
described in detail on this document.
Subcellular Location Markup Language Level 1
The Subcellular Location Markup Language (SLML) Level 1 Version 1.6 Release 1 is a model
representation format for generative models of protein subcellular location patterns (Zhao and
Murphy, 2007). SLML is oriented towards communicating models that are
1. automated, in the sense that they are learned from experimental data,
2. generative, in the sense that we can synthesize new examples from the SLML instance
3. statistically accurate, in the sense that the SLML instance describes the variations from cell
to cell, and
4. compact, in the sense that we can communicate these variations using fewer bits than the
original data set.
8
SLML is described as a collection of components in UML and mapped into an XML schema. This
allows the description of its contents in a neutral fashion that is system independent and widely
supported by most modern programming languages (Quackenbush 2006).
Document Conventions
All the components and attributes of the language model are described in Unified Modeling
Language (UML). The main reason for using UML to describe the main components of the
language model is that it provides a system independent representation of the model that is
both intuitive and clear.
In XML Schema 1.0 language there are two main classes of relationships between components.
The first relationship is the superclass relationship. In SLML Level 1 all major components have
the Name and Identification components as parent classes. This notation will allow future
developers of SLML to easily make changes across the schema without modifying the general
structure of the language. The second is the “composed of” relationship which may seem
similar to the previous one for those who are not familiar with XML. The latter kind of
relationship describes the instance where a compartment is composed of other
subcompartments but these do not inherit attributes from the parent compartment. Most XML
languages use convention in a similar fashion to HTML.
9
Figure 1 listOfParameters class in SLML Level 1 in UML. In SLML, classes do not possess operations so the third part is ignored.
In this document all parent classes are ignored in diagrams while “composed of” relationships
are shown for simplicity.
Figure 2 A snippet of SLML Level that shows the two main relationships. Parent classes are ignored throughout this document since
Identification and Name are parent classes of every major component.
Preliminary Definitions and Principles
SLML Level 1.0 inherits all primitive data types from XML Schema 1.0 (Biron and Malhotra,
2000) but in reality only a minor subset of them is actually used in the language model. These
data types are (1) integers, (2) strings, (3) booleans and (4) doubles.
Matrix
Figure 3 The Matrix, Mrow and Cn components in UML format. Only the Matrix component is presented in detail.
10
The Matrix component is a helper container that is used to define multidimensional arrays or
matrices. Matrices are used in SLML to hold multidimensional parameters. The Matrix
component follows a similar notation to the Matrix element defined in MathML (W3C 2001),
but adds other dimensions to the matrix and has length, width and height attributes. A matrix
with only length is considered a vector; a matrix with length and width is considered a 2D
matrix while one containing these and height is considered a 3D matrix.
The matrix component may contain other matrices which in turn are composed of matrix rows.
Each of the rows may contain only numbers. Usage of variables as entries of arrays hasn’t been
considered in this Version. Hence the Matrix component in SLML cannot be mapped into the
MathML namespace.
XML Definition
<!-- Definition:Matrix -->
<xsd:complexType name="Matrix">
<xsd:sequence>
<xsd:element name="mrow" type="Mrow" minOccurs="1"
maxOccurs="unbounded" />
</xsd:sequence>
<xsd:attribute name="id" type="Identification" use="optional" />
<xsd:attribute name="name" type="Name" use="required" />
<xsd:attribute name="length" type="xsd:int" use="optional" />
<xsd:attribute name="width" type="xsd:int" use="optional" />
<xsd:attribute name="height" type="xsd:int" use="optional"/>
<xsd:attribute name="notes" type="xsd:notes use="optional"/>
</xsd:complexType>
Identification, Name, Meta and Notes
These are the minor components of SLML used for describing patterns of data used in the
major components and their attributes.
Identification
11
Figure 4 The Identification component in UML format.
The Identification component describes the characters that can be used for the identification
attribute of all main and minor components. Even though the identification attribute is optional
and left to the user, a good programming practice should make all identifications unique across
an instance if the user decides to implement them.
XML Definition
<!-- Definition:Identification -->
<xsd:simpleType name="Identification">
<xsd:restriction base="xsd:string">
<xsd:pattern value="(_|[a-z]|[A-Z])(_|[a-z]|[A-Z]|[0-9])*" />
</xsd:restriction>
</xsd:simpleType>
Name
Figure 5 The Name component in UML format.
The Name component defines the character set that can be used for the name attribute of all
containers. The use of this is required for all major components and optional for minor ones.
12
Even though the name assigned to the components is left to the user, the main idea behind this
attribute is to be able to map major components such as models, to other languages that may
reside in a different namespace but in the same file, e.g. having a generative model mapped
into a SBML instance.
The characters allowed by the pattern include all Unicode characters. The Name component is a
parent class of every component in SLML Level 1.
XML Definition
<!-- Definition:Name -->
<xsd:simpleType name="Name">
<xsd:restriction base="xsd:string">
<xsd:pattern value="(_|[a-z]|[A-Z])(_|[a-z]|[A-Z]|[0-9])*" />
</xsd:restriction>
</xsd:simpleType>
Meta
The Meta component defines the meta container used by the Information and Documentation
class. It follows the notation of HTML meta tags and its mainly used for annotation. All Meta
components are optional.
The characters allowed by the pattern include all Unicode characters.
XML Definiton
<!-- Definition:Name -->
<xsd:complexType name="Meta">
<xsd:attribute name="name" type="Name" use="required" />
<xsd:attribute name="value" type="Value" use="required" />
</xsd:complexType>
Value
13
The Value component defines the character set that can be used for value attributes in the
Meta components. The characters allowed by the pattern include all Unicode characters.
XML Definition
<!-- Definition:Value -->
<xsd:simpleType name="Value">
<xsd:restriction base="xsd:string">
<xsd:pattern value="(_|[a-z]|[A-Z])(_|[a-z]|[A-Z]|[0-9])*" />
</xsd:restriction>
</xsd:simpleType>
Mathematical notation support
SLML Level 1.0 includes support of MathML (W3C, 2008). Nevertheless, SLML itself doesn’t use
MathML at this point because a new Matrix class was defined in the SLML namespace for
Version 1.6. Yet, support for MathML will be necessary to provide a new model for future
Levels of the language, since it has been discussed the inclusion of methods in MathML content
format that will allow any generic parser to synthesize images directly from the XML instance.
MathML parsers are standards in most programming languages and support for conversion of
MathML content format to equations is supported by most popular programming languages
like Matlab and Java.
SLML Components
This section discusses the main components of SLML. Some of the components contain the
parameters of the generative models while other exist to describe relationships between
compartments.
The SLML Container
14
Figure 6 The main component of SLML.
The SLML container is the main component of the language. It follows the notation of the
XML Schema 1.0. It is the main class of the language and it contains 4 attributes
1. The namespace of the language, i.e.
http://murphylab.web.cmu.edu/services/SLML/level1. By convention
it should point to the actual schema. Its use is required.
2. The Level of the current schema. The current Level is 1, which corresponds to the first
public release of SLML. Its use is required.
3. The Version of the current schema. The current Version is 1.6 which is the version
discussed in this document. Its use is required.
4. The Release of the current schema. The current Release is 1. Its use is required.
Documentation
Figure 7 The Documentation component of SLML in UML format.
15
The Documentation container allows the user to add new information to the SLML instance.
The main purpose for this class is to allow the user to annotate the SLML schema with
additional data that might be found useful to the user of the SLML instance. This class is in turn
composed of a Meta component that is similar in notation to the meta tag used in HTML. The
use of this component is optional. Its only argument is also optional.
ListOfCells
Figure 8 The ListOfCells component in UML format.
The ListOfCells components is merely a container of all cell models in an SLML Level 1.0. It
aggregates all cell models making
Cell
Figure 9 The Cell component in UML format.
16
The Cell component defines the cell container which is composed of a list of models and
information regarding the data set. Several three-color generative models may be contained
within a single cell compartment. This means they come from the same data set.
Information
Figure 10 The Information component in UML format.
The Information component is a container of information regarding the Cell component. Its
purpose is to annotate the experiment or dataset used to train the generative model.
ListOfModels
Figure 11 The ListOfModels component in UML format.
The ListOfModels is an aggregator of models that facilitates searching.
Model
17
Figure 12 The Model component in UML format.
A model is composed of a list of patterns. Several patterns can make a model, e.g. a vesicular
model is composed of a medial-axis model for nuclear shape, a radial distance model for the
cell membrane and Gaussian mixture model for the protein distribution of vesicular
compartments.
ListOfPatterns
Figure 13 The ListOfPatterns component in UML format.
The ListOfPatterns is an aggregator of patterns that facilitates searching.
ListOfObjects
18
Figure 14 The ListOfObjects component in UML format.
The ListOfObjects is an aggregator of objects that facilitates searching.
Object
Figure 15 The Object component in UML format. The order of the other containers it is made of is irrelevant.
The Object component contains a description of an object in the pattern. Every object in SLML
Level 1.0 is composed of other four main components
1. Shape component. It describes the shape of the object, e.g. a nuclear shape model.
2. Texture component. It describes the texture of the object, e.g. nuclear texture model.
3. Position component. It describes the position of the object with respect to other objects,
e.g. Gaussian object position model.
19
4. Frequency component. It describes the number of objects in the pattern.
The minimum number of objects supported by this language is 1 and the maximum is
unbounded. It should be pointed out that a pattern may be composed of several object types.
Shape
Figure 16 The Shape component in UML format. This is the only member of the Model container that is required.
The Shape component contains a description of the shape of the object. A shape model is
composed of a list of parameters that describe the model. This list should contain all the
parameters needed to synthesize the shape of the object. This is the only member of the Model
component that is required, since the minimum information needed to synthesize an object is
its shape.
Texture
20
Figure 17 The Texture component in UML format.
The Texture component contains a description of the texture of an object. A texture model is
composed of a list of parameters that describe the model. This list should contain all the
parameters needed to synthesize an object with textured. This member of the Model
component is optional since some objects can be synthesized without texture. That is, any
software that parses an SLML instance where an object doesn’t contain texture model should
synthesize the object outline.
Frequency
Figure 18 The Frequency component in UML format.
The Frequency component contains a description of the number of objects in a pattern. A
frequency model is composed of a list of parameters that describe the model. This list should
contain all the parameters needed to synthesize as many objects as described by the latter. This
member of the Model component is optional since some patterns are composed of a single
object. That is, any software that parses an SLML instance where an object doesn’t contain a
frequency model should synthesize a single object.
21
Even though the frequency model is composed of parameters needed to sample from a
distribution, SLML allows frequency models to be described as integers that simply tell how
many objects should be synthesized.
ListOfParameters
Figure 19 ListOfParameters component in UML format.
The ListOfParameters component is a mere aggregatorfor the parameters of a shape, texture,
position and frequency models. The use of this component is optional, though its absence
means no parameters are present.
Parameter
Figure 20 The Parameter component in UML format.
22
The Parameter component is probably the most important container of SLML Level 1. It holds
the attributes of a parameter as well as its value. It contains six attributes
1. Identification. Similar to other components. Its use is optional.
2. Name. Similar to other components. Its use is required.
3. Constant. If the parameter is constant, then it is true. The default value is true. Its use is
optional.
4. Complex. True for parameters that contain other parameters. The default is false. Its use is
optional.
5. Type. The data type of this parameter. It includes basic data types as well as a matrix
definition. The DataType class is a list container that includes the data types supported by
the parameter container.
The Parameter component may hold a scalar, another parameter or a Matrix component.
XML Definition <!-- Definition:Parameter -->
<xsd:complexType name="Parameter">
<xsd:attribute name="id" type="Identification" use="optional" />
<xsd:attribute name="name" type="Name" use="required" />
<xsd:attribute name="constant" type="xsd:boolean" default="true" />
<xsd:attribute name="complex" type="xsd:boolean" default="false" />
<xsd:attribute name="type" type="DataType"
use="optional" default="double" />
<xsd:attribute name=”notes” type=”xsd:anyType” use=”optional” />
</xsd:complexType>
<!-- Definition:DataType -->
<xsd:simpleType name="DataType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="double" />
<xsd:enumeration value="integer" />
<xsd:enumeration value="string" />
<xsd:enumeration value="boolean" />
</xsd:restriction>
</xsd:simpleType>
23
The SLML container
Materials and Methods
Figure 21 Examples of synthesized images from SLML instances.
The data used to train the SLML Level 1 instances come from a 3D HeLa dataset. The data
contains three fluorescence channels for each field which corresponds to DNA distribution,
24
total protein and one of six proteins in the dataset, i.e. (1) giantin, (2) gpp130, (3) LAMP2, (4) a
mitochondrial protein, (5) nucleolin and (6) transferrin. The data used for this project can be
found at http://murphylab.web.cmu.edu/data/2007_Cytometry_GenModel.html
The algorithms and software in this work were implemented in Matlab 2008a and Java. The
software written for this project can be found on
http://murphylab.web.cmu.edu/software/SLML. Training of generative models was
performed according to (Zhao and Murphy 2009). After the models are learnt then these are
parsed in XML format following the rules in the SLML Level 1 schema.
Figure 22 Flowchart describing the process of learning the SLML instances from a collection of microscope images.
SLML Toolbox for Matlab
The Subcellular Location Markup Language (SLML) Toolbox 2009 (v1.5.2) for Matlab is a
collection of scripts and functions that perform most common tasks on SLML Level 1 Version
25
1.* instances. The toolbox can (1) read, write, edit and save SLML instances; (2) validate SLML
instances; (3) train generative models of protein subcellular location pattern and (4) synthesize
multicolor images from SLML instances.
The SLML Toolbox 2009 (v1.5.2) for Matlab is compliant with SLML Level 1. For more
information about the SLML Level 1 as well as other tools, visit
http://murphylab.web.cmu.edu/services/SLML/level1
Main Tools
The Toolbox contains five tools for training generative models and synthesizing multicolor
images. These tools were used.
img2slml
The img2slml is a command line tool that trains a generative model of protein subcellular
location and saves the model as a SLML Level 1 instance.
slml2img
The slml2img is a command line tool that synthesizes multicolor images from one or several
SLML Level 1 instances.
SLML Image Synthesizer
This GUI-based tool allows the user to synthesize multicolor images from multiple SLML Level 1
instances.
SLML Model Trainer
The SLML Model Trainer will train a generative model of protein subcellular location from a
collection of three-color images. To use this tool run the command
26
Results
The generative models of protein subcellular location patterns were mapped to an XML
language known as SLML Level 0. This level merely represented a mapping of all the variables
needed to synthesize three-channel digital image.
Level Release Description
Level 0 Version 0 Private An XML dump of the generative models data structure. (Deprecated).
Level 0 Version 0.5
Private An XML dump of the generative models data structure. The data structure changed from previous version. Supported by a DTD. (Deprecated)
Level 1 Version 1.6
Public A detailed description of the generative models data structure that is not a dump yet allows a one-to-one mapping between documents. It supports documentation and is supported by an XML schema.
Figure 23 Three main versions of SLML.
After future examination of the first private release new compartment relationships were
included to consider the dependencies between the vesicle model that can be described as
containing a (1) a nuclear shape and texture model, (2) a cell membrane model, (3) a Gaussian
mixture protein pattern. These new relationships were added and that became Level 1 as it is
described in this document.
Since SLML is software independent, a set of tools for reading, editing and writing SLML
instances was written and were used to test the validity of the SLML instances with respect to
the schema language definition.
After the models were verified and known to be syntactically and semantically correct, two
applications were constructed. First a GUI based application to synthesize multicolor images
27
which allows the user to view the generated images in gallery form and a GUI based application
for training generative models of subcellular location protein pattern from a three-color image
collection.
The set of applications, known as the SLML Toolbox for Matlab was then ported as executables
for Windows, MacOSX and Linux.
Name Release Description
SLML Toolbox 2006 Private • Train generative models from three-channel images
• Synthesize three color images
• XML dump of the models
SLML Toolbox 2007 Private • Train generative models from three-channel images
• Synthesize three color images
• XML dump of the models supported –but not validated- by a DTD
• Verification through recursion
SLML Toolbox 2008 Public • Train generative models from three-channel images
• Synthesize multicolor images
• Validation using a Java parser
• Joining of multiple models
SLML Toolbox 2009 Public • Train generative models from three-channel images
• Synthesize multicolor images Figure 24History of the SLML Toolbox for Matlab.
The four main distributions were then tested on the different OS to verify their integrity.
28
OS/Matlab Matlab R13 Matlab 2006a Matlab 2006b Matlab 2007a Matlab 2007b Matlab 2008a
Win XP SP2 Untested Failed Passed Passed Passed Passed
Win Vista Not Compatible Not Compatible Not Compatible Untested Untested Passed
Cygwin Failed Failed Failed Failed Failed Untested
MacOSX Tiger Passed Passed Passed Passed Passed Passed
MacOSX Leopard
Untested Untested Passed Passed Untested Passed
Mandrake Untested Passed Passed Passed Untested Passed
OpenSuse Passed Passed Passed Passed Untested Passed
Ubuntu Dapper Untested Passed Passed Passed Untested Passed
Figure 25 Shows the different combinations of OS and Matlab Version the Toolbox was tested on.
Disscussion
Futute Levels in SLML
As it stands SLML Level 1 provides a robust description for generative models of protein
subcellular location. It is designed to cater a huge variety of new models that can be easily
mapped as SLML instances. Yet the language lacks a powerful descriptor for the relationships
between compartments and models. Future developments of the language should consider the
consequences of modeling dependencies and be able to map these to a set of rules.
29
Another important aspect of SLML is that even though it has the potential of annotating
numerous additional data through its Documentation and Information components, a collection
of meta-data should be included so that information about the data set from which the models
were trained could be mapped into SLML, e.g. resolution of the original images.
Language Integration
SBML
As it stands, SLML instances can easily be mapped into a SBML instances. Even though they
share some class names, they reside in different namespaces so clashing between languages is
not present.
Since we constructed the SLML Toolbox for Matlab and the SBML Toolbox for Matlab exists,
future developments towards inclusion in new languages such start at this point, where parsing
a generative model to Matlab is trivial and the same goes for the SBML models.
VCML
VCML resides on its own namespace, so inclusion of SBML instances is just as trivial as with
SBML. Nevertheless, VCML has powerful components for describing compartmental
geometries. Thus, future development of SLML VCML should seek the generation of
compartmental geometries in this format rather than synthesizing of multicolor images.
Software Integration
The PSLID-VCell application is an integration of the Protein Subcellular Location Image Database
(PSLID) and VCell that allows user to create geometries from generative models of subcellular
location protein patterns.
30
Integration of SLML instances into PSLID-VCell should allow importing of SLML files to generate
new compartmental geometries in a similar fashion that can be done with experimental data.
MCell
MCell is a modeling tool for cellular microphysiology in 3D. Even though 3D models were not
considered as part of this project generative models of 3D cellular framework are known (Zhao
and Murphy 2007). Since MCell uses Model Description Language (MDL), which is not an XML
based language, integration of SLML to MCell should occur at a different level. An intermediate
solution to this problem would be to generate 3D meshes and map these to the Virtual Reality
Markup Language (VRML). VRML instances can be easily read in Blender, a free open source 3D
content creation suite, and then mapped to MDL.
References
1. Aderem. Systems Biology: Its Practice and Challenges (2005). Cell 121:511-513.
2. Biron and Malhotra. XML Schema Part 2: Datatypes (2000). Retrieved from
http://www.w3c.org/TR/xmlschema-2 on May 1, 2009.
3. Butler. Computing 2010: from black holes to biology (1999). Nature 402:C67-C70.
4. Chou and Cai. Prediction and classification of protein subcellular location – sequence-order
effect and pseudo amino acid composition (2003). Journal of Cellular Biochemistry 90:1250-
1260.
5. Doyle. Beyond the spherical cow (2001). Nature 411:151-152.
6. Editorial. Towards a theory of biological robustness (2007). Molecular Systems Biology
3:137.
31
7. Hucka, et. al. Systems biology markup language (SBML) Level 1: structures and facilities for
basic model definitions. Retrieved from http://ww.sbml.org on January 23, 2009.
8. Hucka, et. al. The systems biology markup language (SBLML): a medium for representation
and exchange of biochemical network models (2003). Bioinformatics 19(4):524-531.
9. Hunter, et. al. Beginning XML (2007). O’Reilly. ISBN: 978-0-470-11487-2.
10. Kitano. Computational cellular dynamics: a network-physics integral (2006). Nature Reviews
7:163.
11. Kitano. Computational systems biology (2002). Nature 420:206-210.
12. Kitano. International alliances for quantitative modeling in systems biology (2005).
Molecular Systems Biology 1:1-2.
13. Kitano. Systems Biology: A Brief Overview (2002). Science 295(5560):1662-1664.
14. Lloyd, Halstead and Nielsen. CellML: its future, present and past. Progress in Biophysics and
Molecular Biology 85(2-3):433-450.
15. Nature. Are you ready for the revolution? (2001) Nature 409:758-760.
16. Noble. The rise of computational biology (2002). Nature Reviews 3:460-463.
17. Oram and Wilson. Beautiful Code, Leading Programmers Explain How They Think (2007).
O’Reilly. ISBN-10: 0-5960-51004-7.
18. Quackenbush. Standardizing the standards (2006). Molecular Systems Biology 10:1-2.
19. Slepchenko, et. al. Computational Cell Biology – Spatiotemporal Simulation of Cellular
Events (2002). Annu. Rev. Biophys. Biomol. Struct. (31):423-441.
20. W3C. Extensible Markup Language (XML) 1.0 (2008). Retrieved from
http://www.w3.org/TR/2008/REC-xml-20081126/ on May 1, 2009.
32
21. W3C. Mathematical Markup Language (MathML) 2.0 (2003). Retrieved from
http://www.w3.org/TR/MathML2/ on May 1, 2009.
22. W3C. Namespaces in XML 1.0 (2006). Retrieved from http://www.w3.org/TR/REC-xml-
names/ on May 1, 2009.
23. W3C. XML Schema Data Types (2004). Retrieved from http://www.w3.org/TR/xmlschema-
2/ on May 1, 2009.
24. You. Toward Computational Systems Biology (2004). Cell Biochemistry and Biophysics
40:167-185.
25. Zhao and Murphy. Automated learning of generative models for subcellular location:
building blocks for systems biology (2007). Cytometry Part A 71A:978-990.
Appendix
List of Tests
The following table contains a description of all the tests performed on the SLML Toolbox. In
order for a distribution/OS to be compatible with the toolbox, all tests must pass.
Test Description
0000 Train a generative model of protein subcellular location pattern. This
test trains the models from (Zhao & Murphy, 2007) and then parses
them to SLML Level 1.
0001 Test isCompatible.m on juggernaut.cbi.cmu.edu, troll.cbi.cmu.edu and
alien.cbi.cmu.edu the three systems that were used to create the stand-
33
alone applications.
0002 Test array2mathml.m
0003 Test mathml2array.m
0004 Parses a set of generative models of protein subcellular location to
SLML.
0005 Train a generative model of protein subcellular location from a
collection of microscope images and save them as SLML instances . The
amount of images used to train a negligible model because the purpose
of this test is just to test the parsing into SLML vocabulary.
0006 Train a generative model of protein subcellular location from a subset
of a collection of microscope of images. The amount of images used will
train a negligible model because the purpose of this test is just to test
the set of functions to train the model.
0007 Train a generative model of protein subcellular location from a
collection of TfR microscope images and save them as SLML instances.
The purpose of this test is to assess the time it takes to train a model.
0008 Train a generative model of protein subcellular location pattern from a
collection of giantin microscope images and save them as SLML
instances. The purpose of this test is to assess the time it takes to train
a model.
0009 Generate multicolor images from multiple generative models.
0010 Test model2slml.m
0011 Generate framework from a single SLML instance. Used to test
ml_gencellcomp from SLIC.
34
0012 Generate cell framework from lysosome.mat
0013 Generate cell framework from nucleolus.mat
0014 Generate cell framework from endosome.mat
0015 Generate cell framework from giantin.mat