Characterization of Molecular Communication Based on Cell ...

75
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: eses, Dissertations, and Student Research Computer Science and Engineering, Department of 12-2016 Characterization of Molecular Communication Based on Cell Metabolism rough Mutual Information and Flux Balance Analysis Zahmeeth Sayed Sakkaff University of Nebraska - Lincoln, [email protected] Follow this and additional works at: hp://digitalcommons.unl.edu/computerscidiss Part of the Biochemical and Biomolecular Engineering Commons , Bioinformatics Commons , Biological Engineering Commons , Biology Commons , Biomaterials Commons , Biomechanics and Biotransport Commons , Catalysis and Reaction Engineering Commons , Computational Engineering Commons , Computer Engineering Commons , Molecular, Cellular, and Tissue Engineering Commons , Nanoscience and Nanotechnology Commons , and the Other Biomedical Engineering and Bioengineering Commons is Article is brought to you for free and open access by the Computer Science and Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Computer Science and Engineering: eses, Dissertations, and Student Research by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Sakkaff, Zahmeeth Sayed, "Characterization of Molecular Communication Based on Cell Metabolism rough Mutual Information and Flux Balance Analysis" (2016). Computer Science and Engineering: eses, Dissertations, and Student Research. 114. hp://digitalcommons.unl.edu/computerscidiss/114

Transcript of Characterization of Molecular Communication Based on Cell ...

Page 1: Characterization of Molecular Communication Based on Cell ...

University of Nebraska - LincolnDigitalCommons@University of Nebraska - LincolnComputer Science and Engineering: Theses,Dissertations, and Student Research Computer Science and Engineering, Department of

12-2016

Characterization of Molecular CommunicationBased on Cell Metabolism Through MutualInformation and Flux Balance AnalysisZahmeeth Sayed SakkaffUniversity of Nebraska - Lincoln, [email protected]

Follow this and additional works at: http://digitalcommons.unl.edu/computerscidissPart of the Biochemical and Biomolecular Engineering Commons, Bioinformatics Commons,

Biological Engineering Commons, Biology Commons, Biomaterials Commons, Biomechanics andBiotransport Commons, Catalysis and Reaction Engineering Commons, ComputationalEngineering Commons, Computer Engineering Commons, Molecular, Cellular, and TissueEngineering Commons, Nanoscience and Nanotechnology Commons, and the Other BiomedicalEngineering and Bioengineering Commons

This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at DigitalCommons@University ofNebraska - Lincoln. It has been accepted for inclusion in Computer Science and Engineering: Theses, Dissertations, and Student Research by anauthorized administrator of DigitalCommons@University of Nebraska - Lincoln.

Sakkaff, Zahmeeth Sayed, "Characterization of Molecular Communication Based on Cell Metabolism Through Mutual Informationand Flux Balance Analysis" (2016). Computer Science and Engineering: Theses, Dissertations, and Student Research. 114.http://digitalcommons.unl.edu/computerscidiss/114

Page 2: Characterization of Molecular Communication Based on Cell ...

CHARACTERIZATION OF MOLECULAR COMMUNICATION BASED ONCELL METABOLISM THROUGH MUTUAL INFORMATION AND FLUX

BALANCE ANALYSIS

by

Zahmeeth Sayed Sakkaff

A THESIS

Presented to the Faculty of

The Graduate College at the University of Nebraska

In Partial Fulfilment of Requirements

For the Degree of Master of Science

Major: Computer Science

Under the Supervision of Professor Massimiliano Pierobon

Lincoln, Nebraska

December, 2016

Page 3: Characterization of Molecular Communication Based on Cell ...

CHARACTERIZATION OF MOLECULAR COMMUNICATION BASED ON

CELL METABOLISM THROUGH MUTUAL INFORMATION AND FLUX

BALANCE ANALYSIS

Zahmeeth Sayed Sakkaff, M. S.

University of Nebraska, 2016

Adviser: Massimiliano Pierobon

Synthetic biology is providing novel tools to engineer cells and access the basis

of their molecular information processing, including their communication channels

based on chemical reactions and molecule exchange. Molecular communication is a

discipline in communication engineering that studies these types of communications

and ways to exploit them for novel purposes, such as the development of ubiquitous

and heterogeneous communication networks to interconnect biological cells with nano

and biotechnology-enabled devices, i.e., the Internet of Bio-Nano Things. One major

problem in realizing these goals stands in the development of reliable techniques

to control the engineered cells and their behavior from the external environment.

A possible solution may stem from exploiting the natural mechanisms that allow

cells to regulate their metabolism, the complex network of chemical reactions that

underlie their growth and reproduction, as a function of chemical compounds in the

environment.

In this thesis, molecular communication concepts are applied to study the poten-

tial of cell metabolism, and its regulation, to channel information from the outside

environment into the cell as function of chemical compounds in the environment, and

quantify how much information of the internal state of the metabolic network can be

perceived from the outside environment. For this, cell metabolism is characterized in

Page 4: Characterization of Molecular Communication Based on Cell ...

this work through two abstractions, namely, as a molecular communication encoder

and a modulator, respectively. The former models the cell metabolism as a binary

encoder of the mechanisms underlying the regulation of the cell metabolic network

state in function of the chemical composition of the external environment. The latter

models the metabolic network inside the cell as a digital modulator of metabolite

exchange/growth according to the information contained in its state. Based on these

abstractions, the aforementioned potential of cell metabolism is quantified with the

information theoretic mutual information parameter obtained through the use of a

well-known and computationally efficient metabolic simulation technique.

Numerical results are obtained through simulation of cell metabolism based on

the standard processes of Genome Scale Modeling (GEM) and Flux Balance Anal-

ysis (FBA). These preliminary proof-of-concept results are based on the following

three main cellular species: Escherichia coli (E. coli), the “standard” organism in

microbiology, and two important human gut microbes studied in our collaborators’

lab, namely, the Bacteroides thetaiotaomicron (B. theta) and the Methanobrevibacter

smithii (M. smithii), which provide a direct connection of this work to future practical

applications.

Page 5: Characterization of Molecular Communication Based on Cell ...

iv

ACKNOWLEDGMENTS

At this place I would like to thank many people who helped me to complete this

thesis successfully. First of all, Prof. Massimiliano Pierobon, who supervised me

throughout this research and introduced me to the challenging Telecommunication

Systems Modeling and Engineering of Cell Communication Pathways research area.

I am very grateful to him for the immense amount of time that he dedicated for

supervision, for his patience, for his continual support and motivation and knowledge

leading me into the appropriate methods and techniques needed in this work. Your

knowledge and enormous support has been the key to all my work, thanks! Next,

I would like to thank my committee members Dr. Myra Cohen and Dr. Juan Cui,

thank you for devoting your valuable time to improve the thesis. I would also like

to thank Dr. Myra Cohen and Dr. Nicole Buan, Mikaela Cashman, Jennie Catlett

and Dr. Christine Kelley for their constructive feedback, and for encouraging me

to delve into the subject of cell metabolism, and flux balance analysis through the

KBase software application suite.

I would like to thank all the Molecular and Biochemical Telecommunications

(MBiTe) Lab colleagues, in particular, Aditya Immaneni who worked with me to

visualizing the complex network of cell metabolisms. I would also like to thank the

KBase developers team for their active support throughout the development of this

research and US National Science Foundation for supporting this research through

grant MCB-1449014. Apart from these, I would also like to acknowledge Allison

Haindfield as the proof reader of this thesis, and I am gratefully indebted to her for

her very valuable comments on this thesis. Finally, I must express my very profound

gratitude to my family members, and friends for providing me with unfailing support

and continuous encouragement throughout my years of study and through the pro-

Page 6: Characterization of Molecular Communication Based on Cell ...

v

cess of researching and writing this thesis. This accomplishment would not have been

possible without them. Thank you.

Page 7: Characterization of Molecular Communication Based on Cell ...

vi

Contents

Contents vi

List of Figures ix

List of Tables xii

1 Introduction 1

2 Background 5

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Biological Pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Cell Metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 The Mathematical Relation Between Transcription Factors and Enzymes 10

3 Cell Metabolism as a Molecular Communication System 13

3.1 Molecular Communication Abstraction of Cell Metabolism . . . . . . 13

3.2 Steady-State Mutual Information . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Stage I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 Stage II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Page 8: Characterization of Molecular Communication Based on Cell ...

vii

4 In silico Characterization of the Mutual Information of Cell Metabolism

through Flux Balance Analysis 21

4.1 Estimation of Optimal Enzyme Expression Regulation through Flux

Balance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 An Upper Bound Steady-State Mutual Information of Cell Metabolism 27

4.2.1 Upper Bound for Stage I . . . . . . . . . . . . . . . . . . . . . 27

4.2.2 Upper Bound for Stage II . . . . . . . . . . . . . . . . . . . . 29

5 Data Generation Workflow 31

5.1 Data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 In silico experimentation for E. coli . . . . . . . . . . . . . . . . . . . 34

5.3 In silico experimentation for B. theta and M. smithii . . . . . . . . 35

6 Numerical Results 37

6.1 Stage I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.1.1 Numerical Results for E. coli . . . . . . . . . . . . . . . . . . 38

6.1.2 Numerical Results for B. theta and M. smithii . . . . . . . . . 42

6.2 Stage II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.2.1 Upper bound of steady-state mutual information over internal

metabolic state changing reactions with respect to biomass only 47

6.2.2 Upper bound of steady-state mutual information over internal

metabolic state changing reactions with respect to uptake and

secretion of compounds and biomass . . . . . . . . . . . . . . 48

6.2.3 Upper bound of steady-state mutual information over seven in-

put compounds with respect to uptake and secretion of com-

pounds and biomass . . . . . . . . . . . . . . . . . . . . . . . 49

6.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Page 9: Characterization of Molecular Communication Based on Cell ...

viii

7 Conclusions and Future Work 55

Bibliography 57

Page 10: Characterization of Molecular Communication Based on Cell ...

ix

List of Figures

1.1 Proposed model structure for interpretation of biological system with molec-

ular communication system . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Graphical representation of the interconnection of signal transduction,

gene regulation and metabolic pathways. . . . . . . . . . . . . . . . . . . 7

3.1 Sketch of the proposed molecular communication abstraction of cell metabolism. 14

3.2 Sketch of the proposed molecular communication system based on cell

metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Sketch of the proposed Metabolic Reaction Abstraction. . . . . . . . . . 16

4.1 KEGG module of Bacteroides thetaiotaomicron metabolic pathways. (a)

Summary of the biological processes shown in the pathway map of Glycol-

ysis / Gluconeogenesis and Glyoxylate and dicarboxylate metabolism (b)

Enlarged fine details of a section of a complete metabolic model, (c) Part

of the complete KEGG database pathway maps of Bacteroides thetaio-

taomicron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 From metabolic network to stoichiometric matrix. (a) Conceptual model

of a metabolic network. (b) List of the reactions participating in the

metabolic network. (c) Corresponding stoichiometric matrix. . . . . . . . 24

Page 11: Characterization of Molecular Communication Based on Cell ...

x

4.3 Conceptual model of the FBA LP formulation for finding the optimal

solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1 KBase Workflow for simulation data. . . . . . . . . . . . . . . . . . . . . 32

5.2 KBase workflow to generate silico data. . . . . . . . . . . . . . . . . . . . 33

6.1 Optimal E.Coli K12 MG1655 growth as a function of the input flux of

D-Glucose and Lactose in the environment. . . . . . . . . . . . . . . . . 39

6.2 FBA-estimated binary chemical reaction states {r∗i }Mi=1 for each combi-

nation of D-Glucose and Lactose input fluxes, where white = ON state;

black = OFF state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3 FBA-estimated binary chemical reaction states {r∗i }Mi=1 for each combina-

tion of Glucose, Hematin, Formate, H2, V itaminB12, Acetate, and Vita-

min K input fluxes, where yellow = ON state; violet = OFF state. . . . 42

6.4 FBA-estimated binary chemical reaction states {r∗i }Mi=1 for each combina-

tion of Glucose, Hematin, Formate, H2, V itaminB12, Acetate, and Vita-

min K input fluxes, where yellow = ON state; violet = OFF state. . . . 43

6.5 14 FBA groups of B. theta based on similar FBA-estimated chemical re-

action states for each combination of Glucose, Hematin, Formate, H2,

V itaminB12, Acetate, and Vitamin K input fluxes. . . . . . . . . . . . . 45

6.6 31 FBA groups of M. smithii based on similar FBA-estimated chemical

reaction states for each combination of Glucose, Hematin, Formate, H2,

V itaminB12, Acetate, and Vitamin K input fluxes. . . . . . . . . . . . . 46

6.7 A hive plot for the FBA group F is shown in the figure. The reactions are

placed on the Z axis, the reactants on the X axis and the products on the

Y axis. Further the External compounds are placed higher on the X and

Y axes than the Internal compounds. . . . . . . . . . . . . . . . . . . . . 51

Page 12: Characterization of Molecular Communication Based on Cell ...

xi

6.8 Shows differential hive plots of F vs G and F vs Z. The groups F and G

in F vs G hive plot has the same biomass whereas, the groups F and Z in

F vz Z hive plot have the least and highest biomass respectively. When a

reaction is present in F and absent in G or Z the reaction is represented

along with its links to the compounds. When a reaction is present in the

other groups but absent in group F the reaction is shown as a node not

connected to any other compounds. . . . . . . . . . . . . . . . . . . . . 53

6.9 Shows the state changing reactions represented as a network diagram. The

size of a node is proportional to the number of links connecting to or from

it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Page 13: Characterization of Molecular Communication Based on Cell ...

xii

List of Tables

6.1 FBA group matrix of B. theta where the rows are the 14 groups and the

columns represent the chemical compounds uptaken, secreted, and the

generation biomass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2 FBA group matrix of M. smithii where the rows are the 31 groups and

the columns represent the chemical compounds uptaken, secreted, and the

generated biomass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Page 14: Characterization of Molecular Communication Based on Cell ...

1

Chapter 1

Introduction

Molecular communication is one of the latest frontiers in communication engineer-

ing [1], where tools from computer communications, information theory, signal pro-

cessing, and wireless networking are applied to the domain of chemical reactions

and molecule exchange. Recent results in molecular communication research range

from theoretical studies of the communication channels and the expression of their

communication capacity [36], [14], [43], [4], to the more practical design of suitable

modulation and coding techniques [31], and networking protocols [16].

Synthetic biology is today providing novel tools for the design, realization, and

control of biological processes through the programming of cells’ genetic code [23].

These tools are allowing engineers to study and access the basis of molecular infor-

mation processing in biological cells, which can be potentially utilized for the real-

ization of practical molecular communication systems [35], [29]. The future pervasive

deployment of genetically engineered cells and their interaction with other bio, mi-

cro and nano-technology enabled devices through molecular communication systems

and networks has been recently envisioned as the novel paradigm of the Internet

of Bio-Nano Things [2]. These ubiquitous and heterogeneous communications will

Page 15: Characterization of Molecular Communication Based on Cell ...

2

enable advanced applications in many fields, including medicine (e.g., developing bio-

compatible diagnosis and treatment systems), industry (e.g., biologically-controlled

food production), and agriculture (e.g.,, monitoring and control of soil chemical and

microbiological status).

One major problem in synthetic biology stands in the control from the external

environment to the internal functionalities of genetically engineered cells. Various

techniques to realize this control have been explored, such as the use of light, i.e.,

optogenetics [45], magnetic fields, i.e., magnetic nanoparticles [12], and dedicated sig-

naling circuits [28]. A possible solution, may stem from exploiting the natural mech-

anisms involved in the regulation of cell metabolism. Cell metabolism is a complex

network of chemical reactions that underlie the cell’s growth and reproduction, which

consumes and transforms chemical compounds present in its environment. Cells have

mechanisms to regulate and optimize their metabolism (metabolic state) according

to the chemical composition of the surrounding environment.

In order to achieve our goal of gaining more control of the internal cell function-

alities from the external environment, we need a deeper understanding of how the

information flows from the cell’s environment to the metabolic state and how much

information of the internal cell metabolic state can be perceived from the outside en-

vironment. This requires a deeper understanding of cell regulation mechanisms from

beginning to end.

Page 16: Characterization of Molecular Communication Based on Cell ...

3

CellMetabolicState

Chemical compound 1Chemical compound 2Chemical compound 3Chemical compound 4

Environment

Cell

Regulationofcellmetabolism

InputInformation OutputInformation

BiologicalRealityMolecularCommunicationAbstraction

BinaryEncoder

CellMetabolicState

InputInformationDigital

Modulator OutputInformation

Figure 1.1: Proposed model structure for interpretation of biological system withmolecular communication system

In this thesis, we apply molecular communication (MC) concepts to study the

potential of cell metabolism, and its regulation, to channel information from the

outside environment into the cell as shown in Figure 1.1. In the view of molecular

communication systems, the potential of cell metabolism and its regulation can be

understood by how much information (in bits) the cell can take from the external

environment and encode into changes in its internal metabolic behavior. For this

end, we abstract the cell metabolism as a cascade of two MC components, namely, i)

as a binary encoder of mechanisms underlying the regulation of cell metabolic state

as function of chemical compounds of the external environment, and ii) as a digital

modulator of metabolite growth/exchange according to the information contained in

the metabolic state of the cell.

Inspired by [39], we apply information theory tools to express the performance

Page 17: Characterization of Molecular Communication Based on Cell ...

4

of this binary encoder (Stage I) and digital modulator (Stage II) in terms of steady-

state mutual information. Subsequently, we define an upper bound to this mutual

information that can easily be quantified in silico through the use of a well-known

and computationally efficient metabolic simulation techniques, namely, Genome Scale

Modeling (GEM) and Flux Balance Analysis (FBA), which relies only on the a priori

knowledge of the cell’s DNA code (genome) and epigenetic regulation.

Next, we present numerical results obtained by analyzing the binary encoder model

of the Escherichia coli (abbreviated as E. coli) bacterium metabolism and its reg-

ulation with respect to two different input chemical compounds, namely, D-Glucose

and Lactose. We also show the results of FBA in terms of growth rate computed for

different combinations of values of input fluxes of the aforementioned two compounds.

Further, we report the numerical results of Stage I and Stage II proposed for a case

study with human gut microbes metabolism and its regulation named as Bacteroides

thetaiotaomicron (abbreviated as B. theta) and Methanobrevibacter smithii (abbre-

viated as M. smithii) with respect to seven different input chemical compounds.

The rest of this thesis is organized as follows. Chapter 2 covers some necessary

background in microbiology and mathematical relation between transcription factors

and enzymes. Chapter 3 presents the cell metabolism as a molecular communication

system. Here, we examine how cell metabolism is characterized by the two abstrac-

tions, namely, as an MC binary encoder and a digital modulator. Chapter 4 discusses

the information theory to characterize the steady-state mutual information of the

proposed abstractions. Chapter 5 discusses in silico experimentations to generate the

data presented in this thesis. Chapter 6 presents the numerical results for three main

cellular species, E. coli, B. theta and M. smithii, through several simulations and

experiments. The final chapter covers the analysis of our findings, the conclusion,

and details some future avenues.

Page 18: Characterization of Molecular Communication Based on Cell ...

5

Chapter 2

Background

2.1 Motivation

A living cell environment is composed of various biochemical compounds. Different

compounds work together to encode a particular information by combining various

biochemical signals [39], [33]. This environmental information is then sensed by the

cell, which regulates its own internal metabolic network state consequently [39]. A

cell is able to extract this information from the environment by means of biochemical

processes present inside the cell. These biological processes are generally composed

of signal transduction and gene regulation [18], [33], [7], which occur along a chain of

chemical reactions known as biological pathways. This extracted information can be

used to alter the gene expression, thereby modifying the cell metabolic network state

[33], [39]. Hence, the final outcome of the modified cell metabolic network state is

correlated to the information that the cell extracts from the environment. However,

during this process there will be some limitation in the amount of information that the

cell can extract from the environment. This is due to the generation of biochemical

noise coming from feedback loops, cross talks, amplification, integration, and a delay

Page 19: Characterization of Molecular Communication Based on Cell ...

6

inherent within the aforementioned biological processes [18], [46]. As a result, the

extracted information can be distorted, and the cell may not be successful in precisely

modify its metabolic network accordingly.

In this thesis, we abstract the behavior of the aforementioned biological processes

of a cell as a communication system and analyze them with communication engineer-

ing tools. The goal of our work is to characterize the aforementioned limitations of a

cell in extracting input information from the environment. By stemming from this,

we apply communication engineering concepts in order to characterize the potential

of a cell to represent information from the external environment to the metabolic

state and, subsequently, we quantify how much information of the internal state of

the metabolic network can be perceived from the external environment as a form

of biomass (growth), uptake, and secretion1. In order quantify the limits of this

information flow, inspired by the mathematical model proposed by [39], [46], [15],

we apply Information theory tools. Based on the latter, we develop a mathematical

framework to compute the amount of information flowing from the external environ-

ment through the cell’s metabolic state to the cell growth, uptake, and secretion of

chemical compounds. We envision that the results of this work will enable the future

design of communication engineering techniques to operate a fine-tuned control of the

cell behavior based on information transmission through its metabolism.

2.2 Biological Pathways

In this section, we briefly discuss what a biological pathway is and how the interaction

between different biological pathways vary based on the variations in the chemical

1Biomass is a production of organic materials in the cell which maximize the growth. An uptakeis an absorption of chemical compounds by the cell from the environment to help the chemicalreactions during the cell metabolism process. Secretion is the release of chemical compounds, thatwere the products of some chemical reactions and not needed by the cell, into the environment.

Page 20: Characterization of Molecular Communication Based on Cell ...

7

compounds present in the environment. A better understanding of the workings of

biological pathways will help us better understand the cell metabolism, which is the

main focus of this study.

EnvironmentChemical

Composition

P

P

P

P

P

P

ATP

P

ATP

External Signal as compound 1, 2, 3,..SignalTranscription factorGene (DNA)mRNAEnzymeMetabolite

Receptors

Transporters

PhosphateCofactor

Chromosomal(DNA)

CytosolP

P

P

P

P

P

P

Figure 2.1: Graphical representation of the interconnection of signal transduction,gene regulation and metabolic pathways.

A biological pathway can be defined as a chain of reactions among chemical com-

pounds in a cell which leads to the production of a certain product, such as protein or

fat molecules. In some cases, a biological pathway can cause a change inside the cell

such as turning a gene expression ON/OFF. In order to carry out their designated

task, the compounds involved in biological pathways interact with each other and

with chemical signals inside the cell. The biological pathways can essentially be cate-

gorized into three kinds: signal transduction pathways, gene regulation pathways, and

cell metabolic pathways (which compose the aforementioned metabolic networks).

Page 21: Characterization of Molecular Communication Based on Cell ...

8

Signal transduction pathways transport information from the cell’s environment

to its interior. Cells have proteins on their surface, called receptors to which com-

pounds from the environment bind. After this binding, the information about the

compounds in the environment travels inside the cell where it is transported by spe-

cialized proteins that trigger specific reactions, these reactions are organized into

cascades. This cascade of phosphorylations, which is the addition of a phosphate

group to a molecule. This cascade of phosphorylation dictates the activation of tran-

scription factors. Generally, the output of these reactions is relayed to one or more

gene regulation pathways, detailed next. Figure 2.1 shows the chemical compounds

(small structures outside of the cell) from the cell’s environment and how they bind to

the receptors (yellow cylinder and red cube) located on the surface of the cell mem-

brane. This leads to the activation of the cell’s signaling pathways (interconnected

blue lines). Once these pathways are activated, special proteins (blue pentagons)

transport the signal internally which regulate the gene expression by activating or in-

hibiting transcription factors (purple triangles) which are initially inactive proteins

floats freely in the cytosol.

Gene regulation pathways consist of chemical reactions that govern the genes and

their expression level into proteins. Some of these proteins are enzymes (red stars in

Figure 2.1), that act as catalysts of specific biochemical reactions in metabolic path-

ways. They work by regulating the transcription and translation processes, where

transcription factors interfere with the deoxyribonucleic acid (DNA) copy into mes-

senger ribonucleic acid (mRNA) and the subsequent synthesis of the gene-encoded

proteins [18]. The transcription factors can be either active or inactive based on the

output of the aforementioned signal transduction pathways. Figure 2.1 also shows

how the activated transcription factors interact with the genes (DNA), which then

produce molecules mRNA. These mRNA molecules are subsequently translated into

Page 22: Characterization of Molecular Communication Based on Cell ...

9

proteins. The transcription factors, genes, and mRNA form the gene regulatory path-

ways. The newly produced proteins are used for internal cellular mechanisms such

as building cellular components, transporting the information in signal transduction

pathways, and acting as catalysts for the metabolic reactions.

Metabolic pathways are chains of chemical reactions inside the cell that break

down compounds present in the environment, such as sugar, minerals, or vitamins to

obtain energy and materials to expand the cell (biomass). In Figure 2.1 the cofactor

is an energy molecule (e.g. ATP) which drives the chemical reactions. Metabolic

pathways are regulated by the aforementioned enzymes, that control rate of the re-

actions constituting them. As result of enzyme regulation, chemical compounds are

exchanged between the cell and environment via special protein called transporters.

As we can see, the three main types of biological pathways such as signal transduc-

tion, gene regulation and metabolic pathways are tightly interconnected, and cells’

behavior can only be properly understood by studying these three processes together.

In this thesis we will apply on molecular communication concepts in order to study

the potential of cell metabolism, and its regulation, to channel information from the

outside environment into the cell. We also look at how the internal state changes of

cell metabolism are perceived from the outside environment. In the next section, we

detail the cell metabolism and its key aspects.

2.3 Cell Metabolism

Cell metabolism is a complex process comprising of pathways that break down the

chemical compounds to produce energy, synthesize proteins, and produce building

blocks required for the cell growth and reproduction [33]. Some of these compounds

are consumed as inputs for subsequent chemical reactions in the pathways, while the

Page 23: Characterization of Molecular Communication Based on Cell ...

10

remaining metabolites are used to either generate energy or build cell components

(biomass), while consuming energy or otherwise discarded through secretion. Addi-

tionally, these reactions also receive an uptake of chemical compounds from the cell’s

environment in the form of inputs to keep the chain of reactions going [42]. Most of

the chemical reactions that compose the aforementioned metabolic pathways do not

take place spontaneously, but are catalyzed by enzymes, defined above.

Cells have predefined mechanisms to control the activities of enzymes [30], mostly

through the aforementioned signal transduction and gene expression pathways. For

example, the cell can fine tune the rate at which the corresponding catalyzed reactions

occur by controlling the expression of the corresponding enzyme-encoding gene, which

in turn is controlled by the information about the compounds present in the external

environment that signal transduction pathways propagate inside the cell. Among

different adaptation mechanisms, we focus on the regulation of enzyme expression

from their corresponding DNA genes as a function of the input chemical compounds.

As we stated, the metabolic process is regulated by enzymes. The enzyme ex-

pression mechanism should be controlled depending on the required outcome. This is

achieved by regulating the transcription process, as discussed in the previous section.

2.4 The Mathematical Relation Between

Transcription Factors and Enzymes

The process underlying the expression of enzymes can be formalized mathematically.

This formalization is important as it forms the basis of the molecular communication

abstraction covered in the next section. Since the enzymes act as catalysts for most

of the chemical reactions that happen during cell metabolism, the rate at which

Page 24: Characterization of Molecular Communication Based on Cell ...

11

each enzyme is expressed directly influences the rate of the corresponding catalyst

reaction. According to a commonly accepted biological model, given a determinate

concentration of active transcription factors [TF* ], the rate Re at which an enzyme

is expressed is given by one of the following two sigmoidal expressions, called Hill’s

functions: [3], [44]:

Re =β[TF ∗]n

Knd + [TF ∗]n

if activation , (2.1)

Re =β

1 +(

[TF ∗]Kd

)n if repression ; (2.2)

where β is the maximum expression level of the enzyme, n is the Hill’s coefficient, hav-

ing values between 1 and 4 depending on how many transcription factors cooperatively

interact with the DNA gene and the equilibrium constant, Kd [44]. Equation (2.1)

models the situation where a higher concentration of transcription factors increase

the enzyme expression from zero to β (activation), while (2.2) models the opposite

(repression). Due to the sigmoidal behavior of the above two expressions with respect

to the active transcription factor concentration [TF* ], they can be expressed with a

logical approximation as follows [3]:

Re ' βH ([TF ∗]−Kd) if activation , (2.3)

Re ' βH (Kd − [TF ∗]) if repression ;

where H(.) is the Heaviside step function, equal to 1 when the argument is positive,

and 0 vice versa. This logical approximation makes sense because the sigmoidal

behavior expressed by (2.1) and (2.2) has the curve values close to two distinct states,

namely, zero and maximum expression level β. According to the approximation in

(2.3), the enzyme expression, and the rate of chemical reaction, can be separated into

Page 25: Characterization of Molecular Communication Based on Cell ...

12

distinct states: ON or OFF, depending on the concentration of active transcription

factors, which in turns depends on the information about the compounds present

in the environment propagated by the cell signal transduction pathways. The ON

state represents the maximum enzyme expression rate and corresponding metabolic

reaction rate, while OFF state represents no enzyme expression and absence of the

corresponding chemical reactions in the cell metabolism. Hence, variations in the

chemical compounds in the environment ultimately cause changes in the metabolic

state by activating or deactivating the chemical reactions. These changes in the

chemical reaction state (ON/OFF) are then reflected as variations in the uptake

and secretion of chemical compounds and the biomass (cell growth), which can be

experimentally measurable parameters when we consider the cell metabolic process

from the perspective of the environment.

Page 26: Characterization of Molecular Communication Based on Cell ...

13

Chapter 3

Cell Metabolism as a Molecular

Communication System

3.1 Molecular Communication Abstraction of

Cell Metabolism

In order to understand how the information flows from the cell’s environment to the

metabolic state, and how much information of the internal cell metabolic state can

be perceived from the outside environment, we need to understand the cell metabolic

regulation mechanism from the beginning to end. For, this, we propose a molecular

communication abstraction, sketched in Figure 3.1.

Part (a) in the below Figure 3.1 shows that the cell takes certain concentrations of

chemical compounds into the cell metabolism, where the variations in these concen-

trations cause state changes in the cell’s metabolic network, represented by dashed

and solid lines connecting dots. The dots represent the chemical compounds involved

in the metabolic reactions, or metabolites, the dashed lines represent the inactive

Page 27: Characterization of Molecular Communication Based on Cell ...

14

reactions, and the black and pink solid lines represent the active reactions, respec-

tively. The pink solid lines represent the active state- changing reactions i.e., the

reactions that change their state as function of variations in the chemical compounds

concentrations.

ZS 2016 10

c1c2c3

Input

Concentration 1

Concentration 2

Concentration 3

Chemical compound 1Chemical compound 2Chemical compound 3Chemical compound 4

r1=1

r3=1

r2=1r4=1

r5=0

rM=0

r7=1

r8=1

rN=1

Biological Cell

[c1…N]J1…N

Enzyme Expression Regulation

Molecular Communication Abstraction

r1=1

r3=1

r2=1r4=1

r5=0

rM=0

r7=1

r8=1

rN=1

U1…T

S1…JBiomass Growth/No

GrowthOutput

Metabolic Reactions

(a) Stage I

(b) Stage II

Figure 3.1: Sketch of the proposed molecular communication abstraction of cellmetabolism.

We model Part (a), or Stage I, as the molecular communication binary encoder

abstraction as in the form of expression of enzyme regulation and Part (b), or Stage II,

as the molecular communication digital modulator abstraction. These state changes

are reflected as variations in the uptake and secretion of chemical compounds and the

biomass (cell growth), which can be measured when we consider the cell metabolic

process from the perspective of the environment shown in Part (b) of the figure. The

uptake of compounds are represented by Uw, while the secretion of compounds are

represented by Sx. Hence, variations in the chemical compounds in the environment

Page 28: Characterization of Molecular Communication Based on Cell ...

15

can be modeled as input, whereas variations in the uptake, secretion, and biomass

can be modeled as the output of the whole cell metabolic regulation process.

MAXP 2016

Enzyme Expr. Reg. Channel

16

cN

c1

c3

c2

r1=1

r3=1

r2=1r4=1

r5=0

rM=0Transmitter

Receiver

Channel

c1 c2 c3 cN. . .

Transmitted Signal Channel

Chemical compound 1Chemical compound 2Chemical compound 3Chemical compound 4

Received Signal

r1 r2 r3 rM. . .

Metabolic Network State

Environment Chemical

CompositionEnzyme

ExpressionRegulation

Enzyme Expression Regulation

JN

J1

J3

J2

Figure 3.2: Sketch of the proposed molecular communication system based on cellmetabolism.

Part (a), molecular communication binary encoder abstraction, also shown in Fig-

ure 3.2, abstracts the system into a Transmitter, a Receiver and a Channel. The goal

of this abstraction is to model the enzyme regulation expression as a binary encoding

of the information contained in the chemical composition of the cell’s environment

“Some of this material appears in [38]”. The transmitter represents the environment

surrounding the cell, where the transmitted signal is the set of chemical compounds

present in the environment that are input of the pathways that compose the cell

metabolic network. The channel represents the mechanisms that regulate the expres-

sion of determinate enzymes in function of the chemical compounds in input, and

the receiver represents the cell metabolism, where the received signal is the result-

Page 29: Characterization of Molecular Communication Based on Cell ...

16

ing aforementioned activity (ON/OFF) of the chemical reactions catalyzed by these

enzymes. This abstraction is more formally expressed as

{c1, c2, ...cN}Enzyme Expression−−−−−−−−−−−→

Regulation{r1, r2, ...rM} , (3.1)

where ci is the concentration (number of molecules per unit volume) of the chemi-

cal compound i, N is the number of chemical compounds present in the environment

surrounding the cell and input of the metabolic pathway network, ri is a binary value

equal to 1 if the enzyme-expression-regulated reaction i is ON, and equal to 0 if the

same reaction is OFF, M is the number of enzyme-expression-regulated reactions that

change their state upon variations in the concentrations of input chemical compounds

ci.

ZS 2016 16

U1…T

U1… , S1… , GrTransmitted Signal Channel Received Signalr1 r2 r3 rM. . . Metabolic Reactions UT , SJ

r1=1

r3=1

r2=1

r5=0

rM=0

r7=1

r8=1

rN=1

S1…JBiomass

Growth/No Growth

TransmitterMetabolic Network

Receiver

Metabolic Interactionsw/ Environment

ChannelMetabolic Reactions

Chemical compound 1Chemical compound 2Chemical compound 3Chemical compound 4…

Abstraction of Our Digital Modulator (Stage II)

Figure 3.3: Sketch of the proposed Metabolic Reaction Abstraction.

In addition to the above abstraction, which lets us study the enzyme regulation

to channel information from the cell’s environment, we also need a model to under-

Page 30: Characterization of Molecular Communication Based on Cell ...

17

stand how much information of the internal cell metabolic state can be perceived

from the outside environment. For this, we propose another abstraction model called

digital modulator abstraction (Part b, Figure 3.1), also shown in Figure 3.3 which

models the metabolic network inside the cell as a digital modulator of metabolite

exchange/growth according to the information contained in the metabolic network

state. In this abstraction, the transmitter represents the cell itself, where the trans-

mitted signal is the metabolic state of the cell, which means the ON/OFF activity of

the state-changing enzyme-regulated reactions. The channel represents mechanisms

involved in the metabolic reactions as a function of the chemical compounds within

the cell’s input. The receiver represents i) the cell’s environment, and ii) the biomass

(as a form of cell growth) where the received signal is the variation in the uptake and

biomass, respectively. This is more formally expressed as

{r1, r2, ...rM}Metabolic−−−−−→Reactions

{U1, ...Uk, S1, ...., Sj, Gr} , (3.2)

The channel model of the above mentioned abstraction is shown in Equation (3.2)

where ri is a binary value equal to 1 if the enzyme-expression-regulated reaction i

is ON, and equal to 0 if the same reaction is OFF, M is the number of enzyme-

expression-regulated reactions. Uw is the flux, the velocity of molecule concentration

propagating in space (e.g., from environment to cell), of metabolites uptaken from the

environment, Sx is the flux of metabolites excreted by the cell into its environment

and Growth (Gr) represents the flux of added components to the cell in the form of

biomass.

We can conclude that the system shown in Figure 3.1 illustrates that the cell’s

behavior can be modeled as a transceiver. This is because the cell i) acts as a receiver

when its internal metabolic state changes as it takes as input a signal in the form of

Page 31: Characterization of Molecular Communication Based on Cell ...

18

variations of the compounds in its environment, and ii) the cell acts as a transmitter

when the state changes of its metabolic network reactions act as a signal, which is

transmitted into the output changes in chemical compounds uptake and secretion

from/into environment and in the biomass.

3.2 Steady-State Mutual Information

3.2.1 Stage I

We define the steady-state mutual information I of the molecular communication sys-

tem abstraction (explained in Chapter 3) for the aforementioned Stage I as the amount

of information about the chemical composition of the cell’s environment measured in

bits that a cell is able to represent in the binary state of its enzyme-expression-

regulated metabolic reactions at steady state, after any evolution of the enzyme-

expression regulation channel “Some of this material appears in [38]”. According to

information theory [9], the mutual information for Stage I can be defined as follows:

I({ci}Ni=1 ; {ri}Mi=1) = H({ci}Ni=1)−H({ci}Ni=1 | {ri}Mi=1) , (3.3)

where the input entropy H({ci}Ni=1) can be defined as

H({ci}Ni=1) = −∫P({ci}Ni=1

)log2 P

({ci}Ni=1

)d {ci}Ni=1 , (3.4)

where the integration∫

is performed throughout the possible values that the set of

chemical compound concentrations {ci}Ni=1 can assume. The conditional entropy of

Page 32: Characterization of Molecular Communication Based on Cell ...

19

the input given the output H({ci}Ni=1 | {ri}Mi=1) is then defined as follows:

H({ci}Ni=1| {ri}Mi=1) = −

K∑k=1

P([{ri}Mi=1

]k

)(3.5)∫

P({ci}Ni=1

∣∣∣[{ri}Mi=1

]k

)log2 P

({ci}Ni=1

∣∣∣[{ri}Mi=1

]k

)d {ci}Ni=1 ;

respectively, where K is equal to the total number of different sets of binary values

at the output of the system[{ri}Mi=1

]k

resulting from the all the possible values that

the input chemical compound concentrations {ci}Ni=1 can assume, and P (.) is the

probability distribution of the argument random variable/s. In the aforementioned

definition of mutual information, we are ignoring possible memory in the system, i.e.,

the values in {ri}Mi=1 could depend on the past trajectory of the values of the input

concentrations {ci}Ni=1. This might be the effect of hysteretic behaviors in the gene

regulatory functions, which we currently ignore all the assumptions are in Section 2,

with the justification that many of these mechanisms are even poorly understood in

biology [10].

3.2.2 Stage II

We define the steady-state mutual information I for the aforementioned Stage II

as the amount of information of the internal binary cell metabolic state that can

be perceived from the outside environment through the metabolic-state-modulated

values of the fluxes of uptaken and secreted metabolites, and the biomass (growth).

Page 33: Characterization of Molecular Communication Based on Cell ...

20

According to information theory [9], this can be defined as

I({ri}Mi=1 ; {Ut}Tt=1 , {Sj}Jj=1 , Gr) = H(({ri}Mi=1)−H({ri}Mi=1 | {Ut}Tt=1 , {Sj}Jj=1 , Gr) ,

(3.6)

where the input entropy H({ri}Mi=1) can be defined as

H({ri}Mi=1) = −K∑k=1

P({rik}

Mi=1

)log2 P

({rik}

Mi=1

), (3.7)

where the summation∑

is performed throughout input of the FBA Groups

based on similar binary enzyme-expression-regulated state changing {ri}Mi=1 within cell

metabolism. The conditional entropy of the input given the outputH({ri}Mi=1 | {Ut}Tt=1

, {Sj}Jj=1 , Gr) is then defined as follows:

H({ri}Mi=1| {Ut}Tt=1 , {Sj}Jj=1 ,Gr) = −Q∑

q=1

P

([{Ut}Tt=1 , {Sj}Jj=1 ,Gr

]q

)(3.8)

K∑k=1

P

({rik}

Mi=1

∣∣∣∣[{Ut}Tt=1 , {Sj}Jj=1 ,Gr]q

)log2 P

({rik}

Mi=1

∣∣∣∣[{Ut}Tt=1 , {Sj}Jj=1 ,Gr]q

);

respectively, where Q is equal to the total number of different sets of flux values at

the output of the system {Ui}ki=1 , {Si}ji=1 ,Growth (Gr) resulting from the enzyme-

expression-regulated reactions {ci}Ni=1 within cell metabolism, and P (.) is the proba-

bility distribution of the argument random variable/s.

Page 34: Characterization of Molecular Communication Based on Cell ...

21

Chapter 4

In silico Characterization of the

Mutual Information of Cell

Metabolism through Flux Balance

Analysis

4.1 Estimation of Optimal Enzyme Expression

Regulation through Flux Balance Analysis

Flux Balance Analysis (FBA) is a widely used and computationally efficient math-

ematical method to estimate the chemical reactions that might be active in the

metabolic network, a cell metabolic network, given determinate environmental condi-

tions [34], [5]. FBA uses a set of linear equations and applies optimization techniques

to estimate the metabolic network state that results in the maximum growth rate

under given determinate conditions. As a consequence, FBA can be utilized for es-

Page 35: Characterization of Molecular Communication Based on Cell ...

22

timating differences in optimal metabolic network states of a cell corresponding to

different environmental conditions. As detailed in the following, these estimates are

the basis for our in silico mutual information “Some of this material appears in [38]”.

Through FBA we are able to obtain an estimate of the state {r∗i }Mi=1 of the afore-

mentioned enzyme-expression-regulated chemical reactions that results into an overall

maximum biomass production. The FBA-estimated chemical reaction states {r∗i }Mi=1

are those that maximize the growth of the cell given a chemical composition of the

surrounding environment {ci}Ni=1, and represent the best regulation of these chemical

reactions that the cell might ever achieve. The aforementioned mechanisms of activa-

tion or repression that might be in place for the regulation of enzyme expression have

been most probably acquired through evolution. Although they tend to reach this

optimal solution, they might just realize a subset of the needed reaction state adapta-

tions [11]. The estimation of the optimal enzyme expression regulation through FBA

can be formalized as follows:

{c1, c2, ...cN}Flux Balance−−−−−−−→

Analysis{r∗1, r∗2, ...r∗M} , (4.1)

The FBA computation stems from the construction of a GEnome-scale Model

(GEM) for the organism under analysis.The construction of a GEM greatly benefits

from the availability of cell genome information and the latest advances in bioinfor-

matics techniques [5]. In particular, this is realized by searching for known genes

that encode metabolic enzymes, defined in Section 2.2, which are possibly activating

metabolic chemical reactions. These reactions are described in extensively curated

online catalogs. Subsequently, further chemical reactions are included in the GEM

through comparisons with the genomes and the corresponding known metabolic path-

ways of other similar organisms that have been already extensively studied and an-

Page 36: Characterization of Molecular Communication Based on Cell ...

23

notated. Subsequently, further chemical reactions are included in the GEM through

comparisons with the genomes and the corresponding known metabolic pathways of

other similar organisms that have been already extensively studied and annotated.

Glyoxylateanddicarboxylatemetabolism

Glycolysis/Gluconeogenesis

Metabolicpathways-Bacteroides thetaiotaomicron

(a)(b)

(c)

Figure 4.1: KEGG module of Bacteroides thetaiotaomicron metabolic pathways. (a)Summary of the biological processes shown in the pathway map of Glycolysis / Glu-coneogenesis and Glyoxylate and dicarboxylate metabolism (b) Enlarged fine detailsof a section of a complete metabolic model, (c) Part of the complete KEGG databasepathway maps of Bacteroides thetaiotaomicron.

Figure 4.1 (c) shows a graphical representation of parts of a GEM (on the right)

for the organisms Bacteroides thetaiotaomicron, also referred to as B. theta, used in

our study, which is obtained from the Kyoto Encyclopedia of Genes and Genomes

(KEGG) database [26]. The nodes represent compounds that are inputs/outputs to

the reactions, and edges represent the chemical reactions. Inputs from the environ-

ment are taken by the organism are involved in and are involved in the reactions of

metabolic pathways, resulting in the exchange of fluxes with the environment (uptake

and secretion), or in the production of biomass (growth) to in the exchange of fluxes

Page 37: Characterization of Molecular Communication Based on Cell ...

24

with the environment (uptake and secretion), or in the production of biomass.

The set of possible metabolic reactions inside a GEM can be expressed through

a stoichiometric matrix S, where each row represents a chemical compound and each

column represents a possibly active metabolic reaction in cell metabolism. Figure 4.2

shows an example of how a stoichiometric matrix S can be obtained from a metabolic

network. Figure 4.2. (a) shows a conceptual model of a metabolic network where the

internal chemical reactions are represented by Ri, and the exchange fluxes with the

environment are represented by Ej, as shown in Figure 4.2 (b). The stoichiometric

matrix S in Figure 4.2. (c) is organized in a way that each row corresponds to a

chemical compound in the metabolic network, while each column corresponds the

reaction or flux exchanged with the environment. Each entry of the stoichiometric

matrix S is the stoichiometric coefficient that indicates how many molecules of a

chemical compound, represented by row entry, are consumed (coefficient < 0) or

produced (coefficient > 0) in one of the possible reactions,

R1 R2 R3R4R5 R6 R7 R8 E1 E2 E3 E4

RBiomass

A -1 0 0 0 0 0 0 0 0.1 0 0 0 0

B 0 -1 0 0 0 0 0 0 0 1 0 0 0

C 1 1 -1 0 0 0 0 0 0 0 0 0 0

D 0 0 1 -1 0 0 0 0 0 0 0 0 0

E 0 0 0 1 0 -1 0.8 0 0 0 0 0 -1

F 0 0 0 0 3 0 0 0 0 0 -1 0 0

G 0 0 0 0 0 1 -1 -1 0 0 0 0 0

H 0 0 0 0 0 0 0 3 0 0 0 -1 -1

Aout 0 0 0 0 0 0 0 0 -1 0 0 0 0

Bout 0 0 0 0 0 0 0 0 0 -1 0 0 0

Fout 0 0 0 0 0 0 0 0 0 0 1 0 0

Hout 0 0 0 0 0 0 0 0 0 0 0 1 0

ATP 0 0 -1 3 4 0 0 -1 0 0 0 0 -5

NADH 0 0 0 2 0 0 3 -5 0 0 0 0 0

BIOMASS 0 0 0 0 0 0 0 0 0 0 0 0 1

E1- E4,BIOMASS– ExchangeFlux,R1– R8– InternalReactions

R1 A ---->C

R2 B---->C

R3 C+ATP---->D

R4 D---->E+3ATP+2NADH

R5 E---->3F+4ATP

R6 E---->G

R7 G<---->0.8E+3NADH

R8 G+ATP+5NADH<---->3H

E1 Aout <--->0.1A

E2 Bout <--->B

E3 F<--->Fout

E4 H<--->Hout

RBiomass E+H+5ATP ---->BIOMASS

Reactions

Metabolites

Stoichiometricmatrix,S

ListofReactions

*

vR1vR2vR3vR4vR5vR6vR7vR8vE1vE2vE3vE4

vBiomass

=0

Fluxes,v

(b) (c)

Aout

BIOMASS

ATP

Cytoplasm

Extracellular

Metabolite Cofactor Reversible Irreversible

E1 E2

E3

E4

R1 R2

R3

R4

RBiomass

(a)

Aout

A B

C

D

ATP

NADH

E

G

R6

NADHFR5

Fout

ATP NADH

R8

H Fout

ATPR7

Figure 4.2: From metabolic network to stoichiometric matrix. (a) Conceptual modelof a metabolic network. (b) List of the reactions participating in the metabolicnetwork. (c) Corresponding stoichiometric matrix.

Page 38: Characterization of Molecular Communication Based on Cell ...

25

The FBA solution is expressed in terms of v∗ which is a column vector that

contains the optimal flux of each reaction, defined as the number of molecules per unit

volume and unit time that are consumed/produced by that reaction. It is obtained

through a Linear Program (LP), formalized as follows [34]:

maximize a′v

subject to Sv = 0

vmin ≤v ≤ vmax ,

where a is a column vector that contains the weight coefficients of the fluxes

that the FBA optimizes. In our case, the entries of a are equal to 1 only at the

indexes corresponding to the chemical compounds that are considered part of the

aforementioned biomass, produced by the cell and responsible for cell growth, while

the other entries are equal to 0. The column vectors vmin and vmax constrain the

minimum and maximum flux, respectively, of each corresponding reaction considered

in the FBA, and define the space where the LP searches for the optimal solution.

This constraint based modeling can be explained by the Figure 4.3 [34]:

V3

V2

V1UnconstrainedSolutionSpace

Mass–balanceconstraint

! • $ = 0Bounds

'()* ≤ ' ≤ '(,-

V3

V2

V1Allowable

SolutionSpace

.,-)()/012)3(,44

UsingLinearProgramming

V3

V2

V1

OptimalSolution

(a) (b) (c)

Figure 4.3: Conceptual model of the FBA LP formulation for finding the optimalsolution.

Page 39: Characterization of Molecular Communication Based on Cell ...

26

When growth/flux constraints are applied by the lower and upper bounds vmin and

vmax, in addition to the constraints from the aforementioned stoichiometric matrix

S, the entire solution space (shown in (a)) of the metabolic network can be reduced

to a smaller allowable solution space, shown in (b). Then, optimization is applied

on the objective function a′v to identify a single optimal solution, which is a flux

distribution lying on the edge of the allowable solution space as shown in (c).

The values of vmin and vmax are set to reasonable biological limiting values [34],

with the exception of the reaction corresponding to the uptake of the input chemical

compounds present in the surrounding environment {ci}Ni=1 for which we are estimat-

ing the chemical reaction states {r∗i }Mi=1. This is expressed as follows:

vmin,i = vmax,i = Ji

({ci}Ni=1

), i = 1, ..., N (4.2)

where Ji is in general a function of all the input concentrations {ci}Ni=1 that returns the

flux of input chemical compound i, and depends on the particular method employed by

cells to uptake this chemical compound (e.g. facilitated diffusion or active transport

through the cell membrane as shown in Figure 2.1). The expression of Ji (.) is in

general known from biochemistry literature. As an example, the expressions for the

input glucose Jgl and lactose Jlac fluxes considered in part of the numerical examples

of this thesis are as follows [40]:

Jgl (cgl) =Jmaxgl cgl

Φgl + cgl, (4.3)

Jlac (cgl, clac) =Jmaxlac clac

klac + clac

(1− φgl

Jmaxgl cgl

kgl + cgl

),

where the parameter values can be found in Table 3 of [40].

The estimates of the chemical reaction states {r∗i }Mi=1 are finally computed from the

Page 40: Characterization of Molecular Communication Based on Cell ...

27

optimal flux vector v∗, which is obtained by the FBA given the chemical composition

of the surrounding environment {ci}Ni=1, as follows:

r∗i =

0, if v∗i = 0

1, otherwise, (4.4)

4.2 An Upper Bound Steady-State Mutual

Information of Cell Metabolism

4.2.1 Upper Bound for Stage I

Given the optimal estimates of the chemical reaction states {r∗i }Mi=1 obtained through

the FBA from the knowledge of the cell’s genome for all the values that our input

set of chemical compound concentrations {ci}Ni=1 can assume, we can compute the

following steady-state mutual information “Some of this material appears in [38]”:

I({ci}Ni=1 ; {r∗i }Mi=1) = H({ci}Ni=1)−H({ci}Ni=1 | {r

∗i }

Mi=1) , (4.5)

where H({ci}Ni=1 | {r∗i }Mi=1) is computed through Equation (3.5) by substituting the

chemical reaction states {ri}Mi=1 resulting from the real regulation of the enzyme ex-

pression with the FBA-estimated chemical reaction states {r∗i }Mi=1.

In this thesis, we consider the mutual information in Equation (4.5) computed with

the results of the FBA as an upper bound to the real steady-state mutual information

in Equation (3.3) that we would obtain in reality as a result of the enzyme expression

regulation. This is formalized as follows:

I({ci}Ni=1 ; {r∗i }Mi=1) ≥ I({ci}Ni=1 ; {ri}Mi=1) . (4.6)

Page 41: Characterization of Molecular Communication Based on Cell ...

28

The expression in Equation (4.6) can be proven through the Data Processing

Inequality from information theory [9], which states that the aforementioned inequal-

ity holds true if the steady-state chemical reaction states {ri}Mi=1 given a set {ci}Ni=1

of values for the input concentrations can be probabilistically determined from the

sole knowledge of the chemical reaction states {r∗i }Mi=1, without the need of having

knowledge of the input concentrations. This is expressed as follows [9]:

P({ri}Mi=1

∣∣∣{ci}Ni=1 , {r∗i }

Mi=1

)= P

({ri}Mi=1

∣∣∣{r∗i }Mi=1

). (4.7)

Equation (4.7) can be explained by considering that the chemical reaction states

{r∗i }Mi=1 are those that underlie the optimally regulated cell metabolism that maxi-

mizes the cell growth rate (or biomass production) given a set of values for the input

concentrations {ci}Ni=1. In reality, when subject to the same input concentrations

{ci}Ni=1, a cell reaches the steady-state chemical reaction states {ri}Mi=1, which might

be in general different from the aforementioned optimal states. If these states are

indeed not optimal, the cell will not grow (produce biomass) and reproduce at the

maximum rate possible given the input concentrations {ci}Ni=1. When considering

multiple cells in a population subject to the same input concentrations, if different

cells show different steady-state chemical reaction states (because of cell-cell variabil-

ity), those that have states closer to the optimal states will grow faster, and ultimately

outnumber other cells. As a consequence, cells have evolved gene expression regula-

tion mechanisms, such as those described in Section. 2.3, through which they adapt

their steady-state chemical reaction states as close as possible to optimality given a set

of input concentrations [10]. Given the optimal chemical reaction states {r∗i }Mi=1 and

the knowledge of the gene expression regulation mechanisms in place in a particular

cell species, we are theoretically able to estimate the probability distribution of the

Page 42: Characterization of Molecular Communication Based on Cell ...

29

steady-state chemical reaction states {ri}Mi=1, which are those that best approximate

the optimal states. As a consequence, under the aforementioned assumptions, the

conditional probability of the steady-state chemical reaction states {ri}Mi=1 given the

input concentrations {ci}Ni=1 and chemical reaction states {r∗i }Mi=1 is equal to the same

probability but only conditioned to the chemical reaction states {r∗i }Mi=1, as expressed

in Equation (4.7).

4.2.2 Upper Bound for Stage II

In order to compute the amount of information of the internal optimal binary metabolic

state that can be perceived from the external environment (update/secretion fluxes,

biomass) of the cell, the calculation of the steady-state mutual information for Stage

II, defined in Section 3.2.2, is performed by considering the optimal chemical reac-

tion states {r∗i }Mi=1 as input, and the optimal values of the fluxes of uptaken {U∗t }

Tt=1

and secreted{S∗j}Jj=1

metabolites, and biomass Gr∗. The latter values are obtained

from the FBA solution as the values in the aforementioned optimal flux vector v∗

corresponding to uptake, secretion, or biomass reactions, respectively. This mutual

information is expressed as

I({r∗i }Mi=1 ; {U∗t }

Tt=1 ,

{S∗j}Jj=1

, Gr∗) = (4.8)

H({r∗i }Mi=1)−H({r∗i }

Mi=1 | {U

∗t }

Tt=1 ,

{S∗j}Jj=1

, Gr∗) ,

whereH({r∗i }Mi=1) andH({r∗i }

Mi=1 | {U∗t }

Tt=1 ,

{S∗j}Jj=1

, Gr∗) are computed through Equa-

tions (3.7) and (3.8) by substituting the chemical reaction states {r∗i }Mi=1 and the fluxes

{U∗t }Tt=1,

{S∗j}Jj=1

, and biomass Gr∗ in place of those resulting from the real regula-

tion of cell metabolism. In general, the values computed through the expression in

Equation (4.5) might differ from those from Equation 3.6. Nevertheless, since Stage

Page 43: Characterization of Molecular Communication Based on Cell ...

30

II is in series to Stage I, and the optimal flux vector v∗ can be deterministically com-

puted from the optimal chemical reaction states {r∗i }Mi=1 by LP optimization, we only

need to consider Equation (4.8) to obtain an upper bound to the mutual information

of the overall system.

A detailed description of the computation of Equations (4.5) and (4.8) for the

organisms considered in this work are provided in Chapter 6.

Page 44: Characterization of Molecular Communication Based on Cell ...

31

Chapter 5

Data Generation Workflow

5.1 Data generation

For in silico, we generated simulation data using KBase [25] which is an open source

software and data platform that allows users to upload their data, collaborate by

sharing with other users, perform automated analysis, and publish their results and

conclusions. The simulation being an execution of a static model, allows us to collect

the chemical reactions that occur in the metabolic model, under selected configura-

tions.

We follow a similar data generation process for the three organisms E. coli, B.

theta and M. smithii. As shown in Figure 5.1, first we build an initial metabolic model

for these organisms. To achieve this, we run the Build Metabolic Model app provided

by KBase, which uses a protein phylogeny database to translate both the organ-

ism’s genome into protein sequences. This initial metabolic model is a genome-scale

metabolic network of biochemical reactions reconstructed from functional protein an-

notations derived from biochemistry literature and similarity with other known pro-

tein sequences. These protein functions are then mapped onto biochemical reactions

Page 45: Characterization of Molecular Communication Based on Cell ...

32

in the KEGG model [26]. This initial model is a draft model and only a starting point

because this metabolic model might be missing some crucial reactions and might have

incorrect protein functional annotations in its network that are required to generate

biomass. A total of 121 initial draft models were created for E. coli, one for each

of the 121 configurations. For B. theta and M. smithii, a total of 128 initial draft

models were created, one for each of the 128 configurations. The details on how these

configurations are created will be discussed in detail in the Sections 5.2 and 5.3.

AnnotatedGenome/Organism

GapfillMetabolicModel

Draftmodels

Gapfillmodels

RunFluxBalanceAnalysis

FBAmodels/objective“Growth”

Media,Reactiondatabase Media

Figure 5.1: KBase Workflow for simulation data.

The next step is the gap filling process [20] where a minimal number of bio-

Page 46: Characterization of Molecular Communication Based on Cell ...

33

chemical reactions and compounds are added to the initial metabolic model with the

goal of making the metabolic network able to synthesize the biomass in a specified

medium. For each of the configurations, gap filled models were produced from their

corresponding initial draft models.

Finally the growth rate of an organism (biomass production rate) in a speci-

fied medium can be obtained by running the FBA. FBA simulates how metabolites

flow through the metabolic network of an organism in a specified medium and uses

constraint-based approach discussed in Section 4.1 a to estimate the steady-state

biomass production rate. The biomass information resulting from the FBA is used to

quantify the mutual information parameter which will be discussed in the following

section.

AnnotatedGenome/Organism

GapfillMetabolicModel

Draftmodels

Gapfillmodels

FBAmodels/objective

“Growth”

Media,Reaction

database

Media

RunFluxBalanceAnalysis

CompareFBA

Solutions

Inputfluxconcentration

State-ChangingReaction

cpd00001_c0rxn00001_c_c0

cpd00012_c0

cpd00009_c0

cpd00067_c0

rxn00007_c_c0

rxn00011_c_c0

cpd00011_c0

cpd03049_c0

cpd00020_c0

cpd00056_c0

rxn00018_c_c0

cpd00871_c0

cpd00169_c0

rxn00022_c_c0

rxn00023_c_c0

cpd00338_c0

rxn00029_c_c0

cpd00689_c0

rxn00048_c_c0

cpd02656_c0

cpd00220_c0

cpd02882_c0

rxn00060_c_c0

cpd00013_c0

cpd00755_c0

rxn00065_c_c0

rxn00077_c_c0

cpd00006_c0 rxn00085_c_c0

cpd00023_c0

cpd00005_c0

cpd00024_c0

cpd00053_c0

rxn00086_c_c0

rxn00090_c_c0

cpd00002_c0

rxn00100_c_c0

cpd00655_c0

cpd00008_c0

cpd00010_c0

rxn00102_c_c0

rxn00105_c_c0

cpd00355_c0cpd00003_c0

rxn00113_c_c0

cpd00146_c0

rxn00117_c_c0

cpd00014_c0cpd00062_c0

cpd00018_c0

rxn00121_c_c0

cpd00050_c0

cpd00015_c0

rxn00125_c_c0

cpd00017_c0

cpd00227_c0

cpd00147_c0

rxn00126_c_c0

rxn00138_c_c0

rxn00139_c_c0

cpd00103_c0

cpd00128_c0

rxn00140_c_c0

rxn00141_c_c0

cpd00019_c0

cpd00135_c0

cpd00182_c0

rxn00147_c_c0

cpd00061_c0

rxn00157_c_c0

cpd00022_c0

cpd00047_c0

rxn00171_c_c0

rxn00173_c_c0

cpd00196_c0

rxn00178_c_c0cpd00279_c0

rxn00184_c_c0

rxn00187_c_c0

rxn00193_c_c0

cpd00186_c0

rxn00213_c_c0

cpd00089_c0

cpd00026_c0

cpd00027_c0

rxn00223_c_c0

cpd00082_c0

rxn00224_c_c0

cpd00028_c0

cpd10515_c0

cpd01476_c0

rxn00237_c_c0

cpd00031_c0

cpd00038_c0

cpd00126_c0

rxn00239_c_c0

rxn00250_c_c0

cpd00242_c0

cpd00032_c0

rxn00258_c_c0cpd00070_c0

rxn00260_c_c0

cpd00041_c0

rxn00268_c_c0

rxn00278_c_c0

cpd00035_c0

cpd00004_c0

rxn00283_c_c0

cpd00117_c0

rxn00285_c_c0

cpd00036_c0

cpd00078_c0

rxn00293_c_c0

cpd00037_c0

cpd02611_c0

rxn00299_c_c0

cpd02978_c0

rxn00300_c_c0

cpd00957_c0

rxn00305_c_c0

rxn00313_c_c0

cpd00516_c0

cpd00039_c0

rxn00324_c_c0

rxn00333_c_c0

rxn00337_c_c0

cpd01977_c0

rxn00346_c_c0

cpd00085_c0

rxn00350_c_c0

cpd01017_c0

cpd00042_c0

rxn00364_c_c0

cpd00046_c0

cpd00096_c0

rxn00371_c_c0

rxn00383_c_c0

rxn00392_c_c0

rxn00407_c_c0

cpd00052_c0

rxn00409_c_c0

rxn00410_c_c0

rxn00412_c_c0

rxn00414_c_c0

rxn00416_c_c0

cpd00132_c0

rxn00420_c_c0

rxn00423_c_c0

cpd00054_c0

cpd00722_c0

rxn00438_c_c0

cpd00793_c0

cpd00482_c0

rxn00459_c_c0

rxn00461_c_c0

cpd02820_c0

rxn00465_c_c0

cpd00274_c0

cpd00064_c0

rxn00470_c_c0

cpd00118_c0

rxn00471_c_c0

cpd00129_c0

rxn00474_c_c0cpd00359_c0

cpd00065_c0

rxn00493_c_c0

cpd00066_c0

cpd00143_c0

cpd00029_c0

rxn00507_c_c0

cpd00071_c0

rxn00515_c_c0

cpd00090_c0

cpd00068_c0

rxn00519_c_c0

rxn00527_c_c0

cpd00069_c0

cpd00868_c0

cpd00162_c0

rxn00539_c_c0

rxn00541_c_c0

rxn00542_c_c0

cpd00159_c0

rxn00548_c_c0

cpd00072_c0

cpd00236_c0

rxn00549_c_c0

rxn00555_c_c0

cpd00288_c0

rxn00558_c_c0

cpd00084_c0

rxn00566_c_c0

cpd00239_c0

rxn00599_c_c0

rxn00611_c_c0

cpd00080_c0

cpd00095_c0

rxn00612_c_c0

rxn00623_c_c0

rxn00649_c_c0

rxn00650_c_c0

cpd00033_c0

rxn00680_c_c0

rxn00686_c_c0

cpd00087_c0

cpd00330_c0

rxn00690_c_c0

cpd00201_c0

rxn00692_c_c0

cpd00125_c0

rxn00693_c_c0

cpd00060_c0

cpd00345_c0

rxn00703_c_c0

cpd00179_c0

rxn00704_c_c0

rxn00710_c_c0

cpd00810_c0

cpd00091_c0

rxn00726_c_c0

cpd00216_c0

cpd00093_c0

rxn00737_c_c0

rxn00740_c_c0

cpd00822_c0

cpd00094_c0

rxn00747_c_c0

cpd00102_c0

rxn00770_c_c0

cpd00101_c0

rxn00772_c_c0

cpd00105_c0

rxn00773_c_c0

rxn00775_c_c0

rxn00777_c_c0

cpd00171_c0

rxn00781_c_c0

cpd00203_c0

rxn00782_c_c0

rxn00785_c_c0

cpd00290_c0

rxn00786_c_c0

rxn00789_c_c0

cpd01775_c0

rxn00790_c_c0

cpd01982_c0

rxn00791_c_c0

cpd02642_c0

rxn00798_c_c0

cpd00130_c0

rxn00799_c_c0

cpd00106_c0

cpd02375_c0

rxn00800_c_c0

cpd02152_c0

rxn00802_c_c0

cpd00051_c0

rxn00806_c_c0

cpd00107_c0

cpd00200_c0

rxn00829_c_c0

cpd00113_c0cpd00841_c0

rxn00830_c_c0

cpd00202_c0

rxn00832_c_c0

cpd00114_c0

cpd02884_c0

rxn00834_c_c0

cpd00497_c0

rxn00838_c_c0

rxn00839_c_c0

rxn00851_c_c0

cpd00731_c0

rxn00859_c_c0

rxn00863_c_c0

cpd01324_c0

cpd00119_c0

cpd02498_c0

rxn00898_c_c0

cpd00123_c0rxn00902_c_c0

cpd01646_c0

rxn00903_c_c0

cpd00156_c0

rxn00908_c_c0

rxn00912_c_c0

cpd00712_c0

rxn00914_c_c0

cpd00311_c0

rxn00917_c_c0

rxn00926_c_c0

cpd00226_c0

rxn00927_c_c0

rxn00941_c_c0

rxn00949_c_c0

rxn00952_c_c0

rxn00966_c_c0

cpd00136_c0

rxn00979_c_c0

rxn01000_c_c0

cpd00219_c0

rxn01018_c_c0

cpd00343_c0

rxn01019_c_c0

rxn01054_c_c0

rxn01069_c_c0

cpd00809_c0

cpd00161_c0

cpd00229_c0

rxn01072_c_c0

cpd00007_c0

cpd00025_c0

rxn01100_c_c0

rxn01101_c_c0

rxn01106_c_c0rxn01111_c_c0

rxn01114_c_c0

cpd00258_c0

rxn01116_c_c0

cpd00198_c0

rxn01134_c_c0

rxn01138_c_c0

rxn01151_c_c0

cpd00185_c0

cpd00875_c0

rxn01152_c_c0

rxn01200_c_c0

cpd00238_c0

rxn01207_c_c0

cpd01882_c0

rxn01208_c_c0

cpd02605_c0

rxn01213_c_c0

cpd00283_c0

cpd00932_c0

rxn01255_c_c0

rxn01256_c_c0

rxn01257_c_c0

cpd08210_c0

rxn01258_c_c0

cpd00658_c0

cpd00218_c0

rxn01265_c_c0

cpd00873_c0

rxn01268_c_c0

rxn01269_c_c0

rxn01298_c_c0

cpd00309_c0

rxn01300_c_c0

rxn01301_c_c0

cpd00346_c0

rxn01302_c_c0

rxn01303_c_c0

rxn01304_c_c0

rxn01332_c_c0

cpd02857_c0

rxn01334_c_c0

rxn01343_c_c0

rxn01362_c_c0

cpd00247_c0

rxn01434_c_c0

rxn01451_c_c0

rxn01452_c_c0

rxn01454_c_c0

cpd00292_c0

rxn01465_c_c0

cpd00282_c0

rxn01466_c_c0

cpd00350_c0

rxn01484_c_c0

cpd00293_c0

rxn01485_c_c0

rxn01492_c_c0

cpd00802_c0

rxn01500_c_c0

cpd00332_c0

rxn01505_c_c0

rxn01512_c_c0

cpd00297_c0

cpd00357_c0

rxn01513_c_c0

cpd00298_c0

rxn01519_c_c0

cpd00358_c0

cpd00299_c0

rxn01520_c_c0

rxn01545_c_c0

cpd01217_c0

rxn01547_c_c0

rxn01575_c_c0

cpd00322_c0

cpd00508_c0

rxn01603_c_c0

cpd00683_c0

rxn01607_c_c0

cpd00812_c0

rxn01629_c_c0

cpd02345_c0

rxn01636_c_c0

cpd00342_c0

cpd00477_c0

rxn01637_c_c0

cpd00918_c0

rxn01643_c_c0

rxn01644_c_c0

cpd02120_c0

rxn01649_c_c0

rxn01669_c_c0

rxn01672_c_c0

cpd00356_c0

rxn01675_c_c0

cpd00626_c0

cpd02210_c0

rxn01682_c_c0

rxn01739_c_c0

cpd00383_c0

cpd02030_c0

rxn01740_c_c0

cpd01716_c0

rxn01790_c_c0

cpd00408_c0

rxn01791_c_c0

rxn01792_c_c0

cpd00644_c0

rxn01917_c_c0cpd02552_c0

rxn01934_c_c0

cpd00484_c0

cpd02043_c0

rxn01961_c_c0

cpd00504_c0

rxn01974_c_c0

rxn01997_c_c0

cpd00521_c0

rxn02000_c_c0

cpd00522_c0

rxn02003_c_c0

cpd02113_c0

rxn02008_c_c0

cpd00525_c0

cpd00890_c0

rxn02011_c_c0

cpd02964_c0

rxn02056_c_c0

cpd00557_c0

cpd03426_c0

rxn02154_c_c0

rxn02155_c_c0

rxn02159_c_c0

cpd00641_c0

rxn02160_c_c0

cpd00807_c0

rxn02175_c_c0

cpd00834_c0

rxn02185_c_c0

cpd00668_c0

rxn02186_c_c0

rxn02187_c_c0

cpd02569_c0

rxn02200_c_c0

cpd00443_c0

cpd00954_c0

rxn02212_c_c0

cpd00699_c0

rxn02213_c_c0

rxn02264_c_c0

cpd00774_c0

rxn02269_c_c0

cpd00760_c0

rxn02285_c_c0

cpd00773_c0

rxn02286_c_c0

rxn02288_c_c0

cpd02083_c0

rxn02303_c_c0

cpd00791_c0

rxn02304_c_c0

cpd02654_c0

rxn02305_c_c0

cpd02894_c0

rxn02320_c_c0

cpd00930_c0

rxn02322_c_c0

rxn02341_c_c0

cpd02666_c0

rxn02402_c_c0

cpd02333_c0

rxn02405_c_c0

cpd02546_c0

rxn02465_c_c0

cpd02843_c0

rxn02473_c_c0

rxn02474_c_c0

cpd02720_c0

cpd00931_c0

rxn02475_c_c0

rxn02476_c_c0

rxn02484_c_c0

cpd00939_c0

cpd02775_c0

cpd02961_c0

rxn02504_c_c0

rxn02507_c_c0

cpd00956_c0

rxn02508_c_c0

rxn02774_c_c0

cpd01620_c0

rxn02789_c_c0

cpd01710_c0

rxn02811_c_c0cpd02693_c0

rxn02831_c_c0

cpd01772_c0

cpd02021_c0

rxn02832_c_c0

cpd03451_c0

rxn02834_c_c0cpd01777_c0

rxn02835_c_c0

cpd02979_c0

rxn02895_c_c0

cpd02394_c0

rxn02897_c_c0cpd01997_c0

cpd02904_c0

rxn02898_c_c0

cpd02295_c0

rxn02914_c_c0

rxn02928_c_c0

cpd02465_c0

rxn02929_c_c0

rxn02937_c_c0

cpd02826_c0

cpd02140_c0

rxn02988_c_c0

cpd03470_c0

rxn03004_c_c0

cpd02678_c0

rxn03062_c_c0

rxn03068_c_c0

rxn03080_c_c0

cpd11225_c0

rxn03084_c_c0

rxn03108_c_c0

rxn03130_c_c0

cpd02930_c0

cpd02835_c0

rxn03135_c_c0

cpd02851_c0

cpd02991_c0

cpd02921_c0

rxn03136_c_c0

rxn03137_c_c0

rxn03146_c_c0

cpd02886_c0

cpd03584_c0

rxn03147_c_c0

cpd02893_c0

rxn03150_c_c0

cpd03423_c0

rxn03159_c_c0

cpd03002_c0

rxn03164_c_c0

cpd02968_c0

rxn03167_c_c0

rxn03175_c_c0

rxn03181_c_c0

cpd02993_c0

rxn03182_c_c0

cpd03585_c0

rxn03194_c_c0

cpd00498_c0

rxn03393_c_c0

cpd03443_c0

cpd03444_c0

rxn03395_c_c0

cpd03446_c0

cpd03445_c0

rxn03397_c_c0

cpd03447_c0

cpd03448_c0

rxn03408_c_c0

cpd03494_c0

cpd03495_c0

rxn03435_c_c0

cpd02535_c0

cpd10162_c0

rxn03436_c_c0

rxn03437_c_c0

rxn03439_c_c0

cpd03586_c0

rxn03511_c_c0

cpd03831_c0

cpd03830_c0

rxn03536_c_c0

cpd03919_c0

cpd03918_c0

rxn03537_c_c0

cpd03920_c0

rxn03538_c_c0

cpd00166_c0

cpd02039_c0

rxn03541_c_c0cpd03917_c0

rxn03638_c_c0

rxn03643_c_c0

cpd00055_c0

cpd03583_c0

rxn03644_c_c0

rxn03841_c_c0

rxn03891_c_c0

cpd02590_c0

cpd00906_c0

rxn03893_c_c0

cpd02557_c0

rxn03901_c_c0

cpd02229_c0

cpd00286_c0

rxn03904_c_c0

rxn03909_c_c0

cpd08289_c0

rxn03916_c_c0

cpd04920_c0

rxn03917_c_c0

cpd04918_c0

rxn03918_c_c0

cpd08316_c0

rxn03919_c_c0

rxn03970_c_c0

rxn03971_c_c0

rxn03978_c_c0

rxn04043_c_c0

rxn04139_c_c0

cpd03449_c0

rxn04675_c_c0

rxn04794_c_c0

rxn04954_c_c0

rxn05005_c_c0

cpd11206_c0rxn05006_c_c0

rxn05029_c_c0

cpd03422_c0

cpd00421_c0

rxn05039_c_c0

rxn05040_c_c0

rxn05054_c_c0

rxn05117_c_c0

rxn05119_c_c0

rxn05142_c_c0

rxn05143_c_c0

rxn05144_c_c0

cpd00016_c0

rxn05145_c_c0

cpd00009_e0

rxn05147_c_c0

rxn05148_c_c0

rxn05164_c_c0

rxn05195_c_c0

cpd10516_e0

cpd10516_c0

rxn05201_c_c0

rxn05209_c_c0

cpd00971_e0

cpd00067_e0

cpd00971_c0

rxn05221_c_c0

cpd00129_e0

rxn05229_c_c0

rxn05232_c_c0

cpd11421_c0

cpd00115_c0

cpd11420_c0

rxn05234_c_c0

cpd00241_c0

rxn05235_c_c0

rxn05236_c_c0

rxn05237_c_c0

cpd00084_e0

rxn05289_c_c0

rxn05291_c_c0

cpd00214_c0

rxn05299_c_c0

rxn05315_c_c0

cpd00034_c0

cpd00205_e0

cpd00034_e0

cpd00205_c0

rxn05319_c_c0

cpd00001_e0

rxn05342_c_c0

cpd11484_c0

cpd11491_c0

cpd11468_c0

rxn05345_c_c0

cpd11493_c0

cpd11492_c0

rxn05358_c_c0

cpd11495_c0

rxn05359_c_c0

cpd11496_c0

rxn05360_c_c0

cpd11497_c0

rxn05361_c_c0

cpd11498_c0

cpd11499_c0

rxn05363_c_c0

cpd11500_c0

rxn05364_c_c0

cpd11501_c0

rxn05365_c_c0

cpd11502_c0

rxn05367_c_c0

cpd11504_c0

cpd11503_c0

rxn05368_c_c0

cpd11505_c0

rxn05369_c_c0

cpd11506_c0

rxn05371_c_c0

cpd11508_c0

cpd11507_c0

rxn05372_c_c0

cpd11509_c0

rxn05373_c_c0cpd11510_c0

cpd11511_c0

rxn05375_c_c0

cpd11512_c0

rxn05376_c_c0

cpd11513_c0

rxn05377_c_c0

cpd11514_c0

rxn05379_c_c0

cpd11515_c0

cpd11516_c0

rxn05380_c_c0

cpd11517_c0

rxn05381_c_c0

cpd11518_c0

rxn05383_c_c0

cpd11520_c0

rxn05384_c_c0

cpd11521_c0

rxn05385_c_c0

cpd11522_c0

rxn05386_c_c0

cpd11523_c0

cpd11524_c0

rxn05388_c_c0

cpd11525_c0

rxn05389_c_c0

cpd11526_c0

rxn05390_c_c0

cpd11527_c0

rxn05392_c_c0

cpd11528_c0cpd11529_c0

rxn05393_c_c0

cpd11530_c0

rxn05394_c_c0

cpd11531_c0

rxn05396_c_c0

cpd11532_c0

cpd11533_c0

rxn05397_c_c0

cpd11534_c0

rxn05398_c_c0

cpd11535_c0

rxn05400_c_c0

cpd11537_c0

cpd11536_c0

rxn05401_c_c0

cpd11538_c0

rxn05402_c_c0

cpd11539_c0

rxn05404_c_c0

cpd11541_c0

cpd11540_c0

rxn05405_c_c0

cpd11542_c0

rxn05406_c_c0

cpd11543_c0rxn05433_c_c0

rxn05434_c_c0

rxn05435_c_c0

rxn05436_c_c0

rxn05437_c_c0

rxn05438_c_c0

cpd11519_c0

rxn05439_c_c0

rxn05440_c_c0

rxn05441_c_c0

rxn05442_c_c0

rxn05443_c_c0

rxn05444_c_c0

cpd11544_c0

rxn05452_c_c0

cpd11434_c0

rxn05454_c_c0

cpd11432_c0

cpd00013_e0

rxn05466_c_c0

rxn05467_c_c0

cpd00011_e0

rxn05468_c_c0

cpd00007_e0

rxn05488_c_c0

rxn05514_c_c0

cpd00063_c0

cpd00063_e0

rxn05516_c_c0

cpd01012_c0

cpd01012_e0

rxn05517_c_c0

rxn05528_c_c0

rxn05555_c_c0

cpd10515_e0

cpd00047_e0

rxn05559_c_c0

cpd00027_e0

rxn05573_c_c0

rxn05596_c_c0

rxn05618_c_c0

cpd00030_c0

cpd00030_e0

rxn05651_c_c0

cpd00048_c0

cpd00048_e0

rxn05728_c_c0

rxn05730_c_c0

rxn05733_c_c0

rxn05744_c_c0

cpd03847_c0

rxn05778_c_c0

rxn05779_c_c0

rxn05893_c_c0

rxn05902_c_c0

rxn05938_c_c0

cpd11620_c0

cpd11621_c0

rxn05939_c_c0

rxn05957_c_c0

rxn06022_c_c0

cpd12370_c0

rxn06060_c_c0

rxn06078_c_c0

rxn06591_c_c0

cpd12227_c0

cpd11912_c0

rxn06723_c_c0

rxn06729_c_c0

cpd11466_c0

rxn06848_c_c0

cpd03587_c0

cpd03736_c0

rxn06865_c_c0

rxn06937_c_c0

rxn07441_c_c0

rxn07758_c_c0

cpd15073_c0

rxn07759_c_c0

rxn07760_c_c0

rxn07764_c_c0

cpd04399_c0

rxn07766_c_c0

rxn08126_c_c0

rxn08128_c_c0

cpd15495_c0

cpd01329_c0

rxn08129_c_c0

rxn08311_c_c0

cpd15526_c0

cpd15421_c0

rxn08333_c_c0

cpd15353_c0

rxn08335_c_c0

rxn08336_c_c0

cpd15500_c0

cpd15499_c0

rxn08434_c_c0

rxn08527_c_c0

rxn08528_c_c0cpd15352_c0

rxn08583_c_c0

cpd15469_c0

cpd15479_c0

rxn08618_c_c0

cpd15489_c0

rxn08619_c_c0

cpd15475_c0

rxn08620_c_c0

cpd15477_c0

cpd00100_c0

rxn08669_c_c0

cpd02090_c0

rxn08708_c_c0 cpd15484_c0

cpd15549_c0

rxn08709_c_c0

cpd15550_c0

cpd15488_c0

rxn08710_c_c0

cpd15485_c0

rxn08711_c_c0

rxn08712_c_c0

rxn08713_c_c0

cpd15432_c0

rxn08766_c_c0

cpd11488_c0

rxn08792_c_c0

cpd01080_c0

rxn08801_c_c0

cpd15329_c0

rxn08850_c_c0

cpd15348_c0

rxn08933_c_c0

rxn08935_c_c0

rxn08936_c_c0

rxn08954_c_c0

cpd15492_c0

rxn09016_c_c0

rxn09142_c_c0

cpd15540_c0

rxn09149_c_c0

rxn09177_c_c0

rxn09180_c_c0

cpd15557_c0

rxn09202_c_c0

cpd15533_c0

rxn09210_c_c0

rxn09225_c_c0

rxn09272_c_c0

cpd15560_c0

cpd15561_c0

rxn09296_c_c0

rxn09310_c_c0

cpd15378_c0

rxn09341_c_c0

rxn09345_c_c0

rxn09429_c_c0

rxn09436_c_c0

rxn09449_c_c0

rxn09456_c_c0

cpd15268_c0

rxn09458_c_c0

rxn09467_c_c0

rxn09468_c_c0

rxn09562_c_c0

rxn09997_c_c0

rxn10052_c_c0

rxn10060_c_c0

rxn10094_c_c0rxn10122_c_c0

rxn10199_c_c0

cpd15666_c0

cpd15665_c0

rxn10205_c_c0

cpd15671_c0

rxn10206_c_c0

cpd15672_c0

rxn10214_c_c0

cpd15677_c0

rxn10215_c_c0

cpd15678_c0

rxn10220_c_c0

cpd15683_c0

rxn10221_c_c0

cpd15684_c0

rxn10226_c_c0

cpd15689_c0

rxn10227_c_c0

cpd15690_c0

rxn10232_c_c0

cpd15695_c0

rxn10233_c_c0

cpd15696_c0

rxn10259_c_c0

cpd15716_c0

rxn10260_c_c0

cpd15717_c0

rxn10265_c_c0

cpd15722_c0

rxn10266_c_c0

cpd15723_c0

rxn10336_c_c0

cpd15793_c0

rxn10337_c_c0

cpd15794_c0

rxn10338_c_c0

cpd15795_c0

cpd00099_e0 rxn10473_c_c0

cpd00099_c0

cpd00149_e0

rxn10474_c_c0

cpd00149_c0

rxn10481_c_c0

cpd00058_e0

cpd00058_c0

rxn10541_c_c0

rxn10571_c_c0

cpd00254_e0

cpd00254_c0

rxn10823_c_c0

rxn10867_c_c0

rxn10904_c_c0

rxn10954_c_c0

rxn11132_c_c0

rxn11674_c_c0

rxn11732_c_c0

rxn11946_c_c0

rxn12008_c_c0

rxn12060_c_c0

rxn12188_c_c0

rxn12217_c_c0

rxn12220_c_c0

rxn12224_c_c0

rxn12225_c_c0

rxn12239_c_c0

rxn12510_c_c0

cpd02201_c0

rxn12512_c_c0

rxn13085_c_c0

cpd00264_c0

rxn13150_c_c0

rxn13477_c_c0

rxn13782_c_c0

cpd17041_c0

rxn13783_c_c0

cpd17042_c0

rxn13784_c_c0

cpd17043_c0

jHive

plots

Figure 5.2: KBase workflow to generate silico data.

The flow for the process involved in generating the Mutual Information for Stage I

Page 47: Characterization of Molecular Communication Based on Cell ...

34

and II can be summarized by the figure above. The flow is very similar to the process

explained in Figure 5.1, except that it involves three additional steps at the end. We

now describe these last three steps.

The output of the FBA will be 128 files, for each of the organisms, B. theta

and M. smithii. For E. coli, the output of the FBA will be 121 files. These are FBA

objects that display the growth of the model, reaction fluxes, biomass compounds and

coefficient values, gene IDs, compound fluxes and gene knockout data in a table-based

format. We take these FBA objects and send them to the Compare FBA Solutions

app in KBase to compare flux profiles predicted by the FBA. This app compares

the objective value, flux through each reaction in FBA, uptake, and secretion of

metabolites in each of the FBA outputs. The reaction fluxes inside the FBA outputs

are categorized into “not in model,” “no flux,” “forward flux,” and “reverse flux”

whereas metabolite fluxes are categorized into “not in model,” “no flux,” “uptake,”

and “secretion.”

The results pertaining to reaction fluxes and metabolite fluxes were analyzed

using a script written in MATLAB and python respectively, to generate what we

call a binary state changing matrix. This matrix shows whether the state changing

reactions are ON or OFF for each of the FBAs. Additionally, we also use our binary

state changing matrices to visualize the relation between compounds and reactions,

the details and implications of which will be discussed in the following section.

5.2 In silico experimentation for E. coli

For our study, we initially focused on the E. coli bacterium strain K-12 MG1655,

which is considered one of the golden standard model systems in synthetic biology

labs, and whose genome is completely known [19] “Some of this material appears

Page 48: Characterization of Molecular Communication Based on Cell ...

35

in [38]”. By stemming from this genome, we built the corresponding GEM, and

subsequently performed the FBA by using the KBase (Department of Energy Systems

Biology Knowledgebase) software application suite [25]. To obtain our numerical

results, we developed a model of the external environment containing the known

minimal set of chemical compounds necessary for this E. coli strain to grow, i.e.,

produce biomass, and at the same time allowing the variation of the concentrations

of key compounds that result in changes in the optimal FBA-computed states of

metabolic reactions. In particular, we obtained our numerical results by performing

FBA on every combination of input fluxes of two input compounds D-Glucose and

Lactose ranging from 0 to 100 [mmol/g CDW/hr] with increments of 10 [mmol/g

CDW/hr], for a total of 121 different combinations of input fluxes.

5.3 In silico experimentation for B. theta and

M. smithii

We also studied two other organisms, namely, the B. theta and the M. smithii, both

found in the human gut, with the goal of understanding their individual metabolic

networks and their interactions with each other and the human body [8], [37].

A set of nutritional compounds such as food, nutrients, toxins which are required

as inputs and products of each organism’s metabolism were identified as Independent

Variables. Seven such compounds were selected to obtain ground truth by running

the entire configuration space in the laboratory. These seven compounds which are

hypothesized to impact whether or not the organisms will grow are Glucose, Hematin,

Formale, H2, Vitamin B12, Acetate and Vitamin K. These compounds can either be

present in the solution (ON) or not (OFF). These seven compounds along with the

Page 49: Characterization of Molecular Communication Based on Cell ...

36

base compounds (which should be present always / minimal media) form the input

media for the experimentation. Each of these seven compounds may or may not be

present in the media, giving us a total of 27 = 128 media [37].

The growth of the organisms was studied as the Dependent Variable of the exper-

imentation, which is measured as the biomass dry weight in grams of cells in vitro

and as the sum of flux through the biomass reactions using an FBA analysis [20].

Page 50: Characterization of Molecular Communication Based on Cell ...

37

Chapter 6

Numerical Results

In this section, we present a proof-of-concept numerical example of both abstractions

(Stage I and Stage II) and their analysis method proposed in this thesis. In particular,

we report the results of different combinations of values of the input fluxes of D-

Glucose and Lactose in terms of growth rate of the E. coli bacterium. For this, we

based our environment on the K-12 MG1655 minimal media [13], enriched with metal

tracers common to other two standard media, namely, the Lysogeny Broth (LB) and

the Carbon-D-Glucose media [6]. All compound fluxes contained in vmin and vmax

were set -100 and 100 [mmol per gram cell dry weight per hour] ([mmol/g CDW/hr]),

respectively. On top of the defined media environment, we introduced two other input

compounds, namely, D-Glucose and Lactose, for which we simulated a variation in

their concentration, and consequent corresponding values in vmin and vmax according

to Equation (4.2) and Equation (4.3). Also, we compute the upper bound steady-state

mutual information for Stage I and Stage II for the same organism.

Moreover, we extend our analysis on the two members of human gut microbiota,

the B. theta bacterium, and M. smithii, which is a methanogenic archaeon. The

genome of both these organisms are completely known [41], [47]. We first built the

Page 51: Characterization of Molecular Communication Based on Cell ...

38

GEM model, and performed the FBA analysis using the KBase [25]. For modeling

the external environment, we have to consider the known minimal set of compounds

necessary for both organisms to grow (i.e. produce biomass) while also allowing the

variation of the concentrations of key compounds that result in changes in the optimal

FBA-computed states of metabolic reactions. For this, we based our environment on

the minimal media [24], enriched with seven compounds, namely Carbon-D-Glucose

(G), Hematin (He), Formate (F), H2, V itamin B12 (B12), Acetate (A), and Vitamin

K (Vk), to obtain the ground truth for our study. Each of these seven compounds

may or may not be present in the media, giving us a total of 128 media. Along

with these important compounds, there are other compounds that are common in all

media, namely, Calcium, Chloride ion, Carbon dioxide, Cobalt, Copper, Ferrous ion,

Water, H+, Potassium, L-Cysteine, Magnesium, Manganese, Sodium, Ammonium,

Nickel, Phosphate, Sodium bicarbonate, Sulfate, Zinc, and Ferric ion.

All compound fluxes contained in vmin were set to -100 [mmol per gram cell

dry weight per hour] ([mmol/g CDW/hr]), and vmax values varied between a low of

0.0000037 and max of 46166.8875 [mmol per gram cell dry weight per hour] ([mmol/g

CDW/hr]). We generated the numerical results for both abstractions by performing

FBA on every combination of input fluxes for all seven compounds mentioned earlier

for a total of 128 different combinations.

6.1 Stage I

6.1.1 Numerical Results for E. coli

In Figure 6.1 we show the results of the FBA in terms of growth rate, or equivalently,

output flux of produced biomass, as defined in Section 4.1, computed for different

Page 52: Characterization of Molecular Communication Based on Cell ...

39

Lactose Input Flux [mmol/g CDW/hr] D-Glucose Input Flux [mmol/g CDW/hr]

3

4

100

5

6

80

7

100

8

Gro

wth

(Bio

mas

s O

utpu

t Flu

x [m

mol

/g C

DW

/hr])

9

60 80

10

11

6040

12

4020 200 0

4

5

6

7

8

9

10

11

Varia

tion

in G

row

th

Figure 6.1: Optimal E.Coli K12 MG1655 growth as a function of the input flux ofD-Glucose and Lactose in the environment.

combinations of values of the input fluxes of D-Glucose and Lactose. In these results,

the variation in optimal growth rate, which is dependent on the optimal metabolic

reaction states, as discussed below, varies from a minimum value of 3.061 [mmol/g

CDW/hr] when the fluxes of D-Glucose and Lactose are absent from the environment,

to a maximum value of 11.67 [mmol/g CDW/hr]. These curves show also a saturation

in the optimal growth rate for D-Glucose fluxes on the higher end of the range, and

the minimal value of D-Glucose flux to obtain this saturation varies as function of the

Lactose flux, from a minimum of 30 [mmol/g CDW/hr] to a maximum of 70 [mmol/g

CDW/hr].

In Figure 6.2, for each of the 121 tested combinations of the input fluxes of D-

Glucose and Lactose, one for each column of the matrix, we show the binary values

Page 53: Characterization of Molecular Communication Based on Cell ...

40

20 40 60 80 100 120Input Flux Combination

50

100

150

200

250

Stat

e-C

hang

ing

Rea

ctio

n

Figure 6.2: FBA-estimated binary chemical reaction states {r∗i }Mi=1 for each combina-

tion of D-Glucose and Lactose input fluxes, where white = ON state; black = OFFstate.

of FBA-estimated chemical reaction states {r∗i }Mi=1 as defined in Section 4.1, one for

each column, where the number of metabolic reactions M that show a state change

within the considered combination of input fluxes of D-Glucose and Lactose is equal

to 251.

The computation of the upper bound of the steady-state mutual information is

finally realized by applying the expressions in Equations (4.5), (3.4), and (3.5), tak-

ing into account that the possible combinations of input fluxes of D-Glucose and

Lactose are drawn from a discrete set. For these preliminary results, we make the

assumption that these combinations are equiprobable. As a consequence, the corre-

sponding combinations {ci}Ni=1 = {cgl, clac} computed through Equation (4.3) can be

Page 54: Characterization of Molecular Communication Based on Cell ...

41

as well considered equiprobable with probability density P ({cgl, clac}) = 1/(#of in-

put combinations) = 1/121. The resulting input entropy H({ci}Ni=1), where N = 2, is

then computed through Equation (3.4) by substituting the integral with a summation

over the number of input combinations, which results into log2(121) = 6.92 bits. To

compute the conditional entropy of the input given the output H({ci}Ni=1 | {ri}Mi=1),

where N = 2 and M = 251, we translated Equation (3.5) into the following formula:

H ({cgl, clac}| {r∗i }Mi=1) = −

Y∑y=1

PY

Xy∑x=1

P (x|y) log2 P (x|y) , (6.1)

where Y and PY correspond to the number of times and the probability, respectively,

that a reaction state combination is found more than once in the data shown in Fig-

ure 6.2, Xy is the number of times the reaction state combination y is found in the

data, and P (x|y) is the probability of having a combination of input concentrations

x = {cgl,x, clac,x} given the reaction state y, which we consider as a uniform distribu-

tion in the number of different combinations of input concentrations that have been

found resulting into the same the reaction state y. In our data we found a total of 4

different reaction combinations that are repeated twice, which results in a conditional

entropy of the input given the output H ({cgl, clac}| {r∗i }Mi=1) = 0.53 bits. Finally, the

following value is found for the upper bound of the steady-state mutual information:

I ({cgl, clac}} ; {r∗i }115i=1) = 6.39 bits (6.2)

Page 55: Characterization of Molecular Communication Based on Cell ...

42

6.1.2 Numerical Results for B. theta and M. smithii

20 40 60 80 100 120Input Flux Combination

20

40

60

80

100

120

140

160

180

200

Stat

e-C

hang

ing

Rea

ctio

n

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 6.3: FBA-estimated binary chemical reaction states {r∗i }Mi=1 for each combina-

tion of Glucose, Hematin, Formate, H2, V itaminB12, Acetate, and Vitamin K inputfluxes, where yellow = ON state; violet = OFF state.

Page 56: Characterization of Molecular Communication Based on Cell ...

43

20 40 60 80 100 120Input Flux Combination

50

100

150

200

250

300

350

400

450

500

550

Stat

e-C

hang

ing

Rea

ctio

n

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 6.4: FBA-estimated binary chemical reaction states {r∗i }Mi=1 for each combina-

tion of Glucose, Hematin, Formate, H2, V itaminB12, Acetate, and Vitamin K inputfluxes, where yellow = ON state; violet = OFF state.

In Figure 6.3 and 6.4, for each of the 128 tested combinations of the input fluxes of

seven compounds, one for each column of the matrix, we show the binary values of

FBA-estimated chemical reaction states {r∗i }Mi=1 as defined in Section 4.1. In addition

one for each column, where the number of metabolic reactions M that show a state

change within the considered combination of input fluxes of seven compounds is equal

to 212 and 556 for B. theta and M. smithii, respectively.

The computation of the upper bound of the steady-state mutual information is

finally realized by applying the expressions in Equations (4.5), (3.4), and (3.5), taking

into account that the possible combinations of input fluxes of the seven compounds

are drawn from a discrete set. For these preliminary results, we make the assumption

that these combinations are equiprobable. As a consequence, the corresponding com-

binations {ci}Ni=1 ={cG, cHe, cF , cH2 , cB12,cA,cVk

}computed through Equation (4.3)

can be as well considered equiprobable with probability density

Page 57: Characterization of Molecular Communication Based on Cell ...

44

P({cG, cHe, cF , cH2 , cB12,cA,cVk

})= 1/(#of input combinations) = 1/128. The re-

sulting input entropy H({ci}Ni=1), where N = 7, is then computed through Equa-

tion (3.4) by substituting the integral with a summation over the number of input

combinations, which results into log2(128) = 7 bits.

To compute the conditional entropy of the input for both organisms, given the

output H({ci}Ni=1 | {ri}Mi=1), where for B. theta N = 7 and M = 212, and for M.

smithii N = 7 and M = 556, we translated Equation (3.5) into the following formula:

H ({cG, cHe, cF , cH2 , cB12 , cA, cVk}| {r∗i }

Mi=1) = −

Y∑y=1

PY

Xy∑x=1

P (x|y) log2 P (x|y) , (6.3)

where Y and PY correspond to the number of times and the probability, respec-

tively, that a reaction state combination is found more than once in the data shown

in Figure 6.3 and 6.4. Xy is the number of times the reaction state combination y

is found in the data, and P (x|y) is the probability of having a combination of input

concentrations x = {cG,x, cHe,x, cF,x, cH2,x, cB12,x, cA,x, cVk,x} given the reaction state y,

which we consider as a uniform distribution in the number of different combinations

of input concentrations that have been found resulting into the same reaction state

y.

The experimental data for B. theta shows that there are a total of 114 different

reaction combinations that are repeated more than once, which results in a conditional

entropy of the input given the output H ({cG, cHe, cF , cH2 , cB12 , cA, cVk}| {r∗i }

Mi=1) =

3.6933 bits. Finally, the following value is found for the upper bound of the steady-

state mutual information:

I ({cG, cHe, cF , cH2 , cB12 , cA, cVk}} ; {r∗i }

113i=1) = 3.3068 bits (6.4)

Page 58: Characterization of Molecular Communication Based on Cell ...

45

On the other hand, for M. smithii we found a total of 97 different reaction com-

binations that are repeated more than once, which results in a conditional entropy of

the input given the output H ({cG, cHe, cF , cH2 , cB12 , cA, cVk}| {r∗i }

Mi=1) = 2.4778 bits.

Finally, the following value is found for the upper bound of the steady-state mutual

information:

I ({cG, cHe, cF , cH2 , cB12 , cA, cVk}} ; {r∗i }

136i=1) = 4.5222 bits (6.5)

6.2 Stage II

0 20 40 60 80 100 120 140Input flux combination

0

20

40

60

80

100

120

140

Num

ber o

f mem

bers

in e

ach

grou

p

ABCDEFGHIJKLMN

Number of Groups

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 6.5: 14 FBA groups of B. theta based on similar FBA-estimated chemicalreaction states for each combination of Glucose, Hematin, Formate, H2, V itaminB12,Acetate, and Vitamin K input fluxes.

Page 59: Characterization of Molecular Communication Based on Cell ...

46

0 20 40 60 80 100 120 140Input flux combination

0

20

40

60

80

100

120

140N

umbe

r of m

embe

rs in

eac

h gr

oup

ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAE

Number of Groups

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 6.6: 31 FBA groups of M. smithii based on similar FBA-estimated chemicalreaction states for each combination of Glucose, Hematin, Formate, H2, V itaminB12,Acetate, and Vitamin K input fluxes.

From the matrix shown in Figure 6.5 and 6.6 we grouped the FBAs based on the

similar FBA-estimated chemical reaction states, giving us 14 groups for B. theta and

31 groups for M. smithii. We generate a new matrix, shown in Table 6.1 and 6.2,

where the rows are the groups derived in the previous step and the columns represent

the chemical compounds uptaken, secreted, and the biomass generated during the

metabolism.

FBAs Groups cpd00254_e0 cpd00067_e0 cpd00205_e0 cpd00034_e0 cpd00009_e0cpd00013_e0cpd10515_e0cpd00099_e0cpd00030_e0cpd00012_e0cpd00048_e0cpd00149_e0cpd00073_e0cpd10516_e0cpd00001_e0cpd00011_e0cpd00007_e0cpd11416_c0cpd00058_e0cpd00084_e0cpd00063_e0cpd00027_e0cpd00029_e0 cpd00028_e0 Biomassm4 A 0.00271953 1 0.00271953 0.00271953 0.713146 7.33267 0 0.00271953 0.00271953 0 0.00271953 0.00271953 0 0.0108781 -58.0811 34.002 -0.279739 -0.87827 0.00271953 0.192719 0.00271953 0 0 0 0.87827m36 B 0.00271953 1 0.00271953 0.00271953 0.713146 7.33267 0 0.00271953 0.00271953 0 0.00271953 0.00271953 0 0.0108781 -58.2629 34.002 -0.188847 -0.87827 0.00271953 0.192719 0.00271953 0 0 0 0.87827m2 C 0.000628154 1 0.000628154 0.000628154 0.164721 7.5 0 0.00062815 0.00062815 0 0.00062815 0.00062815 -2.90316 0.00251262 -8.10121 2.53695 -0.0633574 -0.202862 0.00062815 0.044514 0.00062815 0 4.10997 0 0.202862m16 D 0.00271953 1 0.00271953 0.00271953 0.713146 7.33267 0 0.00271953 0.00271953 0 0.00271953 0.00271953 0 0.0108781 -58.092 34.002 -0.2743 -0.87827 0.00271953 0.192719 0.00271953 0 0 0 0.87827m19 E 0.00122742 -2.27388 0.00122742 0.00122742 0.321867 3.36137 0 0.00122742 0.00122742 0 0.0531145 0.00122742 0 0.00490967 -9.75258 -1.17806 -0.224507 -0.396393 0.00122742 0.0350936 0.00122742 2.78 0 0 0.396393m10 F 0.00063984 1 0.00063984 0.00063984 0.167786 7.5 0 0.00063984 0.00063984 0 0.00063984 0.00063984 -2.88996 0.00127968 -8.14871 2.5206 -0.065176 -0.206636 0.00063984 0.0453421 0.00063984 0 4.16285 0.00127968 0.206636m18 G 0.000628154 1 0.000628154 0.000628154 0.164721 7.5 0 0.00062815 0.00062815 0 0.00062815 0.00062815 -2.90316 0.00251262 -8.10121 2.53695 -0.0633574 -0.202862 0.00062815 0.044514 0.00062815 0 4.10997 0 0.202862base H 0.00271953 -9.13362 0.00271953 0.00271953 0.713146 7.33267 0.0081586 0.00271953 0.00271953 0 0.00271953 0.00271953 0 0.00271953 -58.0961 34.002 -0.272261 -0.87827 0.00271953 0.192719 0.00271953 0 0 0 0.87827m6 I 0.000636923 1 0.000636923 0.000636923 0.167021 7.5 0 0.00063692 0.00063692 0 0.00063692 0.00063692 -2.89133 0.00254769 -8.16305 2.51856 -0.0629681 -0.205694 0.00063692 0.0451354 0.00063692 0 4.16808 0 0.205694m12 J 0.00273441 -94.565 0.00273441 0.00273441 0.717046 7.35089 0 0.00273441 0.00273441 0 0.00273441 0.00273441 0 0.00546881 -58.0378 34.002 -0.286738 -0.883073 0.00273441 0.193773 0.00273441 0 0 0.00546881 0.883073m8 K 0.00273441 -57.6517 0.00273441 0.00273441 0.717046 7.35089 0 0.00273441 0.00273441 0 0.00273441 0.00273441 0 0.00546881 -58.0433 34.002 -0.284003 -0.883073 0.00273441 0.193773 0.00273441 0 0 0.00546881 0.883073m14 L 0.000630991 1 0.000630991 0.000630991 0.165465 7.5 0 0.00063099 0.00063099 0 0.00063099 0.00063099 -2.90185 0.00126198 -8.14297 2.46415 0 -0.203778 0.00063099 0.044715 0.00063099 0 4.14201 0.00126198 0.203778m1 M 0.00122742 -1.29133 0.00122742 0.00122742 1.50444 3.76143 0 0.00122742 0.00122742 -0.591287 0.0531145 0.00122742 -0.20003 0.00490967 -10.5439 -0.978028 -0.224507 -0.396393 0.00122742 0.0350936 0.00122742 2.78 0 0 0.396393m9 N 0.00123469 -2.28441 0.00123469 0.00123469 1.43357 5.71184 0.00123469 0.00123469 0.00123469 -0.554898 0.053429 0.00123469 -1.17022 0.00123469 -11.4657 0 -0.231083 -0.398741 0.00123469 0.0353014 0.00123469 2.78 0 0.00246937 0.398741

Table 6.1: FBA group matrix of B. theta where the rows are the 14 groups and thecolumns represent the chemical compounds uptaken, secreted, and the generationbiomass.

Page 60: Characterization of Molecular Communication Based on Cell ...

47

FBAs Groups cpd00129_e0cpd00001_e0cpd00058_e0cpd00013_e0cpd10515_e0cpd00009_e0cpd00067_e0cpd00254_e0cpd00205_e0cpd00034_e0cpd00047_e0cpd10516_e0cpd00226_e0cpd00007_e0cpd00011_e0cpd00133_e0cpd00028_e0cpd00030_e0cpd00048_e0cpd00149_e0cpd00099_e0cpd00063_e0cpd11416_c0cpd00084_e0cpd00268_e0cpd00239_e0cpd00027_e0cpd00075_e0cpd00029_e0cpd01414_e0cpd00119_e0biomassm7 A 0 -11.1671 0.001292 3.48092 0.003875 0.338695 -1.27254 0.001292 0.001292 0.001292 1.72326 0.001292 0 -0.00306 -2.25316 0 0 0.001292 0.001292 0.001292 0.001292 0.001292 -0.41711 0.091528 0 0 2.78 0 0 0 0 0.417114m10 B 0 0 0.000897 -0.71661 -0.01731 0.235182 -0.49248 0.000897 0.000897 0.000897 -2.55938 0.000897 -0.00538 -0.04686 2.3803 -0.00855 0.02 0.000897 0 0.000897 0.000897 0.000897 -0.28963 2.8 0 0 0 0 0.935237 -0.68389 0.118615 0.289634m38 C -0.45462 -1.43648 0.000502 0.010865 0.001506 0.131653 -0.70568 0.000502 0.000502 0.000502 0.498827 0.000502 -0.2419 0 0.968163 0 0 0.000502 0.000502 0.000502 0.000502 0.000502 -0.16214 2.8 -1.38221 0 0 0 0 0 0 0.162135m40 D -0.07453 -0.19099 0.000479 -0.71213 -0.01856 0.125625 0.200737 0.000479 0.000479 0.000479 0 0.000479 -0.00287 0 -0.14048 0 0.02 0.000479 0.000479 0.000479 0.000479 0.000479 -0.15471 1.85314 0 -0.77133 0 0 0 -0.26196 0.063359 0.154711m64 E -0.03319 -1.16681 0.000402 -0.63719 0.001206 0.105416 -1.23626 0.000402 0.000402 0.000402 0 0.000402 0 -0.019 -0.06883 -0.50886 0 0.000402 0.000402 0.000402 0.000402 0.000402 -0.12982 2.8 -0.74707 -1.27737 0 0 0 0 0 0.129823m69 F -0.10483 -13.0678 0.001341 3.71889 0.004023 0.35165 1 0.001341 0.001341 0.001341 4.21521 0.001341 0 -0.00469 -3.60326 0 0 0.001341 0.001341 0.001341 0.001341 0.001341 -0.43307 0.095028 0 0 2.78 0 0 0 0 0.433068m87 G -0.10483 -13.0678 0.001341 3.71889 0.004023 0.35165 1 0.001341 0.001341 0.001341 4.21521 0.001341 0 -0.00469 -3.60326 0 0 0.001341 0.001341 0.001341 0.001341 0.001341 -0.43307 0.095028 0 0 2.78 0 0 0 0 0.433068m114 H 0 -2.33928 0.000948 7.5 0.002844 0.248589 1 0.000948 0.000948 0.000948 -20.2616 0.000948 -0.02756 -0.11 23.7909 0 0 0.000948 0.000948 0.000948 0.000948 0.000948 -0.30615 0.067178 0 0 0 -4.8349 4.23105 0 0 0.306145m118 I -0.22488 0 0.000567 -0.49215 0.0017 0.148594 -1.48104 0.000567 0.000567 0.000567 -0.8061 0.000567 -0.12891 0 1.38098 0 0 0.000567 0.000567 0.000567 0.000567 0.000567 -0.183 2.8 -1.37992 0 0 0 0 0 0 0.182998m124 J -0.12002 -0.95444 0.000316 -0.40997 -0.01905 0.082913 1 0.000316 0.000316 0.000316 1.54337 0.000316 -0.0019 -0.06447 -1.51317 0 0.02 0.000316 0.000316 0.000316 0.000316 0.000316 -0.10211 1.25593 -0.08143 -1.07066 0 0 0 0 0.025395 0.102111m0 K -0.03319 -1.16681 0.000402 -0.63719 0.001206 0.105416 -1.23626 0.000402 0.000402 0.000402 0 0.000402 0 -0.019 -0.06883 -0.50886 0 0.000402 0.000402 0.000402 0.000402 0.000402 -0.12982 2.8 -0.74707 -1.27737 0 0 0 0 0 0.129823m1 L 0 -12.271 0.000804 4.34652 0.002411 0.210743 -4.04467 0.000804 0.000804 0.000804 0 0.000804 -0.02336 -0.03478 -0.25276 -1.04358 0 0.000804 0.000804 0.000804 0.000804 0.000804 -0.25954 0.05695 0 0 2.78 0 0 0 0 0.259537m20 M -0.04365 -1.54442 0.000529 -0.83799 0.001586 0.138636 1 0.000529 0.000529 0.000529 1.97463 0.000529 0 -0.0242 -2.06515 -0.01523 0 0.000529 0.000529 0.000529 0.000529 0.000529 -0.17074 2.3744 -0.32574 -1.68546 0 0 0 0 0 0.170735m21 N 0 -11.1671 0.001292 3.48092 0.003875 0.338695 -1.27254 0.001292 0.001292 0.001292 1.72326 0.001292 0 -0.00306 -2.25316 0 0 0.001292 0.001292 0.001292 0.001292 0.001292 -0.41711 0.091528 0 0 2.78 0 0 0 0 0.417114m57 O 0 -12.8631 0.00073 4.62859 0.00146 0.19142 -4.3515 0.00073 0.00073 0.00073 0 0.00073 -0.06619 -0.03305 0.008086 -1.22776 0.00073 0.00073 0.00073 0.00073 0.00073 0.00073 -0.23574 0.051729 0 0 2.78 0 0 0 0.018686 0.23574m5 P 0 -11.1671 0.001292 3.48092 0.003875 0.338695 -1.27254 0.001292 0.001292 0.001292 1.72326 0.001292 0 -0.00306 -2.25316 0 0 0.001292 0.001292 0.001292 0.001292 0.001292 -0.41711 0.091528 0 0 2.78 0 0 0 0 0.417114m23 Q 0 -11.1671 0.001292 3.48092 0.003875 0.338695 -1.27254 0.001292 0.001292 0.001292 1.72326 0.001292 0 -0.00306 -2.25316 0 0 0.001292 0.001292 0.001292 0.001292 0.001292 -0.41711 0.091528 0 0 2.78 0 0 0 0 0.417114m16 R -0.03319 -1.16681 0.000402 -0.63719 0.001206 0.105416 -1.23626 0.000402 0.000402 0.000402 0 0.000402 0 -0.019 -0.06883 -0.50886 0 0.000402 0.000402 0.000402 0.000402 0.000402 -0.12982 2.8 -0.74707 -1.27737 0 0 0 0 0 0.129823m37 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0m4 T -0.04365 -1.54442 0.000529 -0.83799 0.001586 0.138636 1 0.000529 0.000529 0.000529 1.97463 0.000529 0 -0.0242 -2.06515 -0.01523 0 0.000529 0.000529 0.000529 0.000529 0.000529 -0.17074 2.3744 -0.32574 -1.68546 0 0 0 0 0 0.170735m6 U -0.22488 0 0.000567 -0.49215 0.0017 0.148594 -1.48104 0.000567 0.000567 0.000567 -0.8061 0.000567 -0.12891 0 1.38098 0 0 0.000567 0.000567 0.000567 0.000567 0.000567 -0.183 2.8 -1.37992 0 0 0 0 0 0 0.182998m9 V 0 -12.8631 0.00073 4.62859 0.00146 0.19142 -4.3515 0.00073 0.00073 0.00073 0 0.00073 -0.06619 -0.03305 0.008086 -1.22776 0.00073 0.00073 0.00073 0.00073 0.00073 0.00073 -0.23574 0.051729 0 0 2.78 0 0 0 0.018686 0.23574m14 W -0.09988 -0.43458 0.000318 -0.114 -0.01905 0.083269 0.404991 0.000318 0.000318 0.000318 0.303422 0.000318 0 0 0 0 0.02 0.000318 0.000318 0.000318 0.000318 0.000318 -0.10255 0.987782 -0.48264 0 0 0 0.270944 0 0.008128 0.102548m2 X 0 -2.33928 0.000948 7.5 0.002844 0.248589 1 0.000948 0.000948 0.000948 -20.2616 0.000948 -0.02756 -0.11 23.7909 0 0 0.000948 0.000948 0.000948 0.000948 0.000948 -0.30615 0.067178 0 0 0 -4.8349 4.23105 0 0 0.306145m8 Y -0.07453 -0.19099 0.000479 -0.71213 -0.01856 0.125625 0.200737 0.000479 0.000479 0.000479 0 0.000479 -0.00287 0 -0.14048 0 0.02 0.000479 0.000479 0.000479 0.000479 0.000479 -0.15471 1.85314 0 -0.77133 0 0 0 -0.26196 0.063359 0.154711m12 Z -0.12002 -0.95444 0.000316 -0.40997 -0.01905 0.082913 1 0.000316 0.000316 0.000316 1.54337 0.000316 -0.0019 -0.06447 -1.51317 0 0.02 0.000316 0.000316 0.000316 0.000316 0.000316 -0.10211 1.25593 -0.08143 -1.07066 0 0 0 0 0.025395 0.102111m26 AA 0 0 0.000897 -0.71661 -0.01731 0.235182 -0.49248 0.000897 0.000897 0.000897 -2.55938 0.000897 -0.00538 -0.04686 2.3803 -0.00855 0.02 0.000897 0 0.000897 0.000897 0.000897 -0.28963 2.8 0 0 0 0 0.935237 -0.68389 0.118615 0.289634m11 AB 0 -12.8631 0.00073 4.62859 0.00146 0.19142 -4.3515 0.00073 0.00073 0.00073 0 0.00073 -0.06619 -0.03305 0.008086 -1.22776 0.00073 0.00073 0.00073 0.00073 0.00073 0.00073 -0.23574 0.051729 0 0 2.78 0 0 0 0.018686 0.23574m13 AC 0 -12.1561 0.001307 3.89189 0.002615 0.342826 -1.08361 0.001307 0.001307 0.001307 2.31201 0.001307 -0.11854 -0.00261 -2.29753 0 0.001307 0.001307 0.001307 0.001307 0.001307 0.001307 -0.4222 0.092644 0 0 2.78 0 0 0 0.033465 0.422201m31 AD 0 -12.1535 0.001307 3.89189 0.002615 0.342826 -1.08623 0.001307 0.001307 0.001307 2.3094 0.001307 -0.11854 -0.00392 -2.29492 0 0.001307 0.001307 0.001307 0.001307 0.001307 0.001307 -0.4222 0.092644 0 0 2.78 0 0 0 0.033465 0.422201m3 AE 0 -12.271 0.000804 4.34652 0.002411 0.210743 -4.04467 0.000804 0.000804 0.000804 0 0.000804 -0.02336 -0.03478 -0.25276 -1.04358 0 0.000804 0.000804 0.000804 0.000804 0.000804 -0.25954 0.05695 0 0 2.78 0 0 0 0 0.259537

Table 6.2: FBA group matrix of M. smithii where the rows are the 31 groups andthe columns represent the chemical compounds uptaken, secreted, and the generatedbiomass.

6.2.1 Upper bound of steady-state mutual information over

internal metabolic state changing reactions with

respect to biomass only

We computed the upper bound of the steady-state mutual information over the groups

of internal state changing reactions with respect to biomass by applying the ex-

pressions in Equations (4.8), (3.7), and (3.8). The corresponding groups {gi}Ki=1 ={r∗1k , ..., rMk

∗}Ki=1

. Here {gi}Ki=1 represents the list of groups, which are 14 for B. theta

and 31 for M. smithii. These groups can be considered equiprobable with probability

density P({{gi}Ki=1

})= 1/(#of input combinations). This value is 1/14 for B. theta

and 1/31 for M. smithii. The resulting input entropy H({gi}Ki=1), where K = 14 for

B. theta and K = 31 for M. smithii, is then computed through Equation (3.7) by

substituting the integral with a summation over the number of groups, which results

into log2(14) = 3.8074 bits and log2(31) = 4.9542 bits for B. theta and M. smithii

respectively.

To compute the conditional entropy of the input of internal state changing reac-

tions for both organisms B. theta and M. smithii given the output H({gi}Ki=1 |Gr),

Page 61: Characterization of Molecular Communication Based on Cell ...

48

where for B. theta K = 14 and M = 8 and for M. smithii K = 31 and M = 15, we

use the same formula shown in Equation 3.8 over only biomass, where M is number

of groups based on similar biomass flux values shown in the tables above. The con-

ditional entropy for B. theta was found to be 1 bit and M. smithii was found to be

1.1455 bits.

The information loss in function of the input state changing reactions within inter-

nal cell metabolism and output biomass is then be calculated as a difference of input

entropy and conditional entropy of the input given the output, which was found to

be 2.8074 bits for B. theta and 3.8087 bits for M. smithii.

6.2.2 Upper bound of steady-state mutual information over

internal metabolic state changing reactions with

respect to uptake and secretion of compounds and

biomass

We further group the FBA groups based on the similar biomass values. This gave us

8 groups for B. theta and 15 groups for M. smithii. As we described in Sections 3.2.2

and 4.2.2 we use the same equations to compute the upper bound of the steady-

state mutual information with respect to chemical compounds uptaken, secreted and

biomass flux values.

The corresponding groups are {gi}Ki=1 ={r∗1k , ..., rMk

∗}Ki=1

. Here {gi}Ki=1 represents

the list of groups, which are 14 for B. theta and 31 M. smithii. The resulting input

entropy H({gi}Ki=1), where K = 14 and 31, is then computed through Equation (3.4)

by substituting the integral with a summation over the number of input combinations,

which results into log2(14) = 3.8074 bits for B. theta and log2(31) = 4.9542 bits for

Page 62: Characterization of Molecular Communication Based on Cell ...

49

M. smithii. To compute the conditional entropy of the input for both organisms B.

theta and M. smithii given the output H({gi}Ki=1 | {Uk}Kk=1 , {Sj}Kj=1 , Gr), where for

B. theta K = 14 and M = 13, and for M. smithii K = 31 and M = 16, we use the

same formula shown in Equation 3.8 over uptaken compounds, secreted compounds,

and biomass with the only change that M is number of groups based on similar

compounds secreted, uptaken and biomass flux values shown in the tables above.

The conditional entropy for B. theta was found to be 0.1429 bits and M. smithii was

found to be 1.0810 bits.

The information loss in the function of input chemical compounds and output

as uptaken compounds, secreted compounds, and biomass could be calculated as a

difference of input entropy and the conditional entropy of the input given the output,

which was found to be 3.6645 bits for B. theta and 3.8732 bits for M. smithii.

6.2.3 Upper bound of steady-state mutual information over

seven input compounds with respect to uptake and

secretion of compounds and biomass

We further group the FBA groups based on the similar biomass values. This gave us

8 groups for B. theta and 15 groups for M. smithii. As we described in Sections 3.2.2

and 4.2.2 we use the same equations to compute the upper bound of the steady-state

mutual information over seven input chemical compounds with respect to uptaken,

secreted and biomass flux values.

The corresponding groups are {gi}Ki=1 ={r∗1k , ..., rMk

∗}Ki=1

. Here {gi}Ki=1 repre-

sents the list of groups, which are 14 for B. theta and 31 M. smithii. The resulting

input entropy H({gi}Ki=1), where K = 14 and 31, is then computed through Equa-

tion (3.4) by substituting the integral with a summation over the number of input

Page 63: Characterization of Molecular Communication Based on Cell ...

50

combinations, which results into log2(128) = 7 bits. To compute the conditional

entropy of the input for both organisms B. theta and M. smithii given the output

H({gi}Ki=1 | {Uk}Kk=1 , {Sj}Kj=1 , Gr), where for B. theta K = 14 and M = 8, and for M.

smithii K = 31 and M = 15, we use the same formula shown in Equation 3.8 over

uptaken compounds, secreted compounds, and biomass with the only change that M

is number of groups based on similar compounds secreted, uptaken and biomass flux

values shown in the tables above. The conditional entropy for B. theta was found to

be 4.2644 bits and M. smithii was found to be 4.3802 bits.

The information loss in the function of input chemical compounds and output

as uptaken compounds, secreted compounds, and biomass could be calculated as a

difference of input entropy and the conditional entropy of the input given the output,

which was found to be 2.7356 bits for B. theta and 2.6198 bits for M. smithii.

Page 64: Characterization of Molecular Communication Based on Cell ...

51

6.3 Visualization

Figure 6.7: A hive plot for the FBA group F is shown in the figure. The reactionsare placed on the Z axis, the reactants on the X axis and the products on the Y axis.Further the External compounds are placed higher on the X and Y axes than theInternal compounds.

Hive plots [27] were utilized to visualize the complex network of reactions taking

place within the cell. In a hive plot the nodes of a network are arranged along a

predetermined number of axes. The hive plot shown is a representation of the FBA

group for M. smithii. We obtained 31 FBA-estimated chemical reaction states groups

and labeled them A-AE as shown in Table 6.2. It was determined that three axes

Page 65: Characterization of Molecular Communication Based on Cell ...

52

were sufficient to represent the entire set of data available. The first axis consists of

all the state changing reactions present across all the FBA groups. The second and

third axes represent the same set of compounds arranged in the same order in both

the axes. However, the second axes represents the compounds which are products of

a reaction while the third axes represents the compounds which are reactants in a

reaction. This differentiation helps to better see the difference between multiple FBA

groups. Also, the compounds that are external (uptaken or secreted) to the cell are

arranged at higher positions on the axes when compared to the internal compounds.

The Software Jhive 3.0 [22] utilized to generate the hive plots. Jhive is imple-

mented in Java and therefore can be used on any operating system that supports

Java. Jhive can also generate differential hive plots which allows the comparison of

two different hive plots with ease. Figures show the comparison of FBA groups F, G

and Z. While group F and G have the same value of production of biomass, group Z

has the least production of biomass.

Page 66: Characterization of Molecular Communication Based on Cell ...

53

Figure 6.8: Shows differential hive plots of F vs G and F vs Z. The groups F and G inF vs G hive plot has the same biomass whereas, the groups F and Z in F vz Z hive plothave the least and highest biomass respectively. When a reaction is present in F andabsent in G or Z the reaction is represented along with its links to the compounds.When a reaction is present in the other groups but absent in group F the reaction isshown as a node not connected to any other compounds.

The comparison shows that groups F and G differ by a few reactions while F and

Z vary by more reactions. The figure 6.9 represents the same FBA group as the first

figure but as a network diagram instead. The hive plots show clearly that there is still

internal change in the ones with same biomass and much more (but expected) change

when the biomass differs. The software Gephi [17] was used to plot the network di-

agram. The size of a node is proportional to the number of links to that node. The

network nodes represent reactions as well as compounds. It is difficult to represent

the large set of data as a network diagram. Hive plots provide a way to arrange the

nodes based on predetermined classification.

Visualization of cell metabolic network helps in observing the changes due to dif-

ferent combinations of FBAs. By stemming from the aforementioned abstraction, we

Page 67: Characterization of Molecular Communication Based on Cell ...

54

Figure 6.9: Shows the state changing reactions represented as a network diagram.The size of a node is proportional to the number of links connecting to or from it.

presented an in silico computation of the aforementioned upper bound to the mutual

information by using the FBA metabolic simulation technique based on the knowl-

edge of the genome of three different sample organisms. This allowed to estimate the

information capacity for Stage I and Stage II, and ultimately, the flow of information

measured in bits through cell metabolism.

Page 68: Characterization of Molecular Communication Based on Cell ...

55

Chapter 7

Conclusions and Future Work

In this thesis, we have introduced a method to obtain an upper bound to the steady-

state mutual information of a communication system based on cell metabolism and

its regulation. This upper bound stands as theoretical limit of the ability of a cell

to internally represent the information contained in the chemical composition of the

external environment, and to convey this information to experimentally observable

parameters underlying its interactions with the environment. For this, we introduced

an abstraction of cell metabolism based on the theory of molecular communication

systems. In this abstraction, the cell metabolism is modeled by two subsequent stages.

Stage I models the regulation of the chemical reaction activity in cell metabolism as a

binary encoder of the external concentration of chemical compounds. Stage II models

the cell metabolic interaction with the external environment as a digital modulator

of the state of the binary metabolic reactions into the metabolite uptake/secretion

and the growth of the cell itself.

The abstraction and analysis method developed in this thesis will contribute to the

characterization of organisms in biology by introducing a novel point of view, based on

communication engineering abstractions, to reason about and quantify how organisms

Page 69: Characterization of Molecular Communication Based on Cell ...

56

process information. In particular, we envision that our in silico computation can be

utilized to understand how organisms can be fine tuned and controlled by properly

varying the chemical composition of their living environment. In the long run, this

approach will potentially help the design of techniques to control functionalities in

cells engineered through genetic circuits [32]. Additionally, we provided a sample

visualization of the metabolic network parameters used to help in computation of the

mutual information based on Hive plots.

Future work will be focused on a thorough modeling and evaluation of this molec-

ular communication system, including models of the noise source and the dynamic

behavior of metabolic regulation using Dynamic Flux balance Analysis [21], includ-

ing the investigation of its information theoretical capacity. We will also focus on

developing a model to study how to extract information from lab data and compare

it to the optimal (upper bound) solution, and set the basis for the design of the afore-

mentioned techniques to operate a fine-tuned control on the cell behavior based on

information transmission through its metabolism.

Page 70: Characterization of Molecular Communication Based on Cell ...

57

Bibliography

[1] I. F. Akyildiz, J. M. Jornet, and M. Pierobon. Nanonetworks: A new frontier in

communications. Communications of the ACMs, 54(11):84–89, November 2011.

[2] I. F. Akyildiz, M. Pierobon, S. Balasubramaniam, and Y. Koucheryavy. The in-

ternet of bio-nano things. IEEE Communications Magazine, 53(3):32–40, March

2015.

[3] Uri Alon. An introduction to systems biology: design principles of biological

circuits. CRC press, 2006.

[4] G. Aminian, H. Arjmandi, A. Gohari, M.N. Kenari, and U. Mitra. Capacity of

lti-poisson channel for diffusion based molecular communication. In In Proc. of

2015 IEEE International Conference on Communications (ICC), June 2015.

[5] Gino JE Baart and Dirk E Martens. Genome-scale metabolic models: recon-

struction and analysis. Neisseria meningitidis: Advanced Methods and Protocols,

pages 107–126, 2012.

[6] G. Bertani. Lysogeny at mid-twentieth century: P1, P2, and other experimental

systems. Journal of Bacteriology, 186(3):595–6008, 2004.

[7] N Campbell and J Reece. Biology 7th edition, ap, 2005.

Page 71: Characterization of Molecular Communication Based on Cell ...

58

[8] Mikaela Cashman, Jennie L. Catlett, Myra B. Cohen, Nicole Buan, Zahmeeth

Sakkaff, Massimiliano Pierobon, and Christine Kelley. Sampling and Inference

in Configurable Biological Systems: A Software Testing Perspective. Technical

Report TR-UNL-CSE-2016-0007, University of Nebraska-Lincoln, 2016.

[9] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory, 2nd

Edition. Wiley, 2006.

[10] E. H. Davidson. The Regulatory Genome: Gene Regulatory Networks In Devel-

opment And Evolution. Academic Press, Elsevier, 2006.

[11] Eric H Davidson. The regulatory genome: gene regulatory networks in develop-

ment and evolution. Academic press, 2010.

[12] Jon Dobson. Remote control of cellular behaviour with magnetic nanoparticles.

Nature Nanotechnology, 3:139–143, 2008.

[13] EcoCyc. Escherichia coli K-12 substr. MG1655 Growth Medium: M9

medium with 2% glycerol. http://biocyc.org/ECOLI/NEW-IMAGE?type=

Growth-Media\&object=MIX0-59. [Online; accessed 13-March-2016].

[14] A. Einolghozati, M. Sardari, and F. Fekri. Capacity of diffusion-based molecular

communication with ligand receptors. In IEEE Information Theory Workshop

(ITW), October 2011.

[15] Francesco Fabris. Shannon information theory and molecular biology. Journal

of Interdisciplinary Mathematics, 12(1):41–87, 2009.

[16] L. Felicetti, M. Femminella, G. Reali, T. Nakano, and A.V. Vasilakos. Tcp-like

molecular communications. IEEE Journal on Selected Areas in Communications,

32(12):2354–2367, 2014.

Page 72: Characterization of Molecular Communication Based on Cell ...

59

[17] Gephi. The Open Graph Viz Platform. https://gephi.org/. [Online; accessed

23 September-2016].

[18] Emanuel Goncalves, Joachim Bucher, Anke Ryll, Jens Niklas, Klaus Mauch, Stef-

fen Klamt, Miguel Rocha, and Julio Saez-Rodriguez. Bridging the layers: towards

integration of signal transduction, regulation and metabolism into mathematical

models. Molecular BioSystems, 9(7):1576–1583, 2013.

[19] Koji Hayashi, Naoki Morooka, Yoshihiro Yamamoto, Katsutoshi Fujita, Katsumi

Isono, Sunju Choi, Eiichi Ohtsubo, Tomoya Baba, Barry L Wanner, Hirotada

Mor, and Takashi Horiuchi. Highly accurate genome sequences of Escherichia

coli K-12 strains MG1655 and W3110. Mol Syst Biol., 2, February 2006.

[20] Christopher S Henry, Matthew DeJongh, Aaron A Best, Paul M Frybarger, Ben

Linsay, and Rick L Stevens. High-throughput generation, optimization and anal-

ysis of genome-scale metabolic models. Nature biotechnology, 28(9):977–982,

2010.

[21] K Hoffner, SM Harwood, and PI Barton. A reliable simulator for dynamic flux

balance analysis. Biotechnology and bioengineering, 110(3):792–802, 2013.

[22] jhive. jhive - A Java GUI for Hive Plots. https://www.bcgsc.ca/wiki/

display/jhive/home. [Online; accessed 23 September-2016].

[23] L. J Kahl and D. Endy. A survey of enabling technologies in synthetic biology.

Journal of Biological Engineering, 7(1):13, May 2013.

[24] Andrew L Kau, Philip P Ahern, Nicholas W Griffin, Andrew L Goodman, and

Jeffrey I Gordon. Human nutrition, the gut microbiome and the immune system.

Nature, 474(7351):327–336, 2011.

Page 73: Characterization of Molecular Communication Based on Cell ...

60

[25] KBase. Department of Energy Systems Biology Knowledgebase (KBase). http:

//kbase.us. [Online; accessed 22-September-2016].

[26] KEGG. Kyoto encyclopedia of genes and genomes (KEGG). http://www.

genome.jp/kegg/. [Online; accessed 22-September-2016].

[27] Martin Krzywinski, Inanc Birol, Steven JM Jones, and Marco A Marra. Hive

plotsrational approach to visualizing networks. Briefings in bioinformatics,

13(5):627–644, 2012.

[28] Wendell A. Lim. The promise of optogenetics in cell biology: interrogating

molecular circuits in space and time. Nature Reviews, 11:393–403, 2010.

[29] Diego Barcena Menendez, Vivek Raj Senthivel, and Mark Isalan. Sender-receiver

systems and applying information theory for quantitative synthetic biology. Curr

Opin Biotechnol, 31C:101–107, March 2015.

[30] Christian M. Metallo and Matthew G. Vander Heiden. Understanding metabolic

regulation and its influence on cell physiology. Molecular Cell, 49:388–398, Febru-

ary 2013.

[31] Reza Mosayebi, Hamidreza Arjmandi, Amin Gohari, Masoumeh Nasiri-Kenari,

and Urbashi Mitra. Receivers for diffusion-based molecular communication: Ex-

ploiting memory and sampling rate. IEEE Journal on Selected Areas in Com-

munications, 32(12):2368–2380, December 2014.

[32] Chris J. Myers. Engineering genetic circuits. Chapman & Hall/CRC, Mathe-

matical and Computational Biology Series, 2009.

[33] D. L. Nelson and M. M. Cox. Lehninger Principles of Biochemistry, chapter

12.2, pages 425–429. W. H. Freeman, 2005.

Page 74: Characterization of Molecular Communication Based on Cell ...

61

[34] Jeffrey D Orth, Ines Thiele, and Bernhard Ø Palsson. What is flux balance

analysis? Nature biotechnology, 28(3):245–248, 2010.

[35] Stephen Payne and Lingchong You. Engineered cell-cell communication and its

applications. Adv Biochem Eng Biotechnol, 146:97–121, 2014.

[36] M. Pierobon and I. F. Akyildiz. Capacity of a diffusion-based molecular commu-

nication system with channel memory and molecular noise. IEEE Transactions

on Information Theory, 59(2):942–954, February 2013.

[37] Massimiliano Pierobon, Myra B. Cohen, Nicole Buan, and Christine Kelley.

SCIM: Sampling, Characterization, Inference and Modeling of Biological Consor-

tia. Technical Report TR-UNL-CSE-2015-0002, University of Nebraska-Lincoln,

2015.

[38] Massimiliano Pierobon, Zahmeeth Sakkaff, Jennie L Catlett, and Nicole R Buan.

Mutual information upper bound of molecular communication based on cell

metabolism. In 2016 IEEE 17th International Workshop on Signal Processing

Advances in Wireless Communications (SPAWC), pages 1–6. IEEE, 2016.

[39] Alex Rhee, Raymond Cheong, and Andre Levchenko. The application of infor-

mation theory to biochemical signaling systems. Phys Biol., 9(4):045011, August

2012.

[40] Moises Santillan. Bistable behavior in a model of the lac operon in escherichia

coli with variable growth rate. Biophysical journal, 94(6):2065–2081, 2008.

[41] Anna Sheydina, Ruth Y Eberhardt, Daniel J Rigden, Yuanyuan Chang, Zhanwen

Li, Christian C Zmasek, Herbert L Axelrod, and Adam Godzik. Structural

genomics analysis of uncharacterized protein families overrepresented in human

Page 75: Characterization of Molecular Communication Based on Cell ...

62

gut bacteria identifies a novel glycoside hydrolase. BMC bioinformatics, 15(1):1,

2014.

[42] Orkun S Soyer, Marcel Salathe, and Sebastian Bonhoeffer. Signal transduction

networks: topology, response and biochemical processes. Journal of Theoretical

Biology, 238(2):416–425, 2006.

[43] K.V. Srinivas, A.W. Eckford, and R.S. Adve. Molecular communication in fluid

media: The additive inverse gaussian noise channel. IEEE Transactions on In-

formation Theory, 58(7):4678–4692, 2012.

[44] Gasper Tkacik, Curtis G Callan Jr, and William Bialek. Information capacity

of genetic regulatory elements. Physical Review E, 78(1):011910, 2008.

[45] Jared E Toettcher, Christopher A Voigt, Orion D Weiner, and Wendell A Lim.

The promise of optogenetics in cell biology: interrogating molecular circuits in

space and time. Nature Methods, 8:35–38, 2011.

[46] Alejandro F Villaverde and Julio R Banga. Reverse engineering and identification

in systems biology: strategies, perspectives and challenges. Journal of The Royal

Society Interface, 11(91):20130505, 2014.

[47] M Zhou, Y-H Chung, KA Beauchemin, L Holtshausen, M Oba, TA McAllister,

and LL Guan. Relationship between rumen methanogens and methane produc-

tion in dairy cows fed diets supplemented with a feed enzyme additive. Journal

of applied microbiology, 111(5):1148–1158, 2011.