Why bacteria run Linux while eukaryotes run
Windows?
Sergei MaslovBrookhaven National Laboratory
New York
2
Physical vs. Biological Laws Physical Laws are often discovered
by finding simple common explanation for very different phenomena
Newton’s Law: Apples fall to the ground Planets revolve around the Sun
Discovery of Biological Laws is slowed down by us having cookie-cutter explanation in terms of natural selection:
Drawing from Facebook group: Trust me, I'm a "Biologist"'
Genes encoded in bacterial genomes
Packages installed on Linux computers
~
Complex systems have many components Genes (Bacteria) Software packages (Linux OS)
Components do not work alone: they need to be assembled to work
In individual systems only a subset of components is installed Genome (Bacteria) – collection of
genes Computer (Linux OS) – collection of
software packages Components have vastly
different frequencies of installation
Justin Pollard, http://www.designboom.com
IKEA kits have many components
Justin Pollard, http://www.designboom.com
They need to be assembled to work
Different frequencies of use
vs
Common Rare
What determines the frequency of installation/use of a
gene/package?
Popularity: AKA preferential attachment Frequency ~ self-amplifying popularity Relevant for social systems: WWW links,
facebook friendships, scientific citations Functional role:
Frequency ~ breadth or importance of the functional role
Relevant for biological and technological systems where selection adjusts undeserved popularity
Empirical data on component frequencies
Bacterial genomes (eggnog.embl.de): 500 sequenced prokaryotic genomes 44,000 Orthologous Gene families
Linux packages (popcon.ubuntu.com): 200,000 Linux packages installed on 2,000,000 individual computers
Binary tables: component is either present or not in a given system
Frequency distributions
P(f)~ f-1.5 except the top √N “universal” components with f~1
Cloud
ShellCore
ORFans
TY Pang, S. Maslov, PNAS (2013)
How to quantify functional importance?
We want to check Frequency ~ Importance
Usefulness=Importance ~ Component is needed for proper functioning of other components
Dependency network A B means A depends on B for its function Formalized for Linux software packages For metabolic enzymes given by upstream-
downstream positions in pathways Frequency ~ dependency degree, Kdep
Kdep = the total number of components that directly or indirectly depend on the selected one
13TY Pang, S. Maslov, PNAS (2013)
Correlation coefficient ~0.4 for both Linux and genesCould be improved by using weighted dependency
degree
Frequency is positively correlated with functional importance
TY Pang, S. Maslov, PNAS (2013)
Warm-up: tree-like metabolic network
Kdep=5
Kdep=15
TCA cycle
TY Pang, S. Maslov, PNAS (2013)
Dependency degree distribution on a critical branching tree
P(K)~K-1.5 for a critical branching tree
Paradox: Kmax-0.5 ~ 1/N Kmax=N2>N
Answer: parent tree size imposes a cutoff:there will be √N “core” nodes with Kmax=N present in almost all systems (ribosomal genes
or core metabolic enzymes)
Need a new model: in a tree D=1, while in real systems D~2>1
Bottom-down model of dependency network evolution
Components added gradually over evolutionary time
New component directly depends on D previously existing components selected randomly
Versions: D is drawn from some distribution
same as above Recent components are preferentially
selectedcitations
There is a fixed probability to connect to anypreviously existing componentsfood webs
18
• p(t,T) –probability that component added at time T
directly or indirectly depends on one added at time t
19
20
Kdep and Kout degree distributions
Kdep decreases layer number
Linux Model with D=2
TY Pang, S. Maslov, PNAS (2013)
Zipf plot for Kdep distributions
Metabolic enzymesvs
Model
Linuxvs
Model
TY Pang, S. Maslov, PNAS (2013)
Frequency distributions
P(f)~ f-1.5 except the top √N “universal” components with f~1
Shell
Core
ORFans
Cloud
TY Pang, S. Maslov, PNAS (2013)
What experiments does P(f) help to interpret?
Pan-genome of E. coli strains
M Touchon et al. PLoS Genetics (2009)
Metagenomes
The Human Microbiome Project Consortium, Nature (2012)
27
Pan-genome scaling
Pan-genome of all bacteria
Slope=-0.4 predictions of the toolbox model (-0.5)
P. LapierreJP Gogarten TIG 2009
(# of genes in pan-genome) ~ (# of sequenced genomes)0.5
(# of new genes added to pan-genome) ~ (# of sequenced genomes)-0.5
Bacterial genome evolution happens in cooperation with
phages
+ =
Comparative genomics of E. coliimplicates phages for BitTorrent
Phage capacity: 20kbOther strains up to
40kb
K-12 to B comparison
1kb: gene length
Phage-Bacteria Infection NetworkData from Flores et al 2011
experiments by Moebus,Nattkemper,1981
WWW from AT&T website circa 1996 visualized by Mark Newman
Why eukaryotes run windows? Dependency network = reuse of
components Bacteria do not keep redundant genes
after HGT Linux developers rely on previous efforts Pros: smaller genomes, open source,
economies of scale Cons: less specialized, potentially unstable,
“dependency hell” Eukaryotes are like Windows or Mac OS
X Keep redundant components Proprietary software
Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009)
# of genes
# o
f p
ath
ways
(or
their
reg
ula
tors
)
101
102
103
104
105
100
101
102
103
104
105
# of installed packages
# o
f se
lect
ed p
acka
ges
100
102
104
1.6
1.7
1.8
Linux data
slope 1.7
Nselected packages ~ Ninstalled packages1.7
Software packages for Linux
35
Collaborators: Tin Yau Pang, Stony Brook University
Support:
Office of Biological and Environmental Research
Thank you!
Top Related