Download - Why bacteria run Linux while eukaryotes run Windows? Sergei Maslov Brookhaven National Laboratory New York.

Why bacteria run Linux while eukaryotes run

Windows?

Sergei MaslovBrookhaven National Laboratory

New York

2

Physical vs. Biological Laws Physical Laws are often discovered

by finding simple common explanation for very different phenomena

Newton’s Law: Apples fall to the ground Planets revolve around the Sun

Discovery of Biological Laws is slowed down by us having cookie-cutter explanation in terms of natural selection:

Drawing from Facebook group: Trust me, I'm a "Biologist"'

Genes encoded in bacterial genomes

Packages installed on Linux computers

~

Complex systems have many components Genes (Bacteria) Software packages (Linux OS)

Components do not work alone: they need to be assembled to work

In individual systems only a subset of components is installed Genome (Bacteria) – collection of

genes Computer (Linux OS) – collection of

software packages Components have vastly

different frequencies of installation

Justin Pollard, http://www.designboom.com

IKEA kits have many components

Justin Pollard, http://www.designboom.com

They need to be assembled to work

Different frequencies of use

vs

Common Rare

What determines the frequency of installation/use of a

gene/package?

Popularity: AKA preferential attachment Frequency ~ self-amplifying popularity Relevant for social systems: WWW links,

facebook friendships, scientific citations Functional role:

Frequency ~ breadth or importance of the functional role

Relevant for biological and technological systems where selection adjusts undeserved popularity

Empirical data on component frequencies

Bacterial genomes (eggnog.embl.de): 500 sequenced prokaryotic genomes 44,000 Orthologous Gene families

Linux packages (popcon.ubuntu.com): 200,000 Linux packages installed on 2,000,000 individual computers

Binary tables: component is either present or not in a given system

Frequency distributions

P(f)~ f-1.5 except the top √N “universal” components with f~1

Cloud

ShellCore

ORFans

TY Pang, S. Maslov, PNAS (2013)

How to quantify functional importance?

We want to check Frequency ~ Importance

Usefulness=Importance ~ Component is needed for proper functioning of other components

Dependency network A B means A depends on B for its function Formalized for Linux software packages For metabolic enzymes given by upstream-

downstream positions in pathways Frequency ~ dependency degree, Kdep

Kdep = the total number of components that directly or indirectly depend on the selected one

13TY Pang, S. Maslov, PNAS (2013)

Correlation coefficient ~0.4 for both Linux and genesCould be improved by using weighted dependency

degree

Frequency is positively correlated with functional importance


Warm-up: tree-like metabolic network

Kdep=5

Kdep=15

TCA cycle


Dependency degree distribution on a critical branching tree

P(K)~K-1.5 for a critical branching tree

Paradox: Kmax-0.5 ~ 1/N Kmax=N2>N

Answer: parent tree size imposes a cutoff:there will be √N “core” nodes with Kmax=N present in almost all systems (ribosomal genes

or core metabolic enzymes)

Need a new model: in a tree D=1, while in real systems D~2>1

Bottom-down model of dependency network evolution

Components added gradually over evolutionary time

New component directly depends on D previously existing components selected randomly

Versions: D is drawn from some distribution

same as above Recent components are preferentially

selectedcitations

There is a fixed probability to connect to anypreviously existing componentsfood webs

18

• p(t,T) –probability that component added at time T

directly or indirectly depends on one added at time t

20

Kdep and Kout degree distributions

Kdep decreases layer number

Linux Model with D=2


Zipf plot for Kdep distributions

Metabolic enzymesvs

Model

Linuxvs

Model


Frequency distributions

P(f)~ f-1.5 except the top √N “universal” components with f~1

Shell

Core

ORFans

Cloud


What experiments does P(f) help to interpret?

Pan-genome of E. coli strains

M Touchon et al. PLoS Genetics (2009)

Metagenomes

The Human Microbiome Project Consortium, Nature (2012)

27

Pan-genome scaling

Pan-genome of all bacteria

Slope=-0.4 predictions of the toolbox model (-0.5)

P. LapierreJP Gogarten TIG 2009

(# of genes in pan-genome) ~ (# of sequenced genomes)0.5

(# of new genes added to pan-genome) ~ (# of sequenced genomes)-0.5

Bacterial genome evolution happens in cooperation with

phages

+ =

Comparative genomics of E. coliimplicates phages for BitTorrent

Phage capacity: 20kbOther strains up to

40kb

K-12 to B comparison

1kb: gene length

Phage-Bacteria Infection NetworkData from Flores et al 2011

experiments by Moebus,Nattkemper,1981

WWW from AT&T website circa 1996 visualized by Mark Newman

Why eukaryotes run windows? Dependency network = reuse of

components Bacteria do not keep redundant genes

after HGT Linux developers rely on previous efforts Pros: smaller genomes, open source,

economies of scale Cons: less specialized, potentially unstable,

“dependency hell” Eukaryotes are like Windows or Mac OS

X Keep redundant components Proprietary software

Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009)

# of genes

# o

f p

ath

ways

(or

their

reg

ula

tors

)

101

102

103

104

105

100

101

102

103

104

105

# of installed packages

# o

f se

lect

ed p

acka

ges

100

102

104

1.6

1.7

1.8

Linux data

slope 1.7

Nselected packages ~ Ninstalled packages1.7

Software packages for Linux

35

Collaborators: Tin Yau Pang, Stony Brook University

Support:

Office of Biological and Environmental Research

Thank you!