Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand
Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center...
-
Upload
barrie-patrick -
Category
Documents
-
view
223 -
download
0
Transcript of Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center...
Evolution of regulatory interactions in bacteria
Mikhail GelfandResearch and Training Center “Bioinformatics”,
Institute for Information Transmission Problems, RASMoscow, Russia
Singapore, 17-18 July 2006
Comparative genomics of regulation• Why
– Functional annotation of genes– Metabolic modeling– Practical applications in genetic engineering, drug targeting etc.
• How– Close genomes: phylogenetic footprinting.
Regulatory sites are seen as conservation islands in alignments of gene upstream regions
– Distant genomes: consistency filtering. Candidate sites in one genome may be unreliable, but independent occurrence upstream of orthologous genes in many genomes yields reliable predictions
• Caveats– Presense of (predicted) binding sites does not immediately imply functional regulation– Operon structure– Need to verify presence of orthologous transcription factors in the studied genomes– Orthologous factors may have different binding motifs– One functional system may be regulated by different factors within and between
genomes• Many genomes
– Taxon-specific regulation– Evolution
• individual sites• transcription-fator families• transcription factors and their binding motifs• simple and complex regulatory systems
How it works: Two simple examples
• Biotin regulator of alpha-proteobacteria
• Universal regulator of ribonucleotide reductases: reconstruction of the regulatory system and the mechanism of regulation
BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing
Profile 2: Gram-negative bacteriaProfile 1: Gram-positive bacteria, Archaea
BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing
Profile 2: Gram-negative bacteriaProfile 1: Gram-positive bacteria, Archaea
BirA of alpha-proteobacteria: no DNA-binding domain
Identification of the candidate regulator (BioR) in alpha-
proteobacteria
1. Candidate binding sites: similar palindromes upstream of biotin biosynthesis and transport genes in different genomes
TTATAGATAATTATCTATAATTATAGATAgTTATCTATAATTATCTATAATTATAGATAgTTATCTATAATcATATATtATcATAGATAgTTATCTATAATTATCTATAATTATCTATtATTATCTAcAATTATCTATAATTATCTATAATTATCTATAATcATAGATtAcTATAGATAATTATCTAcAA
1. 2. Positional clustering:
candidate transcription factor from the GntR family is often found in the same loci (black arrows)
3. Phyletic patterns: phyletic distribution of candidate sites (red cirsles) exactly coincides with the phyletic distribution of the candidate regulator
4. Autoregulation: in many cases there are candidate sites upstream of the bioR gene itself
Conserved signal upstream of nrd genes
Identification of the candidate regulator by the analysis of phyletic patterns
• COG1327: the only COG with exactly the same phylogenetic pattern as the signal– “large scale” on the level of major taxa– “small scale” within major taxa:
• absent in small parasites among alpha- and gamma-proteobacteria
• absent in Desulfovibrio spp. among delta-proteobacteria
• absent in Nostoc sp. among cyanobacteria
• absent in Oenococcus and Leuconostoc among Firmicutes
• present only in Treponema denticola among four spirochetes
COG1327 “Predicted transcriptional regulator, consists of a Zn-ribbon and ATP-cone domains”: regulator of the riboflavin pathway?
Additional evidence – 1
• nrdR is sometimes clustered with nrd genes or with replication genes dnaB, dnaI, polA
Additional evidence – 2
• In some genomes, candidate NrdR-binding sites are found upstream of other replication-related genes– dNTP salvage– topoisomerase I,
replication initiator dnaA, chromosome partitioning, DNA helicase II
Multiple sites (nrd genes): FNR, DnaA, NrdR
Mode of regulation
• Repressor (overlaps with promoters)• Co-operative binding:
– most sites occur in tandem (> 90% cases)– the distance between the copies (centers of
palindromes) equals an integer number of DNA turns:• mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns• 21 bp (2 turns) in Vibrio spp.• 41-42 bp (4 turns) in some Firmicutes
• experimental confirmation in Streptomyces (Borovok et al., 2004)
Evolutionary processes that shape regulatory systems
• Expansion and contraction of regulons• Duplications of regulators with or without
regulated loci• Loss of regulators with or without
regulated loci• Re-assortment of regulators and structural
genes• … especially in complex systems• Horizontal transfer
Loss of regulators, and cryptic sites
Loss of the RbsR in Y. pestis (ABC-transporter also is lost)
Start codon of rbsD
RbsR binding site
Regulon expansion: how FruR has become CRA
icdA
aceA
aceB
aceEF
pckA
ppsApykF
adhE
gpmApgk
tpiA
gapApfkAfbp
FructosefruKfruBA
eda
eddepd
Glucose
ptsHI-crr
Mannose
manXYZ
mtlDmtlAMannitol
Gamma-proteobacteria
Common ancestor of Enterobacteriales
icdA
aceA
aceB
aceEF
pckA
ppsApykF
adhE
gpmApgk
tpiA
gapApfkAfbp
FructosefruKfruBA
eda
eddepd
Glucose
ptsHI-crr
Mannose
manXYZ
mtlDmtlAMannitol
Gamma-proteobacteriaEnterobacteriales
Common ancestor of Escherichia and Salmonella
icdA
aceA
aceB
aceEF
pckA
ppsApykF
adhE
gpmApgk
tpiA
gapApfkAfbp
FructosefruKfruBA
eda
eddepd
Glucose
ptsHI-crr
Mannose
manXYZ
mtlDmtlAMannitol
Gamma-proteobacteriaEnterobacterialesE. coli and Salmonella spp.
Trehalose/maltose catabolism, alpha-proteobacteria
Duplicated LacI-family regulators: lineage-specific
post-duplication loss
The binding signals are very similar (the blue branch is somewhat different: to avoid cross-recognition?)
Utilization of an unknown galactoside, gamma-proteobacteria
Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup)
Yersinia and Klebsiella: two regulons, GalR (not shown, includes genes galK and galT) and Laci-X
Erwinia: one regulon, GalR
Utilization of maltose/maltodextrin, Firmicutes
Two different ABC transporters (shades of red)
PTS (pink)
Glucoside hydrolases (shades of green)
Two regulators (black and grey)
Modularity of the functional subsystem
Two different ABC systems
Three hydrolases in one operon (E. faecalis) or separately
Changes of regulation
Displacement: invasion of a regulator from a different subfamily (horizontal transfer
from a related species?) – blue sites
Orthologous TFs with completely different regulons
(alpha-proteobaceria and Xanthomonadales)
Catabolism of gluconate, proteobacteria
extreme variability of regulation of “marginal” regulon members
γ
Pseudomonas
spp
.
β
Combined regulatory network for iron homeostasis genes in -proteobacteria
RirA IrrFeS heme
RirA
degraded
FurFe
Fur
Iron uptake systems
Siderophoreuptake
Fe / Feuptake Transcription
factors
2+ 3+
Iron storage ferritins
FeS synthesis
Heme synthesis
Iron-requiring enzymes
[iron cofactor]
IscR
Irr
[- Fe] [+Fe]
[+Fe][- Fe]
[+Fe][ Fe]-
FeS
FeS statusof cell
The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.
Rhizobiales
Bradyrhizobiaceae
Rhizobiaceae
Rhodo-bacterales
Hyphomonadaceae
Rhodo-bacteraceae
Rickettsiales
Rhodo-spirillales
Sphingomo-nadales
- pro
teo
bacte
ria
Organism Irr MntR
Sinorhizobium meliloti
Rhizobium leguminosarum
Rhizobium etli
Agrobacterium tumefaciens
Mesorhizobium loti
Mesorhizobium sp. BNC1
Brucella melitensis
Bartonella quintana and spp.
Bradyrhizobium japonicum
Rhodopseudomonas palustris
Nitrobacter hamburgensis
Nitrobacter winogradskyi
Rhodobacter capsulatus
Rhodobacter sphaeroides
Silicibacter sp. TM1040
Silicibacter pomeroyi
Jannaschia sp.CC51
Rhodobacterales bacterium HTCC2654
Roseobacter sp. MED193
Roseovarius nubinhibens ISM
Roseovarius sp.217
Loktanella vestfoldensis SKA53
Sulfitobacter sp. EE-36
Oceanicola batsensis HTCC2597
Oceanicaulis alexandrii HTCC2633
Caulobacter crescentu s
Parvularcula bermudensis HTCC2503
Erythrobacter litoralis
Novosphingobium aromaticivorans
Sphinopyxis alaskensis g RB2256
Zymomonas mobilis
Gluconobacter oxydans
Rhodospirillum rubrum
Magnetospirillum magneticum
Pelagibacter ubique HTCC1002
SM +
MUR /
FUR RirA IscR
RL
RHE
AGR
ML
MBNC
BME
BQ
BJ
RPA
Nham
Nwi
RC
Rsph
STM
S PO
Jann
RB2654
MED193
ISM
ROS217
SKA53
EE36
OB2597
OA2633
CC
PB2503
ELI
Saro
Sala
ZM
GOX
Rrub
Amb
Abb.
PU1002
+ +- -
+ + +- -
+ + +- -
+ + +- -
+ + -
+ + +- -
+ + +- -
+ + +- -
+ + - -
+
+
+
-
-
+ + - --
+ + - --
+ + - --
+
+
+ ++- ++ ++ - +
+ ++ - +
+ ++ - +
+ + -
+ ++ - +
+ ++ - +
+ + - +
+ ++ - +
+ + - +
+ + - +
+ + - +
+ - +
#?
#?
#?
#?#?
- -
+ - +- -
+ - +- -
+ - +- -
+ - +- -
+ - +- -
+ - +- -
+ +- -
+ - +- -
+ - +- -
- +-
+
+
+
+
Group
Caulobacterales
Parvularculales
Rickettsia and Ehrlichia species - +- --
+ +SAR11 cluster
A.
B.
C.
D.
Fe and Mn regulons
Distribution of Irr,
Fur/Mur, MntR,
RirA, and IscR regulons
in α-proteobacteria
#?' in RirA column denotesthe absence of the rirA gene in an unfinished genomic sequence and the presence of candidate RirA-binding sites upstream of the iron uptake genes.
Phylogenetic tree of the Fur family of transcription factors in -proteobacteria - I
Fur in - and - proteobacteria
Fur in - proteobacteria Fur in Firmicutes
in proteobacteria
Fur
MBNC03003593
RB2654 19538AGR C 620
RL mur
Nwi 0013RPA0450
BJ furROS217 18337
Jann 1799SPO2477
STM1w01000993MED193 22541
OB2597 02997SKA53 03101Rsph03000505ISM 15430
GOX0771ZM01411
Saro02001148Sala 1452
ELI1325OA2633 10204
PB2503 04877CC0057
Rrub02001143Amb1009Amb4460
SM murMBNC03003179
BQ fur2BMEI0375
Mesorhizobium sp. BNC1 (I)
Sinorhizobium meliloti
Bartonella quintana
Rhodopseudomonas palustris
Bradyrhizobium japonicum
Caulobacter crescentus
Zmomonas mobilisy
Rhodobacter sphaeroides
Silicibacter sp. TM1040
Silicibacter pomeroyi
Agrobacterium tumefaciens
Rhizobium leguminosarum
Brucella melitensis
Mesorhizobium sp. BNC1 (II)
Rhodobacterales bacterium HTCC2654
Nitrobacter winogradskyiNham 0990 Nitrobacter hamburgensis X14
Jannaschia sp. CC51Roseovarius sp.217
Roseobacter sp. MED193Oceanicola batsensis HTCC2597
Loktanella vestfoldensis SKA53
Roseovarius nubinhibens ISM
Gluconobacter oxydans
Erythrobacter litoralis
Novosphingobium aromaticivoransSphinopyxis alaskensis RB2256
Oceanicaulis alexandrii HTCC2633
Rhodospirillum rubrum
Parvularcula bermudensis HTCC2503
Magnetospirillum magneticum (I)
EE36 12413 Sulfitobacter sp. EE-36
ECOLIPSEAE
NEIMAHELPY
BACSUHelicobacter pylori : sp|O25671
Bacillus subtilis : P54574sp|
Neisseria meningitidis : sp|P0A0S7
Pseudomonas aeruginosa : sp|Q03456Escherichia coli: P0A9A9sp|
Mur
Fur
Magnetospirillum magneticum (II)
RHE_CH00378 Rhizobium etli
PU1002 04436Pelagibacter ubique HTCC1002
Irr
in proteobacteria
proteobacteria
Regulator of manganese uptake genes (sit, mntH)
Regulator of iron uptake and metabolism genes
The A, B, and C groups
of - proteobacteria - Mur
Caulobacter crescentus
Zymomonas mobilis
Gluconobacter oxydans
Erythrobacter litoralis
Novosphingobium aromaticivorans
Rhodospirillum rubrum
Magnetospirillum magneticum
Escherichia coli
Sphinopyxis alaskensis
Parvularcula bermudensis -
Oceanicaulis alexandrii
Bacillus subtilis
Sequence logos for the identified Fur-binding sites in the D group of proteobacteria
Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis
Identified Mur-binding sites
Phylogenetic tree of the Fur family of transcription factors in -proteobacteria - II
Fur in - and - proteobacteria
Fur in - proteobacteria Fur in Firmicutes
Irr in proteo-bacteria regulator of ironhomeostasis
proteobacteria Fur
ECOLIPSEAE
NEIMAHELPY
BACSUHelicobacter pylori : sp|O25671
Bacillus subtilis : P54574sp|
Neisseria meningitidis : sp|P0A0S7
Pseudomonas aeruginosa : sp|Q03456Escherichia coli : P0A9A9sp|
Mur /
Fur
Irr-
AGR C 249SM irr
RL irr1RL irr2
MLr5570MBNC03003186
BQ fur1BMEI1955BMEI1563BJ blr1216
RB2654 182SKA53 01126
ROS217 15500ISM 00785
OB2597 14726Jann 1652
Rsph03001693EE36 03493
STM1w01001534MED193 17849
SPOA0445RC irr
RPA2339RPA0424*
BJ irr*Nwi 0035*Nham 1013* Nitrobacter hamburgensis X14
Nitrobacter winogradskyi
Bradyrhizobium japonicum (I)
Agrobacterium tumefaciens
Rhizobium leguminosarum (I)
Mesorhizobium sp. BNC1
Sinorhizobium meliloti
Mesorhizobium loti
Bartonella quintanaBrucella melitensis (I)
Bradyrhizobium japonicum (II)
Rhodobacter sphaeroides
Rhodobacter capsulatusSilicibacter pomeroyi
Silicibacter sp. TM1040Roseobacter sp. MED193
Sulfitobacter sp. EE-36
Jannaschia sp. CC51Oceanicola batsensis HTCC2597Roseovarius nubinhibens ISM
Roseovarius sp.217Loktanella vestfoldensis SKA53
Rhodobacterales bacterium HTCC2654
Rhizobium etliRHE CH00106
Rhizobium leguminosarum (II)
Brucella melitensis (II)
Rhodopseudomonas palustris (II)Rhodopseudomonas palustris (I)
PU1002 04361 Pelagibacter ubique HTCC1002
Sequence logos for the identified Irr binding sites in -proteobacteria
(8 species) - IrrThe A group
The B group (4 species) - Irr
The C group (12 species) - Irr
Phylogenetic tree of the Rrf2 family of transcription factors in -proteobacteria
proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain proteins without a cysteine triad motif
Iron repressor RirA (Rhizobium leguminosarum)
Nitrite/NO-sensing regulator NsrR (Nitrosomonas europeae, Escherichia coli)
Cysteine metabolism repressor CymR(Bacillus subtilis)
Iron-Sulfur cluster synthesis repressor IscR(Escherichia coli)
Positional clustering of rrf2-like genes with:iron uptake and storage genes;
Fe-S cluster synthesis operons;genes involved in nitrosative stress protection;
sulfate uptake/assimilation genes;thioredoxin reductase;
carboxymuconolactone decarboxylase-family genes;
hmc cytochrome operon
Cytochrome complex regulator Rrf2(Desulfovibrio vulgaris)
ZMO0116
GOX0099
Rrub02000219
ZMO0422
Sala_1236
ELI0458
Saro3534
DV Rrf2
OA2633_03246CC1866
Ricket.
Am
b3030
Rrub 02002540
PB2503_09884
STM_3629
MED193_04321
ISM_16015
OB2597_03589
RO
S2
17
_ 20
54
2RB
26
54
040
09
SKA53_
05183RC_0477
Rsph023725SPO2025
EE36_14302
EC IscR
RPA0663GOX1196
Amb0200Rrub_1115
Sa
la_2
595
Sa
r o02
00
1 62
0
CC
2 62
5
PB
250
3 _0
371 2
R rub02002859
RC_0031Rsph023756
AGR_C_1499
RHE_CH01133
RL_1316
AGR_L_2801SMb20994SMc02267
RHE_CH03364RL_3916
MLl4516MLr1674
Rrub02001767Amb1054
ROS217_16231STM_634
MED193_09800
SPO0432Rsph023178
RB2654_19993RC 0780
BQ04990MBNC02002196
MLr1147BMEII0707
AGR_C_344
RL RirA
SMc00785RHE CH00735
OA2633_11510
Nwi_0743
NE NsrR
Amb1318GOX0860RC NsrR
ROS217_15206Rsph03001477
EC
_Ns
rR
SPOA0186
Ricket.
Sala_1049Saro02000305
OB2597_05195ROS217_02155
ROS217_14291
CC0132
SMc01160
BJ blr7974
RL_5159AGR_L_2343
AGR_C_402
AGR_L_1131
SPO3722RHE_CH02777RL_3336
SPO1393
MBNC02000669MLl1642
SMc02238AGR_C_872
RL_619RHE_CH00547
MBNC03004487
RirA
NsrR
IscR
IscR-II
Rhizo biales
Rh o dob acterales
Jann_2366
BS CymR
The A group - RirA (8 species)
(12 species)The C group - RirA
Sequence logos for the identified RirA-binding sites in -proteobacteria
Genes Functions:Iron uptakeIron storageFeS synthesis
Iron usageHeme biosynthesisRegulatory genesManganese uptake
Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in -proteobacteria
An attempt to reconstruct the history
Regulators and their signals
• Subtle changes at close evolutionary distances
• Cases of motif conservation at surprisingly large distances
• Correlation between contacting nucleotides and amino acid residues
DNA signals and protein-DNA interactions
CRP PurR
IHF TrpR
Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom)
Specificity-determining positions in the LacI family
• Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups
10 residues contact NPF (analog of the effector)
6 residues in the intersubunit contacts
7 residues contact the operator sequence
7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ)
5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ)
6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ)
– 44 SDPs
LacI from E.coli
The LacI family: subtle changes in signals at close distances
G
An
CGGn GC
CRP/FNR family of regulators
FNR
HcpR
CooA
Gam ma
Desulfovibrio
Desulfovibrio
TGTCGGCnnGCCGACA
TTGTgAnnnnnnTcACAA
TTGTGAnnnnnnTCACAA
TTGATnnnnATCAA
Correlation between contacting nucleotides and amino acid residues
• CooA in Desulfovibrio spp.• CRP in Gamma-proteobacteria• HcpR in Desulfovibrio spp. • FNR in Gamma-proteobacteria
DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVRDV COOA ELTMEQLAGLVGTTRQTASTLLNDMIREC CRP KITRQEIGQIVGCSRETVGRILKMLEDYP CRP KXTRQEIGQIVGCSRETVGRILKMLEDVC CRP KITRQEIGQIVGCSRETVGRILKMLEEDD HCPR DVSKSLLAGVLGTARETLSRALAKLVEDV HCPR DVTKGLLAGLLGTARETLSRCLSRMVEEC FNR TMTRGDIGNYLGLTVETISRLLGRFQKYP FNR TMTRGDIGNYLGLTVETISRLLGRFQKVC FNR TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTgAnnnnnnTcACAA
TTGTGAnnnnnnTCACAA
TTGATnnnnATCAA
Contacting residues: REnnnRTG: 1st arginineGA: glutamate and 2nd arginine
The correlation holds for
other factors in the family
Open problems
• Model the evolution of regulatory systems (a catalog of elementary events, estimates of probabilities)– Birth of a binding site; what are the mechanisms?– Loss of a binding site– Duplication of a regulated gene and/or a regulator– Horizontal transfer of a regulated gene and/or a regulator– Loss of structural a gene and/or a regulator– General properties?
• Distribution of TF family and regulon sizes• Stable cores and flexible margins of functional systems (in terms of gene
presence and regulation)
• Co-evolution of TFs and DNA sites: – “Neutral” model for the evolution of binding sites (with invariant functional
pressure from the bound protein)– How do the signals evolve? What is the driving force – changes in TFs?– TF-family, position-specific protein-DNA recognition code?
All that needs to take into account the incompleteness and noise in the data
Acknowledgements
• Andrei A. Mironov (algorithms and software)• Alexandra B. Rakhmaninova (SDPs)• Dmitry Rodionov (now at Burnham Institute) (BioR,
NrdR, iron)• Olga Laikova (LacI, sugars)• Dmitry Ravcheev (FruR)• Olga Kalinina (SDPs/LacI)
• Leonid Mirny, MIT (protein/DNA contacts, SDPs)• Andy Johnston, University of East Anglia (iron)
• Howard Hughes Medical Institute • Russian Fund of Basic Research• Russian Academy of Sciences, program “Molecular and Cellular Biology”• INTAS