Download - media.nature.com · Web viewstrains DSM 23518 (= LMG ... reduce as many captured gaps as possible to produce upgraded draft ... Han Y, Wang M, Vee V, Qu J. et al (2012 ...

Multi-omics analysis of niche specificity provides new insights into

ecological adaptation in bacteria

Bo Zhu1*, Muhammad Ibrahim1*, Zhouqi Cui1, Guanlin Xie1, Gulei Jin2, Michael

Kube3, Bin Li1$, Xueping Zhou1$

1State Key Laboratory of Rice Biology, Institute of Biotechnology, Zhejiang

University, Hangzhou 310029, China2Hangzhou Guhe Info Co., Ltd, Hangzhou 310029, China3Albrecht Daniel Thaer-Institute of Agricultural and Horticultural Sciences,

Humboldt-Universität zu Berlin, 14195 Berlin, Germany

Running title: Ecological adaptation in B. seminalis

*Authors contribute equally to the work$Corresponding author:

Bin Li, Xueping Zhou

Mailing address: State Key Laboratory of Rice Biology, Institute of Biotechnology,

Zhejiang University, 310058, Hangzhou, China.

Phone: 86-571-88982412. Fax: 86-571-88982412.

[email protected]; [email protected]

Conflict of Interest Statement

The authors declare no conflict of interest.

1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

12

Materials and Methods

Strains used in this study

B. seminalis strains DSM 23518 (= LMG 24067 T), 0901, S9 and R456 originated

from CF patient’s sputum (Vanlaere et al 2008), diseased apricot (Fang et al 2009),

westlake water (Fang et al 2011), and rice rhizospheric soil (Li et al 2011),

respectively. Unless otherwise specified, cultures of bacterial strains were maintained

on nutrient agar (NA) or nutrient broth (NB) media at 30°C prior to use. Cultures

were stored long term in 20% aqueous glycerol at -80°C.

Characterization of ecological roles

B. seminalis strains were tested for virulence in the alfalfa model (Bernier et al 2003),

which was carried out as described by Ibrahim et al. (2012). Pathogenicity of B.

seminalis to apricot was examined according to the method of Fang et al. (2009)

except that premature fruits were inoculated with 10 μL of bacterial suspensions at the

concentration of 1 × 105 CFU/mL using sterilized tips. Inhibition of B. seminalis on

the mycelial growth of R. solani was determined according to the method of Li et al.

(2011). The morphology of bacterial cells was observed using a JEOL JSM-6400

scanning electron microscope (Hitachi, Tokyo, Japan).

Growth in various niches

Adaptation of B. seminalis strains to various niches were investigated by incubating

the four strains under CF, water and soil extract media, respectively, while plant

condition was excluded for only strain 0901 was pathogenic to apricot. Water medium

that contains M9 minimal salts with 3% glycerol, was used to simulate the water

environment (Schell et al 2011). CF medium was prepared to mimic the sputum of CF

patients according to the method of (Dinesh 2010). Soil extract medium was prepared

to mimic soil conditions based on recent paper (Yoder-Himes et al 2009) with the

exception that soil was collected from the rice rhizosphere, which was the original

2

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

34

niche for strain R456. In addition, three different niche conditions were tested for

each of strains S9, DSM 23518, and R456. The bacterial numbers were counted based

on the measurement of OD 600 value (Ibrahim et al 2012).

Whole genome sequencing, assembly and annotation

Bacterial genomic DNA, isolated using Wizard Genomic DNA Purification Kit

(Promega, Madison, WI, USA), was used for whole-genome sequencing, which was

performed by using Pacbio sequencing (Pacific Biosciences, Menlo Park, CA, USA),

454 sequencing (Roche, Branford, CT, USA) and Illumina sequencing (Illumina, San

Diego, CA, USA). Sequence runs for four single-molecule real-time (SMRT) cells

were performed on the PacBio RS II sequencer with a 120-minute movie time/SMRT

cell. SMRT Analysis portal version 2.1 was used for read filtering and adapter

trimming, with default parameters, and postfiltered data of 350 - 580 Mb (around 40 -

60X coverage) on each cell/per strain with an average read length of 7 kb were

considered for further assembly. All the four genomes were first de novo assembled

using HGAP assembly protocol, which is available with the SMRT Analysis packages

and accessed through the SMRT Analysis Portal version 2.1. After this first round,

PBJelly V14.1.14 was used to fill and reduce as many captured gaps as possible to

produce upgraded draft genomes (English et al 2012). As B. seminlais genomes are

much bigger than that of the normal bacteria, around 50 scaffolds were generated after

this step. Then quality filtered Illumina and 454 sequencing reads were then used to

correct the false SNPs and Indels due to the low coverage in some regions. Also, these

reads were used enabling gap closure on the pre-assembled genomes by using WGS-

assembler and SSPACE (Boetzer et al 2011, Myers et al 2000). Finally, the consensus

was obtained based on the above procedure. If it was not complete sequence,

scaffolding and gap closure were repeated again until we get the almost complete

bacterial genome sequences.

Coding DNA Sequences (CDSs) were predicted using Prodigal version 2.6 with

default parameters (Hyatt et al 2010). To refine the accuracy, RNA-Seq results were

also used for improvement of gene prediction. Gene functions were automatically

assigned by RAST annotation engine (Aziz et al 2008) Predicted genes were

compared via Blastn against the genomic sequences to verify the accuracy. rRNA

operons and tRNA were predicted by RNAmmer and tRNAscan-SE (Lagesen et al

3

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

56

2007, Lowe and Eddy 1997), while additional analysis was carried out by using

NCBI’s uniprot database (http://www.ncbi.nlm.nih.gov/), COG (Tatusov et al 2001)

KEGG (Ogata et al 1999) and GO terms (Ashburner et al 2000).

Variant calling

Paired-end reads generated from Illumina sequencing were mapped onto genome

sequence by using Burrows–Wheeler Alignment (Li and Durbin 2009). Default

settings were used except the maximum edit distance was set to 0.02 (-n 0.02).

MarkDuplicates command in Picard (http://picard.sourceforge.net/) was used to

remove the reads that mapped to the same positions in strain DSM 23518 genome

(PCR duplications). After IndelRealigner and BaseRecalibrator, SNPs and Indels

were called using GATK (Gac et al 2013, Tenaillon et al 2012). Default settings were

used except the maximum read depth in GATK was set to 500 (-dcov 500). The

generated SNPs and Indels were then filtered using custom Perl scripts to minimize

the false positive mutation calls. First, mutations with a total read depth below 20X

were discarded. Second, SNPs and Indels with a Phred quality score below 30 were

removed. Third, the mutation calls were only kept when at least 80% of the reads was

positive. The lists of SNPs/Indels were then annotated by in-house Perl scripts. For

the mutations that happened in the coding regions, PROVEAN was used to predict

whether a protein sequence variation is deleterious or neural (Chieng et al 2012).

Phylogenetic and comparative genome analysis

The sequences from four whole genome sequenced strains were aligned and

visualized by using Murasaki software (Popendorf et al 2010). For genome-based

phylogeny, in addition to the four B. seminalis genomes that sequenced in this study,

28 complete Burkholderia genome sequences were obtained from Burkholderia

Genome Database (Winsor et al 2008). Furthermore, a well-resolved phylogenetic

tree were also generated based on the multi-locus sequence analysis (MLSA) of the

atpD, gltB, gyrB, lepA, phaC, recA and trpB genes, which has been widely applied in

identification and discrimination of the Burkholderia species (Spilker et al 2009). The

identity of strains was confirmed by calculating whole-genome average nucleotide

identity (ANI) based on Blast and MUMer algorithm by using JSpecies (Richter and

Rosselló-Móra 2009). Multiple sequence alignment was done by using Muscle 3.8

(Edgar 2004) and ML tree was generated by MEGA 6 (Tamura et al 2013). In

4

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

78

addition, GIs were detected by applying IslandViewer which integrated with mostly

used GI detection algorithem IslandPick, SIGI-HMM and IslandPath-DIMOB

(Langille and Brinkman 2009).

DNA methylation analysis pipeline

SMRT generated data was analyzed with RS_Modification and Motif Analysis

pipeline in SMRT analysis 2.2, which was provided by Pacific Biosciences SMRT

portal with default parameters. In this default parameters, coverage and IPD (inter-

pulse duration) ratio were calculated by dividing a methylated base in the DNA

template to an incorporation opposite of a canonical base (Lluch-Senar et al 2013).

All the data sets contain kinetic values for each reference position and DNA strand

with the corresponding sequences generated from assembly procedure. For statistical

analysis, methylation site positions were divided into three parts (up-stream 200 bp

coding region, coding region and down-stream 200 bp coding region). For every gene,

top methylated strain was then selected out for further analysis.

Growth conditions for RNA-Seq analysis

In order to simulate the original niche environments of four B. seminalis strains, 2 mL

of overnight cultured bacteria were inoculated into 50 mL of the following four types

of media. Water medium that contains M9 minimal salts (0.6% Na2PO4 + 0.3%

KH2PO4 + 0.05% NaCl + 0.1% NH4Cl + 0.02% MgSO4 + 0.015% CaCl2) with 3%

glycerol, was used for simulation of the water environment (Schell et al 2011). CF

medium was prepared according to the method of (Dinesh 2010). Briefly, 5.0 g/L

mucin from pig stomach mucosa (Sangon Biotech), 4.0 g/L low molecular-weight

salmon sperm DNA (Fluka), 5.9 mg diethylenetriaminepentaacetic acid (DTPA)

(Sigma), 5.0 g/L NaCl (Sigma), 2.2 g/L KCl (Sigma), 1.8 g/L Tris base (Sigma), were

mixed together autoclaved and 5.0 mL/L egg yolk emulsion (Oxoid), 5.0 g/L

casamino acids (Sangon Biotech) were added when temperature reached to 37°C after

autoclaving. Soil extract medium was prepared to mimic soil conditions based on

recent paper (Yoder-Himes et al 2009) with the exception that soil was collected from

the rice rhizosphere, which was the original niche for strain R456. Plant condition to

obtain in vivo bacteria was prepared according to the method of our recent paper (Li

5

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

910

et al 2014).

Total RNA harvesting

Each bacterial strain was incubated under its condition to stationary phase. After

centrifugation of 4500 g at 4°C, pellets were re-suspended in 3 mL of PBS. One

milliliter of bacterial culture was subjected to RNA purification by RNeasy Mini Kit

(Qiagen) and eluted in 50 µl of RNase-free water. Samples were treated with DNaseI

to remove any residual DNA and purified by phenol-chloroform-isoamyl alcohol

extraction and ethanol precipitation.

mRNA purification and cDNA synthesis

Ten micrograms from each total RNA sample was treated with the MICROBExpress

Bacterial mRNA Enrichment kit (Ambion) and RiboMinus™ Transcriptome Isolation

Kit (Bacteria) (Invitrogen) following the manufacturer’s instructions. Samples were

resuspended in 15 μL of RNase-free water. Bacterial mRNAs were chemically

fragmented to the size range of 200-250 bp using 1 × fragmentation solution

(Ambion) for 2.5 min at 94°C. cDNA was generated according to instructions given in

SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). Briefly, each mRNA

sample was mixed with 100 pmol of random hexamers, incubated at 65°C for 5 min,

chilled on ice, mixed with 4 μL of First-Strand Reaction Buffer (Invitrogen), 2 μL of

0.1 M DTT, 1 μL of 10 mM RNase-free dNTPmix, 1 μL of SuperScript III reverse

transcriptase (Invitrogen), and incubated at 50°C for 1 h. To generate the second

strand, the following Invitrogen reagents were added: 51.5 μL of RNase-free water, 20

μL of second-strand reaction buffer, 2.5 μL of 10 mM RNase-free dNTP mix, 50 U E.

coli DNA Polymerase, 5 U E. coli RNase H, and incubated at 16 °C for 2.5 h.

RNA Sequencing

The Illumina Paired End Sample Prep kit was used for RNA-Seq library creation

according to the manufacturer’s instructions as follows: Fragmented cDNA was end-

repaired, ligated to Illumina adaptors, and amplified by 18 cycles of PCR. Paired-end

100-bp reads were generated by high-throughput sequencing with the Illumina

Hiseq2000 Genome Analyzer instrument.

RNA-Seq data analysis

After removing the low quality reads and adaptors, RNA-Seq reads were aligned to

6

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

1112

the corresponding B. seminalis genome using TopHat 2.0.7 (Trapnell et al 2009),

allowing for a maximum of two mismatches. If reads mapped to more than one

location, only the one showing the highest score was kept. Reads mapping to rRNA

and tRNA regions were removed from further analysis. After getting the reads number

from every sample, edgeR with TMM normalization method was used to determine

the DEGs. Significantly differentially expressed genes (FDR value < 0.05 and at least

two fold changes) were selected for further analysis. Cluster 3.0 and Treeview 1.1.6

were used to generate the heatmap cluster based on the RPKM values (de Hoon et al

2004, Saldanha 2004).

COG enrichment analysis

All the DEGs between different strains or conditions will be classified by COG

category (Tatusov et al 2001). Based on the whole-genome COG classification, the

significance of COG category about DEGs under the same COG category will be

tested based on the Hypergeometric Distribution,

M N Mni n i

Ni x n

p

In which, N means the number of genes in the genome, M means the number of genes

assigned to one COG category in the whole genome, n means the number of DEGs

and I means the number of genes fill into one COG category in DEGs. The results

were shown on Table S4.

Validation of mix sample method

Each sample was derived from a pool of five biological replicates, which has been

developed to increase the efficiency and cost-effectiveness with equivalent statistical

power (Greenwald et al 2012, Peng et al 2003). To validate the accuracy of mix-

sample method, single biological RNA sample from SE of strain DSM 23518 were

prepared. Correlation coefficient between samples was determined by statistical

7

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

1314

analysis.

Quantitative real-time PCR

Total RNAs were extracted from exponentially growing cells, using an RNeasy Mini

spin columns Kit (Qiagen) and was treated with a unit of RNase-free DNase I

(Qiagen), and cDNA synthesis was performed with a Moloney murine leukemia virus

reverse transcriptase first-strand cDNA synthesis kit (QIAGEN). The cDNA was then

used directly as the template for qRT-PCR using a SYBER Green master mix (Protech

Technology Enterprise Co., Ltd.) on an ABI Prism 7000 sequence detection system

(Applied Biosystems). Primers for quantitative real-time PCR (qRT-PCR) of the

selected genes were designed by using Primer 3 based on the genome sequences

(Untergasser et al 2012). All these primers are listed in Table S3 and an annealing

temperature of 58ºC was used for all the primers. Short-chain dehydrogenase

(BCAL2694), which has been proved to be stably expressed in Bcc, was used as

internal control (Van Acker et al 2013). Fold changes were calculated according to the

delta-delta CT method and the values were also shown on Table S3. The correlation

between RNA-Seq results and qRT-PCR results were tested by Pearson's correlation

method.

Supplementary references

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000). Gene Ontology: tool for the unification of biology. Nat Genet 25: 25-29.

Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA et al (2008). The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9: 75.

Bernier SP, Silo-Suh L, Woods DE, Ohman DE, Sokol PA (2003). Comparative analysis of plant and animal models for characterization of Burkholderia cepacia virulence. Infect Immun 71: 5306-5313.

Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011). Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27: 578-579.

Chieng S, Carreto L, Nathan S (2012). Burkholderia pseudomallei transcriptional adaptation in macrophages. BMC Genomics 13: 328.

8

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220221

222223

224225226

227228

229230

1516

de Hoon MJL, Imoto S, Nolan J, Miyano S (2004). Open source clustering software. Bioinformatics 20: 1453-1454.

Dinesh SD (2010). Artificial Sputum Medium. Protocol Exchange doi:10.1038/protex.2010.212.

Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

English AC, Richards S, Han Y, Wang M, Vee V, Qu J et al (2012). Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7: e47768.

Fang Y, Li B, Wang F, Liu B, Wu Z, Su T et al (2009). Bacterial fruit rot of apricot caused by Burkholderia cepacia in China. Plant Pathol J 25: 429-432.

Fang Y, Xie G, Lou M, Li B, Muhammad I (2011). Diversity analysis of Burkholderia cepacia complex in the water bodies of West Lake, Hangzhou, China. The Journal of Microbiology 49: 309-314.

Gac M, Cooper TF, Cruveiller S, Médigue C, Schneider D (2013). Evolutionary history and genetic parallelism affect correlated responses to evolution. Mol Ecol 22: 3292-3303.

Greenwald JW, Greenwald CJ, Philmus BJ, Begley TP, Gross DC (2012). RNA-seq analysis reveals that an ECF σ Factor, AcsS, regulates achromobactin biosynthesis in Pseudomonas syringae pv. syringae B728a. PLoS One 7: e34804.

Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11.

Ibrahim M, Tang Q, Shi Y, Almoneafy A, Fang Y, Xu L et al (2012). Diversity of potential pathogenicity and biofilm formation among Burkholderia cepacia complex water, clinical, and agricultural isolates in China. World J Microb Biot 28: 2113-2123.

Lagesen K, Hallin P, Rødland EA, Stærfeldt HH, Rognes T, Ussery DW (2007). RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35: 3100-3108.

Langille MGI, Brinkman FSL (2009). IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics 25: 664-665.

Li B, Liu BP, Yu RR, Lou MM, Wang YL, Xie GL et al (2011). Phenotypic and molecular characterization of rhizobacterium Burkholderia sp. strain R456 antagonistic to Rhizoctonia solani, sheath blight of rice. World J Microb Biot 27: 2305-2313.

9

231232

233234

235236

237238239

240241

242243244

245246247

248249250

251252253

254255256

257258259

260261262

263264265266

1718

Li B, Ibrahim M, Ge M, Cui Z, Sun G, Xu F et al (2014). Transcriptome analysis of Acidovorax avenae subsp. avenae cultivated in vivo and co-culture with Burkholderia seminalis. Sci Rep 4.

Li H, Durbin R (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754-1760.

Lluch-Senar M, Luong K, Lloréns-Rico V, Delgado J, Fang G, Spittle K et al (2013). Comprehensive methylome characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at single-base resolution. PLoS Genetics 9: e1003191.

Lowe TM, Eddy SR (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955-964.

Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ et al (2000). A whole-genome assembly of Drosophila. Science 287: 2196-2204.

Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27: 29-34.

Peng X, Wood CL, Blalock EM, Chen KC, Landfield PW, Stromberg AJ (2003). Statistical implications of pooling RNA samples for microarray experiments. BMC Bioinformatics 4: 26.

Popendorf K, Tsuyoshi H, Osana Y, Sakakibara Y (2010). Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes. PLoS One 5: e12651.

Richter M, Rosselló-Móra R (2009). Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci U S A 106: 19126-19131.

Saldanha AJ (2004). Java Treeview—extensible visualization of microarray data. Bioinformatics 20: 3246-3248.

Schell MA, Zhao P, Wells L (2011). Outer Membrane Proteome of Burkholderia pseudomallei and Burkholderia mallei From Diverse Growth Conditions. J Proteome Res 10: 2417-2424.

Spilker T, Baldwin A, Bumford A, Dowson CG, Mahenthiralingam E, LiPuma JJ (2009). Expanded multilocus sequence typing for Burkholderia species. J Clin Microbiol 47: 2607-2610.

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013). MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30: 2725-2729.

Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS et al (2001). The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29: 22-28.

10

267268269

270271

272273274

275276

277278

279280

281282283

284285286

287288

289290

291292293

294295296

297298

299300301

1920

Tenaillon O, Rodríguez-Verdugo A, Gaut RL, McDonald P, Bennett AF, Long AD et al (2012). The molecular diversity of adaptive convergence. Science 335: 457-461.

Trapnell C, Pachter L, Salzberg SL (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105-1111.

Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M et al (2012). Primer3—new capabilities and interfaces. Nucleic acids research 40: e115-e115.

Van Acker H, Sass A, Bazzini S, De Roy K, Udine C, Messiaen T et al (2013). Biofilm-grown Burkholderia cepacia complex cells survive antibiotic treatment by avoiding production of reactive oxygen species. PLoS ONE 8: e58943.

Vanlaere E, LiPuma JJ, Baldwin A, Henry D, De Brandt E, Mahenthiralingam E et al (2008). Burkholderia latens sp. nov., Burkholderia diffusa sp. nov., Burkholderia arboris sp. nov., Burkholderia seminalis sp. nov. and Burkholderia metallica sp. nov., novel species within the Burkholderia cepacia complex. Int J Syst Evol Microbiol 58: 1580-1590.

Winsor GL, Khaira B, Van Rossum T, Lo R, Whiteside MD, Brinkman FSL (2008). The Burkholderia Genome Database: facilitating flexible queries and comparative analyses. Bioinformatics 24: 2803-2804.

Yoder-Himes D, Chain P, Zhu Y, Wurtzel O, Rubin E, Tiedje JM et al (2009). Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A 106: 3976-3981.

11

302303

304305

306307

308309310

311312313314315

316317318

319320321

322

323

324

325

326

327

328

329

330

331

332

333

2122

Supplementary Figure and Table Legends

Figure S1: Distribution of differentially expressed genes along the chromosome.

Grey thick circles sorted by strain 0901 from inner to outer represent strains 0901,

DSM 23518, R456 and S9 chromosomes, respectively. The red, green, blue and black

peaks outside the chromosome represent the log2 RPKM values of genes under CF,

apricot, soil and water conditions. Outside the black peak (water RPKM value) is the

heatmap of genes density every 10 kb along the chromosome from blue to red.

Figure S2: Full genome alignment among the four strains 0901, DSM 23518, R456

and S9 of Burkholderia seminalis.

Figure S3: Expression pattern cluster based on the normalized RPKM values. The

cluster of RNA-Seq samples based on the log2 RPKM values.

Figure S4: The histogram of the number of DNA methylation in Burkholderia

seminalis strains 0901, S9, R456 and DSM 23518.

Figure S5: Phylogenetic relationship of four Burkholderia seminalis strains to other

species of Burkholderia. (a) Maximum-likelihood tree was constructed by using

MLSA from four sequenced B. seminalis strains in this study and other 28

Burkholderia strains. Among these strains, B. seminalis DSM 23518 (= LMG 24067),

B. lata 383, B. thailandensis E264, B. mallei ATCC 23344, B. phymatum STM815, B.

phytofirmans PsJN and B. xenovorans LB400 are type strains. (b) Maximum

likelihood tree was constructed based on whole genome sequences. Among these

strains, the type strains are the same as that of (a).

Figure S6: Correlation coefficient between SE-single sample and SE-mix sample of

strain DSM 23518 based on the log2 RPKM values.

12

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

2324

Figure S7: Correlation coefficient between SE-mix sample and W-mix sample of

strain DSM 23518 based on the log2 RPKM values.

Table S1: Physiological characteristics of Burkholderia seminalis strains 0901, S9,

R456 and DSM 23518.

Table S2: Comparison of general genomic features between Burkholderia seminalis

strains 0901, DSM 23518, R456 and S9.

Table S3: Summary of RNA-Seq results (Illumina HiSeq 2000).

Table S4: Integrated information of Burkholderia seminalis strains 0901, S9, R456

and DSM 23518.

Table S5: Average Nucleotide Identity (ANI) among the Burkholderia seminalis

genomes and the selected Burkholderia cenocepacia genomes.

Table S6: COG enrichment results from DEGs. a), strain 0901; b), strain DSM

23518; c), strain S9; d), strain R456.

Table S7: Gene clusters involved in niche adaptation.

Table S8: (a): Primers of qRT-PCR used in this study. (b): Internal primer used in

qRT-PCR and its RPKM values in different strains and conditions.

13

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

2526