Molecular signatures of transgenerational response to ... · polyacanthus from other locations on...
Transcript of Molecular signatures of transgenerational response to ... · polyacanthus from other locations on...
SUPPLEMENTARY INFORMATIONDOI: 10.1038/NCLIMATE3087
NATURE CLIMATE CHANGE | www.nature.com/natureclimatechange 1
1
Signatures of transgenerational molecular response to ocean 1
acidification in a reef fish 2
Celia Schunter, Megan J. Welch, Taewoo Ryu, Huoming Zhang, Michael L. Berumen, 3
Göran E. Nilsson, Philip L. Munday* and Timothy Ravasi* 4
5
Supplementary information 6
7
Study organism & selection of parental pairs by behavioral phenotype 8
Acanthochromis polyacanthus was chosen for this study due to its advantageous life 9
history traits for laboratory rearing as well as previous information on the effects of CO2 10
on the behavior of this species1,2. Adult A. polyacanthus were caught on multiple reefs 11
around the Orpheus Island region of northern Great Barrier Reef (GBR), Australia 12
(18°38'24,3"S,146°29'31,8"E) using a baited barrier net and small hand nets for capture. 13
The fish were then brought to and maintained at James Cook University’s Experimental 14
Marine Aquarium Facility. The individuals used in the study were retrieved from a single 15
population to assure that the analysis of CO2 related molecular mechanisms were not 16
confounded by population differences of the wild adults (genetic or environmental). 17
Nevertheless, similar behavioural impairments have been recorded in populations of A. 18
polyacanthus from other locations on the GBR (e.g. Lizard Island; Welch and Munday 19
unpublished data) and in other reef fish species3 indicating that the behavioural effects of 20
high CO2 are not restricted to this population of A. polyacanthus adults. 21
22
Molecular signatures of transgenerational response to ocean acidification in a species of
reef fish
© 2016 Macmillan Publishers Limited. All rights reserved.
2
Individual adult A. polyacanthus were held under high CO2 (754 μatm) for 7 days, after 23
which they were tested for changes in their olfactory responses to chemical alarm cues 24
(CAC). A two-channel flume (30 cm x 13 cm) was used to test for olfactory preference 25
between blank seawater or CAC, scaled from previous studies2,4 to accommodate adult 26
fish. CAC water and untreated water were fed into the flume at a constant rate of 450 ml 27
min1 monitored by a flow meter. To extract CAC, conspecific adult fish were euthanized 28
with a quick blow to the head, superficial cuts were made along each side of the donor 29
fish and rinsed with 30 ml of control water for each side, a concentration based on 30
previous CAC response ratios3. The extracted CAC was mixed with 10 L of high CO2 31
water in the tank used to supply CAC to the flume to ensure a consistent concentration of 32
fresh CAC for the duration of each trial. A ratio of one donor fish to one test fish was 33
used. Behavioral sensitivity to high CO2 treatment was measured based on the amount of 34
time an individual spent in the CAC, where ≤ 30% time spent in the cue was considered 35
“tolerant”, and ≥ 70% time spent in the cue was considered “sensitive”. Adults were 36
further categorized by size, and breeding pairs were formed using tolerant male + tolerant 37
female and sensitive male + sensitive female of approximately equal size. 38
39
Climate models project that CO2 levels in the atmosphere and surface ocean will exceed 40
700 µatm by the end of this century5–7. Elevated CO2 levels cause a range of sensory and 41
behavioural impairments in coral reef fish, including altered antipredator responses8. 42
Importantly, individuals vary in their sensitivity to high CO2, with the greatest variation 43
among individuals occurring around 700 µatm3,9,10. Therefore, a CO2 level of 44
approximately 700 µatm was chosen as it will be experienced by fish during the second 45
© 2016 Macmillan Publishers Limited. All rights reserved.
3
half of this century and it is also the CO2 level where the maximum expression of 46
phenotypic variation in behaviour should favour adaption. 47
48
Experimental design & sample description 49
Sensitive and tolerant breeding pairs were divided equally into control (414 μatm) and 50
high CO2 (754 μatm, Table S1) treatments where they were held for three months prior to 51
the start of the breeding season. A. polyacanthus lay demersal clutches of eggs that are 52
cared for by both parents until hatching at approximately 10 days post-fertilization 53
(personal observation). Immediately after hatching clutches of offspring were removed 54
from their parents and placed into tanks with the same environmental conditions as their 55
parents: either control water or high CO2 water (Table S1). This provided four offspring 56
groups: (1) offspring from tolerant parents reared in control water, (2) offspring from 57
tolerant parents reared in high CO2 water, (3) offspring from sensitive parents reared in 58
control water, and (4) offspring from sensitive parents reared in high CO2 water. Multiple 59
family lines (parental pairs) were used to ensure that effects seen were not due to a single 60
breeding pair. To further remove bias due to specific breeding pairs, one tolerant parental 61
pair and one sensitive parent pair were first kept at control levels, bred at control levels 62
and the offspring stayed in control levels. Afterwards these two breeding pairs were 63
transferred into high CO2 to breed and these offspring were subsequently kept at high 64
CO2. Juveniles were reared in their respective environmental conditions until they were 5 65
months old. At this age, the brain is of sufficient size to extract enough RNA and proteins 66
for high throughput sequencing and mass spectrometry analysis. Body weight was 67
measured directly after euthanizing and photos were taken for length measurements. 68
© 2016 Macmillan Publishers Limited. All rights reserved.
4
Whole brains were dissected out and snap frozen in liquid nitrogen and stored at -80oC. 69
Dissection was randomized among treatments to eliminate any possibility of a sampling 70
time effect. 71
72
Supplementary Table 1. Mean (± s.d.) seawater parameters in the experimental system 73
for adults and juveniles during the experimental seasons. Temperature, pH, salinity, and 74
total alkalinity (TA) were measured directly. pCO2 was estimated from these parameters 75
using CO2SYS. Seawater parameters were consistent for breeding and experimental 76
components of the study. 77
Treatment pHNBS Temperature
(°C)
Salinity TA (μmol.kg-1
SW)
pCO2
(μatm)
Control 8.15 (±0.04) 28.5 (±0.2) 35.0 (±1.2) 2146 (±125) 414 (±46)
CO2 7.94 (±0.04) 28.5 (±0.3) 35.1 (±1.2) 2223 (+146) 754 (±92)
78
Preparation and sequencing of samples 79
At removal of whole brains from freezers, 350 ul of RTL Plus Buffer was added to the 80
brain tissue from a Qiagen AllPrep DNA/RNA Mini Kit. Approximately 30 rnase and 81
dnase free one-use silica beads (Daintree Scientific, Australia) were placed into 82
Eppendorf tubes and samples were homogenized for 30 seconds in a pre-frozen metal 83
tray with a Thermo Fisher Scientific bead beater. Samples were processed by the Qiagen 84
AllPrep DNA/RNA Kit protocol with the exception that at the RNA purification stage the 85
flow through was kept on ice for protein extraction. DNA and total RNA were purified 86
and kept at -80oC. Proteinase inhibitor was added to the flow through (3.5 ul of Halt 87
© 2016 Macmillan Publishers Limited. All rights reserved.
5
protease & phosphatase inhibitor cocktail 100X, Thermo Fisher Scientific) and the 88
sample was split into two Eppendorf tubes. 1000 ul of cold acetone were added to each 89
tube and vortexed for 10 seconds. The samples were left to precipitate for 30 minutes on 90
ice and then spun at full speed in a 4oC centrifuge for 10 minutes. Acetone was pipetted 91
out without touching the pellet and the pellet was left to dry for 15 minutes in a fume 92
hood and finally stored dry at -80oC. 93
94
De novo genome assembly and annotation 95
In brief, a wild A. polyacanthus fish was previously collected from the same locality on 96
the GBR in Australia and reared in the aquaria as described in Veilleux et al. 201511. 97
Liver genomic DNA of a F1 fish that was ‘developmentally’ reared at +3°C was 98
extracted using the standard phenol-chloroform extraction. Seven mate-pair libraries 99
ranging from 3 to 8kb and five paired end libraries were produced and sequenced on the 100
Illumina Hiseq2000 platform. De novo assembly was performed with a combination of 101
contig assembly with ABySS v1.5.2 (k=65)12 and scaffolding by SSPACE v3.013. The 102
assembled genome size was 992Mb with 30,414 scaffolds (> 500bp) and an N50 of 103
334,400bp. Gene annotation was accomplished with Maker214 by using the transcriptome 104
(de novo assembly in Veilleux et al. 201511) and reference-based assembly by Cufflinks 105
v2.2.115 and the combination of UniProtKB/Swiss-Prot16 and CEGMA core proteins17 as 106
mRNA and protein evidences, respectively, as well as ab initio predictors SNAP18 and 107
AUGUSTUS19. This resulted in 25,301 gene models have an average length of 2,466 bp. 108
Sequence matching and annotation of the gene models was performed with BLASTP 109
v2.2.30 against the nr database (version 01/2015; e-value cutoff: 10-4), BLASTN against 110
© 2016 Macmillan Publishers Limited. All rights reserved.
6
the eukaryotic nt database (version 01/2015) and BLASTX against the Uniprot (version 111
05/2015; e-value cutoff: 10-e10). Functional annotation of the transcripts was obtained 112
with Blast2GO20 version 3.1.2. Gene-annotation (GO) terms, InterPro IDs and KEGG 113
pathways were added to each transcript if possible. 114
115
Transcriptome mapping 116
Total RNA integrity was measured on an Agilent bioanalyzer and samples had a RIN 117
value of at least seven. Illumina sequencing libraries were produced for each sample with 118
a TrueSeq RNA library Preparation Kits and run on the HiSeq2000 platform by 119
Macrogen (Macrogen Korea). Nine samples were individually barcoded and multiplexed 120
in one Illumina lane to receive an approximate amount of 50 million paired end reads per 121
sample. 36 samples were sequenced on 4 lanes total. Information on the RNA quality, 122
randomized multiplexing to avoid batch effects and raw read count can be found in 123
Supplementary Table 5. Raw fastq reads were quality checked with FastQC21 and quality 124
trimmed with Trimmomatic22. Only high quality reads were accepted for further analysis 125
after removing Illumina adapters and low quality bases at the start and end of each read 126
(if below Phred of 35). The sliding window command was set to 4:20 with a minimum 127
read length of 40. Reads were only included if both paired-end reads passed quality 128
trimming. High quality paired-end reads were then mapped against the A. polyacanthus 129
assembled genome sequence with Tophat 223 by using the bowtie2 very sensitive 130
alignment mode with the custom made transcriptome gff file with transcript annotations. 131
Read counts were obtained for all genome exons and transcripts with htseq-count using 132
the union mode in HTSeq24. 133
© 2016 Macmillan Publishers Limited. All rights reserved.
7
134
Differential expression analysis 135
Differential expression analysis was performed with DEseq225 in Bioconductor version 136
3.2 in R version 3.2.1. Within treatment variation analysis revealed one family line of 137
tolerant parents at CO2 level to be outliers. This difference is most likely due to the fact 138
that the parent pair once placed into CO2 (after they had already bred in control 139
condition) took a long time to reproduce so that these offspring reached 5 months in May 140
2015, whereas all other samples were collected at 5 months of age between January and 141
February 2015. This seasonal difference in collection most likely caused a large gene 142
expression difference. To avoid this environmental bias we removed these three 143
individuals from the final transcriptome analysis, leaving two family lines and six 144
samples for the tolerant CO2 treatment. It has to be mentioned that all major results 145
described in this study are also found when including these three outliers, therefore not 146
skewing the main findings. 147
148
For the final analysis, firstly global expression differences between control (18 samples) 149
and high CO2 condition (15 samples) were analyzed with a multifactor analysis by 150
factoring in the different parental phenotypes (Tolerant or Sensitive). The same type of 151
analysis was done comparing all offspring from tolerant parents (n=15) with those from 152
sensitive parents (n=18) factoring in the environmental treatment (control or high CO2). 153
To get a more detailed idea of the expression patterns of each treatment group we 154
analyzed the gene expression differences of: a) control versus high CO2 for offspring 155
with tolerant parents (n=9 vs. 6), b) control versus high CO2 for offspring with sensitive 156
© 2016 Macmillan Publishers Limited. All rights reserved.
8
parents (n=9 vs. 9), c) Offspring of tolerant versus sensitive parents at control condition 157
(n=9 vs. 9) and d) Offspring of tolerant versus sensitive parents at high CO2 condition 158
(n=6 vs. 9). The significance limit was set after FDR correction at p-adjusted of 0.05, but 159
a minimum of 0.3 log2 Fold Change was applied and accepted if statistically significant 160
and within treatment standard deviation was small (SD <Mean). A Principle Component 161
Analysis (PCA) was performed to visualize the expression patterns of each of the four 162
groups of samples using MeV version 4.926 with median as a centering mode and the 163
number of neighbors for K-Nearest Neighbor (KNN) imputation was set to 10 164
(Supplementary Fig. 1). 165
166
We performed hierarchical clustering of the differentially expressed genes to investigate 167
a possible family effect on expression patterns. In the heatmap, if the three samples from 168
each family (or six for one family line) cluster together, then there is a possible family 169
effect, as these individuals show more similar patterns than other individuals from a 170
different parent pair in the same treatment group (Supplementary Fig. 2). We do not see 171
such an effect when comparing sibling offspring reared in control and high CO2 172
conditions, suggesting that individuals express transcripts more similarly at the treatment 173
level than at the family level. 174
175
Fisher’s exact tests were performed with Blast2GO20 to evaluate the presence of 176
functional enrichment within significantly differentiated (DE) genes for the different 177
comparisons. This was done by comparing the GO terms of the DE genes with the entire 178
transcriptome set with a significance level of FDR 0.05. Final enriched GO-terms were 179
© 2016 Macmillan Publishers Limited. All rights reserved.
9
reduced to higher level ontology terms with REVIGO27 by using the small setting. The 180
different comparisons resulted in different enriched functions. Whereas it can be seen that 181
control versus high CO2 comparisons resulted in L-serine biosynthesis processes and 182
organic acid metabolic processes which are shared between offspring phenotype 183
(Supplementary Table 3). Control versus CO2 for the sensitive individuals revealed one 184
unique (not shared) process and the tolerant versus sensitive comparison at CO2 showed 185
circadian rhythm and rhythmic processes as enriched functions. Gene expression 186
networks for the differentially expressed gene sets were created in Genemania28 by using 187
the Danio rerio genome. 188
189
Genetic variant analysis 190
To evaluate if the transgenerational signature in phenotype is due to a genetic variation 191
passed on to the next generation, we searched for single nucleotide polymorphisms 192
(SNPs) within the coding regions of the genome. To confidently call variants (SNPs) in 193
the transcripts of the different treatment groups several modifications were done to the 194
Tophat bam alignment files with samtools29. All 36 samples were first sorted, reordered 195
and then deduplicated with Picard tools (http://picard.sourceforge.net/) to eliminate 196
possible PCR biases. Read groups were added to each sample file to distinguish each 197
sample at the moment of merging all bam files to one. To avoid misalignment and false 198
positives we identified regions with insertions and deletions and religned them with the 199
genome Analysis Toolkit (GATK) version 3.530. The Unified Genotyper in GATK was 200
then used to call variants with a minimum Phred score of 30. Recalibration of all quality 201
variant sites was performed against the high quality variants by using a Gaussian mixture 202
© 2016 Macmillan Publishers Limited. All rights reserved.
10
model with VariantRecalibrator to better distinguish true variants from sequencing errors. 203
Finally, the set of high quality SNPs (Phred score ≥ 30) were obtained from the 204
recalibrated set. To look for variants with a signal of selection that show clear differences 205
between the offspring of tolerant or sensitive parents we used Bayescan31. The software 206
was run with the high quality SNPs vcf file including all 18 individuals per phenotype (T 207
or S) and a false discovery rate threshold of 0.05 was applied (Supplementary Fig. 3). 208
209
Protein digestion and iTRAQ labeling 210
Dried fish brain protein pellets stored at -80oC freezer were resuspended in lysis buffer 211
containing 8 M urea and centrifuged at 15000 rpm for 5 minutes. The supernatant was 212
transferred to a new Eppendorf tube. Protein concentrations were measured using a 2-D 213
Quant kit (GE Healthcare, UK). For each offspring group (Tolerant at control, tolerant at 214
high CO2, sensitive at control and sensitive at high CO2) we pooled six of the samples 215
that were also used for transcriptome sequencing. Due to the removal of one Tolerant 216
high CO2 family line we therefore reduced the number of individuals for proteomics to 217
six (instead of nine) to not induce a bias due to sample size. The samples for which 9 218
individuals were available were randomly chosen but included samples of all three family 219
lines per group. Suspended proteins were pooled at equal concentrations to a final of 100 220
µg. Proteins were reduced and alkylated by following the instruction from iTRAQ 4plex 221
Kit manual (Applied Biosystems, USA). The protein samples were then 1:7 diluted with 222
50 mM triethylammonium bicarbonate and digested using trypsin (Promega, USA) at an 223
enzyme:protein ratio of 1:40 for 16 h at 37°C. The trypsin was inactivated by adding 224
triflouroacetic acid to a final concentration of 2%. The peptides were desalted using 100 225
© 2016 Macmillan Publishers Limited. All rights reserved.
11
mg capacity Sep-Pak C18 cartridges (Water Corporation, USA). Samples were then 226
incubated with iTRAQ Reagents-4plex reagents (Applied Biosystems) for 60 minutes 227
before pooling all individually labeled peptide samples labeled individually and dried32. 228
229
Peptide fractionation and mass spectrometry analysis 230
Samples were fractionated by strong cation exchange chromatography (SCX) as 231
described earlier26. Briefly, The iTRAQ-labeled peptides were resuspended in 85 µL SCX 232
buffer A and fractionated using an Accela 1250 LC system (Thermo Scientific, USA). A 233
total of 15 peptide fractions were obtained, desalted using Sep-Pak C18 cartridges and 234
dried. The fractions were resuspended in 20 µL of LC-MS sample buffer (97% H20, 3% 235
ACN, 0.1% formic acid) and analyzed three times using a Q Exactive HF mass 236
spectrometer (Thermo Scientific, Germany) coupled with an UltiMate™ 3000 UHPLC 237
(Thermo Scientific). Peptides were separated using an Acclaim PepMap100 C18 column 238
(75 um I.D. X 15 cm, 3 µm particle sizes, 100 Å pore sizes) with a flow rate of 300 239
nL/minute. A 60-minute gradient was established using mobile phase A (0.1% formic 240
acid in H2O) and mobile phase B (0.1% formic acid in 80% acetonitrile): 5%-40% B for 241
40 minutes, 5-minute ramping to 90% B, 90% B for 5 minutes, and 2% B for 10-minute 242
column conditioning. The sample was introduced into mass spectrometer through a 243
Nanospray Flex (Thermo Scientific) with an electrospray potential of 1.5 kV. The ion 244
transfer tube temperature was set at 160°C. The Q Exactive was set to perform data 245
acquisition in the positive ion mode. A full MS scan (350-1400 m/z range) was acquired 246
in the Orbitrap at a resolution of 60,000 (at 200 m/z) in a profile mode, a maximum ion 247
accumulation time of 100 milliseconds and a target value of 3 × e6. Charge state 248
© 2016 Macmillan Publishers Limited. All rights reserved.
12
screening for precursor ion was activated. The ten most intense ions above a 2e4 249
threshold and carrying multiple charges were selected for fragmentation using higher 250
energy collision dissociation (HCD). The resolution was set as 15000. Dynamic 251
exclusion for HCD fragmentation was 20 seconds. Other setting for fragment ions 252
included a maximum ion accumulation time of 100 milliseconds, a target value of 1 × e5, 253
a normalized collision energy at 28%, and isolation width of 1.8. 254
255
Protein identification and quantitation 256
Raw MS data were converted into Mascot generic format (mgf) files using Proteome 257
Discoverer 1.4 software (Thermo Scientific). These files were submitted to MASCOT 258
v2.3 (Matrix Sciences Ltd, United Kingdom) for database search against an 259
Acanthochromis polychanthus brain protein dataset developed in-house from the 260
transcriptome data. The mass tolerance was set to 10 ppm for precursors, and 0.5 Da for 261
the MS/MS fragment ion. A maximum of one missed cleavage was allowed. Variable 262
modifications included 4-plex iTRAQ at tyrosine and oxidation at methionine. The fixed 263
modifications were set to methylethanethiosulfonate at cysteine and lysine, and 4-plex 264
iTRAQ at N-terminal. The MASCOT result files were processed using Scaffold v4.1.1 265
(Proteome Software Inc. USA) software for validation of peptide and protein 266
identifications with a threshold of 95% using the Prophet algorithm with Scaffold delta-267
mass correction. iTRAQ label-based quantitation of the identified proteins was performed 268
using the Scaffold Q+ algorithm. The intensities of all labeled peptides were normalized 269
across all runs33. Individual quantitative data acquired in each run were normalized using 270
the i-Tracker algorithm34. Peptide intensity was normalized within the assigned protein. 271
© 2016 Macmillan Publishers Limited. All rights reserved.
13
The reference channel (e.g. 114) was normalized to produce a 1:1 fold change, and the 272
iTRAQ ratios were then transformed to a log scale. P-values were calculated using a 273
paired t-test. We allowed for missing data in one of the technical replicates and accepted 274
differential expression at a fold-change level of 1.5 (consistent over technical replicates). 275
276
Differential Protein Expression 277
A total of 2,702 proteins were confidently detected, however not all these proteins had 278
data for all three technical replicates per sample group and the final number varied 279
between 2,100 to 2,300 per comparison. As per transcriptome analysis four comparisons 280
were performed by changing the reference sample in Scaffold v4.1.1: control versus high 281
CO2 for sensitive and tolerant offspring and also sensitive versus tolerant offspring at 282
control and CO2 conditions. The number of differentially expressed proteins can be found 283
in Figure 2c and the list of proteins in Supplementary Table 2a-d. 284
285
Comparative analysis of transcriptome and proteome 286
It is generally difficult to compare the two levels of molecular responses directly, as 287
transcriptomes are quantitative expression values and mass spectrometry-based 288
proteomes are relative values. Furthermore, the number of proteins detected in the 289
proteomes is generally a smaller fraction of the total number of proteins, whereas with 290
RNA-Seq a large quantity of genome-wide expression is recovered. Hence, the absence 291
of a protein in the proteome data set could mean that this protein is not expressed or that 292
it is not detected or did not passed the filtering criteria. Besides the technical issues of 293
comparing the two molecular levels, direct overlap has shown to be quite low (on average 294
© 2016 Macmillan Publishers Limited. All rights reserved.
14
27%) even in model species 35. Differential expression overlap varies depending on the 295
comparison, but is on average also low (Table S2), nonetheless, some differentially 296
expressed proteins match the differential expression of transcripts. For the seven 297
commonly (regardless of parental phenotype) differentially expressed transcripts, we 298
could detect three of the directly matching proteins (43% overlap, Table 2). Only one of 299
these three proteins (33%) was also differentially expressed between control condition 300
and high CO2 at least for the offspring of tolerant parents (gene name: phgdh; protein 301
name: d-3-phosphoglycerate dehydrogenase). For the 18 commonly differentially 302
expressed proteins (between control and CO2 condition regardless of parental phenotype), 303
none were differentially expressed for the directly related transcripts. However, for one 304
protein (inactive serine protease PAMR1-like) serine plays an important role, which can 305
also be found for several transcripts (discussed in main text). The rest of the commonly 306
differentially expressed proteins are mostly involved in structural maintenance (such as 307
collagen or myoglobin). One interesting protein that is not differentially expressed at the 308
transcript level is vasotocin, which is upregulated at the high CO2 level. For the sensitive 309
versus tolerant offspring comparison at high CO2 21 % of matching transcripts and 310
proteins (Table S2) were commonly differentially expressed. For example, the circadian 311
rhythm gene nr1d1 and purvalbumin (pvalb) are differentially expressed both at the 312
transcript and protein level. 313
314
Supplementary Table 2. Overlap of differentially expressed transcripts and proteins for 315
the different comparisons. Recovered matching proteins are the proteins that were 316
detected through mass spectrometry-based proteomics that matched the sequence of the 317
© 2016 Macmillan Publishers Limited. All rights reserved.
15
transcript. Overlapping differential expression are the exactly matching transcripts and 318
proteins that are both differentially expressed. 319
ComparisonParental
phenotype Condition
Differentially expressed
transcriptsRecovered
matching proteins
Overlapping differential expression
Control vs. CO2 T & S 7 3 1Control vs. CO2 T 173 12 1Control vs. CO2 S 62 14 0
Tolerant vs. Sensitive
high CO2 152 14 3 320
321
qRT-PCR validation 322
For RNAseq validation we used Quantitative Realtime PCR to test the expression of a 323
selection of genes. For this we used samples from the same treatments and families but 324
other biological replicates than previously used in the RNAseq analysis to enforce the 325
findings. In the RNAeq analysis we used three individuals per family (2-3 families per 326
treatment) and here we use two other biological replicates per family (2-3 families per 327
treatment group). QRT-PCR Primers were designed by using the associated transcript of 328
the gene of interest with Primer3Plus by setting the settings to qPCR36. Selected primers 329
were additionally checked with the NCBI Primer-BLAST tool to check for specificity. 330
The 20bp long forward and reverse primers were then obtained and HPSF purified by 331
SIGMA (Sigma-Aldrich, Germany). A total RNA quantity of 550ng for each of the 332
sample was reverse transcribed using a high capacity reverse transcription kit from ABI 333
(Applied Biosystems). 15ng of the produced cDNA was used for each of the reactions; 334
three reactions were replicated per sample. This was done with a Fast SYBR green PCR 335
mix (ABI) with the setup per reaction being: 5ul of 2X master mix, 0.25ul of 10uM 336
© 2016 Macmillan Publishers Limited. All rights reserved.
16
Forward Primer, 0.25ul of 10uM Reverse Primer, 3.5ul H2O and 1ul of cDNA Template 337
for a total reaction of 10ul. In a 384 well clear plate (ABI) the PCR was run: 95oC for 20 338
seconds and then 40cycles of 95oC for 10 seconds and 60oC for 20 seconds. The q-RT 339
PCR was done with a negative control on the 7900 HT Fast Real Time PCR system 340
(ABI) in Genomics section of the Biosciences Core Lab of the King Abdullah University 341
of Science and Technology. Three transcripts which represented the lowest standard 342
deviation in gene expression between all samples were selected as ‘house keeping genes’, 343
whereas each gene was expressed at a different level: low (dhrs7b), intermediate (smurf1) 344
and high (akt1s1) expression. All samples for all analyzed genes were run in triplicates 345
and medians of CT values between technical replicates were used for final quantification 346
and comparison. We used the Livak method and calculated Delta Delta CTs by 347
normalizing the CT values against the housekeeping gene average (Delta CT) and then 348
comparing Delta CTs of different treatments with each other (Delta Delta CT)37. Values 349
of Delta Delta CT were then compared with log2fold differences in the RNAseq data 350
(Supplementary Fig. 4). For comparison of QRT-PCR results with with RNAseq data we 351
performed four comparisons: Tolerant versus sensitive at the (1) control level as well as 352
(2) high CO2 level, and expression at control versus high CO2 for (3) only tolerant or (4) 353
only sensitive specimen. From the three selected ‘small deviation genes’ only the highly 354
expressed one had a significant correlation between RNAseq and qRT-PCR (Pearson's 355
product-moment correlation, p=0.022), which is not usual due to the more variable nature 356
of qRT-PCR. Nine out of ten genes used for validation showed the same expression 357
pattern in qRT-PCR as found with RNAseq (Pearson's product-moment correlation, 358
p<0.05). Only gabra3 did not match significantly, probably due to the very low twofold 359
© 2016 Macmillan Publishers Limited. All rights reserved.
17
expression differences of the gene. This high percentage of validation shows that the 360
RNAseq results can be replicated not only with a different method, but also with different 361
biological samples and therefore the observed pattern is clearly linked to the treatment. 362
© 2016 Macmillan Publishers Limited. All rights reserved.
18
References 363
1. Nilsson, G. E. et al. Near-future carbon dioxide levels alter fish behaviour by 364 interfering with neurotransmitter function. Nat. Clim. Chang. 2, 201–204 (2012). 365
2. Welch, M. J., Watson, S.-A., Welsh, J. Q., McCormick, M. I. & Munday, P. L. 366 Effects of elevated CO2 on fish behaviour undiminished by transgenerational 367 acclimation. Nat. Clim. Chang. 4, 1086–1089 (2014). 368
3. Ferrari, M. C. O. et al. Intrageneric variation in antipredator responses of coral reef 369 fishes affected by ocean acidification: implications for climate change projections 370 on marine communities. Glob. Chang. Biol. 17, 2980–2986 (2011). 371
4. Munday, P. L. et al. Ocean acidification impairs olfactory discrimination and 372 homing ability of a marine fish. Proc. Natl. Acad. Sci. 106, 1848–1852 (2009). 373
5. Meinshausen, M. et al. The RCP greenhouse gas concentrations and their 374 extensions from 1765 to 2300. Clim. Change 109, 213–241 (2011). 375
6. McNeil, B. I. & Sasse, T. P. Future ocean hypercapnia driven by anthropogenic 376 amplification of the natural CO2 cycle. Nature 529, 383–6 (2016). 377
7. Collins, M. et al. in Clim. Chang. 2013 Phys. Sci. Basis. Contrib. Work. Gr. I to 378 Fifth Assess. Rep. Intergov. Panel Clim. Chang. [Stocker, T.F., D. Qin, G.-K. 379 Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, (Cambridge 380 University Press, Cambridge, United Kingdom and New York, NY, USA., 2013). 381
8. Nagelkerken, I. & Munday, P. L. Animal behaviour shapes the ecological effects 382 of ocean acidification and warming: moving from individual to community-level 383 responses. Glob. Chang. Biol. 22, 974–89 (2016). 384
9. Munday, P. L. et al. Replenishment of fish populations is threatened by ocean 385 acidification. Proc. Natl. Acad. Sci. U. S. A. 107, 12930–4 (2010). 386
10. Munday, P. L. et al. Selective mortality associated with variation in CO2 tolerance 387 in a marine fish. Ocean Acidif. 1, 1–5 (2012). 388
11. Veilleux, H. D. et al. Molecular processes of transgenerational acclimation to a 389 warming ocean. Nat. Clim. Chang. 5, 1074–1078 (2015). 390
12. Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. 391 Genome Res. 19, 1117–23 (2009). 392
13. Boetzer, M., Henkel, C. V, Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding 393 pre-assembled contigs using SSPACE. Bioinformatics 27, 578–9 (2011). 394
14. Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database 395 management tool for second-generation genome projects. BMC Bioinformatics 12, 396 491 (2011). 397
15. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals 398 unannotated transcripts and isoform switching during cell differentiation. Nat. 399 Biotechnol. 28, 511–5 (2010). 400
16. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–12 (2014). 401 17. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core 402
genes in eukaryotic genomes. Bioinformatics 23, 1061–7 (2007). 403 18. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004). 404 19. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. 405
Nucleic Acids Res. 34, W435–9 (2006). 406 20. Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and 407
analysis in functional genomics research. Bioinformatics 21, 3674–6 (2005). 408
© 2016 Macmillan Publishers Limited. All rights reserved.
19
21. Andrews, S. FASTQC. A quality control tool for high throughput sequence data. 409 (2010). at <http://www.bioinformatics.babraham.ac.uk/projects/fastqc/> 410
22. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for 411 Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). 412
23. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of 413 insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). 414
24. Anders, S., Pyl, P. T. & Huber, W. HTSeq - A Python framework to work with 415 high-throughput sequencing data. Bioinformatics 31, 166–169 (2014). 416
25. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and 417 dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). 418
26. Saeed, A. I. et al. TM4: a free, open-source system for microarray data 419 management and analysis. Biotechniques 34, 374–8 (2003). 420
27. Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and 421 visualizes long lists of gene ontology terms. PLoS One 6, e21800 (2011). 422
28. Zuberi, K. et al. GeneMANIA prediction server 2013 update: biological network 423 integration for gene prioritization and predicting gene function. Nucleic Acids Res. 424 41, W115–22 (2013). 425
29. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 426 25, 2078–9 (2009). 427
30. DePristo, M. A. et al. A framework for variation discovery and genotyping using 428 next-generation DNA sequencing data. Nat. Genet. 43, 491–8 (2011). 429
31. Foll, M. & Gaggiotti, O. A genome-scan method to identify selected loci 430 appropriate for both dominant and codominant markers: a Bayesian perspective. 431 Genetics 180, 977–93 (2008). 432
32. Chandramouli, K. H., Reish, D., Zhang, H., Qian, P.-Y. & Ravasi, T. Proteomic 433 Changes Associated with Successive Reproductive Periods in Male Polychaetous 434 Neanthes arenaceodentata. Sci. Rep. 5, 13561 (2015). 435
33. Zhang, H. et al. Study of monocyte membrane proteome perturbation during 436 lipopolysaccharide-induced tolerance using iTRAQ-based quantitative proteomic 437 approach. Proteomics 10, 2780–9 (2010). 438
34. Shadforth, I. P., Dunkley, T. P. J., Lilley, K. S. & Bessant, C. i-Tracker: for 439 quantitative proteomics using iTRAQ. BMC Genomics 6, 145 (2005). 440
35. Ghazalpour, A. et al. Comparative analysis of proteome and transcriptome 441 variation in mouse. PLoS Genet. 7, e1001393 (2011). 442
36. Untergasser, A. et al. Primer3Plus, an enhanced web interface to Primer3. Nucleic 443 Acids Res. 35, W71–4 (2007). 444
37. Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using 445 real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25, 446 402–8 (2001). 447
448
© 2016 Macmillan Publishers Limited. All rights reserved.
20
Supplementary Figure 1 449
450
451
Supplementary Figure 1. Two dimensional projection of the Principle Component 452
Analysis (PCA) performed on the four sample treatments. The figure shows the PC1 and 453
PC3 projections. 454
455
456
© 2016 Macmillan Publishers Limited. All rights reserved.
21
Supplementary Figure 2 457
458
459 460
Supplementary Figure 2. Heatmap of sensitive offspring at Control and CO2 level. 461
Heatmap color intensity is proportional to the expression levels. A two way Hierarchical 462
clustering was performed according to condition as well as family lines (F). 463
464
41−1_C5
33−1_C3
41−1_C4
33−1_C4
33−1_C5
41−1_C2
57−1_C3
57−1_C2
57−1_C4
41−2_H1
73−1_H4
41−2_H2
41−2_H3
73−1_H1
73−1_H3
71−1_H1
71−1_H3
71−1_H4ConditionFamily
FamilyF1F2F3F4F5
ConditionCCO2
−3
−2
−1
0
1
2
3
© 2016 Macmillan Publishers Limited. All rights reserved.
22
Supplementary Figure 3 465
466
Supplementary Figure 3. Bayesian FST outlier detection for SNPs between offspring 467
of tolerant and sensitive parents. The four statistical outliers encountered are marked with 468
the gene name. 469
470
© 2016 Macmillan Publishers Limited. All rights reserved.
23
Supplementary Figure 4 471
472
473
Supplementary Figure 4. Transcripts expression validation by quantitative real-474
time PCR. Nine genes were validated with qRT-PCR and two fold changes are 475
represented for the four different comparisons: CO2 versus control only for Tolerant 476
samples (CO2 vs. C_T) or for sensitive samples (CO2 vs. C_S), or expression between 477
Tolerant and Sensitive samples at control condition (T vs. S_C) or high CO2 (T vs. 478
S_CO2). The red line corresponds to the qRT-PCR expression levels and the blue line to 479
the RNA-seq results. All genes have a significant correlation between qRT-PCR and 480
RNA-seq (P value of Pearson's product-moment correlation <0.05) except gabra3. 481
0.0
0.5
1.0
1.5
2.0
CO2vsC_T TvsS_C CO2vsC_S TvsS_CO2
2FoldC
hange
pck1
−1
0
1
2
CO2vsC_T TvsS_C CO2vsC_S TvsS_CO2
2FoldC
hange
fgf1
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
ciart
−1.0
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
per1
−1.0
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
nfil3
−1.0
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
shmt2
−1.0
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
glrk
−1.0
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
gabra1
−1.0
−0.5
0.0
0.5
1.0
CO2vsC_T TvsS_C CO2vsC_STvsS_CO2
2FoldC
hange
gabra3
method qPCR RNAseq
© 2016 Macmillan Publishers Limited. All rights reserved.
24
Supplementary Table Legends 482
Supplementary Table 1 a-‐d. List of significantly differentially expressed transcripts. 483
Supplementary Table 2a-‐d: List of differentially expressed proteins 484
Supplementary Table 3: Significant (FDR<0.05) enrichment of biological functions 485
for each comparison 486
Supplementary Table 4. Differentially expressed Solute Carrier Transporter Genes 487
(SLC) and transporter proteins. 488
Supplementary Table 5. Sequencing information for each biological sample 489
© 2016 Macmillan Publishers Limited. All rights reserved.