Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler ×...

11
Supporting Information Greaves et al. 10.1073/pnas.1201043109 SI Materials and Methods Plant Material. Hybrids between Arabidopsis thaliana C24 and Landsberg erecta (Ler) were generated by hand pollination. Seeds were sterilized, stratied for 2 d at 4 °C, and sown onto MSN medium, pH 7 (MS medium supplemented with 0.6% wt/ vol Noble agar). At 18 d, plants were transferred to soil (Debco Seed Raising Mix supplemented with 1 g/L Osmocote Extract Mini patterned release fertilizer pellets) and grown in a 16-h day cycle at 22 °C/18 °C (day/night). Light intensity was maintained between 120190 μmol·m 2 ·s 1 by using uorescent tubes and incandescent bulbs. Immature oral buds were collected from several plants per sample, frozen in liquid N 2 , and stored at 80 °C. For seedling analysis at 15 days after sowing whole seedlings were frozen in liquid N 2 and stored at 80 °C. Bisulphite PCR. Five hundred nanograms to 2 μg of genomic DNA from immature oral buds was isolated using the Plant DNeasy Minikit (Qiagen). Isolated DNA was bisulphite converted by using MethylEasy Xceed kit (Human Genetic Signatures). Bi- sulphite PCR was carried out (1) by using unbiased bisulphite primers, allowing the amplication of both methylated and un- methylated fragments (Dataset S1, Table S12). Annealing tem- peratures for primers were set at either 50 or 55 °C. At least eight clones were sequenced for each sample, and the level of methylation was analyzed and visualized by using Microsoft Excel 2007. For allelic discrimination of hybrid methylation, at least seven clones of each allele were analyzed. MethylC-seq. MethylC-seq was carried out as described in Lister et al. (2). Immature oral bud genomic DNA was extracted by using the plant DNeasy minikit (Qiagen) and stored at 20 °C. DNA was sonicated to a size of 200500 bp. DNA was end-re- paired and ligated with Illumina methylated adapters. DNA was run on a gel and size selected by gel purication (300 bp in size) before undergoing bisulphite conversion by using the Methyl- Easy Xceed Kit (Human Genetic Signatures). Libraries were amplied, and single 75-bp reads were obtained on a Genome Analyzer II by Geneworks. The bisulphite conversion efciency was checked by analyzing the conversion of chloroplast DNA using standard Bisulphite PCR and Sanger sequencing (3). mRNA-seq. Total RNA was extracted from 15-d-old seedlings for C24, Ler, C24 × Ler, and Ler × C24 (24 plants per sample) by using the RNeasy minikit (Qiagen). mRNA-seq libraries for Il- lumina sequencing were then prepared by the Biomolecular Resource Facility, The Australian National University. Approx- imately 25 million single 75-bp reads were obtained for each sample and mapped to the Arabidopsis genome (TAIR 9) by using TopHAT. Expression levels of genes were validated by using qRT-PCR, except for At3g43340 and At5g45790. At3g43340 is near identical to At2g39140 not allowing a reliable analysis of the gene expression level via qRT-PCR. At5g45790 had low expression levels and could not be reliably analyzed. qRT-PCR results were not in full agreement with mRNA-seq data for At4g22310 and At5g38720. qRT-PCR. Total RNA from 15-d-old seedlings was extracted by using RNeasy Plant Mini Kit (Qiagen) and treated with DNase I amplication grade (Invitrogen). Two micrograms of total RNA was used to perform cDNA synthesis (20 μL) by using oligo dT primers and SuperScript III RNase Reverse transcriptase (RT; Invitrogen). Each qRT-PCR was performed by using SYBR green and 0.25 μL of the RT product by using platinum Taq (Invitrogen). All gene expression levels were compared with the reference gene At4g26410 (Dataset S1, Table S12). qRT-PCR conditions were as follows: step 1: 94 °C for 10 min; step 2: 95 °C for 20 s, 60 °C for 20 s, and 72 °C for 20 s (45 cycles). The Corbett RG-6000 machine was used to run all qRT-PCR reactions. Mapping and m C Cluster Formation. Seventy-ve nucleotide Meth- ylC-seq reads were trimmed and mapped to the TAIR9 A. thaliana Columbia reference genome by using BSMAP (4). Mapping was accomplished by using a reiterative single nucleo- tide trimming of the read until either a successful hit was found or 30 nt had been removed, resulting in >90% of the reads mapped with >60% mapped uniquely. All reads with identical genomic location and sequences were removed before analysis of the mapped data. An error rate was determined by the number of uniquely mapped methylated Cs to the number of uniquely mapped Cs on the Chloroplast genome. A bisulphite conversion rate of 98% for all four samples was determined. True meth- ylated Cs were determined by a P value under the binomial distribution P = binomial (m, x, error_rate), where m = number of methylated reads, n = number of unmethylated reads; x = m + n, which is the sequencing depth on a single C. error_ratewas the error rate for nonconversion of the samples. True m C cytosines were determined as any C with a P value under the binomial distribution of 0.0025 (p cutoff). This P value depended on the coverage of reads over that given C. m C clusters were formed by using MergeBed from the BED- Tools package version 1.2 (5). Clusters were dened by using multiple combinations of the minimal number of m C required (ranging from 3 to 6) and the minimal distance between each m C (ranging from 15 to 25 nt). We settled on examining two rep- resentative cluster sets: (i ) clusters formed when at least three m C were within 20 nt of the adjacent m C; and (ii ) clusters formed when at least six m C were within 20 nt of the adjacent m C (Dataset S2). The rst approach, yielded 150,000 m C clusters for each sample. Cluster methylation levels were calculated as the average methylation level of the m C within the cluster, with unique values attained for m Total (All m C), m CG, m CHG, and m CHH contexts. Depending on the level of coverage, or the presence of m Cs, cluster lengths and individual m C cluster methylation levels were different for each sample. As a result, a two-way analysis was undertaken rst, comparing regions that were methylated in C24 (referred as C24-identied), and second, comparing regions that were methylated in Ler (referred as Ler- identied). This approach resulted in two datasets for the pa- rental comparison and four datasets (two for each hybrid) for the hybrid analysis that were analyzed independently. Numbers through the paper are represented as the average from the dif- ferent analyses with the number of clusters in each class found in Dataset S1, Tables S3, S4, and S6. All analyses gave consistent results and patterns. For comparative analysis between parents or between parents and hybrids, clusters with 75% coverage of the same C were used. In such cases, where clusters had between 75 and 99% coverage of the same C, the Cs not covered in all samples were trimmed from the analysis and not used to estab- lish a cluster methylation level. Any cluster with <75% of the same C covered between samples was not used in any compar- ative analysis. Clusters were allocated a methylation index value ranging from 0 (low/unmethylated) to 5 (highly methylated) based on the average methylation of the m Cs contained within (Dataset S1, Greaves et al. www.pnas.org/cgi/content/short/1201043109 1 of 11

Transcript of Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler ×...

Page 1: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Supporting InformationGreaves et al. 10.1073/pnas.1201043109SI Materials and MethodsPlant Material. Hybrids between Arabidopsis thaliana C24 andLandsberg erecta (Ler) were generated by hand pollination.Seeds were sterilized, stratified for 2 d at 4 °C, and sown ontoMSN medium, pH 7 (MS medium supplemented with 0.6% wt/vol Noble agar). At 18 d, plants were transferred to soil (DebcoSeed Raising Mix supplemented with 1 g/L Osmocote ExtractMini patterned release fertilizer pellets) and grown in a 16-h daycycle at 22 °C/18 °C (day/night). Light intensity was maintainedbetween 120–190 μmol·m−2·s−1 by using fluorescent tubes andincandescent bulbs. Immature floral buds were collected fromseveral plants per sample, frozen in liquid N2, and stored at −80 °C.For seedling analysis at 15 days after sowing whole seedlingswere frozen in liquid N2 and stored at −80 °C.

Bisulphite PCR. Five hundred nanograms to 2 μg of genomic DNAfrom immature floral buds was isolated using the Plant DNeasyMinikit (Qiagen). Isolated DNA was bisulphite converted byusing MethylEasy Xceed kit (Human Genetic Signatures). Bi-sulphite PCR was carried out (1) by using unbiased bisulphiteprimers, allowing the amplification of both methylated and un-methylated fragments (Dataset S1, Table S12). Annealing tem-peratures for primers were set at either 50 or 55 °C. At leasteight clones were sequenced for each sample, and the level ofmethylation was analyzed and visualized by using MicrosoftExcel 2007. For allelic discrimination of hybrid methylation, atleast seven clones of each allele were analyzed.

MethylC-seq. MethylC-seq was carried out as described in Listeret al. (2). Immature floral bud genomic DNA was extracted byusing the plant DNeasy minikit (Qiagen) and stored at −20 °C.DNA was sonicated to a size of 200–500 bp. DNA was end-re-paired and ligated with Illumina methylated adapters. DNA wasrun on a gel and size selected by gel purification (≈300 bp in size)before undergoing bisulphite conversion by using the Methyl-Easy Xceed Kit (Human Genetic Signatures). Libraries wereamplified, and single 75-bp reads were obtained on a GenomeAnalyzer II by Geneworks. The bisulphite conversion efficiencywas checked by analyzing the conversion of chloroplast DNAusing standard Bisulphite PCR and Sanger sequencing (3).

mRNA-seq. Total RNA was extracted from 15-d-old seedlings forC24, Ler, C24 × Ler, and Ler × C24 (24 plants per sample) byusing the RNeasy minikit (Qiagen). mRNA-seq libraries for Il-lumina sequencing were then prepared by the BiomolecularResource Facility, The Australian National University. Approx-imately 25 million single 75-bp reads were obtained for eachsample and mapped to the Arabidopsis genome (TAIR 9) byusing TopHAT. Expression levels of genes were validatedby using qRT-PCR, except for At3g43340 and At5g45790.At3g43340 is near identical to At2g39140 not allowing a reliableanalysis of the gene expression level via qRT-PCR. At5g45790had low expression levels and could not be reliably analyzed.qRT-PCR results were not in full agreement with mRNA-seqdata for At4g22310 and At5g38720.

qRT-PCR. Total RNA from 15-d-old seedlings was extracted byusing RNeasy Plant Mini Kit (Qiagen) and treated with DNase Iamplification grade (Invitrogen). Two micrograms of total RNAwas used to perform cDNA synthesis (20 μL) by using oligo dTprimers and SuperScript III RNase Reverse transcriptase (RT;Invitrogen). Each qRT-PCR was performed by using SYBR

green and 0.25 μL of the RT product by using platinum Taq(Invitrogen). All gene expression levels were compared with thereference gene At4g26410 (Dataset S1, Table S12). qRT-PCRconditions were as follows: step 1: 94 °C for 10 min; step 2: 95 °Cfor 20 s, 60 °C for 20 s, and 72 °C for 20 s (45 cycles). The CorbettRG-6000 machine was used to run all qRT-PCR reactions.

Mapping and mC Cluster Formation. Seventy-five nucleotide Meth-ylC-seq reads were trimmed and mapped to the TAIR9 A.thaliana Columbia reference genome by using BSMAP (4).Mapping was accomplished by using a reiterative single nucleo-tide trimming of the read until either a successful hit was foundor 30 nt had been removed, resulting in >90% of the readsmapped with >60% mapped uniquely. All reads with identicalgenomic location and sequences were removed before analysis ofthe mapped data. An error rate was determined by the numberof uniquely mapped methylated C’s to the number of uniquelymapped C’s on the Chloroplast genome. A bisulphite conversionrate of ≥98% for all four samples was determined. True meth-ylated C’s were determined by a P value under the binomialdistribution P = binomial (m, x, error_rate), where m = numberof methylated reads, n = number of unmethylated reads; x =m + n, which is the sequencing depth on a single C. “error_rate”was the error rate for nonconversion of the samples. True mCcytosines were determined as any C with a P value under thebinomial distribution of 0.0025 (p cutoff). This P value dependedon the coverage of reads over that given C.

mC clusters were formed by using MergeBed from the BED-Tools package version 1.2 (5). Clusters were defined by usingmultiple combinations of the minimal number of mC required(ranging from 3 to 6) and the minimal distance between each mC(ranging from 15 to 25 nt). We settled on examining two rep-resentative cluster sets: (i) clusters formed when at least threemC were within 20 nt of the adjacent mC; and (ii) clusters formedwhen at least six mC were within 20 nt of the adjacent mC(Dataset S2). The first approach, yielded ≈150,000 mC clustersfor each sample. Cluster methylation levels were calculated asthe average methylation level of the mC within the cluster, withunique values attained for mTotal (All mC), mCG, mCHG, andmCHH contexts. Depending on the level of coverage, or thepresence of mC’s, cluster lengths and individual mC clustermethylation levels were different for each sample. As a result,a two-way analysis was undertaken first, comparing regions thatwere methylated in C24 (referred as C24-identified), and second,comparing regions that were methylated in Ler (referred as Ler-identified). This approach resulted in two datasets for the pa-rental comparison and four datasets (two for each hybrid) for thehybrid analysis that were analyzed independently. Numbersthrough the paper are represented as the average from the dif-ferent analyses with the number of clusters in each class found inDataset S1, Tables S3, S4, and S6. All analyses gave consistentresults and patterns. For comparative analysis between parentsor between parents and hybrids, clusters with 75% coverage ofthe same C were used. In such cases, where clusters had between75 and 99% coverage of the same C, the C’s not covered in allsamples were trimmed from the analysis and not used to estab-lish a cluster methylation level. Any cluster with <75% of thesame C covered between samples was not used in any compar-ative analysis.Clusters were allocated a methylation index value ranging from

0 (low/unmethylated) to ≥5 (highly methylated) based on theaverage methylation of the mC’s contained within (Dataset S1,

Greaves et al. www.pnas.org/cgi/content/short/1201043109 1 of 11

Page 2: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Table S5). Context-specific methylation index values were as-signed for mTotal, mCG, mCHG, and mCHH methylation re-flective of the differences in the distribution of methylation levelsbetween the contexts. If a particular cluster contained only onemC context, that context’s methylation index was used for mTotalindex values. Clusters in parents were considered differentiallymethylated if their indexing scores differed by a value ≥2. Forhybrid comparisons, clusters were considered nonadditivelymethylated if the hybrid index score differed by a value ≥2 fromthe midparent value (MPV) index score. Once a region wasregistered as differentially methylated, we also tested the regionto make sure that the overall region was different in its levels andnot just different C’s methylated in the same region.

Distribution of mC Clusters over the Arabidopsis Genome. Theannotations for genes and transposable elements (TE) weredownloaded from the TAIR9 genomic release. The genomicfeature overlapping the mC cluster was identified by using a scriptinvoking various commands from the BEDTools package version1.2 (5). Assigning Clusters to a genomic feature was prioritizedby TE > Gene body > ±1 kb from gene > Intergenic. If a mCcluster overlapped two genomic features, such as gene body andupstream, the mC cluster was assigned based on the feature thatthe majority of the cluster overlapped. For profiling purposes,clusters that had different levels of methylation between parents,or hybrids and MPV, yet still had overall high levels of methyl-ation (mC index value ≥4) were not considered differentiallymethylated because these regions would still have levels ofmethylation likely to result in a repressive state. To determinewhether a statistical enrichment for differentially methylatedclusters in particular features occurred the results were subjectedto a Fisher exact test (P < 0.05) by using the R software package.

SNP Methylation Analysis. SNPs within the MethylC-seq readswere identified by using the genomic sequences for the C24 andLer accession obtained from the 1,001 genomes website (www.1001genomes.org). This analysis resulted in 220,778 SNP con-taining reads through which the mC could be tracked to the parentof origin in the C24 × Ler hybrid. SNP trackable mC’s (165,416)were mapped onto the original mC clusters, filtered for quality andadherence to the methylation level of the original cluster (DatasetS1, Table S13), resulting in 5,479 SNP clusters (Dataset S1, TableS8). Allelic cluster methylation levels derived from SNPs werethen compared with the expected methylation level by using themethylation index scale described earlier, allowing us to observehow the allelic methylation level were inherited.

Extrapolating the Number of TCM and TCdM Events in Hybrids. Asecond approach using SNPs was used to determine how many ofour original nonadditive mC clusters (Dataset S1, Table S6) werea direct result of TCM and TCdM events. We were able to showthat 50% of our nonadditive increases and 46% of nonadditivedecreases, which contained SNPs, were due to TCM and TCdM,respectively (Dataset S1, Table S9 and Fig. S7). This number islikely to be an underestimate given our defined criteria for dif-ferential allelic methylation. Of the ≈70,000 mC clusters used forthe hybrid analysis, ≈2,000 (3%) had a nonadditive increase and≈3,000 (4%) had a nonadditive decrease in methylation sug-gesting ≈1,000 (50% of 2,000) of our nonadditive increase mCclusters and ≈1,300 (46% of 3,000) of nonadditive decreased mCclusters are a result of TCM and TCdM, respectively (DatasetS1, Table S9). Our datasets contained ≈150,000 clusters, how-ever only ≈50% of them were incorporated in the hybrid analysis(given the need for coverage in both parents and at least onehybrid as described above), therefore we can estimate that thereare ≈2,100 TCM and ≈3,000 TCdM events occurring in ourdataset (Dataset S1, Table S9). Our mC clusters contained ≈93%of total mC sequenced, whereas our sequencing covered ≈68% of

all C’s in C24 and Ler (Dataset S1, Table S9). Assuming thesequenced regions are representative of the genome as a whole,then we would expect ≈3,500 TCM events [((≈2,100/0.93)/≈0.68)= ≈3,500] and ≈4,700 TCdM [((≈3,000/≈0.93)/0.68) = ≈4,700]events across the entire hybrid genome (Dataset S1, Table S9).

siRNAs.Twenty-three to twenty-four nucleotide siRNA reads froma published dataset (3) were used to form siRNA clusters byusing both uniquely mapped reads and repetitive reads. Clusterswere formed by using MergeBED from the BEDTools packageversion 1.2 with the –d option set at 200 (5). Only clusters with≥5 siRNA reads were kept for subsequent analysis. For theanalysis of methylation spreading beyond a siRNA clusters, thesiRNA clusters were normalized to 500 nt in length. Inter-sectBED with the –c option was used to report back the fre-quency of methylation at a given C within the siRNA cluster and±1 kb flanking region of the siRNA cluster. This analysis re-sulted in ≈18,000 siRNA clusters, ≈9,000 5′ flanking regions, and≈9,000 3′ flanking regions. The number of flanking regions waslower than the number of siRNA clusters because only flankingregions not overlapping adjacent clusters were used. Graphs ofmC frequency over siRNA clusters clearly showed methylationfrequencies were highest directly under the siRNA clusters. Italso showed that mC levels do not reach a background level until400 nt outside of a siRNA cluster. Any mC within a siRNAcluster or within 400 nt from a siRNA cluster was considered tobe siRNA-directed DNA methylation and should be consideredsiRNA-associated. To determine the role of siRNAs in non-additive methylation, all mC clusters where one parent wasmethylated and the other unmethylated were compared with theposition of siRNA clusters and characterized as siRNA-associ-ated or siRNA-independent using IntersectBED from BED-Tools package version 1.2. Because we were interested inwhether siRNAs had directed methylation on another allele,a mC cluster was only considered siRNA-associated if the siRNAwas in the parent that was methylated. Clusters were only con-sidered siRNA-independent if there was no siRNAs in eitherparent or the hybrid being analyzed.

Identification of Candidate Genes That May Be Regulated ThroughDNA Methylation. Flanking regions. We identified 1,155 clustersthat were nonadditively methylated and in the ±1-kb flankingregions of genes. These genes were then filtered for the numberof mC (≥5), differential expression between parents (≥1.1-folddifference), and hybrid gene expression different from MPV(≥1.2-fold difference). This analysis resulted in 75 candidategenes that could be potentially altered in hybrids because ofnonadditive methylation. This list was then manually filtered byexamining the relationship of methylation to gene expression.Only those genes where the relationship between methylationand gene expression was consistent between both parent–parentand hybrid–MPV were analyzed resulting in 32 candidate geneswith altered expression levels in the hybrid that may be due tononadditive changes in methylation.Gene body. To study the effect of methylation changes in the genebody, we selected all genes that contained a mC cluster (C24 × Ler= 13,125; Ler × C24 = 13,542). All mC clusters within the samegene were merged and an overall methylation level produced forthe gene as a whole. These ≈13,000 genes were then filtered forthe number of mC [10–20 depending on the context present, aseach context was weighted differentially because of presumeddifferences in suppressive activity in a gene (i.e., CHG is stronglysuppressive, whereas CG is less so)], hybrid gene expression dif-ferent from MPV (≥1.2 fold difference), and hybrid differentialmethylation, resulting in 323 candidate genes. These genes werethen manually filtered by studying total methylation levels,numbers of C’s, and the relationship between methylation andgene expression in the parents and between the parents and hy-

Greaves et al. www.pnas.org/cgi/content/short/1201043109 2 of 11

Page 3: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

brid. This filtering left 23 candidate genes with differential ex-pression in the hybrid that may be due to the associated changesin gene body methylation.

Software.R-script, BEDTools, and AWK programming languagewere used for all bio-informatics analysis done in this paper. Finalgraphs were generated in Microsoft Excel 2007.

1. Clark SJ, Harrison J, Paul CL, Frommer M (1994) High-sensitivity mapping of methylatedcytosines. Nucleic Acids Res 22:2990–2997.

2. Lister R, et al. (2008) Highly integrated single-base resolution maps of the epigenomein Arabidopsis. Cell 133:523–536.

3. Groszmann M, et al. (2011) Changes in 24-nt siRNA levels in Arabidopsis hybridssuggest an epigenetic contribution to hybrid vigor. Proc Natl Acad Sci USA 108:2617–2622.

4. Li Y, Li W (2009) BSMAP: Whole genome bisulfite sequence MAPping program. BMCBioinformatics 10:232.

5. Quinlan AR, Hall IM (2010) BEDTools: A flexible suite of utilities for comparing genomicfeatures. Bioinformatics 26:841–842.

Fig. S1. Genomic methylation. (A) Histogram of methylation frequencies on the y axis for different levels (%) of CG (blue), CHG (green), and CHH (red)methylation (x axis). (B) Methylation levels across genic regions (Gene body and ±1 kb) for CG (blue), CHG (green), and CHH (red).

Greaves et al. www.pnas.org/cgi/content/short/1201043109 3 of 11

Page 4: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Fig. S2. Distribution of 23/24nt siRNA clusters, mC read coverage, and mC frequencies across all five Arabidopsis chromosomes of both accessions and thereciprocal hybrids. Gene, TE, and repeat densities are represented as a heat map underneath each chromosome.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 4 of 11

Page 5: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Fig. S3. (A) Genomic distribution of methylated clusters (for details of the method of assigning a mC cluster to genomic features, see SI Materials andMethods). (B) Length and density (methylated C’s / total C’s in cluster) of methylated clusters in different genomic features. (C) Regions of differentialmethylation between the parental ecotypes C24 and Ler. (D) Genomic distribution of nonadditive methylation in hybrids. Two graphs each are shown for C24-identified and Ler-identified clusters. Arrows indicate statistical enrichment or underrepresentation of differentially methylated mC clusters in differentgenomic features (Fisher exact test; P < 0.05). (E) Genomic distribution of methylated clusters analyzed in the hybrids.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 5 of 11

Page 6: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Fig. S4. (A) Average methylation levels of clusters shown in the permutation tables in Fig. 3. Red graphs denote allelic combinations expected to result inincreased methylation, blue graphs denote allelic combinations expected to result in decreased levels of methylation, and gray graphs denote allelic com-binations expected to result in additive levels of methylation. Error bars = SEM. (B) Average methylation levels of hybrid mC clusters when one parent is highlymethylated, the other parent has no methylation, and the changes in hybrid methylation levels can only be attained by TCM or TCdM.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 6 of 11

Page 7: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Fig. S5. Examples of altered allelic methylation in the hybrids through TCM or TCdM. (A) Allelic methylation levels for Ler x C24 hybrid (see Fig. 4 for parentsand C24 x Ler). (B) Example of a low TCM event (Locus C) and a decrease in mCG at several sites in the C24 x Ler hybrid (Locus E). mCG (blue), mCHG (red), andmCHH (green) methylated residues are plotted along the x axis (each tick-mark equalling 100 nt), with the y axis representing the methylation level. Thedistribution and abundance of siRNA reads are plotted in red above each graph and are duplicated between the two alleles of the hybrids. Regions undergoingTCM and TCdM events are underlined in red and blue, respectively. (C) The number of siRNA reads over a SNP present at Locus F (arrow).

Greaves et al. www.pnas.org/cgi/content/short/1201043109 7 of 11

Page 8: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Fig. S6. Nonadditive methylation is associated with siRNAs. (A) The frequency of given cytosine being methylated within a siRNA clusters and ±1 kb flankingregions. Red, the cluster region; Green, adjacent flanking 200 nt; Blue, remaining 800 nt. Average methylation levels are indicated in the top graphs. n:Clusters = ≈18,000; Flanking regions ≈9,000 (refer to SI Materials and Methods). (B) Average mCG levels of siRNA-associated and siRNA-independent hybrid mCclusters plotted against the expected MPV showing an above expected trend in mCG in siRNA-associated clusters. Dot colors denote categorization of mC statusin the hybrid: red, nonadditive increase; blue, nonadditive decrease; gray, additive methylation. (C) Examples of genes correlating with altered methylationlevels in the hybrids. mC clusters were located in or within ±1 kb of the described genes. Gray histograms show average methylation level of the mC cluster(highlighted in yellow) and levels of gene expression obtained from a mRNA-seq data. The next set of graphs depict the distribution and abundance of siRNAs(red) and mC (blue) in both parents and each reciprocal hybrid. Arrows at the bottom of each example show gene orientation or location of the neighboringgene. Numbers at each extreme of the regions show genomic position. IC, insufficient coverage.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 8 of 11

Page 9: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Fig. S7. (A) Allelic inheritance of clusters containing SNPs that were reported as nonadditively methylated in the whole dataset. mC clusters with a nonadditiveincrease in methylation are enriched for allelic contributions that would result in an increase in overall hybrid methylation levels. mC clusters with a nonadditivedecrease in methylation are enriched for allelic mC contributions that would result in an overall decrease in hybrid methylation. (B) Average allelic methylationlevels of clusters reported in A. Red denotes allelic combinations that would result in an increase in methylation, blue allelic combinations that would result ina decrease in methylation, and black allelic combinations that would result in additive methylation levels in the hybrid.

Dataset S1, Table S1. Overview of MethylC-seq data

Dataset S1 (XLSX)

Dataset S1, Table S2. Contextual methylation level differences

Dataset S1 (XLSX)

All C’s were divided into unmethylated [not significantly different from false methylation rate (FMR; as determined by the chloroplast conversion rate)],semimethylated (statistically different from FMR and <80%) and highly methylated (>80%). CG methylation is either unmethylated or highly methylated. CHGmethylation is evenly distributed between semimethylated and highly methylated. Most CHH sites are considered semimethylated.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 9 of 11

Page 10: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Dataset S1, Table S3. Shows the number of clusters formed for each sample

Dataset S1 (XLSX)

A cluster was formed when at least three mC were within 20 nt of each other. Cluster characteristics are similar between all four samples.

Dataset S1, Table S4. Differential methylation between C24 and Ler accessions

Dataset S1 (XLSX)

Dataset S1, Table S5. Methylation index was used to categorize cluster methylation levels into different categories

Dataset S1 (XLSX)

The methylation index values differed for each context as CG, CHG, and CHH methylation have different frequencies of methylation levels (Fig. S1).

Dataset S1, Table S6. The number of nonadditively methylated clusters in Arabidopsis hybrids

Dataset S1 (XLSX)

Only clusters with 75% or greater of the same C’s covered in parents and at least one hybrid were analyzed. Nonadditive methylated clusters weredetermined if the methylation index score differed by ≥2 from the MPV.*Nonadditive mC clusters in the “Any context” category refers to clusters having at least 1 context with altered methylation.

Dataset S1, Table S7. Both hybrids show similar trends in either additive or nonadditive methylation at the same locus

Dataset S1 (XLSX)

Dataset S1, Table S8. Raw data on mC’s trackable via SNPs contained in the same reads, number that matched our clusters and thenumber of mCs that past the filter and were used for analysis (Fig. S8)

Dataset S1 (XLSX)

Dataset S1, Table S9. Number of possible TCM and TCdM events in an intraspecific hybrid cross between C24 and Ler accessions

Dataset S1 (XLSX)

Dataset S1, Table S10. Numbers of mC clusters associated with siRNAs and independent of siRNAs together with whether they werenonadditively or additively methylated

Dataset S1 (XLSX)

Averages are given for all clusters and then clusters divided into intragenic and nonintragenic.

Dataset S1, Table S11. Pattern of candidate gene’s methylation level and siRNA level in Col

Dataset S1 (XLSX)

Data based on ref. 2. Table shows siRNA level in Col, the level of methylation, and which accession (C24 or Ler) it was similar to. If Columbia wasunmethylated, then there is no way it can be altered in mutants of genes in the methylation pathways (N/A).*At2g32160 was down-regulated in a rdd mutant.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 10 of 11

Page 11: Supporting Information - pnas.org · Supporting Information ... C24, Ler, C24 × Ler, and Ler × C24 ... invoking various commands from the BEDTools package version 1.2 (5).

Dataset S1, Table 12. Primer sequences

Dataset S1 (XLSX)

Dataset S1, Table 13. Filtering table for SNP analysis

Dataset S1 (XLSX)

This table shows the original number of C’s present in a cluster and the number required to be covered in the SNP analysis for us to study a particular cluster.This filtering system enabled a good correlation between original cluster methylation levels and SNP cluster methylation levels (Fig. S9) and resulted in 5,479clusters that were analyzed during the SNP analysis.

Dataset S2. The MethylC-seq analysis was done by using two different clustering approaches

Dataset S2 (XLSX)

The first produced clusters with a minimum of three mC (each had to be within 20 bp of the next mC), while the second approach produced clusters with atleast six mC (each had to be within 20 bp of the next mC). (A–H) Hybrid clusters were divided into 10 groups based on the similarity of methylation levelsbetween parents (H, highly methylated; SH, semi-high; SL, semi-low; L/U, low or unmethylated—refer to Dataset S1, Table S5 for respective index scores). (A–D)The three mC-derived clustering method. (E–H) The six mC-derived clustering approach. Both clustering approaches show the frequency of nonadditivemethylation correlates with increased differences in parental methylation levels. (I and J) Profiling of the three mC clustering approach and the six mC clusteringapproach. Both approaches gave consistent results concerning genomic locations of additive and nonadditive mC clusters. (K) A list of 54 candidate geneswhose altered hybrid expression levels may be a result of altered DNA methylation.

Greaves et al. www.pnas.org/cgi/content/short/1201043109 11 of 11