Uncovering the Cannabis sativa methylome through Enzymatic ...
Transcript of Uncovering the Cannabis sativa methylome through Enzymatic ...
Brittany S. Sexton1, Louise Williams1, V.K. Chaithanya Ponnaluri1, Keerthana Krishnan1, Luo Sun1, Yvonne Helbert2, Lei Zhang2, Christine J. Sumner1, Fiona J. Stewart1, Bradley W. Langhorst1,
Timothy Harkins2, Kevin J. McKernan2, Eileen T. Dimalanta1, Theodore B. Davis11New England Biolabs, Inc., Ipswich, MA 01938 2Medicinal Genomics Corp., Woburn, MA 01801
Uncovering the Cannabis sativa methylome through Enzymatic Methyl-seq
INTRODUCTION
CONCLUSIONS
METHODS
RESULTS
REFERENCES
Cannabis sativa is an industrial crop producing more economic value than the top five crops inCalifornia combined. In the medical field, cannabinoids are used to treat and alleviate symptomsfor a wide range of diseases, including nausea in cancer patients undergoing chemotherapy. Sexdetermination in cannabis is an important economic factor. Pollination dramatically reducescannabinoid expression in female plants. To express high levels of cannabinoids, female plantsare grown in isolation of male pollen. However, female plants are capable of stress inducedhermaphroditism that can lead to acres of pollination and low cannabinoid yields.
5-methylcytosine (5mC) is an important epigenetic mechanism that regulates many cellularprocesses, including sex determination. To better understand the hermaphroditic process incannabis, more robust 5mC surveying tools are required. The current gold standard fordetermining 5mC, whole genome bisulfite sequencing (WGBS), introduces DNA damage and GC-biased sequences resulting in skewed methylation profiles. To further complicate matters, thecannabis genome is 66% AT-rich and 83% AT-rich after bisulfite conversion. Very few technologiescurrently address the complexity of plant methylation signals, and to date no methylome exists forcannabis.
NEBNext Enzymatic Methyl-seq (EM-seq™) is a novel enzymatic method used to detect 5mC.EM-seq successfully identified the 5mCs within the cannabis genome. Libraries were generatedusing genomic DNA extracted from female leaf for EM-seq and WGBS. To give insight into themethylation-based regulation of the hermaphroditic process, EM-seq libraries were also preparedusing genomic DNA from female flowers, female seeded flowers, and male flowers. We concludethat EM-seq is superior to WGBS and delivers higher library yields, more accurate methylationinformation, reduced DNA damage, increased sequencing length, and decreased GC-bias. EM-seq enables the study of the previously unknown cannabis methylome and opens up thepossibility to discover critical and exciting regulatory functions for the hermaphroditic process.
PCRAmplificationAPOBEC
Ultra II End Prep/
Ligation
ShearedGenomic
DNATET2/Oxidation
EnhancerIllumina
Sequencing
• Jamaican Lion genomic DNA was extracted from femaleclones (leaf, seeded and unseeded flowers) and male sibling(flowers) plants.
• 50 ng of genomic DNA, spiked with control (unmethylatedlambda DNA and CpG methylated pUC19) were shearedusing the Covaris® S2 instrument
• Sheared DNA was end-repaired then ligated to modifiedsequencing adaptors
• 5mC and 5hmC were protected from APOBEC deamination by TET2/Oxidation Enhancer
• Cytosines were deaminated to uracils with APOBEC• Libraries were amplified using NEBNext® Q5UTM Master Mix
and NEBNext Multiplex Oligos for Illumina (96 Unique DualIndex Primer Pairs, E6440S)
• Libraries were sequenced using an Illumina Nextseq 500,2x76 base paired reads.
• Bisulfite conversion was performed using Zymo Research EZDNA Methylation-GoldTM kit
SAMPLE PREPARATION
DATA ANALYSIS
Methyl KitMethylDackel
Methyl Extraction
Align(BWAmeth)
SamblasterDeduplicate Picard
• Reads were aligned to Jamaican Lion reference genome(August 2018 assembly) and analyzed using the tools inabove flowchart.
• Four contigs from this assembly showed anomalousmethylation patterns and were excluded.
8 PCR cycles6 PCR cycles
0
0.5
1
1.5
2
2.5
3
3.5
Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2
Perc
ent D
uplic
atio
n
0
5
10
15
20
25
Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2
Yiel
d (n
M)
HIGHER QUALITY SEQUENCING METRICS WITH EM-SEQ COMPARED TO WGBS
EM-seq libraries outperform WGBS forCannabis sativa input DNA as shown infigures A-E: (A) Less PCR cycles are required(6 vs 8 cycles respectively) for EM-seq. (B)EM-seq libraries have larger library insertsizes than WGBS. Additionally, the EM-seqprotocol has been optimized for both standardand large insert libraries. (C) EM-seq librarieshave lower duplication percentages. (D) GCdistribution is more even for EM-seq thanWGBS. (E) More even base coverage for EM-seq than WGBS. 50 million, 2 x 76 basereads were used for analysis (Figures A-D)and 130 million 2 x 76 base reads were usedfor Figure E.
C
A
E
Coverage Depth
% o
f Bas
es a
t thi
s De
pth
Insert Size (bp)
B
PCR
Yiel
d (n
M)
% D
uplic
atio
n
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Coverage Depth
Proje
ct
Medic
inalG
enom
ics_Leaf_
WG
BS
_E
Mseq_C
om
bin
edB
am
s
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
Avg. F
rac B
ases
20%
16%
12%
8%
4%
0%50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
Insert Size
JL
0.00%
0.02%
0.04%
0.06%
0.08%
0.10%
0.12%
0.14%
0.16%
0.18%
0.20%
0.22%
Avg. %
of in
sert
s
% o
f Rea
ds
EM-seq (standard)EM-seq (large insert)WGBS
50 100 150 200 250 300 350 400 450 500 550 600 650 700
Norm
aliz
ed C
over
age
Percent GC
0 10 20 30 40 50 60 70 80 90 100
5 10 15 20 25 30
EM-seq-1 EM-seq-2 WGBS-1 WGBS-2
EM-seq-1 EM-seq-2 WGBS-1 WGBS-2
EM-seq WGBS10
0
4
2
6
8
1
0
1.0
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.0
2.2
25
0
5
10
15
20
0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
WGBS
CANNABIS EM-SEQ LIBRARIES ARE SUPERIOR TO WGBS
DIFFERENTIAL CpG METHYLATION IDENTIFIED BETWEEN CANNABIS FLOWER TISSUES
A
A
EM-seq enables the identification of differential methylation across Cannabis sativa flower tissues: femaleflower and female seeded flower (clones) and the male flower (sibling). (A) % 5mC levels in the CpG, CHG,and CHH contexts for all flower EM-seq libraries. The female methylation levels for flower and seededflower were higher than the male flower indicating methylation patterns are potentially determined by sex.Unmethylated Lambda control was <0.5% methylated and pUC19 was >97.5% CpG methylated (data notshown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpG contextbetween (1) female flower and female seeded flower (2) female flower and male flower. The differentialmethylation calls for female flower and female seeded flower identified 30 hypomethylated CpGs but nohypermethylated CpGs. The differential methylation calls for female and male flower identified >50,000hypomethylated CpGs and >11,000 hypermethylated. (C) Comparison of the hypomethylated CpGs withdifferential methylation between the female flower samples & female and male flowers.
0
25
50
75
100
CpG CHG CHH CpG CHG CHH CpG CHG CHH
0
10
20
30
40
LC-MS Female Leaf EM-seq Female Leaf WGBS Female Leaf
B
0
25
50
75
100
CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH
Tota
l % 5
mC
EM-seq Female Leaf WGBS Female Leaf
Female Flower-1
Female Flower-2
Female Seeded Flower-1
Female Seeded Flower-2
Male Flower-1
Male Flower-2
MethylatedUnmethylated
Genomic Location (504 bp)
MethylatedUnmethylated
Differential methylation analysis comparing female flower and seeded flower (clones) and the male flower(sibling) were performed. Significant differential methylation calls (q<0.01) for CpG context (1x coverageminimum) between female flower with female seeded flower and/or male flower were identified. (A)Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked tothe positive regulation of seed production. The female flower is CpG methylated in this region, suggestingthe expression of the genes are turned off, while the female seeded flower is unmethylated at this CpG.(B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and thisgene is linked to the positive regulation of THCA production. Interestingly, the female flower is CpGunmethylated in this region, suggesting the expression of the gene is turned on, while the male flower ismethylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce logscale less THCA.
(A) Comparison of total % 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The % 5mCfor LC-MS is determined as a percentage of total cytosines ((5mC/(total C))x100). 5mC percentages for EM-seq and WGBS were determined by combining 5mC in the 3 methylated cytosine contexts (CpG. CHH,CHG). EM-seq cytosine methylation numbers are closer to LCMS methylation values than WGBS. (B) %5mC levels in the CpG, CHG, and CHH contexts for EM-seq and WGBS for female leaf. UnmethylatedLambda control was <0.5% methylated for EM-seq and <2% for WGBS and was >97.5% CpG methylatedfor both EM-seq and WGBS (data not shown). (C) CpG context correlations of EM-seq and WGBS libraries(1x minimum coverage). Both EM-seq and WGBS libraries were highly correlated between replicates andmethods (CHG and CHH context data not shown). 50 million, 2 x76 base reads were used for methylationanalysis.
q-va
lue
Methylation Difference
Hypo Hyper
Female & Female Seeded Flower Female & Male Flower
A B C
B
C
METHYLATION PROFILES IDENTIFIED GENES INVOLVED IN SEED AND CANNABINOID PRODUCTION
Genomic Location (992 bp)
Female Flower-1
Female Flower-2
Female Seeded Flower-1
Female Seeded Flower-2
Male Flower-1
Male Flower-2
0.0000
0.0025
0.0050
0.0075
0.0100
−100 −50 0 50 100meth.diff
qvalue factor(factor)hypo
0.0000
0.0025
0.0050
0.0075
0.0100
−100 −50 0 50 100meth.diff
qvalue
factor(factor)hypo
hyper
0.0100
0.0075
0.0050
0.0025
0.0000
-100 0 100Methylation Difference
-100 0 100
0.0100
0.0075
0.0050
0.0025
0.0000
100
75
50
25
0
% 5
mC
per C
con
text
40
30
20
10
0EM-seqLC-MS
100
75
50
25
0
% 5
mC
per C
con
text
Female Flower
Female Seeded Flower
Male Flower
q-va
lue
Female & Female Seeded Flower Female & Male Flower
RESULTS
EM-seq enables the first in depth analysis of the Cannabis sativa methylome anddifferential methylation identified genes involved in seed production and THCA production.
EM-seq libraries compared to WGBS libraries had:• Higher Library Yields with less PCR cycles• Lower Percent Duplication• More Even Base Coverage
• Larger Library insert Sizes• Less GC Bias• Similar Percentage Methylation as LC-MS
1. Faust, G. G., and I. M. Hall. “SAMBLASTER: Fast Duplicate Marking and Structural Variant ReadExtraction.”2. Pedersen et al. (2014) “Fast and Accurate Alignment of Long Bisulfite-Seq Reads.”3. Ryan D, Pederson, B “MethylDackel”4. Broad Institute. (2016). Picard tools.
DEM-seq WGBS
EM-seq WGBS
UUGTCGGAUUGC
TTGTCGGATTGC
Converted:
Sequenced:
CCGTCGGACCGC
mhm
CCGTCGGACCGC
ca/g
UUGTCGGAUUGC
TET2/Oxidation Enhancer
APOBEC
TTGTCGGATTGC
ca/g
CCGTCGGACCGC
mhm