Supplementary Materials for -...
Transcript of Supplementary Materials for -...
www.sciencemag.org/cgi/content/full/science.aag0291/DC1
Supplementary Materials for
Structure of a yeast activated spliceosome at 3.5 Å resolution
Chuangye Yan, Ruixue Wan, Rui Bai, Gaoxingyu Huang, Yigong Shi*
*Corresponding author. E-mail: [email protected]
Published 21 July 2016 on Science First Release DOI: 10.1126/science.aag0291
This PDF file includes
Materials and Methods Figs. S1 to S23 Tables S1 to S4 Supporting References
Yan & Wan et al
Materials and Methods
Cef1-TAP tagging in S. cerevisiae
The reported cryo-EM structure of the S. pombe spliceosome likely reflects that of the
intron lariat spliceosomal (ILS) complex, because EM density for the 5’-exon is
largely absent (1). To isolate the spliceosomes at the early-to-intermediate stages, we
relied on an affinity purification protocol to screen five different protein tags,
including Brr2, Cef1, Hsh155, Prp8, and Snu114. For each protein tag, we
experimented with several different purification conditions mainly by altering the salt
concentration. In the end, Cef1 as the affinity tag gave rise to the best outcome in
terms of spliceosome yield and quality.
The carboxyl-terminus of Cef1 was TAP-tagged by PCR-based gene
targeting method using the plasmid pF6Aa-CTAP-HphMX6 as a PCR template. The
TAP tag comprises protein A and calmodulin binding peptide. The PCR product was
transformed into haploid Saccharomyces cerevisiae (S. cerevisiae) cells by the lithium
acetate method (2), allowing homologous recombination. Transformants were
selected on hygromycin B-YPD solid medium. Correct integration of the tag into the
genome was confirmed by PCR at the sequence level and by Western blots at the
protein level. The resulting strain carries a TAP tag and the HphMX6 marker at its
carboxyl-terminus.
2
Yan & Wan et al
Purification of the spliceosomal complexes
Purification of the yeast spliceosomal complexes was carried out essentially as
described (3) (Fig. S1A). Briefly, 54 liters of S. cerevisiae were cultured in YPD
medium at 30 oC to an OD600 of 3.5~4. The cell pellets (~135 mL) were collected by
centrifugation and resuspended in Buffer A containing 20 mM HEPES-KOH, pH 7.9,
150 mM KCl, 1.5 mM MgCl2, and 20% glycerol. The cell suspension was dropped
into liquid nitrogen to form yeast beads with a diameter of 3-6 mm and pulverized to
powder by SPEX 6870 Freezer Mill. The frozen cell powder was thawed at room
temperature and resuspended in Buffer A in the presence of a protease inhibitor
cocktail containing final concentrations of 0.5 mM phenylmethylsulphonyl fluoride
(PMSF), 2 mM benzamidine, 2.6 μg/ml aprotinin, 1.4 μg/ml pepstatin and 5 μg/ml
leupeptin. The cell lysate was first centrifuged at 18,000g for 1 hour and the
supernatant was centrifuged again at 100,000g for 1 hour, yielding ~132 mL cell
extract and a pellet of cell debris. The supernatant was incubated with IgG
Sepharose-6 Fast Flow resin (GE Healthcare) and cleaved by TEV protease at 18 oC
for 1.5 hours in Buffer B containing 10 mM Tris-HCl, pH 8.0, 150 mM NaCl, 0.1%
NP40, 1 mM DTT, and 0.5 mM EDTA. The eluent was supplemented with 2 mM
CaCl2 and loaded into calmodulin affinity resin (Stratagene). Finally, the complex
was eluted by the CEB buffer (10 mM Tris-HCl, pH 8.0, 75 mM NaCl, 1 mM
Mg(OAc)2, 1 mM imidazole, 0.01% NP40, 1 mM TCEP, 0.5 mM EGTA), and
3
Yan & Wan et al
concentrated by Amicon Ultra-15 Centrifugal Filter Unit with Ultracel-100 membrane
(Merck Millipore). The final eluent was analyzed by urea PAGE for RNA detection
and negative staining EM for particle intactness. The resulting complex was used for
sample preparation for cryo-EM imaging.
The purified sample contains a mixture of different spliceosomal complexes,
as judged by the presence of all five snRNA species (U2, U4, U5L, U5S, and U6) on
the urea PAGE gel (Fig. S1B) and the appearance of the particles by negative staining
EM (Fig. S1C). By applying the sample to 10-30% glycerol gradient centrifugation,
we were able to individually enrich the Bact and C complexes (Fig. S1D);
unfortunately, this additional purification step results in the loss and dissociation of
the majority of the spliceosomal complexes. A rough estimate indicates that only
about 30-40 percent of the original Bact or C complex remained intact after the
centrifugation step. Thus, although the additional purification step allows enrichment
of the individual spliceosomal complexes, the final yield is highly undesirable.
Because the different spliceosomal particles can be conveniently differentiated by
two-dimensional (2D) and three-dimensional (3D) classifications of the electron
microscopy (EM) data analysis, we decided not to apply this additional step of
purification. This strategy proved to be effective.
4
Yan & Wan et al
EM data acquisition and processing
Uranyl acetate (2%, w/v) was used for negative staining. Briefly, the copper grids
supported by a thin layer of carbon film (Zhongjingkeyi Technology Co. Ltd) were
glow-discharged. 4 µl of the sample at a concentration of ~0.02 mg/ml were applied
onto the grid for 1 minute and stored at room temperature. Images were taken on an
FEI Tecnai Spirit Bio TWIN microscope operating at 120 kV to verify the sample
quality. The same carbon-coated copper grids as those used for negative staining were
used for cryo-EM specimen preparation. Cryo-EM grids were prepared with Vitrobot
Mark IV (FEI Company), using 8 oC and 100 percent humidity. Aliquots of 4 µl
sample at a concentration of ~0.27 mg/mL were applied to glow-discharged grids,
blotted for 2.5 seconds and plunged into liquid ethane cooled by liquid nitrogen.
Images were taken by an FEI Titan Krios electron microscope operating at 300 kV
with a nominal magnification of 22,500x. Images were recorded by a Gatan K2
Summit detector (Gatan Company) using the super-resolution mode, with a pixel size
of 0.653 Å. Defocus values varied from 1.6 to 2.6 μm. Each image was
dose-fractionated to 32 frames with a dose rate of ~8.2 counts/sec/physical-pixel
(~4.7 e-/sec/Å2), an total exposure time of 8.0 seconds, and 0.25 second per frame.
UCSFImage4 was used for all data collection (4).
Preliminary image processing
5
Yan & Wan et al
A total of 12,142 cryo-EM micrographs were collected. All 32 frames in each image
were aligned and summed using the whole-image motion correction program
MOTIONCORR (5), with 2-fold binned to a pixel size of 1.306 Å. The anisotropic
magnification distortion of the micrographs collected on the FEI Titan Krios
microscope was estimated by the program “mag_distortion_estimate” and corrected
by the program “mag_distortion_correct” (6). The defocus value of each image was
determined by CTFFIND3 (7).
An initial data set of 841 micrographs were collected and processed to
inspect the sample quality and composition (Fig. S2). 124,646 particles were
semi-autopicked using the reference-based particle picking subroutine in RELION (8).
The templates for particle picking were obtained from the 2D class averages
calculated from ~3,000 manually picked particles. Particle sorting and two rounds of
reference-free 2D classification were performed to remove ice spots, contaminants,
damaged particles and aggregates using particles binned to a pixel size of 5.224 Å,
yielding 53,066 particles (Fig. S2).
Relying on the STAR file derived from 2D classification, all 53,066 particles
were read back to their original images based on their refined centers. This procedure
is the same as that employed in structure determination of a late-stage spliceosome
6
Yan & Wan et al
from S. cerevisiae and a U4/U6.U5 tri-snRNP from S. pombe (1, 9). Examination of
the micrographs and comparison with the original semi-autopicked 124,646 particles
revealed three findings. First, most ice spots, contaminants, damaged particles and
aggregates have been successfully removed by prior classifications. Second, a sizable
fraction of the good particles appear to have been removed by prior classifications.
Third, some of the remaining particles are mis-centered or appear to be aggregates.
These considerations necessitated one round of manual particle picking and
discarding so as to improve the overall quality of the picked particles. One round of
manual particle picking and discarding was performed with these improved
coordinate files, yielding a final set of 55,772 particles for further processing (Fig. S2).
The published 3.6 Å spliceosome cryo-EM map was low-pass filtered to 40 Å by
CHIMERA (10) and used as the reference for 3D classification. Using binned
particles with a pixel size of 2.612 Å, one round of 3D classification was performed
by global search. As predicted from our biochemical characterization (Fig. S1), one
class of the particles, representing only about 8.3 percent of the total, display an
umbrella appearance that is similar to the reported EM structure of the Bact complex
(11). These particles are the subject of this study. In addition, another class,
representing about 34.5 percent of the total particles, exhibits an appearance that is
characteristic of the C complex or the ILS complex (Fig. S2). These particles are the
subject of an ongoing study that will be reported in the near future. Notably, the 3D
reconstruction of the Bact complex differs in a major way from that of the C complex
7
Yan & Wan et al
or ILS complex by having a chuck of density on the periphery, which represent the
SF3b complex, the RES complex, and well-defined Brr2 (see later). The rest three
classes, representing 57.2 percent of the total, fail to display features that are
characteristic of the known spliceosomal complexes and are thus left out for
immediate investigation.
Image processing
The above-described procedure was applied to all 12,142 micrographs (Fig. S3). In
total, 1,304,231 particles were picked using the reference-based particle picking
subroutine in RELION (8). Particle sorting and two rounds of reference-free 2D
classification were performed to remove ice spots, contaminants, damaged particles
and aggregates, yielding 573,541 particles. All 573,541 particles were read back to
their original images based on their refined centers. One round of manual particle
picking and discarding was performed, resulting in the removal of 97,667 bad
particles and inclusion of 285,893 new particles. In the end, a data set of 761,767
particles was used for further processing.
Our preliminary analysis revealed that the spliceosomal Bact complex is only
represented by about 8.3 percent of the total particles (Fig. S2); thus finding all
particles that represent the genuine Bact complex is important to the improvement of
8
Yan & Wan et al
data quality. Based on our experience, one round of 3D classification may be
insufficient for selection of all Bact particles into one class or Bact-related classes, and
some of the Bact particles might be classified into other spliceosomal complexes or
even other contaminating complexes such as the ribosome. Our 3D classification
results also show that different parameters often result in the classification of different
percentages for the Bact complex. To avoid the problem of discarding good particles
for the low-abundance complex, we simultaneously performed three independent 3D
classifications (K=5, 6, and 7) (Fig. S3). Then we merged all four classes that appear
to represent the Bact complex; using an in-house script, we removed the duplicated
particles according to the unique index of each particle given by RELION. This
procedure allowed us to generate 151,323 particles, representing 19.8 percent of the
total. This procedure was repeated one more time, resulting in 105,685 good particles
(13.9 percent of the total). The Bact complex was gradually enriched. A third round of
3D classification (K=4) yielded 84,486 particles in one major class, which have an
average resolution of 3.9 Å after auto-refinement. Following per-particle motion
correction and radiation-damage weighting (known as particle polishing) (12), these
polished particles give a reconstruction with an improved resolution of 3.58 Å after
auto-refinement. Finally, an additional round of 3D classification was performed
without alignment using the refined polished particles. 77,312 particles in one class,
or 91.5 percent of the input, yield a final reconstruction with an average resolution of
3.52 Å (Fig. S3). The angular distribution of the particles used for the final
9
Yan & Wan et al
reconstruction of the Bact complex is reasonable, and the refinement of the atomic
coordinates did not suffer from severe overfitting (Fig. S4). The resulting density
maps show clear features for the secondary structural elements and amino acid side
chains for most protein components of the Bact complex (Figs. S5-S12). The RNA
elements and their interacting proteins are also well defined by the EM density maps
(Fig. S13-S16).
Reported resolutions are calculated on the basis of the gold-standard FSC
0.143 criterion, and the FSC curves were corrected for the effects of a soft mask on
the FSC curve using high-resolution noise substitution (13). Prior to visualization, all
density maps were corrected for the modulation transfer function (MTF) of the
detector, and then sharpened by applying a negative B-factor that was estimated using
automated procedures (14). Local resolution variations were estimated using ResMap
(15).
Model building and refinement
Due to a wide range of resolution limits for the various regions of the spliceosomal
Bact complex, we combined de novo model building and homologous structure
modeling to generate an atomic model (Table S1). Identification and docking of the
components of the Bact complex were facilitated by the published structures of the
10
Yan & Wan et al
spliceosome at 3.6 Å resolution and the tri-snRNP at 3.8 Å resolution (1, 9). The
proteins and the corresponding PDB accession codes are summarized in Table S2:
3JCM for Prp8, Snu114, U5 RNA, U5 Sm ring and 5’-EXON of pre-mRNA; 3JB9 for
U6 snRNA, U2 snRNA, all proteins of NTC core complex, all proteins of
NTC-Related proteins, Prp17 and branch point sequence of pre-mRNA; 2KOA for
Rds3; 1CVJ for Hsh49; 2MKC for the RES complex; 2JKD for Pml1; 4C9B for
Cwc22; 2CSY for Cwc24;3BKP for Cwc27; and 2OZB for Prp2. The atomic
coordinates of Syf2 and part of Cef1 were obtained from the unpublished cryo-EM
structure of the spliceosomal C complex. These structures were docked into the
density map using COOT (16) and fitted into density using CHIMERA (10).
The atomic models of U5 snRNA, Prp8, Snu114 from the published
cryo-EM structure of the tri-snRNP (9) and the crystal structure of Brr2 (accession
code 5DCA (17)) were directly docked into the density maps and manually rebuilt
using COOT (16). The atomic model of Rds3, Pml1, and NTC-related (NTR) proteins
were generated by CHAINSAW (18) and the backbone was manually adjusted using
COOT (16). After that, automated model rebuilding was performed with RosettaCM
using the adjusted model as the template and the experimental cryo-EM density as a
guide (19-21). The Rosetta distance homology modeling function “RosettaCM” was
specifically designed to assist cryo-EM model building. In this step, we allowed
Rosetta to generate 10 models for each protein and selected the best model by
11
Yan & Wan et al
individually comparing the models with the cryo-EM density maps. Then the
hydrogen atoms of the generated model were removed and model building was
further performed manually using COOT (16). An atomic model for Rse1 was first
predicted by I-TASSER server (22), and the model was further manually corrected
and rebuilt using COOT (16).
For those protein sequences that lack a homologue structure, de novo model
building was performed; this was summarized in Table S2 with a “De novo building”
annotation. These proteins include Hsh155, Cus1, Ysf3, Prp11, Bud13, Cwc21, and
parts of Cwc22 and Cwc24. The chemical properties of proteins and amino acids were
considered to facilitate model building. Sequence assignment was guided mainly by
bulky residues such as Phe, Tyr, Trp and Arg. Unique patterns of sequences were
exploited for validation of residue assignment.
The RNA sequence assignment was greatly aided by the cryo-EM structures
of the ILS complex (1, 23) and the U4/U6.U5 tri-snRNP (9), reported secondary
structures, published base pairing specifics, and the relative sizes of the purine and
pyrimidine bases. The RNA sequences were manually built using COOT (16).
Pre-mRNA was de novo modeled on the basis of the EM density maps using COOT.
The RNA nucleotides, together with all protein components, were refined using
12
Yan & Wan et al
REFMAC in reciprocal space (24). To further improve the geometries of the RNA
nucleotides, the RNA elements alone were adjusted using RCrane (25). The
conformations of the RNA components were further refined using phenix.erraser (26).
ERRASER is a Rosetta program for modeling RNA nucleotides into density.
On the basis of the EM density maps, we identified four metal ions that are
bound by nucleotides in the ISL of U6 snRNA. These metal ions were tentatively
assigned as Mg2+. In all cases, the local maxima of the EM density that may
correspond to ions are 2.0–2.4 Å away from the oxygen atoms of the phosphate
groups, consistent with the metrics for Mg2+ coordination (27-31). In contrast, K+ is
usually measured at 2.8–3.5 Å from the coordinating ligands (27, 28, 32). Therefore,
the densities seen here are likely those of Mg2+. Despite the high likelihood, we
acknowledge that at the reported resolution we cannot unambiguously assign these
metal ions to Mg2+. In addition, we cannot conclusively differentiate Mg2+ from water
molecules, although water molecules should be much less visible in the EM density
maps.
Structure refinement of individual protein was carried out using
phenix.real_space_refine application in PHENIX in real space (33) with secondary
structure and geometry restraints to prevent over-fitting. The final overall model was
13
Yan & Wan et al
refined against the overall 3.52 Å map using REFMAC in reciprocal space (24), using
secondary structure restraints that were generated by ProSMART (34). Overfitting of
the overall model was monitored by refining the model in one of the two independent
maps from the gold-standard refinement approach, and testing the refined model
against the other map (35) (Fig. S4B).
Protein structures in the Bact complex were individually validated through
examination of their Molprobity scores, statistics of Ramachandran plots, and
EMRinger scores (Table S3). Only protein structures that were solved by homology
modeling or de novo building in Table S2 are included for this practice. For obvious
reasons, those structures that were fitted into the cryo-EM density maps by rigid-body
docking were omitted for such model validation. Molprobity scores were calculated
as described (36) . EMRinger scores were calculated as described (37). EMRinger is a
side chain–directed model and map validation tool for cryo-EM structure
determination. EMRinger evaluates how precise an atomic model is fitted into the
cryo-EM map during refinement. EMRinger scores should be above 1.0 for
well-refined structures with maps in the 3- to 4-Å range. The RNA nucleotides in the
Bact complex were validated directly by the Molprobity server, and the results are
shown in Table S4.
14
Yan & Wan et al
Table S1 Cryo-EM data collection and refinement statistics.
Data collection
EM equipment FEI Titan Krios Voltage (kV) 300 Detector Gatan K2 Pixel size (Å) 1.306 Electron dose (e-/Å2) 45.6 Defocus range (µm) 1.6~2.6 Reconstruction Software RELION 1.4 Number of used Particles 77,312 Accuracy of rotation (˚) 0.462 Accuracy of translation (pixels) 0.31 Final Resolution (Å) 3.52 Model building software Coot, Rosetta, RCrane Refinement Software Phenix & Refmac Map sharpening B-factor (Å2) -84.6 Average Fourier shell correlation 0.865
R-factor 0.299
Model composition Protein residues 13,505 RNA nucleotides 357 GTP 1 ATP 1 Validation R.m.s deviations Bonds length (Å) 0.010 Bonds Angle (˚) 1.311 Ramachandran plot statistics (%)
Preferred 92.61 Allowed 5.49 Outlier 1.90
15
Yan & Wan et al
Table S2 Summary of model building for the yeast Bact spliceosomal complex.
Molecule Length Domain/Region PDB code Modeling Resolution (Å)
U5 snRNP
U5 snRNA 214 28:183 3JCM Homology modeling 2.8~4.0
Prp8
2413
N-terminal Domain (127:839)
RT finger/palm (840:1253)
Thumb/X (1254:1377)
Linker (1378:1650)
Endonuclease (1651:1829)
RNaseH-like (1830:2085)
Jab1/MPN (2148-2398)
3JCM
4BGD
Homology modeling
Homology modeling
2.8~4.0
~4.0
Snu114 984 67:975 3JCM Homology modeling 2.8~4.0
Brr2 2176 113:2163 5DCA Homology modeling 3.0~4.0
SmB1
SmD1
SmD2
SmD3
SmE1
SmF1
SmG1
196
146
110
101
94
86
77
Sm fold
Sm fold
Sm fold
Sm fold
Sm fold
Sm fold
Sm fold
3JCM
Rigid docking
4.0~6.0
U6 snRNP U6 snRNA 112 nt 1:106 3JB9 Homology modeling 2.8~4.0
U2 snRNP
U2 RNA 1175 nt 1:66 3JB9 Homology modeling 2.8~4.0
Rse1 1361 WD40 domain (56:1361) - I-TASSER/Modeling 3.0~3.8
Hsh155 971 N-terminal Domain (16:74)
HEAT Repeat domain (156:971)
-
-
De novo building
De novo building
3.0~3.5
3.0~3.5
Cus1 436 131:289 - De novo building 3.0~4.0
Rds3 107 PHF domain (2:104) 2K0A Homology modeling 3.0~3.5
Ysf3 85 1:84 - De novo building 3.0~3.5
Hsh49 213 2 RRM domains Rigid docking ~8
Prp9 530 - - Not modelled -
Prp11 266 Zinc Finger domain (1:108) - De novo building 3.0~3.5
Prp21 280 - Not modelled -
RES complex
Bud13 266 213:265 2MKC De novo building 3.0~4.2
Pml1 204 N-terminal Domain (20-42)
Forkhead associated domain
2MKC
2JKD
Homology modeling
Rigid docking
3.5~4.2
4.0~5.0
Ist/Snu17 194 RRM_ist3 like domain 2MKC Homology modeling 3.5~4.2
NTC/Prp19
Complex
Clf1 687 TPR domain (40:275)
-
3JB9
Homology modeling
Rigid docking
3.0~4.5
~30
Syf1 859 - Rigid docking ~30
Cef1 590 Myb Domain (9:111)
-
Homology modeling
Rigid docking
2.8~3.5
~30
16
Yan & Wan et al
Prp19 503 - Rigid docking ~30
Syf2 215 92:215 From C complex Rigid docking ~7
Snt309 175 - 3JB9 Rigid docking ~30
NTC-Related
proteins
Bud31 157 1:157 3JB9 Homology modeling 3.0~3.5
Cwc2 339 1:261 3JB9 Homology modeling 3.0~4.0
Cwc15 175 3:41/127:175 3JB9 Homology modeling 3.0~4.0
Ecm2 364 RRM domain (3:288) 3JB9 Homology modeling 3.0~5.0
Prp45 379 34:350 3JB9 Homology modeling 3.0~4.0
Prp46 451 WD40 domain (111:447) 3JB9 Homology modeling 2.8~3.5
Known
Splicing
Factors
Cwc21 135 2:28 - De novo building 3.0~3.5
Cwc22 577 MIF4G domain (11:263)
MA3 domain (279:485)
4C9B
-
Rigid docking
De novo building
5~10
3.0~3.5
Cwc24 259 Zf-CCCH Domain (126:169)
RING domain
-
2CSY
De novo building
Homology modeling
2.8~3.2
3.5~4.5
Cwc27 301 Cyclophilin domain (4:170) 3BKP Homology modeling 3.0~4.0
Prp2 876 2OZB Rigid docking 5~15
Spp2 185 - - Not modelled -
Yju2 278 - - Not modelled -
Step2
proteins
Prp17 455 50:75 3JB9 Homology modeling 3.0~4.2
Pre-mRNA Pre-mRNA - 61 nt - De novo building 2.8~4.2
17
Yan & Wan et al
Table S3 Summary of model validation for individual proteins of the yeast Bact complex (Proteins solved by homology modeling or de novo building in Table S2 are included here).
*EMRinger: side chain–directed model and map validation tool for 3D cryo-electron microscopy that can assesses the precise fitting of an atomic model into the map during refinement. To validate the model-to-map correctness of atomic models from cryo-EM, refinement should result in EMRinger scores above 1.0 for well-refined structures with maps in the 3- to 4-Å range.
Molecule Molprobity Scores
Ramachandran plot statistics (%) EMRinger* Score Preferred Allowed Outlier
Prp8 2.06 93.00 5.72 1.28 3.67 Snu114 2.13 92.09 6.54 1.38 3.39
Brr2 2.09 94.24 5.04 0.71 1.89 Rse1 2.33 91.15 6.96 1.89 2.88
Hsh155 1.99 94.83 2.87 2.30 3.57 Cus1 2.16 94.56 5.44 0.00 2.47 Rds3 2.83 86.14 11.88 1.98 4.43 Ysf3 2.00 90.24 8.54 1.22 4.75 Prp11 1.94 92.63 6.32 1.05 2.98 Bud13 2.09 94.12 5.88 0.00 2.71 Pml1 2.02 95.78 3.61 0.60 0.85
Ist/Snu17 2.51 94.07 3.70 2.22 3.24 Clf1 2.09 94.87 5.13 0.00 1.72 Cef1 2.00 93.91 4.35 1.74 4.09
Bud31 2.01 91.61 6.45 1.94 3.00 Cwc2 2.22 93.44 4.25 2.32 2.43
Cwc15 1.79 89.23 10.77 0.00 4.78 Ecm2 2.27 88.14 10.73 1.13 1.57 Prp45 2.02 90.34 7.14 2.52 2.93 Prp46 2.20 90.45 7.46 2.09 3.28 Cwc22 1.85 92.68 5.85 1.46 3.45 Cwc24 2.53 87.16 11.01 1.83 2.86 Cwc27 2.27 89.04 8.90 2.05 3.13
18
Yan & Wan et al
Table S4 Summary of model building, refinement and validation for RNA components of the yeast Bact spliceosome complex.
Model building Software Coot & RCrane Refinement Software Phenix/Phenix.Erraser
Validation Molprobity Server
Validation All RNAs U6 snRNA U5 snRNA U2 snRNA Pre-mRNA
Clash scores 4.86 1.82 2.15 3.85 8.55 Correct sugar puckers (%) 99.44 100.00 99.15 100.00 98.59 Good backbone conf. (%)
80.11 85.44 82.05 86.36 63.38
Good bonds (%) 99.98 100.00 100.00 99.94 100.00 Good angles (%) 100.00 100.00 100.00 100.00 100.00 The percentages of correct sugar puckers, good backbone conformations, good angles, and good bonds were calculated by subtracting the percentages of Probably Wrong sugar puckers, Bad backbone conformations, Bad angles, and Bad bonds reported in the MolProbity server, respectively, from 100 percent.
19
Yan & Wan et al
Fig. S1 Purification and characterization of the spliceosomal complexes from Saccharomyces cerevisiae (S. cerevisiae). (A) A cartoon diagram of the purification protocol. The protein Cef1 was tagged by protein A and a calmodulin binding peptide. (B) The affinity-purified sample contained a mixture of different spliceosomal complexes. Shown here is a denaturing Urea-PAGE gel stained by SYBR® Gold. At least five major RNA species are clearly present, with their sizes corresponding to those of U6 snRNA (112 nucleotides), U4 snRNA (160 nucleotides), two forms of U5 snRNA (179 and 214 nucleotides), and U2 snRNA (1175 nucleotides). (C) A representative electron microscopy (EM) micrograph of the affinity-purified sample stained by uranyl acetate. At least three different spliceosomal complexes are present. Scale bar, 100 nm. (D) Two representative negative-stained EM micrographs of the sample that had been purified through one round of 10-30% glycerol gradient centrifugation. Scale bar, 100 nm. The spliceosomal Bact complex (right) and the C complex (left) had been greatly enriched, but the concentration and final yield for these complex became unacceptably low. We decided to directly use the affinity-purified sample for cryo-EM data acquisition and to rely on two-dimensional (2D) and three-dimensional (3D) classifications to separate the different spliceosomal complexes.
20
Yan & Wan et al
Fig. S2 Analysis of the initial data set of 841 micrographs of the
affinity-purified spliceosomal complexes from S. cerevisiae. This analysis shows
that a small proportion of the particles (about 8.3 percent) corresponds to the Bact
complex. Thus enriching the Bact complex is important for data processing. Please
refer to Materials and Methods for details. This figure, together with Figs. S3 and S4A,
were prepared using CHIMERA (10). All other structural images were created using
PyMol (38).
21
Yan & Wan et al
Fig. S3 A flow chart for the cryo-EM data processing and structure determination of the spliceosomal Bact complex from S. cerevisiae. The final reconstruction has an average resolution of 3.52 Å. Please refer to Materials and Methods for details.
22
Yan & Wan et al
Fig. S4 Cryo-EM analysis of the spliceosomal Bact complex from S. cerevisiae.
(A) Angular distribution of the particles used for the final reconstruction of the
spliceosomal Bact complex. Each cylinder represents one view and the height of
the cylinder is proportional to the number of particles for that view. Two orientations
of the Bact complex are shown. (B) FSC curves of the final refined model versus the
overall 3.52 Å map it was refined against (black); of the model refined in the first of
the two independent maps used for the gold-standard FSC versus that same map (red);
and of the model refined in the first of the two independent maps versus the second
independent map (green). The little difference between the red and green curves
indicates that the refinement of the atomic coordinates did not suffer from severe
overfitting.
23
Yan & Wan et al
Fig. S5 EM density maps for Prp8 in the spliceosomal Bact complex. Shown here are EM density maps for the N-domain (A), RT Palm/Finger (B), Thumb/X (C), Linker (D), endonuclease domain (E), RNaseH-like domain (F), and representative secondary structural elements from these regions of Prp8 (G). The side chain features for many residues are clearly visible, allowing assignment of specific amino acids.
24
Yan & Wan et al
Fig. S6 EM density maps for Snu114 (Cwf10 in S. pombe) and Brr2. (A) Overall EM density maps for Snu114. (B) EM density maps for 12 representative secondary structural elements. Bulky residues are labeled. (C) Overall EM density maps for Brr2. (D) EM density maps for four α-helices and two β-strands of Brr2.
25
Yan & Wan et al
Fig. S7 EM density maps for Rse1 and Rds3. (A) Overall EM density maps for Rse1. It contains three β-propellers, each comprising seven WD40 repeats. (B) EM density maps for 14 representative secondary structural elements of Rse1. The quality of the density maps allowed de novo modeling of Rse1. (C) Overall EM density maps for Rds3. (D) EM density maps for two representative regions of Rds3.
26
Yan & Wan et al
Fig. S8 EM density maps for Hsh155, Ysf3, and Cus1. (A) Overall EM density maps for Hsh155. (B) EM density maps for 18 representative α-helices of Hsh155. The quality of the density maps allowed de novo modeling of Hsh155. (C) Overall EM density maps for Ysf3. (D) EM density maps for two representative α-helices of Ysf3. (E) Overall EM density maps for Cus1. (F) EM density maps for four representative secondary structural elements of Cus1. The density maps allowed de novo modeling of Cus1.
27
Yan & Wan et al
Fig. S9 EM density maps for the retention and splicing (RES) complex. (A) Overall EM density maps for the RES complex. The RES complex comprises Pml1, Bud13, and Snu17. The quality of the density maps allowed de novo modeling of these three proteins. The NTR component Prp45 interacts with both Pml1 and Bud13 to stabilize the RES complex. EM density maps of the four boxed regions are shown in panels B through E. (F) EM density maps for Bud13 and a representative α-helix. (G) EM density maps for Snu17 and a representative α-helix.
28
Yan & Wan et al
Fig. S10 EM density maps for Prp45, Prp46, and Clf1. (A) Overall EM density maps for the highly extended protein Prp45. (B) EM density maps for seven representative secondary structural elements of Prp45. (C) Overall EM density maps for the β-propeller protein Prp46. (D) EM density maps for nine representative β-strands of Prp46. (E) Overall EM density maps for the HAT repeat protein Clf1 (Syf3 in S. pombe). (F) EM density maps for six representative α-helices of Clf1.
29
Yan & Wan et al
Fig. S11 EM density maps for Bud31, Cwc2, Ecm2, and Prp11. (A) Overall EM density maps for Bud31. (B) EM density maps for four local structural elements of Bud31. (C) Overall EM density maps for Cwc2. (D) EM density maps for six local structural elements of Cwc2. (E) Overall EM density maps for Ecm2. (F) EM density maps for two representative local structural elements of Ecm2. (G) Overall EM density maps for the SF3a component Prp11 (SF3a66 in human). (F) EM density maps for two representative α-helices of Prp11.
30
Yan & Wan et al
Fig. S12 EM density maps for the splicing factors Cwc22, Cwc24 and Cwc27, and the NTC component Cef15 (Cdc5 in S. pombe). (A) Overall EM density maps for Cwc22. (B) EM density maps for four representative α-helices of Cwc22. (C) Overall EM density maps for Cwc24. (D) EM density maps for four local structural elements of Cwc24. (E) Overall EM density maps for Cwc27. (F) EM density maps for two representative local structural elements of Cwc27. (G) Overall EM density maps for Cef1. (H) EM density maps for five representative α-helices and one loop of Cef1.
31
Yan & Wan et al
Fig. S13 EM density maps of the RNA elements. (A) Overall EM density maps for the RNA elements. The four RNA molecules are color-coded. Two perpendicular views are shown. (B) Overall EM density maps for U5 snRNA. (C) Two close-up views on the EM density maps of loop I of U5 snRNA (left panel) and its base-pairing interactions with the 5’-exon sequences (right panels). (D) Two close-up views on the EM density maps of the duplex regions of U5 snRNA. (E) Three close-up views on the EM density maps of U5 snRNA.
32
Yan & Wan et al
Fig. S14 EM density maps of U6 snRNA and the active site. (A) Overall EM density maps for U6 snRNA. (B) Two close-up views of the local EM density maps of U6 snRNA. (C) Two perpendicular views of the EM density maps of the intramolecular stem loop (ISL) of U6 snRNA and Helix I of the U2/U6 duplex. (D) A close-up view on Helix I of the U2/U6 duplex. (E) A close-up view on the RNA triplex between U2 and U6 snRNA. (F) A close-up view on the structural Mg2+ ions that help stabilize the ISL of U6 snRNA and its surrounding nucleotides. (G) A close-up view on the duplex between ACAGA box and the 5’SS of the intron. (H) A close-up view on the loop I base-pairing interactions with 5’-exon sequences. (I) Two close-up views of the active-site magnesium ions and their surrounding structural elements.
33
Yan & Wan et al
Fig. S15 EM density maps of U2 snRNA and the surrounding structural components. (A) EM density maps of the duplex between U2 snRNA and the branch point sequence (BPS) of the intron. (B) Two close-up views of the local EM density maps of U2 snRNA. (C) An overall view on how the intron is bound by U2 snRNA and the protein components Hsh155 (yellow) and Rds3 (brown). (D) A close-up view on the EM density maps surrounding the invariant adenine nucleotide in the BPS. A number of residues from both Hsh155 and Rds3 together form a pocket to recognize the adenine nucleotide. (E) Two views of the EM density maps on the interactions among the RES complex, the SF3b complex, and the NTR protein Prp45. The RES complex, positioned at the bottom of Hsh155, directly recognizes the exposed intron sequences just outside of the SF3b complex. Prp45 closely interacts with the RES complex.
34
Yan & Wan et al
Fig. S16 EM density maps at the active site. (A) An overall view of the EM density maps at the active site region, four surrounding proteins are shown here: Cwc24 (yellow), Cus1 (cyan), Prp11 (forest), Prp8 (magenta). (B) A close-up view of the EM density maps surrounding the invariant guanine nucleotide at the 5’-end of the 5’SS. The guanine base is surrounded by residues from the amino-terminal zinc finger domain of Cwc24, particularly Tyr155, Lys160, and Phe161. The zinc ion is located close to the guanine base. (C) A close-up view of the catalytic magnesium ion. An amino-terminal loop from Prp11 is positioned close to the active site. Two positively charge residues Lys10 and Lys11 are shown here. (D) A close-up view on the EM density maps of the active site and base-pairing of 5’-exon and loop I of U5 snRNA. Note the presence of a Tyr residue from Prp11. (E) A close-up view on the local EM density maps of the protein components that shape the active site. The four protein components are Prp11, Cwc24, Cus1, and the 1585 loop of Prp8, which closely interact with each other. (F) A close-up view on the local EM density maps of Cus1 around the active site. Two positively charged residues Arg230 and Lys226 are H-bonded to the nucleotides A51 and A52 of U6 snRNA.
35
Yan & Wan et al
Fig. S17 Overall structure of the spliceosomal Bact complex. In the structure,
five subcomplexes are color-coded: orange for U5 snRNP, marine for U2 snRNP,
grey for NTC, cyan for NTR, and red for the RES complex. The splicing factors Prp2,
Cwc21, Cwc22, Cwc24, and Cwc27 are colored purple. Four views are shown.
36
Yan & Wan et al
Fig. S18 Structural comparison of Prp8 and Spp42. The most highly conserved and the largest spliceosomal component is Prp8 in S. cerevisiae or Spp42 in S. pombe. Shown here are Prp8 from the U4/U6.U5 tri-snRNP (left panels), Prp8 from the Bact complex (middle panels), and Spp42 from the ILS complex (right panel). Compared to Prp8 in the tri-snRNP, the N-domain of the Bact complex and the N-domain of Spp42 are similarly moved closer to the core (represented by double-headed arrows).
37
Yan & Wan et al
Fig. S19 Structures of the protein components in the Bact complex. (A) Structure of the SF3b central scaffold component Hsh155 in two perpendicular views. (B) Structure of Hsh155 with Ysf3, Cus1, and Prp11 bound. (C) Structure of Rse1 in two views. (D) Interactions within the SF3b complex. In this representation, the starting point is the structure of Rse1 bound to Ysf3, Cus1, Prp11, and Rds3. The three components of the RES complex are Snu17 (E), Pml1 (F), and Bud 13 (G). (H) Structure of the splicing factor Cwf27. (I) Structure of the ATPase/helicase Brr2. (J) Structure of the ATPase/helicase Prp2. (K) Structure of the splicing factor Cwf22. (L) Structure of the splicing factor Cwf24.
38
Yan & Wan et al
Fig. S20 The splicing factors Cwc21. Cwc22, Cwc24, and Cwc27 and the
ATPase/helicase Prp2. (A) Cwc21 and the MA3 domain of Cwc22 bind to Prp8
and are both located close to the 5’-exon. Cwc21 forms a β-sheet with the Switch
loop of Prp8 and directly interacts with 5’-exon. (B) The RING domain of Cwc24,
containing two zinc fingers, is bound to Rse1 of the SF3b complex. Another zinc
finger domain at the amino-terminus of Cwc24 is located at the active site and
coordinates the bases GU in the 5’SS. (C) Cwc27, which is a peptidyl-prolyl
cis-trans isomerase (39), binds both the HLH domain of Brr2 and the endonuclease
domain of Prp8. (D) The ATPase/helicase Prp2, which mediates the structural
rearrangement of the Bact to B* complex, is bound to Hsh155 and located at the close
proximity of the RES complex and the 3’-end sequences of the intron.
39
Yan & Wan et al
Fig. S21 Structural comparison of the RNA elements between the U4/U6.U5 tri-snRNP and the Bact complex. (A) Overall views of the RNA map in the Bact complex (left panel) and in the U4/U6.U5 tri-snRNP (9) (right panel). (B) Alignment of the RNA elements between the U4/U6.U5 tri-snRNP (9) and the Bact complex. The alignment was performed on the two U5 snRNA molecules (left panel). A close-up view focusing on the pre-mRNA molecules is shown in the right panel. In the tri-snRNP structure (9), the 5’-exon sequences of the pre-mRNA are already bound to loop I of U5 snRNA, and the 5’SS of the pre-mRNA is recognized by the ACAGA box of U6 snRNA. In the Bact complex, the 5’-exon sequences of the pre-mRNA are similarly bound to loop I of U5 snRNA, and the 5’SS of the pre-mRNA is similarly recognized by the ACAGA box of U6 snRNA.
40
Yan & Wan et al
Fig. S22 Comparison of the catalytic centers between the spliceosomal Bact complex from S. cerevisiae and the pre-catalytic self-splicing group IIC intron. (A) Structure of the catalytic center of the spliceosomal Bact complex from S. cerevisiae, with a close-up view on the active site. The 5’-exon is paired with loop I of U5 snRNA. Mg2+ shown here is likely a catalytic metal (M2) and is coordinated by A59 and G60, which are part of the RNA triplex. The 5’SS forms a kink in the backbone that presents the scissile phosphodiester bond of the splice site to the active site. (B) Structure of the catalytic center of the self-splicing group IIC intron from Oceanobacillus iheyensis in the pre-catalytic state (40), with a close-up view on the active site. The sequences corresponding to 5’-exon is paired with EBS1. The only Mg2+ ion near the active site is not a catalytic metal ion. Similar to the Bact complex, the backbone is kinked at the junction of 5’-exon and the splice site. This RNA conformation represents that just prior to catalysis. Notably, predictions for the active site of the spliceosome was first proposed on the basis of structural studies of the group II introns (41, 42). Comparison between the spliceosomal Bact complex and the self-splicing group IIC intron was performed in PyMol (38).
41
Yan & Wan et al
Fig. S23 Zinc-binding sites in the Bact complex are shown for Rds3 (A), Prp11
(B), Bud31 (C), Cwc2 (D), Ecm2 (E), and Cwc24 (F). In our structure, Rds3
contains three C4-type zinc fingers, Prp11 has a C2H2-type zinc finger, and Bud31
contains three zinc ions coordinated by nine Cys residues. Cwc2 contains a
C3H1-type zinc finger, Ecm2 has two C4-type zinc fingers, and Cwc24 contains three
zinc fingers: two C3H1 and one C4.
References
1. C. Yan et al., Structure of a yeast spliceosome at 3.6-angstrom resolution. Science 349, 1182 (Sep 11, 2015).
2. R. D. Gietz, R. H. Schiestl, Quick and easy yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc 2, 35 (2007).
3. O. Puig et al., The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods 24, 218 (Jul, 2001).
42
Yan & Wan et al
4. X. Li, S. Zheng, D. A. Agard, Y. Cheng, Asynchronous data acquisition and on-the-fly analysis of dose fractionated cryoEM images by UCSFImage. Journal of structural biology 192, 174 (Nov, 2015).
5. X. Li et al., Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nature methods 10, 584 (Jun, 2013).
6. T. Grant, N. Grigorieff, Automatic estimation and correction of anisotropic magnification distortion in electron microscopes. Journal of structural biology 192, 204 (Nov, 2015).
7. J. A. Mindell, N. Grigorieff, Accurate determination of local defocus and specimen tilt in electron microscopy. Journal of structural biology 142, 334 (Jun, 2003).
8. S. H. Scheres, RELION: implementation of a Bayesian approach to cryo-EM structure determination. Journal of structural biology 180, 519 (Dec, 2012).
9. R. Wan et al., The 3.8 A structure of the U4/U6.U5 tri-snRNP: Insights into spliceosome assembly and catalysis. Science 351, 466 (Jan 29, 2016).
10. E. F. Pettersen et al., UCSF Chimera--a visualization system for exploratory research and analysis. Journal of computational chemistry 25, 1605 (Oct, 2004).
11. P. Fabrizio et al., The evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome. Molecular cell 36, 593 (Nov 25, 2009).
12. S. H. Scheres, Beam-induced motion correction for sub-megadalton cryo-EM particles. eLife 3, e03665 (2014).
13. S. Chen et al., High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135, 24 (Dec, 2013).
14. P. B. Rosenthal, R. Henderson, Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J Mol Biol 333, 721 (Oct 31, 2003).
15. A. Kucukelbir, F. J. Sigworth, H. D. Tagare, Quantifying the local resolution of cryo-EM density maps. Nature methods 11, 63 (Jan, 2014).
16. P. Emsley, K. Cowtan, Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126 (2004).
17. E. Absmeier et al., The large N-terminal region of the Brr2 RNA helicase guides productive spliceosome activation. Genes & development 29, 2576 (Dec 15, 2015).
18. N. Stein, CHAINSAW: a program for mutating pdb files used as templates in molecular replacement. J Appl Crystallogr 41, 641 (Jun, 2008).
19. F. DiMaio, M. D. Tyka, M. L. Baker, W. Chiu, D. Baker, Refinement of Protein Structures into Low-Resolution Density Maps Using Rosetta. J Mol Biol 392, 181 (Sep 11, 2009).
20. Y. Song et al., High-resolution comparative modeling with RosettaCM. Structure 21, 1735 (Oct 8, 2013).
43
Yan & Wan et al
21. F. DiMaio et al., Atomic-accuracy models from 4.5-A cryo-electron microscopy data with density-guided iterative local refinement. Nature methods 12, 361 (Apr, 2015).
22. J. Yang et al., The I-TASSER Suite: protein structure and function prediction. Nature methods 12, 7 (Jan, 2015).
23. J. Hang, R. Wan, C. Yan, Y. Shi, Structural basis of pre-mRNA splicing. Science 349, 1191 (Sep 11, 2015).
24. G. N. Murshudov, A. A. Vagin, E. J. Dodson, Refinement of macromolecular structures by the maximum-likelihood method. Acta crystallographica. Section D, Biological crystallography 53, 240 (May 1, 1997).
25. K. S. Keating, A. M. Pyle, RCrane: semi-automated RNA model building. Acta crystallographica. Section D, Biological crystallography 68, 985 (Aug, 2012).
26. F. C. Chou, P. Sripakdeevong, S. M. Dibrov, T. Hermann, R. Das, Correcting pervasive errors in RNA crystallography through enumerative structure prediction. Nature methods 10, 74 (Jan, 2013).
27. M. M. Harding, Geometry of metal-ligand interactions in proteins. Acta crystallographica. Section D, Biological crystallography 57, 401 (Mar, 2001).
28. M. M. Harding, Metal-ligand geometry relevant to proteins and in proteins: sodium and potassium. Acta crystallographica. Section D, Biological crystallography 58, 872 (May, 2002).
29. M. C. Erat, R. K. Sigel, Divalent metal ions tune the self-splicing reaction of the yeast mitochondrial group II intron Sc.ai5gamma. Journal of biological inorganic chemistry : JBIC : a publication of the Society of Biological Inorganic Chemistry 13, 1025 (Aug, 2008).
30. P. Auffinger, L. Bielecki, E. Westhof, Anion binding to nucleic acids. Structure 12, 379 (Mar, 2004).
31. M. Marcia, A. M. Pyle, Principles of ion recognition in RNA: insights from the group II intron structures. RNA 20, 516 (Apr, 2014).
32. J. Mahler, I. Persson, A study of the hydration of the alkali metal ions in aqueous solution. Inorganic chemistry 51, 425 (Jan 2, 2012).
33. P. D. Adams et al., PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta crystallographica. Section D, Biological crystallography 66, 213 (Feb, 2010).
34. R. A. Nicholls, M. Fischer, S. McNicholas, G. N. Murshudov, Conformation-independent structural comparison of macromolecules with ProSMART. Acta crystallographica. Section D, Biological crystallography 70, 2487 (Sep, 2014).
35. A. Amunts et al., Structure of the yeast mitochondrial large ribosomal subunit. Science 343, 1485 (Mar 28, 2014).
36. I. W. Davis et al., MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids research 35, W375 (Jul, 2007).
37. B. A. Barad et al., EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nature methods 12, 943 (Oct, 2015).
44
Yan & Wan et al
38. W. L. DeLano, The PyMOL Molecular Graphics System. on World Wide Web http://www.pymol.org, (2002).
39. A. Ulrich, M. C. Wahl, Structure and evolution of the spliceosomal peptidyl-prolyl cis-trans isomerase Cwc27. Acta crystallographica. Section D, Biological crystallography 70, 3110 (Dec 1, 2014).
40. R. T. Chan, A. R. Robart, K. R. Rajashankar, A. M. Pyle, N. Toor, Crystal structure of a group II intron in the pre-catalytic state. Nature structural & molecular biology 19, 555 (May, 2012).
41. K. S. Keating, N. Toor, P. S. Perlman, A. M. Pyle, A structural analysis of the group II intron active site and implications for the spliceosome. RNA 16, 1 (Jan, 2010).
42. N. Toor, K. S. Keating, S. D. Taylor, A. M. Pyle, Crystal structure of a self-spliced group II intron. Science 320, 77 (Apr 4, 2008).
45