Structure determination of noncanonical RNA motifs from H ...

39
1 Structure determination of noncanonical RNA motifs from 1 H NMR chemical shift data alone Parin Sripakdeevong 1 , Mirko Cevec 2 , Andrew T. Chang 3 , Michèle C. Erat 4,9 , Melanie Ziegeler 2 , Qin Zhao 5 , George E. Fox 6 , Xiaolian Gao 6 , Scott D. Kennedy 7 , Ryszard Kierzek 8 , Edward P. Nikonowicz 3 , Harald Schwalbe 2 , Roland K. O. Sigel 9 , Douglas H. Turner 10 , and Rhiju Das 1,11,12,* 1 Biophysics Program, Stanford University, Stanford, CA 94305, USA 2 Center for Biomolecular Magnetic Resonance, Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe University Frankfurt, Max-von-Laue-Strasse 7, 60438 Frankfurt, Germany 3 Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77251, USA 4 Department of Biochemistry, University of Oxford, Oxford OX1 3QU, United Kingdom 5 Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA 6 Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA 7 Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA 8 Institute of Bioorganic Chemistry, Polish Academy of Sciences, 60-714 Poznan, Noskowskiego 12/14, Poland 9 Institute of Inorganic Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland 10 Department of Chemistry, University of Rochester, Rochester, New York 14627, USA 11 Department of Biochemistry, Stanford University, Stanford, CA 94305, USA 12 Department of Physics, Stanford University, Stanford, CA 94305, USA * To whom correspondence should be addressed. Phone: (650) 723-5976. Fax: (650) 723-6783. E-mail: [email protected].

Transcript of Structure determination of noncanonical RNA motifs from H ...

Page 1: Structure determination of noncanonical RNA motifs from H ...

1

Structure determination of noncanonical RNA motifs from 1H NMR chemical shift data alone

Parin Sripakdeevong1, Mirko Cevec2, Andrew T. Chang3, Michèle C. Erat4,9, Melanie Ziegeler2, Qin

Zhao5, George E. Fox6, Xiaolian Gao6, Scott D. Kennedy7, Ryszard Kierzek8, Edward P. Nikonowicz3, Harald Schwalbe2, Roland K. O. Sigel9, Douglas H. Turner10, and Rhiju Das1,11,12,*

1Biophysics Program, Stanford University, Stanford, CA 94305, USA 2Center for Biomolecular Magnetic Resonance, Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe University Frankfurt, Max-von-Laue-Strasse 7, 60438 Frankfurt, Germany 3Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77251, USA 4Department of Biochemistry, University of Oxford, Oxford OX1 3QU, United Kingdom 5Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA 6Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA 7Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA 8Institute of Bioorganic Chemistry, Polish Academy of Sciences, 60-714 Poznan, Noskowskiego 12/14, Poland 9Institute of Inorganic Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland 10Department of Chemistry, University of Rochester, Rochester, New York 14627, USA 11Department of Biochemistry, Stanford University, Stanford, CA 94305, USA 12Department of Physics, Stanford University, Stanford, CA 94305, USA * To whom correspondence should be addressed. Phone: (650) 723-5976. Fax: (650) 723-6783. E-mail: [email protected].

Page 2: Structure determination of noncanonical RNA motifs from H ...

2

Abstract

Structured non-coding RNAs play key roles in fundamental processes from gene regulation to viral

pathogenesis, but determining these molecules’ three-dimensional conformation remains time-consuming,

even for the smallest functional motifs. We demonstrate herein that the integration of non-exchangeable 1H chemical shift data with Rosetta de novo modeling can consistently produce high-resolution RNA

structures, without the use of other NMR measurements. On a benchmark set of 21 noncanonical motifs,

including 11 blind targets, CS-ROSETTA-RNA recovered the experimental structure with high accuracy

(0.6 to 2.0 Å all-heavy-atom rmsd) in 16 cases. These successes included blind predictions of a group II

intron internal loop, two tRNA anticodon loops, four UNAC tetraloops, and a highly irregular 5´-GAGU-

3´/3´-UGAG-5´ internal loop that required additional synthesis of samples with modified sequence to

solve by conventional NMR means. Four cases poorly resolved by CS-ROSETTA-RNA involved

dynamics or lack of structure that also complicated traditional structure determination. By using

previously untapped information in 1H chemical shifts, CS-ROSETTA-RNA provides a new route for

enabling, accelerating, and validating structure determination of noncanonical RNA motifs.

Page 3: Structure determination of noncanonical RNA motifs from H ...

3

Introduction

RNA molecules form complex three-dimensional structures that play key roles in a multitude of cellular

processes [see e.g., ref. (1)]. These RNAs are typically composed of canonical helices interconnected by

motifs with intricate, noncanonical structure. With size ranges on the order of a dozen nucleotides or less,

these RNA motifs offer compelling targets for solution NMR approaches (2, 3). Nevertheless, NMR

characterization of RNA motifs does not always produce sufficient constraints to generate confident

atomic-resolution 3D models (4-9). Compared to protein NMR spectroscopy, RNAs present

approximately half the density of protons, give lower dispersion of spin resonances due to the limited

chemical diversity of the four nucleobases, and lack a robust methodology that achieves unambiguous

sequential assignment via through-bond correlation experiments (2, 3). The development of new RNA

structure modeling approaches that extract more information from limited NMR data would accelerate the

structure determination process, permit cross-checks of model validity, and enable the determination of

structures for RNA sequences that require extraordinary efforts to solve by conventional NMR methods.

NMR chemical shifts have long been recognized as an important source of structural information

for functional macromolecules. Backbone chemical shifts are widely used for protein analysis, including

determination of protein secondary structures and backbone torsions (10, 11) and refinement of three-

dimensional models (12-14). More recently, chemical shift data have been leveraged for accurate de novo

protein structure determination (15-17). The development of chemical shift methodologies for RNA could

be similarly fruitful (18). First, non-exchangeable 1H chemical shifts are routinely collected in most NMR

experiments and account for 55% of all RNA chemical shift data deposited in the Biological Magnetic

Resonance Bank (BMRB) (19). Second, algorithms have been developed to ‘back-calculate’ non-

exchangeable 1H chemical shifts from RNA 3D structure (20-23). In particular, the well-calibrated

Page 4: Structure determination of noncanonical RNA motifs from H ...

4

NUCHEMICS program has been used to refine models generated from conventional NMR measurements

(NOE, J-couplings, residual dipolar couplings) (24, 25) and to successfully determine de novo structures

of simple helical forms of nucleic acids (26). Nevertheless, structural information contained in NMR

chemical shift data is still not routinely leveraged for RNA structure determination.

In this work, we demonstrate that assigned 1H chemical shift data alone provide sufficient

information to determine the structures of noncanonical RNA motifs at high resolution, without other

NMR measurements. As with recent CS-ROSETTA approaches developed for proteins (16, 27, 28), the

key innovation has been the integration of chemical shift data with high-resolution de novo structure

prediction. Recent years have seen the development of a physically realistic Rosetta energy function for

RNA (29) and advances in RNA conformational sampling (30, 31) that enable atomic-accuracy recovery

of noncanonical RNA structures in favorable cases. We therefore reasoned that incorporating

experimentally determined 1H NMR chemical shifts into de novo modeling might enable consistent

structure determination of noncanonical RNA motifs. This article presents the resulting protocol,

Chemical-Shift-ROSETTA for RNA (CS-ROSETTA-RNA), and its extensive benchmark on 21 RNA

motifs, including 11 blind targets.

Results

Coupling 1H chemical shifts to Rosetta RNA structure modeling

We illustrate the CS-ROSETTA-RNA method first on a complex test motif that was challenging for prior

Rosetta approaches, a conserved UUAAGU hexaloop from 16S ribosomal RNA involved in RNA/protein

contacts (32, 33) (Fig. 1A). On one hand, application of two previously described RNA structure

prediction methods, Fragment Assembly with Full-Atom Refinement (FARFAR) (29) and Stepwise

Page 5: Structure determination of noncanonical RNA motifs from H ...

5

Assembly (SWA) (31) successfully generated models with atomic-resolution agreement to this

hexaloop’s crystallographic structure (0.52 Å all-heavy-atom rmsd; see Figs. 1A-B). On the other hand,

the Rosetta all-atom energy function incorrectly evaluated the energy of these near-native models

compared to highly non-native models by at least 8 kBT (Fig. 1C). We reasoned that the structural

information contained in chemical shifts could be used to better discriminate the near-native models.

Indeed, the Rosetta near-native model explained several noncanonical 1H chemical shifts observed

experimentally (33), such as a significant upfield chemical shift for the H4´ atom of the hexaloop’s U7

nucleotide due to its apposition against the nucleobase of A5 [3.39 ppm, more than six standard

deviations upfield of the BMRB average (4.38 ± 0.15 ppm); see Fig. 1B]. Overall, the NUCHEMICS

back-calculated chemical shifts for the near-native Rosetta model gave good agreement with the

experimental non-exchangeable 1H chemical shifts as quantified by a chemical shift root-mean-square-

deviation (rmsdshift) of 0.19 ppm (Fig. 1D). Furthermore, for the entire set of Rosetta models, the chemical

shift deviation became progressively larger for models with greater conformational discrepancy from the

crystallographic structure (Fig. 1E). We therefore supplemented the Rosetta energy function with a

chemical shift pseudo-energy term (Eshift) proportional to the sum of squared deviations between

experimental and back-calculated chemical shifts (see Methods). After this pseudo-energy term was

added to the Rosetta all-atom energy function, the near-native model indeed gave the lowest energy

among all Rosetta models (Fig. 1F).

A benchmark for high resolution RNA modeling

To evaluate the generality and accuracy of the CS-ROSETTA-RNA method, we carried out modeling on

a benchmark set containing 21 RNA motifs (see Table 1). First, we applied CS-ROSETTA-RNA to a test

set of 10 noncanonical RNA motifs for which published chemical shift data as well as structural models

Page 6: Structure determination of noncanonical RNA motifs from H ...

6

derived from NMR and, in some cases, crystallography were available (see SI Appendix, Table S1). These

RNA motifs ranged in size from six to thirteen nucleotides and included both hairpins and internal loops.

On average, 6.2 non-exchangeable 1H chemical shifts per nucleotide (out of 7-8 total) were assigned,

including both backbone and base protons (SI Appendix, Table S1). To assess the applicability of CS-

ROSETTA-RNA to RNAs that form heterogeneous ensembles, this test set included 3 motifs that were

known to adopt multiple conformations and/or display structural dynamics in solution: a G:G mismatch

with two alternative conformations (35), a poorly structured apical loop of the HIV-1 TAR RNA (36),

and a tandem UG:UA mismatch that contains no hydrogen bonds (37) (SI Appendix, Fig. S2). In addition

to these cases, we further tested CS-ROSETTA-RNA on 11 blind RNA targets concurrently under

investigation by five independent NMR laboratories. Sequences and assigned chemical shifts for these

targets, but no other information, were made available for chemical-shift-guided modeling. Subsequent

comparison of CS-ROSETTA-RNA models with structures derived from conventional NMR approaches

thus served as blind evaluations for the new method.

Over the entire benchmark of 21 RNA motifs, CS-ROSETTA-RNA returned 16 cases (76%) in

which at least one of the five lowest energy cluster centers achieved better than 2.0 Å rmsd (over all

heavy atoms; see Table 1). The method performed well on both the test set of known structures (8/10

success cases) and the blind targets set (8/11 success cases). Furthermore, 10 of the 21 cases satisfied a

more stringent success criterion: the lowest energy (top-ranked) model was within atomic-accuracy of the

experimental structure (under 1.5 Å all-heavy-atom rmsd). As expected, this number of stringent success

cases was significantly higher than the 2 of 21 cases achieved by modeling without chemical shift data

(see SI Appendix, Table S2).

CS-ROSETTA-RNA success cases included atomic-accuracy models from diverse sources, such

Page 7: Structure determination of noncanonical RNA motifs from H ...

7

as the most conserved internal loop from the signal recognition particle (SRP) RNA (6, 7, 38) (rmsd of

0.81 Å; Fig. 2A); a two-way junction with a 90° turn from the hepatitis C virus IRES (39, 40) (rmsd of

1.48 Å; Fig. 2B); and a 4x4 internal loop from an R2 retrotransposon (41) (rmsd of 1.17 Å; Fig. 2C).

Successful blind predictions included the cuUCCaa anticodon stem-loop of Bacillus subtilis tRNAGly

(rmsd of 1.41 Å; Fig. 2D); a 5´-GU-3´/3´-UAU-5´ internal loop from a group II intron (rmsd of 1.37 Å;

Fig. 2E); and all four UNAC tetraloops (Figs. 2F-G). In each of these cases, the high accuracy of the CS-

ROSETTA-RNA model was reflected not only in low rmsd to the experimental structure but also

accurate recovery of the base pair and base stack geometries as classified in the Leontis-Westhof scheme

(34) (see SI Appendix, Table S3).

High-confidence models and structure cross-validation

Several CS-ROSETTA-RNA predictions gave strong convergence, as defined by a distinct energy

‘funnel’: a single dominant conformation and geometrically similar models achieved better energy than

all other conformations. In six benchmark cases, the lowest energy model gave an energy gap of >3.0 kBT

to the next-lowest energy cluster and, in each of these cases, the model achieved atomic-accuracy (under

1.5 Å rmsd to experimental structure; see SI Appendix, Fig. S3). This energy gap thus appears to be a

hallmark of CS-ROSETTA-RNA accuracy (see also A criterion for confidence prediction subsection of

SI Appendix, Supporting Results). In one apparent exception, the SRP conserved internal loop, a large

energy gap (5.5 kBT) strongly suggested that the CS-ROSETTA-RNA prediction should be accurate, but

the lowest energy CS-ROSETTA-RNA model showed disagreements with the experimental NMR models

(6) (>2.0 Å rmsd; see SI Appendix, Figs. S4A-B). Further analysis revealed that the experimental NMR

models poorly explained the 1H chemical shift data published in the same study (6) (rmsdshift = 0.52 ppm)

Page 8: Structure determination of noncanonical RNA motifs from H ...

8

and poorly agreed with subsequently solved crystallographic structures (7, 38) [rmsd of 2.30 Å to PDB:

1LNT (38)]. In contrast, the CS-ROSETTA-RNA model gave excellent agreement with the chemical shift

data (rmsdshift = 0.18 ppm) and closely matched the crystallographic structures (rmsd of 0.81 Å to PDB:

1LNT; see Fig. 2A and SI Appendix, Figs. S4C-D). The SRP motif case suggests that the CS-ROSETTA-

RNA method can be used to help cross-validate or remodel NMR-derived structures.

Blind modeling of a highly irregular internal loop

Perhaps the most striking CS-ROSETTA-RNA result was the blind prediction of a self-complementary

4x4 internal loop with the sequence 5´-GABrGU-3´/3´-UBrGAG-5´ (BrG, denoting 8-bromoguanosine, was

introduced to stabilize the major of two conformations observed in the RNA). The original NMR study

(42) of this motif provided a descriptive 2D schematic (reproduced in Fig. 3A) but did not produce

sufficient constraints to generate an atomic-detail three-dimensional structure due to ambiguity in NOE

assignments. In contrast, CS-ROSETTA-RNA was able to produce a 3D model (Fig. 3B) with a large

energy gap (28.1 kBT), giving high confidence in the model’s accuracy. Further supporting its accuracy,

the CS-ROSETTA-RNA model recovered information proposed by the NMR study but set aside during

structure modeling, including two G-G trans Watson-Crick/Hoogsteen base pairs, the two-fold rotational

symmetry of the structure, and C2´-endo ribose conformations at six nucleotides (see SI Appendix,

Supporting Results). Finally, chemical shift data strongly supported an unusual feature of the CS-

ROSETTA-RNA model, previously unseen in the database of RNA structures [as assessed by FR3D (43);

see SI Appendix, Supporting Results]: the nucleobases of the two central adenosines were stacked on each

other, formed no base pairs, and formed hydrogen bonds to the opposing strand backbone. To

accommodate these interactions, the adenosines adopted an irregular backbone conformation, involving

Page 9: Structure determination of noncanonical RNA motifs from H ...

9

~180° inversions in their ribose orientation relative to the riboses in the preceding and following

nucleotides. The resulting geometry positioned the adenosines’ H4´, 1H5´, and 2H5´ atoms directly above

the neighboring guanosine nucleobases, leading to strongly predicted upfield shifts that agreed well with

the experimentally measured chemical shifts (rmsdshift = 0.18 ppm; Fig. 3C). Here, the enumerative SWA

method (29, 31), which does not rely on a fragment database of previous structures, was critical for

modeling the conformation of this highly irregular internal loop.

Subsequently, an additional series of NMR experiments with newly synthesized non-self-

complementary duplex allowed for the assignment of previously ambiguous NOEs. The resulting

conventionally determined NMR structure of the 5´-GABrGU-3´/3´-UBrGAG-5´ internal loop further

confirmed the accuracy of the CS-ROSETTA-RNA model at atomic resolution (see Fig. 3D and SI

Appendix, Fig. S5). CS-ROSETTA-RNA high-confidence modeling of the 5´-GABrGU-3´/3´-UBrGAG-5´

motif suggests that the method can be used to facilitate structure determination in cases where NOE

distances and assignments cannot be determined unambiguously due to overlap and/or symmetry (42).

Poor accuracy cases

Despite generally giving high-resolution models, CS-ROSETTA-RNA returned 5 of 21 cases with poorer

than 2.0 Å rmsd accuracy. In one case (uuGCCaa anticodon stem-loop of B. subtilis tRNAGly), either

FARFAR modeling using a different fragment database or SWA modeling alone returned <2.0 Å rmsd

structures among the five lowest energy clusters (SI Appendix, Fig. S6), suggesting that this motif would

be recoverable with different sampling schemes. For the other four failure cases, examination of the

original experimental NMR models revealed potential complications for structure modeling. Each of

these cases presented large deviations in the coordinates of the experimental NMR ensemble (>2.5 Å

Page 10: Structure determination of noncanonical RNA motifs from H ...

10

average pairwise rmsd for HIV-1 TAR apical loop and two HAR1 internal loops) or more localized

fluctuations (tandem UG:UA mismatch; see SI Appendix, Fig. S2). Such structural dynamics in solution

precluded high-resolution agreement between these structures and the CS-ROSETTA-RNA models.

Interestingly, a separate benchmark case involving a heterogeneous ensemble was recovered by CS-

ROSETTA-RNA: models for two populated conformations of the G:G mismatch were attained and

agreed with the experimental NMR ensemble (35) (Figs. 2H and 2I). Thus, overall, our benchmark results

demonstrate the general applicability of CS-ROSETTA-RNA for high resolution RNA structure

determination but highlight the possibility of ambiguity in cases where the RNA conformation is dynamic

or unstructured.

Discussion

Natural RNA sequences form intricate three-dimensional structures involving the interconnection of

canonical helices and small, functional noncanonical motifs. By integrating NMR chemical shift data into

a new generation of RNA de novo modeling algorithms, CS-ROSETTA-RNA enables confident and rapid

determination of noncanonical RNA motif structures in a manner fundamentally distinct from prior

methods, using independent and far less experimental information. While obtaining resonance

assignments is a necessary and time-consuming step of NMR-based RNA characterization, the structural

information contained in chemical shifts are typically left aside during the generation of RNA structural

models. Furthermore, the standard operating procedure (2, 3) of determining NOEs, J-couplings, and, in

some cases, residual dipolar coupling, does not always yield sufficient information to determine an

RNA’s three-dimensional structure by conventional means, as illustrated by the 5´-GABrGU-3´/3´-

UBrGAG-5´ example above (Fig. 3). Here, incorporating assigned 1H chemical shift data alone into

Page 11: Structure determination of noncanonical RNA motifs from H ...

11

Rosetta-based RNA modeling gives high accuracy structures for the majority of cases in a 21-RNA

benchmark, including 8 of 11 blind targets. A large energy gap between the lowest and next lowest

energy cluster center provided a powerful criterion for assessing confidence in the modeling. Since the

models are based on independent data, the CS-ROSETTA-RNA structures can provide strong cross-

validation or point out potential inaccuracies for NMR models generated by conventional means, as in the

SRP RNA motif case (see Fig. 2C and SI Appendix, Fig. S4). Furthermore, in cases like the irregular 5´-

GABrGU-3´/3´-UBrGAG-5´ loop where ambiguities hindered NOE assignments, the CS-ROSETTA-RNA

approach permits the bypass of experiments with additional samples to yield high-confidence, all-atom

models.

The CS-ROSETTA-RNA method can be used immediately for modeling or cross-validating new

RNA motifs. In addition, several extensions for the CS-ROSETTA-RNA approach appear feasible, based

on recent work in chemical-shift-guided protein modeling (16, 27, 28). Integration of chemical shift

information with residual dipolar coupling in de novo structure prediction could permit the rapid

modeling of larger RNAs composed of multiple motifs (39, 44, 45). Furthermore, the development and

calibration of back-calculation algorithms for RNA 13C, 15N, and exchangeable 1H chemical shifts (46,

47) will enable the incorporation of these additional, structural data into CS-ROSETTA-RNA. Finally,

NMR experiments can reveal chemical shifts of slowly exchanging or rarely populated ‘hidden’ states of

nucleic acids with potential functional importance (48-50). Thus, further integration of de novo modeling

and NMR methodologies may allow not just the acceleration of structure determination but eventually

lead to the solution of currently intractable three-dimensional RNA structures.

Methods

Page 12: Structure determination of noncanonical RNA motifs from H ...

12

Generation of Rosetta Models

Two complementary structure-modeling methods, Fragment Assembly with Full-Atom Refinement

(FARFAR) and Stepwise Assembly (SWA), were used in parallel to generate the Rosetta models for each

motif. SWA models were constructed using a series of recursive building steps, as described previously

(31). Each step involved enumerating several million conformations for each nucleotide, and all step-by-

step build-up paths were covered in N2 building steps where N is the number of nucleotides in the motif.

At the final building steps, all models are finely clustered and a maximum of 10,000 low energy SWA

models were retained. The SWA approach is effective at generating models that are highly optimized to

the underlying all-atom energy function, but can produce primarily incorrect models when the assumed

energy function is inaccurate. Therefore, models were also generated by fragment assembly followed by

full-atom refinement (FARFAR) in the Rosetta framework, as described previously (29); the fragment

source was the large ribosomal subunit of H. marismortuii (PDB: 1JJ2). For each motif, 250,000

FARFAR models were generated; these models were then finely clustered and a maximum of 10,000 low

energy FARFAR models were retained. The SWA and FARFAR models were then combined leading to

~10,000-20,000 final Rosetta models for each motif. The total computational costs for the generation of

SWA and FARFAR models in term of modern central processing unit (CPU) are as follow. For SWA

runs, the computational cost range from ~5,000 CPU hours for the smallest motif (6-nt) to ~50,000 CPU

hours for the largest motif (13-nt) investigated in this work. For FARFAR runs, the computational cost

range from ~3,000 CPU hours for the smallest motif (6-nt) to ~8,000 CPU hours for the largest motif (13-

nt). Algorithms and complete documentation are being incorporated into Rosetta release 3.5

(www.rosettacommons.org), freely available for academic use.

To further encourage usage of the CS-ROSETTA-RNA method by the general NMR RNA

community, a web server where users can access and submit CS-ROSETTA-RNA jobs is made available

Page 13: Structure determination of noncanonical RNA motifs from H ...

13

at http://rosie.rosettacommons.org/rna_denovo. Due to computational resource limitations and to ensure

short queue time, the web server runs a slightly modified version of CS-ROSETTA-RNA where the

models are generate using only the FARFAR method and the maximum number of models per job

submission is limited to 50,000.

Incorporation of 1H chemical shifts into structure modeling

Information from the experimental non-exchangeable 1H chemical shifts were incorporated into the

modeling process through the chemical shift pseudo-energy term:

!

Eshift = c " # iexp $# i

calc( )i%

2 [1]

where and are, respectively, the experimental and back-calculated chemical shift in ppm units

(the index i sums over all experimentally assigned non-exchangeable 1H chemical shifts in the RNA

motif), and c is a weighting factor set to 4.0 kBT/ppm-2 based on test runs with different motifs. The

NUCHEMICS program (22) was used to back-calculate non-exchangeable 1H chemical shifts. In the 21

RNA motifs benchmark set, only 3 chemical shift datasets (UUCG tetraloop, Chimp HAR1 GAA loop

and Human HAR1 GAA loop) included stereospecific assignments of the diastereotopic 1H5´ and 2H5´

protons pair. For the remaining 18 chemical shift datasets, the assignment of 1H5´ and 2H5´ was

determined for each model based on which values gave better agreement between the experimental and

back-calculated chemical shifts.

Each Rosetta model was refined and rescored under the hybrid all-atom energy:

Ehybrid = ERosetta + Eshift [2]

where ERosetta is the standard Rosetta all-atom energy function for RNA (29), and Eshift is the chemical

shift pseudo-energy term. Refinement of the models under the Ehybrid all-atom energy function was carried

Page 14: Structure determination of noncanonical RNA motifs from H ...

14

out using continuous minimization in torsional space with the Davidson–Fletcher–Powell algorithm under

the Rosetta framework [For this purpose, the NUCHEMICS algorithm was rewritten inside the Rosetta

codebase (51)]. After refinement, the models were rescored and re-ranked under the Ehybrid all-atom

energy function. Finally, all models were clustered, such that models with pairwise all-heavy-atom rmsd

below 1.5 Å are grouped. The lowest energy member of each cluster was designated as the cluster center

and the five lowest energy cluster centers were designated the CS-ROSETTA-RNA predictions.

Page 15: Structure determination of noncanonical RNA motifs from H ...

15

Acknowledgments

We thank Hashim Al-Hashimi for suggesting the problem and the Rosetta community for sharing code.

We are grateful to G. Varani and J. D. Puglisi for providing experimental 1H chemical shift data (for

HIV-1 TAR apical loop and Hepatitis C Virus IRES IIa, respectively). Calculations were carried out on

the BioX2 cluster (NSF award CNS-0619926) and XSEDE resources (allocation MCB090153). We

acknowledge financial support from a Burroughs Wellcome Career Award at the Scientific Interface (to

RD), a C. V. Starr Asia/Pacific Stanford Graduate Fellowship (to PS), the National Institutes of Health

Grant GM73969 to (EPN), the Swiss National Science Foundation and the European Commission (BIO-

NMR, ERC-Starting Grant, Marie Curie Fellowship) (to MCE and RKOS), and the Deutsche

Forschungsgemeinschaft (Cluster of Excellence) and the European Commission (Bio-NMR, WeNMR,

Marie Curie Fellowship) (to MC, MZ and HS).

Page 16: Structure determination of noncanonical RNA motifs from H ...

16

References

1. Gesteland RF, Cech T, & Atkins JF (2006) The RNA world: the nature of modern RNA suggests a

prebiotic RNA world (Cold Spring Harbor Lab Press, Cold Spring Harbor, NY). 2. Furtig B, Richter C, Wohnert J, & Schwalbe H (2003) NMR spectroscopy of RNA.

Chembiochem 4(10):936-962. 3. Scott LG & Hennig M (2008) RNA structure determination by NMR. Methods Mol Biol 452:29-

61. 4. Peterson RD & Feigon J (1996) Structural change in Rev responsive element RNA of HIV-1 on

binding Rev peptide. J Mol Biol 264(5):863-877. 5. Ippolito JA & Steitz TA (2000) The structure of the HIV-1 RRE high affinity rev binding site at

1.6 A resolution. J Mol Biol 295(4):711-717. 6. Schmitz U, James TL, Lukavsky P, & Walter P (1999) Structure of the most conserved internal

loop in SRP RNA. Nat Struct Biol 6(7):634-638. 7. Jovine L, et al. (2000) Crystal structure of the ffh and EF-G binding sites in the conserved domain

IV of Escherichia coli 4.5S RNA. Structure 8(5):527-540. 8. Tolbert BS, et al. (2010) Major groove width variations in RNA structures determined by NMR

and impact of 13C residual chemical shift anisotropy and 1H-13C residual dipolar coupling on refinement. J Biomol NMR 47(3):205-219.

9. Nabuurs SB, Spronk CAEM, Vuister GW, & Vriend G (2006) Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors. PLoS Comput Biol 2(2):e9.

10. Szilágyi L (1995) Chemical shifts in proteins come of age. Progress in Nuclear Magnetic Resonance Spectroscopy 27(4):325-442.

11. Cornilescu G, Delaglio F, & Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13(3):289-302.

12. Kuszewski J, Gronenborn AM, & Clore GM (1995) The impact of direct refinement against proton chemical shifts on protein structure determination by NMR. J Magn Reson B 107(3):293-297.

13. Clore GM & Gronenborn AM (1998) New methods of structure refinement for macromolecular structure determination by NMR. Proc Natl Acad Sci U S A 95(11):5891-5898.

14. Vila JA, et al. (2008) Quantum chemical 13C(alpha) chemical shift calculations for protein NMR structure determination, refinement, and validation. Proc Natl Acad Sci U S A 105(38):14389-14394.

15. Cavalli A, Salvatella X, Dobson CM, & Vendruscolo M (2007) Protein structure determination from NMR chemical shifts. Proc Natl Acad Sci U S A 104(23):9615-9620.

16. Shen Y, et al. (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci U S A 105(12):4685-4690.

17. Wishart DS, et al. (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res 36(Web Server issue):W496-502.

18. Wishart DS & Case DA (2001) Use of chemical shifts in macromolecular structure determination. Methods Enzymol 338:3-34.

19. Ulrich EL, et al. (2008) BioMagResBank. Nucleic Acids Res 36(Database issue):D402-408. 20. Prado FR & Giessner-Prettre C (1981) Parameters for the calculation of the ring current and

atomic magnetic anisotropy contributions to magnetic shielding constants: Nucleic acid bases and

Page 17: Structure determination of noncanonical RNA motifs from H ...

17

intercalating agents. Journal of Molecular Structure: THEOCHEM 76(1):81-92. 21. Dejaegere A, Bryce Richard A, & Case David A (1999) An Empirical Analysis of Proton

Chemical Shifts in Nucleic Acids. Modeling NMR Chemical Shifts, ACS Symposium Series, (American Chemical Society), Vol 732, pp 194-206.

22. Cromsigt JA, Hilbers CW, & Wijmenga SS (2001) Prediction of proton chemical shifts in RNA. Their use in structure refinement and validation. J Biomol NMR 21(1):11-29.

23. Lam SL (2007) DSHIFT: a web server for predicting DNA chemical shifts. Nucleic Acids Res 35(Web Server issue):W713-717.

24. Huthoff H, Girard F, Wijmenga SS, & Berkhout B (2004) Evidence for a base triple in the free HIV-1 TAR RNA. RNA 10(3):412-423.

25. Girard FC, Ottink OM, Ampt KA, Tessari M, & Wijmenga SS (2007) Thermodynamics and NMR studies on Duck, Heron and Human HBV encapsidation signals. Nucleic Acids Res 35(8):2800-2811.

26. van der Werf RM (2011) New Methods for Studying Structure and Dynamics of RNA. Ph.D. Thesis (Radboud University Nijmegen Nijmegen, The Netherlands).

27. Shen Y, Vernon R, Baker D, & Bax A (2009) De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 43(2):63-78.

28. Raman S, et al. (2010) NMR structure determination for larger proteins using backbone-only data. Science 327(5968):1014-1018.

29. Das R, Karanicolas J, & Baker D (2010) Atomic accuracy in predicting and designing noncanonical RNA structure. Nat Methods 7(4):291-294.

30. Das R & Baker D (2007) Automated de novo prediction of native-like RNA tertiary structures. Proceedings of the National Academy of Sciences 104(37):14664-14669.

31. Sripakdeevong P, Kladwang W, & Das R (2011) An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling. Proc Natl Acad Sci U S A 108(51):20573-20578.

32. Carter AP, et al. (2000) Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics. Nature 407(6802):340-348.

33. Zhang H, Fountain MA, & Krugh TR (2001) Structural characterization of a six-nucleotide RNA hairpin loop found in Escherichia coli, r(UUAAGU). Biochemistry 40(33):9879-9886.

34. Leontis NB & Westhof E (2001) Geometric nomenclature and classification of RNA base pairs. RNA 7(4):499-512.

35. Burkard ME & Turner DH (2000) NMR structures of r(GCAGGCGUGC)2 and determinants of stability for single guanosine-guanosine base pairs. Biochemistry 39(38):11748-11762.

36. Aboul-ela F, Karn J, & Varani G (1996) Structure of HIV-1 TAR RNA in the absence of ligands reveals a novel conformation of the trinucleotide bulge. Nucleic Acids Res 24(20):3974-3981.

37. Shankar N, et al. (2007) NMR reveals the absence of hydrogen bonding in adjacent UU and AG mismatches in an isolated internal loop from ribosomal RNA. Biochemistry 46(44):12665-12678.

38. Deng J, Xiong Y, Pan B, & Sundaralingam M (2003) Structure of an RNA dodecamer containing a fragment from SRP domain IV of Escherichia coli. Acta Crystallogr D Biol Crystallogr 59(Pt 6):1004-1011.

39. Lukavsky PJ, Kim I, Otto GA, & Puglisi JD (2003) Structure of HCV IRES domain II determined by NMR. Nat Struct Biol 10(12):1033-1038.

40. Zhao Q, Han Q, Kissinger CR, Hermann T, & Thompson PA (2008) Structure of hepatitis C virus IRES subdomain IIa. Acta Crystallogr D Biol Crystallogr 64(Pt 4):436-443.

41. Lerman YV, et al. (2011) NMR structure of a 4 x 4 nucleotide RNA internal loop from an R2

Page 18: Structure determination of noncanonical RNA motifs from H ...

18

retrotransposon: identification of a three purine-purine sheared pair motif and comparison to MC-SYM predictions. RNA 17(9):1664-1677.

42. Hammond NB, Tolbert BS, Kierzek R, Turner DH, & Kennedy SD (2010) RNA internal loops with tandem AG pairs: the structure of the 5'GAGU/3'UGAG loop can be dramatically different from others, including 5'AAGU/3'UGAA. Biochemistry 49(27):5817-5827.

43. Sarver M, Zirbel C, Stombaugh J, Mokdad A, & Leontis N (2008) FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. Journal of Mathematical Biology 56(1):215-252.

44. Davis JH, et al. (2005) RNA helical packing in solution: NMR structure of a 30 kDa GAAA tetraloop-receptor complex. J Mol Biol 351(2):371-382.

45. Wang J, et al. (2009) A method for helical RNA global structure determination in solution using small-angle x-ray scattering and NMR measurements. J Mol Biol 393(3):717-734.

46. Rossi P & Harbison GS (2001) Calculation of 13C chemical shifts in rna nucleosides: structure-13C chemical shift relationships. J Magn Reson 151(1):1-8.

47. Aeschbacher T, Schubert M, & Allain FH (2012) A procedure to validate and correct the 13C chemical shift calibration of RNA datasets. J Biomol NMR 52(2):179-190.

48. Johnson JE, Jr. & Hoogstraten CG (2008) Extensive backbone dynamics in the GCAA RNA tetraloop analyzed using 13C NMR spin relaxation and specific isotope labeling. J Am Chem Soc 130(49):16757-16769.

49. Bothe JR, et al. (2011) Characterizing RNA dynamics at atomic resolution using solution-state NMR spectroscopy. Nat Methods 8(11):919-931.

50. Nikolova EN, et al. (2011) Transient Hoogsteen base pairs in canonical duplex DNA. Nature 470(7335):498-502.

51. Leaver-Fay A, et al. (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545-574.

Page 19: Structure determination of noncanonical RNA motifs from H ...

19

Figure Legends Figure 1. The CS-ROSETTA-RNA method illustrated on a conserved UUAAGU hairpin from the

16S rRNA. (A) The crystallographic model of the UUAAGU hairpin (PDB: 1FJG). (B) Rosetta near-

native model with a 0.52 Å all-heavy-atom rmsd to the crystallographic model (rmsd calculated over the

entire loop, excluding the flexible G6 extra-helical bulge). Two-dimensional schematics follow the

Leontis and Westhof nomenclature (34). (C) Plot of the Rosetta energy vs. rmsd to the crystallographic

conformation for all Rosetta models before the inclusion of the chemical shift pseudo-energy term. The

Rosetta all-atom energy function incorrectly assigns numerous non-native models (>5.0 Å rmsd) with

significantly lower energies (~8 kBT) than the near-native models. (D) Comparison of the back-calculated

chemical shifts from the Rosetta near-native model vs. experimental 1H chemical shift values shows

excellent agreement (rmsdshift= 0.19 ppm). (E) Plot of the average rmsdshift of all Rosetta models in 0.5-Å

rmsd bins from the crystallographic structure. On average, the agreement between the experimental and

back-calculated chemical shifts becomes progressively worse for models with greater conformational

discrepancy from the crystallographic structure. (F) Plot of the Rosetta energy vs. rmsd to the

crystallographic conformation for all Rosetta models after the inclusion of the chemical shift pseudo-

energy term. With chemical shift data, the near-native model (G) becomes the lowest energy model

overall (green circle).

Figure 2. Comparison of experimental and CS-ROSETTA-RNA models for diverse RNA motifs.

The Rosetta models (shown in color) are overlaid over the experimental structures (shown in white). (A)

4x4 internal loop from the most conserved domain of the signal recognition particle (SRP) RNA (PDB:

1LNT). (B) Hepatitis C Virus IRES Subdomain IIa (PDB: 2PN4). (C) 4x4 internal loop from an R2

retrotransposon (PDB: 2L8F). (D) Glycine tRNA(UCC) anticodon stem-loop from Bacillus subtilis

(PDB: 2LBL). (E) 5´-GU-3´/3´-UAU-5´ internal loop from a group II intron (PDB: not yet deposited). (F)

UAAC tetraloop (PDB: 4A4R). (G) UCAC tetraloop (PDB: 4A4S). (H) G(anti):G(syn) single mismatch

(~75% populated; PDB: 1F5G). (I) G(syn):G(anti) single mismatch (~25% populated; PDB: 1F5H). The

rmsds between Rosetta models (energy cluster rank) and the experimental structure are (A) 0.81 Å (first),

(B) 1.48 Å (second), (C) 1.17 Å (first), (D) 1.41 (third), (E) 1.37 Å (first), (F) 0.94 Å (first), (G) 1.00 Å

(first), (H) 0.71 Å (first), and (I) 0.63 Å (second). The two-dimensional schematics are annotated based

Page 20: Structure determination of noncanonical RNA motifs from H ...

20

on the experimental structure and follow the Leontis and Westhof nomenclature (34).

Figure 3. CS-ROSETTA-RNA modeling of the 5´-GAGU-3´/3´-UGAG-5´ self-complementary

internal loop. (A) A 2D schematic proposed in the original NMR study (42) and (B) CS-ROSETTA-

RNA model of the 5´-GABrGU-3´/3´-UBrGAG-5´ symmetric duplex. In the CS-ROSETTA-RNA model,

the backbone of A5 and A5* adopts a highly irregular conformation, which leads to a ~180° inversion in

the adenosines’ ribose orientation relative to the riboses in the preceding and following nucleotides (G4,

G6, G4*, G6*; see direction of arrows at the O4* ribose atoms). This inverted ribose conformation

positions the H4´, 1H5´ and 2H5´ ribose protons of A5 and A5* directly above the G6 and G6* guanine

rings and the resulting large ring current effects explain the large upfield shifts of these protons. (C)

Back-calculated chemical shift values give excellent agreement with experimentally determined values

(rmsdshift = 0.18 ppm). The upfield shifted H4´, 1H5´ and 2H5´ ribose protons of A5 and A5* are

highlighted with green circles. (D) Subsequently determined experimental NMR ensemble of the 5´-

GABrGU-3´/3´-UBrGAG-5´ internal loop (PDB: 2LX1) agrees with the CS-ROSETTA-RNA model at

atomic resolution (1.07 Å rmsd between the first member of the NMR ensemble and the CS-ROSETTA-

RNA model).

Table Legends Table 1. The CS-ROSETTA-RNA method benchmarked on 21 RNA motifs.

Page 21: Structure determination of noncanonical RNA motifs from H ...

Figure 1

Crystal (PDB: 1FJG) With Chemical Shift Data (1st-ranked model)

DATA Linear Fit

y = 1.02x ! 0.02 R2 = 0.985_ _ _

rmsdshift= 0.193 ppms

E F D

A C B

Page 22: Structure determination of noncanonical RNA motifs from H ...

A Figure 2

E

B

C

F

H

F

I

G

D

Page 23: Structure determination of noncanonical RNA motifs from H ...

Figure 3 A

C DATA Linear Fit

y = 1.003x ! 0.045 R2 = 0.976__ ___

rmsdshift = 0.183 ppm _ _

D NMR Ensemble

1H5"

2H5"

H4"

B CS-ROSETTA-RNA Model

Page 24: Structure determination of noncanonical RNA motifs from H ...

Table 1. The CS-ROSETTA-RNA method benchmarked on 21 RNA motifs. Motif name PDBa Nnt

b rmsdc,d

(Å) rmsd-top5c,e

(Å)

Known structures Single G:G mismatch 1F5G 6 0.71 0.71 UUCG tetraloop 2KOC 6 0.84 0.84 Tandem GA:AG mismatch 1MIS 8 1.10 1.10 Tandem UG:UA mismatch 2JSE 8 3.02 2.52 16s rRNA UUAAGU loop 1FJG 8 0.52 0.52 HIV-1 TAR apical loop 1ANR 8 5.86 5.86 tRNAi

Met ASL 1SZY 9 3.89 1.35 Conserved SRP internal loop 1LNT 12 0.81 0.81 R2 retrotransposon 4x4 loop 2L8F 12 1.17 1.17 Hepatitis C virus IRES IIa 2PN4 13 3.21 1.48 Blind targets UAAC tetraloop 4A4R 6 0.97 0.97 UCAC tetraloop 4A4S 6 1.00 1.00 UGAC tetraloop 4A4U 6 3.60 1.67 UUAC tetraloop 4A4T 6 1.72 1.72 Chimp HAR1 GAA loop 2LHP 7 2.88 2.88 Human HAR1 GAA loop 2LUB 7 2.26 2.03 GU:UAU internal loop – f 9 1.37 1.37 tRNAGly ASL (cuUCCaa)g 2LBL 9 3.28 1.41 tRNAGly ASL (cuUCCcg)g 2LBK 9 3.42 1.94 tRNAGly ASL (uuGCCaa)g 2LBJ 9 3.08 2.93 Self-symmetric GABrGU loop 2LX1 12 1.14 1.14 rmsd < 1.50 Å ! ! 10/21 13/21 rmsd < 2.00 Å ! ! 11/21 16/21

Additional information and full motif names provided in SI Appendix, Tables S1 and S3. Energy vs. rmsd plots for all 21 benchmark motifs provided in SI Appendix, Figure S1.

a PDB ID of experimental reference structure. b Motif size, the number of nucleotides in the modeled RNA motif. Each motif consists of non-canonical core nucleotides closed by boundary canonical (W.C. or G-U wobble) base pairs. c All-heavy-atom rmsd over all nucleotides, excluding the boundary canonical base pairs after alignment over all nucleotides. Nucleotides found to be extra-helical bulges (both unpaired and unstacked) in the experimental structure were excluded from both the alignment and the rmsd calculation. Bold text indicates rmsd better than 2.0 Å. d All-heavy-atom rmsd of the first-ranked (lowest energy) model to the experimental structure. e Lowest all-heavy-atom rmsd to the experimental structure among the five lowest energy cluster centers. f The experimental structure has not yet been deposited into the PDB database. g The sequence of the 7-nt anticodon loop is given in parentheses with the anticodon triplet in uppercase.

Page 25: Structure determination of noncanonical RNA motifs from H ...

1

Supporting Information for “Structure determination of noncanonical RNA motifs from 1H NMR chemical shift data alone”

Parin Sripakdeevong1, Mirko Cevec2, Andrew T. Chang3, Michèle C. Erat4,9, Melanie Ziegeler2, Qin Zhao5, George E. Fox6, Xiaolian Gao6, Scott D. Kennedy7, Ryszard Kierzek8, Edward P. Nikonowicz3,

Harald Schwalbe2, Roland K. O. Sigel9, Douglas H. Turner10, and Rhiju Das1,11,12,* 1Biophysics Program, Stanford University, Stanford, CA 94305, USA 2Center for Biomolecular Magnetic Resonance, Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe University Frankfurt, Max-von-Laue-Strasse 7, 60438 Frankfurt, Germany 3Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77251, USA 4Department of Biochemistry, University of Oxford, Oxford OX1 3QU, United Kingdom 5Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA 6Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA 7Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA 8Institute of Bioorganic Chemistry, Polish Academy of Sciences, 60-714 Poznan, Noskowskiego 12/14, Poland 9Institute of Inorganic Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland 10Department of Chemistry, University of Rochester, Rochester, New York 14627, USA 11Department of Biochemistry, Stanford University, Stanford, CA 94305, USA 12Department of Physics, Stanford University, Stanford, CA 94305, USA

To whom correspondence should be addressed. Phone: (650) 723-5976. Fax: (650) 723-6783. E-mail: [email protected].

Page 26: Structure determination of noncanonical RNA motifs from H ...

2

This Supporting Information contains the following sections:

Supporting Methods.

Supporting Results.

Figure S1. Energy vs. all-heavy-atom RMSD to the experimental structure.

Figure S2. Four dynamic and/or unstructured motifs from the RNA motif benchmark.

Figure S3. Egap vs. RMSD of the CS-ROSETTA-RNA lowest energy model to the experimental

structure.

Figure S4. Comparisons between CS-ROSETTA, NMR and crystallographic models of the most

conserved internal loop from the signal recognition particle RNA.

Figure S5. Comparisons between the CS-ROSETTA-RNA and NMR models of the 5´-GABrGU-3´/3´-

UBrGAG-5´ self-complementary internal loop.

Figure S6. Modeling results on the uuGCCaa anticodon stem-loop of B. subtilis tRNAGly by three

different models generation methods.

Table S1. Supplemental information on the RNA motifs benchmark

Table S2. Supplemental Rosetta modeling results (before inclusion of chemical shift data)

Table S3. Supplemental Rosetta modeling results (after inclusion of chemical shift data)

Page 27: Structure determination of noncanonical RNA motifs from H ...

3

Supporting Methods Updates to the standard Rosetta all-atom energy function for RNA Minor updates were made to the Rosetta all-atom energy function for RNA (1). The energy unit reported herein is in Rosetta units (RU), which is used internally by the Rosetta program to store evaluated energy values. Comparisons to RNA Watson-Crick helix thermodynamic parameters (2) indicate that 1 Rosetta unit (RU) is approximately equal to 1 kBT (1) (note that the structure modeling results in this study do not depend on the absolute scale of this energy unit). Compared to the Rosetta energy function used in the prior published work (3), two minor updates were made:

a. Glycosidic (!) torsional potential: In the prior implementation of the glycosidic torsional potential, the glycosidic torsion at the syn minima (~69° if north pucker; ~70° if south pucker) were penalized by 2.2 RU relative to conformations at anti minima (199° if north pucker; ~237° if south pucker) based on the lower frequency of syn glycosidic conformation found in the crystallographic structure of the large ribosomal subunit of H. marismortuii (PDB: 1JJ2). Prior to this work, in separate studies on RNA loop modeling (3) and ab initio motif structure prediction (unpublished results), we observed poor recovery of syn guanosines in noncanonical motifs. Hence, we have modified the glycosidic torsional potential to remove the energy penalty of the syn conformation for guanosine nucleotide.

b. Intra-nucleotide energetic terms: For historical reasons, the energetic contributions from the standard Rosetta all-atom energy terms (van der Waals interactions, hydrogen bonds and solvation effects) were omitted for atom-pairs belonging to the same nucleotide (the exception being a very weak van der Waals interactions repulsion term). These energy terms have been reintroduced here for all base-phosphate intra-nucleotide atom-pairs, including those forming potential hydrogen bonds. The energetic contribution from the standard Rosetta energy terms between intra-nucleotide base-ribose and ribose-phosphate atom-pairs are still presently omitted and instead captured through existing torsional potential terms (1).

CS-ROSETTA-RNA modeling for the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop

Among the 11 blind motifs in the 21-RNA benchmark, the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop was unique in that an initial NMR study on the motif was already published prior to the CS-ROSETTA-RNA modeling (4). As noted in the main text, this prior study did not produce sufficient constraints to generate an atomic-detail three-dimensional structure, but a descriptive 2D schematic was proposed (Fig. 3A in main text). We therefore took the following approach to the modeling process. First, approximately half of the features proposed by the 2D schematic were directly incorporated as constraints during the modeling process, including:

i). G4, G4*, G6 and G6* were assumed to adopt the syn glycosidic conformation.

ii). G4-G6* and G4*-G6 were assumed to form base pairs (but the exact base-pairing pattern was

Page 28: Structure determination of noncanonical RNA motifs from H ...

4

not specified).

iii). U7 and U7* were assumed to be extra-helical bulges.

In this case, only Stepwise Assembly (3) (SWA) modeling was performed since the Fragment Assembly with Full-Atom Refinement (1) (FARFAR) method did not have a framework to support incorporating the constraints. Additional features proposed in the prior study that were not used as modeling constraints included:

i). The exact base-pairing pattern (trans Watson-Crick/Hoogsteen) of G4-G6* and G4*-G6.

ii). Two-fold rotational symmetry of the structure.

iii.) 2´-endo sugar conformations at G4, G4*, A5, A5*, G6, and G6*.

The CS-ROSETTA-RNA model was able to independently recover these additional features.

The original blind modeling of the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop was also completed during the early development stage of the CS-ROSETTA-RNA method. At the time of the original modeling, the intra-nucleotide energetic terms update to the Rosetta all-atom energy function for RNA (see above) was not yet implemented. The lowest energy model obtained from this original CS-ROSETTA-RNA modeling (referred to here as the original Rosetta model) thus contained steric clashes between the amino atoms in the base and the phosphate group atoms of both the G4 and G4* nucleotides. Once the intra-nucleotide energetic terms update to the Rosetta all-atom energy function for RNA was implemented, the original Rosetta model was refined using the following protocol: Stepwise Assembly (SWA) modeling was carried out again on the RNA motif, but now focused only on conformations near the original Rosetta model; a filter was imposed at each SWA building step requiring models to be with 3.0 Å rmsd of the original Rosetta model. The refined Rosetta model generated from this procedure closely resembles the original Rosetta model (1.20 Å pairwise all-heavy-atom rmsd) and retained all the base-pairing and base-stacking features. The refined model also removed steric clashes between the amino atoms in the base and the phosphate group atoms of both the G4 and G4* nucleotides, as expected, and now contain an energetically favorable hydrogen bond between the 1H2 amino proton and the OP1 phosphate oxygen at these nucleotides. The results from the original blind modeling are presented in Table 1 of main text and the Supporting Tables while the refined model is presented in Fig. 3 of the main text.

Supporting Results A criterion for confidence prediction

We discovered that the energy gap, between the lowest energy model and the next lowest energy model that is structurally distinct from the former (at least 1.5 Å rmsd separation), could be used as a metric to assess the confidence of the CS-ROSETTA-RNA modeling. A large energy gap indicates the presence of

Page 29: Structure determination of noncanonical RNA motifs from H ...

5

a single dominant lowest energy conformation with much better energy than all others. Conversely, a small energy gap indicates the presence of multiple structurally distinct conformations with similar energies. As a metric to assess the confidence level of future predictions, we defined the Egap:

where Elowest is the Rosetta energy (including chemical shift pseudo-energy) of the lowest energy model and Enext is the Rosetta energy (including chemical shift pseudo-energy) of the next lowest energy model that is at least 1.5 Å all-heavy-atom rmsd from the former. Based on the results from the 21-RNA benchmark, an Egap value greater than 3.0 kBT is a strong indicator that the lowest energy Rosetta model is accurate. For all 6 cases from the 21-motif benchmark that were found to have an Egap value greater than 3.0 kBT, the lowest energy model was within atomic-accuracy of the experimental structure (under 1.5 Å all-heavy-atom rmsd; see Fig. S3 and Table S3). The criterion was applicable to motifs in both the known structure test set and the blind target set. Interestingly, the same criterion was not applicable to the benchmark results prior to the addition of the chemical shift pseudo-energy term (see Table S2). Thus, inclusion of the experimental chemical shift data was critical for both achieving accurate predictions and assessing their confidence.

FR3D search on the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop In the Rosetta model for the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop, the backbone of A5 and A5* adopted a highly irregular conformation, which led to a ~180° inversion in the adenosines’ ribose orientation relative to the riboses in the preceding and following nucleotides (G4, G6, G4*, G6*; see main text Fig. 3B). We used the FR3D program (5) to determine whether this conformational arrangement had been previously observed. The FR3D search, which relies on geometric comparisons of the base conformations, was carried out on a non-redundant list of 654 RNA-containing PDB structures with resolution better than 4 Å (downloaded from the FR3D website on June 02, 2012). The atomic coordinates of the G4-A5-G6 tri-nucleotide from the Rosetta model were used as the query conformation and FR3D was used to search for any geometrically similar consecutive tri-nucleotide conformation in the non-redundant PDB database. FR3D found no matching candidate with geometric discrepancy below the 0.50 Å default cutoff value. The candidate with the lowest geometric discrepancy (0.62 Å), the G1297-U1298-A1299 tri-nucleotide of the 16S rRNA of H. marismortuii (PDB: 2AW7), does not display the aforementioned ~180° inverted ribose conformational arrangement. Lastly, a manual search reveals a potentially similar 180° inverted ribose conformation at the G1426-A1427-G1428 tri-nucleotide of the 23S rRNA of E. coli (PDB: 1VS6). However, all bases in this prior structure were in the anti glycosidic configuration, distinct from the syn G configurations exhibited in the 5´-GABrGU-3´/3´-UBrGAG-5´ Rosetta model.

Page 30: Structure determination of noncanonical RNA motifs from H ...

6

References

1. Das R, Karanicolas J, & Baker D (2010) Atomic accuracy in predicting and designing noncanonical RNA structure. Nat Methods 7(4):291-294.

2. Xia T, et al. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37(42):14719-14735.

3. Sripakdeevong P, Kladwang W, & Das R (2011) An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling. Proc Natl Acad Sci U S A 108(51):20573-20578.

4. Hammond NB, Tolbert BS, Kierzek R, Turner DH, & Kennedy SD (2010) RNA internal loops with tandem AG pairs: the structure of the 5'GAGU/3'UGAG loop can be dramatically different from others, including 5'AAGU/3'UGAA. Biochemistry 49(27):5817-5827.

5. Sarver M, Zirbel C, Stombaugh J, Mokdad A, & Leontis N (2008) FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. Journal of Mathematical Biology 56(1):215-252.

Page 31: Structure determination of noncanonical RNA motifs from H ...

Figure S1. Energy vs. all-heavy-atom RMSD to the experimental structure. The energy (y-axis) is the sum of the standard Rosetta all-atom energy function for RNA and the chemical shift pseudo-energy term.

All-heavy-atom RMSD (Å) to Experimental Structure

Ene

rgy

Page 32: Structure determination of noncanonical RNA motifs from H ...

Motif name! NMR PDB ID!

Number of models in NMR ensemble!

Average pairwise rmsd (Å) of NMR ensemble!Include closing W.C.

base pairs!Exclude closing W.C.

base pairs!HIV-1 TAR apical loop! 1ANR! 20! 4.07! 4.42!Chimp HAR1 GAA loop! 2LHP! 10! 2.61! 3.33!Human HAR1 GAA loop! 2LUB! 10! 1.98! 2.58!Tandem UG:UA mismatch! 2JSE ! 10! 1.39! 1.77!

Figure S2. Four dynamic and/or unstructured motifs from the RNA motif benchmark. (A) The table reports the average rmsd value calculated between every possible pair of models in the NMR-derived ensemble. Inclusion of the well-structured boundary Watson-Crick base pairs in the rmsd calculation lowers the rmsd value. Four structurally diverse models from the NMR-derived ensemble of the (B) HIV-1 TAR apical loop, (C) chimp HAR1F GAA loop, (D) human HAR1F GAA loop, and (E) tandem UG:UA mismatch. In the UG:UA mismatch case, the structural dynamics appear to be localized to a single gaunosine nucleotide although the motif is known to be thermodynamically destabilizing and contains no base pairs [Biochemistry. 2007 Nov 6;46(44):12665-78].

A

E

PDB: 2JSE | NMR model #1 PDB: 2JSE | NMR model #2 PDB: 2JSE | NMR model #3 PDB: 2JSE | NMR model #5

C

PDB: 2LHP | NMR model #1 PDB: 2LHP | NMR model #5 PDB: 2LHP | NMR model #8 PDB: 2LHP | NMR model #10

D

PDB: 2LUB | NMR model #1 PDB: 2LUB | NMR model #3 PDB: 2LUB | NMR model #6 PDB: 2LUB | NMR model #10

B

PDB: 1ANR | NMR model #1 PDB: 1ANR | NMR model #3 PDB: 1ANR | NMR model #4 PDB: 1ANR | NMR model #6

Page 33: Structure determination of noncanonical RNA motifs from H ...

Figure S3. Egap vs. RMSD of the CS-ROSETTA-RNA lowest energy model to the experimental structure. An energy gap (Egap) value greater than 3.0 kBT (red line) is a strong indicator that the lowest energy Rosetta model will achieve atomic-accuracy. In the 21-RNA benchmark, six motifs have an Egap value greater than 3.0 kBT, and the lowest energy CS-ROSETTA-RNA model for all of these 6 cases were found to be within 1.5 Å rmsd (green line) of the experimental structure.

Page 34: Structure determination of noncanonical RNA motifs from H ...

Figure S4. Comparisons between CS-ROSETTA, NMR and crystallographic models of the most conserved internal loop from the signal recognition particle RNA. The NMR study on this motif produced a tightly defined ensemble, where (A) the mean minimized model and (B) the ensemble member with the best rmsdshift both gave back-calculated chemical shifts for the G4 H1!, G4 H8 and G9 H1! protons that disagreed with the experimental data by at least 1.0 ppm (green circles). Since the experimental 1H chemical shift data and the NMR models originated from the same NMR study, these disagreements could not have been due to differences in experimental conditions (e.g. temperature, salt concentrations and pH). The NMR ensemble also showed disagreements with both (C) the CS-ROSETTA-RNA lowest energy model and (D) the crystallographic model, particularly at the G4-G5 backbone suite (blue dashed circle), the G9-C10 backbone suite, and the C10 nucleobase. .

A NMR mean minimized model (PDB: 28SR)

G9 H1!

DATA Linear Fit

B NMR ensemble model #6 (PDB: 28SP)

C CS-ROSETTA-RNA lowest energy model

D Crystallographic model (PDB: 1LNT)

G4 H8

G4 H2!

G4 H8!

G4 H2!

G4 H8

G4 H2!

G4 H8

G4 H2!

DATA Linear Fit

rmsdshift=0.441 ppm_

rmsdshift =0.504 ppm_

G4 H2!

G4 H8

G9 H1! G4 H2!

G4 H8

rmsdshift=0.180 ppm_

rmsdshift=0.274 ppm_

DATA Linear Fit

DATA Linear Fit

Page 35: Structure determination of noncanonical RNA motifs from H ...

Figure S5. Comparisons between the CS-ROSETTA-RNA and NMR models of the 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop. (A-B) The CS-ROSETTA-RNA lowest energy model in cartoon and stick view. (C-D) The first model from the NMR ensemble subsequently solved by the Turner and Kennedy group (PDB: 2LX1) in cartoon and stick view. The pairwise all-heavy-atom rmsd value between the CS-ROSETTA-RNA and the NMR model is 1.07 Å.

A CS-ROSETTA-RNA model (stick view)

C NMR ensemble model #1 (stick view)

B CS-ROSETTA-RNA model (cartoon view)

D NMR ensemble model #1 (cartoon view)

Page 36: Structure determination of noncanonical RNA motifs from H ...

Figure S6. Modeling results on the uuGCCaa anticodon stem-loop of B. subtilis tRNAGly by three different models generation methods. During the original blind modeling, we generated models using (A) SWA and (B) FARFAR with RNA09 fragment database (obtained from http://kinemage.biochem.duke.edu/databases/rnadb.php). CS-ROSETTA-RNA prediction using models from both source (A) and (B) did not return a <2.0 Å rmsd structure (to NMR PDB: 2LBJ) among the five lowest energy clusters. However, modeling with (A) SWA alone does return a <2.0 Å rmsd structure among the five lowest energy clusters (green circle). Lastly, modeling with (C) FARFAR but using the PDB:1JJ2 fragment database also returned a <2.0 Å rmsd structure among the five lowest energy clusters (blue circle).

A Stepwise Assembly (SWA)

B FARFAR with RNA09 fragment database

C FARFAR with PDB:1JJ2 fragment database

Page 37: Structure determination of noncanonical RNA motifs from H ...

Motif nam

e M

otif properties N

on-exchangable 1H chem

ical shift dataa

NM

R structure

b C

rystallographic structureb

Size Strands

Ntotal

Nper

nucleotide Source

PDB

nucleotide-segment

(chain ID)

rmsd

shift (ppm

) PD

B nucleotide-segm

ent(chain ID

) rm

sd shift

(ppm)

Know

n Structures Single G

:G m

ismatch

6 2

39 6.5

BM

RB

Entry 4614 1F5G

3-5 (A

) , 6-8 (B)

0.19 !

! !

UU

CG

tetraloop 6

1 46

7.7 B

MR

B Entry 5705

2KO

C

5-10 (A)

0.24 1F7Y

8-13 (B

) 0.28

Tandem G

A:A

G m

ismatch

8 2

60 7.5

Biochem

istry. 1996 Jul 30;35(30):9677-89 1M

IS 3-6 (A

), 11-14 (B)

0.32 !

! !

Tandem U

G:U

A m

ismatch

8 2

38 4.8

Biochem

istry. 2007 Nov 6;46(44):12665-78

2JSE 4-7 (A

), 16-19 (A)

0.26 !

! !

16s rRN

A U

UA

AG

U loop

8 1

46 5.8

Biochem

istry. 2001 Aug 21;40(33):9879-86

1HS2

3-10 (A)

0.33 1FJG

1089-1096 (A

) 0.25

HIV-1 TA

R apical loop

8 1

56 7.0

Nucleic A

cids Res. 1996 O

ct 15;24(20):3974-81 1A

NR

29-36 (A

) 0.28

! !

! tR

NA

i Met A

SL 9

1 67

7.4 J M

ol Biol. 1997 A

pr 4;267(3):505-19 1SZY

7-15 (A

) 0.30

! !

! C

onserved SRP internal loop

12 2

67 5.6

Nat Struct B

iol. 1999 Jul;6(7):634-8 28SR

5-10 (A

), 19-24 (A)

0.50 1L

NT

4-9 (A

), 16-21 (B)

0.27 R

2 retrotransposon 4x4 loop 12

2 70

5.8 B

MR

B Entry 17406

2L8F 2-7 (A

), 10-15 (A)

0.32 !

! !

Hepatitis C

virus IRES IIa

13 2

93 7.2

Nat Struct B

iol. 2003 Dec;10(12):1033-8

1P5M

9-17 (A), 45-48 (A

) 0.41

2PN4

51-59 (A), 109-112 (B

) 0.20

Blind Targets

UA

AC

tetraloop 6

1 28

4.7 Fox group

4A4R

9-14 (A

) 0.31

! !

! U

CA

C tetraloop

6 1

30 5.0

Fox group 4A

4S 9-14 (A

) 0.25

! !

! U

GA

C tetraloop

6 1

28 4.7

Fox group 4A

4U

9-14 (A)

0.39 !

! !

UU

AC

tetraloop 6

1 28

4.7 Fox group

4A4T

9-14 (A)

0.50 !

! !

Chim

p HA

R1 G

AA

loop 7

2 38

5.4 Schw

albe group 2LH

P 9-11 (A

), 26-29 (A)

0.35 !

! !

Hum

an HA

R1 G

AA

loop 7

2 34

4.9 Schw

albe group 2LU

B

9-11 (A), 26-29 (A

) 0.35

! !

! G

U:U

AU

internal loop 9

2 50

5.6 Sigel group

!c

6-9 (A), 30-34 (A

) 0.44

! !

tRN

AG

ly ASL (cuU

CC

aa) 9

1 65

7.2 N

ikonowicz group

2LBL

5-13 (A)

0.32 !

! !

tRN

AG

ly ASL (cuU

CC

cg) 9

1 59

6.6 N

ikonowicz group

2LBK

5-13 (A

) 0.19

! !

! tR

NA

Gly A

SL (uuGC

Caa)

9 1

69 7.7

Nikonow

icz group 2LB

J 5-13 (A

) 0.37

! !

! Self-sym

metric G

AB

rGU

loop 12

2 90

7.5 K

ennedy and Turner group 2LX

1 3-8 (A

), 14-19 (A)

0.26 !

! !

Average 8.4

1.5 52.7

6.2 !

! !

0.33 !

! 0.25

HIV, hum

an imm

unodeficiency virus; TAR

, trans-activation response; tRN

Ai M

et, initiator methionine tR

NA

; ASL, anticodon stem

-loop; SRP, signal recognition particle;

IRES, internal ribosom

e entry site; HA

R1, hum

an accelerated region 1; tRN

AG

ly, glycine tRN

A; B

rG, 8-brom

oguanosine.

a Ntotal and N

per nucleotide are, respectively, the number of experim

entally assigned non-exchangeable 1H chem

ical shift data (including H1´, H

2´, H3´, H

4´, 1H5´ and 2H

5´ ribose protons, and H

2, H5, H

6 and H8 base protons). The chem

ical shift data were obtained from

the BM

RB

database, from published literature (see citation) and from

N

MR

structure determination studies conducted in parallel w

ith this work (see group nam

e). In two know

n structure cases, the data were not directly available in the

published literature and were graciously provided to us by the authors of the source publication (G

. Varani for the HIV-1 TA

R apical loop and J. D

. Puglisi for the Hepatitis

C V

irus IRES IIa). In B

MR

B Entry 5705, correction w

ere made for the sw

itched chemical shift assignm

ents of 1H5´ and 2H

5´ for nucleotides C8, G

9 and G10.

b The NM

R structures cam

e from the sam

e source as the experimental non-exchangeable 1H

chemical shift data, except for the self-sym

metric G

AB

rGU

loop structure which

came from

a subsequent study by the same group. The first m

odel of the NM

R ensem

ble was used as the experim

ental reference structure (except in the 3 cases below). In 4

cases with know

n structure, we found that the m

otif had also been solved by crystallography. In 3 of the 4 cases, the crystallographic structure provided a better agreement

to the chemical shift data (i.e., low

er rmsd

shift than every mem

ber of the NM

R ensem

ble). In these 3 cases (PDB

ID highlighted in bold text), the crystallographic structure

was used as the experim

ental reference structure. c The experim

ental structure has not yet been deposited into the PDB

database.

Table S1. Supplemental inform

ation on the RN

A m

otifs benchmark

Page 38: Structure determination of noncanonical RNA motifs from H ...

Table S2. Supplemental R

osetta modeling results (before inclusion of chem

ical shift data)

Motif nam

e M

otif properties Low

est energy model (top-1)

Best of five low

est energy cluster centers (top-5)

Size Strands

PDB

a E

gap b rm

sd (Å)

rmsd

shift (ppm

) B

ase-pair recovery

c B

ase-stack recovery

c C

luster rank

rmsd (Å

) rm

sdshift

(ppm)

Base-pair

recoveryc

Base-stack

recoveryc

Know

n Structures Single G

:G m

ismatch

6 2

1F5G

0.33 0.72

0.16 1/1

4/4 1

0.72 0.16

1/1 4/4

UU

CG

tetraloop 6

1 1F7Y

1.54

2.38 0.42

0/0 1/2

1 2.38

0.42 0/0

1/2 Tandem

GA

:AG

mism

atch 8

2 1M

IS 2.35

3.84 0.36

0/2 4/6

2 0.99

0.22 2/2

6/6 Tandem

UG

:UA

mism

atch 8

2 2JSE

0.80 6.36

0.38 0/0

0/3 3

3.43 0.32

0/0 2/3

16s rRN

A U

UA

AG

U loop

8 1

1FJG

0.27 6.63

0.52 0/1

0/3 2

1.99 0.38

1/1 3/3

HIV-1 TA

R apical loop

8 1

1AN

R

0.00 4.86

0.30 0/0

0/0 1

4.86 0.30

0/0 0/0

tRN

Ai M

et ASL

9 1

1SZY

0.38 5.57

0.32 0/0

2/3 2

4.33 0.25

0/0 2/3

Conserved SR

P internal loop 12

2 1LN

T 1.80

1.89 0.40

1/3 9/10

2 0.91

0.26 3/3

10/10 R

2 retrotransposon 4x4 loop 12

2 2L8F

0.06 4.68

0.48 0/4

3/10 5

1.39 0.44

2/4 10/10

Hepatitis C

virus IRES IIa

13 2

2PN4

2.82 6.11

0.35 2/2

5/7 4

5.33 0.35

1/2 5/7

Blind Targets

UA

AC

tetraloop 6

1 4A

4R

0.33 0.85

0.33 0/1

3/3 1

0.85 0.33

0/1 3/3

UC

AC

tetraloop 6

1 4A

4S 0.09

4.66 0.57

0/1 0/4

2 0.72

0.28 0/1

4/4 U

GA

C tetraloop

6 1

4A4U

2.11

5.94 0.49

0/0 1/2

2 1.60

0.38 0/0

2/2 U

UA

C tetraloop

6 1

4A4T

0.44 5.05

0.51 0/0

0/0 3

1.43 0.39

0/0 0/0

Chim

p HA

R1 G

AA

loop 7

2 2LH

P 0.52

5.04 0.50

0/0 1/2

5 3.78

0.38 0/0

2/2 H

uman H

AR

1 GA

A loop

7 2

2LUB

0.90

4.44 0.48

0/1 2/5

4 2.28

0.27 1/1

4/5 G

U:U

AU

internal loop 9

2 !

d 0.16

4.12 0.25

1/2 3/4

1 4.12

0.25 1/2

3/4 tR

NA

Gly A

SL (cuUC

Caa)

9 1

2LBL

0.41 5.56

0.29 0/0

2/5 3

5.47 0.33

0/0 2/5

tRN

AG

ly ASL (cuU

CC

cg) 9

1 2LB

K

0.55 5.34

0.35 1/1

1/5 5

5.00 0.29

1/1 1/5

tRN

AG

ly ASL (uuG

CC

aa) 9

1 2LB

J 1.23

5.02 0.44

1/1 1/3

3 4.11

0.38 1/1

1/3 Self-sym

metric G

AB

rGU

loop 12

2 2LX

1 0.55

5.46 0.53

0/2 2/7

3 1.25

0.29 2/2

7/7 Average

8.4 1.5

! 0.84

4.50 0.40

0.33/1.05 2.10/4.19

2.62 2.71

0.32 0.76/1.05

3.43/4.19 R

MSD

< 1.50 Å

! !

! !

2/21 !

! !

! 8/21

! !

! R

MSD

< 2.00 Å

! !

! !

3/21 !

! !

! 10/21

! !

! E

gap >3.00 kB T

! !

! 0/21

! !

! !

! !

! !

! a PD

B ID

of experimental reference crystallographic or N

MR

structure (see Table S1 for details). b Energy gap (see A criterion for confidence prediction subsection of Supporting R

esults for definition). c N

umber of native base-pairs and native base-stacks correctly recovered by the R

osetta model. B

ase pairs and the base stacks are automatically annotated using the program

M

C-annotate [J M

ol Biol. 2001 M

ay 18;308(5):919-36]. Base-pairing annotation follow

s the Leontis and Westhof nom

enclature [RN

A. 2001 A

pr;7(4):499-512] and recovery entails having the correct edge-to-edge interaction (W

atson-Crick, H

oogsteen, or Sugar-edge) and local strand orientation (cis or trans). Counts of both native base pairs and

correctly recovered base pairs are lowered ow

ing to ambiguities in assignm

ent of bifurcated base pairs, pairs connected by single hydrogen bonds and pairs that are not com

pletely co-planar. Boundary/closing canonical base pairs w

ere not counted. Base-stacks are classified as either upw

ard, downw

ard, outward or inw

ard [RN

A. 2009 O

ct;15(10):1875-85] and recovery entails having the correct base-stacking type. d The experim

ental structure has not yet been deposited into the PDB

database.

Page 39: Structure determination of noncanonical RNA motifs from H ...

Motif nam

e M

otif properties Low

est energy model (top-1)

Best of five low

est energy cluster centers (top-5)

Size Strands

PDB

a E

gap b rm

sd (Å)

rmsd

shift (ppm

) B

ase-pair recovery

c B

ase-stack recovery

c C

luster rank

rmsd (Å

) rm

sdshift

(ppm)

Base-pair

recoveryc

Base-stack

recoveryc

Know

n Structures Single G

:G m

ismatch

6 2

1F5G

0.46 0.71

0.14 1/1

4/4 1

0.71 0.14

1/1 4/4

UU

CG

tetraloop 6

1 1F7Y

3.30

0.84 0.18

0/0 2/2

1 0.84

0.18 0/0

2/2 Tandem

GA

:AG

mism

atch 8

2 1M

IS 7.58

1.10 0.13

2/2 6/6

1 1.10

0.13 2/2

6/6 Tandem

UG

:UA

mism

atch 8

2 2JSE

1.41 3.02

0.12 0/0

2/3 5

2.52 0.15

0/0 3/3

16s rRN

A U

UA

AG

U loop

8 1

1FJG

2.11 0.52

0.19 1/1

3/3 1

0.52 0.19

1/1 3/3

HIV-1 TA

R apical loop

8 1

1AN

R

2.23 5.86

0.18 0/0

0/0 1

5.86 0.18

0/0 0/0

tRN

Ai M

et ASL

9 1

1SZY

0.48 3.89

0.18 0/0

2/3 3

1.35 0.17

0/0 3/3

Conserved SR

P internal loop 12

2 1LN

T 5.51

0.81 0.18

3/3 10/10

1 0.81

0.18 3/3

10/10 R

2 retrotransposon 4x4 loop 12

2 2L8F

4.10 1.17

0.17 3/4

10/10 1

1.17 0.17

3/4 10/10

Hepatitis C

virus IRES IIa

13 2

2PN4

2.08 3.21

0.19 2/2

7/7 2

1.48 0.17

2/2 7/7

Blind Targets

UA

AC

tetraloop 6

1 4A

4R

3.92 0.97

0.24 1/1

3/3 1

0.97 0.24

1/1 3/3

UC

AC

tetraloop 6

1 4A

4S 2.56

1.00 0.20

1/1 4/4

1 1.00

0.20 1/1

4/4 U

GA

C tetraloop

6 1

4A4U

0.74

3.60 0.28

0/0 1/2

2 1.67

0.28 0/0

2/2 U

UA

C tetraloop

6 1

4A4T

0.45 1.72

0.27 0/0

0/0 1

1.72 0.27

0/0 0/0

Chim

p HA

R1 G

AA

loop 7

2 2LH

P 0.73

2.88 0.18

0/0 2/2

3 2.88

0.21 0/0

1/2 H

uman H

AR

1 GA

A loop

7 2

2LUB

1.33

2.26 0.22

1/1 4/5

4 2.03

0.17 0/1

4/5 G

U:U

AU

internal loop 9

2 !

d 1.10

1.37 0.16

2/2 4/4

1 1.37

0.16 2/2

4/4 tR

NA

Gly A

SL (cuUC

Caa)

9 1

2LBL

0.89 3.28

0.19 0/0

5/5 3

1.41 0.20

0/0 5/5

tRN

AG

ly ASL (cuU

CC

cg) 9

1 2LB

K

1.50 3.42

0.17 1/1

4/5 2

1.94 0.16

1/1 5/5

tRN

AG

ly ASL (uuG

CC

aa) 9

1 2LB

J 2.19

3.08 0.16

1/1 3/3

3 2.93

0.17 1/1

3/3 Self-sym

metric G

AB

rGU

loop 12

2 2LX

1 28.13

1.14 0.22

2/2 7/7

1 1.14

0.22 2/2

7/7 Average

8.4 1.5

! 3.39

2.18 0.19

1.00/1.05 3.95/4.19

1.90 1.69

0.19 0.95/1.05

4.10/4.19 R

MSD

< 1.50 Å

! !

! !

10/21 !

! !

! 13/21

! !

! R

MSD

< 2.00 Å

! !

! !

11/21 !

! !

! 16/21

! !

! E

gap >3.00 kB T

! !

! 6/21

! !

! !

! !

! !

! a PD

B ID

of experimental reference crystallographic or N

MR

structure (see Table S1 for details). b Energy gap (see A criterion for confidence prediction subsection of Supporting R

esults for definition). c N

umber of native base-pairs and native base-stacks correctly recovered by the R

osetta model. B

ase pairs and the base stacks are automatically annotated using the program

M

C-annotate [J M

ol Biol. 2001 M

ay 18;308(5):919-36]. Base-pairing annotation follow

s the Leontis and Westhof nom

enclature [RN

A. 2001 A

pr;7(4):499-512] and recovery entails having the correct edge-to-edge interaction (W

atson-Crick, H

oogsteen, or Sugar-edge) and local strand orientation (cis or trans). Counts of both native base pairs and

correctly recovered base pairs are lowered ow

ing to ambiguities in assignm

ent of bifurcated base pairs, pairs connected by single hydrogen bonds and pairs that are not com

pletely co-planar. Boundary/closing canonical base pairs w

ere not counted. Base-stacks are classified as either upw

ard, downw

ard, outward or inw

ard [RN

A. 2009 O

ct;15(10):1875-85] and recovery entails having the correct base-stacking type. d The experim

ental structure has not yet been deposited into the PDB

database.

Table S3. Supplemental R

osetta modeling results (after inclusion of chem

ical shift data)