Investigating diproline segments in proteins: Occurrences, conformation and classification

11
Indranil Saha,* Narayanaswamy Shamala Department of Physics, Indian Institute of Science, Bangalore-560012, India Received 3 May 2011; revised 18 July 2011; accepted 18 July 2011 Published online 6 September 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/bip.21703 This article was originally published online as an accepted preprint. The ‘‘Published Online’’ date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at biopolymers@wiley. com INTRODUCTION T he uniqueness of proline among the 20 genetically coded amino acids lies in its covalent linkage between the side-chain and the backbone nitrogen atom. The direct consequence of the pyrrolidine ring formation restricts the torsional angle / to values of 2608 308 for the L-proline residue which has been clearly exploited in the design of well structured peptides. 1–6 Generally, proline exists in the following three stable conformations: (i) polypro- line (P II ) region (/ 2608, w 1208), (ii) the C 7 region or c-turn (/ 2708, w 708), and (iii) right handed a helical region a R (/ 2608, w 2308). It is now well established that the presence of L-proline in proteins has a large effect on their conformation. 7–10 This effect is due to the fact that proline is a part of a five-membered ring. As a consequence, proline, when linked in a peptide chain, is devoid of amide hydrogen atoms and thus can only contrib- ute to hydrogen bonded structures through its carbonyl group. A major consequence of the absence of the NH group capable of participating in intra-molecular hydrogen bond- ing, an interaction characteristic of polypeptide secondary Investigating Diproline Segments in Proteins: Occurrences, Conformation and Classification Additional Supporting Information may be found in the online version of this article. Correspondence to: Prof. N. Shamala; e-mail: [email protected] *Present address: Structural Biology Laboratory, ELETTRA Synchrotron Light Laboratory, Trieste-34149, Italy; e-mail: [email protected]. ABSTRACT: The covalent linkage between the side-chain and the backbone nitrogen atom of proline leads to the formation of the five-membered pyrrolidine ring and hence restriction of the backbone torsional angle / to values of 260 8 308 for the L-proline. Diproline segments constitute a chain fragment with considerably reduced conformational choices. In the current study, the conformational states for the diproline segment ( L Pro- L Pro) found in proteins has been investigated with an emphasis on the cis and trans states for the Pro-Pro peptide bond. The occurrence of diproline segments in turns and other secondary structures has been studied and compared to that of Xaa-Pro-Yaa segments in proteins which gives us a better understanding on the restriction imposed on other residues by the diproline segment and the single proline residue. The study indicates that P II –P II and P II a are the most favorable conformational states for the diproline segment. The analysis on Xaa-Pro-Yaa sequences reveals that the Xaa- Pro peptide bond exists preferably as the trans conformer rather than the cis conformer. The present study may lead to a better understanding of the behavior of proline occurring in diproline segments which can facilitate various designed diproline-based synthetic templates for biological and structural studies. # 2011 Wiley Periodicals, Inc. Biopolymers 97: 54–64, 2012. Keywords: diproline segments; conformational states; cis Pro-Pro peptide bond; trans Pro-Pro peptide bond; flanking residue conformation V V C 2011 Wiley Periodicals, Inc. 54 Biopolymers Volume 97 / Number 1

Transcript of Investigating diproline segments in proteins: Occurrences, conformation and classification

Investigating Diproline Segments in Proteins: Occurrences, Conformationand Classification

Indranil Saha,* Narayanaswamy ShamalaDepartment of Physics, Indian Institute of Science, Bangalore-560012, India

Received 3 May 2011; revised 18 July 2011; accepted 18 July 2011

Published online 6 September 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/bip.21703

This article was originally published online as an accepted

preprint. The ‘‘Published Online’’ date corresponds to the

preprint version. You can request a copy of the preprint by

emailing the Biopolymers editorial office at biopolymers@wiley.

com

INTRODUCTION

The uniqueness of proline among the 20 genetically

coded amino acids lies in its covalent linkage between

the side-chain and the backbone nitrogen atom. The

direct consequence of the pyrrolidine ring formation

restricts the torsional angle / to values of2608 � 308for the L-proline residue which has been clearly exploited in

the design of well structured peptides.1–6 Generally, proline

exists in the following three stable conformations: (i) polypro-

line (PII) region (/ � 2608, w � 1208), (ii) the C7

region or c-turn (/ � 2708, w � 708), and (iii) right handed

a helical region aR (/ � 2608, w � 2308). It is now

well established that the presence of L-proline in proteins has

a large effect on their conformation.7–10 This effect is due to

the fact that proline is a part of a five-membered ring. As

a consequence, proline, when linked in a peptide chain,

is devoid of amide hydrogen atoms and thus can only contrib-

ute to hydrogen bonded structures through its carbonyl

group.

A major consequence of the absence of the NH group

capable of participating in intra-molecular hydrogen bond-

ing, an interaction characteristic of polypeptide secondary

Investigating Diproline Segments in Proteins: Occurrences, Conformationand Classification

Additional Supporting Information may be found in the online version of this

article.

Correspondence to: Prof. N. Shamala; e-mail: [email protected]

*Present address: Structural Biology Laboratory, ELETTRA Synchrotron Light

Laboratory, Trieste-34149, Italy; e-mail: [email protected].

ABSTRACT:

The covalent linkage between the side-chain and the

backbone nitrogen atom of proline leads to the formation

of the five-membered pyrrolidine ring and hence

restriction of the backbone torsional angle / to values of

260 8� 308 for the L-proline. Diproline segments

constitute a chain fragment with considerably reduced

conformational choices. In the current study, the

conformational states for the diproline segment

(LPro-LPro) found in proteins has been investigated with

an emphasis on the cis and trans states for the Pro-Pro

peptide bond. The occurrence of diproline segments in

turns and other secondary structures has been studied

and compared to that of Xaa-Pro-Yaa segments in

proteins which gives us a better understanding on the

restriction imposed on other residues by the diproline

segment and the single proline residue. The study

indicates that PII–PII and PII–a are the most favorable

conformational states for the diproline segment. The

analysis on Xaa-Pro-Yaa sequences reveals that the Xaa-

Pro peptide bond exists preferably as the trans conformer

rather than the cis conformer. The present study may

lead to a better understanding of the behavior of proline

occurring in diproline segments which can facilitate

various designed diproline-based synthetic templates for

biological and structural studies. # 2011 Wiley

Periodicals, Inc. Biopolymers 97: 54–64, 2012.

Keywords: diproline segments; conformational states; cis

Pro-Pro peptide bond; trans Pro-Pro peptide bond;

flanking residue conformation

VVC 2011 Wiley Periodicals, Inc.

54 Biopolymers Volume 97 / Number 1

structure11–19 in proline, results in disfavoring of this

amino acid residue to occur in regular a-helices and b-sheets frequently. However, the presence of proline in the

N-cap region of helices is explained by the fact that the

backbone torsion angles needed for a-helix formation is

readily adopted by proline and the N-terminus residue

which have solvent exposed NH groups.20–27 An earlier

study also demonstrates the stabilization of a-helices using

CH. . .O hydrogen bonding in the proline residue.28 In

proteins, the loop regions29–32 and the turn regions1–3 are

rich in prolines which facilitates polypeptide chain reversal

permitting the formation of a globular structure. Several

insights have emerged from this large body of published

work.30,33–39 The rotational isomeric states, conformational

energies, and statistical weights of the local minimum-

energy conformers of di-, tri-, tetra-, and penta-L-proline,

with N-terminal acetyl and C-terminal methyl ester groups

have been investigated earlier.40 The occurrence of proline

in position i12 of the b-turn facilitates the formation of

type VIA1 turn,2,41,42 with a cis peptide bond between the

diproline segment. Diproline segments constitute a chain

fragment with considerably reduced conformational

choices. Proline has been recognized as playing a special

role in the folding and unfolding of globular proteins7

and a relatively high intrinsic probability (between 6.1 and

6.3, depending on the adjacent sequence) of existing as

the cis rather than the trans peptide isomer,43,44 whereas

for other amino acids the probability is much smaller.45

The presence of a cis-proline residue in the a-helix of a

protein or polypeptide has been suggested to cause a

reversal of direction of the helix,46,47 while a trans-proline

residue causes a disruption of the hydrogen bonding in

four residues of the a helix.48 The occurrence of cis pep-

tide bonds in proteins involving proline is a noteworthy

feature in proteins.49–51

The allowed regions of /, w space for both L-and D-pro-

line residues has been investigated in an earlier study,5,6

using designed peptides containing homochiral and hetero-

chiral diproline segments. In the current analysis, the pos-

sible conformational states for the diproline segment

(LPro-LPro) found in proteins taken from a non-redundant

dataset has been investigated and identified with an em-

phasis on the cis and trans states for the peptide bond

between the diproline segment. The occurrence of diproline

segments in type VIA1 turns (cis Pro-Pro peptide bond)

and other regular secondary structures like type III b-turnsand a-helices has also been studied. This has been followed

up by the amino acid distribution flanking the diproline

segment and the conformation adopted by Xaa-Pro and

Yaa-Pro segments in proteins.

MATERIALS AND METHODS

Selection of the Non-Redundant Dataset of Proteins

from PDBA dataset comprising of 2190, largely non-homologous (�25%

sequence similarity), high resolution (�1.8 A or better) protein

crystal structures was culled from the entire PDB52 using the

PISCES server.53 The R-factor was chosen to be � 25% for all the

structures in the search procedure. From this dataset, proteins con-

taining the following sequence stretches XPPY, XPPPY, XPPPPY,

XPPPPPY were identified and selected. This reduced the number of

proteins from 2190 to 653. Out of these 47 proteins were discarded

because they did not contain the total electron density for the

desired sequence stretches in the crystallographic data. Finally, with

606 protein structures in the non-redundant data set, the analysis

was carried out. The data set consisted of the PDB entries given in

List S1 of the Supporting Information (polypeptide chain identifiers

are indicated wherever homologous multiple chains are present).

Segregation of Sequences and Assignment of

Conformational Regions for the Proline ResidueIn the dataset, 809 sequences were retrieved that contained the

sequence Xaa-Pro-Pro-Yaa. Out of these 60 contained a cis Pro-Pro

peptide bond and the rest had a trans peptide bond between the

prolines. The 60 examples were then segregated into blocks of

secondary structure adopted by the Pro-Pro segment. Similar classi-

fication was also done for the 749 examples of the remaining

sequences where the peptide bond between the prolines is trans.

From the dataset, 20,654 sequences of the type Xaa-Pro-Yaa were

retrieved and grouped from 2132 proteins [List S2 of the Supporting

information] (from 2190 proteins) into conformational blocks

adopted by the Xaa-Pro and Pro-Yaa segments with cis and trans

conformations being taken care of. The flowchart shown in Figure 1

shows the scheme used. In the analysis, the following values of vari-

ous conformations were used: a region �/ 5 2308 to 2908, w 52208 to 2808; c region �/ 5 2408 to 21008, w 5 408 to 1008; PIIregion �/ 5 2308 to 2908, w 5 908–1808; Bridge region �/ 52508 to 21408, w 5 2258 to 258; Extended region �/ 5 21008 to21608, w5 908–1808.

RESULTS AND DISCUSSION

Distribution of Proline Conformations for the

Sequence Xaa-Pro-Pro-Yaa

The (/, w) plot for the proline resides in diproline segments

at positions i11 and i12 (cis and trans Pro-Pro peptide

bond) is shown in Figures 2A and 2D. The data indicate that

in the cis examples, the conformation adopted by proline in

position i11 is invariably PII and at position i12, proline

prefers a conformation in the PII or Bridge region. Hence,

the most favorable conformation for the prolines in diproline

segments would be one that has PII–Bridge or PII–PIIcombination. In the trans case, the conformation adopted by

proline in position i11 is overwhelmingly populated in the

Diproline Segments in Amino Acid Sequences of Proteins 55

Biopolymers

PII and right-handed a-helical region whereas for position

i12, the major conformation is PII and a with a substantial

amount of occurrences in Bridge and the C7 (c-turn) region.Hence, a lot of conformational diversity is observed in dipro-

line segments with trans peptide linkage between them.

Classification of the Sequences Having a cis Pro-ProPeptide Bond and trans Pro-Pro Peptide Bond

In our non-redundant dataset of 606 proteins, the Pro-Pro

bond was observed to be cis in 60 instances and trans in 749

instances. Among the examples with cis bonds, 30 partici-

pated in type VIA1 b-turns (Table I), 21 in PII–PII conforma-

tion, and the remaining 9 in loop regions. Among the exam-

ples with trans bonds, 32 participated in either a-helices (23),310 helices (7) or type III b-turns (2) (Table I). c-turnsoccurred in 27 examples with one of the prolines in the

diproline segment adopting a conformation in the C7

(c-turn) region (Table I). The rest 231 examples belonged to

loop regions in proteins and 459 examples belonged to the

category PII–PII.

Conformation of Pro-Pro Segments in Proteins

Table II lists the grouping of Pro-Pro segments in proteins

into the following 10 categories: (i) a–a, (ii) a–PII, (iii) PII–a,

(iv) PII–PII, (v) PII–c, (vi) PII–Bridge, (vii) c–PII, (viii)

Extended–PII, (ix) Extended–Bridge, and (x) Others (which

did not belong to anyone of the categories mentioned above).

Considering the sequence Xaa-Pro-Pro-Yaa, the first cis/trans

in the table refers to the Xaa-Pro peptide bond being cis or

trans and the second cis/trans in the table refers to the Pro-

Pro peptide bond being cis or trans.

Cis–Cis and Trans–Cis Configurations. The data is indica-

tive of the fact that the cis–cis configuration of the peptide

bond is quite rare for the diproline segment. Out of the seven

examples, for six of them the conformation adopted by the

first proline is inevitably PII followed by the second proline

residue conformation which occurs preferentially in the

Bridge region. A total of 53 examples were observed in the

trans–cis category with six of them belonging to the ‘‘Others’’

category. The first proline residue inevitably adopts a confor-

mation in the PII region (even in the six examples of the

‘‘Others’’ category, the conformation adopted by the first

proline residue is PII) with the second proline overwhelm-

ingly populating the Bridge region thereby justifying the

large number of occurrences of type VIA1 turns. This is

followed up by the conformation PII at position i12. It is

noteworthy that for cis Pro-Pro peptide bond, there is nearly

FIGURE 1 Flowchart of the scheme used for the analysis.

56 Saha and Shamala

Biopolymers

no occurrence (only one example found) of the PII–a confor-

mation, which is one of the most favored conformations

noted for trans Pro-Pro peptide bond.

Cis–Trans and Trans–Trans Configurations. Fifty-five

examples occurring in the category cis–trans points to the

fact that the difference in cis/trans isomerisation energies are

small in peptide bonds considering proline residues.49–51

Unlike the other two categories mentioned above, in this cat-

egory a substantial population (43 examples) adopts a PII–PIIconformation followed by 8 examples of PII–a conformation

and 3 examples of the diproline segment taking up a PII–cconformation. Number of examples in the PII–Bridge and

Extended–PII categories is negligibly small (one example

each). The data indicate that with a cis–trans peptide linkage,

PII–PII conformation is the most stable and favored confor-

mation for the Pro-Pro segment in proteins. The

highest number of examples (691) in the category trans–trans

FIGURE 2 Ramachandran map showing the distribution of backbone torsion angles (/, w) for(A) Proi11, (B) Proi12, for the sequence Xaa-Pro-Pro-Yaa and for which the Pro-Pro peptide bond is

cis. (C) Proi11 and (D) Proi12 for the sequence Xaa-Pro-Pro-Yaa and for which the Pro-Pro peptide

bond is trans.

Diproline Segments in Amino Acid Sequences of Proteins 57

Biopolymers

TableI

ListofTypeVIa1b-Turns,IIIb-Turns,Helices,an

dc-Turns

PDBID

TypeVIa1

b-Turn

Location

(/,w,x):

Pro

i11(D

eg)

(/,w,x):

Pro

i12(D

eg)

PDBID

Locationof

b-Turn/H

elix

(/,w,x):

Pro

i11(D

eg)

(/,w,x):

Pro

i12(D

eg)

b-Turn/H

elix

PDBID

Locationof

c-Turn

(/,w,x):

Pro

i11(D

eg)

(/,w,x):

Pro

i12(D

eg)

1NNLA

CP188P189A

271,168,0

274,210,178

1D3GA

GP364P365V

251,235,179

262,219,178

a1T9H

ARP73P74I

265,161,178

277,96,178

1V0WA

CP349P350L

275,173,2

291,22,2180

1I6LA

YP126P127L

257,245,178

262,234,178

310

1U71A

LP322P323H

271,156,2173

276,63,177

2F7VA

NP23P24R

264,163,0

285,21,2180

1IRDB

TP324P325V

254,244,177

266,235,174

a1C61A

YP339P340I

274,133,2176

280,63,179

2DPLA

KP301P302A

273,169,1

282,217,2176

1JG

1A

FP154P155K

252,237,173

256,228,2178

310

2CULA

VP168P169G

268,50,65

242,162,180

1AK0

NP88P69T

268,150,0

291,3,2179

1M22A

MP327P328L

242,251,2180

261,239,2179

a1F8EA

SP166P167T

288,161,179

279,98,2180

1LUGA

TP200P201L

247,137,12

286,8,168

1O2DA

TP205P206S

260,248,179

265,239,2180

a1LLFA

EP31P32V

254,122,2176

283,56,2174

1BS0A

RP350P351T

246,152,22

294,19,177

1O8XA

CP41P42A

257,254,2178

264,220,177

a2BKM

AGP61P62L

257,149,2174

279,63,2172

1H4GA

RP106P107G

282,169,3

2115,15,175

1QQFA

VP1250P1251V

253,244,2180

264,237,179

a1PMI

DP353P354I

277,158,180

281,42,176

1IFRA

AP508P509T

264,155,0

293,3,2178

1VNS

GP47P48L

261,258,2179

261,245,2180

a1QW9A

AP315P316L

266,155,2178

282,64,2177

1JFBA

DP91P92E

263,149,23

288,26,2178

1VNS

TP300P301R

250,243,179

253,231,179

a1UEKA

DP196P197Y

269,141,174

277,60,170

1JZ8A

NP111P112F

259,142,5

2102,20,2176

3SIL

FP267P268M

256,236,2176

262,225,177

TypeIII

1VNS

KP395P396F

251,133,2179

272,56,179

1K0M

ACP90P91R

262,154,4

283,212,2174

1V54A

LP106P107S

257,245,179

261,242,180

a1SU8A

LP584P585I

281,165,2171

284,67,2177

1N97A

HP201P202L

271,148,0

277,24,2179

1R7AA

LP256P257L

249,258,180

271,229,179

a1S9UA

SP85P86W

261,126,2174

282,42,179

2ERL

CP37P38Y

265,148,23

293,22,179

1NZJA

DP49P50R

259,235,168

253,228,180

310

1VLAA

EP101P102K

266,150,175

279,73,2165

1V54A

YP130P131L

290,163,2

277,214,2180

1RK6A

YP193P194A

249,244,179

261,221,2177

310

1X6OA

MP125P126D

271,157,174

277,46,2177

1V6SA

VP322P323F

259,150,0

287,17,2178

1UCDA

WP11P12A

245,250,2180

259,241,2180

a1W23A

VP239P240F

278,167,173

278,83,2165

1S1DA

QP207P208G

255,147,4

285,12,175

1V9FA

YP124P125I

246,236,178

258,219,177

310

1T1UA

LP240P241I

254,158,2178

278,61,2165

1VLPA

RP284P285Y

256,141,5

283,218,2176

1T6UA

KP65P66H

245,245,179

261,233,177

a2B61A

TP203P204D

276,160,2179

279,63,2175

1SG4A

NP23P24V

266,161,3

281,212,2169

1OJ8

ARP41P42R

250,248,2180

252,244,2179

a2EX4A

IP24P25T

255,139,2179

286,72,2178

1YXYA

YP81P82N

267,156,4

281,26,2177

1S7IA

IP106P107G

243,255,2180

260,223,2179

a/310

2FPQA

KP61P62R

278,159,179

280,55,179

1ZZM

AFP15P16F

268,151,1

284,5,2177

1VPM

ALFP20P21D

250,237,176

263,219,2176

310

2AGYA

LP172P173K

266,151,2172

277,59,2179

1TUOA

TP241P242S

255,153,0

291,19,180

1YDIA

MP75P76A

248,248,2179

255,236,2180

a2CDUA

VP120P121I

281,168,2175

283,47,175

2BW4A

KP22P23F

255,149,10

287,3,177

2AVDA

NP139P140E

257,243,2176

257,232,174

a2EX2A

AP163P164A

252,119,2176

281,76,2171

1WXCA

LP168P169Y

251,141,8

288,21,173

2AXOA

CP58P59A

242,246,180

256,240,2180

a2FUKA

AP142P143A

259,125,2179

270,60,178

2FHZB

TP54P55D

270,158,0

294,22,2180

2AEUA

NP241P242L

253,240,178

259,240,174

a2G8JA

FP62P63V

275,163,2177

280,68,2177

2B0JA

KP114P115K

262,144,6

291,21,2179

1WBEA

LP31P32F

248,240,178

257,224,2180

a/310

2ICYA

YP189P190G

286,131,2177

277,46,177

2F26A

RP315P316A

264,149,0

2101,24,2179

2ETVA

YP213P214F

249,243,176

262,235,174

a2GCIA

VP149P150L

271,132,174

276,60,2158

2DCFA

EP29P30H

280,151,0

286,1,2172

2DE3A

IP76P77L

259,248,180

267,230,179

a2JEKA

GP99P100D

252,144,6

291,11,2176

2HA8A

KP86P87N

251,240,177

262,220,179

a/310

2OJ5

ALP376P377L

248,142,8

295,15,174

2IO

IA

SP1125P1126L

255,252,2175

260,229,180

TypeIII

2IN

UA

YP344P345F

231,254,180

248,231,2178

310

2NT0A

SP98P99A

249,244,176

259,240,180

a

immediately leads to the conclusion that the trans peptide

bond is mostly favored between the diproline segment.

Thirty-two examples of helical conformation (Table I) fol-

lowed by nearly an equal number of examples (33) in the

PII–Bridge region adopted by the diproline segment immedi-

ately leads to the conclusion that with trans peptide bond

linkage, a–a and PII–Bridge conformations are equally likely.

However, the data strongly indicate that with trans peptide

linkage between the Prolines, PII–PII is the most stable and

favored conformation (416 examples). Left-handed polypro-

line II helices which are ‘‘very locally driven’’54 have, however,

not been investigated in this analysis. Twenty-four examples

were noted in the C7 (c-turn) region. The overall percentagedistribution of conformational states (Table II) reveals that

PII–PII and PII–a are the most favored states for the diproline

segment with percentage occurrences of 59.26% and 22%,

respectively followed by PII–Bridge even though the percent-

age occurrence is much less compared to the first two catego-

ries mentioned above. The table indicates that the population

in trans–cis and cis–trans states are comparable indicating

that the energy differences between these states is small.

However, trans–trans is the most populated state with a

percentage occurrence of 85.43%.

The puckering states of the pyrrolidine ring and its possi-

ble influence on diproline segment conformation has been

studied in a separate analysis (to be published later).

a–PII Conformations in Trans–Cis and Tran–TransConfigurations

Three examples of a–PII conformation observed in the data

merit mention, which have not been observed earlier.55 The

first example occurs unexpectedly in the trans–cis category in

the protein (Histone-Lysine N-methyltransferase, PDB ID:

2F69 A) with proline occurring at positions 341 and 342 of

the amino acid sequence. Proline at position 342 has been

inserted by molecular modeling methods and hence the cis

peptide bond may be an outcome of this procedure.56 The

rest two examples are found in the trans–trans category. Out

of these, in the protein ‘‘Probable Glutaminase YBAS; PDB

ID: 1U60 A,’’57 the diproline segment is part of a bend

characterized by a–PII conformation connecting two

b-strands. In the third example of a protein (Periplasmic

Binding Protein BUGD; PDB ID: 2F5X A),58 the diproline

segment is a part of an eight residue loop that connects a

helix and a b-strand. Hence, the data suggest that adoption

of a–PII conformation is primarily governed by the local

interaction present in that region of the protein, to facilitate

folding and initiating specific interactions.

Number of Occurrences of Proline in Various

Conformational States

Table III shows the number of occurrences of proline present

in diproline segments. For cis proline, PII is the most

preferred conformation at position i11 whereas for position

i12, the preferred conformation adopted by the proline

residue lies in the PII and Bridge regions. In the case of trans

proline, PII is still the dominant conformation both at posi-

tion i11 and i12 followed by an appreciable number of

occurrences in the right-handed a-helical conformation. The

number of occurrences of proline in the Bridge region and

the c-turn conformation are comparable in number. Thus,

this table clearly indicates that PII is the most preferred con-

formation at position i11 for both cis and trans proline and

the number of appreciably populated conformational states

occupied by proline at position i12 is greater in case of trans

proline than in cis proline.

Distribution of Flanking Residues and their

Preferred Conformational States (/,w)

The number of occurrences of all amino acids except proline

in the flanking positions is given in Table IV. A histogram

representation of these occurrences for each amino acid

(grouped into hydrophobic, polar, and charged categories)

except proline is also shown in Figure 3. Considering cis Pro-

Pro peptide bond (60 examples), for the left flanking posi-

tion i, the distribution shows a greater population for polar

and charged amino acids like Thr, Arg, and Lys. For position

i13 (right flanking position), hydrophobic amino acids like

Leu and Phe show a greater affinity. For the trans configura-

tion (749 examples) and considering position left flanking

position (i), the data shows an affinity of hydrophobic amino

Table II Conformation of Pro-Pro-Segments in Proteins

Type Cis–Cis Cis–Trans Trans–Cis Trans–Trans

Overall

%

a-a — — — 32 3.95

a–PII — — 1 2 0.37

PII–a — 8 1 169 22.00

PII–PII 2 43 19 416 59.26

PII–c — 3 — 23 3.21

PII–Bridge 4 1 26 33 8.27

c–PII — — — 1 0.12

Extended–PII — 1 — — 0.12

Extended–

Bridge

— — — 1 0.12

Others 1 2 6 14 2.83

Total 7 58 53 691 809

% 0.86 7.16 6.54 85.43

Diproline Segments in Amino Acid Sequences of Proteins 59

Biopolymers

acids (Leu in particular) to occur in this position whereas for

the right flanking position (i13), there is more tendency of

hydrophobic amino acids like Gly, Ala, Val, Leu and charged

amino acids like Glu and Lys to occur. Among the polar

amino acids only Thr shows an appreciable affinity to occur

in this position.

The distribution of the backbone torsion angles /,w in

the left flanking position (i) and right flanking position

(i13) for all 19 amino acids except proline is shown in Sup-

porting Information Figures S1–S5. Analysis of the left flank-

ing position (i) reveals that the Extended region is preferred

by Gly, Val, and Leu whereas Extended and polyproline

regions are both favored for the amino acids Ala, Ser, Thr,

Ile, Asp, Glu, Lys, Arg, Cys, Phe, and Tyr. The c-turn region

shows a marked affinity for the amino acids Asn, His, Phe,

and Tyr whereas the helical region is substantially populated

by Asn only. Considering the right flanking position (i13),

the following preferences are observed: helical region (ASN,

Gln, Val, and Leu); Bridge and helical (Gly, Lys, Arg, His,

Phe, and Tyr); Extended and helical (Ala, Ser, Thr, Asp, Glu,

Ile, and His); polyproline (Asp, Glu, and Ile). Met and Trp

are present in very small numbers in the dataset and hence

no statistically valuable conclusion can be made from theirs

distribution plot.

Conformation of Xaa-Pro and Pro-Yaa Segments in

Proteins (Sequences of the Type Xaa-Pro-Yaa)

Twenty thousand six hundred fifty-four sequences of the

type Xaa-Pro-Yaa (Xaa and Yaa are amino acids other than

proline) were retrieved from the dataset and then grouped

into various conformational categories with a clear demarca-

tion between cis and trans peptide bonds between the Xaa-

Pro and Pro-Yaa segments. Table V lists the various confor-

mational categories. Comparison with Table II clearly reveals

more allowed combinations of conformational states for

these segments than that are allowed for the diproline seg-

ment which stresses the effect of the torsional restriction

imposed by the pyrrolidine ring of proline on its conforma-

tion and on the neighboring residue. Table V indicates that

the Xaa-Pro peptide bond exists preferably as the trans con-

former. The same is valid for Pro-Yaa segments with the cis

conformer being populated to even lesser extent. The data

show that a–a, PII–a, PII–PII, and Extended–PII are the most

Table III Number of Occurrences of Proline in Various Conformations

Xaa-P-P-Yaa Sequence (cis Proline) Xaa-P-P-Yaa Sequence (trans Proline) Xaa-P-P-P-Yaa Sequence (trans Proline)

No. of occurrences No. of occurrences No. of occurrences

Conformation i11 Position i12 Position i11 Position i12 Position i11 Position i12 Position i13 Position

a 1 1 34 209 — 2 12

PII 5217 22 697112 46314 41 38 20

Bridge — 30 — 35 — — 8

c — — 1 26 — — 1

Extended — — 2 — — — —

Cis or trans proline refers to the Pro-Pro peptide bond being cis or trans. The numbers in bold font are examples form the category ‘‘Others.’’

Table IV Number of Amino Acid Occurrences for the Left

Flanking Position (i) and the Right Flanking Position (i13)

(seq. Xaa-P-P-Yaa)

Left Flanking

Position (i)

Right Flanking

Position (i13 )

Amino

acid

Cis

Pro-Pro

Peptide

Bond

Trans

Pro-Pro

Peptide

Bond

Cis

Pro-Pro

Peptide

Bond

Trans

Pro-Pro

Peptide

Bond

Gly (G) 4 41 3 85

Ala (A) — 56 4 68

Val (V) 2 65 3 58

Leu (L) 5 126 8 63

Ile (I) — 43 1 26

Met (M) — 14 — 8

Phe (F) 1 53 11 23

Trp (W) — 5 — 4

Asn (N) 5 39 1 28

Cys (C) 5 19 — 11

Ser (S) 2 28 1 41

Thr (T) 8 56 5 47

Tyr (Y) 3 26 6 24

Gln (Q) 4 31 2 34

Glu (E) 2 31 4 83

Asp (D) 4 29 3 36

Arg (R) 7 36 3 32

His (H) 1 24 4 28

Lys (K) 7 27 1 50

60 Saha and Shamala

Biopolymers

populated states for Xaaa-Pro and Pro-Yaa segments as

compared to PII–a and PII–PII states observed for the Po-Pro

segment.

Cis Peptide Linkage. Table V indicates that a–a, a–PII, PII–a,PII–c, c–PII, and Extended–Bridge conformational combina-

tions are less favored for the Xaa-Pro segment as in the case

of Pro-Pro segments. PII–PII and PII–Bridge are favored com-

binations in both the cases. However, the Extended–PII is

heavily populated which is absent in the case of Pro-Pro.

This clearly points that when proline takes up a PII confor-

mation, the preceding amino acid residue either adopts an

Extended or a PII conformation in majority of cases. How-

ever, when proline takes up a conformation in the Bridge

region, the preceding amino acid is invariably takes up a PIIconformation. This agrees with the occurrence of Type VIA1

turns in Xaa-Pro segments with a cis peptide linkage between

Xaa and Proline. Two categories that merit mention are

Extended–a (96 examples) and Extended–PII combinations

(334 examples), which is absent for diproline segments hav-

ing a cis Pro-Pro peptide bond. These two categories repre-

sent examples belonging to cis-Pro-touch turns.59 The other

combinations listed in Table V shows that they are infrequent

and seldom present.

The Pro-Yaa segment very rarely takes up a cis configura-

tion of the peptide bond between them. There is no appreci-

able population in any category than ‘‘Others.’’ Only nine

examples of PII–Extended conformation are observed.

Trans Peptide Linkage. For trans Xaa-Pro peptide bond

linkage, it is observed that PII–PII is the most favored combi-

nation of conformation followed by the Extended–PII combi-

nation (unlike diproline segment analysis). a–a and PII–a are

populated to a substantial extent as in the case of Pro-Pro.

FIGURE 3 Amino acid distribution (except proline) showing the number of occurrences of all

amino acids in the flanking position i and i13 for the sequence stretch X-P-P-Y with (A) cis Pro-Pro

peptide bond, (B) trans Pro-Pro peptide bond.

Diproline Segments in Amino Acid Sequences of Proteins 61

Biopolymers

However, both PII–c and PII–Bridge conformational blocks

show a decrease in percentage than that observed for Pro-

Pro. A total of 38 examples are observed for the a–PII combi-

nation which was present in trace quantities for Pro-Pro.

Considering the category trans Pro-Yaa peptide bond, the

conformational blocks are more populated than any other

category. It is observed that PII–PII, a–a, and PII–a conforma-

tional combinations are highly preferred for this segment in

the trans case which corroborates with the analysis on dipro-

line segments mentioned earlier in this analysis. a–PII combi-

nation is present in large numbers in this category. The

a–Bridge combination interestingly is quite heavily popu-

lated in this category.

CONCLUSIONSThe data and analysis presented in this present study leads to

a number of conclusions pertaining to preferred proline con-

formation in diproline segments. It is observed that for cis

Pro-Pro peptide bond, the conformation adopted by the first

Proline lies in PII region whereas the second proline inevita-

bly adopts a conformation in the Bridge region, leading to

the formation of the type VIA1 b-turn structure. However, in

the trans case, the conformation adopted by the first proline

is overwhelmingly populated in the PII (polyproline) and

right-handed a-helical region. For position i12, the major

conformation adopted by proline is PII and a with a substan-

tial amount of occurrences in Bridge and the C7 (c-turn)region. The analysis also reveals that the cis–cis configuration

of the peptide bond is very rare when considering the dipro-

line segment (Table II). With a cis–trans peptide linkage, PII–

PII conformation is the most stable and favored conforma-

tion for the Pro-Pro segment in proteins. The trans peptide

bond is mostly favored between the diproline segment pro-

teins. With trans peptide bond linkage between the proline

residues, a–a and PII–Bridge conformations are equally likely.

The overall percentage distribution of conformational states

for the diproline segment reveals that PII–PII and PII–a are

Table V Conformation of Xaa-Pro and Pro-Yaa Segments in Proteins (20,654 Sequences of the Type Xaa-Pro-Yaa )

Type

Xaa-Pro Segments Pro-Yaa Segments

Total Overall %Cis Trans Cis Trans

a–a 1 1624 3886 5511 13.34

a–PII 6 38 138 182 0.44

a–c 10 26 36 0.10

a–Bridge 1 211 1944 2156 5.22

a–Extended 3 456 459 1.11

PII–a 11 2418 2007 4436 10.74

PII–PII 150 4344 3 3121 7618 18.44

PII–c 1 257 169 427 1.03

PII–Bridge 188 655 1 563 1407 3.41

PII–Extended 5 1 9 2983 2998 7.26

Bridge–a 3 5 1193 1201 2.91

c–a 99 99 0.24

c–PII 3 117 120 0.29

c–c 3 3 0.01

c–Bridge 2 2 38 42 0.10

c–Extended 1 1 1 192 195 0.47

Bridge–PII 1 4 285 290 0.70

Bridge–c 24 24 0.06

Bridge–Extended 2 295 297 0.72

Extended–a 96 1383 11 1490 3.61

Extended–PII 334 4290 20 4644 11.24

Extended–c 1 169 170 0.41

Extended–Bridge 519 519 1.26

Extended–Extended 2 4 11 17 0.04

Others 231 3677 33 3026 6967 16.86

Total 1036 19,618 47 20,607 41,308

% 2.51 47.49 0.11 49.89

Cis or trans proline refers to the Xaa-Pro/Pro-Yaa peptide bond being cis or trans.

62 Saha and Shamala

Biopolymers

the most favorable conformational states for the diproline

segment with percentage occurrences of 59.26% and 21.97%,

respectively. PII–Bridge is the third most preferred conforma-

tion even though the percentage occurrence is much less

compared to the first two categories mentioned above. The

a–a and PII–c conformations are populated nearly equally

likely. The population in trans–cis and cis–trans states are

comparable indicating that the energy differences between

these states is small. However, trans–trans is the most popu-

lated state with a percentage occurrence of 85.43%.

The analysis and comparison of conformational states

with the Xaa-Pro-Yaa sequence reveals that the Xaa-Pro pep-

tide bond exists preferably as the trans conformer rather than

the cis conformer. The same is valid for Pro-Yaa segment,

with the cis conformer being populated to even lesser extent.

The data show that a–a, PII–a, PII–PII, and Extended–PII are

the most populated states for Xaa-Pro and Pro-Yaa segments

as compared to PII–PII and PII–a and states observed for the

Pro-Pro segment. Considering individual proline residues,

PII is the most preferred conformation at position i11 for

both cis and trans proline. The data presented in Table V

immediately leads to the conclusion that the amino acid fol-

lowing the proline in majority of cases adopts either a right-

handed helical (a) or Bridge conformation when proline

takes up a right-handed helical (a) conformation. With pro-

line taking up a PII conformation, the amino acid following

proline preferably adopts a conformation in the right-handed

helical (a), polyproline (PII), or Extended region of the Ram-

achandran map. The Extended–a and Extended–PII blocks

that were quite heavily populated in the case of trans Xaa-

Pro segments are nearly absent in the case of trans Pro-Yaa

segments. Thus, these results in turn may lead to better

understanding of the behavior of proline occurring in dipro-

line segments which can then be utilized for designing

various diproline-based synthetic templates for biological

and structural studies.

Authors sincerely thank Prof. P. Balaram (MBU, IISc) and Prof. N.V.

Joshi (CES, IISc) for their valuable comments on the preparation of

the manuscript. Authors also thank Dr. Raghurama Hegde who had

helped to prepare the perl scripts needed for the study.

REFERENCES1. Chou, P. Y.; Fasman, G. D. J Mol Biol 1977, 115, 135–175.

2. Wilmot, C. M.; Thornton, J. M. J Mol Biol 1988, 203, 221–232.

3. Richardson, J. S.; Richardson, D. C. Prediction of Protein Con-

formation; Plenum: New York, 1989; pp 1–98.

4. Richardson, J. S.; Richardson, D. C. Trends Biochem Sci 1989,

14, 304–309.

5. Chatterjee, B.; Saha, I.; Raghothama, S.; Aravinda, S.; Rai, R.;

Shamala, N.; Balaram, P. Chem—Eur J 2008, 14, 6192–6204.

6. Saha, I.; Chatterjee, B.; Shamala, N.; Balaram, P. Biopolymers

(Peptide Sci) 2008, 90, 537–543.

7. Levitt, M. J Mol Biol 1981, 145, 251–263.

8. MacArthur, M. W.; Thornton, J. M. J Mol Biol 1991, 218, 397–

412.

9. Reimer, U.; Scherer, G.; Drewello, M.; Kruber, S.; Schutkowski,

M; Fischer, G. J Mol Biol 1998, 279, 449–460.

10. Eyles, S. J.; Gierasch, L. M. J Mol Biol 2000, 301, 737–747.

11. Schimmel, P. R.; Flory, P. J. J Mol Biol 1968, 34, 105–120.

12. Chou, P. Y.; Fasman, G. D. Biochemistry 1974, 13, 211–222.

13. Chou, P. Y.; Fasman, G. D. Biochemistry 1974, 13, 222–245.

14. Anfinsen, C. B.; Scheraga, H. A. Adv Protein Chem 1975, 29,

205–300.

15. Robson, B.; Suzuki, E. J Mol Biol 1976, 107, 327–356.

16. Zimmerman, S. S.; Scheraga, H. A. Proc Natl Acad Sci USA

1977, 74, 4126–4129.

17. Richardson, J. S.; Richardson, D. C. Science 1988, 240, 1648–

1652.

18. Smith, C. K.; Withka, J. M.; Regan, L. Biochemistry 1994, 33,

5510–5517.

19. Minor, D.L., Jr.; Kim, P. S. Nature 1994, 367, 660–663.

20. Piela, L.; Nemethy, G.; Scheraga, H. A. Biopolymers 1987, 26,

1587–1600.

21. Presta, L. G.; Rose, G. D. Science 1988, 240, 1632–1641.

22. Yun, R. H.; Anderson, A. D.; Hermans, J. Proteins Struct Funct

Genet 1991, 10, 219–228.

23. Adzhubei, A. A.; Sternberg, M. J. E. J Mol Biol 1993, 229, 472–

493.

24. Aurora, R.; Rose, G. D. Protein Sci 1998, 7, 21–38.

25. Gunasekaran, K.; Gomathi, L.; Ramakrishnan, C.; Balaram, P.

J Mol Biol 1998, 284, 1505–1516.

26. Viguera, A. R.; Serrano, L. Protein Sci 1999, 8, 1733–1742.

27. Kim, M. K.; Kang. Y. K. Protein Sci 1999, 8, 1492–1499.

28. Chakrabarti, P.; Chakrabarti, S. J Mol Biol 1998, 284, 867–873.

29. Searle, M. S.; Williams, D. H.; Packman, L. C. Nat Struct Biol

1995, 2, 999–1006.

30. Gunasekaran, K.; Ramakrishnan, C.; Balaram, P. Protein Eng

1997, 10, 1131–1141.

31. Simpson, E. R.; Meldrum, J. K.; Bofill, R.; Crespo, M. D;

Holmes, E.; Searle, M. S. Angew Chem Int Ed Engl 2005, 44,

4939–4944.

32. Bofill, R; Simpson, E. R.; Platt, G. W.; Crespo, M. D.; Searle, M.

S. J Mol Biol 2005, 349, 205–221.

33. Venkatachalapathi, Y. V.; Balaram, P. Nature 1979, 281, 83–84.

34. Smith, J. A.; Pease, L. G. CRC Crit Rev Biochem 1980, 8, 315–

399.

35. Balaram, P. Proc Ind Acad Sci Chem Sci 1984, 93, 703–717.

36. Sibanda, B. L.; Thornton, J. M. Nature 1985, 316, 170–174.

37. Gellman, S. H. Curr Opin Chem Biol 1998, 2, 717–725.

38. Balaram, P. J Pept Res 1999, 54, 195–199.

39. Kaul, R.; Balaram, P. Bioorg Med Chem 1999, 7, 105–117.

40. Tanaka, S.; Scheraga, H. A. Macromolecules 1974, 7, 698–705.

41. Richardson, J. S. Adv Protein Chem 1981, 34, 167–339.

42. Hutchinson, E. G.; Thornton, J. M. Protein Sci 1994, 3, 2207–

2216.

43. Brandts, J. F.; Halvorson, H. R.; Brennan, M. Biochemistry

1975, 14, 4953–4963.

44. Grathwohl, C.; Wuthrich, K. Biopolymers 1976, 15, 2025–2041.

45. Ramachandran, G. N.; Mitra, A. K. J Mol Biol 1976, 107, 85–92.

Diproline Segments in Amino Acid Sequences of Proteins 63

Biopolymers

46. Pauling, L. J Am Chem Soc 1940, 62, 2643–2657.

47. Edsall, J. T. J Polym Sci 1954, 12, 253–280.

48. Toma, F.; Fermandjian, S.; Low, M.; Kisfaludy, L. Biochim Bio-

phys Acta 1978, 534, 112–122.

49. Stewart, D. E.; Sarkar, A.; Wampler, J. E. J Mol Biol 1990, 214,

253–260.

50. Pal, D.; Chakrabarti, P. J Mol Biol 1999, 294, 271–288.

51. Wedemeyer, W. J.; Welker, E.; Scheraga, H. A. Biochemistry

2002, 41, 14637–14644.

52. Bernstein, F. C.; Koetzle, T. F.; Williams, G. J. B.; Meyer, E. F.; Jr.

Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.;

Tasumi M. J Mol Biol 1977, 112, 535–542.

53. Wang, G.; Dunbrack, R. L., Jr. Bioinformatics 2003, 19, 1589–

1591.

54. Creamer, T. P. Proteins Struct Funct Genet 1998, 33, 218–226.

55. Rai, R.; Aravinda, S.; Kanagarajadurai, K.; Raghothama, S.;

Shamala, N.; Balaram, P. J Am Chem Soc 2006, 128, 7916–7928.

56. Couture, J. F.; Collazo, E.; Hauk, G.; Trievel, R. C. Nat Struct

Mol Biol 2006, 13, 140–146.

57. Brown, G.; Singer, A.; Proudfoot, M.; Skarina, T.; Kim, Y.;

Chang, C.; Dementieva, I.; Kuznetsova, E.; Gonzalez, C. F.; Joa-

chimiak, A.; Savchenko, A.; Yakunin, A. F. Biochemistry 2008,

47, 5724–5735.

58. Huvent, I.; Belrhali, H.; Antoine, R.; Bompard, C.; Locht, C.;

Dubuisson, F. J.; Villeret, V. J Mol Biol 2006, 356, 1014–1026.

59. Videau, L. L.; Arendall, W. B., III; Richardson, J. S. Proteins

Struct Funct Bioinformatics 2004, 56, 298–309.

Reviewing Editor: J. Andrew McCammon

64 Saha and Shamala

Biopolymers