Putative homeodomain proteins identified in prokaryotes based on pattern and sequence similarity
-
Upload
shashi-kant -
Category
Documents
-
view
212 -
download
0
Transcript of Putative homeodomain proteins identified in prokaryotes based on pattern and sequence similarity
Putative homeodomain proteins identified in prokaryotes basedon pattern and sequence similarity
Shashi Kant,a Ashima Bagaria,b and S. Ramakumarb,c,*
a School of Biotechnology, Madurai Kamaraj University, Madurai 625021, Indiab Department of Physics, Indian Institute of Science, Bangalore 560012, Indiac Bioinformatics Center, Indian Institute of Science, Bangalore 560012, India
Received 13 October 2002
Abstract
A putative homeodomain has been identified in eubacterial genomes, which include several pathogens. The domain is related in
sequence to homeodomain, a component specific to transcription factors and playing a very important role in eukaryotes such as
controlling the developmental processes of the organism. The putative homeodomain has been characterized utilizing the eukaryotic
homeodomain protein sequence signature present in PROSITE as well as the sequence similarity search using BLAST suite for
different eubacterial genomes. These findings provide evidence for the occurrence of DNA-binding motif in prokarya similar to that
in eukarya.
� 2002 Elsevier Science (USA). All rights reserved.
Keywords: Transcription factors; Homeodomain; Sequence pattern; Homeobox; DNA-binding domain; Prokaryote; Eukaryote; Pro-homeodomain
Homeotic genes—the master control genes that reg-
ulate developmental process [1] in a precise spatial and
temporal fashion [2] in higher organisms, share a com-
mon sequence element, known as homeobox. The ho-
meobox is an important member of homeotic genes anda number of these genes appear to have retained both
their precisely ordered tandem arrangement in the ge-
nome as well as their developmental roles in axial pat-
terning across vast evolutionary time [2]. It encodes a
self-folding, stable protein domain of about 60 amino
acids, the homeodomain, which is composed of three
helical regions representing the sequence specific DNA-
binding domain of much larger transcription factorproteins [2,3].
Homeodomain was first identified in Drosophila ho-
meotic loci, where the proteins play a role in determi-
nation of body plan [4]. The sequences related to
homeodomain are found in several other organisms, but
while moving from the original Drosophila homeodo-
mains to mammalian homeodomains, the relationship
between conserved regions drops significantly [5].
However, some residues in the sequence are almost
conserved, leading to similar structure and function of
homeodomain from different organisms [6]. It is a DNA-
binding domain with three helices, where the first and
the second helices are almost antiparallel to each otherand the third helix is almost perpendicular to the other
two [7]. The second and the third helices form a helix-
turn-helix motif [8].
Since the past few years a large number of homeobox
genes from taxonomic groups ranging from yeast to
human have been isolated. A vast amount of sequence
data on homeodomains has been accumulated, which
provides useful and important information about theevolution of the homeobox gene family and the phy-
logeny of eukaryotic organisms [9,10]. On the other
hand, a large number of gene sequences either from fully
or partially sequenced genomes are available in the
public domain. These include genomes from several
pathogens as well. However not much is known as to
whether proteins similar in sequence and structure to
eukaryotic homeodomains occur in prokaryotes also.An attempt to answer this question utilizing currently
available bioinformatics tools such as BLAST series of
software [11], FASTA [12], CLUSTALW [13], SCAN
Biochemical and Biophysical Research Communications 299 (2002) 229–232
www.academicpress.com
BBRC
* Corresponding author. Fax: +91-80-360-2602/91-80-334-1683.
E-mail address: [email protected] (S. Ramakumar).
0006-291X/02/$ - see front matter � 2002 Elsevier Science (USA). All rights reserved.
PII: S0006 -291X(02 )02607 -4
PROSITE [14], and PHD [15] forms the basis of thepresent investigation.
Materials and methods
PROSITE [14] database has two homeobox domain signatures,
PS00027 and PS50071. The signature PS00027 includes the second and
the third helices, and has the pattern [LIVMFYG]-[ASLVR]-X(2)-
[LIVMSTACN]-X-[LIVM]-X(4)-[LIV]-[RKNQESTAIY]-[LIVFSTN-
KH]-W-[FYVC]-X-[NDQTAH]-X(5)-[RKNAIMW].
Analysis by different multiple alignment methods such as CLU-
STALW of more than 1100 homeodomain sequences (http://ge-
nome.nhgri.gov/homeodomain/fasta/? domain ¼ 1) revealed the
presence of conserved residues in the first helix also (e.g., Leu16 and
Phe/Leu20). On the basis of this finding we extended the homeodomain
signature to [LMFAC]-X(3)-[FYILW]-X(3,8)-[PLVIMSQKRDE]-
X(4)-[KRMILWQA]-X(2)-[LVIYMFHK]-[ASDTVFLRH]-X(2)-[LT-
IVMFAC]-X-[LMIAVNY]-X(4)-[LVCI]-X-[LIVSKARNTHG]-W-[F-
YTVELI]-X-[NIAHDGKQ]-X-[RAPLSNK]-X(3)-[RGYKLWNAIV]
that included the first helix as well and performed pattern search in
SWISSPROT and TrEMBL [16] through www.expasy.ch. The bigger
signature was found to give almost the same number of hits as ob-
tained using the PROSITE signature PS00027 (there were 869 hits in
865 sequences from 783 entries in our case and 869 hits in 867 se-
quences from 780 entries in the case of the PROSITE signature). Some
hypothetical proteins in TrEMBL databank, such as Q09546 from
Caenorhabditis elegans, also had sequences compatible with the ex-
tended sequence signature.
To further explore the sequence space, we used the BLAST series of
programs that are widely accessed for searching protein and DNA
databases for sequence similarities. In particular, Position Specific It-
erated BLAST (PSI-BLAST) program is useful for identifying distant
relationships among organisms by finding protein families. In the
present context we carried out the genome specific BLAST [11] search
using one of these hypothetical proteins Q09546, which gave a number
of hits in human as well as in other organisms like Candida albicans
and Aspergillus sp. Surprisingly, FASTA3 [12] and PSI-BLAST of
Homo sapiens gene sequence NM_002586(pre-B-cell leukemia tran-
scription factor 2) showed a few hits in eubacteria Staphylococcus
aureus Q932B9 (SWALL code) compatible with the extended homeo-
domain sequence pattern. The sequence Q932B9 has been annotated in
the TrEMBL [16] database as a hypothetical protein containing the
helix-turn-helix motif.
A sequence search was then carried out with BLAST suite using
Q932B9 (ref — NP_371375.1 — NC_002758.1). This search detected a
number of hits in different species of bacteria (Table 1) like Staphy-
lococcus aureus, Clostridium difficile, and Neisseria gonorrhoeae (Fig. 1)
which are prokaryotes. Multiple sequence alignment (Fig. 1) of these
sequences using CLUSTALW program [13] revealed a sequence pat-
tern that was similar to the extended homeodomain signature.
In order to further characterize the prokaryotic sequences, their
secondary structure was predicted using the PHD [15] server which
indicated the presence of three consecutive helical segments interrupted
by non-helical regions containing N/G sequence pattern at the 22nd
and 33rd positions (Fig. 1). Hydropathy analysis of the helical seg-
ments using the HELICALWHEEL module of Wisconsin GCG
package [17] identified the helices to be amphipathic (Fig. 2). Attempts
were made to predict the 3D-structure of the Q932B9 and other pro-
karyotic sequences (Fig. 1) using various 3D model prediction servers
such as ModBase (http://alto.rockfeller.edu/modbasecgi/index.cgi),
3DPSSM (http://www.sbg.bio.ic.ac.uk/servers/3dpssm), and 3DJIG-
SAW (http://www.bmm.icnet.uk/servers/3djigsaw). However it was not
possible to generate a reliable 3D model due to non-availability of an
already known protein 3D-structure which is significantly similar in
sequence to the prokaryotic sequences.
Results and discussions
It is already known that the eukaryotic homeodo-main has amphipathic helices and these helices interact
with each other in three dimensions forming a hydro-
phobic core. It is interesting to note that a typical pro-
karyotic putative homeodomain sequence (Q932B9) also
contains amphipathic helices (Fig. 2) and they may be
expected to interact with each other forming a hydro-
phobic core as depicted in Fig. 2.
The results of CLUSTALW (Fig. 1) showed thatmany of these sequences had conserved residues Leu
and Phe/Leu at the gap of three residues in first helix as
in the case of homeodomain (www.bioinfo.de/isb/1999/
01/main.htm). Tryptophan (W), which has only one
codon, is not a very common residue in proteins and
which is also the least mutable amino acid was present in
Table 1
References to the prokaryotic sequences (Fig. 1) considered for multiple alignment
Sequence Organism Protein identification Coding region in the genome Frame
Seq 1 Staphylococcus aureus aureus N315 (Sa) Ref— NC003140.1 22,523–22,684 +2
Seq 2 Bacillus anthracis strain AMES (Ba) gnl— TIGR_198094— contig:
6615:b_anthracis
1,013,325–1,013,504 +3
Seq 3 Staphylococcus epidermidis (Se) gnl— TIGR_1282— 407 672,424–672,603 +1
Seq 4 Geobacillus stearothermophilus (Bs) gnl— OUACGT_1422—
bstearo.fasta.screen.Contig 375
5391–5230 )2
Seq 5 Enterococcus faecalis (Ef) Gnl— TIGR_1351— glf_11370 2,234,369–2,234,196 )1Seq 6 Clostridium difficile (Cd) gnl— SANGER_1496— Contig 30 68,031–68,204 +3
Seq 7 Listeria monocytogenesis (Lm) gnl— TIGR_1639—
contig:761:1_monocytogenes-4b
905,763–905,584 )2
Seq 8 Streptococcus mutans (Sm) gnl— OUACGT_1309—
smutans.fasta.screen.Contig 2
741,652–741,831 +1
Query
sequence
Staphylococcus aureus strain Mu50 (Sa) ref— NC_002758.1— NP_371375.1 — —
Seq 9 Neisseria gonorrhoeae (Ng) Gnl— OUACGT_485—
Ngon_contig 1
464,684–464,529 )3
230 S. Kant et al. / Biochemical and Biophysical Research Communications 299 (2002) 229–232
almost all putative homeodomain sequences at the same
position (Fig. 1). Functional significance of the con-
served nature of tryptophan is further strengthened bythe fact that the Trp 48 interacts with Leu 16 and Phe/
Leu20 of the first helix and Leu 31 of the second helix
and their interaction plays a very important role in
eukaryotic homeodomain 3D-structure [9,18]. The resi-
dues conserved at equivalent positions in bacterial
sequences might also be playing the same role as in
eukaryotic homeodomain 3D-fold.
The possibility of formation of hydrophobic core by
the three helices of the prokaryotic sequences, taken
together with the presence of conserved tryptophan (W)
residue, that can stabilize the characteristic 3D-fold as in
eukaryotic homeodomains, suggests that the prokary-
otic sequences are likely to assume a fold similar to thatof eukaryotic homeodomains.
Based on these findings, we propose the presence of
putative homeodomain in eubacteria. Further, it will
be of interest to find out their exact role in prokary-
otic cells. Structural similarity with eukaryotic home-
odomain suggests their probable role as transcription
regulator in cell division processes, which is the im-
portant prokaryotic event common with eukaryoticdevelopmental program. BLAST search also revealed
many hits in Staphylococcus species, which are mostly
pathogenic in nature. The genome context of the gene
for Q932B9 from Staphylococcus aureus aureus was
investigated (www.ebi.ac.uk) and it was found that the
location where it is present in the genome contains
genes that code for single strand DNA-binding pro-
tein (BAB57026), excisionase (BAB57010), integrase(BAB57009), and some hypothetical proteins. Thus,
based on gene context, some role-requiring interaction
with DNA may be expected for the pro-homeodo-
main.
Conclusions
Thus our work which combines motif based assign-
ments with BLAST database searches in a semi-auto-
matic protocol, with a view to identify distant
relationships in organisms has revealed pro-homeodo-main or homeodomain like sequences in prokaryotes
Fig. 1. Multiple sequence alignment of the homeodomain like proteins of prokaryotes showing the region of alignment. The representative sequences
(Table 1) were aligned using CLUSTALW [13] program and alignment was refined manually. The shading of conserved residues is according to the
consensus and includes residues conserved in at least 78% of the aligned sequences. In the consensus line, hydrophobic residues (L, I, V, M, A, F, W,
S, and T) are represented by h and polar residues (K, R, D, E, Q, S, T, H, and N) are represented by p. The secondary structure prediction was
performed using the PHD program [15]. On the multiple alignment H indicates a-helix. Abbreviation: Ba, Bacillus anthracis; Bs, Geobacillus ste-arothermophilus; Cd, Clostridium difficile; Ef, Enterococcus faecalis; Lm, Listeria monocytogenes; Ng, Neisseria gonorrhea; Sa, Staphylococcus aureus;
Se, Staphylococcus epidermidis; Sm, Streptococcus mutans.
Fig. 2. Helical wheel [17] representation of all the three helices of
Staphylococcus aureus SWALL code Q932B9. The three helical seg-
ments were as predicted by the PHD server [15]. Schematic represen-
tation of the possible association of the three helices constituting the
hydrophobic core is seen. Hydrophobic residues are enclosed in a
rectangle.
S. Kant et al. / Biochemical and Biophysical Research Communications 299 (2002) 229–232 231
which may be expected to have a 3D similar to that ofeukaryotic homeodomains.
A complex regulatory switch in eukaryotes requires
varied forms of protein–DNA interactions. In the case
of eukaryotic homeodomain proteins the N-terminal
region participates in recognition of DNA to augment
the specificity [19,20]. Due to comparatively simpler
regulation, the N-terminal specificity in prokaryotic
homeodomain like proteins might not be required. Fu-ture studies of role of these pro-homeodomains may
unravel many interesting features in prokaryotes. Most
interestingly, since these proteins are seen in pathogenic
bacteria also, they may be suitable targets for drug de-
sign provided they play some crucial role in cellular
events. Finally, our work also outlines a generally ap-
plicable method, which combines pattern and sequence
similarity searches for the identification of families anddetection of distant relationships in proteins.
Acknowledgments
We thank Mr. Raju Mukherjee, Mr. Kalyan Kumar Sinha, and
Mr. Rudresh for useful discussions. Access to Bioinformatics Centre
and Interactive Graphics Facility both funded by the Department of
Biotechnology (DBT) is gratefully acknowledged. A.B. thanks the
Council of Scientific and Industrial Research (India) for a fellowship.
Some of the sequence data for this project was obtained from The
Institute for Genomic Research website at http://www.tigr.org. Our
sincere thanks to all the funding agencies that have supported the se-
quencing projects at TIGR.
References
[1] M. Billeter, Homeodomain-type DNA recognition, Prog. Bio-
phys. Mol. Biol. 66 (1996) 211–225.
[2] W.J. Gehring, M. Affolten, T. Burglin, Homeodomain proteins,
Annu. Rev. Biochem. 63 (1994) 487–526.
[3] Y.Q. Qian, M. Billeter, G. Otting, M. Muller, W.J. Gearing, K.
Wuthrich, The structure of the Antennapedia homeodomain
determined by NMR spectroscopy in solution: comparison with
prokaryotic repressors, Cell 59 (1999) 573–580.
[4] B. Lewin, Genes VII, seventh ed., Oxford University Press,
Oxford, 2000.
[5] C. Sander, R. Schneider, Database of homology-derived struc-
tures and the structural meaning of sequence alignment, Proteins 9
(1991) 56–68.
[6] S. Banerjee-Basu, E.S. Ferlanti, J.F. Ryan, A.D. Baxevanis, The
homeodomain resource: sequences, structures and genomic infor-
mation, Nucleic Acids Res. 27 (1999) 336–337.
[7] D.S. Wilson, B. Guenther, C. Desplan, J. Kuriyan, High-
resolution crystal structure of a paired class operative Homeod-
omain dimer on DNA, Cell 82 (1995) 709–719.
[8] S.C Harrison, A.K. Agarwal, DNA recognition by proteins with
the helix-turn-helix motif, Annu. Rev. Biochem. 59 (1990) 933–
969.
[9] S. Banerjee-Basu, A.D. Baxevanis, Molecular evolution of the
homeodomain family of transcription factors, Nucleic Acids Res.
29 (2001) 3258–3269.
[10] C. Kappen, Analysis of a complete homeobox gene repertoire:
implications for the evolution of diversity, Proc. Natl. Acad. Sci.
USA 97 (2000) 4481–4486.
[11] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang,
W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST; a new
generation of protein database search programs, Nucleic Acids
Res. 25 (1997) 3389–3402.
[12] W.R. Pearson, D.J. Lipman, Improved tools for biological
sequence analysis, Proc. Natl. Acad. Sci. USA 85 (1998) 2444–
2448.
[13] J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTALW
improving the sensitivity of progressive multiple sequence align-
ment through sequence weighting position-specific gap penalties
and weight matrix choice, Nucleic Acids Res. 22 (1994) 4673–
4680.
[14] K. Hofmann, P. Bucher, L. Falquet, A. Bairoch, The PROSITE
database its status in 1999, Nucleic Acids Res. 27 (1999) 215–
219.
[15] B. Rost, C. Sander, Combining evolutionary information and
neural network to predict protein secondary structure, Proteins 19
(1994) 55–72.
[16] A. Bairoch, R. Apweiler, The SWISS-PROT protein sequence
database and its supplement TrEMBL in 2000, Nucleic Acids Res.
28 (2000) 45–48.
[17] D.D. Womble, GCG: the Wisconsin package of sequence analysis
programs, Methods Mol. Biol. 132 (2000) 3–22.
[18] C.R. Kissinger, Crystal structure of an engrailed homeodomain–
DNA complex at 2.8�AA resolution: a framework for understanding
homeodomain–DNA interactions, Cell 63 (1990) 579–590.
[19] S.E. Ades, R.T. Saur, Specificity of minor-groove and major-
groove interactions in a homeodomain–DNA complex, Biochem-
istry 34 (1995) 14601–14608.
[20] M. Sharkey, Y. Graba, M.P. Scott, Hox genes in evolution:
protein surfaces and paralog groups, Trends Genet. 13 (1997)
145–151.
232 S. Kant et al. / Biochemical and Biophysical Research Communications 299 (2002) 229–232