Integrating Phosphorylation and Catalytic Sites Information into AH-DB
Transcript of Integrating Phosphorylation and Catalytic Sites Information into AH-DB
8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB
http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 1/5
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014
ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 77
Abstract — Protein is the so-called molecule machine which binds various molecules. These interactions are
critical to many biological processes. A protein before interaction is called ―apo‖ state while after interaction is
called ―holo‖ state. Our recent study constructed the largest repository, Apo-Holo DataBase (AH-DB), of apo-
holo structure pairs in the world, which enables researchers to understand protein interactions and their
associated biological processes. This work is a following work of AH-DB. In this work, two enhancementshave been done: (a) phosphorylation and catalytic sites and (b) a novel measure of structural variation. The
release of phospho.ELM dataset used in this work is version 9.0, which collected more than 42,500
phosphorylation sites that span 8,718 substrate proteins, and the catalytic sites information are collected from
Catalytic Site Atlas. The second enhancement is a novel measure of structure variation, L-RMSD, which
stands for local-root mean square deviation. A case study conducted in this work show that the two
enhancements make AH-DB capable to provide wider applications
Keywords — Catalytic Site; Conformational Transition; Molecular Interaction; Phosphorylation Site; Protein
Structure; RMSD.
Abbreviations — Apo-Holo DataBase (AH-DB); Catalytic Sites (C-sites); Local Root Mean Square Deviation
(L-RMSD); Nuclear Magnetic Resonance (NMR); Phosphorylation Sites (P-sites); Protein Data Bank (PDB);
Root Mean Square Deviation (RMSD).
I. INTRODUCTION
ROTEIN interactions with other molecules play an
important role in various biological processes. During
the interaction of binding other molecules, many
proteins undergo conformational transitions [Kim et al.,
2000; Goh et al., 2004; Michel 2007; Dan et al., 2010;].
Knowing such conformational transitions helps to understand
protein interactions and their associated biological processes.
To this goal, comparing the structure of a protein before
binding (apo structure) with that after binding (holo structure)
is important. Our recent work constructed a comprehensivelibrary, the Apo-Holo DataBase (AH-DB), of apo-holo
structure pairs for researchers to study conformational
transitions [Chang et al., 2012]. It provides sophisticated
interfaces for searching apo-holo structure pairs and
exploring conformational transitions in the paired structures.
This article demonstrates two major enhancements of AH-
DB.
The first enhancement is the inclusion of
phosphorylation and catalytic sites. Amino acid
phosphorylation is an important post-translational protein
modifications that are frequently seen in eukaryotic cells
[Lemeer & Heck, 2009]. Mapping large scale
phosphoproteomics to various data [e.g. de la Fuente van
Bentem et al., 2008; Morandell et al., 2008; Preisinger et al.,
2008] facilitates the process of elucidating more of the
signalling networks. Due to that available phosphorylation
motifs are short [Linding et al., 2007; Zanzoni et al., 2011]
and reside in unstructured and rapidly-evolving regions
[Brown et al., 2002], the comparative studies for estimating
the evolutionary conservation are enabled by the complete
phosphoproteomic data across species. As such
phosphorotemoic data have been recently available, someefforts to estimate the evolutionary conservation of the
phospho-residues appeared [Boekhorst et al., 2008; Holt et
al., 2009; Tan et al., 2009; 2009A]. Enzymes are another
important molecules for all biological mechanism. The
catalytic region are usually highly conserved and in short
length. In contrast, the responsible region for binding the
substrate are not as critical to catalytic function. Thus,
exploring enzyme activities will be helpful in understanding
the relationship between protein structure and function.
However, no efforts on mapping phosphoprotemoics to
P
*Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.E-Mail: n26001636{at}mail{dot}ncku{dot}edu{dot}tw
**Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.
E-Mail: n26000305{at}mail{dot}ncku{dot}edu{dot}tw
***Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.
E-Mail: chouwill79131{at}mail{dot}ncku{dot}edu{dot}tw
****Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.
E-Mail: darby{at}mail{dot}ncku{dot}edu{dot}tw
Hao Wang*, Wen-Hao Chiang**, Chia-Wei Chou*** & Darby Tien-Hao Chang****
Integrating Phosphorylation and
Catalytic Sites Information into AH-DB
8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB
http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 2/5
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014
ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 78
structural data have been made. With the integration of these
phosphoproteomic and catalysis data, AH-DB enables
researchers to obtain the extent of structural conformations in
the phosphorylation and catalysis and to view the
phosphorylation sites in a 3D and interactive interface.
The second enhancement is a novel measure of structure
variation, L-RMSD, which stands for local-root mean square
deviation. The root mean square deviation (RMSD) measures
the average distance between the equivalent points of twosuperimposed structures. In the context of protein structures,
the RMSD usually measures the Cα atoms of two
superimposed protein structures. The RMSD is widely used
in comparing two protein structures where one structure is
translated and rotated to minimize the RMSD onto the other
structure. A small RMSD is assumed to indicate a good
structure superimposition (as given in the Gauss – Markov
theorem) when the distances among the equivalent atoms
distribute equally [Seber & Wild, 1989]. If the assumption is
false, the RMSD becomes an inappropriate index of structure
comparison because of the heteroscedasticity issue in
statistics [Hogg & Tanis, 2010]. In protein superimpositions,
however, the distances among equivalent atoms usually vary
in different protein sub-regions. This is due to that proteins
are flexible rather than rigid bodies. For example, the local
structures of protein binding sites usually undergo
conformational transitions during the binding while most of
the other sub-regions of the proteins remain rigid body-like.
This fact leads to varying local precisions in comparing
protein structures of similar binding domains. Namely, some
equivalent atoms superimpose well while some do not. Thus,
a single RMSD that measures the global superimposition
quality is not enough. The proposed L-RMSD, in this regard,
is supplementary information of the global RMSD by
providing the local superimposition qualities of user specified protein sub-regions.
In summary:
The motivation of this research is to explore protein
conformational changes upon binding other molecules.
The objectives of this research are: a) to know the
impact of phosphorylation and catalytic activities on
the conformational change and b) to quantify
conformational changes of protein local structures.
The contributions of this manuscript are: a) integrating
phosphorylation and catalytic information into AH-DB
and b) enabling researchers to obtain the extent of
structural conformations in the phosphorylation and
catalytic activities in a 3D and interactive interface.
II. METHODS
This section first describes the fundamental architecture of
AH-DB that has been constructed in our previous work
[Chang et al., 2012], which helps readers to know how the
two enhancements were integrated to AH-DB. Follows are
the collection procedure of phosphorylation sites and the
calculation of the L-RMSD.
2.1.
Architecture of AH -DB
Figure 1 shows the architecture of AH-DB. It is a structure-
based database with friendly user interface. In AH-DB, the
protein structure before interaction is called the ―apo‖ state,
which has the meaning of ―standard‖ in medicine; while the
protein structure after interaction is called the ―holo‖ state,
which has the meaning of ―comprehensive‖. Thanks to the
advancement of the X-ray crystallography technology, the
tertiary structures of many proteins, namely the three-
dimensional coordinates of all atoms, at different states have
been determined. One may think of an X-ray crystallized
structure as a snapshot of a protein at a specific state. These
snapshots are well organized in the Protein Data Bank (PDB)
database. However, these snapshots are static.
Figure 1: Architecture of AH-DB
(a) a protein. (b) a protein binding with another
molecule, represented by the red triangle. In this figure, (a)and (b) represent two states of the same protein. These
protein states are X-ray crystalized (denoted ―X-ray‖ in the
figure) into structures. These protein structures are snapshots
of proteins and are organized into the Protein Data Bank
database (denoted ―PDB, snapshots‖ in the figure). AH-DB
selects, analyzes and visualizes suitable structure pairs to re-
construct the processes of proteins binding other molecules
(denoted ―Select‖, ―Analyze‖ and ―Visualize‖ in the figure),
which is a repository of changes rather than snapshots.
AH-DB comprises of three major components to
maximize the usage of these static snapshots. The first major
component of AH-DB is a procedure to ―select‖ suitable
structure pairs of similar proteins at different states. Based on
the selected structures, the second major component of AH-
DB is a collection of analysis algorithms/packages to identify
the structural changes of proteins during the processes of
binding other molecules. This component aims to re-construct
the binding process from a set of static snapshots. The third
major component of AH-DB is its interface, which provides
much flexibility for searching apo-holo structure pairs and a
highly interactive means of exploring the paired structures.
This sophisticated interface is necessary to allow users more
8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB
http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 3/5
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014
ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 79
comprehensively to explore the complicated relations among
molecules and conformational transitions. In brief, the three
components (Select, Analyze and Visualize in figure 1) make
up AH-DB, which is a repository of changes rather than
snapshots.
2.2. Phosphorylat ion and Catalyti c Sites
The data of phosphorylation sites (P-sites) of AH-DB was
collected from the phospho.ELM database [Diella et al.,2008]. The data deposited in phospho. ELM are
experimentally verified and/or extracted from literature. The
release of phospho.ELM dataset used in this work is version
9.0, which collected more than 42,500 phosphorylation sites
that span 8,718 substrate proteins. The correspondence
between phospho.ELM sequences and the Protein Data Bank
(PDB) chains was established via the UniProt identifier
[Apweiler et al., 2010]. The data of catalytic sites (C-sites)
was collected from the Catalytic Site Atlas [Porter et al.,
2004], and also classify these data via the UniProt identifier.
The local region of each phosphorylation or catalytic site in
AH-DB is defined as the instance and its 20 (ten preceding
and ten following) neighbor residues in sequence. The
phosphorylation and catalytic information can be visualized
in the sequence and structure views of AH-DB (Figure 2).
2.3. Local Root Mean Square Deviation (L -RMSD)
The equation of L-RMSD is identical to global RMSD as
follows:
L-RMSD
N
i
id N
1
21sqrt (1)
where N is the number of equivalent Cα pairs in the
superimposition and di is the distance between the i-th
equivalent Cα pair. The only difference of L-RMSD toRMSD is that the region of searching the N equivalent Cα
pairs is restricted to a local structure rather than the entire
protein. In AH-DB, the problem of determining the local
structure is solved by its interactive interface. As shown in
figure 2C, AH-DB provides various facilities for users to
highlight local structures. The phosphorylation sites enhanced
in this work, for example, is such a local structure. The L-
RMSD is calculated on-the-fly and shown in the structural
view immediately (Figure 3) when users enable the L-RMSD
display checkbox.
III. CASE STUDY
This section uses a protein of which the phosphorylation sites
have been comprehensively studied in a recent paper [Soroka
et al., 2012] as a case study to demonstrate the enhancements
of AH-DB. The used protein is Hsp90, which is a critical
molecular chaperone in the eukaryotic cytosol. The activity
of Hsp90 is largely influenced by the posttranslational
modifications. In Soroka et al., (2012) mutated the major phosphorylation sites of yeast Hsp90 and identified three
critical ones (S379, S485, and S604) that influenced yeast
growth. The mutation at S379 breaks the active conformation
of Hsp90; the one at S485 alters structural changes during the
ATP hydrolysis; the one at S604 affects inter-subunit
communication. As a result, the lethal effect of the first two
mutations comes from structural changes near the
phosphorylation sites.
Figure 2 shows the results of analysing Hsp90 with the
following procedure. In this case, [Organism] was set to S.
cerevisiae, [Target protein] was set to HSP82_YEAST,
[compact pairs] was set to ―by tar -add‖ (which asks AH-DB
to compact similar structure pairs for a condensed result),[representative pair] was set to ―most SSE transitions‖ (which
asks AH-DB to select the structure pair with the most
secondary structure transitions when compacting), [exclude
NMR] was enabled, [ignore ligand] was enabled, [ignore ion]
was enabled and [technology consistency] was enabled
(which asks the paired structures must be determined by the
same technology). The squared bracketed terms above are
search options of AH-DB, more details of each option can be
found in our previous work [Chang et al., 2012] and/or on the
web [http://ahdb.ee.ncku.edu.tw/]. In the search page that lists
apo-holo structure pairs satisfying the above options, the
fourth pair — which contains the most secondary structuretransitions — was chosen. In the pair page that shows details
of the chosenapo-holo structure pair, the phosphorylation
sites were highlighted with L-RMSD enabled. The added
molecules were hidden to focus on the target protein, Hsp90.
8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB
http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 4/5
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014
ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 80
Figure 2: Interface to Access the Phosphorylation Information in AH-DB
(A) Sequence view; (B) structure view; (C) display controls.
Users can utilize the controls in (C) to highlight
phosphorylation sites in (A) and (B). The protein shown in
this figure is yeast Hsp90, of which the phosphorylation sites
have been comprehensively studied in a recent paper. The
mutation at S379 breaks the active conformation of Hsp90
while the mutation at S485 alters structural changes during
the ATP hydrolysis.
Figure 2 shows an obvious difference between the apo
(blue) and holo (red) structures at S379. The large difference
is also revealed in the L-RMSD (8.15) in figure 3. However,
the structural difference between the apo and holo structures
is less obvious at S485. The only hint in figure 2 is that theS485 locates at the structure boundary of the apo structure
(the area righter than the S485 has only the blue structure).
Though there are various reasons that a PDB entry does not
contain the entire protein structure (such as disordered region
or experimental limitation), it is commonly inferred that the
missing region in the apo structure is an independent domain
of Hsp90. Thus, the mutation at S485 might influence the
conformation of two domains in Hsp90 and result in lethal
effects. This conjecture is consistent with Soroka et al’s
(2012) experimental results. They observed the strongest
effect in reducing ATPase activity at S485. However, this
phosphorylation site is far away from the ATP-binding site.
Thus, Soroka et al., (2012) anticipated that the S485
introduces variants in the conformational dynamics and
results in a distant effect in Hsp90. They also conducted an
experiment to show that the S485 does decrease the structural
flexibility of Hsp90 and thus prevent formatting the N-
terminal dimers.
Figure 3: Interface to Show L-RMSD
IV. CONCLUSION
AH-DB is the largest repository of structural changes in the
world. Its data support various analyses of, for example,
protein disorder, secondary structure transition, catalysis,
translational regulation and molecular dynamics. The
enhancements introduced in this work make AH-DB capable
to provide wider applications. As the structure determination
techniques continue to be improved, AH-DB has the potential
to greatly expedite and extend analyses in related fields, for
example, protein disorder, secondary structure transition,
protein flexibility/plasticity, protein interaction, post-
translational modification and molecular dynamics, or even
combine with our recent work like machine learning
techniques and protein interaction analysis [Fan et al., 2013;
Lin et al., 2013].
S379S485
CB
A