Integrating Phosphorylation and Catalytic Sites Information into AH-DB

5
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014 ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 77  Abstract   Protein is the so-called molecule machine which binds various molecules. These interactions are critical to many biological processes. A protein before interaction is called apostate while after interaction i s called holostate. Our recent study constructed the largest repository, Apo-Holo DataBase (AH-DB), of apo- holo structure pairs in the world, which enables researchers to understand protein interactions and their associated biological processes. This work is a following work of AH-DB. In this work, two enhancements have been done: (a) phosphorylation and catalytic sites and (b) a novel measure of structural variation. The release of phospho.ELM dataset used in this work is version 9.0, which collected more than 42,500  phosphorylation sites that span 8,718 substrate proteins, and the catalytic sites information are collected from Catalytic Site Atlas. The second enhancement is a novel measure of structure variation, L-RMSD, which stands for local-root mean square deviation. A case study conducted in this work show that the two enhancements make AH-DB capable to provide wider applications Keywords   Catalytic Site; Conformational Transition; Molecular Interaction; Phosphorylation Site; Protein Structure; RMSD. Abbreviations   Apo-Holo DataBase (AH-DB); Catalytic Sites (C-sites); Local Root Mean Square Deviation (L-RMSD); Nuclear Magnetic Resonance (NMR); Phosphorylation Sites (P-sites); Protein Data Bank (PDB); Root Mean Square Deviation (RMSD). I. INTRODUCTION ROTEIN interactions with other molecules play an important role in various biological processes. During the interaction of binding other molecules, many  proteins undergo conformational transitions [Kim et al., 2000; Goh et al., 2004; Michel 2007; Dan et al., 2010;]. Knowing such conformational transitions helps to understand  protein interactions and their associated biological processes. To this goal, comparing the structure of a protein before  binding (apo structure ) with that after binding (holo struct ure) is important. Our recent work constructed a comprehensive library, the Apo-Holo DataBase (AH-DB), of apo-holo structure pairs for researchers to study conformational transitions [Chang et al., 2012]. It provides sophisticated interfaces for searching apo-holo structure pairs and exploring conformational transitions in the paired structures. This article demonstrates two major enhancements of AH- DB. The first enhancement is the inclusion of  phosphorylation and catalytic sites. Amino acid  phosphorylation is an important post-translational protein modifications that are frequently seen in eukaryotic cells [Lemeer & Heck, 2009]. Mapping large scale  phosphoproteomics to various data [e.g. de la Fuente van Bentem et al., 2008; Morandell et al., 2008; Preisinger et al., 2008] facilitates the process of elucidating more of the signalling networks. Due to that available phosphorylation motifs are short [Linding et al., 2007; Zanzoni et al., 2011] and reside in unstructured and rapidly-evolving regions [Brown et al., 2002], the comparative studies for estimating the evolutionary conservation are enabled by the complete  phosphoproteomic data across species. As such  phosphorotemoic data have been recently available, some efforts to estimate the evolutionary conservation of the  phospho-residues appeared [Boekhorst et al., 2008; Holt et al., 2009; Tan et al., 2009; 2009A]. Enzymes are another important molecules for all biological mechanism. The catalytic region are usually highly conserved and in short length. In contrast, the responsible region for binding the substrate are not as critical to catalytic function. Thus, exploring enzyme activities will be helpful in understanding the relationship between protein structure and function. However, no efforts on mapping phosphoprotemoics to P *Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN. E-Mail: n2 6001636{at}mail{dot}ncku{dot}edu{dot}t w **Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN. E-Mail: n2 6000305{at}mail{dot}ncku{dot}edu{dot}t w ***Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN. E-Mail: chouwill79131{at}mail{ dot}ncku{dot}edu{dot}tw ****Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN. E-Mail: darby{at}mail{dot}ncku{dot}e du{dot}tw Hao Wang*, Wen-Hao Chiang**, Chia-Wei Chou*** & Darby Tien-Hao Chang**** Integrating Phosphorylation and Catalytic Sites Information into AH-DB

Transcript of Integrating Phosphorylation and Catalytic Sites Information into AH-DB

8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB

http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 1/5

The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014 

ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 77 

Abstract  — Protein is the so-called molecule machine which binds various molecules. These interactions are

critical to many biological processes. A protein before interaction is called ―apo‖ state while after interaction is

called ―holo‖ state. Our recent study constructed the largest repository, Apo-Holo DataBase (AH-DB), of apo-

holo structure pairs in the world, which enables researchers to understand protein interactions and their

associated biological processes. This work is a following work of AH-DB. In this work, two enhancementshave been done: (a) phosphorylation and catalytic sites and (b) a novel measure of structural variation. The

release of phospho.ELM dataset used in this work is version 9.0, which collected more than 42,500

 phosphorylation sites that span 8,718 substrate proteins, and the catalytic sites information are collected from

Catalytic Site Atlas. The second enhancement is a novel measure of structure variation, L-RMSD, which

stands for local-root mean square deviation. A case study conducted in this work show that the two

enhancements make AH-DB capable to provide wider applications

Keywords  — Catalytic Site; Conformational Transition; Molecular Interaction; Phosphorylation Site; Protein

Structure; RMSD.

Abbreviations  — Apo-Holo DataBase (AH-DB); Catalytic Sites (C-sites); Local Root Mean Square Deviation

(L-RMSD); Nuclear Magnetic Resonance (NMR); Phosphorylation Sites (P-sites); Protein Data Bank (PDB);

Root Mean Square Deviation (RMSD).

I.  INTRODUCTION 

ROTEIN interactions with other molecules play an

important role in various biological processes. During

the interaction of binding other molecules, many

 proteins undergo conformational transitions [Kim et al.,

2000; Goh et al., 2004; Michel 2007; Dan et al., 2010;].

Knowing such conformational transitions helps to understand

 protein interactions and their associated biological processes.

To this goal, comparing the structure of a protein before

 binding (apo structure) with that after binding (holo structure)

is important. Our recent work constructed a comprehensivelibrary, the Apo-Holo DataBase (AH-DB), of apo-holo

structure pairs for researchers to study conformational

transitions [Chang et al., 2012]. It provides sophisticated

interfaces for searching apo-holo structure pairs and

exploring conformational transitions in the paired structures.

This article demonstrates two major enhancements of AH-

DB.

The first enhancement is the inclusion of

 phosphorylation and catalytic sites. Amino acid

 phosphorylation is an important post-translational protein

modifications that are frequently seen in eukaryotic cells

[Lemeer & Heck, 2009]. Mapping large scale

 phosphoproteomics to various data [e.g. de la Fuente van

Bentem et al., 2008; Morandell et al., 2008; Preisinger et al.,

2008] facilitates the process of elucidating more of the

signalling networks. Due to that available phosphorylation

motifs are short [Linding et al., 2007; Zanzoni et al., 2011]

and reside in unstructured and rapidly-evolving regions

[Brown et al., 2002], the comparative studies for estimating

the evolutionary conservation are enabled by the complete

 phosphoproteomic data across species. As such

 phosphorotemoic data have been recently available, someefforts to estimate the evolutionary conservation of the

 phospho-residues appeared [Boekhorst et al., 2008; Holt et

al., 2009; Tan et al., 2009; 2009A]. Enzymes are another

important molecules for all biological mechanism. The

catalytic region are usually highly conserved and in short

length. In contrast, the responsible region for binding the

substrate are not as critical to catalytic function. Thus,

exploring enzyme activities will be helpful in understanding

the relationship between protein structure and function.

However, no efforts on mapping phosphoprotemoics to

P

*Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.E-Mail: n26001636{at}mail{dot}ncku{dot}edu{dot}tw

**Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.

E-Mail: n26000305{at}mail{dot}ncku{dot}edu{dot}tw

***Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.

E-Mail: chouwill79131{at}mail{dot}ncku{dot}edu{dot}tw

****Department of Electrical Engineering, National Cheng Kung University, Tainan, TAIWAN.

E-Mail: darby{at}mail{dot}ncku{dot}edu{dot}tw

Hao Wang*, Wen-Hao Chiang**, Chia-Wei Chou*** & Darby Tien-Hao Chang****

Integrating Phosphorylation and

Catalytic Sites Information into AH-DB

8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB

http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 2/5

The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014 

ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 78 

structural data have been made. With the integration of these

 phosphoproteomic and catalysis data, AH-DB enables

researchers to obtain the extent of structural conformations in

the phosphorylation and catalysis and to view the

 phosphorylation sites in a 3D and interactive interface.

The second enhancement is a novel measure of structure

variation, L-RMSD, which stands for local-root mean square

deviation. The root mean square deviation (RMSD) measures

the average distance between the equivalent points of twosuperimposed structures. In the context of protein structures,

the RMSD usually measures the Cα atoms of two

superimposed protein structures. The RMSD is widely used

in comparing two protein structures where one structure is

translated and rotated to minimize the RMSD onto the other

structure. A small RMSD is assumed to indicate a good

structure superimposition (as given in the Gauss – Markov

theorem) when the distances among the equivalent atoms

distribute equally [Seber & Wild, 1989]. If the assumption is

false, the RMSD becomes an inappropriate index of structure

comparison because of the heteroscedasticity issue in

statistics [Hogg & Tanis, 2010]. In protein superimpositions,

however, the distances among equivalent atoms usually vary

in different protein sub-regions. This is due to that proteins

are flexible rather than rigid bodies. For example, the local

structures of protein binding sites usually undergo

conformational transitions during the binding while most of

the other sub-regions of the proteins remain rigid body-like.

This fact leads to varying local precisions in comparing

 protein structures of similar binding domains. Namely, some

equivalent atoms superimpose well while some do not. Thus,

a single RMSD that measures the global superimposition

quality is not enough. The proposed L-RMSD, in this regard,

is supplementary information of the global RMSD by

 providing the local superimposition qualities of user specified protein sub-regions.

In summary:

  The motivation of this research is to explore protein

conformational changes upon binding other molecules.

  The objectives of this research are: a) to know the

impact of phosphorylation and catalytic activities on

the conformational change and b) to quantify

conformational changes of protein local structures.

  The contributions of this manuscript are: a) integrating

 phosphorylation and catalytic information into AH-DB

and b) enabling researchers to obtain the extent of

structural conformations in the phosphorylation and

catalytic activities in a 3D and interactive interface.

II.  METHODS 

This section first describes the fundamental architecture of

AH-DB that has been constructed in our previous work

[Chang et al., 2012], which helps readers to know how the

two enhancements were integrated to AH-DB. Follows are

the collection procedure of phosphorylation sites and the

calculation of the L-RMSD.

2.1. 

Architecture of AH -DB

Figure 1 shows the architecture of AH-DB. It is a structure-

 based database with friendly user interface. In AH-DB, the

 protein structure before interaction is called the ―apo‖ state,

which has the meaning of ―standard‖ in medicine; while the

 protein structure after interaction is called the ―holo‖ state,

which has the meaning of ―comprehensive‖. Thanks to the

advancement of the X-ray crystallography technology, the

tertiary structures of many proteins, namely the three-

dimensional coordinates of all atoms, at different states have

 been determined. One may think of an X-ray crystallized

structure as a snapshot of a protein at a specific state. These

snapshots are well organized in the Protein Data Bank (PDB)

database. However, these snapshots are static.

Figure 1: Architecture of AH-DB

(a) a protein. (b) a protein binding with another

molecule, represented by the red triangle. In this figure, (a)and (b) represent two states of the same protein. These

 protein states are X-ray crystalized (denoted ―X-ray‖ in the

figure) into structures. These protein structures are snapshots

of proteins and are organized into the Protein Data Bank

database (denoted ―PDB, snapshots‖ in the figure). AH-DB

selects, analyzes and visualizes suitable structure pairs to re-

construct the processes of proteins binding other molecules

(denoted ―Select‖, ―Analyze‖ and ―Visualize‖ in the figure),

which is a repository of changes rather than snapshots.

AH-DB comprises of three major components to

maximize the usage of these static snapshots. The first major

component of AH-DB is a procedure to ―select‖ suitable

structure pairs of similar proteins at different states. Based on

the selected structures, the second major component of AH-

DB is a collection of analysis algorithms/packages to identify

the structural changes of proteins during the processes of

 binding other molecules. This component aims to re-construct

the binding process from a set of static snapshots. The third

major component of AH-DB is its interface, which provides

much flexibility for searching apo-holo structure pairs and a

highly interactive means of exploring the paired structures.

This sophisticated interface is necessary to allow users more

8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB

http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 3/5

The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014 

ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 79 

comprehensively to explore the complicated relations among

molecules and conformational transitions. In brief, the three

components (Select, Analyze and Visualize in figure 1) make

up AH-DB, which is a repository of changes rather than

snapshots.

2.2. Phosphorylat ion and Catalyti c Sites

The data of phosphorylation sites (P-sites) of AH-DB was

collected from the phospho.ELM database [Diella et al.,2008]. The data deposited in phospho. ELM are

experimentally verified and/or extracted from literature. The

release of phospho.ELM dataset used in this work is version

9.0, which collected more than 42,500 phosphorylation sites

that span 8,718 substrate proteins. The correspondence

 between phospho.ELM sequences and the Protein Data Bank

(PDB) chains was established via the UniProt identifier

[Apweiler et al., 2010]. The data of catalytic sites (C-sites)

was collected from the Catalytic Site Atlas [Porter et al.,

2004], and also classify these data via the UniProt identifier.

The local region of each phosphorylation or catalytic site in

AH-DB is defined as the instance and its 20 (ten preceding

and ten following) neighbor residues in sequence. The

 phosphorylation and catalytic information can be visualized

in the sequence and structure views of AH-DB (Figure 2).

2.3.  Local Root Mean Square Deviation (L -RMSD)

The equation of L-RMSD is identical to global RMSD as

follows:

L-RMSD

 

 

 

 

 N 

i

id  N 

1

21sqrt   (1)

where N is the number of equivalent Cα pairs in the

superimposition and di is the distance between the i-th

equivalent Cα pair. The only difference of L-RMSD toRMSD is that the region of searching the N equivalent Cα

 pairs is restricted to a local structure rather than the entire

 protein. In AH-DB, the problem of determining the local

structure is solved by its interactive interface. As shown in

figure 2C, AH-DB provides various facilities for users to

highlight local structures. The phosphorylation sites enhanced

in this work, for example, is such a local structure. The L-

RMSD is calculated on-the-fly and shown in the structural

view immediately (Figure 3) when users enable the L-RMSD

display checkbox.

III.  CASE STUDY 

This section uses a protein of which the phosphorylation sites

have been comprehensively studied in a recent paper [Soroka

et al., 2012] as a case study to demonstrate the enhancements

of AH-DB. The used protein is Hsp90, which is a critical

molecular chaperone in the eukaryotic cytosol. The activity

of Hsp90 is largely influenced by the posttranslational

modifications. In Soroka et al., (2012) mutated the major phosphorylation sites of yeast Hsp90 and identified three

critical ones (S379, S485, and S604) that influenced yeast

growth. The mutation at S379 breaks the active conformation

of Hsp90; the one at S485 alters structural changes during the

ATP hydrolysis; the one at S604 affects inter-subunit

communication. As a result, the lethal effect of the first two

mutations comes from structural changes near the

 phosphorylation sites.

Figure 2 shows the results of analysing Hsp90 with the

following procedure. In this case, [Organism] was set to S.

cerevisiae, [Target protein] was set to HSP82_YEAST,

[compact pairs] was set to ―by tar -add‖ (which asks AH-DB

to compact similar structure pairs for a condensed result),[representative pair] was set to ―most SSE transitions‖ (which

asks AH-DB to select the structure pair with the most

secondary structure transitions when compacting), [exclude

 NMR] was enabled, [ignore ligand] was enabled, [ignore ion]

was enabled and [technology consistency] was enabled

(which asks the paired structures must be determined by the

same technology). The squared bracketed terms above are

search options of AH-DB, more details of each option can be

found in our previous work [Chang et al., 2012] and/or on the

web [http://ahdb.ee.ncku.edu.tw/]. In the search page that lists

apo-holo structure pairs satisfying the above options, the

fourth pair  — which contains the most secondary structuretransitions — was chosen. In the pair page that shows details

of the chosenapo-holo structure pair, the phosphorylation

sites were highlighted with L-RMSD enabled. The added

molecules were hidden to focus on the target protein, Hsp90.

8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB

http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 4/5

The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 2, No. 3, May 2014 

ISSN: 2321-2381 © 2014 | Published by The Standard International Journals (The SIJ) 80 

Figure 2: Interface to Access the Phosphorylation Information in AH-DB

(A) Sequence view; (B) structure view; (C) display controls.

Users can utilize the controls in (C) to highlight

 phosphorylation sites in (A) and (B). The protein shown in

this figure is yeast Hsp90, of which the phosphorylation sites

have been comprehensively studied in a recent paper. The

mutation at S379 breaks the active conformation of Hsp90

while the mutation at S485 alters structural changes during

the ATP hydrolysis. 

Figure 2 shows an obvious difference between the apo

(blue) and holo (red) structures at S379. The large difference

is also revealed in the L-RMSD (8.15) in figure 3. However,

the structural difference between the apo and holo structures

is less obvious at S485. The only hint in figure 2 is that theS485 locates at the structure boundary of the apo structure

(the area righter than the S485 has only the blue structure).

Though there are various reasons that a PDB entry does not

contain the entire protein structure (such as disordered region

or experimental limitation), it is commonly inferred that the

missing region in the apo structure is an independent domain

of Hsp90. Thus, the mutation at S485 might influence the

conformation of two domains in Hsp90 and result in lethal

effects. This conjecture is consistent with Soroka et al’s 

(2012) experimental results. They observed the strongest

effect in reducing ATPase activity at S485. However, this

 phosphorylation site is far away from the ATP-binding site.

Thus, Soroka et al., (2012) anticipated that the S485

introduces variants in the conformational dynamics and

results in a distant effect in Hsp90. They also conducted an

experiment to show that the S485 does decrease the structural

flexibility of Hsp90 and thus prevent formatting the N-

terminal dimers.

Figure 3: Interface to Show L-RMSD

IV.  CONCLUSION 

AH-DB is the largest repository of structural changes in the

world. Its data support various analyses of, for example,

 protein disorder, secondary structure transition, catalysis,

translational regulation and molecular dynamics. The

enhancements introduced in this work make AH-DB capable

to provide wider applications. As the structure determination

techniques continue to be improved, AH-DB has the potential

to greatly expedite and extend analyses in related fields, for

example, protein disorder, secondary structure transition,

 protein flexibility/plasticity, protein interaction, post-

translational modification and molecular dynamics, or even

combine with our recent work like machine learning

techniques and protein interaction analysis [Fan et al., 2013;

Lin et al., 2013].

S379S485

CB

A

8/11/2019 Integrating Phosphorylation and Catalytic Sites Information into AH-DB

http://slidepdf.com/reader/full/integrating-phosphorylation-and-catalytic-sites-information-into-ah-db 5/5