Assessment of predictions submitted for the CASP7 function prediction category
Click here to load reader
-
Upload
gonzalo-lopez -
Category
Documents
-
view
216 -
download
0
Transcript of Assessment of predictions submitted for the CASP7 function prediction category
![Page 1: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/1.jpg)
proteinsSTRUCTURE O FUNCTION O BIOINFORMATICS
Function: Prediction
Assessment of predictions submitted forthe CASP7 function prediction categoryGonzalo Lopez, Ana Rojas, Michael Tress,* and Alfonso Valencia
Structural and Computational Biology Programme, Spanish National Cancer Research Centre, Almagro, Madrid, Spain
INTRODUCTION
Whole-genome sequencing projects are generating unannotated sequen-
ces in increasing numbers and at the same time there are a substantial
number of known structures that have little or no functional information,
many of these generated by structural genomics projects. There is a great
deal of interest in predicting function for these proteins and it is clear that
function prediction is becoming an increasingly important field.1–4
Function assignment is far from simple. Although functional annotations
can be transferred by homology, common evolutionary origin does not
guarantee identical function and the more distant the evolutionary relation-
ship, the less reliable the transfer will be.5
Protein 3D structure can be of use in predicting function. In theory,
structure-based prediction ought to succeed more often than sequence-
based prediction because structural patterns tend to be conserved long after
sequence patterns become undetectable. However, predicting function for
proteins with known structure still presents researchers with problems.
While structure may be conserved within a superfamily of proteins, it is
not always true that function is conserved to the same extent.6
Structure-based function prediction may present researchers with some
challenges, but it does seem probable that protein 3D structure can directly
aid in functional annotation. Function prediction was included in CASP6
for the first time7 with the aim of discovering whether computational
methods could use 3D structure to add useful molecular or biological in-
formation to the target proteins.
One interesting side effect of the large increase in known 3D structures is
that it is now possible to build homology models for a large number of
proteins. Model databases8–10 that extend the range of known structures
are springing up all the time. The rise of the protein model databases gen-
The authors state no conflict of interest.
Grant sponsor: BioSapiens; Grant number: LSHC-CT-2003-505265; Grant sponsor: GENEFUN; Grant num-
ber: LSHG-CT-2004-503567.
*Correspondence to: Michael Tress, Structural Biological Computation Programme, CNIO, c./ Melchor
Fernandez Almagro, 3, Madrid, Spain. E-mail: [email protected]
Received 1 March 2007; Revised 16 April 2007; Accepted 30 April 2007
Published online 24 July 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21651
ABSTRACT
Here we present a full overview of the
Critical Assessment of Protein Structure
Prediction (CASP7) function prediction
category. Predictions were submitted for
Gene Ontology molecular function terms,
Enzyme Commission numbers, and ligand
binding site residues. The first two catego-
ries were difficult to assess because very
little new functional information becomes
available after the experiment. The major-
ity of the known Gene Ontology terms and
all the Enzyme Commission numbers were
available a priori to predictors before the
experiment, so prediction for these two
categories was not blind. Nevertheless, for
Gene Ontology terms we were able to
demonstrate that some groups made better
predictions than others. In the binding
residue category, the predictors did not
know in advance which ligands were
bound and therefore blind evaluation was
possible, but there were disappointingly
few predictions in this category. After
CASP 6 and 7 the need to organize a
more effective blind function prediction
category is obvious, even if it means focus-
ing on binding site prediction as the only
category that can be truly assessed in the
CASP spirit.
Proteins 2007; 69(Suppl 8):165–174.VVC 2007 Wiley-Liss, Inc.
Key words: target structures; function pre-
diction; 3D models; binding sites; GO
terms; EC numbers.
VVC 2007 WILEY-LISS, INC. PROTEINS 165
![Page 2: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/2.jpg)
erates a number of questions. For example, to what
extent could we use these 3D models to make reliable
functional assignments? How much can remotely homol-
ogous models aid the prediction of general function?
And for what kind of biological situations are model-
based function predictions most useful?
Answering these questions under the format of CASP
is a tall order. Targets are released in CASP because we
are close to deducing their structure not their function,
and for that reason the function remains an unknown at
the end of the experiment. Experience shows that with a
bit of luck the probable function of the protein may be
refined after the structure is released, but unless a group
has been carrying out combined structural and functional
studies it is generally not possible to come up with a de-
finitive answer. The fact that we know little new about
the function of the target proteins after the CASP predic-
tion season is closed complicates the way that the func-
tion prediction category is assessed.
The first specific challenge that we faced in setting up
the function prediction experiment for CASP7 was the
definition of function itself. What is function and what
do we expect predictors to be able to predict? We
decided to evaluate function prediction with three sepa-
rate measures. The methods we chose were two standard
measures of general function, Enzyme Commission num-
bers (EC numbers11) and Gene Ontology (GO) molecu-
lar function terms.12 These two measures tended to over-
lap somewhat. We included a more specific measure that
we felt could have been predicted by the predicting
groups and that we were sure we would be able to evalu-
ate with confidence after the experiments–the prediction
of ligand binding residues.
In this report, we present the results of our assessment
of the predictions submitted to the CASP7 function pre-
diction experiment. The number of groups making pre-
dictions in this section was disappointingly poor given its
importance, but we were able to reach some conclusions
on the state of the art in function prediction. The results
also allowed us to make firm proposals as to the future
of function prediction in CASP.
METHODS AND RESULTS
Prediction methods
One of the fields in the CASP submission format
allowed predictors to enter information on the type of
methods used for function prediction. The aim was to
have a measure of the contribution of different approaches
to function predictions and try to answer to what extent
3D model structures can be used in the assignment of
functional information. The methods used by predictors
were encoded as binary vectors representing the usage or
not of the six different proposed approaches and the fre-
quency of use of these vectors are represented in Figure 1.
It is clear from the figure that almost all the predictors
used sequence analysis as part of their prediction and
that the second most commonly used technique was the
use of information from the GO database. Structural in-
formation is used in very few cases, three of the four
most common vectors did not employ structural infor-
mation, but it is perhaps interesting to note that arguably
the most successful prediction group (FN408) did use
structural information in their predictions.
Predictors made use of a wide range of methods
reflecting an increasing activity in the area of function
annotation that goes beyond standard annotation pipe-
lines based on simple transference from homologous sequen-
ces. Indeed, some groups employed as many methods
as possible in their predictions. For example FN408
(KIHARA_PFP) used a sequence-based GO term predic-
tor (PFP2), which makes annotations based on the fre-
quency of GO terms in PSI-BLAST13 searches, in con-
junction with information from GO, from domain, motif
and protein–protein interaction databases and literature.
Group FN510 (IUB-Info) made predictions with a sup-
port vector machine trained on residue-based, sequence
Figure 1Summary of methods used for function prediction: methods are encoded in
binary vectors representing a group of method types. (a) Frequency of vector
usage, note that each group sent the same vector in all predictions and that not
all group submitted method usage information. (b) Method preferences in each
prediction category. Note that binding site bars (BS) are biased given that only
one group predicted systematically and submitted method vectors for this
category. [Color figure can be viewed in the online issue, which is available at
www.interscience.wiley.com.]
G. Lopez et al.
166 PROTEINS DOI 10.1002/prot
![Page 3: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/3.jpg)
alignment, and literature derived features. FN687 (YU-
BA, probably the better of the two binding residue pre-
dictors) made their predictions after superimposing
CASP7 server models onto PDB structures14 with bound
ligands and selecting the bound ligand through nearest
neighbor clustering. Groups FN212/FN418 (HHPred)
based their GO term predictions on profile–profile align-
ments between hidden Markov model alignments gener-
ated around the target sequences and those alignments
generated around sequences from PDB structures and
functional domains. The GO terms were drawn from the
relative numbers of GO terms/descriptors found in the
searches. FN788 (LCBDavis) use short fragments of
structure (‘‘local descriptors’’) and ROSETTA to generate
a set of rules that could discriminate function.
Targets
There were 100 target proteins evaluated in the CASP6
function prediction set. Five extra targets were included
in the function assessment category: targets T0343,
T0352, and T0310 that were excluded from the structural
assessment by the assessors15 and targets T0344 and
T0355 for which predictors made predictions, but for
which no structure was deposited with CASP. The final
structure of these targets was not relevant for function
prediction evaluation. Function prediction targets were
not split into domains, predictors had to predict function
for the whole of the given sequence.
The functional annotation used in the evaluation came
from various sources. The main source of reliable infor-
mation was Uniprot,16 from which we took GO terms
and EC numbers. Residues that bound ligands came
from the target structures that were deposited with the
organizers. In addition, we were able to add terms by
crosschecking GO terms with EC number and vice versa,
and by crosschecking GO terms with bound ligands.
Sixty-six targets had some sort of functional informa-
tion associated to them in terms of EC numbers, GO
terms, or bound ligands. These targets formed the pool
of targets that we were able to use for this evaluation. It
follows that there were 34 targets for which there was no
usable functional information and these targets formed
no part of this evaluation.
Targets with bound ligands
Binding residues came directly from the structures in
the PDB or those deposited with the CASP organizers.
Binding site residues were defined as those residues in
contact with biologically relevant ligands. Two atoms
were considered to be in contact when they were within
a distance of 0.5 A plus the sum of their van der Waals
distances. This definition was established and made pub-
lic before prediction seasons in the submission format at
CASP web pages. There were 63 targets with 112 candi-
date bound molecules.
While the contact criteria are straightforward, deter-
mining biological relevance is more difficult. Solvent
molecules were discarded a priori and the remaining
bound ions, metabolites, and coenzymes were assessed
using a combination of the database FireDB,17 LGA18
structural alignments, and literature information. If a tar-
get with a bound ligand had a structural homologue that
bound the same or an equivalent ligand at the same site
and the binding residues were conserved, the ligand was
considered to be biologically relevant. Conservation of
the binding residues was determined with the server fire-
star.19 An example is shown in Figure 2.
We were able to demonstrate biological importance for
a total of 28 ligands bound to 21 targets (Table I). Of the
ligands, 17 were ions (Mg, Zn, Ni, and Na), 7 were
nucleotides (ADP, GTP) or derivates (FAD, PLR, SAH,
SAM), 3 were metabolites (PO4, Oxalate), and 1 target
bound coenzyme A. Ligand codes come from the compo-
nents.cif file of the PDB.
Figure 2Binding site biological relevance: (a) Target T0348 sequence is submitted to
firestar. The output generates an alignment with PSI-BLAST, the reliability of
alignments is assessed by SQUARE (Ref. 20 conservation is shown in blue) and
template ligand binding sites are mapped on to the alignment. The first box
represents the target aligned with it and shows the zinc binding residues. The
second and third boxes show the alignment of T0348 with the 1pft and 1p91A
templates respectively. The three cysteines from the zinc-binding site are
conserved while the glutamate is changed in both templates. (b) LGA structural
alignment of T0348 and NMR model 1 from 1pft. The figure shows that side
chains are oriented towards the Zinc atom in the three conserved cysteines but
also that glu 32 (T0348) and cys 30 (1pft) are close to the zinc ion. [Color
figure can be viewed in the online issue, which is available at
www.interscience.wiley.com.]
CASP7 Function Prediction Assessment
DOI 10.1002/prot PROTEINS 167
![Page 4: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/4.jpg)
There were two cases where nonstandard ligands were
bound in biological binding sites. Target T0292 is
known to bind ATP, but the structure deposited with
CASP bound nucleotide analog 5-[(Z)-(5-chloro-2-oxo-
1,2-dihydro-3H-indol-3-ylidene)methyl]-N-(diethylamino)
ethyl]-2,4-dimethyl-1H-pyrrole-3-carboxamide (5Z5).
Meanwhile target T0339 is a pyridoxal phosphate (PLP)
binding protein, but the structure contained a close deri-
vate (5-hydroxy 4,6-dimethylpyridin-3-YL)methyl dihy-
drogen phosphate (PLR). To assess the similarity between
the natural ligand and the analog, we obtained Tanimoto
coefficients from the Super Ligands database.21 PLP and
PLR had a Tanimoto score of 88.57%, while ATP and
5Z5 had a score of 60.14%. It has been established that
Tanimoto scores higher than 0.7 (70%) indicate that two
molecules have high structural similarity, which in turn
is a good indication of similar biological activity.22 For
this reason 5Z5 was rejected while PLR included in the
list of biologically relevant sites.
Binding site evaluation
We measured binding site prediction with simple cover-
age and accuracy scores. Coverage was the number of cor-
rectly predicted residues divided by the number of anno-
tated residues. Accuracy represented the number of
correctly predicted residues from the total number of pre-
dicted residues.
C0:5 ¼ 100 3N0:5
T0:5A0:5 ¼ 100 3
N0:5
P0:5
where T0.5 is the number of residues in contact with the
ligand under 0.5 distance cut-off, P0.5 the number of pre-
dicted residues, and N0.5 is the number of correctly pre-
dicted residues at 0.5 A distance cut-off.
One problem with simple measures such as the accu-
racy and coverage at 0.5 A is that it does not take into
account surrounding residues. A residue at 0.54 A plus
van der Waals might also play a role in the binding of the
ligand and if it is predicted by one of the groups, it ought
to receive some score. For that reason we also calculated
scores similar to the GDT-TS structure prediction scores
that take this into account. It is also possible to calculate
ROC-like curves from the accuracy and coverage because
these measures can also be obtained for a range of differ-
ent distance cut-offs (C1.0, A1.0, C1.5, A1.5, etc).
While we had the means to assess this category, only
two groups (FN408 and FN687) submitted systematic
predictions for binding residues, while a third group
Table IBinding Site Summary Table
Target PDB Ligand Residue numbers Residues, one letter code aa Homologues by PSI-Blast
T0284 – OXL-MG2 48, 49, 50, 88, 159, 212, 235 GGSDRYH 1oqfA, 1o5qC, 1f8iA, 1m1bA. . .T0289 2gu2 ZN 20, 23, 115 HEH 2bcoB, 2i3cA, 1yw4A. . .T0292 2cl1 5Z5a 12, 33, 34, 66, 84–88, 90, 146, 160 IVKVMEYCEGFL 1ol7A, 1q24A. . .T0293 2h00 SAH 34, 36, 41, 69–71, 75–76, 91–93, 97, 119,
121–124, 143–145, 147, 186, 189LPRGTGIYTEVDCVQTLNPPFFR 2h00A, 2b3tA, 1sq9A. . .
T0308 2h57 GTP-MG 10–16, 31, 34, 55–56, 59, 114, 115, 117, 118, 147,149 ANSGKTTITSNKGRDLDAI 1hurA, 1mr3F, 1r8sA. . .T0312 2h6l ZN 89, 91, 104 HHH 1xv2CT0313 2h58 ADP-MG 7, 9, 10, 12, 86–92 RRPTTGAGKTY 1bq2, 1vfzA, 1gojA. . .T0315 2gzx NI-NI 6, 8, 92, 128, 153, 204 HHEHHD 1yixA, 1xwyA. . .b
T0316 2hma SAM-MG 12–14, 16, 18, 19, 36–38, 100, 104, 108,126–128, 152, 155
GMSGDSIFMDNKTGHNF 1qpmA, 1xnqB. . .
T0318 2hb6 ZN-ZN 252, 257, 275, 334, 336 KDDDE 2ewbA, 1qytF. . .T0319 2j6a ZN 11, 16, 112, 115 CCCC 2hf1A, 1dx8AT0320 – FAD 59, 61, 66, 106, 107, 144, 148, 161, 163–165,
181, 182, 185, 188, 190, 300SNCFIMFIIGITDTFRR
T0324 2hdo PO4 9–11, 104, 105, 137 DIDTSK 2gfhAT0329 2hl0 NA 9, 11, 189 DDD 2gfhAT0330 2hcf MG 9, 11, 177 DDD 2fdrA, 1lvhA. . .T0332 2ha8 SAH 87–89, 110–112, 115, 129, 130, 132, 137–139, 141, 144 VENGNEGEINRSLVS 1v2xA, 1j85A. . .T0339 2hdy PLRa 71, 72, 75, 117, 119, 166, 205, 207, 208, 228,
230, 231, 267, 268GTNHSMDAQVHKGT 1p3wA, 1ecxB, 1t3iA. . .
T0341 2h04 PO4-MG 13, 15, 46, 47, 179, 204 DNTNKD 1yv9A, 2c4nAT0348 2hf1 ZN 11, 14, 29, 32 CCCD 1pft, 1p91A, 1p9wA. . .T0369 2hkv NI 48, 123, 127 HHH 1rxqA, 2f22Ac
T0371 2hx1 MG 19, 21, 232 DFD 1zjjB, 1ydfA, 2c4n. . .T0372 2hqy COA 175, 190, 246, 270, 272, 275 WEYLIL
Ligand codes come from the components.cif file from the PDB.15 Residue numbers are relative to CASP target sequence.aNoncanonical ligands. Not used in evaluation.bT0315 templates bind Zn more commonly than NI.cHomologs found by structural similarity with LGA.
G. Lopez et al.
168 PROTEINS DOI 10.1002/prot
![Page 5: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/5.jpg)
(FN337) sent predictions for five targets. As a result it
was not possible to carry out a real evaluation of predic-
tions. However, there were some interesting predictions,
such as those in Figure 3.
GO molecular function terms
We fixed a date for the evaluation of the GO terms as
GO annotations are changed with relative frequency in the
Uniprot database. Uniprot GO terms came from a version
of the Uniprot database frozen on February 15th 2007.
For the purposes of this evaluation, we only considered
the GO molecular function category. Root terms such as
‘‘binding’’ or ‘‘catalytic activity’’ were not included in the
evaluation. Thirty-six of the 100 targets already had 46
GO molecular function terms associated to them in Uni-
prot. In addition, we were able to add GO terms to a total
of 34 targets from bound ligands (21 targets), from cross-
referencing the associated EC number (4 targets) and
from recent publications (6 targets). Since the close of the
prediction season a total of 19 targets have had modifica-
tions to their Uniprot-associated GO terms, with a total of
12 new terms and 9 terms removed from the pages.
The GO terms new to Uniprot since the close of the
prediction season totaled 56, although only apart from
the information from the bound ligands, just two of the
terms from recent papers can be regarded as completely
‘‘new.’’
Evaluation
Relations between terms in the Gene Ontology are of
two types: ‘‘is a’’ and ‘‘part of.’’ The proportions of both
types of linkage differ in the three different main ontol-
ogies (biological process, cellular localization, and mole-
cular function). Linkage of the type ‘‘is a’’ leads to a
hierarchy while linkages of the type ‘‘part of ’’ compli-
cates matters by including other graph features. GO
term predictions in CASP7 are restricted to molecular
function where practically all the relations are of the
type ‘‘is a.’’ For that reason, we treated the ontology as
a hierarchy of terms in the evaluation, with each of the
terms being separated from the root term by a variable
number of steps. In this evaluation ‘‘term depth’’ is set
to the maximum number of steps needed to climb to
the root.
Over-prediction is not implicitly penalized. This is
because proteins can have many GO terms and most tar-
gets are annotated with just a few. Therefore, we have to
assume that some terms remain to be elucidated. How-
ever, a number of groups entered redundant predictions
for some targets and we did filter the submitted predic-
tions before calculating the scores.
Each annotated term was compared directly with
the most similar predicted term in the target predic-
tions. This pairing between annotated term and the
most similar predicted term is referred to as the
‘‘computable pair.’’ Common ancestor depth is the
depth of the first common parent and measures sim-
ilarity between two terms. Common ancestor depth
was calculated for all computable pairs in a target.
The prediction score for each target prediction was
obtained by summing the common ancestor depths
of all computable pairs. The final score was normal-
ized by dividing by the maximum possible score for a
given target (the sum of the annotated term depths):
Figure 3Binding site prediction: (a) Target T0289 binds Zinc at residues 20 (his), 23 (glu), and 115 (his). Group FN337 was able to predict 20 and 23 (green) while 115 (blue)
was missed out resulting in an accuracy of 100%, but coverage of 67%. (b) Best prediction for T0332 target binding S-Adenosyl Homocysteine (sticks) from FN687.
Correctly predicted residues in green, missed in blue and misplaced in red dots. Scores were 53% in coverage and 80% in accuracy.
CASP7 Function Prediction Assessment
DOI 10.1002/prot PROTEINS 169
![Page 6: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/6.jpg)
GO Score
¼ sum of common ancestor depths of computable pairs
sum of the annotated terms depth
Scores range between 0 and 1.
Two datasets were constructed from the total set of
terms, one containing all the annotations and the other
including only terms which were not in the databases
before the prediction deadlines. We computed the scores
as described above for both datasets.
GO consensus predictions
As a means of comparison with the predictors we gen-
erated ‘‘consensus’’ predictions. Here we wanted to assess
whether a consensus prediction extracted from the par-
ticipants would improve predictions as it does in other
prediction fields. The previous CASP assessors published
a retrospective analysis of the CASP6 function predic-
tions23 where they concluded that function prediction
could be useful for function annotation, as long as a
large number of groups participate in the predictions.
For the GO terms, we generated three consensus sets.
One included all the terms predicted by any two or more
predictors. This set had an average of 7.3 terms per tar-
get. However, the average number of filtered terms per
target predicted by the predictions groups was much less,
3.1 terms per target. For this reason, we generated a sec-
ond consensus set where predicted terms were ranked
and then selected so that the final average of terms per
target in consensus was similar to 3.1.
In addition, we wanted to generate a consensus predic-
tion that was nonredundant by method in the style of
Pelligrini-Calace et al.23 in order to see if it was possible
to reproduce their results. To do this, we grouped all the
predictions by the method used to make the prediction
(Fig. 1) and included all the terms predicted by any two
or more of the method clusters. The average of terms per
target in this nonredundant consensus set was 3.45. Pre-
dictions from several groups were left out of the non-
redundant consensus because these groups did not
include a method type.
Results for GO term category
Results are not identical to the results presented at the
Asilomar CASP meeting in November 2006 because of
the large proportion of terms that have been added or
subtracted from the Uniprot annotation since the CASP
experiment ended.
For the complete dataset groups, FN408 and FN087
made the highest scoring predictions [Fig. 4(a)]. We also
Figure 4GO term prediction evaluation: average scores of predictions. The blue bar
shows the 3.1 term consensus predictions, the green bar shows the nonredundant
consensus selections and the red bars show the results of the predictors. Only
groups submitting predictions for more than 10 targets are included. Two
datasets of terms are evaluated: (a) all terms found for a common subset of 39
targets and (b) terms not found in Uniprot before prediction deadlines (a
common subset of 20 targets). [Color figure can be viewed in the online issue,
which is available at www.interscience.wiley.com.]
Figure 5Head-to-head comparisons of GO terms prediction: in orange the number of
wins for each group from head-to-head comparisons of GO term predictions
made for common targets. Statistically significant wins are in yellow. The
significance of the wins in head-to-head comparisons is calculated from standard
paired t-tests and over a minimum of five common targets (P-value for
significance 5 0.05). [Color figure can be viewed in the online issue, which is
available at www.interscience.wiley.com.]
G. Lopez et al.
170 PROTEINS DOI 10.1002/prot
![Page 7: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/7.jpg)
computed results for the newly annotated GO terms,
where group FN408 and FN788 stood out [Fig. 4(b)].
In general, the three consensus predictions are better
than the predictors, the 7.3 GO term average set is
always better than the predictors, while the 3.1 GO term
consensus set and the nonredundant consensus sets are
better than all but the best predictor (FN408) over both
groups of GO terms.
Because few groups predicted all targets in each sec-
tion, all comparisons between groups were head-to-head
over common subsets of predicted targets (Fig. 5). We
also calculated how significant the differences between
groups were. As can be seen from the results, group
FN408 performs better than all other groups over this set
of targets and GO terms. In addition, they are signifi-
cantly better than all but three groups FN549 (where the
comparison was made over 5 targets), FN753 (just 16
targets and the server of group FN408) and FN510.
The same calculations were made for the subset of
new GO terms and here again group FN408 had better
scores than every other group when compared head-to-
head, although over these targets the scores for group
FN510 were very similar and this time the differences
between the top two groups and groups FN212, FN418,
and FN490 were not significant. In part this is because
the comparisons were made over a smaller, more biased
set of GO terms.
EC numbers
We recorded 35 targets with EC numbers in total, 14
had an experimentally verified EC number assigned in
Uniprot and 21 had Uniprot EC numbers that were puta-
tive or assigned by sequence similarity. Many of these
were assigned by working backwards from GO terms
assigned by electronic annotation and we found two
experimentally verified EC numbers from the literature.
All but 4 of the targets with EC numbers in Uniprot
have experimental confirmatory evidence. No targets
have had modifications to their Uniprot-associated EC
number since the close of the prediction season.
EC annotations were not classified into subgroups,
because we felt that these subgroups would contain too
few targets to evaluate properly.
Evaluation details
First predictions were filtered as in the GO term evalu-
ation by eliminating predecessor codes when children
appeared in the same predictions. Each annotated term
was then compared directly with the most similar pre-
dicted term in the target predictions. EC numbers are a
hierarchy of four levels (for example target T0375,
2.7.1.3), and comparison between predicted with anno-
tated terms is relatively simple.
Lower levels are only computed if the higher levels are
identical, with the exception of target T0384. In general,
each level exists only to differentiate one enzymatic func-
tion from the others in the same level. However, there
are cases where the numbering of a lower level has signif-
icance irrespective of the numbering of the preceding
level. For example, the third number in all oxidoreduc-
tases (EC:1.0.0.0) describes the acceptor molecule inde-
pendent of the numbering of the second level number
(which relates to the donor group). Target T0384 was
defined as EC:1.0.1.0 by similarity. The second number
(the donor group) is missing but the acceptor group is
probably NAD1 or NADP1, so the third number is
defined.
Unlike the GO terms, where proteins may have more
than one term, there is generally just a single EC number
per protein so it is important to take over-prediction
into account. Here, over-prediction was penalized
because the scores for each prediction were divided by
the number of predicted EC numbers predicted by each
group for each target.
EC Score ¼ sum of computable pair scores
maximum possible score
A consensus prediction was also generated for EC num-
bers. Here we selected the EC number predicted by most
participants–sometimes two or three EC numbers were
predicted equally by predictors and hence were included
in the consensus.
EC results
Fewer predictors made systematic EC number predic-
tions and there were fewer new annotations. Groups
scored well, some with EC scores of over 0.80. However,
it should be pointed out that none of the EC numbers
can be considered as ‘‘new.’’ Groups FN087, FN677,
FN490, and FN408 had the best scores, but FN408 only
predicted 21 of 35 targets (Table II). The consensus pre-
diction scored higher than any single group in this cate-
gory. When the groups were compared head-to-head
over common subsets of targets FN087 and FN677 stood
Table IIEC Predictions
Group Score
Consensus 0.93FN087 0.90FN490 0.84FN677 0.84
Average scores of predictions for the top three groups over a common subset of
29 targets. Results for the EC consensus prediction are also shown as a comparison.
Maximum possible score is set to one. Head to head comparisons between groups
show that the three groups and the consensus were not significantly different.
CASP7 Function Prediction Assessment
DOI 10.1002/prot PROTEINS 171
![Page 8: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/8.jpg)
out, but were not significantly better than groups FN408
or FN490.
CONCLUSIONS
CASP is an experiment that evaluates the state of
structure prediction. The assessment is based on known
structures that can be hidden from the predictors, thus
making predictions blind. The same cannot be done with
the function prediction category. Assessment is hampered
by the lack of new functional information. With the
exception of bound ligands, the assessors have no more
functional information at the end of the experiment than
was available to the predictors during the experiment. Six
months after the first evaluation there are only a few tar-
gets with newly annotated GO terms and no newly anno-
tated EC numbers at all. Previous experience from
CASP623 has shown that a even a full year is barely
enough to gather additional experimental information.
The majority of the GO functional terms that we were
able to add to the existing annotations are related to
ligand binding since 21 target structures clearly bound a
biologically relevant ligand. As it is difficult to draw con-
clusions based on a small and biased set of functional
annotation data, we had to include functional annota-
tions that were already known (present in Uniprot) in
the evaluation of GO terms and EC numbers.
In the GO term category there were more functional
annotations that came from known (and therefore not
blind) sources than new functional annotations. This was
even more of a problem in the EC number category.
What this means is that we are evaluating the all-round
predictive ability of the prediction methods, not only
how well they can predict unknown function for proteins
without functional annotation, but also how well they
can reproduce the functional annotations that are already
known for the target. Nevertheless, the new functional
annotations form a substantial and growing part of the
evaluation and the proportion of newly annotated terms
will continue to grow. There have even been a number of
new functional annotations since the November meetings
that have allowed us to monitor trends.
The fact that only 22 groups submitted predictions
was another surprise. This is an important and growing
field, so it is unfortunate that few groups were prepared
to predict and that groups that have published methods
in this area did not participate in the experiment. It is
almost certainly true that the slow release of functional
information that hampers the assessment was also the
cause of this low turnout. However, binding site predic-
tions were something that could have readily been eval-
uated so we were surprised that only two groups made
consistent predictions for this category.
The only category for which it was possible to make
any sort of conclusions about the efficacy of the methods
used was the GO molecular function category. This is
because there were far too few predictors in the binding
residue prediction category and too few new terms in the
EC number prediction category. Hence, the results from
the EC number prediction section will be biased towards
methods that are able to use easily available, nonblind
functional information.
In the GO molecular function prediction category, it
was possible to evaluate the predictions of 14 groups
over 57 targets. We can tentatively conclude that at pres-
ent group FN408 stands out as an all round prediction
method. It was significantly better than all but three
groups in when evaluated in head-to-head comparisons
over the set of all molecular function terms. Of those
three groups, one group made predictions for just five
targets and another was the prediction server of the same
group. These conclusions are conditional because we
know that GO molecular function terms are subject to
constant revision and will be changed and added to year
on year. Group FN510 also stood out in the subset of
newer GO terms, although this group was less good at
‘‘predicting’’ known GO terms.
The consensus predictions did not turn out to be as
good a predictor of function as we had hoped. This was
possibly because there were so few predictors. However,
one thing was clear, as the GO terms were updated and
the proportion of new terms within the subset of reliable
GO terms increased the scores for the consensus predic-
tions actually decreased relative to the best predictors,
suggesting that the average predictor was less good at
predicting the more difficult GO terms.
There are a number of difficulties in running a func-
tion prediction assessment in CASP, but nevertheless the
general opinion during the CASP meeting was that the
function prediction category is important and should be
maintained. Whether this future is under exactly the
same rules as the other categories is another question. At
present the main problem for an experiment like CASP
is the fact that it may take several years for functional
annotations to be known. While it does mean that the
evaluation can be revisited some time in the future, it is
not ideal for a rapidly developing field where predictors
need to make use of the results and the evaluation in
order to refine their methods.
One other possibility is to select a small number of
targets that can be kept hidden from the predictors.
Inevitably this would mean that there would be very few
targets to evaluate and hence any meaningful evaluation
would have to run over a longer time period than a
CASP experiment. This would entail a rolling evaluation
in the style of the CAPRI docking experiment24 that
could be run in conjunction with experimental groups.
There are many other function prediction experiments
such as AFP (http://biofunctionprediction.org/), BioCrea-
tive (http://biocreative.sourceforge.net), GeneFun (http://www.
genefun.org), and MouseFunc (http://www.mousefunc.
org)25–28 that exploit a range of predictive methods to
G. Lopez et al.
172 PROTEINS DOI 10.1002/prot
![Page 9: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/9.jpg)
make functional annotations and it might be fruitful to
bring some of these groups into the CASP function pre-
diction process. Some of these experiments have been
able to persuade databases or experimental groups to
provide annotations previous to their release, for example
the sets of annotated GO terms in BioCreative I and the
protein interaction data in BioCreative II were both pro-
vided prior to release.
One category in this evaluation would certainly be
suitable for a CASP-style experiment. Since target struc-
tures are submitted to the CASP organizers complete
with bound ligands, it should be possible to evaluate pre-
dictions for ligand-binding residues. Two subcategories of
binding site prediction were proposed in the CASP7
meeting. In the first predictors would not know whether
a ligand was bound to a target and would be expected to
make automatic prediction of binding residues. In the
second the predictors would know in advance that a cer-
tain ligand would bind to the structure. The second part
of the experiment would apply to fewer targets (20% if
this CASP was a representative example), so more
detailed methods and human predictors would be able to
spend more time on the predictions. It is certainly possi-
ble to predict binding sites for even the hardest of remote
homologues as can be seen in Figure 6.
However, for this experiment to function properly,
CASP would almost certainly need to encourage function
prediction groups that do not normally participate in
CASP experiments to submit predictions.
ACKNOWLEDGMENTS
Thanks to all those at the Prediction Center, especially
Andriy Krysthafovych. Thanks to Angela del Pozo for
invaluable help in handling GO terms.
REFERENCES
1. Friedberg I, Harder T, Godzik A. JAFA: a protein function annota-
tion meta-server. Nucleic Acids Res 2006;34:W379–W381.
2. Hawkins T, Luban S, Kihara D. Enhanced automated function pre-
diction using distantly related sequences and contextual association
by PFP. Protein Sci 2006;15:1550–1556.
Figure 6A difficult (but perfectly possible) prediction: (a) Representation of the nickel-binding site in PDB structure 1rxqA. The nickel is bound to histidines 64, 157 and 161. (b)
Target T0369 binds nickel on histidines 48, 123, and 127. (c) 3D-jury alignment for T0369 and 1rxq_A passed through firestar. The three binding histidines are
conserved in both template and target. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
CASP7 Function Prediction Assessment
DOI 10.1002/prot PROTEINS 173
![Page 10: Assessment of predictions submitted for the CASP7 function prediction category](https://reader038.fdocuments.us/reader038/viewer/2022100421/5750251d1a28ab877eb2368a/html5/thumbnails/10.jpg)
3. Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for pre-
dicting protein function from 3D structure. Nucleic Acids Res
2005;33:W89–W93.
4. Pazos F, Bang J-W. Computational prediction of functionally im-
portant regions in proteins. Curr Bioinform 2006;1:15–23.
5. Devos D, Valencia A. Practical limits of function prediction. Pro-
teins 2000;41:98–107.
6. Todd AE, Orengo CA, Thornton JM. Evolution of function in pro-
tein superfamilies, from a structural perspective. J Mol Biol 2001;
307:1113–1143.
7. Soro S, Tramontano A. The prediction of protein function at
CASP6. Proteins 2005;61:201–213.
8. Kopp J, Schwede T. The SWISS-MODEL Repository of annotated
three-dimensional protein structure homology models Nucleic
Acids Res 2004;32:D230–D234.
9. Castrignano T, De Meo PD, Cozzetto D, Talamo IG, Tramontano
A. The PMDB protein model database. Nucleic Acids Res 2006;34:
D306–D309.
10. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi
A, Marti-Renom M, Karchin R, Webb BM, Eramian D, Shen MY,
Kelly L, Melo F, Sali A. MODBASE: a database of annotated com-
parative protein structure models and associated resources. Nucleic
Acids Res 2006;34:D291–D295.
11. IUBMB. Enzyme Nomenclature. New York: Academic Press; 1992.
12. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,
Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP,
Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,
Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the
unification of biology. The gene ontology consortium. Nat Genet
2000;25:25–29.
13. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. Nucleic Acids Res 1997;25:3389–
3402.
14. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig
H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids
Res 2000;28:235–242.
15. Clarke ND, Ezkurdia I, Kopp J, Read R, Schwede T, Tress ML. Domain
definition and target classification for CASP7. Proteins, this issue.
16. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro
S, Gasteiger E, Huang H, Lopez R, Magrane M, Natale DA,
O’Donovan C, Redaschi N, Yeh LS. The universal protein resource
(UniProt). Nucleic Acids Res 2005;33:D154–D159.
17. Lopez G, Valencia A, Tress ML. FireDB—a database of functionally
important residues from proteins of known structure. Nucleic Acids
Res 2007;35:D219–D223.
18. Zemla A. LGA—a method for finding 3D similarities in protein
structures. Nucleic Acids Res 2003;31:3370–3374.
19. Lopez G, Valencia A, Tress ML. firestar—prediction of functionally
important residues using structural templates and alignment reli-
ability. Nucleic Acids Res 2007;35: Web server issue, in press.
20. Tress ML, Grana O, Valencia A. SQUARE-determining reliable
regions in sequence alignments. Bioinformatics 2004;20:974–975.
21. Michalsky E, Dunkel M, Goede A, Preissner R. SuperLigands—a
database of ligand structures derived from the protein data bank.
BMC Bioinform 2005;6:122.
22. Mitchell JBO. The relationship between the sequence identities of
slpha helical proteins in the PDB and the molecular similarities of
their ligands. J Chem Inf Comput Sci 2001;41:1617–1622.
23. Pellegrini-Calace M, Soro S, Tramontano A. Revisiting the predic-
tion of protein function at CASP6. FEBS J 2006;273:2977–2983.
24. Janin J, Henrick K, Moult J, Ten Eyck L, Sternberg MJE, Vajda S,
Vakser I, Wodak SJ. CAPRI: A Critical assessment of predicted
interactions. Proteins 2003;52:2–9.
25. Friedberg I, Jambon M, Godzik A. New avenues in protein function
prediction. Protein Sci 2006;15:1527–1529.
26. Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreA-
tIvE: critical assessment of information extraction for biology. BMC
Bioinform 2005;6:S1.
27. In-silico prediction of gene function. Ref. U.E.: LSHG-CT-2004-
503567.
28. Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple
approach to improve protein structure predictions. Bioinformatics
2003;19:1015–108.
G. Lopez et al.
174 PROTEINS DOI 10.1002/prot