Assessment of predictions submitted for the CASP7 function prediction category

10

Click here to load reader

Transcript of Assessment of predictions submitted for the CASP7 function prediction category

Page 1: Assessment of predictions submitted for the CASP7 function prediction category

proteinsSTRUCTURE O FUNCTION O BIOINFORMATICS

Function: Prediction

Assessment of predictions submitted forthe CASP7 function prediction categoryGonzalo Lopez, Ana Rojas, Michael Tress,* and Alfonso Valencia

Structural and Computational Biology Programme, Spanish National Cancer Research Centre, Almagro, Madrid, Spain

INTRODUCTION

Whole-genome sequencing projects are generating unannotated sequen-

ces in increasing numbers and at the same time there are a substantial

number of known structures that have little or no functional information,

many of these generated by structural genomics projects. There is a great

deal of interest in predicting function for these proteins and it is clear that

function prediction is becoming an increasingly important field.1–4

Function assignment is far from simple. Although functional annotations

can be transferred by homology, common evolutionary origin does not

guarantee identical function and the more distant the evolutionary relation-

ship, the less reliable the transfer will be.5

Protein 3D structure can be of use in predicting function. In theory,

structure-based prediction ought to succeed more often than sequence-

based prediction because structural patterns tend to be conserved long after

sequence patterns become undetectable. However, predicting function for

proteins with known structure still presents researchers with problems.

While structure may be conserved within a superfamily of proteins, it is

not always true that function is conserved to the same extent.6

Structure-based function prediction may present researchers with some

challenges, but it does seem probable that protein 3D structure can directly

aid in functional annotation. Function prediction was included in CASP6

for the first time7 with the aim of discovering whether computational

methods could use 3D structure to add useful molecular or biological in-

formation to the target proteins.

One interesting side effect of the large increase in known 3D structures is

that it is now possible to build homology models for a large number of

proteins. Model databases8–10 that extend the range of known structures

are springing up all the time. The rise of the protein model databases gen-

The authors state no conflict of interest.

Grant sponsor: BioSapiens; Grant number: LSHC-CT-2003-505265; Grant sponsor: GENEFUN; Grant num-

ber: LSHG-CT-2004-503567.

*Correspondence to: Michael Tress, Structural Biological Computation Programme, CNIO, c./ Melchor

Fernandez Almagro, 3, Madrid, Spain. E-mail: [email protected]

Received 1 March 2007; Revised 16 April 2007; Accepted 30 April 2007

Published online 24 July 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21651

ABSTRACT

Here we present a full overview of the

Critical Assessment of Protein Structure

Prediction (CASP7) function prediction

category. Predictions were submitted for

Gene Ontology molecular function terms,

Enzyme Commission numbers, and ligand

binding site residues. The first two catego-

ries were difficult to assess because very

little new functional information becomes

available after the experiment. The major-

ity of the known Gene Ontology terms and

all the Enzyme Commission numbers were

available a priori to predictors before the

experiment, so prediction for these two

categories was not blind. Nevertheless, for

Gene Ontology terms we were able to

demonstrate that some groups made better

predictions than others. In the binding

residue category, the predictors did not

know in advance which ligands were

bound and therefore blind evaluation was

possible, but there were disappointingly

few predictions in this category. After

CASP 6 and 7 the need to organize a

more effective blind function prediction

category is obvious, even if it means focus-

ing on binding site prediction as the only

category that can be truly assessed in the

CASP spirit.

Proteins 2007; 69(Suppl 8):165–174.VVC 2007 Wiley-Liss, Inc.

Key words: target structures; function pre-

diction; 3D models; binding sites; GO

terms; EC numbers.

VVC 2007 WILEY-LISS, INC. PROTEINS 165

Page 2: Assessment of predictions submitted for the CASP7 function prediction category

erates a number of questions. For example, to what

extent could we use these 3D models to make reliable

functional assignments? How much can remotely homol-

ogous models aid the prediction of general function?

And for what kind of biological situations are model-

based function predictions most useful?

Answering these questions under the format of CASP

is a tall order. Targets are released in CASP because we

are close to deducing their structure not their function,

and for that reason the function remains an unknown at

the end of the experiment. Experience shows that with a

bit of luck the probable function of the protein may be

refined after the structure is released, but unless a group

has been carrying out combined structural and functional

studies it is generally not possible to come up with a de-

finitive answer. The fact that we know little new about

the function of the target proteins after the CASP predic-

tion season is closed complicates the way that the func-

tion prediction category is assessed.

The first specific challenge that we faced in setting up

the function prediction experiment for CASP7 was the

definition of function itself. What is function and what

do we expect predictors to be able to predict? We

decided to evaluate function prediction with three sepa-

rate measures. The methods we chose were two standard

measures of general function, Enzyme Commission num-

bers (EC numbers11) and Gene Ontology (GO) molecu-

lar function terms.12 These two measures tended to over-

lap somewhat. We included a more specific measure that

we felt could have been predicted by the predicting

groups and that we were sure we would be able to evalu-

ate with confidence after the experiments–the prediction

of ligand binding residues.

In this report, we present the results of our assessment

of the predictions submitted to the CASP7 function pre-

diction experiment. The number of groups making pre-

dictions in this section was disappointingly poor given its

importance, but we were able to reach some conclusions

on the state of the art in function prediction. The results

also allowed us to make firm proposals as to the future

of function prediction in CASP.

METHODS AND RESULTS

Prediction methods

One of the fields in the CASP submission format

allowed predictors to enter information on the type of

methods used for function prediction. The aim was to

have a measure of the contribution of different approaches

to function predictions and try to answer to what extent

3D model structures can be used in the assignment of

functional information. The methods used by predictors

were encoded as binary vectors representing the usage or

not of the six different proposed approaches and the fre-

quency of use of these vectors are represented in Figure 1.

It is clear from the figure that almost all the predictors

used sequence analysis as part of their prediction and

that the second most commonly used technique was the

use of information from the GO database. Structural in-

formation is used in very few cases, three of the four

most common vectors did not employ structural infor-

mation, but it is perhaps interesting to note that arguably

the most successful prediction group (FN408) did use

structural information in their predictions.

Predictors made use of a wide range of methods

reflecting an increasing activity in the area of function

annotation that goes beyond standard annotation pipe-

lines based on simple transference from homologous sequen-

ces. Indeed, some groups employed as many methods

as possible in their predictions. For example FN408

(KIHARA_PFP) used a sequence-based GO term predic-

tor (PFP2), which makes annotations based on the fre-

quency of GO terms in PSI-BLAST13 searches, in con-

junction with information from GO, from domain, motif

and protein–protein interaction databases and literature.

Group FN510 (IUB-Info) made predictions with a sup-

port vector machine trained on residue-based, sequence

Figure 1Summary of methods used for function prediction: methods are encoded in

binary vectors representing a group of method types. (a) Frequency of vector

usage, note that each group sent the same vector in all predictions and that not

all group submitted method usage information. (b) Method preferences in each

prediction category. Note that binding site bars (BS) are biased given that only

one group predicted systematically and submitted method vectors for this

category. [Color figure can be viewed in the online issue, which is available at

www.interscience.wiley.com.]

G. Lopez et al.

166 PROTEINS DOI 10.1002/prot

Page 3: Assessment of predictions submitted for the CASP7 function prediction category

alignment, and literature derived features. FN687 (YU-

BA, probably the better of the two binding residue pre-

dictors) made their predictions after superimposing

CASP7 server models onto PDB structures14 with bound

ligands and selecting the bound ligand through nearest

neighbor clustering. Groups FN212/FN418 (HHPred)

based their GO term predictions on profile–profile align-

ments between hidden Markov model alignments gener-

ated around the target sequences and those alignments

generated around sequences from PDB structures and

functional domains. The GO terms were drawn from the

relative numbers of GO terms/descriptors found in the

searches. FN788 (LCBDavis) use short fragments of

structure (‘‘local descriptors’’) and ROSETTA to generate

a set of rules that could discriminate function.

Targets

There were 100 target proteins evaluated in the CASP6

function prediction set. Five extra targets were included

in the function assessment category: targets T0343,

T0352, and T0310 that were excluded from the structural

assessment by the assessors15 and targets T0344 and

T0355 for which predictors made predictions, but for

which no structure was deposited with CASP. The final

structure of these targets was not relevant for function

prediction evaluation. Function prediction targets were

not split into domains, predictors had to predict function

for the whole of the given sequence.

The functional annotation used in the evaluation came

from various sources. The main source of reliable infor-

mation was Uniprot,16 from which we took GO terms

and EC numbers. Residues that bound ligands came

from the target structures that were deposited with the

organizers. In addition, we were able to add terms by

crosschecking GO terms with EC number and vice versa,

and by crosschecking GO terms with bound ligands.

Sixty-six targets had some sort of functional informa-

tion associated to them in terms of EC numbers, GO

terms, or bound ligands. These targets formed the pool

of targets that we were able to use for this evaluation. It

follows that there were 34 targets for which there was no

usable functional information and these targets formed

no part of this evaluation.

Targets with bound ligands

Binding residues came directly from the structures in

the PDB or those deposited with the CASP organizers.

Binding site residues were defined as those residues in

contact with biologically relevant ligands. Two atoms

were considered to be in contact when they were within

a distance of 0.5 A plus the sum of their van der Waals

distances. This definition was established and made pub-

lic before prediction seasons in the submission format at

CASP web pages. There were 63 targets with 112 candi-

date bound molecules.

While the contact criteria are straightforward, deter-

mining biological relevance is more difficult. Solvent

molecules were discarded a priori and the remaining

bound ions, metabolites, and coenzymes were assessed

using a combination of the database FireDB,17 LGA18

structural alignments, and literature information. If a tar-

get with a bound ligand had a structural homologue that

bound the same or an equivalent ligand at the same site

and the binding residues were conserved, the ligand was

considered to be biologically relevant. Conservation of

the binding residues was determined with the server fire-

star.19 An example is shown in Figure 2.

We were able to demonstrate biological importance for

a total of 28 ligands bound to 21 targets (Table I). Of the

ligands, 17 were ions (Mg, Zn, Ni, and Na), 7 were

nucleotides (ADP, GTP) or derivates (FAD, PLR, SAH,

SAM), 3 were metabolites (PO4, Oxalate), and 1 target

bound coenzyme A. Ligand codes come from the compo-

nents.cif file of the PDB.

Figure 2Binding site biological relevance: (a) Target T0348 sequence is submitted to

firestar. The output generates an alignment with PSI-BLAST, the reliability of

alignments is assessed by SQUARE (Ref. 20 conservation is shown in blue) and

template ligand binding sites are mapped on to the alignment. The first box

represents the target aligned with it and shows the zinc binding residues. The

second and third boxes show the alignment of T0348 with the 1pft and 1p91A

templates respectively. The three cysteines from the zinc-binding site are

conserved while the glutamate is changed in both templates. (b) LGA structural

alignment of T0348 and NMR model 1 from 1pft. The figure shows that side

chains are oriented towards the Zinc atom in the three conserved cysteines but

also that glu 32 (T0348) and cys 30 (1pft) are close to the zinc ion. [Color

figure can be viewed in the online issue, which is available at

www.interscience.wiley.com.]

CASP7 Function Prediction Assessment

DOI 10.1002/prot PROTEINS 167

Page 4: Assessment of predictions submitted for the CASP7 function prediction category

There were two cases where nonstandard ligands were

bound in biological binding sites. Target T0292 is

known to bind ATP, but the structure deposited with

CASP bound nucleotide analog 5-[(Z)-(5-chloro-2-oxo-

1,2-dihydro-3H-indol-3-ylidene)methyl]-N-(diethylamino)

ethyl]-2,4-dimethyl-1H-pyrrole-3-carboxamide (5Z5).

Meanwhile target T0339 is a pyridoxal phosphate (PLP)

binding protein, but the structure contained a close deri-

vate (5-hydroxy 4,6-dimethylpyridin-3-YL)methyl dihy-

drogen phosphate (PLR). To assess the similarity between

the natural ligand and the analog, we obtained Tanimoto

coefficients from the Super Ligands database.21 PLP and

PLR had a Tanimoto score of 88.57%, while ATP and

5Z5 had a score of 60.14%. It has been established that

Tanimoto scores higher than 0.7 (70%) indicate that two

molecules have high structural similarity, which in turn

is a good indication of similar biological activity.22 For

this reason 5Z5 was rejected while PLR included in the

list of biologically relevant sites.

Binding site evaluation

We measured binding site prediction with simple cover-

age and accuracy scores. Coverage was the number of cor-

rectly predicted residues divided by the number of anno-

tated residues. Accuracy represented the number of

correctly predicted residues from the total number of pre-

dicted residues.

C0:5 ¼ 100 3N0:5

T0:5A0:5 ¼ 100 3

N0:5

P0:5

where T0.5 is the number of residues in contact with the

ligand under 0.5 distance cut-off, P0.5 the number of pre-

dicted residues, and N0.5 is the number of correctly pre-

dicted residues at 0.5 A distance cut-off.

One problem with simple measures such as the accu-

racy and coverage at 0.5 A is that it does not take into

account surrounding residues. A residue at 0.54 A plus

van der Waals might also play a role in the binding of the

ligand and if it is predicted by one of the groups, it ought

to receive some score. For that reason we also calculated

scores similar to the GDT-TS structure prediction scores

that take this into account. It is also possible to calculate

ROC-like curves from the accuracy and coverage because

these measures can also be obtained for a range of differ-

ent distance cut-offs (C1.0, A1.0, C1.5, A1.5, etc).

While we had the means to assess this category, only

two groups (FN408 and FN687) submitted systematic

predictions for binding residues, while a third group

Table IBinding Site Summary Table

Target PDB Ligand Residue numbers Residues, one letter code aa Homologues by PSI-Blast

T0284 – OXL-MG2 48, 49, 50, 88, 159, 212, 235 GGSDRYH 1oqfA, 1o5qC, 1f8iA, 1m1bA. . .T0289 2gu2 ZN 20, 23, 115 HEH 2bcoB, 2i3cA, 1yw4A. . .T0292 2cl1 5Z5a 12, 33, 34, 66, 84–88, 90, 146, 160 IVKVMEYCEGFL 1ol7A, 1q24A. . .T0293 2h00 SAH 34, 36, 41, 69–71, 75–76, 91–93, 97, 119,

121–124, 143–145, 147, 186, 189LPRGTGIYTEVDCVQTLNPPFFR 2h00A, 2b3tA, 1sq9A. . .

T0308 2h57 GTP-MG 10–16, 31, 34, 55–56, 59, 114, 115, 117, 118, 147,149 ANSGKTTITSNKGRDLDAI 1hurA, 1mr3F, 1r8sA. . .T0312 2h6l ZN 89, 91, 104 HHH 1xv2CT0313 2h58 ADP-MG 7, 9, 10, 12, 86–92 RRPTTGAGKTY 1bq2, 1vfzA, 1gojA. . .T0315 2gzx NI-NI 6, 8, 92, 128, 153, 204 HHEHHD 1yixA, 1xwyA. . .b

T0316 2hma SAM-MG 12–14, 16, 18, 19, 36–38, 100, 104, 108,126–128, 152, 155

GMSGDSIFMDNKTGHNF 1qpmA, 1xnqB. . .

T0318 2hb6 ZN-ZN 252, 257, 275, 334, 336 KDDDE 2ewbA, 1qytF. . .T0319 2j6a ZN 11, 16, 112, 115 CCCC 2hf1A, 1dx8AT0320 – FAD 59, 61, 66, 106, 107, 144, 148, 161, 163–165,

181, 182, 185, 188, 190, 300SNCFIMFIIGITDTFRR

T0324 2hdo PO4 9–11, 104, 105, 137 DIDTSK 2gfhAT0329 2hl0 NA 9, 11, 189 DDD 2gfhAT0330 2hcf MG 9, 11, 177 DDD 2fdrA, 1lvhA. . .T0332 2ha8 SAH 87–89, 110–112, 115, 129, 130, 132, 137–139, 141, 144 VENGNEGEINRSLVS 1v2xA, 1j85A. . .T0339 2hdy PLRa 71, 72, 75, 117, 119, 166, 205, 207, 208, 228,

230, 231, 267, 268GTNHSMDAQVHKGT 1p3wA, 1ecxB, 1t3iA. . .

T0341 2h04 PO4-MG 13, 15, 46, 47, 179, 204 DNTNKD 1yv9A, 2c4nAT0348 2hf1 ZN 11, 14, 29, 32 CCCD 1pft, 1p91A, 1p9wA. . .T0369 2hkv NI 48, 123, 127 HHH 1rxqA, 2f22Ac

T0371 2hx1 MG 19, 21, 232 DFD 1zjjB, 1ydfA, 2c4n. . .T0372 2hqy COA 175, 190, 246, 270, 272, 275 WEYLIL

Ligand codes come from the components.cif file from the PDB.15 Residue numbers are relative to CASP target sequence.aNoncanonical ligands. Not used in evaluation.bT0315 templates bind Zn more commonly than NI.cHomologs found by structural similarity with LGA.

G. Lopez et al.

168 PROTEINS DOI 10.1002/prot

Page 5: Assessment of predictions submitted for the CASP7 function prediction category

(FN337) sent predictions for five targets. As a result it

was not possible to carry out a real evaluation of predic-

tions. However, there were some interesting predictions,

such as those in Figure 3.

GO molecular function terms

We fixed a date for the evaluation of the GO terms as

GO annotations are changed with relative frequency in the

Uniprot database. Uniprot GO terms came from a version

of the Uniprot database frozen on February 15th 2007.

For the purposes of this evaluation, we only considered

the GO molecular function category. Root terms such as

‘‘binding’’ or ‘‘catalytic activity’’ were not included in the

evaluation. Thirty-six of the 100 targets already had 46

GO molecular function terms associated to them in Uni-

prot. In addition, we were able to add GO terms to a total

of 34 targets from bound ligands (21 targets), from cross-

referencing the associated EC number (4 targets) and

from recent publications (6 targets). Since the close of the

prediction season a total of 19 targets have had modifica-

tions to their Uniprot-associated GO terms, with a total of

12 new terms and 9 terms removed from the pages.

The GO terms new to Uniprot since the close of the

prediction season totaled 56, although only apart from

the information from the bound ligands, just two of the

terms from recent papers can be regarded as completely

‘‘new.’’

Evaluation

Relations between terms in the Gene Ontology are of

two types: ‘‘is a’’ and ‘‘part of.’’ The proportions of both

types of linkage differ in the three different main ontol-

ogies (biological process, cellular localization, and mole-

cular function). Linkage of the type ‘‘is a’’ leads to a

hierarchy while linkages of the type ‘‘part of ’’ compli-

cates matters by including other graph features. GO

term predictions in CASP7 are restricted to molecular

function where practically all the relations are of the

type ‘‘is a.’’ For that reason, we treated the ontology as

a hierarchy of terms in the evaluation, with each of the

terms being separated from the root term by a variable

number of steps. In this evaluation ‘‘term depth’’ is set

to the maximum number of steps needed to climb to

the root.

Over-prediction is not implicitly penalized. This is

because proteins can have many GO terms and most tar-

gets are annotated with just a few. Therefore, we have to

assume that some terms remain to be elucidated. How-

ever, a number of groups entered redundant predictions

for some targets and we did filter the submitted predic-

tions before calculating the scores.

Each annotated term was compared directly with

the most similar predicted term in the target predic-

tions. This pairing between annotated term and the

most similar predicted term is referred to as the

‘‘computable pair.’’ Common ancestor depth is the

depth of the first common parent and measures sim-

ilarity between two terms. Common ancestor depth

was calculated for all computable pairs in a target.

The prediction score for each target prediction was

obtained by summing the common ancestor depths

of all computable pairs. The final score was normal-

ized by dividing by the maximum possible score for a

given target (the sum of the annotated term depths):

Figure 3Binding site prediction: (a) Target T0289 binds Zinc at residues 20 (his), 23 (glu), and 115 (his). Group FN337 was able to predict 20 and 23 (green) while 115 (blue)

was missed out resulting in an accuracy of 100%, but coverage of 67%. (b) Best prediction for T0332 target binding S-Adenosyl Homocysteine (sticks) from FN687.

Correctly predicted residues in green, missed in blue and misplaced in red dots. Scores were 53% in coverage and 80% in accuracy.

CASP7 Function Prediction Assessment

DOI 10.1002/prot PROTEINS 169

Page 6: Assessment of predictions submitted for the CASP7 function prediction category

GO Score

¼ sum of common ancestor depths of computable pairs

sum of the annotated terms depth

Scores range between 0 and 1.

Two datasets were constructed from the total set of

terms, one containing all the annotations and the other

including only terms which were not in the databases

before the prediction deadlines. We computed the scores

as described above for both datasets.

GO consensus predictions

As a means of comparison with the predictors we gen-

erated ‘‘consensus’’ predictions. Here we wanted to assess

whether a consensus prediction extracted from the par-

ticipants would improve predictions as it does in other

prediction fields. The previous CASP assessors published

a retrospective analysis of the CASP6 function predic-

tions23 where they concluded that function prediction

could be useful for function annotation, as long as a

large number of groups participate in the predictions.

For the GO terms, we generated three consensus sets.

One included all the terms predicted by any two or more

predictors. This set had an average of 7.3 terms per tar-

get. However, the average number of filtered terms per

target predicted by the predictions groups was much less,

3.1 terms per target. For this reason, we generated a sec-

ond consensus set where predicted terms were ranked

and then selected so that the final average of terms per

target in consensus was similar to 3.1.

In addition, we wanted to generate a consensus predic-

tion that was nonredundant by method in the style of

Pelligrini-Calace et al.23 in order to see if it was possible

to reproduce their results. To do this, we grouped all the

predictions by the method used to make the prediction

(Fig. 1) and included all the terms predicted by any two

or more of the method clusters. The average of terms per

target in this nonredundant consensus set was 3.45. Pre-

dictions from several groups were left out of the non-

redundant consensus because these groups did not

include a method type.

Results for GO term category

Results are not identical to the results presented at the

Asilomar CASP meeting in November 2006 because of

the large proportion of terms that have been added or

subtracted from the Uniprot annotation since the CASP

experiment ended.

For the complete dataset groups, FN408 and FN087

made the highest scoring predictions [Fig. 4(a)]. We also

Figure 4GO term prediction evaluation: average scores of predictions. The blue bar

shows the 3.1 term consensus predictions, the green bar shows the nonredundant

consensus selections and the red bars show the results of the predictors. Only

groups submitting predictions for more than 10 targets are included. Two

datasets of terms are evaluated: (a) all terms found for a common subset of 39

targets and (b) terms not found in Uniprot before prediction deadlines (a

common subset of 20 targets). [Color figure can be viewed in the online issue,

which is available at www.interscience.wiley.com.]

Figure 5Head-to-head comparisons of GO terms prediction: in orange the number of

wins for each group from head-to-head comparisons of GO term predictions

made for common targets. Statistically significant wins are in yellow. The

significance of the wins in head-to-head comparisons is calculated from standard

paired t-tests and over a minimum of five common targets (P-value for

significance 5 0.05). [Color figure can be viewed in the online issue, which is

available at www.interscience.wiley.com.]

G. Lopez et al.

170 PROTEINS DOI 10.1002/prot

Page 7: Assessment of predictions submitted for the CASP7 function prediction category

computed results for the newly annotated GO terms,

where group FN408 and FN788 stood out [Fig. 4(b)].

In general, the three consensus predictions are better

than the predictors, the 7.3 GO term average set is

always better than the predictors, while the 3.1 GO term

consensus set and the nonredundant consensus sets are

better than all but the best predictor (FN408) over both

groups of GO terms.

Because few groups predicted all targets in each sec-

tion, all comparisons between groups were head-to-head

over common subsets of predicted targets (Fig. 5). We

also calculated how significant the differences between

groups were. As can be seen from the results, group

FN408 performs better than all other groups over this set

of targets and GO terms. In addition, they are signifi-

cantly better than all but three groups FN549 (where the

comparison was made over 5 targets), FN753 (just 16

targets and the server of group FN408) and FN510.

The same calculations were made for the subset of

new GO terms and here again group FN408 had better

scores than every other group when compared head-to-

head, although over these targets the scores for group

FN510 were very similar and this time the differences

between the top two groups and groups FN212, FN418,

and FN490 were not significant. In part this is because

the comparisons were made over a smaller, more biased

set of GO terms.

EC numbers

We recorded 35 targets with EC numbers in total, 14

had an experimentally verified EC number assigned in

Uniprot and 21 had Uniprot EC numbers that were puta-

tive or assigned by sequence similarity. Many of these

were assigned by working backwards from GO terms

assigned by electronic annotation and we found two

experimentally verified EC numbers from the literature.

All but 4 of the targets with EC numbers in Uniprot

have experimental confirmatory evidence. No targets

have had modifications to their Uniprot-associated EC

number since the close of the prediction season.

EC annotations were not classified into subgroups,

because we felt that these subgroups would contain too

few targets to evaluate properly.

Evaluation details

First predictions were filtered as in the GO term evalu-

ation by eliminating predecessor codes when children

appeared in the same predictions. Each annotated term

was then compared directly with the most similar pre-

dicted term in the target predictions. EC numbers are a

hierarchy of four levels (for example target T0375,

2.7.1.3), and comparison between predicted with anno-

tated terms is relatively simple.

Lower levels are only computed if the higher levels are

identical, with the exception of target T0384. In general,

each level exists only to differentiate one enzymatic func-

tion from the others in the same level. However, there

are cases where the numbering of a lower level has signif-

icance irrespective of the numbering of the preceding

level. For example, the third number in all oxidoreduc-

tases (EC:1.0.0.0) describes the acceptor molecule inde-

pendent of the numbering of the second level number

(which relates to the donor group). Target T0384 was

defined as EC:1.0.1.0 by similarity. The second number

(the donor group) is missing but the acceptor group is

probably NAD1 or NADP1, so the third number is

defined.

Unlike the GO terms, where proteins may have more

than one term, there is generally just a single EC number

per protein so it is important to take over-prediction

into account. Here, over-prediction was penalized

because the scores for each prediction were divided by

the number of predicted EC numbers predicted by each

group for each target.

EC Score ¼ sum of computable pair scores

maximum possible score

A consensus prediction was also generated for EC num-

bers. Here we selected the EC number predicted by most

participants–sometimes two or three EC numbers were

predicted equally by predictors and hence were included

in the consensus.

EC results

Fewer predictors made systematic EC number predic-

tions and there were fewer new annotations. Groups

scored well, some with EC scores of over 0.80. However,

it should be pointed out that none of the EC numbers

can be considered as ‘‘new.’’ Groups FN087, FN677,

FN490, and FN408 had the best scores, but FN408 only

predicted 21 of 35 targets (Table II). The consensus pre-

diction scored higher than any single group in this cate-

gory. When the groups were compared head-to-head

over common subsets of targets FN087 and FN677 stood

Table IIEC Predictions

Group Score

Consensus 0.93FN087 0.90FN490 0.84FN677 0.84

Average scores of predictions for the top three groups over a common subset of

29 targets. Results for the EC consensus prediction are also shown as a comparison.

Maximum possible score is set to one. Head to head comparisons between groups

show that the three groups and the consensus were not significantly different.

CASP7 Function Prediction Assessment

DOI 10.1002/prot PROTEINS 171

Page 8: Assessment of predictions submitted for the CASP7 function prediction category

out, but were not significantly better than groups FN408

or FN490.

CONCLUSIONS

CASP is an experiment that evaluates the state of

structure prediction. The assessment is based on known

structures that can be hidden from the predictors, thus

making predictions blind. The same cannot be done with

the function prediction category. Assessment is hampered

by the lack of new functional information. With the

exception of bound ligands, the assessors have no more

functional information at the end of the experiment than

was available to the predictors during the experiment. Six

months after the first evaluation there are only a few tar-

gets with newly annotated GO terms and no newly anno-

tated EC numbers at all. Previous experience from

CASP623 has shown that a even a full year is barely

enough to gather additional experimental information.

The majority of the GO functional terms that we were

able to add to the existing annotations are related to

ligand binding since 21 target structures clearly bound a

biologically relevant ligand. As it is difficult to draw con-

clusions based on a small and biased set of functional

annotation data, we had to include functional annota-

tions that were already known (present in Uniprot) in

the evaluation of GO terms and EC numbers.

In the GO term category there were more functional

annotations that came from known (and therefore not

blind) sources than new functional annotations. This was

even more of a problem in the EC number category.

What this means is that we are evaluating the all-round

predictive ability of the prediction methods, not only

how well they can predict unknown function for proteins

without functional annotation, but also how well they

can reproduce the functional annotations that are already

known for the target. Nevertheless, the new functional

annotations form a substantial and growing part of the

evaluation and the proportion of newly annotated terms

will continue to grow. There have even been a number of

new functional annotations since the November meetings

that have allowed us to monitor trends.

The fact that only 22 groups submitted predictions

was another surprise. This is an important and growing

field, so it is unfortunate that few groups were prepared

to predict and that groups that have published methods

in this area did not participate in the experiment. It is

almost certainly true that the slow release of functional

information that hampers the assessment was also the

cause of this low turnout. However, binding site predic-

tions were something that could have readily been eval-

uated so we were surprised that only two groups made

consistent predictions for this category.

The only category for which it was possible to make

any sort of conclusions about the efficacy of the methods

used was the GO molecular function category. This is

because there were far too few predictors in the binding

residue prediction category and too few new terms in the

EC number prediction category. Hence, the results from

the EC number prediction section will be biased towards

methods that are able to use easily available, nonblind

functional information.

In the GO molecular function prediction category, it

was possible to evaluate the predictions of 14 groups

over 57 targets. We can tentatively conclude that at pres-

ent group FN408 stands out as an all round prediction

method. It was significantly better than all but three

groups in when evaluated in head-to-head comparisons

over the set of all molecular function terms. Of those

three groups, one group made predictions for just five

targets and another was the prediction server of the same

group. These conclusions are conditional because we

know that GO molecular function terms are subject to

constant revision and will be changed and added to year

on year. Group FN510 also stood out in the subset of

newer GO terms, although this group was less good at

‘‘predicting’’ known GO terms.

The consensus predictions did not turn out to be as

good a predictor of function as we had hoped. This was

possibly because there were so few predictors. However,

one thing was clear, as the GO terms were updated and

the proportion of new terms within the subset of reliable

GO terms increased the scores for the consensus predic-

tions actually decreased relative to the best predictors,

suggesting that the average predictor was less good at

predicting the more difficult GO terms.

There are a number of difficulties in running a func-

tion prediction assessment in CASP, but nevertheless the

general opinion during the CASP meeting was that the

function prediction category is important and should be

maintained. Whether this future is under exactly the

same rules as the other categories is another question. At

present the main problem for an experiment like CASP

is the fact that it may take several years for functional

annotations to be known. While it does mean that the

evaluation can be revisited some time in the future, it is

not ideal for a rapidly developing field where predictors

need to make use of the results and the evaluation in

order to refine their methods.

One other possibility is to select a small number of

targets that can be kept hidden from the predictors.

Inevitably this would mean that there would be very few

targets to evaluate and hence any meaningful evaluation

would have to run over a longer time period than a

CASP experiment. This would entail a rolling evaluation

in the style of the CAPRI docking experiment24 that

could be run in conjunction with experimental groups.

There are many other function prediction experiments

such as AFP (http://biofunctionprediction.org/), BioCrea-

tive (http://biocreative.sourceforge.net), GeneFun (http://www.

genefun.org), and MouseFunc (http://www.mousefunc.

org)25–28 that exploit a range of predictive methods to

G. Lopez et al.

172 PROTEINS DOI 10.1002/prot

Page 9: Assessment of predictions submitted for the CASP7 function prediction category

make functional annotations and it might be fruitful to

bring some of these groups into the CASP function pre-

diction process. Some of these experiments have been

able to persuade databases or experimental groups to

provide annotations previous to their release, for example

the sets of annotated GO terms in BioCreative I and the

protein interaction data in BioCreative II were both pro-

vided prior to release.

One category in this evaluation would certainly be

suitable for a CASP-style experiment. Since target struc-

tures are submitted to the CASP organizers complete

with bound ligands, it should be possible to evaluate pre-

dictions for ligand-binding residues. Two subcategories of

binding site prediction were proposed in the CASP7

meeting. In the first predictors would not know whether

a ligand was bound to a target and would be expected to

make automatic prediction of binding residues. In the

second the predictors would know in advance that a cer-

tain ligand would bind to the structure. The second part

of the experiment would apply to fewer targets (20% if

this CASP was a representative example), so more

detailed methods and human predictors would be able to

spend more time on the predictions. It is certainly possi-

ble to predict binding sites for even the hardest of remote

homologues as can be seen in Figure 6.

However, for this experiment to function properly,

CASP would almost certainly need to encourage function

prediction groups that do not normally participate in

CASP experiments to submit predictions.

ACKNOWLEDGMENTS

Thanks to all those at the Prediction Center, especially

Andriy Krysthafovych. Thanks to Angela del Pozo for

invaluable help in handling GO terms.

REFERENCES

1. Friedberg I, Harder T, Godzik A. JAFA: a protein function annota-

tion meta-server. Nucleic Acids Res 2006;34:W379–W381.

2. Hawkins T, Luban S, Kihara D. Enhanced automated function pre-

diction using distantly related sequences and contextual association

by PFP. Protein Sci 2006;15:1550–1556.

Figure 6A difficult (but perfectly possible) prediction: (a) Representation of the nickel-binding site in PDB structure 1rxqA. The nickel is bound to histidines 64, 157 and 161. (b)

Target T0369 binds nickel on histidines 48, 123, and 127. (c) 3D-jury alignment for T0369 and 1rxq_A passed through firestar. The three binding histidines are

conserved in both template and target. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

CASP7 Function Prediction Assessment

DOI 10.1002/prot PROTEINS 173

Page 10: Assessment of predictions submitted for the CASP7 function prediction category

3. Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for pre-

dicting protein function from 3D structure. Nucleic Acids Res

2005;33:W89–W93.

4. Pazos F, Bang J-W. Computational prediction of functionally im-

portant regions in proteins. Curr Bioinform 2006;1:15–23.

5. Devos D, Valencia A. Practical limits of function prediction. Pro-

teins 2000;41:98–107.

6. Todd AE, Orengo CA, Thornton JM. Evolution of function in pro-

tein superfamilies, from a structural perspective. J Mol Biol 2001;

307:1113–1143.

7. Soro S, Tramontano A. The prediction of protein function at

CASP6. Proteins 2005;61:201–213.

8. Kopp J, Schwede T. The SWISS-MODEL Repository of annotated

three-dimensional protein structure homology models Nucleic

Acids Res 2004;32:D230–D234.

9. Castrignano T, De Meo PD, Cozzetto D, Talamo IG, Tramontano

A. The PMDB protein model database. Nucleic Acids Res 2006;34:

D306–D309.

10. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi

A, Marti-Renom M, Karchin R, Webb BM, Eramian D, Shen MY,

Kelly L, Melo F, Sali A. MODBASE: a database of annotated com-

parative protein structure models and associated resources. Nucleic

Acids Res 2006;34:D291–D295.

11. IUBMB. Enzyme Nomenclature. New York: Academic Press; 1992.

12. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,

Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP,

Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,

Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the

unification of biology. The gene ontology consortium. Nat Genet

2000;25:25–29.

13. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,

Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of

protein database search programs. Nucleic Acids Res 1997;25:3389–

3402.

14. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig

H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids

Res 2000;28:235–242.

15. Clarke ND, Ezkurdia I, Kopp J, Read R, Schwede T, Tress ML. Domain

definition and target classification for CASP7. Proteins, this issue.

16. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro

S, Gasteiger E, Huang H, Lopez R, Magrane M, Natale DA,

O’Donovan C, Redaschi N, Yeh LS. The universal protein resource

(UniProt). Nucleic Acids Res 2005;33:D154–D159.

17. Lopez G, Valencia A, Tress ML. FireDB—a database of functionally

important residues from proteins of known structure. Nucleic Acids

Res 2007;35:D219–D223.

18. Zemla A. LGA—a method for finding 3D similarities in protein

structures. Nucleic Acids Res 2003;31:3370–3374.

19. Lopez G, Valencia A, Tress ML. firestar—prediction of functionally

important residues using structural templates and alignment reli-

ability. Nucleic Acids Res 2007;35: Web server issue, in press.

20. Tress ML, Grana O, Valencia A. SQUARE-determining reliable

regions in sequence alignments. Bioinformatics 2004;20:974–975.

21. Michalsky E, Dunkel M, Goede A, Preissner R. SuperLigands—a

database of ligand structures derived from the protein data bank.

BMC Bioinform 2005;6:122.

22. Mitchell JBO. The relationship between the sequence identities of

slpha helical proteins in the PDB and the molecular similarities of

their ligands. J Chem Inf Comput Sci 2001;41:1617–1622.

23. Pellegrini-Calace M, Soro S, Tramontano A. Revisiting the predic-

tion of protein function at CASP6. FEBS J 2006;273:2977–2983.

24. Janin J, Henrick K, Moult J, Ten Eyck L, Sternberg MJE, Vajda S,

Vakser I, Wodak SJ. CAPRI: A Critical assessment of predicted

interactions. Proteins 2003;52:2–9.

25. Friedberg I, Jambon M, Godzik A. New avenues in protein function

prediction. Protein Sci 2006;15:1527–1529.

26. Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreA-

tIvE: critical assessment of information extraction for biology. BMC

Bioinform 2005;6:S1.

27. In-silico prediction of gene function. Ref. U.E.: LSHG-CT-2004-

503567.

28. Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple

approach to improve protein structure predictions. Bioinformatics

2003;19:1015–108.

G. Lopez et al.

174 PROTEINS DOI 10.1002/prot