BMMB597E Protein Evolution Protein classification 1.

Post on 16-Jan-2016

226 views 0 download

Tags:

Transcript of BMMB597E Protein Evolution Protein classification 1.

BMMB597EProtein Evolution

Protein classification

1

2

Protein families

• The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin, were solved (in 1959—60) before the amino acid sequences were determined

• It came as a surprise that the structures were quite similar

• Soon it became clear, on the basis of both sequences and structures, that there were families of proteins

myoglobin haemoglobin

3

4

50 years earlier, there were some hints …

• E.T. Reichert & A.P. Brown. The differentiation and specificity of corresponding proteins and other vital substances in relation to biological classification and organic evolution: the crystallography of hemoglobins. (Carnegie Institution of Washington, 1909)

• Crystallography 3 years before discovery of X-ray diffraction?

5

Reichert and Brown studied interfacial angles in haemoglobin crystals

• Stenö’s law (1669): different crystals of the same substance may have differerent sizes and shapes, but the angles between faces are constant for each substance

• They found that the angles differed from species to species

• Similarities in values of interfacial angles were consistent with classical taxonomic tree

• They even found differences between oxy- and deoxyhaemoglobin

6

Most premature scientific result ever?

• These results implied:– That proteins adopted (or at least could adopt)

unique structures, to form a crystal– That protein structures varied between species– That this variation was parallel with the evolution

of the species– That proteins could change structure as a result of

changes in state of ligation• In 1909!

7

M.O. Dayhoff

• Pioneer of bioinformatics• Collected protein sequences• First curated ‘database’• Recognized that proteins form families, on the

basis of amino acid sequences• Computational sequence alignments• First evolutionary tree • First amino-acid substitution matrix (later

replaced by BLOSUM)

8

Can relationships among proteins be extended beyond families?

• Families = sets of proteins with such obvious similarities that we assume that they are related

• One question: how much similarity do we need to believe in a relationship?

• How far can evolution go?• Convergent evolution?• Cautionary tale: chymotrypsin / subtilisin

9

Chymotrypsin-subtilisin

• Both proteolytic enzymes– Chymotrypsin mammalian– subtilisin from B. subtilis

• Both have catalytic triads• Same function – same mechanism• Sequences 12% similar (near noise level)

• However, structures show them to be unrelated

10

Chymotrypsin / Subtilisin

Catalytic triad in serine proteinases

11

12

Chymotrypsin and subtilisin have similar catalytic triads

13

How can we classify proteins that belong to families?

• Align sequences• Calculate phylogenetic tree (various ways to

do this, depend on sequence alignment)• Usually, phylogenetic tree of homologous

proteins from different species follow phylogenetic tree based on classical taxonomy

• That is reassuring• But what happens as divergence proceeds?

14

How can we classify proteins that do not obviously belong to families?

• Base this on structure rather than sequence• Structural similarities are maintained as

divergence proceeds, better than sequence similarities

• For closely related proteins, expect no difference between sequence-based and structure based classification

• How far can classification be extended?

15

SCOP Structural Classification of Proteins

• Idea of A.G. Murzin, based on old work by C. Chothia and M. Levitt

• Even if two proteins are not obviously homologous, they may share structural features, to a greater or lesser degree.

• For instance, the secondary structures of some proteins are only -helices

• Others, have -sheets but no -helices

16

SCOP

• SCOP is a database that gives a hierarchical classification of all protein domains

• Recall that a domain is a compact subunit of a protein structure that ‘looks as if’ it would have independent stability

Fragment of fibronectin

17

Dissection of structure into domains

• It is not always quite so obvious how to divide a protein into domains

• There is some (not a lot) of room for argument• Note that sometimes the chain passes back

and forth between domains• In these cases one or both domains do not

consist entirely of a consecutive set of residues

18

lactoferrin

19

SCOP, CATH, DALI Database classify protein structures

• SCOP (Structural Classification of Proteins) • CATH (Class, Architecture, Topology, Homologous

superfamily)• DALI Database • These web sites have many useful features: – information-retrieval engines, including

search by keyword or sequence– presentation of structure pictures– links to other related sites including bibliographical

databases.

20

SCOPhttp://www.scop.mrc-lmb.cam.ac.uk

• SCOP organizes protein structures in a hierarchy according to evolutionary origin and structural similarity.

• Domains -- extracted from the Protein Data Bank entries.

• Sets of domains are grouped into families: sets domains for which imilarities in structure, function and sequence imply a common evolutionary origin.

21

The SCOP hierarchy

• Families that share a common structure, or even a common structure and a common function, but lack adequate sequence similarity – so that the evidence for evolutionary relationship is suggestive but not compelling – are grouped into superfamilies

• Superfamilies that share a common folding topology, for at least a large central portion of the structure, are grouped as folds.

• Finally, each fold group falls into one of the general classes.

22

Major classes in SCOP

• – secondary structure all helical• – secondary structure all sheet• / – helices and sheets, but in different parts of

structure• + – contain -- supersecondary structure• ‘small proteins’ – which often have little

secondary structure and are held together by disulphide bridges or ligands; for instance, wheat-germ agglutinin)

23

Summary of SCOP hierarchy

• Class• Fold• Superfamily• Family• Domain

24

SCOP classification of flavodoxin

Protein: Flavodoxin from Clostridium beijerinckii [TaxId: 1520]Lineage:Root: scop Class: Alpha and beta proteins (a/b) [51349] Mainly parallel beta sheets (beta-alpha-beta units) Fold: Flavodoxin-like [52171] 3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345 Superfamily: Flavoproteins [52218] Family: Flavodoxin-related [52219] binds FMN Protein: Flavodoxin [52220] Species: Clostridium beijerinckii [TaxId: 1520] [52226] PDB Entry Domains:5nul complexed with fmn; mutant chain a [31191]

2fax complexed with fmn; mutant chain a [31194]

… many others

25

Clostridium beijerinckii Flavodoxin(stereo pair)

26

Flavodoxin NADPH-cytochrome P450 reductase

same superfamily, different family

27

Flavodoxin CHEY same fold, different superfamily

28

Flavodoxin Spinach ferredoxin reductase

same class, different folds

29

Flavodoxin in the SCOP hierarchy• To give some idea of the nature of the similarities expressed by the

differentlevels of the hierarchy

• Flavodoxin from Clostridium beijerinckii and NADPH-cytochrome P450 reductase are in the same superfamily, but different families.

• Flavodoxin and the signal transduction protein CHEY are in the same fold category, but different superfamilies.

• Flavodoxin and Spinach ferredoxin reductase are in the same class – + – but have different folds.

30

CATH presents a classification scheme similar to that of SCOP

• CATH = Class, Architecture, Topology, Homologous superfamily, the levels of its hierarchy.

• In CATH, proteins with very similar structures, sequences and functions are grouped into sequence families.

• A homologous superfamily contains proteins for which similarity of sequence and structure gives evidence of common ancestry

• A topology or fold family comprises sets of homologous superfamilies that share the spatial arrangement and connectivity of helices and strands

• Architectures are groups of proteins with similar arrangements of helices and sheets, but with different connectivity. For instance, different four -helix bundles with different connectivities would share the same architecture but not the same topology in CATH

• General classes of architectures in CATH are: . , - (subsuming the / and + classes of SCOP), and domains of low secondary structure content.

31

Do different classification schemes agree?• To classify protein structures (or any other set of objects) you

need to be able to measure the similarities among them. • The measure of similarity induces a tree-like representation of

the relationships. • CATH, SCOP, DALI and the others, agree, for the most part, on

what is similar, and the tree structures of their classifications are therefore also similar.

• However, even an objective measure of similarity does not specify how to define the different levels of the hierarchy.

• These are interpretative decisions, and any apparent differences in the names and distinctions between the levels disguise the underlying general agreement about what is similar and what is different.