BMMB597E Protein Evolution Protein classification 1.

31
BMMB597E Protein Evolution Protein classification 1

Transcript of BMMB597E Protein Evolution Protein classification 1.

Page 1: BMMB597E Protein Evolution Protein classification 1.

BMMB597EProtein Evolution

Protein classification

1

Page 2: BMMB597E Protein Evolution Protein classification 1.

2

Protein families

• The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin, were solved (in 1959—60) before the amino acid sequences were determined

• It came as a surprise that the structures were quite similar

• Soon it became clear, on the basis of both sequences and structures, that there were families of proteins

Page 3: BMMB597E Protein Evolution Protein classification 1.

myoglobin haemoglobin

3

Page 4: BMMB597E Protein Evolution Protein classification 1.

4

50 years earlier, there were some hints …

• E.T. Reichert & A.P. Brown. The differentiation and specificity of corresponding proteins and other vital substances in relation to biological classification and organic evolution: the crystallography of hemoglobins. (Carnegie Institution of Washington, 1909)

• Crystallography 3 years before discovery of X-ray diffraction?

Page 5: BMMB597E Protein Evolution Protein classification 1.

5

Reichert and Brown studied interfacial angles in haemoglobin crystals

• Stenö’s law (1669): different crystals of the same substance may have differerent sizes and shapes, but the angles between faces are constant for each substance

• They found that the angles differed from species to species

• Similarities in values of interfacial angles were consistent with classical taxonomic tree

• They even found differences between oxy- and deoxyhaemoglobin

Page 6: BMMB597E Protein Evolution Protein classification 1.

6

Most premature scientific result ever?

• These results implied:– That proteins adopted (or at least could adopt)

unique structures, to form a crystal– That protein structures varied between species– That this variation was parallel with the evolution

of the species– That proteins could change structure as a result of

changes in state of ligation• In 1909!

Page 7: BMMB597E Protein Evolution Protein classification 1.

7

M.O. Dayhoff

• Pioneer of bioinformatics• Collected protein sequences• First curated ‘database’• Recognized that proteins form families, on the

basis of amino acid sequences• Computational sequence alignments• First evolutionary tree • First amino-acid substitution matrix (later

replaced by BLOSUM)

Page 8: BMMB597E Protein Evolution Protein classification 1.

8

Can relationships among proteins be extended beyond families?

• Families = sets of proteins with such obvious similarities that we assume that they are related

• One question: how much similarity do we need to believe in a relationship?

• How far can evolution go?• Convergent evolution?• Cautionary tale: chymotrypsin / subtilisin

Page 9: BMMB597E Protein Evolution Protein classification 1.

9

Chymotrypsin-subtilisin

• Both proteolytic enzymes– Chymotrypsin mammalian– subtilisin from B. subtilis

• Both have catalytic triads• Same function – same mechanism• Sequences 12% similar (near noise level)

• However, structures show them to be unrelated

Page 10: BMMB597E Protein Evolution Protein classification 1.

10

Chymotrypsin / Subtilisin

Page 11: BMMB597E Protein Evolution Protein classification 1.

Catalytic triad in serine proteinases

11

Page 12: BMMB597E Protein Evolution Protein classification 1.

12

Chymotrypsin and subtilisin have similar catalytic triads

Page 13: BMMB597E Protein Evolution Protein classification 1.

13

How can we classify proteins that belong to families?

• Align sequences• Calculate phylogenetic tree (various ways to

do this, depend on sequence alignment)• Usually, phylogenetic tree of homologous

proteins from different species follow phylogenetic tree based on classical taxonomy

• That is reassuring• But what happens as divergence proceeds?

Page 14: BMMB597E Protein Evolution Protein classification 1.

14

How can we classify proteins that do not obviously belong to families?

• Base this on structure rather than sequence• Structural similarities are maintained as

divergence proceeds, better than sequence similarities

• For closely related proteins, expect no difference between sequence-based and structure based classification

• How far can classification be extended?

Page 15: BMMB597E Protein Evolution Protein classification 1.

15

SCOP Structural Classification of Proteins

• Idea of A.G. Murzin, based on old work by C. Chothia and M. Levitt

• Even if two proteins are not obviously homologous, they may share structural features, to a greater or lesser degree.

• For instance, the secondary structures of some proteins are only -helices

• Others, have -sheets but no -helices

Page 16: BMMB597E Protein Evolution Protein classification 1.

16

SCOP

• SCOP is a database that gives a hierarchical classification of all protein domains

• Recall that a domain is a compact subunit of a protein structure that ‘looks as if’ it would have independent stability

Fragment of fibronectin

Page 17: BMMB597E Protein Evolution Protein classification 1.

17

Dissection of structure into domains

• It is not always quite so obvious how to divide a protein into domains

• There is some (not a lot) of room for argument• Note that sometimes the chain passes back

and forth between domains• In these cases one or both domains do not

consist entirely of a consecutive set of residues

Page 18: BMMB597E Protein Evolution Protein classification 1.

18

lactoferrin

Page 19: BMMB597E Protein Evolution Protein classification 1.

19

SCOP, CATH, DALI Database classify protein structures

• SCOP (Structural Classification of Proteins) • CATH (Class, Architecture, Topology, Homologous

superfamily)• DALI Database • These web sites have many useful features: – information-retrieval engines, including

search by keyword or sequence– presentation of structure pictures– links to other related sites including bibliographical

databases.

Page 20: BMMB597E Protein Evolution Protein classification 1.

20

SCOPhttp://www.scop.mrc-lmb.cam.ac.uk

• SCOP organizes protein structures in a hierarchy according to evolutionary origin and structural similarity.

• Domains -- extracted from the Protein Data Bank entries.

• Sets of domains are grouped into families: sets domains for which imilarities in structure, function and sequence imply a common evolutionary origin.

Page 21: BMMB597E Protein Evolution Protein classification 1.

21

The SCOP hierarchy

• Families that share a common structure, or even a common structure and a common function, but lack adequate sequence similarity – so that the evidence for evolutionary relationship is suggestive but not compelling – are grouped into superfamilies

• Superfamilies that share a common folding topology, for at least a large central portion of the structure, are grouped as folds.

• Finally, each fold group falls into one of the general classes.

Page 22: BMMB597E Protein Evolution Protein classification 1.

22

Major classes in SCOP

• – secondary structure all helical• – secondary structure all sheet• / – helices and sheets, but in different parts of

structure• + – contain -- supersecondary structure• ‘small proteins’ – which often have little

secondary structure and are held together by disulphide bridges or ligands; for instance, wheat-germ agglutinin)

Page 23: BMMB597E Protein Evolution Protein classification 1.

23

Summary of SCOP hierarchy

• Class• Fold• Superfamily• Family• Domain

Page 24: BMMB597E Protein Evolution Protein classification 1.

24

SCOP classification of flavodoxin

Protein: Flavodoxin from Clostridium beijerinckii [TaxId: 1520]Lineage:Root: scop Class: Alpha and beta proteins (a/b) [51349] Mainly parallel beta sheets (beta-alpha-beta units) Fold: Flavodoxin-like [52171] 3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345 Superfamily: Flavoproteins [52218] Family: Flavodoxin-related [52219] binds FMN Protein: Flavodoxin [52220] Species: Clostridium beijerinckii [TaxId: 1520] [52226] PDB Entry Domains:5nul complexed with fmn; mutant chain a [31191]

2fax complexed with fmn; mutant chain a [31194]

… many others

Page 25: BMMB597E Protein Evolution Protein classification 1.

25

Clostridium beijerinckii Flavodoxin(stereo pair)

Page 26: BMMB597E Protein Evolution Protein classification 1.

26

Flavodoxin NADPH-cytochrome P450 reductase

same superfamily, different family

Page 27: BMMB597E Protein Evolution Protein classification 1.

27

Flavodoxin CHEY same fold, different superfamily

Page 28: BMMB597E Protein Evolution Protein classification 1.

28

Flavodoxin Spinach ferredoxin reductase

same class, different folds

Page 29: BMMB597E Protein Evolution Protein classification 1.

29

Flavodoxin in the SCOP hierarchy• To give some idea of the nature of the similarities expressed by the

differentlevels of the hierarchy

• Flavodoxin from Clostridium beijerinckii and NADPH-cytochrome P450 reductase are in the same superfamily, but different families.

• Flavodoxin and the signal transduction protein CHEY are in the same fold category, but different superfamilies.

• Flavodoxin and Spinach ferredoxin reductase are in the same class – + – but have different folds.

Page 30: BMMB597E Protein Evolution Protein classification 1.

30

CATH presents a classification scheme similar to that of SCOP

• CATH = Class, Architecture, Topology, Homologous superfamily, the levels of its hierarchy.

• In CATH, proteins with very similar structures, sequences and functions are grouped into sequence families.

• A homologous superfamily contains proteins for which similarity of sequence and structure gives evidence of common ancestry

• A topology or fold family comprises sets of homologous superfamilies that share the spatial arrangement and connectivity of helices and strands

• Architectures are groups of proteins with similar arrangements of helices and sheets, but with different connectivity. For instance, different four -helix bundles with different connectivities would share the same architecture but not the same topology in CATH

• General classes of architectures in CATH are: . , - (subsuming the / and + classes of SCOP), and domains of low secondary structure content.

Page 31: BMMB597E Protein Evolution Protein classification 1.

31

Do different classification schemes agree?• To classify protein structures (or any other set of objects) you

need to be able to measure the similarities among them. • The measure of similarity induces a tree-like representation of

the relationships. • CATH, SCOP, DALI and the others, agree, for the most part, on

what is similar, and the tree structures of their classifications are therefore also similar.

• However, even an objective measure of similarity does not specify how to define the different levels of the hierarchy.

• These are interpretative decisions, and any apparent differences in the names and distinctions between the levels disguise the underlying general agreement about what is similar and what is different.