Lecture 10 Protein Tertiary (3D)...

Introduction to Bioinformaticsfor Medical Research

Gideon [email protected]

Lecture 10Protein Tertiary (3D) Structure

2

Protein Tertiary Structure

• Defining Structure• Determining experimentally

– PDB• Predicting Structure

– TOPITS– GenTHREADER

• Structural classification– SCOP

3

Defining Structure

1 N MET A 1 -14.830 -2.121 10.034 2 CA MET A 1 -14.608 -1.535 8.679 3 C MET A 1 -15.821 -1.799 7.781 4 O MET A 1 -15.713 -2.464 6.770 5 CB MET A 1 -13.372 -2.254 8.135 6 CG MET A 1 -13.531 -3.764 8.330 7 SD MET A 1 -12.739 -4.636 6.956 8 CE MET A 1 -13.839 -6.072 6.937 9 1H MET A 1 -15.554 -2.865 9.976 10 2H MET A 1 -13.942 -2.531 10.386

Hydrogen number

Residue numberRemotenessAtomic symbol

3D co-ordsChainResidue

4

X-ray Crystallography

• Create repetitive crystal of molecule– Often difficult, especially hydrophobic portions

• X-rays generate diffraction pattern– Pattern represents electron density

• Generate comparison patterns– Add ions or change wavelength

• Obtain electron density map– Fit protein sequence to map

5

Nuclear Magnetic Resonance

• Dissolve molecules in water– Allows free tumbling and vibration

• Detect activity of atoms with quantum spin– 1Hydrogen (natural), 13Carbon, 15Nitrogen

• Defines set of atomicdistance constraints– Ensemble of models

• Can detect motion

6

PDB

• Database of molecular structures– Obtained by crystallography or NMR– Carefully curated and validated

• Founded in 1971– 19375 proteins, 2117 other structures

• Additional protein information– Secondary structure– References, external links

7

PDB: Summary Information

Chains in molecule

Experimentalmethod

Molecule in PDB entry

Link to SCOP

8

PDB: 3D Structure

• Still images at fixed orientation– Generate at any size

• Interactive molecule explorer– Requires Java or Chime plug-in

• Download structure file– Display in RasMol,

Swiss-PDBViewer, etc…• Demonstration

9

Predicting 3D Structure

• Outstanding difficult problem• Based only on protein sequence

– Comparative modeling (homology)– Ab-initio modeling

• Based on secondary structure– Fold recognition– Protein threading

10

Comparative Modeling

• Similar sequence suggests similar structure– Amino acid characteristics determine folding

• Similarity particularly high in core– Alpha helices and beta sheets preserved– But even near-identical sequences vary in loops

• Effectiveness depends on protein length– Longer fi less sequence similarity required

11

Ab Initio Modeling

• Compute molecular structure from laws ofphysics and chemistry alone– Ideal solution (theoretically)

• Simulate process of protein folding– Apply minimum energy considerations

• Practically nearly impossible– Exceptionally complex calculations– Biophysics understanding incomplete

12

Protein Folds

• A combination of secondary structural units– Forms basic level of classification

• Each protein family belongs to a fold– Estimated 700–1500 different folds

13

Fold Recognition / Threading

• Compare sequence against known structures– Try to ‘thread’ sequence along chain

• Score suitability of the threading– Can adjacent amino acids bond?– Are amino acids close to or far from water?– Are secondary structures similar?

• Examine list of most threadable structures– Correct answer is often in top 10 or so

14

Threading Example

Knownstructure

Querysequence

Gaps inthreading

15

TOPITS Output (1)

Alignmentscore

Alignmentlength

Lengthof indels

Numberof indels

Length ofsequence

Alignmentsignificance

Matchedsequence

% sequenceidentity

16

TOPITS Output (2)

Querysequence

Predictedstructure

Buried /Outside

Databasesequence

Amino acidmatches

Database knownsecondary structure

17

GenTHREADER Output

Predictionconfidence

Expectederrors

Score fromneural network

Sequence alignmentscore and length

Energymeasurements

Length ofsequence

Structurefrom PDB

18

Prediction Flowchart

PSI-BLAST

Ab initiomethods

TOPITS,GenTHREADER

PHDsec,PSIpred

19

Structure Classification

• Class– All alpha, all beta, alpha/beta, alpha+beta

• Fold– Strong structural similarity

• Superfamily– Probably common evolutionary origin

• Family– Evolutionary relationship, sequence similarity

20

SCOP

• Structural Classification of Proteins– Based on known protein structures– Manually created by visual inspection

• Hierarchical database structure– Class, fold, superfamily, family– Proteins/domains, species instances

• Founded in 1995– 765 folds, 1232 superfamilies, 2164 families

21

SCOP: Navigation

Nodename

Nodedescription

Path fromroot to node

Childrenof node

22

Other Resources

• CATH (classification of protein domains)– http://www.biochem.ucl.ac.uk/bsm/cath/

• SWISS-MODEL (comparative modeling)– http://www.expasy.ch/swissmod/

• CASP (structure prediction competition)– http://predictioncenter.llnl.gov/

• GTSP (guide to structure prediction)– http://speedy.embl-heidelberg.de/gtsp/

Lecture 10 Protein Tertiary (3D)...

Documents

Transcript of Lecture 10 Protein Tertiary (3D)...