Protein Secondary Structures

38
April 8, 2003 Claus Lundegaard CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU Protein Secondary Protein Secondary Structures Structures Assignment and prediction

description

Protein Secondary Structures. Assignment and prediction. Use of secondary structure. Classification of protein structures Definition of loops/core Use in fold recognition methods Improvements of alignments Definition of domain boundaries. Secondary Structure Elements. - PowerPoint PPT Presentation

Transcript of Protein Secondary Structures

Page 1: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Protein Secondary StructuresProtein Secondary Structures

Assignment and prediction

Page 2: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Use of secondary structureUse of secondary structure

• Classification of protein structures• Definition of loops/core• Use in fold recognition methods• Improvements of alignments• Definition of domain boundaries

Page 3: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Secondary Structure Secondary Structure ElementsElements

Page 4: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Classification of secondary Classification of secondary structurestructure

• Defining features– Dihedral angles– Hydrogen bonds– Geometry

• Assigned manually by crystallographers or

• Automatic– DSSP (Kabsch & Sander,1983)– STRIDE (Frishman & Argos, 1995)– Continuum (Andersen et al.)

Page 5: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Dihedral AnglesDihedral Angles

phi - dihedral angle about the N-Calpha bondpsi - dihedral angle about the Calpha-C bondomega - dihedral angle about the C-N (peptide) bond

From http://www.imb-jena.de

Page 6: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Alpha helicesAlpha helices phi(deg) psi(deg) H-bond pattern------------------------------------------------------------------right-handed alpha-helix -57.8 -47.0 i+4pi-helix -57.1 -69.7 i+53-10 helix -74.0 -4.0 i+3

(omega is 180 deg in all cases)-----------------------------------------------------------------From http://www.imb-jena.de

Page 7: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Beta StrandsBeta Strands phi(deg) psi(deg) omega (deg)------------------------------------------------------------------beta strand -120 120 180 -----------------------------------------------------------------

Hydrogen bond patterns in beta sheets. Here a four-stranded beta sheet is drawn schematically which contains three antiparallel and one parallel strand. Hydrogen bonds are indicated with red lines (antiparallel strands) and green lines (parallel strands) connecting the hydrogen and receptor oxygen.

From http://broccoli.mfn.ki.se/pps_course_96/

Page 8: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Secondary Structure TypesSecondary Structure Types

* H = alpha helix * B = residue in isolated beta-bridge * E = extended strand, participates in beta ladder * G = 3-helix (3/10 helix) * I = 5 helix (pi helix) * T = hydrogen bonded turn * S = bend

Page 9: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Automatic assignment programsAutomatic assignment programs• DSSP ( http://www.cmbi.kun.nl/gv/dssp/ )

• STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html )

# RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 4 A E 0 0 205 0, 0.0 2,-0.3 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7 42.2 25.1 2 5 A H - 0 0 127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987 360.0-152.8-149.1 154.0 9.4 41.3 24.7 3 6 A V - 0 0 66 -2,-0.3 21,-2.6 2, 0.0 2,-0.5 -0.995 4.6-170.2-134.3 126.3 11.5 38.4 23.5 4 7 A I E -A 23 0A 106 -2,-0.4 2,-0.4 19,-0.2 19,-0.2 -0.976 13.9-170.8-114.8 126.6 15.0 37.6 24.5 5 8 A I E -A 22 0A 74 17,-2.8 17,-2.8 -2,-0.5 2,-0.9 -0.972 20.8-158.4-125.4 129.1 16.6 34.9 22.4 6 9 A Q E -A 21 0A 86 -2,-0.4 2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4 -98.9 106.4 19.9 33.0 23.0 7 10 A A E +A 20 0A 18 13,-2.5 13,-2.5 -2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7 20.7 31.8 19.5 8 11 A E E +A 19 0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2 -0.933 4.4 175.4-139.1 156.9 23.4 29.4 18.4 9 12 A F E -A 18 0A 31 9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967 13.3-160.9-160.6 151.3 24.4 27.6 15.3 10 13 A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7,-0.2 -0.994 16.5-156.0-136.8 132.1 27.2 25.3 14.1 11 14 A L E >> -A 16 0A 24 5,-3.2 4,-1.7 -2,-0.4 5,-1.3 -0.929 11.7-122.6-120.0 133.5 28.0 24.8 10.4 12 15 A N T 45S+ 0 0 54 -2,-0.4 -2, 0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8 150.9 29.7 22.0 8.6 13 16 A P T 45S+ 0 0 114 0, 0.0 -1,-0.2 0, 0.0 -2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0 21.6 6.8 14 17 A D T 45S- 0 0 66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752 89.3-146.2 -64.6 -23.0 33.0 25.2 7.6 15 18 A Q T <5 + 0 0 132 -4,-1.7 2,-0.3 1,-0.2 -3,-0.2 0.936 51.1 134.1 52.9 50.0 33.3 24.2 11.2 16 19 A S E < +A 11 0A 44 -5,-1.3 -5,-3.2 2, 0.0 2,-0.3 -0.877 28.9 174.9-124.8 156.8 32.1 27.7 12.3 17 20 A G E -A 10 0A 28 -2,-0.3 2,-0.3 -7,-0.2 -7,-0.2 -0.893 15.9-146.5-151.0-178.9 29.6 28.7 14.8 18 21 A E E -A 9 0A 14 -9,-1.8 -9,-1.5 -2,-0.3 2,-0.4 -0.979 5.0-169.6-158.6 146.0 28.0 31.5 16.7 19 22 A F E +A 8 0A 3 12,-0.4 12,-2.3 -2,-0.3 2,-0.3 -0.982 27.8 149.2-139.1 120.3 26.5 32.2 20.1 20 23 A M E -AB 7 30A 0 -13,-2.5 -13,-2.5 -2,-0.4 2,-0.4 -0.983 39.7-127.8-152.1 161.6 24.5 35.4 20.6 21 24 A F E -AB 6 29A 45 8,-2.4 7,-2.9 -2,-0.3 8,-1.0 -0.934 23.9-164.1-112.5 137.7 21.7 37.0 22.6 22 25 A D E -AB 5 27A 6 -17,-2.8 -17,-2.8 -2,-0.4 2,-0.5 -0.948 6.9-165.0-123.7 138.3 18.9 38.9 20.8 23 26 A F E > S-AB 4 26A 76 3,-3.5 3,-2.1 -2,-0.4 -19,-0.2 -0.947 78.4 -27.2-127.3 111.5 16.4 41.3 22.3 24 27 A D T 3 S- 0 0 74 -21,-2.6 -20,-0.1 -2,-0.5 -1,-0.1 0.904 128.9 -46.6 50.4 45.0 13.4 42.1 20.2 25 28 A G T 3 S+ 0 0 20 -22,-0.3 2,-0.4 1,-0.2 -1,-0.3 0.291 118.8 109.3 84.7 -11.1 15.4 41.4 17.0 26 29 A D E < S-B 23 0A 114 -3,-2.1 -3,-3.5 109, 0.0 2,-0.3 -0.822 71.8-114.7-103.1 140.3 18.4 43.4 18.1 27 30 A E E -B 22 0A 8 -2,-0.4 -5,-0.3 -5,-0.2 3,-0.1 -0.525 24.9-177.7 -74.1 127.5 21.8 41.8 19.1

Page 10: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Straight HEC

Secondary Structure Secondary Structure PredictionPrediction

• What to predict?– All 8 types or pool types into groups

H

E

C

* H = helix * B = residue in isolated -bridge * E = extended strand, participates in ladder * G = 3-helix (3/10 helix)* I = 5 helix ( helix)* T = hydrogen bonded turn * S = bend * C/.= random coil

CASP

Q3

Page 11: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Secondary Structure Secondary Structure PredictionPrediction

• Simple alignments.• Heuristic Methods (e.g., Chou-Fasman, 1974)• Neural Networks (different inputs)

– Raw Sequence (late 80’s)– Blosum matrix (e.g., PhD, early 90’s)– Position specific alignment profiles (e.g., PsiPred,

late 90’s)– Multiple networks balloting, probability

conversion, output expansion (Petersen et al., 2000).

Page 12: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Improvement of accuracyImprovement of accuracy

1974 Chou & Fasman ~50-53%1978 Garnier 63%1987 Zvelebil 66%1988 Quian & Sejnowski 64.3%1993 Rost & Sander 70.8-72.0%1997 Frishman & Argos <75%1999 Cuff & Barton 72.9%1999 Jones 76.5%2000 Petersen et al. 77.9%

Page 13: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Simple AlignmentsSimple Alignments

• Solved structures homologous to query needed• Homologous proteins have ~88% identical (3 state) secondary structure • If no homologue can be identified alignment will give almost random results

Page 14: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Amino acid preferences in Amino acid preferences in --HelixHelix

Page 15: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Amino acid preferences in Amino acid preferences in --StrandStrand

Page 16: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Amino acid preferences in Amino acid preferences in coilcoil

Page 17: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Chou-FasmanChou-FasmanName P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Ala 142 83 66 0.06 0.076 0.035 0.058Arg 98 93 95 0.070 0.106 0.099 0.085Asp 101 54 146 0.147 0.110 0.179 0.081Asn 67 89 156 0.161 0.083 0.191 0.091Cys 70 119 119 0.149 0.050 0.117 0.128Glu 151 37 74 0.056 0.060 0.077 0.064Gln 111 110 98 0.074 0.098 0.037 0.098Gly 57 75 156 0.102 0.085 0.190 0.152His 100 87 95 0.140 0.047 0.093 0.054Ile 108 160 47 0.043 0.034 0.013 0.056Leu 121 130 59 0.061 0.025 0.036 0.070Lys 114 74 101 0.055 0.115 0.072 0.095Met 145 105 60 0.068 0.082 0.014 0.055Phe 113 138 60 0.059 0.041 0.065 0.065Pro 57 55 152 0.102 0.301 0.034 0.068Ser 77 75 143 0.120 0.139 0.125 0.106Thr 83 119 96 0.086 0.108 0.065 0.079Trp 108 137 96 0.077 0.013 0.064 0.167Tyr 69 147 114 0.082 0.065 0.114 0.125Val 106 170 50 0.062 0.048 0.028 0.053

Page 18: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Chou-FasmanChou-Fasman1. Assign all of the residues in the peptide the appropriate set of parameters.

2. Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(a-helix) > 100. That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(a-helix) < 100 is reached. That is declared the end of the helix. If the segment defined by this procedure is longer than 5 residues and the average P(a-helix) > P(b-sheet) for that segment, the segment can be assigned as a helix.

3. Repeat this procedure to locate all of the helical regions in the sequence.

4. Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(b-sheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(b-sheet) < 100 is reached. That is declared the end of the beta-sheet. Any segment of the region located by this procedure is assigned as a beta-sheet if the average P(b-sheet) > 105 and the average P(b-sheet) > P(a-helix) for that region.

5. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(a-helix) > P(b-sheet) for that region. It is a beta sheet if the average P(b-sheet) > P(a-helix) for that region.

6. To identify a bend at residue number j, calculate the following value:p(t) = f(j)f(j+1)f(j+2)f(j+3)where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) > 1.00 in the tetra-peptide; and (3) the averages for the tetra-peptide obey the inequality P(a-helix) < P(turn) > P(b-sheet), then a beta-turn is predicted at that location.

Page 19: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Chou-FasmanChou-Fasman

• General applicable• Works for sequences with no

solved homologs• Low Accuracy

Page 20: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Neural NetworksNeural Networks

• Benefits– General applicable– Can capture higher order correlations– Inputs other than sequence

information• Drawbacks

– Needs many data (different solved structures)

– Risk of overtraining

Page 21: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

ArchitectureArchitecture

IKEEHVI IQAE

HEC

IKEEHVIIQAEFYLNPDQSGEF…..Window

Input Layer

Hidden Layer

Output Layer

Weights

Page 22: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Sparse encodingSparse encoding

Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

AAcid

A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 23: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Input LayerInput Layer

IKEEHVI IQAE

000

000

100

000

000

000

00

Page 24: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

BLOSUM 62BLOSUM 62 A R N D C Q E G H I L K M F P S T W Y V B Z X *A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4

Page 25: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Input LayerInput Layer

IKEEHVI IQAE

-100

2-4

25-2

0-3-3

1-2-3-1

0-1-3

-2-2

Page 26: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Structure to StructureStructure to Structure

HECHECHEC

HEC

IKEEHVIIQAEFYLNPDQSGEF…..

Window

Input Layer

Hidden Layer

Output Layer

Weights

Page 27: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

PHD method (Rost and PHD method (Rost and Sander)Sander)

• Combine neural networks with sequence profiles– 6-8 Percentage points increase in prediction

accuracy over standard neural networks

• Use second layer “Structure to structure” network to filter predictions

• Jury of predictors• Set up as mail server

Page 28: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Position specific scoring Position specific scoring matrices matrices (BLAST profiles)(BLAST profiles)

A R N D C Q E G H I L K M F P S T W Y V 1 I -2 -4 -5 -5 -2 -4 -4 -5 -5 6 0 -4 0 -2 -4 -4 -2 -4 -3 4 2 K -1 -1 -2 -2 -3 -1 3 -3 -2 -2 -3 4 -2 -4 -3 1 1 -4 -3 2 3 E 5 -3 -3 -3 -3 3 1 -2 -3 -3 -3 -2 -2 -4 -3 -1 -2 -4 -3 1 4 E -4 -3 2 5 -6 1 5 -4 -3 -6 -6 -2 -5 -6 -4 -2 -3 -6 -5 -5 5 H -4 2 1 1 -5 1 -2 -4 9 -5 -2 -3 -4 -4 -5 -3 -4 -5 1 -5 6 V -3 0 -4 -5 -4 -4 -2 -3 -5 1 -2 1 0 1 -4 -3 3 -5 -3 5 7 I 0 -2 -4 1 -4 -2 -4 -4 -5 1 0 -2 0 2 -5 1 -1 -5 -3 4 8 I -3 0 -5 -5 -4 -2 -5 -6 1 2 4 -4 -1 0 -5 -2 0 -3 5 -1 9 Q -2 -3 -2 -3 -5 4 -1 3 5 -5 -3 -3 -4 -2 -4 2 -1 -4 2 -2 10 A 2 -4 -4 -3 2 -3 -1 -4 -2 1 -1 -4 -3 -4 1 2 3 -5 -1 1 11 E -1 3 1 1 -1 0 1 -4 -3 -1 -3 0 3 -5 4 -1 -3 -6 -3 -1 12 F -3 -5 -5 -5 -4 -4 -4 -1 -1 1 1 -5 2 5 -1 -4 -4 -3 5 2 13 Y 3 -5 -5 -6 3 -4 -5 -2 -1 0 -4 -5 -3 3 -5 -2 -2 -2 7 1 14 L -1 -3 -4 -2 1 5 1 -1 -1 -1 1 -3 -3 1 -5 -1 -1 -2 3 -2 15 N -1 -4 4 1 5 -3 -4 2 -4 -4 -4 -3 -2 -4 -5 2 0 -5 0 0 16 P -2 4 -4 -4 -5 0 -3 3 2 -5 -4 0 -4 -3 0 1 -2 -1 5 -3 17 D -3 -2 1 5 -6 -2 2 2 -1 -2 -2 -3 -5 -4 -5 -1 2 -6 -3 -4

Page 29: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

PSI-Pred (Jones, DT)PSI-Pred (Jones, DT)

• Use alignments from iterative sequence searches (PSI-Blast) as input to a neural network

• Better predictions due to better sequence profiles

• Available as stand alone program and via the web

Page 30: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Benchmarking secondary Benchmarking secondary structure predictionsstructure predictions

• CASP– Critical Assessment of Structure Predictions– Sequences from about-to-be-solved-

structures are given to groups who submit their predictions before the structure is published

• EVA– Newly solved structures are send to

prediction servers.

Page 31: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

EVA results (Rost et al., EVA results (Rost et al., 2001)2001)

• PROFphd 77.0%• PSIPRED 76.8%• SAM-T99sec 76.1%• SSpro 76.0%• Jpred2 75.5%• PHD 71.7%

– Cubic.columbia.edu/eva

Page 32: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

HEC

Output expansionOutput expansionOutput expansionOutput expansion

IKEEHVI IQAE

HEC

IKEEHVIIQAEFYLNPDQSGEF…..

Window

Input Layer

Hidden Layer

Output Layer

Weights

HEC

Page 33: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

• Sequence-to-structure– Window sizes 15,17,19 and 21– Hidden units 50 and 75– 10-fold cross validation => 80 predictions

• Structure-to-structure– Window size 17– Hidden units 40– 10-fold cross validation => 800 predictions

Several different Several different architecturesarchitectures

Page 34: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

• Confidence of a per residue prediction– P(Highest) – P(second highest)– H: 0.80 E: 0.05 C:0.15 => conf.=0.65

• Mean per chain confidence for all 800 predictions– Calculate Mean and Standard deviation– Averaging of per chain predictions with

Balloting procedureBalloting procedure

Page 35: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Activities to probabilitiesActivities to probabilities

0.05 0.1 0.15 … 1.00.05 0.990.100.15 0.9 0.83 0.75...1.0

Helix activitiesStrand activitiesCoil probabilities

Coil conversion

Page 36: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

EVA (400 low homology proteins)

Ranking Group name Q3 Performance

1 SBI-AT 77.2 %2 PROFsec B.Rost 76.3 %

3 Psi-pred D.Jones 76.2 %

• Sequence profiles as inputSequence profiles as input• Neural network technologyNeural network technology

• Balloting of large number of Neural Network Balloting of large number of Neural Network predictions (0.2%)predictions (0.2%)• Output expansion (0.5%)Output expansion (0.5%)• Probability transformation (1.2%)Probability transformation (1.2%)

Petersen et al.,Petersen et al., Proteins,Proteins, 41: 17- 41: 17-20, 200020, 2000

Page 37: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Links to serversLinks to servers

• Database of links– http://mmtsb.scripps.edu/cgi-bin/renderrelre

s?protmodel

• ProfPHD – http://cubic.bioc.columbia.edu/

• PSIPRED– http://bioinf.cs.ucl.ac.uk/psipred/

• JPred– www.compbio.dundee.ac.uk/Software/JPred/

jpred.html

Page 38: Protein Secondary Structures

April 8, 2003 Claus Lundegaard

CEN

TER

FO

R B

IOLO

GIC

AL S

EQ

UEN

CE A

NA

LY

SIS

TEC

HN

ICA

L U

NIV

ER

SIT

Y O

F D

EN

MA

RK

DTU

Practical ConclusionPractical Conclusion

• If you need a secondary structure prediction use one of the newer ones such as– ProfPHD,– PSIPRED, and– JPred

• And not one of the older ones such as – Chou-Fasman, and– Garnier