1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.
-
Upload
neal-flynn -
Category
Documents
-
view
239 -
download
1
Transcript of 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.
![Page 1: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/1.jpg)
1
CAP5510 – BioinformaticsProtein Structures
Tamer Kahveci
CISE Department
University of Florida
![Page 2: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/2.jpg)
2
What and Why?
• Proteins fold into a three dimensional shape
• Structure can reveal functional information that we can not find from sequence
• Misfolding proteins can cause diseases
– Sickle cell anemia, mad cow disease
• Used in drug design
Hemoglobin
Normal v.s. sickled blood cells
E → V
HIV proteaseinhibitor
![Page 3: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/3.jpg)
3
Goals
• Understand protein structures• Primary, secondary, tertiary
• Learn how protein shapes are– determined– Predicted
• Structure comparison (?)
![Page 4: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/4.jpg)
4
A Protein Sequence
>gi|22330039|ref|NP_683383.1| unknown protein; protein id: At1g45196.1 [Arabidopsis thaliana]
MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL
DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW
SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY
SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA
QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM
![Page 5: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/5.jpg)
5
• Basic Amino AcidStructure:– The side chain, R,
varies for each ofthe 20 amino acids
C
RR
C
H
NO
OHH
H
Aminogroup
Carboxylgroup
Side chain
Amino Acid Composition
![Page 6: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/6.jpg)
6
The Peptide Bond
• Dehydration synthesis• Repeating backbone: N–C –C –N–C –C
– Convention – start at amino terminus and proceed to carboxy terminus
O O
![Page 7: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/7.jpg)
7
Peptidyl polymers
• A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids.
• We call the units of a protein amino acid residues.
carbonylcarbonylcarboncarbon
amideamidenitrogennitrogen
![Page 8: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/8.jpg)
8
Side chain properties
• Carbon does not make hydrogen bonds with water easily – hydrophobic
• O and N are generally more likely than C to h-bond to water – hydrophilic
• We group the amino acids into three general groups:– Hydrophobic– Charged (positive/basic & negative/acidic)– Polar
![Page 9: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/9.jpg)
9
The Hydrophobic Amino Acids
![Page 10: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/10.jpg)
10
The Charged Amino Acids
![Page 11: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/11.jpg)
11
The Polar Amino Acids
![Page 12: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/12.jpg)
12
More Polar Amino Acids
And then there’s…And then there’s…
![Page 13: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/13.jpg)
13
Phi () – the angle of rotation about the N-C bond.
Psi () – the angle of rotation about the C-C bond.
The planar bond angles and bond lengths are fixed.
Planarity of the Peptide Bond
![Page 14: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/14.jpg)
14
• Primary structure = the linear sequence of amino acids comprising a protein:
AGVGTVPMTAYGNDIQYYGQVT…• Secondary structure
– Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the-sheet
– The location of direction of these periodic, repeating structures is known as the secondary structure of the protein
Primary & Secondary Structure
![Page 15: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/15.jpg)
15
60°
The Alpha Helix
![Page 16: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/16.jpg)
16
Properties of the Alpha Helix
60°• Hydrogen bonds
between C=O ofresidue n, andNH of residuen+4
• 3.6 residues/turn• 1.5 Å/residue rise• 100°/residue turn
![Page 17: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/17.jpg)
17
Properties of -helices
• 4 – 40+ residues in length
• Often amphipathic or “dual-natured”– Half hydrophobic and half hydrophilic
• If we examine many -helices,we find trends…– Helix formers: Ala, Glu, Leu, Met– Helix breakers: Pro, Gly, Tyr, Ser
![Page 18: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/18.jpg)
18
The beta strand (& sheet)
135° +135°
![Page 19: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/19.jpg)
19
Properties of beta sheets
• Formed of stretches of 5-10 residues in extended conformation
• Parallel/aniparallel,contiguous/non-contiguous
![Page 20: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/20.jpg)
20
Anti-Parallel Beta Sheets
![Page 21: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/21.jpg)
21
Parallel Beta Sheets
![Page 22: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/22.jpg)
22
Mixed Beta Sheets
![Page 23: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/23.jpg)
23
Turns and Loops
• Secondary structure elements are connected by regions of turns and loops
• Turns – short regions of non-, non- conformation
• Loops – larger stretches with no secondary structure. – Sequences vary much more than secondary
structure regions
![Page 24: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/24.jpg)
24
Ramachandran Plot
![Page 25: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/25.jpg)
25
Levels of Protein
Structure
• Secondary structure elements combine to form tertiary structure
• Quaternary structure occurs in multienzyme complexes
![Page 26: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/26.jpg)
26
Protein Structure Example
Beta Sheet
Helix Loop
ID: 12as2 chains
![Page 27: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/27.jpg)
27
Wireframe Ball and stick
Views of a Protein
![Page 28: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/28.jpg)
28
Views of a protein
Spacefill Cartoon
CPK colors
Carbon = green, black, or grey
Nitrogen = blue
Oxygen = red
Sulfur = yellow
Hydrogen = white
![Page 29: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/29.jpg)
29
Common Protein Motifs
![Page 30: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/30.jpg)
30
• Four helical bundle:
Globin domain:
Mostly Helical Folding Motifs
![Page 31: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/31.jpg)
31
/ barrel:
/ Motifs
![Page 32: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/32.jpg)
32
Open TwistedBeta Sheets
![Page 33: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/33.jpg)
33
Beta Barrels
![Page 34: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/34.jpg)
34
Determining the Structure of a Protein
Experimental Methods
•X-ray
•NMR
As of August 2013, structure of > 85,000 proteins are determined
![Page 35: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/35.jpg)
35
X-Ray Crystallography
Crystals diffract X-rays in regular patterns (Max Von Laue, 1912)
Discovery of X-rays(Wilhelm Conrad Röntgen, 1895)
The first X-ray diffraction pattern from a protein crystal (Dorothy Hodgkin, 1934)
![Page 36: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/36.jpg)
36
X-Ray Crystallography
• Grow millions of protein crystals– Takes months
• Expose to radiation beam• Analyze the image with
computer– Average over many copies
of images• PDB• Not all proteins can be
crystallized!
![Page 37: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/37.jpg)
37
NMR
• Nuclear Magnetic Resonance• Nuclei of atoms vibrate when exposed to
oscillating magnetic field• Detect vibrations by external sensors• Computes inter-atomic distances.
• Requires complex analysis. NMR can be used for short sequences (<200 residues)
• More than one model can be derived from NMR.
![Page 38: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/38.jpg)
38
Determining the Structure of a Protein
Computational Methods
![Page 39: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/39.jpg)
39
The Protein Folding Problem
• Central question of molecular biology:“Given a particular sequence of amino acid residues (primary structure), what will the secondary/tertiary/quaternary structure of the resulting protein be?”
• Input: AAVIKYGCAL…Output: 11, 22…
![Page 40: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/40.jpg)
40
Structure v.s. Sequence
• Observation: A protein with the same sequence (under the same circumstances) yields the same shape.
• Protein folds into a shape that minimizes the energy needed to stay in that shape.
• Protein folds in ~10-15 seconds.
![Page 41: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/41.jpg)
41
Secondary Structure Prediction
![Page 42: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/42.jpg)
42
Chou-Fasman methods
• Uses statistically obtained Chou-Fasman parameters.
• For each amino acid has– P(a): alpha– P(b): beta– P(t): turn– f(): additional turn parameter.
![Page 43: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/43.jpg)
43
Chou-Fasman ParametersName Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Alanine A 142 83 66 0.06 0.076 0.035 0.058Arginine R 98 93 95 0.07 0.106 0.099 0.085Aspartic Acid D 101 54 146 0.147 0.11 0.179 0.081Asparagine N 67 89 156 0.161 0.083 0.191 0.091Cysteine C 70 119 119 0.149 0.05 0.117 0.128Glutamic Acid E 151 37 74 0.056 0.06 0.077 0.064Glutamine Q 111 110 98 0.074 0.098 0.037 0.098Glycine G 57 75 156 0.102 0.085 0.19 0.152Histidine H 100 87 95 0.14 0.047 0.093 0.054Isoleucine I 108 160 47 0.043 0.034 0.013 0.056Leucine L 121 130 59 0.061 0.025 0.036 0.07Lysine K 114 74 101 0.055 0.115 0.072 0.095Methionine M 145 105 60 0.068 0.082 0.014 0.055Phenylalanine F 113 138 60 0.059 0.041 0.065 0.065Proline P 57 55 152 0.102 0.301 0.034 0.068Serine S 77 75 143 0.12 0.139 0.125 0.106Threonine T 83 119 96 0.086 0.108 0.065 0.079Tryptophan W 108 137 96 0.077 0.013 0.064 0.167Tyrosine Y 69 147 114 0.082 0.065 0.114 0.125Valine V 106 170 50 0.062 0.048 0.028 0.053
![Page 44: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/44.jpg)
44
C.-F. Alpha Helix Prediction (1)
• Find P(a) for all letters• Find 6 contiguous letters, at least 4 of them have P(a) >
100• Declare these regions as alpha helix
A E A T T L C M Q S T Y C Y V
142 151 142 83 83 121 70 145 111 77 83 69 70 69 106
83 37 83 119 119 130 119 105 110 75 119 147 119 147 170
P(a)
P(b)
![Page 45: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/45.jpg)
45
C.-F. Alpha Helix Prediction (2)
• Extend in both directions until 4 consecutive letters with P(a) < 100 found
A E A T T L C M Q S T Y C Y V
142 151 142 83 83 121 70 145 111 77 83 69 70 69 106
83 37 83 119 119 130 119 105 110 75 119 147 119 147 170
P(a)
P(b)
![Page 46: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/46.jpg)
46
C.-F. Alpha Helix Prediction (3)
• Find sum of P(a) (Sa) and sum of P(b) (Sb) in the extended region– If region is long enough ( >= 5 letters) and P(a) > P(b) then
declare the extended region as alpha helix
A E A T T L C M Q S T Y C Y V
142 151 142 83 83 121 70 145 111 77 83 69 70 69 106
83 37 83 119 119 130 119 105 110 75 119 147 119 147 170
P(a)
P(b)
![Page 47: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/47.jpg)
47
C.-F. Beta Sheet Prediction
• Same as alpha helix replace P(a) with P(b)
• Resolving overlapping alpha helix & beta sheet– Compute sum of P(a) (Sa) and sum of P(b)
(Sb) in the overlap.– If Sa > Sb => alpha helix– If Sb > Sa => beta sheet
![Page 48: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/48.jpg)
48
C.-F. Turn Prediction
• An amino acid is predicted as turn if all of the following holds:– f(i)*f(i+1)*f(i+2)*f(i+3) > 0.000075– Avg(P(i+k)) > 100, for k=0, 1, 2, 3– Sum(P(t)) > Sum(P(a)) and Sum(P(b)) for i+k, (k=0, 1, 2, 3)
f()
A E A T T L C M Q S T Y C Y V
142 151 142 83 83 121 70 145 111 77 83 69 70 69 106
83 37 83 119 119 130 119 105 110 75 119 147 119 69 170
66 74 66 96 96 59 119 60 98 143 96 114 119 114 50
i i+1 i+2 i+3
P(a)
P(b)
P(t)
![Page 49: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/49.jpg)
49
Other Methods for SSE Prediction
• Similarity searching– Predator
• Markov chain
• Neural networks– PHD
• ~65% to 80% accuracy
![Page 50: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/50.jpg)
50
Tertiary Structure Prediction
![Page 51: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/51.jpg)
51
Forces driving protein folding
• It is believed that hydrophobic collapse is a key driving force for protein folding– Hydrophobic core– Polar surface interacting with solvent
• Minimum volume (no cavities)• Disulfide bond formation stabilizes• Hydrogen bonds• Polar and electrostatic interactions
![Page 52: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/52.jpg)
52
• Simple lattice models (HP-models or Hydrophobic-Polar models)– Two types of residues:
hydrophobic and polar– 2-D or 3-D lattice– The only force is
hydrophobic collapse– Score = number of
HH contacts
Fold Optimization
![Page 53: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/53.jpg)
53
Scoring Lattice Models
• H/P model scoring: count noncovalent hydrophobic interactions.
• Sometimes:– Penalize for buried polar or surface hydrophobic residues
![Page 54: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/54.jpg)
54
• For smaller polypeptides, exhaustive search can be used– Looking at the “best” fold, even in such a simple
model, can teach us interesting things about the protein folding process
• For larger chains, other optimization and search methods must be used– Greedy, branch and bound– Evolutionary computing, simulated annealing
Can we use lattice models?
![Page 55: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/55.jpg)
55
The “hydrophobic zipper” effect
Ken Dill ~ 1997
![Page 56: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/56.jpg)
56
Representing a lattice model
• Absolute directions– UURRDLDRRU
• Relative directions– LFRFRRLLFFL– Advantage, we can’t have
UD or RL in absolute– Only three directions: LRF
• What about bumps? LFRRR– Bad score– Use a better representation
![Page 57: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/57.jpg)
57
Preference-order representation
• Each position has two “preferences”– If it can’t have either of the
two, it will take the “least favorite” path if possible
• Example: {LR},{FL},{RL},{FR},{RL},{RL},{FL},{RF}
• Can still cause bumps:{LF},{FR},{RL},{FL},{RL},{FL},{RF},{RL},{FL}
![Page 58: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/58.jpg)
58
More realistic models
• Higher resolution lattices (45° lattice, etc.)• Off-lattice models
– Local moves– Optimization/search methods and /
representations• Greedy search• Branch and bound• EC, Monte Carlo, simulated annealing, etc.
![Page 59: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/59.jpg)
59
How to Evaluate the Result?
• Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold).
• Theoretical force field: G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb
• Empirical force fields– Start with a database– Look at neighboring residues – similar to known
protein folds?
![Page 60: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/60.jpg)
60
Comparative Modeling
1. Identify similar protein sequences from a database of known proteins (BLAST)
2. Find conserved regions by aligning these proteins (CLUSTAL-W)
3. Predict alpha helices and beta sheets from conserved regions, backbone
4. Predict loops5. Predict side chain positions6. Evaluate
![Page 61: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/61.jpg)
61
Threading: Fold recognition
• Given:– Sequence: IVACIVSTEYDVMKAAR…
– A database of molecular coordinates
• Map the sequence onto each fold
• Evaluate– Objective 1: improve
scoring function– Objective 2: folding
![Page 62: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/62.jpg)
62
Folding : still a hard problem
• Levinthal’s paradox – Consider a 100 residue protein. If each residue can take only 3 positions, there are 3100 = 5 1047 possible conformations.– If it takes 10-13s to convert from 1 structure to
another, exhaustive search would take 1.6 1027 years.
![Page 63: 1 CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida.](https://reader036.fdocuments.us/reader036/viewer/2022062801/56649e7d5503460f94b80c7c/html5/thumbnails/63.jpg)
63
Protein Classification
• Class: Similar secondary structure properties– All alpha, all beta, alpha/beta, alpha+beta
• Fold: major secondary structure similarity.– Globin like (6 helices, folded leaf, partly opened)
• Super family: distant homologs. 25-30% sequence identity.
• Family: close homologs. Evolved from the same ancestor. High identity.