Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The -Helix Fold From Protein...
-
date post
19-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The -Helix Fold From Protein...
Finding the Beta Helix Motif By Marcin Mejran
Papers
Predicting The -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger
Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition by Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi Gopalakrishnan
Secondary StructureBeta Strand
• Forms -sheets
Alpha Helix• Stand alone
Can combine into more complex structures:
• Beta sheets
• Beta Helixes
Images from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html
sheet
Second and a half Structure
beta helix
beta barrel
beta trefoil
-Helix
-Helix
Helix composed of three parallel sheets
Three -strands per “rung”
Connecting “loops” Not in Eukaryotes Secreted by various
bacteria Right and left handed
-Helix Few solved
structures9 SCOP
SuperFamilies14 RH solved
structures in PDB Solved structures
differ widely
B3T2
B2
B1
-Helix
T2 turn: unique two residue loop
-strands are 3 to 5 residues.
T1 and T3 vary in size, may contain secondary structures
-strands interact between rungs
-Helix
Good choice from computational point of view
“Nice” structure Repeating parallel -stands Rungs have similar structure Stacking is predictable Well conserved -stand across super-
families
-Helix
Long term interactions Close in 3D but not 1D
“Non-unique” features B2-T2-B3 segment
Unique features not clearly shown in sequence
Usual methods don’t workImage from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.html
BetaWrap
“Wraps” sequences around helix Finds best “wrap” Uses B2, B3 strands and T2 turn
Rest of rung varies greatly in size
Decomposes into sub-problems Rungs Find multiple rungs Find B1 by local optimization
Hydrophobic/charged
HydrophobicDislikes Water
HydrophilicLike water
ChargedOn Outside
B3T2
B2
B1
Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt
BetaWrap: Rungs
Given a T2 turn, find the next T2 turn
B2
B3 T2Candidate
Rung
Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt
BetaWrap: Rungs More weight given to
inward pairs Certain stacked
Amino Acids preferred
Penalty for highly charged inward residues
Penalizes too few or too many residues
B3T2
B2
B1
Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt
BetaWrap: Multiple Rungs
Find multiple initial B2-T2-B3 segments
Match pattern based on hydrophobic residues (appear on the inside)
Φ – A,F,I,L,M,V,W,Y
– D,E,R,K
X - Any
AFDEMVRKYE FIFDDEAK EDEMVMVFD
BetaWrap: Multiple Rungs
DP is used to find 5 rungs in either direction from initial positions
α-helix filtering Take average score
of top 10 remaining wraps
Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt
BetaWrap: Completing
Find B1 positionsHighest scoring parseDoes not affect wrap
score. Further filtering on
hydrophobic residues in T1 and T2
Training Seven fold cross-validation
Partitioned based on families Scores calculated for
α-helix filtering thresholdB1-score thresholdHydrophobic count thresholddistribution of unmatched residues between
rungs
Image from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtml
BetaWrap: Results
BetaWrap: Results
Correctly identifies Beta-Helixes Correctly separates helixes and non-helixes Can predict -helixes across families
BetaWrap: SummaryPros: Finds beta-helixes AccurateCons: Still makes errors
Rung placement Hard coded information
Over-fittingHard to generalize
Conditional Random Fields (CRFs)
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
…HMM
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
…CRF
Hidden Markov Model
Set of States Transition Probabilities Emission Probabilities Only given sequence of
emitted residues Find sequence of true
states Generative
Res ProbA .2B .8
Res ProbA .2B .8
Res ProbA .2B .8
Hidden Markov Model HMM: Maximize
P(x,y|θ) = P(y|x,θ)P(x|θ)x: emitted state/given sequencey: “hidden”/true stateP(x,y|θ): Joint probability of x and yP(y|x,θ): Probability of y given xP(x|θ): Probability of x
Need to make assumptions about the distribution of x
Viterbi Algorithm HMM
Find most likely path/most likely sequence of hidden states
e3(x1)
e2(x1)
e1(x1)
e3(x2)
e2(x2)
e1(x2)
e3(x3)
e2(x3)
e1(x3)
e3(x4)
e2(x4)
e1(x4)
x1 x2 x3 x4
Viterbi Algorithm HMM
e3(x1)
e2(x1)
e1(x1)
e3(x2)
e2(x2)
e1(x2)
e3(x3)
e2(x3)
e1(x3)
e3(x4)
e2(x4)
e1(x4)
x1 x2 x3 x4
v(i,j) = max(v(i-1,1)*t1,j*ej(xi), v(i-1,2)*t2,j*ej(xi) … v(i-k,1)*tk,j*ej(xi))
HMM Disadvantages There is a strong independence assumption Long term interactions are difficult to model Overlapping features are difficult to model
Conditional Random Fields (CRFs) Replace transition and emission probabilities with a set
of feature functions f(i,j,k) Feature functions based on all xs, not just one Not generative
f(3,0,1)
f(2,0,1)
f(1,0,1)
f(3,i,2)
f(2,i,2)
f(1,i,2)
f(3,i,3)
f(2,i,3)
f(1,i,3)
f(3,i,4)
f(2,i,4)
f(1,i,4)
x1 x2 x3 x4
Conditional Random Fields (CRFs)
HMM: Maximize
P(x,y|θ)=P(y|x,θ)P(x|θ) CRF: Maximize
P(y|x,θ) Do not make assumptions about
underlying distribution
Viterbi CRFs Same method as for HMM
f(3,0,1)
f(2,0,1)
f(1,0,1)
f(3,i,2)
f(2,i,2)
f(1,i,2)
f(3,i,3)
f(2,i,3)
f(1,i,3)
f(3,i,4)
f(2,i,4)
f(1,i,4)
x1 x2 x3 x4
Conditional Random Fields (CRFs) States should form a chain Likelihood function is convex for chain
Z0 = number of states
λk = weights
Segmented CRFs Each state corresponds to a structure Represented as a graph G
States represent secondary structures Nodes represent interactions Chains are nicer than graphs
Segmented CRFs G =<V,E1,E2>
E1: Edges between neighborsE2: Edges for long-term interactions
E1 edges can be implied in model
Only E2 needs to be explicitly considered
However Graph needs to be a chain for E2 Deterministic state transitions
Beta-Helix CRF
Beta-Helix CRF
Combined states B23: B2,B3,T2
Size assumptions: B23: 8 residues B1: 3 residues T1,T3: 1 to 80
res.
Intra-Node Features
Regular Expression Template for B23
FIFDDEAK
Φ – A,F,I,L,M,V,W,Y
– D,E,R,K
X - Any
Intra-Node Features
Probabilistic motif profiles for B23 and B1 Use HMMER to generate profiles from known
B23 and B1 sequences
Intra-Node Features
Secondary Structure PredictionPSIPREDHelps locate T1 and T376 to 78% accuracy for α-helixes and coils
Segment length for T1 and T3Estimated as density function
Inter-Node Features
Side chain alignment scoresAlignment between
B23 regionsMore weight given to
inward pairs
B3T2
B2
Inter-Node Features
Parallel Beta-sheet alignment scores
Distance between adjacent B23 segments
SCRF: Results
SCRF: Results
Summary
Discovered new beta-helix proteinSf6 gp14
Detected beta-helixes in plantsNone known of before
More robust than BetaWrap
Questions