1 Functional prediction in proteins (purifying and positive selection)
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Functional prediction in proteins (purifying and positive selection)
11
Functional prediction Functional prediction in proteins in proteins
(purifying and positive (purifying and positive selection)selection)
22
1. Introduction: evolution 1. Introduction: evolution & sequence analysis& sequence analysis
33
Darwin – the theory of natural Darwin – the theory of natural selectionselection
Adaptive evolutionAdaptive evolution::
Favorable traits will become more Favorable traits will become more frequent in the populationfrequent in the population
44
Adaptive evolutionAdaptive evolution
When natural selection favors a single allele When natural selection favors a single allele
and therefore allele frequency continuously and therefore allele frequency continuously
shifts in one directionshifts in one direction
55
Kimura – the theory of neutral Kimura – the theory of neutral evolutionevolution
Neutral evolutionNeutral evolution::
Most molecular changes have no effect Most molecular changes have no effect on the phenotype (neutral)on the phenotype (neutral)
Selection operates to Selection operates to preservepreserve a trait a trait (no change)(no change)
66
Purifying SelectionPurifying Selection
Stabilizes a trait in a population:Stabilizes a trait in a population:Small babies Small babies more illness more illness
Large babies Large babies more difficult birth… more difficult birth…
Baby weight is stabilized round 3-4 KgBaby weight is stabilized round 3-4 Kg
77
Purifying selectionPurifying selection (conservation) -(conservation) - the the molecular levelmolecular level
Histone 3Histone 3
88
Synonymous vs. non-synonymous substitutions
Purifying selection: excess of synonymous substitutions relative to non-synonymous substitutions
Synonymous substitution: GUUGUC
Non-synonymous substitution: GUUGCU
99
Synonymous vs. non-synonymous substitutions
Histone 3Histone 3
Non-synNon-syn.. SynSyn..
1010
Conservation as a means of Conservation as a means of predicting functionpredicting function
Infer the rate of evolution at each siteInfer the rate of evolution at each site
1111
Conservation as a means of Conservation as a means of predicting functionpredicting function
Low rate of evolution Low rate of evolution constraints on the site to constraints on the site to prevent disruption of function/structure: prevent disruption of function/structure: active sites, protein-protein interactions, protein active sites, protein-protein interactions, protein core etc.core etc. 11223344556677
HumanHumanDDMMAAAAHHAAMM
ChimpChimpDDEEAAAAGGGGCC
CowCowDDQQAAAAWWAAPP
FishFishDDLLAAAACCAALL
S. S. cerevisiaecerevisiae
DDDDGGAAFFAAAA
S. pombeS. pombeDDDDGGAALLGGEE
1212
Which site is more conserved?Which site is more conserved?
11223344556677
HumanHumanDDMMAAAAHHAAMM
ChimpChimpDDEEAAAAGGGGCC
CowCowDDQQAAAAWWAAPP
FishFishDDLLAAAACCAALL
S. S. cerevisiaecerevisiae
DDDDGGAAFFAAAA
S. pombeS. pombeDDDDGGAALLGGEE
1313
Use phylogenetic informationUse phylogenetic information 11223344556677
HumanHumanDDMMAAAAHHAAMM
ChimpChimpDDEEAAAAGGGGCC
CowCowDDQQAAAAWWAAPP
FishFishDDLLAAAACCAALL
S. S. cerevisiaecerevisiae
DDDDGGAAFFAAAA
S. pombeS. pombeDDDDGGAALLGGEEA
G
A
A
A
G
A
A
A
A
G
G
1414
ConSurf/ConSeq web servers:ConSurf/ConSeq web servers: Prediction of conserved residues by Prediction of conserved residues by
estimating evolutionary rates at each siteestimating evolutionary rates at each site
1515
Working processWorking processInput a protein Input a protein with a known 3D structurewith a known 3D structure
((PDB ID or file provided by the userPDB ID or file provided by the user))
Find homologous protein sequences )psi-blast(
Perform multiple sequence alignment )removing doubles(
Construct an evolutionary tree
Project the results on the 3D structure
Calculate the conservation score for each site
1616
ConSurf example: ConSurf example: potassium channel potassium channel
An integral membrane protein with sequence An integral membrane protein with sequence similarity to all known K+ channels, particularly similarity to all known K+ channels, particularly in the pore region. in the pore region.
PDB ID: 1bl8 chain A PDB ID: 1bl8 chain A
1717
ConSurf resultsConSurf results
1818
http://http://conseq.bioinfo.tau.ac.ilconseq.bioinfo.tau.ac.il//
ConSeq performs the same analysis as ConSeq performs the same analysis as ConSurf but presents the results on the ConSurf but presents the results on the sequence.sequence.
Predicts buried/exposed relation Predicts buried/exposed relation exposed & conserved exposed & conserved functionally important functionally important
sitessites buried & conserved buried & conserved structurally important sites structurally important sites
1919
2. Positive selection & drug 2. Positive selection & drug resistanceresistance
2020
Darwin – the theory of natural Darwin – the theory of natural selectionselection
Adaptive evolutionAdaptive evolution::
Favorable traits will become more Favorable traits will become more frequent in the populationfrequent in the population
2121
Adaptive evolution Adaptive evolution at the molecular levelat the molecular level
2222
Adaptive evolution Adaptive evolution at the molecular levelat the molecular level
Look for Look for changes changes
which confer which confer an advantagean advantage
2323
Naïve detectionNaïve detection
Observe a multiple sequence alignment:Observe a multiple sequence alignment:variable regions = adaptive evolution??variable regions = adaptive evolution??
2424
Naïve detectionNaïve detection The problem – how do we know which The problem – how do we know which
sites are not under any sites are not under any selectionselection pressure pressure (“non-important” sites) and which are (“non-important” sites) and which are underunder adaptive evolution adaptive evolution??
2525
Solution – we look at the DNASolution – we look at the DNA
synonymoussynonymous
non-non-synonymoussynonymous
2626
Solution – we look at the DNASolution – we look at the DNA
Purifying selectionSyn > Non-syn
Adaptive evolution = Positive selectionNon-syn > Syn
NeutralselectionSyn = Non-syn
2727
Also known as… Ka/Ks Also known as… Ka/Ks (or dn/ds, or (or dn/ds, or ωω) ) ratioratio
Purifying selection: Ka < Ks (Ka/Ks <1)Purifying selection: Ka < Ks (Ka/Ks <1) Neutral selection: Ka = Ks (Ka/Ks = 1)Neutral selection: Ka = Ks (Ka/Ks = 1) Positive selection: Ka > Ks (Ka/Ks >1)Positive selection: Ka > Ks (Ka/Ks >1)
Non-synonymous
substitution rate
Synonymous substitution
rate
2828
Examples for positive selectionExamples for positive selection
Proteins involved in the Proteins involved in the immune systemimmune system Proteins involved in Proteins involved in host-pathogen host-pathogen
interactioninteraction (‘arms-race’) (‘arms-race’) Proteins following Proteins following gene duplicationgene duplication Proteins involved in Proteins involved in reproductionreproduction systems systems
2929
Accumulation of substitutions (syn. or non-syn.) Accumulation of substitutions (syn. or non-syn.)
depends on the evolutionary time that elapsed depends on the evolutionary time that elapsed
since the divergence of the analyzed species. since the divergence of the analyzed species.
When distant species are analyzed saturation of syn.When distant species are analyzed saturation of syn.
substitutions is often encounteredsubstitutions is often encountered
Synonymous vs. non-synonymous substitutions
3030
Selecton – a server for the detection Selecton – a server for the detection of purifying and positive selectionof purifying and positive selection
http://selecton.bioinfo.tau.ac.il
Stern et al., Nucleic Acids Res 35, W506 (2007).
3131
Detecting drug resistance using Detecting drug resistance using SelectonSelecton
3232
HIV: molecular evolution paradigmHIV: molecular evolution paradigm
Rapidly evolving Rapidly evolving virus:virus:
1.1.High mutation High mutation rate (low rate (low fidelity of fidelity of reverse reverse transcriptase)transcriptase)
2.2.High High replication replication raterate
3333
Drug resistanceDrug resistance
No No drugdrug
DrugDrug
Adaptive evolution Adaptive evolution (positive selection)(positive selection)
3434
HIV ProteaseHIV Protease
Protease is an Protease is an essential essential enzymeenzyme for viral for viral
replicationreplication
Drugs against Drugs against Protease are Protease are
always part of always part of the “cocktail”the “cocktail”
3535
Ritonavir InhibitorRitonavir Inhibitor
Ritonavir (RTV) is a specific protease Ritonavir (RTV) is a specific protease inhibitor (drug)inhibitor (drug)
CC3737HH4848NN66OO55SS22
3636
Used Selecton to analyse HIV-1 protease Used Selecton to analyse HIV-1 protease gene sequences from patients that were gene sequences from patients that were treated with RTV treated with RTV onlyonly
3737
3838
Example: HIV ProteaseExample: HIV Protease
Primary mutationsPrimary mutations Secondary Secondary
mutationsmutations
novel novel predictions predictions (experimental (experimental validation)validation)
3939
Rate shifts and Rate shifts and HIV sub-typesHIV sub-types
4040
Rate shiftsRate shifts
V Chimp
V Rhesus
A Squirrel
K Rat
M Mouse
V Human
4141
Rate shiftsRate shifts
V
VA
K
M
V
Low evolutionary rateLow evolutionary rate
High evolutionary rateHigh evolutionary rate
4242
Rate shiftsRate shifts
Specificity determinants:Specificity determinants: Different phylogenetic groupsDifferent phylogenetic groups
V Chimp
V Rhesus
A Squirrel
A Rat
A Mouse
V HumanGain of Gain of function?function?
4343
Rate shiftsRate shifts
Specificity determinants:Specificity determinants: Following gene duplicationFollowing gene duplication
V S. paradoxus
V S. mikatae
A S. cervisiae
A S. paradoxus
A S. mikatae
V S. cervisiae
Tropomyosin 1Tropomyosin 1
Tropomyosin 2Tropomyosin 2
4444
Rate shifts in HIV subtypesRate shifts in HIV subtypes
4545
HIV subtypesHIV subtypes
4646
Which sites are responsible for the Which sites are responsible for the differences between the subtypesdifferences between the subtypes??
Detection of rate-shifts in all 9 subtypesDetection of rate-shifts in all 9 subtypes
4747
Significant rate shift in all HIV genesSignificant rate shift in all HIV genes
# rate-shift sites# rate-shift sitesproportionproportion
EnvEnv84840.10.1
GagGag20200.040.04
NefNef21210.170.17
PolPol33330.030.03
RevRev29290.250.25
TatTat13130.150.15
VifVif13130.070.07
VprVpr440.050.05
VpuVpu29290.350.35
4848
Gag Position12Gag Position12
Wild-type (Wild-type (EE)) Site which contributes to Site which contributes to
Protease Inhibitor (Amprenavir) Protease Inhibitor (Amprenavir) drug resistance (drug resistance (KK))
E
E
E
K
Q
R
K
K
4949
C
C
A
G
F
D
J
KK
EE
NN
RR
5050
SummarySummary
Sequence analysis can provide valuable Sequence analysis can provide valuable information about protein functioninformation about protein function
The basic signal: conservation:The basic signal: conservation:
http://http://consurf.tau.ac.ilconsurf.tau.ac.il Positive “Darwinian” selection: Positive “Darwinian” selection:
http://selecton.bioinfo.tau.ac.ilhttp://selecton.bioinfo.tau.ac.il Rate-shifts (specificity determinants)Rate-shifts (specificity determinants)