Protein Sequence Amino Acid Composition IEC RP HPLC Ancient Sequencing methods Modern Sequencing...
-
Upload
lenard-cannon -
Category
Documents
-
view
228 -
download
0
Transcript of Protein Sequence Amino Acid Composition IEC RP HPLC Ancient Sequencing methods Modern Sequencing...
Protein Sequence
Amino Acid Composition IEC RP HPLC
Ancient Sequencing methods Modern Sequencing methods Sequencing the Gene Then what?
Amino Acid Composition
1952 - Complete Acid Hydrolysis Ion Exchange Chromatography with programmed
buffer changes (~3 hr) Post-column derivatization with
Ninhydrin Fluorescamine
1980 - Complete Acid Hydrolysis Precolumn derivatization to Phenylthiohydantoins Reversed-Phase HPLC (~30 min)
Sequencing
Sanger Endgroup Analysis Modify the protein with fluorodinitrobenzene
(amines), aka FDNB, Sanger’s reagent. Alternative reagent, dansyl chloride, fluorescent. Hydrolyze protein Separate by TLC Identify N-terminal amino acid by Rf
Treat protein with Aminopeptidase Repeat until the end gets ragged Use proteolytic fragments for simplicity
Sequencing
Generate proteolytic fragments Use more than one protease in separate experiments
Trypsin cleaves after Arg and Lys residues Chymotrypsin cleaves after Phe, Tyr, Trp
Separate fragments (HV paper electrophoresis/HPLC) Sequence all peptides independently Assemble the sequence using overlap info
TrypsinChtr
Automated Sequencing
Use proteolytic fragments Sequence each peptide using automated
Edman Degradation Each Edman cycle removes one amino acid Converts it to PTH amino acid for HPLC
Assemble the sequence using overlap info
TrypsinChtr
N-Terminal Edman Degradation
N
C
S
C CN
C CN
O O
H2N
H
RH H
R''
HR'
+
C CN
C CN
O OH
R
N
H
HR''R'
H
N
C
S-
NC S
N C O-
O
NCC
NC
H R''R'
HRH H
NC S
N CC O
H HR
O
NCCH2N
R' R''
H
N C
S
C N
CH
H R
O
Attack onPhenylisothiocyanate
Peptide
Analino-thiazolinoneamino acid
Peptide N-1
+PTH-amino acidAbsorbs 260-275 nmRP-HPLC compatible
+ H+
Rearrangement
C-Terminal Edman Degradation
C CN
C C
O OH
RH
RHN
OH
HR'H3C C
O
OC
OH3C
H3C CO
OH
C CN
C C
O OH
RH
RHN
O
CO CH3
HR'
SCN
H3C CO
OH
C CN
C C
O OH
R
RHN
NHC
S
HR'
C C
OH
R
RHN
OH
NHHN
O
S
R'H
-
+H2O
Activation of carboxyl by acetic anhydride Attack by thiocyanate
Peptide N-1
TH-amino acid
-
Hydrolysis
Alternative Sequencing - MS
Use non-fragmenting ionization Electrospray Ionization + traditional mass Spec Matrix-assisted laser desorption-ionization + time-
of-flight mass spec (MALDI-TOF) Measures mass of mature, intact protein
and/or complexes
Sequencing the Gene
DNA synthesis in vitro requires Template (the DNA you want to sequence) Primer (complementary to region up stream of where you want to
sequence) Polymerase dXTP’s, Mg++
Primer pairs with template, free 3’-OH group ready for action
As dXTP’s basepair with template, the 3’-OH attacks the -phosphate of the dXTP, displacing PPi, making a phosphodiester, extending the nascent DNA chain by one base
The Polymerase Reaction
3’
AT
GC
CG
AT
AT
CG
CG
AT
TA
TA
AT
AT
5’
A C T A G A A T T C A
OH
T
PPP P P P P P P P P P P P
P P P P P P P P P P P P P P P P P P P P P P
OBase
OH
O P O P O P OCH 2
O O O
O O O
OBase
OH
OBaseR
O
O
OCH 2PO
5’
Elongation of a primer thatis base-paired with a templateRequires a free 3’-0H group
Di-deoxy Terminators
If 2’, 3’-dideoxy nucleoside triphosphates were used, the reaction would proceed for only one cycle because there would be no free 3’-OH group to attack the next dXTP
If a fraction of a percent of ONE 2’, 3’-dideoxy nucleoside triphosphate (say ddTTP) were used SOME polymer would be terminated EACH time that base was
incorporated, i.e., each time dA occurs in the template. If 1/1000th of the dTTP were ddTTP, then 1/1000th of the polymers
would terminate at each dA in the template… the rest would continue You would get many polymers of different sizes, each corresponding to
the occurrence of a dA in the template Use four separate reactions, one with ddTTP, one with ddATP, one
with ddGTP, and one with ddCTP (and all other components) One of the reaction mixtures would contain a polymer that
terminated at each base
Dideoxy Terminators
Use fluorescent or radioactive primer so you can see every polymer
Separate them by size (gel electrophoresis)
Read sequence of polymers from gel
Infer the sequence of the template by Watson-Crick
ddATP ddTTP ddCTP ddGTP
Agarose gel
smal
l
larg
e
Bas
e in
pol
ymer
Seq
uenc
e of
tem
plat
e
3’ATGTCACAGGACAGA5’
5’TACAGTCTCCTGTCT3’
A, T, G, and C. What are the Amino Acids?Standard Genetic Code
First/Second U C A G
U
UUU PheUUC Phe
UUA LeuUUG Leu
UCU SerUCC Ser
UCA SerUCG Ser
UAU TyrUAC Tyr
UAA ***UAG ***
UGU CysUGC Cys
UGA ***UGG Trp
C
CUU LeuCUC Leu
CUA LeuCUG Leu
CCU ProCCC Pro
CCA ProCCG Pro
CAU HisCAC His
CAA GlnCAG Gln
CGU ArgCGC Arg
CGA ArgCGG Arg
A
AUU IleAUC Ile
AUA IleAUG Met
ACU ThrACC Thr
ACA ThrACG Thr
AAU AsnAAC Asn
AAA LysAAG Lys
AGU SerACC Ser
AGA ArgAGG Arg
G
GUU ValGUG Val
GUA ValGUG Val
GCU AlaGCC Ala
GCA AlaGCG Ala
GAU AspGAC Asp
GAA GluGAG Glu
GGU GlyGGC Gly
GGA GlyGGG Gly
ORFs - Look for longest uninterrupted sequence
So, you’ve got the sequence…So what?
Next topic: BioinformaticsInferences based on homology
Questions1. Has the gene been sequenced before? (Will I be able to publish?)2. What is the sequence of the protein encoded by the gene?3. Has the protein been sequenced before?4. Is the gene similar to one that has been sequenced before?
1. Did I sequence the right gene?2. Will I be able to find structural or functional relatives?
5. Is the protein similar to one that has been sequenced before?1. How similar?2. What does the similarity mean?
6. Can I predict the function of the gene product, or is the predicted function consistent with what I know about the protein?
7. Can I get information about structural features of the gene product?1. Secondary structure2. Folding domains or other common patterns3. Hydropathy profiles
1. How might predicted helices and/or sheet pack?2. Is it likely to be a membrane protein, a transmembrane protein?
Answers: Sequence Similarities and Similarity Searches
1. Search sequence databases for homologous proteins.2. Find families of proteins that are similar to your protein. 3. Use information about the structure and properties of
the similar protein(s) to establish inferences about your protein. If the exact sequence is in the database, the similarity search routines will find that, too.
4. Determine whether two sequences are related (or identical) by aligning them so that homologous regions are adjacent.
5. For two identical sequences:
MGKARSMVLKHSTKARSMGKARSMVLKHSTKARS
But, what about:Imperfect homology
MGKARSMLLKHSTKARS
MGKARTMVLKHSTRARS
Gaps/insertions
MGKARSMLLKHSLKARS
MGRA LKHSLRART
And, how homologous is homologous
Need
Similarity scores for pairs amino acids Method for dealing with gaps Algorithms for comparing a sequence
with a database Ways to assess the degree of homology Ways to link structural info with
sequence info
Dynamic Programming
Needleman-Wunsch Algorithm
Compares similarity of two proteins a & b at positions i & j:
NWi,j = max(NWi-1, j-1 + s(aibj); NWi-1, j; +g; NWi, j-1 +g)
NWi-1, j-1 = running total
s(aibj)= similarity between residue i of protein a and residue j of protein b
g = gap penaltyhttp://www.avatar.se/molbioinfo2001/dynprog/dynamic.html
Fill a Matrix with all possibilitiesFill a Matrix with all possibilitiesSimple example: s = 1,0 and g = 0
Smith-Waterman Always compare NW terms to zero so
that it doesn’t get too small.
NWi,j = max (NWi-1, j-1 + s(aibj); NWi-1, j; + g; NWi, j-1 + g; 0)
BLAST & FASTA FASTA - great, we won’t talk about it
much faster and more selective than SW, but less sensitive
Basic Local Alignment Search Tool less selective and more sensitive than
FASTA, i.e., you may get more hits, but some of
them may be wrong
BLAST Divide sequence into “words” of length W (eg.
BLASTp, initial W = 3) Compare all W-length words
Retain only pairs with similarity above a threshold,T
Call them High-Scoring Pairs Increase W, repeat with HSPs Keep going
remaining above a minimum similarity, and compare to random probability (E)
Scoring Matrices- Making similarity quantitative
Compare the actual frequency to the frequency expected by chance alone.
Probablilty that alanine appears at position x in a protein = fraction of Ala in all proteins pAla
Probability that one protein has Ala at position x, and another protein has Gly? =pAlapGly
The frequency due to chance, alone.
Similarity
qAla,Gly = ACTUAL frequency that Ala and Gly are at position x in two proteins (in your database)
Ri,j = qi,j/pipj
Score: Si,j = log2(Ri,j) = log2(qi,j/pipj) “Log-Odds Scores” Remember Chou & Fasman?
€
=1
λ
⎛
⎝ ⎜
⎞
⎠ ⎟log
qi, j
pip j
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
PAM Matrices Margaret Dayhoff assembled the Atlas of Protein
Structure Evolutionarily-accepted mutations Calculated qi,j for all aa’s in closely-related proteins These were accepted by Nature as similar/close
enough Generate half matrices: Point Accepted
Mutation/Percent Accepted Mutations Scale, so PAM1 reflects 1 mutation per 100 residues,
PAM50, 50 allowed mutation/100
BLOSUM Henikoff and Henikoff BLOcks of Amino Acid SUbstitution
Matrix BLOCKS is a database of related
proteins
BLAST Search Go to BLAST Website Enter Nucleotide or AA sequence Choose BLAST type
Nucleotide-nucleotide; BLASTn Protein-protein, BLASTp 6-frame-translated nucleotide-
Protein:BLASTx others
Then? Does it make sense? Multisequence Alignment Secondary structure prediction Domains Families
Caveat
It ain't what you don't know that'll kill you,
it's what you know that ain't so.