ABSTRACT Document: INTERACTIVE SONIFICATION OF ABSTRACT DATA
Abstract
description
Transcript of Abstract
Abstract Accurate determination of the molecular weight (MW) of a protein is an important step toward its isolation, purification and identification. Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) in one dimension with single percentage gels is traditionally used for that process. Gradient gels that incorporate a range of percentages have been considered less accurate, in part due to a lack of reliable mathematical models. The purpose of this project was to develop statistical models to accurately predict protein MW's on gradient gels. Six mathematical models were applied to protein standards of previously identified MW's to determine the best fitting model. Relative mobility (Rm) of the protein standards were calculated and compared to the actual MW's to make this determination. The "Cubic Model" was determined to be the best fitting and will be used to identify unknown proteins that may be involved in amphibian fertilization.
GoalTo determine which model provides the best fit for determining the known protein
standards
ConclusionsWe examined 6 mathematical models to relate relative mobility to the molecular weights of known protein standards. The cubic model was determined best by examining the predicted weights, residuals, and R-squared values for each of the models. Then this
model was used to estimate the molecular weights of the unknown proteins. FutureThe cubic model will be tested on proteins involved in frog fertilization. Other ways to reduce the error and improve the model will be studied.
4 Step Procedure
Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis1Jennifer Wright, 2Edward J. Carroll, Jr., and 1Lawrence Clevenson
Departments of 1Mathematics and 2BiologyCalifornia State University Northridge
NASA/PAIR Program
Fig. 1 Electrophoresis Gel of Raw Data
Relative Mobility vs. Molecular WeightPlot of Raw Data used in Determining the Models
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3.8 4 4.2 4.4 4.6 4.8 5 5.2Molecular Weight
Rel
ativ
e M
obili
ty
Male Frog 7.5
Female Frog#2: 7.5%
Sea Urchin #3:10%
Sea Urchin #4:10% rep.
Sea Urchin #1:12%
Sea Urchin #2:12% rep.
Jelly & Seminal#1
Jelly & Seminal#2
Fig. 2 – Graph of relative mobility of raw data vs. log molecular weights starting with two 7.5% gels, two 10%, two 12% and two gradient gels.
Fig. 3 Raw StandardsActual Molecular Weights vs. Predicted Molecular Weights of Standards
Residuals for Cubic Model log(MW) = a + b * Rm + c * Rm
2 + d * Rm3
GEL ID
Male Frog #1:
7.5%
Female Frog #2:
7.5%
Sea Urchin #3:
10%
Sea Urchin #4: 10%
Male Frog
#3: 12%
Female Frog #3:
12%
Sea Urchin #1: 12%
Semin. Plasma
#5
Semin. Plasma
#7
Jelly & Semin.
#1
Jelly & Semin.
#2
Jelly & Semin.
#3
R Squared 0.9981 0.9990 0.9996 0.9999 0.9991 0.9995 0.9993 0.9990 0.9986 0.9833 0.9871 0.9844
Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals0.001 0.001 0.001 0.001 0.006 0.007 0.004 0.012 0.013 0.024 0.025 0.0240.012 0.009 0.003 0.001 0.018 0.009 0.002 0.017 0.024 0.041 0.046 0.0370.017 0.012 0.009 0.004 0.014 0.001 0.016 0.004 0.007 0.038 0.035 0.0360.007 0.005 0.008 0.004 0.005 0.002 0.007 0.003 0.000 0.029 0.030 0.0240.001 0.000 0.003 0.002 0.005 0.012 0.008 0.012 0.010 0.054 0.049 0.041
0.001 0.000 0.003 0.008 0.009 0.008 0.007 0.024 0.028 0.0320.001 0.002 0.004 0.002 0.002 0.053 0.046 0.039
0.098 0.084 0.1050.098 0.078 0.095
Table 2: Residuals and R-squared values for the Cubic model. The red numbers are negative and black are positive.
Relative Mobility vs. Log Molecular Weight
y = -0.4775x + 2.705R2 = 0.9644
y = 0.2922x3 - 4.1443x2 + 18.975x - 27.493R2 = 0.9977
y = -0.154x2 + 0.9288x - 0.474R2 = 0.985
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4
Log Molecular Weight
Rela
tive
Mob
ility
Gel#2 VE& SE
TrendlineLog Linear
TrendlineCubic
TrendlineQuad.
Comparison of 3 models with a Standard
Fig. 4 One set of raw data (Gel #2 VE) is set against 3 of the models tested (Log Linear, Quad., Cubic).
Fig. 5 Raw Data
Thanks to:Carol Shubin, Virginia Latham, Larry Clevenson, Edward Carroll, Gregory Frye, John Handy, Jennifer Rosales, Alicia Maravilla and Celia Smith.
This work was supported by NASA CSUN/JPL PAIR. Grant #NASA-NCC5-489
Final Predicted Weights of Unknown Proteins Using Cubic Model
Table 3: The Cubic model was applied to unknown proteins to predict their molecular weights.
Molecular Weight (Daltons) Whole VEtime "0" Sup time "60" Sup time "0" Pel time "60" Pel
Band 1 139,917 138,672 124,966 126,508Band 2 46,835 77,697 74,554 75,113Band 3 41,750 46,911 45,578 45,819Band 4 39,590 41,388 39,888 40,138Band 5 39,359 37,468 37,646Band 6 22,038 35,588Band 7 23,302
Models Tested
Cubic Log(MW) = a + b * Rm + c * Rm2 + d * Rm
3 -LN2 Log(MW) = a + b * ( -Ln(Rm)) + c * ( -Ln(Rm))2 Log-Log Log(MW) = a + b * Log(Rm) + c * Log(Rm)2 Quad Log(MW) = a + b * Rm + c * Rm
2 Log Linear Log(MW) = a + b * (Rm) SLIC Log( Ln(MW)) = a + b * Ln( -Ln(Rm))
1. Analysis of standards in the gels.
2. Test models on known protein standards.
3. Decide on best fitting model.
4. Apply model to unknown proteins.
Determinations
1.) The R-Squared is good for most of the models, except for the SLIC model for which R-squared is a little low. R-squared is the ratio of predicted variation, (ûi - u)2, to the total variation, (ui - u)2 where ûi is the predicted value of ui for a particular model and u is the mean. The Cubic model produces the R-squared average with the closest fit of the 6 different models. Ideally, R-squared is equal to 1, meaning that the predicted values and the actual values are equal. 2.) The predictions of the MW are good for most of the models but the Cubic shows a smaller amount of variation.
3.) The residuals of the models show the differences between the actual data points and the predicted points. Examining the residuals (see example above) the Cubic model produces smaller residual values than the other 5 models.
TWO METHODS USED IN MEASURING GELS
1. SPOTFINDER - A computer program designed to find and measure the bands.
PROS - less error in measurements and quick. CONS - would count 2 bands close together as one and spots as bands.
2. ADOBE PHOTOSHOP
- Ruler in Adobe Photoshop to measure the location of the bands. PROS - can spot more bands by sight. CONS - error in measurements and more subjective.
Photoshop method was chosen for this project since the spotfinder would improperly find some of the bands.
0.8630.9490.9850.9890.9900.996R-Squared Ave.
9,21110,472 9,374 10,635 10,267 8,0076,500
11,49412,820 12,789 13,438 13,618 12,27714,400
17,90617,963 21,197 20,518 20,514 20,93321,500
36,59429,506 28,090 28,736 28,822 31,64731,000
57,33150,028 44,679 44,248 44,271 45,51945,000
81,44178,955 72,876 69,995 70,111 67,68966,200
97,197102,609 101,049 97,280 96,991 94,77597,400
107,140117,416 120,246 117,919 118,022 115,949116,250
144,849164,210 183,306 197,683 197,751 201,028200,000
Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW Predicted MWMolecular Weights
SLICLog LinearQuad.Log-Log-LN^2CubicActual
Table 1: Comparison of the 6 models and the R-squared values produced by each model.
Margin of Error
Fig. 5 Graph of Standards and upper/lower predicted confidence interval at 95%.
Actual MW of Standards Compared With Confidence Interval for Predicted MW
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
220,000
Myosin Beta-Galactosidase Phosphorylase B Serum albumin Ovalbumin CarbonicAnhydrase
Trypsin Inhibitor
Bands (name of standards)
Mol
ecul
ar W
eigh
t Standards
Lower ConfidenceLimitUpper ConfidenceLimit
GEL 1 Trypsin SupernatantStandard
Actual MW Pred MW
Low er Confidence
Limit
Upper Confidence
Limit200,000 202,292 187,866 217,825116,250 115,864 110,945 121,00097,400 95,109 90,856 99,56166,200 66,590 63,278 70,07545,000 45,999 43,361 48,79731,000 30,284 28,388 32,30721,500 21,744 20,136 23,480
Table 4: Actual and predicted molecular weights of standards with a 95% C.I..
Actual and Predicted Standards with Confidence Interval (C.I.)
Gel #4 Whole VEtime "0" sup
Pred MW
Low er Confidence
Limit
Upper Confidence
Limit139,917 134,326 145,74046,835 44,596 49,18541,750 39,458 44,17539,590 37,269 42,056
Table 5: The predicted molecular weights of unknown proteins involved in frog fertilization with a 95% C.I..
Unknown Predicted Weights with C.I.