Abstract

1
Abstract Accurate determination of the molecular weight (MW) of a protein is an important step toward its isolation, purification and identification. Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) in one dimension with single percentage gels is traditionally used for that process. Gradient gels that incorporate a range of percentages have been considered less accurate, in part due to a lack of reliable mathematical models. The purpose of this project was to develop statistical models to accurately predict protein MW's on gradient gels. Six mathematical models were applied to protein standards of previously identified MW's to determine the best fitting model. Relative mobility (R m ) of the protein standards were calculated and compared to the actual MW's to make this determination. The "Cubic Model" was determined to be the best fitting and will be used to identify unknown proteins that may be involved in amphibian fertilization. Goal To determine which model provides the best fit for determining the known protein standards Conclusions We examined 6 mathematical models to relate relative mobility to the molecular weights of known protein standards. The cubic model was determined best by examining the predicted weights, residuals, and R-squared values for each of the models. Then this model was used to estimate the molecular weights of the unknown proteins. Future The cubic model will be tested on proteins involved in frog fertilization. Other ways to reduce the error and improve the model will be studied. 4 Step Procedure Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis 1 Jennifer Wright, 2 Edward J. Carroll, Jr., and 1 Lawrence Clevenson Departments of 1 Mathematics and 2 Biology California State University Northridge NASA/PAIR Program Fig. 1 Electrophoresis Gel of Raw Data Relative Mobility vs. Molecular Weight Plot of Raw Data used in Determining the Models 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3.8 4 4.2 4.4 4.6 4.8 5 5.2 MolecularWeight R elative M obility M ale Frog 7.5 Fem ale Frog #2:7.5% Sea Urchin #3: 10% Sea Urchin #4: 10% rep. Sea Urchin #1: 12% Sea Urchin #2: 12% rep. Jelly & Sem inal #1 Jelly & Sem inal #2 Fig. 2 – Graph of relative mobility of raw data vs. log molecular weights starting with two 7.5% gels, two 10%, two 12% and two gradient gels. Fig. 3 Raw Standards Actual Molecular Weights vs. Predicted Molecular Weights of Standards Residuals for Cubic Model log(MW) = a + b * R m + c * R m 2 + d * R m 3 G EL ID Male Frog #1: 7.5% Fem ale Frog #2: 7.5% Sea U rchin #3: 10% Sea U rchin #4:10% Male Frog #3:12% Fem ale Frog #3: 12% Sea U rchin #1:12% Sem in. Plasm a #5 Sem in. Plasm a #7 Jelly & Sem in. #1 Jelly & Sem in. #2 Jelly & Sem in. #3 R Squared 0.9981 0.9990 0.9996 0.9999 0.9991 0.9995 0.9993 0.9990 0.9986 0.9833 0.9871 0.9844 R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals R es iduals 0.001 0.001 0.001 0.001 0.006 0.007 0.004 0.012 0.013 0.024 0.025 0.024 0.012 0.009 0.003 0.001 0.018 0.009 0.002 0.017 0.024 0.041 0.046 0.037 0.017 0.012 0.009 0.004 0.014 0.001 0.016 0.004 0.007 0.038 0.035 0.036 0.007 0.005 0.008 0.004 0.005 0.002 0.007 0.003 0.000 0.029 0.030 0.024 0.001 0.000 0.003 0.002 0.005 0.012 0.008 0.012 0.010 0.054 0.049 0.041 0.001 0.000 0.003 0.008 0.009 0.008 0.007 0.024 0.028 0.032 0.001 0.002 0.004 0.002 0.002 0.053 0.046 0.039 0.098 0.084 0.105 0.098 0.078 0.095 Table 2: Residuals and R-squared values for the Cubic model. The red numbers are negative and black are positive. R elative M obility vs.Log M olecular W eight y = -0.4775x + 2.705 R 2 = 0.9644 y = 0.2922x 3 -4.1443x 2 + 18.975x -27.493 R 2 = 0.9977 y = -0.154x 2 + 0.9288x -0.474 R 2 = 0.985 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 Log M olecular W eight R elative M obility G el#2 VE & SE Trendline Log Linear Trendline Cubic Trendline Q uad. Comparison of 3 models with a Standard Fig. 4 One set of raw data (Gel #2 VE) is set against 3 of the models tested (Log Linear, Quad., Cubic). Fig. 5 Raw Data Thanks to: Carol Shubin, Virginia Latham, Larry Clevenson, Edward Carroll, Gregory Frye, John Handy, Jennifer Rosales, Alicia Maravilla and Celia Smith. This work was supported by NASA CSUN/JPL PAIR. Grant #NASA-NCC5-489 Final Predicted Weights of Unknown Proteins Using Cubic Model Table 3: The Cubic model was applied to unknown proteins to predict their molecular weights. MolecularWeight(Daltons) W hole V E tim e "0" S up tim e "60" S up tim e "0" P el tim e "60" P el B and 1 139,917 138,672 124,966 126,508 B and 2 46,835 77,697 74,554 75,113 B and 3 41,750 46,911 45,578 45,819 B and 4 39,590 41,388 39,888 40,138 B and 5 39,359 37,468 37,646 B and 6 22,038 35,588 B and 7 23,302 M odelsTested Cubic Log(M W )= a + b * R m + c * R m 2 + d * R m 3 -LN 2 Log(M W )= a + b * (-L n(R m ))+ c * (-Ln(R m )) 2 Log-Log Log(M W )= a + b * Log(R m )+ c * Log(R m ) 2 Q uad Log(M W )= a + b * R m + c * R m 2 Log Linear Log(M W )= a + b * (R m ) SLIC Log(Ln(M W ))= a + b * Ln(-Ln(R m )) 1. Analysis of standards in the gels. 2. Test models on known protein standards. 3. Decide on best fitting model. 4. Apply model to unknown proteins. Determinations 1.) The R-Squared is good for most of the models, except for the SLIC model for which R-squared is a little low. R-squared is the ratio of predicted variation, (û i - u) 2 , to the total variation, (u i - u) 2 where û i is the predicted value of u i for a particular model and u is the mean. The Cubic model produces the R-squared average with the closest fit of the 6 different models. Ideally, R-squared is equal to 1, meaning that the predicted values and the actual values are equal. 2.) The predictions of the MW are good for most of the models but the Cubic shows a smaller amount of variation. 3.) The residuals of the models show the differences between the actual data points and the predicted points. Examining the residuals (see example above) the Cubic model produces smaller residual values than the other 5 models. TW O M ETH O D S U SED IN M EA SU R IN G G ELS 1. SPO TFIN D ER -A com puterprogram designed to find and m easure the bands. PRO S -lesserrorin m easurem entsand quick. CO N S -w ould count2 bandsclose togetherasone and spotsasbands. 2. A D O BE PH O TO SH O P -Rulerin A dobe Photoshop to m easure the location ofthe bands. PRO S -can spotm ore bandsby sight. CO N S -errorin m easurem entsand m ore subjective. Photoshop m ethod w aschosen forthisprojectsince the spotfinderw ould im properly find som e ofthe bands. 0.863 0.949 0.985 0.989 0.990 0.996 R-Squared Ave. 9,211 10,472 9,374 10,635 10,267 8,007 6,500 11,494 12,820 12,789 13,438 13,618 12,277 14,400 17,906 17,963 21,197 20,518 20,514 20,933 21,500 36,594 29,506 28,090 28,736 28,822 31,647 31,000 57,331 50,028 44,679 44,248 44,271 45,519 45,000 81,441 78,955 72,876 69,995 70,111 67,689 66,200 97,197 102,609 101,049 97,280 96,991 94,775 97,400 107,140 117,416 120,246 117,919 118,022 115,949 116,250 144,849 164,210 183,306 197,683 197,751 201,028 200,000 Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW Molecular Weights SLIC Log Linear Quad. Log-Log -LN^2 Cubic Actual Table 1: Comparison of the 6 models and the R-squared values produced by each model. Margin of Error Fig. 5 Graph of Standards and upper/lower predicted confidence interval at 95%. ActualM W ofStandards C om pared W ith C onfidence IntervalforPredicted M W 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 180,000 200,000 220,000 Myosin Beta-Galactosidase Phosphorylase B Serum album in Ovalbumin Carbonic A nhydrase Trypsin Inhibitor B ands (nam e ofstandards) MolecularW eight Standards LowerConfidence Limit UpperConfidence Limit G EL 1 Trypsin Supernatant Standard Actual M W Pred M W Low er Confidence Limit Upper Confidence Limit 200,000 202,292 187,866 217,825 116,250 115,864 110,945 121,000 97,400 95,109 90,856 99,561 66,200 66,590 63,278 70,075 45,000 45,999 43,361 48,797 31,000 30,284 28,388 32,307 21,500 21,744 20,136 23,480 Table 4: Actual and predicted molecular weights of standards with a 95% C.I.. Actual and Predicted Standards with Confidence Interval (C.I.) G el #4 W hole VE tim e "0" sup Pred M W Low er Confidence Limit Upper Confidence Limit 139,917 134,326 145,740 46,835 44,596 49,185 41,750 39,458 44,175 39,590 37,269 42,056 Table 5: The predicted molecular weights of unknown proteins involved in frog fertilization with a 95% C.I.. Unknown Predicted Weights with C.I.

description

Actual. Cubic. -LN^2. Log-Log. Quad. Log Linear. SLIC. Molecular Weights. Predicted MW. Predicted MW. Predicted MW. Predicted MW. Predicted MW. Predicted MW. Fig. 3 Raw Standards. 200,000. 201,028. 197,751. 197,683. 183,306. 164,210. 144,849. 116,250. 115,949. 118,022. - PowerPoint PPT Presentation

Transcript of Abstract

Page 1: Abstract

Abstract Accurate determination of the molecular weight (MW) of a protein is an important step toward its isolation, purification and identification. Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) in one dimension with single percentage gels is traditionally used for that process. Gradient gels that incorporate a range of percentages have been considered less accurate, in part due to a lack of reliable mathematical models.  The purpose of this project was to develop statistical models to accurately predict protein MW's on gradient gels. Six mathematical models were applied to protein standards of previously identified MW's to determine the best fitting model. Relative mobility (Rm) of the protein standards were calculated and compared to the actual MW's to make this determination. The "Cubic Model" was determined to be the best fitting and will be used to identify unknown proteins that may be involved in amphibian fertilization.

GoalTo determine which model provides the best fit for determining the known protein

standards

ConclusionsWe examined 6 mathematical models to relate relative mobility to the molecular weights of known protein standards. The cubic model was determined best by examining the predicted weights, residuals, and R-squared values for each of the models. Then this

model was used to estimate the molecular weights of the unknown proteins. FutureThe cubic model will be tested on proteins involved in frog fertilization. Other ways to reduce the error and improve the model will be studied.

4 Step Procedure

Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis1Jennifer Wright, 2Edward J. Carroll, Jr., and 1Lawrence Clevenson

Departments of 1Mathematics and 2BiologyCalifornia State University Northridge

NASA/PAIR Program

Fig. 1 Electrophoresis Gel of Raw Data

Relative Mobility vs. Molecular WeightPlot of Raw Data used in Determining the Models

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3.8 4 4.2 4.4 4.6 4.8 5 5.2Molecular Weight

Rel

ativ

e M

obili

ty

Male Frog 7.5

Female Frog#2: 7.5%

Sea Urchin #3:10%

Sea Urchin #4:10% rep.

Sea Urchin #1:12%

Sea Urchin #2:12% rep.

Jelly & Seminal#1

Jelly & Seminal#2

Fig. 2 – Graph of relative mobility of raw data vs. log molecular weights starting with two 7.5% gels, two 10%, two 12% and two gradient gels.

Fig. 3 Raw StandardsActual Molecular Weights vs. Predicted Molecular Weights of Standards

Residuals for Cubic Model log(MW) = a + b * Rm + c * Rm

2 + d * Rm3

GEL ID

Male Frog #1:

7.5%

Female Frog #2:

7.5%

Sea Urchin #3:

10%

Sea Urchin #4: 10%

Male Frog

#3: 12%

Female Frog #3:

12%

Sea Urchin #1: 12%

Semin. Plasma

#5

Semin. Plasma

#7

Jelly & Semin.

#1

Jelly & Semin.

#2

Jelly & Semin.

#3

R Squared 0.9981 0.9990 0.9996 0.9999 0.9991 0.9995 0.9993 0.9990 0.9986 0.9833 0.9871 0.9844

Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals Residuals0.001 0.001 0.001 0.001 0.006 0.007 0.004 0.012 0.013 0.024 0.025 0.0240.012 0.009 0.003 0.001 0.018 0.009 0.002 0.017 0.024 0.041 0.046 0.0370.017 0.012 0.009 0.004 0.014 0.001 0.016 0.004 0.007 0.038 0.035 0.0360.007 0.005 0.008 0.004 0.005 0.002 0.007 0.003 0.000 0.029 0.030 0.0240.001 0.000 0.003 0.002 0.005 0.012 0.008 0.012 0.010 0.054 0.049 0.041

0.001 0.000 0.003 0.008 0.009 0.008 0.007 0.024 0.028 0.0320.001 0.002 0.004 0.002 0.002 0.053 0.046 0.039

0.098 0.084 0.1050.098 0.078 0.095

Table 2: Residuals and R-squared values for the Cubic model. The red numbers are negative and black are positive.

Relative Mobility vs. Log Molecular Weight

y = -0.4775x + 2.705R2 = 0.9644

y = 0.2922x3 - 4.1443x2 + 18.975x - 27.493R2 = 0.9977

y = -0.154x2 + 0.9288x - 0.474R2 = 0.985

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4

Log Molecular Weight

Rela

tive

Mob

ility

Gel#2 VE& SE

TrendlineLog Linear

TrendlineCubic

TrendlineQuad.

Comparison of 3 models with a Standard

Fig. 4 One set of raw data (Gel #2 VE) is set against 3 of the models tested (Log Linear, Quad., Cubic).

Fig. 5 Raw Data

Thanks to:Carol Shubin, Virginia Latham, Larry Clevenson, Edward Carroll, Gregory Frye, John Handy, Jennifer Rosales, Alicia Maravilla and Celia Smith.

This work was supported by NASA CSUN/JPL PAIR. Grant #NASA-NCC5-489

Final Predicted Weights of Unknown Proteins Using Cubic Model

Table 3: The Cubic model was applied to unknown proteins to predict their molecular weights.

Molecular Weight (Daltons) Whole VEtime "0" Sup time "60" Sup time "0" Pel time "60" Pel

Band 1 139,917 138,672 124,966 126,508Band 2 46,835 77,697 74,554 75,113Band 3 41,750 46,911 45,578 45,819Band 4 39,590 41,388 39,888 40,138Band 5 39,359 37,468 37,646Band 6 22,038 35,588Band 7 23,302

Models Tested

Cubic Log(MW) = a + b * Rm + c * Rm2 + d * Rm

3 -LN2 Log(MW) = a + b * ( -Ln(Rm)) + c * ( -Ln(Rm))2 Log-Log Log(MW) = a + b * Log(Rm) + c * Log(Rm)2 Quad Log(MW) = a + b * Rm + c * Rm

2 Log Linear Log(MW) = a + b * (Rm) SLIC Log( Ln(MW)) = a + b * Ln( -Ln(Rm))

1. Analysis of standards in the gels.

2. Test models on known protein standards.

3. Decide on best fitting model.

4. Apply model to unknown proteins.

Determinations

1.) The R-Squared is good for most of the models, except for the SLIC model for which R-squared is a little low. R-squared is the ratio of predicted variation, (ûi - u)2, to the total variation, (ui - u)2 where ûi is the predicted value of ui for a particular model and u is the mean. The Cubic model produces the R-squared average with the closest fit of the 6 different models. Ideally, R-squared is equal to 1, meaning that the predicted values and the actual values are equal. 2.) The predictions of the MW are good for most of the models but the Cubic shows a smaller amount of variation.

3.) The residuals of the models show the differences between the actual data points and the predicted points. Examining the residuals (see example above) the Cubic model produces smaller residual values than the other 5 models.

TWO METHODS USED IN MEASURING GELS

1. SPOTFINDER - A computer program designed to find and measure the bands.

PROS - less error in measurements and quick. CONS - would count 2 bands close together as one and spots as bands.

2. ADOBE PHOTOSHOP

- Ruler in Adobe Photoshop to measure the location of the bands. PROS - can spot more bands by sight. CONS - error in measurements and more subjective.

Photoshop method was chosen for this project since the spotfinder would improperly find some of the bands.

0.8630.9490.9850.9890.9900.996R-Squared Ave.

9,21110,472 9,374 10,635 10,267 8,0076,500

11,49412,820 12,789 13,438 13,618 12,27714,400

17,90617,963 21,197 20,518 20,514 20,93321,500

36,59429,506 28,090 28,736 28,822 31,64731,000

57,33150,028 44,679 44,248 44,271 45,51945,000

81,44178,955 72,876 69,995 70,111 67,68966,200

97,197102,609 101,049 97,280 96,991 94,77597,400

107,140117,416 120,246 117,919 118,022 115,949116,250

144,849164,210 183,306 197,683 197,751 201,028200,000

Predicted MW Predicted MW Predicted MW Predicted MW Predicted MW Predicted MWMolecular Weights

SLICLog LinearQuad.Log-Log-LN^2CubicActual

Table 1: Comparison of the 6 models and the R-squared values produced by each model.

Margin of Error

Fig. 5 Graph of Standards and upper/lower predicted confidence interval at 95%.

Actual MW of Standards Compared With Confidence Interval for Predicted MW

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

220,000

Myosin Beta-Galactosidase Phosphorylase B Serum albumin Ovalbumin CarbonicAnhydrase

Trypsin Inhibitor

Bands (name of standards)

Mol

ecul

ar W

eigh

t Standards

Lower ConfidenceLimitUpper ConfidenceLimit

GEL 1 Trypsin SupernatantStandard

Actual MW Pred MW

Low er Confidence

Limit

Upper Confidence

Limit200,000 202,292 187,866 217,825116,250 115,864 110,945 121,00097,400 95,109 90,856 99,56166,200 66,590 63,278 70,07545,000 45,999 43,361 48,79731,000 30,284 28,388 32,30721,500 21,744 20,136 23,480

Table 4: Actual and predicted molecular weights of standards with a 95% C.I..

Actual and Predicted Standards with Confidence Interval (C.I.)

Gel #4 Whole VEtime "0" sup

Pred MW

Low er Confidence

Limit

Upper Confidence

Limit139,917 134,326 145,74046,835 44,596 49,18541,750 39,458 44,17539,590 37,269 42,056

Table 5: The predicted molecular weights of unknown proteins involved in frog fertilization with a 95% C.I..

Unknown Predicted Weights with C.I.