Handling Data and Figures of Merit Data comes in different formats time Histograms Lists But…. Can...
-
Upload
marjorie-berry -
Category
Documents
-
view
214 -
download
0
Transcript of Handling Data and Figures of Merit Data comes in different formats time Histograms Lists But…. Can...
Handling Data and Figures of Merit
Data comes in different formatstimeHistogramsLists
But….Can contain the same information about quality
What is meant by quality?
(figures of merit)Precision, separation (selectivity), limits of detection,Linear range
day weight day weight day weight
1 140 31 143.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2
10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9
My weight
Plot as a function of time data was acquired:
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
Do not use curved lines to connect data points – that assumes you know more about the relationship of the data than you really do
Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16)
day weight day weight day weight1 140 31 143.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2
10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9
Bin refers to what groups of weight to cluster. LikeA grade curve which lists number of students who got between 95 and 100 pts95-100 would be a bin
Assume my weight is a single, random, set of similar data
0
5
10
15
20
25
Weight (lbs)
# o
f O
bse
rvat
ion
sMake a frequency chart (histogram) of the data
Create a “model” of my weight and determine averageWeight and how consistent my weight is
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
0
5
10
15
20
25
Weight (lbs)
# o
f O
bse
rvat
ion
s
= measure of the consistency, or similarity, of weights
average143.11
s = 1.4 lbs
Inflection pt
s = standard deviation
Characteristics of the Model Population(Random, Normal)
Peak height, APeak location (mean or average), Peak width, W, at baselinePeak width at half height, W1/2
Standard deviation, s, estimates the variation in an infinite population,
Related concepts
f xA
ex
2
1
2
2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
Am
pli
tud
e
Width is measuredAt inflection point =s
W1/2
Triangulated peak: Base width is 2s < W < 4s
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
Am
pli
tud
e
+/- 1s
Area +/- 2s = 95.4%
Area +/- 3s = 99.74 %
pp s~ 6
Pp = peak to peak – or – largest separation of measurements
Peak to peak is sometimesEasier to “see” on the data vs time plot
Area = 68.3%
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
Peak topeak
pp s~ 6
139.5
144.9
s~ pp/6 = (144.9-139.5)/6~0.9
(Calculated s= 1.4)
0
5
10
15
20
25
Weight (lbs)
# o
f O
bse
rvat
ion
s
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
Am
pli
tud
e
Scale up the first derivative and second derivative to see better
There are some other important characteristics of a normal (random)population
1st derivative2nd derivative
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
Am
pli
tud
ePopulation, 0th derivative
1st derivative,Peak is at the inflection Determines the std. dev.
2nd derivativePeak is at the inflectionOf first derivative – shouldBe symmetrical for normalPopulation; goes to zero at Std. dev.
Asymmetry can be determined from principle component analysis
A. F. (≠Alanah Fitch) = asymmetric factor
Is there a difference between my “baseline” weight and school weight?Can you “detect” a difference? Can you “quantitate” a difference?
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
Vacation
School Begins
Baseline
Comparing TWO populations of measurements
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
Exact same information displayed differently, but now we divideThe data into different measurement populations
baseline
school
Model of the data as two normal populations
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
Average Baseline weight
Average schoolweight
Standard deviationOf baseline weight
Standard deviationOf the school weight
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
0
5
10
15
20
25
Weight (lbs)
# o
f O
bse
rvat
ion
s
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
We have two models to describe the population of measurementsOf my weight. In one we assume that all measurements fall into a single population. In the second we assume that the measurementsHave sampled two different populations.
Which is the better model?How to we quantify “better”?
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
Compare how closeThe measured dataFits the model
Did I gain weight?
The red bars represent the differenceBetween the two population model andThe data
The purple lines representThe difference betweenThe single populationModel and the dataWhich modelHas less summeddifferences?
This process (summing of the squares of the differences)Is essentially what occurs in an ANOVA
Analysis of variance
Normally sum the square of the difference in order to account forBoth positive and negative differences.
In the bad old days you had to work out all the sums of squares.In the good new days you can ask Excel program to do it for you.
Anova: Single Factor5% certaintySUMMARY
Groups Count Sum Average VarianceColumn 1 12 277.41 23.1175 8.70360227Column 2 12 345.72 28.81 6.50010909
ANOVASource of Variation SS df MS F P-value F critBetween Groups 194.4273 1 194.4273 25.5762995 4.59E-05 4.300949Within Groups 167.2408 22 7.601856 Source of Variation
Total 361.6682 23
Test: is F<Fcritical? If true = hypothesis true, single population if false = hypothesis false, can not be explained
by a single population at the5% certainty level
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
Fre
qu
ency
White, N=12, Sum sq diff=0.037, stdev=2.55 White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
0
0.05
0.1
0.15
0.2
0.25
0.3
14 19 24 29 34 39
Length, cm
Fre
qu
ency
N=24 Sum sq diff=0.0449, stdev=3.96N=78, sum sq diff=0.108, stdev=4.05
In an Analysis of Variance you test the hypothesis that the sample isBest described as a single population.1. Create the expected frequency (Gaussian from normal error curve)2. Measure the deviation between the histogram point and the expected
frequency3. Square to remove signs4. SS = sum squares5. Compare to expected SS which scales with population size6. If larger than expected then can not explain deviations assuming a
single population
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
Fre
qu
ency
White, N=12, Sum sq diff=0.037, stdev=2.55 White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
0
0.05
0.1
0.15
0.2
0.25
0.3
14 19 24 29 34 39
Length, cm
Fre
qu
ency
N=24 Sum sq diff=0.0449, stdev=3.96N=78, sum sq diff=0.108, stdev=4.05
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
15 17 19 21 23 25 27 29 31 33 35
Length (cm)
Sq
uar
e D
iffe
ren
ce E
xpec
ted
Mea
sure
d
The square differencesFor an assumption ofA single populationIs larger than forThe assumption ofTwo individual populations
There are other measurements which describe the two populations
Resolution of two peaks
Rx xW W
a b
a b
2 2
Mean or average
Baseline width
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 1.5 2 2.5 3 3.5 4
x
Sig
nal
xa xb
x xa b
W a
2W b
2
In this example
W Wa b
2 2
Peaks are baseline resolved when R > 1R x xW W
a ba b 1
2 2:
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1 1.5 2 2.5 3 3.5 4
x
Sig
nal
xa xb
x xa b
W a
2W b
2
In this example
W Wa b
2 2
Peaks are just baseline resolved when R = 1
R x xW W
a ba b 1
2 2:
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 1.5 2 2.5 3 3.5 4
x
Sig
nal
xa xb
x xa b
W a
2W b
2
In this example
W Wa b
2 2
Peaks are not baseline resolved when R < 1
R x xW W
a ba b 1
2 2:
2008 Data
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
Fre
qu
ency
White, N=12, Sum sq diff=0.037Red, N=12, Sum sq diff=0.11
What is the R for this data?
x W Wp R W 12
R 1
Visually less resolved Visually better resolved
Comparison of 1978 Low Lead to 1979 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160Series2 Series3
Comparison of 1978 Low Lead to 1978 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160IQ Verbal
% M
easu
red
Anonymous 2009 student analysis of Needleman data
W
W
a
b
211 2 7 0 4 2
21 3 0 9 5 3 5
~ ~
~ ~R
x xW W
a b
a b
2 2
Visually less resolved Visually better resolved
Comparison of 1978 Low Lead to 1979 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160Series2 Series3
Comparison of 1978 Low Lead to 1978 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160IQ Verbal
% M
easu
red
Anonymous 2009 student analysis of Needleman data
W
W
a
b
211 2 7 0 4 2
21 3 0 9 5 3 5
~ ~
~ ~
x xa b ~ ~11 2 9 5 1 7R
x xW W
a b
a b
2 2
1 7
4 2 3 50 2 2~ .
Other measures of the quality of separation of the Peaks
1. Limit of detection2. Limit of quantification3. Signal to noise (S/N)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
e
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
e
3s
X blank
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
e
3s
X limit of detection
x x sLOD blank b lank 3
99.74%Of the observationsOf the blank will lie below the mean of theFirst detectable signal (LOD)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
e
3s
Two peaks are visible when all the data is summed together
Estimate the LOD (signal) of this data
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
Other measures of the quality of separation of the Peaks
1. Limit of detection2. Limit of quantification3. Signal to noise (S/N)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
ex x sLOQ blank b lank 9 Your book suggests 10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
e
9s
Limit of quantification requires absolute Certainty that no blank is part of the measurement
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
Am
pli
tud
e
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
Estimate the LOQ (signal) of this data
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weigh
t (lbs
)
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
Other measures of the quality of separation of the Peaks
1. Limit of detection2. Limit of quantification3. Signal to noise (S/N)
Signal = xsample - xblank
Noise = N = standard deviation, s
S
N
x x
s
x x
ppsam ple b lank sam ple b lank
6
Estimate the S/N of this data
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
weig
ht (l
bs)
Vacation
School Begins
Baseline
Signal
Peak to peak variation within mean school ~ 6s where s = N for Noise
(This assumes pp school ~ pp baseline)
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30
Sample number
len
gth
(cm
)
Can you “tell” where the switch betweenRed and white potatoes begins?
What is the signal (length of white)?What is the background (length of red)?What is the S/N ?
Effect of sample size on the measurement
Error curvePeak height grows with # of measurements.+ - 1 s always has same proportion of total number of measurements
However, the actual value of s decreases as population grows
ss
nsam ple
popu la tion
sam ple
22.5
23
23.5
24
24.5
25
25.5
26
26.5
27
0 2 4 6 8 10 12 14
Sample number
Red
Ru
nn
ing
Len
gth
Ave
rag
e
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Red
Ru
nn
ing
Std
ev
2008 Data
y = -0.8807x + 5.9303
R2 = 0.9491
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4.1
1.5 2 2.5 3 3.5 4
sqrt number of samples
std
ev r
ed le
ng
th c
m
ss
nsam ple
popu la tion
sam ple
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
Fre
qu
ency
White, N=12, Sum sq diff=0.037, stdev=2.55 White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
Calibration Curve
A calibration curve is based on a selected measurement as linearIn response to the concentration of the analyte.
Or… a prediction of measurement due to some changeCan we predict my weight change if I had spent a longer time on Vacation?
bxay
vacationondaysbalbsfitch
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o
f O
bse
rvat
ion
s
vacationondaysbalbsfitch
5 days
The calibration curve contains information about the sampling Of the population
y = 0.3542x + 140.04
R2 = 0.7425
139
139.5
140
140.5
141
141.5
142
142.5
143
0 1 2 3 4 5 6
Days on Vacation
Fit
ch W
eig
ht,
lbs
Can get this by using “trend line”
y = -0.8807x + 5.9303
R2 = 0.9491
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4.1
1.5 2 2.5 3 3.5 4
sqrt number of samples
std
ev r
ed le
ng
th c
mThis is just a trendlineFrom “format” data Sample sqrt(#samples) stdev
1 1 #DIV/0!2 1.414213562 2.0364683 1.732050808 4.4757274 2 4.314415 2.236067977 3.8440456 2.449489743 3.8446047 2.645751311 3.7351248 2.828427125 3.4584149 3 3.23505510 3.16227766 3.09305311 3.31662479 2.93594412 3.464101615 2.950187
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.296113395R Square 0.087683143Adjusted R Square -0.013685397Standard Error 0.703143388Observations 11
ANOVAdf SS MS F Significance F
Regression 1 0.427662048 0.427662 0.864994 0.376617
Residual 9 4.449695616 0.494411Total 10 4.877357664
Coefficients Standard Error t Stat P-value Lower 95%Intercept 3.884015711 0.514960076 7.542363 3.53E-05 2.719094X Variable 1 -0.06235252 0.067042092 -0.93005 0.376617 -0.21401
Using the analysisData pack
Get an errorAssociated withThe intercept
In the best of all worlds you should have a series of blanksThat determine you’re the “noise” associated with the background
x x sLOD blank b lank 3
Sometimes you forget, so to fall back and punt, estimateThe standard deviation of the “blank” from the linear regression
But remember, in doing this you are acknowledgingA failure to plan ahead in your analysis
x x b conc LODLOD blank [ . ]
[ . ]conc LODs
bb lank
3
Extrapolation of the associated errorCan be obtained from the LinearRegression data
Sensitivity (slope)
x x sLOD blank b lank 3x s x b conc LODb lank b lank b lank 3 [ . ]
The concentration LOD depends on BOTHStdev of blank and sensitivity
Signal LOD
!!Note!! Signal LOD ≠ Conc LODWe want Conc. LOD
-350
-300
-250
-200
-150
-100
-50
0
024681012
pH or pM
mV
y = -31.143x - 74.333
R2 = 0.9994
-350
-300
-250
-200
-150
-100
-50
0
024681012
pH or pM
mV
y = -31.143x - 74.333
R2 = 0.9994
-350
-300
-250
-200
-150
-100
-50
0
024681012
pH or pM
mV
y = -31.143x - 74.333
R2 = 0.9994
y = -41x - 118.5
R2 = 0.9872
-350
-300
-250
-200
-150
-100
-50
0
024681012
pH or pM
mV
Difference in slope is one measure selectivity
In a perfect method the sensing device would have zeroSlope for the interfering species
Selectivity
Pb2+
H+
Limit of linearity
5% deviation
Summary: Figures of Merit Thus far
R = resolutionS/NLOD = both signal and concentrationLOQLOLSensitivity (calibration curve slope)Selectivity (essentially difference in slopes)
Can be expressed in terms of signal, but betterExpression is in terms of concentration
Tests: Anova
Why is the limit of detection important?
Why has the limit of detection changed so much in theLast 20 years?
The End
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o
f M
easu
rem
ents
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o
f M
easu
rem
ents
Which of these two data sets would be likelyTo have better numerical value for theAbility to distinguish between two differentPopulations?
Needleman’s data
2008 Data
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
Fre
qu
ency
White, N=12, Sum sq diff=0.037Red, N=12, Sum sq diff=0.11
Height for normalized Bell curve <1
Which population is more variable?How can you tell?
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
Fre
qu
ency
White, N=12, Sum sq diff=0.037, stdev=2.55 White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
Increasing the sample size decreases the std dev and increases separationOf the populations, notice that the means also change, will do so untilWe have a reasonable sample of the population
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o
f M
easu
rem
ents
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o
f M
easu
rem
ents