A Bayesian Approach to Assessing Lab Proficiency With ... A Bayesian Approach to Assessing Lab...
Transcript of A Bayesian Approach to Assessing Lab Proficiency With ... A Bayesian Approach to Assessing Lab...
1
A Bayesian Approach to Assessing Lab Proficiency With Qualitative PCR Assays
Used to Detect Biotech Traits in Crop Seed
7th Seminar on Statistics in Seed Testing ISTA
August 2005
Kirk Remund
2
Overview
• Establish seed purity testing need• Establish lab proficency need• Lab proficency setup & challenges• Bayesian approach to overcome
challenges• Examples• Final comments/references
3
Testing Objectives• Test conventional crop seed for biotech trait
seed impurities (test seeds in pools)• Test biotech trait crop seed for purity (test
individual seeds)Assay Types• Detect gene event in seed or tissue (PCR)• Detect protein in seed or tissue (ELISA)• Focus of this talk is on qualitative assays
– Positive (+) = one or more seeds in pool are positive for trait
– Negative (-) = all seeds in pool are negative for trait
4
Seed Sampling/Testing Process
0.08%Starlink (Cry9c)Impurity = 0.05%
Seed Lot SampleSampling Error
< 0.19% 0.11%
SampleGrinding
PCR
DNAExtraction
Amplification& Scoring
Assay System
Tally Results& Draw Conclusion
5
Lab Proficency with “Assay System”
• Qualitative assay error metric– False positive rate (Fp): score a (-) sample as (+) – False negative rate (Fn): score a (+) sample as (-)– Do not confuse with Type I and Type II Errors!
• Properties– sensitivity, specificity, robustness– repeatability/reproducibility
• Lab must demonstrate error rates are low enough to achieve confidence in test results
6
Quantifying error rates important?0%
20%
40%
60%
80%
100%
0.00 0.05 0.10 0.15
Actual Lot Impurity (%)
Prob
abili
ty o
f Acc
eptin
g Lo
t
No errors 10% Fn rate 5% Fp rate 10% Fp rate
Test 10 pools of 300 seeds, C=0
7
Quantifying error rates important?0%
20%
40%
60%
80%
100%
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40
Actual Lot Impurity (%)
Prob
abili
ty o
f Acc
eptin
g Lo
t
No errors 5% Fp rate 5% Fn rate 10% Fn rate
Test 10 pools of 150, C=5
8
Lab Proficency Test Kit
• Kit Contents– Known positive and negative control samples– Blind (+) samples which could be individual (+)
seeds or seed pools [one (+) seed spiked into pool of numerous (-) seeds]
– Blind (-) samples which could be individual (-) seeds or seed pools [all (-) seeds]
• Lab runs blind kit samples over multiple days, equipment, lab technicians
9
Lab Proficency Testing Challenges
• PCR methods lack robustness to follow traditional method validation/handoff approach
• Qualitative assay = hundreds of samples required rather than tens to get reasonable estimates of errors
• Many assays (e.g., 100+) for lab to demonstrate proficency
• Difficult to find reference material with required purity• Much resources required to make test kits for all labs
doing testing (samples stability, accurate kit prep.)
10
• Error rates similar across assays in lab for same– Crop, tissue, pool size, method of detection
• Use historical or prior proficency data to help calculate error rates for present assay
Basis for a Bayesian Approach
0.37%
SampleGrinding
PCR
DNAExtraction
Amplification& Scoring
Assay System
11
Bayesian Approach Checks
• Process checklist: to ensure that lab process for present data is “same” used to generate historical data
• Test for differences between kit data to determine if a pooled or unpooled Bayesian approach is used (Fisher’s Exact Test, α = 0.1)
12
Flowchart of Approach
Present (P) Assay Data
H1 Data H2 Data Hk Data. . .
Each Hi from same process as P?
Can all Hi be pooled?
Can P be pooled with all Hi?
Unpooled Bayesianapproach to get errorrates and upper bounds
Pooled Bayesianapproach to get errorrates and upper bounds(noninformative prior)
noyes Disregard
Hi
noyes
noyes
Historical Data (Prior)Likelihood
13
Notation
x number of miscalls in present dataset
n number of samples in present dataset
yi number of miscalls in ith historical dataset
mi number of samples in ith historical dataset
misclassification rateθ
14
)σ,yg(β ),σ,yf(α
posteriorx)nβx,Beta(α~xθ
prior) β , α Beta( ~ θ
likelihoodθ)Bin(n,~θx
2y
2y ˆˆˆˆ
==
−++
Unpooled Bayesian Calculations
Method of Moments
15
Pooled Bayesian Calculations
posterior1)sN1,Beta(s~sθ priorUni(0,1)Beta(1,1)~θ
likelihoodθ)Bin(N,~θs
mnN yx sk
1ii
k
1ii
+−+
=
+=+= ∑∑==
(non-informative)
16
theta
prio
r den
sity
0.0 0.05 0.10 0.15 0.20
01
23
4
Pooled Bayesian ExampleDataset N* False
Neg.Error rate
P 50 2 4.0%
H1 30 2 6.7%
H2 50 0 0.0%
H3 100 2 2.0%
H4 90 5 5.6%
prior
posterior
theta hat 95% upper limit
Present data only (P) 4.0% 12.1%
P and all Hi 3.4% 5.6%
* N is number of blind positive samples
17
Unpooled Bayesian Example #1Dataset N False
Neg.Error rate
P 50 2 4.0%
H1 30 2 6.7%
H2 50 0 0.0%
H3 100 2 2.0%
H4 90 9 10.0%
prior
posterior
theta hat 95% upper limit
Present data only (P) 4.0% 12.1%
P and all Hi 4.2% 8.7%
H4 significantly different than other Hi (Fisher’s Exact Test, alpha=0.1)theta
prio
r den
sity
0.0 0.05 0.10 0.15 0.20
05
1015
20
18theta
post
erio
r den
sity
0.0 0.05 0.10 0.15 0.20
010
2030
40
Pooled PosteriorUnpooled Posterior
Compare Posterior Densities
19
theta
dens
ity
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
Unpooled Bayesian Example #2Dataset N False
Neg.Error rate
P 50 2 4.0%
H1 30 2 6.7%
H2 50 0 0.0%
H3 100 2 2.0%
H4 90 60 66.7%
prior
posterior
theta hat 95% upper limit
Present data only (P) 4.0% 12.1%
P and all Hi 4.1% 9.5%
Estimate and upper limit seem to low given H4 result?
20
Unpooled Bayesian Example #2
theta
dens
ity
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
PosteriorPrior - Beta(0.1,0.40)Jefferys Prior - Beta(0.5,0.5)
21
theta
dens
ity
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
6
Unpooled Bayesian Example #3Dataset N False
Neg.Error rate
P 50 15 30.0%
H1 30 2 6.7%
H2 50 0 0.0%
H3 100 2 2.0%
H4 90 60 66.7%
prior
posterior
theta hat 95% upper limit
Present data only (P) 30.0% 42.4%
P and all Hi 29.9% 40.1%
Estimates and upper limits close (prior has little weight)
22
Final Comments
• At present, lab proficency testing for PCR takes a very large amount of time and $$$
• Implementing this approach will likely yield significant cost savings
• Even Frequentists might agree: “this is a legitimate Bayesian application?”
• I welcome any comments/criticisms to help us determine if this is a reasonable approach
23
References
1. Carlin, B. P., Louis, T. A., Bayes & empirical Bayesmethods for data analysis, 2nd ed. Chapman & hall/CRC, New York, 2000
2. Lindley, D.V., Introduction to probability and statistics from a Bayesian viewpoint, part 2. Inference, 1st ed. Cambridge Press, Cambridge University.
3. Remund, K. M., Dixon, D. A., Wright, D. L., Holden, L. R., “Statistical considerations in seed purity testing for transgenic traits”, Seed Science Research, June 2001, pages 101-119. (e-mail [email protected] to receive copy)