YFiler®Plus Sensitivity Study
-
Upload
human-identification -
Category
Science
-
view
126 -
download
8
description
Transcript of YFiler®Plus Sensitivity Study
Yfiler® Plus Sensitivity Study (early release kit)
Sonja Klein Mark Timken
Martin Buoncristiani
Cal DOJ Jan Bashinski DNA Lab
Presentation Outline
• What is a sensitivity study? • Pre-PCR sampling statistics • Results of the empirical study compared to
predicted results based on sampling statistics • How the results might help with interpretation
and provide guidance with addressing the SWGDAM Y STR guidelines
Yfiler® Plus from Applied Biosystems
• 27 locus Y STR kit – 3 known multi-copy loci
• DYS389, DYS385, DYF387S1 • 25 ul reaction volume • 30 cycles • 1 ng target template • 6 dye system (5 plus LIZ) • 3500
– 1.2kV 16sec. – 175 RFU Analytical Threshold
Yfiler® Plus Plot- 1 ng 10 new loci
What is a Sensitivity Study?
• Validation requirement • Typically:
– Replicate amps in a dilution series • Example: 2 ng to 16 pg (for a 1 ng target system)
– Positive control DNA or other known single source samples (extracted, purified DNA)
Some Sensitivity Study Goals
• Is the target template appropriate for system? – Cycle number, reaction volume, instrument/injection settings – Mid to low-mid range of CCD camera detection
• Over what range of input is signal linear? • Over what range of input are all peaks likely to be
detected? • What stochastic effects are observed?
– PHRs, dropout, (stutter) – Stochastic Threshold
• Multi-copy loci for Y STRs – Probability of dropout (for probabilistic genotyping)
• Null allele vs. drop-out for Y STRs
Sensitivity Study
• DNA Dilution Series Template (pg) # of replicates Loci # expected alleles
1000 2 27 54
500 2 27 54
250 4 27 108
125 8 27 216
62.5 14 27 378
31.25 14 27 378
15.6 14 27 378
7.8 14 27 378
72 amps 1944 alleles (loci)
Linearity
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000
RFU
Template (pg)
empirical avg PH
Peak Height Dispersion
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000
RFU
Template (pg)
empirical avg PH
empirical PHs
Peak Height Relative Stdev
0
20
40
60
80
100
120
7.8 15.6 31.25 62.5 125 250 500 1000
Stde
v PH
/Avg
PH
*100
(%CV
)
Template (pg)
Sources of Peak-Height Variation • Stochastic:
– Pre-PCR allelic sampling – PCR synthesis
• Systematic:
– Inter-locus imbalance – Preferential amplification – Cap-to-cap – Injection-to-injection – Degradation – Inhibition
Pre-PCR Stochastic Sampling Effects
• Publications pointing to pre-PCR sampling is the main source of peak height variance at low templates (for extracted DNA, i.e. dissociated alleles) – Walsh, et.al. CSH Genome Res. 1992 – Taberlet, et.al., NAR 1996 – Stenman and Orpana, Nature Biotechnology 2001 – Gill, et.al., NAR 2005 – Timken, et.al., FSIG 2014
• paper compared empirical results of 2372 with ID+ and MF to predicted results based on pre-PCR sampling alone
2001
“Assuming that the template molecules are evenly distributed in a solution of known concentration, the probability of a certain number of molecules to be present in an aliquot pipetted from this solution can be calculated according to the Poisson distribution”
Uniform Sampling (what we imagine) vs. Stochastic (Poisson) Sampling (what we get)
Assume a homogeneous DNA solution at 105.6 pg per 10 uL, i.e., 16 copies per 10 uL. This situation is depicted below as a uniform distribution of copies (black balls) in squares representing 10 uL volumes.
If we transfer 10 uL from this solution into a PCR tube, at a single locus with an average of 16 copies, what do stochastic sampling statistics (using the Poisson distribution) say that we’ll get?
22 copies ●●●●●●●●●●●●●●●●●●●●●●●●
11 copies ●●●●●●●●●●●●●●●●●●●●●●
18 copies ●●●●●●●●●●●●●●●●●●●●●●●●
13 copies ●●●●●●●●●●●●●●●●●●●●●●
Avg = 16 = λ Stdev=√λ
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Poisson Distribution
Poisson assumptions: 1. Alleles have the same average concentration (λ)
2. Are sampled equivalently
3. Are sampled independently
Poisson is a discrete probability distribution
Poisson properties: 1. The mean of the distribution is equal to λ 2. The variance is also equal to λ
•C.M. Grinstead, J.L. Snell, Introduction to Probability: Second Revised Edition, American Mathematical Society, Rhode Island, 1997.
𝑃 𝑋 λ =λ𝑋𝑒−λ
𝑋!
Uniform Sampling (what we imagine) vs. Stochastic (Poisson) Sampling (what we get)
Assume a homogeneous DNA solution at 105.6 pg per 10 uL, i.e., 16 copies per 10 uL. This situation is depicted below as a uniform distribution of copies (black balls) in squares representing 10 uL volumes.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
If we transfer 10 uL from this solution into a PCR tube, at a single locus with an average of 16 copies, what do stochastic sampling statistics (using the Poisson distribution) say that we’ll get?
22 copies ●●●●●●●●●●●●●●●●●●●●●●●●
11 copies ●●●●●●●●●●●●●●●●●●●●●●
18 copies ●●●●●●●●●●●●●●●●●●●●●●●●
13 copies ●●●●●●●●●●●●●●●●●●●●●●
Avg = 16 λ = 16 Stdev=√λ
The probability that we actually sample exactly 16 according to the Poisson distribution is:
𝑃 16 16 =1616 × 𝑒−16
16!
= 0.0992 (~9.9% of the time exactly 16 copies will be sampled)
We can also calculate a cumulative probability. For example, the probability we sample 16 or fewer copies with an average of 16 (calculated in Excel) is:
P(16 or fewer) = POISSON(16,16,TRUE) =0.566
Pre-PCR Sampling Statistics and Sensitivity Study with ID+/MF
Applying pre-PCR Sampling Statistics to Yfiler® Plus Sensitivity Study
• Test if Poisson sampling statistics can predict Yfiler® Plus peak height variance Need: – Accurate average starting copy number (λ) – Signal proportional to template (linear) – Dissociated alleles (extracted DNA)
NIST SRM® 2372A
• NIST Standard Reference Material (SRM® 2372A) – Single source male (white blood cell) extract that
was developed as a standard for Human DNA Quantitation*
– 57ng/ul concentration (known λ) – Most accurate template available
*Kline, et.al. Production and certification of NIST Standard Reference Material 2372 Human DNA Quantitation Standard, Anal. Bioanal. Chem. 394 (2009) 1183-1192.
Sensitivity Study 2372A
• DNA Dilution Series Template (pg) # of replicates Loci # expected alleles Avg starting copy #
1000 2 27 54 151.5
500 2 27 54 75.8
250 4 27 108 37.9
125 8 27 216 18.9
62.5 14 27 378 9.5
31.25 14 27 378 4.7
15.6 14 27 378 2.4
7.8 14 27 378 1.2
Assuming 6.6 pg per diploid cell
λ =
Linearity
RFU/copy = ~ 48
y = 47.66x - 33.196
0
2000
4000
6000
8000
10000
12000
0 20 40 60 80 100 120 140 160
RFU
Average Copy Number
empirical avg PH
Signal per starting copy #
49.8 46.8
50.2 50.9
41.2
46.4
41.2
48.9
0
10
20
30
40
50
60
7.8 15.6 31.25 62.5 125 250 500 1000
Aver
age
sign
al/s
tart
ing
copy
(RFU
)
Template (pg)
Detection Sensitivity Analytical Threshold (in copies)
𝐴𝐴𝑐 =𝐴𝐴𝑅𝑅𝑅
𝑅𝑅𝑅/𝑐𝑐𝑐𝑐
𝐴𝐴𝑐 =17548.9
𝐴𝐴𝑐 = 3.6
For reference, 2372 with ID+ (25ul volume, 28 cycles) on the 3500 at an AT of 175 RFU gave an ATc of 4.1 (175/42.4= 4.1).
𝐴𝐴𝑐𝑝 = 3.6 𝑐𝑐𝑐𝑐𝑒𝑐 𝑥6.6𝑐𝑝𝑐𝑐𝑐𝑐
~ 24 pg
Detection Sensitivity Analytical Threshold (in copies)
𝐴𝐴𝑐 =𝐴𝐴𝑅𝑅𝑅
𝑅𝑅𝑅/𝑐𝑐𝑐𝑐
𝐴𝐴𝑐 =17548.9
𝐴𝐴𝑐 = 3.6
For reference, 2372 with ID+ (25ul volume, 28 cycles) on the 3500 at an AT of 175 RFU gave an ATc of 4.1 (175/42.4= 4.1).
𝐴𝐴𝑐𝑝 = 3.6 𝑐𝑐𝑐𝑐𝑒𝑐 𝑥6.6𝑐𝑝𝑐𝑐𝑐𝑐
~ 24 pg
For repeated 24 pg amps, on average, half the alleles will be above the AT and half will be below.
STRBase http://www.cstl.nist.gov/strbase/ystrpos1.htm
STRBase http://www.cstl.nist.gov/strbase/ystrpos1.htm
Alleles on the Y chromosome are associated in terms of inheritance, but dissociated in terms of sampling from extracted DNA.
Sequences searched by using BLAT Human Genome Assembly (Feb. 2009)
STR Y Location DistanceDYS393 3.131E+06
1.140E+06DYS456 4.271E+06
2.590E+06DYS570 6.861E+06
1.920E+05DYS576 7.053E+06
8.148E+05DYS458 7.868E+06
3.502E+05DYS449 8.218E+06
2.080E+05DYS481 8.426E+06
2.240E+05DYS627 8.650E+06
8.719E+05DYS19 9.522E+06
4.581E+06DYS391 1.410E+07
STR Y Location DistanceDYS391 1.410E+07
2.772E+05DYS635 1.438E+07
8.711E+04DYS437 1.447E+07
4.826E+04DYS439 1.452E+07
9.680E+04DYS389 I 1.461E+07
DYS389 II 1.461E+073.257E+05
DYS438 1.494E+072.337E+06
DYS390 1.727E+074.512E+04
DYS518 1.732E+071.073E+06
DYS533 1.839E+07
STR Y Location DistanceDYS533 1.839E+07
3.506E+05Y GATA H4 1.874E+07
2.058E+06DYS385 a 2.080E+07
4.023E+04DYS385 b 2.084E+07
2.090E+05DYS460 2.105E+07
1.583E+06DYS392 2.263E+07
1.731E+06DYS448 2.436E+07
1.566E+06DYF387S1 2.593E+07
2.100E+06DYF387S1 2.803E+07
Sequences searched by using BLAT Human Genome Assembly (Feb. 2009)
STR Y Location DistanceDYS393 3.131E+06
1.140E+06DYS456 4.271E+06
2.590E+06DYS570 6.861E+06
1.920E+05DYS576 7.053E+06
8.148E+05DYS458 7.868E+06
3.502E+05DYS449 8.218E+06
2.080E+05DYS481 8.426E+06
2.240E+05DYS627 8.650E+06
8.719E+05DYS19 9.522E+06
4.581E+06DYS391 1.410E+07
STR Y Location DistanceDYS391 1.410E+07
2.772E+05DYS635 1.438E+07
8.711E+04DYS437 1.447E+07
4.826E+04DYS439 1.452E+07
9.680E+04DYS389 I 1.461E+07
DYS389 II 1.461E+073.257E+05
DYS438 1.494E+072.337E+06
DYS390 1.727E+074.512E+04
DYS518 1.732E+071.073E+06
DYS533 1.839E+07
STR Y Location DistanceDYS533 1.839E+07
3.506E+05Y GATA H4 1.874E+07
2.058E+06DYS385 a 2.080E+07
4.023E+04DYS385 b 2.084E+07
2.090E+05DYS460 2.105E+07
1.583E+06DYS392 2.263E+07
1.731E+06DYS448 2.436E+07
1.566E+06DYF387S1 2.593E+07
2.100E+06DYF387S1 2.803E+07
The minimum distance between the 27 Yfiler Plus loci is approximately 40,000 bases (except for DYS389I/II). High quality extracted DNA is ~10,000-25,000 bases so each Y locus should be dissociated and follow sampling statistics (with the exception of DYS389I/II).
Empirical vs. Predicted
• Results: – Comparison of the Yfiler® Plus empirical dropout
rate to the dropout rate predicted by pre-PCR sampling statistics
0
250
500
750
1000
1250
1500
1750
2000
0 25 50 75 100 125
RFU
Template (pg)
empirical avg PH
empirical PHs
Empirical Dropout: pg # alleles < 175 RFU Total expected alleles Fr(D) < 175 RFU 7.8 357 378 0.94
15.6 295 378 0.78 31.25 139 378 0.37 62.5 28 378 0.074 125 0 216 0
175
Empirical Frequency of Dropout at 175 RFU
Probability of Dropout at ATc of 3.6
pg copies NORM.DIST(ATc, λ, sqrtλ, TRUE) 7.8 1.18 0.99
15.6 2.36 0.79 31.25 4.73 0.30 62.5 9.47 0.028 125 18.94 2.11E-04
POISSON: A discrete probability distribution, e.g. 1, 2, 3, etc., so to simulate the continuous peak height data, we use a Normal approximation to the Poisson.
NORM.DIST(x, mean, stdev, TRUE)
NORM.DIST(ATc, λ, sqrtλ, TRUE) NORM.DIST(3.6,4.7,2.18,TRUE) = 0.306924 For example, starting with 31.25 pg, P(D) =
Fr(D) from Sensitivity Study vs. P(D) using λ and Sampling Statistics
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140
P(D
) (N
orm
al) o
r Fr(
D) Y
FP
Template (pg) in the PCR
P(D) Normal at ATc 3.6
Fr(D) at AT 175 RFU
Fr(D) from Sensitivity Study vs. P(D) using λ and Sampling Statistics
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140
P(D
) (N
orm
al) o
r Fr(
D) Y
FP
Template (pg) in the PCR
P(D) LR Analysis (1-18cells) at ATc 3.6
P(D) Normal at ATc 3.6
Fr(D) at AT 175 RFU
Logistic Regression: P(D) = 1
1+𝑒− 𝑎+𝑏𝑏 For ATc 3.6,
𝑎 = 3.029319 𝑏 = −0.11497
SWGDAM Interpretation Guidelines for Y-Chromosome STR Typing by Forensic DNA
Laboratories (Jan. 9, 2014) • 3.2.1 The laboratory should establish guidelines for the identification of
such null alleles. As an example, this guideline may be based on experimental studies designed to distinguish a null allele from an undetected allele resulting from low template amounts, DNA degradation or inhibition.
• 5.2 The laboratory should establish a stochastic threshold for known multi-copy Y-STR loci based on empirical data derived within the laboratory and specific to the quantitation and amplification systems (e.g., kits) and the detection instrumentation used. It is noted that a stochastic threshold may be established by assessing peak height ratios across any multi-copy locus in a dilution series of DNA amplified in replicate. The RFU value above which it is reasonable to assume that, at a given locus, allelic dropout of a sister allele has not occurred constitutes a stochastic threshold.
Yfiler® Plus Plot- 31 pg
?
?
Probability of Dropout vs. 2372 Average Peak Height Could Aid Null Allele Assessment
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500 600 700 800 900 1000
P(D
) (N
orm
al) o
r Fr(
D) Y
FP
2372 Average Peak Height (RFU)
P(D) LR Analysis (1-18cells) at ATc 3.6
P(D) Normal at ATc 3.6
Fr(D) at AT 175 RFU
PHRs for Multi-copy Loci
• PHRs for Y kits – Very few multi-copy loci to measure PHRs from
• For example, 10 amps with YFP yields only 10*2= 20 PHRs as compared to ~120 PHRs for 10 amps with ID+ (~12 het loci*10)
– Can use Poisson distribution to predict PHR dispersion
Poisson Generated PHRs
Stochastic Threshold for Multi-copy Loci
• Stochastic Threshold – Can set an ST relative to P(D) risk using logistic
regression curve at the ATc
Logistic Regression (Semi-Log Plot) (Normal approx. of Poisson) for multi-copy ST
Logistic Regression (Semi-Log Plot) (Normal approx. of Poisson) for multi-copy ST
17
A “peak height” of 17 copies has a “sister allele” dropout probability of 1 in 1000 (using ATc of 3.6). 17 copies x 48.9 RFU/copy= 831 RFU
Conclusions
A sensitivity study was conducted with NIST SRM® 2372A and Yfiler® Plus on the 3500. Peaks, on average, were shown to be proportional to input template amount. Pre-PCR sampling statistics predicted empirical dropout rates at the analytical threshold (of 175 RFU or 3.6 ATc). Amplification of an accurate standard can serve as a good starting point for characterizing a system’s sensitivity and estimating dropout probabilities, ST, and PHR ranges.
Limitations
• Repeat in the presence of large amounts of female DNA to verify RFU per copy values
• Systematic signal differences – run-to-run – Inter-color or inter-locus signal differences
• Predictions assume extracted, diluted DNA (100% dissociated model) – Poisson sampling will overestimate the variance
• if performing direct amps (associated alleles)