Cybergenetics © 2007-2011 1
Taming Uncertainty inForensic DNA Evidence
ENFSI MeetingENFSI MeetingApril, 2011April, 2011
Brussels, BelgiumBrussels, Belgium
Mark W Mark W PerlinPerlin, PhD, MD, PhD, PhD, MD, PhDCybergenetics, Pittsburgh, PACybergenetics, Pittsburgh, PA
Cybergenetics © 2003-2011Cybergenetics © 2003-2011
Uncertainty
•• people (and the law) want people (and the law) want certaintycertainty•• scientific data are scientific data are uncertainuncertain
•• probabilityprobability describes uncertainty describes uncertainty•• informationinformation is is a change in probabilitya change in probability
•• DNA identification is an information science DNA identification is an information science•• goal is goal is maximal objective informationmaximal objective information
Genotype
?
Prior probabilityPrior probability
10, 1010, 1110, 1210, 1311, 1111, 1212, 1313, 13 …
unknown personpopulation
Cybergenetics © 2007-2011 2
Genotype
?
Likelihood functionLikelihood function
10, 1010, 1110, 1210, 13biological father
child
mother
Genotype
?10, 1010, 1110, 1210, 13
Posterior probabilityPosterior probability
Objective Bayesian inference posterior = prior x likelihood
biological father
child
mother
Match
?10, 1010, 1110, 1210, 13 Pr(matching genotype)
10, 11
biological father
child
mother
putative father
50%
Cybergenetics © 2007-2011 3
Match
?10, 1010, 1110, 1210, 13
10, 1010, 1110, 1210, 1311, 1111, 1212, 1313, 13 …
Pr(matching genotype)
Pr(cooincidental genotype)
10, 11
50%
5%
biological father
child
mother
putative father
population
InformationPr(matching genotype)
Pr(cooincidental genotype)LR =
Likelihood ratioLikelihood ratio
log(LR) is a standard measure of information
50%5%
= 10=
log10(10) = 1 information unit
MW Perlin. Explaining the likelihood ratio in DNA mixture interpretation. Proceedingsof Promega's 21st International Symposium on Human Identification, 2010.
Kinship
?10, 1010, 1110, 1210, 13
10, 1010, 1110, 1210, 1311, 1111, 1212, 1313, 13 …
Pr(matching genotype)
Pr(cooincidental genotype)
10, 11
missing person
son
spouse
daughter
father
biological remains
90%
5%population
Cybergenetics © 2007-2011 4
Phenotype
+
evidence
reality
PhenotypeLab
+
10 11 12
evidence STR data
reality observation
Phenotype
evidence STR data genotype
10 11 12
10, 10 @ 30%10, 11 @ 50%10, 12 @ 20%
Lab Infer
+
reality observation model
Cybergenetics © 2007-2011 5
Phenotype
?10, 1010, 1110, 1210, 13
10, 1010, 1110, 1210, 1311, 1111, 1212, 1313, 13 …
Pr(matching genotype)
Pr(cooincidental genotype)
10, 11
50%
5%
evidence
suspect
population
10 11 12
Computer Interpretation
• quantitative computer interpretation• statistical search of probability model• preserve all identification information• objectively infer genotype, then match
• any number of mixture contributors• stutter, imbalance, degraded DNA• calculate uncertainty of every peak
Quantitative Data
Cybergenetics © 2007-2011 6
Quantitative Interpretation
Perlin MW, Szabady B. Linear mixture analysis: a mathematical approach to resolving mixedDNA samples. Journal of Forensic Sciences, 2001.
Calculate Peak Uncertainty
MW Perlin, A Sinelnikov. An information gap in DNA evidence interpretation. PLoS ONE, 2009.
Infer Accurate Genotype
MW Perlin, MM Legler, CE Spencer, JL Smith, WP Allan, JL Belrose, BW Duceman.Validating TrueAllele DNA mixture interpretation. Journal of Forensic Sciences, 2011.
Cybergenetics © 2007-2011 7
Qualitative Thresholds
Over threshold,peaks are treatedas allele events.
Under threshold,alleles do not exist.
list ofincluded alleles
Qualitative Method VariationNational Institute of Standards and Technology
Two Contributor Mixture Data, Known Victim
31 thousand (4)
213 trillion (14)
Fingernail: 7% MixtureCommonwealth v. Foley
Score Method13 thousand inclusion
23 million use victim189 billion quantitative
• probability modeling preserves information• peak threshold discards information
The DNA Investigator™ Newsletter, 2009Same Data, More Information – Murder, Match and DNA
Cybergenetics © 2007-2011 8
More Data, More Information
• low template mixture• three DNA contributors• triplicate amplification• post-PCR enhancement
vWA locus data
• no match score found• computer interpretation• joint likelihood function
Information
Gen
oype
pro
babi
lility
(vW
A)
Assume theta = 1%, and three contributors. A match between the suspect and the evidenceis 3,620,000 times more probable than coincidence.
6x
More Data, More Information
33
4455
66
Perlin MW, Greenhalgh M. Scientific combination of DNA evidence: a handgun mixture ineight parts. Twentieth International Symposium on the Forensic Sciences of the Australian
and New Zealand Forensic Science Society, Sydney, Australia. 2010.
Cybergenetics © 2007-2011 9
Data: Locus D18
33
44
55
66
11
22
44
88
Preserve vs. Discard
MW Perlin, A Sinelnikov. An information gap in DNA evidence interpretation. PLoS ONE, 2009.
Cybergenetics © 2007-2011 10
MW Perlin, MM Legler, CE Spencer, JL Smith, WP Allan, JL Belrose, BW Duceman.Validating TrueAllele DNA mixture interpretation. Journal of Forensic Sciences, 2011.
7.03 7.03
6.24 6.2413.2613.26
Quantitative InterpretationQuantitative Interpretation
Inclusion methodInclusion method
Preserve vs. Discard
Validated Reproducibility
0
5
10
15
20
25
2A 2C 2H 2D 2B 2F 2G 2E
log(L
R)
Rep 1
Rep 2
0.1750.175
MW Perlin, MM Legler, CE Spencer, JL Smith, WP Allan, JL Belrose, BW Duceman.Validating TrueAllele DNA mixture interpretation. Journal of Forensic Sciences, 2011.
Preserve vs. Discard
• quantitative interpretation preserves information - every time• peak threshold discards information - 70% of the time
Perlin MW, Duceman BW. Profiles in productivity: greater yield at lower cost with computerDNA interpretation. Twentieth International Symposium on the Forensic Sciences of the
Australian and New Zealand Forensic Science Society, Sydney, Australia. 2010.
Cybergenetics © 2007-2011 11
Data Peak Height is a RandomVariable with Mean and Variance
Accurate vs. DispersedGenotype Probability
Missed Allele Error > 100%
Threshold = 200 rfu
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
50:50 30:70 10:90
DNA mixture ratio
Fals
e n
eg
ati
ve r
ate
(all
ele
s/
locu
s)
1000 pg
500 pg
250 pg
125 pg
Perlin MW. Reliable interpretation of stochastic DNA evidence.Canadian Society of Forensic Sciences 57th Annual Meeting; Toronto, ON. 2010.
Cybergenetics © 2007-2011 12
Investigative DNA Database10, 1010, 1110, 1210, 13
evidence genotypes suspect genotypes
LR
• genotype probability representation (not alleles)• fully preserves DNA identification information• enables LR calculation with every match
10, 1010, 1110, 1210, 13
• connect crimes to criminals• disaster victim identification (WTC)• find missing people• automatic familial search• combat terrorism through DNA
MW Perlin, JB Kadane, RW Cotton. Match likelihood ratio for uncertain genotypes.Law, Probability and Risk, 2009.
Societal Consequences• much informative DNA evidence goes unused• lost in "interim" qualitative interpretation methods, thresholds applied for analyst convenience• DNA databases (e.g., CODIS) lose information, using inclusion-based "alleles", instead of ISFG's preferred LR-based "genotype" probability
Inaccurate, uninformative DNA interpretation:• DNA analysts lose genotypes • prosecutors lose criminal cases• databases lose investigative leads • innocent law abiding citizens lose lives
The DNA Investigator™ Newsletter, 2011DNA Intelligence and Forensic Failure – What you don't know can kill you
International Consensus1. DNA data is continuous, and has random variation2. Thresholds do not work for low template DNA3. Mathematical models can account for random variation
4. The 21st century might be a good time to move awayfrom potentially biased human review of low level (oralmost any) DNA data to some sort of objectivecomputer interpretation that can infer genotypes up toprobability, without ever looking at suspects, that givessome (possibly uninformative) objective answer.
M Perlin; P Gill, J Buckleton, B Budowle, A van Daal. Low template DNA controversy.Twentieth International Symposium on the Forensic Sciences of the Australian and New
Zealand Forensic Science Society, Sydney, Australia. 2010.
Cybergenetics © 2007-2011 13
Strengthening Forensic Science
http://www.cybgen.com/[email protected]
•• objective, reliable, consistent interpretation objective, reliable, consistent interpretation•• preserve all available information, preserve all available information, from the scene of crime tofrom the scene of crime to courtcourt
•• probabilistic genotypesprobabilistic genotypes•• modern statistics and computation modern statistics and computation•• solid mathematical foundation solid mathematical foundation
•• platinum standard for preserving DNA information platinum standard for preserving DNA information•• move beyond less informative interim methods move beyond less informative interim methods
Learning More
www.cybgen.com/information
• Newsletters gentle introduction to ideas• Courses for scientists and lawyers• Presentations handouts, movies, transcripts• Publications abstracts, manuscripts
The science of quantitative DNA mixture interpretation
Top Related