Applications of in Forensic Science
Pushing Out the Frontiers Nick D. K. Petraco and Many Others John Jay College of Criminal Justice!
Outline • Admissibility of Scientific Evidence is a
problem! • Frye and the Daubert Standards
• How chemistry, engineering, math and computers can help forensic science • Current Projects At John Jay:
• Petroleum Distillates (Fire Debris) • Dust (Trace Evidence) • Cartridge cases (Firearms)
Admissibility of scientific evidence!
• Principal legal standards: Frye and Daubert
• Frye (1923) – Testimony offered as “scientific” must “...have gained general acceptance in the particular field in which it belongs”.
• New York is still a “Frye State”
• Daubert (1993)- Judges are the “gatekeepers” of scientific evidence.
• Must determine if the science is reliable • Has empirical testing been done?
• Falsifiability • Has the science been subject to peer review? • Are there known error rates? • Is there general acceptance?
• Federal Government and 26(-ish) States are “Daubert States”
Frye and Daubert
Raising Standards with Data and Statistics
• DNA profiling the most successful application of statistics in forensic science. • Responsible for current interest in “raising standards” of
other branches in forensics.
• No protocols for the application of statistics to physical evidence. • Our goal: application of objective, numerical
computational pattern comparison to physical evidence
• Statistical pattern comparison! • Modern algorithms are called
machine learning • Idea is to measure
features of the physical evidence that characterize it
• Train algorithm to recognize “major” differences between groups of features while taking into account natural variation and measurement error.
What Statistics Can Be Used?
• R is not a proprietary black box! • Open-source and totally transparent!
• R maintained by a professional group of statisticians, and computational scientists • From very simple to state-of-the-art procedures available • Very good graphics for exhibits and papers
• R is extensible (it is a full scripting language) • R has “commercial versions” too, Revolution
R, S+
Why ?
Fire Debris Analysis Casework
• Liquid gasoline samples recovered during investigation: • Unknown history • Subjected to various real world conditions.
• If an individual sample can be discriminated from the larger group, this can be of forensic interest.
• Gas-Chromatography Commonly Used to ID gas. • Peak comparisons of chromatograms difficult and time
consuming. • Does “eye-balling” satisfy Daubert, or even Frye .....????
Study Design • This study was undertaken to examine the variability of
gasoline components in • Twenty liquid gasoline samples • Samples from fire investigations in the New York City area
• All samples analyzed using Gas Chromatography-Mass Spectrometry • Keto and Wineman target compounds • Fifteen peaks were chosen in this study that represented the
common components present in gasoline.
• Normalized GC-MS peak areas were utilized to test the discrimination potential of multiple multivariate methods for discrimination.
Chosen Peaks
M. Gil!
• Use prcomp, lda (MASS) and rgl: 10D PCA-3D CVA • HOO-CV correct classification rate: 100%.
Dust
N. Petraco!
Hans Gross
A 19th Century German magistrate influenced by the writings of Sir Arthur Conan Doyle suggested that Dust and other traces be allowed in legal proceedings.
Dust
N. Petraco!
It enables one to identify the people places and things involved in an event.
What can it tell you?
It helps one to associate the people, places and things involved in an event.
It can often tell a story.
It can help one reconstruct the event.
Develop a simple method that enables you to identify the trace materials commonly found in dust samples
Develop a simple generic data sheet (Tool) that allows you to quickly collect data on the trace materials commonly found in dust samples
Write some analysis scripts
Analyze the data
Convert the data sheet in to an Excel Spreadsheet and load into
N. Petraco!
• Conformal Prediction TheoryVovk et al.
• New, but has roots in 1960’s with Kolmogorov’s ideas on randomness and algorithmic complexity.
• Can be used with any statistical pattern classification algorithm.
• Independent of data’s underlying probability distribution. • This is a very important property for forensic pattern
recognition!! • Well, …sample should be I.I.D.
• For identification of patterns, method produces • “Confidence region” at Level of confidence, 1- α
• Confidence: Measure of how likely I.D. procedure is to be correct • Results are valid: P(identification error) ≤α
3D PCA-Clustering can show potential for discrimination
• Use e1071, caret, pls and custom scripts: • PCA-SVM 27D, refined bootstrap
error rate estimate= 0.7%, • 95% CI [0.0%,3.3 %]
• CPT 99% level of confidence “I.D.” • Empirical Error rate = 0% • Unique and correct ID intervals
= 93.1% • PLS-DA 35D refined Bootstrap error
rate estimate= 0.8% • 95% CI [0.0%,3.3%]
Known Match Comparisons 5/8” Consecutively manufactured chisels
G. Petillo
Tool Marks
• Obtain striation pattern profiles form 3D confocal microscopy
Approach For Striated Tool Marks
Glock 19 firing pin impression
Primer!shear!
P. Diaczuk!
• 3D confocal image of entire shear pattern
Shear marks on primer of two different Glock 19s
Mean total profile:
Mean “waviness” profile:
Mean “roughness” profile:
• Primer shears (82-91 profiles) – PCA-SVM, CPT at the 95% level of confidence
• Empirical error rate was 4.7% • 90.7% of I.D. intervals were unique and correct • 7% of I.D. intervals had more than 1 I.D. • No “uninformative” intervals were returned
– PCA-SVM, HOO-CV • Error rate estimate is 0.0%-4.4%, depending on the number of replicates
– PLS-DA, Bootstrap (>10 replicates only) • 95% confidence interval for error rate: [0%, 0%] • 95% confidence interval for average false positive rate: [0%, 0%] • 95% confidence interval for average false negative rate: [0%, 0%]
– PLS-DA, HOO-CV • Error rate estimate is 0.0%-4.3%, depending on the number of replicates
• Results so far are on par with expectations
Primer Shear
• 3D PCA 36 Glocks, 1080 simulated and real primer shear profiles:
• 18D PCA-SVM, refined bootstrap gun I.D. error rate 0.3%, 95% CI [0%, 0.8%]
Empirical Bayes’ • Bayes’ Rule: can we realistically estimate posterior
error probabilities empirically/falsifiably??
Pr S- | t +( ) = Pr t+ | S-( )
Pr t +( ) Pr S-( )Probability of no actual association given a test/algorithm indicates a positive ID
• Perhaps. Genomics has spawned similar questions: • What is the probability of no disease (S-) given the
differences in expression “scores” of thousands of genes.
Empirical Bayes’ • Erfon’s machinery for “empirical Bayes’ two-groups
model”Efron 2007
• Surprisingly simple! • S-, truly no association, Null hypothesis • S+, truly an association, Non-null hypothesis • z, a Gaussian random variate derived from a machine learning task
to ID an unknown pattern with a group
• Scheme yields estimate of Pr(S-|z) along with it’s standard error • Called the local false discovery rate (fdr) or posterior error
probability
• Given a similarity score, fdr is an estimate of the probability that the computer is wrong in calling a “match”
• Catch: you need A LOT of z and they should be fairly independent and Pr(S-) > 0.9
Empirical Bayes’ • Machine learning algorithms dump out tons of
“scores” measuring how much they think each unknown “piece of evidence”, “matches” with a known
1 2 3 4 5 6 7 8 1 0.819231296 0.02159198 0.029272183 0.025411563 0.0225503 0.010915617 0.024680949 0.046346112 2 0.765918964 0.02255741 0.050851857 0.030990821 0.02792472 0.016217858 0.028200947 0.057337426 3 0.879527253 0.01795078 0.0184986 0.022467998 0.01577359 0.007162571 0.011941812 0.026677397 4 0.800343998 0.02064226 0.045323988 0.024858598 0.02244252 0.012063858 0.022874428 0.051450344 5 0.767143734 0.02275608 0.040918155 0.035104182 0.02878297 0.012899321 0.028778064 0.063617494 6 0.85119471 0.02110206 0.023900293 0.019113873 0.02001155 0.008916055 0.018147978 0.037613483 7 0.74589297 0.0218173 0.046976363 0.042961529 0.03109739 0.019528189 0.033183337 0.058542922 8 0.858658608 0.01868246 0.028668362 0.019978748 0.01658029 0.008302535 0.015666725 0.033462271 9 0.757389572 0.02335122 0.031192719 0.041061437 0.03268829 0.015409691 0.036750576 0.062156499
10 0.861134581 0.01798733 0.019716501 0.032786291 0.01893542 0.009636956 0.012723958 0.027078964 11 0.705447085 0.04062574 0.039299857 0.038929197 0.03988337 0.021602265 0.031111236 0.083101251 12 0.880200022 0.02285615 0.015669536 0.011783052 0.01489933 0.024089363 0.00854029 0.021962262 13 0.846512708 0.03225958 0.01772897 0.018825927 0.02479879 0.012076544 0.015157749 0.032639732 14 0.906210922 0.01286627 0.015271791 0.014028597 0.01511204 0.006311254 0.009610058 0.020589068 15 0.924204618 0.01267558 0.010580565 0.012674802 0.01076464 0.007229735 0.006838076 0.015031985 16 0.743561031 0.04290822 0.013293883 0.020473147 0.04545213 0.080523563 0.019370505 0.034417526 17 0.863369692 0.01235278 0.045802525 0.02472657 0.01088378 0.00856469 0.007809193 0.026490772 18 0.918347765 0.01613136 0.008822265 0.011735291 0.01131915 0.012266314 0.006711902 0.01466595
Gun ID (knowns)
Primer Shear # (unknowns)
• Platt SVM “probability scores” from Glock 19 ID study:
Empirical Bayes’ • Use these SVM “Platt scores” to form p-values
• Transform p-values with probit function • Produces A LOT of z values • The z are fairly independent……
Null (Non−Match) Histogram (via HOO)
z−values
Densi
ty
−4 −2 0 2 4
0.00.1
0.20.3
0.4
−4 −2 0 2 4
0.00.1
0.20.3
0.4
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!!
!!
!
!
!!!
!!
!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
−3 −2 −1 0 1 2 3
−3−2
−10
12
Normal Q−Q Plot
Theoretical Quantiles
Sample
Quantile
s
Checking assumptions on z-scores of Glock 19 primer shears!
Empirical Bayes’ • Use locfdr: Gives a “calibrated posterior association
probability” model
−4 −2 0 2
0.00.2
0.40.6
0.81.0
1.2
Estimated Posterior Error Probilities (Local FDRs)
z value
Pr(S−|z
) est.
−4 −2 0 2
0.00.2
0.40.6
0.81.0
1.2
−4 −2 0 2
0.00.2
0.40.6
0.81.0
1.2
−4.8 −4.6 −4.4 −4.2
0.000
00.0
005
0.001
00.0
015
Estimated Posterior Error Probilities (Local FDRs)
z value
Pr(S−|z
) est.
−4.8 −4.6 −4.4 −4.2
0.000
00.0
005
0.001
00.0
015
−4.8 −4.6 −4.4 −4.2
0.000
00.0
005
0.001
00.0
015
This is the est. prob of no association
Computer outputs “match” for: unknown-known from “Bob the burglar”, falls here!
This is an uncertainty in the estimate
Empirical Bayes’
−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5
0.00.2
0.40.6
0.81.0
1.21.4
z value
Pr(S−|
z) est.
−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5
0.00.2
0.40.6
0.81.0
1.21.4
−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5
0.00.2
0.40.6
0.81.0
1.21.4
!! ! !! !! ! !!! !!! !!!! !!! !! !!!
!
!
!
! !! !! !
!
! !! !!!!! ! ! !! !!!! !!!! !! !! !!!! !!
!
!! !!! !! !!! !! ! !! !!!!!! ! !! !! ! !! !!
!
!! !!!
!!
−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5
0.00.2
0.40.6
0.81.0
1.21.4
• For the Glock 19 primer shear study: • Posterior error probs. (black circles) estimated by HOO-CV
The SVM alg got these Primer shear IDs wrong!
Just Getting Started: Things to Come • Footwear • Soil • Wrenches • Chisels • Q.D.
• Tire Tracks • Hair • Blood Spatter • Gun Shot Residue
N. Petraco!
N. Petraco!
Acknowledgements • Research Team:
• Mr. Peter Diaczuk • Dr. Peter De Forest • Ms. Carol Gambino • Mr. Mark Gil • Dr. James Hamby • Dr. Thomas Kubic • Off. Patrick McLaughlin • Dr. Linton Mohammed • Mr. Jerry Petillo • Mr. Nicholas Petraco • Dr. Peter A. Pizzola • Dr. Graham Rankin • Dr. Jacqueline Speir • Dr. Peter Shenkin • Mr. Peter Tytell
• Helen Chan • Manny Chaparro • Julie Cohen • Aurora Dimitrova • Eric Gosslin • Frani Kammerman • Brooke Kammrath
• Loretta Kuo • Dale Purcel • Stephanie Pollut • Rebecca Smith • Elizabeth Willie • Chris Singh • Melodie Yu
Top Related