Download - Applications of in Forensic Science - Meetupfiles.meetup.com › 1406240 › Forensic_Science.pdfApplications of in Forensic Science Pushing Out the Frontiers Nick D. K. Petraco and

Applications of in Forensic Science

Pushing Out the Frontiers Nick D. K. Petraco and Many Others John Jay College of Criminal Justice!

Outline •  Admissibility of Scientific Evidence is a

problem! •  Frye and the Daubert Standards

•  How chemistry, engineering, math and computers can help forensic science •  Current Projects At John Jay:

•  Petroleum Distillates (Fire Debris) •  Dust (Trace Evidence) •  Cartridge cases (Firearms)

Admissibility of scientific evidence!

•  Principal legal standards: Frye and Daubert

•  Frye (1923) – Testimony offered as “scientific” must “...have gained general acceptance in the particular field in which it belongs”.

•  New York is still a “Frye State”

•  Daubert (1993)- Judges are the “gatekeepers” of scientific evidence.

•  Must determine if the science is reliable •  Has empirical testing been done?

•  Falsifiability •  Has the science been subject to peer review? •  Are there known error rates? •  Is there general acceptance?

•  Federal Government and 26(-ish) States are “Daubert States”

Frye and Daubert

Raising Standards with Data and Statistics

•  DNA profiling the most successful application of statistics in forensic science. •  Responsible for current interest in “raising standards” of

other branches in forensics.

•  No protocols for the application of statistics to physical evidence. •  Our goal: application of objective, numerical

computational pattern comparison to physical evidence

•  Statistical pattern comparison! •  Modern algorithms are called

machine learning •  Idea is to measure

features of the physical evidence that characterize it

•  Train algorithm to recognize “major” differences between groups of features while taking into account natural variation and measurement error.

What Statistics Can Be Used?

•  R is not a proprietary black box! •  Open-source and totally transparent!

•  R maintained by a professional group of statisticians, and computational scientists •  From very simple to state-of-the-art procedures available •  Very good graphics for exhibits and papers

•  R is extensible (it is a full scripting language) •  R has “commercial versions” too, Revolution

R, S+

Why ?

Fire Debris Analysis Casework

•  Liquid gasoline samples recovered during investigation: •  Unknown history •  Subjected to various real world conditions.

•  If an individual sample can be discriminated from the larger group, this can be of forensic interest.

•  Gas-Chromatography Commonly Used to ID gas. •  Peak comparisons of chromatograms difficult and time

consuming. •  Does “eye-balling” satisfy Daubert, or even Frye .....????

Study Design •  This study was undertaken to examine the variability of

gasoline components in •  Twenty liquid gasoline samples •  Samples from fire investigations in the New York City area

•  All samples analyzed using Gas Chromatography-Mass Spectrometry •  Keto and Wineman target compounds •  Fifteen peaks were chosen in this study that represented the

common components present in gasoline.

•  Normalized GC-MS peak areas were utilized to test the discrimination potential of multiple multivariate methods for discrimination.

Chosen Peaks

M. Gil!

•  Use prcomp, lda (MASS) and rgl: 10D PCA-3D CVA •  HOO-CV correct classification rate: 100%.

Dust

N. Petraco!

Hans Gross

A 19th Century German magistrate influenced by the writings of Sir Arthur Conan Doyle suggested that Dust and other traces be allowed in legal proceedings.

Dust

N. Petraco!

It enables one to identify the people places and things involved in an event.

What can it tell you?

It helps one to associate the people, places and things involved in an event.

It can often tell a story.

It can help one reconstruct the event.

Develop a simple method that enables you to identify the trace materials commonly found in dust samples

Develop a simple generic data sheet (Tool) that allows you to quickly collect data on the trace materials commonly found in dust samples

Write some analysis scripts

Analyze the data

Convert the data sheet in to an Excel Spreadsheet and load into

N. Petraco!

•  Conformal Prediction TheoryVovk et al.

•  New, but has roots in 1960’s with Kolmogorov’s ideas on randomness and algorithmic complexity.

•  Can be used with any statistical pattern classification algorithm.

•  Independent of data’s underlying probability distribution. •  This is a very important property for forensic pattern

recognition!! •  Well, …sample should be I.I.D.

•  For identification of patterns, method produces •  “Confidence region” at Level of confidence, 1- α

•  Confidence: Measure of how likely I.D. procedure is to be correct •  Results are valid: P(identification error) ≤α

3D PCA-Clustering can show potential for discrimination

•  Use e1071, caret, pls and custom scripts: •  PCA-SVM 27D, refined bootstrap

error rate estimate= 0.7%, •  95% CI [0.0%,3.3 %]

•  CPT 99% level of confidence “I.D.” •  Empirical Error rate = 0% •  Unique and correct ID intervals

= 93.1% •  PLS-DA 35D refined Bootstrap error

rate estimate= 0.8% •  95% CI [0.0%,3.3%]

Known Match Comparisons 5/8” Consecutively manufactured chisels

G. Petillo

Tool Marks

•  Obtain striation pattern profiles form 3D confocal microscopy

Approach For Striated Tool Marks

Glock 19 firing pin impression

Primer!shear!

P. Diaczuk!

•  3D confocal image of entire shear pattern

Shear marks on primer of two different Glock 19s

Mean total profile:

Mean “waviness” profile:

Mean “roughness” profile:

•  Primer shears (82-91 profiles) –  PCA-SVM, CPT at the 95% level of confidence

•  Empirical error rate was 4.7% •  90.7% of I.D. intervals were unique and correct •  7% of I.D. intervals had more than 1 I.D. •  No “uninformative” intervals were returned

–  PCA-SVM, HOO-CV •  Error rate estimate is 0.0%-4.4%, depending on the number of replicates

–  PLS-DA, Bootstrap (>10 replicates only) •  95% confidence interval for error rate: [0%, 0%] •  95% confidence interval for average false positive rate: [0%, 0%] •  95% confidence interval for average false negative rate: [0%, 0%]

–  PLS-DA, HOO-CV •  Error rate estimate is 0.0%-4.3%, depending on the number of replicates

•  Results so far are on par with expectations

Primer Shear

•  3D PCA 36 Glocks, 1080 simulated and real primer shear profiles:

•  18D PCA-SVM, refined bootstrap gun I.D. error rate 0.3%, 95% CI [0%, 0.8%]

Empirical Bayes’ •  Bayes’ Rule: can we realistically estimate posterior

error probabilities empirically/falsifiably??

Pr S- | t +( ) = Pr t+ | S-( )

Pr t +( ) Pr S-( )Probability of no actual association given a test/algorithm indicates a positive ID

•  Perhaps. Genomics has spawned similar questions: •  What is the probability of no disease (S-) given the

differences in expression “scores” of thousands of genes.

Empirical Bayes’ •  Erfon’s machinery for “empirical Bayes’ two-groups

model”Efron 2007

•  Surprisingly simple! •  S-, truly no association, Null hypothesis •  S+, truly an association, Non-null hypothesis •  z, a Gaussian random variate derived from a machine learning task

to ID an unknown pattern with a group

•  Scheme yields estimate of Pr(S-|z) along with it’s standard error •  Called the local false discovery rate (fdr) or posterior error

probability

•  Given a similarity score, fdr is an estimate of the probability that the computer is wrong in calling a “match”

•  Catch: you need A LOT of z and they should be fairly independent and Pr(S-) > 0.9

Empirical Bayes’ •  Machine learning algorithms dump out tons of

“scores” measuring how much they think each unknown “piece of evidence”, “matches” with a known

1 2 3 4 5 6 7 8 1 0.819231296 0.02159198 0.029272183 0.025411563 0.0225503 0.010915617 0.024680949 0.046346112 2 0.765918964 0.02255741 0.050851857 0.030990821 0.02792472 0.016217858 0.028200947 0.057337426 3 0.879527253 0.01795078 0.0184986 0.022467998 0.01577359 0.007162571 0.011941812 0.026677397 4 0.800343998 0.02064226 0.045323988 0.024858598 0.02244252 0.012063858 0.022874428 0.051450344 5 0.767143734 0.02275608 0.040918155 0.035104182 0.02878297 0.012899321 0.028778064 0.063617494 6 0.85119471 0.02110206 0.023900293 0.019113873 0.02001155 0.008916055 0.018147978 0.037613483 7 0.74589297 0.0218173 0.046976363 0.042961529 0.03109739 0.019528189 0.033183337 0.058542922 8 0.858658608 0.01868246 0.028668362 0.019978748 0.01658029 0.008302535 0.015666725 0.033462271 9 0.757389572 0.02335122 0.031192719 0.041061437 0.03268829 0.015409691 0.036750576 0.062156499

10 0.861134581 0.01798733 0.019716501 0.032786291 0.01893542 0.009636956 0.012723958 0.027078964 11 0.705447085 0.04062574 0.039299857 0.038929197 0.03988337 0.021602265 0.031111236 0.083101251 12 0.880200022 0.02285615 0.015669536 0.011783052 0.01489933 0.024089363 0.00854029 0.021962262 13 0.846512708 0.03225958 0.01772897 0.018825927 0.02479879 0.012076544 0.015157749 0.032639732 14 0.906210922 0.01286627 0.015271791 0.014028597 0.01511204 0.006311254 0.009610058 0.020589068 15 0.924204618 0.01267558 0.010580565 0.012674802 0.01076464 0.007229735 0.006838076 0.015031985 16 0.743561031 0.04290822 0.013293883 0.020473147 0.04545213 0.080523563 0.019370505 0.034417526 17 0.863369692 0.01235278 0.045802525 0.02472657 0.01088378 0.00856469 0.007809193 0.026490772 18 0.918347765 0.01613136 0.008822265 0.011735291 0.01131915 0.012266314 0.006711902 0.01466595

Gun ID (knowns)

Primer Shear # (unknowns)

•  Platt SVM “probability scores” from Glock 19 ID study:

Empirical Bayes’ •  Use these SVM “Platt scores” to form p-values

•  Transform p-values with probit function •  Produces A LOT of z values •  The z are fairly independent……

Null (Non−Match) Histogram (via HOO)

z−values

Densi

ty

−4 −2 0 2 4

0.00.1

0.20.3

0.4

−4 −2 0 2 4

0.00.1

0.20.3

0.4

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!!

!!

!

!

!!!

!!

!!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!!!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

−3 −2 −1 0 1 2 3

−3−2

−10

12

Normal Q−Q Plot

Theoretical Quantiles

Sample

Quantile

s

Checking assumptions on z-scores of Glock 19 primer shears!

Empirical Bayes’ •  Use locfdr: Gives a “calibrated posterior association

probability” model

−4 −2 0 2

0.00.2

0.40.6

0.81.0

1.2

Estimated Posterior Error Probilities (Local FDRs)

z value

Pr(S−|z

) est.

−4 −2 0 2

0.00.2

0.40.6

0.81.0

1.2

−4 −2 0 2

0.00.2

0.40.6

0.81.0

1.2

−4.8 −4.6 −4.4 −4.2

0.000

00.0

005

0.001

00.0

015

Estimated Posterior Error Probilities (Local FDRs)

z value

Pr(S−|z

) est.

−4.8 −4.6 −4.4 −4.2

0.000

00.0

005

0.001

00.0

015

−4.8 −4.6 −4.4 −4.2

0.000

00.0

005

0.001

00.0

015

This is the est. prob of no association

Computer outputs “match” for: unknown-known from “Bob the burglar”, falls here!

This is an uncertainty in the estimate

Empirical Bayes’

−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5

0.00.2

0.40.6

0.81.0

1.21.4

z value

Pr(S−|

z) est.

−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5

0.00.2

0.40.6

0.81.0

1.21.4

−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5

0.00.2

0.40.6

0.81.0

1.21.4

!! ! !! !! ! !!! !!! !!!! !!! !! !!!

!

!

!

! !! !! !

!

! !! !!!!! ! ! !! !!!! !!!! !! !! !!!! !!

!

!! !!! !! !!! !! ! !! !!!!!! ! !! !! ! !! !!

!

!! !!!

!!

−5.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5

0.00.2

0.40.6

0.81.0

1.21.4

•  For the Glock 19 primer shear study: •  Posterior error probs. (black circles) estimated by HOO-CV

The SVM alg got these Primer shear IDs wrong!

Just Getting Started: Things to Come •  Footwear •  Soil •  Wrenches •  Chisels •  Q.D.

•  Tire Tracks •  Hair •  Blood Spatter •  Gun Shot Residue

N. Petraco!

N. Petraco!

Acknowledgements •  Research Team:

•  Mr. Peter Diaczuk •  Dr. Peter De Forest •  Ms. Carol Gambino •  Mr. Mark Gil •  Dr. James Hamby •  Dr. Thomas Kubic •  Off. Patrick McLaughlin •  Dr. Linton Mohammed •  Mr. Jerry Petillo •  Mr. Nicholas Petraco •  Dr. Peter A. Pizzola •  Dr. Graham Rankin •  Dr. Jacqueline Speir •  Dr. Peter Shenkin •  Mr. Peter Tytell

•  Helen Chan •  Manny Chaparro •  Julie Cohen •  Aurora Dimitrova •  Eric Gosslin •  Frani Kammerman •  Brooke Kammrath

•  Loretta Kuo •  Dale Purcel •  Stephanie Pollut •  Rebecca Smith •  Elizabeth Willie •  Chris Singh •  Melodie Yu