forensim: an open source initiative for method evaluation...

33
Background The forensim package Method evaluation : An example forensim: an open source initiative for method evaluation in forensic genetics Hinda Haned Laboratory of Biometry and Evolutionary Biology, University of Lyon, France Wroskhop in Forensic Genetics, Institute of Forensic Medecine, Oslo, November 2009 1 / 33

Transcript of forensim: an open source initiative for method evaluation...

Page 1: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

forensim: an open source initiative for methodevaluation in forensic genetics

Hinda Haned

Laboratory of Biometry and Evolutionary Biology, University of Lyon, France

Wroskhop in Forensic Genetics, Institute of Forensic Medecine,Oslo, November 2009

1 / 33

Page 2: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Forensic DNA mixtures : A challenging task

Interpretation issues

Is it a mixture ?

How many peopleinvolved ?

Weight of the stain asan evidence ?

2 / 33

Page 3: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Available methods

I Several methods dedicated to mixtures interpretation areavailable :

LR in case of population substructure Curran et al 1999Number of contributors Egeland et al 2003Unknown related contributors Fung and Hu 2003Genotyping errors Thompson et al 2003

⇒ Lack of evaluation of methods’ efficiency and robustness

3 / 33

Page 4: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

How to evaluate these methods ?

On simulated DNA stains where the circumstances of thehypothetical crime are known by the experimenter.

The experimenter would evaluate method’s efficiency :1 While varying accurate parameters :

type of markers analyzednumber of markers analyzednumber of contributors to the DNA evidence

2 In critical situations :

population subdivision (co-ancestry)partial profilesrelatedness between contributors to the DNA stainallele dropout

4 / 33

Page 5: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

How to evaluate these methods ?

I Laboratory simulated DNA stains :Some scenarios are hard to test in laboratory (ex. populationsubstructure)Cost issues : new experiments are to be conducted for eachtested scenario

I Computer simulated DNA stains :

Complex scenarios can be simulatedNo cost issues

Currently, there is no free software providing simulation toolsspecific to forensic genetics.

5 / 33

Page 6: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

How to evaluate these methods ?

I Laboratory simulated DNA stains :

Some scenarios are hard to test in laboratory (ex. populationsubstructure)Cost issues : new experiments are to be conducted for eachtested scenario

I Computer simulated DNA stains :Complex scenarios can be simulatedNo cost issues

Currently, there is no free software providing simulation toolsspecific to forensic genetics.

6 / 33

Page 7: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

The forensim package

Main features

I forensim is a package for the statistical software

I forensim is freely available

I Relies on object oriented programming

I Sources freely available on

I Compiles and runs on a wide variety of UNIX platforms,Windows and MacOS

7 / 33

Page 8: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

forensim’s structure

Simulation tools

Simulation of datacommonly encounteredin forensic casework

Statistical tools

Main statistical methodsfor forensic DNAevidence interpretation

8 / 33

Page 9: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulation tools

Object oriented programming

Structured code, can easily be modified/ enriched ⇒ allows a widevariety of scenarios

9 / 33

Page 10: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Statistical tools

Statistical methods usually used to report the weight of a DNAevidence are implemented :

Random man exclusion probability

θ correction for allele dependencies Weir, In Buckelton et al, 2005

Likelihood ratios

General formula for likelihood ratios Curran et al, 1999

Random match probabilities

Accounts for :I relatednessI allele dependencies

Balding & Nichols, 1994

10 / 33

Page 11: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulation tools : Focus on DNA mixtures

Two kinds of information stored :

Usual information

• Alleles present in the stain

• Marker names

• Allele frequencies of the putative population

Simulation-related information

• Number of individuals involved

• Contributors’ genotypes

• Contributors’ populations

11 / 33

Page 12: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulating forensic DNA mixtures

Simulating a 3-person mixture, using the African American allelefrequencies (Butler et al, 2003) :

Step1 : load the package

> library(forensim)

### forensim 1.1.2 is loaded ###

Step2 : generate the data

> data(strusa)> geno <- simugeno(strusa, n = c(100, 0, 0))> mix3 <- simumix(geno, ncontri = c(3, 0, 0))

12 / 33

Page 13: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulating forensic DNA mixtures

Simulating a 3-person mixture, using the African American allelefrequencies (Butler et al, 2003) :

Step1 : load the package

> library(forensim)

### forensim 1.1.2 is loaded ###

Step2 : generate the data

> data(strusa)> geno <- simugeno(strusa, n = c(100, 0, 0))> mix3 <- simumix(geno, ncontri = c(3, 0, 0))

13 / 33

Page 14: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulating forensic DNA mixtures

Mixture representation in forensim

> mix3

# Simumix object: simulated mixture #

@which.loc: vector of 15 locus names@ncontri: [email protected]: 3 x 15 data frame of the contributors [email protected]: list of the alleles found in the mixture@popinfo: populations of the contributors

Display stain profiles at locus FGA

> mix3$mix.all$FGA

[1] "20" "21" "24" "25"

Display contributors profiles at locus FGA

> mix3$mix.prof[, "FGA"]

ind70 ind58 ind1"21/24" "24/25" "21/20"

14 / 33

Page 15: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulating forensic DNA mixtures

Mixture representation in forensim

> mix3

# Simumix object: simulated mixture #

@which.loc: vector of 15 locus names@ncontri: [email protected]: 3 x 15 data frame of the contributors [email protected]: list of the alleles found in the mixture@popinfo: populations of the contributors

Display stain profiles at locus FGA

> mix3$mix.all$FGA

[1] "20" "21" "24" "25"

Display contributors profiles at locus FGA

> mix3$mix.prof[, "FGA"]

ind70 ind58 ind1"21/24" "24/25" "21/20"

15 / 33

Page 16: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Simulating forensic DNA mixtures

Mixture representation in forensim

> mix3

# Simumix object: simulated mixture #

@which.loc: vector of 15 locus names@ncontri: [email protected]: 3 x 15 data frame of the contributors [email protected]: list of the alleles found in the mixture@popinfo: populations of the contributors

Display stain profiles at locus FGA

> mix3$mix.all$FGA

[1] "20" "21" "24" "25"

Display contributors profiles at locus FGA

> mix3$mix.prof[, "FGA"]

ind70 ind58 ind1"21/24" "24/25" "21/20"

16 / 33

Page 17: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Reporting the weight of the evidence

What is the exclusion probability of the DNA evidence ?

>PE(mix3, freq = strusa, refpop = "Afri", theta = 0, byloc =FALSE)

PE0.999989

Help page

I mix : the DNA mixture

I freq : the allele frequencies to use

I refpop : the reference population, used only if freq contains allele frequencies formultiple populations

I theta : θ correction for allele dependencies

I byloc : logical indicating whether the PE is computed by/overall loci

17 / 33

Page 18: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Reporting the weight of the evidence

What is the exclusion probability of the DNA evidence ?

>PE(mix3, freq = strusa, refpop = "Afri", theta = 0, byloc =FALSE)

PE0.999989

Help page

I mix : the DNA mixture

I freq : the allele frequencies to use

I refpop : the reference population, used only if freq contains allele frequencies formultiple populations

I theta : θ correction for allele dependencies

I byloc : logical indicating whether the PE is computed by/overall loci

18 / 33

Page 19: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Reporting the weight of the evidence

By locus exclusion probability

> PE(mix3, freq = strusa, refpop = "Afri", byloc = TRUE)

PE_lCSF1PO 0.6315FGA 0.6320TH01 0.4140TPOX 0.2629VWA 0.1739D3S1358 0.2893D5S818 0.2018D7S820 0.2259D8S1179 0.6082D13S317 0.1739D16S539 0.4404D18S51 0.5828D21S11 0.5426D2S1338 0.6339D19S433 0.8437

19 / 33

Page 20: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Determining the number of contributors to a DNA mixture

I In many situations, scarce data is available about the origin ofthe satin

No available suspectUnknown contributorsScarce non genetic evidence

An estimate of the number of contributors can help theinvestigators !

20 / 33

Page 21: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Determining the number of contributors to a DNA mixture

I A common laboratory practice : the number of contributorsset to the minimum required to explain the profiles.

An alternative approach :

I A maximum-likelihood estimator of the number ofcontributors to a forensic DNA mixture

Egeland et al. Estimating the number of contributors to a DNA profile. Int J Legal

Med 2003 ;117(5) : 271-5.

21 / 33

Page 22: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

The maximum likelihood apporach

Let A be a specific locus with alleles A1, ...,Ak withfrequencies p1, ..., pk in a given population.

Crime scene profiles : A1 and A2 .

What is the likelihood of these profiles, if there were twocontributors supplying these alleles ?

22 / 33

Page 23: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

The maximum likelihood apporach

7 genotype pairs are possible :

(A1A1,A2A2) (A2A2,A1A1) (A1A1,A1A2)(A2A2,A1A2) (A1A2,A1A1) (A1A2,A1A2)(A1A2,A2A2)

Assuming the independence of alleles between and withinindividuals :

Pr(A1A1) = p21 and Pr(A1A2) = 2p1p2

Pr(A1A1,A1A2) = Pr(A1A1)× Pr(A1A2)

Adding the genotype probabilities for all 7 genotype pairs

LA(x = 2) = 4p31p2 + 6p2

1p22 + 4p1p

32

23 / 33

Page 24: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

The likelihood function

I Generalization :Multiallelic lociAllele dependencies due to population subdivision

I Need for automatizationInspired from the general formula for likelihood ratios fromCurran et al. (1999)

LA(x) =r∑

r1=0

r∑r2=0

...

r−r1−r2−...−rc−2∑rc−1

(2x)!c∏

i=1

ui !

×

c∏i=1

ui−1∏j=0

[(1− θ)pi + jθ]

2x−1∏j=0

[(1− θ) + jθ]

Haned et al Estimating the number of contributors to forensic DNA mixtures : Does maximum likelihood perform

better than maximum allele count ? J. Forensic. Sci., accepted (2010) 24 / 33

Page 25: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Maximum likelihood estimation

The maximum likelihood estimation of x , when a single marker Ais considered, verifies :

maxj=1,2,3,...

LA(x = j)

When multiple loci are considered simultaneously :

maxj=1,2,3...

∏A

LA(x = j)

25 / 33

Page 26: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Method evaluation

Does maximum likelihood perform better then maximumallele count ?

26 / 33

Page 27: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Implementation

Maximum allele count

>mincontri(mix3)

[1] 3

Maximum likelihood

>likestim(mix = mix3, freq = strusa, refpop = "Afri"’, theta = 0)

max maxval3 2.6e-26

Help page

I mix : the DNA mixture

I freq : the allele frequencies to use

I refpop : the reference population, used only if freq contains allele frequencies formultiple populations

I theta : θ correction for allele dependencies

27 / 33

Page 28: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Implementation

Maximum allele count

>mincontri(mix3)

[1] 3

Maximum likelihood

>likestim(mix = mix3, freq = strusa, refpop = "Afri"’, theta = 0)

max maxval3 2.6e-26

Help page

I mix : the DNA mixture

I freq : the allele frequencies to use

I refpop : the reference population, used only if freq contains allele frequencies formultiple populations

I theta : θ correction for allele dependencies

28 / 33

Page 29: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Method evaluation procedure

I 1000 DNA stains comprising x contributors, x=1,...,5.

> Mix2<-replicate(1000,simumix(geno, ncontri = c(2, 0, 0)))

I For each mixture : an error is scored if the value of x thatmaximizes the likelihood is different from the true number ofcontributors.

> res<-sapply(Mix2,likestim,strusa,"Afri")

29 / 33

Page 30: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Mixture simulated with African American allele frequencies

I DNA stains comprising 1 to 5 individuals belonging to thesame population : African Americans

x 1 2 3 4 5

Max. Likelihood 1 1 0.94 0.79 0.67Max. All. count. 1 1 0.99 0.45 0.05

30 / 33

Page 31: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Other situations can be investigated

More functionalities available via other packages :

• Basic statistical inference

• Bayesian inference

• Familial analysis

• Population genetics

31 / 33

Page 32: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

How to get help

You are not familiar with :

Do not worry ! A detailed tutorial with practical and reproducibleexamples is available online :http://forensim.r-forge.r-project.org/

You are encountering problems using forensim :

I Post a message on forensim mailing list :[email protected].

I Contact me :[email protected]

32 / 33

Page 33: forensim: an open source initiative for method evaluation ...forensim.r-forge.r-project.org/misc/oslo.pdf · for forensic DNA evidence interpretation 8/33. Background The forensim

BackgroundThe forensim package

Method evaluation : An example

Contributions are greatly encouraged !

forensim is evolving, and you can participate !

I Suggestions ?

I Particular needs ?

I Contributions to the package : data, methods... are welcome !

http://forensim.r-forge.r-project.org/

33 / 33