LoComatioN: A software tool for the analysis of low copy number DNA profiles

www.elsevier.com/locate/forsciint

Forensic Science International 166 (2007) 128–138

LoComatioN: A software tool for the analysis of low copy

number DNA profiles

Peter Gill a,*, Amanda Kirkham a, James Curran b

a Forensic Science Service, Trident Court, 2960 Solihull Parkway, Solihull B37 7YN, UKb Department of Statistics, The University of Auckland, Private Bag 92019, Auckland, New Zealand

Received 23 February 2006; accepted 11 April 2006

Available online 8 June 2006

Abstract

Previously, the interpretation of low copy number (LCN) STR profiles has been carried out using the biological or ‘consensus’ method—

essentially, alleles are not reported, unless duplicated in separate PCR analyses [P. Gill, J. Whitaker, C. Flaxman, N. Brown, J. Buckleton, An

investigation of the rigor of interpretation rules for STRs derived from less than 100 pg of DNA, Forens. Sci. Int. 112 (2000) 17–40]. The method is

now widely used throughout Europe. Although a probabilistic theory was simultaneously introduced, its time-consuming complexity meant that it

could not be easily applied in practice. The ‘consensus’ method is not as efficient as the probabilistic approach, as the former wastes information in

DNA profiles. However, the theory was subsequently extended to allow for DNA mixtures and population substructure in a programmed solution

by Curran et al. [J.M. Curran, P. Gill, M.R. Bill, Interpretation of repeat measurement DNA evidence allowing for multiple contributors and

population substructure, Forens. Sci. Int. 148 (2005) 47–53]. In this paper, we describe an expert interpretation system (LoComatioN) which

removes this computational burden, and enables application of the full probabilistic method. This is the first expert system that can be used to

rapidly evaluate numerous alternative explanations in a likelihood ratio approach, greatly facilitating court evaluation of the evidence. This would

not be possible with manual calculation. Finally, the Gill et al. and Curran et al. papers both rely on the ability of the user to specify two quantities:

the probability of allelic drop-out, and the probability of allelic contamination (‘‘drop-in’’). In this paper, we offer some guidelines on how these

quantities may be specified.

# 2006 Elsevier Ireland Ltd. All rights reserved.

Keywords: Low copy number (LCN); Automation; LoComatioN; Likelihood ratio; Propositions

1. Introduction

Low copy number (LCN) DNA profiling is a term used to

describe the analysis of very small amounts of DNA from a few

cells (<200 pg). In ideal conditions it is possible to successfully

get a profile from a single cell (�6 pg) by raising the number of

PCR amplification cycles from 28 to 34 [1]. Although the

sensitivity of the test is greatly improved, Gill et al. [2] and

Whitaker et al. [3] showed that interpretation is complicated.

One phenomenon is that heterozygotes become highly

‘‘imbalanced’’. Heterozygote imbalance arises when one of

the alleles in a heterozygous genotype amplifies more strongly

than the other, even though one person contributed both of the

alleles. We observe this ‘‘imbalance’’ as a difference in peak

* Corresponding author.

E-mail address: [email protected] (P. Gill).

0379-0738/$ – see front matter # 2006 Elsevier Ireland Ltd. All rights reserved.

doi:10.1016/j.forsciint.2006.04.016

heights or peak areas. Secondly, an extreme form of

heterozygote imbalance results in disappearance of an allele,

giving the false appearance of a homozygote. This phenomenon

is known as allele ‘drop-out’. Finally, ‘drop-in’ (contamina-

tion), where one or two additional alleles can appear in the

profile and must be included in the assessment [4]. LCN profiles

are often DNA mixtures. In modern DNA mixture interpreta-

tion it has become customary to use peak height or area

information to guide the choice of feasible genotype

combinations. However, interpretation methods that include

a consideration of peak height/area are not appropriate for LCN

evidence. These ‘‘quantitative’’ methods have primarily been

developed for profiles where there is a significant amount

(>200 pg) of DNA present. The assumption that the peak

height/area of alleles is proportional to the actual amount of

DNA present [5–7] is well established, however with LCN,

stochastic effects compromise this [8].

mailto:[email protected]

http://dx.doi.org/10.1016/j.forsciint.2006.04.016

P. Gill et al. / Forensic Science International 166 (2007) 128–138 129

Although a probabilistic method has been published [9,10]

the likelihood ratio (LR) calculations are far too complex to

carry out manually, especially when the theory was extended to

include mixture interpretation with multiple contributors. An

interim method, called the ‘‘biological model’’, was introduced.

The biological model depended upon the derivation of a

‘‘consensus’’ profile. A consensus profile only reports alleles

that were reproducible from two or more replicate analyses of

an extracted DNA sample [9,11]. As contamination tended to

be a single tube event of low probability, it was unlikely that

these alleles would be replicated in different analyses and

reported in the consensus profile. The biological model tended

to behave in a conservative way relative to the formal statistical

model, but does not make full use of the information available

in the replicate DNA profiles.

Curran et al. [12] recently introduced a set theoretic

formalization to allow the automatic calculation of LRs for

LCN profiles. This method has been implemented in a fully

functional software application called LoComatioN.

LoComatioN is a hypothesis driven expert system that enables

LRs for any number of different LCN propositions to be

evaluated. The construction of the LR follows the standard

format, requiring an evaluation of the probability of observing

the evidence under the prosecution and the defence hypotheses,

Hp and Hd, respectively. We call these hypotheses ‘‘proposi-

tions’’. An example might be a rape case where a woman alleges

she was raped by exactly one man. The prosecution proposition

(Hp) is that the crime scene stain consists of the victim (V) and the

suspect (S). The alternative or defence proposition (Hd) is that the

victim and someone unrelated to the suspect were the only

contributors. We denote this V + unknown (U). Of course, more

complex propositions may be suggested by the defence, and it

may be desirable to evaluate the LR with respect to several

different pairs of propositions. Although the theory to analyse

different propositions exists, in practice the computational

requirements for a reporting officer doing the calculations

manually are very time-consuming (and therefore potentially

error prone). As a result this option is often precluded. This

inability to provide adequate calculations to the court for

multiple propositions is a limiting factor and might be

detrimental because cases may be reported as ‘‘inconclusive’’.

The advantage of LoComatioN, is that the scientist is able to input

data from up to five replicate analyses, and is able to consider up

to five contributors to any mixture where the propositions can be

altered at will. This means that for virtually all mixtures, the

scientist can now rapidly evaluate any number of propositions

that the court requires. We hope that this means the

‘‘inconclusive’’ category will become something of the past.

The ability to evaluate multiple propositions means that

LoComatioN has an important role as an exploratory tool. We

show how sensitive the LR is to different conditioning

statements/propositions by reference to a complex case. To

facilitate the court going process and to resolve potential

uncertainties about the effects of different conditioning

statements, we have introduced guidance to formulate

propositions by incorporating some generalisations of Brenner

et al. [13], Weir [14] and Buckleton et al. [15].

2. Formulation of propositions

We use the following notation to show the respective

propositions in a typical mixture case conditioned on a victim

(V), suspect (S) and unknown (U) where the propositions are

Hp: V + S and Hd: V + U:

LR ¼ PrðEjHpÞPrðEjHdÞ

where the likelihood ratio is comprised of Hp (the prosecution

proposition) in the numerator and Hd (the defence proposition)

in the denominator. E is the evidence of the DNA crime profile.

The prosecution proposition (Hp) is initially based upon the

testimony of witnesses and other circumstances of the case.

DNA profiling is carried out on a crime stain and the results are

used to confirm or to refute the proposition. If the profile

matches the suspect (S), then the proposition Hp is supported. In

a DNA mixture, alleles that match S may be present, providing

support for Hp. However, additional alleles from other sources

may also be present and these may provide support for the

alternative defence proposition (Hd). Further refinement of

propositions might be required [16,17].

It is not always easy to specify propositions in complex cases

where multiple perpetrators/victims may be present. The DNA

result itself may indicate that different explanations are

possible. Furthermore, it is possible that Hp and Hd could be

very different from each other. For example under Hp we might

consider a victim and suspect to be the contributors (V + S),

whereas under Hd we might examine more complex proposi-

tions such as three unknowns being the contributors to the stain

(U1 + U2 + U3). There is a common misconception that the

number of contributors (nc) under Hp and Hd should be the

same. They do not.

3. Allele drop-out and the Q designation

Drop-out is an important defining feature of LCN. There are

two aspects to be included in probabilistic calculations: the first

is to estimate the probability of drop-out Pr(D) and the second is

to include the dropped out allele in the probabilistic assessment.

Originally the F designation [9] was used to signify the

possibility of drop-out event; a sample that shows a single

allele, a, can be designated aF, where F can be any allele,

including a. The probability of F = 1 since it includes all allelic

possibilities, the probability of a is pa, hence Pr(aF) = 2pa.

However, this formula may not be conservative, i.e. can over

estimate the LR in favour of Hp [9]. This is more likely to

happen when the probability of drop-out is low.

We have introduced an improved concept into LoComatioN

to facilitate programming. If drop-out is required to support a

proposition we consider that the identity of the unknown allele

Q can be anything, except those already observed in the DNA

profile:

PrðQÞ ¼ 1�Xn

i¼1

pi

P. Gill et al. / Forensic Science International 166 (2007) 128–138130

where n alleles are observed in the profile and pi is the

frequency of the ith allele.

Consider the following simple example. The crime stain

profile at locus THO1 has one allele of type 11. The suspect (S) is

genotype 9,11. Under Hp, we argue that allele 9 must have

dropped out. Under Hd, evaluation of the alternative explanation

(U) would include a probabilistic determination of all possible

pairwise combinations (that must include allele 11): 4,11; 5,11;

6,11, etc. a total of nine different combinations to be computed.

The Q designation is used, given drop-out, where Pr(Q)=pQ =

1 � p11, and this achieves exactly the same result in just one

computational step. The combination p211 is included under the

hypothesis that no drop-out has occurred. When mixtures are

considered, the computational savings are even greater.

3.1. Using the Q designation to formulate Hp and Hd

As an example, if the stain profile E = abc; S = ab; nc = 2

and all three alleles are low level, then under Hp, if drop-out has

occurred we consider all pairwise combinations of cQ where

pQ = 1 � pa � pb � pc:

PrðEjHp;DÞ ¼ PrðEjHp;DÞPrðDÞ; PrðEjHpÞ ¼ 2 pc pQ

(1)

Alternatively, if no drop-out has occurred:

PrðEjHp; D̄Þ ¼ PrðEjHpÞPrðD̄Þ;PrðEjHpÞ ¼ p2

c þ 2 pa pc þ 2 pb pc (2)

Hence, Pr(EjHp) comprises the sum of terms (1) and (2).

Under Hd, with two unknown (U1, U2) contributors, given

drop-out:

PrðEjHd;DÞ ¼ PrðEjHdÞPrðDÞ; PrðEjHdÞ ¼ 24 pa pb pc pQ

(3)

With no drop-out, such that alleles a, b, c are shared between

two contributors:

PrðEjHd; D̄Þ ¼ PrðEjHdÞPrðD̄Þ;PrðEjHdÞ ¼ 12 pa pb pcð pa þ pb þ pcÞ (4)

The likelihood ratio is LR = Pr(EjHp)/Pr(EjHd):

LR ¼ PrðD̄Þð2 pa þ 2 pb þ pcÞ þ PrðDÞð2 pQÞ12 pa pb½PrðD̄Þð pa þ pb þ pcÞ þ PrðDÞð2 pQÞ�

(5)

4. Estimation of Pr(D)

From Gill et al. [8], for low copy number DNA, in the absence

of degradation, it is reasonable to assume that the chance of allele

drop-out is independent of the locus. Note that if significant

degradation has occurred then high molecular weight loci will be

affected preferentially. Under LCN conditions, where DNA is

amplified 34 cycles, the biochemistry/detection system will

distinguish a single copy of DNA at any SGM+ locus [1]. We

provide a method to estimate Pr(D) by simulation based on the

assumption that Pr(D) is constant across all loci (Appendix I).

5. Estimation of Pr(C)

A contaminant event is the spurious occurrence of single

alleles from multiple sources, assumed to be independent events.

Probability of contamination is estimated from negative controls

as described by Gill and Kirkham [4]. Laboratory records

indicate a level of approximately dPrðCÞ ¼ 0:05 per sample where

dPrðCÞ ¼ n

LN

where n is the number of alleles observed in a series of negative

controls and N the total number of negative controls analysed

and L is the number of loci tested per sample (whether or not

alleles are actually observed). The ‘‘hat’’ over Pr(C) indicates

that this is an estimate.

The probability of any given allele appearing as a

contaminant is approximated to be the same as the probability

of its occurrence in the white Caucasian population (from a

frequency database).

6. A fully worked example with drop-out and

contamination

A suspect’s genotype at a particular locus is ab. The crime

sample profile (E) is a. The prosecution proposition (Hp) states

that the suspect (S) is the offender. This can only be explained if

drop-out of allele b had occurred. The defence proposition (Hd) is

that the offence has been committed by an unknown individual

(U), unrelated to the suspect. Using our previously defined

notation the likelihood ratio using propositions Hp: S and Hd: U is


Formulae for the numerator and denominator are given in

Table 1, illustrating use of the Q virtual allele designation in

conjunction with the probability of drop-out, Pr(D) and the

probability of no contamination PrðC̄Þ ¼ 1� PrðCÞ.The calculations for this simple example are just about

manageable by hand, but most propositions will be much more

complicated than this, comprising mixtures from two or more

people and two or more replicates. An example of LoComatioN

output and associated statistical analysis is given in Appendix II.

7. Casework example to illustrate evaluation ofmultiple propositions

7.1. Case circumstances

Late one night, two cohabiting females were woken by a

masked man who had broken into their flat. The intruder

threatened the women with a hammer. He ordered them to

engage in sexual acts but the victims did not comply. One

shouted for help and the other fought off the assailant. Both

victims sustained injuries caused by the hammer. The assailant

ran away, discarding the hammer outside the flat, which was

subsequently recovered. On questioning, the suspect denied

that the hammer was his.


Table 1

An illustration of the correct use of Q when drop-out is considered

Suspect (Mj) Pr(Mj) Pr(E = ajMj) Product

Hp numerator calculation

a,b 1 PrðDÞPrðD̄ÞPrðC̄Þ PrðDÞPrðD̄ÞPrðC̄Þ

Possible random men (Mj) Pr(Mj) Pr(E = ajMj) Producta

Hd denominator calculation

a,a p2a PrðD̄Þ2PrðC̄Þ PrðD̄Þ2PrðC̄Þ p2

a

a,Q 2papQ PrðDÞPrðD̄ÞPrðC̄Þ 2PrðDÞPrðD̄ÞPrðC̄Þ pa pQ

Q,Q p2Q

Pr(D)2Pr(C)pa PrðDÞ2PrðCÞ pa p2Q

The crime stain is of type a, the suspect is genotype ab and under Hp, we assume that given S, allele b has dropped out with probability Pr(D). Under Hd, given that the

suspect is innocent, then drop-out may or may not have happened. We evaluate a set of possible ‘‘random man’’ genotypes worth considering M1, M2, M3.a Denominator = sum of the products.

Table 2

Tabulated PCR amplification results from casework example

Allelic results observed at each loci tested

Amelo D3 VWA D16 D2 D8 D21 D18 D19 THO FGA

Sample

(R1)

XY 14 16 15 16 19 11 13 14 20 23 24 25 11 12 13 15 28 31 12 14 15.2 17.2 6 8 9 9.3 22

Sample

(R2)

XY 14 16 15 16 17 19 11 13 14 20 24 25 11 12 13 15 28 29 30 31 31.2 13 14 16 17 12 13 14 15.2 17.2 6 8 9 9.3 22 23 25

Victim 1 XX 16 16 15 16 13 13 20 20 11 15 29 30 17 17 12 14 6 8 22 25

Victim 2 XX 15 17 16 19 12 13 18 25 11 13 29 30 15 17 14 14 6 7 20 22

Suspect XY 14 16 15 19 11 14 24 25 12 13 28 31 14 17 15.2 17.2 9 9.3 22 23

7.2. Propositions and DNA analysis

The overall purpose of the investigation was to establish

whether the hammer was relevant evidence—i.e. was the

hammer used/not used in the attack? The specific purpose of the

DNA investigation was to establish if there was evidence to

support or to refute alternative propositions [16] of the kind:

� H

Ta

Ta

Co

N

p: the DNA from the hammer originated from the suspect

and two victims;

� H
d: the DNA from the hammer originated from an unknown
individual unrelated to the suspect, and two victims.

The hammer-head was swabbed and two LCN PCR

amplification replicates (R1 and R2) were obtained (Table 2).

The results showed that at some loci more than two alleles were

present, suggesting a mixture (following guidelines of Clayton

et al. [6]). Both PCR amplification and extraction reagent

negatives were blank, indicating no obvious source of gross

contamination. From laboratory records of negative controls,

Pr(C) = 0.05.

ble 3

bulated consensus PCR amplification results from R1 and R2 in the casework ex

Allelic results (consensus) observed at each loci tested

Amelo D3 VWA D16 D2

nsensus result XY 14 16 15 16 19 11 13 14 20 24 25

B. The alleles in bold denote alleles that could be attributed to the victims.

7.3. Traditional consensus method (biological model)

The consensus approach [9] was dependent upon experi-

mental reproducibility of individual alleles. The method

compared two separate PCR amplification results and the

calculation of the LR was derived from the consensus of

duplicated alleles at each locus in R1 and R2 (Table 3). The

consensus approach uses the F designation to signify drop-out.

The assumptions in this model were:

1. a

am

D

1

three person mixture, nc = 3;

2. b
oth victims were considered to be contributors under both
Hp and Hd.

We evaluate propositions Hp: V1 + V2 + S and Hd:

V1 + V2 + U.

The standard approach was used: any alleles that matched

either of the victims were subtracted to leave a partial profile

(Table 4), interpreted as S under Hp and U under Hd.

There were seven alleles shared between both victims and

the suspect. The F designation was subsequently assigned to

ple

8 D21 D18 D19 THO FGA

1 12 13 15 28 31 12 14 15.2 17.2 6 8 9 9.3 22


Table 4

Tabulated ‘foreign’ alleles defining the assailant’s DNA components in the DNA mixture taken from the hammer

‘Foreign’ alleles defining the offender’s DNA profile

Amelo D3 VWA D16 D2 D8 D21 D18 D19 THO FGA

Offender Y 14 F F F 11 14 24 F 12 F 28 31 F F 15.2 17.2 9 9.3 F F

any locus where one allele was present [9] to signify the

possibility of allele drop-out:


¼ 1:55� 106ðwhite Caucasian reference databaseÞ

7.3.1. LoComatioN analysis

We now evaluate the effect of comparing different

alternative pairs of propositions in the context of the fully

probabilistic model that incorporates probabilities of drop-out

and contamination into the LR [9,12]. This model is much more

powerful than the consensus approach, taking the interpretation

process a stage further. A consensus profile is not derived.

Consequently, it is possible to calculate the LR relative to a

single analysis (R1), although replicate (R1, R2, . . ., Rn) analyses

are much to be preferred, because more information is

incorporated into the calculation. The Q virtual allele is used

when drop-out occurs, instead of F in the ‘consensus’ method.

7.4. Application of the theory to evaluate multiple

propositions

Casework circumstances are often complex. Multiple pairs

of propositions may be possible, but the prime consideration is

that the suspect S is always in the numerator under Hp and this is

replaced by U in the denominator under Hd. A dialogue may

ensue in court where the scientist is requested to evaluate the

LR using multiple ‘what-if’ propositions. LoComatioN can be

used as an exploratory tool for this purpose.

The profile in the example can be interpreted using several

different propositions conditioned on nc = 2 persons or

alternatively nc = 3 persons mixtures, from an average of 32

bands in R1 and R2 DNA profiles (Table 2). The estimated upper

bound on the value of the probability of drop-out is, dPrðDÞ0:95 ¼0:16 and 0.38, respectively (Appendix I).

From a preliminary assessment of evidence in this case, the

first iteration of propositions is as follows.

Proposition 1. Hp: V1 + V2 + S and Hd: V1 + V2 + U.

However, examination of the DNA results suggested a

possible alternative explanation. All of the alleles that could be

attributed to victim two are shared with either victim one or the

suspect. Therefore, the propositions could be modified as

follows.

Proposition 2. Hp: V1 + S and Hd: V1 + U.

Now we condition upon a two person mixture. However,

this would require five alleles to be explained as

contamination events (D18-16, D21-31.2, D2-23, D16-12,

VWA-17). As Pr(C) = 0.05 per DNA profile, this would be

unlikely. A more plausible explanation would be that DNA

from three contributors was present, where one was unknown

under Hp and Hd (i.e. transfer of DNA to the hammer from an

unknown person could have occurred before the crime

event). The absence of a DNA profile from V2 does not

imply that she was not hit with the hammer, since transfer

of DNA as a result of physical contact is dependent

upon unquantifiable factors and is not a foregone conclusion

[18].

Proposition 3. Hp: V1 + S + U and Hd: V1 + U1 + U2.

For illustrative purposes only we also consider two separate,

albeit highly improbable, propositions (since we believe that V2

DNA is absent), but it is interesting to determine the effect on

the LR if V2 is substituted for V1.

Proposition 4. Hp: V2 + S and Hd: V2 + U.

Proposition 5. Hp: V2 + S + U1 and Hd: V2 + U1 + U2.

Finally, to illustrate an unbalanced pair of propositions

where Hp is anchored on V1 and S we evaluate Hd using

V1 + U1 + U2—since nc is different under Hp and Hd, Pr(D) is

conditioned on nc = 2 and 3, respectively.

Proposition 6. Hp: V1 + S1 and Hd: V1 + U1 + U2.

The probability of contamination was kept constant

(Pr(C) = 0.05) for all propositions; with Pr(D) varied from

0.01 to 0.95 by 0.05 increments. LRs were calculated across all

loci for each level of Pr(D) (Fig. 1).

The highest LRs were calculated using Proposition 6 Hp:

V + S and Hd: V1 + U1 + U2, followed by Proposition 2 Hp:

V1 + S and Hd: V1 + U. However, we restate that neither is

optimal for court reporting for the reasons outlined previously.

Whereas the proposition V1 + S appeared to favour Hp the

most, given the large number of unknown alleles that cannot be

realistically explained by contamination, we advocate Hp:

V1 + S + U as the simplest and most realistic prosecution

proposition. Proposition 1: Hp: V1 + V2 + S and Hd:

V1 + V2 + U and Proposition 3: V1 + S + U2 and Hd:

V1 + U1 + U2 give LRs that are very similar. The substitution

of V2 with U2 makes very little difference to the result, i.e. it

does not assist the defence to argue whether V2 is present or

whether an unknown person was present in the crime profile.

The lowest LRs were calculated with Proposition 4: Hp:

V2 + S and Hd: V2 + U. This result was not unexpected, as

seven (out of twenty) of the alleles of victim two were not

reproduced in any of the amplification replicate results—giving

a much smaller numerator value.


Fig. 1. Casework example, log10 genotype likelihood ratios vs. probability of drop-out for each pair of propositions tested. The large striped arrows correspond

to the x-axis estimate of probability of drop-out dPrðDÞ0:95 for nc = 2 and the large solid arrows estimate probability of drop-out dPrðDÞ0:95 for nc = 3 (32 allele

profile). A horizontal line to the y-axis gives an estimate of the log10 LR. For each line on the graph, the alternative prosecution and defence propositions are

given in the format Hp/Hd.

Finally, Proposition 6: Hp: V1 + S and Hd: V1 + U1 + U2

gave the greatest LR, but as previously indicated; invoking

multiple independent contaminant alleles is not particularly

realistic and was therefore not advocated. Proposition 3 was

preferred, whilst noting that Proposition 1 made very little

difference with respect to the final LR at the predicted drop-out

level Pr(D) = 0.38. The main purpose of this demonstration was

to show how easy it is to rapidly evaluate any propositions

required by the court. An important feature is that all

calculations are relatively insensitive to Pr(D) since the fall

in LR was small over the realistic range of Pr(D).

7.5. Comparison with the consensus model

The consensus, or biological model results, evaluated Hp:

V1 + V2 + S and Hd: V1 + V2 + U and the LR = Pr(EjHp)/

Pr(EjHd) = 1.55 � 106. This was conservative relative to all

propositions tested except for the unrealistic pair of

Propositions 4: Hp: V2 + S and Hd: V2 + U.

8. Discussion

Whereas the contamination parameter is relatively straight-

forward to estimate from experimental observation of negative

controls [4], the drop-out parameter is more problematic. Under

the assumption that allelic drop-out is random [8] we currently

estimate the distribution of this parameter from the number of

alleles present in the DNA profile, relative to profiles randomly

generated from a reference population database such as

Caucasian. Different distributions result from different popula-

tion databases—but the differences are minor (data not shown).

It is currently impracticable to estimate multiple drop-out

parameters (one for each potential contributor), consequently

we effectively use an average (unweighted) value.

It is informative to evaluate the effect of altering the drop-out

parameter of individual loci comparing Hp: S + U and Hd:

U1 + U2 (Table 5). Under Hp, S = ab and U = cd. To simplify

calculations we evaluate a locus where alleles are either

common ( p = 0.1) or rare ( p = 0.02). We have not considered


Table 5

LRs calculated for typical drop-out and contamination events at a single locus where Pr(allele) = 0.1 or 0.02, respectively, evaluating Hp: S + U and Hd: U1 + U2

Condition Match probability of allele = 0.1 Match probability of allele = 0.02

Pr(D) = 0.1 Pr(D) = 0.5 Pr(D) = 0.9 Pr(D) = 0.1 Pr(D) = 0.5 Pr(D) = 0.9

No drop-out; no contamination 8.3 8.3 8.3 208 208 208

1 suspect allele dropped out 0.4 0.98 1.16 3.44 4.3 4.42

1 unknown allele dropped out 9.1 2.5 0.3 77.5 10.7 1.2

Both suspect alleles dropped out 0.21 0.21 0.21 0.17 0.17 0.17

1 contamination event; no drop-out 5 5 5 125 125 125

1 contamination event; 1 suspect allele dropped out 0.03 0.24 1.3 0.14 1.2 6.7

the effect of FST in these comparisons. Nevertheless, we

illustrate that the following generalisations are useful when

evaluating any locus:

(a) I
f it is not necessary to invoke drop-out or contamination
under Hp in order to explain S then the LR is constant

because Pr(D) cancels out in the numerator and denomi-

nator.

(b) I
f one S allele has dropped out then the evidence tends to be
neutral, or favours Hd, dependent upon whether the

remaining S allele is rare.

(c) I
f both S alleles have dropped out, i.e. complete locus drop-
out under Hp then the evidence always favours Hd

independent of Pr(D).

(d) S
imilarly, Pr(D) cancels when a contamination event occurs
provided both suspect alleles are present—the profile is

type abcde. Hp is favoured.

(e) I
f one contaminant band and one drop-out event has
occurred under Hp, then the LR will favour Hd; the greater

Pr(D), the greater the LR becomes.

(f) C
onversely, if an unknown allele is alleged to have dropped
out under Hd, then this also reduces the LR—the greater

Pr(D), the lower the LR becomes.

The biggest effect occurs when Hp can only be explained if

drop-out has occurred (e.g. the profile is abd) regardless of the

value of Pr(D) chosen within the range 0.1 < Pr(D) < 0.9, the

LR drops by an approximate order of magnitude within this

range. In addition, the lower the Pr(D) the less likely it is that

drop-out is a satisfactory explanation under Hp, and conse-

quently the lower the LR becomes.

8.1. General conclusions on forming propositions

LoComatioN enables rapid evaluation of multiple proposi-

tions. Sometimes it is difficult to formulate propositions in

casework because of uncertainties surrounding the casework

circumstances. This is especially true for DNA profiles where

the amount of DNA is limited. In addition, there may be

ample opportunity for transfer of DNA to have occurred

before the crime event. The case example described provided

an opportunity to evaluate the effect of choosing different

propositions for analysis. The profile was a mixture where it

was unclear whether a victim’s DNA was present. We showed

that the issue of whether V2 or U was the best explanation

under Hd was of trivial consequence. This leads us to propose

a possible new approach to assist in the evaluation of

evidence.

Reasonable (multiple) pairs of propositions can be selected

in agreement with the court requirements. A minimum LRmin

(the lowest LR calculated) can provide a base-line. It is worth

noting that all propositions will have S in the numerator

substituted by U in the denominator, i.e. we have shown that

any differences between LRs are a result of secondary issues

that relate to the number and conditioning of contributors to the

crime stain evidence. If LR differences are trivial or bounded by

LRmin, then the court may view that the peripheral issues are

simply not relevant to the evidence, as it does not affect the

primary consideration of whether the suspect contributed to the

crime stain.

If there are several alleles from an unknown source in a

crime sample, then it is unlikely that these are explained by a

contamination probability which is strictly only valid under the

assumption that the contaminant alleles present are indepen-

dent, and not from a single source. With Pr(C) = 0.05, on

average, we would expect only one to two contaminant alleles.

Consequently we recommend that profiles with three or more

alleles that cannot be explained by the casework circumstances

are always evaluated by invoking an addition unknown (U)

contributor as the most reasonable explanation.

The second recommendation is to use the Q designation with

caution under Hp, since it always increases Pr(EjHp).

Conversely, to maximise Pr(EjHd) it is reasonable to use Q

if the alleles are at low level.

8.2. LoComatioN as a LR calculator for ‘conventional’

DNA profiles

LoComatioN can also be used to calculate LRs from

conventional 28 cycle DNA profiles as well. There is a

misconception that the low copy number definition applies only

to elevated PCR cycle number. However, the defining feature of

LCN is drop-out and drop-in. These phenomena also occur with

28 PCR cycles. Most laboratories have guidelines to indicate

whether a given profile is sufficient for conventional

interpretation (i.e. precluding allele drop-out). Many will

report major/minor mixtures where the minor component is

attributed to the suspect under Hp, but allele drop-out may be

observed. All of the considerations described previously, also

apply to low level DNA analysed using 28 PCR cycles.


If the alleles at a locus are above an experimentally defined

threshold level (e.g. 150rfu) then allele drop-out is unlikely to

occur. Under these conditions, Pr(D) � 0 and consequently the

Q designation is not relevant to the calculation of the LR. Under

these conditions the theory used by LoComatioN converges to

models previously described [19]—however, the advantage is

that Pr(C) can be incorporated, multiple propositions can be

evaluated, and furthermore the information from several

replicates can be combined into one LR if necessary.

Appendix A

A.1. Simulation of the empirical likelihood for the

probability of drop-out

In the following simulations we consider the number of

contributors, nc, and the probability of contamination Pr(C) to

be fixed in advance. The goal of the simulations is to estimate

the probability of observing x alleles at L loci given that the

probability of drop-out is equal to D, Pr(D) = D. That is, we

wish to estimate Pr(xjD, C, nc). Given that Pr(C) and nc are

constant, this becomes Pr(xjD). The problem is that we do not

know D. Therefore we use the data, x, to estimate D using

maximum likelihood estimation. This quantity is called the

likelihood of D and is denoted L(D). However, we do not know

the likelihood function of D given x either, so we have

constructed a simulation in order to estimate the likelihood

function of D given x. As L(D) is estimated from simulation we

call it the empirical likelihood of D.

A.1.1. Simulation details

There are three parts to the simulation. Firstly we must

specify the value of D. Secondly, we must repeatedly generate

nc random DNA profiles and combine them together subject to

Fig. 2. The likelihood surface for the probability of drop-out, given two

contributors and Pr(C) = 0.05.

drop-out. Finally we must consider that contamination may

have occurred. Each iteration of the simulation (for a given

value of D) will produce a random profile that could have

resulted from the contribution of nc unrelated individuals

profiles, and from this profile we can count the number of

observed alleles, x. Note that because we are not considering

quantitative information such as peak heights or areas, it is

possible for allele masking to occur. For example, if nc = 2 and

two random profiles are ab and bc, we will only observe abc in

the resulting scene stain. Hence, even with no drop-out

(Pr(D) = 0), it is possible to observe fewer than 2ncL alleles.

The frequency with which different values of x occur for a given

value of D is estimate of Pr(xjD).

A.1.2. Simulation pseudo-code

Descriptions of simulations are always problematic. For that

reason, we describe out simulation in pseudo-code so that those

who are interested may replicate the work.

for D = 0.0, 0.01, 0.02, . . ., 0.90

let ˜ f ¼ ½0; . . . ; 0�, where ˜ f is a vector of length (2nc + 1)L + 1

for i = 1, . . ., N

Make the scene profile blank

for j = 1, . . ., nc

for l = 1, . . ., L

Select two alleles at random, Al1, Al2 with probability pAlk, k = 1, 2

Generate two random uniform numbers, u1, u2 � U[0, 1]

If u1 � D then add allele Al1 to the scene profile

If u2 � D then add allele Al2 to the scene profile

for l = 1, . . ., L

Generate a random uniform number, u � U[0, 1]

If u Pr(C) add a random allele Al1, selected with

probability pAl1to the scene profile

Record x, the total number of alleles observed

Let fx = fx + 1 (the elements of ˜ f are labelled 0 to (2nc + 1)L)

let Pr(xjD) = ( fx/N), x = 0, 1, . . ., (2nc + 1)L

where L is the number of loci in the multiplex (L = 10 for

SGM+), N is the number of iterations per value of D. Increasing

N will reduce the Monte Carlo sampling error in px. pAlkis the

frequency of the kth allele at the lth locus in the population

database. Note that usage above just means we select alleles

randomly with probability proportional to their frequency in the

database (population).

The range of x is from 0 to (2n + 1)L because each individual

can contribute at most two distinct peaks and furthermore we

allow at least one contaminant allele per locus which may also

be distinct. So when n = 2, there is a possibility that we will

observe 0, . . ., 5 peaks and 0, . . ., 50 peaks over 10 loci.

A.1.3. Simulation results

Fig. 2 shows the likelihood surface for the probability of

drop-out, given two contributors (nc = 2) and Pr(C) = 0.05.

How is this used? This is best demonstrated by example.

Consider the case in Section 5. A total of 32 alleles were

observed across ten loci. Let us initially postulate that there

were only two contributors to this profile. If x is constant, at 32,

then the graph in Fig. 1 lets us answer the question ‘‘what is the

most likely value for Pr(D) if x = 32?’’ We do this taking a


Fig. 3. Likelihood function for the probability of drop-out when x = 32 and

nc = 2.

Fig. 4. The cumulative distribution function (cdf) F(Djx = 32) for a profile with

32 alleles. The solid line is the cdf for D assuming that there are three (nc = 3)

contributors to this mixture, whereas the dashed line is the cdf for D assuming

that there are two (nc = 2) contributors. The y-axis tells us the probability that D

is smaller than the value on the x-axis. For example, if a vertical line from the x-

axis is drawn at the point 0.16 to where it hits the dashed line, and a horizontal

line to the y-axis, it hits at about 0.95. We interpret this as ‘‘assuming only two

people contributed to this mix, we are 95% sure that the true value of Pr(D) is

less than 0.16.

‘‘slice’’ of Fig. 1 along the line x = 32. This yields the graph in

Fig. 3.

From Fig. 3 we can see the maximum occurs when

Pr(D) = 0. This means that 32 alleles are not uncommon when

there is no drop-out and two contributors to the stain. However,

we can see that it is also quite probable that we would observe

32 alleles even if Pr(D) = 0.2. Actually it is about 16 times less

likely, but the point we wish to make is that it is not impossible

to observe 32 alleles when Pr(D) = 0.2. Therefore, what we

would like to do is put some sort of confidence bound on Pr(D).

That is, we would choose a value D* so that 95% of intervals of

the form [0, D*] would contain the true value. Although we use

95% in as an example throughout this paper there is no reason

why a more stringent value (e.g. 99.9%) could not be used. To

do this we need to estimate the cumulative distribution function

(cdf) for the probability of drop-out given a certain value of x.

We can change the likelihood function in Fig. 3 to a probability

function by normalising it—i.e. making sure that the area under

the curve sums to one (Fig. 4). In doing this, we are making the

assumption that the probability of drop-out is a discrete random

variable.1 In theory it is not, but in practice if we know the

probability of drop-out to the nearest 1% (0.01) then this will be

sufficient to calculate the LR without substantial bias to the

defendant. Once we have the probability function for D, f(Djx),

we can calculate the cumulative distribution function:

FðDjxÞ ¼Xd¼D

d 2f0;0:01;0:02;...gf ðD ¼ djxÞ

The actual level of drop-out used in the LR calculations was

taken from the 5th or 95th percentile of the cdf, dependent upon

1 And we are implicitly placing a uniform prior on it as well. Technically the

normalization of the likelihood is a Bayesian operation, hence the interpretation

of the resulting intervals are correct in a Bayesian sense.

the level that minimised the LR—in practice this is usually the

95th percentile. Mathematically we evaluate qa = F�1(a)

where F�1(a) inverse cumulative distribution function is given

by finding the value x such thatR x�1 f ðtÞ dt ¼ a. a = 0.05 for

the 5th percentile and a = 0.95 for the 95th percentile. In

practice we approximate the cdf as a piecewise linear function.

We find two points q1 and q2, such that F(q1) < a < F(q2)

and we return F�1ðaÞ � wq1 þ ð1� wÞq2 where w ¼ða� Fðq1ÞÞ=ðFðq2Þ � Fðq1ÞÞ. In our example this yields

values of 0.16 and 0.38.

Appendix B. A more detailed example of LoComatioN

principles

In LoComatioN [12] the Q allele designation enables

probabilistic evaluation of all possible allelic combinations,

including those that could be explained if drop-out and

contamination had happened. From the casework example, we

evaluate all possible allele propositions for each locus in turn.

For example for the case stain evidence (E) at the D3 locus we

have two identical results: R1 = R2 = 14,16. The suspect, S, has

genotype 14,16 and the victim V, has genotype 16,16. The

propositions under consideration are:

� H
p: the victim, suspect and one unknown unrelated
contributor are the only people who have contributed to this

stain (V + S + U);

� H
d: the victim and two unknown unrelated contributors are
the only people who have contributed to this stain

(V + U1 + U2).


Table 6

Illustration of probabilistic principles employed to formulate the probabilities under Hp

Proposed contributing

genotypes V + S + U

Pr(R1 = 14,16jMj) Pr(R2 = 14,16jMj) Pr(Mj) Product

16,16 + 14,16 + 14,14 No drop-out, no

contamination PrðD̄Þ6PrðC̄ÞNo drop-out, no

contamination PrðD̄Þ6PrðC̄Þp3

14 p316 PrðD̄Þ12

PrðC̄Þ2 p314 p3

16

16,16 + 14,16 + 14,16 PrðD̄Þ6PrðC̄Þ PrðD̄Þ6PrðC̄Þ 2 p214 p4

16 2PrðD̄Þ12PrðC̄Þ2 p2

14 p416

16,16 + 14,16 + 16,16 PrðD̄Þ6PrðC̄Þ PrðD̄Þ6PrðC̄Þ p14 p516 PrðD̄Þ12

PrðC̄Þ2 p14 p516

16,16 + 14.16 + 14,Q No drop-out, drop-out and

contamination PrðD̄Þ5PrðDÞPrðC̄ÞNo drop-out, drop-out and

contamination PrðD̄Þ5PrðDÞPrðC̄Þ2 p2

14 p316 pQ 2PrðD̄Þ10

PrðDÞ2PrðC̄Þ2 p214 p3

16 pQ

16,16 + 14,16 + 16,Q PrðD̄Þ5PrðDÞPrðC̄Þ PrðD̄Þ5PrðDÞPrðC̄Þ 2 p14 p416 pQ 2PrðD̄Þ10

PrðDÞ2PrðC̄Þ2 p14 p416 pQ

16,16 + 14,16 + Q,Q PrðD̄Þ4PrðDÞPrðC̄Þ PrðD̄Þ4PrðDÞPrðC̄Þ p14 p316 p2

Q PrðD̄Þ8PrðDÞ4PrðC̄Þ2 p14 p316 p2

Q

The numerator is then calculated by summing the entire product column, using the total law of probability.

Evaluation of the probability of the evidence under Hp is

straight-forward – the unknown contributor, U, is allowed to

have a genotype formed by any combination of alleles 14, 16

and Q – allowing for the possibility of drop-out to be

considered. Hence, the genotypes considered for the unknown

contributor, under Hp, would be: 14,14; 14,16; 16,16; 14,Q;

16,Q; Q,Q.

Fig. 5. LoComatioN screen-shot showing some of the allelic combinations to be co

(Table 2) LR = Pr(EjHp)/Pr(EjHd). Under Hd, all potential genotypes from U1 + U

In order to illustrate the probabilistic principles employed in

the software, the calculations have been formulated for the Hp

alternatives in Table 6.

The Hd calculations proceed in a similar fashion, however

under Hd there are two unknown contributors, making the list

of possible alternative genotypes for U1 and U2 a great deal

longer, see Fig. 5 for allele combination listings. The following

nsidered under Hp: V + S + U and Hd: V + U1 + U2 from a casework example

2 contributors are considered.


Table 7

Expansion of the first four rows of Fig. 5, to illustrate probabilistic principles employed to formulate probabilities under Hd

Proposed contributing

genotypes V + U1 + U2

Pr(R1 = 14,16jMj) Pr(R2 = 14,16jMj) Pr(Mj) Product

16,16 + 14,14 + 14,14 No drop-out, no

contamination PrðD̄Þ6PrðC̄ÞNo drop-out, no

contamination PrðD̄Þ6PrðC̄Þp5

14 p316 PrðD̄Þ12

PrðC̄Þ2 p514 p3

16

16,16 + 14,14 + 14,16 PrðD̄Þ6PrðC̄Þ PrðD̄Þ6PrðC̄Þ 4 p414 p4

16 4PrðD̄Þ12PrðC̄Þ2 p4

14 p416

16,16 + 14,14 + 14,Q No drop-out, drop-out and

no contamination PrðD̄Þ5PrðDÞPrðC̄ÞNo drop-out, drop-out and

no contamination PrðD̄Þ5PrðDÞPrðC̄Þ4 p4

14 p316 pQ 4PrðD̄Þ10

PrðDÞ2PrðC̄Þ2 p414 p3

16 pQ

16,16 + 14,14 + Q,Q PrðD̄Þ4PrðDÞ2PrðC̄Þ PrðD̄Þ4PrðDÞ2PrðC̄Þ 6 p314 p3

16 p2Q 6PrðD̄Þ8PrðDÞ4PrðC̄Þ2 p3

14 p316 p2

Q

Table 7 has been included in order to demonstrate that the

principles applied to Hp, also apply to Hd.

References

[1] I. Findlay, A. Taylor, P. Quirke, R. Frazier, A. Urquhart, DNA fingerprint-

ing from single cells, Nature 389 (1997) 555–556.

[2] P. Gill, R. Sparkes, C. Kimpton, Development of guidelines to designate

alleles using an STR multiplex system, Forens. Sci. Int. 89 (1997) 185–

197.

[3] J.P. Whitaker, E.A. Cotton, P. Gill, A comparison of the characteristics of

profiles produced with the AMPFlSTR SGM Plus multiplex system for

both standard and low copy number (LCN) STR DNA analysis, Forens.

Sci. Int. 123 (2001) 215–223.

[4] P. Gill, A. Kirkham, Development of a simulation model to assess the

impact of contamination in casework using STRs, J. Forens. Sci. 49 (2004)

485–491.

[5] M. Bill, P. Gill, J. Curran, T. Clayton, R. Pinchin, M. Healy, J. Buckleton,

PENDULUM—a guideline based approach to the interpretation of STR

mixtures, Forens. Sci. Int. 148 (2004) 181–189.

[6] T.M. Clayton, J.P. Whitaker, R. Sparkes, P. Gill, Analysis and interpreta-

tion of mixed forensic stains using DNA STR profiling, Forens. Sci. Int. 91

(1998) 55–70.

[7] P. Gill, R. Sparkes, R. Pinchin, T. Clayton, J. Whitaker, J. Buckleton,

Interpreting simple STR mixtures using allele peak areas, Forens. Sci. Int.

91 (1998) 41–53.

[8] P. Gill, J. Curran, K. Elliot, A graphical simulation model of the entire

DNA process associated with the analysis of short tandem repeat loci,

Nucleic Acids Res. 33 (2005) 632–643.

[9] P. Gill, J. Whitaker, C. Flaxman, N. Brown, J. Buckleton, An investigation

of the rigor of interpretation rules for STRs derived from less than 100 pg

of DNA, Forens. Sci. Int. 112 (2000) 17–40.

[10] J. Buckleton, P. Gill, Low copy number, in: J. Buckleton, C.M. Triggs, J.S.

Walsh (Eds.), Forensic DNA Evidence Interpretation, CRC Press, 2005,

pp. 275–297.

[11] P. Taberlet, S. Griffin, B. Goossens, S. Questiau, V. Manceau, N. Escara-

vage, L.P. Waits, J. Bouvet, Reliable genotyping of samples with very low

DNA quantities using PCR, Nucleic Acids Res. 24 (1996) 3189–3194.

[12] J.M. Curran, P. Gill, M.R. Bill, Interpretation of repeat measurement DNA

evidence allowing for multiple contributors and population substructure,

Forens. Sci. Int. 148 (2005) 47–53.

[13] C.H. Brenner, R. Fimmers, M.P. Baur, Likelihood ratios for mixed stains

when the number of donors cannot be agreed, Int. J. Legal Med. 109

(1996) 218–219.

[14] B.S. Weir, DNA statistics in the Simpson matter, Nat. Genet. 11 (1995)

365–368.

[15] J. Buckleton, J.M. Curran, P. Gill, Towards understanding the effect of

uncertainty in the number of contributors to DNA stains, Forens. Sci. Int.,

in press.

[16] I.W. Evett, G. Jackson, J.A. Lambert, More on the hierarchy of proposi-

tions: exploring the distinction between explanations and propositions,

Sci. Justice 40 (2000) 3–10.

[17] R. Cook, I.W. Evett, G. Jackson, P.J. Jones, J.A. Lambert, A model for case

assessment and interpretation, Sci. Justice 38 (1998) 151–156.

[18] A. Lowe, C. Murray, J. Whitaker, G. Tully, P. Gill, The propensity of

individuals to deposit DNA and secondary transfer of low level DNA from

individuals to inert surfaces, Forens. Sci. Int. 129 (2002) 25–34.

[19] I.W. Evett, C. Buffery, G. Willott, D. Stoney, A guide to interpreting single

locus profiles of DNA mixtures in forensic cases, J. Forens. Sci. Soc. 31

(1991) 41–47.

LoComatioN: A software tool for the analysis of low copy number DNA profiles

Documents

Transcript of LoComatioN: A software tool for the analysis of low copy number DNA profiles