Post on 16-Dec-2018
TRANSCRIPT OF PROCEEDINGS
S CR 2014 0007
SUPREME COURT OF VICTORIA
CRIMINAL JURISDICTION
MELBOURNE
THURSDAY 20 APRIL 2017
(2nd day of hearing)
BEFORE THE HONOURABLE JUSTICE EMERTON
DIRECTOR OF PUBLIC PROSECUTIONS v. CLINTON JAMES TUITE
VICTORIAN GOVERNMENT REPORTING SERVICE7/436 Lonsdale Street, Melbourne Vic 3000 - Telephone 9603 9134161889
Pages 119 - 177
1
1
2
345
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2122
23
2345
DR ROGERS: Good morning, Your Honour.
HER HONOUR: Good morning.
DR ROGERS: Dr Duncan Taylor appears on the video-link and is
ready to give his evidence.
HER HONOUR: Good morning, Dr Taylor?---Good morning.
Yes, Dr Rogers. We have got to swear the witness in, I'm
sorry.
.DF:DM:CAT 20/04/17 SC 11A 119 DISCUSSIONTuite
1
1
2
3
4
5
6
7
23
<DUNCAN ALEXANDER TAYLOR, sworn and examined:
DR ROGERS: Dr Taylor, do you have - did you prepare a
statement, a 19 page document dated 17 April 2015?---Yes,
I did.
Do you have a copy of that in front of you?---Yes.
And that's a truthful document?---Yes.
I tender that.
#EXHIBIT B - - Statement of Dr Duncan Taylor dated 17/4/2015.
Do you have another statement, 20 pages, dated 15 August 2016,
signed by you?---Yes.
And that's a truthful document?---Yes, it is.
Yes, I tender that.
HER HONOUR: What date did you give for that?
DR ROGERS: 15 August 2016.
HER HONOUR: All right. I have got here a statement dated 2
September 2016, is that not - that's the cover, is it?
DR ROGERS: Have I got the wrong date.
MR DESMOND: Yes, well, the cover page.
DR ROGERS: The cover page has got the notice of additional
evidence.
MR DESMOND: That's the 2nd of September - - -
HER HONOUR: It's the cover page that's the wrong date.
DR ROGERS: It's the date it was served, Your Honour, I
understand.
HER HONOUR: All right. Yes, I see this is dated 15 August.
#EXHIBIT C - - Statement of Dr Duncan Taylor dated 15/8/2016.
DR ROGERS: Yes, thank you.
HER HONOUR: Yes, Mr Desmond.
MR DESMOND: Thank you, Your Honour.
<CROSS-EXAMINED BY MR DESMOND:.DF:DM:CAT 20/04/17 SC 11A 120 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
89
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
2728
29
30
31
32
23
Good morning, Dr Taylor?---Good morning.
Sir, were you provided with a copy of Professor Balding's
statement at some point?---No, I wasn't.
Okay. Is it correct - he's given some evidence that he did
have some conversations with you, as best I could
approximate it, it would be in the lead up to 22 June
2015 when he wrote his statement?---Yes, I think we did.
He asked you perhaps some questions in more detail about STRmix
and the program, those sorts of matters?---Yes, that's
right.
Did you happen to make notes of those conversations, by the
way?---No, I don't have any notes of those.
I just want to read to you, doctor, part of the professor's
report - this is on p.4. He says, "From Dr Taylor's
comments on p.1", so he's referring to your statement,
he's identified the statement of 1 July 2014, so it
appears he's referring to that. He goes on - - -
HER HONOUR: 3 July?
MR DESMOND: - - - and says, "It appears that this statement
has been prepared in response to a defence request for
further documentation about the workings and validation
of STRmix". He says, "I agree that such documentation
would be valuable". Firstly, do you agree with
that?---The documentation about the workings of STRmix
would be valuable?
Yes?---Yes.
He goes on to say, "And this statement goes some way" -
referring to your statement - "towards meeting this
request".
HER HONOUR: Can we just clarify, for Dr Taylor's benefit,
which statement we're talking about here, which of his
.DF:DM:CAT 20/04/17 SC 11A 121 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
statements? It seems to me it's the one dated 17 April
2015.
MR DESMOND: Well, I would have thought so, I couldn't find the
date that he refers to that, Your Honour.
HER HONOUR: Yes. So, Dr Taylor, it's your 2015 statement, the
one with all the algorithms in it.
MR DESMOND: Yes. Thank you, Your Honour, I missed the heading
to that paragraph. He identifies your second statement
of 17 April 2015?---Okay.
And goes on to say, "From Dr Taylor's comments on p.1 it
appears that this statement has been prepared in response
to a defence request for further documentation about the
workings and validation of STRmix. I agree that such
documentation would be valuable and this statement goes
some way towards meeting this request. I found
Dr Taylor's discussion of the difficulty arising because
the STRmix algorithm is 'not entirely step-by-step' to be
confusing and unnecessary". Do you agree that quote of
yours in your statement about the difficulty arising
because of the STRmix algorithm is confusing and
unnecessary?---I wouldn't necessarily agree that my
statement is confusing and unnecessary.
Well, it's the particular subject matter within the statement,
it may incorporate other subjects matters but this
particular one is you express some difficulty arising
because of the STRmix algorithm and setting it out, the
maths step-by-step, did you not?---Yes, and the reason
for that was, as I understood what I was being asked to
provide, was some step-by-step manner by which a person
could take the input material, put them through a series
of equations and then end up with the answer that STRmix
.DF:DM:CAT 20/04/17 SC 11A 122 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
gives at the end. Now, you can't do that because of the
way that STRmix works.
I'm sorry, doctor.
HER HONOUR: Can I just stop you for a minute, Dr Taylor.
Could we turn the sound up a bit, please, Mr Hansen.
You might be better off - I know you probably hate using
them - - -
MR DESMOND: I am generally, Your Honour, but I then find that
I speak lower because it's very loud in my ear.
HER HONOUR: Thank you.
MR DESMOND: I'll see how we go?---Shall I repeat my last
statement?
HER HONOUR: Yes, please.
MR DESMOND: Yes?---So the - what I was being asked to do, as I
understood it, when I wrote this report was provide some
step-by-step mechanism by which a person could take the
input material that was being provided to STRmix, take it
through a series of equations and end up with the answer
at the end that STRmix gave and because of the way in
which STRmix works it's not possible to provide that
linear series of equations that lead from inputs to
answers; that's the difficulty that I refer to.
Balding goes on to say, "There is no difficulty in principle to
describe a stochastic algorithm and there would clearly
be no interest in describing the actual realised values
of every step of the algorithm". Do you agree with
that?---That seems fair enough.
He then says that, "Also on his statement" - referring to your
statement on p.2 - "However, I can provide the rules (or
more technically the models) that govern". And Balding
goes on to say, "Is not completely satisfactory". Do you
.DF:DM:CAT 20/04/17 SC 11A 123 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
agree that statement of yours is not completely
satisfactory?---Well, I think I would need a little bit
more context as to what he considers to be satisfactory
to answer that.
Well, he goes on to say, "Yes, the rules is what is requested
but the mathematical models underlying STRmix are
generally well described in the literature and, in my
view, are not only cause for concern. What is less well
documented is a high level technical description of the
algorithm. I believe that this is not being made
available because of commercial considerations but I do
not think this is satisfactory in the criminal legal
context". Do you agree with those propositions?---Well,
there's more to the points really what you've just said
there. I agree that these sorts of algorithms should be
available to the judiciary or to the scientific public.
I would disagree that we haven't given those, as far as
how STRmix works, because we've published extensively on
every aspect of the way that STRmix works, all the
models, all the maths and all the algorithms.
He goes on to say that, "While the published literature
relating to STRmix is impressively large and of high
quality, it is at different levels and in different
places published at different times with much repetition
and referring to different versions of the program". Do
you agree with that?---Yes.
He goes on to say, "Defence experts should have available a
single document explaining how the program works and the
changes over different versions as well as the results of
validation checks and which versions they relate it". Do
you agree with that?---Yes, that would be reasonable.
.DF:DM:CAT 20/04/17 SC 11A 124 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
Do you agree that's not been done at this point for whatever
reason?---Well, the most complete compilation of all the
STRmix algorithms that are kept all in one place and all
up-to-date with the latest version of STRmix was in the
in the STRmix user's manual. That also includes a series
of changes across versions numbers in the back of the
manual there and we have defence disclosure policies to
make that available to defence. And just further to what
you said regarding the validation reports, typically they
- I assume that they would be talking about validation
reports for each - for the specific laboratory that's
validated, STRmix, which it would be available from the
laboratory. If you're talking about developmental
validations, then that's also present in the STRmix
manual.
He says, this is Balding, "This documentation would be
substantial and difficult for any one expert to absorb
and critically appraise, however, if such a document were
available to the international community of experts,
there would be long-term advantages through additional
opportunities for scrutiny that might uncover areas for
improvement and give greater reassurance than is
currently possible". Do you agree with that, sir?---I'm
not sure whether I agree or disagree with that.
Certainly I agree with the sentiment that the more
information you provide to the scientific public, the
more you're going to advance the field. I would say that
we have made all that information available but, as
previously pointed out, it's in different publications
spanning time.
M'mm?---It's really - whether or not we provide that material
.DF:DM:CAT 20/04/17 SC 11A 125 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
all in one block in a STRmix manual is really a matter of
convenience rather than access.
Well, I understood you to be saying if - and if it's correct
that all the algorithms and developmental validations for
each version are contained within the manual, I
understood you to be saying the manual is available, in
effect, in a litigation sense to the defence and the
judiciary but it's not generally freely available. Have
I misunderstood that?---No, that's correct.
Now, apart from other things that have occurred since we last,
since I last had an opportunity to ask you questions,
SWGDAM, that's the Scientific Working Group on DNA
Analysis Methods, have published guidelines for the
validation of probabilistic genotyping systems?---Yes.
As I understand it, and just reading from the copy I've got
here, "Following the public comment period the ad hoc
working group forwarded the final guidelines to the
SWGDAM executive board and they were approved for posting
on the SWGDAM website on June 15 2015. That's correct,
as you - - -?---Okay.
And clearly, as disclosed by the question and the document
itself, it sets out a series of guidelines addressing the
validation of probabilistic genotyping systems in
general, one of which is STRmix?---Yes.
You've sought to comply or adhere to those guidelines as the
developer or co-developer of STRmix; is that
right?---Yes.
You jointly authored a paper that was published in Forensic
Science International Genetics, volume 23 2016 pp.226-239
entitled, "Developmental validation of STRmix expert
.DF:DM:CAT 20/04/17 SC 11A 126 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
software for the interpretation of forensic DNA
profiles", agreed?---Yes.
Just quickly going to the abstract, it reads, "In 2015 the
scientific working group on DNA analysis methods
published the SWGDAM guidelines for the validation of
probabilistic genotype systems. STRmix is a
probabilistic genotyping software that employs a
continuous model of DNA profile interpretation. This
paper describes the developmental validation activities
of STRmix following the SWGDAM guidelines." That sounds
accurate, as I've read that part of the abstract to
you?---That seems correct.
Okay. Now, in the body of the article you go on to - the
precursor had been the description as to how the science
had developed, manual techniques for DNA profile
interpretation puristically based, et cetera, and you,
the authors, then get to identifying again the 2015
guidelines and you go on to say, could I suggest, "The
developmental validation of STRmix was initially
undertaken in 2012 following the requirements outlined
within the FBI quality assurance standards by analysts at
Forensic Science South Australia and the Institute of
Environmental Science and Research Limited, ie, ESR,
agreed?---Yes.
In paragraph 1.1 of the article you identified guideline 3.1 of
the 2015 SWGDAM guidelines being publication of
underlying scientific principles, agreed?---Okay.
And the paragraph commences, "All significant portions of the
statistical algorithms and underlying scientific
principles behind STRmix have been published in peer
reviewed scientific literature. Within table one we
.DF:DM:CAT 20/04/17 SC 11A 127 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
provide a summary of these models and algorithms and
their references aligned with the software version in
which they were introduced". Okay?---Okay, yes.
Now, when one then goes to table one, it's identified as a,
"Summary of the scientific principles the STRmix version
in which they were introduced and their publications".
The next line is, "Algorithms scientific principles and
methods", the first one of which is - it's listed in the
table is allele and stutter peak height variability as
separate constants within the MCMC"?---Okay.
Before I go through each of them as necessary, are you able to
answer - I assume you haven't got access to the document
where you are at the moment - if you have, you're welcome
to open it up?---No, I don't have it.
Are we talking about mass parameters within this table?---Okay.
The first one that I have just read out, "allele and stutter
peak variability is separate constants within the MCMC",
the heading under the column "version introduced" is
version 2.0; that's accurate?---I assume so, yes, yeah.
Does that mean for earlier versions there was no algorithm or a
different algorithm for modelling allele and stutter peak
height variability as separate constants within the
MCMC?---Could you just repeat that, sorry?
Okay. Perhaps you could explain what is meant by the phrase
"separate constants within the MCMC"?---For the allele
and stutter peak height variability?
Yes, yes, what's a separate constant mean?---Sure.
Is it a constant at each marker or- - -?---So within the DNA
profile that we're analysing, some of the fluorescence
that we're seeing, some of the peaks is going to be due
to alleles. If I use that term, you familiar with what I
.DF:DM:CAT 20/04/17 SC 11A 128 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
mean?
Yes?---And some of the fluorescence is going to be due to
stutters from those alleles.
Yes?---In early versions of STRmix we modelled the peak height
variability of stutters and alleles with one over-arching
model and then in later versions we refined the model we
used so that it still uses the same algorithm but
stutters and alleles had different constants within that
algorithm which means stutters - the peak high
variability of stutter peaks were tolerated less than the
peak high variability in allelic peaks in later versions.
Okay. But the change in version two, you may be able to tell
us approximately when version two commenced operations or
you may not, was done to improve the ultimate outcome of
STRmix, that is the production of the LRs?---Yes, and so
it is for a number of algorithms and ongoing efforts, we
continually refine and improvement the algorithms within
the program.
I understand. But you might recall the case of Tuite involves
the use of STRmix version prior to version two, it's
version one point zero something?---Yes, that's right.
The next category is peak height variability as random
variables within the MCMC. That was introduced,
according to the table, in version 2.3. Firstly, peak
height variability, is that a reference to both stutter,
alleles, spikes, blobs and any other artefacts or is it
only a reference to stutter and allelic peaks?---It's
only in reference to stutter and allelic peaks because
all those other artifactual peaks that you just
mentioned, it's presumed that an analyst has screened
those out prior to using STRmix.
.DF:DM:CAT 20/04/17 SC 11A 129 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
Okay. But where in the first category, allele and stutter peak
height variability are separate constants, which you have
explained came in with version two, peak height
variability is random variables within the MCMC, what is
that, how is that different from the first
category?---Sure. So within this new change that you're
talking about that came in with version 2.3 we actually
allowed STRmix to change how tolerant it was to peak
height imbalances for both stutters and alleles so that
if profiles were being produced in a lab that happened to
be slightly more variable than average DNA profiles, then
STRmix could adjust for that. Similarly, if they were
slightly less variable in peak height than the average,
then STRmix could adjust for that. Prior to that version
there was just one fixed value for the tolerance of peak
height variability for stutters and alleles that couldn't
shift.
So we could change variability there for the imbalance issue of
peak heights looking at, across an entire profile, that's
looking at the balance and whether it confirms?---No.
With this one you're still talking about within a locus,
you're still talking about individual peak height
variabilities, you're just talking about the ability of
STRmix to adjust to the variability that's been seen.
Okay. So it's restrictive to locus or does it include genotype
as well across the profile?---It's for all peaks in the
whole profile.
Okay. Well, does have to - does it either have to discriminate
and/or is it able to discriminate with, between allele
imbalance and genotype imbalance?---Well, there is no
real - - -
.DF:DM:CAT 20/04/17 SC 11A 130 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
Is that unnecessary- - -?---I'm not quite sure - - -
- - - that sort of distinction, genotype imbalance across a
profile, genotype - - -?---So allelic - there is no real
term "genotype imbalance" as far as I'm aware. Allelic
imbalance, I suppose, would cover that.
Would cater for it. Okay. The third category is model for
calibrating laboratory peak height variability. That's
said to have been introduced in version 2.0. Now,
firstly, is that model for it to be used, that requires
input of the individual laboratory validation data as
they have modelled peak heights in the past or is it
talking about something else?---That's talking about the
process of labs taking some validation data and using
this component of STRmix to calibrate STRmix for how that
ladder is performing.
I didn't ask you, so I will just re-trace, for that second
category, peak height variability as random variables
within the MCMC, did it do it, and, if so, what model,
how did it do it prior to version 2.3?---No, 2.3 we
introduced that feature of the model where STRmix could
adjust for the profile it was seeing with regards to peak
height variability. Prior to that STRmix couldn't adjust
to the individual profiles peak height variability, it
used a - like an average value from the laboratory
calibration data.
Okay. Well, that may then answer the next question. For the
third category of model for calibrating laboratory peak
height variability which came in with version 2, was that
addressed prior to version 2?---Yes, it was.
Application of - it says "Gaussian" G-a-u-s-s-i-a-n "random
walk to the MCMC process came in in version 2.3". What
.DF:DM:CAT 20/04/17 SC 11A 131 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
is the Gaussian random walk to the MCMC process in
layman's language?---In layman's language will be a
challenge but I'll see how I go. When you are trying to
describe a DNA profile or when STRmix is trying to
describe a DNA profile it will choose values for the
different mass parameters so it will choose a DNA amount
for each contributor and that would make a difference as
to how high it's expecting to see peaks. It will choose
values for degradation for each contributor and that will
dictate how much the peak highest is expecting to drop
off as the profile goes from left to right, and similar
for other mass parameters in order to build up an
expected profile but what it's expecting to see. Now,
how closely that expected profile aligns with the profile
you have observed sort of dictates how well those mass
parameters are describing what you've got. With each
iteration in the MCMC, so each new round of estimating
new values for these mass parameters, STRmix will take a
small step away from the current value that it's sitting
on and that small step away is called a Gaussian random
walk because the size of the step depends on a Gaussian
curve.
The application of that Gaussian step, was that catered for in
versions prior to version 2.3?---Prior to 2.3 we didn't
use a Gaussian random walk, we used a different method of
proposing these new mass parameters.
What was the method used in the version which I think is 1.08
but you might be able to correct me in the Tuite
case?---That method was a clipping and sliding method.
Sorry, clipping and what?---And sliding, which is to say we
have a small window from which those values could be
.DF:DM:CAT 20/04/17 SC 11A 132 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
chosen, that window would be sitting around the current
value and that window could slide up or down as the
analysis proceeded.
That clipping and sliding method, is that identifying a
particular algorithm or a mathematical equation?---A
particular algorithm.
Can you identify where, either what the algorithm is or where I
would find it?---That algorithm would be present in the
STRmix manuals that related to STRmix prior to version
2.3.
Okay. The next category is "modelling of back stutter by
regressing stutter ratio against allelic designation
which was introduced in version 2". Firstly, can you
just explain what's that identifying? I know what "back
stutter" is, but what's "back stutter by regressing
stutter ratio" as opposed to just looking at the stutter
ratio?---Well, what that means is that within a locus you
can create the equation of a line that lets you convert
an allele to a stutter ratio. So that's what we used.
Was modelling of back stutter by regressing stutter ratio
against allelic designation catered for in versions prior
to version 2?---Yes.
With a different algorithm?---No, with that same algorithm,
with a regressive, but with a regression algorithm.
What's the distinction, what's being identified that was
introduced in version 2 that apparently wasn't in the
earlier versions?---Well, I believe, from memory, in that
table that earliest version number we used is version 2
because that was the first commercially released version
of STRmix, so that's as low down as we go but a lot of
these things were present in prior versions because
.DF:DM:CAT 20/04/17 SC 11A 133 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
they're fundamentals to the working of STRmix.
Well, the document certainly doesn't say this and this is
published in the Forensic Science International, you
know, peer review - it's peer reviewed literature,
agreed?---Correct.
As I understand it, the STRmix program, correct me if I am
wrong, but was actually sold to FSL or did they just get
it for free, it was a commercial product when it was sold
to FSL Victoria?---So all the Australian labs received
pre commercial versions of STRmix for free. The first
commercial version was version 2, which is what we've
published on because that's the version that's available
to the international community.
Well, this particular category, that was in the earlier - well
- and specifically was in the Tuite version?---Yes.
Next category, "Modelling of back stutter by regressing stutter
ratio against LUS" - Professor Balding explained LUS
yesterday, it's longest- - -?---Uninterrupted sequence.
Uninterrupted sequence, thank you. And according to the table
introduced in version 2.3?---Yes.
So I'm assuming that because it's first introduced in 2.3, it
wasn't in the first commercial program, version 2, and,
therefore, wasn't in the non-commercial programs prior to
version 2, is that correct?---Correct, yes.
That modelling - firstly, perhaps, what's distinction here for
the modelling of back stutter by regressing stutter ratio
against LUS as opposed to the previous description of
regressive stutter ratio against allelic
designation?---Well, in the previous model where you
regress it against allelic designation, for each locus
you get an equation of a line, as I said, which lets you
.DF:DM:CAT 20/04/17 SC 11A 134 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
take an allelic value and compare that to the equation of
the line and get an expected stutter ratio come out of
that. What we found was that there was a model that
better described stutter ratio than simply this one
equation of the line, in particular some loci where the
stutter or with a repeat region wasn't a simple sequence
of repeats but was broken up by non-repeating regions, we
found this longest uninterrupted sequence was a better
description of the expected stutter ratio, so we refined
the model to use that longest uninterrupted sequence
rather than the allelic designation.
By what measure is it better?---Well, you can - - -
If you used a percentage term, if that was appropriate, is it
10 per cent better than the previous - - -?---Um, so what
you can do is graph allelic designation versus the
observed stutter ratio and you can fit an equation of the
line and you can get value, called an R-square value that
tells you how well that line describes the data, and I'm
relying on my memory a bit here but I believe that that
R-square value was around about 0.34, 34 per cent of the
variability was described by allele.
Okay?---We then did the same thing graphing longest
uninterrupted sequence against the observed stutter ratio
and that R-square value, from memory, was around 60 to 70
per cent, so it was - - -
So perhaps twice as effective?---You can't - it's not quite as
simple as that but it was more effective.
Okay. The next category is, "Modelling of forward stutter" -
forward stutter, is that restricted to, is it two base
points along the profile or four base points?---That's
one repeat units - - -
.DF:DM:CAT 20/04/17 SC 11A 135 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
One repeat unit?---- - - up from the allele rather than being
the standard stutter we have been talking about is one
repeat unit less than the allele, this is one repeat unit
more than the allele.
But if you were reading one repeat forward or back, is it plus
or minus 2BP or not?---It depends on the repeat in the
region that you're looking, so if it was a four base P
repeat it's plus or minus 4, if it's a five base P repeat
it's minus five.
Okay. While the modelling of forward stutter was first
introduced in the version 2.4, so do you accept by
inference it wasn't being modelled prior to version 2.4
in either commercial or non-commercial
programs?---Correct.
Next is, "Modelling of allelic drop-in using a simple
expediential or uniform distribution said to be
introduced in version 2.0", so allowing for your
explanation which you previously gave, was that modelled
in the Tuite STRmix version?---Yes.
Got the same algorithm as has been described- - -?---Yes.
- - - coming in with the first commercial one?---Yes.
Next is, "Modelling of allelic drop-in using a gamma
distribution" which is first introduced in version 2.3,
so applying the same logic it wouldn't have been
introduced in the non-commercial version,
agreed?---Agreed.
What's the distinction between the gamma distribution that's
measuring modelling allelic drop-in as opposed to using
the simple or uniform distribution? What are you
achieving by the different modelling with gamma?---The
gamma distribution came from a publication by a guy named
.DF:DM:CAT 20/04/17 SC 11A 136 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
Roberto Bucholz. He came up with a very neat way of
using a gamma distribution to model all the expected
drop-in that you see in a laboratory and when we saw that
we quite liked the way that that algorithm worked and so
we incorporated it into the next version of STRmix.
But am I current in understanding when you say you used the
phrase "we liked", what you mean by that it was a
superior model to what we were previously doing by
modelling with the simple or uniform expediential
distribution?---Yes, it provided a better estimate in
more situations.
Okay. Modelling - the next one is "modelling of degradation
and dropout". It identifies version 2. So are you able
to say that there was modelling of degradation and
dropout in the Tuite version?---There was.
Same - is that one algorithm, modelling of degradation and
dropout or is that two?---That's two separate algorithms.
Okay. Same algorithms for each in the Tuite version as is
identified in version 2 in table 1 of this
document?---Yes.
Next is, "Modelling of uncertainties in the allele frequencies
using the HPD". Just define what the HPD is for me,
please?---HPD stands for highest posterior density and
it's basically just a fancy name to take into account our
uncertainty in the True Allele frequencies in a
population because we're basing them on a small sub-set
we've used to create a database.
And that identifies version 2, so I'm assuming that that was
being modelled in the Tuite version, is that
right?---Yes, that's correct.
Same algorithm?---Yes.
.DF:DM:CAT 20/04/17 SC 11A 137 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
Next, "Modelling of the uncertainties in the MCMC". To the
layman that reads very vague, it's probably more specific
modelling of the uncertainties in MCMC. To you what does
that mean?---That means that each time you run an MCMC
analysis you get a slightly different answer. You expect
a certain amount of variability purely from that
stochastic MCMC process so in the later versions of
STRmix we were able to encapsulate that expected level of
uncertainty on an analysis by analysis case and include
that in the HPD estimate.
I'm sorry, does that mean if you did the run again or the
modelling again using STRmix you should get the same
result because you're allowing for the random uncertainty
using MCMC?---What it means is - well, for each analysis
that you run you get a point estimate and you get this
lower bound interval that we're reporting.
Yes?---If you were to run that analysis multiple times then 99
per cent of the time the point estimates will appear
above or the intervals, if you do cross comparisons
between the different runs.
Okay. That came in with version 2.3 according to table
1?---Yes.
So that wasn't being modelled in the earlier versions,
agreed?---Agreed.
Next is, "Database searching of mixed DNA profiles", it relates
to version 2, so I'm assuming that was being modelled in
the Tuite version?---Yes.
But what does that mean "database searching of mixed profiles"?
For what aspect? Because, I mean, you would, for a
particular analysis the crime scene stain profile would
be put in and perhaps the known reference sample of a
.DF:DM:CAT 20/04/17 SC 11A 138 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
victim and for the purpose of the likelihood ratio the
person of interest, in that sense there's no database to
search, so what is that identifying there?---Well, prior
to STRmix the standard way of dealing with perhaps no
suspect cases would be that you would test an exhibit and
attempt to interpret a single component from one person,
load that to a database in the hopes that it might match
someone and identify a potential offender. That process
relied on being able to interpret a single component from
a mixture. With the use of STRmix we were able to search
unresolvable mixtures against the database and identify
potential contributors to that mixture. So it just
opened up a lot more profiles to the stutter base
searching function mainly for no suspect cases.
I'm just making sure I've got this - as I read it, that came
in, there's just such a gap between the description and
then the version, I've got to make sure I get the - I
think that came in with version 2, so that was in the
Tuite version?---Yes.
And you understand this is a cold hit match case?---Yes.
So that particular modelling figured prominently in the Tuite
case or not?---Well, I don't know how the cold hit came
to be made.
Okay?---It may have been using standard - - -
HER HONOUR: Is this a modelling issue or just a feature of the
system?
MR DESMOND: I think it's a good question?---Well, it's just a
feature of the system.
HER HONOUR: That's right?---If you accept the fact that there
is a way that we can calculate a likelihood ratio if you
supply a reference, really this just a feature which says
.DF:DM:CAT 20/04/17 SC 11A 139 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
if you now supply a thousand references you can get a
thousand likelihood ratios all at once and look for
potential contributors.
MR DESMOND: Okay. I'll skip "familial searching" and move on
to - well, the next one is relatives as alternative
contributors, I'll skip that. The third last one is,
"Modelling expected stutter peak heights in saturated
data" and I raise this because it's a - there's a
particular aspect that Professor Chakraborty raises
concerning one item. That came in in version 2.3, so you
accept that modelling wasn't done prior to version
2.3?---That's right.
And you best explain what it actually means?---Certainly. When
you generate a DNA profile you will get alleles and
you'll get much smaller peaks which we have been talking
about called stutter peaks.
Yes?---As more DNA is used in the magnification reaction those
peak heights are going to increase and it happens at
roughly linear rate to a certain point and that point
would be when the instrument is no longer capable of
detecting any more fluorescence than what it's already
detected and that's called the saturation level.
Yes?---So above that particular level you can't really use the
observed peak heights of the allele to obtain the
expected peak heights of the stutters in that case
because they're saturated and the allelic peak heights
aren't representative any more of how much DNA is in the
DNA extract of that person.
M'mm?---So instead of using the observed peak height, which is
saturated, to come up with the expected stutter peak
height, we use the expected parent peak height that is
.DF:DM:CAT 20/04/17 SC 11A 140 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
within STRmix to come up with the expected stutter peak
height.
HER HONOUR: Why would - - -?---And it gets around that
saturation - - -
- - - saturate?---What was that, sorry?
Why would you saturate? Can't you control the amount of
phosphorescence or whatever it is that you're
measuring?---You can do and typically the laboratories
will have a certain DNA amount which is commonly called
the optimal DNA amount which just basically means they're
peaks most of the time will be relatively strong but not
so strong that they get into that saturation zone but
there are certain situations where just by random
sampling of the DNA extract you might sample more DNA
than what you're expecting and peaks will become
saturated or you may wish to push the amount of DNA that
you use in a PCR in order to try to obtain more
information about minor components which can drive the
major components more towards saturation. In that
situation, it's sort of a balancing point between getting
more information from the minor components and losing
information in the major component due to saturation.
Thank you.
MR DESMOND: Have you got Chakraborty's report handy? As I
understand it, you were given a copy of it because you
sort of responded to issues that he raised?---Yes, I do.
If you could go to p.6, doctor, please?---Yes.
The third bullet point he says, "The concept of expected peak
height and variances in observed stochastic effects in
LTDA are learned through execution of MCMC algorithms of
the model maker component of the software which uses
.DF:DM:CAT 20/04/17 SC 11A 141 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
genotype data from multiple single source samples (90
recommended for the version 108 version) with a range of
various template amounts. A user specified parameter
'saturation cap' defines the upper limit allowable for a
valuation of expected peak heights while comparing the
observed data. The choice of such saturation caps in
evaluating expected peak heights critically affects the
ratio of variance and expected peak heights
(overestimates the ratio) in case work use of stochastic
effects. During the review apparently it was noted that
in the R v. Tuite case at least one case example was
analysed in the STRmix by using a considerably larger
saturation cap of 32,000 RFU making the ratio of variance
over expected peak heights much tighter". Now, firstly,
is that issue, which I know you disagree with, and I'll
take you to your response in a minute, is that addressing
at all this particular modelling, that is the modelling
expected stutter peak heights in saturated data?---Yes,
that's referring to that saturation issue.
Okay. Do you agree he's at least correct in identifying that
there was one case work sample analysed by STRmix in this
Tuite case that used what he describes as a considerably
larger saturation cap of 32,000 RFUs?---I don't disagree,
I don't have any information to the contrary.
Okay. Are you able to tell us, from your clearly informed
knowledge of the program, what the general saturation cap
or the recommended saturation cap is or do you not make
any recommendation?---What we recommend is that each lab
develops a saturation cap that is calibrated to their
particular instruments. There's generally two different
levels of saturation cap that we find that laboratories
.DF:DM:CAT 20/04/17 SC 11A 142 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
come to. One would be around about 7 or 8 thousand RFUs,
that's for data that's produced on a particular model of
instrument known as a 31/30 or 31 hundred Capillary
Electrophoresis instrument.
Yes?---In the past few years there's been a new model of that
instrument, a new model of the hardware, which are called
35 hundred instruments. They use a different scale of
fluorescence so typically saturation caps for those
instruments tend to be around the 30,000 mark.
I'm looking for it but I don't know that if I can find it. The
settings used in calculation are generally set out within
the output data, aren't they?---Yes, that should have
saturation cap listed there.
To give you an example, I'm looking at F190 - it's a page
number of the case file in Tuite - if I can pick it up,
I'll tell you what item that relates to. I think it
relates to item 1-2. But in the heading, "Settings used
in calculation" - they go down the various parameters and
the saturation cap, I assume it is, but it just says
"saturation" is listed as 8,000 which would accord with
the 7-8 thousand range you identified a minute ago for
the 31/10, I think you said, or a 31/50 machine?---Yes.
To give you an example of the 32,000 at p.F212 of the case
file, which seems to relate to item 1-3, again going to
the page with, "Settings used in calculation" we go down
to the stutter cap and it's recorded 32,000?---The
saturation.
That's what it says under saturation - the word "saturation"
says 32,000?---Yes.
It doesn't say RFU but we're clearly talking about RFUs?---Yes.
So without trying to address the specific item issue, do you
.DF:DM:CAT 20/04/17 SC 11A 143 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
agree with Chakraborty or take issue with the balance of
his sentence there, "making the ratio of variance over
expected peak heights much tighter"?---No, that's not
necessarily correct.
Well, is it possibly correct as opposed to necessarily
correct?---It's not necessarily correct, it's not
necessarily incorrect. It would depend on - he is
talking about the ratio of variance over expected peak
heights, so when you go a 35 hundred that expected peak
height, because you've got a larger saturation cap, is
going to increase, so the bottom - - -
The variance is going to increase?---The expected peak height
is going to increase because you've got a larger
saturation cap, so he's talking about - - -
Sorry, are we looking at the difference between two peaks and
trying to determine does it fall within the stutter
range?---No, we're not talking about stutter at all here.
Okay?---We're just talking about saturation, okay.
Yes?---We're talking about the ratio, if we're using
Chakraborty's words, the ratio of variance over expected
peak heights at the very last - - -
Well, your modelling in table 1 as it's described is modelling
expected stutter peak heights in saturation data?---Yes,
but are we talking about my point in the table or are you
talking about Chakraborty's point?
Well, I asked you is Chakraborty's point relevant to this issue
in the table "modelling expected stutter peak heights"
and you said it's relevant, that doesn't necessarily mean
it's only addressing stutter peak - I'm not trying to
trick you up here?---Yeah.
Perhaps you just explain what you believe Chakraborty's
.DF:DM:CAT 20/04/17 SC 11A 144 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
addressing there?---So what Chakraborty is addressing
there is the ratio of the variance over the expected peak
heights. Now, sometimes that's going to be the expected
stutter peak heights, which is what that table was
talking about, or sometimes it's going to be the expected
parent - allelic peak heights.
Yes?---He says that the ratio is going to be much tighter and
he says that because when you increase your saturation
cap your expected peak heights are going to be larger, so
the bottle part of that ratio is going to get bigger and
that's what he describes as "getting tighter" but the
other point he hasn't addressed is the fact that the top
value on that line, so the variance when you're using
these larger saturation cap instruments, is also going to
increase, so the ratio may not change at all.
Well, I suppose it depends on the given situation, it may or it
may not change?---That's right.
Okay. But are you able to say whether - and I expect you won't
be because you're not, without being critical, hands on
in terms of the Tuite case with the items?---(No audible
response).
Whether it's that item that he's identifying there with the
larger saturation cap was seemingly done because the 35
hundred machine was now being used as opposed to the
31/50 machine?---I can't say with certainly but all I can
say is you wouldn't increase the saturation cap to 32,000
if you used a 31/30 instrument. It would have to be a
change of model or instrument.
That would be a significant error if that was done?---That
would be an error if that had occurred, yeah.
Okay. The second last item in table 1 is taking into account
.DF:DM:CAT 20/04/17 SC 11A 145 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
the factor of two, which is a quote, the factor of two,
and I have read about that somewhere but I can't
remember what it means, in LR calculations which is
introduced in version 2.3 and applying the same logic
wasn't being taken into account in earlier versions.
What is the - firstly, what is the factor of two in LR
calculations?---That is to do with likelihood ratio
calculations and it's to do with the way that you set up
hypotheses so if you don't use this factor of two or this
- it's called "factor of N" in the scientific literature.
Yes?---Then really you're talking about a particular component
of a mixture. If you do use the factor of N, you're
talking about the mixture as a whole.
In the Tuite case, likelihood ratios were calculated based on a
Crown hypothesis and competing defence hypothesis?---Yes.
But not with taking into account the factor of two in the LR
calculation as is necessary and recommended in versions
2.3 onwards?---Correct.
So what's not been addressed in the LR calculations, say, in
earlier versions that's now being addressed because they
both, on either, earlier and later versions, they both
would have a Crown hypothesis and a defence
hypothesis?---Perhaps I can explain by way of an example.
You could imagine you'd have a two person mixture where
you've got one major component that matches a person of
interest and you've got one minor component that's
different. Earlier versions of STRmix would give you the
likelihood ratio considering that the person of interest
is the major component and unknown is the minor component
compared to two unknown people being the source of DNA.
Yes?---Now, what STRmix version 2.3 and onwards does is
.DF:DM:CAT 20/04/17 SC 11A 146 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
compares the person of interest to the major component
and so you have the hypothesis that they're the major
contributor and there's an unknown minor.
Yes?---But you also consider them as the minor contributor with
an unknown major, so you're comparing it to both
components of the mixture and reporting like an average
of all of those comparisons, whereas in the prior
versions of STRmix you're just comparing it to the major
component in my example.
But it would also discriminate between the two?---What do you
mean by that?
Well, when it - you say it caters for determining when the
person of interest is the major contributor, it will come
up with a likelihood ratio?---Yes.
But it also looks at where he's possibly the minor
contributor?---Yes.
It just - it doesn't come up with just one figure?---Well, in
both versions of STRmix the references would be compared
to all components of the mixture but in the versions
prior to STRmix 2.3 it would give you the likelihood
ratio of for, say, the comparison to the major component.
Yes?---Only.
Yes?---Whereas later on in versions of STRmix where we
introduce this factor of two, it reports like an average
across all the different comparisons to all the
components of the mixture.
In the real life situation, assume the person of interest was
the actual contributor of the major component?---Okay.
STRmix would still give weightings, albeit I assume they would
be lesser in value, but would still give some weightings
to the person of interest possibly being the minor
.DF:DM:CAT 20/04/17 SC 11A 147 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
contributor?---It could do, yes.
When in this silly example I'm giving we would know that he's
just in fact not?---Say that again, sorry?
In this example that I'm giving, we would know in fact that
he's not the contributor of the minor sample, STRmix
potentially would still give some weighting to that
person of interest being the contributor of the minor
component to the mixed sample?---Well, as I said, it
could do, it depends if that amount of peak heights
variability was allowed so that if major peaks could pair
with minor peaks under the calibration settings produced
in the lab, then it would.
Yes?---If that level of imbalance is simply never seen in a
lab, then when it went to compare the person of interest
to the minor component the weights would be zero because
it would deem it impossible for him to be that minor.
Is that able to be done now with the items in the Tuite case,
that calculation?---Yes.
Is it your understanding - I know from what I have read about
STRmix and from what you've told us previously that if
defence makes requests for further calculations, then so
long as they're not silly requests and wasting
everybody's time, it would be expected that calculations
would be done?---Yes.
The last item in table 1 is, "Model for incorporating prior
beliefs in mixture proportions" and that was introduced
in the version 2.3, so a priori it wasn't introduced or
existent in earlier versions to 2.3, agreed?---Correct.
Does that have a scientific meaning? It sounds like a layman's
term, almost a religious term, "prior beliefs"?---What
that means is if you have some prior expectation on what
.DF:DM:CAT 20/04/17 SC 11A 148 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
the mixture proportions should be in a particular sample
and you analyse that sample with STRmix and the results
that STRmix give doesn't necessarily align with your
prior expectations, you can provide these, what we call
informed priors to, I guess, assist STRmix or provide it
some indication of your prior expectation.
Is that modelling both subjective and objective? It's
objective in terms of there's a known algorithm but it
requires some subjective input by the operator to the
particular prior belief that he or she believed is
relevant?---Yes, that's subjective component to that.
How does one have a prior belief about a crime scene stain and
its mixture proportions?---So the most common example or
scenario where this would be used is you may have a
profile where there appears to be peaks that are below
your analytical threshold, so peaks you haven't detected
and therefore peaks that are not going into STRmix to be
analysed.
Yes?---You might use those sub-threshold peaks to inform you
about how many contributors you believe there are to that
DNA profile but, of course, you're making that assumption
or you're making that decision based on information that
you're then not providing to STRmix, so there's a
disconnect between the information that you're using and
the information that STRmix is using.
Yes?---So then you can use these informed priors to tell
STRmix, "Look, I've said, for example, three
contributors, I believe one of these contributors is very
low-level, so I'll give a prior belief that one of the
contributors is at very low levels in this mixture
because I'm using this low-level sub-threshold
.DF:DM:CAT 20/04/17 SC 11A 149 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
information that you're not seeing", and then STRmix will
proceed with that.
Okay. That model or the algorithm that's the base for that
model, can you identify that at all in your statement -
which I have got to find where I put. It's the first of
the two today that you identified which sets out the
amount. The math really commences from about p.6, from
memory?---Yes. Well, it's not in that particular
statement because it's more of a user option rather than
a fundamental mathematical principle.
But it wasn't a user option for the Tuite case, in any
event?---Correct.
Can you tell me what the algorithm is or is it too long-winded
to regurgitate and you'd need it in front of you to see
it?---I can give hopefully a layman's explanation.
No, no the actual equation?---Well, the equation is the
equations for a normal distribution.
Okay?---You provide your prior beliefs and mixture proportions
by supplying means and variances of normal distributions
for what you believe the mixture proportions should be
and then STRmix using the densities of those normal
distributions in order to adjust the probabilities that
it obtains.
Okay. You go on in that article, "Developmental Validation of
STRmix" - "so STRmix uses the quantitative information
from an electropherogram, such as peak heights, to
calculate probability" et cetera. "The weights across
all combinations of that locus are normalised so they sum
to one". Do I understand another way or they sum to 100
per cent, the total weightings, which often we express in
percentage terms and see them in Deakin tables, they all
.DF:DM:CAT 20/04/17 SC 11A 150 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
seem to add up to 100 per cent, is that saying the same
thing?---Yes.
Okay. You then go on to say, "STRmix describes the
fluorescence observed in one or more EPGs using a number
of models that describe various properties of DNA profile
behaviour. These are described as mass parameters and
include a template for each contributor, a locus specific
amplification efficiency for each locus, a replication
efficiency for each PCR template and a degradation for
each contributor. This biological model is described in
Bright et al. reference 16." Reference 16 is Bright,
yourself Curran, Buckleton, developing allelic and
stutter peak height models for a continuous method of DNA
interpretation, FSI Genetics Volume 7 in 2013. So, I
then want to go to that paper. Firstly, I should ask
you, that sentence incorporates a number of propositions,
so I'll just quickly read them. "These are described as
mass parameters and include a template for each
contributor, a locus specific amplification efficiency
for each locus, a replication efficiency for each PCR
replicate and a degradation for each contributor. This
biological model is described in Bright et al." When I
was looking at this the other day, I'm reading that as
you relying upon this Bright reference as giving you the
biological model for degradation for each contributor.
Have I misunderstood that?---It's difficult for me to
comment without the paper in front of me. It could be
for the whole sentence.
Well, can we approach it this way, I'll now read you the
extract or the relevant parts of the - not the extract,
the abstract of reference 16 developing allelic and
.DF:DM:CAT 20/04/17 SC 11A 151 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
stutter peak height models, et cetera?---Okay.
"Traditional forensic DNA interpretation methods are restricted
as they are unable to deal completely with complex
low-level or mixed DNA profiles. This type of data has
become more prevalent as DNA typing technologies become
more sensitive. In addition, they do not make full use
of the information available in peak heights. Existing
methods of interpretation are often described as binary
which describes the fact that the probability of the
evidence is assigned as zero or one (hence binary). See
example one." New sentence, "These methods are being
replaced by more advanced interpretation methods such as
continuous models. In this paper we describe a series of
models that can be used to calculate expected values for
allele and stutter peak heights and their ratio SR. This
model could inform methods which implement a continuous
method for the interpretation of DNA profiling data".
You wouldn't take issue, I've read out accurately the
abstract?---I'm happy with that, yes.
In the body the article, "Continuous methods make assumptions
about the underlying behaviour of peak height or the
variability in the ratio of the two peaks of a
heterozygote and the ratio of an allelic peak height to
stutter peak height to evaluate the probability of a set
of peak heights. These models may be developed from
empirical data external to the profile under
interpretation, by a combination of external data and the
profile under consideration, or simply by the profile
under consideration. We would tend to favour the
combination approach". The combination approach is the
.DF:DM:CAT 20/04/17 SC 11A 152 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
combination of external data and the profile under
consideration. Now, I think I know but what is the
external data and is that combination approach- -
-?---That would be - - -
- - - which forms the basis for STRmix; is that right?---Yes.
So in that instance the external data would be the
calibration data that's been used in the validation of
STRmix for that particular laboratory and you use that
calibration data or that external data to inform some
parts of the workings of STRmix and the models within
STRmix and then, as you're analysing the profile itself,
that is informing STRmix of the values of certain mass
parameters and all of that combined ultimately leads to
your deconvolution or your likelihood ratio.
Well, the calibration data, does that include the internal data
from the particular laboratory that's used for variance
inputs?---Yes.
Do you agree, doctor - pardon me while I find it - there are a
number of unknowables in reality in relation to the
interpretation of DNA and the attempt to interpret it by
a DNA interpretation system such as STRmix?---Yes.
I want to list what I would suggest to you are the unknowables
and put it in this context that it's really these
unknowables which STRmix is attempting to model, in other
words, to fill in the blanks?---Okay.
The blanks in the sense of without the modelling they are
unknowable?---Okay.
The number of contributors to the profile, that's unknowable,
you never truly know the number of contributors to any
given profile, agreed?---Agreed.
The DNA amounts of each contributor?---Agreed.
.DF:DM:CAT 20/04/17 SC 11A 153 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
The degradation of each contributor?---Yes, agreed.
The amplification efficiently of each locus?---Yes.
The replicate amplification strength?---That's right.
And the level of peak height variability within the
sample?---That's right.
And at least those and they are probably a number of others, if
there are you may be able to tell the court what they are
now, that are being modelled discretely each of these
unknowns?---Yes. As you said before, this is what STRmix
is attempting to inform us on when it analyses a DNA
profile.
Now, within this article, "Developing allelic and stutter peak
height models" there are certainly sections which address
stutter, 3.3 is entitled, "Modelling peak heights", 3.4
is entitled, "Application of the model" that you have
identified in the article for mass and stutter. Then it
goes on to the discussion and the final acknowledgement
for the work, okay?---Okay.
Just one particular issue before I get on to what I'm
addressing. There's a sentence here, "Unexpectedly the
scatter plots" that you've identified in figure 8 in the
article, and there's figure 8A and figure 8B - so I'll
read out what they are - figure 8A log 0 to the H over
capital E to the H for the high molecular weight allele
versus log 0 to the little L, O to L over E to L for the
low molecular weight allele for each heterozygote locus
and B, diagram B, log O little a, I think it is, it's
hard to read, over E little a for the allelic peak versus
log OA negative 1 over E to the A negative 1 for stutter
peak but the sentence I'm asking you about is this,
.DF:DM:CAT 20/04/17 SC 11A 154 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
"Unexpectedly the scatter plots in figures 8A and B
indicate there is no detectible correlation between
stutter and allele in this biological model". When I
read it that sort of gave me some concerns but I'm
probably not understanding what you and the other authors
are describing there. Are you able to explain that
without the article in front of you or not?---I have to
try and cast my memory back as to what - all right, I
think, I know what we were talking about with that
particular sentence. Our thinking to that point was that
- well, stutter is a process that occurs during PCR so
within each cycle of the PCR the allele that we're
targeting is replicated over and over again in an
expediential fashion.
Sure?---At some point or in some of those amplifications there
will be a misalignment of the template strand and the
strand that's being made, so the copying stand, and
that's what leads to a stutter peak. The thinking is,
and it's still a reasonable theory, is that if that
stutter occurs very early on in the PCR process, what you
would expect to see is a smaller than expected allele
peak height and a larger than expected stutter peak
height because more of the fluorescence has gone into the
stutter rather than being part of the allele as it should
be. So what you would expect is that you - if that was
the case you would see that correlation as you would see
smaller allele - parent peak heights that were smaller
than expected correlating with stutter peak heights that
were larger than expected, however, we didn't see that in
the data that we looked at, which means one of two
things, either that correlation doesn't exist and we need
.DF:DM:CAT 20/04/17 SC 11A 155 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
to rethink the way that works, that model works, or that
the correlation does exist but it is too small to observe
in amongst the other noise present in the system.
The discussion then refers back to previous publications
suggested that LUS is a better explanatory variable for
SR than allele designation. What I'm putting to you is
this article is really addressing modelling in relation
to stutter ratio?---Yes.
So I go back to the sentence which included things other than
stutter ratio, "Mass parameters include a template for
each contributor, a locus specific amplification
efficiently for each locus" - that's not stutter, is it,
that's a different parameter?---No, that's different.
A replication efficiency for each PCR replicate, again, that's
not stutter, that's a different model?---Correct.
And a degradation for each contributor. Now, what I'm putting
to you, I was concentrating - I had in mind concentrating
on the degradation issue but I might address the others
as well, but at least as at this point in your validation
paper, you're referencing reference 16, "Developing
allelic and stutter peak height models" as a basis for
the modelling of the degradation for each contributor and
that article, I'm suggesting to you, does not identify a
model for degradation as it's not addressing degradation
or the rate of degradation; do you accept that?---Yes,
that sounds fair.
Okay. You then go on to say in the validation article,
"Drop-in is optionally modelled as a gamma distribution
following" - I'm not sure how you'd pronounce
this - Puch-Solis, I'm being told. "In addition STRmix
employs a per allele stutter model, the parameters of
.DF:DM:CAT 20/04/17 SC 11A 156 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
which are based on empirical data". One of the
references is the one I have just taken you to, what I
was calling your reference 16?---Yes.
And you also reference papers at 20 and 21, the first one being
by Messrs Brooks, Bright, Harbison and Buckleton
characterising stutter in forensic STR multiplexes and 21
is an article by Messrs Bright, Curran and Buckleton,
investigation into the performance of different models
for predicting stutter. I'm going to go to those
articles, Your Honour. I wouldn't mind a short
mid-morning break, if possible.
HER HONOUR: Yes. Is there any difficulty with the video-link
if we take a short break?
MR DESMOND: Keep to going as far as I'm concerned.
DR ROGERS: Yes, I think it would be more prudent to keep it
going rather than to stop it and start again.
HER HONOUR: All right. Dr Taylor, we are going to take a
10-minute break, because we will sit through until one
when the luncheon adjournment will take place?---Okay.
So I'm happy - we are going to keep the video-link running,
that doesn't mean you have to keep sitting there though.
My associate will be or Mr Hansen will be in touch about
resuming at 10 past 12?---Okay, thank you.
The court will resume sitting at 10 past 12.
<(THE WITNESS WITHDREW)
(Short adjournment.)
<DUNCAN ALEXANDER TAYLOR, recalled:
HER HONOUR: Mr Desmond.
MR DESMOND: Thank you, Your Honour. Just to give you the just
to give you the context, doctor, of where we were at, so
this validation article, the authors say, "Profile
.DF:DM:CAT 20/04/17 SC 11A 157 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
degradation is modelled as expediential" and you have got
references 17 and 18 there, 17 is Bright, Taylor and
Buckleton Degradation Forensic DNA Profiles Australian
Journal of Forensic Science Volume 45, 2013. 18 is
Buckleton Kelly Bright Taylor and that Tvederbrink and
Curren utilising allelic drop-out probabilities estimated
by logistical regression and case work, Forensic Science
International Volume 9 2014. So I will take you firstly
then to reference 17 degradation of Forensic DNA
Profiles. The abstract reading. "Selected profiles
typed at the Pro Mega Power Plex 21, PP21 loci
re-examined to determine if a linear or expediential
model best describe the relationship between peak height
and molecular weight. There were fewer large departures
from observed and expected peak heights using the
expediential model, the larger differences that were
observed were exclusively at the high molecular weight
loci. We conclude that the data supports the use of an
expediential curve to model peak heights versus molecular
weight in PP21 profiles, we believe this observation will
improve our ability to model expected peak heights for
use in DNA interpretation software". The article then
commences, I will read this part of it. "Typically the
samples will be amplified using commercially manufactured
STR multiplexes that analyse many loci simultaneously".
Can you just remind me, what is meant by
multiplexes?---That means a combination of a whole - a
combination of looking at many different regions of the
DNA all within the one reaction.
So looking at many - I just didn't hear the word?---Many
different regions of the DNA all within one PCR reaction.
.DF:DM:CAT 20/04/17 SC 11A 158 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
The sentence then continues, "With subsequent polymerase chain
reaction PCR product generated in a Capili
electrophoreses instrument. The resulting DNA profile is
an EPG, the heights or areas of the peaks within the area
EPG are approximately proportionate to the amount of
undegraded template DNA. However this relationship is
affected by a number of systematic factors". You accept
that?---Yes.
A bit further on there is a paragraph that commences, "The
modelling of expected peak heights is important in the
interpretation of forensic mixtures", that's obviously
true?---Yes.
You go on to say, the authors have previously described a
series of models that can be used to calculate expected
values for allele and stutter peak heights and their SR,
that is their ratio, known shortcomings of the binary
model have led to the development of new and improved
models that factor in the probability of drop-out.
Subsequently fully continuous interpretation models have
been developed. These models else take the quantitative
information from the EPG, for example, peak heights, and
use them to calculate the probability of the peak heights
given all the possible genotype combinations for the
individual contributors". That's sounds accurate?---Yes.
You go on then to say, "It is important to understand how
degradation affects these models. The simplest model is
linear. That is the expected peak height declines
constantly with respect to molecular weight. This can be
demonstrated crudely by taking a paper EPG and drawing a
downward sloping straight line across the apex of the
heterozygote peaks from to the lowest molecular weight
.DF:DM:CAT 20/04/17 SC 11A 159 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
locus to the highest molecular weight locus. A linear
model has previously been suggested by the current
authors". You go say, "In this work we investigate
linear and expediential equations for modelling
degradation within single source Pro Mega Power Plex 21
profiles". You agree that was the investigation, it was
for modelling degradation within single source Pro Mega
PP21 profiles?---Yes.
It is a relatively short article, without being critical. You
then describe the method, there are some algorithms and
then there is a brief paragraph, results and conclusions,
part of which reads. "We can see that more extreme
positive departures from expectation occur at the high
molecular weight end, in approximately 350 BP" - just
remind me again, what's that, base points?---Base pairs.
Base pairs, thanks. "Approximately 350 BP and above and
extreme negative departures occur in the mid-zone. This
is expected if we force a straight line on an
expediential curve". Can you just explain that firstly,
the reference to more extreme positive departures than
expected at the high molecular weight end and the extreme
negative departures in the mid-zone, what is that
identifying?---You can imagine if some data that you are
looking at has an expediential curve and if you were
describe an expediential curve you could imagine like a
ski slope that starts off very steep and then it flattens
out as it going along. So now imagine that you wanted to
try to describe that curvy ski scope with a straight
line, what you would do is you would you start at one end
and sort of draw a line as best you could through the
slope and what would you find is in that mid-section the
.DF:DM:CAT 20/04/17 SC 11A 160 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
bendy curve would be below the straight line and then as
you got to the end of the curve the bendy bit would be
above the straight line because the straight light
continued downwards into the ground, that's really what
that is talking about.
Thank you. And you go on to conclude, "We conclude this
evidence supports the use of an expediential curve to
model peak heights versus molecular weight in PP21
profiles." Am I correct in understanding that the
modelling of peak height versus molecular weight that's
addressing the degradation or the rate of
degradation?---Correct.
Do you agree this article does not address an investigation
into modelling degradation within complex mixtures
sourced from PP21 profiles?---Would I agree that it
doesn't address degradation?
Does not investigate. I will read out the investigation
sentence. "In this work we investigate linear and
expediential equations for modelling to generation within
single source Pro Mega Power Plex 21 profiles?---Yes,
that's right, so we used single source profiles to
generate our models.
Now the other reference was reference 18, just reminding you
again, one of the authors is yourself, titled "utilising
allelic drop-out probabilities estimated by logistic
regression in case work. Which is FSR Genetics Volume 9
2014 pp.9-11. I will read out part of the abstract.
"Some advanced methods for DNA interpretation require a
probability for the event of drop-out. Methods have been
suggested biased on logistic regression. Two of these
respectively use proxy for template et cetera, both of
.DF:DM:CAT 20/04/17 SC 11A 161 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
these methods allow different modelling constants from
each locus, a variant of the model using an expediential
curve is discussed, this variant constrains the constants
to be the same for every locus. We tested these two
methods and the variant by developing the constants
(training) on one set of data and testing them on
another. This mimics the likely use in case work. We
find that the new variant appears to be the most useful
in that it performs better than the other two options
when trained on one data set and used on another. The
hypothesised reason for this is that locus to locus
variation and the amplification efficiently varies with
time, multi-mix batch, or from sample to sample". So
that's the end of the extract. The reference to
multi-mix batch, I think I know what that means, perhaps
can you explain that?---I'm not exactly sure what that
refers to.
Then I withdraw my suggestion that I think I know what that
means. In any event, I'm addressing this degradation
issue and on p.2 of the document, "Experience in case
work has also suggested that there is a locus effect in
addition to a general downward slope", do you agree
that?---Yes.
"As an example, in one report 3 loci within the identifier,
that's the trademark, Multiplex, were preferentially
inhibited top varying extents in the presence of a
laboratory cleaning agent. Multi-mix is produced in
batches and it is conceivable that the locus balance in
one batch is different from another". Are you able on
the spot to say, well, yes, that's accurate, or do you
say, well, it must be because it is in it, but I really
.DF:DM:CAT 20/04/17 SC 11A 162 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
need to see the article because of the multi-mix and
batch reference?---I think I am happy to agree that's
accurate. I assume Multi-mix perhaps means a Multiplex
as I was speaking about before.
"Collectively these factors suggest that loci may be above or
bow the trend line", is that a reference to the
expediential curve or the straight line?---The
expediential curve.
"And whether a specific locus is above or below may change from
time to time or even sample to sample",
correct?---Correct.
"Models ignorant of such effects are likely to underperform";
correct?---Yes, by perform, they won't have as much
ability to distinguish contributors from non-contributors
so there would be lower discrimination power as a result.
Would you say that again, they won't be able to distinguish
between contributors and non-contributors?---And
non-contributors, there will be less discrimination power
in the system.
I am just trying to get my head around, if there is a
non-contributor there would be nothing to see?---When you
compare someone who is not a contributor to a profile you
expect a likelihood ratio that favours their exclusion.
I see what you mean?---That's what I mean.
The authors then say "there are currently two published
logistic regression drop-out models". I will not
describe them at present purposes. But you go on to say,
"However, the question presents do locus effects
developed for one set of training data translate to a
future set of data? This question is more than academic,
if multiplex batches or even samples differ in locus
.DF:DM:CAT 20/04/17 SC 11A 163 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
amplification efficiency then transport ability of the
model to future profiles may be an issue, accordingly it
may be advantageous to consider a model that incorporates
the concept of degradation but does not include a locus
effect", is that accurate?---Yes.
Do you say that the model that accounts for the rate of
degradation in the current version does or does not
include locus effect?---In all versions of STRmix we have
always had a locus effect as well as a degradation model.
Can you then explain the sentences I have just read out,
"Accordingly it may be advantageous to consider a model
that incorporates the concepts of degradation but does
not include a locus effect"?---So I think what that
sentence is saying is, a locus effect may be important or
it or may not be important and it is worth investigating
if those are the cases.
If I put to you what it is saying is, if Multiplex mixed
batches or even samples differ in locus amplification
efficiency then you can't transport the model to a future
profile and, therefore, the advantage would be we better
investigate to see the concept of degradation absent
locus effect because if we can do that that would be the
favoured route to take?---I can give you some sort of
theory here. What you are trying to do when you are
analysing a DNA profile or describing a DNA profile is to
do so with as few variables as possible because that's
going to be the most powerful system, so the offset, or
what is playing against that is the model has to
reasonably describe the DNA profile it is seeing, so if
you cut out too many variables it is just going to fail
to explain the profile you are seeing. So it is always
.DF:DM:CAT 20/04/17 SC 11A 164 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
favourable to have as few variables as possible, but you
have to obviously increase those until you get to the
point where you are explaining your profiles, so whilst
it might be in a general way it might be favourable to a
simpler system, it is to a point.
The results section of the paper say, "We use the mean log
(likelihood) per allele to score each of the models. The
model which yields the highest mean log (likelihood) is
regard as a preferred model. Therefore, if on average
one model yields higher log (likelihood) values than the
other models then we would regard this as evidence of
superior performance", and the results are given in what
are described as tables one and tables two. Do you agree
with that?---Yes.
"For the Power Plex 21 data T with a little 1 regularly gave
the highest mean (likelihood) in the training and test
sets, refers to table one, we interpret this as meaning
that pristine source data is too good to show the
expected degradation effect and, therefore, not suitable
to train these logistic models". Is that
accurate?---Yes.
Is that meaning degradation rates should not be calculated or
modelled based on samples taken from say scientists at
FSL as a data set, because they are pristine ones?---I
think what that this paper is talking about, when you are
talking about the logistic models in the paper talking
they are about a model for probability of drop-out, not
degradation necessarily. It is a drop-out probability.
That will shorten the exercise. I will just read the next
sentence. "However, we conclude that further development
is required in the application of locus specific effects
.DF:DM:CAT 20/04/17 SC 11A 165 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
and it is likely that these locus effects will vary from
profile to profile or time to time". Then there are the
acknowledgements. So I think from your previous answer
you are agreeing this paper is not an authority or a
reference paper for a model for the rate of
degradation?---It wouldn't be the main focus for the
paper, no, it is more of a drop-out focus.
I am just a bit concerned about the caveat, because the point
I'm trying to address is, in your validation paper to
accommodate the SWGDAM 2015 guidelines for probabilistic
genotyping you are quoting the profile degradation is
modelled as expediential and the references are those,
this one I have just taken you to, which I'm suggesting
is not an authority, doesn't provide a model for profile
degradation, nor indeed did the previous one which was
reference 17, which I have now got to find again, because
that only investigated single source PP21
profiles?---Well, the model we produced for degradation
from the single source samples is applied to mixtures.
Now, do you have any peer reviewed article supporting the
proposition that it is appropriate to apply the model
that you developed for the investigation into single
source PP21 profiles to complex mixtures that have been
amplified with PP21?---We have papers that look at the
application of some of the models within STRmix between
single source and mixed profiles and found that they are
applicable.
Can you identify for me now if possible the relevant papers or
alternatively will you undertake to identify those papers
in due course and provide them to the Crown so they can
be made available to the defence?---I can do that. One
.DF:DM:CAT 20/04/17 SC 11A 166 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
of the papers in particular is called something along the
lines of factors affecting peak height variability in STR
data, but I will have to track down.
Are you one of the joint authors of that?---Yes.
Factors affecting peak height variability in?---In STR data, I
think.
I understand. Returning then to the validation article. You
go to say, I think I read this out previously, "Drop in
is optionally modelled as gamma distribution following
Puch-Solis. In addition STRmix employs a per allele
stutter model, the parameters of which are based on
empirical data", and you reference for that sentence
articles, references 16, which is the one entitled
developing allelic stutter peak height models, 20 is the
Brooks Bright Harbison Buckleton characterising stutter
in forensic STR Multiplexes and the last one, which I
have just taken you to, investigation into the
performance of different models for predicting stutter.
I may not have just taken to you but I will take you to
that. So if we look then at the first reference, 16,
which is the developing allelic and peak height stutter
models for a continuous method of DNA interpretation. We
are looking for a stutter model, allele stutter modern,
the parameters of which are based on empirical
data?---Okay.
Within this article, I am obviously trying to avoid reading out
an entire article, but the first reference to a
definition for stutter ratio follows on from the
paragraph "loci where the alleles were repeated by one
repeat were disregarded because stutter is likely to
interfere with the allele height of the low molecular
.DF:DM:CAT 20/04/17 SC 11A 167 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
weight allele in an additive manner. These have
previously been referred to as a stutter affected
heterozygotes. In total 2,323 heterozygotes loci were
identified as being suitable for analysis. Stutter ratio
was defined as SR equals O little (a) minus 1 over O to
the (a) minus one refers to the observed height of the
stutter peak and O to lay of the parent peak". Now that
definition, that's not the model, is that right?---That's
just an equation for determining stutter ration.
I am trying to identify the model in this paper that was
sourced from empirical data. Are you able to describe
the model and I will search for it in the paper, or we
can - - - ?---At some point.
Under the paragraph 3.1 "stutter" you have got "the following
linear modern was proposed to describe the relationship
between SR and the explanatory variables LUS and locus
L"?---That's the model.
It then reads "SR to the little i equals Beta to the little O,
L plus Beta to the, looks like 1.1, or 1.L, LUS to the
little i and the sentence after is, "This was termed the
stutter model, linear modelling of stutter has been
reported previously" and then there are two references,
22 and 23. So that's the stutter model?---That's the
stutter model.
And do you rely upon where you said in the validation article
the parameters STRmix employs a per allele stutter model,
the parameters of which are based on empirical data. Do
you rely upon those references 22 and 23, I can read out
the titles to them if you want me to, in this paper as
authority for the proposition that stutter model is a per
allele stutter model placed on empirical data?---Yes,
.DF:DM:CAT 20/04/17 SC 11A 168 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
yes.
In short, if you can, can you explain, is that model
calculating a stutter ratio at each marker, at each peak
or at each genotype, what?---The model itself there has
parameters which are dependent upon the marker or upon
the locus and upon the longest uninterrupted sequence
within that locus that, around that longest uninterrupted
sequence is an allele specific property so that model is
locus specific, but it is also sourced allele specific
because locus uninterrupted sequences are allele
specific.
But does it in the end provide, and I am just pulling a figure
from mid-air at the moment, does it give you a figure per
locus that if, for example, there is a greater than 30
per cent ratio difference between what seems to be the
peak height and the sister peak, then that sister peak,
if it is greater than the 30 per cent is stutter, is that
what it is doing?---No, because all these results are
based on single source samples of known origin, we know
when the peak is a stutter or is an allele, this model
just provides the expected stutter ration for each allele
at each locus, it does not talk about peak height
variability of stutters.
But it is giving a per cent type figure at each locus that
STRmix uses in deciding whether, how much weight it gives
to a particular peak as being allelic or not?---Yes,
that's, it uses that information that is part of the
information that STRmix use in its assessment of the
profile.
Is that a constant, it would seem to me that must be a
constant, not a constant per marker, but if an analysis
.DF:DM:CAT 20/04/17 SC 11A 169 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
was done today by STRmix and an analysis was done in two
weeks' time by STRmix this appears to be a constant at a
given locus for any given profile analysis?---The
expected stutter ratio is a constant per locus and per
allele.
Where is that published, or is that information published as to
that the expected stutter - what's the phrase I'm looking
for, stutter ratio?---Expected stutter ratios.
Expected stutter ratio per locus, where would I find that, for
further reference?---That sort of information would be
present in individual labs' validation reports in
preparation for STRmix.
So that's an internal validation phrase in this case?---It is
not the sort of, yes, it is not the sort of information
that journals are interested in publishing.
Is that - you may have answered this I can't recall. Is that
novel or new with subsequent versions or that's always
been the case that it is a per locus stutter ratio that
is inputted?---There has always been a per locus stutter
ratio, but from version 2.3 onwards it was also
incorporating the longest uninterrupted sequence rather
than just an allelic designation so there was an
improvement in the model when we went to version 2.3.
Your next substantive paragraph in the validation article is
addressing guideline 3.2 of the SWGDAM guidelines,
sensitivity and specificity studies. You recall that
guideline 3.2?---Yes.
"With respect to interpretation methods sensitivity is defined
as the ability of the software to reliably resolve the
DNA profile of known contributors within a mixed DNA
profile for a range of starting DNA template", that's the
.DF:DM:CAT 20/04/17 SC 11A 170 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
objective?---Yes.
Within the paragraphs addressing this issue there is a
paragraph that commences with, "Sensitivity and
specificity studies, however, have a scientific component
to them and it may be desirable to use the best estimate
available for these. If these studies are used to
formulate decisions such as assigning terms to a verbal
scale then it should be noted that they refer to the
point estimate and not the lower bound. This has an
additional and possibly undesirable consequence that if
the verbal scale is calibrated from the sensitivity and
specificity plots and then this is a scale is applied to
the lower bound the scale itself now possesses an element
of conservativeness". Were you able to take that in and
explain to me what you are addressing there where you say
this has an additional and possibly undesirable
consequence?---Yes. When we are carrying out these
specificity and sensitivity tests it involves generating
a large number of likelihood ratios using STRmix for
contributors and for non-contributors and we have a look
at those likelihood ratios for DNA samples over a range
of input DNA amount. If you were to generate those
sensitivity and specificity plots, which sort of graft
these likelihood ratios over DNA amount, there are
different values you could choose to graft, one would be
the likelihood ratio point estimate, which is what we
have used in all of our studies, and another would be,
that lower bounds HPD interval that we spoke about just
before the break, so we always choose the point estimate
as the value to graph and what that sentence or couple of
sentences that you just read out is saying is that if you
.DF:DM:CAT 20/04/17 SC 11A 171 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
create the sensitivity plots using the point estimate and
then you use those sensitivity plots to come up with a
verbal scale that you are going to report in court as to
how much or what level of the support the results give to
particular inclusions or exclusion, but then you were to
actually report the lower bound likelihood ratio in your
case work it means you have generated your verbal
equivalent scale on the point estimates and then applying
it to the lower bound interval instead, so there is a
disconnect. It is basically a warning to say not to do
that.
Shortly after that paragraph that I read there is a paragraph
that commences "our preferred procedure when using STRmix
is that the analyst assesses whether a person of interest
is excluded prior to either their assessment of the
results of software calculations or interpretation of the
profile using the software at all. Following this
procedure STRmix is being continually checked against
human expectations and hence is being continually
validated"; is that correct?---That's reasonable, yes.
I'm concentrating more on the first sentence within the
paragraph, as I read it, it is you the developer's
preferred procedure that the biologist actually assesses
the person of interest's profile first before employing
STRmix?---Or before looking at the STRmix results.
You go on to say - - - ?---I was just going to elaborate on
that. If you form your, what you believe to be the case,
so someone being included in a mixture or excluded in a
mixture or whether it is difficult to say first, then you
can objectively assess the STRmix output in the
likelihood ratio, which is why we always suggest you form
.DF:DM:CAT 20/04/17 SC 11A 172 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
your opinions before looking at the STRmix results rather
than looking at the STRmix results, because the risk is
if you look at the STRmix results and STRmix says or
gives a likelihood ratio that supports their inclusion
you may then be biased to go oh, yes, well, I would have
chosen inclusion anyway. So does that makes sense?
It does, but I would have thought the bias argument would more
exist with your preferred method on the basis there is a
subjective assessment that I have a got a match here
potentially with one of the components of this profile
that I have not yet objectively in a sense independently
analysed with STRmix, what would you say to that?---I
think it is always important to generate your
interpretations before looking at the STRmix results, I
think that's - - -
Do you know if that is done in Victoria by the way, if that
preferred procedure, have you made that specific
recommendation to FSL in Victoria?---I'm not aware of the
specific practices there, no.
Have you made, have you published to FSL in Victoria that's the
preferred practice by you, the developer?---Well, it is
published in that paper you just read out.
I understand that, they may not have the paper, they should
have, is it in the manual, is it the verbal
communications, "look, this is the way you go about it,
up to you ultimately, I am not your supervisor", because
they have training, I would anticipate you probably
either established or participated in establishing the
core product of the training program; is that right?---I
would have, yes, yes, I would have to check the manuals,
the STRmix manuals to see what it says about the way that
.DF:DM:CAT 20/04/17 SC 11A 173 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
these things were went about.
But if it is not in the manual you would expect it would be
part of the training component for a biologist?---Yes,
and I believe in the training that we give there is a
component of STRmix implementation where we go through
those sorts of things.
Moving on to a slightly different issue, but still within the
paragraphs sensitivity and specificity studies. You say,
to highlight the matter, having said the LR is an
assessment of the weight of evidence, you raise this
issue, "to highlight the matter, consider that we make up
a DNA mixture and hence we know the donor's. Consider
that this mixture is made from Smith and Brown. If we
test the proposition that it contains Smith we expect a
high LR, suppose the LR is a billion, is this correct?
It is larger than one and as such that part is correct,
but is a billion too large or too small or just right?
The problem is that we do not have the 'true answer' and
this cannot be obtained by any method", that's
correct?---Correct.
You then deal with the issue of false exclusions and say, "A
false exclusion occurs when" bullet point (i) "the PCR
reaction runs sufficiently poorly that the peak or
stutter heights give misleading information or" bullet
point (ii) "a non-contributor is assumed to be present
or" bullet point (iii) "there is an operator error
notably inclusion of an artefact in the peak information
used by STRmix at interpretation. An artifactual peak
that has been retained within the input file will become
part of the information used by STRmix to build genotype
combinations. This will result in genotype combinations
.DF:DM:CAT 20/04/17 SC 11A 174 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
containing the artefact which will not align with the
true genotypes of contributors to the profile. If the
POI aligns with one of these altered (false) genotypes
this might result in a false exclusion". You'd stand by
that?---Yes.
"There are a number of factors within STRmix under the control
of the operator or the lab that affects errors. Most
significantly are the two variance terms",
correct?---Yes.
"If these are set too low they increase false exclusions, set
too high they increase false inclusions"; correct?
---Correct.
"These variances are set during the lab's internal validation
by modelling the observed variation in allelic and
stutter peak heights within a set of single source
profiles of varying quality. There are a number of
diagnostic output by STRmix that allow a human check of
the results including the genotypic weights" - then
there's an equation - "the posterior mean of the variance
terms and summary statistics of the MCMC (discussed
later)." Now, I'll just read out the sentence again.
"The variances are set during the lab's internal
validation by modelling the observed variation in allelic
stutter and stutter peak heights within a set of single
source profiles of varying quality". You then give a
reference of yourself, Buckleton and Bright, "Factors
affecting people height variability for STR repeat data,
FSI Genetics 21 in 2016?---Just before you go on, that's
the paper that I was referring to earlier.
Yes?---That compares mixture and single source profiles.
Thanks for that. But this sentence doesn't refer to mixed
.DF:DM:CAT 20/04/17 SC 11A 175 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
23
source profiles. I'll read the sentence again. "These
variances are set during a lab's internal validation by
modelling the observed variation in allelic and stutter
peak heights within a set of single source profiles of
varying quality." I'm inferring the variances are not
set during a lab's internal validation by modelling
observed variation allelic and stutter peak heights not
from a set of complex mixture profiles?---That's right,
so a number of properties of DNA profiles including the
peak height variability and including stutter ratios and
locus amplification efficiency are all set using single
source profiles and there's no reason to believe that any
of those factors would be different if there is more than
one person's DNA in a DNA extract - - -
Okay. Do you have any- - -?---And any tests that we have done
that look at that have indicated there's no difference
between a single source and mixed profiles.
Well, can you identify either now or in the fullness of time
any peer reviewed literature supporting that
statement?---I will do.
Thank you.
HER HONOUR: Is that convenient time, Mr Desmond?
MR DESMOND: Yes, thank you, Your Honour.
HER HONOUR: We will break for lunch until 2.15. Adjourn the
court, please.
<(THE WITNESS WITHDREW)
.DF:DM:CAT 20/04/17 SC 11A 176 TAYLOR XN XXNTuite
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
23
LUNCHEON ADJOURNMENT
.DF:DM:CAT 20/04/17 SC 11A 177 DISCUSSIONTuite
1
1
23
WITNESS AND EXHIBIT LIST: PAGE:
DISCUSSION 119
DUNCAN ALEXANDER TAYLOR, SWORN AND EXAMINED 120
EXHIBIT B - - STATEMENT OF DR DUNCAN TAYLOR DATED 17/4/2015.
120
EXHIBIT C - - STATEMENT OF DR DUNCAN TAYLOR DATED 15/8/2016.
120
CROSS-EXAMINED BY MR DESMOND 120
THE WITNESS WITHDREW 157
DUNCAN ALEXANDER TAYLOR, RECALLED 157
THE WITNESS WITHDREW 176
LUNCHEON ADJOURNMENT 177
DISCUSSION 177
1
2