New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation....
Transcript of New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation....
![Page 1: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/1.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 1
© 2019 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
Timo Berthold
![Page 2: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/2.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 2
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 2
• Performance variability
• Statistical tests
• Best practices
![Page 3: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/3.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 3
0,25
0,5
1
2
4
8
16
32
64ra
il50
7n
s12
08
40
0g
mu
-35
-40
sa
telli
tes
1-2
5e
ilB1
01
ran
16
x16
cs
ch
ed
01
0s
p9
8ic
roll3
00
0m
ik-2
50
-1-1
00
-1n
eo
s1
3a
flo
w4
0b
no
sw
ot
ne
wd
an
om
ine
-90
-10
ac
c-t
igh
t5n
eo
s-1
10
98
24
ne
os
-84
97
02
da
no
int
ne
os
-16
01
93
6u
nit
ca
l_7
tim
tab
1b
na
tt3
50
pg
5_3
4iis
-bu
pa
-co
vn
eo
s1
8n
s18
30
65
3b
ley_
xl1
ms
pp
16
n3
seq
24
ma
cro
ph
ag
erm
ine
6m
sc
98
-ip
iis-1
00
-0-c
ov
op
m2
-z7
-s2
bie
ns
t2b
inka
r10
_1c
ov1
07
5d
fn-g
win
-UU
Mle
cts
ch
ed
-4-o
bj
ns1
68
83
47
ns1
76
60
74
pig
eo
n-1
0q
iurm
atr
10
0-p
5n
3d
iv3
6n
eo
s-4
76
28
3m
ap
18
pw
-myc
iel4
roc
II-4
-11
ma
p2
0n
et1
2n
eo
s-9
34
27
8rm
atr
10
0-p
10
ns1
75
89
13
min
e-1
66
-5iis
-pim
a-c
ov
ap
p1
-2n
etd
ive
rsio
nm
cs
ch
ed
ne
os
-13
96
12
5a
sh6
08
gp
ia-3
co
lm
zzv1
1zi
b5
4-U
UE
be
as
leyC
3re
blo
ck6
7b
iella
1e
il33
-2n
eo
s-9
16
79
2ro
co
co
C1
0-0
01
00
0n
4-3
sp
98
irtr
ipti
m1
tan
gle
gra
m2
gla
ss4
co
re2
53
6-6
91
ex9
ne
os
-68
61
90
air
04
ba
b5
ne
os
-13
37
30
7vp
ph
ard
en
ligh
t13
en
ligh
t14
m1
00
n5
00
k4r1
tan
gle
gra
m1
30
n2
0b
8
Single solver version-to-version improvement: appx. 30%. HOWEVER:
• ~40% of the instances got slower (~50% got faster)
• Performance drop by up to a factor of 4 (versus improvements of up to factor 45)
Picture courtesy of Thorsten Koch.
Test set: MIPLIB2010 Benchmark (87 instances)
![Page 4: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/4.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 4
![Page 5: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/5.jpg)
© 2018 Fair Isaac Corporation. Confidential. 5
• Performance variability is a change in performance by seemingly performance-neutral changes, like changing the
• Input format
• Essentially, this leads to a permutation of the problem
• Operating system (Does NOT occur with Xpress)
• Different number of threads (can be overcome by control setting in Xpress)
• This occurs even though the underlying software is deterministic
• Reasons are imperfect tie-breaking, numerical round-off differences on different platforms…
• Related: Adding redundant information (E.g., a constraint σ𝑖=1𝑛 𝑥𝑖 ≤ 𝑛, 𝑥𝑖 ∈ {0,1})
![Page 6: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/6.jpg)
© 2018 Fair Isaac Corporation. Confidential. 6
• Performance variability
• Can indicate improvements where there are none
• Can shadow improvements
• Can lead to wrong conclusions (“on some instances our method was bad, but on some, it was good”)
• Running Xpress with ten different random permutations:
• 118/240 instances have a variability of more than a factor of 2
• For 30, it is more than a factor of 10
• Most extreme case: 0.02 seconds versus timeout
• For large test sets, the effect averages out
• Still 4% difference between „best“ and „worst“ permutation on MIPLIB2017
![Page 7: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/7.jpg)
© 2018 Fair Isaac Corporation. Confidential. 7
• Performance variability can be utilized to
• Distinguish between structural improvements and random noise
• The more permutations/seeds you run, the clearer the picture for each single instance
• Mimic performance on unknown instances from same application
• Very, VERY often, customers give you one single example
• Performance variability gives you a poor man’s parallelization:
• Fire off X permutations of the problem, stop when the first one solves
• Doesn’t scale very well ;-)
![Page 8: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/8.jpg)
© 2018 Fair Isaac Corporation. Confidential. 8
• Performance variability for benchmarking is typically induced by
• Permuting the matrix: High variability 😊 but structural performance “loss” 😒
• Also: more cache misses
• Random seeds: Lower variability 😒 but preserves average performance 😊
• New method: Cyclic shifts
• Light-weight permutation that shifts the rows and columns
• Only two „breakpoints“
• High variability 😊 and preserves average performance😊
• Can be combined with random seeds
• As a solver developer, after each release:
• We change the benchmarking permutation seeds
• The offset to the random seed
![Page 9: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/9.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 9
![Page 10: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/10.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 10
• To test a hypothesis, we compare it to the null-hypothesis: “There is no such relationship”
• When the null-hypothesis can be rejected with high probability, the result is deemed significant
• Essentially, this expresses the confidence in the result. How safe is it to draw conclusions from the test results?
• “How likely is it that a random draw would have delivered the same (or an even more extreme) result?”
• Typically, p-values less than 5% are considered significant.
• For a normal distribution, 95% are within mean +/- two standard deviations
• There is a huge variety of statistical tests, require different assumptions, e.g., concerning the distribution. In our case typically: distribution-free (non-parametric)
• Distinguish between nominal data (yes/no) and rational data
![Page 11: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/11.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 11
• Applied to 2x2 contigency table of paired nominal data
• E.g., does the solver find a solution with/without a certain setting (yes/no, yes/no)
• Considers the cases where both differ (the counter diagonal of the table)
• Compute 𝜒2 =𝑏−𝑐 2
𝑏+𝑐
• For random drawed 𝑏, 𝑐, this would follow a chi-squared-distribution.
• Lookup p-value
• Well-suited for categorical data (found a solution yes/no, solved to optimality yes/no)
![Page 12: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/12.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 12
• Non-parametric difference test for paired rational data
• Sorts observations by absolute value of the logarithm of their ratio
• Assigns ranks from 1 to n
• Split into positive and negative group, compute rank sums
• The more different those sums, the more likely that the setting with the larger sum outperforms the one with the lower sum
• Wilcoxon statistic: 𝑧 =min 𝑊−,𝑊+ −
𝑁(𝑁+1)
4
𝑁(𝑁+1)(2𝑁+1)
24
• Would follow a normal distribution for randomly drawn data
• Well-suited for numerical data (runtime, number of nodes, PDI)
![Page 13: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/13.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 13
![Page 14: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/14.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 14
• Choose a suitable test set
• Are there standard test sets in your field?
• Use large test sets
• Use diverse test sets (within your target domain)
• Use „real“ instances if possible, not randomly generated
• Be careful, what to exclude and how to split test sets
• NEVER EVER do „easy“ vs. „hard“ only w.r.t. one „baseline“ setting
• Some measures might only make sense on „all optimal“ or „all timeout“
• Report number of solved instances
• Explain why certain instances had to be excluded and how many were affected
![Page 15: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/15.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 15
• Choose a suitable test set
• Are there standard test sets in your field?
• Use large test sets
• Use diverse test sets (within your target domain)
• Use „real“ instances if possible, not randomly generated
• Be careful, what to exclude and how to split test sets
• NEVER EVER do „easy“ vs. „hard“ only w.r.t. one „baseline“ setting
• Some measures might only make sense on „all optimal“ or „all timeout“
• Report number of solved instances
• Explain why certain instances had to be excluded and how many were affected
![Page 16: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/16.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 16
• Report machine specification, software versions and working limits used
• Make fair comparisons
• What is the state-of-the-art?
• Not only software, but also models
• Does the benchmark solver have the same purpose (e.g., heuristics vs exact solvers)
![Page 17: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/17.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 17
• State the question(s) that you want to answer and the means of doing so
• @Reviewers: This might not be the same question and the same methodology that you would have chosen....
• Run on otherwise idle machines
• Run at most one job per CPU (and bind to CPU)
• Never trust your own results
• Investigate outliers, negative and positive ones
• Use methods like permutations, statistical tests etc to double-check your results
• Use diverse measures
• Also report negative results
![Page 18: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/18.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 18
![Page 19: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/19.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 19
• Performance variability describes
a) Large runtime differences caused by seemingly neutral changes
b) A method to tell whether a result is significant
c) A performance measure
• What is NOT a best practice for computational experiments?
a) Use diverse machine setups
b) Use diverse performance measures
c) Run on otherwise idle machines
• When comparing runtimes, the significance should be checked by
a) A McNemar test
b) A Cyclic shift test
c) A Wilcoxon signed rank test
![Page 20: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/20.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 20
• Performance variability describes
a) Large runtime differences caused by seemingly neutral changes
b) A method to tell whether a result is significant
c) A performance measure
• What is NOT a best practice for computational experiments?
a) Use diverse machine setups
b) Use diverse performance measures
c) Run on otherwise idle machines
• When comparing runtimes, the significance should be checked by
a) A McNemar test
b) A Cyclic shift test
c) A Wilcoxon signed rank test
![Page 21: New Timo Berthold - Zuse Institute Berlin · 2020. 9. 7. · © 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced](https://reader034.fdocuments.us/reader034/viewer/2022051815/603fdee10ab3a2434257a327/html5/thumbnails/21.jpg)
© 2019 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. 21
© 2019 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
Timo Berthold