Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. ·...
Transcript of Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. ·...
![Page 1: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/1.jpg)
Jan Hubička and Martin LiškaSUSElabs
[email protected], [email protected]
Building openSUSE with link-time optimizations
![Page 2: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/2.jpg)
Outlilne
● What is link-time optimization? ● Link-time optimization and GCC● Benchmarks● Can we build openSUSE with link-time optimization by
default?
![Page 3: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/3.jpg)
What is link-time optimization?
![Page 4: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/4.jpg)
Per-file compilation
File1.c
File2.c
File3.c
File4.c
File1.o
File2.o
File3.o
File4.o
GCC
GCC
GCC
GCC
ld / gold a.out
![Page 5: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/5.jpg)
Link-time optimization
File1.c
File2.c
File3.c
File4.c
File1.oIL
File2.oIL
File3.oIL
File4.oIL
GCC
GCC
GCC
GCC
ld / gold a.out
LTO plug-inlink-timecompiler
![Page 6: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/6.jpg)
Benefits of LTO● Symbol promotion
(from linker’s resolution data most symbols become “static”)● Cross-module inlining, constant propagation● Aggressive unreachable code removal● Profile propagation● EH optimization (propagation of “nothrow”)● Identical code folding● Optimized code layout● and more :)
![Page 7: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/7.jpg)
Problems of LTO
● Whole toolchain has to be restructured● Slow compile-edit cycle● Harder bugreports
(often whole program needed to reproduce issue)● Not 100% transparent to user, but in most cases all one needs
to do is to add -flto
![Page 8: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/8.jpg)
Link-time optimization and GCC
![Page 9: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/9.jpg)
Modernizing GCC (LTO perspective)● 1999 (GCC 2.95): Function-at-a-time● 2001 (GCC 3.0): New inliner (first high-level opt . in gcc)● 2004 (GCC 3.4): Unit-at-a-time; intermodule compilation for C; Inter-procedural
optimization framework● 2005 (GCC 4.0): New SSA optimization framework● 2006 (GCC 4.1): Inter-procedural optimizations: profile guided inlining, pure/const
discovery, mod/ref, inter-procedural constant propagation, --fwhole-program –combine
● 2008 (GCC 4.4): Inter-procedural optimization on SSA; early optimization and inlining.● 2010 (GCC 4.5): Basic LTO framework (5 years in development)● 2011 (GCC 4.6): WHOPR (parallel link-time optimization); Firefox builds
![Page 10: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/10.jpg)
Link-time optimization
File1.c
File2.c
File3.c
File4.c
File1.oIL
File2.oIL
File3.oIL
File4.oIL
GCC
GCC
GCC
GCC
ld / gold a.out
LTO plug-inlink-timecompiler
![Page 11: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/11.jpg)
Parallelized Link-time optimization (WHOPR)
File1.c
File2.c
File3.c
File4.c
File1.oIL
File2.oIL
File3.oIL
File4.oIL
GCC
GCC
GCC
GCC
ld / gold a.out
LTO plguin
Whole ProgramAnalysis
Local opt.
Local opt
Local opt.
![Page 12: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/12.jpg)
Modernizing GCC (LTO perspective)● 2012 (GCC 4.7.0): Memory use optimizations, new inliner heuristics, new inter-procedural
constant propagation with clonning● 2013 (GCC 4.8.0): symbol table; propagation of values passed through aggregates● 2014 (GCC 4.9.0): slim LTO objects by default; on demand loading of functions; devirtualization
pass; feedback directed code layout● 2015 (GCC 5): Identical Code Folding; COMDAT optimization; One Definition Rule for C++;
alignment propagation; correct command line options handling with LTO● 2016 (GCC 6): Linker-plugin now detects type of output binary. C&Fortran type merging. Better
alias anaysis● 2017 (GCC 7): Inter-procedural value range propagaion; bitwise propagation● 2018 (GCC 8): Early debug info. Profile representation rewrite. Function splitting now by default.
Reworked runtime estimation; Malloc attribute propagation
![Page 13: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/13.jpg)
GCC optimization pipeline
Parser
IL generation
Early opts:
Early InlinerConstant prop.Forwward prop.Jump threading
Scalar repl. of aggr.Alias analysis
Redundancy ellim.Dead store ellim.Dead code ellim.
Tail recursionSwitch conversionpure/const/nothrow
EH optimizationProfile guessing
Compile time Link-time serial
IP analysisstreaming out
Symbol & typestreaming in+ merging
Inter-procedural(whole program)
Opts:
Dead symbol ellim.Symbol promotion
profile analysisIdentical code folding
devirtualizationConstant propagationconst/destr merging
Inliningpure/const/nothrow
mod/refcomdat
Partitioningstreaming out
Streaming in symbols, types and declarations
& link
Stream inand apply
transformations
High level opts:
Constant prop.Complette unrollForward prop.Alias analysis
Return slot opt.Redudancy ellim.Jump threading
Dead code ellim.Conditional store ellim.
Copy prop.If combine
Tail recursionCopy loop headersScalar repl. of aggr.
Dead store ellim.Dead code ellim.
ReassociationSincos, bswap opt.
Loop invariant motionPartial redundancy ellim.
Loop splittingUnroll and jam
Loop dsitributionLoop interchange ...
Low level opts:
Common subexpression ellim.Forward propagation
Copy propagationPartial Redundancy Ellim.
Code hoistingCopy propagation
Store motionIf conversion
Loop invariant motionLoop unrolling
Doloop optimizationWeb constructionCopy propagation
Common subexpression ellim.Dead store ellim.
Instruction combineFunction partitioningInstruction splitting
Live range shrinkingScheduling
Register allocationGlobal common subexpr. Ellim.
Shrink wrappingStack adjustment opt.
Register renamingConstant prop.
Code reorderingScheduling
X87 register stackCode/data alignment
Machine dependent reorg.Code output
Link-time parallel
![Page 14: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/14.jpg)
Benchmarks
![Page 15: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/15.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
0
0.5
1
1.5
2
2.5
generic
native
![Page 16: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/16.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
0
2
4
6
8
10
12
14
generic
native
![Page 17: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/17.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
0
2
4
6
8
10
12
14
generic
native
429.
mcf
458
.sjen
g
445
.gob
mk
403
.gcc
462
.libqu
antu
m
473
.ast
ar
464
.h26
4ref
471
.om
netp
p
401
.bzip
2
483
.xala
ncbm
k
400
.per
lbenc
h
456
.hm
mer
Geo
mea
n
-20
0
20
40
60
80
100
120
GCC -Ofast relative to GCC 6 -O2
GCC 6
GCC 7
GCC 8
![Page 18: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/18.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
0
2
4
6
8
10
12
14
generic
native
![Page 19: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/19.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
0
2
4
6
8
10
12
14
generic
native
![Page 20: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/20.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
0
2
4
6
8
10
12
14
generic
native
400.
perlb
ench
401
.bzip
2
403
.gcc
429
.mcf
445
.gob
mk
456
.hm
mer
458
.sjen
g
462
.libqu
antu
m
464
.h26
4ref
471
.om
netp
p
473
.ast
ar
483
.xala
ncbm
k
Geo
mea
n
-50
-40
-30
-20
-10
0
10
Clang & ICC -Ofast relative to GCC 8
clang/flang 6
ICC 18
![Page 21: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/21.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
0
2
4
6
8
10
12
14
generic
native
![Page 22: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/22.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
0
2
4
6
8
10
12
14
generic
native
![Page 23: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/23.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
0
2
4
6
8
10
12
14
generic
native
400.
perlb
ench
401
.bzip
2
403
.gcc
429
.mcf
445
.gob
mk
456
.hm
mer
458
.sjen
g
462
.libqu
antu
m
464
.h26
4ref
471
.om
netp
p
473
.ast
ar
483
.xala
ncbm
k
Geo
mea
n
-4
-2
0
2
4
6
8
GCC -Ofast -flto relative to -Ofast
![Page 24: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/24.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
0
2
4
6
8
10
12
14
generic
native
![Page 25: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/25.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
clang
/fllan
g 6
-Ofa
st -f
lto
ICC 1
8 -O
fast
-flto
0
5
10
15
20
25
generic
native
![Page 26: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/26.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
clang
/fllan
g 6
-Ofa
st -f
lto
ICC 1
8 -O
fast
-flto
0
5
10
15
20
25
generic
native
400.
perlb
ench
401
.bzip
2
403
.gcc
429
.mcf
445
.gob
mk
456
.hm
mer
458
.sjen
g
462
.libqu
antu
m
464
.h26
4ref
471
.om
netp
p
473
.ast
ar
483
.xala
ncbm
k
Geo
mea
n
-40
-20
0
20
40
60
80
100
120
Clang/flang 6 and ICC 18 relative to GCC 8 (-O2 -flto)
clang/flang 6
ICC 18
![Page 27: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/27.jpg)
SPECint 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
clang
/flan
g 6
-Ofa
st
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
clang
/fllan
g 6
-Ofa
st -f
lto
ICC 1
8 -O
fast
-flto
0
5
10
15
20
25
generic
native
![Page 28: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/28.jpg)
Profile feedback & LTO works well together
Hmmer benchmark has problems; was excluded.
To build with profile feedback often you can use: ./configure ; CFLAGS=”-O2 -fprofile-generate” make ; make check ; make clean ; CFLAGS=”-O2 -fprofile-use” make
400.
perlb
ench
401
.bzip
2
403
.gcc
429
.mcf
445
.gob
mk
458
.sjen
g
462
.libqu
antu
m
464
.h26
4ref
471
.om
netp
p
473
.ast
ar
483
.xala
ncbm
k
Geo
mea
n
-10
-5
0
5
10
15
20
25
Performance relative to GCC 8 -Ofast
LTO
FDO
FDO+LTO
![Page 29: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/29.jpg)
SPECint 2006 code size
GCC 7 -O2 GCC 8 -O2 GCC 7 -Ofast GCC 8 -OfastGCC 8 -Ofast + FDO Clang 6 -O2 Clang 6 -Ofast Icc 18 -Ofast0
2000000
4000000
6000000
8000000
10000000
12000000
Non-LTO
LTO
![Page 30: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/30.jpg)
SPECfp 2006 performance (relative to GCC 6)
GCC 7 -O
2
GCC 8 -O
2
GCC 6 -O
fast
GCC 7 -O
fast
GCC 8 -O
fast
ICC 1
8 -O
fast
GCC 8 -O
2 -fl
to
GCC 8 -O
fast
lto
ICC 1
8 -O
fast
-flto
-10
0
10
20
30
40
50
60
generic
native
![Page 31: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/31.jpg)
Firefox performance summary
resp
onsiv
enes
s tp
5o
drom
aeo
dom
Display
list m
utat
e
tap
paint
spee
dom
eter
a11y
r
svgr
_opa
city
tp5o
ARES6
style
benc
hta
rt
drom
aeo
css
tsvg
x
star
tup
time
-5
0
5
10
15
20
25
30
35
Firefox performance relative to non-LTO build
static
Profile feedback
![Page 32: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/32.jpg)
Firefox performance – dromaeo DOM
![Page 33: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/33.jpg)
Firefox performance – tp5o page responsiveness
![Page 34: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/34.jpg)
Firefox binary size
clang6 -Oz -fltoclang6 -Oz -flto=thin
clang6 -O2 -fltoclang6 -O3 -flto=thin + FDO
clang6 O3 -flto + FDOclang6 -O3 -flto
clang6 -O3 -flto=thin
clang6 -Ozclang6 -Osclang6 -O2
clang6 -O3 + FDOclang6 -O3
gcc8 -Os -fltogcc8 -O3 -flto + FDO
gcc8 -O2 -fltogcc8 -O3 -flto
gcc8 -Osgcc8 -O3 + FDO
gcc8 -O2gcc8 -O3
gcc8 -O3 -flto + FDOgcc7 -O3 -flto + FDOgcc6 -O3 -flto + FDO
0 20000000 40000000 60000000 80000000 100000000 120000000 140000000
EH
data
relocations
text
![Page 35: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/35.jpg)
Firefox binary size
clang6 -Oz -fltoclang6 -Oz -flto=thin
clang6 -O2 -fltoclang6 -O3 -flto=thin + FDO
clang6 O3 -flto + FDOclang6 -O3 -flto
clang6 -O3 -flto=thin
clang6 -Ozclang6 -Osclang6 -O2
clang6 -O3 + FDOclang6 -O3
gcc8 -Os -fltogcc8 -O3 -flto + FDO
gcc8 -O2 -fltogcc8 -O3 -flto
gcc8 -Osgcc8 -O3 + FDO
gcc8 -O2gcc8 -O3
gcc8 -O3 -flto + FDOgcc7 -O3 -flto + FDOgcc6 -O3 -flto + FDO
0 20000000 40000000 60000000 80000000 100000000 120000000 140000000
EH
data
relocations
text
![Page 36: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/36.jpg)
Firefox binary size
clang6 -Oz -fltoclang6 -Oz -flto=thin
clang6 -O2 -fltoclang6 -O3 -flto=thin + FDO
clang6 O3 -flto + FDOclang6 -O3 -flto
clang6 -O3 -flto=thin
clang6 -Ozclang6 -Osclang6 -O2
clang6 -O3 + FDOclang6 -O3
gcc8 -Os -fltogcc8 -O3 -flto + FDO
gcc8 -O2 -fltogcc8 -O3 -flto
gcc8 -Osgcc8 -O3 + FDO
gcc8 -O2gcc8 -O3
gcc8 -O3 -flto + FDOgcc7 -O3 -flto + FDOgcc6 -O3 -flto + FDO
0 20000000 40000000 60000000 80000000 100000000 120000000 140000000
EH
data
relocations
text
![Page 37: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/37.jpg)
Firefox binary size
clang6 -Oz -fltoclang6 -Oz -flto=thin
clang6 -O2 -fltoclang6 -O3 -flto=thin + FDO
clang6 O3 -flto + FDOclang6 -O3 -flto
clang6 -O3 -flto=thin
clang6 -Ozclang6 -Osclang6 -O2
clang6 -O3 + FDOclang6 -O3
gcc8 -Os -fltogcc8 -O3 -flto + FDO
gcc8 -O2 -fltogcc8 -O3 -flto
gcc8 -Osgcc8 -O3 + FDO
gcc8 -O2gcc8 -O3
gcc8 -O3 -flto + FDOgcc7 -O3 -flto + FDOgcc6 -O3 -flto + FDO
0 20000000 40000000 60000000 80000000 100000000 120000000 140000000
EH
data
relocations
text
![Page 38: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/38.jpg)
Firefox binary size
clang6 -Oz -fltoclang6 -Oz -flto=thin
clang6 -O2 -fltoclang6 -O3 -flto=thin + FDO
clang6 O3 -flto + FDOclang6 -O3 -flto
clang6 -O3 -flto=thin
clang6 -Ozclang6 -Osclang6 -O2
clang6 -O3 + FDOclang6 -O3
gcc8 -Os -fltogcc8 -O3 -flto + FDO
gcc8 -O2 -fltogcc8 -O3 -flto
gcc8 -Osgcc8 -O3 + FDO
gcc8 -O2gcc8 -O3
gcc8 -O3 -flto + FDOgcc7 -O3 -flto + FDOgcc6 -O3 -flto + FDO
0 20000000 40000000 60000000 80000000 100000000 120000000 140000000
EH
data
relocations
text
![Page 39: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/39.jpg)
Summary
● LTO now works and scale for large applications (Firefox, Libreoffice,…)
● Important size optimization especially for large C++ programs● Often important performance optimization especially when
combined with profile feedback● Space for future improvements
(both on GCC side and re-optimization of programs to LTO model)
![Page 40: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/40.jpg)
LTO in openSUSE Factory
![Page 41: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/41.jpg)
LTO in openSUSE Factory
● A new staging project (openSUSE:Factory:Staging:N)● Setting: Optflags: * -flto● ~80 packages failed (out of all ~2300)● Branch all failing packages and disable LTO● LTO Factory (w/ some disabled packages) can run openQA test-
suite (with a small fallout)
![Page 42: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/42.jpg)
Package statistics
● Total ELF files in distribution: ~6700● Reduction: 1839 MB to 1736MB (-5.6%)● libmergedlo.so: reduction 10MB (-16%) ● Biggest reduction:
– mysql_upgrade (-92%)
– Innochecksum (-94%)
– mbstream (-94%)
![Page 43: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/43.jpg)
Issues
● Argyllcms: lto1: fatal error: multiple prevailing defs for 'xcal_read_icc': LD PR: https://sourceware.org/PR23079
● GCC LTO miscompilation: http://gcc.gnu.org/PR85248● .symver in shared libraries: https://gcc.gnu.org/PR48200:
– __asm__(".symver old_foo,foo@VERS_1.1")
– New symver function attribute must be added (GCC 9.1.0)
![Page 44: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/44.jpg)
Issues (cont.)
● static libraries (*.la):– e2fsprogs, btrfsprogs, …– Fat LTO objects must be used (-ffat-lto-objects)– LTO elf sections should be stripped– OBS sanitizer should be extended– LTO mode can combine both LTO objects and assembly objects
![Page 45: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/45.jpg)
Issues (cont. 2)
● LTO warnings (-Wodr, ...):– ltrace: error: type of 'filter_matches_symbol' does not match
original declaration [-Werror=lto-type-mismatch]– gdb: error: type 'struct ipa_sym_addresses' violates the C++ One
Definition Rule [-Werror=odr]● mpir: weak configure script that scans *.o file as a blob● weak support for LTO debug info in dwz tool● Maybe higher memory constaints for selected packages
![Page 46: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/46.jpg)
Issues (cont. 3)
● Usage of top-level assembler (https://gcc.gnu.org/PR57703):– Example of syscall.cc in Chromium project:
asm volatile(".text\n"
".align 16, 0x90\n"
".type SyscallAsm, @function\n"
"SyscallAsm:.cfi_startproc\n"
– Error:nacl_helper.ltrans1.ltrans.o: In function `playground2::SandboxSyscall(int, long, long, long, long, long, long)':
nacl_helper.ltrans1.o:(.text+0x4503): undefined reference to `SyscallAsm'
![Page 47: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/47.jpg)
Histogram of text segment sizes
0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000 40,000,0000
20
40
60
80
100
120
140
![Page 48: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/48.jpg)
Conclusion
● LTO looks mature enough to be used by default● Do we want it for openSUSE Factory?● For the future, can we offer it also for SLE?
![Page 49: Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. · Building openSUSE with link-time optimizations . Outlilne ... 2018 (GCC 8): Early](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ebdab3bbc5a6e0616c203/html5/thumbnails/49.jpg)
LicenseThis slide deck is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license. It can be shared and adapted for any purpose (even commercially) as long as Attribution is given and any derivative work is distributed under the same license.
Details can be found at https://creativecommons.org/licenses/by-sa/4.0/
General DisclaimerThis document is not to be construed as a promise by any participating organisation to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. openSUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for openSUSE products remains at the sole discretion of openSUSE. Further, openSUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All openSUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE LLC, in the United States and other countries. All third-party trademarks are the property of their respective owners.
Credits
TemplateRichard Brown
Design & InspirationopenSUSE Design Team
http://opensuse.github.io/branding-guidelines/