Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. ·...

Jan Hubička and Martin LiškaSUSElabs

jh@suse.cz, mliska@suse.cz

Building openSUSE with link-time optimizations

Outlilne

● What is link-time optimization? ● Link-time optimization and GCC● Benchmarks● Can we build openSUSE with link-time optimization by

default?

What is link-time optimization?

Per-file compilation

File1.c

File2.c

File3.c

File4.c

File1.o

File2.o

File3.o

File4.o

ld / gold a.out

Link-time optimization

File1.c

File2.c

File3.c

File4.c

File1.oIL

File2.oIL

File3.oIL

File4.oIL

ld / gold a.out

LTO plug-inlink-timecompiler

Benefits of LTO● Symbol promotion

(from linker’s resolution data most symbols become “static”)● Cross-module inlining, constant propagation● Aggressive unreachable code removal● Profile propagation● EH optimization (propagation of “nothrow”)● Identical code folding● Optimized code layout● and more :)

Problems of LTO

● Whole toolchain has to be restructured● Slow compile-edit cycle● Harder bugreports

(often whole program needed to reproduce issue)● Not 100% transparent to user, but in most cases all one needs

to do is to add -flto

Link-time optimization and GCC

Modernizing GCC (LTO perspective)● 1999 (GCC 2.95): Function-at-a-time● 2001 (GCC 3.0): New inliner (first high-level opt . in gcc)● 2004 (GCC 3.4): Unit-at-a-time; intermodule compilation for C; Inter-procedural

optimization framework● 2005 (GCC 4.0): New SSA optimization framework● 2006 (GCC 4.1): Inter-procedural optimizations: profile guided inlining, pure/const

discovery, mod/ref, inter-procedural constant propagation, --fwhole-program –combine

● 2008 (GCC 4.4): Inter-procedural optimization on SSA; early optimization and inlining.● 2010 (GCC 4.5): Basic LTO framework (5 years in development)● 2011 (GCC 4.6): WHOPR (parallel link-time optimization); Firefox builds

Link-time optimization

File1.c

File2.c

File3.c

File4.c

File1.oIL

File2.oIL

File3.oIL

File4.oIL

ld / gold a.out

LTO plug-inlink-timecompiler

Parallelized Link-time optimization (WHOPR)

File1.c

File2.c

File3.c

File4.c

File1.oIL

File2.oIL

File3.oIL

File4.oIL

ld / gold a.out

LTO plguin

Whole ProgramAnalysis

Local opt.

Local opt

Local opt.

Modernizing GCC (LTO perspective)● 2012 (GCC 4.7.0): Memory use optimizations, new inliner heuristics, new inter-procedural

constant propagation with clonning● 2013 (GCC 4.8.0): symbol table; propagation of values passed through aggregates● 2014 (GCC 4.9.0): slim LTO objects by default; on demand loading of functions; devirtualization

pass; feedback directed code layout● 2015 (GCC 5): Identical Code Folding; COMDAT optimization; One Definition Rule for C++;

alignment propagation; correct command line options handling with LTO● 2016 (GCC 6): Linker-plugin now detects type of output binary. C&Fortran type merging. Better

alias anaysis● 2017 (GCC 7): Inter-procedural value range propagaion; bitwise propagation● 2018 (GCC 8): Early debug info. Profile representation rewrite. Function splitting now by default.

Reworked runtime estimation; Malloc attribute propagation

GCC optimization pipeline

Parser

IL generation

Early opts:

Early InlinerConstant prop.Forwward prop.Jump threading

Scalar repl. of aggr.Alias analysis

Redundancy ellim.Dead store ellim.Dead code ellim.

Tail recursionSwitch conversionpure/const/nothrow

EH optimizationProfile guessing

Compile time Link-time serial

IP analysisstreaming out

Symbol & typestreaming in+ merging

Inter-procedural(whole program)

Dead symbol ellim.Symbol promotion

profile analysisIdentical code folding

devirtualizationConstant propagationconst/destr merging

Inliningpure/const/nothrow

mod/refcomdat

Partitioningstreaming out

Streaming in symbols, types and declarations

& link

Stream inand apply

transformations

High level opts:

Constant prop.Complette unrollForward prop.Alias analysis

Return slot opt.Redudancy ellim.Jump threading

Dead code ellim.Conditional store ellim.

Copy prop.If combine

Tail recursionCopy loop headersScalar repl. of aggr.

Dead store ellim.Dead code ellim.

ReassociationSincos, bswap opt.

Loop invariant motionPartial redundancy ellim.

Loop splittingUnroll and jam

Loop dsitributionLoop interchange ...

Low level opts:

Common subexpression ellim.Forward propagation

Copy propagationPartial Redundancy Ellim.

Code hoistingCopy propagation

Store motionIf conversion

Loop invariant motionLoop unrolling

Doloop optimizationWeb constructionCopy propagation

Common subexpression ellim.Dead store ellim.

Instruction combineFunction partitioningInstruction splitting

Live range shrinkingScheduling

Register allocationGlobal common subexpr. Ellim.

Shrink wrappingStack adjustment opt.

Register renamingConstant prop.

Code reorderingScheduling

X87 register stackCode/data alignment

Machine dependent reorg.Code output

Link-time parallel

Benchmarks

SPECint 2006 performance (relative to GCC 6)

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

.libqu

GCC -Ofast relative to GCC 6 -O2

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

.libqu

Clang & ICC -Ofast relative to GCC 8

clang/flang 6

ICC 18

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

.libqu

GCC -Ofast -flto relative to -Ofast

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

/fllan

generic

native

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

/fllan

generic

native

.libqu

Clang/flang 6 and ICC 18 relative to GCC 8 (-O2 -flto)

clang/flang 6

ICC 18

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

/fllan

generic

native

Profile feedback & LTO works well together

Hmmer benchmark has problems; was excluded.

To build with profile feedback often you can use: ./configure ; CFLAGS=”-O2 -fprofile-generate” make ; make check ; make clean ; CFLAGS=”-O2 -fprofile-use” make

.libqu

Performance relative to GCC 8 -Ofast

FDO+LTO

SPECint 2006 code size

GCC 7 -O2 GCC 8 -O2 GCC 7 -Ofast GCC 8 -OfastGCC 8 -Ofast + FDO Clang 6 -O2 Clang 6 -Ofast Icc 18 -Ofast0

2000000

4000000

6000000

8000000

10000000

12000000

Non-LTO

SPECfp 2006 performance (relative to GCC 6)

GCC 7 -O

GCC 8 -O

GCC 6 -O

GCC 7 -O

GCC 8 -O

generic

native

Firefox performance summary

Display

list m

Firefox performance relative to non-LTO build

static

Profile feedback

Firefox performance – dromaeo DOM

Firefox performance – tp5o page responsiveness

Firefox binary size

clang6 -Oz -fltoclang6 -Oz -flto=thin

clang6 -O2 -fltoclang6 -O3 -flto=thin + FDO

clang6 O3 -flto + FDOclang6 -O3 -flto

clang6 -O3 -flto=thin

clang6 -Ozclang6 -Osclang6 -O2

clang6 -O3 + FDOclang6 -O3

gcc8 -Os -fltogcc8 -O3 -flto + FDO

gcc8 -O2 -fltogcc8 -O3 -flto

gcc8 -Osgcc8 -O3 + FDO

gcc8 -O2gcc8 -O3

gcc8 -O3 -flto + FDOgcc7 -O3 -flto + FDOgcc6 -O3 -flto + FDO

0 20000000 40000000 60000000 80000000 100000000 120000000 140000000

relocations

Firefox binary size

gcc8 -O2gcc8 -O3

0 20000000 40000000 60000000 80000000 100000000 120000000 140000000

relocations

Firefox binary size

gcc8 -O2gcc8 -O3

0 20000000 40000000 60000000 80000000 100000000 120000000 140000000

relocations

Firefox binary size

gcc8 -O2gcc8 -O3

0 20000000 40000000 60000000 80000000 100000000 120000000 140000000

relocations

Firefox binary size

gcc8 -O2gcc8 -O3

0 20000000 40000000 60000000 80000000 100000000 120000000 140000000

relocations

Summary

● LTO now works and scale for large applications (Firefox, Libreoffice,…)

● Important size optimization especially for large C++ programs● Often important performance optimization especially when

combined with profile feedback● Space for future improvements

(both on GCC side and re-optimization of programs to LTO model)

LTO in openSUSE Factory

● A new staging project (openSUSE:Factory:Staging:N)● Setting: Optflags: * -flto● ~80 packages failed (out of all ~2300)● Branch all failing packages and disable LTO● LTO Factory (w/ some disabled packages) can run openQA test-

suite (with a small fallout)

Package statistics

● Total ELF files in distribution: ~6700● Reduction: 1839 MB to 1736MB (-5.6%)● libmergedlo.so: reduction 10MB (-16%) ● Biggest reduction:

– mysql_upgrade (-92%)

– Innochecksum (-94%)

– mbstream (-94%)

Issues

● Argyllcms: lto1: fatal error: multiple prevailing defs for 'xcal_read_icc': LD PR: https://sourceware.org/PR23079

● GCC LTO miscompilation: http://gcc.gnu.org/PR85248● .symver in shared libraries: https://gcc.gnu.org/PR48200:

– __asm__(".symver old_foo,foo@VERS_1.1")

– New symver function attribute must be added (GCC 9.1.0)

Issues (cont.)

● static libraries (*.la):– e2fsprogs, btrfsprogs, …– Fat LTO objects must be used (-ffat-lto-objects)– LTO elf sections should be stripped– OBS sanitizer should be extended– LTO mode can combine both LTO objects and assembly objects

Issues (cont. 2)

● LTO warnings (-Wodr, ...):– ltrace: error: type of 'filter_matches_symbol' does not match

original declaration [-Werror=lto-type-mismatch]– gdb: error: type 'struct ipa_sym_addresses' violates the C++ One

Definition Rule [-Werror=odr]● mpir: weak configure script that scans *.o file as a blob● weak support for LTO debug info in dwz tool● Maybe higher memory constaints for selected packages

Issues (cont. 3)

● Usage of top-level assembler (https://gcc.gnu.org/PR57703):– Example of syscall.cc in Chromium project:

asm volatile(".text\n"

".align 16, 0x90\n"

".type SyscallAsm, @function\n"

"SyscallAsm:.cfi_startproc\n"

– Error:nacl_helper.ltrans1.ltrans.o: In function `playground2::SandboxSyscall(int, long, long, long, long, long, long)':

nacl_helper.ltrans1.o:(.text+0x4503): undefined reference to `SyscallAsm'

Histogram of text segment sizes

0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000 40,000,0000

Conclusion

● LTO looks mature enough to be used by default● Do we want it for openSUSE Factory?● For the future, can we offer it also for SLE?

LicenseThis slide deck is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license. It can be shared and adapted for any purpose (even commercially) as long as Attribution is given and any derivative work is distributed under the same license.

Details can be found at https://creativecommons.org/licenses/by-sa/4.0/

General DisclaimerThis document is not to be construed as a promise by any participating organisation to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. openSUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for openSUSE products remains at the sole discretion of openSUSE. Further, openSUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All openSUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE LLC, in the United States and other countries. All third-party trademarks are the property of their respective owners.

Credits

TemplateRichard Brown

rbrown@opensuse.org

Design & InspirationopenSUSE Design Team

http://opensuse.github.io/branding-guidelines/

Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. ·...

Documents

Transcript of Building openSUSE with link-time optimizationshubicka/slides/opensuse2018-e.pdf · 2018. 5. 27. ·...

Tutorial Opensuse Zimbra

OpenSuse Customizing Tutorial

openSUSE Build Service

Book Opensuse Reference

Booting directｌy opensuse iso file ｂｙ grub2 @ openSUSE Asia Summit2015

Community Week: openSUSE Weekly News · Community Week: openSUSE Weekly News - Translate OWN into your Language - Satoru Matsumoto aka 'HeliosReds' openSUSE Member Weekly News Team

OpenSUSE Asia Summit 2016

openSUSE Project Presentation

openSUSE and SUSE€¦ · openSUSE and SUSE Collaboration in the open is not always easy Jos Poortvliet openSUSE Community Manager jpoortvliet@suse.com Robert Schweikert Public Cloud

OpenSUSE-KIWI Image System

Link building in 2016 | Link Building Master Class - SEMPO Cities

The openSUSE project fileWhat is openSUSE / SUSE Linux? • openSUSE – is a community project (not a distribution) – wiki, mailinglist, IRC, documentation, sub projects etc. –

Tutorial Opensuse

SEO link building

openSUSE Community week / openSUSE Weekly News Translation

OpenSUSE - Security Guide

openSUSE Packaging for the osmocom stack...openSUSE •Thanks to various openSUSE community contributors most of the common libraries from the osmocom project are already packaged

openSUSE 10.3 GNOME Quick Start...openSUSE®10.3providesthetoolsthatLinux*usersrequireintheirdailyactivities.Itcomeswithaneasy-to-usegraphicaluserinterface(theGNOME*desktop ...

Provides link wheel link building services

openSUSE – What's New? - Open Source Conference · 2 Topics Introduction to openSUSE® openSUSE & SUSE® Linux Enterprise What's new in openSUSE 13.2 Base OS Desktop Server Cloud

openSUSE 10.3 GNOME Quick Start...openSUSE®10.3providesthetoolsthatLinuxusersrequireintheirdailyactivities.Itcomeswithaneasy-to-usegraphicaluserinterface(theGNOMEdesktop ...