Modular Codegen - LLVM

35
Modular Codegen Further Benefits of Explicit Modularization

Transcript of Modular Codegen - LLVM

Page 1: Modular Codegen - LLVM

Modular CodegenFurther Benefits of Explicit Modularization

Page 2: Modular Codegen - LLVM

Module Flavours

Page 3: Modular Codegen - LLVM

Motivating Example

#ifndef FOO_H#define FOO_Hinline void foo() { ... }#endif

#include "foo.h"void bar() { foo();}

#include "foo.h"void baz() { foo();}

Page 4: Modular Codegen - LLVM

Implicit Modules

● User writes .modulemap files

module foo { header "foo.h" export *}

Page 5: Modular Codegen - LLVM

Implicit Modules Build Process

bar.cpp

clang++

baz.cpp

clang++

Build System

Page 6: Modular Codegen - LLVM

Implicit Modules Build Process

bar.cppfoo.mm

clang++

baz.cpp

clang++

clang++

foo.pcm

Build System

Page 7: Modular Codegen - LLVM

Implicit Modules Build Process

bar.cppfoo.mm

clang++

bar.o

baz.cpp

clang++

baz.o

clang++

foo.pcm

Build System

Page 8: Modular Codegen - LLVM

Implicit Modules

● User writes .modulemap files● Compiler finds them and implicitly builds module descriptions in a

filesystem cache● Build system agnostic

● Difficult to parallelize - build system isn’t aware of the dependencies● Doesn’t distribute (clang doesn’t know about distribution scheme)

Page 9: Modular Codegen - LLVM

Explicit Modules

● Build system explicitly invokes the compiler on .modulemap files● Passes resulting .pcm files when compiling .cpp files for use

Page 10: Modular Codegen - LLVM

Explicit Modules Build Process

foo.mm

clang++

foo.pcm

Build System

Page 11: Modular Codegen - LLVM

Explicit Modules Build Process

bar.cpp

foo.mm

clang++

bar.o

baz.cpp

clang++

baz.o

clang++

foo.pcm

Build System

Page 12: Modular Codegen - LLVM

Modules TS (Technical Specification)

● New file type (C++ with some new syntax - .cppm?)● New import syntax● Also needs build system support

Page 13: Modular Codegen - LLVM

Modular Codegen

Page 14: Modular Codegen - LLVM

Duplication in Object Files

Each object file contains independent definitions of:

● Uninlined ‘inline’ functions (& some other bits)

● Debug information descriptions of classes

bar.cpp

foo.mm

bar.o● bar()● f1()

baz.cppfoo.pcm

baz.o● baz()● f1()

a.out● bar()● baz()● f1()

Page 15: Modular Codegen - LLVM

Modular Objects

The module can be used as a ‘home’ for these entities so they don’t need to be carried by every user.

bar.cpp

foo.mm

bar.o● bar()

baz.cppfoo.pcm

baz.o● baz()

a.out● bar()● baz()● f1()

foo.o● f1()

Page 16: Modular Codegen - LLVM

Risks

Unused entities may increase linker inputs.

bar.cpp

foo.mm

bar.o● bar()

baz.cppfoo.pcm

baz.o● baz()

a.out● bar()● baz()● f1()

foo.o● f1()● f2()● f3()

Page 17: Modular Codegen - LLVM

Constraints

● Headers are compiled separately (& only once) from uses● Dependencies must be well formed

○ Headers cannot be implemented by a different library - they form circular dependencies no longer broken by duplicated definitions at every use.

Page 18: Modular Codegen - LLVM

Diversion: ‘How Unix Linkers Work (lite)’

void a1() { b(); }void a2() { … }

void b() { a2(); }

Page 19: Modular Codegen - LLVM

Diversion: ‘How Unix Linkers Work (lite)’

void a1() { b(); }void a2() { … }

void b() { a2(); }

a1()?

Page 20: Modular Codegen - LLVM

Diversion: ‘How Unix Linkers Work (lite)’

void a1() { b(); }void a2() { … }

void b() { a2(); }

a1()? a1()✓

b()?

Page 21: Modular Codegen - LLVM

Diversion: ‘How Unix Linkers Work (lite)’

void a1() { b(); }void a2() { … }

void b() { a2(); }

a1()? a1()✓

b() ?

a1()✓b() ✓a2()?

Page 22: Modular Codegen - LLVM

Diversion: ‘How Unix Linkers Work (lite)’

void a1() { b(); }void a2() { … }

void b() { a2(); }

a1()? a1()✓

b() ?

a1()✓b() ✓a2()?

a1()✓b() ✓a2()❌

Page 23: Modular Codegen - LLVM

Clang/LLVM Codebase

● *.def files are textual/non-modular● lib/Support/regc* are non-modular● MCTargetOptionsCommandFlags.h non-modular● CommandFlags.h non-modular● Target ASM Parsers depend on MC Target Description● static namespace-scope functions in headers -> inline, non-static● Missing #includes● No idea what to do with abi-breaking.h● Weird things in Hexagon (non-modular headers that are included exactly once…)● ASTMatchers defining global variables in headers… no idea how this isn’t causing link errors,

maybe they’ve got implicit internal linkage.

Page 24: Modular Codegen - LLVM

Results

Page 25: Modular Codegen - LLVM

Object Section Sizes-O0 -fmodules-codegen -gsplit-dwarf

Page 26: Modular Codegen - LLVM

Object Section Sizes-O0 -fmodules-codegen -gsplit-dwarf

Page 27: Modular Codegen - LLVM

Object Section Sizes-O0 -fmodules-codegen -gsplit-dwarf

Page 28: Modular Codegen - LLVM

Object Section Sizes-O0 -fmodules-codegen -gsplit-dwarf

Page 29: Modular Codegen - LLVM

-O3

Page 30: Modular Codegen - LLVM

Further Work

● Other aspects needed for Modules TS○ Variables (implemented - could be backported to non-TS style, may not be needed)○ ???

● Avoid homing alwaysinline functions (maybe other reasonable inlining heuristics to avoid homing functions unlikely to remain uninlined)

● Avoid type units when a home is likely to be unique (not an implicit template instantiation, or has a strong vtable, etc)

Page 31: Modular Codegen - LLVM

Thanks!David Blaikie

Email/etc: [email protected]: @dwblaikie

Page 32: Modular Codegen - LLVM

20%Use this slide to show a major stat. It can help enforce the presentation’s main

message or argument.

Page 33: Modular Codegen - LLVM

This is the most important takeaway that everyone has to remember.

Page 34: Modular Codegen - LLVM

Final pointA one-line description of it

Page 35: Modular Codegen - LLVM

“This is a super-important quote”

- From an expert