LLVM Overview

LLVM Overview

Constantin Lungu, 2014

Agenda

• What is LLVM

• Why is it good

• How does it work

• IR, SSA, Phi nodes, data alignment

What is LLVM?

What is LLVM?

• A set of reusable libraries for implementing compilers

• Started in 2000

• Written in C++, 811k SLOC

• As of today, works with C, C++, ObjC, Ada, D, Fortran

• Not an acronym, LLVM scope is not limited to creation of VM

• LLVM = umbrella project + IR + debugger + C++ standard library

Why is it good?

Why is it good?

• Supports lot of instruction sets: ARM, Hexagon, MIPS, NVPTX, R600, SPARC, x86/x86-64, even PowerPC!

• It's a layer between top level code and the executable

• It makes the front end and the back end decoupled

• Supports runtime compilation (JIT)

• Has lots of optimizers

How does it work?

How does it work?

In a nutshell:

• Generate LLVM IR from your compiler

• Run optimizers

• Create object files, assembly, or machine code in memory

How does it work?

• Tokenise the source code

• Parse the token stream

• Build the AST

• Optimize IR

• Assemble

LLVM IR

• Unlimited SSA Register machine instruction set

• Representations:

• Human-readable LLVM assembly (.ll)

• Dense 'bitcode' binaries (.bc)

• C++ classes

Static Single Assignment

Let's consider the following code:

int main(int argc, const char* argv[]){ int i = 1; i = i * 2; return 0;}

Which yields us...

Static Single Assignment

Note: LLVM registers are indexed. So %1 is the first allocated register, %2 is the second, etc.

define i32 @main(i32 %argc, i8** %argv) #0 { %1 = alloca i32, align 4 %2 = alloca i32, align 4 %3 = alloca i8**, align 8 %i = alloca i32, align 4 ; declare i store i32 0, i32* %1 store i32 %argc, i32* %2, align 4 store i8** %argv, i8*** %3, align 8 ; startup till here store i32 1, i32* %i, align 4 ; store 1 in i %4 = load i32* %i, align 4 ; store i in %4 %5 = mul nsw i32 %4, 2 ; multiply %4 by 2 and store in %5 store i32 %5, i32* %i, align 4 ; store %5 in i ret i32 0}

But wait...

d0: y := 1d1: y := 2d2: x := y

d0 is redundant, right? It has no effect on the final value of x. But we know it because we are smart. Compilers aren't that smart, so they have to do Reaching Definition analysis to determine that. Let's convert it to SSA form?

d0: y1 := 1d1: y2 := 2d2: x1 := y2

Mmm... much better.

Benefits of SSA

• Get rid of use-define chains with reaching definition

• If a variable has N uses and M definitions, it takes space and time proportional to N·M to represent use-def chains, while size of the SSA form is linear in the size of the original pattern

• Simplifies other algorithms related to optimization & data structures or even gets rid of them

And what about those φ nodes?

Well, these ones are necessary when a variable can be assigned a different value based on the control flow. Sample code on the left, usage of Phi nodes on the right.

y = 1 y1 = 1if (condition) if (condition) y = 2 y2 = 2

x = y x1 = Φ(y1, y2)

The Phi node selects y1 or y2, depending where the control flow reached the Phi node. The argument y1 is associated with the block defining y1. Same thing goes with y2.

A better example of φ nodes

void func(bool first, bool second) { bool third = first || second;}

Which yields us...

define void @func(i1 zeroext %first, i1 zeroext %second) #0 { %1 = alloca i8, align 1 %2 = alloca i8, align 1 %third = alloca i8, align 1 %3 = zext i1 %first to i8 store i8 %3, i8* %1, align 1 %4 = zext i1 %second to i8 store i8 %4, i8* %2, align 1 %5 = load i8* %1, align 1 ; Decide what's first || second %6 = trunc i8 %5 to i1 ; and store it in %6 br i1 %6, label %10, label %7 ; The actually interesting part, labeled %0; <label>:7 %8 = load i8* %2, align 1 %9 = trunc i8 %8 to i1 br label %10 ; labeled %7

; <label>:10 ; preds = %7, %0 %11 = phi i1 [ true, %0 ], [ %9, %7 ] ; ... yield true if came from %0, otherwise yield %9 %12 = zext i1 %11 to i8 store i8 %12, i8* %third, align 1 ret void}

And what's about those align keywords?

• The CPU accesses memory by a single word at a time

• If it happens so that the highest and lowest bits are not within the same memory word being accessed, the CPU will have to split the read in two reads! :(

• Not very good when you are optimizing machine code

• Solution - pad those values, so that you will always read the data in as few cycles as possible

• ARM, for instance, does not support unaligned memory access

Thanks!

LLVM Overview

Software

Transcript of LLVM Overview