LLVM Overview

20
LLVM Overview Constantin Lungu, 2014

description

An overview

Transcript of LLVM Overview

Page 1: LLVM Overview

LLVM Overview

Constantin Lungu, 2014

Page 2: LLVM Overview

Agenda

• What is LLVM

• Why is it good

• How does it work

• IR, SSA, Phi nodes, data alignment

Page 3: LLVM Overview

What is LLVM?

Page 4: LLVM Overview

What is LLVM?

• A set of reusable libraries for implementing compilers

• Started in 2000

• Written in C++, 811k SLOC

• As of today, works with C, C++, ObjC, Ada, D, Fortran

• Not an acronym, LLVM scope is not limited to creation of VM

• LLVM = umbrella project + IR + debugger + C++ standard library

Page 5: LLVM Overview

Why is it good?

Page 6: LLVM Overview

Why is it good?

• Supports lot of instruction sets: ARM, Hexagon, MIPS, NVPTX, R600, SPARC, x86/x86-64, even PowerPC!

• It's a layer between top level code and the executable

• It makes the front end and the back end decoupled

• Supports runtime compilation (JIT)

• Has lots of optimizers

Page 7: LLVM Overview

How does it work?

Page 8: LLVM Overview

How does it work?

In a nutshell:

• Generate LLVM IR from your compiler

• Run optimizers

• Create object files, assembly, or machine code in memory

Page 9: LLVM Overview

How does it work?

• Tokenise the source code

• Parse the token stream

• Build the AST

• Optimize IR

• Assemble

Page 10: LLVM Overview

LLVM IR

• Unlimited SSA Register machine instruction set

• Representations:

• Human-readable LLVM assembly (.ll)

• Dense 'bitcode' binaries (.bc)

• C++ classes

Page 11: LLVM Overview

Static Single Assignment

Let's consider the following code:

int main(int argc, const char* argv[]){ int i = 1; i = i * 2; return 0;}

Which yields us...

Page 12: LLVM Overview

Static Single Assignment

Note: LLVM registers are indexed. So %1 is the first allocated register, %2 is the second, etc.

define i32 @main(i32 %argc, i8** %argv) #0 { %1 = alloca i32, align 4 %2 = alloca i32, align 4 %3 = alloca i8**, align 8 %i = alloca i32, align 4 ; declare i store i32 0, i32* %1 store i32 %argc, i32* %2, align 4 store i8** %argv, i8*** %3, align 8 ; startup till here store i32 1, i32* %i, align 4 ; store 1 in i %4 = load i32* %i, align 4 ; store i in %4 %5 = mul nsw i32 %4, 2 ; multiply %4 by 2 and store in %5 store i32 %5, i32* %i, align 4 ; store %5 in i ret i32 0}

Page 13: LLVM Overview

But wait...

d0: y := 1d1: y := 2d2: x := y

d0 is redundant, right? It has no effect on the final value of x. But we know it because we are smart. Compilers aren't that smart, so they have to do Reaching Definition analysis to determine that. Let's convert it to SSA form?

d0: y1 := 1d1: y2 := 2d2: x1 := y2

Mmm... much better.

Page 14: LLVM Overview

Benefits of SSA

• Get rid of use-define chains with reaching definition

• If a variable has N uses and M definitions, it takes space and time proportional to N·M to represent use-def chains, while size of the SSA form is linear in the size of the original pattern

• Simplifies other algorithms related to optimization & data structures or even gets rid of them

Page 15: LLVM Overview

And what about those φ nodes?

Well, these ones are necessary when a variable can be assigned a different value based on the control flow. Sample code on the left, usage of Phi nodes on the right.

y = 1 y1 = 1if (condition) if (condition) y = 2 y2 = 2

x = y x1 = Φ(y1, y2)

The Phi node selects y1 or y2, depending where the control flow reached the Phi node. The argument y1 is associated with the block defining y1. Same thing goes with y2.

Page 16: LLVM Overview

A better example of φ nodes

void func(bool first, bool second) { bool third = first || second;}

Which yields us...

Page 17: LLVM Overview

define void @func(i1 zeroext %first, i1 zeroext %second) #0 { %1 = alloca i8, align 1 %2 = alloca i8, align 1 %third = alloca i8, align 1 %3 = zext i1 %first to i8 store i8 %3, i8* %1, align 1 %4 = zext i1 %second to i8 store i8 %4, i8* %2, align 1 %5 = load i8* %1, align 1 ; Decide what's first || second %6 = trunc i8 %5 to i1 ; and store it in %6 br i1 %6, label %10, label %7 ; The actually interesting part, labeled %0; <label>:7 %8 = load i8* %2, align 1 %9 = trunc i8 %8 to i1 br label %10 ; labeled %7

; <label>:10 ; preds = %7, %0 %11 = phi i1 [ true, %0 ], [ %9, %7 ] ; ... yield true if came from %0, otherwise yield %9 %12 = zext i1 %11 to i8 store i8 %12, i8* %third, align 1 ret void}

Page 18: LLVM Overview

And what's about those align keywords?

• The CPU accesses memory by a single word at a time

• If it happens so that the highest and lowest bits are not within the same memory word being accessed, the CPU will have to split the read in two reads! :(

• Not very good when you are optimizing machine code

• Solution - pad those values, so that you will always read the data in as few cycles as possible

• ARM, for instance, does not support unaligned memory access

Page 19: LLVM Overview
Page 20: LLVM Overview

Thanks!