The Architecture of Open Source Applications (Volume 1) LLVM5
The Architecture of Open Source Applications (Volume 1) LLVM5
html
transformations, etc. The most important aspect of it, though, is that it is itself defined as a first
class language with well-defined semantics. To make this concrete, here is a simple example of a
.ll file:
recurse:
%tmp2 = sub i32 %a, 1
%tmp3 = add i32 %b, 1
%tmp4 = call i32 @add2(i32 %tmp2, i32 %tmp3)
ret i32 %tmp4
done:
ret i32 %b
}
This LLVM IR corresponds to this C code, which provides two different ways to add integers:
As you can see from this example, LLVM IR is a low-level RISC-like virtual instruction set. Like a
real RISC instruction set, it supports linear sequences of simple instructions like add, subtract,
compare, and branch. These instructions are in three address form, which means that they take
some number of inputs and produce a result in a different register.5 LLVM IR supports labels and
generally looks like a weird form of assembly language.
Unlike most RISC instruction sets, LLVM is strongly typed with a simple type system (e.g., i32 is a
32-bit integer, i32** is a pointer to pointer to 32-bit integer) and some details of the machine are
abstracted away. For example, the calling convention is abstracted through call and ret
instructions and explicit arguments. Another significant difference from machine code is that the
LLVM IR doesn't use a fixed set of named registers, it uses an infinite set of temporaries named
5 of 16 1/11/2024, 2:20 PM